Files

defiQUG e0bb17eff7 ops: oracle publisher LXC 3500/3501, CT migrate docs, Besu/RPC maintenance

- Provision oracle-publisher on CT 3500 (quoted DATA_SOURCE URLs, dotenv).
- Host-side pct-lxc-3501-net-up for ccip-monitor eth0 after migrate.
- CoinGecko key script: avoid sed & corruption; document quoted URLs.
- Besu node list reload, fstrim/RPC scripts, storage health docs.
- Submodule smom-dbis-138: web3 v6 pin, oracle check default host r630-02.

Made-with: Cursor

2026-03-28 15:22:23 -07:00

8.9 KiB

Raw Blame History

Migrate LXC Containers from r630-01 to r630-02

Purpose: Free space on r630-01’s LVM thin pool (data) by moving selected containers to r630-02. Use after the pool is near or at 100% (e.g. to stabilise 2101 and other Besu nodes).

Hosts:

Source: r630-01 — 192.168.11.11 (pool data 200G; ~74.48% after all migrations: 5200–5202, 6000–6002, 6400–6402, 5700)
Target: r630-02 — 192.168.11.12 (pools: thin1-r630-02, thin2–thin6; see pvesm status on r630-02)

Completed 2026-02-15: CTs 5200, 5201, 5202 (Cacti), 6000, 6001, 6002 (Fabric), 6400, 6401, 6402 (Indy), 5700 (dev-vm) migrated from r630-01 to r630-02 (backup → copy → destroy on source → restore on target → start). Storage: Cacti → thin1-r630-02; Fabric → thin2; Indy + dev-vm → thin6. r630-01 pool dropped to 74.48%. Cluster migration (pct migrate) was not used (aliased volumes / storage mismatch). Script: scripts/maintenance/migrate-ct-r630-01-to-r630-02.sh.

1. Check cluster (optional)

If both nodes are in the same Proxmox cluster, you can use live migration and skip backup/restore:

ssh root@192.168.11.11 "pvecm status"
ssh root@192.168.11.12 "pvecm status"

If both show the same cluster and the other node is listed, migration is:

# From r630-01 (or from any cluster node)
pct migrate <VMID> r630-02 --restart

CLI caveat: pct migrate may fail if the CT references storages that do not exist on the target (e.g. local-lvm on r630-02) or if the source storage ID is inactive on the target (e.g. thin1 on r630-02 vs thin1-r630-02). Remove stale unusedN volumes only after verifying with lvs that they are not the same LV as rootfs (see incident note below).

Recommended (PVE API, maps rootfs to target pool): use pvesh from the source node so disks land on e.g. thin5:

ssh root@192.168.11.11 "pvesh create /nodes/r630-01/lxc/<VMID>/migrate --target r630-02 --target-storage thin5 --restart 1"

This is the path that succeeded for 3501 (ccip-monitor) on 2026-03-28.

Storage will be copied to the target. The source volume is removed after a successful migrate. Do not use pct set <vmid> --delete unused0 when unused0 and rootfs both name vm-<id>-disk-0 on different storages — Proxmox can delete the only root LV (Oracle publisher 3500 incident, 2026-03-28).

If the nodes are not in a cluster, use the backup/restore method below.

2. Migration by backup/restore (standalone nodes)

Use this when there is no cluster or when you prefer a full backup before moving.

Prerequisites

SSH as root to both 192.168.11.11 and 192.168.11.12
Enough free space on r630-01 for the backup (or use a temporary NFS/shared path)
Enough free space on r630-02 in the chosen storage (e.g. thin1)

Steps (one container)

Replace <VMID> (e.g. 5200) and <TARGET_STORAGE> (e.g. thin1) as needed.

1. Stop the container on r630-01

ssh root@192.168.11.11 "pct stop <VMID>"

2. Create backup on r630-01

ssh root@192.168.11.11 "vzdump <VMID> --mode stop --compress zstd --storage local --dumpdir /var/lib/vz/dump"

Backup file will be under /var/lib/vz/dump/ (e.g. vzdump-lxc-<VMID>-*.tar.zst).

3. Copy backup to r630-02

BACKUP=$(ssh root@192.168.11.11 "ls -t /var/lib/vz/dump/vzdump-lxc-<VMID>-*.tar.zst 2>/dev/null | head -1")
scp "root@192.168.11.11:$BACKUP" /tmp/
scp "/tmp/$(basename $BACKUP)" root@192.168.11.12:/var/lib/vz/dump/

Or from r630-01:

BACKUP=$(ls -t /var/lib/vz/dump/vzdump-lxc-<VMID>-*.tar.zst 2>/dev/null | head -1)
scp "$BACKUP" root@192.168.11.12:/var/lib/vz/dump/

4. Restore on r630-02

ssh root@192.168.11.12 "pct restore <VMID> /var/lib/vz/dump/$(basename $BACKUP) --storage <TARGET_STORAGE>"

If the config has a fixed rootfs size (e.g. 50G), use:

ssh root@192.168.11.12 "pct restore <VMID> /var/lib/vz/dump/vzdump-lxc-<VMID>-*.tar.zst --storage thin1 -rootfs thin1:50"

5. Start container on r630-02

ssh root@192.168.11.12 "pct start <VMID>"

6. Free space on r630-01 (destroy original)

Only after you have verified the container works on r630-02:

ssh root@192.168.11.11 "pct destroy <VMID> --purge 1"

7. Update docs and scripts

Update any references that assume the container runs on r630-01 (e.g. config/ip-addresses.conf comments, runbooks, maintenance scripts). The IP does not change; only the Proxmox host changes.
If something (e.g. NPM, firewall) was keyed by host, point it at the same IP (unchanged).

3. Good candidates to migrate

Containers that free meaningful space on r630-01 and are reasonable to run on r630-02 (same LAN, same IP after move).

VMID	Name / role	Approx. size (virtual)	Notes
5200	cacti-1	50G	✅ Migrated (thin1-r630-02)
5201	cacti-alltra-1	50G	✅ Migrated (thin1-r630-02)
5202	cacti-hybx-1	50G	✅ Migrated (thin1-r630-02)
6000	fabric-1	50G	✅ Migrated (thin2)
6001	fabric-alltra-1	100G	✅ Migrated (thin2)
6002	fabric-hybx-1	100G	✅ Migrated (thin2)
6400	indy-1	50G	✅ Migrated (thin6)
6401	indy-alltra-1	100G	✅ Migrated (thin6)
6402	indy-hybx-1	100G	✅ Migrated (thin6)
5700	dev-vm	400G (thin)	✅ Migrated (thin6)
3500	oracle-publisher-1	20G thin1 (was)	2026-03-28: root LV accidentally removed; CT recreated on r630-02 `thin5` (fresh template). Redeploy app + `.env`.
3501	ccip-monitor-1	20G	2026-03-28: migrated to r630-02 `thin5` via `pvesh … /migrate --target-storage thin5`. Networking: unprivileged Ubuntu image may leave eth0 DOWN after migrate; `unprivileged` cannot be toggled later. Mitigation: on r630-02 install `scripts/maintenance/pct-lxc-3501-net-up.sh` to `/usr/local/sbin/` and optional `@reboot` cron (see script header).

High impact (larger disks):

5700 (dev-vm) — 400G virtual (only ~5% used). Migrating it frees a lot of thin pool potential; actual freed space depends on usage. Consider moving to r630-02 to avoid future pool pressure.

Do not migrate (keep on r630-01 for now):

2101 (Core RPC) — critical; fix pool first, then decide.
2500–2505 (RPC nodes) — same pool pressure; migrate only after pool is healthy or after moving other CTs.
10130, 10150, 10151 (DBIS) — core apps; move only with a clear plan.
1000–1502 (validators/sentries) — chain consensus; treat as critical.

4. Check storage on r630-02

Before restoring, confirm target storage name and space:

ssh root@192.168.11.12 "pvesm status"
ssh root@192.168.11.12 "lvs -o lv_name,data_percent,size"

Use a pool that has free space (e.g. thin1 at <85% or another thin*).

5. Scripted single-CT migration

From project root you can run (script below):

./scripts/maintenance/migrate-ct-r630-01-to-r630-02.sh <VMID> [target_storage]

Example:

./scripts/maintenance/migrate-ct-r630-01-to-r630-02.sh 5200 thin1

See the script for exact steps (stop, vzdump, scp, restore, start, optional destroy on source).

Unprivileged CTs: vzdump often fails with tar Permission denied under lxc-usernsexec. Prefer section 1 pvesh … /migrate with --target-storage instead of this script for those guests.

5a. Reprovision Oracle Publisher (VMID 3500) on r630-02

After a fresh LXC template or data loss, from project root (LAN, secrets loaded):

source scripts/lib/load-project-env.sh   # or ensure PRIVATE_KEY / smom-dbis-138/.env
./scripts/deployment/provision-oracle-publisher-lxc-3500.sh

Uses web3 6.x (POA middleware). If on-chain updateAnswer fails, use a PRIVATE_KEY for an EOA allowed on the aggregator contract.

5b. r630-02 disk / VG limits (cannot automate)

Each thin1–thin6 VG on r630-02 is a single ~231 GiB SSD with ~124 MiB vg_free. There is no space to lvextend pools until you grow the partition/PV or add hardware. Guest fstrim and migration to thin5 reduce data usage only within existing pools.

6. References

502_DEEP_DIVE_ROOT_CAUSES_AND_FIXES.md — LVM thin pool full, 2101/2500–2505
BLOCKSCOUT_FIX_RUNBOOK.md — Migrate VM 5000 to thin5 (same-host example)
ALL_VMIDS_ENDPOINTS.md · config/proxmox-operational-template.json — VMID list and IPs
Proxmox: Backup and Restore

8.9 KiB Raw Blame History Unescape Escape