Files
proxmox/docs/03-deployment/MIGRATE_CT_R630_01_TO_R630_02.md
defiQUG e0bb17eff7 ops: oracle publisher LXC 3500/3501, CT migrate docs, Besu/RPC maintenance
- Provision oracle-publisher on CT 3500 (quoted DATA_SOURCE URLs, dotenv).
- Host-side pct-lxc-3501-net-up for ccip-monitor eth0 after migrate.
- CoinGecko key script: avoid sed & corruption; document quoted URLs.
- Besu node list reload, fstrim/RPC scripts, storage health docs.
- Submodule smom-dbis-138: web3 v6 pin, oracle check default host r630-02.

Made-with: Cursor
2026-03-28 15:22:23 -07:00

8.9 KiB
Raw Blame History

Migrate LXC Containers from r630-01 to r630-02

Purpose: Free space on r630-01s LVM thin pool (data) by moving selected containers to r630-02. Use after the pool is near or at 100% (e.g. to stabilise 2101 and other Besu nodes).

Hosts:

  • Source: r630-01 — 192.168.11.11 (pool data 200G; ~74.48% after all migrations: 52005202, 60006002, 64006402, 5700)
  • Target: r630-02 — 192.168.11.12 (pools: thin1-r630-02, thin2thin6; see pvesm status on r630-02)

Completed 2026-02-15: CTs 5200, 5201, 5202 (Cacti), 6000, 6001, 6002 (Fabric), 6400, 6401, 6402 (Indy), 5700 (dev-vm) migrated from r630-01 to r630-02 (backup → copy → destroy on source → restore on target → start). Storage: Cacti → thin1-r630-02; Fabric → thin2; Indy + dev-vm → thin6. r630-01 pool dropped to 74.48%. Cluster migration (pct migrate) was not used (aliased volumes / storage mismatch). Script: scripts/maintenance/migrate-ct-r630-01-to-r630-02.sh.


1. Check cluster (optional)

If both nodes are in the same Proxmox cluster, you can use live migration and skip backup/restore:

ssh root@192.168.11.11 "pvecm status"
ssh root@192.168.11.12 "pvecm status"

If both show the same cluster and the other node is listed, migration is:

# From r630-01 (or from any cluster node)
pct migrate <VMID> r630-02 --restart

CLI caveat: pct migrate may fail if the CT references storages that do not exist on the target (e.g. local-lvm on r630-02) or if the source storage ID is inactive on the target (e.g. thin1 on r630-02 vs thin1-r630-02). Remove stale unusedN volumes only after verifying with lvs that they are not the same LV as rootfs (see incident note below).

Recommended (PVE API, maps rootfs to target pool): use pvesh from the source node so disks land on e.g. thin5:

ssh root@192.168.11.11 "pvesh create /nodes/r630-01/lxc/<VMID>/migrate --target r630-02 --target-storage thin5 --restart 1"

This is the path that succeeded for 3501 (ccip-monitor) on 2026-03-28.

Storage will be copied to the target. The source volume is removed after a successful migrate. Do not use pct set <vmid> --delete unused0 when unused0 and rootfs both name vm-<id>-disk-0 on different storages — Proxmox can delete the only root LV (Oracle publisher 3500 incident, 2026-03-28).

If the nodes are not in a cluster, use the backup/restore method below.


2. Migration by backup/restore (standalone nodes)

Use this when there is no cluster or when you prefer a full backup before moving.

Prerequisites

  • SSH as root to both 192.168.11.11 and 192.168.11.12
  • Enough free space on r630-01 for the backup (or use a temporary NFS/shared path)
  • Enough free space on r630-02 in the chosen storage (e.g. thin1)

Steps (one container)

Replace <VMID> (e.g. 5200) and <TARGET_STORAGE> (e.g. thin1) as needed.

1. Stop the container on r630-01

ssh root@192.168.11.11 "pct stop <VMID>"

2. Create backup on r630-01

ssh root@192.168.11.11 "vzdump <VMID> --mode stop --compress zstd --storage local --dumpdir /var/lib/vz/dump"

Backup file will be under /var/lib/vz/dump/ (e.g. vzdump-lxc-<VMID>-*.tar.zst).

3. Copy backup to r630-02

BACKUP=$(ssh root@192.168.11.11 "ls -t /var/lib/vz/dump/vzdump-lxc-<VMID>-*.tar.zst 2>/dev/null | head -1")
scp "root@192.168.11.11:$BACKUP" /tmp/
scp "/tmp/$(basename $BACKUP)" root@192.168.11.12:/var/lib/vz/dump/

Or from r630-01:

BACKUP=$(ls -t /var/lib/vz/dump/vzdump-lxc-<VMID>-*.tar.zst 2>/dev/null | head -1)
scp "$BACKUP" root@192.168.11.12:/var/lib/vz/dump/

4. Restore on r630-02

ssh root@192.168.11.12 "pct restore <VMID> /var/lib/vz/dump/$(basename $BACKUP) --storage <TARGET_STORAGE>"

If the config has a fixed rootfs size (e.g. 50G), use:

ssh root@192.168.11.12 "pct restore <VMID> /var/lib/vz/dump/vzdump-lxc-<VMID>-*.tar.zst --storage thin1 -rootfs thin1:50"

5. Start container on r630-02

ssh root@192.168.11.12 "pct start <VMID>"

6. Free space on r630-01 (destroy original)

Only after you have verified the container works on r630-02:

ssh root@192.168.11.11 "pct destroy <VMID> --purge 1"

7. Update docs and scripts

  • Update any references that assume the container runs on r630-01 (e.g. config/ip-addresses.conf comments, runbooks, maintenance scripts). The IP does not change; only the Proxmox host changes.
  • If something (e.g. NPM, firewall) was keyed by host, point it at the same IP (unchanged).

3. Good candidates to migrate

Containers that free meaningful space on r630-01 and are reasonable to run on r630-02 (same LAN, same IP after move).

VMID Name / role Approx. size (virtual) Notes
5200 cacti-1 50G Migrated (thin1-r630-02)
5201 cacti-alltra-1 50G Migrated (thin1-r630-02)
5202 cacti-hybx-1 50G Migrated (thin1-r630-02)
6000 fabric-1 50G Migrated (thin2)
6001 fabric-alltra-1 100G Migrated (thin2)
6002 fabric-hybx-1 100G Migrated (thin2)
6400 indy-1 50G Migrated (thin6)
6401 indy-alltra-1 100G Migrated (thin6)
6402 indy-hybx-1 100G Migrated (thin6)
5700 dev-vm 400G (thin) Migrated (thin6)
3500 oracle-publisher-1 20G thin1 (was) 2026-03-28: root LV accidentally removed; CT recreated on r630-02 thin5 (fresh template). Redeploy app + .env.
3501 ccip-monitor-1 20G 2026-03-28: migrated to r630-02 thin5 via pvesh … /migrate --target-storage thin5. Networking: unprivileged Ubuntu image may leave eth0 DOWN after migrate; unprivileged cannot be toggled later. Mitigation: on r630-02 install scripts/maintenance/pct-lxc-3501-net-up.sh to /usr/local/sbin/ and optional @reboot cron (see script header).

High impact (larger disks):

  • 5700 (dev-vm) — 400G virtual (only ~5% used). Migrating it frees a lot of thin pool potential; actual freed space depends on usage. Consider moving to r630-02 to avoid future pool pressure.

Do not migrate (keep on r630-01 for now):

  • 2101 (Core RPC) — critical; fix pool first, then decide.
  • 25002505 (RPC nodes) — same pool pressure; migrate only after pool is healthy or after moving other CTs.
  • 10130, 10150, 10151 (DBIS) — core apps; move only with a clear plan.
  • 10001502 (validators/sentries) — chain consensus; treat as critical.

4. Check storage on r630-02

Before restoring, confirm target storage name and space:

ssh root@192.168.11.12 "pvesm status"
ssh root@192.168.11.12 "lvs -o lv_name,data_percent,size"

Use a pool that has free space (e.g. thin1 at <85% or another thin*).


5. Scripted single-CT migration

From project root you can run (script below):

./scripts/maintenance/migrate-ct-r630-01-to-r630-02.sh <VMID> [target_storage]

Example:

./scripts/maintenance/migrate-ct-r630-01-to-r630-02.sh 5200 thin1

See the script for exact steps (stop, vzdump, scp, restore, start, optional destroy on source).

Unprivileged CTs: vzdump often fails with tar Permission denied under lxc-usernsexec. Prefer section 1 pvesh … /migrate with --target-storage instead of this script for those guests.

5a. Reprovision Oracle Publisher (VMID 3500) on r630-02

After a fresh LXC template or data loss, from project root (LAN, secrets loaded):

source scripts/lib/load-project-env.sh   # or ensure PRIVATE_KEY / smom-dbis-138/.env
./scripts/deployment/provision-oracle-publisher-lxc-3500.sh

Uses web3 6.x (POA middleware). If on-chain updateAnswer fails, use a PRIVATE_KEY for an EOA allowed on the aggregator contract.

5b. r630-02 disk / VG limits (cannot automate)

Each thin1thin6 VG on r630-02 is a single ~231 GiB SSD with ~124 MiB vg_free. There is no space to lvextend pools until you grow the partition/PV or add hardware. Guest fstrim and migration to thin5 reduce data usage only within existing pools.


6. References