Files
proxmox/docs/04-configuration/PROXMOX_LOAD_BALANCING_RUNBOOK.md
defiQUG dbd517b279 Sync workspace: config, docs, scripts, CI, operator rules, and submodule pointers.
- Update dbis_core, cross-chain-pmm-lps, explorer-monorepo, metamask-integration, pr-workspace/chains
- Omit embedded publish git dirs and empty placeholders from index

Made-with: Cursor
2026-04-12 06:12:20 -07:00

15 KiB
Raw Blame History

Proxmox load balancing runbook

Purpose: Reduce load on the busiest node (r630-01) by migrating selected LXC containers to r630-02. Also frees space on r630-01 when moving to another host. Note: ml110 is being repurposed to OPNsense/pfSense (WAN aggregator); migrate workloads off ml110 to r630-01/r630-02 before repurpose — see ML110_OPNSENSE_PFSENSE_WAN_AGGREGATOR.md.

Before you start: If you are considering adding a third or fourth R630 to the cluster first, see PROXMOX_ADD_THIRD_FOURTH_R630_DECISION.md — including whether you already have r630-03/r630-04 (powered off) to bring online.

Spare nodes (storage ready): r630-03 (192.168.11.13) and r630-04 (192.168.11.14) are in the cluster with data / local-lvm active (shared /etc/pve/storage.cfg lists ml110,r630-01,r630-03,r630-04). For r630-03, you can also place CT disks on thin1-r630-03thin6-r630-03 (~226GiB pools, one per SSD). For r630-04, use data/local-lvm (Ceph OSD disks are separate). Scripts: scripts/proxmox/ensure-r630-spare-node-storage.sh, scripts/proxmox/provision-r630-03-six-ssd-thinpools.sh, optional scripts/proxmox/pve-spare-host-optional-tuneup.sh.

Current imbalance (typical):

Node IP LXC count Load (1/5/15) Notes
r630-01 192.168.11.11 58 56 / 81 / 92 Historical sample only; re-check live load before acting
r630-02 192.168.11.12 23 ~4 / 4 / 4 Light
ml110 192.168.11.10 18 ~7 / 7 / 9 Repurposing to OPNsense/pfSense — migrate workloads off to r630-01/r630-02
r630-03 192.168.11.13 0 (spare) low Migration target — ~1TiB data/local-lvm + thin1-r630-03thin6-r630-03
r630-04 192.168.11.14 0 (spare) low Migration target — ~467GiB thin + Ceph OSDs

Ways to balance:

  1. Cross-host migration (e.g. r630-01 → r630-02, r630-03, or r630-04) — Moves workload off r630-01. IP stays the same if the container uses a static IP; only the Proxmox host changes. (ml110 is no longer a migration target; migrate containers off ml110 first.)
  2. Same-host storage migration (r630-01 data → thin1) — Frees space on the data pool and can improve I/O; does not reduce CPU/load by much. See MIGRATION_PLAN_R630_01_DATA.md.

1. Check cluster (live migrate vs backup/restore)

If all nodes are in the same Proxmox cluster, you can try live migration (faster, less downtime):

ssh root@192.168.11.11 "pvecm status"
ssh root@192.168.11.12 "pvecm status"
  • If both show the same cluster name and list each other: use pct migrate <VMID> <target_node> --restart from any cluster node (run on r630-01 or from a host that SSHs to r630-01).
  • If nodes are not in a cluster (or migrate fails due to storage): use backup → copy → restore with the script below.

2. Cross-host migration (r630-01 → r630-02)

Script (backup/restore; works without shared storage):

cd /path/to/proxmox

# One container (replace VMID and target storage)
./scripts/maintenance/migrate-ct-r630-01-to-r630-02.sh <VMID> [target_storage] [--destroy-source]

# Examples
./scripts/maintenance/migrate-ct-r630-01-to-r630-02.sh 3501 thin1 --dry-run
./scripts/maintenance/migrate-ct-r630-01-to-r630-02.sh 3501 thin1 --destroy-source

Target storage on r630-02: Check with ssh root@192.168.11.12 "pvesm status". Common: thin1, thin2, thin5, thin6.

If cluster works (live migrate):

ssh root@192.168.11.11 "pct migrate <VMID> r630-02 --storage thin1 --restart"
# Then remove source CT if desired: pct destroy <VMID> --purge 1

3. Good candidates to move (r630-01 → r630-02)

Containers that reduce load and are safe to move (no critical chain/consensus; IP can stay static). Prefer moving several smaller ones rather than one critical RPC.

VMID Name / role Notes
3500 oracle-publisher-1 Oracle publisher
3501 ccip-monitor-1 CCIP monitor
7804 gov-portals-dev Gov portals (already migrated in past; verify current host)
8640 vault-phoenix-1 Vault (if not critical path)
8642 vault-phoenix-3 Vault
10232 CT10232 Small service
10235 npmplus-alltra-hybx NPMplus instance (has its own NPM; update UDM port forward if needed)
10236 npmplus-fourth NPMplus instance
1003010092 order-* (identity, intake, finance, etc.) Order stack; move as a group if desired
1020010210 order-prometheus, grafana, opensearch, haproxy Monitoring/HA; move with order-* or after

Do not move (keep on r630-01 for now):

  • 10233 — npmplus (main NPMplus; 76.53.10.36 → .167)
  • 2101 — besu-rpc-core-1 (core RPC for deploy/admin)
  • 2420/2430/2440/2460/2470/2480 — edge/private RPC lanes (critical; migrate only deliberately)
  • 10001002, 15001502 — validators and sentries (consensus)
  • 10130, 10150, 10151 — dbis-frontend, dbis-api (core apps; move only with a plan)
  • 100, 101, 102, 104, 105 — mail, datacenter, cloudflared, gitea (infra); 103 Omada retired 2026-04-04

4. Migrating workloads off ml110 (before OPNsense/pfSense repurpose)

ml110 (192.168.11.10) is being repurposed to OPNsense/pfSense (WAN aggregator between 610 cable modems and UDM Pros). All containers/VMs on ml110 must be migrated to r630-01 or r630-02 before the repurpose.

  • If cluster: ssh root@192.168.11.10 "pct migrate <VMID> r630-01 --storage <storage> --restart" or ... r630-02 ...
  • If no cluster: Use backup on ml110, copy to r630-01 or r630-02, restore there (see MIGRATE_CT_R630_01_TO_R630_02.md and adapt for source=ml110, target=r630-01 or r630-02).

After all workloads are off ml110, remove ml110 from the cluster (or reinstall the node with OPNsense/pfSense). See ML110_OPNSENSE_PFSENSE_WAN_AGGREGATOR.md.


4b. Prepare r630-03 for migrations from r630-01 (mail, TsunamiSwap, 57xx AI, Studio)

Goal: Move selected LXCs off r630-01 onto r630-03 (192.168.11.13) to reduce load. Use cluster online migration with explicit target storage local-lvm (each node has its own pve VG + data thin pool; disks are copied to r630-03).

Verified batch (source r630-01, static IPs on vmbr0 / gw 192.168.11.1):

VMID Hostname RAM (MiB) Cores rootfs (source) Size (config) Notes
100 proxmox-mail-gateway 4096 2 thin1 (r630-01 only) 10G Must use --storage local-lvm (not thin1 on target).
5010 tsunamiswap 16384 8 local-lvm 160G Largest disk; migrate when window allows.
5702 ai-inf-1 16384 4 local-lvm 30G
5705 ai-inf-2 16384 4 local-lvm 30G
7805 sankofa-studio 8192 4 local-lvm 60G studio.sankofa.nexus — NPM unchanged if IP stays .72.

Rough total new allocation on r630-03: ~290G thin + ~60G RAM cap (not all resident at once). r630-03 had ~1TiB free on data / local-lvm and ~503GiB host RAM (check live: pvesm status, free -h on .13).

Preparation checklist

  1. Cluster: pvecm status on r630-01 and r630-03 — same cluster, Quorate: Yes. local-lvm in /etc/pve/storage.cfg must list r630-03 (and source node).
  2. Network: Target node has vmbr0 on the same LAN/VLAN as r630-01 (static CT IPs unchanged).
  3. Backups: Take vzdump (or ZFS snapshot policy) for each VMID before migrating.
  4. Order (suggested): smaller disks first, 5010 last: 1005702570578055010. Alternatively do 5010 in a maintenance window first if you want the biggest copy done when load is lowest.

Migrate (from any node that can run pct, typically r630-01):

# Replace NODE with r630-01 if you SSH there first.
# Always set target storage so thin1-only CT 100 lands on r630-03's pool.

ssh root@192.168.11.11 "pct migrate 100 r630-03 --storage local-lvm --restart"
ssh root@192.168.11.11 "pct migrate 5702 r630-03 --storage local-lvm --restart"
ssh root@192.168.11.11 "pct migrate 5705 r630-03 --storage local-lvm --restart"
ssh root@192.168.11.11 "pct migrate 7805 r630-03 --storage local-lvm --restart"
ssh root@192.168.11.11 "pct migrate 5010 r630-03 --storage local-lvm --restart"

If a migrate fails (lock, storage), stop the CT (pct stop <vmid>), retry with --restart, or use offline backup/restore per MIGRATE_CT_R630_01_TO_R630_02.md adapted for target r630-03 and storage local-lvm.

Afterward: Update docs that still say these VMIDs live on r630-01 (pct list -a | grep <vmid>). Optional: bash scripts/proxmox/ensure-r630-spare-node-storage.sh --node r630-03 (dry-run) if you change storage layout.

Helper (prints the same plan): bash scripts/proxmox/print-migrate-r630-01-to-r630-03-plan.sh


4c. First-wave offload from r630-01 to r630-04 (Order / vault / portal support workloads)

Goal: Reduce r630-01 skew and free pressure on /var/lib/vz / thin1 by moving a low-risk first wave onto r630-04 (192.168.11.14), which is a spare node with active data + local-lvm and essentially 0% thin usage.

Validated target readiness (live checks):

  • r630-04 is quorate in the same five-node cluster (pvecm status).
  • vmbr0 is up on 192.168.11.14/24.
  • pvesm status shows data and local-lvm both active.
  • lvs pve/data shows ~466.7G thin capacity with ~0% Data% and ~1% Meta%.
  • bash scripts/proxmox/ensure-r630-spare-node-storage.sh --node r630-04 reports local-lvm active and no corrective action needed.

Recommended first wave (ordered):

VMID Hostname RAM rootfs Why this batch
10201 order-grafana 2G thin1:20G Very light support service; good first canary
10210 order-haproxy 2G thin1:20G Small edge for Order surface; easy to validate
7804 gov-portals-dev 2G thin1:20G Small app workload; static IP on vmbr0
10020 order-redis 4G thin1:50G Light in current sample; frees thin1 space
10230 order-vault 2G thin1:50G Small support workload
10092 order-mcp-legal 4G thin1:50G Small support workload
8640 vault-phoenix-1 4G thin1:50G Light current usage; frees thin1
8642 vault-phoenix-3 4G local-lvm:50G Similar profile; keep after a few easy wins
10091 order-portal-internal 4G thin1:50G Low CPU and RAM in live sample
10090 order-portal-public 4G thin1:50G Low CPU and RAM in live sample
10070 order-legal 4G thin1:50G Low current pressure
10200 order-prometheus 4G thin1:100G Still reasonable, but leave last due to larger disk

Suggested batching:

  1. Canary batch: 10201, 10210, 7804
  2. Small support batch: 10020, 10230, 10092
  3. Portal / vault batch: 8640, 8642, 10091, 10090, 10070
  4. Last in wave: 10200

Migrate (one at a time or by the batches above):

ssh root@192.168.11.11 "pct migrate 10201 r630-04 --storage local-lvm --restart"
ssh root@192.168.11.11 "pct migrate 10210 r630-04 --storage local-lvm --restart"
ssh root@192.168.11.11 "pct migrate 7804 r630-04 --storage local-lvm --restart"
ssh root@192.168.11.11 "pct migrate 10020 r630-04 --storage local-lvm --restart"
ssh root@192.168.11.11 "pct migrate 10230 r630-04 --storage local-lvm --restart"
ssh root@192.168.11.11 "pct migrate 10092 r630-04 --storage local-lvm --restart"
ssh root@192.168.11.11 "pct migrate 8640 r630-04 --storage local-lvm --restart"
ssh root@192.168.11.11 "pct migrate 8642 r630-04 --storage local-lvm --restart"
ssh root@192.168.11.11 "pct migrate 10091 r630-04 --storage local-lvm --restart"
ssh root@192.168.11.11 "pct migrate 10090 r630-04 --storage local-lvm --restart"
ssh root@192.168.11.11 "pct migrate 10070 r630-04 --storage local-lvm --restart"
ssh root@192.168.11.11 "pct migrate 10200 r630-04 --storage local-lvm --restart"

Preflight before the first command

  1. bash scripts/verify/poll-lxc-cluster-health.sh
  2. bash scripts/proxmox/ensure-r630-spare-node-storage.sh --node r630-04
  3. ssh root@192.168.11.14 "pvecm status; pvesm status | egrep '^(data|local-lvm|local)'"
  4. Take vzdump or snapshot coverage for the chosen batch if you want rollback points

Post-check after each batch

  1. ssh root@192.168.11.11 "pct list" | egrep '^(VMID|10201|10210|7804|10020|10230|10092|8640|8642|10091|10090|10070|10200)\b'
  2. ssh root@192.168.11.14 "pct list" | egrep '^(VMID|10201|10210|7804|10020|10230|10092|8640|8642|10091|10090|10070|10200)\b'
  3. Re-run bash scripts/verify/poll-lxc-cluster-health.sh and confirm r630-01 skew / vz pressure trend down

Helper (prints the same plan): bash scripts/proxmox/print-migrate-r630-01-to-r630-04-first-wave.sh


5. After migration

  • IP: Containers keep the same IP if they use static IP in the CT config; no change needed for NPM/DNS if they point by IP.
  • Docs: Update any runbooks or configs that assume “VMID X is on r630-01” (e.g. config/ip-addresses.conf comments, backup scripts).
  • Verify: Re-run bash scripts/check-all-proxmox-hosts.sh and confirm load and container counts.

6. Special-case CTs

Blockscout 5000

5000 (blockscout-1) cannot use the normal pvesh ... /migrate flow because it has a host-local bind mount:

mp1: /var/lib/vz/logs-vmid5000,mp=/var/log-remote

Live pvesh create /nodes/<src>/lxc/5000/migrate ... aborts with:

cannot migrate local bind mount point 'mp1'

Use the dedicated stop-and-restore helper instead:

bash scripts/proxmox/migrate-blockscout-5000-to-r630-04.sh --dry-run
PROXMOX_OPS_APPLY=1 bash scripts/proxmox/migrate-blockscout-5000-to-r630-04.sh --apply

The helper does four things in order:

  1. Seed and final-sync /var/lib/vz/logs-vmid5000 to r630-04
  2. Stop CT 5000
  3. Create and copy a vzdump archive, then pct restore it to local-lvm on r630-04
  4. Re-apply mp1 on the target and start the CT

The bind-mounted log tree is intentionally kept as a host path on the target:

/var/lib/vz/logs-vmid5000

7. Quick reference

Goal Command / doc
Check current load bash scripts/check-all-proxmox-hosts.sh
Migrate one CT (r630-01 → r630-02) ./scripts/maintenance/migrate-ct-r630-01-to-r630-02.sh <VMID> thin1 [--destroy-source]
Plan r630-01 → r630-03 (100, 5010, 57xx, 7805) bash scripts/proxmox/print-migrate-r630-01-to-r630-03-plan.sh — see §4b
Move Blockscout 5000 (bind mount) bash scripts/proxmox/migrate-blockscout-5000-to-r630-04.sh --dry-run
Same-host (data → thin1) MIGRATION_PLAN_R630_01_DATA.md, migrate-ct-r630-01-data-to-thin1.sh
Full migration doc MIGRATE_CT_R630_01_TO_R630_02.md