Files
proxmox/docs/03-deployment/MIGRATE_503GB_R630_TO_128_256GB_SERVERS.md
defiQUG fbda1b4beb
Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
docs: Ledger Live integration, contract deploy learnings, NEXT_STEPS updates
- ADD_CHAIN138_TO_LEDGER_LIVE: Ledger form done; public code review repo bis-innovations/LedgerLive; init/push commands
- CONTRACT_DEPLOYMENT_RUNBOOK: Chain 138 gas price 1 gwei, 36-addr check, TransactionMirror workaround
- CONTRACT_*: AddressMapper, MirrorManager deployed 2026-02-12; 36-address on-chain check
- NEXT_STEPS_FOR_YOU: Ledger done; steps completable now (no LAN); run-completable-tasks-from-anywhere
- MASTER_INDEX, OPERATOR_OPTIONAL, SMART_CONTRACTS_INVENTORY_SIMPLE: updates
- LEDGER_BLOCKCHAIN_INTEGRATION_COMPLETE: bis-innovations/LedgerLive reference

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-12 15:46:57 -08:00

5.8 KiB
Raw Blame History

Migrate workload off 503 GB R630 → r630-05 through r630-08 (256 GB each)

Last Updated: 2026-01-31
Document Version: 1.0
Status: Active Documentation


Last updated: 2026-01-31
Goal: Move all workload off the 503 GB R630 (r630-01) onto r630-05, r630-06, r630-07, r630-08, each with 256 GB RAM, for the HA cluster.


Current state (reference)

Host IP RAM (from health check) LXC count
ml110 192.168.11.10 125 GB 17
r630-01 192.168.11.11 503 GB 69
r630-02 192.168.11.12 251 GB 10

The 503 GB server is r630-01 (192.168.11.11). That is the source host to migrate workload from.


Target layout: r630-05 through r630-08 (256 GB each)

Host IP (planned) RAM Role / use
r630-05 192.168.11.15 256 GB HA compute node 1
r630-06 192.168.11.16 256 GB HA compute node 2
r630-07 192.168.11.17 256 GB HA compute node 3
r630-08 192.168.11.18 256 GB HA compute node 4
  • 4 nodes × 256 GB meets the HA cluster target (34 nodes, 128256 GB per node).
  • Assign IPs 192.168.11.1518 (or your chosen range) when the hosts are racked and on the same VLAN as the rest of the cluster.
  • Migrate workload from r630-01 (and optionally from ml110/r630-02) onto these four nodes.

Target state

  • No single server with 503 GB holding all workload.
  • r630-05, r630-06, r630-07, r630-08 as the primary HA compute pool, each 256 GB RAM.
  • Workload spread across the four nodes; critical services on nodes that participate in HA.

Phase 1: Inventory and plan

  1. List everything on the 503 GB host (r630-01).

    • From project root:
      ./scripts/quick-proxmox-inventory.sh
      or SSH:
      ssh root@192.168.11.11 "pct list; qm list"
    • Note: VMID, name, RAM/CPU, and whether its critical (e.g. NPMplus 10233, RPC, Blockscout, DBIS, etc.).
  2. Decide destination per VM/container.

    • ml110 (125 GB): optional for lighter containers.
    • r630-02 (251 GB): optional overflow; long-term can also be migrated to r630-05..08.
    • r630-05, r630-06, r630-07, r630-08 (256 GB each): primary targets; spread workload from r630-01 across all four.
  3. Strategy.

    • Add r630-05, r630-06, r630-07, r630-08 to the cluster (256 GB each, IPs e.g. 192.168.11.1518). Migrate workload from r630-01 to these four nodes.
    • When r630-01 is empty: power off, reduce RAM (remove DIMMs) if reusing; otherwise decommission.

Phase 2: Migrate workload off r630-01

  1. Storage.
    Today there is no shared storage; each VM/containers disk lives on the host. So migration is:

    • LXC: pct migrate <vmid> <target-node> (or stop → backup → restore on target).
    • QEMU: qm migrate <vmid> <target-node> (live if storage allows) or stop → backup → restore on target.
  2. Order (suggested).

    • Migrate non-critical containers first (test, dev, duplicate roles).
    • Then critical ones: NPMplus (10233), RPC-related, Blockscout, DBIS, etc. Do these in a maintenance window if you want minimal impact.
  3. Example migrate one LXC to r630-05 (or r630-06, r630-07, r630-08).
    From any node with cluster access, or from r630-01:

    # List containers on r630-01
    ssh root@192.168.11.11 "pct list"
    
    # Migrate LXC 10234 to r630-05 (target must have storage; use r630-05..08 as needed)
    pct migrate 10234 r630-05 --restart
    

    If pct migrate fails (e.g. no shared storage), use backup on source → restore on target:

    # On r630-01: backup
    pct backup <vmid> backup-<vmid>.tar.gz --compress zstd
    
    # Copy to r630-05 (or shared storage), then on r630-05:
    pct restore <vmid> backup-<vmid>.tar.gz
    # Reconfigure network (IP, etc.) if needed, then start.
    
  4. After each move:
    Check service on the new host (IP, DNS, NPMplus proxy targets, etc.).


Phase 3: Downsize r630-01 to 128256 GB

  1. When r630-01 has no (or minimal) workload:
    Power off r630-01 (or put in maintenance).

  2. Reseat / remove DIMMs so total RAM is 128 GB or 256 GB (per your choice).

    • Use Dell docs / R630 Owners Manual for population rules (which slots to leave populated for 128 or 256 GB).
    • Keep DIMMs you pull for use in other R630s (to bring them to 128256 GB).
  3. Power on r630-01, confirm RAM in BIOS and in Proxmox (e.g. free -h).

  4. Repeat for r630-02 if it currently has 251 GB and you want it at 128256 GB; use DIMMs from r630-01 if needed.


Phase 4: Balance and HA readiness

  • Ensure no single node is overloaded (CPU/RAM).
  • Document final RAM per server: e.g. ml110 125 GB, r630-01 256 GB, r630-02 256 GB, (optional) r630-03 256 GB.
  • When you introduce shared storage (Ceph or NFS) and Proxmox HA, these 128256 GB nodes will match the “128256 GB per server” HA target.

Quick reference

Step Action
1 Inventory r630-01: ssh root@192.168.11.11 "pct list; qm list" or ./scripts/quick-proxmox-inventory.sh
2 Choose destinations: r630-05, r630-06, r630-07, r630-08 (256 GB each); ml110/r630-02 optional.
3 Migrate LXC: pct migrate <vmid> <target-node> or backup/restore.
4 Migrate QEMU: qm migrate <vmid> <target-node> or backup/restore.
5 When r630-01 is empty: power off, reduce RAM to 128256 GB, power on.
6 Add r630-05..08 to cluster (256 GB each); optionally downsize r630-02 using DIMMs from r630-01.

Target: All workload off the 503 GB R630 onto r630-05, r630-06, r630-07, r630-08 (256 GB each) for the HA cluster.