proxmox/docs/archive/historical/CLUSTER_MIGRATION_PLAN.md

# Cluster Migration Plan - LXC Containers to pve2

**Date**: $(date)
**Status**: 📋 Planning Phase

## Cluster Overview

### Current Cluster Status

**Cluster Name**: h
**Nodes**: 3 (ml110, pve, pve2)
**Status**: ✅ Quorate (all nodes online)

### Node Resources

| Node | CPUs | RAM | RAM Used | RAM % | Disk | Disk Used | Disk % | Status |
|------|------|-----|----------|-------|------|-----------|--------|--------|
| **ml110** | 6 | 125.67 GB | 35.61 GB | 28.3% | 93.93 GB | 7.21 GB | 7.7% | 🟢 Online |
| **pve** | 32 | 503.79 GB | 5.62 GB | 1.1% | 538.78 GB | 2.06 GB | 0.4% | 🟢 Online |
| **pve2** | 56 | 251.77 GB | 4.49 GB | 1.8% | 222.90 GB | 1.97 GB | 0.9% | 🟢 Online |

**Analysis**:
- ml110 is heavily loaded (28.3% RAM, 9.4% CPU) with all 25 containers
- pve2 has abundant resources (1.8% RAM, 0.06% CPU) - ideal migration target
- pve also has capacity but pve2 has more CPUs (56 vs 32)

## Current Container Distribution

**All containers are currently on ml110 node (25 total)**:

### Infrastructure Services (Keep on ml110)
- 100: proxmox-mail-gateway
- 101: proxmox-datacenter-manager
- 102: cloudflared
- 103: omada
- 104: gitea
- 105: nginxproxymanager
- 130: monitoring-1

### Besu Blockchain Nodes (High Priority for Migration)

**Validators** (High resource usage - 8GB RAM each):
- 1000: besu-validator-1
- 1001: besu-validator-2
- 1002: besu-validator-3
- 1003: besu-validator-4
- 1004: besu-validator-5

**Sentries** (Moderate resource usage - 4GB RAM each):
- 1500: besu-sentry-1
- 1501: besu-sentry-2
- 1502: besu-sentry-3
- 1503: besu-sentry-4

**RPC Nodes** (Very high resource usage - 16GB RAM each):
- 2500: besu-rpc-1
- 2501: besu-rpc-2
- 2502: besu-rpc-3

### Application Services (Medium Priority)

- 3000-3003: ml110 containers (4 containers)
- 3500: oracle-publisher-1
- 3501: ccip-monitor-1
- 5000: blockscout-1 (database intensive)
- 6200: firefly-1

## Migration Strategy

### Phase 1: High Resource Containers (Priority 1)

**Target**: Move high-resource Besu nodes to pve2

**Containers to Migrate**:
1. Besu RPC nodes (2500-2502) - 16GB RAM each = **48GB total**
2. Besu Validators (1000-1004) - 8GB RAM each = **40GB total**
3. Blockscout (5000) - Database intensive

**Expected Impact**:
- Reduces ml110 RAM usage by ~88GB+ (from 35.61GB to much lower)
- Reduces ml110 CPU load significantly
- Utilizes pve2's 56 CPUs and 251GB RAM capacity

**Migration Order**:
1. Start with RPC nodes (one at a time to minimize disruption)
2. Then validators (can migrate in parallel if needed)
3. Finally Blockscout (database may take longer)

### Phase 2: Medium Resource Containers (Priority 2)

**Containers to Migrate**:
- Besu Sentries (1500-1503) - 4GB RAM each = **16GB total**
- Oracle Publisher (3500)
- CCIP Monitor (3501)
- Firefly (6200)
- ml110 containers (3000-3003) - if needed

### Phase 3: Keep on ml110 (Infrastructure)

**Containers to Keep**:
- Infrastructure services (100-105) - Core infrastructure
- Monitoring (130) - Should remain on primary node

## Migration Commands

### Single Container Migration

```bash
# Migrate a single container
ssh root@192.168.11.10 "pct migrate <VMID> pve2 --restart"

# Example: Migrate besu-rpc-1
ssh root@192.168.11.10 "pct migrate 2500 pve2 --restart"
```

### Using Migration Script

```bash
# Dry run to see what would be migrated
./scripts/migrate-containers-to-pve2.sh --dry-run

# Execute migration
./scripts/migrate-containers-to-pve2.sh
```

### Batch Migration

```bash
# Migrate all RPC nodes
for vmid in 2500 2501 2502; do
    ssh root@192.168.11.10 "pct migrate $vmid pve2 --restart"
    sleep 30  # Wait between migrations
done

# Migrate all validators
for vmid in 1000 1001 1002 1003 1004; do
    ssh root@192.168.11.10 "pct migrate $vmid pve2 --restart"
    sleep 30
done
```

## Migration Considerations

### Pre-Migration Checklist

- [x] Verify cluster is quorate
- [x] Verify target node (pve2) is online
- [x] Check available storage on pve2
- [ ] Verify network connectivity between nodes
- [ ] Plan maintenance window if needed
- [ ] Backup critical containers (if needed)
- [ ] Notify users of potential brief service interruption

### During Migration

1. **Migration is live** - Containers remain running during migration
2. **Network downtime** - Brief network interruption during cutover
3. **Storage migration** - Container disk images are copied to target node
4. **Restart** - Container is restarted on target node after migration

### Post-Migration Verification

```bash
# Verify containers are on pve2
ssh root@192.168.11.10 "pvesh get /nodes/pve2/lxc"

# Check container status
ssh root@192.168.11.10 "pct status 2500"

# Verify network connectivity from container
ssh root@192.168.11.10 "pct exec 2500 -- ping -c 3 192.168.11.250"
```

### Rollback Plan

If migration fails or issues occur:

```bash
# Migrate container back to ml110
ssh root@192.168.11.10 "pct migrate <VMID> ml110 --restart"
```

## Expected Results

### After Phase 1 Migration

**ml110**:
- RAM usage: ~35.61GB → ~10GB (reduced by ~25GB)
- CPU usage: ~9.4% → ~3-4%
- Containers: 25 → ~14 containers

**pve2**:
- RAM usage: ~4.49GB → ~90GB (increased by ~85GB)
- CPU usage: ~0.06% → ~5-10%
- Containers: 0 → ~11 containers

### Resource Distribution

| Node | Containers | RAM Usage | Status |
|------|------------|-----------|--------|
| ml110 | ~14 | ~10GB (8%) | ✅ Balanced |
| pve | 0 | ~5.6GB (1.1%) | ✅ Available |
| pve2 | ~11 | ~90GB (36%) | ✅ Well utilized |

## Next Steps

1. ✅ Review and approve migration plan
2. ⏳ Execute Phase 1 migrations (RPC nodes, validators, Blockscout)
3. ⏳ Verify all containers are running correctly on pve2
4. ⏳ Monitor resource usage on both nodes
5. ⏳ Execute Phase 2 migrations if needed
6. ⏳ Document final container distribution

## Related Scripts

- `scripts/analyze-cluster-migration.sh` - Analyze cluster and container distribution
- `scripts/migrate-containers-to-pve2.sh` - Execute container migrations
- `scripts/get-container-distribution.sh` - List containers by node

---

**Last Updated**: $(date)
**Status**: Ready for execution