Files
proxmox/docs/archive/historical/CLUSTER_MIGRATION_PLAN.md
defiQUG cb47cce074 Complete markdown files cleanup and organization
- Organized 252 files across project
- Root directory: 187 → 2 files (98.9% reduction)
- Moved configuration guides to docs/04-configuration/
- Moved troubleshooting guides to docs/09-troubleshooting/
- Moved quick start guides to docs/01-getting-started/
- Moved reports to reports/ directory
- Archived temporary files
- Generated comprehensive reports and documentation
- Created maintenance scripts and guides

All files organized according to established standards.
2026-01-06 01:46:25 -08:00

225 lines
6.0 KiB
Markdown

# Cluster Migration Plan - LXC Containers to pve2
**Date**: $(date)
**Status**: 📋 Planning Phase
## Cluster Overview
### Current Cluster Status
**Cluster Name**: h
**Nodes**: 3 (ml110, pve, pve2)
**Status**: ✅ Quorate (all nodes online)
### Node Resources
| Node | CPUs | RAM | RAM Used | RAM % | Disk | Disk Used | Disk % | Status |
|------|------|-----|----------|-------|------|-----------|--------|--------|
| **ml110** | 6 | 125.67 GB | 35.61 GB | 28.3% | 93.93 GB | 7.21 GB | 7.7% | 🟢 Online |
| **pve** | 32 | 503.79 GB | 5.62 GB | 1.1% | 538.78 GB | 2.06 GB | 0.4% | 🟢 Online |
| **pve2** | 56 | 251.77 GB | 4.49 GB | 1.8% | 222.90 GB | 1.97 GB | 0.9% | 🟢 Online |
**Analysis**:
- ml110 is heavily loaded (28.3% RAM, 9.4% CPU) with all 25 containers
- pve2 has abundant resources (1.8% RAM, 0.06% CPU) - ideal migration target
- pve also has capacity but pve2 has more CPUs (56 vs 32)
## Current Container Distribution
**All containers are currently on ml110 node (25 total)**:
### Infrastructure Services (Keep on ml110)
- 100: proxmox-mail-gateway
- 101: proxmox-datacenter-manager
- 102: cloudflared
- 103: omada
- 104: gitea
- 105: nginxproxymanager
- 130: monitoring-1
### Besu Blockchain Nodes (High Priority for Migration)
**Validators** (High resource usage - 8GB RAM each):
- 1000: besu-validator-1
- 1001: besu-validator-2
- 1002: besu-validator-3
- 1003: besu-validator-4
- 1004: besu-validator-5
**Sentries** (Moderate resource usage - 4GB RAM each):
- 1500: besu-sentry-1
- 1501: besu-sentry-2
- 1502: besu-sentry-3
- 1503: besu-sentry-4
**RPC Nodes** (Very high resource usage - 16GB RAM each):
- 2500: besu-rpc-1
- 2501: besu-rpc-2
- 2502: besu-rpc-3
### Application Services (Medium Priority)
- 3000-3003: ml110 containers (4 containers)
- 3500: oracle-publisher-1
- 3501: ccip-monitor-1
- 5000: blockscout-1 (database intensive)
- 6200: firefly-1
## Migration Strategy
### Phase 1: High Resource Containers (Priority 1)
**Target**: Move high-resource Besu nodes to pve2
**Containers to Migrate**:
1. Besu RPC nodes (2500-2502) - 16GB RAM each = **48GB total**
2. Besu Validators (1000-1004) - 8GB RAM each = **40GB total**
3. Blockscout (5000) - Database intensive
**Expected Impact**:
- Reduces ml110 RAM usage by ~88GB+ (from 35.61GB to much lower)
- Reduces ml110 CPU load significantly
- Utilizes pve2's 56 CPUs and 251GB RAM capacity
**Migration Order**:
1. Start with RPC nodes (one at a time to minimize disruption)
2. Then validators (can migrate in parallel if needed)
3. Finally Blockscout (database may take longer)
### Phase 2: Medium Resource Containers (Priority 2)
**Containers to Migrate**:
- Besu Sentries (1500-1503) - 4GB RAM each = **16GB total**
- Oracle Publisher (3500)
- CCIP Monitor (3501)
- Firefly (6200)
- ml110 containers (3000-3003) - if needed
### Phase 3: Keep on ml110 (Infrastructure)
**Containers to Keep**:
- Infrastructure services (100-105) - Core infrastructure
- Monitoring (130) - Should remain on primary node
## Migration Commands
### Single Container Migration
```bash
# Migrate a single container
ssh root@192.168.11.10 "pct migrate <VMID> pve2 --restart"
# Example: Migrate besu-rpc-1
ssh root@192.168.11.10 "pct migrate 2500 pve2 --restart"
```
### Using Migration Script
```bash
# Dry run to see what would be migrated
./scripts/migrate-containers-to-pve2.sh --dry-run
# Execute migration
./scripts/migrate-containers-to-pve2.sh
```
### Batch Migration
```bash
# Migrate all RPC nodes
for vmid in 2500 2501 2502; do
ssh root@192.168.11.10 "pct migrate $vmid pve2 --restart"
sleep 30 # Wait between migrations
done
# Migrate all validators
for vmid in 1000 1001 1002 1003 1004; do
ssh root@192.168.11.10 "pct migrate $vmid pve2 --restart"
sleep 30
done
```
## Migration Considerations
### Pre-Migration Checklist
- [x] Verify cluster is quorate
- [x] Verify target node (pve2) is online
- [x] Check available storage on pve2
- [ ] Verify network connectivity between nodes
- [ ] Plan maintenance window if needed
- [ ] Backup critical containers (if needed)
- [ ] Notify users of potential brief service interruption
### During Migration
1. **Migration is live** - Containers remain running during migration
2. **Network downtime** - Brief network interruption during cutover
3. **Storage migration** - Container disk images are copied to target node
4. **Restart** - Container is restarted on target node after migration
### Post-Migration Verification
```bash
# Verify containers are on pve2
ssh root@192.168.11.10 "pvesh get /nodes/pve2/lxc"
# Check container status
ssh root@192.168.11.10 "pct status 2500"
# Verify network connectivity from container
ssh root@192.168.11.10 "pct exec 2500 -- ping -c 3 192.168.11.250"
```
### Rollback Plan
If migration fails or issues occur:
```bash
# Migrate container back to ml110
ssh root@192.168.11.10 "pct migrate <VMID> ml110 --restart"
```
## Expected Results
### After Phase 1 Migration
**ml110**:
- RAM usage: ~35.61GB → ~10GB (reduced by ~25GB)
- CPU usage: ~9.4% → ~3-4%
- Containers: 25 → ~14 containers
**pve2**:
- RAM usage: ~4.49GB → ~90GB (increased by ~85GB)
- CPU usage: ~0.06% → ~5-10%
- Containers: 0 → ~11 containers
### Resource Distribution
| Node | Containers | RAM Usage | Status |
|------|------------|-----------|--------|
| ml110 | ~14 | ~10GB (8%) | ✅ Balanced |
| pve | 0 | ~5.6GB (1.1%) | ✅ Available |
| pve2 | ~11 | ~90GB (36%) | ✅ Well utilized |
## Next Steps
1. ✅ Review and approve migration plan
2. ⏳ Execute Phase 1 migrations (RPC nodes, validators, Blockscout)
3. ⏳ Verify all containers are running correctly on pve2
4. ⏳ Monitor resource usage on both nodes
5. ⏳ Execute Phase 2 migrations if needed
6. ⏳ Document final container distribution
## Related Scripts
- `scripts/analyze-cluster-migration.sh` - Analyze cluster and container distribution
- `scripts/migrate-containers-to-pve2.sh` - Execute container migrations
- `scripts/get-container-distribution.sh` - List containers by node
---
**Last Updated**: $(date)
**Status**: Ready for execution