- Organized 252 files across project - Root directory: 187 → 2 files (98.9% reduction) - Moved configuration guides to docs/04-configuration/ - Moved troubleshooting guides to docs/09-troubleshooting/ - Moved quick start guides to docs/01-getting-started/ - Moved reports to reports/ directory - Archived temporary files - Generated comprehensive reports and documentation - Created maintenance scripts and guides All files organized according to established standards.
225 lines
6.0 KiB
Markdown
225 lines
6.0 KiB
Markdown
# Cluster Migration Plan - LXC Containers to pve2
|
|
|
|
**Date**: $(date)
|
|
**Status**: 📋 Planning Phase
|
|
|
|
## Cluster Overview
|
|
|
|
### Current Cluster Status
|
|
|
|
**Cluster Name**: h
|
|
**Nodes**: 3 (ml110, pve, pve2)
|
|
**Status**: ✅ Quorate (all nodes online)
|
|
|
|
### Node Resources
|
|
|
|
| Node | CPUs | RAM | RAM Used | RAM % | Disk | Disk Used | Disk % | Status |
|
|
|------|------|-----|----------|-------|------|-----------|--------|--------|
|
|
| **ml110** | 6 | 125.67 GB | 35.61 GB | 28.3% | 93.93 GB | 7.21 GB | 7.7% | 🟢 Online |
|
|
| **pve** | 32 | 503.79 GB | 5.62 GB | 1.1% | 538.78 GB | 2.06 GB | 0.4% | 🟢 Online |
|
|
| **pve2** | 56 | 251.77 GB | 4.49 GB | 1.8% | 222.90 GB | 1.97 GB | 0.9% | 🟢 Online |
|
|
|
|
**Analysis**:
|
|
- ml110 is heavily loaded (28.3% RAM, 9.4% CPU) with all 25 containers
|
|
- pve2 has abundant resources (1.8% RAM, 0.06% CPU) - ideal migration target
|
|
- pve also has capacity but pve2 has more CPUs (56 vs 32)
|
|
|
|
## Current Container Distribution
|
|
|
|
**All containers are currently on ml110 node (25 total)**:
|
|
|
|
### Infrastructure Services (Keep on ml110)
|
|
- 100: proxmox-mail-gateway
|
|
- 101: proxmox-datacenter-manager
|
|
- 102: cloudflared
|
|
- 103: omada
|
|
- 104: gitea
|
|
- 105: nginxproxymanager
|
|
- 130: monitoring-1
|
|
|
|
### Besu Blockchain Nodes (High Priority for Migration)
|
|
|
|
**Validators** (High resource usage - 8GB RAM each):
|
|
- 1000: besu-validator-1
|
|
- 1001: besu-validator-2
|
|
- 1002: besu-validator-3
|
|
- 1003: besu-validator-4
|
|
- 1004: besu-validator-5
|
|
|
|
**Sentries** (Moderate resource usage - 4GB RAM each):
|
|
- 1500: besu-sentry-1
|
|
- 1501: besu-sentry-2
|
|
- 1502: besu-sentry-3
|
|
- 1503: besu-sentry-4
|
|
|
|
**RPC Nodes** (Very high resource usage - 16GB RAM each):
|
|
- 2500: besu-rpc-1
|
|
- 2501: besu-rpc-2
|
|
- 2502: besu-rpc-3
|
|
|
|
### Application Services (Medium Priority)
|
|
|
|
- 3000-3003: ml110 containers (4 containers)
|
|
- 3500: oracle-publisher-1
|
|
- 3501: ccip-monitor-1
|
|
- 5000: blockscout-1 (database intensive)
|
|
- 6200: firefly-1
|
|
|
|
## Migration Strategy
|
|
|
|
### Phase 1: High Resource Containers (Priority 1)
|
|
|
|
**Target**: Move high-resource Besu nodes to pve2
|
|
|
|
**Containers to Migrate**:
|
|
1. Besu RPC nodes (2500-2502) - 16GB RAM each = **48GB total**
|
|
2. Besu Validators (1000-1004) - 8GB RAM each = **40GB total**
|
|
3. Blockscout (5000) - Database intensive
|
|
|
|
**Expected Impact**:
|
|
- Reduces ml110 RAM usage by ~88GB+ (from 35.61GB to much lower)
|
|
- Reduces ml110 CPU load significantly
|
|
- Utilizes pve2's 56 CPUs and 251GB RAM capacity
|
|
|
|
**Migration Order**:
|
|
1. Start with RPC nodes (one at a time to minimize disruption)
|
|
2. Then validators (can migrate in parallel if needed)
|
|
3. Finally Blockscout (database may take longer)
|
|
|
|
### Phase 2: Medium Resource Containers (Priority 2)
|
|
|
|
**Containers to Migrate**:
|
|
- Besu Sentries (1500-1503) - 4GB RAM each = **16GB total**
|
|
- Oracle Publisher (3500)
|
|
- CCIP Monitor (3501)
|
|
- Firefly (6200)
|
|
- ml110 containers (3000-3003) - if needed
|
|
|
|
### Phase 3: Keep on ml110 (Infrastructure)
|
|
|
|
**Containers to Keep**:
|
|
- Infrastructure services (100-105) - Core infrastructure
|
|
- Monitoring (130) - Should remain on primary node
|
|
|
|
## Migration Commands
|
|
|
|
### Single Container Migration
|
|
|
|
```bash
|
|
# Migrate a single container
|
|
ssh root@192.168.11.10 "pct migrate <VMID> pve2 --restart"
|
|
|
|
# Example: Migrate besu-rpc-1
|
|
ssh root@192.168.11.10 "pct migrate 2500 pve2 --restart"
|
|
```
|
|
|
|
### Using Migration Script
|
|
|
|
```bash
|
|
# Dry run to see what would be migrated
|
|
./scripts/migrate-containers-to-pve2.sh --dry-run
|
|
|
|
# Execute migration
|
|
./scripts/migrate-containers-to-pve2.sh
|
|
```
|
|
|
|
### Batch Migration
|
|
|
|
```bash
|
|
# Migrate all RPC nodes
|
|
for vmid in 2500 2501 2502; do
|
|
ssh root@192.168.11.10 "pct migrate $vmid pve2 --restart"
|
|
sleep 30 # Wait between migrations
|
|
done
|
|
|
|
# Migrate all validators
|
|
for vmid in 1000 1001 1002 1003 1004; do
|
|
ssh root@192.168.11.10 "pct migrate $vmid pve2 --restart"
|
|
sleep 30
|
|
done
|
|
```
|
|
|
|
## Migration Considerations
|
|
|
|
### Pre-Migration Checklist
|
|
|
|
- [x] Verify cluster is quorate
|
|
- [x] Verify target node (pve2) is online
|
|
- [x] Check available storage on pve2
|
|
- [ ] Verify network connectivity between nodes
|
|
- [ ] Plan maintenance window if needed
|
|
- [ ] Backup critical containers (if needed)
|
|
- [ ] Notify users of potential brief service interruption
|
|
|
|
### During Migration
|
|
|
|
1. **Migration is live** - Containers remain running during migration
|
|
2. **Network downtime** - Brief network interruption during cutover
|
|
3. **Storage migration** - Container disk images are copied to target node
|
|
4. **Restart** - Container is restarted on target node after migration
|
|
|
|
### Post-Migration Verification
|
|
|
|
```bash
|
|
# Verify containers are on pve2
|
|
ssh root@192.168.11.10 "pvesh get /nodes/pve2/lxc"
|
|
|
|
# Check container status
|
|
ssh root@192.168.11.10 "pct status 2500"
|
|
|
|
# Verify network connectivity from container
|
|
ssh root@192.168.11.10 "pct exec 2500 -- ping -c 3 192.168.11.250"
|
|
```
|
|
|
|
### Rollback Plan
|
|
|
|
If migration fails or issues occur:
|
|
|
|
```bash
|
|
# Migrate container back to ml110
|
|
ssh root@192.168.11.10 "pct migrate <VMID> ml110 --restart"
|
|
```
|
|
|
|
## Expected Results
|
|
|
|
### After Phase 1 Migration
|
|
|
|
**ml110**:
|
|
- RAM usage: ~35.61GB → ~10GB (reduced by ~25GB)
|
|
- CPU usage: ~9.4% → ~3-4%
|
|
- Containers: 25 → ~14 containers
|
|
|
|
**pve2**:
|
|
- RAM usage: ~4.49GB → ~90GB (increased by ~85GB)
|
|
- CPU usage: ~0.06% → ~5-10%
|
|
- Containers: 0 → ~11 containers
|
|
|
|
### Resource Distribution
|
|
|
|
| Node | Containers | RAM Usage | Status |
|
|
|------|------------|-----------|--------|
|
|
| ml110 | ~14 | ~10GB (8%) | ✅ Balanced |
|
|
| pve | 0 | ~5.6GB (1.1%) | ✅ Available |
|
|
| pve2 | ~11 | ~90GB (36%) | ✅ Well utilized |
|
|
|
|
## Next Steps
|
|
|
|
1. ✅ Review and approve migration plan
|
|
2. ⏳ Execute Phase 1 migrations (RPC nodes, validators, Blockscout)
|
|
3. ⏳ Verify all containers are running correctly on pve2
|
|
4. ⏳ Monitor resource usage on both nodes
|
|
5. ⏳ Execute Phase 2 migrations if needed
|
|
6. ⏳ Document final container distribution
|
|
|
|
## Related Scripts
|
|
|
|
- `scripts/analyze-cluster-migration.sh` - Analyze cluster and container distribution
|
|
- `scripts/migrate-containers-to-pve2.sh` - Execute container migrations
|
|
- `scripts/get-container-distribution.sh` - List containers by node
|
|
|
|
---
|
|
|
|
**Last Updated**: $(date)
|
|
**Status**: Ready for execution
|
|
|