245 lines
6.9 KiB
Markdown
245 lines
6.9 KiB
Markdown
|
|
# Immediate Actions Execution Review
|
||
|
|
|
||
|
|
**Date:** 2026-01-20
|
||
|
|
**Review of:** Execution results from immediate actions
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Executive Summary
|
||
|
|
|
||
|
|
### ✅ Successes
|
||
|
|
1. **CPU Load Reduction:** ml110 CPU usage dropped from **81.5% to 39.2%** (52% reduction!)
|
||
|
|
2. **7 Containers Successfully Migrated** to r630-01:
|
||
|
|
- besu-validator-1, 2, 3 (containers 1000, 1001, 1002)
|
||
|
|
- besu-sentry-1, 2, 3 (containers 1500, 1501, 1502)
|
||
|
|
- besu-rpc-core-1 (container 2101)
|
||
|
|
3. **r630-01 Utilization:** CPU usage increased from 8.2% to 12.9% (still very healthy)
|
||
|
|
4. **All containers running** successfully after migration
|
||
|
|
|
||
|
|
### ⚠️ Issues Encountered
|
||
|
|
|
||
|
|
#### 1. Storage Incompatibility on r630-02
|
||
|
|
**Problem:** All 7 migrations to r630-02 failed with error:
|
||
|
|
```
|
||
|
|
storage 'local-lvm' is not available on node 'r630-02'
|
||
|
|
```
|
||
|
|
|
||
|
|
**Root Cause:**
|
||
|
|
- Containers on ml110 use `local-lvm` storage
|
||
|
|
- r630-02 has different storage pools: `thin1-r630-02`, `thin2`, `thin3`, `thin4`, `thin5`, `thin6`
|
||
|
|
- The standard `pct migrate` command doesn't automatically handle storage conversion
|
||
|
|
|
||
|
|
**Affected Containers:**
|
||
|
|
- besu-validator-4, 5 (1003, 1004)
|
||
|
|
- besu-sentry-4, ali (1503, 1504)
|
||
|
|
- besu-rpc-public-1 (2201)
|
||
|
|
- besu-rpc-ali-0x8a (2303)
|
||
|
|
- besu-rpc-thirdweb-0x8a-1 (2401)
|
||
|
|
|
||
|
|
#### 2. thin2 Storage Migration Issue
|
||
|
|
**Problem:** Container 5000 (blockscout-1) migration failed due to incorrect command syntax:
|
||
|
|
```
|
||
|
|
Unknown option: storage
|
||
|
|
pct migrate <vmid> <target> [OPTIONS]
|
||
|
|
```
|
||
|
|
|
||
|
|
**Root Cause:** The `pct migrate` command doesn't support `--storage` flag directly. Need to use API-based migration.
|
||
|
|
|
||
|
|
**Current Status:**
|
||
|
|
- Container 5000 still on thin2 (200GB disk, 96% used)
|
||
|
|
- Container 6200 also on thin2 (50GB disk)
|
||
|
|
- thin2 is at 88.86% capacity (210.7GB used of 226.13GB)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Current System State
|
||
|
|
|
||
|
|
### ml110
|
||
|
|
- **Before:** 23 containers, 81.5% CPU usage
|
||
|
|
- **After:** 16 containers, 39.2% CPU usage
|
||
|
|
- **Improvement:** ✅ 52% CPU reduction
|
||
|
|
- **Remaining High-CPU Containers:**
|
||
|
|
- besu-validator-4 (95.2% CPU) - Failed to migrate
|
||
|
|
- besu-validator-5 (60.9% CPU) - Failed to migrate
|
||
|
|
- besu-sentry-4 (96.8% CPU) - Failed to migrate
|
||
|
|
- besu-sentry-ali (94.1% CPU) - Failed to migrate
|
||
|
|
- besu-rpc-public-1 (80.0% CPU) - Failed to migrate
|
||
|
|
- besu-rpc-ali-0x8a (93.3% CPU) - Failed to migrate
|
||
|
|
- besu-rpc-thirdweb-0x8a-1 (94.1% CPU) - Failed to migrate
|
||
|
|
|
||
|
|
### r630-01
|
||
|
|
- **Before:** 50 containers, 8.2% CPU usage
|
||
|
|
- **After:** 57 containers, 12.9% CPU usage
|
||
|
|
- **Status:** ✅ Healthy, well within capacity
|
||
|
|
|
||
|
|
### r630-02
|
||
|
|
- **Before:** 7 containers, 5.3% CPU usage
|
||
|
|
- **After:** 7 containers, 5.3% CPU usage
|
||
|
|
- **Status:** ⚠️ Still underutilized - migrations failed
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Solutions Required
|
||
|
|
|
||
|
|
### 1. Fix r630-02 Migrations (High Priority)
|
||
|
|
|
||
|
|
**Solution:** Use API-based migration with storage parameter:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Method 1: Use pvesh API
|
||
|
|
pvesh create /nodes/ml110/lxc/<vmid>/migrate \
|
||
|
|
--target r630-02 \
|
||
|
|
--storage thin1-r630-02 \
|
||
|
|
--online 1
|
||
|
|
|
||
|
|
# Method 2: Stop container, migrate, change storage
|
||
|
|
pct stop <vmid>
|
||
|
|
pct migrate <vmid> r630-02
|
||
|
|
# Then manually move storage if needed
|
||
|
|
```
|
||
|
|
|
||
|
|
**Available Storage on r630-02:**
|
||
|
|
- `thin1-r630-02`: 0.34% used (225.36 GiB available) ✅ **Recommended**
|
||
|
|
- `thin3`: 3.11% used (219.10 GiB available)
|
||
|
|
- `thin4`: 22.59% used (175.05 GiB available)
|
||
|
|
- `thin5`: 0.00% used (226.13 GiB available)
|
||
|
|
- `thin6`: 0.00% used (226.13 GiB available)
|
||
|
|
|
||
|
|
### 2. Fix thin2 Capacity Issue (Critical)
|
||
|
|
|
||
|
|
**Containers Using thin2:**
|
||
|
|
- CT 5000 (blockscout-1): 200GB disk, 96% used
|
||
|
|
- CT 6200: 50GB disk, 10% used
|
||
|
|
- Orphaned volume: vm-6201-disk-0 (50GB, 7.72% used) - may be unused
|
||
|
|
|
||
|
|
**Solutions:**
|
||
|
|
1. **Migrate containers to free storage:**
|
||
|
|
- Use `pvesh` API to migrate CT 5000 to `thin1-r630-02` or `thin3`
|
||
|
|
- Migrate CT 6200 to available storage
|
||
|
|
- Clean up orphaned volumes if not in use
|
||
|
|
|
||
|
|
2. **Alternative:** Expand thin2 storage if possible
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Recommended Next Steps
|
||
|
|
|
||
|
|
### Immediate (Critical)
|
||
|
|
1. ✅ **Complete r630-02 migrations** using API-based method with storage parameter
|
||
|
|
2. ✅ **Migrate containers from thin2** to free up capacity
|
||
|
|
3. ✅ **Verify all migrations** and check container health
|
||
|
|
|
||
|
|
### High Priority
|
||
|
|
4. ✅ **Monitor CPU usage** on ml110 - should stabilize around 30-40%
|
||
|
|
5. ✅ **Check container health** after migrations
|
||
|
|
6. ✅ **Document storage mapping** for future migrations
|
||
|
|
|
||
|
|
### Medium Priority
|
||
|
|
7. ✅ **Investigate inactive storage pools** (data/thin1 on r630-02 are node-restricted)
|
||
|
|
8. ✅ **Optimize storage distribution** across all nodes
|
||
|
|
9. ✅ **Set up monitoring alerts** for storage >80% and CPU >70%
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Migration Commands for r630-02
|
||
|
|
|
||
|
|
### Using API-based Migration (Correct Method)
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# On ml110 or via SSH
|
||
|
|
# For each container, use:
|
||
|
|
|
||
|
|
# besu-validator-4 (1003)
|
||
|
|
pvesh create /nodes/ml110/lxc/1003/migrate \
|
||
|
|
--target r630-02 \
|
||
|
|
--storage thin1-r630-02 \
|
||
|
|
--online 1
|
||
|
|
|
||
|
|
# besu-validator-5 (1004)
|
||
|
|
pvesh create /nodes/ml110/lxc/1004/migrate \
|
||
|
|
--target r630-02 \
|
||
|
|
--storage thin1-r630-02 \
|
||
|
|
--online 1
|
||
|
|
|
||
|
|
# besu-sentry-4 (1503)
|
||
|
|
pvesh create /nodes/ml110/lxc/1503/migrate \
|
||
|
|
--target r630-02 \
|
||
|
|
--storage thin1-r630-02 \
|
||
|
|
--online 1
|
||
|
|
|
||
|
|
# besu-sentry-ali (1504)
|
||
|
|
pvesh create /nodes/ml110/lxc/1504/migrate \
|
||
|
|
--target r630-02 \
|
||
|
|
--storage thin1-r630-02 \
|
||
|
|
--online 1
|
||
|
|
|
||
|
|
# besu-rpc-public-1 (2201)
|
||
|
|
pvesh create /nodes/ml110/lxc/2201/migrate \
|
||
|
|
--target r630-02 \
|
||
|
|
--storage thin1-r630-02 \
|
||
|
|
--online 1
|
||
|
|
|
||
|
|
# besu-rpc-ali-0x8a (2303)
|
||
|
|
pvesh create /nodes/ml110/lxc/2303/migrate \
|
||
|
|
--target r630-02 \
|
||
|
|
--storage thin1-r630-02 \
|
||
|
|
--online 1
|
||
|
|
|
||
|
|
# besu-rpc-thirdweb-0x8a-1 (2401)
|
||
|
|
pvesh create /nodes/ml110/lxc/2401/migrate \
|
||
|
|
--target r630-02 \
|
||
|
|
--storage thin1-r630-02 \
|
||
|
|
--online 1
|
||
|
|
```
|
||
|
|
|
||
|
|
### Migrate thin2 Containers
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# On r630-02
|
||
|
|
# Migrate CT 5000 (blockscout-1) to thin1-r630-02
|
||
|
|
pvesh create /nodes/r630-02/lxc/5000/migrate \
|
||
|
|
--target r630-02 \
|
||
|
|
--storage thin1-r630-02 \
|
||
|
|
--online 0 # Stop first if needed
|
||
|
|
|
||
|
|
# Migrate CT 6200 to thin1-r630-02
|
||
|
|
pvesh create /nodes/r630-02/lxc/6200/migrate \
|
||
|
|
--target r630-02 \
|
||
|
|
--storage thin1-r630-02 \
|
||
|
|
--online 0
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Expected Results After Completion
|
||
|
|
|
||
|
|
### ml110
|
||
|
|
- **CPU Usage:** ~15-20% (down from 81.5%)
|
||
|
|
- **Container Count:** ~9 containers (down from 23)
|
||
|
|
- **Status:** ✅ Optimally loaded for management/light workloads
|
||
|
|
|
||
|
|
### r630-01
|
||
|
|
- **CPU Usage:** ~15-20% (up from 8.2%)
|
||
|
|
- **Container Count:** ~57 containers
|
||
|
|
- **Status:** ✅ Well-balanced workload distribution
|
||
|
|
|
||
|
|
### r630-02
|
||
|
|
- **CPU Usage:** ~15-20% (up from 5.3%)
|
||
|
|
- **Container Count:** ~14 containers (up from 7)
|
||
|
|
- **Status:** ✅ Better utilization of high-core CPU
|
||
|
|
- **Storage:** thin2 below 50% usage
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Lessons Learned
|
||
|
|
|
||
|
|
1. **Storage Compatibility:** Always check available storage on target node before migration
|
||
|
|
2. **API vs CLI:** Use `pvesh` API for migrations when storage conversion is needed
|
||
|
|
3. **Migration Strategy:** Consider two-step migration (node first, then storage) for complex scenarios
|
||
|
|
4. **Verification:** Always verify migrations and check container health after completion
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Report Generated:** 2026-01-20
|
||
|
|
**Status:** Partial Success - 7/14 migrations completed successfully
|