Files
proxmox/docs/archive/ML110_DEPLOYMENT_LOG_ANALYSIS.md

152 lines
4.8 KiB
Markdown

# ML110 Deployment Log Analysis
**Date**: 2025-12-20
**Deployment Attempt**: Complete Validated Deployment (Option 1)
## Summary
The deployment attempt encountered network configuration errors during container creation, but the containers were not actually created (despite success messages in the logs).
## Key Findings
### 1. Network Configuration Errors
All container creation attempts failed with:
```
400 Parameter verification failed.
net0: invalid format - format error
net0.ip: invalid format - value does not look like a valid ipv4 network configuration
```
**Affected Containers**:
- Validators: 1000, 1001, 1002, 1003, 1004
- Sentries: 1500, 1501, 1502, 1503
- RPC Nodes: 2500, 2501, 2502
### 2. Script Logic Issue
The deployment script reports "Container created" even when `pct create` fails. This is misleading because:
- The `pct create` command returns an error (400 status)
- Containers were never actually created (no config files exist)
- The script continues execution as if containers exist
- All subsequent steps fail because containers don't exist
### 3. Network Format Validation
**Test Result**: The network configuration format is **CORRECT**:
```bash
bridge=vmbr0,name=eth0,ip=192.168.11.100/24,gw=192.168.11.1,type=veth
```
This format successfully created test container 99999.
### 4. Container History
System logs show containers were created on Dec 19-20 and later deleted:
- Validators 1000-1004: Created Dec 19, deleted Dec 20 06:21-06:22
- Sentries 1500-1503: Created Dec 19, deleted Dec 20 06:22-06:23
- RPC nodes 2500-2502: Created Dec 19, deleted Dec 20 06:21
### 5. Deployment Script Issues
**Location**: `/opt/smom-dbis-138-proxmox/scripts/deployment/deploy-besu-nodes.sh`
**Problems**:
1. **Error Handling**: Script doesn't check `pct create` exit code properly
2. **False Success**: Reports success even when container creation fails
3. **Variable Expansion**: Possible issue with variable expansion in network config string
**Expected Network Config** (from script):
```bash
network_config="bridge=${PROXMOX_BRIDGE:-vmbr0},name=eth0,ip=${ip_address}/${netmask},gw=${gateway},type=veth"
```
### 6. Configuration Phase Issues
Since containers don't exist:
- All configuration file copy attempts are skipped
- Container status checks all fail (containers not running)
- Network bootstrap fails (no containers to collect enodes from)
## Root Cause Analysis
The actual error suggests that at runtime, the network configuration string may be malformed due to:
1. **Variable Not Set**: `PROXMOX_BRIDGE`, `GATEWAY`, or `NETMASK` may be empty or incorrect
2. **Variable Expansion**: Shell variable expansion might not be working as expected
3. **String Formatting**: The network config string might be getting corrupted during variable substitution
## Evidence
### Working Containers (Reference)
Containers 100-105 exist and are running, using DHCP:
```
net0: bridge=vmbr0,firewall=1,hwaddr=BC:24:11:XX:XX:XX,ip=dhcp,type=veth
```
### Test Container Creation
Created container 99999 successfully with static IP format:
```
bridge=vmbr0,name=eth0,ip=192.168.11.100/24,gw=192.168.11.1,type=veth
```
## Recommendations
### 1. Fix Script Error Handling
Update `deploy-besu-nodes.sh` to properly check `pct create` exit code:
```bash
if ! pct create "$vmid" ...; then
log_error "Failed to create container $vmid"
return 1
fi
log_success "Container $vmid created"
```
### 2. Debug Variable Values
Add logging to show actual network config values before container creation:
```bash
log_info "Network config: $network_config"
log_info "PROXMOX_BRIDGE: ${PROXMOX_BRIDGE:-vmbr0}"
log_info "GATEWAY: ${GATEWAY:-192.168.11.1}"
log_info "NETMASK: ${NETMASK:-24}"
log_info "IP Address: $ip_address"
```
### 3. Verify Configuration File
Check that `/opt/smom-dbis-138-proxmox/config/proxmox.conf` and `network.conf` have correct values:
- `PROXMOX_BRIDGE=vmbr0`
- `GATEWAY=192.168.11.1`
- `NETMASK=24`
### 4. Alternative: Use DHCP Initially
Since containers 100-105 work with DHCP, consider:
1. Create containers with DHCP: `ip=dhcp`
2. After creation, use `pct set` to configure static IPs (as in `fix-container-ips.sh`)
This two-step approach is more reliable.
## Next Steps
1. **Fix Script**: Update error handling in `deploy-besu-nodes.sh`
2. **Add Debugging**: Add verbose logging for network config values
3. **Test Creation**: Create a single test container to verify fix
4. **Re-run Deployment**: Execute full deployment after fix
## Log Files
- **Deployment Log**: `/opt/smom-dbis-138-proxmox/logs/deploy-validated-set-20251220-112033.log`
- **System Logs**: `journalctl -u pve-container@*` shows container lifecycle
---
**Status**: ❌ Deployment Failed - Containers not created due to network config error
**Action Required**: Fix script error handling and verify configuration variable values