152 lines
4.8 KiB
Markdown
152 lines
4.8 KiB
Markdown
# ML110 Deployment Log Analysis
|
|
|
|
**Date**: 2025-12-20
|
|
**Deployment Attempt**: Complete Validated Deployment (Option 1)
|
|
|
|
## Summary
|
|
|
|
The deployment attempt encountered network configuration errors during container creation, but the containers were not actually created (despite success messages in the logs).
|
|
|
|
## Key Findings
|
|
|
|
### 1. Network Configuration Errors
|
|
|
|
All container creation attempts failed with:
|
|
```
|
|
400 Parameter verification failed.
|
|
net0: invalid format - format error
|
|
net0.ip: invalid format - value does not look like a valid ipv4 network configuration
|
|
```
|
|
|
|
**Affected Containers**:
|
|
- Validators: 1000, 1001, 1002, 1003, 1004
|
|
- Sentries: 1500, 1501, 1502, 1503
|
|
- RPC Nodes: 2500, 2501, 2502
|
|
|
|
### 2. Script Logic Issue
|
|
|
|
The deployment script reports "Container created" even when `pct create` fails. This is misleading because:
|
|
|
|
- The `pct create` command returns an error (400 status)
|
|
- Containers were never actually created (no config files exist)
|
|
- The script continues execution as if containers exist
|
|
- All subsequent steps fail because containers don't exist
|
|
|
|
### 3. Network Format Validation
|
|
|
|
**Test Result**: The network configuration format is **CORRECT**:
|
|
```bash
|
|
bridge=vmbr0,name=eth0,ip=192.168.11.100/24,gw=192.168.11.1,type=veth
|
|
```
|
|
|
|
This format successfully created test container 99999.
|
|
|
|
### 4. Container History
|
|
|
|
System logs show containers were created on Dec 19-20 and later deleted:
|
|
- Validators 1000-1004: Created Dec 19, deleted Dec 20 06:21-06:22
|
|
- Sentries 1500-1503: Created Dec 19, deleted Dec 20 06:22-06:23
|
|
- RPC nodes 2500-2502: Created Dec 19, deleted Dec 20 06:21
|
|
|
|
### 5. Deployment Script Issues
|
|
|
|
**Location**: `/opt/smom-dbis-138-proxmox/scripts/deployment/deploy-besu-nodes.sh`
|
|
|
|
**Problems**:
|
|
1. **Error Handling**: Script doesn't check `pct create` exit code properly
|
|
2. **False Success**: Reports success even when container creation fails
|
|
3. **Variable Expansion**: Possible issue with variable expansion in network config string
|
|
|
|
**Expected Network Config** (from script):
|
|
```bash
|
|
network_config="bridge=${PROXMOX_BRIDGE:-vmbr0},name=eth0,ip=${ip_address}/${netmask},gw=${gateway},type=veth"
|
|
```
|
|
|
|
### 6. Configuration Phase Issues
|
|
|
|
Since containers don't exist:
|
|
- All configuration file copy attempts are skipped
|
|
- Container status checks all fail (containers not running)
|
|
- Network bootstrap fails (no containers to collect enodes from)
|
|
|
|
## Root Cause Analysis
|
|
|
|
The actual error suggests that at runtime, the network configuration string may be malformed due to:
|
|
|
|
1. **Variable Not Set**: `PROXMOX_BRIDGE`, `GATEWAY`, or `NETMASK` may be empty or incorrect
|
|
2. **Variable Expansion**: Shell variable expansion might not be working as expected
|
|
3. **String Formatting**: The network config string might be getting corrupted during variable substitution
|
|
|
|
## Evidence
|
|
|
|
### Working Containers (Reference)
|
|
Containers 100-105 exist and are running, using DHCP:
|
|
```
|
|
net0: bridge=vmbr0,firewall=1,hwaddr=BC:24:11:XX:XX:XX,ip=dhcp,type=veth
|
|
```
|
|
|
|
### Test Container Creation
|
|
Created container 99999 successfully with static IP format:
|
|
```
|
|
bridge=vmbr0,name=eth0,ip=192.168.11.100/24,gw=192.168.11.1,type=veth
|
|
```
|
|
|
|
## Recommendations
|
|
|
|
### 1. Fix Script Error Handling
|
|
|
|
Update `deploy-besu-nodes.sh` to properly check `pct create` exit code:
|
|
|
|
```bash
|
|
if ! pct create "$vmid" ...; then
|
|
log_error "Failed to create container $vmid"
|
|
return 1
|
|
fi
|
|
log_success "Container $vmid created"
|
|
```
|
|
|
|
### 2. Debug Variable Values
|
|
|
|
Add logging to show actual network config values before container creation:
|
|
|
|
```bash
|
|
log_info "Network config: $network_config"
|
|
log_info "PROXMOX_BRIDGE: ${PROXMOX_BRIDGE:-vmbr0}"
|
|
log_info "GATEWAY: ${GATEWAY:-192.168.11.1}"
|
|
log_info "NETMASK: ${NETMASK:-24}"
|
|
log_info "IP Address: $ip_address"
|
|
```
|
|
|
|
### 3. Verify Configuration File
|
|
|
|
Check that `/opt/smom-dbis-138-proxmox/config/proxmox.conf` and `network.conf` have correct values:
|
|
- `PROXMOX_BRIDGE=vmbr0`
|
|
- `GATEWAY=192.168.11.1`
|
|
- `NETMASK=24`
|
|
|
|
### 4. Alternative: Use DHCP Initially
|
|
|
|
Since containers 100-105 work with DHCP, consider:
|
|
1. Create containers with DHCP: `ip=dhcp`
|
|
2. After creation, use `pct set` to configure static IPs (as in `fix-container-ips.sh`)
|
|
|
|
This two-step approach is more reliable.
|
|
|
|
## Next Steps
|
|
|
|
1. **Fix Script**: Update error handling in `deploy-besu-nodes.sh`
|
|
2. **Add Debugging**: Add verbose logging for network config values
|
|
3. **Test Creation**: Create a single test container to verify fix
|
|
4. **Re-run Deployment**: Execute full deployment after fix
|
|
|
|
## Log Files
|
|
|
|
- **Deployment Log**: `/opt/smom-dbis-138-proxmox/logs/deploy-validated-set-20251220-112033.log`
|
|
- **System Logs**: `journalctl -u pve-container@*` shows container lifecycle
|
|
|
|
---
|
|
|
|
**Status**: ❌ Deployment Failed - Containers not created due to network config error
|
|
**Action Required**: Fix script error handling and verify configuration variable values
|
|
|