Files
proxmox/docs/09-troubleshooting/TROUBLESHOOTING_FAQ.md

509 lines
12 KiB
Markdown

# Troubleshooting FAQ
Common issues and solutions for Besu validated set deployment.
## Table of Contents
1. [Container Issues](#container-issues)
2. [Service Issues](#service-issues)
3. [Network Issues](#network-issues)
4. [Consensus Issues](#consensus-issues)
5. [Configuration Issues](#configuration-issues)
6. [Performance Issues](#performance-issues)
---
## Container Issues
### Q: Container won't start
**Symptoms**: `pct status <vmid>` shows "stopped" or errors during startup
**Solutions**:
```bash
# Check container status
pct status <vmid>
# View container console
pct console <vmid>
# Check logs
journalctl -u pve-container@<vmid>
# Check container configuration
pct config <vmid>
# Try starting manually
pct start <vmid>
```
**Common Causes**:
- Insufficient resources (RAM, disk)
- Network configuration errors
- Invalid container configuration
- OS template issues
---
### Q: Container runs out of disk space
**Symptoms**: Services fail, "No space left on device" errors
**Solutions**:
```bash
# Check disk usage
pct exec <vmid> -- df -h
# Check Besu database size
pct exec <vmid> -- du -sh /data/besu/database/
# Clean up old logs
pct exec <vmid> -- journalctl --vacuum-time=7d
# Increase disk size (if using LVM)
pct resize <vmid> rootfs +10G
```
---
### Q: Container network issues
**Symptoms**: Cannot ping, cannot connect to services
**Solutions**:
```bash
# Check network configuration
pct config <vmid> | grep net0
# Check if container has IP
pct exec <vmid> -- ip addr show
# Check routing
pct exec <vmid> -- ip route
# Restart container networking
pct stop <vmid>
pct start <vmid>
```
---
## Service Issues
### Q: Besu service won't start
**Symptoms**: `systemctl status besu-validator` shows failed
**Solutions**:
```bash
# Check service status
pct exec <vmid> -- systemctl status besu-validator
# View service logs
pct exec <vmid> -- journalctl -u besu-validator -n 100
# Check for configuration errors
pct exec <vmid> -- besu --config-file=/etc/besu/config-validator.toml --help
# Verify configuration file syntax
pct exec <vmid> -- cat /etc/besu/config-validator.toml
```
**Common Causes**:
- Missing configuration files
- Invalid configuration syntax
- Missing validator keys
- Port conflicts
- Insufficient resources
---
### Q: Service starts but crashes
**Symptoms**: Service starts then stops, high restart count
**Solutions**:
```bash
# Check crash logs
pct exec <vmid> -- journalctl -u besu-validator --since "10 minutes ago"
# Check for out of memory
pct exec <vmid> -- dmesg | grep -i "out of memory"
# Check system resources
pct exec <vmid> -- free -h
pct exec <vmid> -- df -h
# Check JVM heap settings
pct exec <vmid> -- cat /etc/systemd/system/besu-validator.service | grep BESU_OPTS
```
---
### Q: Service shows as active but not responding
**Symptoms**: Service status shows "active" but RPC/P2P not responding
**Solutions**:
```bash
# Check if process is actually running
pct exec <vmid> -- ps aux | grep besu
# Check if ports are listening
pct exec <vmid> -- netstat -tuln | grep -E "30303|8545|9545"
# Check firewall rules
pct exec <vmid> -- iptables -L -n
# Test connectivity
pct exec <vmid> -- curl -s http://localhost:8545
```
---
## Network Issues
### Q: Nodes cannot connect to peers
**Symptoms**: Low or zero peer count, "No peers" in logs
**Solutions**:
```bash
# Check static-nodes.json
pct exec <vmid> -- cat /etc/besu/static-nodes.json
# Check permissions-nodes.toml
pct exec <vmid> -- cat /etc/besu/permissions-nodes.toml
# Verify enode URLs are correct
pct exec <vmid> -- besu public-key export --node-private-key-file=/data/besu/nodekey --format=enode
# Check P2P port is open
pct exec <vmid> -- netstat -tuln | grep 30303
# Test connectivity to peer
pct exec <vmid> -- ping -c 3 <peer-ip>
```
**Common Causes**:
- Incorrect enode URLs in static-nodes.json
- Firewall blocking P2P port (30303)
- Nodes not in permissions-nodes.toml
- Network connectivity issues
---
### Q: Invalid enode URL errors
**Symptoms**: "Invalid enode URL syntax" or "Invalid node ID" in logs
**Solutions**:
```bash
# Check node ID length (must be 128 hex chars)
pct exec <vmid> -- besu public-key export --node-private-key-file=/data/besu/nodekey --format=enode | \
sed 's|^enode://||' | cut -d'@' -f1 | wc -c
# Should output 129 (128 chars + newline)
# Fix node IDs using allowlist scripts
./scripts/besu-collect-all-enodes.sh
./scripts/besu-generate-allowlist.sh
./scripts/besu-deploy-allowlist.sh
```
---
### Q: RPC endpoint not accessible
**Symptoms**: Cannot connect to RPC on port 8545
**Solutions**:
```bash
# Check if RPC is enabled (validators typically don't have RPC)
pct exec <vmid> -- grep -i "rpc-http-enabled" /etc/besu/config-*.toml
# Check if RPC port is listening
pct exec <vmid> -- netstat -tuln | grep 8545
# Check firewall
pct exec <vmid> -- iptables -L -n | grep 8545
# Test from container
pct exec <vmid> -- curl -X POST -H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
http://localhost:8545
# Check host allowlist in config
pct exec <vmid> -- grep -i "host-allowlist\|rpc-http-host" /etc/besu/config-*.toml
```
---
## Consensus Issues
### Q: No blocks being produced
**Symptoms**: Block height not increasing, "No blocks" in logs
**Solutions**:
```bash
# Check validator service is running
pct exec <vmid> -- systemctl status besu-validator
# Check validator keys
pct exec <vmid> -- ls -la /keys/validators/
# Check consensus logs
pct exec <vmid> -- journalctl -u besu-validator | grep -i "consensus\|qbft\|proposing"
# Verify validators are in genesis (if static validators)
pct exec <vmid> -- cat /etc/besu/genesis.json | grep -A 20 "qbft"
# Check peer connectivity
pct exec <vmid> -- curl -s -X POST -H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \
http://localhost:8545
```
**Common Causes**:
- Validator keys missing or incorrect
- Not enough validators online
- Network connectivity issues
- Consensus configuration errors
---
### Q: Validator not participating in consensus
**Symptoms**: Validator running but not producing blocks
**Solutions**:
```bash
# Verify validator address
pct exec <vmid> -- cat /keys/validators/validator-*/address.txt
# Check if address is in validator contract (for dynamic validators)
# Or check genesis.json (for static validators)
pct exec <vmid> -- cat /etc/besu/genesis.json | python3 -m json.tool | grep -A 10 "qbft"
# Verify validator keys are loaded
pct exec <vmid> -- journalctl -u besu-validator | grep -i "validator.*key"
# Check for permission errors
pct exec <vmid> -- journalctl -u besu-validator | grep -i "permission\|denied"
```
---
## Configuration Issues
### Q: Configuration file not found
**Symptoms**: "File not found" errors, service won't start
**Solutions**:
```bash
# List all config files
pct exec <vmid> -- ls -la /etc/besu/
# Verify required files exist
pct exec <vmid> -- test -f /etc/besu/genesis.json && echo "genesis.json OK" || echo "genesis.json MISSING"
pct exec <vmid> -- test -f /etc/besu/config-validator.toml && echo "config OK" || echo "config MISSING"
# Copy missing files
# (Use copy-besu-config.sh script)
./scripts/copy-besu-config.sh /path/to/smom-dbis-138
```
---
### Q: Invalid configuration syntax
**Symptoms**: "Invalid option" or syntax errors in logs
**Solutions**:
```bash
# Validate TOML syntax
pct exec <vmid> -- python3 -c "import tomllib; open('/etc/besu/config-validator.toml').read()" 2>&1
# Validate JSON syntax
pct exec <vmid> -- python3 -m json.tool /etc/besu/genesis.json > /dev/null
# Check for deprecated options
pct exec <vmid> -- journalctl -u besu-validator | grep -i "deprecated\|unknown option"
# Review Besu documentation for current options
```
---
### Q: Path errors in configuration
**Symptoms**: "File not found" errors with paths like "/config/genesis.json"
**Solutions**:
```bash
# Check configuration file paths
pct exec <vmid> -- grep -E "genesis-file|data-path" /etc/besu/config-validator.toml
# Correct paths should be:
# genesis-file="/etc/besu/genesis.json"
# data-path="/data/besu"
# Fix paths if needed
pct exec <vmid> -- sed -i 's|/config/|/etc/besu/|g' /etc/besu/config-validator.toml
```
---
## Performance Issues
### Q: High CPU usage
**Symptoms**: Container CPU usage > 80% consistently
**Solutions**:
```bash
# Check CPU usage
pct exec <vmid> -- top -bn1 | head -20
# Check JVM GC activity
pct exec <vmid> -- journalctl -u besu-validator | grep -i "gc\|pause"
# Adjust JVM settings if needed
# Edit /etc/systemd/system/besu-validator.service
# Adjust BESU_OPTS and JAVA_OPTS
# Consider allocating more CPU cores
pct set <vmid> --cores 4
```
---
### Q: High memory usage
**Symptoms**: Container running out of memory, OOM kills
**Solutions**:
```bash
# Check memory usage
pct exec <vmid> -- free -h
# Check JVM heap settings
pct exec <vmid> -- ps aux | grep besu | grep -oP 'Xm[xs]\K[0-9]+[gm]'
# Reduce heap size if too large
# Edit /etc/systemd/system/besu-validator.service
# Adjust BESU_OPTS="-Xmx4g" to appropriate size
# Or increase container memory
pct set <vmid> --memory 8192
```
---
### Q: Slow sync or block processing
**Symptoms**: Blocks processing slowly, falling behind
**Solutions**:
```bash
# Check database size and health
pct exec <vmid> -- du -sh /data/besu/database/
# Check disk I/O
pct exec <vmid> -- iostat -x 1 5
# Consider using SSD storage
# Check network latency
pct exec <vmid> -- ping -c 10 <peer-ip>
# Verify sufficient peers
pct exec <vmid> -- curl -s -X POST -H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \
http://localhost:8545 | python3 -c "import sys, json; print(len(json.load(sys.stdin).get('result', [])))"
```
---
## General Troubleshooting Commands
```bash
# View all container statuses
for vmid in 1000 1001 1002 1003 1004 1500 1501 1502 1503 2500 2501 2502; do
echo "=== Container $vmid ==="
pct status $vmid
done
# Check all service statuses
for vmid in 1000 1001 1002 1003 1004; do
pct exec $vmid -- systemctl status besu-validator --no-pager -l | head -10
done
# View recent logs from all nodes
for vmid in 1000 1001 1002 1003 1004; do
echo "=== Logs for container $vmid ==="
pct exec $vmid -- journalctl -u besu-validator -n 20 --no-pager
done
# Check network connectivity between nodes
pct exec 1000 -- ping -c 3 192.168.11.14 # validator to validator
# Verify RPC endpoint (RPC nodes only)
pct exec 2500 -- curl -s -X POST -H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
http://localhost:8545 | python3 -m json.tool
```
---
## Getting Help
If issues persist:
1. **Collect Information**:
- Service logs: `journalctl -u besu-validator -n 100`
- Container status: `pct status <vmid>`
- Configuration: `pct exec <vmid> -- cat /etc/besu/config-validator.toml`
- Network: `pct exec <vmid> -- ip addr show`
2. **Check Documentation**:
- [Besu Nodes File Reference](BESU_NODES_FILE_REFERENCE.md)
- [Deployment Guide](VALIDATED_SET_DEPLOYMENT_GUIDE.md)
- [Besu Documentation](https://besu.hyperledger.org/)
3. **Validate Configuration**:
- Run prerequisites check: `./scripts/validation/check-prerequisites.sh`
- Validate validators: `./scripts/validation/validate-validator-set.sh`
4. **Review Logs**:
- Check deployment logs: `logs/deploy-validated-set-*.log`
- Check service logs in containers
- Check Proxmox host logs
---
## Related Documentation
### Operational Procedures
- **[OPERATIONAL_RUNBOOKS.md](OPERATIONAL_RUNBOOKS.md)** - Complete operational runbooks
- **[QBFT_TROUBLESHOOTING.md](QBFT_TROUBLESHOOTING.md)** - QBFT consensus troubleshooting
- **[BESU_ALLOWLIST_QUICK_START.md](BESU_ALLOWLIST_QUICK_START.md)** - Allowlist troubleshooting
### Deployment & Configuration
- **[DEPLOYMENT_STATUS_CONSOLIDATED.md](DEPLOYMENT_STATUS_CONSOLIDATED.md)** - Current deployment status
- **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md)** - Network architecture reference
- **[VALIDATED_SET_DEPLOYMENT_GUIDE.md](VALIDATED_SET_DEPLOYMENT_GUIDE.md)** - Deployment guide
### Monitoring
- **[MONITORING_SUMMARY.md](MONITORING_SUMMARY.md)** - Monitoring setup
- **[BLOCK_PRODUCTION_MONITORING.md](BLOCK_PRODUCTION_MONITORING.md)** - Block production monitoring
### Reference
- **[MASTER_INDEX.md](MASTER_INDEX.md)** - Complete documentation index
---
**Last Updated:** 2025-01-20
**Version:** 1.0