509 lines
12 KiB
Markdown
509 lines
12 KiB
Markdown
# Troubleshooting FAQ
|
|
|
|
Common issues and solutions for Besu validated set deployment.
|
|
|
|
## Table of Contents
|
|
|
|
1. [Container Issues](#container-issues)
|
|
2. [Service Issues](#service-issues)
|
|
3. [Network Issues](#network-issues)
|
|
4. [Consensus Issues](#consensus-issues)
|
|
5. [Configuration Issues](#configuration-issues)
|
|
6. [Performance Issues](#performance-issues)
|
|
|
|
---
|
|
|
|
## Container Issues
|
|
|
|
### Q: Container won't start
|
|
|
|
**Symptoms**: `pct status <vmid>` shows "stopped" or errors during startup
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Check container status
|
|
pct status <vmid>
|
|
|
|
# View container console
|
|
pct console <vmid>
|
|
|
|
# Check logs
|
|
journalctl -u pve-container@<vmid>
|
|
|
|
# Check container configuration
|
|
pct config <vmid>
|
|
|
|
# Try starting manually
|
|
pct start <vmid>
|
|
```
|
|
|
|
**Common Causes**:
|
|
- Insufficient resources (RAM, disk)
|
|
- Network configuration errors
|
|
- Invalid container configuration
|
|
- OS template issues
|
|
|
|
---
|
|
|
|
### Q: Container runs out of disk space
|
|
|
|
**Symptoms**: Services fail, "No space left on device" errors
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Check disk usage
|
|
pct exec <vmid> -- df -h
|
|
|
|
# Check Besu database size
|
|
pct exec <vmid> -- du -sh /data/besu/database/
|
|
|
|
# Clean up old logs
|
|
pct exec <vmid> -- journalctl --vacuum-time=7d
|
|
|
|
# Increase disk size (if using LVM)
|
|
pct resize <vmid> rootfs +10G
|
|
```
|
|
|
|
---
|
|
|
|
### Q: Container network issues
|
|
|
|
**Symptoms**: Cannot ping, cannot connect to services
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Check network configuration
|
|
pct config <vmid> | grep net0
|
|
|
|
# Check if container has IP
|
|
pct exec <vmid> -- ip addr show
|
|
|
|
# Check routing
|
|
pct exec <vmid> -- ip route
|
|
|
|
# Restart container networking
|
|
pct stop <vmid>
|
|
pct start <vmid>
|
|
```
|
|
|
|
---
|
|
|
|
## Service Issues
|
|
|
|
### Q: Besu service won't start
|
|
|
|
**Symptoms**: `systemctl status besu-validator` shows failed
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Check service status
|
|
pct exec <vmid> -- systemctl status besu-validator
|
|
|
|
# View service logs
|
|
pct exec <vmid> -- journalctl -u besu-validator -n 100
|
|
|
|
# Check for configuration errors
|
|
pct exec <vmid> -- besu --config-file=/etc/besu/config-validator.toml --help
|
|
|
|
# Verify configuration file syntax
|
|
pct exec <vmid> -- cat /etc/besu/config-validator.toml
|
|
```
|
|
|
|
**Common Causes**:
|
|
- Missing configuration files
|
|
- Invalid configuration syntax
|
|
- Missing validator keys
|
|
- Port conflicts
|
|
- Insufficient resources
|
|
|
|
---
|
|
|
|
### Q: Service starts but crashes
|
|
|
|
**Symptoms**: Service starts then stops, high restart count
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Check crash logs
|
|
pct exec <vmid> -- journalctl -u besu-validator --since "10 minutes ago"
|
|
|
|
# Check for out of memory
|
|
pct exec <vmid> -- dmesg | grep -i "out of memory"
|
|
|
|
# Check system resources
|
|
pct exec <vmid> -- free -h
|
|
pct exec <vmid> -- df -h
|
|
|
|
# Check JVM heap settings
|
|
pct exec <vmid> -- cat /etc/systemd/system/besu-validator.service | grep BESU_OPTS
|
|
```
|
|
|
|
---
|
|
|
|
### Q: Service shows as active but not responding
|
|
|
|
**Symptoms**: Service status shows "active" but RPC/P2P not responding
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Check if process is actually running
|
|
pct exec <vmid> -- ps aux | grep besu
|
|
|
|
# Check if ports are listening
|
|
pct exec <vmid> -- netstat -tuln | grep -E "30303|8545|9545"
|
|
|
|
# Check firewall rules
|
|
pct exec <vmid> -- iptables -L -n
|
|
|
|
# Test connectivity
|
|
pct exec <vmid> -- curl -s http://localhost:8545
|
|
```
|
|
|
|
---
|
|
|
|
## Network Issues
|
|
|
|
### Q: Nodes cannot connect to peers
|
|
|
|
**Symptoms**: Low or zero peer count, "No peers" in logs
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Check static-nodes.json
|
|
pct exec <vmid> -- cat /etc/besu/static-nodes.json
|
|
|
|
# Check permissions-nodes.toml
|
|
pct exec <vmid> -- cat /etc/besu/permissions-nodes.toml
|
|
|
|
# Verify enode URLs are correct
|
|
pct exec <vmid> -- besu public-key export --node-private-key-file=/data/besu/nodekey --format=enode
|
|
|
|
# Check P2P port is open
|
|
pct exec <vmid> -- netstat -tuln | grep 30303
|
|
|
|
# Test connectivity to peer
|
|
pct exec <vmid> -- ping -c 3 <peer-ip>
|
|
```
|
|
|
|
**Common Causes**:
|
|
- Incorrect enode URLs in static-nodes.json
|
|
- Firewall blocking P2P port (30303)
|
|
- Nodes not in permissions-nodes.toml
|
|
- Network connectivity issues
|
|
|
|
---
|
|
|
|
### Q: Invalid enode URL errors
|
|
|
|
**Symptoms**: "Invalid enode URL syntax" or "Invalid node ID" in logs
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Check node ID length (must be 128 hex chars)
|
|
pct exec <vmid> -- besu public-key export --node-private-key-file=/data/besu/nodekey --format=enode | \
|
|
sed 's|^enode://||' | cut -d'@' -f1 | wc -c
|
|
|
|
# Should output 129 (128 chars + newline)
|
|
|
|
# Fix node IDs using allowlist scripts
|
|
./scripts/besu-collect-all-enodes.sh
|
|
./scripts/besu-generate-allowlist.sh
|
|
./scripts/besu-deploy-allowlist.sh
|
|
```
|
|
|
|
---
|
|
|
|
### Q: RPC endpoint not accessible
|
|
|
|
**Symptoms**: Cannot connect to RPC on port 8545
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Check if RPC is enabled (validators typically don't have RPC)
|
|
pct exec <vmid> -- grep -i "rpc-http-enabled" /etc/besu/config-*.toml
|
|
|
|
# Check if RPC port is listening
|
|
pct exec <vmid> -- netstat -tuln | grep 8545
|
|
|
|
# Check firewall
|
|
pct exec <vmid> -- iptables -L -n | grep 8545
|
|
|
|
# Test from container
|
|
pct exec <vmid> -- curl -X POST -H "Content-Type: application/json" \
|
|
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
|
|
http://localhost:8545
|
|
|
|
# Check host allowlist in config
|
|
pct exec <vmid> -- grep -i "host-allowlist\|rpc-http-host" /etc/besu/config-*.toml
|
|
```
|
|
|
|
---
|
|
|
|
## Consensus Issues
|
|
|
|
### Q: No blocks being produced
|
|
|
|
**Symptoms**: Block height not increasing, "No blocks" in logs
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Check validator service is running
|
|
pct exec <vmid> -- systemctl status besu-validator
|
|
|
|
# Check validator keys
|
|
pct exec <vmid> -- ls -la /keys/validators/
|
|
|
|
# Check consensus logs
|
|
pct exec <vmid> -- journalctl -u besu-validator | grep -i "consensus\|qbft\|proposing"
|
|
|
|
# Verify validators are in genesis (if static validators)
|
|
pct exec <vmid> -- cat /etc/besu/genesis.json | grep -A 20 "qbft"
|
|
|
|
# Check peer connectivity
|
|
pct exec <vmid> -- curl -s -X POST -H "Content-Type: application/json" \
|
|
-d '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \
|
|
http://localhost:8545
|
|
```
|
|
|
|
**Common Causes**:
|
|
- Validator keys missing or incorrect
|
|
- Not enough validators online
|
|
- Network connectivity issues
|
|
- Consensus configuration errors
|
|
|
|
---
|
|
|
|
### Q: Validator not participating in consensus
|
|
|
|
**Symptoms**: Validator running but not producing blocks
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Verify validator address
|
|
pct exec <vmid> -- cat /keys/validators/validator-*/address.txt
|
|
|
|
# Check if address is in validator contract (for dynamic validators)
|
|
# Or check genesis.json (for static validators)
|
|
pct exec <vmid> -- cat /etc/besu/genesis.json | python3 -m json.tool | grep -A 10 "qbft"
|
|
|
|
# Verify validator keys are loaded
|
|
pct exec <vmid> -- journalctl -u besu-validator | grep -i "validator.*key"
|
|
|
|
# Check for permission errors
|
|
pct exec <vmid> -- journalctl -u besu-validator | grep -i "permission\|denied"
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration Issues
|
|
|
|
### Q: Configuration file not found
|
|
|
|
**Symptoms**: "File not found" errors, service won't start
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# List all config files
|
|
pct exec <vmid> -- ls -la /etc/besu/
|
|
|
|
# Verify required files exist
|
|
pct exec <vmid> -- test -f /etc/besu/genesis.json && echo "genesis.json OK" || echo "genesis.json MISSING"
|
|
pct exec <vmid> -- test -f /etc/besu/config-validator.toml && echo "config OK" || echo "config MISSING"
|
|
|
|
# Copy missing files
|
|
# (Use copy-besu-config.sh script)
|
|
./scripts/copy-besu-config.sh /path/to/smom-dbis-138
|
|
```
|
|
|
|
---
|
|
|
|
### Q: Invalid configuration syntax
|
|
|
|
**Symptoms**: "Invalid option" or syntax errors in logs
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Validate TOML syntax
|
|
pct exec <vmid> -- python3 -c "import tomllib; open('/etc/besu/config-validator.toml').read()" 2>&1
|
|
|
|
# Validate JSON syntax
|
|
pct exec <vmid> -- python3 -m json.tool /etc/besu/genesis.json > /dev/null
|
|
|
|
# Check for deprecated options
|
|
pct exec <vmid> -- journalctl -u besu-validator | grep -i "deprecated\|unknown option"
|
|
|
|
# Review Besu documentation for current options
|
|
```
|
|
|
|
---
|
|
|
|
### Q: Path errors in configuration
|
|
|
|
**Symptoms**: "File not found" errors with paths like "/config/genesis.json"
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Check configuration file paths
|
|
pct exec <vmid> -- grep -E "genesis-file|data-path" /etc/besu/config-validator.toml
|
|
|
|
# Correct paths should be:
|
|
# genesis-file="/etc/besu/genesis.json"
|
|
# data-path="/data/besu"
|
|
|
|
# Fix paths if needed
|
|
pct exec <vmid> -- sed -i 's|/config/|/etc/besu/|g' /etc/besu/config-validator.toml
|
|
```
|
|
|
|
---
|
|
|
|
## Performance Issues
|
|
|
|
### Q: High CPU usage
|
|
|
|
**Symptoms**: Container CPU usage > 80% consistently
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Check CPU usage
|
|
pct exec <vmid> -- top -bn1 | head -20
|
|
|
|
# Check JVM GC activity
|
|
pct exec <vmid> -- journalctl -u besu-validator | grep -i "gc\|pause"
|
|
|
|
# Adjust JVM settings if needed
|
|
# Edit /etc/systemd/system/besu-validator.service
|
|
# Adjust BESU_OPTS and JAVA_OPTS
|
|
|
|
# Consider allocating more CPU cores
|
|
pct set <vmid> --cores 4
|
|
```
|
|
|
|
---
|
|
|
|
### Q: High memory usage
|
|
|
|
**Symptoms**: Container running out of memory, OOM kills
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Check memory usage
|
|
pct exec <vmid> -- free -h
|
|
|
|
# Check JVM heap settings
|
|
pct exec <vmid> -- ps aux | grep besu | grep -oP 'Xm[xs]\K[0-9]+[gm]'
|
|
|
|
# Reduce heap size if too large
|
|
# Edit /etc/systemd/system/besu-validator.service
|
|
# Adjust BESU_OPTS="-Xmx4g" to appropriate size
|
|
|
|
# Or increase container memory
|
|
pct set <vmid> --memory 8192
|
|
```
|
|
|
|
---
|
|
|
|
### Q: Slow sync or block processing
|
|
|
|
**Symptoms**: Blocks processing slowly, falling behind
|
|
|
|
**Solutions**:
|
|
```bash
|
|
# Check database size and health
|
|
pct exec <vmid> -- du -sh /data/besu/database/
|
|
|
|
# Check disk I/O
|
|
pct exec <vmid> -- iostat -x 1 5
|
|
|
|
# Consider using SSD storage
|
|
# Check network latency
|
|
pct exec <vmid> -- ping -c 10 <peer-ip>
|
|
|
|
# Verify sufficient peers
|
|
pct exec <vmid> -- curl -s -X POST -H "Content-Type: application/json" \
|
|
-d '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \
|
|
http://localhost:8545 | python3 -c "import sys, json; print(len(json.load(sys.stdin).get('result', [])))"
|
|
```
|
|
|
|
---
|
|
|
|
## General Troubleshooting Commands
|
|
|
|
```bash
|
|
# View all container statuses
|
|
for vmid in 1000 1001 1002 1003 1004 1500 1501 1502 1503 2500 2501 2502; do
|
|
echo "=== Container $vmid ==="
|
|
pct status $vmid
|
|
done
|
|
|
|
# Check all service statuses
|
|
for vmid in 1000 1001 1002 1003 1004; do
|
|
pct exec $vmid -- systemctl status besu-validator --no-pager -l | head -10
|
|
done
|
|
|
|
# View recent logs from all nodes
|
|
for vmid in 1000 1001 1002 1003 1004; do
|
|
echo "=== Logs for container $vmid ==="
|
|
pct exec $vmid -- journalctl -u besu-validator -n 20 --no-pager
|
|
done
|
|
|
|
# Check network connectivity between nodes
|
|
pct exec 1000 -- ping -c 3 192.168.11.14 # validator to validator
|
|
|
|
# Verify RPC endpoint (RPC nodes only)
|
|
pct exec 2500 -- curl -s -X POST -H "Content-Type: application/json" \
|
|
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
|
|
http://localhost:8545 | python3 -m json.tool
|
|
```
|
|
|
|
---
|
|
|
|
## Getting Help
|
|
|
|
If issues persist:
|
|
|
|
1. **Collect Information**:
|
|
- Service logs: `journalctl -u besu-validator -n 100`
|
|
- Container status: `pct status <vmid>`
|
|
- Configuration: `pct exec <vmid> -- cat /etc/besu/config-validator.toml`
|
|
- Network: `pct exec <vmid> -- ip addr show`
|
|
|
|
2. **Check Documentation**:
|
|
- [Besu Nodes File Reference](BESU_NODES_FILE_REFERENCE.md)
|
|
- [Deployment Guide](VALIDATED_SET_DEPLOYMENT_GUIDE.md)
|
|
- [Besu Documentation](https://besu.hyperledger.org/)
|
|
|
|
3. **Validate Configuration**:
|
|
- Run prerequisites check: `./scripts/validation/check-prerequisites.sh`
|
|
- Validate validators: `./scripts/validation/validate-validator-set.sh`
|
|
|
|
4. **Review Logs**:
|
|
- Check deployment logs: `logs/deploy-validated-set-*.log`
|
|
- Check service logs in containers
|
|
- Check Proxmox host logs
|
|
|
|
---
|
|
|
|
## Related Documentation
|
|
|
|
### Operational Procedures
|
|
- **[OPERATIONAL_RUNBOOKS.md](OPERATIONAL_RUNBOOKS.md)** - Complete operational runbooks
|
|
- **[QBFT_TROUBLESHOOTING.md](QBFT_TROUBLESHOOTING.md)** - QBFT consensus troubleshooting
|
|
- **[BESU_ALLOWLIST_QUICK_START.md](BESU_ALLOWLIST_QUICK_START.md)** - Allowlist troubleshooting
|
|
|
|
### Deployment & Configuration
|
|
- **[DEPLOYMENT_STATUS_CONSOLIDATED.md](DEPLOYMENT_STATUS_CONSOLIDATED.md)** - Current deployment status
|
|
- **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md)** - Network architecture reference
|
|
- **[VALIDATED_SET_DEPLOYMENT_GUIDE.md](VALIDATED_SET_DEPLOYMENT_GUIDE.md)** - Deployment guide
|
|
|
|
### Monitoring
|
|
- **[MONITORING_SUMMARY.md](MONITORING_SUMMARY.md)** - Monitoring setup
|
|
- **[BLOCK_PRODUCTION_MONITORING.md](BLOCK_PRODUCTION_MONITORING.md)** - Block production monitoring
|
|
|
|
### Reference
|
|
- **[MASTER_INDEX.md](MASTER_INDEX.md)** - Complete documentation index
|
|
|
|
---
|
|
|
|
**Last Updated:** 2025-01-20
|
|
**Version:** 1.0
|