- Update dbis_core, cross-chain-pmm-lps, explorer-monorepo, metamask-integration, pr-workspace/chains - Omit embedded publish git dirs and empty placeholders from index Made-with: Cursor
442 lines
11 KiB
Markdown
442 lines
11 KiB
Markdown
# Besu Configuration Deployment Monitoring Guide
|
|
|
|
> Historical note: This monitoring guide reflects a January 2026 deployment window and preserves the fleet assumptions from that rollout. Keep it as deployment history. For current nodes and services, verify against `docs/04-configuration/ALL_VMIDS_ENDPOINTS.md`, `docs/04-configuration/RPC_ENDPOINTS_MASTER.md`, and the active verification scripts.
|
|
|
|
**Last Updated:** 2026-01-31
|
|
**Document Version:** 1.0
|
|
**Status:** Active Documentation
|
|
|
|
---
|
|
|
|
**Date**: 2026-01-17
|
|
**Purpose**: Guide for monitoring Besu configuration deployments and verifying correct operation
|
|
|
|
---
|
|
|
|
## Overview
|
|
|
|
After deploying cleaned Besu configurations to running nodes, monitor the deployment to ensure services start correctly, configuration changes are applied, and no issues arise.
|
|
|
|
---
|
|
|
|
## Post-Deployment Monitoring Period
|
|
|
|
**Recommended**: 24-48 hours after deployment
|
|
|
|
**Intensive Monitoring**: First 4-6 hours
|
|
**Standard Monitoring**: 24-48 hours
|
|
**Ongoing Monitoring**: Regular health checks
|
|
|
|
---
|
|
|
|
## Monitoring Checklist
|
|
|
|
### Immediate (0-1 hour after deployment)
|
|
|
|
- [ ] Verify all services started successfully
|
|
- [ ] Check for configuration errors in logs
|
|
- [ ] Verify no restart loops
|
|
- [ ] Check logging levels are correct
|
|
- [ ] Test RPC endpoints (if applicable)
|
|
|
|
### Short-term (1-6 hours after deployment)
|
|
|
|
- [ ] Monitor service status
|
|
- [ ] Check for configuration-related errors
|
|
- [ ] Verify network connectivity
|
|
- [ ] Test consensus participation (validators)
|
|
- [ ] Test archive queries (sentries)
|
|
|
|
### Medium-term (6-48 hours after deployment)
|
|
|
|
- [ ] Monitor resource usage (memory, CPU, disk)
|
|
- [ ] Check peer connections
|
|
- [ ] Verify sync status
|
|
- [ ] Monitor for performance issues
|
|
- [ ] Check metrics endpoints
|
|
|
|
---
|
|
|
|
## Service Status Verification
|
|
|
|
### Check Systemd Service Status
|
|
|
|
```bash
|
|
# For each node (example for validator 1000)
|
|
pct exec 1000 -- systemctl status besu-validator.service
|
|
|
|
# Check if service is active
|
|
pct exec 1000 -- systemctl is-active besu-validator.service
|
|
# Expected: "active"
|
|
|
|
# Check service logs
|
|
pct exec 1000 -- journalctl -u besu-validator.service -n 50 --no-pager
|
|
```
|
|
|
|
### Verify No Restart Loops
|
|
|
|
```bash
|
|
# Check restart count (should be 0 or low after deployment)
|
|
pct exec 1000 -- systemctl show besu-validator.service | grep NRestart
|
|
# Expected: NRestart=0 or low number
|
|
|
|
# Check for frequent restarts
|
|
pct exec 1000 -- journalctl -u besu-validator.service --since "1 hour ago" | grep "Started\|Stopped" | tail -10
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration Verification
|
|
|
|
### Verify Logging Levels
|
|
|
|
**Validators and RPC**: Should log at `WARN` level
|
|
**Sentry nodes**: Should log at `INFO` level
|
|
|
|
```bash
|
|
# Check Besu logs for logging level (should show WARN or INFO)
|
|
pct exec 1000 -- journalctl -u besu-validator.service -n 20 | grep -i "log\|WARN\|INFO"
|
|
|
|
# Validators/RPC: Should see WARN-level messages (minimal logs)
|
|
# Sentries: Should see INFO-level messages (detailed logs)
|
|
```
|
|
|
|
### Check for Configuration Errors
|
|
|
|
```bash
|
|
# Look for configuration errors
|
|
pct exec 1000 -- journalctl -u besu-validator.service | grep -i "error\|unknown option\|configuration"
|
|
|
|
# Should NOT see:
|
|
# - "Unknown options in TOML configuration file"
|
|
# - "Configuration error"
|
|
# - Deprecated option warnings
|
|
```
|
|
|
|
---
|
|
|
|
## Functional Verification
|
|
|
|
### Validator Nodes
|
|
|
|
**Check Consensus Participation**:
|
|
```bash
|
|
# Verify validator is synced
|
|
curl -X POST http://192.168.11.100:8545 \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}'
|
|
# Expected: false (fully synced)
|
|
|
|
# Note: Validators have RPC disabled, so use internal tools or metrics
|
|
```
|
|
|
|
**Check Metrics** (validators enable metrics):
|
|
```bash
|
|
curl http://192.168.11.100:9545/metrics | grep besu_blocks_total
|
|
```
|
|
|
|
### Sentry Nodes (Archive)
|
|
|
|
**Check Archive Functionality**:
|
|
```bash
|
|
# Test historical query (verify archive mode)
|
|
curl -X POST http://192.168.11.150:8545 \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"jsonrpc":"2.0","method":"eth_getBalance","params":["0x0000000000000000000000000000000000000000","0x100"],"id":1}'
|
|
# Should return historical balance (archive nodes only)
|
|
```
|
|
|
|
**Check Sync Status**:
|
|
```bash
|
|
curl -X POST http://192.168.11.150:8545 \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}'
|
|
# Expected: false (fully synced)
|
|
```
|
|
|
|
### RPC Nodes
|
|
|
|
**Test RPC Endpoints**:
|
|
```bash
|
|
# Test HTTP-RPC
|
|
curl -X POST http://192.168.11.250:8545 \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'
|
|
|
|
# Test chain ID
|
|
curl -X POST http://192.168.11.250:8545 \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}'
|
|
# Expected: "0x8a" (138 in hex)
|
|
```
|
|
|
|
**Verify Logging Level** (should be WARN, minimal logs):
|
|
```bash
|
|
# Check logs show minimal output (WARN level)
|
|
pct exec 2500 -- journalctl -u besu-rpc.service -n 20 --no-pager
|
|
# Should see mostly warnings/errors, not info messages
|
|
```
|
|
|
|
---
|
|
|
|
## Network Connectivity
|
|
|
|
### Peer Connections
|
|
|
|
**Check Peer Count**:
|
|
```bash
|
|
# Via metrics (if available)
|
|
curl http://192.168.11.150:9545/metrics | grep besu_peers
|
|
|
|
# Via logs (look for peer connection messages)
|
|
pct exec 1500 -- journalctl -u besu-sentry.service | grep -i "peer\|connected"
|
|
```
|
|
|
|
**Expected**:
|
|
- Validators: Connected to sentries (and other validators)
|
|
- Sentries: Connected to validators and external peers
|
|
- RPC: Connected to internal peers (sentries/validators)
|
|
|
|
---
|
|
|
|
## Performance Monitoring
|
|
|
|
### Resource Usage
|
|
|
|
**Memory Usage**:
|
|
```bash
|
|
# Check Besu process memory
|
|
pct exec 1000 -- ps aux | grep besu | awk '{print $4,$11}'
|
|
|
|
# Check systemd memory limit
|
|
pct exec 1000 -- systemctl show besu-validator.service | grep MemoryMax
|
|
```
|
|
|
|
**CPU Usage**:
|
|
```bash
|
|
# Monitor CPU usage
|
|
pct exec 1000 -- top -bn1 | grep besu
|
|
```
|
|
|
|
**Disk I/O**:
|
|
```bash
|
|
# Check disk usage
|
|
pct exec 1500 -- df -h /data/besu
|
|
|
|
# Check database size
|
|
pct exec 1500 -- du -sh /data/besu/database/
|
|
```
|
|
|
|
---
|
|
|
|
## Configuration Drift Detection
|
|
|
|
### Compare Running Configs to Templates
|
|
|
|
```bash
|
|
# Use audit script
|
|
./scripts/audit-besu-configs.sh
|
|
|
|
# Manual comparison
|
|
# 1. Copy running config from node
|
|
pct exec 1000 -- cat /etc/besu/config-validator.toml > /tmp/running-config.toml
|
|
|
|
# 2. Compare to template
|
|
diff /tmp/running-config.toml smom-dbis-138-proxmox/templates/besu-configs/config-validator.toml
|
|
```
|
|
|
|
**Expected**: Running configs should match templates (after deployment)
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Issue: Service Fails to Start
|
|
|
|
**Symptoms**:
|
|
- Service status: `failed` or `inactive`
|
|
- Frequent restarts
|
|
- Configuration errors in logs
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check service status
|
|
pct exec 1000 -- systemctl status besu-validator.service
|
|
|
|
# Check logs for errors
|
|
pct exec 1000 -- journalctl -u besu-validator.service -n 100 --no-pager
|
|
```
|
|
|
|
**Common Causes**:
|
|
1. Configuration syntax error
|
|
2. Deprecated options still present
|
|
3. Invalid option values
|
|
4. Missing required files (genesis.json, etc.)
|
|
|
|
**Resolution**:
|
|
1. Validate config with `validate-besu-config.sh`
|
|
2. Check for deprecated options
|
|
3. Review Besu logs for specific errors
|
|
4. Restore from backup if needed
|
|
|
|
---
|
|
|
|
### Issue: Configuration Not Applied
|
|
|
|
**Symptoms**:
|
|
- Logging level unchanged
|
|
- Service running but with old settings
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check if config file was updated
|
|
pct exec 1000 -- stat /etc/besu/config-validator.toml
|
|
|
|
# Check actual logging level in Besu logs
|
|
pct exec 1000 -- journalctl -u besu-validator.service | grep -i "logging\|WARN\|INFO"
|
|
```
|
|
|
|
**Resolution**:
|
|
1. Verify config file was copied correctly
|
|
2. Ensure service was restarted after config update
|
|
3. Check for file permission issues
|
|
4. Verify Besu is reading correct config file
|
|
|
|
---
|
|
|
|
### Issue: Logging Level Incorrect
|
|
|
|
**Symptoms**:
|
|
- Validators showing INFO logs (should be WARN)
|
|
- RPC nodes showing INFO logs (should be WARN)
|
|
- Sentries showing WARN logs (should be INFO)
|
|
|
|
**Diagnosis**:
|
|
```bash
|
|
# Check config file logging setting
|
|
pct exec 1000 -- grep "^logging" /etc/besu/config-validator.toml
|
|
# Expected: logging="WARN" for validators
|
|
|
|
# Check actual log output
|
|
pct exec 1000 -- journalctl -u besu-validator.service -n 20
|
|
# Should see minimal logs (WARN level)
|
|
```
|
|
|
|
**Resolution**:
|
|
1. Verify config file has correct `logging="WARN"` or `logging="INFO"`
|
|
2. Ensure service was restarted
|
|
3. Clear log cache if needed: `journalctl --vacuum-time=1s`
|
|
|
|
---
|
|
|
|
## Monitoring Scripts
|
|
|
|
### Automated Monitoring
|
|
|
|
Create monitoring script to check all nodes:
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# monitor-besu-deployment.sh
|
|
|
|
NODES=(1000 1001 1002 1003 1004 1500 1501 1502 1503 2500 2501 2502)
|
|
|
|
for vmid in "${NODES[@]}"; do
|
|
echo "Checking VMID $vmid..."
|
|
|
|
# Check service status
|
|
status=$(pct exec $vmid -- systemctl is-active besu-*.service 2>/dev/null || echo "unknown")
|
|
echo " Service status: $status"
|
|
|
|
# Check for errors in logs
|
|
errors=$(pct exec $vmid -- journalctl -u besu-*.service --since "1 hour ago" | grep -i "error" | wc -l)
|
|
echo " Errors in last hour: $errors"
|
|
|
|
# Check restart count
|
|
restarts=$(pct exec $vmid -- systemctl show besu-*.service | grep NRestart | cut -d= -f2 | head -1)
|
|
echo " Restart count: $restarts"
|
|
done
|
|
```
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
### Deployment Successful If:
|
|
|
|
✅ **All services running**:
|
|
- Systemd status: `active`
|
|
- No restart loops
|
|
- Services stable for 24+ hours
|
|
|
|
✅ **Configuration applied**:
|
|
- Logging levels correct (WARN for validators/RPC, INFO for sentries)
|
|
- No deprecated options in use
|
|
- All configs match templates
|
|
|
|
✅ **Functionality verified**:
|
|
- Validators participating in consensus
|
|
- Sentries providing archive queries
|
|
- RPC nodes serving API requests
|
|
- Network connectivity normal
|
|
|
|
✅ **No errors**:
|
|
- No configuration errors in logs
|
|
- No "Unknown options" errors
|
|
- Services starting cleanly
|
|
|
|
---
|
|
|
|
## Monitoring Timeline
|
|
|
|
### Hour 0-1: Immediate Verification
|
|
- Service status
|
|
- Configuration errors
|
|
- Basic functionality
|
|
|
|
### Hour 1-6: Intensive Monitoring
|
|
- Service stability
|
|
- Performance metrics
|
|
- Network connectivity
|
|
- Detailed verification
|
|
|
|
### Hour 6-24: Standard Monitoring
|
|
- Ongoing health checks
|
|
- Resource usage
|
|
- Performance trends
|
|
|
|
### Day 2+: Ongoing Monitoring
|
|
- Regular health checks
|
|
- Performance monitoring
|
|
- Configuration drift detection
|
|
|
|
---
|
|
|
|
## Post-Deployment Checklist
|
|
|
|
- [ ] All services running (validators, sentries, RPC)
|
|
- [ ] No configuration errors in logs
|
|
- [ ] Logging levels correct (WARN/INFO as appropriate)
|
|
- [ ] No restart loops
|
|
- [ ] Validators participating in consensus
|
|
- [ ] Sentries providing archive queries
|
|
- [ ] RPC nodes serving API requests
|
|
- [ ] Network connectivity normal
|
|
- [ ] Peer connections healthy
|
|
- [ ] Resource usage within expected ranges
|
|
- [ ] Configuration drift: None detected
|
|
|
|
---
|
|
|
|
## Related Documentation
|
|
|
|
- `scripts/deploy-besu-configs.sh` - Deployment script
|
|
- `scripts/audit-besu-configs.sh` - Configuration audit
|
|
- `scripts/validate-besu-config.sh` - Configuration validation
|
|
- `docs/04-configuration/BESU_CONFIGURATION_GUIDE.md` - Configuration reference
|
|
|
|
---
|
|
|
|
**Last Updated**: 2026-01-17
|
|
**Status**: Monitoring Guide
|