Files
proxmox/docs/04-configuration/BESU_DEPLOYMENT_MONITORING.md
defiQUG fbda1b4beb
Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
docs: Ledger Live integration, contract deploy learnings, NEXT_STEPS updates
- ADD_CHAIN138_TO_LEDGER_LIVE: Ledger form done; public code review repo bis-innovations/LedgerLive; init/push commands
- CONTRACT_DEPLOYMENT_RUNBOOK: Chain 138 gas price 1 gwei, 36-addr check, TransactionMirror workaround
- CONTRACT_*: AddressMapper, MirrorManager deployed 2026-02-12; 36-address on-chain check
- NEXT_STEPS_FOR_YOU: Ledger done; steps completable now (no LAN); run-completable-tasks-from-anywhere
- MASTER_INDEX, OPERATOR_OPTIONAL, SMART_CONTRACTS_INVENTORY_SIMPLE: updates
- LEDGER_BLOCKCHAIN_INTEGRATION_COMPLETE: bis-innovations/LedgerLive reference

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-12 15:46:57 -08:00

440 lines
11 KiB
Markdown

# Besu Configuration Deployment Monitoring Guide
**Last Updated:** 2026-01-31
**Document Version:** 1.0
**Status:** Active Documentation
---
**Date**: 2026-01-17
**Purpose**: Guide for monitoring Besu configuration deployments and verifying correct operation
---
## Overview
After deploying cleaned Besu configurations to running nodes, monitor the deployment to ensure services start correctly, configuration changes are applied, and no issues arise.
---
## Post-Deployment Monitoring Period
**Recommended**: 24-48 hours after deployment
**Intensive Monitoring**: First 4-6 hours
**Standard Monitoring**: 24-48 hours
**Ongoing Monitoring**: Regular health checks
---
## Monitoring Checklist
### Immediate (0-1 hour after deployment)
- [ ] Verify all services started successfully
- [ ] Check for configuration errors in logs
- [ ] Verify no restart loops
- [ ] Check logging levels are correct
- [ ] Test RPC endpoints (if applicable)
### Short-term (1-6 hours after deployment)
- [ ] Monitor service status
- [ ] Check for configuration-related errors
- [ ] Verify network connectivity
- [ ] Test consensus participation (validators)
- [ ] Test archive queries (sentries)
### Medium-term (6-48 hours after deployment)
- [ ] Monitor resource usage (memory, CPU, disk)
- [ ] Check peer connections
- [ ] Verify sync status
- [ ] Monitor for performance issues
- [ ] Check metrics endpoints
---
## Service Status Verification
### Check Systemd Service Status
```bash
# For each node (example for validator 1000)
pct exec 1000 -- systemctl status besu-validator.service
# Check if service is active
pct exec 1000 -- systemctl is-active besu-validator.service
# Expected: "active"
# Check service logs
pct exec 1000 -- journalctl -u besu-validator.service -n 50 --no-pager
```
### Verify No Restart Loops
```bash
# Check restart count (should be 0 or low after deployment)
pct exec 1000 -- systemctl show besu-validator.service | grep NRestart
# Expected: NRestart=0 or low number
# Check for frequent restarts
pct exec 1000 -- journalctl -u besu-validator.service --since "1 hour ago" | grep "Started\|Stopped" | tail -10
```
---
## Configuration Verification
### Verify Logging Levels
**Validators and RPC**: Should log at `WARN` level
**Sentry nodes**: Should log at `INFO` level
```bash
# Check Besu logs for logging level (should show WARN or INFO)
pct exec 1000 -- journalctl -u besu-validator.service -n 20 | grep -i "log\|WARN\|INFO"
# Validators/RPC: Should see WARN-level messages (minimal logs)
# Sentries: Should see INFO-level messages (detailed logs)
```
### Check for Configuration Errors
```bash
# Look for configuration errors
pct exec 1000 -- journalctl -u besu-validator.service | grep -i "error\|unknown option\|configuration"
# Should NOT see:
# - "Unknown options in TOML configuration file"
# - "Configuration error"
# - Deprecated option warnings
```
---
## Functional Verification
### Validator Nodes
**Check Consensus Participation**:
```bash
# Verify validator is synced
curl -X POST http://192.168.11.100:8545 \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}'
# Expected: false (fully synced)
# Note: Validators have RPC disabled, so use internal tools or metrics
```
**Check Metrics** (validators enable metrics):
```bash
curl http://192.168.11.100:9545/metrics | grep besu_blocks_total
```
### Sentry Nodes (Archive)
**Check Archive Functionality**:
```bash
# Test historical query (verify archive mode)
curl -X POST http://192.168.11.150:8545 \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_getBalance","params":["0x0000000000000000000000000000000000000000","0x100"],"id":1}'
# Should return historical balance (archive nodes only)
```
**Check Sync Status**:
```bash
curl -X POST http://192.168.11.150:8545 \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}'
# Expected: false (fully synced)
```
### RPC Nodes
**Test RPC Endpoints**:
```bash
# Test HTTP-RPC
curl -X POST http://192.168.11.250:8545 \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'
# Test chain ID
curl -X POST http://192.168.11.250:8545 \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}'
# Expected: "0x8a" (138 in hex)
```
**Verify Logging Level** (should be WARN, minimal logs):
```bash
# Check logs show minimal output (WARN level)
pct exec 2500 -- journalctl -u besu-rpc.service -n 20 --no-pager
# Should see mostly warnings/errors, not info messages
```
---
## Network Connectivity
### Peer Connections
**Check Peer Count**:
```bash
# Via metrics (if available)
curl http://192.168.11.150:9545/metrics | grep besu_peers
# Via logs (look for peer connection messages)
pct exec 1500 -- journalctl -u besu-sentry.service | grep -i "peer\|connected"
```
**Expected**:
- Validators: Connected to sentries (and other validators)
- Sentries: Connected to validators and external peers
- RPC: Connected to internal peers (sentries/validators)
---
## Performance Monitoring
### Resource Usage
**Memory Usage**:
```bash
# Check Besu process memory
pct exec 1000 -- ps aux | grep besu | awk '{print $4,$11}'
# Check systemd memory limit
pct exec 1000 -- systemctl show besu-validator.service | grep MemoryMax
```
**CPU Usage**:
```bash
# Monitor CPU usage
pct exec 1000 -- top -bn1 | grep besu
```
**Disk I/O**:
```bash
# Check disk usage
pct exec 1500 -- df -h /data/besu
# Check database size
pct exec 1500 -- du -sh /data/besu/database/
```
---
## Configuration Drift Detection
### Compare Running Configs to Templates
```bash
# Use audit script
./scripts/audit-besu-configs.sh
# Manual comparison
# 1. Copy running config from node
pct exec 1000 -- cat /etc/besu/config-validator.toml > /tmp/running-config.toml
# 2. Compare to template
diff /tmp/running-config.toml smom-dbis-138-proxmox/templates/besu-configs/config-validator.toml
```
**Expected**: Running configs should match templates (after deployment)
---
## Troubleshooting
### Issue: Service Fails to Start
**Symptoms**:
- Service status: `failed` or `inactive`
- Frequent restarts
- Configuration errors in logs
**Diagnosis**:
```bash
# Check service status
pct exec 1000 -- systemctl status besu-validator.service
# Check logs for errors
pct exec 1000 -- journalctl -u besu-validator.service -n 100 --no-pager
```
**Common Causes**:
1. Configuration syntax error
2. Deprecated options still present
3. Invalid option values
4. Missing required files (genesis.json, etc.)
**Resolution**:
1. Validate config with `validate-besu-config.sh`
2. Check for deprecated options
3. Review Besu logs for specific errors
4. Restore from backup if needed
---
### Issue: Configuration Not Applied
**Symptoms**:
- Logging level unchanged
- Service running but with old settings
**Diagnosis**:
```bash
# Check if config file was updated
pct exec 1000 -- stat /etc/besu/config-validator.toml
# Check actual logging level in Besu logs
pct exec 1000 -- journalctl -u besu-validator.service | grep -i "logging\|WARN\|INFO"
```
**Resolution**:
1. Verify config file was copied correctly
2. Ensure service was restarted after config update
3. Check for file permission issues
4. Verify Besu is reading correct config file
---
### Issue: Logging Level Incorrect
**Symptoms**:
- Validators showing INFO logs (should be WARN)
- RPC nodes showing INFO logs (should be WARN)
- Sentries showing WARN logs (should be INFO)
**Diagnosis**:
```bash
# Check config file logging setting
pct exec 1000 -- grep "^logging" /etc/besu/config-validator.toml
# Expected: logging="WARN" for validators
# Check actual log output
pct exec 1000 -- journalctl -u besu-validator.service -n 20
# Should see minimal logs (WARN level)
```
**Resolution**:
1. Verify config file has correct `logging="WARN"` or `logging="INFO"`
2. Ensure service was restarted
3. Clear log cache if needed: `journalctl --vacuum-time=1s`
---
## Monitoring Scripts
### Automated Monitoring
Create monitoring script to check all nodes:
```bash
#!/bin/bash
# monitor-besu-deployment.sh
NODES=(1000 1001 1002 1003 1004 1500 1501 1502 1503 2500 2501 2502)
for vmid in "${NODES[@]}"; do
echo "Checking VMID $vmid..."
# Check service status
status=$(pct exec $vmid -- systemctl is-active besu-*.service 2>/dev/null || echo "unknown")
echo " Service status: $status"
# Check for errors in logs
errors=$(pct exec $vmid -- journalctl -u besu-*.service --since "1 hour ago" | grep -i "error" | wc -l)
echo " Errors in last hour: $errors"
# Check restart count
restarts=$(pct exec $vmid -- systemctl show besu-*.service | grep NRestart | cut -d= -f2 | head -1)
echo " Restart count: $restarts"
done
```
---
## Success Criteria
### Deployment Successful If:
**All services running**:
- Systemd status: `active`
- No restart loops
- Services stable for 24+ hours
**Configuration applied**:
- Logging levels correct (WARN for validators/RPC, INFO for sentries)
- No deprecated options in use
- All configs match templates
**Functionality verified**:
- Validators participating in consensus
- Sentries providing archive queries
- RPC nodes serving API requests
- Network connectivity normal
**No errors**:
- No configuration errors in logs
- No "Unknown options" errors
- Services starting cleanly
---
## Monitoring Timeline
### Hour 0-1: Immediate Verification
- Service status
- Configuration errors
- Basic functionality
### Hour 1-6: Intensive Monitoring
- Service stability
- Performance metrics
- Network connectivity
- Detailed verification
### Hour 6-24: Standard Monitoring
- Ongoing health checks
- Resource usage
- Performance trends
### Day 2+: Ongoing Monitoring
- Regular health checks
- Performance monitoring
- Configuration drift detection
---
## Post-Deployment Checklist
- [ ] All services running (validators, sentries, RPC)
- [ ] No configuration errors in logs
- [ ] Logging levels correct (WARN/INFO as appropriate)
- [ ] No restart loops
- [ ] Validators participating in consensus
- [ ] Sentries providing archive queries
- [ ] RPC nodes serving API requests
- [ ] Network connectivity normal
- [ ] Peer connections healthy
- [ ] Resource usage within expected ranges
- [ ] Configuration drift: None detected
---
## Related Documentation
- `scripts/deploy-besu-configs.sh` - Deployment script
- `scripts/audit-besu-configs.sh` - Configuration audit
- `scripts/validate-besu-config.sh` - Configuration validation
- `docs/04-configuration/BESU_CONFIGURATION_GUIDE.md` - Configuration reference
---
**Last Updated**: 2026-01-17
**Status**: Monitoring Guide