Files
proxmox/docs/04-configuration/BESU_PERFORMANCE_TUNING.md
defiQUG fbda1b4beb
Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
docs: Ledger Live integration, contract deploy learnings, NEXT_STEPS updates
- ADD_CHAIN138_TO_LEDGER_LIVE: Ledger form done; public code review repo bis-innovations/LedgerLive; init/push commands
- CONTRACT_DEPLOYMENT_RUNBOOK: Chain 138 gas price 1 gwei, 36-addr check, TransactionMirror workaround
- CONTRACT_*: AddressMapper, MirrorManager deployed 2026-02-12; 36-address on-chain check
- NEXT_STEPS_FOR_YOU: Ledger done; steps completable now (no LAN); run-completable-tasks-from-anywhere
- MASTER_INDEX, OPERATOR_OPTIONAL, SMART_CONTRACTS_INVENTORY_SIMPLE: updates
- LEDGER_BLOCKCHAIN_INTEGRATION_COMPLETE: bis-innovations/LedgerLive reference

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-12 15:46:57 -08:00

387 lines
8.9 KiB
Markdown

# Besu Performance Tuning Guide
**Last Updated:** 2026-01-31
**Document Version:** 1.0
**Status:** Active Documentation
---
**Date**: 2026-01-17
**Purpose**: Performance optimization recommendations for Besu nodes
---
## Overview
This guide provides performance tuning recommendations for Besu nodes based on network size, node type, and operational requirements.
---
## Network Size Analysis
### Current Network Topology
- **Validators**: 5 nodes (VMIDs 1000-1004)
- **Sentries**: 4 nodes (VMIDs 1500-1503)
- **RPC Nodes**: 10+ nodes (VMIDs 2500+)
- **Total Nodes**: ~19-20 active nodes
### Expected Growth
- **Near-term**: 20-30 nodes
- **Medium-term**: 30-50 nodes
- **Long-term**: 50-100 nodes
---
## Performance Configuration Options
### max-peers
**Current Settings**:
- Validators: `25` peers
- Sentries: `25` peers
- RPC (Standard): `25` peers
- RPC (ThirdWeb): `50` peers
**Recommended Settings by Network Size**:
| Network Size | Validators | Sentries | RPC (Standard) | RPC (High Traffic) |
|--------------|------------|----------|----------------|-------------------|
| **10-20 nodes** | 15-20 | 20-25 | 20-25 | 30-40 |
| **20-50 nodes** | 20-25 | 25-30 | 25-30 | 40-50 |
| **50-100 nodes** | 25-30 | 30-40 | 30-40 | 50-75 |
| **100+ nodes** | 30-40 | 40-50 | 40-50 | 75-100 |
**Rationale**:
- **Validators**: Fewer peers needed (only sentries and other validators)
- **Sentries**: Moderate peers (handle P2P traffic for validators)
- **RPC Standard**: Moderate peers (serve API requests)
- **RPC High Traffic**: Higher peers (ThirdWeb, high-volume applications)
**Current Assessment**: ✅ Appropriate for current network size (20 nodes)
---
### P2P Configuration
```toml
# P2P host binding
p2p-host="0.0.0.0"
p2p-port=30303
# Maximum peer connections
max-peers=25
# Discovery
discovery-enabled=true # or false for isolated nodes
```
**Tuning Guidelines**:
- **Discovery enabled**: For public-facing nodes (sentries, public RPC)
- **Discovery disabled**: For internal-only nodes (validators, core RPC)
- **Max peers**: Balance between connectivity and resource usage
---
### Sync Mode Configuration
```toml
sync-mode="FULL"
```
**Options**:
- `FULL`: Full blockchain sync (validators, archive nodes)
- `FAST`: Fast sync (non-archive RPC nodes)
- `SNAP`: Snapshot sync (if available, fastest bootstrap)
**Recommendations**:
-**Validators**: `FULL` (required for consensus)
-**Sentries (Archive)**: `FULL` (archive nodes)
- ⚠️ **RPC Nodes**: Consider `FAST` for non-archive nodes (better performance)
**Note**: Current configs all use `FULL`. Consider `FAST` for non-archive RPC nodes if storage is a concern.
---
### Logging Configuration
```toml
logging="WARN" # Validators and RPC
logging="INFO" # Sentry archive nodes
```
**Performance Impact**:
- **INFO logging**: ~10-20% I/O overhead
- **WARN logging**: Minimal I/O overhead (<5%)
- **DEBUG logging**: High I/O overhead (30-50%)
**Recommendation**: ✅ Current settings are optimal
- Validators/RPC: `WARN` (minimal overhead)
- Sentry archive: `INFO` (detailed logs for archival)
---
### RPC Configuration
#### HTTP-RPC Timeout
```toml
# ThirdWeb RPC uses extended timeout
rpc-http-timeout=60
```
**Default**: 60 seconds (Besu default)
**Tuning**:
- **Standard RPC**: Default (60s) is appropriate
- **High-volume RPC**: May need longer timeout for complex queries
- **Public RPC**: Default is sufficient
**Recommendation**: ✅ Current settings appropriate
---
#### WebSocket Configuration
```toml
rpc-ws-enabled=true
rpc-ws-port=8546
```
**Performance Considerations**:
- WebSocket connections consume memory
- Recommended for real-time applications (ThirdWeb, dApps)
- Not needed for simple read-only public RPC
**Current Usage**: ✅ Appropriate (enabled where needed, disabled for public RPC)
---
### Metrics Configuration
```toml
metrics-enabled=true
metrics-port=9545
metrics-host="0.0.0.0"
```
**Performance Impact**: Minimal (<2% overhead)
**Recommendation**: ✅ Keep enabled on all nodes for monitoring
---
## Resource Recommendations
### Memory (JVM Heap)
**Current Settings** (from deployment scripts):
- Validators: `-Xmx4g -Xms4g`
- Sentries: `-Xmx6g -Xms6g` (archive nodes need more)
- RPC: `-Xmx6g -Xms6g`
**Recommended by Node Type**:
| Node Type | Heap Size | Rationale |
|-----------|-----------|-----------|
| **Validator** | 4-8GB | Consensus operations, transaction pool |
| **Sentry (Archive)** | 8-12GB | Full archive database, historical queries |
| **RPC (Standard)** | 4-8GB | API serving, standard sync |
| **RPC (High Traffic)** | 8-12GB | High request volume, complex queries |
**Current Assessment**: ✅ Appropriate for current workload
---
### CPU
**Recommendations**:
- **Validators**: 4+ vCPUs (consensus is CPU-intensive)
- **Sentries**: 4-8 vCPUs (P2P relay, archive queries)
- **RPC**: 4-8 vCPUs (API serving, request handling)
**Current VM Sizes**:
- Validators: `Standard_D4_v2` (4 vCPUs) ✅
- Sentries: `Standard_D4_v2` (4 vCPUs) ✅
- RPC: `Standard_D8s_v6` (8 vCPUs) ✅
**Assessment**: ✅ Current sizing is appropriate
---
### Disk I/O
**Archive Nodes (Sentries)**:
- High read I/O (historical queries)
- SSD recommended for archive database
- Consider high IOPS for archive nodes
**Validators/RPC**:
- Moderate I/O (recent block data)
- Standard storage sufficient
---
## Performance Monitoring
### Key Metrics to Monitor
1. **Peer Connections**:
- Active peer count vs. `max-peers`
- Peer connection churn
- Peer latency
2. **Block Sync**:
- Sync status (in-sync vs. syncing)
- Block import rate
- Sync lag (blocks behind)
3. **RPC Performance**:
- Request rate (requests/second)
- Response latency (p50, p95, p99)
- Error rate
4. **Resource Usage**:
- Memory usage (heap utilization)
- CPU usage
- Disk I/O (read/write rates)
5. **Transaction Pool**:
- Transaction pool size
- Transaction processing rate
---
## Tuning Recommendations by Network Growth
### Phase 1: Current (20 nodes)
**Current Settings**: ✅ Appropriate
- `max-peers=25` for most nodes
- `max-peers=50` for ThirdWeb RPC
- `sync-mode="FULL"` for all nodes
**No changes needed** at current scale.
---
### Phase 2: Medium Growth (30-50 nodes)
**Recommended Adjustments**:
1. Increase `max-peers` to 30-35 for sentries
2. Increase `max-peers` to 30-35 for high-traffic RPC
3. Monitor peer connection health
4. Consider `FAST` sync for non-archive RPC nodes
---
### Phase 3: Large Growth (50-100 nodes)
**Recommended Adjustments**:
1. Increase `max-peers` to 40-50 for sentries
2. Increase `max-peers` to 50-75 for high-traffic RPC
3. Review JVM heap sizes (may need increase)
4. Monitor and optimize database performance
5. Consider horizontal scaling for RPC nodes
---
## Network-Specific Tuning
### Validator Network
**Characteristics**: Consensus-critical, low latency needed
**Tuning**:
- Lower `max-peers` (only sentries + validators)
- Prioritize stable peer connections
- Monitor consensus performance (block time, round time)
**Current**: ✅ Optimized for consensus performance
---
### Sentry Network
**Characteristics**: P2P relay, full archive
**Tuning**:
- Moderate `max-peers` (handle P2P traffic)
- Archive database optimization
- Higher memory for historical queries
**Current**: ✅ Configured for archive + P2P relay
---
### RPC Network
**Characteristics**: API serving, variable traffic
**Tuning**:
- Variable `max-peers` by traffic level
- WebSocket configuration based on use case
- RPC timeout based on query complexity
**Current**: ✅ Varied appropriately by use case
---
## Performance Optimization Checklist
### Initial Setup
- ✅ JVM heap size appropriate for node type
-`max-peers` configured for network size
- ✅ Logging level optimized (WARN for most, INFO for archive)
- ✅ Sync mode appropriate (FULL for archive, consider FAST for non-archive)
### Ongoing Monitoring
- ⏳ Monitor peer connection health
- ⏳ Track RPC request latency
- ⏳ Monitor memory/CPU usage
- ⏳ Check block sync status
### Optimization
- ⏳ Adjust `max-peers` based on network growth
- ⏳ Tune JVM GC settings if needed
- ⏳ Optimize database performance for archive nodes
- ⏳ Scale resources if performance degrades
---
## Best Practices
### 1. Start Conservative
- Begin with recommended settings
- Monitor performance
- Adjust based on actual workload
### 2. Scale Gradually
- Increase `max-peers` incrementally
- Monitor impact of changes
- Revert if issues occur
### 3. Monitor First, Tune Second
- Collect performance metrics
- Identify bottlenecks
- Tune specific issues
### 4. Document Changes
- Track configuration changes
- Document performance impact
- Maintain configuration history
---
## Related Documentation
- `docs/04-configuration/BESU_CONFIGURATION_GUIDE.md` - Configuration reference
- `docs/04-configuration/RPC_CONFIG_ANALYSIS.md` - RPC configuration analysis
- Monitoring dashboards (Grafana/Prometheus)
---
**Last Updated**: 2026-01-17
**Status**: Performance Tuning Guide