proxmox/docs/04-configuration/BESU_PERFORMANCE_TUNING.md

# Besu Performance Tuning Guide

**Last Updated:** 2026-01-31
**Document Version:** 1.0
**Status:** Active Documentation

---

**Date**: 2026-01-17
**Purpose**: Performance optimization recommendations for Besu nodes

---

## Overview

This guide provides performance tuning recommendations for Besu nodes based on network size, node type, and operational requirements.

---

## Network Size Analysis

### Current Network Topology

- **Validators**: 5 nodes (VMIDs 1000-1004)
- **Sentries**: 4 nodes (VMIDs 1500-1503)
- **RPC Nodes**: 10+ nodes (VMIDs 2500+)
- **Total Nodes**: ~19-20 active nodes

### Expected Growth

- **Near-term**: 20-30 nodes
- **Medium-term**: 30-50 nodes
- **Long-term**: 50-100 nodes

---

## Performance Configuration Options

### max-peers

**Current Settings**:
- Validators: `25` peers
- Sentries: `25` peers
- RPC (Standard): `25` peers
- RPC (ThirdWeb): `50` peers

**Recommended Settings by Network Size**:

| Network Size | Validators | Sentries | RPC (Standard) | RPC (High Traffic) |
|--------------|------------|----------|----------------|-------------------|
| **10-20 nodes** | 15-20 | 20-25 | 20-25 | 30-40 |
| **20-50 nodes** | 20-25 | 25-30 | 25-30 | 40-50 |
| **50-100 nodes** | 25-30 | 30-40 | 30-40 | 50-75 |
| **100+ nodes** | 30-40 | 40-50 | 40-50 | 75-100 |

**Rationale**:
- **Validators**: Fewer peers needed (only sentries and other validators)
- **Sentries**: Moderate peers (handle P2P traffic for validators)
- **RPC Standard**: Moderate peers (serve API requests)
- **RPC High Traffic**: Higher peers (ThirdWeb, high-volume applications)

**Current Assessment**: ✅ Appropriate for current network size (20 nodes)

---

### P2P Configuration

```toml
# P2P host binding
p2p-host="0.0.0.0"
p2p-port=30303

# Maximum peer connections
max-peers=25

# Discovery
discovery-enabled=true  # or false for isolated nodes
```

**Tuning Guidelines**:
- **Discovery enabled**: For public-facing nodes (sentries, public RPC)
- **Discovery disabled**: For internal-only nodes (validators, core RPC)
- **Max peers**: Balance between connectivity and resource usage

---

### Sync Mode Configuration

```toml
sync-mode="FULL"
```

**Options**:
- `FULL`: Full blockchain sync (validators, archive nodes)
- `FAST`: Fast sync (non-archive RPC nodes)
- `SNAP`: Snapshot sync (if available, fastest bootstrap)

**Recommendations**:
- ✅ **Validators**: `FULL` (required for consensus)
- ✅ **Sentries (Archive)**: `FULL` (archive nodes)
- ⚠️ **RPC Nodes**: Consider `FAST` for non-archive nodes (better performance)

**Note**: Current configs all use `FULL`. Consider `FAST` for non-archive RPC nodes if storage is a concern.

---

### Logging Configuration

```toml
logging="WARN"  # Validators and RPC
logging="INFO"  # Sentry archive nodes
```

**Performance Impact**:
- **INFO logging**: ~10-20% I/O overhead
- **WARN logging**: Minimal I/O overhead (<5%)
- **DEBUG logging**: High I/O overhead (30-50%)

**Recommendation**: ✅ Current settings are optimal
- Validators/RPC: `WARN` (minimal overhead)
- Sentry archive: `INFO` (detailed logs for archival)

---

### RPC Configuration

#### HTTP-RPC Timeout

```toml
# ThirdWeb RPC uses extended timeout
rpc-http-timeout=60
```

**Default**: 60 seconds (Besu default)

**Tuning**:
- **Standard RPC**: Default (60s) is appropriate
- **High-volume RPC**: May need longer timeout for complex queries
- **Public RPC**: Default is sufficient

**Recommendation**: ✅ Current settings appropriate

---

#### WebSocket Configuration

```toml
rpc-ws-enabled=true
rpc-ws-port=8546
```

**Performance Considerations**:
- WebSocket connections consume memory
- Recommended for real-time applications (ThirdWeb, dApps)
- Not needed for simple read-only public RPC

**Current Usage**: ✅ Appropriate (enabled where needed, disabled for public RPC)

---

### Metrics Configuration

```toml
metrics-enabled=true
metrics-port=9545
metrics-host="0.0.0.0"
```

**Performance Impact**: Minimal (<2% overhead)

**Recommendation**: ✅ Keep enabled on all nodes for monitoring

---

## Resource Recommendations

### Memory (JVM Heap)

**Current Settings** (from deployment scripts):
- Validators: `-Xmx4g -Xms4g`
- Sentries: `-Xmx6g -Xms6g` (archive nodes need more)
- RPC: `-Xmx6g -Xms6g`

**Recommended by Node Type**:

| Node Type | Heap Size | Rationale |
|-----------|-----------|-----------|
| **Validator** | 4-8GB | Consensus operations, transaction pool |
| **Sentry (Archive)** | 8-12GB | Full archive database, historical queries |
| **RPC (Standard)** | 4-8GB | API serving, standard sync |
| **RPC (High Traffic)** | 8-12GB | High request volume, complex queries |

**Current Assessment**: ✅ Appropriate for current workload

---

### CPU

**Recommendations**:
- **Validators**: 4+ vCPUs (consensus is CPU-intensive)
- **Sentries**: 4-8 vCPUs (P2P relay, archive queries)
- **RPC**: 4-8 vCPUs (API serving, request handling)

**Current VM Sizes**:
- Validators: `Standard_D4_v2` (4 vCPUs) ✅
- Sentries: `Standard_D4_v2` (4 vCPUs) ✅
- RPC: `Standard_D8s_v6` (8 vCPUs) ✅

**Assessment**: ✅ Current sizing is appropriate

---

### Disk I/O

**Archive Nodes (Sentries)**:
- High read I/O (historical queries)
- SSD recommended for archive database
- Consider high IOPS for archive nodes

**Validators/RPC**:
- Moderate I/O (recent block data)
- Standard storage sufficient

---

## Performance Monitoring

### Key Metrics to Monitor

1. **Peer Connections**:
   - Active peer count vs. `max-peers`
   - Peer connection churn
   - Peer latency

2. **Block Sync**:
   - Sync status (in-sync vs. syncing)
   - Block import rate
   - Sync lag (blocks behind)

3. **RPC Performance**:
   - Request rate (requests/second)
   - Response latency (p50, p95, p99)
   - Error rate

4. **Resource Usage**:
   - Memory usage (heap utilization)
   - CPU usage
   - Disk I/O (read/write rates)

5. **Transaction Pool**:
   - Transaction pool size
   - Transaction processing rate

---

## Tuning Recommendations by Network Growth

### Phase 1: Current (20 nodes)

**Current Settings**: ✅ Appropriate
- `max-peers=25` for most nodes
- `max-peers=50` for ThirdWeb RPC
- `sync-mode="FULL"` for all nodes

**No changes needed** at current scale.

---

### Phase 2: Medium Growth (30-50 nodes)

**Recommended Adjustments**:
1. Increase `max-peers` to 30-35 for sentries
2. Increase `max-peers` to 30-35 for high-traffic RPC
3. Monitor peer connection health
4. Consider `FAST` sync for non-archive RPC nodes

---

### Phase 3: Large Growth (50-100 nodes)

**Recommended Adjustments**:
1. Increase `max-peers` to 40-50 for sentries
2. Increase `max-peers` to 50-75 for high-traffic RPC
3. Review JVM heap sizes (may need increase)
4. Monitor and optimize database performance
5. Consider horizontal scaling for RPC nodes

---

## Network-Specific Tuning

### Validator Network

**Characteristics**: Consensus-critical, low latency needed

**Tuning**:
- Lower `max-peers` (only sentries + validators)
- Prioritize stable peer connections
- Monitor consensus performance (block time, round time)

**Current**: ✅ Optimized for consensus performance

---

### Sentry Network

**Characteristics**: P2P relay, full archive

**Tuning**:
- Moderate `max-peers` (handle P2P traffic)
- Archive database optimization
- Higher memory for historical queries

**Current**: ✅ Configured for archive + P2P relay

---

### RPC Network

**Characteristics**: API serving, variable traffic

**Tuning**:
- Variable `max-peers` by traffic level
- WebSocket configuration based on use case
- RPC timeout based on query complexity

**Current**: ✅ Varied appropriately by use case

---

## Performance Optimization Checklist

### Initial Setup
- ✅ JVM heap size appropriate for node type
- ✅ `max-peers` configured for network size
- ✅ Logging level optimized (WARN for most, INFO for archive)
- ✅ Sync mode appropriate (FULL for archive, consider FAST for non-archive)

### Ongoing Monitoring
- ⏳ Monitor peer connection health
- ⏳ Track RPC request latency
- ⏳ Monitor memory/CPU usage
- ⏳ Check block sync status

### Optimization
- ⏳ Adjust `max-peers` based on network growth
- ⏳ Tune JVM GC settings if needed
- ⏳ Optimize database performance for archive nodes
- ⏳ Scale resources if performance degrades

---

## Best Practices

### 1. Start Conservative
- Begin with recommended settings
- Monitor performance
- Adjust based on actual workload

### 2. Scale Gradually
- Increase `max-peers` incrementally
- Monitor impact of changes
- Revert if issues occur

### 3. Monitor First, Tune Second
- Collect performance metrics
- Identify bottlenecks
- Tune specific issues

### 4. Document Changes
- Track configuration changes
- Document performance impact
- Maintain configuration history

---

## Related Documentation

- `docs/04-configuration/BESU_CONFIGURATION_GUIDE.md` - Configuration reference
- `docs/04-configuration/RPC_CONFIG_ANALYSIS.md` - RPC configuration analysis
- Monitoring dashboards (Grafana/Prometheus)

---

**Last Updated**: 2026-01-17
**Status**: Performance Tuning Guide