docs/04-configuration/BESU_ARCHIVE_NODES.md

# Besu Archive Node Configuration Guide

**Last Updated:** 2026-01-31  
**Document Version:** 1.0  
**Status:** Active Documentation

---

**Date**: 2026-01-17  
**Purpose**: Guide for configuring and managing Besu archive nodes (sentry nodes)

---

## Overview

Sentry nodes are configured as **full archive nodes** to maintain complete blockchain history for archival purposes. This guide documents archive node configuration, storage requirements, and management.

---

## Archive Node Configuration

### Current Sentry Configuration

**Node Type**: Sentry (Full Archive)

**Key Configuration**:
```toml
# Archive node configuration
sync-mode="FULL"              # Full blockchain sync
logging="INFO"                # Detailed logs for archival

# RPC Configuration (internal only)
rpc-http-enabled=true
rpc-http-api=["ETH","NET","WEB3","ADMIN"]

# Network
discovery-enabled=true        # Open P2P discovery
max-peers=25

# Permissioning
permissions-nodes-config-file-enabled=true
```

**File**: `smom-dbis-138-proxmox/templates/besu-configs/config-sentry.toml`

---

## Archive Node Requirements

### 1. Sync Mode: FULL

```toml
sync-mode="FULL"
```

**Verification**: ✅ All sentry configs use `sync-mode="FULL"`

**Purpose**: 
- Maintains complete blockchain history
- Enables historical state queries
- Required for full archive functionality

---

### 2. Logging: INFO

```toml
logging="INFO"
```

**Verification**: ✅ All sentry configs use `logging="INFO"`

**Rationale**:
- Detailed logs for archival purposes
- Better debugging for archive queries
- Necessary for historical analysis

**Trade-off**: Higher I/O overhead (~10-20%) compared to WARN logging

---

### 3. No Pruning

**Current Configuration**: ✅ Pruning not enabled (default: full archive)

**Verification**: No `pruning-enabled` or `pruning-blocks-retained` options in sentry configs

**Purpose**: 
- Keep all historical data
- Enable unlimited historical queries
- Maintain complete blockchain archive

**Note**: If storage becomes an issue, consider enabling pruning with high retention, but this reduces archive completeness.

---

### 4. RPC APIs for Archive Queries

**Current APIs**: `["ETH","NET","WEB3","ADMIN"]`

**Archive-Relevant APIs**:
- `ETH`: Standard Ethereum APIs (including historical queries)
- `ADMIN`: Administrative operations

**Verification**: ✅ Appropriate APIs enabled for archive access

---

## Storage Requirements

### Archive Database Growth

**Estimation** (per Besu documentation):
- **Block data**: ~2-5 KB per block
- **State data**: Variable (grows with contract storage)
- **Transaction receipts**: ~500 bytes per transaction

**Growth Rate**:
- **Current network**: ~20 blocks/minute = ~1,200 blocks/hour
- **Block data growth**: ~2.4-6 MB/hour = ~58-144 MB/day
- **With state data**: Significantly higher (contract storage)

**Storage Requirements**:

| Time Period | Estimated Storage | Notes |
|-------------|-------------------|-------|
| **1 month** | ~10-50 GB | Depends on transaction volume |
| **3 months** | ~30-150 GB | Linear growth expected |
| **1 year** | ~100-500 GB | State data may be higher |
| **5 years** | ~500 GB - 2.5 TB | Long-term archival |

**Current Assessment**: Monitor storage usage and plan for growth

---

### Storage Planning

**Recommendations**:

1. **Initial Allocation**: 
   - Minimum: 500 GB per archive node
   - Recommended: 1-2 TB per archive node

2. **Growth Planning**:
   - Monitor storage usage monthly
   - Plan expansion before reaching 80% capacity
   - Consider separate volumes for archive data

3. **Backup Strategy**:
   - Regular backups of archive database
   - Offsite backup for disaster recovery
   - Retention policy for backups

---

## Archive Node Verification

### Configuration Verification

```bash
# Verify sync mode is FULL
grep "sync-mode" /etc/besu/config-sentry.toml
# Expected: sync-mode="FULL"

# Verify logging is INFO
grep "logging" /etc/besu/config-sentry.toml
# Expected: logging="INFO"

# Verify no pruning options
grep -i "pruning" /etc/besu/config-sentry.toml
# Expected: No output (pruning not enabled = full archive)
```

**Current Status**: ✅ All sentry configs verified as archive nodes

---

### Functional Verification

**Check Archive Status**:
```bash
# Check sync status
curl -X POST http://localhost:8545 \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}'
# Expected: false (fully synced)

# Check latest block
curl -X POST http://localhost:8545 \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'

# Test historical query (verify archive capability)
curl -X POST http://localhost:8545 \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"eth_getBalance","params":["0x...","0x100"],"id":1}'
# Should return historical balance (archive nodes only)
```

---

## Archive Node Management

### Storage Management

**Monitor Storage Usage**:
```bash
# Check database size
du -sh /data/besu/database/

# Check disk usage
df -h /data/besu/

# Monitor growth over time
# (set up monitoring alerts at 80% capacity)
```

**Storage Expansion**:
1. Plan expansion when approaching 80% capacity
2. Backup archive data before expansion
3. Expand volume or add storage
4. Verify Besu continues operating

---

### Backup and Recovery

**Backup Strategy**:

1. **Database Backup**:
   - Full database backup weekly
   - Incremental backups daily
   - Offsite backup monthly

2. **Configuration Backup**:
   - Backup config files
   - Backup permission files
   - Backup node keys

3. **Recovery Procedures**:
   - Document recovery steps
   - Test recovery procedures
   - Maintain recovery runbook

---

### Performance Optimization

**Archive Node Performance**:

1. **Storage Performance**:
   - Use SSD for archive database (high read I/O)
   - Consider NVMe for high-performance requirements
   - Monitor I/O performance

2. **Memory Optimization**:
   - Higher heap size (8-12 GB) for archive nodes
   - Cache frequently accessed historical data
   - Monitor memory usage for historical queries

3. **Query Optimization**:
   - Index historical data appropriately
   - Monitor query performance
   - Optimize frequently used historical queries

---

## Archive vs. Pruned Nodes

### Full Archive (Current Configuration)

**Characteristics**:
- ✅ Complete blockchain history
- ✅ All historical state queries supported
- ✅ Unlimited historical access
- ⚠️ Higher storage requirements
- ⚠️ Higher memory requirements

**Use Case**: ✅ Sentry nodes (archival purposes)

---

### Pruned Nodes (Not Recommended for Sentries)

**Configuration**:
```toml
pruning-enabled=true
pruning-blocks-retained=1024  # Keep last 1024 blocks
```

**Characteristics**:
- ❌ Limited historical data
- ❌ Historical queries may fail
- ✅ Lower storage requirements
- ✅ Lower memory requirements

**Use Case**: Non-archive RPC nodes (if storage is concern)

**Note**: **Do NOT enable pruning on sentry nodes** - they are archive nodes.

---

## Alternative: Pruning Configuration (If Storage Becomes Issue)

**Only consider if storage is a critical constraint**:

```toml
# Enable pruning with high retention (NOT RECOMMENDED for full archive)
pruning-enabled=true
pruning-blocks-retained=100000  # Keep last 100,000 blocks (~70 days at 2s/block)
```

**Warning**: This reduces archive completeness. Prefer expanding storage instead.

---

## Monitoring Archive Nodes

### Key Metrics

1. **Sync Status**:
   - Fully synced (archive complete)
   - Syncing (catching up)
   - Lag (blocks behind)

2. **Storage Usage**:
   - Database size
   - Disk usage
   - Growth rate

3. **Query Performance**:
   - Historical query latency
   - Query success rate
   - Archive query volume

4. **Resource Usage**:
   - Memory usage (historical queries)
   - Disk I/O (read-heavy)
   - CPU usage (query processing)

---

## Archive Node Strategy

### Current Implementation

✅ **Sentry nodes = Full archive nodes**
- Complete blockchain history
- Detailed logs (INFO)
- Full sync mode
- No pruning

✅ **Validators = Non-archive**
- Minimal logs (WARN)
- Full sync (consensus requirement)
- Not archive nodes (no historical queries)

✅ **RPC nodes = Non-archive (most)**
- Minimal logs (WARN)
- Full sync (currently)
- Not archive nodes (API serving)

---

### Archive Node Distribution

**Current**:
- **Archive Nodes**: 4 sentries (VMIDs 1500-1503)
- **Non-Archive Nodes**: Validators + RPC nodes

**Recommendation**: ✅ Appropriate distribution
- Sentries handle archival
- Other nodes run lean
- Centralized archive management

---

## Storage Planning Example

### Example: 1 Year Archive Growth

**Assumptions**:
- Block time: 2 seconds
- Blocks per day: 43,200
- Blocks per year: ~15.7 million
- Block data: ~3 KB per block (average)
- State data: Variable (depends on contracts)

**Estimation**:
- Block data: 15.7M × 3 KB ≈ 47 GB/year
- State data: 50-200 GB/year (varies widely)
- **Total**: ~100-250 GB/year per archive node

**Planning**:
- Initial: 1 TB allocation
- Year 1: ~750 GB remaining
- Year 2: ~500 GB remaining
- Year 3: ~250 GB remaining
- **Action**: Plan expansion by year 3

---

## Best Practices

### 1. Storage Monitoring
- Monitor disk usage weekly
- Set alerts at 80% capacity
- Plan expansion proactively

### 2. Archive Verification
- Verify archive queries work
- Test historical state access
- Confirm sync status regularly

### 3. Backup Strategy
- Regular database backups
- Test recovery procedures
- Offsite backup for disaster recovery

### 4. Performance Monitoring
- Monitor query performance
- Track storage growth
- Optimize if performance degrades

---

## Related Documentation

- `docs/04-configuration/BESU_CONFIGURATION_GUIDE.md` - Configuration reference
- `docs/04-configuration/BESU_PERFORMANCE_TUNING.md` - Performance tuning
- `docs/04-configuration/BESU_PATH_REFERENCE.md` - Path structure

---

## Summary

### Archive Node Status

✅ **Configuration Verified**:
- All sentry nodes configured as full archive
- `sync-mode="FULL"` ✅
- `logging="INFO"` ✅
- No pruning enabled ✅

✅ **Storage Planning**:
- Monitor growth regularly
- Plan expansion proactively
- Maintain backup strategy

✅ **Performance**:
- Appropriate memory allocation
- SSD recommended for archive database
- Monitor query performance

---

**Last Updated**: 2026-01-17  
**Status**: Archive Configuration Verified