Files
explorer-monorepo/docs/specs/indexing/reorg-handling.md

354 lines
8.6 KiB
Markdown

# Reorg Handling Specification
## Overview
This document specifies how the indexer handles blockchain reorganizations (reorgs), where the canonical chain changes and previously indexed blocks become invalid. Reorg handling ensures data consistency and maintains accurate blockchain state.
## Reorg Detection
### Detection Methods
**1. Block Hash Mismatch**
- Compare stored block hash with RPC node block hash
- If mismatch detected, reorg has occurred
**2. Parent Hash Validation**
- Verify each block's parent_hash matches previous block's hash
- Chain break indicates reorg point
**3. Block Height Comparison**
- Monitor chain head block number
- Sudden decrease indicates potential reorg
**4. Chain Head Monitoring**
- Poll `eth_blockNumber` periodically
- Compare with last indexed block number
- Detect rollback scenarios
### Detection Strategy
**Real-time Monitoring**:
- Check block hash after each new block ingestion
- Compare with RPC node's block hash for same block number
- Immediate detection for recent blocks
**Periodic Validation**:
- Validate last N blocks (e.g., 100 blocks) every M seconds
- Detect deep reorgs that may have been missed
- Background validation job
**Checkpoint Validation**:
- Validate checkpoint blocks (every 1000 blocks)
- Ensure checkpoint blocks still exist in canonical chain
- Detect deep reorgs quickly
## Reorg Handling Flow
```mermaid
flowchart TB
Detect[Detect Reorg]
Identify[Identify Reorg Point<br/>Find Common Ancestor]
Mark[Mark Blocks as Orphaned]
Delete[Delete Orphaned Data]
Reindex[Re-index New Chain]
Verify[Verify Data Consistency]
Detect --> Identify
Identify --> Mark
Mark --> Delete
Delete --> Reindex
Reindex --> Verify
Verify --> Done[Complete]
Verify -->|Inconsistency| Delete
```
### Step 1: Identify Reorg Point
**Goal**: Find the common ancestor block (last block that's still valid).
**Algorithm**:
1. Start from current chain head
2. Compare block hash with stored hash at each block number
3. When hash matches, that's the common ancestor
4. All blocks after common ancestor need reorg handling
**Optimization**:
- Binary search to find reorg point quickly
- Start from recent blocks and work backwards
- Cache last N block hashes for faster comparison
### Step 2: Mark Blocks as Orphaned
**Strategy**: Mark blocks as orphaned before deletion (for audit trail).
**Database Update**:
```sql
UPDATE blocks
SET orphaned = true, orphaned_at = NOW()
WHERE chain_id = ?
AND block_number > ?
AND orphaned = false;
```
**Benefits**:
- Audit trail of reorgs
- Ability to query orphaned blocks
- Easier debugging
### Step 3: Delete Orphaned Data
**Cascade Deletion Order**:
1. Token transfers (depends on transactions)
2. Logs (depends on transactions)
3. Traces (depends on transactions)
4. Internal transactions (depends on transactions)
5. Transactions (depends on blocks)
6. Blocks (orphaned blocks)
**Implementation**:
- Use database transactions for atomicity
- Cascade deletes via foreign keys (if enabled)
- Or explicit deletion in correct order
**Performance Considerations**:
- Batch deletions for large reorgs
- Index on `block_number` for efficient deletion
- Consider soft deletes (mark as deleted) vs hard deletes
### Step 4: Re-index New Chain
**Process**:
1. Fetch new blocks from RPC node (from reorg point onward)
2. Process blocks through normal indexing pipeline
3. Verify each block before marking as indexed
**Optimization**:
- Parallel processing for multiple blocks (if safe)
- Use existing indexer workers
- Priority queue: reorg blocks before new blocks
### Step 5: Verify Data Consistency
**Validation Checks**:
- Block hashes match RPC node
- Transaction counts match
- Parent hashes form valid chain
- No gaps in block numbers
**Metrics**:
- Reorg depth (number of blocks affected)
- Reorg duration (time to handle)
- Data consistency verification
## Data Consistency Guarantees
### Transaction Isolation
**Strategy**: Use database transactions to ensure atomic reorg handling.
**Implementation**:
```sql
BEGIN;
-- Mark blocks as orphaned
-- Delete orphaned data
-- Insert new blocks
-- Verify consistency
COMMIT;
```
**Rollback**: If any step fails, rollback entire operation.
### Idempotency
**Requirement**: Reorg handling must be idempotent (safe to retry).
**Mechanisms**:
- Check block hash before processing
- Skip blocks already correctly indexed
- Use unique constraints to prevent duplicates
### Finality Considerations
**Reorg Depth Limits**:
- Only handle reorgs within finality window
- For PoW: Typically 12-100 blocks deep
- For PoS: Typically 1-2 epochs (32-64 blocks)
- For BFT: Typically immediate finality
**Configuration**:
- Configurable reorg depth limit per chain
- Skip reorgs beyond finality window (log for investigation)
## Performance Optimization
### Handling Frequent Reorgs
**Problem**: Some chains have frequent small reorgs (1-2 blocks).
**Solution**:
- Batch reorg handling (wait for stability)
- Detect reorgs but delay handling for short period
- Only handle if reorg persists
**Configuration**:
- Reorg confirmation period: 30 seconds
- Maximum reorg depth to handle immediately: 5 blocks
- Deeper reorgs: Manual investigation
### Handling Deep Reorgs
**Problem**: Deep reorgs require deleting and re-indexing many blocks.
**Optimization Strategies**:
1. **Parallel Processing**: Process new chain blocks in parallel
2. **Batch Operations**: Batch database operations
3. **Incremental Updates**: Only update changed data
4. **Selective Deletion**: Only delete affected data (not entire blocks if possible)
### Index Maintenance
**During Reorg**:
- Pause index updates temporarily
- Resume after reorg handling complete
- Rebuild affected indexes if necessary
## Monitoring and Alerting
### Metrics
**Reorg Metrics**:
- Reorg count (per chain, per time period)
- Reorg depth distribution
- Reorg handling duration (p50, p95, p99)
- Data consistency check results
**Alerting Rules**:
- Reorg depth > 10 blocks: Warning (investigate)
- Reorg depth > 100 blocks: Critical (potential chain issue)
- Reorg handling duration > 5 minutes: Warning
- Data consistency check failure: Critical
### Logging
**Log Events**:
- Reorg detection (block number, depth)
- Reorg point identification (common ancestor)
- Blocks orphaned (count, block numbers)
- Re-indexing progress
- Data consistency verification results
**Log Levels**:
- INFO: Normal reorgs (< 5 blocks)
- WARN: Unusual reorgs (5-10 blocks)
- ERROR: Deep reorgs (> 10 blocks) or failures
## Edge Cases
### Multiple Reorgs in Quick Succession
**Scenario**: Chain reorgs, then reorgs again before first reorg is handled.
**Handling**:
- Cancel in-progress reorg handling
- Start new reorg handling from latest state
- Ensure idempotency
### Reorg During Backfill
**Scenario**: Historical block indexing encounters a reorg.
**Handling**:
- Pause backfill
- Handle reorg
- Resume backfill from reorg point
### Concurrent Reorg Handling
**Prevention**:
- Use database locks to prevent concurrent reorg handling
- Single reorg handler per chain
- Queue reorg events if handler is busy
## Recovery Procedures
### Manual Reorg Handling
**When to Use**:
- Automatic handling fails
- Deep reorgs beyond normal limits
- Data corruption detected
**Procedure**:
1. Identify reorg point manually
2. Verify with RPC node
3. Mark blocks as orphaned
4. Delete orphaned data
5. Trigger re-indexing
6. Verify data consistency
### Data Recovery
**Backup Strategy**:
- Regular database backups
- Point-in-time recovery capability
- Ability to restore to pre-reorg state
**Recovery Steps**:
1. Restore database to point before reorg
2. Re-run indexer from that point
3. Let automatic reorg handling process naturally
## Testing Strategy
### Unit Tests
- Reorg detection logic
- Common ancestor identification
- Orphan marking
- Data deletion logic
### Integration Tests
- Simulate reorgs (testnet with known reorgs)
- Verify data consistency after reorg
- Test concurrent reorg handling
- Test deep reorgs
### Load Tests
- Simulate frequent reorgs
- Measure performance impact
- Test reorg handling during high load
## Configuration
### Reorg Handling Configuration
```yaml
reorg:
detection:
check_interval_seconds: 10
validation_depth: 100
checkpoint_interval: 1000
handling:
max_depth: 100
confirmation_period_seconds: 30
batch_size: 1000
parallel_processing: true
finality:
chain_138:
type: "poa" # or "pos", "pow", "bft"
depth_limit: 50
finality_blocks: 12
```
## References
- Indexer Architecture: See `indexer-architecture.md`
- Data Models: See `data-models.md`
- Database Schema: See `../database/postgres-schema.md`