519 lines
13 KiB
Markdown
519 lines
13 KiB
Markdown
# Indexer Architecture Specification
|
|
|
|
## Overview
|
|
|
|
This document specifies the architecture for the blockchain indexing pipeline that ingests, processes, and stores blockchain data from ChainID 138 and other supported chains. The indexer is responsible for maintaining a complete, queryable database of blocks, transactions, logs, traces, and token transfers.
|
|
|
|
## Architecture
|
|
|
|
```mermaid
|
|
flowchart TB
|
|
subgraph Input[Input Layer]
|
|
Node[RPC Node<br/>ChainID 138]
|
|
WS[WebSocket<br/>New Block Events]
|
|
end
|
|
|
|
subgraph Ingest[Ingestion Layer]
|
|
BL[Block Listener<br/>Real-time]
|
|
BW[Backfill Worker<br/>Historical]
|
|
Q[Message Queue<br/>Kafka/RabbitMQ]
|
|
end
|
|
|
|
subgraph Process[Processing Layer]
|
|
BP[Block Processor]
|
|
TP[Transaction Processor]
|
|
LP[Log Processor]
|
|
TrP[Trace Processor]
|
|
TokenP[Token Transfer Processor]
|
|
end
|
|
|
|
subgraph Decode[Decoding Layer]
|
|
ABI[ABI Registry]
|
|
SigDB[Signature Database]
|
|
Decoder[Event Decoder]
|
|
end
|
|
|
|
subgraph Persist[Persistence Layer]
|
|
PG[(PostgreSQL<br/>Canonical Data)]
|
|
ES[(Elasticsearch<br/>Search Index)]
|
|
TS[(TimescaleDB<br/>Metrics)]
|
|
end
|
|
|
|
subgraph Materialize[Materialization Layer]
|
|
Agg[Aggregator<br/>TPS, Gas Stats]
|
|
Cache[Cache Layer<br/>Redis]
|
|
end
|
|
|
|
Node --> BL
|
|
Node --> BW
|
|
WS --> BL
|
|
|
|
BL --> Q
|
|
BW --> Q
|
|
|
|
Q --> BP
|
|
BP --> TP
|
|
BP --> LP
|
|
BP --> TrP
|
|
|
|
TP --> TokenP
|
|
LP --> Decoder
|
|
Decoder --> ABI
|
|
Decoder --> SigDB
|
|
|
|
BP --> PG
|
|
TP --> PG
|
|
LP --> PG
|
|
TrP --> PG
|
|
TokenP --> PG
|
|
|
|
BP --> ES
|
|
TP --> ES
|
|
LP --> ES
|
|
|
|
BP --> TS
|
|
TP --> TS
|
|
|
|
PG --> Agg
|
|
Agg --> Cache
|
|
```
|
|
|
|
## Block Ingestion Pipeline
|
|
|
|
### Block Listener (Real-time)
|
|
|
|
**Purpose**: Monitor blockchain for new blocks and ingest them immediately.
|
|
|
|
**Implementation**:
|
|
- Subscribe to `newHeads` via WebSocket
|
|
- Poll `eth_blockNumber` as fallback (every 2 seconds)
|
|
- Handle WebSocket reconnection automatically
|
|
|
|
**Flow**:
|
|
1. Receive block header event
|
|
2. Fetch full block data via `eth_getBlockByNumber`
|
|
3. Enqueue block to processing queue
|
|
4. Acknowledge receipt
|
|
|
|
**Error Handling**:
|
|
- Retry on network errors (exponential backoff)
|
|
- Handle reorgs (see reorg handling section)
|
|
- Log errors for monitoring
|
|
|
|
### Backfill Worker (Historical)
|
|
|
|
**Purpose**: Index historical blocks from genesis or a specific starting point.
|
|
|
|
**Implementation**:
|
|
- Parallel workers for faster indexing
|
|
- Configurable batch size (e.g., 100 blocks per batch)
|
|
- Rate limiting to avoid overloading RPC node
|
|
- Checkpoint system for resuming interrupted backfills
|
|
|
|
**Flow**:
|
|
1. Determine starting block (checkpoint or genesis)
|
|
2. Fetch batch of blocks
|
|
3. Enqueue each block to processing queue
|
|
4. Update checkpoint
|
|
5. Repeat until caught up with chain head
|
|
|
|
**Optimization Strategies**:
|
|
- Parallel workers process different block ranges
|
|
- Skip blocks already indexed (idempotent processing)
|
|
- Batch RPC requests where possible
|
|
|
|
### Message Queue
|
|
|
|
**Purpose**: Decouple ingestion from processing, enable scaling, ensure durability.
|
|
|
|
**Technology**: Kafka or RabbitMQ
|
|
|
|
**Topics/Queues**:
|
|
- `blocks`: New blocks to process
|
|
- `transactions`: Transactions to decode
|
|
- `traces`: Traces to process (async)
|
|
|
|
**Configuration**:
|
|
- Durability: Persistent storage
|
|
- Replication: 3 replicas for high availability
|
|
- Partitioning: By chain_id and block number (for ordering)
|
|
|
|
## Transaction Processing Flow
|
|
|
|
### Block Processing
|
|
|
|
**Steps**:
|
|
1. **Validate Block**: Verify block hash, parent hash, block number
|
|
2. **Extract Transactions**: Get transaction list from block
|
|
3. **Fetch Receipts**: Get transaction receipts for all transactions
|
|
4. **Process Each Transaction**:
|
|
- Store transaction data
|
|
- Process receipt (logs, status)
|
|
- Extract token transfers (ERC-20/721/1155)
|
|
- Link to contract interactions
|
|
|
|
**Data Extracted**:
|
|
- Transaction fields (hash, from, to, value, gas, etc.)
|
|
- Receipt fields (status, gasUsed, logs, etc.)
|
|
- Contract creation detection
|
|
- Token transfer events
|
|
|
|
### Transaction Decoding
|
|
|
|
**Purpose**: Decode event logs and transaction data using ABIs.
|
|
|
|
**Process**:
|
|
1. Identify contract address (to field or created address)
|
|
2. Look up ABI in registry (verified contracts)
|
|
3. Decode function calls and events
|
|
4. Store decoded data for search and filtering
|
|
|
|
**Fallback Strategies**:
|
|
- Signature database for unknown functions/events (4-byte signatures)
|
|
- Heuristic detection for common patterns (Transfer events)
|
|
- Store raw data when decoding fails
|
|
|
|
### ABI Registry
|
|
|
|
**Purpose**: Store contract ABIs for decoding transactions and events.
|
|
|
|
**Data Sources**:
|
|
- Contract verification submissions
|
|
- Sourcify integration
|
|
- Public ABI repositories (4byte.directory, etc.)
|
|
|
|
**Storage**:
|
|
- Database table: `contract_abis`
|
|
- Cache layer: Redis for frequently accessed ABIs
|
|
- Versioning: Support multiple ABI versions per contract
|
|
|
|
**Schema**:
|
|
```sql
|
|
contract_abis (
|
|
id UUID PRIMARY KEY,
|
|
chain_id INTEGER NOT NULL,
|
|
address VARCHAR(42) NOT NULL,
|
|
abi JSONB NOT NULL,
|
|
verified BOOLEAN DEFAULT false,
|
|
source VARCHAR(50), -- 'verification', 'sourcify', 'public'
|
|
created_at TIMESTAMP,
|
|
updated_at TIMESTAMP,
|
|
UNIQUE(chain_id, address)
|
|
)
|
|
```
|
|
|
|
### Signature Database
|
|
|
|
**Purpose**: Map 4-byte function signatures and 32-byte event signatures to function/event names.
|
|
|
|
**Data Sources**:
|
|
- Public signature databases (4byte.directory)
|
|
- User submissions
|
|
- Automatic extraction from verified contracts
|
|
|
|
**Usage**:
|
|
- Lookup function name from signature (e.g., `0x095ea7b3` → `approve(address,uint256)`)
|
|
- Lookup event name from topic[0] (e.g., `0xddf252...` → `Transfer(address,address,uint256)`)
|
|
- Partial decoding when full ABI unavailable
|
|
|
|
## Event Log Indexing
|
|
|
|
### Log Processing
|
|
|
|
**Purpose**: Index event logs for efficient querying and filtering.
|
|
|
|
**Process**:
|
|
1. Extract logs from transaction receipts
|
|
2. Decode log topics and data using ABI
|
|
3. Index by:
|
|
- Contract address
|
|
- Event signature (topic[0])
|
|
- Indexed parameters (topic[1..3])
|
|
- Block number and transaction hash
|
|
- Log index
|
|
|
|
**Indexing Strategy**:
|
|
- PostgreSQL table: `logs` with indexes on (address, topic0, block_number)
|
|
- Elasticsearch index: Full-text search on decoded event data
|
|
- Time-series: Aggregate log counts per contract/event
|
|
|
|
### Event Decoding
|
|
|
|
**Decoding Flow**:
|
|
1. Identify event signature from topic[0]
|
|
2. Look up event definition in ABI registry
|
|
3. Decode indexed parameters (topics 1-3)
|
|
4. Decode non-indexed parameters (data field)
|
|
5. Store decoded parameters as JSONB
|
|
|
|
**Common Events to Index**:
|
|
- ERC-20: `Transfer(address,address,uint256)`
|
|
- ERC-721: `Transfer(address,address,uint256)`
|
|
- ERC-1155: `TransferSingle`, `TransferBatch`
|
|
- Approval events: `Approval(address,address,uint256)`
|
|
|
|
## Trace Processing
|
|
|
|
### Call Trace Extraction
|
|
|
|
**Purpose**: Extract detailed call traces for transaction debugging and internal transaction tracking.
|
|
|
|
**Trace Types**:
|
|
- `call`: Contract calls
|
|
- `create`: Contract creation
|
|
- `suicide`: Contract self-destruct
|
|
- `delegatecall`: Delegate calls
|
|
|
|
**Process**:
|
|
1. Request trace via `trace_transaction` or `trace_block`
|
|
2. Parse trace result structure
|
|
3. Extract:
|
|
- Call hierarchy (parent-child relationships)
|
|
- Internal transactions (value transfers)
|
|
- Gas usage per call
|
|
- Revert information
|
|
|
|
### Internal Transaction Tracking
|
|
|
|
**Purpose**: Track value transfers that occur inside transactions (not just top-level).
|
|
|
|
**Data Extracted**:
|
|
- From address (caller)
|
|
- To address (callee)
|
|
- Value transferred
|
|
- Call type (call, delegatecall, etc.)
|
|
- Success/failure status
|
|
- Gas used
|
|
|
|
**Storage**:
|
|
- Separate table: `internal_transactions`
|
|
- Link to parent transaction via `transaction_hash`
|
|
- Link to parent call via `trace_address` array
|
|
|
|
## Token Transfer Extraction
|
|
|
|
### ERC-20 Transfer Detection
|
|
|
|
**Detection Method**:
|
|
1. Look for `Transfer(address,address,uint256)` event
|
|
2. Decode event parameters (from, to, value)
|
|
3. Store in `token_transfers` table
|
|
4. Update token holder balances
|
|
|
|
**Data Stored**:
|
|
- Token contract address
|
|
- From address
|
|
- To address
|
|
- Amount (with decimals)
|
|
- Block number
|
|
- Transaction hash
|
|
- Log index
|
|
|
|
### ERC-721 Transfer Detection
|
|
|
|
**Similar to ERC-20 but**:
|
|
- Token ID is tracked (unique NFT)
|
|
- Transfer can be from zero address (mint) or to zero address (burn)
|
|
|
|
### ERC-1155 Transfer Detection
|
|
|
|
**Events**:
|
|
- `TransferSingle`: Single token transfer
|
|
- `TransferBatch`: Batch token transfer
|
|
|
|
**Challenges**:
|
|
- Multiple token IDs and amounts per transfer
|
|
- Batch operations require array decoding
|
|
|
|
### Token Holder Tracking
|
|
|
|
**Purpose**: Maintain list of addresses holding each token.
|
|
|
|
**Strategy**:
|
|
- Real-time updates: Update on each transfer
|
|
- Periodic reconciliation: Verify balances via RPC
|
|
- Balance snapshots: Store balance at each block (for historical queries)
|
|
|
|
## Indexer Worker Scaling and Partitioning
|
|
|
|
### Horizontal Scaling
|
|
|
|
**Strategy**: Multiple indexer workers processing different blocks/chains.
|
|
|
|
**Partitioning Methods**:
|
|
1. **By Chain**: Each worker handles one chain
|
|
2. **By Block Range**: Workers split block ranges (for backfill)
|
|
3. **By Processing Stage**: Separate workers for blocks, traces, token transfers
|
|
|
|
### Worker Coordination
|
|
|
|
**Mechanisms**:
|
|
- Message queue: Workers consume from shared queue
|
|
- Database locks: Prevent duplicate processing
|
|
- Leader election: For single-worker tasks (reorg handling)
|
|
|
|
### Load Balancing
|
|
|
|
**Distribution**:
|
|
- Round-robin for backfill workers
|
|
- Sticky sessions for chain-specific workers
|
|
- Priority queuing: Real-time blocks before historical blocks
|
|
|
|
### Performance Targets
|
|
|
|
**Throughput**:
|
|
- Process 100 blocks/minute per worker
|
|
- Process 1000 transactions/minute per worker
|
|
- Process 100 traces/minute per worker (trace operations are slower)
|
|
|
|
**Latency**:
|
|
- Real-time blocks: Indexed within 5 seconds of block production
|
|
- Historical blocks: Catch up to chain head within reasonable time
|
|
|
|
## Data Consistency
|
|
|
|
### Transaction Isolation
|
|
|
|
**Strategy**: Process blocks atomically (all or nothing).
|
|
|
|
**Implementation**:
|
|
- Database transactions for block-level operations
|
|
- Idempotent processing (can safely retry)
|
|
- Checkpoint system to track last processed block
|
|
|
|
### Idempotency
|
|
|
|
**Requirements**:
|
|
- Processing same block multiple times should not create duplicates
|
|
- Use unique constraints in database
|
|
- Upsert operations where applicable
|
|
|
|
## Error Handling and Retry Logic
|
|
|
|
### Error Types
|
|
|
|
1. **Transient Errors**: Network issues, temporary RPC failures
|
|
- Retry with exponential backoff
|
|
- Max retries: 10
|
|
- Max backoff: 5 minutes
|
|
|
|
2. **Permanent Errors**: Invalid data, unsupported features
|
|
- Log error and skip
|
|
- Alert for investigation
|
|
|
|
3. **Reorg Errors**: Block replaced by different block
|
|
- Handle via reorg detection (see reorg handling spec)
|
|
|
|
### Retry Strategy
|
|
|
|
**Exponential Backoff**:
|
|
- Initial delay: 1 second
|
|
- Multiplier: 2x
|
|
- Max delay: 5 minutes
|
|
- Jitter: Random ±20% to avoid thundering herd
|
|
|
|
## Monitoring and Observability
|
|
|
|
### Key Metrics
|
|
|
|
**Throughput**:
|
|
- Blocks processed per minute
|
|
- Transactions processed per minute
|
|
- Logs indexed per minute
|
|
|
|
**Latency**:
|
|
- Time from block production to index completion
|
|
- Time to process block (p50, p95, p99)
|
|
|
|
**Lag**:
|
|
- Block height lag (current block - last indexed block)
|
|
- Time lag (current time - last indexed block time)
|
|
|
|
**Errors**:
|
|
- Error rate by type
|
|
- Retry count
|
|
- Failed blocks
|
|
|
|
### Alerting Rules
|
|
|
|
- Block lag > 10 blocks: Warning
|
|
- Block lag > 100 blocks: Critical
|
|
- Error rate > 1%: Warning
|
|
- Error rate > 5%: Critical
|
|
- Worker down: Critical
|
|
|
|
## Integration Points
|
|
|
|
### RPC Node Integration
|
|
|
|
- See `../infrastructure/node-rpc-architecture.md`
|
|
- Connection pooling
|
|
- Rate limiting awareness
|
|
- Failover handling
|
|
|
|
### Database Integration
|
|
|
|
- See `../database/postgres-schema.md`
|
|
- Connection pooling
|
|
- Batch inserts for performance
|
|
- Transaction management
|
|
|
|
### Search Integration
|
|
|
|
- See `../database/search-index-schema.md`
|
|
- Async indexing to Elasticsearch
|
|
- Bulk indexing for efficiency
|
|
|
|
## Implementation Guidelines
|
|
|
|
### Technology Stack
|
|
|
|
**Recommended**:
|
|
- **Language**: Go, Rust, or Python (performance considerations)
|
|
- **Queue**: Kafka (high throughput) or RabbitMQ (simpler setup)
|
|
- **Database**: PostgreSQL with connection pooling
|
|
- **Caching**: Redis for frequently accessed data
|
|
|
|
### Code Structure
|
|
|
|
```
|
|
indexer/
|
|
├── cmd/
|
|
│ ├── block-listener/ # Real-time block listener
|
|
│ ├── backfill-worker/ # Historical indexing worker
|
|
│ └── processor/ # Block/transaction processor
|
|
├── internal/
|
|
│ ├── ingestion/ # Ingestion logic
|
|
│ ├── processing/ # Processing logic
|
|
│ ├── decoding/ # ABI/signature decoding
|
|
│ └── persistence/ # Database operations
|
|
└── pkg/
|
|
├── abi/ # ABI registry
|
|
└── rpc/ # RPC client
|
|
```
|
|
|
|
### Testing Strategy
|
|
|
|
**Unit Tests**:
|
|
- Decoding logic
|
|
- Data transformation
|
|
- Error handling
|
|
|
|
**Integration Tests**:
|
|
- End-to-end block processing
|
|
- Database operations
|
|
- Queue integration
|
|
|
|
**Load Tests**:
|
|
- Process historical blocks
|
|
- Simulate high block production rate
|
|
- Test worker scaling
|
|
|
|
## References
|
|
|
|
- Data Models: See `data-models.md`
|
|
- Reorg Handling: See `reorg-handling.md`
|
|
- Database Schema: See `../database/postgres-schema.md`
|
|
- RPC Architecture: See `../infrastructure/node-rpc-architecture.md`
|
|
|