13 KiB
Indexer Architecture Specification
Overview
This document specifies the architecture for the blockchain indexing pipeline that ingests, processes, and stores blockchain data from ChainID 138 and other supported chains. The indexer is responsible for maintaining a complete, queryable database of blocks, transactions, logs, traces, and token transfers.
Architecture
flowchart TB
subgraph Input[Input Layer]
Node[RPC Node<br/>ChainID 138]
WS[WebSocket<br/>New Block Events]
end
subgraph Ingest[Ingestion Layer]
BL[Block Listener<br/>Real-time]
BW[Backfill Worker<br/>Historical]
Q[Message Queue<br/>Kafka/RabbitMQ]
end
subgraph Process[Processing Layer]
BP[Block Processor]
TP[Transaction Processor]
LP[Log Processor]
TrP[Trace Processor]
TokenP[Token Transfer Processor]
end
subgraph Decode[Decoding Layer]
ABI[ABI Registry]
SigDB[Signature Database]
Decoder[Event Decoder]
end
subgraph Persist[Persistence Layer]
PG[(PostgreSQL<br/>Canonical Data)]
ES[(Elasticsearch<br/>Search Index)]
TS[(TimescaleDB<br/>Metrics)]
end
subgraph Materialize[Materialization Layer]
Agg[Aggregator<br/>TPS, Gas Stats]
Cache[Cache Layer<br/>Redis]
end
Node --> BL
Node --> BW
WS --> BL
BL --> Q
BW --> Q
Q --> BP
BP --> TP
BP --> LP
BP --> TrP
TP --> TokenP
LP --> Decoder
Decoder --> ABI
Decoder --> SigDB
BP --> PG
TP --> PG
LP --> PG
TrP --> PG
TokenP --> PG
BP --> ES
TP --> ES
LP --> ES
BP --> TS
TP --> TS
PG --> Agg
Agg --> Cache
Block Ingestion Pipeline
Block Listener (Real-time)
Purpose: Monitor blockchain for new blocks and ingest them immediately.
Implementation:
- Subscribe to
newHeadsvia WebSocket - Poll
eth_blockNumberas fallback (every 2 seconds) - Handle WebSocket reconnection automatically
Flow:
- Receive block header event
- Fetch full block data via
eth_getBlockByNumber - Enqueue block to processing queue
- Acknowledge receipt
Error Handling:
- Retry on network errors (exponential backoff)
- Handle reorgs (see reorg handling section)
- Log errors for monitoring
Backfill Worker (Historical)
Purpose: Index historical blocks from genesis or a specific starting point.
Implementation:
- Parallel workers for faster indexing
- Configurable batch size (e.g., 100 blocks per batch)
- Rate limiting to avoid overloading RPC node
- Checkpoint system for resuming interrupted backfills
Flow:
- Determine starting block (checkpoint or genesis)
- Fetch batch of blocks
- Enqueue each block to processing queue
- Update checkpoint
- Repeat until caught up with chain head
Optimization Strategies:
- Parallel workers process different block ranges
- Skip blocks already indexed (idempotent processing)
- Batch RPC requests where possible
Message Queue
Purpose: Decouple ingestion from processing, enable scaling, ensure durability.
Technology: Kafka or RabbitMQ
Topics/Queues:
blocks: New blocks to processtransactions: Transactions to decodetraces: Traces to process (async)
Configuration:
- Durability: Persistent storage
- Replication: 3 replicas for high availability
- Partitioning: By chain_id and block number (for ordering)
Transaction Processing Flow
Block Processing
Steps:
- Validate Block: Verify block hash, parent hash, block number
- Extract Transactions: Get transaction list from block
- Fetch Receipts: Get transaction receipts for all transactions
- Process Each Transaction:
- Store transaction data
- Process receipt (logs, status)
- Extract token transfers (ERC-20/721/1155)
- Link to contract interactions
Data Extracted:
- Transaction fields (hash, from, to, value, gas, etc.)
- Receipt fields (status, gasUsed, logs, etc.)
- Contract creation detection
- Token transfer events
Transaction Decoding
Purpose: Decode event logs and transaction data using ABIs.
Process:
- Identify contract address (to field or created address)
- Look up ABI in registry (verified contracts)
- Decode function calls and events
- Store decoded data for search and filtering
Fallback Strategies:
- Signature database for unknown functions/events (4-byte signatures)
- Heuristic detection for common patterns (Transfer events)
- Store raw data when decoding fails
ABI Registry
Purpose: Store contract ABIs for decoding transactions and events.
Data Sources:
- Contract verification submissions
- Sourcify integration
- Public ABI repositories (4byte.directory, etc.)
Storage:
- Database table:
contract_abis - Cache layer: Redis for frequently accessed ABIs
- Versioning: Support multiple ABI versions per contract
Schema:
contract_abis (
id UUID PRIMARY KEY,
chain_id INTEGER NOT NULL,
address VARCHAR(42) NOT NULL,
abi JSONB NOT NULL,
verified BOOLEAN DEFAULT false,
source VARCHAR(50), -- 'verification', 'sourcify', 'public'
created_at TIMESTAMP,
updated_at TIMESTAMP,
UNIQUE(chain_id, address)
)
Signature Database
Purpose: Map 4-byte function signatures and 32-byte event signatures to function/event names.
Data Sources:
- Public signature databases (4byte.directory)
- User submissions
- Automatic extraction from verified contracts
Usage:
- Lookup function name from signature (e.g.,
0x095ea7b3→approve(address,uint256)) - Lookup event name from topic[0] (e.g.,
0xddf252...→Transfer(address,address,uint256)) - Partial decoding when full ABI unavailable
Event Log Indexing
Log Processing
Purpose: Index event logs for efficient querying and filtering.
Process:
- Extract logs from transaction receipts
- Decode log topics and data using ABI
- Index by:
- Contract address
- Event signature (topic[0])
- Indexed parameters (topic[1..3])
- Block number and transaction hash
- Log index
Indexing Strategy:
- PostgreSQL table:
logswith indexes on (address, topic0, block_number) - Elasticsearch index: Full-text search on decoded event data
- Time-series: Aggregate log counts per contract/event
Event Decoding
Decoding Flow:
- Identify event signature from topic[0]
- Look up event definition in ABI registry
- Decode indexed parameters (topics 1-3)
- Decode non-indexed parameters (data field)
- Store decoded parameters as JSONB
Common Events to Index:
- ERC-20:
Transfer(address,address,uint256) - ERC-721:
Transfer(address,address,uint256) - ERC-1155:
TransferSingle,TransferBatch - Approval events:
Approval(address,address,uint256)
Trace Processing
Call Trace Extraction
Purpose: Extract detailed call traces for transaction debugging and internal transaction tracking.
Trace Types:
call: Contract callscreate: Contract creationsuicide: Contract self-destructdelegatecall: Delegate calls
Process:
- Request trace via
trace_transactionortrace_block - Parse trace result structure
- Extract:
- Call hierarchy (parent-child relationships)
- Internal transactions (value transfers)
- Gas usage per call
- Revert information
Internal Transaction Tracking
Purpose: Track value transfers that occur inside transactions (not just top-level).
Data Extracted:
- From address (caller)
- To address (callee)
- Value transferred
- Call type (call, delegatecall, etc.)
- Success/failure status
- Gas used
Storage:
- Separate table:
internal_transactions - Link to parent transaction via
transaction_hash - Link to parent call via
trace_addressarray
Token Transfer Extraction
ERC-20 Transfer Detection
Detection Method:
- Look for
Transfer(address,address,uint256)event - Decode event parameters (from, to, value)
- Store in
token_transferstable - Update token holder balances
Data Stored:
- Token contract address
- From address
- To address
- Amount (with decimals)
- Block number
- Transaction hash
- Log index
ERC-721 Transfer Detection
Similar to ERC-20 but:
- Token ID is tracked (unique NFT)
- Transfer can be from zero address (mint) or to zero address (burn)
ERC-1155 Transfer Detection
Events:
TransferSingle: Single token transferTransferBatch: Batch token transfer
Challenges:
- Multiple token IDs and amounts per transfer
- Batch operations require array decoding
Token Holder Tracking
Purpose: Maintain list of addresses holding each token.
Strategy:
- Real-time updates: Update on each transfer
- Periodic reconciliation: Verify balances via RPC
- Balance snapshots: Store balance at each block (for historical queries)
Indexer Worker Scaling and Partitioning
Horizontal Scaling
Strategy: Multiple indexer workers processing different blocks/chains.
Partitioning Methods:
- By Chain: Each worker handles one chain
- By Block Range: Workers split block ranges (for backfill)
- By Processing Stage: Separate workers for blocks, traces, token transfers
Worker Coordination
Mechanisms:
- Message queue: Workers consume from shared queue
- Database locks: Prevent duplicate processing
- Leader election: For single-worker tasks (reorg handling)
Load Balancing
Distribution:
- Round-robin for backfill workers
- Sticky sessions for chain-specific workers
- Priority queuing: Real-time blocks before historical blocks
Performance Targets
Throughput:
- Process 100 blocks/minute per worker
- Process 1000 transactions/minute per worker
- Process 100 traces/minute per worker (trace operations are slower)
Latency:
- Real-time blocks: Indexed within 5 seconds of block production
- Historical blocks: Catch up to chain head within reasonable time
Data Consistency
Transaction Isolation
Strategy: Process blocks atomically (all or nothing).
Implementation:
- Database transactions for block-level operations
- Idempotent processing (can safely retry)
- Checkpoint system to track last processed block
Idempotency
Requirements:
- Processing same block multiple times should not create duplicates
- Use unique constraints in database
- Upsert operations where applicable
Error Handling and Retry Logic
Error Types
-
Transient Errors: Network issues, temporary RPC failures
- Retry with exponential backoff
- Max retries: 10
- Max backoff: 5 minutes
-
Permanent Errors: Invalid data, unsupported features
- Log error and skip
- Alert for investigation
-
Reorg Errors: Block replaced by different block
- Handle via reorg detection (see reorg handling spec)
Retry Strategy
Exponential Backoff:
- Initial delay: 1 second
- Multiplier: 2x
- Max delay: 5 minutes
- Jitter: Random ±20% to avoid thundering herd
Monitoring and Observability
Key Metrics
Throughput:
- Blocks processed per minute
- Transactions processed per minute
- Logs indexed per minute
Latency:
- Time from block production to index completion
- Time to process block (p50, p95, p99)
Lag:
- Block height lag (current block - last indexed block)
- Time lag (current time - last indexed block time)
Errors:
- Error rate by type
- Retry count
- Failed blocks
Alerting Rules
- Block lag > 10 blocks: Warning
- Block lag > 100 blocks: Critical
- Error rate > 1%: Warning
- Error rate > 5%: Critical
- Worker down: Critical
Integration Points
RPC Node Integration
- See
../infrastructure/node-rpc-architecture.md - Connection pooling
- Rate limiting awareness
- Failover handling
Database Integration
- See
../database/postgres-schema.md - Connection pooling
- Batch inserts for performance
- Transaction management
Search Integration
- See
../database/search-index-schema.md - Async indexing to Elasticsearch
- Bulk indexing for efficiency
Implementation Guidelines
Technology Stack
Recommended:
- Language: Go, Rust, or Python (performance considerations)
- Queue: Kafka (high throughput) or RabbitMQ (simpler setup)
- Database: PostgreSQL with connection pooling
- Caching: Redis for frequently accessed data
Code Structure
indexer/
├── cmd/
│ ├── block-listener/ # Real-time block listener
│ ├── backfill-worker/ # Historical indexing worker
│ └── processor/ # Block/transaction processor
├── internal/
│ ├── ingestion/ # Ingestion logic
│ ├── processing/ # Processing logic
│ ├── decoding/ # ABI/signature decoding
│ └── persistence/ # Database operations
└── pkg/
├── abi/ # ABI registry
└── rpc/ # RPC client
Testing Strategy
Unit Tests:
- Decoding logic
- Data transformation
- Error handling
Integration Tests:
- End-to-end block processing
- Database operations
- Queue integration
Load Tests:
- Process historical blocks
- Simulate high block production rate
- Test worker scaling
References
- Data Models: See
data-models.md - Reorg Handling: See
reorg-handling.md - Database Schema: See
../database/postgres-schema.md - RPC Architecture: See
../infrastructure/node-rpc-architecture.md