Files

defiQUG 903c03c65b Add full monorepo: virtual-banker, backend, frontend, docs, scripts, deployment

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-02-10 11:32:49 -08:00

13 KiB

Raw Blame History

Indexer Architecture Specification

Overview

This document specifies the architecture for the blockchain indexing pipeline that ingests, processes, and stores blockchain data from ChainID 138 and other supported chains. The indexer is responsible for maintaining a complete, queryable database of blocks, transactions, logs, traces, and token transfers.

Architecture

flowchart TB
    subgraph Input[Input Layer]
        Node[RPC Node<br/>ChainID 138]
        WS[WebSocket<br/>New Block Events]
    end
    
    subgraph Ingest[Ingestion Layer]
        BL[Block Listener<br/>Real-time]
        BW[Backfill Worker<br/>Historical]
        Q[Message Queue<br/>Kafka/RabbitMQ]
    end
    
    subgraph Process[Processing Layer]
        BP[Block Processor]
        TP[Transaction Processor]
        LP[Log Processor]
        TrP[Trace Processor]
        TokenP[Token Transfer Processor]
    end
    
    subgraph Decode[Decoding Layer]
        ABI[ABI Registry]
        SigDB[Signature Database]
        Decoder[Event Decoder]
    end
    
    subgraph Persist[Persistence Layer]
        PG[(PostgreSQL<br/>Canonical Data)]
        ES[(Elasticsearch<br/>Search Index)]
        TS[(TimescaleDB<br/>Metrics)]
    end
    
    subgraph Materialize[Materialization Layer]
        Agg[Aggregator<br/>TPS, Gas Stats]
        Cache[Cache Layer<br/>Redis]
    end
    
    Node --> BL
    Node --> BW
    WS --> BL
    
    BL --> Q
    BW --> Q
    
    Q --> BP
    BP --> TP
    BP --> LP
    BP --> TrP
    
    TP --> TokenP
    LP --> Decoder
    Decoder --> ABI
    Decoder --> SigDB
    
    BP --> PG
    TP --> PG
    LP --> PG
    TrP --> PG
    TokenP --> PG
    
    BP --> ES
    TP --> ES
    LP --> ES
    
    BP --> TS
    TP --> TS
    
    PG --> Agg
    Agg --> Cache

Block Ingestion Pipeline

Block Listener (Real-time)

Purpose: Monitor blockchain for new blocks and ingest them immediately.

Implementation:

Subscribe to newHeads via WebSocket
Poll eth_blockNumber as fallback (every 2 seconds)
Handle WebSocket reconnection automatically

Flow:

Receive block header event
Fetch full block data via eth_getBlockByNumber
Enqueue block to processing queue
Acknowledge receipt

Error Handling:

Retry on network errors (exponential backoff)
Handle reorgs (see reorg handling section)
Log errors for monitoring

Backfill Worker (Historical)

Purpose: Index historical blocks from genesis or a specific starting point.

Implementation:

Parallel workers for faster indexing
Configurable batch size (e.g., 100 blocks per batch)
Rate limiting to avoid overloading RPC node
Checkpoint system for resuming interrupted backfills

Flow:

Determine starting block (checkpoint or genesis)
Fetch batch of blocks
Enqueue each block to processing queue
Update checkpoint
Repeat until caught up with chain head

Optimization Strategies:

Parallel workers process different block ranges
Skip blocks already indexed (idempotent processing)
Batch RPC requests where possible

Message Queue

Purpose: Decouple ingestion from processing, enable scaling, ensure durability.

Technology: Kafka or RabbitMQ

Topics/Queues:

blocks: New blocks to process
transactions: Transactions to decode
traces: Traces to process (async)

Configuration:

Durability: Persistent storage
Replication: 3 replicas for high availability
Partitioning: By chain_id and block number (for ordering)

Transaction Processing Flow

Block Processing

Steps:

Validate Block: Verify block hash, parent hash, block number
Extract Transactions: Get transaction list from block
Fetch Receipts: Get transaction receipts for all transactions
Process Each Transaction:
- Store transaction data
- Process receipt (logs, status)
- Extract token transfers (ERC-20/721/1155)
- Link to contract interactions

Data Extracted:

Transaction fields (hash, from, to, value, gas, etc.)
Receipt fields (status, gasUsed, logs, etc.)
Contract creation detection
Token transfer events

Transaction Decoding

Purpose: Decode event logs and transaction data using ABIs.

Process:

Identify contract address (to field or created address)
Look up ABI in registry (verified contracts)
Decode function calls and events
Store decoded data for search and filtering

Fallback Strategies:

Signature database for unknown functions/events (4-byte signatures)
Heuristic detection for common patterns (Transfer events)
Store raw data when decoding fails

ABI Registry

Purpose: Store contract ABIs for decoding transactions and events.

Data Sources:

Contract verification submissions
Sourcify integration
Public ABI repositories (4byte.directory, etc.)

Storage:

Database table: contract_abis
Cache layer: Redis for frequently accessed ABIs
Versioning: Support multiple ABI versions per contract

Schema:

contract_abis (
    id UUID PRIMARY KEY,
    chain_id INTEGER NOT NULL,
    address VARCHAR(42) NOT NULL,
    abi JSONB NOT NULL,
    verified BOOLEAN DEFAULT false,
    source VARCHAR(50), -- 'verification', 'sourcify', 'public'
    created_at TIMESTAMP,
    updated_at TIMESTAMP,
    UNIQUE(chain_id, address)
)

Signature Database

Purpose: Map 4-byte function signatures and 32-byte event signatures to function/event names.

Data Sources:

Public signature databases (4byte.directory)
User submissions
Automatic extraction from verified contracts

Usage:

Lookup function name from signature (e.g., 0x095ea7b3 → approve(address,uint256))
Lookup event name from topic[0] (e.g., 0xddf252... → Transfer(address,address,uint256))
Partial decoding when full ABI unavailable

Event Log Indexing

Log Processing

Purpose: Index event logs for efficient querying and filtering.

Process:

Extract logs from transaction receipts
Decode log topics and data using ABI
Index by:
- Contract address
- Event signature (topic[0])
- Indexed parameters (topic[1..3])
- Block number and transaction hash
- Log index

Indexing Strategy:

PostgreSQL table: logs with indexes on (address, topic0, block_number)
Elasticsearch index: Full-text search on decoded event data
Time-series: Aggregate log counts per contract/event

Event Decoding

Decoding Flow:

Identify event signature from topic[0]
Look up event definition in ABI registry
Decode indexed parameters (topics 1-3)
Decode non-indexed parameters (data field)
Store decoded parameters as JSONB

Common Events to Index:

ERC-20: Transfer(address,address,uint256)
ERC-721: Transfer(address,address,uint256)
ERC-1155: TransferSingle, TransferBatch
Approval events: Approval(address,address,uint256)

Trace Processing

Call Trace Extraction

Purpose: Extract detailed call traces for transaction debugging and internal transaction tracking.

Trace Types:

call: Contract calls
create: Contract creation
suicide: Contract self-destruct
delegatecall: Delegate calls

Process:

Request trace via trace_transaction or trace_block
Parse trace result structure
Extract:
- Call hierarchy (parent-child relationships)
- Internal transactions (value transfers)
- Gas usage per call
- Revert information

Internal Transaction Tracking

Purpose: Track value transfers that occur inside transactions (not just top-level).

Data Extracted:

From address (caller)
To address (callee)
Value transferred
Call type (call, delegatecall, etc.)
Success/failure status
Gas used

Storage:

Separate table: internal_transactions
Link to parent transaction via transaction_hash
Link to parent call via trace_address array

Token Transfer Extraction

ERC-20 Transfer Detection

Detection Method:

Look for Transfer(address,address,uint256) event
Decode event parameters (from, to, value)
Store in token_transfers table
Update token holder balances

Data Stored:

Token contract address
From address
To address
Amount (with decimals)
Block number
Transaction hash
Log index

ERC-721 Transfer Detection

Similar to ERC-20 but:

Token ID is tracked (unique NFT)
Transfer can be from zero address (mint) or to zero address (burn)

ERC-1155 Transfer Detection

Events:

TransferSingle: Single token transfer
TransferBatch: Batch token transfer

Challenges:

Multiple token IDs and amounts per transfer
Batch operations require array decoding

Token Holder Tracking

Purpose: Maintain list of addresses holding each token.

Strategy:

Real-time updates: Update on each transfer
Periodic reconciliation: Verify balances via RPC
Balance snapshots: Store balance at each block (for historical queries)

Indexer Worker Scaling and Partitioning

Horizontal Scaling

Strategy: Multiple indexer workers processing different blocks/chains.

Partitioning Methods:

By Chain: Each worker handles one chain
By Block Range: Workers split block ranges (for backfill)
By Processing Stage: Separate workers for blocks, traces, token transfers

Worker Coordination

Mechanisms:

Message queue: Workers consume from shared queue
Database locks: Prevent duplicate processing
Leader election: For single-worker tasks (reorg handling)

Load Balancing

Distribution:

Round-robin for backfill workers
Sticky sessions for chain-specific workers
Priority queuing: Real-time blocks before historical blocks

Performance Targets

Throughput:

Process 100 blocks/minute per worker
Process 1000 transactions/minute per worker
Process 100 traces/minute per worker (trace operations are slower)

Latency:

Real-time blocks: Indexed within 5 seconds of block production
Historical blocks: Catch up to chain head within reasonable time

Data Consistency

Transaction Isolation

Strategy: Process blocks atomically (all or nothing).

Implementation:

Database transactions for block-level operations
Idempotent processing (can safely retry)
Checkpoint system to track last processed block

Idempotency

Requirements:

Processing same block multiple times should not create duplicates
Use unique constraints in database
Upsert operations where applicable

Error Handling and Retry Logic

Error Types

Transient Errors: Network issues, temporary RPC failures
- Retry with exponential backoff
- Max retries: 10
- Max backoff: 5 minutes
Permanent Errors: Invalid data, unsupported features
- Log error and skip
- Alert for investigation
Reorg Errors: Block replaced by different block
- Handle via reorg detection (see reorg handling spec)

Retry Strategy

Exponential Backoff:

Initial delay: 1 second
Multiplier: 2x
Max delay: 5 minutes
Jitter: Random ±20% to avoid thundering herd

Monitoring and Observability

Key Metrics

Throughput:

Blocks processed per minute
Transactions processed per minute
Logs indexed per minute

Latency:

Time from block production to index completion
Time to process block (p50, p95, p99)

Lag:

Block height lag (current block - last indexed block)
Time lag (current time - last indexed block time)

Errors:

Error rate by type
Retry count
Failed blocks

Alerting Rules

Block lag > 10 blocks: Warning
Block lag > 100 blocks: Critical
Error rate > 1%: Warning
Error rate > 5%: Critical
Worker down: Critical

Integration Points

RPC Node Integration

See ../infrastructure/node-rpc-architecture.md
Connection pooling
Rate limiting awareness
Failover handling

Database Integration

See ../database/postgres-schema.md
Connection pooling
Batch inserts for performance
Transaction management

Search Integration

See ../database/search-index-schema.md
Async indexing to Elasticsearch
Bulk indexing for efficiency

Implementation Guidelines

Technology Stack

Recommended:

Language: Go, Rust, or Python (performance considerations)
Queue: Kafka (high throughput) or RabbitMQ (simpler setup)
Database: PostgreSQL with connection pooling
Caching: Redis for frequently accessed data

Code Structure

indexer/
├── cmd/
│   ├── block-listener/      # Real-time block listener
│   ├── backfill-worker/     # Historical indexing worker
│   └── processor/           # Block/transaction processor
├── internal/
│   ├── ingestion/           # Ingestion logic
│   ├── processing/          # Processing logic
│   ├── decoding/            # ABI/signature decoding
│   └── persistence/         # Database operations
└── pkg/
    ├── abi/                 # ABI registry
    └── rpc/                 # RPC client

Testing Strategy

Unit Tests:

Decoding logic
Data transformation
Error handling

Integration Tests:

End-to-end block processing
Database operations
Queue integration

Load Tests:

Process historical blocks
Simulate high block production rate
Test worker scaling

References

Data Models: See data-models.md
Reorg Handling: See reorg-handling.md
Database Schema: See ../database/postgres-schema.md
RPC Architecture: See ../infrastructure/node-rpc-architecture.md

13 KiB Raw Blame History

Indexer Architecture Specification

Overview

Architecture

Block Ingestion Pipeline

Block Listener (Real-time)

Backfill Worker (Historical)

Message Queue

Transaction Processing Flow

Block Processing

Transaction Decoding

ABI Registry

Signature Database

Event Log Indexing

Log Processing

Event Decoding

Trace Processing

Call Trace Extraction

Internal Transaction Tracking

Token Transfer Extraction

ERC-20 Transfer Detection

ERC-721 Transfer Detection

ERC-1155 Transfer Detection

Token Holder Tracking

Indexer Worker Scaling and Partitioning

Horizontal Scaling

Worker Coordination

Load Balancing

Performance Targets

Data Consistency

Transaction Isolation

Idempotency

Error Handling and Retry Logic

Error Types

Retry Strategy

Monitoring and Observability

Key Metrics

Alerting Rules

Integration Points

RPC Node Integration

Database Integration

Search Integration

Implementation Guidelines

Technology Stack

Code Structure

Testing Strategy

References

13 KiB

Raw Blame History