Add full monorepo: virtual-banker, backend, frontend, docs, scripts, deployment

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-10 11:32:49 -08:00
parent 4d4f8cedad
commit 903c03c65b
815 changed files with 125522 additions and 264 deletions
--- a/docs/specs/database/data-lake-schema.md
+++ b/docs/specs/database/data-lake-schema.md
@@ -0,0 +1,294 @@
+# Data Lake Schema Specification
+
+## Overview
+
+This document specifies the data lake schema for long-term storage of blockchain data in S3-compatible object storage using Parquet format for analytics, ML, and compliance purposes.
+
+## Storage Structure
+
+### Directory Layout
+
+```
+s3://explorer-data-lake/
+├── raw/
+│   ├── chain_id=138/
+│   │   ├── year=2024/
+│   │   │   ├── month=01/
+│   │   │   │   ├── day=01/
+│   │   │   │   │   ├── blocks.parquet
+│   │   │   │   │   ├── transactions.parquet
+│   │   │   │   │   └── logs.parquet
+│   │   │   │   └── ...
+│   │   │   └── ...
+│   │   └── ...
+│   └── ...
+├── processed/
+│   ├── chain_id=138/
+│   │   ├── daily_aggregates/
+│   │   │   ├── year=2024/
+│   │   │   │   └── month=01/
+│   │   │   │       └── day=01.parquet
+│   │   └── ...
+│   └── ...
+└── archived/
+    └── ...
+```
+
+### Partitioning Strategy
+
+**Partition Keys**:
+- `chain_id`: Chain identifier
+- `year`: Year (YYYY)
+- `month`: Month (MM)
+- `day`: Day (DD)
+
+**Benefits**:
+- Efficient query pruning
+- Parallel processing
+- Easy data management (delete by partition)
+
+## Parquet Schema
+
+### Blocks Parquet Schema
+
+```json
+{
+  "type": "struct",
+  "fields": [
+    {"name": "chain_id", "type": "integer", "nullable": false},
+    {"name": "number", "type": "long", "nullable": false},
+    {"name": "hash", "type": "string", "nullable": false},
+    {"name": "parent_hash", "type": "string", "nullable": false},
+    {"name": "timestamp", "type": "timestamp", "nullable": false},
+    {"name": "miner", "type": "string", "nullable": true},
+    {"name": "gas_used", "type": "long", "nullable": true},
+    {"name": "gas_limit", "type": "long", "nullable": true},
+    {"name": "transaction_count", "type": "integer", "nullable": true},
+    {"name": "size", "type": "integer", "nullable": true}
+  ]
+}
+```
+
+### Transactions Parquet Schema
+
+```json
+{
+  "type": "struct",
+  "fields": [
+    {"name": "chain_id", "type": "integer", "nullable": false},
+    {"name": "hash", "type": "string", "nullable": false},
+    {"name": "block_number", "type": "long", "nullable": false},
+    {"name": "transaction_index", "type": "integer", "nullable": false},
+    {"name": "from_address", "type": "string", "nullable": false},
+    {"name": "to_address", "type": "string", "nullable": true},
+    {"name": "value", "type": "string", "nullable": false}, // Decimal as string
+    {"name": "gas_price", "type": "long", "nullable": true},
+    {"name": "gas_used", "type": "long", "nullable": true},
+    {"name": "gas_limit", "type": "long", "nullable": false},
+    {"name": "status", "type": "integer", "nullable": true},
+    {"name": "timestamp", "type": "timestamp", "nullable": false}
+  ]
+}
+```
+
+### Logs Parquet Schema
+
+```json
+{
+  "type": "struct",
+  "fields": [
+    {"name": "chain_id", "type": "integer", "nullable": false},
+    {"name": "transaction_hash", "type": "string", "nullable": false},
+    {"name": "block_number", "type": "long", "nullable": false},
+    {"name": "log_index", "type": "integer", "nullable": false},
+    {"name": "address", "type": "string", "nullable": false},
+    {"name": "topic0", "type": "string", "nullable": true},
+    {"name": "topic1", "type": "string", "nullable": true},
+    {"name": "topic2", "type": "string", "nullable": true},
+    {"name": "topic3", "type": "string", "nullable": true},
+    {"name": "data", "type": "string", "nullable": true},
+    {"name": "timestamp", "type": "timestamp", "nullable": false}
+  ]
+}
+```
+
+### Token Transfers Parquet Schema
+
+```json
+{
+  "type": "struct",
+  "fields": [
+    {"name": "chain_id", "type": "integer", "nullable": false},
+    {"name": "transaction_hash", "type": "string", "nullable": false},
+    {"name": "block_number", "type": "long", "nullable": false},
+    {"name": "token_address", "type": "string", "nullable": false},
+    {"name": "token_type", "type": "string", "nullable": false},
+    {"name": "from_address", "type": "string", "nullable": false},
+    {"name": "to_address", "type": "string", "nullable": false},
+    {"name": "amount", "type": "string", "nullable": true},
+    {"name": "token_id", "type": "string", "nullable": true},
+    {"name": "timestamp", "type": "timestamp", "nullable": false}
+  ]
+}
+```
+
+## Data Ingestion
+
+### ETL Pipeline
+
+**Process**:
+1. Extract: Query PostgreSQL for daily data
+2. Transform: Convert to Parquet format
+3. Load: Upload to S3 with partitioning
+
+**Schedule**: Daily batch job after day ends
+
+**Tools**: Apache Spark, AWS Glue, or custom ETL scripts
+
+### Compression
+
+**Format**: Snappy compression (good balance of speed and compression ratio)
+
+**Alternative**: Gzip (better compression, slower)
+
+### File Sizing
+
+**Target Size**: 100-500 MB per Parquet file
+- Smaller files: Better parallelism
+- Larger files: Better compression
+
+**Strategy**: Write files of target size, or split by time ranges
+
+## Query Interface
+
+### AWS Athena / Presto
+
+**Table Definition**:
+```sql
+CREATE EXTERNAL TABLE blocks_138 (
+  chain_id int,
+  number bigint,
+  hash string,
+  parent_hash string,
+  timestamp timestamp,
+  miner string,
+  gas_used bigint,
+  gas_limit bigint,
+  transaction_count int,
+  size int
+)
+STORED AS PARQUET
+LOCATION 's3://explorer-data-lake/raw/chain_id=138/'
+TBLPROPERTIES (
+  'projection.enabled' = 'true',
+  'projection.year.type' = 'integer',
+  'projection.year.range' = '2020,2030',
+  'projection.month.type' = 'integer',
+  'projection.month.range' = '1,12',
+  'projection.day.type' = 'integer',
+  'projection.day.range' = '1,31'
+);
+```
+
+### Query Examples
+
+**Daily Transaction Count**:
+```sql
+SELECT 
+  DATE(timestamp) as date,
+  COUNT(*) as transaction_count
+FROM transactions_138
+WHERE year = 2024 AND month = 1
+GROUP BY DATE(timestamp)
+ORDER BY date;
+```
+
+**Token Transfer Analytics**:
+```sql
+SELECT 
+  token_address,
+  COUNT(*) as transfer_count,
+  SUM(CAST(amount AS DECIMAL(78, 0))) as total_volume
+FROM token_transfers_138
+WHERE year = 2024 AND month = 1
+GROUP BY token_address
+ORDER BY total_volume DESC
+LIMIT 100;
+```
+
+## Data Retention
+
+### Retention Policies
+
+**Raw Data**: 7 years (compliance requirement)
+**Processed Aggregates**: Indefinite
+**Archived Data**: Move to Glacier after 1 year
+
+### Lifecycle Policies
+
+**S3 Lifecycle Rules**:
+1. Move to Infrequent Access after 30 days
+2. Move to Glacier after 1 year
+3. Delete after 7 years (raw data)
+
+## Data Processing
+
+### Aggregation Jobs
+
+**Daily Aggregates**:
+- Transaction counts by hour
+- Gas usage statistics
+- Token transfer volumes
+- Address activity metrics
+
+**Monthly Aggregates**:
+- Network growth metrics
+- Token distribution changes
+- Protocol usage statistics
+
+### ML/Analytics Workflows
+
+**Use Cases**:
+- Anomaly detection
+- Fraud detection
+- Market analysis
+- Network health monitoring
+
+**Tools**: Spark, Pandas, Jupyter notebooks
+
+## Security and Access Control
+
+### Access Control
+
+**IAM Policies**: Restrict access to specific prefixes
+**Encryption**: Server-side encryption (SSE-S3 or SSE-KMS)
+**Audit Logging**: Enable S3 access logging
+
+### Data Classification
+
+**Public Data**: Blocks, transactions (public blockchain data)
+**Sensitive Data**: User addresses, labels (requires authentication)
+**Compliance Data**: Banking/transaction data (strict access control)
+
+## Cost Optimization
+
+### Storage Optimization
+
+**Strategies**:
+- Use appropriate storage classes (Standard, IA, Glacier)
+- Compress data (Parquet + Snappy)
+- Delete old data per retention policy
+- Use intelligent tiering
+
+### Query Optimization
+
+**Strategies**:
+- Partition pruning (query only relevant partitions)
+- Column pruning (select only needed columns)
+- Predicate pushdown (filter early)
+
+## References
+
+- Database Schema: See `postgres-schema.md`
+- Analytics: See `../observability/metrics-monitoring.md`
+
--- a/docs/specs/database/graph-schema.md
+++ b/docs/specs/database/graph-schema.md
@@ -0,0 +1,300 @@
+# Graph Database Schema Specification
+
+## Overview
+
+This document specifies the Neo4j graph database schema for storing cross-chain entity relationships, address clustering, and protocol interactions.
+
+## Schema Design
+
+### Node Types
+
+#### Address Node
+
+**Labels**: `Address`, `Chain{chain_id}` (e.g., `Chain138`)
+
+**Properties**:
+```cypher
+{
+  address: "0x...",           // Unique identifier
+  chainId: 138,                // Chain ID
+  label: "My Wallet",          // Optional label
+  isContract: false,           // Is contract address
+  firstSeen: timestamp,        // First seen timestamp
+  lastSeen: timestamp,         // Last seen timestamp
+  transactionCount: 100,       // Transaction count
+  balance: "1.5"               // Current balance (string for precision)
+}
+```
+
+**Constraints**:
+```cypher
+CREATE CONSTRAINT address_address_chain_id FOR (a:Address) 
+REQUIRE (a.address, a.chainId) IS UNIQUE;
+```
+
+#### Contract Node
+
+**Labels**: `Contract`, `Address`
+
+**Properties**: Inherits from Address, plus:
+```cypher
+{
+  name: "MyToken",
+  verificationStatus: "verified",
+  compilerVersion: "0.8.19"
+}
+```
+
+#### Token Node
+
+**Labels**: `Token`, `Contract`
+
+**Properties**: Inherits from Contract, plus:
+```cypher
+{
+  symbol: "MTK",
+  decimals: 18,
+  totalSupply: "1000000",
+  type: "ERC20"  // ERC20, ERC721, ERC1155
+}
+```
+
+#### Protocol Node
+
+**Labels**: `Protocol`
+
+**Properties**:
+```cypher
+{
+  name: "Uniswap V3",
+  category: "DEX",
+  website: "https://uniswap.org"
+}
+```
+
+### Relationship Types
+
+#### TRANSFERRED_TO
+
+**Purpose**: Track token transfers between addresses.
+
+**Properties**:
+```cypher
+{
+  amount: "1000000000000000000",
+  tokenAddress: "0x...",
+  transactionHash: "0x...",
+  blockNumber: 12345,
+  timestamp: timestamp
+}
+```
+
+**Example**:
+```cypher
+(a1:Address {address: "0x..."})-[r:TRANSFERRED_TO {
+  amount: "1000000000000000000",
+  tokenAddress: "0x...",
+  transactionHash: "0x..."
+}]->(a2:Address {address: "0x..."})
+```
+
+#### CALLED
+
+**Purpose**: Track contract calls between addresses.
+
+**Properties**:
+```cypher
+{
+  transactionHash: "0x...",
+  blockNumber: 12345,
+  timestamp: timestamp,
+  gasUsed: 21000,
+  method: "transfer"
+}
+```
+
+#### OWNS
+
+**Purpose**: Track token ownership (current balances).
+
+**Properties**:
+```cypher
+{
+  balance: "1000000000000000000",
+  tokenId: "123",  // For ERC-721/1155
+  updatedAt: timestamp
+}
+```
+
+**Example**:
+```cypher
+(a:Address)-[r:OWNS {
+  balance: "1000000000000000000",
+  updatedAt: timestamp
+}]->(t:Token)
+```
+
+#### INTERACTS_WITH
+
+**Purpose**: Track protocol interactions.
+
+**Properties**:
+```cypher
+{
+  interactionType: "swap",  // swap, deposit, withdraw, etc.
+  transactionHash: "0x...",
+  timestamp: timestamp
+}
+```
+
+**Example**:
+```cypher
+(a:Address)-[r:INTERACTS_WITH {
+  interactionType: "swap",
+  transactionHash: "0x..."
+}]->(p:Protocol)
+```
+
+#### CLUSTERED_WITH
+
+**Purpose**: Link addresses that belong to the same entity (address clustering).
+
+**Properties**:
+```cypher
+{
+  confidence: 0.95,  // Clustering confidence score
+  method: "heuristic",  // Clustering method
+  createdAt: timestamp
+}
+```
+
+#### CCIP_MESSAGE_LINK
+
+**Purpose**: Link transactions across chains via CCIP messages.
+
+**Properties**:
+```cypher
+{
+  messageId: "0x...",
+  sourceTxHash: "0x...",
+  destTxHash: "0x...",
+  status: "delivered",
+  timestamp: timestamp
+}
+```
+
+**Example**:
+```cypher
+(srcTx:Transaction)-[r:CCIP_MESSAGE_LINK {
+  messageId: "0x...",
+  status: "delivered"
+}]->(destTx:Transaction)
+```
+
+## Query Patterns
+
+### Find Token Holders
+
+```cypher
+MATCH (t:Token {address: "0x...", chainId: 138})-[r:OWNS]-(a:Address)
+WHERE r.balance > "0"
+RETURN a.address, r.balance
+ORDER BY toFloat(r.balance) DESC
+LIMIT 100;
+```
+
+### Find Transfer Path
+
+```cypher
+MATCH path = (a1:Address {address: "0x..."})-[:TRANSFERRED_TO*1..3]-(a2:Address {address: "0x..."})
+WHERE ALL(r in relationships(path) WHERE r.tokenAddress = "0x...")
+RETURN path
+LIMIT 10;
+```
+
+### Find Protocol Users
+
+```cypher
+MATCH (a:Address)-[r:INTERACTS_WITH]->(p:Protocol {name: "Uniswap V3"})
+RETURN a.address, count(r) as interactionCount
+ORDER BY interactionCount DESC
+LIMIT 100;
+```
+
+### Address Clustering
+
+```cypher
+MATCH (a1:Address)-[r:CLUSTERED_WITH]-(a2:Address)
+WHERE a1.address = "0x..."
+RETURN a2.address, r.confidence, r.method;
+```
+
+### Cross-Chain CCIP Links
+
+```cypher
+MATCH (srcTx:Transaction {hash: "0x..."})-[r:CCIP_MESSAGE_LINK]-(destTx:Transaction)
+RETURN srcTx, r, destTx;
+```
+
+## Data Ingestion
+
+### Transaction Ingestion
+
+**Process**:
+1. Process transaction from indexer
+2. Create/update address nodes
+3. Create TRANSFERRED_TO relationships for token transfers
+4. Create CALLED relationships for contract calls
+5. Update OWNS relationships for token balances
+
+### Batch Ingestion
+
+**Strategy**:
+- Use Neo4j Batch API for bulk inserts
+- Batch size: 1000-10000 operations
+- Use transactions for atomicity
+
+### Incremental Updates
+
+**Process**:
+- Update relationships as new transactions processed
+- Maintain OWNS relationships (update balances)
+- Add new relationships for new interactions
+
+## Performance Optimization
+
+### Indexing
+
+**Indexes**:
+```cypher
+CREATE INDEX address_address FOR (a:Address) ON (a.address);
+CREATE INDEX address_chain_id FOR (a:Address) ON (a.chainId);
+CREATE INDEX transaction_hash FOR (t:Transaction) ON (t.hash);
+```
+
+### Relationship Constraints
+
+**Uniqueness**: Use MERGE to avoid duplicate relationships
+
+**Example**:
+```cypher
+MATCH (a1:Address {address: "0x...", chainId: 138})
+MATCH (a2:Address {address: "0x...", chainId: 138})
+MERGE (a1)-[r:TRANSFERRED_TO {
+  transactionHash: "0x..."
+}]->(a2)
+ON CREATE SET r.amount = "1000000", r.timestamp = timestamp();
+```
+
+## Data Retention
+
+**Strategy**:
+- Keep all current relationships
+- Archive old relationships (older than 1 year) to separate database
+- Keep aggregated statistics (interaction counts) instead of all relationships
+
+## References
+
+- Entity Graph: See `../multichain/entity-graph.md`
+- CCIP Integration: See `../ccip/ccip-tracking.md`
+
--- a/docs/specs/database/postgres-schema.md
+++ b/docs/specs/database/postgres-schema.md
@@ -0,0 +1,517 @@
+# PostgreSQL Database Schema Specification
+
+## Overview
+
+This document specifies the complete PostgreSQL database schema for the explorer platform. The schema is designed to support multi-chain operation, high-performance queries, and data consistency.
+
+## Schema Design Principles
+
+1. **Multi-chain Support**: All tables include `chain_id` for chain isolation
+2. **Normalization**: Normalized structure to avoid data duplication
+3. **Performance**: Strategic indexing for common query patterns
+4. **Consistency**: Foreign key constraints where appropriate
+5. **Extensibility**: JSONB columns for flexible data storage
+6. **Partitioning**: Large tables partitioned by `chain_id`
+
+## Core Tables
+
+### Blocks Table
+
+See `../indexing/data-models.md` for detailed block schema.
+
+**Partitioning**: Partition by `chain_id` for large deployments.
+
+**Key Indexes**:
+- Primary: `(chain_id, number)`
+- Unique: `(chain_id, hash)`
+- Index: `(chain_id, timestamp)` for time-range queries
+
+### Transactions Table
+
+See `../indexing/data-models.md` for detailed transaction schema.
+
+**Key Indexes**:
+- Primary: `(chain_id, hash)`
+- Index: `(chain_id, block_number, transaction_index)` for block queries
+- Index: `(chain_id, from_address)` for address queries
+- Index: `(chain_id, to_address)` for address queries
+- Index: `(chain_id, block_number, from_address)` for compound queries
+
+### Logs Table
+
+See `../indexing/data-models.md` for detailed log schema.
+
+**Key Indexes**:
+- Primary: `(chain_id, transaction_hash, log_index)`
+- Index: `(chain_id, address)` for contract event queries
+- Index: `(chain_id, topic0)` for event type queries
+- Index: `(chain_id, address, topic0)` for filtered event queries
+- Index: `(chain_id, block_number)` for block-based queries
+
+### Traces Table
+
+See `../indexing/data-models.md` for detailed trace schema.
+
+**Key Indexes**:
+- Primary: `(chain_id, transaction_hash, trace_address)`
+- Index: `(chain_id, action_from)` for address queries
+- Index: `(chain_id, action_to)` for address queries
+- Index: `(chain_id, block_number)` for block queries
+
+### Internal Transactions Table
+
+See `../indexing/data-models.md` for detailed internal transaction schema.
+
+**Key Indexes**:
+- Primary: `(chain_id, transaction_hash, trace_address)`
+- Index: `(chain_id, from_address)`
+- Index: `(chain_id, to_address)`
+- Index: `(chain_id, block_number)`
+
+## Token Tables
+
+### Tokens Table
+
+```sql
+CREATE TABLE tokens (
+    id BIGSERIAL,
+    chain_id INTEGER NOT NULL,
+    address VARCHAR(42) NOT NULL,
+    type VARCHAR(10) NOT NULL CHECK (type IN ('ERC20', 'ERC721', 'ERC1155')),
+    name VARCHAR(255),
+    symbol VARCHAR(50),
+    decimals INTEGER CHECK (decimals >= 0 AND decimals <= 18),
+    total_supply NUMERIC(78, 0),
+    holder_count INTEGER DEFAULT 0,
+    transfer_count INTEGER DEFAULT 0,
+    logo_url TEXT,
+    website_url TEXT,
+    description TEXT,
+    verified BOOLEAN DEFAULT false,
+    created_at TIMESTAMP DEFAULT NOW(),
+    updated_at TIMESTAMP DEFAULT NOW(),
+    PRIMARY KEY (id),
+    UNIQUE (chain_id, address)
+) PARTITION BY LIST (chain_id);
+
+CREATE INDEX idx_tokens_chain_address ON tokens(chain_id, address);
+CREATE INDEX idx_tokens_chain_type ON tokens(chain_id, type);
+CREATE INDEX idx_tokens_chain_symbol ON tokens(chain_id, symbol);
+```
+
+### Token Transfers Table
+
+```sql
+CREATE TABLE token_transfers (
+    id BIGSERIAL,
+    chain_id INTEGER NOT NULL,
+    transaction_hash VARCHAR(66) NOT NULL,
+    block_number BIGINT NOT NULL,
+    log_index INTEGER NOT NULL,
+    token_address VARCHAR(42) NOT NULL,
+    token_type VARCHAR(10) NOT NULL CHECK (token_type IN ('ERC20', 'ERC721', 'ERC1155')),
+    from_address VARCHAR(42) NOT NULL,
+    to_address VARCHAR(42) NOT NULL,
+    amount NUMERIC(78, 0),
+    token_id VARCHAR(78),
+    operator VARCHAR(42),
+    created_at TIMESTAMP DEFAULT NOW(),
+    PRIMARY KEY (id),
+    FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash),
+    FOREIGN KEY (chain_id, token_address) REFERENCES tokens(chain_id, address),
+    UNIQUE (chain_id, transaction_hash, log_index)
+) PARTITION BY LIST (chain_id);
+
+CREATE INDEX idx_token_transfers_chain_token ON token_transfers(chain_id, token_address);
+CREATE INDEX idx_token_transfers_chain_from ON token_transfers(chain_id, from_address);
+CREATE INDEX idx_token_transfers_chain_to ON token_transfers(chain_id, to_address);
+CREATE INDEX idx_token_transfers_chain_tx ON token_transfers(chain_id, transaction_hash);
+CREATE INDEX idx_token_transfers_chain_block ON token_transfers(chain_id, block_number);
+CREATE INDEX idx_token_transfers_chain_token_from ON token_transfers(chain_id, token_address, from_address);
+CREATE INDEX idx_token_transfers_chain_token_to ON token_transfers(chain_id, token_address, to_address);
+```
+
+### Token Holders Table (Optional)
+
+**Purpose**: Maintain current token balances for efficient queries.
+
+```sql
+CREATE TABLE token_holders (
+    id BIGSERIAL,
+    chain_id INTEGER NOT NULL,
+    token_address VARCHAR(42) NOT NULL,
+    address VARCHAR(42) NOT NULL,
+    balance NUMERIC(78, 0) NOT NULL DEFAULT 0,
+    token_id VARCHAR(78), -- For ERC-721/1155
+    updated_at TIMESTAMP DEFAULT NOW(),
+    PRIMARY KEY (id),
+    FOREIGN KEY (chain_id, token_address) REFERENCES tokens(chain_id, address),
+    UNIQUE (chain_id, token_address, address, COALESCE(token_id, ''))
+) PARTITION BY LIST (chain_id);
+
+CREATE INDEX idx_token_holders_chain_token ON token_holders(chain_id, token_address);
+CREATE INDEX idx_token_holders_chain_address ON token_holders(chain_id, address);
+```
+
+## Contract Tables
+
+### Contracts Table
+
+```sql
+CREATE TABLE contracts (
+    id BIGSERIAL,
+    chain_id INTEGER NOT NULL,
+    address VARCHAR(42) NOT NULL,
+    name VARCHAR(255),
+    compiler_version VARCHAR(50),
+    optimization_enabled BOOLEAN,
+    optimization_runs INTEGER,
+    evm_version VARCHAR(20),
+    source_code TEXT,
+    abi JSONB,
+    constructor_arguments TEXT,
+    verification_status VARCHAR(20) NOT NULL CHECK (verification_status IN ('pending', 'verified', 'failed')),
+    verified_at TIMESTAMP,
+    verification_method VARCHAR(50),
+    license VARCHAR(50),
+    created_at TIMESTAMP DEFAULT NOW(),
+    updated_at TIMESTAMP DEFAULT NOW(),
+    PRIMARY KEY (id),
+    UNIQUE (chain_id, address)
+) PARTITION BY LIST (chain_id);
+
+CREATE INDEX idx_contracts_chain_address ON contracts(chain_id, address);
+CREATE INDEX idx_contracts_chain_verified ON contracts(chain_id, verification_status);
+CREATE INDEX idx_contracts_abi_gin ON contracts USING GIN (abi); -- For ABI queries
+```
+
+### Contract ABIs Table
+
+```sql
+CREATE TABLE contract_abis (
+    id BIGSERIAL,
+    chain_id INTEGER NOT NULL,
+    address VARCHAR(42) NOT NULL,
+    abi JSONB NOT NULL,
+    source VARCHAR(50) NOT NULL,
+    verified BOOLEAN DEFAULT false,
+    created_at TIMESTAMP DEFAULT NOW(),
+    updated_at TIMESTAMP DEFAULT NOW(),
+    PRIMARY KEY (id),
+    UNIQUE (chain_id, address)
+) PARTITION BY LIST (chain_id);
+
+CREATE INDEX idx_abis_chain_address ON contract_abis(chain_id, address);
+CREATE INDEX idx_abis_abi_gin ON contract_abis USING GIN (abi);
+```
+
+### Contract Verifications Table
+
+```sql
+CREATE TABLE contract_verifications (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    chain_id INTEGER NOT NULL,
+    address VARCHAR(42) NOT NULL,
+    status VARCHAR(20) NOT NULL CHECK (status IN ('pending', 'processing', 'verified', 'failed', 'partially_verified')),
+    compiler_version VARCHAR(50),
+    optimization_enabled BOOLEAN,
+    optimization_runs INTEGER,
+    evm_version VARCHAR(20),
+    source_code TEXT,
+    abi JSONB,
+    constructor_arguments TEXT,
+    verification_method VARCHAR(50),
+    error_message TEXT,
+    verified_at TIMESTAMP,
+    version INTEGER DEFAULT 1,
+    is_active BOOLEAN DEFAULT true,
+    created_at TIMESTAMP DEFAULT NOW(),
+    updated_at TIMESTAMP DEFAULT NOW(),
+    FOREIGN KEY (chain_id, address) REFERENCES contracts(chain_id, address)
+);
+
+CREATE INDEX idx_verifications_chain_address ON contract_verifications(chain_id, address);
+CREATE INDEX idx_verifications_status ON contract_verifications(status);
+```
+
+## Address-Related Tables
+
+### Address Labels Table
+
+```sql
+CREATE TABLE address_labels (
+    id BIGSERIAL,
+    chain_id INTEGER NOT NULL,
+    address VARCHAR(42) NOT NULL,
+    label VARCHAR(255) NOT NULL,
+    label_type VARCHAR(20) NOT NULL CHECK (label_type IN ('user', 'public', 'contract_name')),
+    user_id UUID,
+    source VARCHAR(50),
+    created_at TIMESTAMP DEFAULT NOW(),
+    updated_at TIMESTAMP DEFAULT NOW(),
+    PRIMARY KEY (id),
+    UNIQUE (chain_id, address, label_type, user_id),
+    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
+);
+
+CREATE INDEX idx_labels_chain_address ON address_labels(chain_id, address);
+CREATE INDEX idx_labels_chain_user ON address_labels(chain_id, user_id);
+```
+
+### Address Tags Table
+
+```sql
+CREATE TABLE address_tags (
+    id BIGSERIAL,
+    chain_id INTEGER NOT NULL,
+    address VARCHAR(42) NOT NULL,
+    tag VARCHAR(50) NOT NULL,
+    tag_type VARCHAR(20) NOT NULL CHECK (tag_type IN ('category', 'risk', 'protocol')),
+    user_id UUID,
+    created_at TIMESTAMP DEFAULT NOW(),
+    PRIMARY KEY (id),
+    UNIQUE (chain_id, address, tag, user_id),
+    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
+);
+
+CREATE INDEX idx_tags_chain_address ON address_tags(chain_id, address);
+CREATE INDEX idx_tags_chain_tag ON address_tags(chain_id, tag);
+```
+
+## User Tables
+
+### Users Table
+
+```sql
+CREATE TABLE users (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    email VARCHAR(255) UNIQUE,
+    username VARCHAR(100) UNIQUE,
+    password_hash TEXT,
+    api_key_hash TEXT,
+    created_at TIMESTAMP DEFAULT NOW(),
+    updated_at TIMESTAMP DEFAULT NOW(),
+    last_login_at TIMESTAMP
+);
+
+CREATE INDEX idx_users_email ON users(email);
+CREATE INDEX idx_users_username ON users(username);
+```
+
+### Watchlists Table
+
+```sql
+CREATE TABLE watchlists (
+    id BIGSERIAL,
+    user_id UUID NOT NULL,
+    chain_id INTEGER NOT NULL,
+    address VARCHAR(42) NOT NULL,
+    label VARCHAR(255),
+    created_at TIMESTAMP DEFAULT NOW(),
+    PRIMARY KEY (id),
+    UNIQUE (user_id, chain_id, address),
+    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
+);
+
+CREATE INDEX idx_watchlists_user ON watchlists(user_id);
+CREATE INDEX idx_watchlists_chain_address ON watchlists(chain_id, address);
+```
+
+### API Keys Table
+
+```sql
+CREATE TABLE api_keys (
+    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+    user_id UUID NOT NULL,
+    key_hash TEXT NOT NULL UNIQUE,
+    name VARCHAR(255),
+    tier VARCHAR(20) NOT NULL CHECK (tier IN ('free', 'pro', 'enterprise')),
+    rate_limit_per_second INTEGER,
+    rate_limit_per_minute INTEGER,
+    ip_whitelist TEXT[], -- Array of CIDR blocks
+    last_used_at TIMESTAMP,
+    expires_at TIMESTAMP,
+    revoked BOOLEAN DEFAULT false,
+    created_at TIMESTAMP DEFAULT NOW(),
+    FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
+);
+
+CREATE INDEX idx_api_keys_user ON api_keys(user_id);
+CREATE INDEX idx_api_keys_hash ON api_keys(key_hash);
+```
+
+## Multi-Chain Partitioning
+
+### Partitioning Strategy
+
+**Large Tables**: Partition by `chain_id` using LIST partitioning.
+
+**Tables to Partition**:
+- `blocks`
+- `transactions`
+- `logs`
+- `traces`
+- `internal_transactions`
+- `token_transfers`
+- `tokens`
+- `token_holders` (if used)
+
+### Partition Creation
+
+**Example for blocks table**:
+
+```sql
+-- Create parent table
+CREATE TABLE blocks (
+    -- columns
+) PARTITION BY LIST (chain_id);
+
+-- Create partitions
+CREATE TABLE blocks_chain_138 PARTITION OF blocks
+    FOR VALUES IN (138);
+
+CREATE TABLE blocks_chain_1 PARTITION OF blocks
+    FOR VALUES IN (1);
+
+-- Add indexes to partitions (inherited from parent)
+```
+
+**Benefits**:
+- Faster queries (partition pruning)
+- Easier maintenance (per-chain operations)
+- Parallel processing
+- Data isolation
+
+## Indexing Strategy
+
+### Index Types
+
+1. **B-tree**: Default for most indexes (equality, range, sorting)
+2. **Hash**: For exact match only (rarely used, B-tree usually better)
+3. **GIN**: For JSONB columns (ABIs, decoded data)
+4. **BRIN**: For large ordered columns (block numbers, timestamps)
+5. **Partial**: For filtered indexes (e.g., verified contracts only)
+
+### Index Maintenance
+
+**Regular Maintenance**:
+- `VACUUM ANALYZE` regularly (auto-vacuum enabled)
+- `REINDEX` if needed (bloat, corruption)
+- Monitor index usage (`pg_stat_user_indexes`)
+
+**Index Monitoring**:
+- Track index sizes
+- Monitor index bloat
+- Remove unused indexes
+
+## Data Retention and Archiving
+
+### Retention Policies
+
+**Hot Data**: Recent data (last 1 year)
+- Fast access required
+- All indexes maintained
+
+**Warm Data**: Older data (1-5 years)
+- Archive to slower storage
+- Reduced indexing
+
+**Cold Data**: Very old data (5+ years)
+- Archive to object storage
+- Minimal indexing
+
+### Archiving Strategy
+
+**Approach**:
+1. Partition tables by time ranges (monthly/yearly)
+2. Move old partitions to archive storage
+3. Query archive when needed (slower but available)
+
+**Implementation**:
+- Use PostgreSQL table partitioning by date range
+- Move partitions to archive storage (S3, etc.)
+- Query via foreign data wrappers if needed
+
+## Migration Strategy
+
+### Versioning
+
+**Migration Tool**: Use migration tool (Flyway, Liquibase, or custom).
+
+**Versioning Format**: `YYYYMMDDHHMMSS_description.sql`
+
+**Example**:
+```
+20240101000001_initial_schema.sql
+20240115000001_add_token_holders.sql
+20240201000001_add_partitioning.sql
+```
+
+### Migration Best Practices
+
+1. **Backward Compatible**: Additive changes preferred
+2. **Reversible**: All migrations should be reversible
+3. **Tested**: Test on staging before production
+4. **Documented**: Document breaking changes
+5. **Rollback Plan**: Have rollback strategy
+
+### Schema Evolution
+
+**Adding Columns**:
+- Use `ALTER TABLE ADD COLUMN` with default values
+- Avoid NOT NULL without defaults (use two-step migration)
+
+**Removing Columns**:
+- Mark as deprecated first
+- Remove after migration period
+
+**Changing Types**:
+- Create new column
+- Migrate data
+- Drop old column
+- Rename new column
+
+## Performance Optimization
+
+### Query Optimization
+
+**Common Query Patterns**:
+1. Get block by number: Use `(chain_id, number)` index
+2. Get transaction by hash: Use `(chain_id, hash)` index
+3. Get address transactions: Use `(chain_id, from_address)` or `(chain_id, to_address)` index
+4. Filter logs by address and event: Use `(chain_id, address, topic0)` index
+
+### Connection Pooling
+
+**Configuration**:
+- Use connection pooler (PgBouncer, pgpool-II)
+- Pool size: 20-100 connections per application server
+- Statement-level pooling for better concurrency
+
+### Read Replicas
+
+**Strategy**:
+- Primary: Write operations
+- Replicas: Read operations (load balanced)
+- Async replication (small lag acceptable)
+
+## Backup and Recovery
+
+### Backup Strategy
+
+**Full Backups**: Daily full database dumps
+**Incremental Backups**: Continuous WAL archiving
+**Point-in-Time Recovery**: Enabled via WAL archiving
+
+### Recovery Procedures
+
+**RTO Target**: 1 hour
+**RPO Target**: 5 minutes (max data loss)
+
+## References
+
+- Data Models: See `../indexing/data-models.md`
+- Indexer Architecture: See `../indexing/indexer-architecture.md`
+- Search Index Schema: See `search-index-schema.md`
+- Multi-chain Architecture: See `../multichain/multichain-indexing.md`
+
--- a/docs/specs/database/search-index-schema.md
+++ b/docs/specs/database/search-index-schema.md
@@ -0,0 +1,458 @@
+# Search Index Schema Specification
+
+## Overview
+
+This document specifies the Elasticsearch/OpenSearch index schema for full-text search and faceted querying across blocks, transactions, addresses, tokens, and contracts.
+
+## Architecture
+
+```mermaid
+flowchart LR
+    PG[(PostgreSQL<br/>Canonical Data)]
+    Transform[Data Transformer]
+    ES[(Elasticsearch<br/>Search Index)]
+    
+    PG --> Transform
+    Transform --> ES
+    
+    Query[Search Query]
+    Query --> ES
+    ES --> Results[Search Results]
+```
+
+## Index Structure
+
+### Blocks Index
+
+**Index Name**: `blocks-{chain_id}` (e.g., `blocks-138`)
+
+**Document Structure**:
+```json
+{
+  "block_number": 12345,
+  "hash": "0x...",
+  "timestamp": "2024-01-01T00:00:00Z",
+  "miner": "0x...",
+  "transaction_count": 100,
+  "gas_used": 15000000,
+  "gas_limit": 20000000,
+  "chain_id": 138,
+  "parent_hash": "0x...",
+  "size": 1024
+}
+```
+
+**Field Mappings**:
+- `block_number`: `long` (not analyzed, for sorting/filtering)
+- `hash`: `keyword` (exact match)
+- `timestamp`: `date`
+- `miner`: `keyword` (exact match)
+- `transaction_count`: `integer`
+- `gas_used`: `long`
+- `gas_limit`: `long`
+- `chain_id`: `integer`
+- `parent_hash`: `keyword`
+
+**Searchable Fields**:
+- Hash (exact match)
+- Miner address (exact match)
+
+### Transactions Index
+
+**Index Name**: `transactions-{chain_id}`
+
+**Document Structure**:
+```json
+{
+  "hash": "0x...",
+  "block_number": 12345,
+  "transaction_index": 5,
+  "from_address": "0x...",
+  "to_address": "0x...",
+  "value": "1000000000000000000",
+  "gas_price": "20000000000",
+  "gas_used": 21000,
+  "status": "success",
+  "timestamp": "2024-01-01T00:00:00Z",
+  "chain_id": 138,
+  "input_data_length": 100,
+  "is_contract_creation": false,
+  "contract_address": null
+}
+```
+
+**Field Mappings**:
+- `hash`: `keyword`
+- `block_number`: `long`
+- `transaction_index`: `integer`
+- `from_address`: `keyword`
+- `to_address`: `keyword`
+- `value`: `text` (for full-text search on large numbers)
+- `value_numeric`: `long` (for range queries)
+- `gas_price`: `long`
+- `gas_used`: `long`
+- `status`: `keyword`
+- `timestamp`: `date`
+- `chain_id`: `integer`
+- `input_data_length`: `integer`
+- `is_contract_creation`: `boolean`
+- `contract_address`: `keyword`
+
+**Searchable Fields**:
+- Hash (exact match)
+- From/to addresses (exact match)
+- Value (range queries)
+
+### Addresses Index
+
+**Index Name**: `addresses-{chain_id}`
+
+**Document Structure**:
+```json
+{
+  "address": "0x...",
+  "chain_id": 138,
+  "label": "My Wallet",
+  "tags": ["wallet", "exchange"],
+  "token_count": 10,
+  "transaction_count": 500,
+  "first_seen": "2024-01-01T00:00:00Z",
+  "last_seen": "2024-01-15T00:00:00Z",
+  "is_contract": true,
+  "contract_name": "MyToken",
+  "balance_eth": "1.5",
+  "balance_usd": "3000"
+}
+```
+
+**Field Mappings**:
+- `address`: `keyword`
+- `chain_id`: `integer`
+- `label`: `text` (analyzed) + `keyword` (exact match)
+- `tags`: `keyword` (array)
+- `token_count`: `integer`
+- `transaction_count`: `long`
+- `first_seen`: `date`
+- `last_seen`: `date`
+- `is_contract`: `boolean`
+- `contract_name`: `text` + `keyword`
+- `balance_eth`: `double`
+- `balance_usd`: `double`
+
+**Searchable Fields**:
+- Address (exact match, prefix match)
+- Label (full-text search)
+- Contract name (full-text search)
+- Tags (facet filter)
+
+### Tokens Index
+
+**Index Name**: `tokens-{chain_id}`
+
+**Document Structure**:
+```json
+{
+  "address": "0x...",
+  "chain_id": 138,
+  "name": "My Token",
+  "symbol": "MTK",
+  "type": "ERC20",
+  "decimals": 18,
+  "total_supply": "1000000000000000000000000",
+  "holder_count": 1000,
+  "transfer_count": 50000,
+  "logo_url": "https://...",
+  "verified": true,
+  "description": "A token description"
+}
+```
+
+**Field Mappings**:
+- `address`: `keyword`
+- `chain_id`: `integer`
+- `name`: `text` (analyzed) + `keyword` (exact match)
+- `symbol`: `keyword` (uppercase normalized)
+- `type`: `keyword`
+- `decimals`: `integer`
+- `total_supply`: `text` (for large numbers)
+- `total_supply_numeric`: `double` (for sorting)
+- `holder_count`: `integer`
+- `transfer_count`: `long`
+- `logo_url`: `keyword`
+- `verified`: `boolean`
+- `description`: `text` (analyzed)
+
+**Searchable Fields**:
+- Name (full-text search)
+- Symbol (exact match, prefix match)
+- Address (exact match)
+
+### Contracts Index
+
+**Index Name**: `contracts-{chain_id}`
+
+**Document Structure**:
+```json
+{
+  "address": "0x...",
+  "chain_id": 138,
+  "name": "MyContract",
+  "verification_status": "verified",
+  "compiler_version": "0.8.19",
+  "source_code": "contract MyContract {...}",
+  "abi": [...],
+  "verified_at": "2024-01-01T00:00:00Z",
+  "transaction_count": 1000,
+  "created_at": "2024-01-01T00:00:00Z"
+}
+```
+
+**Field Mappings**:
+- `address`: `keyword`
+- `chain_id`: `integer`
+- `name`: `text` + `keyword`
+- `verification_status`: `keyword`
+- `compiler_version`: `keyword`
+- `source_code`: `text` (analyzed, indexed but not stored in full for large contracts)
+- `abi`: `object` (nested, for structured queries)
+- `verified_at`: `date`
+- `transaction_count`: `long`
+- `created_at`: `date`
+
+**Searchable Fields**:
+- Name (full-text search)
+- Address (exact match)
+- Source code (full-text search, limited)
+
+## Indexing Pipeline
+
+### Data Transformation
+
+**Purpose**: Transform canonical PostgreSQL data into search-optimized documents.
+
+**Transformation Steps**:
+1. **Fetch Data**: Query PostgreSQL for entities to index
+2. **Enrich Data**: Add computed fields (balances, counts, etc.)
+3. **Normalize Data**: Normalize addresses, format values
+4. **Index Document**: Send to Elasticsearch/OpenSearch
+
+### Indexing Strategy
+
+**Initial Indexing**:
+- Bulk index existing data
+- Process in batches (1000 documents per batch)
+- Use bulk API for efficiency
+
+**Incremental Indexing**:
+- Index new entities as they're created
+- Update entities when changed
+- Delete entities when removed
+
+**Update Frequency**:
+- Real-time: Index immediately after database insert/update
+- Batch: Bulk update every N minutes for efficiency
+
+### Index Aliases
+
+**Purpose**: Enable zero-downtime index updates.
+
+**Strategy**:
+- Write to new index (e.g., `blocks-138-v2`)
+- Build index in background
+- Switch alias when ready
+- Delete old index after switch
+
+**Alias Names**:
+- `blocks-{chain_id}` → points to latest version
+- `transactions-{chain_id}` → points to latest version
+- etc.
+
+## Query Patterns
+
+### Full-Text Search
+
+**Blocks Search**:
+```json
+{
+  "query": {
+    "match": {
+      "hash": "0x123..."
+    }
+  }
+}
+```
+
+**Address Search**:
+```json
+{
+  "query": {
+    "bool": {
+      "should": [
+        { "match": { "label": "wallet" } },
+        { "prefix": { "address": "0x123" } }
+      ]
+    }
+  }
+}
+```
+
+**Token Search**:
+```json
+{
+  "query": {
+    "bool": {
+      "should": [
+        { "match": { "name": "My Token" } },
+        { "match": { "symbol": "MTK" } }
+      ]
+    }
+  }
+}
+```
+
+### Faceted Search
+
+**Filter by Multiple Criteria**:
+```json
+{
+  "query": {
+    "bool": {
+      "must": [
+        { "term": { "chain_id": 138 } },
+        { "term": { "type": "ERC20" } },
+        { "range": { "holder_count": { "gte": 100 } } }
+      ]
+    }
+  },
+  "aggs": {
+    "by_type": {
+      "terms": { "field": "type" }
+    }
+  }
+}
+```
+
+### Unified Search
+
+**Cross-Entity Search**:
+- Search across blocks, transactions, addresses, tokens
+- Use `_index` field to filter by entity type
+- Combine results with relevance scoring
+
+**Multi-Index Query**:
+```json
+{
+  "query": {
+    "multi_match": {
+      "query": "0x123",
+      "fields": ["hash", "address", "from_address", "to_address"],
+      "type": "best_fields"
+    }
+  }
+}
+```
+
+## Index Configuration
+
+### Analysis Settings
+
+**Custom Analyzer**:
+- Address analyzer: Lowercase, no tokenization
+- Symbol analyzer: Uppercase, no tokenization
+- Text analyzer: Standard analyzer with lowercase
+
+**Example Configuration**:
+```json
+{
+  "settings": {
+    "analysis": {
+      "analyzer": {
+        "address_analyzer": {
+          "type": "custom",
+          "tokenizer": "keyword",
+          "filter": ["lowercase"]
+        }
+      }
+    }
+  }
+}
+```
+
+### Sharding and Replication
+
+**Sharding**:
+- Number of shards: Based on index size
+- Large indices (> 50GB): Multiple shards
+- Small indices: Single shard
+
+**Replication**:
+- Replica count: 1-2 (for high availability)
+- Increase replicas for read-heavy workloads
+
+## Performance Optimization
+
+### Index Optimization
+
+**Refresh Interval**:
+- Default: 1 second
+- For bulk indexing: Increase to 30 seconds, then reset
+
+**Bulk Indexing**:
+- Batch size: 1000-5000 documents
+- Use bulk API
+- Disable refresh during bulk indexing
+
+### Query Optimization
+
+**Query Caching**:
+- Enable query cache for repeated queries
+- Cache filter results
+
+**Field Data**:
+- Use `doc_values` for sorting/aggregations
+- Avoid `fielddata` for text fields
+
+## Maintenance
+
+### Index Monitoring
+
+**Metrics**:
+- Index size
+- Document count
+- Query performance (p50, p95, p99)
+- Index lag (time behind database)
+
+### Index Cleanup
+
+**Strategy**:
+- Delete old indices (after alias switch)
+- Archive old indices to cold storage
+- Compress indices for storage efficiency
+
+## Integration with PostgreSQL
+
+### Data Sync
+
+**Sync Strategy**:
+- Real-time: Listen to database changes (CDC, triggers, or polling)
+- Batch: Periodic sync jobs
+- Hybrid: Real-time for recent data, batch for historical
+
+**Change Detection**:
+- Use `updated_at` timestamp
+- Use database triggers to queue changes
+- Use CDC (Change Data Capture) if available
+
+### Consistency
+
+**Eventual Consistency**:
+- Search index is eventually consistent with database
+- Small lag acceptable (< 1 minute)
+- Critical queries can fall back to database
+
+## References
+
+- Database Schema: See `postgres-schema.md`
+- Indexer Architecture: See `../indexing/indexer-architecture.md`
+- Unified Search: See `../multichain/unified-search.md`
+
--- a/docs/specs/database/timeseries-schema.md
+++ b/docs/specs/database/timeseries-schema.md
@@ -0,0 +1,239 @@
+# Time-Series Database Schema Specification
+
+## Overview
+
+This document specifies the time-series database schema using ClickHouse or TimescaleDB for storing mempool data, metrics, and analytics time-series data.
+
+## Technology Choice
+
+**Option 1: TimescaleDB** (PostgreSQL extension)
+- Pros: PostgreSQL compatibility, SQL interface, easier integration
+- Cons: Less optimized for very high throughput
+
+**Option 2: ClickHouse**
+- Pros: Very high performance, columnar storage, excellent compression
+- Cons: Different SQL dialect, separate infrastructure
+
+**Recommendation**: Start with TimescaleDB for easier integration, migrate to ClickHouse if needed for scale.
+
+## TimescaleDB Schema
+
+### Mempool Transactions Table
+
+**Table**: `mempool_transactions`
+
+```sql
+CREATE TABLE mempool_transactions (
+    time TIMESTAMPTZ NOT NULL,
+    chain_id INTEGER NOT NULL,
+    hash VARCHAR(66) NOT NULL,
+    from_address VARCHAR(42) NOT NULL,
+    to_address VARCHAR(42),
+    value NUMERIC(78, 0),
+    gas_price BIGINT,
+    max_fee_per_gas BIGINT,
+    max_priority_fee_per_gas BIGINT,
+    gas_limit BIGINT,
+    nonce BIGINT,
+    input_data_length INTEGER,
+    first_seen TIMESTAMPTZ NOT NULL,
+    status VARCHAR(20) DEFAULT 'pending', -- 'pending', 'confirmed', 'dropped'
+    confirmed_block_number BIGINT,
+    confirmed_at TIMESTAMPTZ,
+    PRIMARY KEY (time, chain_id, hash)
+);
+
+SELECT create_hypertable('mempool_transactions', 'time');
+
+CREATE INDEX idx_mempool_chain_hash ON mempool_transactions(chain_id, hash);
+CREATE INDEX idx_mempool_chain_from ON mempool_transactions(chain_id, from_address);
+CREATE INDEX idx_mempool_chain_status ON mempool_transactions(chain_id, status, time);
+```
+
+**Retention Policy**: 7 days for detailed data, aggregates for longer periods
+
+### Network Metrics Table
+
+**Table**: `network_metrics`
+
+```sql
+CREATE TABLE network_metrics (
+    time TIMESTAMPTZ NOT NULL,
+    chain_id INTEGER NOT NULL,
+    block_number BIGINT,
+    tps DOUBLE PRECISION, -- Transactions per second
+    gps DOUBLE PRECISION, -- Gas per second
+    avg_gas_price BIGINT,
+    pending_transactions INTEGER,
+    block_time_seconds DOUBLE PRECISION,
+    PRIMARY KEY (time, chain_id)
+);
+
+SELECT create_hypertable('network_metrics', 'time');
+
+CREATE INDEX idx_network_metrics_chain_time ON network_metrics(chain_id, time DESC);
+```
+
+**Aggregation**: Pre-aggregate to 1-minute, 5-minute, 1-hour intervals
+
+### Gas Price History Table
+
+**Table**: `gas_price_history`
+
+```sql
+CREATE TABLE gas_price_history (
+    time TIMESTAMPTZ NOT NULL,
+    chain_id INTEGER NOT NULL,
+    block_number BIGINT,
+    min_gas_price BIGINT,
+    max_gas_price BIGINT,
+    avg_gas_price BIGINT,
+    p25_gas_price BIGINT, -- 25th percentile
+    p50_gas_price BIGINT, -- 50th percentile (median)
+    p75_gas_price BIGINT, -- 75th percentile
+    p95_gas_price BIGINT, -- 95th percentile
+    p99_gas_price BIGINT, -- 99th percentile
+    PRIMARY KEY (time, chain_id)
+);
+
+SELECT create_hypertable('gas_price_history', 'time');
+```
+
+### Address Activity Metrics Table
+
+**Table**: `address_activity_metrics`
+
+```sql
+CREATE TABLE address_activity_metrics (
+    time TIMESTAMPTZ NOT NULL,
+    chain_id INTEGER NOT NULL,
+    address VARCHAR(42) NOT NULL,
+    transaction_count INTEGER,
+    received_count INTEGER,
+    sent_count INTEGER,
+    total_received NUMERIC(78, 0),
+    total_sent NUMERIC(78, 0),
+    PRIMARY KEY (time, chain_id, address)
+);
+
+SELECT create_hypertable('address_activity_metrics', 'time', 
+    chunk_time_interval => INTERVAL '1 day');
+
+CREATE INDEX idx_address_activity_chain_address ON address_activity_metrics(chain_id, address, time DESC);
+```
+
+**Aggregation**: Pre-aggregate to hourly/daily for addresses
+
+## ClickHouse Schema (Alternative)
+
+### Mempool Transactions Table
+
+```sql
+CREATE TABLE mempool_transactions (
+    time DateTime('UTC') NOT NULL,
+    chain_id UInt32 NOT NULL,
+    hash String NOT NULL,
+    from_address String NOT NULL,
+    to_address Nullable(String),
+    value Decimal128(0),
+    gas_price UInt64,
+    max_fee_per_gas Nullable(UInt64),
+    max_priority_fee_per_gas Nullable(UInt64),
+    gas_limit UInt64,
+    nonce UInt64,
+    input_data_length UInt32,
+    first_seen DateTime('UTC') NOT NULL,
+    status String DEFAULT 'pending',
+    confirmed_block_number Nullable(UInt64),
+    confirmed_at Nullable(DateTime('UTC'))
+) ENGINE = MergeTree()
+PARTITION BY toYYYYMM(time)
+ORDER BY (chain_id, time, hash)
+TTL time + INTERVAL 7 DAY; -- Auto-delete after 7 days
+```
+
+## Data Retention and Aggregation
+
+### Retention Policies
+
+**Raw Data**:
+- Mempool transactions: 7 days
+- Network metrics: 30 days
+- Gas price history: 90 days
+- Address activity: 30 days
+
+**Aggregated Data**:
+- 1-minute aggregates: 90 days
+- 5-minute aggregates: 1 year
+- 1-hour aggregates: 5 years
+- Daily aggregates: Indefinite
+
+### Continuous Aggregates (TimescaleDB)
+
+```sql
+-- 1-minute network metrics aggregate
+CREATE MATERIALIZED VIEW network_metrics_1m
+WITH (timescaledb.continuous) AS
+SELECT
+    time_bucket('1 minute', time) AS bucket,
+    chain_id,
+    AVG(tps) AS avg_tps,
+    AVG(gps) AS avg_gps,
+    AVG(avg_gas_price) AS avg_gas_price,
+    AVG(pending_transactions) AS avg_pending_tx
+FROM network_metrics
+GROUP BY bucket, chain_id;
+
+-- Add refresh policy
+SELECT add_continuous_aggregate_policy('network_metrics_1m',
+    start_offset => INTERVAL '1 hour',
+    end_offset => INTERVAL '1 minute',
+    schedule_interval => INTERVAL '1 minute');
+```
+
+## Query Patterns
+
+### Recent Mempool Transactions
+
+```sql
+SELECT * FROM mempool_transactions
+WHERE chain_id = 138
+  AND time > NOW() - INTERVAL '1 hour'
+  AND status = 'pending'
+ORDER BY time DESC
+LIMIT 100;
+```
+
+### Gas Price Statistics
+
+```sql
+SELECT 
+    time_bucket('5 minutes', time) AS bucket,
+    AVG(avg_gas_price) AS avg_gas_price,
+    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY avg_gas_price) AS median_gas_price
+FROM gas_price_history
+WHERE chain_id = 138
+  AND time > NOW() - INTERVAL '24 hours'
+GROUP BY bucket
+ORDER BY bucket DESC;
+```
+
+### Network Throughput
+
+```sql
+SELECT 
+    time_bucket('1 minute', time) AS bucket,
+    AVG(tps) AS avg_tps,
+    MAX(tps) AS max_tps
+FROM network_metrics
+WHERE chain_id = 138
+  AND time > NOW() - INTERVAL '1 hour'
+GROUP BY bucket
+ORDER BY bucket DESC;
+```
+
+## References
+
+- Mempool Service: See `../mempool/mempool-service.md`
+- Observability: See `../observability/metrics-monitoring.md`
+