Add full monorepo: virtual-banker, backend, frontend, docs, scripts, deployment
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
294
docs/specs/database/data-lake-schema.md
Normal file
294
docs/specs/database/data-lake-schema.md
Normal file
@@ -0,0 +1,294 @@
|
||||
# Data Lake Schema Specification
|
||||
|
||||
## Overview
|
||||
|
||||
This document specifies the data lake schema for long-term storage of blockchain data in S3-compatible object storage using Parquet format for analytics, ML, and compliance purposes.
|
||||
|
||||
## Storage Structure
|
||||
|
||||
### Directory Layout
|
||||
|
||||
```
|
||||
s3://explorer-data-lake/
|
||||
├── raw/
|
||||
│ ├── chain_id=138/
|
||||
│ │ ├── year=2024/
|
||||
│ │ │ ├── month=01/
|
||||
│ │ │ │ ├── day=01/
|
||||
│ │ │ │ │ ├── blocks.parquet
|
||||
│ │ │ │ │ ├── transactions.parquet
|
||||
│ │ │ │ │ └── logs.parquet
|
||||
│ │ │ │ └── ...
|
||||
│ │ │ └── ...
|
||||
│ │ └── ...
|
||||
│ └── ...
|
||||
├── processed/
|
||||
│ ├── chain_id=138/
|
||||
│ │ ├── daily_aggregates/
|
||||
│ │ │ ├── year=2024/
|
||||
│ │ │ │ └── month=01/
|
||||
│ │ │ │ └── day=01.parquet
|
||||
│ │ └── ...
|
||||
│ └── ...
|
||||
└── archived/
|
||||
└── ...
|
||||
```
|
||||
|
||||
### Partitioning Strategy
|
||||
|
||||
**Partition Keys**:
|
||||
- `chain_id`: Chain identifier
|
||||
- `year`: Year (YYYY)
|
||||
- `month`: Month (MM)
|
||||
- `day`: Day (DD)
|
||||
|
||||
**Benefits**:
|
||||
- Efficient query pruning
|
||||
- Parallel processing
|
||||
- Easy data management (delete by partition)
|
||||
|
||||
## Parquet Schema
|
||||
|
||||
### Blocks Parquet Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "struct",
|
||||
"fields": [
|
||||
{"name": "chain_id", "type": "integer", "nullable": false},
|
||||
{"name": "number", "type": "long", "nullable": false},
|
||||
{"name": "hash", "type": "string", "nullable": false},
|
||||
{"name": "parent_hash", "type": "string", "nullable": false},
|
||||
{"name": "timestamp", "type": "timestamp", "nullable": false},
|
||||
{"name": "miner", "type": "string", "nullable": true},
|
||||
{"name": "gas_used", "type": "long", "nullable": true},
|
||||
{"name": "gas_limit", "type": "long", "nullable": true},
|
||||
{"name": "transaction_count", "type": "integer", "nullable": true},
|
||||
{"name": "size", "type": "integer", "nullable": true}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Transactions Parquet Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "struct",
|
||||
"fields": [
|
||||
{"name": "chain_id", "type": "integer", "nullable": false},
|
||||
{"name": "hash", "type": "string", "nullable": false},
|
||||
{"name": "block_number", "type": "long", "nullable": false},
|
||||
{"name": "transaction_index", "type": "integer", "nullable": false},
|
||||
{"name": "from_address", "type": "string", "nullable": false},
|
||||
{"name": "to_address", "type": "string", "nullable": true},
|
||||
{"name": "value", "type": "string", "nullable": false}, // Decimal as string
|
||||
{"name": "gas_price", "type": "long", "nullable": true},
|
||||
{"name": "gas_used", "type": "long", "nullable": true},
|
||||
{"name": "gas_limit", "type": "long", "nullable": false},
|
||||
{"name": "status", "type": "integer", "nullable": true},
|
||||
{"name": "timestamp", "type": "timestamp", "nullable": false}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Logs Parquet Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "struct",
|
||||
"fields": [
|
||||
{"name": "chain_id", "type": "integer", "nullable": false},
|
||||
{"name": "transaction_hash", "type": "string", "nullable": false},
|
||||
{"name": "block_number", "type": "long", "nullable": false},
|
||||
{"name": "log_index", "type": "integer", "nullable": false},
|
||||
{"name": "address", "type": "string", "nullable": false},
|
||||
{"name": "topic0", "type": "string", "nullable": true},
|
||||
{"name": "topic1", "type": "string", "nullable": true},
|
||||
{"name": "topic2", "type": "string", "nullable": true},
|
||||
{"name": "topic3", "type": "string", "nullable": true},
|
||||
{"name": "data", "type": "string", "nullable": true},
|
||||
{"name": "timestamp", "type": "timestamp", "nullable": false}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Token Transfers Parquet Schema
|
||||
|
||||
```json
|
||||
{
|
||||
"type": "struct",
|
||||
"fields": [
|
||||
{"name": "chain_id", "type": "integer", "nullable": false},
|
||||
{"name": "transaction_hash", "type": "string", "nullable": false},
|
||||
{"name": "block_number", "type": "long", "nullable": false},
|
||||
{"name": "token_address", "type": "string", "nullable": false},
|
||||
{"name": "token_type", "type": "string", "nullable": false},
|
||||
{"name": "from_address", "type": "string", "nullable": false},
|
||||
{"name": "to_address", "type": "string", "nullable": false},
|
||||
{"name": "amount", "type": "string", "nullable": true},
|
||||
{"name": "token_id", "type": "string", "nullable": true},
|
||||
{"name": "timestamp", "type": "timestamp", "nullable": false}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Data Ingestion
|
||||
|
||||
### ETL Pipeline
|
||||
|
||||
**Process**:
|
||||
1. Extract: Query PostgreSQL for daily data
|
||||
2. Transform: Convert to Parquet format
|
||||
3. Load: Upload to S3 with partitioning
|
||||
|
||||
**Schedule**: Daily batch job after day ends
|
||||
|
||||
**Tools**: Apache Spark, AWS Glue, or custom ETL scripts
|
||||
|
||||
### Compression
|
||||
|
||||
**Format**: Snappy compression (good balance of speed and compression ratio)
|
||||
|
||||
**Alternative**: Gzip (better compression, slower)
|
||||
|
||||
### File Sizing
|
||||
|
||||
**Target Size**: 100-500 MB per Parquet file
|
||||
- Smaller files: Better parallelism
|
||||
- Larger files: Better compression
|
||||
|
||||
**Strategy**: Write files of target size, or split by time ranges
|
||||
|
||||
## Query Interface
|
||||
|
||||
### AWS Athena / Presto
|
||||
|
||||
**Table Definition**:
|
||||
```sql
|
||||
CREATE EXTERNAL TABLE blocks_138 (
|
||||
chain_id int,
|
||||
number bigint,
|
||||
hash string,
|
||||
parent_hash string,
|
||||
timestamp timestamp,
|
||||
miner string,
|
||||
gas_used bigint,
|
||||
gas_limit bigint,
|
||||
transaction_count int,
|
||||
size int
|
||||
)
|
||||
STORED AS PARQUET
|
||||
LOCATION 's3://explorer-data-lake/raw/chain_id=138/'
|
||||
TBLPROPERTIES (
|
||||
'projection.enabled' = 'true',
|
||||
'projection.year.type' = 'integer',
|
||||
'projection.year.range' = '2020,2030',
|
||||
'projection.month.type' = 'integer',
|
||||
'projection.month.range' = '1,12',
|
||||
'projection.day.type' = 'integer',
|
||||
'projection.day.range' = '1,31'
|
||||
);
|
||||
```
|
||||
|
||||
### Query Examples
|
||||
|
||||
**Daily Transaction Count**:
|
||||
```sql
|
||||
SELECT
|
||||
DATE(timestamp) as date,
|
||||
COUNT(*) as transaction_count
|
||||
FROM transactions_138
|
||||
WHERE year = 2024 AND month = 1
|
||||
GROUP BY DATE(timestamp)
|
||||
ORDER BY date;
|
||||
```
|
||||
|
||||
**Token Transfer Analytics**:
|
||||
```sql
|
||||
SELECT
|
||||
token_address,
|
||||
COUNT(*) as transfer_count,
|
||||
SUM(CAST(amount AS DECIMAL(78, 0))) as total_volume
|
||||
FROM token_transfers_138
|
||||
WHERE year = 2024 AND month = 1
|
||||
GROUP BY token_address
|
||||
ORDER BY total_volume DESC
|
||||
LIMIT 100;
|
||||
```
|
||||
|
||||
## Data Retention
|
||||
|
||||
### Retention Policies
|
||||
|
||||
**Raw Data**: 7 years (compliance requirement)
|
||||
**Processed Aggregates**: Indefinite
|
||||
**Archived Data**: Move to Glacier after 1 year
|
||||
|
||||
### Lifecycle Policies
|
||||
|
||||
**S3 Lifecycle Rules**:
|
||||
1. Move to Infrequent Access after 30 days
|
||||
2. Move to Glacier after 1 year
|
||||
3. Delete after 7 years (raw data)
|
||||
|
||||
## Data Processing
|
||||
|
||||
### Aggregation Jobs
|
||||
|
||||
**Daily Aggregates**:
|
||||
- Transaction counts by hour
|
||||
- Gas usage statistics
|
||||
- Token transfer volumes
|
||||
- Address activity metrics
|
||||
|
||||
**Monthly Aggregates**:
|
||||
- Network growth metrics
|
||||
- Token distribution changes
|
||||
- Protocol usage statistics
|
||||
|
||||
### ML/Analytics Workflows
|
||||
|
||||
**Use Cases**:
|
||||
- Anomaly detection
|
||||
- Fraud detection
|
||||
- Market analysis
|
||||
- Network health monitoring
|
||||
|
||||
**Tools**: Spark, Pandas, Jupyter notebooks
|
||||
|
||||
## Security and Access Control
|
||||
|
||||
### Access Control
|
||||
|
||||
**IAM Policies**: Restrict access to specific prefixes
|
||||
**Encryption**: Server-side encryption (SSE-S3 or SSE-KMS)
|
||||
**Audit Logging**: Enable S3 access logging
|
||||
|
||||
### Data Classification
|
||||
|
||||
**Public Data**: Blocks, transactions (public blockchain data)
|
||||
**Sensitive Data**: User addresses, labels (requires authentication)
|
||||
**Compliance Data**: Banking/transaction data (strict access control)
|
||||
|
||||
## Cost Optimization
|
||||
|
||||
### Storage Optimization
|
||||
|
||||
**Strategies**:
|
||||
- Use appropriate storage classes (Standard, IA, Glacier)
|
||||
- Compress data (Parquet + Snappy)
|
||||
- Delete old data per retention policy
|
||||
- Use intelligent tiering
|
||||
|
||||
### Query Optimization
|
||||
|
||||
**Strategies**:
|
||||
- Partition pruning (query only relevant partitions)
|
||||
- Column pruning (select only needed columns)
|
||||
- Predicate pushdown (filter early)
|
||||
|
||||
## References
|
||||
|
||||
- Database Schema: See `postgres-schema.md`
|
||||
- Analytics: See `../observability/metrics-monitoring.md`
|
||||
|
||||
300
docs/specs/database/graph-schema.md
Normal file
300
docs/specs/database/graph-schema.md
Normal file
@@ -0,0 +1,300 @@
|
||||
# Graph Database Schema Specification
|
||||
|
||||
## Overview
|
||||
|
||||
This document specifies the Neo4j graph database schema for storing cross-chain entity relationships, address clustering, and protocol interactions.
|
||||
|
||||
## Schema Design
|
||||
|
||||
### Node Types
|
||||
|
||||
#### Address Node
|
||||
|
||||
**Labels**: `Address`, `Chain{chain_id}` (e.g., `Chain138`)
|
||||
|
||||
**Properties**:
|
||||
```cypher
|
||||
{
|
||||
address: "0x...", // Unique identifier
|
||||
chainId: 138, // Chain ID
|
||||
label: "My Wallet", // Optional label
|
||||
isContract: false, // Is contract address
|
||||
firstSeen: timestamp, // First seen timestamp
|
||||
lastSeen: timestamp, // Last seen timestamp
|
||||
transactionCount: 100, // Transaction count
|
||||
balance: "1.5" // Current balance (string for precision)
|
||||
}
|
||||
```
|
||||
|
||||
**Constraints**:
|
||||
```cypher
|
||||
CREATE CONSTRAINT address_address_chain_id FOR (a:Address)
|
||||
REQUIRE (a.address, a.chainId) IS UNIQUE;
|
||||
```
|
||||
|
||||
#### Contract Node
|
||||
|
||||
**Labels**: `Contract`, `Address`
|
||||
|
||||
**Properties**: Inherits from Address, plus:
|
||||
```cypher
|
||||
{
|
||||
name: "MyToken",
|
||||
verificationStatus: "verified",
|
||||
compilerVersion: "0.8.19"
|
||||
}
|
||||
```
|
||||
|
||||
#### Token Node
|
||||
|
||||
**Labels**: `Token`, `Contract`
|
||||
|
||||
**Properties**: Inherits from Contract, plus:
|
||||
```cypher
|
||||
{
|
||||
symbol: "MTK",
|
||||
decimals: 18,
|
||||
totalSupply: "1000000",
|
||||
type: "ERC20" // ERC20, ERC721, ERC1155
|
||||
}
|
||||
```
|
||||
|
||||
#### Protocol Node
|
||||
|
||||
**Labels**: `Protocol`
|
||||
|
||||
**Properties**:
|
||||
```cypher
|
||||
{
|
||||
name: "Uniswap V3",
|
||||
category: "DEX",
|
||||
website: "https://uniswap.org"
|
||||
}
|
||||
```
|
||||
|
||||
### Relationship Types
|
||||
|
||||
#### TRANSFERRED_TO
|
||||
|
||||
**Purpose**: Track token transfers between addresses.
|
||||
|
||||
**Properties**:
|
||||
```cypher
|
||||
{
|
||||
amount: "1000000000000000000",
|
||||
tokenAddress: "0x...",
|
||||
transactionHash: "0x...",
|
||||
blockNumber: 12345,
|
||||
timestamp: timestamp
|
||||
}
|
||||
```
|
||||
|
||||
**Example**:
|
||||
```cypher
|
||||
(a1:Address {address: "0x..."})-[r:TRANSFERRED_TO {
|
||||
amount: "1000000000000000000",
|
||||
tokenAddress: "0x...",
|
||||
transactionHash: "0x..."
|
||||
}]->(a2:Address {address: "0x..."})
|
||||
```
|
||||
|
||||
#### CALLED
|
||||
|
||||
**Purpose**: Track contract calls between addresses.
|
||||
|
||||
**Properties**:
|
||||
```cypher
|
||||
{
|
||||
transactionHash: "0x...",
|
||||
blockNumber: 12345,
|
||||
timestamp: timestamp,
|
||||
gasUsed: 21000,
|
||||
method: "transfer"
|
||||
}
|
||||
```
|
||||
|
||||
#### OWNS
|
||||
|
||||
**Purpose**: Track token ownership (current balances).
|
||||
|
||||
**Properties**:
|
||||
```cypher
|
||||
{
|
||||
balance: "1000000000000000000",
|
||||
tokenId: "123", // For ERC-721/1155
|
||||
updatedAt: timestamp
|
||||
}
|
||||
```
|
||||
|
||||
**Example**:
|
||||
```cypher
|
||||
(a:Address)-[r:OWNS {
|
||||
balance: "1000000000000000000",
|
||||
updatedAt: timestamp
|
||||
}]->(t:Token)
|
||||
```
|
||||
|
||||
#### INTERACTS_WITH
|
||||
|
||||
**Purpose**: Track protocol interactions.
|
||||
|
||||
**Properties**:
|
||||
```cypher
|
||||
{
|
||||
interactionType: "swap", // swap, deposit, withdraw, etc.
|
||||
transactionHash: "0x...",
|
||||
timestamp: timestamp
|
||||
}
|
||||
```
|
||||
|
||||
**Example**:
|
||||
```cypher
|
||||
(a:Address)-[r:INTERACTS_WITH {
|
||||
interactionType: "swap",
|
||||
transactionHash: "0x..."
|
||||
}]->(p:Protocol)
|
||||
```
|
||||
|
||||
#### CLUSTERED_WITH
|
||||
|
||||
**Purpose**: Link addresses that belong to the same entity (address clustering).
|
||||
|
||||
**Properties**:
|
||||
```cypher
|
||||
{
|
||||
confidence: 0.95, // Clustering confidence score
|
||||
method: "heuristic", // Clustering method
|
||||
createdAt: timestamp
|
||||
}
|
||||
```
|
||||
|
||||
#### CCIP_MESSAGE_LINK
|
||||
|
||||
**Purpose**: Link transactions across chains via CCIP messages.
|
||||
|
||||
**Properties**:
|
||||
```cypher
|
||||
{
|
||||
messageId: "0x...",
|
||||
sourceTxHash: "0x...",
|
||||
destTxHash: "0x...",
|
||||
status: "delivered",
|
||||
timestamp: timestamp
|
||||
}
|
||||
```
|
||||
|
||||
**Example**:
|
||||
```cypher
|
||||
(srcTx:Transaction)-[r:CCIP_MESSAGE_LINK {
|
||||
messageId: "0x...",
|
||||
status: "delivered"
|
||||
}]->(destTx:Transaction)
|
||||
```
|
||||
|
||||
## Query Patterns
|
||||
|
||||
### Find Token Holders
|
||||
|
||||
```cypher
|
||||
MATCH (t:Token {address: "0x...", chainId: 138})-[r:OWNS]-(a:Address)
|
||||
WHERE r.balance > "0"
|
||||
RETURN a.address, r.balance
|
||||
ORDER BY toFloat(r.balance) DESC
|
||||
LIMIT 100;
|
||||
```
|
||||
|
||||
### Find Transfer Path
|
||||
|
||||
```cypher
|
||||
MATCH path = (a1:Address {address: "0x..."})-[:TRANSFERRED_TO*1..3]-(a2:Address {address: "0x..."})
|
||||
WHERE ALL(r in relationships(path) WHERE r.tokenAddress = "0x...")
|
||||
RETURN path
|
||||
LIMIT 10;
|
||||
```
|
||||
|
||||
### Find Protocol Users
|
||||
|
||||
```cypher
|
||||
MATCH (a:Address)-[r:INTERACTS_WITH]->(p:Protocol {name: "Uniswap V3"})
|
||||
RETURN a.address, count(r) as interactionCount
|
||||
ORDER BY interactionCount DESC
|
||||
LIMIT 100;
|
||||
```
|
||||
|
||||
### Address Clustering
|
||||
|
||||
```cypher
|
||||
MATCH (a1:Address)-[r:CLUSTERED_WITH]-(a2:Address)
|
||||
WHERE a1.address = "0x..."
|
||||
RETURN a2.address, r.confidence, r.method;
|
||||
```
|
||||
|
||||
### Cross-Chain CCIP Links
|
||||
|
||||
```cypher
|
||||
MATCH (srcTx:Transaction {hash: "0x..."})-[r:CCIP_MESSAGE_LINK]-(destTx:Transaction)
|
||||
RETURN srcTx, r, destTx;
|
||||
```
|
||||
|
||||
## Data Ingestion
|
||||
|
||||
### Transaction Ingestion
|
||||
|
||||
**Process**:
|
||||
1. Process transaction from indexer
|
||||
2. Create/update address nodes
|
||||
3. Create TRANSFERRED_TO relationships for token transfers
|
||||
4. Create CALLED relationships for contract calls
|
||||
5. Update OWNS relationships for token balances
|
||||
|
||||
### Batch Ingestion
|
||||
|
||||
**Strategy**:
|
||||
- Use Neo4j Batch API for bulk inserts
|
||||
- Batch size: 1000-10000 operations
|
||||
- Use transactions for atomicity
|
||||
|
||||
### Incremental Updates
|
||||
|
||||
**Process**:
|
||||
- Update relationships as new transactions processed
|
||||
- Maintain OWNS relationships (update balances)
|
||||
- Add new relationships for new interactions
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Indexing
|
||||
|
||||
**Indexes**:
|
||||
```cypher
|
||||
CREATE INDEX address_address FOR (a:Address) ON (a.address);
|
||||
CREATE INDEX address_chain_id FOR (a:Address) ON (a.chainId);
|
||||
CREATE INDEX transaction_hash FOR (t:Transaction) ON (t.hash);
|
||||
```
|
||||
|
||||
### Relationship Constraints
|
||||
|
||||
**Uniqueness**: Use MERGE to avoid duplicate relationships
|
||||
|
||||
**Example**:
|
||||
```cypher
|
||||
MATCH (a1:Address {address: "0x...", chainId: 138})
|
||||
MATCH (a2:Address {address: "0x...", chainId: 138})
|
||||
MERGE (a1)-[r:TRANSFERRED_TO {
|
||||
transactionHash: "0x..."
|
||||
}]->(a2)
|
||||
ON CREATE SET r.amount = "1000000", r.timestamp = timestamp();
|
||||
```
|
||||
|
||||
## Data Retention
|
||||
|
||||
**Strategy**:
|
||||
- Keep all current relationships
|
||||
- Archive old relationships (older than 1 year) to separate database
|
||||
- Keep aggregated statistics (interaction counts) instead of all relationships
|
||||
|
||||
## References
|
||||
|
||||
- Entity Graph: See `../multichain/entity-graph.md`
|
||||
- CCIP Integration: See `../ccip/ccip-tracking.md`
|
||||
|
||||
517
docs/specs/database/postgres-schema.md
Normal file
517
docs/specs/database/postgres-schema.md
Normal file
@@ -0,0 +1,517 @@
|
||||
# PostgreSQL Database Schema Specification
|
||||
|
||||
## Overview
|
||||
|
||||
This document specifies the complete PostgreSQL database schema for the explorer platform. The schema is designed to support multi-chain operation, high-performance queries, and data consistency.
|
||||
|
||||
## Schema Design Principles
|
||||
|
||||
1. **Multi-chain Support**: All tables include `chain_id` for chain isolation
|
||||
2. **Normalization**: Normalized structure to avoid data duplication
|
||||
3. **Performance**: Strategic indexing for common query patterns
|
||||
4. **Consistency**: Foreign key constraints where appropriate
|
||||
5. **Extensibility**: JSONB columns for flexible data storage
|
||||
6. **Partitioning**: Large tables partitioned by `chain_id`
|
||||
|
||||
## Core Tables
|
||||
|
||||
### Blocks Table
|
||||
|
||||
See `../indexing/data-models.md` for detailed block schema.
|
||||
|
||||
**Partitioning**: Partition by `chain_id` for large deployments.
|
||||
|
||||
**Key Indexes**:
|
||||
- Primary: `(chain_id, number)`
|
||||
- Unique: `(chain_id, hash)`
|
||||
- Index: `(chain_id, timestamp)` for time-range queries
|
||||
|
||||
### Transactions Table
|
||||
|
||||
See `../indexing/data-models.md` for detailed transaction schema.
|
||||
|
||||
**Key Indexes**:
|
||||
- Primary: `(chain_id, hash)`
|
||||
- Index: `(chain_id, block_number, transaction_index)` for block queries
|
||||
- Index: `(chain_id, from_address)` for address queries
|
||||
- Index: `(chain_id, to_address)` for address queries
|
||||
- Index: `(chain_id, block_number, from_address)` for compound queries
|
||||
|
||||
### Logs Table
|
||||
|
||||
See `../indexing/data-models.md` for detailed log schema.
|
||||
|
||||
**Key Indexes**:
|
||||
- Primary: `(chain_id, transaction_hash, log_index)`
|
||||
- Index: `(chain_id, address)` for contract event queries
|
||||
- Index: `(chain_id, topic0)` for event type queries
|
||||
- Index: `(chain_id, address, topic0)` for filtered event queries
|
||||
- Index: `(chain_id, block_number)` for block-based queries
|
||||
|
||||
### Traces Table
|
||||
|
||||
See `../indexing/data-models.md` for detailed trace schema.
|
||||
|
||||
**Key Indexes**:
|
||||
- Primary: `(chain_id, transaction_hash, trace_address)`
|
||||
- Index: `(chain_id, action_from)` for address queries
|
||||
- Index: `(chain_id, action_to)` for address queries
|
||||
- Index: `(chain_id, block_number)` for block queries
|
||||
|
||||
### Internal Transactions Table
|
||||
|
||||
See `../indexing/data-models.md` for detailed internal transaction schema.
|
||||
|
||||
**Key Indexes**:
|
||||
- Primary: `(chain_id, transaction_hash, trace_address)`
|
||||
- Index: `(chain_id, from_address)`
|
||||
- Index: `(chain_id, to_address)`
|
||||
- Index: `(chain_id, block_number)`
|
||||
|
||||
## Token Tables
|
||||
|
||||
### Tokens Table
|
||||
|
||||
```sql
|
||||
CREATE TABLE tokens (
|
||||
id BIGSERIAL,
|
||||
chain_id INTEGER NOT NULL,
|
||||
address VARCHAR(42) NOT NULL,
|
||||
type VARCHAR(10) NOT NULL CHECK (type IN ('ERC20', 'ERC721', 'ERC1155')),
|
||||
name VARCHAR(255),
|
||||
symbol VARCHAR(50),
|
||||
decimals INTEGER CHECK (decimals >= 0 AND decimals <= 18),
|
||||
total_supply NUMERIC(78, 0),
|
||||
holder_count INTEGER DEFAULT 0,
|
||||
transfer_count INTEGER DEFAULT 0,
|
||||
logo_url TEXT,
|
||||
website_url TEXT,
|
||||
description TEXT,
|
||||
verified BOOLEAN DEFAULT false,
|
||||
created_at TIMESTAMP DEFAULT NOW(),
|
||||
updated_at TIMESTAMP DEFAULT NOW(),
|
||||
PRIMARY KEY (id),
|
||||
UNIQUE (chain_id, address)
|
||||
) PARTITION BY LIST (chain_id);
|
||||
|
||||
CREATE INDEX idx_tokens_chain_address ON tokens(chain_id, address);
|
||||
CREATE INDEX idx_tokens_chain_type ON tokens(chain_id, type);
|
||||
CREATE INDEX idx_tokens_chain_symbol ON tokens(chain_id, symbol);
|
||||
```
|
||||
|
||||
### Token Transfers Table
|
||||
|
||||
```sql
|
||||
CREATE TABLE token_transfers (
|
||||
id BIGSERIAL,
|
||||
chain_id INTEGER NOT NULL,
|
||||
transaction_hash VARCHAR(66) NOT NULL,
|
||||
block_number BIGINT NOT NULL,
|
||||
log_index INTEGER NOT NULL,
|
||||
token_address VARCHAR(42) NOT NULL,
|
||||
token_type VARCHAR(10) NOT NULL CHECK (token_type IN ('ERC20', 'ERC721', 'ERC1155')),
|
||||
from_address VARCHAR(42) NOT NULL,
|
||||
to_address VARCHAR(42) NOT NULL,
|
||||
amount NUMERIC(78, 0),
|
||||
token_id VARCHAR(78),
|
||||
operator VARCHAR(42),
|
||||
created_at TIMESTAMP DEFAULT NOW(),
|
||||
PRIMARY KEY (id),
|
||||
FOREIGN KEY (chain_id, transaction_hash) REFERENCES transactions(chain_id, hash),
|
||||
FOREIGN KEY (chain_id, token_address) REFERENCES tokens(chain_id, address),
|
||||
UNIQUE (chain_id, transaction_hash, log_index)
|
||||
) PARTITION BY LIST (chain_id);
|
||||
|
||||
CREATE INDEX idx_token_transfers_chain_token ON token_transfers(chain_id, token_address);
|
||||
CREATE INDEX idx_token_transfers_chain_from ON token_transfers(chain_id, from_address);
|
||||
CREATE INDEX idx_token_transfers_chain_to ON token_transfers(chain_id, to_address);
|
||||
CREATE INDEX idx_token_transfers_chain_tx ON token_transfers(chain_id, transaction_hash);
|
||||
CREATE INDEX idx_token_transfers_chain_block ON token_transfers(chain_id, block_number);
|
||||
CREATE INDEX idx_token_transfers_chain_token_from ON token_transfers(chain_id, token_address, from_address);
|
||||
CREATE INDEX idx_token_transfers_chain_token_to ON token_transfers(chain_id, token_address, to_address);
|
||||
```
|
||||
|
||||
### Token Holders Table (Optional)
|
||||
|
||||
**Purpose**: Maintain current token balances for efficient queries.
|
||||
|
||||
```sql
|
||||
CREATE TABLE token_holders (
|
||||
id BIGSERIAL,
|
||||
chain_id INTEGER NOT NULL,
|
||||
token_address VARCHAR(42) NOT NULL,
|
||||
address VARCHAR(42) NOT NULL,
|
||||
balance NUMERIC(78, 0) NOT NULL DEFAULT 0,
|
||||
token_id VARCHAR(78), -- For ERC-721/1155
|
||||
updated_at TIMESTAMP DEFAULT NOW(),
|
||||
PRIMARY KEY (id),
|
||||
FOREIGN KEY (chain_id, token_address) REFERENCES tokens(chain_id, address),
|
||||
UNIQUE (chain_id, token_address, address, COALESCE(token_id, ''))
|
||||
) PARTITION BY LIST (chain_id);
|
||||
|
||||
CREATE INDEX idx_token_holders_chain_token ON token_holders(chain_id, token_address);
|
||||
CREATE INDEX idx_token_holders_chain_address ON token_holders(chain_id, address);
|
||||
```
|
||||
|
||||
## Contract Tables
|
||||
|
||||
### Contracts Table
|
||||
|
||||
```sql
|
||||
CREATE TABLE contracts (
|
||||
id BIGSERIAL,
|
||||
chain_id INTEGER NOT NULL,
|
||||
address VARCHAR(42) NOT NULL,
|
||||
name VARCHAR(255),
|
||||
compiler_version VARCHAR(50),
|
||||
optimization_enabled BOOLEAN,
|
||||
optimization_runs INTEGER,
|
||||
evm_version VARCHAR(20),
|
||||
source_code TEXT,
|
||||
abi JSONB,
|
||||
constructor_arguments TEXT,
|
||||
verification_status VARCHAR(20) NOT NULL CHECK (verification_status IN ('pending', 'verified', 'failed')),
|
||||
verified_at TIMESTAMP,
|
||||
verification_method VARCHAR(50),
|
||||
license VARCHAR(50),
|
||||
created_at TIMESTAMP DEFAULT NOW(),
|
||||
updated_at TIMESTAMP DEFAULT NOW(),
|
||||
PRIMARY KEY (id),
|
||||
UNIQUE (chain_id, address)
|
||||
) PARTITION BY LIST (chain_id);
|
||||
|
||||
CREATE INDEX idx_contracts_chain_address ON contracts(chain_id, address);
|
||||
CREATE INDEX idx_contracts_chain_verified ON contracts(chain_id, verification_status);
|
||||
CREATE INDEX idx_contracts_abi_gin ON contracts USING GIN (abi); -- For ABI queries
|
||||
```
|
||||
|
||||
### Contract ABIs Table
|
||||
|
||||
```sql
|
||||
CREATE TABLE contract_abis (
|
||||
id BIGSERIAL,
|
||||
chain_id INTEGER NOT NULL,
|
||||
address VARCHAR(42) NOT NULL,
|
||||
abi JSONB NOT NULL,
|
||||
source VARCHAR(50) NOT NULL,
|
||||
verified BOOLEAN DEFAULT false,
|
||||
created_at TIMESTAMP DEFAULT NOW(),
|
||||
updated_at TIMESTAMP DEFAULT NOW(),
|
||||
PRIMARY KEY (id),
|
||||
UNIQUE (chain_id, address)
|
||||
) PARTITION BY LIST (chain_id);
|
||||
|
||||
CREATE INDEX idx_abis_chain_address ON contract_abis(chain_id, address);
|
||||
CREATE INDEX idx_abis_abi_gin ON contract_abis USING GIN (abi);
|
||||
```
|
||||
|
||||
### Contract Verifications Table
|
||||
|
||||
```sql
|
||||
CREATE TABLE contract_verifications (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
chain_id INTEGER NOT NULL,
|
||||
address VARCHAR(42) NOT NULL,
|
||||
status VARCHAR(20) NOT NULL CHECK (status IN ('pending', 'processing', 'verified', 'failed', 'partially_verified')),
|
||||
compiler_version VARCHAR(50),
|
||||
optimization_enabled BOOLEAN,
|
||||
optimization_runs INTEGER,
|
||||
evm_version VARCHAR(20),
|
||||
source_code TEXT,
|
||||
abi JSONB,
|
||||
constructor_arguments TEXT,
|
||||
verification_method VARCHAR(50),
|
||||
error_message TEXT,
|
||||
verified_at TIMESTAMP,
|
||||
version INTEGER DEFAULT 1,
|
||||
is_active BOOLEAN DEFAULT true,
|
||||
created_at TIMESTAMP DEFAULT NOW(),
|
||||
updated_at TIMESTAMP DEFAULT NOW(),
|
||||
FOREIGN KEY (chain_id, address) REFERENCES contracts(chain_id, address)
|
||||
);
|
||||
|
||||
CREATE INDEX idx_verifications_chain_address ON contract_verifications(chain_id, address);
|
||||
CREATE INDEX idx_verifications_status ON contract_verifications(status);
|
||||
```
|
||||
|
||||
## Address-Related Tables
|
||||
|
||||
### Address Labels Table
|
||||
|
||||
```sql
|
||||
CREATE TABLE address_labels (
|
||||
id BIGSERIAL,
|
||||
chain_id INTEGER NOT NULL,
|
||||
address VARCHAR(42) NOT NULL,
|
||||
label VARCHAR(255) NOT NULL,
|
||||
label_type VARCHAR(20) NOT NULL CHECK (label_type IN ('user', 'public', 'contract_name')),
|
||||
user_id UUID,
|
||||
source VARCHAR(50),
|
||||
created_at TIMESTAMP DEFAULT NOW(),
|
||||
updated_at TIMESTAMP DEFAULT NOW(),
|
||||
PRIMARY KEY (id),
|
||||
UNIQUE (chain_id, address, label_type, user_id),
|
||||
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_labels_chain_address ON address_labels(chain_id, address);
|
||||
CREATE INDEX idx_labels_chain_user ON address_labels(chain_id, user_id);
|
||||
```
|
||||
|
||||
### Address Tags Table
|
||||
|
||||
```sql
|
||||
CREATE TABLE address_tags (
|
||||
id BIGSERIAL,
|
||||
chain_id INTEGER NOT NULL,
|
||||
address VARCHAR(42) NOT NULL,
|
||||
tag VARCHAR(50) NOT NULL,
|
||||
tag_type VARCHAR(20) NOT NULL CHECK (tag_type IN ('category', 'risk', 'protocol')),
|
||||
user_id UUID,
|
||||
created_at TIMESTAMP DEFAULT NOW(),
|
||||
PRIMARY KEY (id),
|
||||
UNIQUE (chain_id, address, tag, user_id),
|
||||
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_tags_chain_address ON address_tags(chain_id, address);
|
||||
CREATE INDEX idx_tags_chain_tag ON address_tags(chain_id, tag);
|
||||
```
|
||||
|
||||
## User Tables
|
||||
|
||||
### Users Table
|
||||
|
||||
```sql
|
||||
CREATE TABLE users (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
email VARCHAR(255) UNIQUE,
|
||||
username VARCHAR(100) UNIQUE,
|
||||
password_hash TEXT,
|
||||
api_key_hash TEXT,
|
||||
created_at TIMESTAMP DEFAULT NOW(),
|
||||
updated_at TIMESTAMP DEFAULT NOW(),
|
||||
last_login_at TIMESTAMP
|
||||
);
|
||||
|
||||
CREATE INDEX idx_users_email ON users(email);
|
||||
CREATE INDEX idx_users_username ON users(username);
|
||||
```
|
||||
|
||||
### Watchlists Table
|
||||
|
||||
```sql
|
||||
CREATE TABLE watchlists (
|
||||
id BIGSERIAL,
|
||||
user_id UUID NOT NULL,
|
||||
chain_id INTEGER NOT NULL,
|
||||
address VARCHAR(42) NOT NULL,
|
||||
label VARCHAR(255),
|
||||
created_at TIMESTAMP DEFAULT NOW(),
|
||||
PRIMARY KEY (id),
|
||||
UNIQUE (user_id, chain_id, address),
|
||||
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_watchlists_user ON watchlists(user_id);
|
||||
CREATE INDEX idx_watchlists_chain_address ON watchlists(chain_id, address);
|
||||
```
|
||||
|
||||
### API Keys Table
|
||||
|
||||
```sql
|
||||
CREATE TABLE api_keys (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
user_id UUID NOT NULL,
|
||||
key_hash TEXT NOT NULL UNIQUE,
|
||||
name VARCHAR(255),
|
||||
tier VARCHAR(20) NOT NULL CHECK (tier IN ('free', 'pro', 'enterprise')),
|
||||
rate_limit_per_second INTEGER,
|
||||
rate_limit_per_minute INTEGER,
|
||||
ip_whitelist TEXT[], -- Array of CIDR blocks
|
||||
last_used_at TIMESTAMP,
|
||||
expires_at TIMESTAMP,
|
||||
revoked BOOLEAN DEFAULT false,
|
||||
created_at TIMESTAMP DEFAULT NOW(),
|
||||
FOREIGN KEY (user_id) REFERENCES users(id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
CREATE INDEX idx_api_keys_user ON api_keys(user_id);
|
||||
CREATE INDEX idx_api_keys_hash ON api_keys(key_hash);
|
||||
```
|
||||
|
||||
## Multi-Chain Partitioning
|
||||
|
||||
### Partitioning Strategy
|
||||
|
||||
**Large Tables**: Partition by `chain_id` using LIST partitioning.
|
||||
|
||||
**Tables to Partition**:
|
||||
- `blocks`
|
||||
- `transactions`
|
||||
- `logs`
|
||||
- `traces`
|
||||
- `internal_transactions`
|
||||
- `token_transfers`
|
||||
- `tokens`
|
||||
- `token_holders` (if used)
|
||||
|
||||
### Partition Creation
|
||||
|
||||
**Example for blocks table**:
|
||||
|
||||
```sql
|
||||
-- Create parent table
|
||||
CREATE TABLE blocks (
|
||||
-- columns
|
||||
) PARTITION BY LIST (chain_id);
|
||||
|
||||
-- Create partitions
|
||||
CREATE TABLE blocks_chain_138 PARTITION OF blocks
|
||||
FOR VALUES IN (138);
|
||||
|
||||
CREATE TABLE blocks_chain_1 PARTITION OF blocks
|
||||
FOR VALUES IN (1);
|
||||
|
||||
-- Add indexes to partitions (inherited from parent)
|
||||
```
|
||||
|
||||
**Benefits**:
|
||||
- Faster queries (partition pruning)
|
||||
- Easier maintenance (per-chain operations)
|
||||
- Parallel processing
|
||||
- Data isolation
|
||||
|
||||
## Indexing Strategy
|
||||
|
||||
### Index Types
|
||||
|
||||
1. **B-tree**: Default for most indexes (equality, range, sorting)
|
||||
2. **Hash**: For exact match only (rarely used, B-tree usually better)
|
||||
3. **GIN**: For JSONB columns (ABIs, decoded data)
|
||||
4. **BRIN**: For large ordered columns (block numbers, timestamps)
|
||||
5. **Partial**: For filtered indexes (e.g., verified contracts only)
|
||||
|
||||
### Index Maintenance
|
||||
|
||||
**Regular Maintenance**:
|
||||
- `VACUUM ANALYZE` regularly (auto-vacuum enabled)
|
||||
- `REINDEX` if needed (bloat, corruption)
|
||||
- Monitor index usage (`pg_stat_user_indexes`)
|
||||
|
||||
**Index Monitoring**:
|
||||
- Track index sizes
|
||||
- Monitor index bloat
|
||||
- Remove unused indexes
|
||||
|
||||
## Data Retention and Archiving
|
||||
|
||||
### Retention Policies
|
||||
|
||||
**Hot Data**: Recent data (last 1 year)
|
||||
- Fast access required
|
||||
- All indexes maintained
|
||||
|
||||
**Warm Data**: Older data (1-5 years)
|
||||
- Archive to slower storage
|
||||
- Reduced indexing
|
||||
|
||||
**Cold Data**: Very old data (5+ years)
|
||||
- Archive to object storage
|
||||
- Minimal indexing
|
||||
|
||||
### Archiving Strategy
|
||||
|
||||
**Approach**:
|
||||
1. Partition tables by time ranges (monthly/yearly)
|
||||
2. Move old partitions to archive storage
|
||||
3. Query archive when needed (slower but available)
|
||||
|
||||
**Implementation**:
|
||||
- Use PostgreSQL table partitioning by date range
|
||||
- Move partitions to archive storage (S3, etc.)
|
||||
- Query via foreign data wrappers if needed
|
||||
|
||||
## Migration Strategy
|
||||
|
||||
### Versioning
|
||||
|
||||
**Migration Tool**: Use migration tool (Flyway, Liquibase, or custom).
|
||||
|
||||
**Versioning Format**: `YYYYMMDDHHMMSS_description.sql`
|
||||
|
||||
**Example**:
|
||||
```
|
||||
20240101000001_initial_schema.sql
|
||||
20240115000001_add_token_holders.sql
|
||||
20240201000001_add_partitioning.sql
|
||||
```
|
||||
|
||||
### Migration Best Practices
|
||||
|
||||
1. **Backward Compatible**: Additive changes preferred
|
||||
2. **Reversible**: All migrations should be reversible
|
||||
3. **Tested**: Test on staging before production
|
||||
4. **Documented**: Document breaking changes
|
||||
5. **Rollback Plan**: Have rollback strategy
|
||||
|
||||
### Schema Evolution
|
||||
|
||||
**Adding Columns**:
|
||||
- Use `ALTER TABLE ADD COLUMN` with default values
|
||||
- Avoid NOT NULL without defaults (use two-step migration)
|
||||
|
||||
**Removing Columns**:
|
||||
- Mark as deprecated first
|
||||
- Remove after migration period
|
||||
|
||||
**Changing Types**:
|
||||
- Create new column
|
||||
- Migrate data
|
||||
- Drop old column
|
||||
- Rename new column
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Query Optimization
|
||||
|
||||
**Common Query Patterns**:
|
||||
1. Get block by number: Use `(chain_id, number)` index
|
||||
2. Get transaction by hash: Use `(chain_id, hash)` index
|
||||
3. Get address transactions: Use `(chain_id, from_address)` or `(chain_id, to_address)` index
|
||||
4. Filter logs by address and event: Use `(chain_id, address, topic0)` index
|
||||
|
||||
### Connection Pooling
|
||||
|
||||
**Configuration**:
|
||||
- Use connection pooler (PgBouncer, pgpool-II)
|
||||
- Pool size: 20-100 connections per application server
|
||||
- Statement-level pooling for better concurrency
|
||||
|
||||
### Read Replicas
|
||||
|
||||
**Strategy**:
|
||||
- Primary: Write operations
|
||||
- Replicas: Read operations (load balanced)
|
||||
- Async replication (small lag acceptable)
|
||||
|
||||
## Backup and Recovery
|
||||
|
||||
### Backup Strategy
|
||||
|
||||
**Full Backups**: Daily full database dumps
|
||||
**Incremental Backups**: Continuous WAL archiving
|
||||
**Point-in-Time Recovery**: Enabled via WAL archiving
|
||||
|
||||
### Recovery Procedures
|
||||
|
||||
**RTO Target**: 1 hour
|
||||
**RPO Target**: 5 minutes (max data loss)
|
||||
|
||||
## References
|
||||
|
||||
- Data Models: See `../indexing/data-models.md`
|
||||
- Indexer Architecture: See `../indexing/indexer-architecture.md`
|
||||
- Search Index Schema: See `search-index-schema.md`
|
||||
- Multi-chain Architecture: See `../multichain/multichain-indexing.md`
|
||||
|
||||
458
docs/specs/database/search-index-schema.md
Normal file
458
docs/specs/database/search-index-schema.md
Normal file
@@ -0,0 +1,458 @@
|
||||
# Search Index Schema Specification
|
||||
|
||||
## Overview
|
||||
|
||||
This document specifies the Elasticsearch/OpenSearch index schema for full-text search and faceted querying across blocks, transactions, addresses, tokens, and contracts.
|
||||
|
||||
## Architecture
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
PG[(PostgreSQL<br/>Canonical Data)]
|
||||
Transform[Data Transformer]
|
||||
ES[(Elasticsearch<br/>Search Index)]
|
||||
|
||||
PG --> Transform
|
||||
Transform --> ES
|
||||
|
||||
Query[Search Query]
|
||||
Query --> ES
|
||||
ES --> Results[Search Results]
|
||||
```
|
||||
|
||||
## Index Structure
|
||||
|
||||
### Blocks Index
|
||||
|
||||
**Index Name**: `blocks-{chain_id}` (e.g., `blocks-138`)
|
||||
|
||||
**Document Structure**:
|
||||
```json
|
||||
{
|
||||
"block_number": 12345,
|
||||
"hash": "0x...",
|
||||
"timestamp": "2024-01-01T00:00:00Z",
|
||||
"miner": "0x...",
|
||||
"transaction_count": 100,
|
||||
"gas_used": 15000000,
|
||||
"gas_limit": 20000000,
|
||||
"chain_id": 138,
|
||||
"parent_hash": "0x...",
|
||||
"size": 1024
|
||||
}
|
||||
```
|
||||
|
||||
**Field Mappings**:
|
||||
- `block_number`: `long` (not analyzed, for sorting/filtering)
|
||||
- `hash`: `keyword` (exact match)
|
||||
- `timestamp`: `date`
|
||||
- `miner`: `keyword` (exact match)
|
||||
- `transaction_count`: `integer`
|
||||
- `gas_used`: `long`
|
||||
- `gas_limit`: `long`
|
||||
- `chain_id`: `integer`
|
||||
- `parent_hash`: `keyword`
|
||||
|
||||
**Searchable Fields**:
|
||||
- Hash (exact match)
|
||||
- Miner address (exact match)
|
||||
|
||||
### Transactions Index
|
||||
|
||||
**Index Name**: `transactions-{chain_id}`
|
||||
|
||||
**Document Structure**:
|
||||
```json
|
||||
{
|
||||
"hash": "0x...",
|
||||
"block_number": 12345,
|
||||
"transaction_index": 5,
|
||||
"from_address": "0x...",
|
||||
"to_address": "0x...",
|
||||
"value": "1000000000000000000",
|
||||
"gas_price": "20000000000",
|
||||
"gas_used": 21000,
|
||||
"status": "success",
|
||||
"timestamp": "2024-01-01T00:00:00Z",
|
||||
"chain_id": 138,
|
||||
"input_data_length": 100,
|
||||
"is_contract_creation": false,
|
||||
"contract_address": null
|
||||
}
|
||||
```
|
||||
|
||||
**Field Mappings**:
|
||||
- `hash`: `keyword`
|
||||
- `block_number`: `long`
|
||||
- `transaction_index`: `integer`
|
||||
- `from_address`: `keyword`
|
||||
- `to_address`: `keyword`
|
||||
- `value`: `text` (for full-text search on large numbers)
|
||||
- `value_numeric`: `long` (for range queries)
|
||||
- `gas_price`: `long`
|
||||
- `gas_used`: `long`
|
||||
- `status`: `keyword`
|
||||
- `timestamp`: `date`
|
||||
- `chain_id`: `integer`
|
||||
- `input_data_length`: `integer`
|
||||
- `is_contract_creation`: `boolean`
|
||||
- `contract_address`: `keyword`
|
||||
|
||||
**Searchable Fields**:
|
||||
- Hash (exact match)
|
||||
- From/to addresses (exact match)
|
||||
- Value (range queries)
|
||||
|
||||
### Addresses Index
|
||||
|
||||
**Index Name**: `addresses-{chain_id}`
|
||||
|
||||
**Document Structure**:
|
||||
```json
|
||||
{
|
||||
"address": "0x...",
|
||||
"chain_id": 138,
|
||||
"label": "My Wallet",
|
||||
"tags": ["wallet", "exchange"],
|
||||
"token_count": 10,
|
||||
"transaction_count": 500,
|
||||
"first_seen": "2024-01-01T00:00:00Z",
|
||||
"last_seen": "2024-01-15T00:00:00Z",
|
||||
"is_contract": true,
|
||||
"contract_name": "MyToken",
|
||||
"balance_eth": "1.5",
|
||||
"balance_usd": "3000"
|
||||
}
|
||||
```
|
||||
|
||||
**Field Mappings**:
|
||||
- `address`: `keyword`
|
||||
- `chain_id`: `integer`
|
||||
- `label`: `text` (analyzed) + `keyword` (exact match)
|
||||
- `tags`: `keyword` (array)
|
||||
- `token_count`: `integer`
|
||||
- `transaction_count`: `long`
|
||||
- `first_seen`: `date`
|
||||
- `last_seen`: `date`
|
||||
- `is_contract`: `boolean`
|
||||
- `contract_name`: `text` + `keyword`
|
||||
- `balance_eth`: `double`
|
||||
- `balance_usd`: `double`
|
||||
|
||||
**Searchable Fields**:
|
||||
- Address (exact match, prefix match)
|
||||
- Label (full-text search)
|
||||
- Contract name (full-text search)
|
||||
- Tags (facet filter)
|
||||
|
||||
### Tokens Index
|
||||
|
||||
**Index Name**: `tokens-{chain_id}`
|
||||
|
||||
**Document Structure**:
|
||||
```json
|
||||
{
|
||||
"address": "0x...",
|
||||
"chain_id": 138,
|
||||
"name": "My Token",
|
||||
"symbol": "MTK",
|
||||
"type": "ERC20",
|
||||
"decimals": 18,
|
||||
"total_supply": "1000000000000000000000000",
|
||||
"holder_count": 1000,
|
||||
"transfer_count": 50000,
|
||||
"logo_url": "https://...",
|
||||
"verified": true,
|
||||
"description": "A token description"
|
||||
}
|
||||
```
|
||||
|
||||
**Field Mappings**:
|
||||
- `address`: `keyword`
|
||||
- `chain_id`: `integer`
|
||||
- `name`: `text` (analyzed) + `keyword` (exact match)
|
||||
- `symbol`: `keyword` (uppercase normalized)
|
||||
- `type`: `keyword`
|
||||
- `decimals`: `integer`
|
||||
- `total_supply`: `text` (for large numbers)
|
||||
- `total_supply_numeric`: `double` (for sorting)
|
||||
- `holder_count`: `integer`
|
||||
- `transfer_count`: `long`
|
||||
- `logo_url`: `keyword`
|
||||
- `verified`: `boolean`
|
||||
- `description`: `text` (analyzed)
|
||||
|
||||
**Searchable Fields**:
|
||||
- Name (full-text search)
|
||||
- Symbol (exact match, prefix match)
|
||||
- Address (exact match)
|
||||
|
||||
### Contracts Index
|
||||
|
||||
**Index Name**: `contracts-{chain_id}`
|
||||
|
||||
**Document Structure**:
|
||||
```json
|
||||
{
|
||||
"address": "0x...",
|
||||
"chain_id": 138,
|
||||
"name": "MyContract",
|
||||
"verification_status": "verified",
|
||||
"compiler_version": "0.8.19",
|
||||
"source_code": "contract MyContract {...}",
|
||||
"abi": [...],
|
||||
"verified_at": "2024-01-01T00:00:00Z",
|
||||
"transaction_count": 1000,
|
||||
"created_at": "2024-01-01T00:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Field Mappings**:
|
||||
- `address`: `keyword`
|
||||
- `chain_id`: `integer`
|
||||
- `name`: `text` + `keyword`
|
||||
- `verification_status`: `keyword`
|
||||
- `compiler_version`: `keyword`
|
||||
- `source_code`: `text` (analyzed, indexed but not stored in full for large contracts)
|
||||
- `abi`: `object` (nested, for structured queries)
|
||||
- `verified_at`: `date`
|
||||
- `transaction_count`: `long`
|
||||
- `created_at`: `date`
|
||||
|
||||
**Searchable Fields**:
|
||||
- Name (full-text search)
|
||||
- Address (exact match)
|
||||
- Source code (full-text search, limited)
|
||||
|
||||
## Indexing Pipeline
|
||||
|
||||
### Data Transformation
|
||||
|
||||
**Purpose**: Transform canonical PostgreSQL data into search-optimized documents.
|
||||
|
||||
**Transformation Steps**:
|
||||
1. **Fetch Data**: Query PostgreSQL for entities to index
|
||||
2. **Enrich Data**: Add computed fields (balances, counts, etc.)
|
||||
3. **Normalize Data**: Normalize addresses, format values
|
||||
4. **Index Document**: Send to Elasticsearch/OpenSearch
|
||||
|
||||
### Indexing Strategy
|
||||
|
||||
**Initial Indexing**:
|
||||
- Bulk index existing data
|
||||
- Process in batches (1000 documents per batch)
|
||||
- Use bulk API for efficiency
|
||||
|
||||
**Incremental Indexing**:
|
||||
- Index new entities as they're created
|
||||
- Update entities when changed
|
||||
- Delete entities when removed
|
||||
|
||||
**Update Frequency**:
|
||||
- Real-time: Index immediately after database insert/update
|
||||
- Batch: Bulk update every N minutes for efficiency
|
||||
|
||||
### Index Aliases
|
||||
|
||||
**Purpose**: Enable zero-downtime index updates.
|
||||
|
||||
**Strategy**:
|
||||
- Write to new index (e.g., `blocks-138-v2`)
|
||||
- Build index in background
|
||||
- Switch alias when ready
|
||||
- Delete old index after switch
|
||||
|
||||
**Alias Names**:
|
||||
- `blocks-{chain_id}` → points to latest version
|
||||
- `transactions-{chain_id}` → points to latest version
|
||||
- etc.
|
||||
|
||||
## Query Patterns
|
||||
|
||||
### Full-Text Search
|
||||
|
||||
**Blocks Search**:
|
||||
```json
|
||||
{
|
||||
"query": {
|
||||
"match": {
|
||||
"hash": "0x123..."
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Address Search**:
|
||||
```json
|
||||
{
|
||||
"query": {
|
||||
"bool": {
|
||||
"should": [
|
||||
{ "match": { "label": "wallet" } },
|
||||
{ "prefix": { "address": "0x123" } }
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Token Search**:
|
||||
```json
|
||||
{
|
||||
"query": {
|
||||
"bool": {
|
||||
"should": [
|
||||
{ "match": { "name": "My Token" } },
|
||||
{ "match": { "symbol": "MTK" } }
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Faceted Search
|
||||
|
||||
**Filter by Multiple Criteria**:
|
||||
```json
|
||||
{
|
||||
"query": {
|
||||
"bool": {
|
||||
"must": [
|
||||
{ "term": { "chain_id": 138 } },
|
||||
{ "term": { "type": "ERC20" } },
|
||||
{ "range": { "holder_count": { "gte": 100 } } }
|
||||
]
|
||||
}
|
||||
},
|
||||
"aggs": {
|
||||
"by_type": {
|
||||
"terms": { "field": "type" }
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Unified Search
|
||||
|
||||
**Cross-Entity Search**:
|
||||
- Search across blocks, transactions, addresses, tokens
|
||||
- Use `_index` field to filter by entity type
|
||||
- Combine results with relevance scoring
|
||||
|
||||
**Multi-Index Query**:
|
||||
```json
|
||||
{
|
||||
"query": {
|
||||
"multi_match": {
|
||||
"query": "0x123",
|
||||
"fields": ["hash", "address", "from_address", "to_address"],
|
||||
"type": "best_fields"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Index Configuration
|
||||
|
||||
### Analysis Settings
|
||||
|
||||
**Custom Analyzer**:
|
||||
- Address analyzer: Lowercase, no tokenization
|
||||
- Symbol analyzer: Uppercase, no tokenization
|
||||
- Text analyzer: Standard analyzer with lowercase
|
||||
|
||||
**Example Configuration**:
|
||||
```json
|
||||
{
|
||||
"settings": {
|
||||
"analysis": {
|
||||
"analyzer": {
|
||||
"address_analyzer": {
|
||||
"type": "custom",
|
||||
"tokenizer": "keyword",
|
||||
"filter": ["lowercase"]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Sharding and Replication
|
||||
|
||||
**Sharding**:
|
||||
- Number of shards: Based on index size
|
||||
- Large indices (> 50GB): Multiple shards
|
||||
- Small indices: Single shard
|
||||
|
||||
**Replication**:
|
||||
- Replica count: 1-2 (for high availability)
|
||||
- Increase replicas for read-heavy workloads
|
||||
|
||||
## Performance Optimization
|
||||
|
||||
### Index Optimization
|
||||
|
||||
**Refresh Interval**:
|
||||
- Default: 1 second
|
||||
- For bulk indexing: Increase to 30 seconds, then reset
|
||||
|
||||
**Bulk Indexing**:
|
||||
- Batch size: 1000-5000 documents
|
||||
- Use bulk API
|
||||
- Disable refresh during bulk indexing
|
||||
|
||||
### Query Optimization
|
||||
|
||||
**Query Caching**:
|
||||
- Enable query cache for repeated queries
|
||||
- Cache filter results
|
||||
|
||||
**Field Data**:
|
||||
- Use `doc_values` for sorting/aggregations
|
||||
- Avoid `fielddata` for text fields
|
||||
|
||||
## Maintenance
|
||||
|
||||
### Index Monitoring
|
||||
|
||||
**Metrics**:
|
||||
- Index size
|
||||
- Document count
|
||||
- Query performance (p50, p95, p99)
|
||||
- Index lag (time behind database)
|
||||
|
||||
### Index Cleanup
|
||||
|
||||
**Strategy**:
|
||||
- Delete old indices (after alias switch)
|
||||
- Archive old indices to cold storage
|
||||
- Compress indices for storage efficiency
|
||||
|
||||
## Integration with PostgreSQL
|
||||
|
||||
### Data Sync
|
||||
|
||||
**Sync Strategy**:
|
||||
- Real-time: Listen to database changes (CDC, triggers, or polling)
|
||||
- Batch: Periodic sync jobs
|
||||
- Hybrid: Real-time for recent data, batch for historical
|
||||
|
||||
**Change Detection**:
|
||||
- Use `updated_at` timestamp
|
||||
- Use database triggers to queue changes
|
||||
- Use CDC (Change Data Capture) if available
|
||||
|
||||
### Consistency
|
||||
|
||||
**Eventual Consistency**:
|
||||
- Search index is eventually consistent with database
|
||||
- Small lag acceptable (< 1 minute)
|
||||
- Critical queries can fall back to database
|
||||
|
||||
## References
|
||||
|
||||
- Database Schema: See `postgres-schema.md`
|
||||
- Indexer Architecture: See `../indexing/indexer-architecture.md`
|
||||
- Unified Search: See `../multichain/unified-search.md`
|
||||
|
||||
239
docs/specs/database/timeseries-schema.md
Normal file
239
docs/specs/database/timeseries-schema.md
Normal file
@@ -0,0 +1,239 @@
|
||||
# Time-Series Database Schema Specification
|
||||
|
||||
## Overview
|
||||
|
||||
This document specifies the time-series database schema using ClickHouse or TimescaleDB for storing mempool data, metrics, and analytics time-series data.
|
||||
|
||||
## Technology Choice
|
||||
|
||||
**Option 1: TimescaleDB** (PostgreSQL extension)
|
||||
- Pros: PostgreSQL compatibility, SQL interface, easier integration
|
||||
- Cons: Less optimized for very high throughput
|
||||
|
||||
**Option 2: ClickHouse**
|
||||
- Pros: Very high performance, columnar storage, excellent compression
|
||||
- Cons: Different SQL dialect, separate infrastructure
|
||||
|
||||
**Recommendation**: Start with TimescaleDB for easier integration, migrate to ClickHouse if needed for scale.
|
||||
|
||||
## TimescaleDB Schema
|
||||
|
||||
### Mempool Transactions Table
|
||||
|
||||
**Table**: `mempool_transactions`
|
||||
|
||||
```sql
|
||||
CREATE TABLE mempool_transactions (
|
||||
time TIMESTAMPTZ NOT NULL,
|
||||
chain_id INTEGER NOT NULL,
|
||||
hash VARCHAR(66) NOT NULL,
|
||||
from_address VARCHAR(42) NOT NULL,
|
||||
to_address VARCHAR(42),
|
||||
value NUMERIC(78, 0),
|
||||
gas_price BIGINT,
|
||||
max_fee_per_gas BIGINT,
|
||||
max_priority_fee_per_gas BIGINT,
|
||||
gas_limit BIGINT,
|
||||
nonce BIGINT,
|
||||
input_data_length INTEGER,
|
||||
first_seen TIMESTAMPTZ NOT NULL,
|
||||
status VARCHAR(20) DEFAULT 'pending', -- 'pending', 'confirmed', 'dropped'
|
||||
confirmed_block_number BIGINT,
|
||||
confirmed_at TIMESTAMPTZ,
|
||||
PRIMARY KEY (time, chain_id, hash)
|
||||
);
|
||||
|
||||
SELECT create_hypertable('mempool_transactions', 'time');
|
||||
|
||||
CREATE INDEX idx_mempool_chain_hash ON mempool_transactions(chain_id, hash);
|
||||
CREATE INDEX idx_mempool_chain_from ON mempool_transactions(chain_id, from_address);
|
||||
CREATE INDEX idx_mempool_chain_status ON mempool_transactions(chain_id, status, time);
|
||||
```
|
||||
|
||||
**Retention Policy**: 7 days for detailed data, aggregates for longer periods
|
||||
|
||||
### Network Metrics Table
|
||||
|
||||
**Table**: `network_metrics`
|
||||
|
||||
```sql
|
||||
CREATE TABLE network_metrics (
|
||||
time TIMESTAMPTZ NOT NULL,
|
||||
chain_id INTEGER NOT NULL,
|
||||
block_number BIGINT,
|
||||
tps DOUBLE PRECISION, -- Transactions per second
|
||||
gps DOUBLE PRECISION, -- Gas per second
|
||||
avg_gas_price BIGINT,
|
||||
pending_transactions INTEGER,
|
||||
block_time_seconds DOUBLE PRECISION,
|
||||
PRIMARY KEY (time, chain_id)
|
||||
);
|
||||
|
||||
SELECT create_hypertable('network_metrics', 'time');
|
||||
|
||||
CREATE INDEX idx_network_metrics_chain_time ON network_metrics(chain_id, time DESC);
|
||||
```
|
||||
|
||||
**Aggregation**: Pre-aggregate to 1-minute, 5-minute, 1-hour intervals
|
||||
|
||||
### Gas Price History Table
|
||||
|
||||
**Table**: `gas_price_history`
|
||||
|
||||
```sql
|
||||
CREATE TABLE gas_price_history (
|
||||
time TIMESTAMPTZ NOT NULL,
|
||||
chain_id INTEGER NOT NULL,
|
||||
block_number BIGINT,
|
||||
min_gas_price BIGINT,
|
||||
max_gas_price BIGINT,
|
||||
avg_gas_price BIGINT,
|
||||
p25_gas_price BIGINT, -- 25th percentile
|
||||
p50_gas_price BIGINT, -- 50th percentile (median)
|
||||
p75_gas_price BIGINT, -- 75th percentile
|
||||
p95_gas_price BIGINT, -- 95th percentile
|
||||
p99_gas_price BIGINT, -- 99th percentile
|
||||
PRIMARY KEY (time, chain_id)
|
||||
);
|
||||
|
||||
SELECT create_hypertable('gas_price_history', 'time');
|
||||
```
|
||||
|
||||
### Address Activity Metrics Table
|
||||
|
||||
**Table**: `address_activity_metrics`
|
||||
|
||||
```sql
|
||||
CREATE TABLE address_activity_metrics (
|
||||
time TIMESTAMPTZ NOT NULL,
|
||||
chain_id INTEGER NOT NULL,
|
||||
address VARCHAR(42) NOT NULL,
|
||||
transaction_count INTEGER,
|
||||
received_count INTEGER,
|
||||
sent_count INTEGER,
|
||||
total_received NUMERIC(78, 0),
|
||||
total_sent NUMERIC(78, 0),
|
||||
PRIMARY KEY (time, chain_id, address)
|
||||
);
|
||||
|
||||
SELECT create_hypertable('address_activity_metrics', 'time',
|
||||
chunk_time_interval => INTERVAL '1 day');
|
||||
|
||||
CREATE INDEX idx_address_activity_chain_address ON address_activity_metrics(chain_id, address, time DESC);
|
||||
```
|
||||
|
||||
**Aggregation**: Pre-aggregate to hourly/daily for addresses
|
||||
|
||||
## ClickHouse Schema (Alternative)
|
||||
|
||||
### Mempool Transactions Table
|
||||
|
||||
```sql
|
||||
CREATE TABLE mempool_transactions (
|
||||
time DateTime('UTC') NOT NULL,
|
||||
chain_id UInt32 NOT NULL,
|
||||
hash String NOT NULL,
|
||||
from_address String NOT NULL,
|
||||
to_address Nullable(String),
|
||||
value Decimal128(0),
|
||||
gas_price UInt64,
|
||||
max_fee_per_gas Nullable(UInt64),
|
||||
max_priority_fee_per_gas Nullable(UInt64),
|
||||
gas_limit UInt64,
|
||||
nonce UInt64,
|
||||
input_data_length UInt32,
|
||||
first_seen DateTime('UTC') NOT NULL,
|
||||
status String DEFAULT 'pending',
|
||||
confirmed_block_number Nullable(UInt64),
|
||||
confirmed_at Nullable(DateTime('UTC'))
|
||||
) ENGINE = MergeTree()
|
||||
PARTITION BY toYYYYMM(time)
|
||||
ORDER BY (chain_id, time, hash)
|
||||
TTL time + INTERVAL 7 DAY; -- Auto-delete after 7 days
|
||||
```
|
||||
|
||||
## Data Retention and Aggregation
|
||||
|
||||
### Retention Policies
|
||||
|
||||
**Raw Data**:
|
||||
- Mempool transactions: 7 days
|
||||
- Network metrics: 30 days
|
||||
- Gas price history: 90 days
|
||||
- Address activity: 30 days
|
||||
|
||||
**Aggregated Data**:
|
||||
- 1-minute aggregates: 90 days
|
||||
- 5-minute aggregates: 1 year
|
||||
- 1-hour aggregates: 5 years
|
||||
- Daily aggregates: Indefinite
|
||||
|
||||
### Continuous Aggregates (TimescaleDB)
|
||||
|
||||
```sql
|
||||
-- 1-minute network metrics aggregate
|
||||
CREATE MATERIALIZED VIEW network_metrics_1m
|
||||
WITH (timescaledb.continuous) AS
|
||||
SELECT
|
||||
time_bucket('1 minute', time) AS bucket,
|
||||
chain_id,
|
||||
AVG(tps) AS avg_tps,
|
||||
AVG(gps) AS avg_gps,
|
||||
AVG(avg_gas_price) AS avg_gas_price,
|
||||
AVG(pending_transactions) AS avg_pending_tx
|
||||
FROM network_metrics
|
||||
GROUP BY bucket, chain_id;
|
||||
|
||||
-- Add refresh policy
|
||||
SELECT add_continuous_aggregate_policy('network_metrics_1m',
|
||||
start_offset => INTERVAL '1 hour',
|
||||
end_offset => INTERVAL '1 minute',
|
||||
schedule_interval => INTERVAL '1 minute');
|
||||
```
|
||||
|
||||
## Query Patterns
|
||||
|
||||
### Recent Mempool Transactions
|
||||
|
||||
```sql
|
||||
SELECT * FROM mempool_transactions
|
||||
WHERE chain_id = 138
|
||||
AND time > NOW() - INTERVAL '1 hour'
|
||||
AND status = 'pending'
|
||||
ORDER BY time DESC
|
||||
LIMIT 100;
|
||||
```
|
||||
|
||||
### Gas Price Statistics
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
time_bucket('5 minutes', time) AS bucket,
|
||||
AVG(avg_gas_price) AS avg_gas_price,
|
||||
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY avg_gas_price) AS median_gas_price
|
||||
FROM gas_price_history
|
||||
WHERE chain_id = 138
|
||||
AND time > NOW() - INTERVAL '24 hours'
|
||||
GROUP BY bucket
|
||||
ORDER BY bucket DESC;
|
||||
```
|
||||
|
||||
### Network Throughput
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
time_bucket('1 minute', time) AS bucket,
|
||||
AVG(tps) AS avg_tps,
|
||||
MAX(tps) AS max_tps
|
||||
FROM network_metrics
|
||||
WHERE chain_id = 138
|
||||
AND time > NOW() - INTERVAL '1 hour'
|
||||
GROUP BY bucket
|
||||
ORDER BY bucket DESC;
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- Mempool Service: See `../mempool/mempool-service.md`
|
||||
- Observability: See `../observability/metrics-monitoring.md`
|
||||
|
||||
Reference in New Issue
Block a user