459 lines
9.5 KiB
Markdown
459 lines
9.5 KiB
Markdown
|
|
# Search Index Schema Specification
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
This document specifies the Elasticsearch/OpenSearch index schema for full-text search and faceted querying across blocks, transactions, addresses, tokens, and contracts.
|
||
|
|
|
||
|
|
## Architecture
|
||
|
|
|
||
|
|
```mermaid
|
||
|
|
flowchart LR
|
||
|
|
PG[(PostgreSQL<br/>Canonical Data)]
|
||
|
|
Transform[Data Transformer]
|
||
|
|
ES[(Elasticsearch<br/>Search Index)]
|
||
|
|
|
||
|
|
PG --> Transform
|
||
|
|
Transform --> ES
|
||
|
|
|
||
|
|
Query[Search Query]
|
||
|
|
Query --> ES
|
||
|
|
ES --> Results[Search Results]
|
||
|
|
```
|
||
|
|
|
||
|
|
## Index Structure
|
||
|
|
|
||
|
|
### Blocks Index
|
||
|
|
|
||
|
|
**Index Name**: `blocks-{chain_id}` (e.g., `blocks-138`)
|
||
|
|
|
||
|
|
**Document Structure**:
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"block_number": 12345,
|
||
|
|
"hash": "0x...",
|
||
|
|
"timestamp": "2024-01-01T00:00:00Z",
|
||
|
|
"miner": "0x...",
|
||
|
|
"transaction_count": 100,
|
||
|
|
"gas_used": 15000000,
|
||
|
|
"gas_limit": 20000000,
|
||
|
|
"chain_id": 138,
|
||
|
|
"parent_hash": "0x...",
|
||
|
|
"size": 1024
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Field Mappings**:
|
||
|
|
- `block_number`: `long` (not analyzed, for sorting/filtering)
|
||
|
|
- `hash`: `keyword` (exact match)
|
||
|
|
- `timestamp`: `date`
|
||
|
|
- `miner`: `keyword` (exact match)
|
||
|
|
- `transaction_count`: `integer`
|
||
|
|
- `gas_used`: `long`
|
||
|
|
- `gas_limit`: `long`
|
||
|
|
- `chain_id`: `integer`
|
||
|
|
- `parent_hash`: `keyword`
|
||
|
|
|
||
|
|
**Searchable Fields**:
|
||
|
|
- Hash (exact match)
|
||
|
|
- Miner address (exact match)
|
||
|
|
|
||
|
|
### Transactions Index
|
||
|
|
|
||
|
|
**Index Name**: `transactions-{chain_id}`
|
||
|
|
|
||
|
|
**Document Structure**:
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"hash": "0x...",
|
||
|
|
"block_number": 12345,
|
||
|
|
"transaction_index": 5,
|
||
|
|
"from_address": "0x...",
|
||
|
|
"to_address": "0x...",
|
||
|
|
"value": "1000000000000000000",
|
||
|
|
"gas_price": "20000000000",
|
||
|
|
"gas_used": 21000,
|
||
|
|
"status": "success",
|
||
|
|
"timestamp": "2024-01-01T00:00:00Z",
|
||
|
|
"chain_id": 138,
|
||
|
|
"input_data_length": 100,
|
||
|
|
"is_contract_creation": false,
|
||
|
|
"contract_address": null
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Field Mappings**:
|
||
|
|
- `hash`: `keyword`
|
||
|
|
- `block_number`: `long`
|
||
|
|
- `transaction_index`: `integer`
|
||
|
|
- `from_address`: `keyword`
|
||
|
|
- `to_address`: `keyword`
|
||
|
|
- `value`: `text` (for full-text search on large numbers)
|
||
|
|
- `value_numeric`: `long` (for range queries)
|
||
|
|
- `gas_price`: `long`
|
||
|
|
- `gas_used`: `long`
|
||
|
|
- `status`: `keyword`
|
||
|
|
- `timestamp`: `date`
|
||
|
|
- `chain_id`: `integer`
|
||
|
|
- `input_data_length`: `integer`
|
||
|
|
- `is_contract_creation`: `boolean`
|
||
|
|
- `contract_address`: `keyword`
|
||
|
|
|
||
|
|
**Searchable Fields**:
|
||
|
|
- Hash (exact match)
|
||
|
|
- From/to addresses (exact match)
|
||
|
|
- Value (range queries)
|
||
|
|
|
||
|
|
### Addresses Index
|
||
|
|
|
||
|
|
**Index Name**: `addresses-{chain_id}`
|
||
|
|
|
||
|
|
**Document Structure**:
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"address": "0x...",
|
||
|
|
"chain_id": 138,
|
||
|
|
"label": "My Wallet",
|
||
|
|
"tags": ["wallet", "exchange"],
|
||
|
|
"token_count": 10,
|
||
|
|
"transaction_count": 500,
|
||
|
|
"first_seen": "2024-01-01T00:00:00Z",
|
||
|
|
"last_seen": "2024-01-15T00:00:00Z",
|
||
|
|
"is_contract": true,
|
||
|
|
"contract_name": "MyToken",
|
||
|
|
"balance_eth": "1.5",
|
||
|
|
"balance_usd": "3000"
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Field Mappings**:
|
||
|
|
- `address`: `keyword`
|
||
|
|
- `chain_id`: `integer`
|
||
|
|
- `label`: `text` (analyzed) + `keyword` (exact match)
|
||
|
|
- `tags`: `keyword` (array)
|
||
|
|
- `token_count`: `integer`
|
||
|
|
- `transaction_count`: `long`
|
||
|
|
- `first_seen`: `date`
|
||
|
|
- `last_seen`: `date`
|
||
|
|
- `is_contract`: `boolean`
|
||
|
|
- `contract_name`: `text` + `keyword`
|
||
|
|
- `balance_eth`: `double`
|
||
|
|
- `balance_usd`: `double`
|
||
|
|
|
||
|
|
**Searchable Fields**:
|
||
|
|
- Address (exact match, prefix match)
|
||
|
|
- Label (full-text search)
|
||
|
|
- Contract name (full-text search)
|
||
|
|
- Tags (facet filter)
|
||
|
|
|
||
|
|
### Tokens Index
|
||
|
|
|
||
|
|
**Index Name**: `tokens-{chain_id}`
|
||
|
|
|
||
|
|
**Document Structure**:
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"address": "0x...",
|
||
|
|
"chain_id": 138,
|
||
|
|
"name": "My Token",
|
||
|
|
"symbol": "MTK",
|
||
|
|
"type": "ERC20",
|
||
|
|
"decimals": 18,
|
||
|
|
"total_supply": "1000000000000000000000000",
|
||
|
|
"holder_count": 1000,
|
||
|
|
"transfer_count": 50000,
|
||
|
|
"logo_url": "https://...",
|
||
|
|
"verified": true,
|
||
|
|
"description": "A token description"
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Field Mappings**:
|
||
|
|
- `address`: `keyword`
|
||
|
|
- `chain_id`: `integer`
|
||
|
|
- `name`: `text` (analyzed) + `keyword` (exact match)
|
||
|
|
- `symbol`: `keyword` (uppercase normalized)
|
||
|
|
- `type`: `keyword`
|
||
|
|
- `decimals`: `integer`
|
||
|
|
- `total_supply`: `text` (for large numbers)
|
||
|
|
- `total_supply_numeric`: `double` (for sorting)
|
||
|
|
- `holder_count`: `integer`
|
||
|
|
- `transfer_count`: `long`
|
||
|
|
- `logo_url`: `keyword`
|
||
|
|
- `verified`: `boolean`
|
||
|
|
- `description`: `text` (analyzed)
|
||
|
|
|
||
|
|
**Searchable Fields**:
|
||
|
|
- Name (full-text search)
|
||
|
|
- Symbol (exact match, prefix match)
|
||
|
|
- Address (exact match)
|
||
|
|
|
||
|
|
### Contracts Index
|
||
|
|
|
||
|
|
**Index Name**: `contracts-{chain_id}`
|
||
|
|
|
||
|
|
**Document Structure**:
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"address": "0x...",
|
||
|
|
"chain_id": 138,
|
||
|
|
"name": "MyContract",
|
||
|
|
"verification_status": "verified",
|
||
|
|
"compiler_version": "0.8.19",
|
||
|
|
"source_code": "contract MyContract {...}",
|
||
|
|
"abi": [...],
|
||
|
|
"verified_at": "2024-01-01T00:00:00Z",
|
||
|
|
"transaction_count": 1000,
|
||
|
|
"created_at": "2024-01-01T00:00:00Z"
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Field Mappings**:
|
||
|
|
- `address`: `keyword`
|
||
|
|
- `chain_id`: `integer`
|
||
|
|
- `name`: `text` + `keyword`
|
||
|
|
- `verification_status`: `keyword`
|
||
|
|
- `compiler_version`: `keyword`
|
||
|
|
- `source_code`: `text` (analyzed, indexed but not stored in full for large contracts)
|
||
|
|
- `abi`: `object` (nested, for structured queries)
|
||
|
|
- `verified_at`: `date`
|
||
|
|
- `transaction_count`: `long`
|
||
|
|
- `created_at`: `date`
|
||
|
|
|
||
|
|
**Searchable Fields**:
|
||
|
|
- Name (full-text search)
|
||
|
|
- Address (exact match)
|
||
|
|
- Source code (full-text search, limited)
|
||
|
|
|
||
|
|
## Indexing Pipeline
|
||
|
|
|
||
|
|
### Data Transformation
|
||
|
|
|
||
|
|
**Purpose**: Transform canonical PostgreSQL data into search-optimized documents.
|
||
|
|
|
||
|
|
**Transformation Steps**:
|
||
|
|
1. **Fetch Data**: Query PostgreSQL for entities to index
|
||
|
|
2. **Enrich Data**: Add computed fields (balances, counts, etc.)
|
||
|
|
3. **Normalize Data**: Normalize addresses, format values
|
||
|
|
4. **Index Document**: Send to Elasticsearch/OpenSearch
|
||
|
|
|
||
|
|
### Indexing Strategy
|
||
|
|
|
||
|
|
**Initial Indexing**:
|
||
|
|
- Bulk index existing data
|
||
|
|
- Process in batches (1000 documents per batch)
|
||
|
|
- Use bulk API for efficiency
|
||
|
|
|
||
|
|
**Incremental Indexing**:
|
||
|
|
- Index new entities as they're created
|
||
|
|
- Update entities when changed
|
||
|
|
- Delete entities when removed
|
||
|
|
|
||
|
|
**Update Frequency**:
|
||
|
|
- Real-time: Index immediately after database insert/update
|
||
|
|
- Batch: Bulk update every N minutes for efficiency
|
||
|
|
|
||
|
|
### Index Aliases
|
||
|
|
|
||
|
|
**Purpose**: Enable zero-downtime index updates.
|
||
|
|
|
||
|
|
**Strategy**:
|
||
|
|
- Write to new index (e.g., `blocks-138-v2`)
|
||
|
|
- Build index in background
|
||
|
|
- Switch alias when ready
|
||
|
|
- Delete old index after switch
|
||
|
|
|
||
|
|
**Alias Names**:
|
||
|
|
- `blocks-{chain_id}` → points to latest version
|
||
|
|
- `transactions-{chain_id}` → points to latest version
|
||
|
|
- etc.
|
||
|
|
|
||
|
|
## Query Patterns
|
||
|
|
|
||
|
|
### Full-Text Search
|
||
|
|
|
||
|
|
**Blocks Search**:
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"query": {
|
||
|
|
"match": {
|
||
|
|
"hash": "0x123..."
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Address Search**:
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"query": {
|
||
|
|
"bool": {
|
||
|
|
"should": [
|
||
|
|
{ "match": { "label": "wallet" } },
|
||
|
|
{ "prefix": { "address": "0x123" } }
|
||
|
|
]
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
**Token Search**:
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"query": {
|
||
|
|
"bool": {
|
||
|
|
"should": [
|
||
|
|
{ "match": { "name": "My Token" } },
|
||
|
|
{ "match": { "symbol": "MTK" } }
|
||
|
|
]
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Faceted Search
|
||
|
|
|
||
|
|
**Filter by Multiple Criteria**:
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"query": {
|
||
|
|
"bool": {
|
||
|
|
"must": [
|
||
|
|
{ "term": { "chain_id": 138 } },
|
||
|
|
{ "term": { "type": "ERC20" } },
|
||
|
|
{ "range": { "holder_count": { "gte": 100 } } }
|
||
|
|
]
|
||
|
|
}
|
||
|
|
},
|
||
|
|
"aggs": {
|
||
|
|
"by_type": {
|
||
|
|
"terms": { "field": "type" }
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Unified Search
|
||
|
|
|
||
|
|
**Cross-Entity Search**:
|
||
|
|
- Search across blocks, transactions, addresses, tokens
|
||
|
|
- Use `_index` field to filter by entity type
|
||
|
|
- Combine results with relevance scoring
|
||
|
|
|
||
|
|
**Multi-Index Query**:
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"query": {
|
||
|
|
"multi_match": {
|
||
|
|
"query": "0x123",
|
||
|
|
"fields": ["hash", "address", "from_address", "to_address"],
|
||
|
|
"type": "best_fields"
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Index Configuration
|
||
|
|
|
||
|
|
### Analysis Settings
|
||
|
|
|
||
|
|
**Custom Analyzer**:
|
||
|
|
- Address analyzer: Lowercase, no tokenization
|
||
|
|
- Symbol analyzer: Uppercase, no tokenization
|
||
|
|
- Text analyzer: Standard analyzer with lowercase
|
||
|
|
|
||
|
|
**Example Configuration**:
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"settings": {
|
||
|
|
"analysis": {
|
||
|
|
"analyzer": {
|
||
|
|
"address_analyzer": {
|
||
|
|
"type": "custom",
|
||
|
|
"tokenizer": "keyword",
|
||
|
|
"filter": ["lowercase"]
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Sharding and Replication
|
||
|
|
|
||
|
|
**Sharding**:
|
||
|
|
- Number of shards: Based on index size
|
||
|
|
- Large indices (> 50GB): Multiple shards
|
||
|
|
- Small indices: Single shard
|
||
|
|
|
||
|
|
**Replication**:
|
||
|
|
- Replica count: 1-2 (for high availability)
|
||
|
|
- Increase replicas for read-heavy workloads
|
||
|
|
|
||
|
|
## Performance Optimization
|
||
|
|
|
||
|
|
### Index Optimization
|
||
|
|
|
||
|
|
**Refresh Interval**:
|
||
|
|
- Default: 1 second
|
||
|
|
- For bulk indexing: Increase to 30 seconds, then reset
|
||
|
|
|
||
|
|
**Bulk Indexing**:
|
||
|
|
- Batch size: 1000-5000 documents
|
||
|
|
- Use bulk API
|
||
|
|
- Disable refresh during bulk indexing
|
||
|
|
|
||
|
|
### Query Optimization
|
||
|
|
|
||
|
|
**Query Caching**:
|
||
|
|
- Enable query cache for repeated queries
|
||
|
|
- Cache filter results
|
||
|
|
|
||
|
|
**Field Data**:
|
||
|
|
- Use `doc_values` for sorting/aggregations
|
||
|
|
- Avoid `fielddata` for text fields
|
||
|
|
|
||
|
|
## Maintenance
|
||
|
|
|
||
|
|
### Index Monitoring
|
||
|
|
|
||
|
|
**Metrics**:
|
||
|
|
- Index size
|
||
|
|
- Document count
|
||
|
|
- Query performance (p50, p95, p99)
|
||
|
|
- Index lag (time behind database)
|
||
|
|
|
||
|
|
### Index Cleanup
|
||
|
|
|
||
|
|
**Strategy**:
|
||
|
|
- Delete old indices (after alias switch)
|
||
|
|
- Archive old indices to cold storage
|
||
|
|
- Compress indices for storage efficiency
|
||
|
|
|
||
|
|
## Integration with PostgreSQL
|
||
|
|
|
||
|
|
### Data Sync
|
||
|
|
|
||
|
|
**Sync Strategy**:
|
||
|
|
- Real-time: Listen to database changes (CDC, triggers, or polling)
|
||
|
|
- Batch: Periodic sync jobs
|
||
|
|
- Hybrid: Real-time for recent data, batch for historical
|
||
|
|
|
||
|
|
**Change Detection**:
|
||
|
|
- Use `updated_at` timestamp
|
||
|
|
- Use database triggers to queue changes
|
||
|
|
- Use CDC (Change Data Capture) if available
|
||
|
|
|
||
|
|
### Consistency
|
||
|
|
|
||
|
|
**Eventual Consistency**:
|
||
|
|
- Search index is eventually consistent with database
|
||
|
|
- Small lag acceptable (< 1 minute)
|
||
|
|
- Critical queries can fall back to database
|
||
|
|
|
||
|
|
## References
|
||
|
|
|
||
|
|
- Database Schema: See `postgres-schema.md`
|
||
|
|
- Indexer Architecture: See `../indexing/indexer-architecture.md`
|
||
|
|
- Unified Search: See `../multichain/unified-search.md`
|
||
|
|
|