Files
explorer-monorepo/docs/specs/indexing/verification-pipeline.md

524 lines
13 KiB
Markdown

# Contract Verification Pipeline Specification
## Overview
This document specifies the pipeline for verifying smart contracts on the explorer platform. Contract verification allows users to submit source code, which is compiled and compared against deployed bytecode to enable source code viewing, debugging, and ABI extraction.
## Architecture
```mermaid
flowchart TB
subgraph Submit[Submission]
User[User Submits<br/>Source Code]
UI[Verification UI]
API[Verification API]
end
subgraph Validate[Validation]
Val[Validate Input]
Check[Check Contract Exists]
Dup[Check Duplicate]
end
subgraph Compile[Compilation]
Comp[Compiler Service]
Versions[Compiler Version<br/>Registry]
Build[Build Artifacts]
end
subgraph Verify[Verification]
Match[Bytecode Matching]
Construct[Constructor Args<br/>Extraction]
MatchResult[Match Result]
end
subgraph Store[Storage]
DB[(Database)]
Artifacts[Artifact Storage<br/>S3/Immutable]
ABI[ABI Registry]
end
User --> UI
UI --> API
API --> Val
Val --> Check
Check --> Dup
Dup --> Comp
Comp --> Versions
Comp --> Build
Build --> Match
Match --> Construct
Construct --> MatchResult
MatchResult --> DB
MatchResult --> Artifacts
MatchResult --> ABI
```
## Source Code Submission Workflow
### Submission Methods
**1. Standard JSON Input** (Recommended)
- Submit Solidity compiler's standard JSON input format
- Includes source files, compiler settings, optimization
- Most reliable for complex contracts
**2. Multi-file Upload**
- Upload individual source files
- Specify compiler version and settings
- Compiler constructs standard JSON input
**3. Sourcify Integration**
- Verify via Sourcify API
- Automatic source code and metadata retrieval
- Supports verified contracts from Sourcify registry
**4. Flattened Source**
- Single flattened source file
- All imports inlined
- Simpler but less flexible
### Submission API
**Endpoint**: `POST /api/v1/contracts/{address}/verify`
**Request Body**:
```json
{
"chain_id": 138,
"address": "0x...",
"compiler_version": "v0.8.19+commit.7dd6d404",
"optimization_enabled": true,
"optimization_runs": 200,
"evm_version": "london",
"source_code": "...", // or standard_json_input
"constructor_arguments": "0x...",
"library_addresses": {
"Lib1": "0x..."
},
"verification_method": "standard_json"
}
```
**Response**:
```json
{
"status": "pending",
"verification_id": "uuid",
"message": "Verification submitted"
}
```
### Input Validation
**Validation Rules**:
1. **Contract Address**: Must be valid Ethereum address, must exist on chain
2. **Compiler Version**: Must be supported compiler version
3. **Source Code**: Must be valid Solidity/Vyper code
4. **Constructor Arguments**: Must match deployed contract (if provided)
5. **Library Addresses**: Must match deployed libraries (if provided)
**Error Handling**:
- Invalid address: 400 Bad Request
- Unsupported compiler: 400 Bad Request
- Invalid source code: 400 Bad Request
- Contract not found: 404 Not Found
## Compiler Version Management
### Compiler Registry
**Purpose**: Manage available compiler versions and their metadata.
**Storage**:
```sql
compiler_versions (
id SERIAL PRIMARY KEY,
version VARCHAR(50) UNIQUE NOT NULL,
compiler_type VARCHAR(20) NOT NULL, -- 'solidity', 'vyper'
evm_version VARCHAR(20),
optimizer_available BOOLEAN DEFAULT true,
download_url TEXT,
checksum VARCHAR(64),
installed BOOLEAN DEFAULT false,
installed_path TEXT,
created_at TIMESTAMP DEFAULT NOW()
)
```
### Compiler Installation
**Methods**:
1. **Pre-installed**: Common versions pre-installed on compilation servers
2. **On-demand**: Download and install when needed
3. **Docker**: Use compiler Docker images (isolated, reproducible)
**Recommended**: Docker-based compilation for isolation and reproducibility.
**Docker Setup**:
```dockerfile
FROM ethereum/solc:0.8.19
# Or use solc-select for version management
```
### Version Selection
**Strategy**:
- Exact match: User specifies exact version
- Pragma matching: Extract version from source code pragma
- Latest compatible: Use latest compatible version if exact not available
**Pragma Parsing**:
- Extract `pragma solidity ^0.8.0;` or `>=0.8.0 <0.9.0`
- Resolve to specific compiler version
- Handle caret (^), tilde (~), and range operators
## Compilation Process
### Standard JSON Input Format
**Structure**:
```json
{
"language": "Solidity",
"sources": {
"Contract.sol": {
"content": "pragma solidity ^0.8.0; ..."
}
},
"settings": {
"optimizer": {
"enabled": true,
"runs": 200
},
"evmVersion": "london",
"outputSelection": {
"*": {
"*": ["abi", "evm.bytecode", "evm.deployedBytecode"]
}
}
}
}
```
### Compilation Steps
1. **Prepare Input**: Construct standard JSON input from user submission
2. **Select Compiler**: Choose appropriate compiler version
3. **Resolve Imports**: Handle import statements (local files, external URLs)
4. **Compile**: Execute compiler with standard JSON input
5. **Extract Artifacts**: Extract ABI, bytecode, deployed bytecode
6. **Handle Errors**: Parse compilation errors and return to user
### Import Resolution
**Import Types**:
- **Local Files**: Included in submission
- **External URLs**: Fetch from URL (GitHub, IPFS, etc.)
- **Standard Libraries**: Known library addresses (OpenZeppelin, etc.)
**Resolution Strategy**:
1. Check local files first
2. Try external URL fetching
3. Check standard library registry
4. Fail if cannot resolve
### Optimization Settings
**Optimizer Configuration**:
- **Enabled**: Boolean flag
- **Runs**: Optimization runs (affects bytecode size vs gas cost)
- **EVN Version**: Target EVM version (affects bytecode generation)
**Matching Strategy**:
- Must match deployed contract's optimization settings exactly
- Try multiple optimization combinations if initial match fails
## Bytecode Matching
### Matching Process
**Goal**: Compare compiled bytecode with deployed bytecode.
**Steps**:
1. Fetch deployed bytecode from chain via `eth_getCode(address)`
2. Extract deployed bytecode from compilation artifacts
3. Compare bytecodes (exact match required)
4. Handle constructor arguments (trimmed from deployed bytecode)
### Bytecode Normalization
**Normalization Steps**:
1. Remove metadata hash (last 53 bytes)
2. Remove constructor arguments (if contract creation)
3. Compare remaining bytecode
**Metadata Hash**:
- Solidity appends metadata hash to bytecode
- Format: `0xa2646970667358221220...` + 43 bytes
- Should be excluded from comparison
### Constructor Arguments Extraction
**Purpose**: Extract constructor arguments from deployed bytecode.
**Process**:
1. Compiled bytecode: `creation_code + constructor_args`
2. Deployed bytecode: `runtime_code` (constructor args removed)
3. Extract constructor args: `deployed_bytecode.length - runtime_code.length`
**Validation**:
- Verify extracted constructor args match user-provided args (if provided)
- Decode constructor args if ABI available
### Library Linking
**Problem**: Contracts using libraries have placeholders in bytecode.
**Solution**:
1. Identify library placeholders in compiled bytecode
2. Replace placeholders with actual library addresses
3. Compare linked bytecode with deployed bytecode
**Library Placeholder Format**:
- `__$...$__` (Solidity)
- Must match user-provided library addresses
## Verification Status Tracking
### Status States
**States**:
1. **pending**: Verification submitted, queued for processing
2. **processing**: Compilation/verification in progress
3. **verified**: Bytecode matches, contract verified
4. **failed**: Verification failed (mismatch, compilation error, etc.)
5. **partially_verified**: Some source files verified (multi-file contracts)
### Status Updates
**Database Schema**:
```sql
contract_verifications (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
chain_id INTEGER NOT NULL,
address VARCHAR(42) NOT NULL,
status VARCHAR(20) NOT NULL,
compiler_version VARCHAR(50),
optimization_enabled BOOLEAN,
optimization_runs INTEGER,
evm_version VARCHAR(20),
source_code TEXT,
abi JSONB,
constructor_arguments TEXT,
verification_method VARCHAR(50),
error_message TEXT,
verified_at TIMESTAMP,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
FOREIGN KEY (chain_id, address) REFERENCES contracts(chain_id, address)
)
```
**Status Transitions**:
- `pending``processing``verified` or `failed`
- Webhook/notification on status change (optional)
## Build Artifact Storage
### Artifact Types
**Artifacts to Store**:
1. **Source Code**: Original submitted source files
2. **Standard JSON Input**: Compiler input
3. **Compiler Output**: Full compiler JSON output
4. **ABI**: Extracted ABI
5. **Bytecode**: Creation and runtime bytecode
6. **Metadata**: Compiler metadata
### Storage Strategy
**Immutable Storage**:
- Use S3-compatible storage (AWS S3, MinIO, etc.)
- Immutable after verification (no updates)
- Versioned storage if updates needed
**Storage Path Structure**:
```
contracts/{chain_id}/{address}/verification_{id}/
- source_code.sol
- standard_json_input.json
- compiler_output.json
- abi.json
- bytecode.txt
- metadata.json
```
**Database Reference**:
- Store artifact storage path in database
- Link to contract record
### Access Control
**Public Access**:
- Verified contracts: Public read access
- Source code: Public read access
- Artifacts: Public read access
**Private Access**:
- Pending verifications: Owner only
- Failed verifications: Owner only (optional public)
## Sourcify Integration
### Sourcify API
**Endpoint**: `GET /api/v1/verify/{chain_id}/{address}`
**Process**:
1. Query Sourcify API for contract verification
2. Retrieve source files and metadata
3. Verify match with deployed bytecode
4. Store in our database if match
**Benefits**:
- Leverage existing verified contracts
- Automatic verification for popular contracts
- Reduces manual verification workload
### Sourcify Format
**Structure**:
```
contracts/
- {chain_id}/
- {address}/
- metadata.json
- sources/
- Contract.sol
```
**Metadata Format**:
- Compiler version
- Settings
- Source file mapping
## Multi-Compiler Version Support
### Supported Compilers
**Solidity**:
- Versions: 0.4.x through latest
- Multiple versions per contract (updates)
**Vyper**:
- Versions: 0.1.x through latest
- Similar workflow to Solidity
### Version Compatibility
**Handling**:
- Support multiple verification attempts with different versions
- Store all verification attempts (history)
- Mark latest successful verification as active
**Database Schema**:
```sql
contract_verifications (
-- ... fields ...
version INTEGER DEFAULT 1, -- Increment for each new verification
is_active BOOLEAN DEFAULT true -- Latest successful verification
)
```
## Error Handling
### Compilation Errors
**Error Types**:
- Syntax errors
- Type errors
- Import resolution errors
- Optimization errors
**Response**:
- Return detailed error messages to user
- Include file and line number
- Suggest fixes when possible
### Verification Failures
**Failure Reasons**:
- Bytecode mismatch
- Constructor arguments mismatch
- Library address mismatch
- Optimization settings mismatch
**Response**:
- Return specific mismatch reason
- Suggest correct settings if possible
- Allow retry with corrected input
## Performance Considerations
### Compilation Performance
**Optimization**:
- Cache compilation results (same source + settings)
- Parallel compilation for multiple contracts
- Compiler server pool for load distribution
### Queue Management
**Queue System**:
- Use message queue (RabbitMQ, Kafka) for verification jobs
- Priority queue: User submissions before automated checks
- Rate limiting per user/IP
**Processing Time**:
- Target: < 30 seconds for simple contracts
- Target: < 5 minutes for complex contracts
- Timeout: 10 minutes maximum
## Security Considerations
### Source Code Validation
**Validation**:
- Validate source code size (max 10MB)
- Sanitize input to prevent injection attacks
- Validate compiler version (whitelist known versions)
### Artifact Storage Security
**Access Control**:
- Verify ownership before allowing updates
- Audit log all verification submissions
- Rate limit submissions per user/IP
## API Endpoints
### Submit Verification
`POST /api/v1/contracts/{address}/verify`
### Check Status
`GET /api/v1/contracts/{address}/verification/{verification_id}`
### Get Verified Contract
`GET /api/v1/contracts/{address}`
### List Verification History
`GET /api/v1/contracts/{address}/verifications`
## References
- Indexer Architecture: See `indexer-architecture.md`
- Data Models: See `data-models.md`
- Database Schema: See `../database/postgres-schema.md`
- API Specification: See `../api/rest-api.md`