Add full monorepo: virtual-banker, backend, frontend, docs, scripts, deployment
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
68
docs/specs/observability/dashboards.md
Normal file
68
docs/specs/observability/dashboards.md
Normal file
@@ -0,0 +1,68 @@
|
||||
# Dashboards Specification
|
||||
|
||||
## Overview
|
||||
|
||||
Observability dashboards for monitoring platform health and performance.
|
||||
|
||||
## Indexer Lag Dashboard
|
||||
|
||||
### Metrics
|
||||
|
||||
- Current block vs indexed block (per chain)
|
||||
- Time lag (minutes behind)
|
||||
- Processing rate (blocks/minute)
|
||||
- Historical lag trends
|
||||
|
||||
### Visualizations
|
||||
|
||||
- Lag over time (line chart)
|
||||
- Current lag by chain (bar chart)
|
||||
- Alert status (indicator)
|
||||
|
||||
## CCIP Message Lifecycle Dashboard
|
||||
|
||||
### Metrics
|
||||
|
||||
- Messages by status
|
||||
- Success rate
|
||||
- Average execution time
|
||||
- Failure reasons
|
||||
|
||||
### Visualizations
|
||||
|
||||
- Message flow diagram
|
||||
- Status distribution (pie chart)
|
||||
- Latency over time
|
||||
- Chain pair statistics
|
||||
|
||||
## Transaction Funnel Analytics
|
||||
|
||||
### Funnel Stages
|
||||
|
||||
1. Quote requested
|
||||
2. User approved
|
||||
3. Transaction signed
|
||||
4. Transaction broadcast
|
||||
5. Transaction confirmed
|
||||
|
||||
### Metrics
|
||||
|
||||
- Conversion rate at each stage
|
||||
- Drop-off reasons
|
||||
- Time at each stage
|
||||
|
||||
## System Health Dashboard
|
||||
|
||||
### Components
|
||||
|
||||
- Service health status
|
||||
- Error rates
|
||||
- Request rates
|
||||
- Resource usage
|
||||
- Database health
|
||||
|
||||
## References
|
||||
|
||||
- Metrics & Monitoring: See `metrics-monitoring.md`
|
||||
- Logging: See `logging.md`
|
||||
|
||||
75
docs/specs/observability/logging.md
Normal file
75
docs/specs/observability/logging.md
Normal file
@@ -0,0 +1,75 @@
|
||||
# Logging Architecture Specification
|
||||
|
||||
## Overview
|
||||
|
||||
Centralized logging architecture for the explorer platform.
|
||||
|
||||
## Log Aggregation Strategy
|
||||
|
||||
**Solution**: ELK Stack (Elasticsearch, Logstash, Kibana) or Loki + Grafana
|
||||
|
||||
**Flow**:
|
||||
1. Services emit logs
|
||||
2. Log collectors aggregate logs
|
||||
3. Logs stored in central store
|
||||
4. Dashboards and queries via UI
|
||||
|
||||
## Log Levels and Categorization
|
||||
|
||||
### Log Levels
|
||||
|
||||
- **DEBUG**: Detailed debugging information
|
||||
- **INFO**: General informational messages
|
||||
- **WARN**: Warning messages
|
||||
- **ERROR**: Error messages
|
||||
- **FATAL**: Critical errors
|
||||
|
||||
### Categories
|
||||
|
||||
**Application Logs**: Business logic, API requests
|
||||
**Access Logs**: HTTP requests, authentication
|
||||
**System Logs**: Infrastructure, system events
|
||||
**Audit Logs**: Security events, compliance
|
||||
|
||||
## Structured Logging Format
|
||||
|
||||
### Log Format
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2024-01-01T00:00:00Z",
|
||||
"level": "INFO",
|
||||
"service": "explorer-api",
|
||||
"message": "Request processed",
|
||||
"request_id": "uuid",
|
||||
"user_id": "uuid",
|
||||
"chain_id": 138,
|
||||
"method": "GET",
|
||||
"path": "/api/v1/blocks",
|
||||
"status_code": 200,
|
||||
"duration_ms": 45,
|
||||
"metadata": {}
|
||||
}
|
||||
```
|
||||
|
||||
## Log Retention Policies
|
||||
|
||||
**Development**: 7 days
|
||||
**Staging**: 30 days
|
||||
**Production**: 90 days (hot), 1 year (cold archive)
|
||||
|
||||
## PII Sanitization in Logs
|
||||
|
||||
**Strategy**: Remove PII before logging
|
||||
**Fields to Sanitize**:
|
||||
- Email addresses
|
||||
- Personal names
|
||||
- Addresses
|
||||
- API keys (partial masking)
|
||||
|
||||
**Implementation**: Log sanitization middleware
|
||||
|
||||
## References
|
||||
|
||||
- Metrics & Monitoring: See `metrics-monitoring.md`
|
||||
|
||||
104
docs/specs/observability/metrics-monitoring.md
Normal file
104
docs/specs/observability/metrics-monitoring.md
Normal file
@@ -0,0 +1,104 @@
|
||||
# Metrics & Monitoring Specification
|
||||
|
||||
## Overview
|
||||
|
||||
Metrics collection and monitoring for the explorer platform.
|
||||
|
||||
## Metrics Catalog
|
||||
|
||||
### API Metrics
|
||||
|
||||
- Request rate (requests/second)
|
||||
- Response time (p50, p95, p99)
|
||||
- Error rate (by status code)
|
||||
- Endpoint usage
|
||||
|
||||
### Indexer Metrics
|
||||
|
||||
- Blocks processed per minute
|
||||
- Transactions processed per minute
|
||||
- Block lag (current block - last indexed)
|
||||
- Error rate
|
||||
- Processing time
|
||||
|
||||
### Database Metrics
|
||||
|
||||
- Query performance
|
||||
- Connection pool usage
|
||||
- Slow queries
|
||||
- Replication lag
|
||||
|
||||
### Infrastructure Metrics
|
||||
|
||||
- CPU usage
|
||||
- Memory usage
|
||||
- Disk I/O
|
||||
- Network I/O
|
||||
|
||||
## Dashboard Specifications
|
||||
|
||||
### Key Dashboards
|
||||
|
||||
**1. System Health**:
|
||||
- Overall system status
|
||||
- Service health
|
||||
- Error rates
|
||||
- Resource usage
|
||||
|
||||
**2. API Performance**:
|
||||
- Request rates
|
||||
- Latency percentiles
|
||||
- Error rates
|
||||
- Top endpoints
|
||||
|
||||
**3. Indexer Performance**:
|
||||
- Block processing rate
|
||||
- Indexer lag
|
||||
- Error rates
|
||||
- Chain status
|
||||
|
||||
## Alerting Rules
|
||||
|
||||
### Alert Conditions
|
||||
|
||||
**Critical**:
|
||||
- Service down
|
||||
- Error rate > 5%
|
||||
- Indexer lag > 100 blocks
|
||||
- Database connection failures
|
||||
|
||||
**Warning**:
|
||||
- Error rate > 1%
|
||||
- Indexer lag > 10 blocks
|
||||
- High latency (p95 > 1s)
|
||||
- High resource usage (> 80%)
|
||||
|
||||
### Alert Channels
|
||||
|
||||
- Email
|
||||
- Slack
|
||||
- PagerDuty (for critical)
|
||||
|
||||
## SLO Definitions
|
||||
|
||||
### API SLOs
|
||||
|
||||
- **Availability**: 99.9% uptime
|
||||
- **Latency**: p95 < 500ms
|
||||
- **Error Rate**: < 0.1%
|
||||
|
||||
### Indexer SLOs
|
||||
|
||||
- **Lag**: < 10 blocks behind chain head
|
||||
- **Processing Time**: p95 < 5 seconds per block
|
||||
|
||||
### WebSocket SLOs
|
||||
|
||||
- **Delivery**: 99.9% message delivery
|
||||
- **Latency**: < 100ms message delivery
|
||||
|
||||
## References
|
||||
|
||||
- Logging: See `logging.md`
|
||||
- Tracing: See `tracing.md`
|
||||
|
||||
66
docs/specs/observability/tracing.md
Normal file
66
docs/specs/observability/tracing.md
Normal file
@@ -0,0 +1,66 @@
|
||||
# Distributed Tracing Specification
|
||||
|
||||
## Overview
|
||||
|
||||
Distributed tracing for request tracking across services.
|
||||
|
||||
## Distributed Tracing Strategy
|
||||
|
||||
**Solution**: OpenTelemetry or Jaeger
|
||||
|
||||
**Implementation**:
|
||||
- Instrument services with tracing
|
||||
- Propagate trace context
|
||||
- Collect and store traces
|
||||
- Visualize in UI
|
||||
|
||||
## Trace Sampling
|
||||
|
||||
### Sampling Strategy
|
||||
|
||||
**Head-Based Sampling**:
|
||||
- Sample rate: 1% of requests
|
||||
- Always sample errors
|
||||
- Always sample slow requests (> 1s)
|
||||
|
||||
**Tail-Based Sampling** (optional):
|
||||
- Sample based on trace characteristics
|
||||
- More efficient storage
|
||||
|
||||
## Trace Correlation Across Services
|
||||
|
||||
### Trace Context Propagation
|
||||
|
||||
**Headers**:
|
||||
- `traceparent` (W3C Trace Context)
|
||||
- `tracestate` (W3C Trace Context)
|
||||
|
||||
**Propagation**: HTTP headers, message queue metadata
|
||||
|
||||
### Trace Structure
|
||||
|
||||
```
|
||||
Trace (request)
|
||||
├── Span (API Gateway)
|
||||
│ ├── Span (Explorer API)
|
||||
│ │ ├── Span (Database Query)
|
||||
│ │ └── Span (Cache Lookup)
|
||||
│ └── Span (Search Service)
|
||||
└── Span (Response)
|
||||
```
|
||||
|
||||
## Performance Analysis Workflows
|
||||
|
||||
### Analysis Steps
|
||||
|
||||
1. Identify slow requests
|
||||
2. Trace request path
|
||||
3. Identify bottlenecks
|
||||
4. Optimize slow components
|
||||
5. Verify improvements
|
||||
|
||||
## References
|
||||
|
||||
- Logging: See `logging.md`
|
||||
- Metrics: See `metrics-monitoring.md`
|
||||
|
||||
Reference in New Issue
Block a user