Add full monorepo: virtual-banker, backend, frontend, docs, scripts, deployment

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-10 11:32:49 -08:00
parent aafcd913c2
commit 88bc76da91
815 changed files with 125522 additions and 264 deletions
--- a/docs/specs/observability/dashboards.md
+++ b/docs/specs/observability/dashboards.md
@@ -0,0 +1,68 @@
+# Dashboards Specification
+
+## Overview
+
+Observability dashboards for monitoring platform health and performance.
+
+## Indexer Lag Dashboard
+
+### Metrics
+
+- Current block vs indexed block (per chain)
+- Time lag (minutes behind)
+- Processing rate (blocks/minute)
+- Historical lag trends
+
+### Visualizations
+
+- Lag over time (line chart)
+- Current lag by chain (bar chart)
+- Alert status (indicator)
+
+## CCIP Message Lifecycle Dashboard
+
+### Metrics
+
+- Messages by status
+- Success rate
+- Average execution time
+- Failure reasons
+
+### Visualizations
+
+- Message flow diagram
+- Status distribution (pie chart)
+- Latency over time
+- Chain pair statistics
+
+## Transaction Funnel Analytics
+
+### Funnel Stages
+
+1. Quote requested
+2. User approved
+3. Transaction signed
+4. Transaction broadcast
+5. Transaction confirmed
+
+### Metrics
+
+- Conversion rate at each stage
+- Drop-off reasons
+- Time at each stage
+
+## System Health Dashboard
+
+### Components
+
+- Service health status
+- Error rates
+- Request rates
+- Resource usage
+- Database health
+
+## References
+
+- Metrics & Monitoring: See `metrics-monitoring.md`
+- Logging: See `logging.md`
+
--- a/docs/specs/observability/logging.md
+++ b/docs/specs/observability/logging.md
@@ -0,0 +1,75 @@
+# Logging Architecture Specification
+
+## Overview
+
+Centralized logging architecture for the explorer platform.
+
+## Log Aggregation Strategy
+
+**Solution**: ELK Stack (Elasticsearch, Logstash, Kibana) or Loki + Grafana
+
+**Flow**:
+1. Services emit logs
+2. Log collectors aggregate logs
+3. Logs stored in central store
+4. Dashboards and queries via UI
+
+## Log Levels and Categorization
+
+### Log Levels
+
+- **DEBUG**: Detailed debugging information
+- **INFO**: General informational messages
+- **WARN**: Warning messages
+- **ERROR**: Error messages
+- **FATAL**: Critical errors
+
+### Categories
+
+**Application Logs**: Business logic, API requests
+**Access Logs**: HTTP requests, authentication
+**System Logs**: Infrastructure, system events
+**Audit Logs**: Security events, compliance
+
+## Structured Logging Format
+
+### Log Format
+
+```json
+{
+  "timestamp": "2024-01-01T00:00:00Z",
+  "level": "INFO",
+  "service": "explorer-api",
+  "message": "Request processed",
+  "request_id": "uuid",
+  "user_id": "uuid",
+  "chain_id": 138,
+  "method": "GET",
+  "path": "/api/v1/blocks",
+  "status_code": 200,
+  "duration_ms": 45,
+  "metadata": {}
+}
+```
+
+## Log Retention Policies
+
+**Development**: 7 days
+**Staging**: 30 days
+**Production**: 90 days (hot), 1 year (cold archive)
+
+## PII Sanitization in Logs
+
+**Strategy**: Remove PII before logging
+**Fields to Sanitize**:
+- Email addresses
+- Personal names
+- Addresses
+- API keys (partial masking)
+
+**Implementation**: Log sanitization middleware
+
+## References
+
+- Metrics & Monitoring: See `metrics-monitoring.md`
+
--- a/docs/specs/observability/metrics-monitoring.md
+++ b/docs/specs/observability/metrics-monitoring.md
@@ -0,0 +1,104 @@
+# Metrics & Monitoring Specification
+
+## Overview
+
+Metrics collection and monitoring for the explorer platform.
+
+## Metrics Catalog
+
+### API Metrics
+
+- Request rate (requests/second)
+- Response time (p50, p95, p99)
+- Error rate (by status code)
+- Endpoint usage
+
+### Indexer Metrics
+
+- Blocks processed per minute
+- Transactions processed per minute
+- Block lag (current block - last indexed)
+- Error rate
+- Processing time
+
+### Database Metrics
+
+- Query performance
+- Connection pool usage
+- Slow queries
+- Replication lag
+
+### Infrastructure Metrics
+
+- CPU usage
+- Memory usage
+- Disk I/O
+- Network I/O
+
+## Dashboard Specifications
+
+### Key Dashboards
+
+**1. System Health**:
+- Overall system status
+- Service health
+- Error rates
+- Resource usage
+
+**2. API Performance**:
+- Request rates
+- Latency percentiles
+- Error rates
+- Top endpoints
+
+**3. Indexer Performance**:
+- Block processing rate
+- Indexer lag
+- Error rates
+- Chain status
+
+## Alerting Rules
+
+### Alert Conditions
+
+**Critical**:
+- Service down
+- Error rate > 5%
+- Indexer lag > 100 blocks
+- Database connection failures
+
+**Warning**:
+- Error rate > 1%
+- Indexer lag > 10 blocks
+- High latency (p95 > 1s)
+- High resource usage (> 80%)
+
+### Alert Channels
+
+- Email
+- Slack
+- PagerDuty (for critical)
+
+## SLO Definitions
+
+### API SLOs
+
+- **Availability**: 99.9% uptime
+- **Latency**: p95 < 500ms
+- **Error Rate**: < 0.1%
+
+### Indexer SLOs
+
+- **Lag**: < 10 blocks behind chain head
+- **Processing Time**: p95 < 5 seconds per block
+
+### WebSocket SLOs
+
+- **Delivery**: 99.9% message delivery
+- **Latency**: < 100ms message delivery
+
+## References
+
+- Logging: See `logging.md`
+- Tracing: See `tracing.md`
+
--- a/docs/specs/observability/tracing.md
+++ b/docs/specs/observability/tracing.md
@@ -0,0 +1,66 @@
+# Distributed Tracing Specification
+
+## Overview
+
+Distributed tracing for request tracking across services.
+
+## Distributed Tracing Strategy
+
+**Solution**: OpenTelemetry or Jaeger
+
+**Implementation**:
+- Instrument services with tracing
+- Propagate trace context
+- Collect and store traces
+- Visualize in UI
+
+## Trace Sampling
+
+### Sampling Strategy
+
+**Head-Based Sampling**:
+- Sample rate: 1% of requests
+- Always sample errors
+- Always sample slow requests (> 1s)
+
+**Tail-Based Sampling** (optional):
+- Sample based on trace characteristics
+- More efficient storage
+
+## Trace Correlation Across Services
+
+### Trace Context Propagation
+
+**Headers**: 
+- `traceparent` (W3C Trace Context)
+- `tracestate` (W3C Trace Context)
+
+**Propagation**: HTTP headers, message queue metadata
+
+### Trace Structure
+
+```
+Trace (request)
+  ├── Span (API Gateway)
+  │   ├── Span (Explorer API)
+  │   │   ├── Span (Database Query)
+  │   │   └── Span (Cache Lookup)
+  │   └── Span (Search Service)
+  └── Span (Response)
+```
+
+## Performance Analysis Workflows
+
+### Analysis Steps
+
+1. Identify slow requests
+2. Trace request path
+3. Identify bottlenecks
+4. Optimize slow components
+5. Verify improvements
+
+## References
+
+- Logging: See `logging.md`
+- Metrics: See `metrics-monitoring.md`
+