- Organized 252 files across project - Root directory: 187 → 2 files (98.9% reduction) - Moved configuration guides to docs/04-configuration/ - Moved troubleshooting guides to docs/09-troubleshooting/ - Moved quick start guides to docs/01-getting-started/ - Moved reports to reports/ directory - Archived temporary files - Generated comprehensive reports and documentation - Created maintenance scripts and guides All files organized according to established standards.
452 lines
8.9 KiB
Markdown
452 lines
8.9 KiB
Markdown
# Deployment Runbook
|
|
## SolaceScanScout Explorer - Production Deployment Guide
|
|
|
|
**Last Updated**: $(date)
|
|
**Version**: 1.0.0
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. [Pre-Deployment Checklist](#pre-deployment-checklist)
|
|
2. [Environment Setup](#environment-setup)
|
|
3. [Database Migration](#database-migration)
|
|
4. [Service Deployment](#service-deployment)
|
|
5. [Health Checks](#health-checks)
|
|
6. [Rollback Procedures](#rollback-procedures)
|
|
7. [Post-Deployment Verification](#post-deployment-verification)
|
|
8. [Troubleshooting](#troubleshooting)
|
|
|
|
---
|
|
|
|
## Pre-Deployment Checklist
|
|
|
|
### Infrastructure Requirements
|
|
|
|
- [ ] Kubernetes cluster (AKS) or VM infrastructure ready
|
|
- [ ] PostgreSQL 16+ with TimescaleDB extension
|
|
- [ ] Redis cluster (for production cache/rate limiting)
|
|
- [ ] Elasticsearch/OpenSearch cluster
|
|
- [ ] Load balancer configured
|
|
- [ ] SSL certificates provisioned
|
|
- [ ] DNS records configured
|
|
- [ ] Monitoring stack deployed (Prometheus, Grafana)
|
|
|
|
### Configuration
|
|
|
|
- [ ] Environment variables configured
|
|
- [ ] Secrets stored in Key Vault
|
|
- [ ] Database credentials verified
|
|
- [ ] Redis connection string verified
|
|
- [ ] RPC endpoint URLs verified
|
|
- [ ] JWT secret configured (strong random value)
|
|
|
|
### Code & Artifacts
|
|
|
|
- [ ] All tests passing
|
|
- [ ] Docker images built and tagged
|
|
- [ ] Images pushed to container registry
|
|
- [ ] Database migrations reviewed
|
|
- [ ] Rollback plan documented
|
|
|
|
---
|
|
|
|
## Environment Setup
|
|
|
|
### 1. Set Environment Variables
|
|
|
|
```bash
|
|
# Database
|
|
export DB_HOST=postgres.example.com
|
|
export DB_PORT=5432
|
|
export DB_USER=explorer
|
|
export DB_PASSWORD=<from-key-vault>
|
|
export DB_NAME=explorer
|
|
|
|
# Redis (for production)
|
|
export REDIS_URL=redis://redis.example.com:6379
|
|
|
|
# RPC
|
|
export RPC_URL=https://rpc.d-bis.org
|
|
export WS_URL=wss://rpc.d-bis.org
|
|
|
|
# Application
|
|
export CHAIN_ID=138
|
|
export PORT=8080
|
|
export JWT_SECRET=<strong-random-secret>
|
|
|
|
# Optional
|
|
export LOG_LEVEL=info
|
|
export ENABLE_METRICS=true
|
|
```
|
|
|
|
### 2. Verify Secrets
|
|
|
|
```bash
|
|
# Test database connection
|
|
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "SELECT 1;"
|
|
|
|
# Test Redis connection
|
|
redis-cli -u $REDIS_URL ping
|
|
|
|
# Test RPC endpoint
|
|
curl -X POST $RPC_URL \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'
|
|
```
|
|
|
|
---
|
|
|
|
## Database Migration
|
|
|
|
### 1. Backup Existing Database
|
|
|
|
```bash
|
|
# Create backup
|
|
pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME > backup_$(date +%Y%m%d_%H%M%S).sql
|
|
|
|
# Verify backup
|
|
ls -lh backup_*.sql
|
|
```
|
|
|
|
### 2. Run Migrations
|
|
|
|
```bash
|
|
cd explorer-monorepo/backend/database/migrations
|
|
|
|
# Review pending migrations
|
|
go run migrate.go --status
|
|
|
|
# Run migrations
|
|
go run migrate.go --up
|
|
|
|
# Verify migration
|
|
go run migrate.go --status
|
|
```
|
|
|
|
### 3. Verify Schema
|
|
|
|
```bash
|
|
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "\dt"
|
|
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "\d blocks"
|
|
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "\d transactions"
|
|
```
|
|
|
|
---
|
|
|
|
## Service Deployment
|
|
|
|
### Option 1: Kubernetes Deployment
|
|
|
|
#### 1. Deploy API Server
|
|
|
|
```bash
|
|
kubectl apply -f k8s/api-server-deployment.yaml
|
|
kubectl apply -f k8s/api-server-service.yaml
|
|
kubectl apply -f k8s/api-server-ingress.yaml
|
|
|
|
# Verify deployment
|
|
kubectl get pods -l app=api-server
|
|
kubectl logs -f deployment/api-server
|
|
```
|
|
|
|
#### 2. Deploy Indexer
|
|
|
|
```bash
|
|
kubectl apply -f k8s/indexer-deployment.yaml
|
|
|
|
# Verify deployment
|
|
kubectl get pods -l app=indexer
|
|
kubectl logs -f deployment/indexer
|
|
```
|
|
|
|
#### 3. Rolling Update
|
|
|
|
```bash
|
|
# Update image
|
|
kubectl set image deployment/api-server api-server=registry.example.com/explorer-api:v1.1.0
|
|
|
|
# Monitor rollout
|
|
kubectl rollout status deployment/api-server
|
|
|
|
# Rollback if needed
|
|
kubectl rollout undo deployment/api-server
|
|
```
|
|
|
|
### Option 2: Docker Compose Deployment
|
|
|
|
```bash
|
|
cd explorer-monorepo/deployment
|
|
|
|
# Start services
|
|
docker-compose up -d
|
|
|
|
# Verify services
|
|
docker-compose ps
|
|
docker-compose logs -f api-server
|
|
```
|
|
|
|
---
|
|
|
|
## Health Checks
|
|
|
|
### 1. API Health Endpoint
|
|
|
|
```bash
|
|
# Check health
|
|
curl https://api.d-bis.org/health
|
|
|
|
# Expected response
|
|
{
|
|
"status": "ok",
|
|
"timestamp": "2024-01-01T00:00:00Z",
|
|
"database": "connected"
|
|
}
|
|
```
|
|
|
|
### 2. Service Health
|
|
|
|
```bash
|
|
# Kubernetes
|
|
kubectl get pods
|
|
kubectl describe pod <pod-name>
|
|
|
|
# Docker
|
|
docker ps
|
|
docker inspect <container-id>
|
|
```
|
|
|
|
### 3. Database Connectivity
|
|
|
|
```bash
|
|
# From API server
|
|
curl https://api.d-bis.org/health | jq .database
|
|
|
|
# Direct check
|
|
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "SELECT COUNT(*) FROM blocks;"
|
|
```
|
|
|
|
### 4. Redis Connectivity
|
|
|
|
```bash
|
|
# Test Redis
|
|
redis-cli -u $REDIS_URL ping
|
|
|
|
# Check cache stats
|
|
redis-cli -u $REDIS_URL INFO stats
|
|
```
|
|
|
|
---
|
|
|
|
## Rollback Procedures
|
|
|
|
### Quick Rollback (Kubernetes)
|
|
|
|
```bash
|
|
# Rollback to previous version
|
|
kubectl rollout undo deployment/api-server
|
|
kubectl rollout undo deployment/indexer
|
|
|
|
# Verify rollback
|
|
kubectl rollout status deployment/api-server
|
|
```
|
|
|
|
### Database Rollback
|
|
|
|
```bash
|
|
# Restore from backup
|
|
psql -h $DB_HOST -U $DB_USER -d $DB_NAME < backup_YYYYMMDD_HHMMSS.sql
|
|
|
|
# Or rollback migrations
|
|
cd explorer-monorepo/backend/database/migrations
|
|
go run migrate.go --down 1
|
|
```
|
|
|
|
### Full Rollback
|
|
|
|
```bash
|
|
# 1. Stop new services
|
|
kubectl scale deployment/api-server --replicas=0
|
|
kubectl scale deployment/indexer --replicas=0
|
|
|
|
# 2. Restore database
|
|
psql -h $DB_HOST -U $DB_USER -d $DB_NAME < backup_YYYYMMDD_HHMMSS.sql
|
|
|
|
# 3. Start previous version
|
|
kubectl set image deployment/api-server api-server=registry.example.com/explorer-api:v1.0.0
|
|
kubectl scale deployment/api-server --replicas=3
|
|
```
|
|
|
|
---
|
|
|
|
## Post-Deployment Verification
|
|
|
|
### 1. Functional Tests
|
|
|
|
```bash
|
|
# Test Track 1 endpoints (public)
|
|
curl https://api.d-bis.org/api/v1/track1/blocks/latest
|
|
|
|
# Test search
|
|
curl https://api.d-bis.org/api/v1/search?q=1000
|
|
|
|
# Test health
|
|
curl https://api.d-bis.org/health
|
|
```
|
|
|
|
### 2. Performance Tests
|
|
|
|
```bash
|
|
# Load test
|
|
ab -n 1000 -c 10 https://api.d-bis.org/api/v1/track1/blocks/latest
|
|
|
|
# Check response times
|
|
curl -w "@curl-format.txt" -o /dev/null -s https://api.d-bis.org/api/v1/track1/blocks/latest
|
|
```
|
|
|
|
### 3. Monitoring
|
|
|
|
- [ ] Check Grafana dashboards
|
|
- [ ] Verify Prometheus metrics
|
|
- [ ] Check error rates
|
|
- [ ] Monitor response times
|
|
- [ ] Check database connection pool
|
|
- [ ] Verify Redis cache hit rate
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
#### 1. Database Connection Errors
|
|
|
|
**Symptoms**: 500 errors, "database connection failed"
|
|
|
|
**Resolution**:
|
|
```bash
|
|
# Check database status
|
|
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "SELECT 1;"
|
|
|
|
# Check connection pool
|
|
# Review database/migrations for connection pool settings
|
|
|
|
# Restart service
|
|
kubectl rollout restart deployment/api-server
|
|
```
|
|
|
|
#### 2. Redis Connection Errors
|
|
|
|
**Symptoms**: Cache misses, rate limiting not working
|
|
|
|
**Resolution**:
|
|
```bash
|
|
# Test Redis connection
|
|
redis-cli -u $REDIS_URL ping
|
|
|
|
# Check Redis logs
|
|
kubectl logs -l app=redis
|
|
|
|
# Fallback to in-memory (temporary)
|
|
# Remove REDIS_URL from environment
|
|
```
|
|
|
|
#### 3. High Memory Usage
|
|
|
|
**Symptoms**: OOM kills, slow responses
|
|
|
|
**Resolution**:
|
|
```bash
|
|
# Check memory usage
|
|
kubectl top pods
|
|
|
|
# Increase memory limits
|
|
kubectl set resources deployment/api-server --limits=memory=2Gi
|
|
|
|
# Review cache TTL settings
|
|
```
|
|
|
|
#### 4. Slow Response Times
|
|
|
|
**Symptoms**: High latency, timeout errors
|
|
|
|
**Resolution**:
|
|
```bash
|
|
# Check database query performance
|
|
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "EXPLAIN ANALYZE SELECT * FROM blocks LIMIT 10;"
|
|
|
|
# Check indexer lag
|
|
curl https://api.d-bis.org/api/v1/track2/stats
|
|
|
|
# Review connection pool settings
|
|
```
|
|
|
|
---
|
|
|
|
## Emergency Procedures
|
|
|
|
### Service Outage
|
|
|
|
1. **Immediate Actions**:
|
|
- Check service status: `kubectl get pods`
|
|
- Check logs: `kubectl logs -f deployment/api-server`
|
|
- Check database: `psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "SELECT 1;"`
|
|
- Check Redis: `redis-cli -u $REDIS_URL ping`
|
|
|
|
2. **Quick Recovery**:
|
|
- Restart services: `kubectl rollout restart deployment/api-server`
|
|
- Scale up: `kubectl scale deployment/api-server --replicas=5`
|
|
- Rollback if needed: `kubectl rollout undo deployment/api-server`
|
|
|
|
3. **Communication**:
|
|
- Update status page
|
|
- Notify team via Slack/email
|
|
- Document incident
|
|
|
|
### Data Corruption
|
|
|
|
1. **Immediate Actions**:
|
|
- Stop writes: `kubectl scale deployment/api-server --replicas=0`
|
|
- Backup current state: `pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME > emergency_backup.sql`
|
|
|
|
2. **Recovery**:
|
|
- Restore from last known good backup
|
|
- Verify data integrity
|
|
- Resume services
|
|
|
|
---
|
|
|
|
## Maintenance Windows
|
|
|
|
### Scheduled Maintenance
|
|
|
|
1. **Pre-Maintenance**:
|
|
- Notify users 24 hours in advance
|
|
- Create maintenance mode flag
|
|
- Prepare rollback plan
|
|
|
|
2. **During Maintenance**:
|
|
- Enable maintenance mode
|
|
- Perform updates
|
|
- Run health checks
|
|
|
|
3. **Post-Maintenance**:
|
|
- Disable maintenance mode
|
|
- Verify all services
|
|
- Monitor for issues
|
|
|
|
---
|
|
|
|
## Contact Information
|
|
|
|
- **On-Call Engineer**: Check PagerDuty
|
|
- **Slack Channel**: #explorer-deployments
|
|
- **Emergency**: [Emergency Contact]
|
|
|
|
---
|
|
|
|
**Document Version**: 1.0.0
|
|
**Last Reviewed**: $(date)
|
|
**Next Review**: $(date -d "+3 months")
|
|
|