Files
Sankofa/docs/DEPLOYMENT_EXECUTION_PLAN.md
defiQUG 9daf1fd378 Apply Composer changes: comprehensive API updates, migrations, middleware, and infrastructure improvements
- Add comprehensive database migrations (001-024) for schema evolution
- Enhance API schema with expanded type definitions and resolvers
- Add new middleware: audit logging, rate limiting, MFA enforcement, security, tenant auth
- Implement new services: AI optimization, billing, blockchain, compliance, marketplace
- Add adapter layer for cloud integrations (Cloudflare, Kubernetes, Proxmox, storage)
- Update Crossplane provider with enhanced VM management capabilities
- Add comprehensive test suite for API endpoints and services
- Update frontend components with improved GraphQL subscriptions and real-time updates
- Enhance security configurations and headers (CSP, CORS, etc.)
- Update documentation and configuration files
- Add new CI/CD workflows and validation scripts
- Implement design system improvements and UI enhancements
2025-12-12 18:01:35 -08:00

13 KiB

Sankofa Phoenix - Deployment Execution Plan

Date: 2025-01-XX
Status: Ready for Execution


Executive Summary

This document provides a step-by-step execution plan for deploying Sankofa and Sankofa Phoenix. All prerequisites are complete, VM YAML files are ready, and infrastructure is operational.


Pre-Execution Checklist

Completed

  • Proxmox infrastructure operational (2 sites)
  • All 21 VM YAML files updated with enhanced template
  • Guest agent configuration complete
  • OS images available (ubuntu-22.04-cloud.img)
  • Network configuration verified
  • Documentation comprehensive
  • Scripts ready for deployment

⚠️ Requires Verification

  • Resource quota check (run ./scripts/check-proxmox-quota.sh)
  • Kubernetes cluster status
  • Database connectivity
  • Keycloak deployment status

Execution Phases

Phase 1: Resource Verification (15 minutes)

Objective: Verify Proxmox resources are sufficient for deployment

Steps:

cd /home/intlc/projects/Sankofa

# 1. Run resource quota check
./scripts/check-proxmox-quota.sh

# 2. Review output
# Expected: Available resources >= 72 CPU, 140 GiB RAM, 278 GiB disk

# 3. If insufficient, document and plan expansion

Success Criteria:

  • Resources sufficient for all 18 VMs
  • Storage pools have adequate space
  • Network connectivity verified

Rollback: None required - verification only


Phase 2: Kubernetes Control Plane (30-60 minutes)

Objective: Deploy and verify Kubernetes control plane components

Steps:

# 1. Verify Kubernetes cluster
kubectl cluster-info
kubectl get nodes

# 2. Create namespaces
kubectl create namespace sankofa --dry-run=client -o yaml | kubectl apply -f -
kubectl create namespace crossplane-system --dry-run=client -o yaml | kubectl apply -f -
kubectl create namespace monitoring --dry-run=client -o yaml | kubectl apply -f -

# 3. Deploy Crossplane
kubectl apply -f gitops/apps/crossplane/
kubectl wait --for=condition=Ready pod -l app=crossplane -n crossplane-system --timeout=300s

# 4. Deploy Proxmox Provider
kubectl apply -f crossplane-provider-proxmox/config/
kubectl wait --for=condition=Installed provider -l pkg.crossplane.io/name=provider-proxmox --timeout=300s

# 5. Create ProviderConfig
kubectl apply -f crossplane-provider-proxmox/config/provider.yaml

# 6. Verify
kubectl get pods -n crossplane-system
kubectl get providerconfig -A

Success Criteria:

  • Crossplane pods running
  • Proxmox provider installed
  • ProviderConfig ready

Rollback:

kubectl delete -f crossplane-provider-proxmox/config/
kubectl delete -f gitops/apps/crossplane/

Phase 3: Database and Identity (30-45 minutes)

Objective: Deploy PostgreSQL and Keycloak

Steps:

# 1. Deploy PostgreSQL (if not external)
kubectl apply -f gitops/apps/postgresql/  # If exists

# 2. Run database migrations
cd api
npm install
npm run db:migrate

# 3. Verify migrations
psql -h <db-host> -U postgres -d sankofa -c "\dt" | grep -E "tenants|billing"

# 4. Deploy Keycloak
kubectl apply -f gitops/apps/keycloak/

# 5. Wait for Keycloak ready
kubectl wait --for=condition=Ready pod -l app=keycloak -n sankofa --timeout=600s

# 6. Configure Keycloak clients
kubectl apply -f gitops/apps/keycloak/keycloak-clients.yaml

Success Criteria:

  • Database migrations complete (26 migrations)
  • Keycloak pods running
  • Keycloak clients configured

Rollback:

kubectl delete -f gitops/apps/keycloak/
# Database rollback: Restore from backup or re-run migrations

Phase 4: Application Deployment (30-45 minutes)

Objective: Deploy API, Frontend, and Portal

Steps:

# 1. Create secrets
kubectl create secret generic api-secrets -n sankofa \
  --from-literal=DB_PASSWORD=<db-password> \
  --from-literal=JWT_SECRET=<jwt-secret> \
  --from-literal=KEYCLOAK_CLIENT_SECRET=<keycloak-secret> \
  --dry-run=client -o yaml | kubectl apply -f -

# 2. Deploy API
kubectl apply -f gitops/apps/api/
kubectl wait --for=condition=Ready pod -l app=api -n sankofa --timeout=300s

# 3. Deploy Frontend
kubectl apply -f gitops/apps/frontend/
kubectl wait --for=condition=Ready pod -l app=frontend -n sankofa --timeout=300s

# 4. Deploy Portal
kubectl apply -f gitops/apps/portal/
kubectl wait --for=condition=Ready pod -l app=portal -n sankofa --timeout=300s

# 5. Verify health endpoints
curl http://api.sankofa.nexus/health
curl http://frontend.sankofa.nexus
curl http://portal.sankofa.nexus

Success Criteria:

  • All application pods running
  • Health endpoints responding
  • No critical errors in logs

Rollback:

kubectl rollout undo deployment/api -n sankofa
kubectl rollout undo deployment/frontend -n sankofa
kubectl rollout undo deployment/portal -n sankofa

Phase 5: Infrastructure VMs (15-30 minutes)

Objective: Deploy Nginx Proxy and Cloudflare Tunnel VMs

Steps:

# 1. Deploy Nginx Proxy VM
kubectl apply -f examples/production/nginx-proxy-vm.yaml

# 2. Deploy Cloudflare Tunnel VM
kubectl apply -f examples/production/cloudflare-tunnel-vm.yaml

# 3. Monitor deployment
watch kubectl get proxmoxvm -A

# 4. Wait for VMs ready (check status)
kubectl wait --for=condition=Ready proxmoxvm nginx-proxy-vm -n default --timeout=600s
kubectl wait --for=condition=Ready proxmoxvm cloudflare-tunnel-vm -n default --timeout=600s

# 5. Verify VM creation in Proxmox
ssh root@192.168.11.10 "qm list | grep -E 'nginx-proxy|cloudflare-tunnel'"

# 6. Check guest agent
ssh root@192.168.11.10 "qm guest exec <vmid> -- cat /etc/os-release"

Success Criteria:

  • Both VMs created and running
  • Guest agent running
  • VMs accessible via SSH
  • Cloud-init completed

Rollback:

kubectl delete proxmoxvm nginx-proxy-vm -n default
kubectl delete proxmoxvm cloudflare-tunnel-vm -n default

Phase 6: Application VMs (30-60 minutes)

Objective: Deploy all 16 SMOM-DBIS-138 VMs

Steps:

# 1. Deploy all VMs
kubectl apply -f examples/production/smom-dbis-138/

# 2. Monitor deployment (in separate terminal)
watch kubectl get proxmoxvm -A

# 3. Check controller logs (in separate terminal)
kubectl logs -n crossplane-system -l app=crossplane-provider-proxmox --tail=50 -f

# 4. Wait for all VMs ready (this may take 10-30 minutes)
# Monitor progress and verify each VM reaches Ready state

# 5. Verify VM creation
kubectl get proxmoxvm -A -o wide

# 6. Check guest agent on all VMs
for vm in $(kubectl get proxmoxvm -A -o jsonpath='{.items[*].metadata.name}'); do
  echo "Checking $vm..."
  kubectl get proxmoxvm $vm -A -o jsonpath='{.status.conditions[*].status}'
done

VM Deployment Order (if deploying sequentially):

  1. validator-01, validator-02, validator-03, validator-04
  2. sentry-01, sentry-02, sentry-03, sentry-04
  3. rpc-node-01, rpc-node-02, rpc-node-03, rpc-node-04
  4. services, blockscout, monitoring, management

Success Criteria:

  • All 16 VMs created
  • All VMs in Running state
  • Guest agent running on all VMs
  • Cloud-init completed successfully

Rollback:

# Delete all VMs
kubectl delete -f examples/production/smom-dbis-138/

Phase 7: Monitoring Stack (20-30 minutes)

Objective: Deploy monitoring and observability stack

Steps:

# 1. Deploy Prometheus
kubectl apply -f gitops/apps/monitoring/prometheus/
kubectl wait --for=condition=Ready pod -l app=prometheus -n monitoring --timeout=300s

# 2. Deploy Grafana
kubectl apply -f gitops/apps/monitoring/grafana/
kubectl wait --for=condition=Ready pod -l app=grafana -n monitoring --timeout=300s

# 3. Deploy Loki
kubectl apply -f gitops/apps/monitoring/loki/
kubectl wait --for=condition=Ready pod -l app=loki -n monitoring --timeout=300s

# 4. Deploy Alertmanager
kubectl apply -f gitops/apps/monitoring/alertmanager/

# 5. Deploy backup CronJob
kubectl apply -f gitops/apps/monitoring/backup-cronjob.yaml

# 6. Verify
kubectl get pods -n monitoring
curl http://grafana.sankofa.nexus

Success Criteria:

  • All monitoring pods running
  • Prometheus scraping metrics
  • Grafana accessible
  • Loki ingesting logs
  • Backup CronJob scheduled

Rollback:

kubectl delete -f gitops/apps/monitoring/

Phase 8: Network Configuration (30-45 minutes)

Objective: Configure Cloudflare Tunnel, Nginx, and DNS

Steps:

# 1. Configure Cloudflare Tunnel
./scripts/configure-cloudflare-tunnel.sh

# Or manually:
# - Create tunnel in Cloudflare dashboard
# - Download credentials JSON
# - Upload to cloudflare-tunnel-vm: /etc/cloudflared/tunnel-credentials.json
# - Update /etc/cloudflared/config.yaml with ingress rules
# - Restart cloudflared service

# 2. Configure Nginx Proxy
./scripts/configure-nginx-proxy.sh

# Or manually:
# - SSH into nginx-proxy-vm
# - Update /etc/nginx/conf.d/*.conf
# - Run certbot for SSL certificates
# - Test: nginx -t
# - Reload: systemctl reload nginx

# 3. Configure DNS
./scripts/setup-dns-records.sh

# Or manually in Cloudflare:
# - Create A/CNAME records
# - Point to Cloudflare Tunnel
# - Enable proxy (orange cloud)

Success Criteria:

  • Cloudflare Tunnel connected
  • Nginx proxying correctly
  • DNS records created
  • SSL certificates issued
  • Services accessible via public URLs

Rollback:

  • Revert DNS changes in Cloudflare
  • Restore previous Nginx configuration
  • Disable Cloudflare Tunnel

Phase 9: Multi-Tenancy Setup (15-20 minutes)

Objective: Create system tenant and configure multi-tenancy

Steps:

# 1. Get API endpoint and admin token
API_URL="http://api.sankofa.nexus/graphql"
ADMIN_TOKEN="<get-from-keycloak>"

# 2. Create system tenant
curl -X POST $API_URL \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -d '{
    "query": "mutation { createTenant(input: { name: \"system\", tier: SOVEREIGN }) { id name billingAccountId } }"
  }'

# 3. Get system tenant ID from response
SYSTEM_TENANT_ID="<from-response>"

# 4. Add admin user to system tenant
curl -X POST $API_URL \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -d "{
    \"query\": \"mutation { addUserToTenant(tenantId: \\\"$SYSTEM_TENANT_ID\\\", userId: \\\"<admin-user-id>\\\", role: TENANT_OWNER) }\"
  }"

# 5. Verify tenant
curl -X POST $API_URL \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -d '{
    "query": "query { myTenant { id name status tier } }"
  }'

Success Criteria:

  • System tenant created
  • Admin user assigned
  • Tenant accessible via API
  • RBAC working correctly

Rollback:

  • Delete tenant via API (if supported)
  • Or manually remove from database

Phase 10: Verification and Testing (30-45 minutes)

Objective: Verify deployment and run tests

Steps:

# 1. Health checks
curl http://api.sankofa.nexus/health
curl http://frontend.sankofa.nexus
curl http://portal.sankofa.nexus
curl http://keycloak.sankofa.nexus/health

# 2. Check all VMs
kubectl get proxmoxvm -A

# 3. Check all pods
kubectl get pods -A

# 4. Run smoke tests
./scripts/smoke-tests.sh

# 5. Run performance tests (optional)
./scripts/performance-test.sh

# 6. Verify monitoring
curl http://grafana.sankofa.nexus
kubectl get pods -n monitoring

# 7. Check backups
./scripts/verify-backups.sh

Success Criteria:

  • All health checks passing
  • All VMs running
  • All pods running
  • Smoke tests passing
  • Monitoring operational
  • Backups configured

Rollback: N/A - verification only


Execution Timeline

Estimated Total Time: 4-6 hours

Phase Duration Dependencies
Phase 1: Resource Verification 15 min None
Phase 2: Kubernetes Control Plane 30-60 min Kubernetes cluster
Phase 3: Database and Identity 30-45 min Phase 2
Phase 4: Application Deployment 30-45 min Phase 3
Phase 5: Infrastructure VMs 15-30 min Phase 2, Phase 4
Phase 6: Application VMs 30-60 min Phase 5
Phase 7: Monitoring Stack 20-30 min Phase 2
Phase 8: Network Configuration 30-45 min Phase 5
Phase 9: Multi-Tenancy Setup 15-20 min Phase 3, Phase 4
Phase 10: Verification and Testing 30-45 min All phases

Risk Mitigation

High-Risk Areas

  1. VM Deployment: May take longer than expected

    • Mitigation: Monitor closely, allow extra time
  2. Network Configuration: DNS propagation delays

    • Mitigation: Test with IP addresses first, then DNS
  3. Database Migrations: Potential data loss

    • Mitigation: Backup before migrations, test in staging first

Rollback Procedures

  • Each phase includes rollback steps
  • Document any issues encountered
  • Keep backups of all configurations

Post-Deployment

Immediate (First 24 hours)

  • Monitor all services
  • Review logs for errors
  • Verify all VMs accessible
  • Check monitoring dashboards
  • Verify backups running

Short-term (First week)

  • Performance optimization
  • Security hardening
  • Documentation updates
  • Team training
  • Support procedures

Success Criteria

Technical

  • All 18 VMs deployed and running
  • All services healthy
  • Guest agent on all VMs
  • Monitoring operational
  • Backups configured

Functional

  • Portal accessible
  • API responding
  • Multi-tenancy working
  • Resource provisioning functional

Last Updated: 2025-01-XX
Status: Ready for Execution