- Added a reference to the comprehensive VM Deployment Plan for better deployment strategy understanding. - Included a quick start guide for deploying infrastructure VMs. - Emphasized the importance of reviewing the VM Deployment Plan before deployment to optimize resource allocation. - Updated the documentation index to include the new VM Deployment Plan link for improved navigation.
18 KiB
VM Deployment Plan
Date: 2025-01-XX
Status: Ready for Deployment
Version: 2.0
Executive Summary
This document provides a comprehensive deployment plan for all virtual machines in the Sankofa Phoenix infrastructure. The plan includes hardware capabilities, resource allocation, deployment priorities, and step-by-step deployment procedures.
Key Constraints
- ML110-01 (Site-1): 6 CPU cores, 256 GB RAM
- R630-01 (Site-2): 28 CPU cores, 768 GB RAM
- Total VMs to Deploy: 30 VMs
- Deployment Method: Crossplane Proxmox Provider via Kubernetes
Hardware Capabilities
Site-1: ML110-01
Location: 192.168.11.10
Hardware Specifications:
- CPU: Intel Xeon E5-2603 v3 @ 1.60GHz
- CPU Cores: 6 cores (6 threads, no hyperthreading)
- RAM: 256 GB (251 GiB usable, ~244 GB available for VMs)
- Storage:
- local-lvm: 794.3 GB available
- ceph-fs: 384 GB available
- Network: vmbr0 (1GbE)
Resource Allocation Strategy:
- Reserve 1 core for Proxmox host (5 cores available for VMs)
- Reserve 8 GB RAM for Proxmox host (~248 GB available for VMs)
- Suitable for: Light-to-medium workloads, infrastructure services
Site-2: R630-01
Location: 192.168.11.11
Hardware Specifications:
- CPU: Intel Xeon E5-2660 v4 @ 2.00GHz (dual socket)
- CPU Cores: 28 cores (56 threads with hyperthreading)
- RAM: 768 GB (755 GiB usable, ~744 GB available for VMs)
- Storage:
- local-lvm: 171.3 GB available
- Ceph OSD: Configured
- Network: vmbr0 (10GbE capable)
Resource Allocation Strategy:
- Reserve 2 cores for Proxmox host (26 cores available for VMs)
- Reserve 16 GB RAM for Proxmox host (~752 GB available for VMs)
- Suitable for: High-resource workloads, compute-intensive applications, blockchain nodes
VM Inventory and Resource Requirements
Summary Statistics
| Category | Count | Total CPU | Total RAM | Total Disk |
|---|---|---|---|---|
| Phoenix Infrastructure | 8 | 52 cores | 128 GiB | 1,150 GiB |
| Core Infrastructure | 2 | 4 cores | 8 GiB | 30 GiB |
| SMOM-DBIS-138 Blockchain | 16 | 64 cores | 128 GiB | 320 GiB |
| Test/Example VMs | 4 | 8 cores | 16 GiB | 200 GiB |
| TOTAL | 30 | 128 cores | 280 GiB | 1,700 GiB |
Note: These totals exceed available resources on a single node. VMs are distributed across both nodes.
VM Deployment Schedule
Phase 1: Core Infrastructure (Priority: CRITICAL)
Deployment Order: Deploy these first as they support other services.
1.1 Nginx Proxy VM
- Node: ml110-01
- Site: site-1
- Resources: 2 CPU, 4 GiB RAM, 20 GiB disk
- Purpose: Reverse proxy and SSL termination
- Dependencies: None
- Deployment File:
examples/production/nginx-proxy-vm.yaml
1.2 Cloudflare Tunnel VM
- Node: r630-01
- Site: site-2
- Resources: 2 CPU, 4 GiB RAM, 10 GiB disk
- Purpose: Cloudflare Tunnel for secure outbound connectivity
- Dependencies: None
- Deployment File:
examples/production/cloudflare-tunnel-vm.yaml
Phase 1 Resource Usage:
- ML110-01: 2 CPU, 4 GiB RAM, 20 GiB disk
- R630-01: 2 CPU, 4 GiB RAM, 10 GiB disk
Phase 2: Phoenix Infrastructure Services (Priority: HIGH)
Deployment Order: Deploy in dependency order.
2.1 DNS Primary Server
- Node: ml110-01
- Site: site-1
- Resources: 4 CPU, 8 GiB RAM, 50 GiB disk
- Purpose: Primary DNS server (BIND9)
- Dependencies: None
- Deployment File:
examples/production/phoenix/dns-primary.yaml
2.2 Git Server
- Node: ml110-01
- Site: site-1
- Resources: 8 CPU, 16 GiB RAM, 500 GiB disk
- Purpose: Git repository hosting (Gitea/GitLab)
- Dependencies: DNS (optional)
- Deployment File:
examples/production/phoenix/git-server.yaml
2.3 Email Server
- Node: ml110-01
- Site: site-1
- Resources: 8 CPU, 16 GiB RAM, 200 GiB disk
- Purpose: Email services (Postfix/Dovecot)
- Dependencies: DNS (optional)
- Deployment File:
examples/production/phoenix/email-server.yaml
2.4 DevOps Runner
- Node: ml110-01
- Site: site-1
- Resources: 8 CPU, 16 GiB RAM, 200 GiB disk
- Purpose: CI/CD runner (Jenkins/GitLab Runner)
- Dependencies: Git Server (optional)
- Deployment File:
examples/production/phoenix/devops-runner.yaml
2.5 Codespaces IDE
- Node: ml110-01
- Site: site-1
- Resources: 8 CPU, 32 GiB RAM, 200 GiB disk
- Purpose: Cloud IDE (code-server)
- Dependencies: None
- Deployment File:
examples/production/phoenix/codespaces-ide.yaml
2.6 AS4 Gateway
- Node: ml110-01
- Site: site-1
- Resources: TBD
- Purpose: AS4 messaging gateway
- Dependencies: DNS, Email
- Deployment File:
examples/production/phoenix/as4-gateway.yaml
2.7 Business Integration Gateway
- Node: ml110-01
- Site: site-1
- Resources: TBD
- Purpose: Business integration services
- Dependencies: DNS
- Deployment File:
examples/production/phoenix/business-integration-gateway.yaml
2.8 Financial Messaging Gateway
- Node: ml110-01
- Site: site-1
- Resources: TBD
- Purpose: Financial messaging services
- Dependencies: DNS
- Deployment File:
examples/production/phoenix/financial-messaging-gateway.yaml
Phase 2 Resource Usage:
- ML110-01: 44+ CPU, 88+ GiB RAM, 1,150+ GiB disk
- R630-01: 0 CPU, 0 GiB RAM, 0 GiB disk
⚠️ WARNING: Phase 2 exceeds ML110-01 CPU capacity (6 cores available). Some VMs may need to be moved to R630-01 or resources reduced.
Phase 3: SMOM-DBIS-138 Blockchain Infrastructure (Priority: HIGH)
Deployment Order: Deploy validators first, then sentries, then RPC nodes, then services.
3.1 Validators (Site-1: ml110-01)
- smom-validator-01: 6 CPU, 12 GiB RAM, 20 GiB disk
- smom-validator-02: 6 CPU, 12 GiB RAM, 20 GiB disk
- smom-validator-03: 6 CPU, 12 GiB RAM, 20 GiB disk
- smom-validator-04: 6 CPU, 12 GiB RAM, 20 GiB disk
- Total: 24 CPU, 48 GiB RAM, 80 GiB disk
- Deployment Files:
examples/production/smom-dbis-138/validator-*.yaml
⚠️ WARNING: 24 CPU cores required but only 6 available on ML110-01. RECOMMENDATION: Move validators to R630-01 or reduce CPU allocation.
3.2 Sentries (Distributed)
- Site-1 (ml110-01):
- smom-sentry-01: 4 CPU, 8 GiB RAM, 20 GiB disk
- smom-sentry-02: 4 CPU, 8 GiB RAM, 20 GiB disk
- Site-2 (r630-01):
- smom-sentry-03: 4 CPU, 8 GiB RAM, 20 GiB disk
- smom-sentry-04: 4 CPU, 8 GiB RAM, 20 GiB disk
- Total: 16 CPU, 32 GiB RAM, 80 GiB disk
- Deployment Files:
examples/production/smom-dbis-138/sentry-*.yaml
3.3 RPC Nodes (Site-2: r630-01)
- smom-rpc-node-01: 4 CPU, 8 GiB RAM, 20 GiB disk
- smom-rpc-node-02: 4 CPU, 8 GiB RAM, 20 GiB disk
- smom-rpc-node-03: 4 CPU, 8 GiB RAM, 20 GiB disk
- smom-rpc-node-04: 4 CPU, 8 GiB RAM, 20 GiB disk
- Total: 16 CPU, 32 GiB RAM, 80 GiB disk
- Deployment Files:
examples/production/smom-dbis-138/rpc-node-*.yaml
3.4 Services (Site-2: r630-01)
- smom-management: 4 CPU, 8 GiB RAM, 20 GiB disk
- smom-monitoring: 4 CPU, 8 GiB RAM, 20 GiB disk
- smom-services: 4 CPU, 8 GiB RAM, 20 GiB disk
- smom-blockscout: 4 CPU, 8 GiB RAM, 20 GiB disk
- Total: 16 CPU, 32 GiB RAM, 80 GiB disk
- Deployment Files:
examples/production/smom-dbis-138/{management,monitoring,services,blockscout}.yaml
Phase 3 Resource Usage:
- ML110-01: 8 CPU (sentries only), 16 GiB RAM, 40 GiB disk
- R630-01: 36 CPU, 72 GiB RAM, 180 GiB disk
Phase 4: Test/Example VMs (Priority: LOW)
Deployment Order: Deploy after production VMs are stable.
- vm-100: ml110-01, 2 CPU, 4 GiB RAM, 50 GiB disk
- basic-vm: ml110-01, 2 CPU, 4 GiB RAM, 50 GiB disk
- medium-vm: ml110-01, 4 CPU, 8 GiB RAM, 50 GiB disk
- large-vm: ml110-01, 8 CPU, 16 GiB RAM, 50 GiB disk
Phase 4 Resource Usage:
- ML110-01: 16 CPU, 32 GiB RAM, 200 GiB disk
Resource Allocation Analysis
ML110-01 (Site-1) - Resource Constraints
Available Resources:
- CPU: 5 cores (6 - 1 reserved)
- RAM: ~248 GB (256 - 8 reserved)
- Disk: 794.3 GB (local-lvm) + 384 GB (ceph-fs)
Requested Resources (Phases 1-2):
- CPU: 46+ cores ⚠️ EXCEEDS CAPACITY BY 9x
- RAM: 92+ GiB ✅ Within capacity
- Disk: 1,170+ GiB ⚠️ EXCEEDS CAPACITY
Requested Resources (Phases 1-3):
- CPU: 54+ cores ⚠️ EXCEEDS CAPACITY BY 11x
- RAM: 108+ GiB ✅ Within capacity
- Disk: 1,250+ GiB ⚠️ EXCEEDS CAPACITY
Recommendations:
- Move high-CPU VMs to R630-01: Git Server, Email Server, DevOps Runner, Codespaces IDE
- Reduce CPU allocations: Use 2-4 cores instead of 8 cores for most services
- Use Ceph storage: Move large disk VMs to Ceph storage
- Prioritize critical services: Deploy only essential services on ML110-01
R630-01 (Site-2) - Resource Capacity
Available Resources:
- CPU: 26 cores (28 - 2 reserved)
- RAM: ~752 GB (768 - 16 reserved)
- Disk: 171.3 GB (local-lvm) + Ceph OSD
Requested Resources (Phase 3):
- CPU: 36 cores ⚠️ EXCEEDS CAPACITY BY 1.4x
- RAM: 72 GiB ✅ Within capacity
- Disk: 180 GiB ⚠️ EXCEEDS CAPACITY
Recommendations:
- Reduce CPU allocations: Use 2-3 cores per validator instead of 6
- Use Ceph storage: Move VM disks to Ceph storage
- Optimize resource allocation: Share resources more efficiently
Revised Deployment Plan
Optimized Resource Allocation
ML110-01 (Site-1) - Light Workloads Only
Phase 1: Core Infrastructure
- Nginx Proxy VM: 2 CPU, 4 GiB RAM, 20 GiB disk ✅
Phase 2: Phoenix Infrastructure (Reduced)
- DNS Primary: 2 CPU, 4 GiB RAM, 50 GiB disk ✅
- Git Server: MOVE TO R630-01 or reduce to 2 CPU
- Email Server: MOVE TO R630-01 or reduce to 2 CPU
- DevOps Runner: MOVE TO R630-01 or reduce to 2 CPU
- Codespaces IDE: MOVE TO R630-01 or reduce to 2 CPU, 16 GiB RAM
- AS4 Gateway: 2 CPU, 4 GiB RAM, 50 GiB disk ✅
- Business Integration Gateway: 2 CPU, 4 GiB RAM, 50 GiB disk ✅
- Financial Messaging Gateway: 2 CPU, 4 GiB RAM, 50 GiB disk ✅
Phase 3: Blockchain (Sentries Only)
- smom-sentry-01: 2 CPU, 4 GiB RAM, 20 GiB disk ✅
- smom-sentry-02: 2 CPU, 4 GiB RAM, 20 GiB disk ✅
ML110-01 Total: 18 CPU cores requested, 5 available ⚠️ Still exceeds capacity
Final Recommendation: Deploy only 2-3 critical VMs on ML110-01, move rest to R630-01.
R630-01 (Site-2) - Primary Compute Node
Phase 1: Core Infrastructure
- Cloudflare Tunnel VM: 2 CPU, 4 GiB RAM, 10 GiB disk ✅
Phase 2: Phoenix Infrastructure (Moved)
- Git Server: 4 CPU, 16 GiB RAM, 500 GiB disk (use Ceph)
- Email Server: 4 CPU, 16 GiB RAM, 200 GiB disk (use Ceph)
- DevOps Runner: 4 CPU, 16 GiB RAM, 200 GiB disk (use Ceph)
- Codespaces IDE: 4 CPU, 32 GiB RAM, 200 GiB disk (use Ceph)
Phase 3: Blockchain Infrastructure
- Validators (4x): 3 CPU each = 12 CPU, 12 GiB RAM each = 48 GiB RAM, 80 GiB disk (use Ceph)
- Sentries (2x): 2 CPU each = 4 CPU, 4 GiB RAM each = 8 GiB RAM, 40 GiB disk
- RPC Nodes (4x): 2 CPU each = 8 CPU, 4 GiB RAM each = 16 GiB RAM, 80 GiB disk (use Ceph)
- Services (4x): 2 CPU each = 8 CPU, 4 GiB RAM each = 16 GiB RAM, 80 GiB disk (use Ceph)
R630-01 Total: 42 CPU cores requested, 26 available ⚠️ Exceeds capacity by 1.6x
Final Recommendation: Reduce CPU allocations further or deploy in batches.
Deployment Execution Plan
Step 1: Pre-Deployment Verification
# 1. Verify Proxmox nodes are accessible
./scripts/check-proxmox-quota-ssh.sh
# 2. Verify images are available
./scripts/verify-image-availability.sh
# 3. Check Crossplane provider is ready
kubectl get providerconfig -n crossplane-system
kubectl get pods -n crossplane-system -l app=crossplane-provider-proxmox
Step 2: Deploy Phase 1 - Core Infrastructure
# Deploy Nginx Proxy (ML110-01)
kubectl apply -f examples/production/nginx-proxy-vm.yaml
# Deploy Cloudflare Tunnel (R630-01)
kubectl apply -f examples/production/cloudflare-tunnel-vm.yaml
# Monitor deployment
kubectl get proxmoxvm -w
Wait for: Both VMs to be in "Running" state before proceeding.
Step 3: Deploy Phase 2 - Phoenix Infrastructure
# Deploy DNS Primary (ML110-01)
kubectl apply -f examples/production/phoenix/dns-primary.yaml
# Wait for DNS to be ready, then deploy other services
kubectl apply -f examples/production/phoenix/git-server.yaml
kubectl apply -f examples/production/phoenix/email-server.yaml
kubectl apply -f examples/production/phoenix/devops-runner.yaml
kubectl apply -f examples/production/phoenix/codespaces-ide.yaml
kubectl apply -f examples/production/phoenix/as4-gateway.yaml
kubectl apply -f examples/production/phoenix/business-integration-gateway.yaml
kubectl apply -f examples/production/phoenix/financial-messaging-gateway.yaml
Note: Adjust node assignments and CPU allocations based on resource constraints.
Step 4: Deploy Phase 3 - Blockchain Infrastructure
# Deploy validators first
kubectl apply -f examples/production/smom-dbis-138/validator-01.yaml
kubectl apply -f examples/production/smom-dbis-138/validator-02.yaml
kubectl apply -f examples/production/smom-dbis-138/validator-03.yaml
kubectl apply -f examples/production/smom-dbis-138/validator-04.yaml
# Deploy sentries
kubectl apply -f examples/production/smom-dbis-138/sentry-01.yaml
kubectl apply -f examples/production/smom-dbis-138/sentry-02.yaml
kubectl apply -f examples/production/smom-dbis-138/sentry-03.yaml
kubectl apply -f examples/production/smom-dbis-138/sentry-04.yaml
# Deploy RPC nodes
kubectl apply -f examples/production/smom-dbis-138/rpc-node-01.yaml
kubectl apply -f examples/production/smom-dbis-138/rpc-node-02.yaml
kubectl apply -f examples/production/smom-dbis-138/rpc-node-03.yaml
kubectl apply -f examples/production/smom-dbis-138/rpc-node-04.yaml
# Deploy services
kubectl apply -f examples/production/smom-dbis-138/management.yaml
kubectl apply -f examples/production/smom-dbis-138/monitoring.yaml
kubectl apply -f examples/production/smom-dbis-138/services.yaml
kubectl apply -f examples/production/smom-dbis-138/blockscout.yaml
Step 5: Deploy Phase 4 - Test VMs (Optional)
# Deploy test VMs only if resources allow
kubectl apply -f examples/production/vm-100.yaml
kubectl apply -f examples/production/basic-vm.yaml
kubectl apply -f examples/production/medium-vm.yaml
kubectl apply -f examples/production/large-vm.yaml
Monitoring and Verification
Real-Time Monitoring
# Watch all VM deployments
kubectl get proxmoxvm -A -w
# Check specific VM status
kubectl describe proxmoxvm <vm-name>
# Check controller logs
kubectl logs -n crossplane-system -l app=crossplane-provider-proxmox --tail=100 -f
Resource Monitoring
# Check Proxmox node resources
./scripts/check-proxmox-quota-ssh.sh
# Check VM resource usage
kubectl get proxmoxvm -A -o wide
Post-Deployment Verification
# Verify all VMs are running
kubectl get proxmoxvm -A | grep -v Running
# Check VM IP addresses
kubectl get proxmoxvm -A -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.network.ipAddress}{"\n"}{end}'
# Verify guest agents
./scripts/verify-guest-agent.sh
Risk Mitigation
Resource Overcommitment
Risk: Requested resources exceed available capacity.
Mitigation:
- Deploy VMs in batches, monitoring resource usage
- Reduce CPU allocations where possible
- Use Ceph storage for large disk requirements
- Move high-resource VMs to R630-01
- Consider adding additional Proxmox nodes
Deployment Failures
Risk: VM creation may fail due to resource constraints or configuration errors.
Mitigation:
- Validate all VM configurations before deployment
- Check Proxmox quotas before each deployment
- Monitor controller logs for errors
- Have rollback procedures ready
- Test deployments on non-critical VMs first
Network Issues
Risk: Network connectivity problems may prevent VM deployment or operation.
Mitigation:
- Verify network bridges exist on all nodes
- Test network connectivity before deployment
- Configure proper DNS resolution
- Verify firewall rules allow required traffic
Deployment Timeline
Estimated Timeline
- Phase 1 (Core Infrastructure): 30 minutes
- Phase 2 (Phoenix Infrastructure): 2-4 hours
- Phase 3 (Blockchain Infrastructure): 3-6 hours
- Phase 4 (Test VMs): 1 hour (optional)
Total Estimated Time: 6-11 hours (excluding verification and troubleshooting)
Critical Path
- Core Infrastructure (Nginx, Cloudflare Tunnel) → 30 min
- DNS Primary → 15 min
- Git Server, Email Server → 1 hour
- DevOps Runner, Codespaces IDE → 1 hour
- Blockchain Validators → 2 hours
- Blockchain Sentries → 1 hour
- Blockchain RPC Nodes → 1 hour
- Blockchain Services → 1 hour
Next Steps
- Review and Approve: Review this plan and approve resource allocations
- Update VM Configurations: Update VM YAML files with optimized resource allocations
- Pre-Deployment Checks: Run all pre-deployment verification scripts
- Execute Deployment: Follow deployment steps in order
- Monitor and Verify: Continuously monitor deployment progress
- Post-Deployment: Verify all services are operational
Related Documentation
- VM Deployment Checklist - Step-by-step checklist
- VM Creation Procedure - Detailed creation procedures
- VM Specifications - Complete VM specifications
- Deployment Requirements - Overall deployment requirements
Last Updated: 2025-01-XX
Status: Ready for Review
Maintainer: Infrastructure Team
Version: 2.0