15 KiB
Orchestration Deployment Guide - Enterprise-Grade
Sankofa / Phoenix / PanTel · ChainID 138 · Proxmox + Cloudflare Zero Trust + Dual ISP + 6×/28
Last Updated: 2025-01-20
Document Version: 1.0
Status: Buildable Blueprint
Overview
This is the complete orchestration technical plan for your environment, using your actual Spectrum /28 #1 and placeholders for the other five /28 blocks, explicitly mapping to your hardware:
- 2× ER605 (edge + HA/failover design)
- 3× ES216G switches
- 1× ML110 Gen9 (management / seed / bootstrap)
- 4× Dell R630 (compute cluster; 512GB RAM each; 2×600GB boot; 6×250GB SSD)
This guide provides a buildable blueprint: network, VLANs, Proxmox cluster, IPAM, CCIP next-phase matrix, Cloudflare Zero Trust, and operational runbooks.
Table of Contents
- Core Principles
- Physical Topology & Roles
- ISP & Public IP Plan
- Layer-2 & VLAN Orchestration
- Routing, NAT, and Egress Segmentation
- Proxmox Cluster Orchestration
- Cloudflare Zero Trust Orchestration
- VMID Allocation Registry
- CCIP Fleet Deployment Matrix
- Deployment Orchestration Workflow
- Operational Runbooks
Core Principles
- No public IPs on Proxmox hosts or LXCs/VMs (default)
- Inbound access = Cloudflare Zero Trust + cloudflared (primary)
- Public IPs are used for:
- ER605 WAN addressing
- Egress NAT pools (role-based allowlisting)
- Break-glass emergency endpoints only
- Segmentation by VLAN/VRF: consensus vs services vs sovereign tenants vs ops
- Deterministic VMID registry + IPAM that matches
Physical Topology & Roles
Hardware Role Assignment
Edge / Routing
ER605-A (Primary Edge Router)
- WAN1: Spectrum primary with Block #1 (76.53.10.32/28)
- WAN2: ISP #2 (failover/alternate policy)
- Role: Active edge router, NAT pools, routing
ER605-B (Standby Edge Router / Alternate WAN policy)
- Role: Standby router OR dedicated to WAN2 policies/testing
- Note: ER605 does not support full stateful HA. This is active/standby operational redundancy, not automatic session-preserving HA.
Switching Fabric
- ES216G-1: Core / uplinks / trunks
- ES216G-2: Compute rack aggregation
- ES216G-3: Mgmt + out-of-band / staging
Compute
-
ML110 Gen9: "Bootstrap & Management" node
- IP: 192.168.11.10
- Role: Proxmox mgmt services, Omada controller, Git, monitoring seed
-
4× Dell R630: Proxmox compute cluster nodes
- Resources: 512GB RAM each, 2×600GB boot, 6×250GB SSD
- Role: Production workloads, CCIP fleet, sovereign tenants, services
ISP & Public IP Plan (6× /28)
Public Block #1 (Known - Spectrum)
| Property | Value |
|---|---|
| Network | 76.53.10.32/28 |
| Gateway | 76.53.10.33 |
| Usable Range | 76.53.10.33–76.53.10.46 |
| Broadcast | 76.53.10.47 |
| ER605 WAN1 IP | 76.53.10.34 (router interface) |
Public Blocks #2–#6 (Placeholders - To Be Configured)
| Block | Network | Gateway | Usable Range | Broadcast | Designated Use |
|---|---|---|---|---|---|
| #2 | <PUBLIC_BLOCK_2>/28 |
<GW2> |
<USABLE2> |
<BCAST2> |
CCIP Commit egress NAT pool |
| #3 | <PUBLIC_BLOCK_3>/28 |
<GW3> |
<USABLE3> |
<BCAST3> |
CCIP Execute egress NAT pool |
| #4 | <PUBLIC_BLOCK_4>/28 |
<GW4> |
<USABLE4> |
<BCAST4> |
RMN egress NAT pool |
| #5 | <PUBLIC_BLOCK_5>/28 |
<GW5> |
<USABLE5> |
<BCAST5> |
Sankofa/Phoenix/PanTel service egress |
| #6 | <PUBLIC_BLOCK_6>/28 |
<GW6> |
<USABLE6> |
<BCAST6> |
Sovereign Cloud Band tenant egress |
Public IP Usage Policy (Role-based)
| Public /28 Block | Designated Use | Why |
|---|---|---|
| #1 (76.53.10.32/28) | Router WAN + break-glass VIPs | Primary connectivity + emergency |
| #2 | CCIP Commit egress NAT pool | Allowlistable egress for source RPCs |
| #3 | CCIP Execute egress NAT pool | Allowlistable egress for destination RPCs |
| #4 | RMN egress NAT pool | Independent security-plane egress |
| #5 | Sankofa/Phoenix/PanTel service egress | Service-plane separation |
| #6 | Sovereign Cloud Band tenant egress | Per-sovereign policy control |
Layer-2 & VLAN Orchestration
VLAN Set (Authoritative)
Migration Note: Currently on flat LAN 192.168.11.0/24. This plan migrates to VLANs while keeping compatibility.
| VLAN ID | VLAN Name | Purpose | Subnet | Gateway |
|---|---|---|---|---|
| 11 | MGMT-LAN | Proxmox mgmt, switches mgmt, admin endpoints | 192.168.11.0/24 | 192.168.11.1 |
| 110 | BESU-VAL | Validator-only network (no member access) | 10.110.0.0/24 | 10.110.0.1 |
| 111 | BESU-SEN | Sentry mesh | 10.111.0.0/24 | 10.111.0.1 |
| 112 | BESU-RPC | RPC / gateway tier | 10.112.0.0/24 | 10.112.0.1 |
| 120 | BLOCKSCOUT | Explorer + DB | 10.120.0.0/24 | 10.120.0.1 |
| 121 | CACTI | Interop middleware | 10.121.0.0/24 | 10.121.0.1 |
| 130 | CCIP-OPS | Ops/admin | 10.130.0.0/24 | 10.130.0.1 |
| 132 | CCIP-COMMIT | Commit-role DON | 10.132.0.0/24 | 10.132.0.1 |
| 133 | CCIP-EXEC | Execute-role DON | 10.133.0.0/24 | 10.133.0.1 |
| 134 | CCIP-RMN | Risk management network | 10.134.0.0/24 | 10.134.0.1 |
| 140 | FABRIC | Fabric | 10.140.0.0/24 | 10.140.0.1 |
| 141 | FIREFLY | FireFly | 10.141.0.0/24 | 10.141.0.1 |
| 150 | INDY | Identity | 10.150.0.0/24 | 10.150.0.1 |
| 160 | SANKOFA-SVC | Sankofa/Phoenix/PanTel service layer | 10.160.0.0/22 | 10.160.0.1 |
| 200 | PHX-SOV-SMOM | Sovereign tenant | 10.200.0.0/20 | 10.200.0.1 |
| 201 | PHX-SOV-ICCC | Sovereign tenant | 10.201.0.0/20 | 10.201.0.1 |
| 202 | PHX-SOV-DBIS | Sovereign tenant | 10.202.0.0/20 | 10.202.0.1 |
| 203 | PHX-SOV-AR | Absolute Realms tenant | 10.203.0.0/20 | 10.203.0.1 |
Switching Configuration (ES216G)
- ES216G-1: Core (all VLAN trunks to ES216G-2/3 + ER605-A)
- ES216G-2: Compute (trunks to R630s + ML110)
- ES216G-3: Mgmt/OOB (mgmt access ports, staging, out-of-band)
All Proxmox uplinks should be 802.1Q trunk ports.
Routing, NAT, and Egress Segmentation
Dual Router Roles
- ER605-A: Active edge router (WAN1 = Spectrum primary with Block #1)
- ER605-B: Standby router OR dedicated to WAN2 policies/testing (no inbound services)
NAT Policies (Critical)
Inbound NAT
- Default: none
- Break-glass only (optional):
- Jumpbox/SSH (single port, IP allowlist, Cloudflare Access preferred)
- Proxmox admin should remain LAN-only
Outbound NAT (Role-based Pools Using /28 Blocks)
| Private Subnet | Role | Egress NAT Pool | Public Block |
|---|---|---|---|
| 10.132.0.0/24 | CCIP Commit | Block #2 <PUBLIC_BLOCK_2>/28 |
#2 |
| 10.133.0.0/24 | CCIP Execute | Block #3 <PUBLIC_BLOCK_3>/28 |
#3 |
| 10.134.0.0/24 | RMN | Block #4 <PUBLIC_BLOCK_4>/28 |
#4 |
| 10.160.0.0/22 | Sankofa/Phoenix/PanTel | Block #5 <PUBLIC_BLOCK_5>/28 |
#5 |
| 10.200.0.0/20–10.203.0.0/20 | Sovereign tenants | Block #6 <PUBLIC_BLOCK_6>/28 |
#6 |
| 192.168.11.0/24 | Mgmt | Block #1 (or none; tightly restricted) | #1 |
This yields provable separation, allowlisting, and incident scoping.
Proxmox Cluster Orchestration
Node Layout
- ml110 (192.168.11.10): mgmt + seed services + initial automation runner
- r630-01..04: production compute
Proxmox Networking (per host)
vmbr0: VLAN-aware bridge- Native VLAN: 11 (MGMT)
- Tagged VLANs: 110,111,112,120,121,130,132,133,134,140,141,150,160,200–203
- Proxmox host IP remains on VLAN 11 only.
Storage Orchestration (R630)
Hardware:
- 2×600GB boot (mirror recommended)
- 6×250GB SSD
Recommended:
- Boot drives: ZFS mirror or hardware RAID1
- Data SSDs: ZFS pool (striped mirrors if you can pair, or RAIDZ1/2 depending on risk tolerance)
- High-write workloads (logs/metrics/indexers) on dedicated dataset with quotas
Cloudflare Zero Trust Orchestration
cloudflared Gateway Pattern
Run 2 cloudflared LXCs for redundancy:
cloudflared-1on ML110cloudflared-2on an R630
Both run tunnels for:
- Blockscout
- FireFly
- Gitea
- Internal admin dashboards (Grafana) behind Cloudflare Access
Keep Proxmox UI LAN-only; if needed, publish via Cloudflare Access with strict posture/MFA.
VMID Allocation Registry
Authoritative Registry Summary
| VMID Range | Domain | Count | Notes |
|---|---|---|---|
| 1000–4999 | Besu | 4,000 | Validators, Sentries, RPC, Archive, Reserved |
| 5000–5099 | Blockscout | 100 | Explorer/Indexing |
| 5200–5299 | Cacti | 100 | Interop middleware |
| 5400–5599 | CCIP | 200 | Ops, Monitoring, Commit, Execute, RMN, Reserved |
| 6000–6099 | Fabric | 100 | Enterprise contracts |
| 6200–6299 | FireFly | 100 | Workflow/orchestration |
| 6400–7399 | Indy | 1,000 | Identity layer |
| 7800–8999 | Sankofa/Phoenix/PanTel | 1,200 | Service + Cloud + Telecom |
| 10000–13999 | Phoenix Sovereign Cloud Band | 4,000 | SMOM/ICCC/DBIS/AR tenants |
Total Allocated: 11,000 VMIDs (1000-13999)
See VMID_ALLOCATION_FINAL.md for complete details.
CCIP Fleet Deployment Matrix
Lane A — Minimum Production Fleet
Total new CCIP nodes: 41 (or 43 if you add 2 monitoring nodes)
VMIDs + Hostnames
| Group | Count | VMIDs | Hostname Pattern |
|---|---|---|---|
| Ops/Admin | 2 | 5400–5401 | ccip-ops-01..02 |
| Monitoring (optional) | 2 | 5402–5403 | ccip-mon-01..02 |
| Commit Oracles | 16 | 5410–5425 | ccip-commit-01..16 |
| Execute Oracles | 16 | 5440–5455 | ccip-exec-01..16 |
| RMN | 7 | 5470–5476 | ccip-rmn-01..07 |
Private IP Assignments (VLAN-based)
Once VLANs are active, assign:
| Role | VLAN | Subnet |
|---|---|---|
| Ops/Admin | 130 | 10.130.0.0/24 |
| Commit | 132 | 10.132.0.0/24 |
| Execute | 133 | 10.133.0.0/24 |
| RMN | 134 | 10.134.0.0/24 |
Interim Plan: While still on the flat LAN, you can keep your interim plan (192.168.11.170+ block) and migrate later by VLAN cutover.
Egress NAT Mapping (Public blocks placeholder)
- Commit VLAN (10.132.0.0/24) → Block #2
<PUBLIC_BLOCK_2>/28 - Execute VLAN (10.133.0.0/24) → Block #3
<PUBLIC_BLOCK_3>/28 - RMN VLAN (10.134.0.0/24) → Block #4
<PUBLIC_BLOCK_4>/28
See CCIP_DEPLOYMENT_SPEC.md for complete specification.
Deployment Orchestration Workflow
Phase 0 — Validate Foundation
- ✅ Confirm ER605-A WAN1 static: 76.53.10.34/28, GW 76.53.10.33
- ⏳ Confirm WAN2 on ER605-A (ISP #2) failover
- ⏳ Confirm ES216G trunks and native VLAN 11 mgmt access is stable
- ⏳ Confirm Proxmox mgmt reachable only from trusted admin endpoints
Phase 1 — VLAN Enablement
- ⏳ Configure ES216G trunk ports
- ⏳ Enable VLAN-aware bridge
vmbr0on Proxmox nodes - ⏳ Create VLAN interfaces on ER605 for routing + DHCP (where appropriate)
- ⏳ Move services one domain at a time (start with monitoring)
Phase 2 — Observability First
- ⏳ Deploy monitoring stack (Prometheus/Grafana/Loki/Alertmanager)
- ⏳ Publish Grafana via Cloudflare Access (not public IPs)
- ⏳ Set alerts for node health, disk, latency, chain metrics
Phase 3 — CCIP Fleet (Lane A)
- ⏳ Deploy CCIP Ops/Admin
- ⏳ Deploy 16 commit nodes (VLAN 132)
- ⏳ Deploy 16 execute nodes (VLAN 133)
- ⏳ Deploy 7 RMN nodes (VLAN 134)
- ⏳ Apply ER605 outbound NAT pools per VLAN using /28 blocks #2–#4 placeholders
- ⏳ Verify node egress identity by role (allowlisting ready)
Phase 4 — Sovereign Tenant Rollout
- ⏳ Stand up Phoenix Sovereign Cloud Band VLANs 200–203
- ⏳ Apply Block #6 egress NAT
- ⏳ Enforce tenant isolation (ACLs, deny east-west)
Operational Runbooks
Network Operations
- ER605_ROUTER_CONFIGURATION.md - Router configuration guide
- BESU_ALLOWLIST_RUNBOOK.md - Besu allowlist management
- CLOUDFLARE_ZERO_TRUST_GUIDE.md - Cloudflare Zero Trust setup
Deployment Operations
- VALIDATED_SET_DEPLOYMENT_GUIDE.md - Validated set deployment
- CCIP_DEPLOYMENT_SPEC.md - CCIP fleet deployment
- DEPLOYMENT_READINESS.md - Pre-deployment validation
Troubleshooting
- TROUBLESHOOTING_FAQ.md - Common issues and solutions
- QBFT_TROUBLESHOOTING.md - QBFT consensus troubleshooting
Deliverables
Completed ✅
- ✅ Authoritative VLAN and subnet plan
- ✅ Public block usage model (with placeholders for 5 blocks)
- ✅ Proxmox cluster topology plan
- ✅ CCIP fleet deployment matrix
- ✅ Stepwise orchestration workflow
Pending ⏳
- ⏳ Exact NAT/VIP rules (requires public blocks #2-6)
- ⏳ ER605-B role decision (standby edge vs dedicated sovereign edge)
- ⏳ VLAN migration execution
- ⏳ CCIP fleet deployment
Next Steps
To Finalize Placeholders
Paste the other five /28 blocks in the same format as Block #1:
- Network / Gateway / Usable / Broadcast
And specify:
- ER605-B usage: standby edge OR dedicated sovereign edge
Then we can produce:
- Exact NAT pool assignment sheet per role
- Break-glass VIP table
- Complete ER605 configuration
Related Documentation
Prerequisites
- PREREQUISITES.md - System requirements and prerequisites
- DEPLOYMENT_READINESS.md - Pre-deployment validation checklist
Architecture
- NETWORK_ARCHITECTURE.md - Complete network architecture
- VMID_ALLOCATION_FINAL.md - VMID allocation registry
- CCIP_DEPLOYMENT_SPEC.md - CCIP deployment specification
Configuration
- ER605_ROUTER_CONFIGURATION.md - Router configuration
- CLOUDFLARE_ZERO_TRUST_GUIDE.md - Cloudflare Zero Trust setup
Operations
- OPERATIONAL_RUNBOOKS.md - Operational procedures
- DEPLOYMENT_STATUS_CONSOLIDATED.md - Deployment status
- TROUBLESHOOTING_FAQ.md - Troubleshooting guide
Best Practices
- RECOMMENDATIONS_AND_SUGGESTIONS.md - Comprehensive recommendations
- IMPLEMENTATION_CHECKLIST.md - Implementation checklist
Reference
- MASTER_INDEX.md - Complete documentation index
Document Status: Complete (v1.0)
Maintained By: Infrastructure Team
Review Cycle: Monthly
Last Updated: 2025-01-20