Files
proxmox/docs/02-architecture/ORCHESTRATION_DEPLOYMENT_GUIDE.md

15 KiB
Raw Blame History

Orchestration Deployment Guide - Enterprise-Grade

Sankofa / Phoenix / PanTel · ChainID 138 · Proxmox + Cloudflare Zero Trust + Dual ISP + 6×/28

Last Updated: 2025-01-20
Document Version: 1.0
Status: Buildable Blueprint


Overview

This is the complete orchestration technical plan for your environment, using your actual Spectrum /28 #1 and placeholders for the other five /28 blocks, explicitly mapping to your hardware:

  • 2× ER605 (edge + HA/failover design)
  • 3× ES216G switches
  • 1× ML110 Gen9 (management / seed / bootstrap)
  • 4× Dell R630 (compute cluster; 512GB RAM each; 2×600GB boot; 6×250GB SSD)

This guide provides a buildable blueprint: network, VLANs, Proxmox cluster, IPAM, CCIP next-phase matrix, Cloudflare Zero Trust, and operational runbooks.


Table of Contents

  1. Core Principles
  2. Physical Topology & Roles
  3. ISP & Public IP Plan
  4. Layer-2 & VLAN Orchestration
  5. Routing, NAT, and Egress Segmentation
  6. Proxmox Cluster Orchestration
  7. Cloudflare Zero Trust Orchestration
  8. VMID Allocation Registry
  9. CCIP Fleet Deployment Matrix
  10. Deployment Orchestration Workflow
  11. Operational Runbooks

Core Principles

  1. No public IPs on Proxmox hosts or LXCs/VMs (default)
  2. Inbound access = Cloudflare Zero Trust + cloudflared (primary)
  3. Public IPs are used for:
    • ER605 WAN addressing
    • Egress NAT pools (role-based allowlisting)
    • Break-glass emergency endpoints only
  4. Segmentation by VLAN/VRF: consensus vs services vs sovereign tenants vs ops
  5. Deterministic VMID registry + IPAM that matches

Physical Topology & Roles

Hardware Role Assignment

Edge / Routing

ER605-A (Primary Edge Router)

  • WAN1: Spectrum primary with Block #1 (76.53.10.32/28)
  • WAN2: ISP #2 (failover/alternate policy)
  • Role: Active edge router, NAT pools, routing

ER605-B (Standby Edge Router / Alternate WAN policy)

  • Role: Standby router OR dedicated to WAN2 policies/testing
  • Note: ER605 does not support full stateful HA. This is active/standby operational redundancy, not automatic session-preserving HA.

Switching Fabric

  • ES216G-1: Core / uplinks / trunks
  • ES216G-2: Compute rack aggregation
  • ES216G-3: Mgmt + out-of-band / staging

Compute

  • ML110 Gen9: "Bootstrap & Management" node

    • IP: 192.168.11.10
    • Role: Proxmox mgmt services, Omada controller, Git, monitoring seed
  • 4× Dell R630: Proxmox compute cluster nodes

    • Resources: 512GB RAM each, 2×600GB boot, 6×250GB SSD
    • Role: Production workloads, CCIP fleet, sovereign tenants, services

ISP & Public IP Plan (6× /28)

Public Block #1 (Known - Spectrum)

Property Value
Network 76.53.10.32/28
Gateway 76.53.10.33
Usable Range 76.53.10.3376.53.10.46
Broadcast 76.53.10.47
ER605 WAN1 IP 76.53.10.34 (router interface)

Public Blocks #2#6 (Placeholders - To Be Configured)

Block Network Gateway Usable Range Broadcast Designated Use
#2 <PUBLIC_BLOCK_2>/28 <GW2> <USABLE2> <BCAST2> CCIP Commit egress NAT pool
#3 <PUBLIC_BLOCK_3>/28 <GW3> <USABLE3> <BCAST3> CCIP Execute egress NAT pool
#4 <PUBLIC_BLOCK_4>/28 <GW4> <USABLE4> <BCAST4> RMN egress NAT pool
#5 <PUBLIC_BLOCK_5>/28 <GW5> <USABLE5> <BCAST5> Sankofa/Phoenix/PanTel service egress
#6 <PUBLIC_BLOCK_6>/28 <GW6> <USABLE6> <BCAST6> Sovereign Cloud Band tenant egress

Public IP Usage Policy (Role-based)

Public /28 Block Designated Use Why
#1 (76.53.10.32/28) Router WAN + break-glass VIPs Primary connectivity + emergency
#2 CCIP Commit egress NAT pool Allowlistable egress for source RPCs
#3 CCIP Execute egress NAT pool Allowlistable egress for destination RPCs
#4 RMN egress NAT pool Independent security-plane egress
#5 Sankofa/Phoenix/PanTel service egress Service-plane separation
#6 Sovereign Cloud Band tenant egress Per-sovereign policy control

Layer-2 & VLAN Orchestration

VLAN Set (Authoritative)

Migration Note: Currently on flat LAN 192.168.11.0/24. This plan migrates to VLANs while keeping compatibility.

VLAN ID VLAN Name Purpose Subnet Gateway
11 MGMT-LAN Proxmox mgmt, switches mgmt, admin endpoints 192.168.11.0/24 192.168.11.1
110 BESU-VAL Validator-only network (no member access) 10.110.0.0/24 10.110.0.1
111 BESU-SEN Sentry mesh 10.111.0.0/24 10.111.0.1
112 BESU-RPC RPC / gateway tier 10.112.0.0/24 10.112.0.1
120 BLOCKSCOUT Explorer + DB 10.120.0.0/24 10.120.0.1
121 CACTI Interop middleware 10.121.0.0/24 10.121.0.1
130 CCIP-OPS Ops/admin 10.130.0.0/24 10.130.0.1
132 CCIP-COMMIT Commit-role DON 10.132.0.0/24 10.132.0.1
133 CCIP-EXEC Execute-role DON 10.133.0.0/24 10.133.0.1
134 CCIP-RMN Risk management network 10.134.0.0/24 10.134.0.1
140 FABRIC Fabric 10.140.0.0/24 10.140.0.1
141 FIREFLY FireFly 10.141.0.0/24 10.141.0.1
150 INDY Identity 10.150.0.0/24 10.150.0.1
160 SANKOFA-SVC Sankofa/Phoenix/PanTel service layer 10.160.0.0/22 10.160.0.1
200 PHX-SOV-SMOM Sovereign tenant 10.200.0.0/20 10.200.0.1
201 PHX-SOV-ICCC Sovereign tenant 10.201.0.0/20 10.201.0.1
202 PHX-SOV-DBIS Sovereign tenant 10.202.0.0/20 10.202.0.1
203 PHX-SOV-AR Absolute Realms tenant 10.203.0.0/20 10.203.0.1

Switching Configuration (ES216G)

  • ES216G-1: Core (all VLAN trunks to ES216G-2/3 + ER605-A)
  • ES216G-2: Compute (trunks to R630s + ML110)
  • ES216G-3: Mgmt/OOB (mgmt access ports, staging, out-of-band)

All Proxmox uplinks should be 802.1Q trunk ports.


Routing, NAT, and Egress Segmentation

Dual Router Roles

  • ER605-A: Active edge router (WAN1 = Spectrum primary with Block #1)
  • ER605-B: Standby router OR dedicated to WAN2 policies/testing (no inbound services)

NAT Policies (Critical)

Inbound NAT

  • Default: none
  • Break-glass only (optional):
    • Jumpbox/SSH (single port, IP allowlist, Cloudflare Access preferred)
    • Proxmox admin should remain LAN-only

Outbound NAT (Role-based Pools Using /28 Blocks)

Private Subnet Role Egress NAT Pool Public Block
10.132.0.0/24 CCIP Commit Block #2 <PUBLIC_BLOCK_2>/28 #2
10.133.0.0/24 CCIP Execute Block #3 <PUBLIC_BLOCK_3>/28 #3
10.134.0.0/24 RMN Block #4 <PUBLIC_BLOCK_4>/28 #4
10.160.0.0/22 Sankofa/Phoenix/PanTel Block #5 <PUBLIC_BLOCK_5>/28 #5
10.200.0.0/2010.203.0.0/20 Sovereign tenants Block #6 <PUBLIC_BLOCK_6>/28 #6
192.168.11.0/24 Mgmt Block #1 (or none; tightly restricted) #1

This yields provable separation, allowlisting, and incident scoping.


Proxmox Cluster Orchestration

Node Layout

  • ml110 (192.168.11.10): mgmt + seed services + initial automation runner
  • r630-01..04: production compute

Proxmox Networking (per host)

  • vmbr0: VLAN-aware bridge
    • Native VLAN: 11 (MGMT)
    • Tagged VLANs: 110,111,112,120,121,130,132,133,134,140,141,150,160,200203
  • Proxmox host IP remains on VLAN 11 only.

Storage Orchestration (R630)

Hardware:

  • 2×600GB boot (mirror recommended)
  • 6×250GB SSD

Recommended:

  • Boot drives: ZFS mirror or hardware RAID1
  • Data SSDs: ZFS pool (striped mirrors if you can pair, or RAIDZ1/2 depending on risk tolerance)
  • High-write workloads (logs/metrics/indexers) on dedicated dataset with quotas

Cloudflare Zero Trust Orchestration

cloudflared Gateway Pattern

Run 2 cloudflared LXCs for redundancy:

  • cloudflared-1 on ML110
  • cloudflared-2 on an R630

Both run tunnels for:

  • Blockscout
  • FireFly
  • Gitea
  • Internal admin dashboards (Grafana) behind Cloudflare Access

Keep Proxmox UI LAN-only; if needed, publish via Cloudflare Access with strict posture/MFA.


VMID Allocation Registry

Authoritative Registry Summary

VMID Range Domain Count Notes
10004999 Besu 4,000 Validators, Sentries, RPC, Archive, Reserved
50005099 Blockscout 100 Explorer/Indexing
52005299 Cacti 100 Interop middleware
54005599 CCIP 200 Ops, Monitoring, Commit, Execute, RMN, Reserved
60006099 Fabric 100 Enterprise contracts
62006299 FireFly 100 Workflow/orchestration
64007399 Indy 1,000 Identity layer
78008999 Sankofa/Phoenix/PanTel 1,200 Service + Cloud + Telecom
1000013999 Phoenix Sovereign Cloud Band 4,000 SMOM/ICCC/DBIS/AR tenants

Total Allocated: 11,000 VMIDs (1000-13999)

See VMID_ALLOCATION_FINAL.md for complete details.


CCIP Fleet Deployment Matrix

Lane A — Minimum Production Fleet

Total new CCIP nodes: 41 (or 43 if you add 2 monitoring nodes)

VMIDs + Hostnames

Group Count VMIDs Hostname Pattern
Ops/Admin 2 54005401 ccip-ops-01..02
Monitoring (optional) 2 54025403 ccip-mon-01..02
Commit Oracles 16 54105425 ccip-commit-01..16
Execute Oracles 16 54405455 ccip-exec-01..16
RMN 7 54705476 ccip-rmn-01..07

Private IP Assignments (VLAN-based)

Once VLANs are active, assign:

Role VLAN Subnet
Ops/Admin 130 10.130.0.0/24
Commit 132 10.132.0.0/24
Execute 133 10.133.0.0/24
RMN 134 10.134.0.0/24

Interim Plan: While still on the flat LAN, you can keep your interim plan (192.168.11.170+ block) and migrate later by VLAN cutover.

Egress NAT Mapping (Public blocks placeholder)

  • Commit VLAN (10.132.0.0/24) → Block #2 <PUBLIC_BLOCK_2>/28
  • Execute VLAN (10.133.0.0/24) → Block #3 <PUBLIC_BLOCK_3>/28
  • RMN VLAN (10.134.0.0/24) → Block #4 <PUBLIC_BLOCK_4>/28

See CCIP_DEPLOYMENT_SPEC.md for complete specification.


Deployment Orchestration Workflow

Phase 0 — Validate Foundation

  1. Confirm ER605-A WAN1 static: 76.53.10.34/28, GW 76.53.10.33
  2. Confirm WAN2 on ER605-A (ISP #2) failover
  3. Confirm ES216G trunks and native VLAN 11 mgmt access is stable
  4. Confirm Proxmox mgmt reachable only from trusted admin endpoints

Phase 1 — VLAN Enablement

  1. Configure ES216G trunk ports
  2. Enable VLAN-aware bridge vmbr0 on Proxmox nodes
  3. Create VLAN interfaces on ER605 for routing + DHCP (where appropriate)
  4. Move services one domain at a time (start with monitoring)

Phase 2 — Observability First

  1. Deploy monitoring stack (Prometheus/Grafana/Loki/Alertmanager)
  2. Publish Grafana via Cloudflare Access (not public IPs)
  3. Set alerts for node health, disk, latency, chain metrics

Phase 3 — CCIP Fleet (Lane A)

  1. Deploy CCIP Ops/Admin
  2. Deploy 16 commit nodes (VLAN 132)
  3. Deploy 16 execute nodes (VLAN 133)
  4. Deploy 7 RMN nodes (VLAN 134)
  5. Apply ER605 outbound NAT pools per VLAN using /28 blocks #2#4 placeholders
  6. Verify node egress identity by role (allowlisting ready)

Phase 4 — Sovereign Tenant Rollout

  1. Stand up Phoenix Sovereign Cloud Band VLANs 200203
  2. Apply Block #6 egress NAT
  3. Enforce tenant isolation (ACLs, deny east-west)

Operational Runbooks

Network Operations

Deployment Operations

Troubleshooting


Deliverables

Completed

  • Authoritative VLAN and subnet plan
  • Public block usage model (with placeholders for 5 blocks)
  • Proxmox cluster topology plan
  • CCIP fleet deployment matrix
  • Stepwise orchestration workflow

Pending

  • Exact NAT/VIP rules (requires public blocks #2-6)
  • ER605-B role decision (standby edge vs dedicated sovereign edge)
  • VLAN migration execution
  • CCIP fleet deployment

Next Steps

To Finalize Placeholders

Paste the other five /28 blocks in the same format as Block #1:

  • Network / Gateway / Usable / Broadcast

And specify:

  • ER605-B usage: standby edge OR dedicated sovereign edge

Then we can produce:

  • Exact NAT pool assignment sheet per role
  • Break-glass VIP table
  • Complete ER605 configuration

Prerequisites

Architecture

Configuration

Operations

Best Practices

Reference


Document Status: Complete (v1.0)
Maintained By: Infrastructure Team
Review Cycle: Monthly
Last Updated: 2025-01-20