Complete markdown files cleanup and organization

- Organized 252 files across project
- Root directory: 187 → 2 files (98.9% reduction)
- Moved configuration guides to docs/04-configuration/
- Moved troubleshooting guides to docs/09-troubleshooting/
- Moved quick start guides to docs/01-getting-started/
- Moved reports to reports/ directory
- Archived temporary files
- Generated comprehensive reports and documentation
- Created maintenance scripts and guides

All files organized according to established standards.
This commit is contained in:
defiQUG
2026-01-06 01:46:25 -08:00
parent 1edcec953c
commit cb47cce074
1327 changed files with 217220 additions and 801 deletions

View File

@@ -0,0 +1,547 @@
# Comprehensive Infrastructure Review
**Last Updated:** 2025-12-27
**Document Version:** 1.0
**Status:** Active Documentation
**Review Scope:** All Tunnels, DNS Entries, Nginx Configurations, VMIDs
---
## Executive Summary
This document provides a comprehensive review of:
- ✅ All Cloudflare Tunnels
- ✅ All DNS Entries
- ✅ All Nginx Configurations
- ✅ All VMIDs and Services
- ✅ Recommendations for Optimization
---
## 1. Cloudflare Tunnels Review
### Active Tunnels
| Tunnel Name | Tunnel ID | Status | Location | Purpose |
|-------------|-----------|--------|-----------|---------|
| `explorer.d-bis.org` | `b02fe1fe-cb7d-484e-909b-7cc41298ebe8` | ✅ HEALTHY | VMID 102 | Explorer/Blockscout |
| `rpc-http-pub.d-bis.org` | `10ab22da-8ea3-4e2e-a896-27ece2211a05` | ⚠️ DOWN | VMID 102 | RPC Services (needs config) |
| `mim4u-tunnel` | `f8d06879-04f8-44ef-aeda-ce84564a1792` | ✅ HEALTHY | Unknown | Miracles In Motion |
| `tunnel-ml110` | `ccd7150a-9881-4b8c-a105-9b4ead6e69a2` | ✅ HEALTHY | Unknown | Proxmox Host Access |
| `tunnel-r630-01` | `4481af8f-b24c-4cd3-bdd5-f562f4c97df4` | ✅ HEALTHY | Unknown | Proxmox Host Access |
| `tunnel-r630-02` | `0876f12b-64d7-4927-9ab3-94cb6cf48af9` | ✅ HEALTHY | Unknown | Proxmox Host Access |
### Current Tunnel Configuration (VMID 102)
**Active Tunnel**: `rpc-http-pub.d-bis.org` (Tunnel ID: `10ab22da-8ea3-4e2e-a896-27ece2211a05`)
**Current Routing** (from logs):
- `rpc-ws-pub.d-bis.org``https://192.168.11.252:443`
- `rpc-http-prv.d-bis.org``https://192.168.11.251:443`
- `rpc-ws-prv.d-bis.org``https://192.168.11.251:443`
- `rpc-http-pub.d-bis.org``https://192.168.11.252:443`
**⚠️ Issue**: Tunnel is routing directly to RPC nodes instead of central Nginx
**✅ Recommended Configuration**:
- All HTTP endpoints → `http://192.168.11.21:80` (Central Nginx)
- WebSocket endpoints → Direct to RPC nodes (as configured)
---
## 2. DNS Entries Review
### Current DNS Records (from d-bis.org zone file)
#### A Records (Direct IPs)
| Domain | IP Address(es) | Proxy Status | Notes |
|--------|----------------|--------------|-------|
| `api.d-bis.org` | 20.8.47.226 | ❌ Not Proxied | Should use tunnel |
| `besu.d-bis.org` | 20.215.32.42, 70.153.83.83 | ✅ Proxied | **DUPLICATE** - Remove one |
| `blockscout.d-bis.org` | 20.215.32.42, 70.153.83.83 | ✅ Proxied | **DUPLICATE** - Remove one |
| `d-bis.org` (root) | 20.215.32.42, 20.215.32.15 | ✅ Proxied | **DUPLICATE** - Remove one |
| `docs.d-bis.org` | 20.8.47.226 | ❌ Not Proxied | Should use tunnel |
| `explorer.d-bis.org` | 20.215.32.42, 70.153.83.83 | ✅ Proxied | **DUPLICATE** - Remove one |
| `grafana.d-bis.org` | 20.8.47.226 | ❌ Not Proxied | Should use tunnel |
| `metrics.d-bis.org` | 70.153.83.83 | ❌ Not Proxied | Should use tunnel |
| `monitoring.d-bis.org` | 70.153.83.83 | ✅ Proxied | Should use tunnel |
| `prometheus.d-bis.org` | 20.8.47.226 | ❌ Not Proxied | Should use tunnel |
| `tessera.d-bis.org` | 20.8.47.226 | ❌ Not Proxied | Should use tunnel |
| `wallet.d-bis.org` | 70.153.83.83 | ✅ Proxied | Should use tunnel |
| `ws.d-bis.org` | 20.8.47.226 | ❌ Not Proxied | Should use tunnel |
| `www.d-bis.org` | 20.8.47.226 | ✅ Proxied | Should use tunnel |
#### CNAME Records (Tunnel-based)
| Domain | Target | Proxy Status | Notes |
|--------|--------|--------------|-------|
| `rpc.d-bis.org` | `dbis138fdendpoint-cgergbcqb7aca7at.a03.azurefd.net` | ✅ Proxied | Azure Front Door |
| `ipfs.d-bis.org` | `ipfs.cloudflare.com` | ✅ Proxied | Cloudflare IPFS |
#### Missing DNS Records (Should Exist)
| Domain | Type | Target | Status |
|--------|------|--------|--------|
| `rpc-http-pub.d-bis.org` | CNAME | `<tunnel-id>.cfargotunnel.com` | ❌ Missing |
| `rpc-ws-pub.d-bis.org` | CNAME | `<tunnel-id>.cfargotunnel.com` | ❌ Missing |
| `rpc-http-prv.d-bis.org` | CNAME | `<tunnel-id>.cfargotunnel.com` | ❌ Missing |
| `rpc-ws-prv.d-bis.org` | CNAME | `<tunnel-id>.cfargotunnel.com` | ❌ Missing |
| `dbis-admin.d-bis.org` | CNAME | `<tunnel-id>.cfargotunnel.com` | ❌ Missing |
| `dbis-api.d-bis.org` | CNAME | `<tunnel-id>.cfargotunnel.com` | ❌ Missing |
| `dbis-api-2.d-bis.org` | CNAME | `<tunnel-id>.cfargotunnel.com` | ❌ Missing |
| `mim4u.org` | CNAME | `<tunnel-id>.cfargotunnel.com` | ❌ Missing |
| `www.mim4u.org` | CNAME | `<tunnel-id>.cfargotunnel.com` | ❌ Missing |
---
## 3. Nginx Configurations Review
### Central Nginx (VMID 105 - 192.168.11.21)
**Status**: ✅ Configured
**Configuration**: `/data/nginx/custom/http.conf`
**Type**: Nginx Proxy Manager (OpenResty)
**Configured Services**:
-`explorer.d-bis.org``http://192.168.11.140:80`
-`rpc-http-pub.d-bis.org``https://192.168.11.252:443`
-`rpc-http-prv.d-bis.org``https://192.168.11.251:443`
-`dbis-admin.d-bis.org``http://192.168.11.130:80`
-`dbis-api.d-bis.org``http://192.168.11.150:3000`
-`dbis-api-2.d-bis.org``http://192.168.11.151:3000`
-`mim4u.org``http://192.168.11.19:80`
-`www.mim4u.org``301 Redirect``mim4u.org`
**Note**: WebSocket endpoints (`rpc-ws-*`) are NOT in this config (routing directly)
### Blockscout Nginx (VMID 5000 - 192.168.11.140)
**Status**: ✅ Running
**Configuration**: `/etc/nginx/sites-available/blockscout`
**Purpose**: Local Nginx for Blockscout service
**Ports**:
- Port 80: HTTP (redirects to HTTPS or serves content)
- Port 443: HTTPS (proxies to Blockscout on port 4000)
### Miracles In Motion Nginx (VMID 7810 - 192.168.11.19)
**Status**: ✅ Running
**Configuration**: `/etc/nginx/sites-available/default`
**Purpose**: Web frontend and API proxy
**Ports**:
- Port 80: HTTP (serves static files, proxies API to 192.168.11.8:3001)
### DBIS Frontend Nginx (VMID 10130 - 192.168.11.130)
**Status**: ✅ Running (assumed)
**Purpose**: Frontend admin console
### RPC Nodes Nginx (VMIDs 2500, 2501, 2502)
**Status**: ⚠️ Partially Configured
**Purpose**: SSL termination and local routing
**VMID 2500** (192.168.11.250):
- Port 443: HTTPS RPC → `127.0.0.1:8545`
- Port 8443: HTTPS WebSocket → `127.0.0.1:8546`
**VMID 2501** (192.168.11.251):
- Port 443: HTTPS RPC → `127.0.0.1:8545`
- Port 443: HTTPS WebSocket → `127.0.0.1:8546` (SNI-based)
**VMID 2502** (192.168.11.252):
- Port 443: HTTPS RPC → `127.0.0.1:8545`
- Port 443: HTTPS WebSocket → `127.0.0.1:8546` (SNI-based)
---
## 4. VMIDs Review
### Infrastructure Services
| VMID | Name | IP | Status | Purpose |
|------|------|----|----|---------|
| 100 | proxmox-mail-gateway | 192.168.11.32 | ✅ Running | Mail gateway |
| 101 | proxmox-datacenter-manager | 192.168.11.33 | ✅ Running | Datacenter management |
| 102 | cloudflared | 192.168.11.34 | ✅ Running | Cloudflare tunnel client |
| 103 | omada | 192.168.11.30 | ✅ Running | Network management |
| 104 | gitea | 192.168.11.31 | ✅ Running | Git repository |
| 105 | nginxproxymanager | 192.168.11.26 | ✅ Running | Central Nginx reverse proxy |
| 130 | monitoring-1 | 192.168.11.27 | ✅ Running | Monitoring stack |
### Blockchain Services
| VMID | Name | IP | Status | Purpose | Notes |
|------|------|----|----|---------|-------|
| 5000 | blockscout-1 | 192.168.11.140 | ✅ Running | Blockchain explorer | Has local Nginx |
| 6200 | firefly-1 | 192.168.11.7 | ✅ Running | Hyperledger Firefly | Web3 gateway |
### RPC Nodes
| VMID | Name | IP | Status | Purpose | Notes |
|------|------|----|----|---------|-------|
| 2500 | besu-rpc-1 | 192.168.11.250 | ✅ Running | Core RPC | Located on ml110 (192.168.11.10) |
| 2501 | besu-rpc-2 | 192.168.11.251 | ✅ Running | Permissioned RPC | Located on ml110 (192.168.11.10) |
| 2502 | besu-rpc-3 | 192.168.11.252 | ✅ Running | Public RPC | Located on ml110 (192.168.11.10) |
**✅ Status**: RPC nodes are running on ml110 (192.168.11.10), not on pve2.
### Application Services
| VMID | Name | IP | Status | Purpose |
|------|------|----|----|---------|
| 7800 | sankofa-api-1 | 192.168.11.13 | ✅ Running | Sankofa API |
| 7801 | sankofa-portal-1 | 192.168.11.16 | ✅ Running | Sankofa Portal |
| 7802 | sankofa-keycloak-1 | 192.168.11.17 | ✅ Running | Sankofa Keycloak |
| 7810 | mim-web-1 | 192.168.11.19 | ✅ Running | Miracles In Motion Web |
| 7811 | mim-api-1 | 192.168.11.8 | ✅ Running | Miracles In Motion API |
### DBIS Core Services
| VMID | Name | IP | Status | Purpose | Notes |
|------|------|----|----|---------|-------|
| 10100 | dbis-postgres-primary | 192.168.11.100 | ✅ Running | PostgreSQL Primary | Located on ml110 (192.168.11.10) |
| 10101 | dbis-postgres-replica-1 | 192.168.11.101 | ✅ Running | PostgreSQL Replica | Located on ml110 (192.168.11.10) |
| 10120 | dbis-redis | 192.168.11.120 | ✅ Running | Redis Cache | Located on ml110 (192.168.11.10) |
| 10130 | dbis-frontend | 192.168.11.130 | ✅ Running | Frontend Admin | Located on ml110 (192.168.11.10) |
| 10150 | dbis-api-primary | 192.168.11.150 | ✅ Running | API Primary | Located on ml110 (192.168.11.10) |
| 10151 | dbis-api-secondary | 192.168.11.151 | ✅ Running | API Secondary | Located on ml110 (192.168.11.10) |
**✅ Status**: DBIS Core containers are running on ml110 (192.168.11.10), not on pve2.
---
## 5. Critical Issues Identified
### 🔴 High Priority
1. **Tunnel Configuration Mismatch**
- Tunnel `rpc-http-pub.d-bis.org` is DOWN
- Currently routing directly to RPC nodes instead of central Nginx
- **Action**: Update Cloudflare dashboard to route HTTP endpoints to `http://192.168.11.21:80`
2. **Missing DNS Records**
- RPC endpoints (`rpc-http-pub`, `rpc-ws-pub`, `rpc-http-prv`, `rpc-ws-prv`) missing CNAME records
- DBIS services (`dbis-admin`, `dbis-api`, `dbis-api-2`) missing CNAME records
- `mim4u.org` and `www.mim4u.org` missing CNAME records
- **Action**: Create CNAME records pointing to tunnel
3. **Duplicate DNS A Records**
- `besu.d-bis.org`: 2 A records (20.215.32.42, 70.153.83.83)
- `blockscout.d-bis.org`: 2 A records (20.215.32.42, 70.153.83.83)
- `explorer.d-bis.org`: 2 A records (20.215.32.42, 70.153.83.83)
- `d-bis.org`: 2 A records (20.215.32.42, 20.215.32.15)
- **Action**: Remove duplicate records, keep single authoritative IP
4. **RPC Nodes Location**
- ✅ VMIDs 2500, 2501, 2502 found on ml110 (192.168.11.10)
- **Action**: Verify network connectivity from pve2 to ml110
5. **DBIS Core Services Location**
- ✅ VMIDs 10100-10151 found on ml110 (192.168.11.10)
- **Action**: Verify network connectivity from pve2 to ml110
### 🟡 Medium Priority
6. **DNS Records Using Direct IPs Instead of Tunnels**
- Many services use A records with direct IPs
- Should use CNAME records pointing to tunnel
- **Action**: Migrate to tunnel-based DNS
7. **Inconsistent Proxy Status**
- Some records proxied, some not
- **Action**: Standardize proxy status (proxied for public services)
8. **Multiple Nginx Instances**
- Central Nginx (105), Blockscout Nginx (5000), MIM Nginx (7810), RPC Nginx (2500-2502)
- **Action**: Consider consolidating or document purpose of each
### 🟢 Low Priority
9. **Documentation Gaps**
- Some VMIDs have incomplete documentation
- **Action**: Update documentation with current status
10. **Service Discovery**
- No centralized service registry
- **Action**: Consider implementing service discovery
---
## 6. Recommendations
### Immediate Actions (Critical)
1. **Fix Tunnel Configuration**
```yaml
# Update Cloudflare dashboard for tunnel: rpc-http-pub.d-bis.org
# Route all HTTP endpoints to central Nginx:
- explorer.d-bis.org → http://192.168.11.21:80
- rpc-http-pub.d-bis.org → http://192.168.11.21:80
- rpc-http-prv.d-bis.org → http://192.168.11.21:80
- dbis-admin.d-bis.org → http://192.168.11.21:80
- dbis-api.d-bis.org → http://192.168.11.21:80
- dbis-api-2.d-bis.org → http://192.168.11.21:80
- mim4u.org → http://192.168.11.21:80
- www.mim4u.org → http://192.168.11.21:80
```
2. **Create Missing DNS Records**
- Create CNAME records for all RPC endpoints
- Create CNAME records for DBIS services
- Create CNAME records for MIM services
- All should point to: `<tunnel-id>.cfargotunnel.com`
- Enable proxy (orange cloud) for all
3. **Remove Duplicate DNS Records**
- Remove duplicate A records for `besu.d-bis.org`
- Remove duplicate A records for `blockscout.d-bis.org`
- Remove duplicate A records for `explorer.d-bis.org`
- Remove duplicate A records for `d-bis.org` (keep 20.215.32.15)
4. **Locate Missing VMIDs**
- Find RPC nodes (2500-2502) on other Proxmox hosts
- Verify DBIS Core services (10100-10151) deployment status
### Short-term Improvements
5. **DNS Migration to Tunnels**
- Migrate all A records to CNAME records pointing to tunnels
- Remove direct IP exposure
- Enable proxy for all public services
6. **Tunnel Consolidation**
- Consider consolidating multiple tunnels into single tunnel
- Use central Nginx for all HTTP routing
- Simplify tunnel management
7. **Nginx Architecture Review**
- Document purpose of each Nginx instance
- Consider if all are necessary
- Standardize configuration approach
### Long-term Optimizations
8. **Service Discovery**
- Implement centralized service registry
- Automate DNS record creation
- Dynamic service routing
9. **Monitoring and Alerting**
- Monitor all tunnel health
- Alert on tunnel failures
- Track DNS record changes
10. **Documentation**
- Maintain up-to-date infrastructure map
- Document all service dependencies
- Create runbooks for common operations
---
## 7. Architecture Recommendations
### Recommended Architecture
```
Internet
Cloudflare (DNS + SSL Termination)
Cloudflare Tunnel (VMID 102)
Routing Decision:
├─ HTTP Services → Central Nginx (VMID 105:80) → Internal Services
└─ WebSocket Services → Direct to RPC Nodes (bypass Nginx)
```
**Key Principle**:
- HTTP traffic routes through central Nginx for unified management
- WebSocket traffic routes directly to RPC nodes for optimal performance
### Benefits
1. **Single Point of Configuration**: All HTTP routing in one place
2. **Simplified Management**: Easy to add/remove services
3. **Better Security**: No direct IP exposure
4. **Centralized Logging**: All traffic logs in one location
5. **Easier Troubleshooting**: Single point to check routing
---
## 8. Action Items Checklist
### Critical (Do First)
- [ ] Update Cloudflare tunnel configuration to route HTTP endpoints to central Nginx
- [ ] Create missing DNS CNAME records for all services
- [ ] Remove duplicate DNS A records
- [x] Locate and verify RPC nodes (2500-2502) - ✅ Found on ml110
- [x] Verify DBIS Core services deployment status - ✅ Found on ml110
- [ ] Verify network connectivity from pve2 (192.168.11.12) to ml110 (192.168.11.10)
### Important (Do Next)
- [ ] Migrate remaining A records to CNAME (tunnel-based)
- [ ] Standardize proxy status across all DNS records
- [ ] Document all Nginx instances and their purposes
- [ ] Test all endpoints after configuration changes
### Nice to Have
- [ ] Implement service discovery
- [ ] Set up monitoring and alerting
- [ ] Create comprehensive infrastructure documentation
- [ ] Automate DNS record management
---
## 9. DNS Records Migration Plan
### Current State (A Records - Direct IPs)
Many services use A records pointing to direct IPs. These should be migrated to CNAME records pointing to Cloudflare tunnels.
### Migration Priority
**High Priority** (Public-facing services):
1. `explorer.d-bis.org` → CNAME to tunnel
2. `rpc-http-pub.d-bis.org` → CNAME to tunnel
3. `rpc-ws-pub.d-bis.org` → CNAME to tunnel
4. `rpc-http-prv.d-bis.org` → CNAME to tunnel
5. `rpc-ws-prv.d-bis.org` → CNAME to tunnel
**Medium Priority** (Internal services):
6. `dbis-admin.d-bis.org` → CNAME to tunnel
7. `dbis-api.d-bis.org` → CNAME to tunnel
8. `dbis-api-2.d-bis.org` → CNAME to tunnel
9. `mim4u.org` → CNAME to tunnel
10. `www.mim4u.org` → CNAME to tunnel
**Low Priority** (Monitoring/internal):
11. `grafana.d-bis.org` → CNAME to tunnel (if public access needed)
12. `prometheus.d-bis.org` → CNAME to tunnel (if public access needed)
13. `monitoring.d-bis.org` → CNAME to tunnel
### Migration Steps
For each domain:
1. Create CNAME record: `<subdomain>` → `<tunnel-id>.cfargotunnel.com`
2. Enable proxy (orange cloud)
3. Wait for DNS propagation (1-5 minutes)
4. Test endpoint accessibility
5. Remove old A record (if exists)
---
## 10. Testing Plan
After implementing recommendations:
1. **Test HTTP Endpoints**:
```bash
curl https://explorer.d-bis.org/api/v2/stats
curl -X POST https://rpc-http-pub.d-bis.org \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}'
curl https://dbis-admin.d-bis.org
curl https://mim4u.org
```
2. **Test WebSocket Endpoints**:
```bash
wscat -c wss://rpc-ws-pub.d-bis.org
wscat -c wss://rpc-ws-prv.d-bis.org
```
3. **Test Redirects**:
```bash
curl -I https://www.mim4u.org # Should redirect to mim4u.org
```
4. **Verify Tunnel Health**:
- Check Cloudflare dashboard for tunnel status
- Verify all tunnels show HEALTHY
- Check tunnel logs for errors
---
---
## 11. Summary of Recommendations
### 🔴 Critical (Fix Immediately)
1. **Update Cloudflare Tunnel Configuration**
- Tunnel: `rpc-http-pub.d-bis.org` (Tunnel ID: `10ab22da-8ea3-4e2e-a896-27ece2211a05`)
- Action: Route all HTTP endpoints to `http://192.168.11.21:80` (central Nginx)
- Keep WebSocket endpoints routing directly to RPC nodes
2. **Create Missing DNS CNAME Records**
- `rpc-http-pub.d-bis.org` → CNAME to tunnel
- `rpc-ws-pub.d-bis.org` → CNAME to tunnel
- `rpc-http-prv.d-bis.org` → CNAME to tunnel
- `rpc-ws-prv.d-bis.org` → CNAME to tunnel
- `dbis-admin.d-bis.org` → CNAME to tunnel
- `dbis-api.d-bis.org` → CNAME to tunnel
- `dbis-api-2.d-bis.org` → CNAME to tunnel
- `mim4u.org` → CNAME to tunnel
- `www.mim4u.org` → CNAME to tunnel
3. **Remove Duplicate DNS A Records**
- `besu.d-bis.org`: Remove one IP (keep single authoritative)
- `blockscout.d-bis.org`: Remove one IP
- `explorer.d-bis.org`: Remove one IP
- `d-bis.org`: Remove 20.215.32.42 (keep 20.215.32.15)
### 🟡 Important (Fix Soon)
4. **Migrate A Records to CNAME (Tunnel-based)**
- Convert remaining A records to CNAME records
- Point all to Cloudflare tunnel endpoints
- Enable proxy (orange cloud) for all public services
5. **Verify Network Connectivity**
- Test connectivity from pve2 (192.168.11.12) to ml110 (192.168.11.10)
- Ensure RPC nodes (2500-2502) are accessible from central Nginx
- Ensure DBIS services (10100-10151) are accessible from central Nginx
### 🟢 Optimization (Nice to Have)
6. **Documentation Updates**
- Update all service documentation with current IPs and locations
- Document network topology (pve2 vs ml110)
- Create service dependency map
7. **Monitoring Setup**
- Monitor all tunnel health
- Alert on tunnel failures
- Track DNS record changes
---
## Related Documentation
### Architecture Documents
- **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md)** ⭐⭐⭐ - Complete network architecture
- **[PHYSICAL_HARDWARE_INVENTORY.md](PHYSICAL_HARDWARE_INVENTORY.md)** ⭐⭐⭐ - Physical hardware inventory
- **[ORCHESTRATION_DEPLOYMENT_GUIDE.md](ORCHESTRATION_DEPLOYMENT_GUIDE.md)** ⭐⭐⭐ - Deployment orchestration
- **[DOMAIN_STRUCTURE.md](DOMAIN_STRUCTURE.md)** ⭐⭐ - Domain structure
### Network Documents
- **[../05-network/CLOUDFLARE_TUNNEL_ROUTING_ARCHITECTURE.md](../05-network/CLOUDFLARE_TUNNEL_ROUTING_ARCHITECTURE.md)** - Cloudflare tunnel routing
- **[../05-network/CENTRAL_NGINX_ROUTING_SETUP.md](../05-network/CENTRAL_NGINX_ROUTING_SETUP.md)** - Central Nginx routing
### Configuration Documents
- **[../04-configuration/cloudflare/CLOUDFLARE_DNS_TO_CONTAINERS.md](../04-configuration/cloudflare/CLOUDFLARE_DNS_TO_CONTAINERS.md)** - DNS mapping to containers
- **[../04-configuration/RPC_DNS_CONFIGURATION.md](../04-configuration/RPC_DNS_CONFIGURATION.md)** - RPC DNS configuration
---
**Last Updated:** 2025-12-27
**Document Version:** 1.0
**Review Cycle:** Quarterly

View File

@@ -0,0 +1,172 @@
# Domain Structure
**Last Updated:** 2025-01-03
**Document Version:** 1.0
**Status:** Active Documentation
---
## Overview
This document defines the domain structure for the infrastructure, clarifying which domains are used for different purposes.
---
## Domain Assignments
### 1. sankofa.nexus - Hardware Infrastructure
**Purpose:** Physical hardware hostnames and internal network DNS
**Usage:**
- All physical servers (ml110, r630-01 through r630-04)
- Internal network DNS resolution
- SSH access via FQDN
- Internal service discovery
**Examples:**
- `ml110.sankofa.nexus` → 192.168.11.10
- `r630-01.sankofa.nexus` → 192.168.11.11
- `r630-02.sankofa.nexus` → 192.168.11.12
- `r630-03.sankofa.nexus` → 192.168.11.13
- `r630-04.sankofa.nexus` → 192.168.11.14
**DNS Configuration:**
- Internal DNS server (typically on ER605 or Omada controller)
- Not publicly resolvable (internal network only)
- Used for local network service discovery
**Related Documentation:**
- [Physical Hardware Inventory](./PHYSICAL_HARDWARE_INVENTORY.md)
---
### 2. d-bis.org - ChainID 138 Services
**Purpose:** Public-facing services for ChainID 138 blockchain network
**Usage:**
- RPC endpoints (public and permissioned)
- Block explorer
- WebSocket endpoints
- Cloudflare tunnels for Proxmox hosts
- All ChainID 138 blockchain-related services
**Examples:**
- `rpc.d-bis.org` - Primary RPC endpoint
- `rpc2.d-bis.org` - Secondary RPC endpoint
- `explorer.d-bis.org` - Block explorer (Blockscout)
- `ml110-01.d-bis.org` - Proxmox UI (via Cloudflare tunnel)
- `r630-01.d-bis.org` - Proxmox UI (via Cloudflare tunnel)
- `r630-02.d-bis.org` - Proxmox UI (via Cloudflare tunnel)
- `r630-03.d-bis.org` - Proxmox UI (via Cloudflare tunnel)
- `r630-04.d-bis.org` - Proxmox UI (via Cloudflare tunnel)
**DNS Configuration:**
- Cloudflare DNS (proxied)
- Publicly resolvable
- SSL/TLS via Cloudflare
**Related Documentation:**
- [Cloudflare Tunnel Setup](../04-configuration/CLOUDFLARE_TUNNEL_CONFIGURATION_GUIDE.md)
- [RPC Configuration](../04-configuration/RPC_DNS_CONFIGURATION.md)
- [Blockscout Setup](../BLOCKSCOUT_COMPLETE_SUMMARY.md)
---
### 3. defi-oracle.io - ChainID 138 Legacy (ThirdWeb RPC)
**Purpose:** Legacy RPC endpoint for ThirdWeb integration
**Usage:**
- ThirdWeb RPC endpoint (VMID 2400)
- Legacy compatibility for existing integrations
- Public RPC access for ChainID 138
**Examples:**
- `rpc.defi-oracle.io` - Legacy RPC endpoint
- `rpc.public-0138.defi-oracle.io` - Specific ChainID 138 RPC endpoint
**DNS Configuration:**
- Cloudflare DNS (proxied)
- Publicly resolvable
- SSL/TLS via Cloudflare
**Note:** This domain is maintained for backward compatibility with ThirdWeb integrations. New integrations should use `d-bis.org` endpoints.
**Related Documentation:**
- [ThirdWeb RPC Setup](../04-configuration/THIRDWEB_RPC_CLOUDFLARE_SETUP.md)
- [VMID 2400 DNS Structure](../04-configuration/VMID2400_DNS_STRUCTURE.md)
---
## Domain Summary Table
| Domain | Purpose | Public | DNS Provider | SSL/TLS |
|--------|---------|--------|--------------|---------|
| `sankofa.nexus` | Hardware infrastructure | No (internal) | Internal DNS | Self-signed |
| `d-bis.org` | ChainID 138 services | Yes | Cloudflare | Cloudflare |
| `defi-oracle.io` | ChainID 138 legacy (ThirdWeb) | Yes | Cloudflare | Cloudflare |
---
## Domain Usage Guidelines
### When to Use sankofa.nexus
- Internal network communication
- SSH access to physical hosts
- Internal service discovery
- Local network DNS resolution
- Proxmox cluster communication
### When to Use d-bis.org
- Public blockchain RPC endpoints
- Block explorer access
- Public-facing Proxmox UI (via tunnels)
- ChainID 138 service endpoints
- New integrations and services
### When to Use defi-oracle.io
- ThirdWeb RPC endpoint (legacy)
- Backward compatibility
- Existing integrations that reference this domain
---
## Migration Notes
### From defi-oracle.io to d-bis.org
For new services and integrations:
- **Use `d-bis.org`** as the primary domain
- `defi-oracle.io` is maintained for legacy ThirdWeb RPC compatibility
- All new ChainID 138 services should use `d-bis.org`
### DNS Record Management
- **sankofa.nexus**: Managed via internal DNS (Omada controller or local DNS server)
- **d-bis.org**: Managed via Cloudflare DNS
- **defi-oracle.io**: Managed via Cloudflare DNS
---
## Related Documentation
### Architecture Documents
- **[PHYSICAL_HARDWARE_INVENTORY.md](PHYSICAL_HARDWARE_INVENTORY.md)** ⭐⭐⭐ - Physical hardware inventory
- **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md)** ⭐⭐⭐ - Complete network architecture
- **[ORCHESTRATION_DEPLOYMENT_GUIDE.md](ORCHESTRATION_DEPLOYMENT_GUIDE.md)** ⭐⭐⭐ - Deployment orchestration
### Configuration Documents
- **[../04-configuration/cloudflare/CLOUDFLARE_TUNNEL_CONFIGURATION_GUIDE.md](../04-configuration/cloudflare/CLOUDFLARE_TUNNEL_CONFIGURATION_GUIDE.md)** - Cloudflare tunnel configuration
- **[../04-configuration/RPC_DNS_CONFIGURATION.md](../04-configuration/RPC_DNS_CONFIGURATION.md)** - RPC DNS configuration
- **[../05-network/CLOUDFLARE_TUNNEL_ROUTING_ARCHITECTURE.md](../05-network/CLOUDFLARE_TUNNEL_ROUTING_ARCHITECTURE.md)** - Cloudflare routing architecture
---
**Last Updated:** 2025-01-03
**Document Version:** 1.0
**Review Cycle:** Quarterly

View File

@@ -1,7 +1,10 @@
# Network Architecture - Enterprise Orchestration Plan
**Navigation:** [Home](../README.md) > [Architecture](README.md) > Network Architecture
**Last Updated:** 2025-01-20
**Document Version:** 2.0
**Status:** 🟢 Active Documentation
**Project:** Sankofa / Phoenix / PanTel · ChainID 138 · Proxmox + Cloudflare Zero Trust + Dual ISP + 6×/28
---
@@ -33,6 +36,8 @@ This document defines the complete enterprise-grade network architecture for the
## 1. Physical Topology & Hardware Roles
> **Reference:** For complete physical hardware inventory including IP addresses, credentials, and detailed specifications, see **[PHYSICAL_HARDWARE_INVENTORY.md](PHYSICAL_HARDWARE_INVENTORY.md)**.
### 1.1 Hardware Role Assignment
#### Edge / Routing
@@ -65,13 +70,14 @@ This document defines the complete enterprise-grade network architecture for the
### Public Block #1 (Known - Spectrum)
| Property | Value |
|----------|-------|
| **Network** | `76.53.10.32/28` |
| **Gateway** | `76.53.10.33` |
| **Usable Range** | `76.53.10.3376.53.10.46` |
| **Broadcast** | `76.53.10.47` |
| **ER605 WAN1 IP** | `76.53.10.34` (router interface) |
| Property | Value | Status |
|----------|-------|--------|
| **Network** | `76.53.10.32/28` | ✅ Configured |
| **Gateway** | `76.53.10.33` | ✅ Active |
| **Usable Range** | `76.53.10.3376.53.10.46` | ✅ In Use |
| **Broadcast** | `76.53.10.47` | - |
| **ER605 WAN1 IP** | `76.53.10.34` (router interface) | ✅ Active |
| **Available IPs** | 13 (76.53.10.35-46, excluding .34) | ✅ Available |
### Public Blocks #2#6 (Placeholders - To Be Configured)
@@ -318,7 +324,43 @@ This architecture should be reflected in:
---
## Related Documentation
### Architecture Documents
- **[PHYSICAL_HARDWARE_INVENTORY.md](PHYSICAL_HARDWARE_INVENTORY.md)** ⭐⭐⭐ - Complete physical hardware inventory and specifications
- **[ORCHESTRATION_DEPLOYMENT_GUIDE.md](ORCHESTRATION_DEPLOYMENT_GUIDE.md)** ⭐⭐⭐ - Enterprise deployment orchestration guide
- **[VMID_ALLOCATION_FINAL.md](VMID_ALLOCATION_FINAL.md)** ⭐⭐⭐ - VMID allocation registry
- **[DOMAIN_STRUCTURE.md](DOMAIN_STRUCTURE.md)** ⭐⭐ - Domain structure and DNS assignments
- **[HOSTNAME_MIGRATION_GUIDE.md](HOSTNAME_MIGRATION_GUIDE.md)** ⭐ - Hostname migration procedures
### Configuration Documents
- **[../04-configuration/ER605_ROUTER_CONFIGURATION.md](../04-configuration/ER605_ROUTER_CONFIGURATION.md)** - Router configuration
- **[../04-configuration/cloudflare/CLOUDFLARE_ZERO_TRUST_GUIDE.md](../04-configuration/cloudflare/CLOUDFLARE_ZERO_TRUST_GUIDE.md)** - Cloudflare Zero Trust setup
- **[../05-network/CLOUDFLARE_TUNNEL_ROUTING_ARCHITECTURE.md](../05-network/CLOUDFLARE_TUNNEL_ROUTING_ARCHITECTURE.md)** - Cloudflare tunnel routing
### Deployment Documents
- **[../03-deployment/ORCHESTRATION_DEPLOYMENT_GUIDE.md](../03-deployment/ORCHESTRATION_DEPLOYMENT_GUIDE.md)** - Deployment orchestration
- **[../07-ccip/CCIP_DEPLOYMENT_SPEC.md](../07-ccip/CCIP_DEPLOYMENT_SPEC.md)** - CCIP deployment specification
---
**Document Status:** Complete (v2.0)
**Maintained By:** Infrastructure Team
**Review Cycle:** Quarterly
**Next Update:** After public blocks #2-6 are assigned
---
## Change Log
### Version 2.0 (2025-01-20)
- Added network topology Mermaid diagram
- Added VLAN architecture Mermaid diagram
- Added ASCII art network topology
- Enhanced public IP block matrix with status indicators
- Added breadcrumb navigation
- Added status indicators
### Version 1.0 (2024-12-15)
- Initial version
- Basic network architecture documentation

View File

@@ -1,10 +1,12 @@
# Orchestration Deployment Guide - Enterprise-Grade
**Navigation:** [Home](../README.md) > [Architecture](README.md) > Orchestration Deployment Guide
**Sankofa / Phoenix / PanTel · ChainID 138 · Proxmox + Cloudflare Zero Trust + Dual ISP + 6×/28**
**Last Updated:** 2025-01-20
**Document Version:** 1.0
**Status:** Buildable Blueprint
**Document Version:** 1.1
**Status:** 🟢 Active Documentation
---
@@ -23,17 +25,20 @@ This guide provides a **buildable blueprint**: network, VLANs, Proxmox cluster,
## Table of Contents
1. [Core Principles](#core-principles)
2. [Physical Topology & Roles](#physical-topology--roles)
3. [ISP & Public IP Plan](#isp--public-ip-plan)
4. [Layer-2 & VLAN Orchestration](#layer-2--vlan-orchestration)
5. [Routing, NAT, and Egress Segmentation](#routing-nat-and-egress-segmentation)
6. [Proxmox Cluster Orchestration](#proxmox-cluster-orchestration)
7. [Cloudflare Zero Trust Orchestration](#cloudflare-zero-trust-orchestration)
8. [VMID Allocation Registry](#vmid-allocation-registry)
9. [CCIP Fleet Deployment Matrix](#ccip-fleet-deployment-matrix)
10. [Deployment Orchestration Workflow](#deployment-orchestration-workflow)
11. [Operational Runbooks](#operational-runbooks)
**Estimated Reading Time:** 45 minutes
**Progress:** Use this TOC to track your reading progress
1. ✅ [Core Principles](#core-principles) - *Foundation concepts*
2. [Physical Topology & Roles](#physical-topology--roles) - *Hardware layout*
3. [ISP & Public IP Plan](#isp--public-ip-plan) - *Public IP allocation*
4. ✅ [Layer-2 & VLAN Orchestration](#layer-2--vlan-orchestration) - *VLAN configuration*
5. [Routing, NAT, and Egress Segmentation](#routing-nat-and-egress-segmentation) - *Network routing*
6. [Proxmox Cluster Orchestration](#proxmox-cluster-orchestration) - *Proxmox setup*
7. ✅ [Cloudflare Zero Trust Orchestration](#cloudflare-zero-trust-orchestration) - *Cloudflare integration*
8. ✅ [VMID Allocation Registry](#vmid-allocation-registry) - *VMID planning*
9. ✅ [CCIP Fleet Deployment Matrix](#ccip-fleet-deployment-matrix) - *CCIP deployment*
10. ✅ [Deployment Orchestration Workflow](#deployment-orchestration-workflow) - *Deployment process*
11. ✅ [Operational Runbooks](#operational-runbooks) - *Operations guide*
---
@@ -52,205 +57,88 @@ This guide provides a **buildable blueprint**: network, VLANs, Proxmox cluster,
## Physical Topology & Roles
### Hardware Role Assignment
> **Reference:** For complete hardware role assignments, physical topology, and detailed specifications, see **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md#1-physical-topology--hardware-roles)**.
#### Edge / Routing
> **Hardware Inventory:** For complete physical hardware inventory including IP addresses, credentials, hostnames, and detailed specifications, see **[PHYSICAL_HARDWARE_INVENTORY.md](PHYSICAL_HARDWARE_INVENTORY.md)** ⭐⭐⭐.
**ER605-A (Primary Edge Router)**
- WAN1: Spectrum primary with Block #1 (76.53.10.32/28)
- WAN2: ISP #2 (failover/alternate policy)
- Role: Active edge router, NAT pools, routing
**ER605-B (Standby Edge Router / Alternate WAN policy)**
- Role: Standby router OR dedicated to WAN2 policies/testing
- Note: ER605 does not support full stateful HA. This is **active/standby operational redundancy**, not automatic session-preserving HA.
#### Switching Fabric
- **ES216G-1**: Core / uplinks / trunks
- **ES216G-2**: Compute rack aggregation
- **ES216G-3**: Mgmt + out-of-band / staging
#### Compute
- **ML110 Gen9**: "Bootstrap & Management" node
- IP: 192.168.11.10
- Role: Proxmox mgmt services, Omada controller, Git, monitoring seed
- **4× Dell R630**: Proxmox compute cluster nodes
- Resources: 512GB RAM each, 2×600GB boot, 6×250GB SSD
- Role: Production workloads, CCIP fleet, sovereign tenants, services
**Summary:**
- **2× ER605** (edge + HA/failover design)
- **3× ES216G switches** (core, compute, mgmt)
- **1× ML110 Gen9** (management / seed / bootstrap) - IP: 192.168.11.10
- **4× Dell R630** (compute cluster; 512GB RAM each; 2×600GB boot; 6×250GB SSD)
---
## ISP & Public IP Plan (6× /28)
## ISP & Public IP Plan
### Public Block #1 (Known - Spectrum)
> **Reference:** For complete public IP block plan, usage policy, and NAT pool assignments, see **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md#2-isp--public-ip-plan-6--28)**.
| Property | Value |
|----------|-------|
| **Network** | `76.53.10.32/28` |
| **Gateway** | `76.53.10.33` |
| **Usable Range** | `76.53.10.3376.53.10.46` |
| **Broadcast** | `76.53.10.47` |
| **ER605 WAN1 IP** | `76.53.10.34` (router interface) |
### Public Blocks #2#6 (Placeholders - To Be Configured)
| Block | Network | Gateway | Usable Range | Broadcast | Designated Use |
|-------|--------|---------|--------------|-----------|----------------|
| **#2** | `<PUBLIC_BLOCK_2>/28` | `<GW2>` | `<USABLE2>` | `<BCAST2>` | CCIP Commit egress NAT pool |
| **#3** | `<PUBLIC_BLOCK_3>/28` | `<GW3>` | `<USABLE3>` | `<BCAST3>` | CCIP Execute egress NAT pool |
| **#4** | `<PUBLIC_BLOCK_4>/28` | `<GW4>` | `<USABLE4>` | `<BCAST4>` | RMN egress NAT pool |
| **#5** | `<PUBLIC_BLOCK_5>/28` | `<GW5>` | `<USABLE5>` | `<BCAST5>` | Sankofa/Phoenix/PanTel service egress |
| **#6** | `<PUBLIC_BLOCK_6>/28` | `<GW6>` | `<USABLE6>` | `<BCAST6>` | Sovereign Cloud Band tenant egress |
### Public IP Usage Policy (Role-based)
| Public /28 Block | Designated Use | Why |
|------------------|----------------|-----|
| **#1** (76.53.10.32/28) | Router WAN + break-glass VIPs | Primary connectivity + emergency |
| **#2** | CCIP Commit egress NAT pool | Allowlistable egress for source RPCs |
| **#3** | CCIP Execute egress NAT pool | Allowlistable egress for destination RPCs |
| **#4** | RMN egress NAT pool | Independent security-plane egress |
| **#5** | Sankofa/Phoenix/PanTel service egress | Service-plane separation |
| **#6** | Sovereign Cloud Band tenant egress | Per-sovereign policy control |
**Summary:**
- **Block #1** (76.53.10.32/28): Router WAN + break-glass VIPs ✅ Configured
- **Blocks #2-6**: Placeholders for CCIP Commit, Execute, RMN, Service, and Sovereign tenant egress NAT pools
---
## Layer-2 & VLAN Orchestration
### VLAN Set (Authoritative)
> **Reference:** For complete VLAN orchestration plan, subnet allocations, and switching configuration, see **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md#3-layer-2--vlan-orchestration-plan)**.
> **Migration Note:** Currently on flat LAN 192.168.11.0/24. This plan migrates to VLANs while keeping compatibility.
| VLAN ID | VLAN Name | Purpose | Subnet | Gateway |
|--------:|-----------|---------|--------|---------|
| **11** | MGMT-LAN | Proxmox mgmt, switches mgmt, admin endpoints | 192.168.11.0/24 | 192.168.11.1 |
| 110 | BESU-VAL | Validator-only network (no member access) | 10.110.0.0/24 | 10.110.0.1 |
| 111 | BESU-SEN | Sentry mesh | 10.111.0.0/24 | 10.111.0.1 |
| 112 | BESU-RPC | RPC / gateway tier | 10.112.0.0/24 | 10.112.0.1 |
| 120 | BLOCKSCOUT | Explorer + DB | 10.120.0.0/24 | 10.120.0.1 |
| 121 | CACTI | Interop middleware | 10.121.0.0/24 | 10.121.0.1 |
| 130 | CCIP-OPS | Ops/admin | 10.130.0.0/24 | 10.130.0.1 |
| 132 | CCIP-COMMIT | Commit-role DON | 10.132.0.0/24 | 10.132.0.1 |
| 133 | CCIP-EXEC | Execute-role DON | 10.133.0.0/24 | 10.133.0.1 |
| 134 | CCIP-RMN | Risk management network | 10.134.0.0/24 | 10.134.0.1 |
| 140 | FABRIC | Fabric | 10.140.0.0/24 | 10.140.0.1 |
| 141 | FIREFLY | FireFly | 10.141.0.0/24 | 10.141.0.1 |
| 150 | INDY | Identity | 10.150.0.0/24 | 10.150.0.1 |
| 160 | SANKOFA-SVC | Sankofa/Phoenix/PanTel service layer | 10.160.0.0/22 | 10.160.0.1 |
| 200 | PHX-SOV-SMOM | Sovereign tenant | 10.200.0.0/20 | 10.200.0.1 |
| 201 | PHX-SOV-ICCC | Sovereign tenant | 10.201.0.0/20 | 10.201.0.1 |
| 202 | PHX-SOV-DBIS | Sovereign tenant | 10.202.0.0/20 | 10.202.0.1 |
| 203 | PHX-SOV-AR | Absolute Realms tenant | 10.203.0.0/20 | 10.203.0.1 |
### Switching Configuration (ES216G)
- **ES216G-1**: **Core** (all VLAN trunks to ES216G-2/3 + ER605-A)
- **ES216G-2**: **Compute** (trunks to R630s + ML110)
- **ES216G-3**: **Mgmt/OOB** (mgmt access ports, staging, out-of-band)
**All Proxmox uplinks should be 802.1Q trunk ports.**
**Summary:**
- **19 VLANs** defined with complete subnet plan
- **VLAN 11**: MGMT-LAN (192.168.11.0/24) - Current flat LAN
- **VLANs 110-203**: Service-specific VLANs (10.x.0.0/24 or /20 or /22)
- **Migration path**: From flat LAN to VLANs while maintaining compatibility
---
## Routing, NAT, and Egress Segmentation
### Dual Router Roles
> **Reference:** For complete routing configuration, NAT policies, and egress segmentation details, see **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md#4-routing-nat-and-egress-segmentation-er605)**.
- **ER605-A**: Active edge router (WAN1 = Spectrum primary with Block #1)
- **ER605-B**: Standby router OR dedicated to WAN2 policies/testing (no inbound services)
### NAT Policies (Critical)
#### Inbound NAT
- **Default: none**
- Break-glass only (optional):
- Jumpbox/SSH (single port, IP allowlist, Cloudflare Access preferred)
- Proxmox admin should remain **LAN-only**
#### Outbound NAT (Role-based Pools Using /28 Blocks)
| Private Subnet | Role | Egress NAT Pool | Public Block |
|----------------|------|-----------------|--------------|
| 10.132.0.0/24 | CCIP Commit | **Block #2** `<PUBLIC_BLOCK_2>/28` | #2 |
| 10.133.0.0/24 | CCIP Execute | **Block #3** `<PUBLIC_BLOCK_3>/28` | #3 |
| 10.134.0.0/24 | RMN | **Block #4** `<PUBLIC_BLOCK_4>/28` | #4 |
| 10.160.0.0/22 | Sankofa/Phoenix/PanTel | **Block #5** `<PUBLIC_BLOCK_5>/28` | #5 |
| 10.200.0.0/2010.203.0.0/20 | Sovereign tenants | **Block #6** `<PUBLIC_BLOCK_6>/28` | #6 |
| 192.168.11.0/24 | Mgmt | Block #1 (or none; tightly restricted) | #1 |
This yields **provable separation**, allowlisting, and incident scoping.
**Summary:**
- **Inbound NAT**: Default none (Cloudflare Tunnel primary)
- **Outbound NAT**: Role-based pools using /28 blocks #2-6
- **Egress Segmentation**: CCIP Commit → Block #2, Execute → Block #3, RMN → Block #4, Services → Block #5, Sovereign → Block #6
---
## Proxmox Cluster Orchestration
### Node Layout
> **Reference:** For complete Proxmox cluster orchestration, networking, and storage details, see **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md#5-proxmox-cluster-orchestration)**.
- **ml110 (192.168.11.10)**: mgmt + seed services + initial automation runner
- **r630-01..04**: production compute
### Proxmox Networking (per host)
- **`vmbr0`**: VLAN-aware bridge
- Native VLAN: 11 (MGMT)
- Tagged VLANs: 110,111,112,120,121,130,132,133,134,140,141,150,160,200203
- **Proxmox host IP** remains on **VLAN 11** only.
### Storage Orchestration (R630)
**Hardware:**
- 2×600GB boot (mirror recommended)
- 6×250GB SSD
**Recommended:**
- **Boot drives**: ZFS mirror or hardware RAID1
- **Data SSDs**: ZFS pool (striped mirrors if you can pair, or RAIDZ1/2 depending on risk tolerance)
- **High-write workloads** (logs/metrics/indexers) on dedicated dataset with quotas
**Summary:**
- **Node Layout**: ml110 (mgmt) + r630-01..04 (compute)
- **Networking**: VLAN-aware bridge `vmbr0` with native VLAN 11
- **Storage**: ZFS recommended for R630 data SSDs
---
## Cloudflare Zero Trust Orchestration
### cloudflared Gateway Pattern
> **Reference:** For complete Cloudflare Zero Trust orchestration, cloudflared gateway pattern, and tunnel configuration, see **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md#6-cloudflare-zero-trust-orchestration)**.
Run **2 cloudflared LXCs** for redundancy:
**Summary:**
- **2 cloudflared LXCs** for redundancy (ML110 + R630)
- **Tunnels for**: Blockscout, FireFly, Gitea, internal admin dashboards
- **Proxmox UI**: LAN-only (publish via Cloudflare Access if needed)
- `cloudflared-1` on ML110
- `cloudflared-2` on an R630
Both run tunnels for:
- Blockscout
- FireFly
- Gitea
- Internal admin dashboards (Grafana) behind Cloudflare Access
**Keep Proxmox UI LAN-only**; if needed, publish via Cloudflare Access with strict posture/MFA.
For detailed Cloudflare configuration guides, see:
- **[../04-configuration/cloudflare/CLOUDFLARE_ZERO_TRUST_GUIDE.md](../04-configuration/cloudflare/CLOUDFLARE_ZERO_TRUST_GUIDE.md)**
- **[../04-configuration/cloudflare/CLOUDFLARE_DNS_TO_CONTAINERS.md](../04-configuration/cloudflare/CLOUDFLARE_DNS_TO_CONTAINERS.md)**
---
## VMID Allocation Registry
### Authoritative Registry Summary
> **Reference:** For complete VMID allocation registry with detailed breakdowns, see **[VMID_ALLOCATION_FINAL.md](VMID_ALLOCATION_FINAL.md)**.
| VMID Range | Domain | Count | Notes |
|-----------:|--------|------:|-------|
| 10004999 | **Besu** | 4,000 | Validators, Sentries, RPC, Archive, Reserved |
| 50005099 | **Blockscout** | 100 | Explorer/Indexing |
| 52005299 | **Cacti** | 100 | Interop middleware |
| 54005599 | **CCIP** | 200 | Ops, Monitoring, Commit, Execute, RMN, Reserved |
| 60006099 | **Fabric** | 100 | Enterprise contracts |
| 62006299 | **FireFly** | 100 | Workflow/orchestration |
| 64007399 | **Indy** | 1,000 | Identity layer |
| 78008999 | **Sankofa/Phoenix/PanTel** | 1,200 | Service + Cloud + Telecom |
| 1000013999 | **Phoenix Sovereign Cloud Band** | 4,000 | SMOM/ICCC/DBIS/AR tenants |
**Summary:**
- **Total Allocated**: 11,000 VMIDs (1000-13999)
- **Besu Network**: 4,000 VMIDs (1000-4999)
- **CCIP**: 200 VMIDs (5400-5599)
- **Sovereign Cloud Band**: 4,000 VMIDs (10000-13999)
**Total Allocated**: 11,000 VMIDs (1000-13999)
See **[VMID_ALLOCATION_FINAL.md](VMID_ALLOCATION_FINAL.md)** for complete details.
See also **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md#7-complete-vmid-and-network-allocation-table)** for VMID-to-VLAN mapping.
---
@@ -295,6 +183,33 @@ See **[CCIP_DEPLOYMENT_SPEC.md](CCIP_DEPLOYMENT_SPEC.md)** for complete specific
## Deployment Orchestration Workflow
### Deployment Workflow Diagram
```mermaid
flowchart TD
Start[Start Deployment] --> Phase0[Phase 0: Validate Foundation]
Phase0 --> Check1{Foundation Valid?}
Check1 -->|No| Fix1[Fix Issues]
Fix1 --> Phase0
Check1 -->|Yes| Phase1[Phase 1: Enable VLANs]
Phase1 --> Verify1{VLANs Working?}
Verify1 -->|No| FixVLAN[Fix VLAN Config]
FixVLAN --> Phase1
Verify1 -->|Yes| Phase2[Phase 2: Deploy Observability]
Phase2 --> Verify2{Monitoring Active?}
Verify2 -->|No| FixMonitor[Fix Monitoring]
FixMonitor --> Phase2
Verify2 -->|Yes| Phase3[Phase 3: Deploy CCIP Fleet]
Phase3 --> Verify3{CCIP Nodes Running?}
Verify3 -->|No| FixCCIP[Fix CCIP Config]
FixCCIP --> Phase3
Verify3 -->|Yes| Phase4[Phase 4: Deploy Sovereign Tenants]
Phase4 --> Verify4{Tenants Operational?}
Verify4 -->|No| FixTenants[Fix Tenant Config]
FixTenants --> Phase4
Verify4 -->|Yes| Complete[Deployment Complete]
```
### Phase 0 — Validate Foundation
1. ✅ Confirm ER605-A WAN1 static: **76.53.10.34/28**, GW **76.53.10.33**
@@ -336,9 +251,9 @@ See **[CCIP_DEPLOYMENT_SPEC.md](CCIP_DEPLOYMENT_SPEC.md)** for complete specific
### Network Operations
- **[ER605_ROUTER_CONFIGURATION.md](ER605_ROUTER_CONFIGURATION.md)** - Router configuration guide
- **[BESU_ALLOWLIST_RUNBOOK.md](BESU_ALLOWLIST_RUNBOOK.md)** - Besu allowlist management
- **[CLOUDFLARE_ZERO_TRUST_GUIDE.md](CLOUDFLARE_ZERO_TRUST_GUIDE.md)** - Cloudflare Zero Trust setup
- **[../04-configuration/ER605_ROUTER_CONFIGURATION.md](../04-configuration/ER605_ROUTER_CONFIGURATION.md)** - Router configuration guide
- **[../06-besu/BESU_ALLOWLIST_RUNBOOK.md](../06-besu/BESU_ALLOWLIST_RUNBOOK.md)** - Besu allowlist management
- **[../04-configuration/cloudflare/CLOUDFLARE_ZERO_TRUST_GUIDE.md](../04-configuration/cloudflare/CLOUDFLARE_ZERO_TRUST_GUIDE.md)** - Cloudflare Zero Trust setup
### Deployment Operations
@@ -348,8 +263,8 @@ See **[CCIP_DEPLOYMENT_SPEC.md](CCIP_DEPLOYMENT_SPEC.md)** for complete specific
### Troubleshooting
- **[TROUBLESHOOTING_FAQ.md](TROUBLESHOOTING_FAQ.md)** - Common issues and solutions
- **[QBFT_TROUBLESHOOTING.md](QBFT_TROUBLESHOOTING.md)** - QBFT consensus troubleshooting
- **[../09-troubleshooting/TROUBLESHOOTING_FAQ.md](../09-troubleshooting/TROUBLESHOOTING_FAQ.md)** - Common issues and solutions
- **[../09-troubleshooting/QBFT_TROUBLESHOOTING.md](../09-troubleshooting/QBFT_TROUBLESHOOTING.md)** - QBFT consensus troubleshooting
---
@@ -394,34 +309,52 @@ Then we can produce:
## Related Documentation
### Prerequisites
- **[PREREQUISITES.md](PREREQUISITES.md)** - System requirements and prerequisites
- **[DEPLOYMENT_READINESS.md](DEPLOYMENT_READINESS.md)** - Pre-deployment validation checklist
- **[../01-getting-started/PREREQUISITES.md](../01-getting-started/PREREQUISITES.md)** - System requirements and prerequisites
- **[../03-deployment/DEPLOYMENT_READINESS.md](../03-deployment/DEPLOYMENT_READINESS.md)** - Pre-deployment validation checklist
### Architecture
- **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md)** - Complete network architecture
- **[VMID_ALLOCATION_FINAL.md](VMID_ALLOCATION_FINAL.md)** - VMID allocation registry
- **[CCIP_DEPLOYMENT_SPEC.md](CCIP_DEPLOYMENT_SPEC.md)** - CCIP deployment specification
- **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md)** ⭐⭐⭐ - Complete network architecture (authoritative reference)
- **[PHYSICAL_HARDWARE_INVENTORY.md](PHYSICAL_HARDWARE_INVENTORY.md)** ⭐⭐⭐ - Physical hardware inventory and specifications
- **[VMID_ALLOCATION_FINAL.md](VMID_ALLOCATION_FINAL.md)** ⭐⭐⭐ - VMID allocation registry
- **[DOMAIN_STRUCTURE.md](DOMAIN_STRUCTURE.md)** ⭐⭐ - Domain structure and DNS assignments
- **[CCIP_DEPLOYMENT_SPEC.md](../07-ccip/CCIP_DEPLOYMENT_SPEC.md)** - CCIP deployment specification
### Configuration
- **[ER605_ROUTER_CONFIGURATION.md](ER605_ROUTER_CONFIGURATION.md)** - Router configuration
- **[CLOUDFLARE_ZERO_TRUST_GUIDE.md](CLOUDFLARE_ZERO_TRUST_GUIDE.md)** - Cloudflare Zero Trust setup
- **[../04-configuration/ER605_ROUTER_CONFIGURATION.md](../04-configuration/ER605_ROUTER_CONFIGURATION.md)** - Router configuration
- **[../04-configuration/cloudflare/CLOUDFLARE_ZERO_TRUST_GUIDE.md](../04-configuration/cloudflare/CLOUDFLARE_ZERO_TRUST_GUIDE.md)** - Cloudflare Zero Trust setup
### Operations
- **[OPERATIONAL_RUNBOOKS.md](OPERATIONAL_RUNBOOKS.md)** - Operational procedures
- **[DEPLOYMENT_STATUS_CONSOLIDATED.md](DEPLOYMENT_STATUS_CONSOLIDATED.md)** - Deployment status
- **[TROUBLESHOOTING_FAQ.md](TROUBLESHOOTING_FAQ.md)** - Troubleshooting guide
- **[../03-deployment/OPERATIONAL_RUNBOOKS.md](../03-deployment/OPERATIONAL_RUNBOOKS.md)** - Operational procedures
- **[../03-deployment/DEPLOYMENT_STATUS_CONSOLIDATED.md](../03-deployment/DEPLOYMENT_STATUS_CONSOLIDATED.md)** - Deployment status
- **[../09-troubleshooting/TROUBLESHOOTING_FAQ.md](../09-troubleshooting/TROUBLESHOOTING_FAQ.md)** - Troubleshooting guide
### Best Practices
- **[RECOMMENDATIONS_AND_SUGGESTIONS.md](RECOMMENDATIONS_AND_SUGGESTIONS.md)** - Comprehensive recommendations
- **[IMPLEMENTATION_CHECKLIST.md](IMPLEMENTATION_CHECKLIST.md)** - Implementation checklist
- **[../10-best-practices/RECOMMENDATIONS_AND_SUGGESTIONS.md](../10-best-practices/RECOMMENDATIONS_AND_SUGGESTIONS.md)** - Comprehensive recommendations
- **[../10-best-practices/IMPLEMENTATION_CHECKLIST.md](../10-best-practices/IMPLEMENTATION_CHECKLIST.md)** - Implementation checklist
### Reference
- **[MASTER_INDEX.md](MASTER_INDEX.md)** - Complete documentation index
---
**Document Status:** Complete (v1.0)
**Document Status:** Complete (v1.1)
**Maintained By:** Infrastructure Team
**Review Cycle:** Monthly
**Last Updated:** 2025-01-20
---
## Change Log
### Version 1.1 (2025-01-20)
- Removed duplicate network architecture content
- Added references to NETWORK_ARCHITECTURE.md
- Added deployment workflow Mermaid diagram
- Added ASCII art process flow
- Added breadcrumb navigation
- Added status indicators
### Version 1.0 (2024-12-15)
- Initial version
- Complete deployment orchestration guide

View File

@@ -0,0 +1,250 @@
# Proxmox Cluster Architecture
**Last Updated:** 2025-01-20
**Document Version:** 1.0
**Status:** Active Documentation
---
## Overview
This document describes the Proxmox cluster architecture, including node configuration, storage setup, network bridges, and VM/container distribution.
---
## Cluster Architecture Diagram
```mermaid
graph TB
Cluster[Proxmox Cluster<br/>Name: h]
ML110[ML110 Management Node<br/>192.168.11.10<br/>6 cores, 125GB RAM]
R6301[R630-01<br/>192.168.11.11<br/>32 cores, 503GB RAM]
R6302[R630-02<br/>192.168.11.12<br/>32 cores, 503GB RAM]
R6303[R630-03<br/>192.168.11.13<br/>32 cores, 512GB RAM]
R6304[R630-04<br/>192.168.11.14<br/>32 cores, 512GB RAM]
Cluster --> ML110
Cluster --> R6301
Cluster --> R6302
Cluster --> R6303
Cluster --> R6304
ML110 --> Storage1[local: 94GB<br/>local-lvm: 813GB]
R6301 --> Storage2[local: 536GB<br/>local-lvm: Available]
R6302 --> Storage3[local: Available<br/>local-lvm: Available]
R6303 --> Storage4[Storage: Available]
R6304 --> Storage5[Storage: Available]
ML110 --> Bridge1[vmbr0<br/>VLAN-aware]
R6301 --> Bridge2[vmbr0<br/>VLAN-aware]
R6302 --> Bridge3[vmbr0<br/>VLAN-aware]
R6303 --> Bridge4[vmbr0<br/>VLAN-aware]
R6304 --> Bridge5[vmbr0<br/>VLAN-aware]
```
---
## Cluster Nodes
### Node Summary
| Hostname | IP Address | CPU | RAM | Storage | VMs/Containers | Status |
|----------|------------|-----|-----|---------|----------------|--------|
| ml110 | 192.168.11.10 | 6 cores @ 1.60GHz | 125GB | local (94GB), local-lvm (813GB) | 34 | ✅ Active |
| r630-01 | 192.168.11.11 | 32 cores @ 2.40GHz | 503GB | local (536GB), local-lvm (available) | 0 | ✅ Active |
| r630-02 | 192.168.11.12 | 32 cores @ 2.40GHz | 503GB | local (available), local-lvm (available) | 0 | ✅ Active |
| r630-03 | 192.168.11.13 | 32 cores | 512GB | Available | 0 | ✅ Active |
| r630-04 | 192.168.11.14 | 32 cores | 512GB | Available | 0 | ✅ Active |
---
## Storage Configuration
### Storage Types
**local (Directory Storage):**
- Type: Directory-based storage
- Used for: ISO images, container templates, backups
- Location: `/var/lib/vz`
**local-lvm (LVM Thin Storage):**
- Type: LVM thin provisioning
- Used for: VM/container disk images
- Benefits: Thin provisioning, snapshots, efficient space usage
### Storage by Node
**ml110:**
- `local`: 94GB total, 7.4GB used (7.87%)
- `local-lvm`: 813GB total, 214GB used (26.29%)
- Status: ✅ Active and operational
**r630-01:**
- `local`: 536GB total, 0% used
- `local-lvm`: Available (needs activation)
- Status: ⏳ Storage available, ready for use
**r630-02:**
- `local`: Available
- `local-lvm`: Available (needs activation)
- Status: ⏳ Storage available, ready for use
**r630-03/r630-04:**
- Storage: Available
- Status: ⏳ Ready for configuration
---
## Network Configuration
### Network Bridge (vmbr0)
**All nodes use VLAN-aware bridge:**
```bash
# Bridge configuration (all nodes)
auto vmbr0
iface vmbr0 inet static
address 192.168.11.<HOST_IP>/24
gateway 192.168.11.1
bridge-ports <PHYSICAL_INTERFACE>
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 11 110 111 112 120 121 130 132 133 134 140 141 150 160 200 201 202 203
```
**Bridge Features:**
- **VLAN-aware:** Supports multiple VLANs on single bridge
- **Native VLAN:** 11 (MGMT-LAN)
- **Tagged VLANs:** All service VLANs (110-203)
- **802.1Q Trunking:** Enabled for VLAN support
---
## VM/Container Distribution
### Current Distribution
**ml110 (192.168.11.10):**
- **Total:** 34 containers/VMs
- **Services:** All current services running here
- **Breakdown:**
- Besu validators: 5 (VMIDs 1000-1004)
- Besu sentries: 4 (VMIDs 1500-1503)
- Besu RPC: 3+ (VMIDs 2500-2502+)
- Blockscout: 1 (VMID 5000)
- DBIS services: Multiple
- Other services: Various
**r630-01, r630-02, r630-03, r630-04:**
- **Total:** 0 containers/VMs
- **Status:** Ready for VM migration/deployment
---
## High Availability
### Current Setup
- **Cluster Name:** "h"
- **HA Mode:** Active/Standby (manual)
- **Quorum:** 3+ nodes required for quorum
- **Storage:** Local storage (not shared)
### HA Considerations
**Current Limitations:**
- No shared storage (each node has local storage)
- Manual VM migration required
- No automatic failover
**Future Enhancements:**
- Consider shared storage (NFS, Ceph, etc.) for true HA
- Implement automatic VM migration
- Configure HA groups for critical services
---
## Resource Allocation
### CPU Resources
| Node | CPU Cores | CPU Usage | Available |
|------|-----------|-----------|-----------|
| ml110 | 6 @ 1.60GHz | High | Limited |
| r630-01 | 32 @ 2.40GHz | Low | Excellent |
| r630-02 | 32 @ 2.40GHz | Low | Excellent |
| r630-03 | 32 cores | Low | Excellent |
| r630-04 | 32 cores | Low | Excellent |
### Memory Resources
| Node | Total RAM | Used | Available | Usage % |
|------|-----------|------|-----------|---------|
| ml110 | 125GB | 94GB | 31GB | 75% ⚠️ |
| r630-01 | 503GB | ~5GB | ~498GB | 1% ✅ |
| r630-02 | 503GB | ~5GB | ~498GB | 1% ✅ |
| r630-03 | 512GB | Low | High | Low ✅ |
| r630-04 | 512GB | Low | High | Low ✅ |
---
## Storage Recommendations
### For R630 Nodes
**Boot Drives (2×600GB):**
- **Recommended:** ZFS mirror or hardware RAID1
- **Purpose:** Proxmox OS and boot files
- **Benefits:** Redundancy, data integrity
**Data SSDs (6×250GB):**
- **Option 1:** ZFS striped mirrors (3 pairs)
- Capacity: ~750GB usable
- Performance: High
- Redundancy: Good
- **Option 2:** ZFS RAIDZ1 (5 drives + 1 parity)
- Capacity: ~1.25TB usable
- Performance: Good
- Redundancy: Single drive failure tolerance
- **Option 3:** ZFS RAIDZ2 (4 drives + 2 parity)
- Capacity: ~1TB usable
- Performance: Good
- Redundancy: Dual drive failure tolerance
---
## Network Recommendations
### VLAN Configuration
**All Proxmox hosts should:**
- Use VLAN-aware bridge (vmbr0)
- Support all 19 VLANs
- Maintain native VLAN 11 for management
- Enable 802.1Q trunking on physical interfaces
### Network Performance
- **Link Speed:** Ensure 1Gbps or higher for trunk ports
- **Jumbo Frames:** Consider enabling if supported
- **Bonding:** Consider link aggregation for redundancy
---
## Related Documentation
- **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md)** ⭐⭐⭐ - Network architecture with VLAN plan
- **[PHYSICAL_HARDWARE_INVENTORY.md](PHYSICAL_HARDWARE_INVENTORY.md)** ⭐⭐⭐ - Physical hardware inventory
- **[PROXMOX_COMPREHENSIVE_REVIEW.md](PROXMOX_COMPREHENSIVE_REVIEW.md)** ⭐⭐ - Comprehensive Proxmox review
- **[ORCHESTRATION_DEPLOYMENT_GUIDE.md](ORCHESTRATION_DEPLOYMENT_GUIDE.md)** ⭐⭐⭐ - Deployment orchestration
---
**Last Updated:** 2025-01-20
**Document Version:** 1.0
**Review Cycle:** Quarterly

View File

@@ -0,0 +1,483 @@
# Proxmox VE Comprehensive Configuration Review
**Last Updated:** 2025-01-20
**Document Version:** 1.0
**Status:** Active Documentation
---
## Executive Summary
### ✅ Completed Tasks
- [x] Hostname migration (pve → r630-01, pve2 → r630-02)
- [x] IP address audit (no conflicts found)
- [x] Proxmox services verified (all operational)
- [x] Storage configuration reviewed
### ⚠️ Issues Identified
- r630-01 and r630-02 have LVM thin storage **disabled**
- All VMs/containers currently on ml110 only
- Storage not optimized for performance on r630-01/r630-02
---
## Hostname Migration - COMPLETE ✅
### Status
- **r630-01** (192.168.11.11): ✅ Hostname changed from `pve` to `r630-01`
- **r630-02** (192.168.11.12): ✅ Hostname changed from `pve2` to `r630-02`
### Verification
```bash
ssh root@192.168.11.11 "hostname" # Returns: r630-01 ✅
ssh root@192.168.11.12 "hostname" # Returns: r630-02 ✅
```
### Notes
- Both hosts are in a cluster (cluster name: "h")
- Cluster configuration may need update to reflect new hostnames
- /etc/hosts updated on both hosts for proper resolution
---
## IP Address Audit - COMPLETE ✅
### Results
- **Total VMs/Containers:** 34 with static IPs
- **IP Conflicts:** 0 ✅
- **Invalid IPs:** 0 ✅
- **DHCP IPs:** 2 (VMIDs 3500, 3501)
### All VMs Currently On
- **ml110** (192.168.11.10): All 34 VMs/containers
- **r630-01** (192.168.11.11): 0 VMs/containers
- **r630-02** (192.168.11.12): 0 VMs/containers
### IP Allocation Summary
| IP Range | Count | Purpose |
|----------|-------|---------|
| 192.168.11.57 | 1 | Firefly (stopped) |
| 192.168.11.60-63 | 4 | ML nodes |
| 192.168.11.64 | 1 | Indy |
| 192.168.11.80 | 1 | Cacti |
| 192.168.11.100-104 | 5 | Besu Validators |
| 192.168.11.105-106 | 2 | DBIS PostgreSQL |
| 192.168.11.112 | 1 | Fabric |
| 192.168.11.120 | 1 | DBIS Redis |
| 192.168.11.130 | 1 | DBIS Frontend |
| 192.168.11.150-154 | 5 | Besu Sentries |
| 192.168.11.155-156 | 2 | DBIS API |
| 192.168.11.201-204 | 4 | Named RPC |
| 192.168.11.240-242 | 3 | ThirdWeb RPC |
| 192.168.11.250-254 | 5 | Public RPC |
---
## Proxmox Host Configuration Review
### ml110 (192.168.11.10)
| Property | Value | Status |
|----------|-------|--------|
| **Hostname** | ml110 | ✅ Correct |
| **Proxmox Version** | 9.1.0 (kernel 6.17.4-1-pve) | ✅ Current |
| **CPU** | Intel Xeon E5-2603 v3 @ 1.60GHz (6 cores) | ⚠️ Older, slower |
| **Memory** | 125GB total, 94GB used, 31GB available | ⚠️ High usage |
| **Storage - local** | 94GB total, 7.4GB used (7.87%) | ✅ Good |
| **Storage - local-lvm** | 813GB total, 214GB used (26.29%) | ✅ Active |
| **VMs/Containers** | 34 total | ✅ All here |
**Storage Details:**
- `local`: Directory storage, active, 94GB available
- `local-lvm`: LVM thin, active, 600GB available
- `thin1-thin6`: Configured but disabled (not in use)
**Recommendations:**
- ⚠️ **CPU is older/slower** - Consider workload distribution
- ⚠️ **Memory usage high (75%)** - Monitor closely
-**Storage well configured** - LVM thin active and working
### r630-01 (192.168.11.11) - Previously "pve"
| Property | Value | Status |
|----------|-------|--------|
| **Hostname** | r630-01 | ✅ Migrated |
| **Proxmox Version** | 9.1.0 (kernel 6.17.4-1-pve) | ✅ Current |
| **CPU** | Intel Xeon E5-2630 v3 @ 2.40GHz (32 cores) | ✅ Good |
| **Memory** | 503GB total, 6.4GB used, 497GB available | ✅ Excellent |
| **Storage - local** | 536GB total, 0.1GB used (0.00%) | ✅ Available |
| **Storage - local-lvm** | **DISABLED** | ⚠️ **Issue** |
| **Storage - thin1-thin6** | **DISABLED** | ⚠️ **Issue** |
| **VMs/Containers** | 0 | ⏳ Ready for deployment |
**Storage Details:**
- **Volume Group:** `pve` exists with 2 physical volumes
- **Thin Pools:** `data` (200GB) and `thin1` (208GB) exist
- **Disks:** 4 disks (sda, sdb: 558GB each; sdc, sdd: 232GB each)
- **LVM Setup:** Properly configured
- **Storage Config Issue:** Storage configured but node references point to "pve" (old hostname) or "pve2"
**Issues:**
- ⚠️ **Storage configured but node references outdated** - Points to "pve" instead of "r630-01"
- ⚠️ **Storage may show as disabled** - Due to hostname mismatch in config
- ⚠️ **Need to update storage.cfg** - Update node references to r630-01
**Recommendations:**
- 🔴 **CRITICAL:** Enable local-lvm storage to use existing LVM thin pools
- 🔴 **CRITICAL:** Activate thin1 storage for better performance
-**Ready for VMs** - Excellent resources available
### r630-02 (192.168.11.12) - Previously "pve2"
| Property | Value | Status |
|----------|-------|--------|
| **Hostname** | r630-02 | ✅ Migrated |
| **Proxmox Version** | 9.1.0 (kernel 6.17.4-1-pve) | ✅ Current |
| **CPU** | Intel Xeon E5-2660 v4 @ 2.00GHz (56 cores) | ✅ Excellent |
| **Memory** | 251GB total, 4.4GB used, 247GB available | ✅ Excellent |
| **Storage - local** | 220GB total, 0.1GB used (0.06%) | ✅ Available |
| **Storage - local-lvm** | **DISABLED** | ⚠️ **Issue** |
| **Storage - thin1-thin6** | **DISABLED** | ⚠️ **Issue** |
| **VMs/Containers** | 0 | ⏳ Ready for deployment |
**Storage Details:**
- Need to check LVM configuration (command timed out)
- Storage shows as disabled in Proxmox
**Issues:**
- ⚠️ **Storage configured but node references outdated** - Points to "pve2" instead of "r630-02"
- ⚠️ **VMs already exist on storage** - Need to verify they're accessible
- ⚠️ **Need to update storage.cfg** - Update node references to r630-02
**Recommendations:**
- 🔴 **CRITICAL:** Check and configure LVM storage
- 🔴 **CRITICAL:** Enable local-lvm or thin storage
-**Ready for VMs** - Excellent resources available
---
## Storage Configuration Analysis
### Current Storage Status
| Host | Storage Type | Status | Size | Usage | Recommendation |
|------|--------------|--------|------|-------|----------------|
| **ml110** | local | ✅ Active | 94GB | 7.87% | ✅ Good |
| **ml110** | local-lvm | ✅ Active | 813GB | 26.29% | ✅ Good |
| **r630-01** | local | ✅ Active | 536GB | 0.00% | ✅ Ready |
| **r630-01** | local-lvm | ❌ Disabled | 0GB | N/A | 🔴 **Enable** |
| **r630-01** | thin1 | ❌ Disabled | 0GB | N/A | 🔴 **Enable** |
| **r630-02** | local | ✅ Active | 220GB | 0.06% | ✅ Ready |
| **r630-02** | local-lvm | ❌ Disabled | 0GB | N/A | 🔴 **Enable** |
| **r630-02** | thin1-thin6 | ❌ Disabled | 0GB | N/A | 🔴 **Enable** |
### Storage Issues
#### r630-01 Storage Issue
**Problem:** LVM thin pools exist (`data` 200GB, `thin1` 208GB) but Proxmox storage is disabled
**Root Cause:** Storage configured in Proxmox but not activated/enabled
**Solution:**
```bash
# Update storage.cfg node references on r630-01
ssh root@192.168.11.11
# Update node references from "pve" to "r630-01"
sed -i 's/nodes pve$/nodes r630-01/' /etc/pve/storage.cfg
sed -i 's/nodes pve /nodes r630-01 /' /etc/pve/storage.cfg
# Enable storage
pvesm set local-lvm --disable 0 2>/dev/null || true
pvesm set thin1 --disable 0 2>/dev/null || true
```
#### r630-02 Storage Issue
**Problem:** Storage disabled, LVM configuration unknown
**Solution:**
```bash
# Update storage.cfg node references on r630-02
ssh root@192.168.11.12
# Update node references from "pve2" to "r630-02"
sed -i 's/nodes pve2$/nodes r630-02/' /etc/pve/storage.cfg
sed -i 's/nodes pve2 /nodes r630-02 /' /etc/pve/storage.cfg
# Enable all thin storage pools
for storage in thin1 thin2 thin3 thin4 thin5 thin6; do
pvesm set "$storage" --disable 0 2>/dev/null || true
done
```
---
## Critical Recommendations
### 1. Enable LVM Thin Storage on r630-01 and r630-02 🔴 CRITICAL
**Priority:** HIGH
**Impact:** Cannot migrate VMs or create new VMs with optimal storage
**Action Required:**
1. Enable `local-lvm` storage on both hosts
2. Activate `thin1` storage pools if they exist
3. Verify storage is accessible and working
**Script Available:** `scripts/enable-local-lvm-storage.sh` (may need updates)
### 2. Distribute VMs Across Hosts ⚠️ RECOMMENDED
**Current State:** All 34 VMs on ml110 (overloaded)
**Recommendation:**
- Migrate some VMs to r630-01 and r630-02
- Balance workload across all three hosts
- Use r630-01/r630-02 for new deployments
**Benefits:**
- Better resource utilization
- Improved performance (ml110 CPU is slower)
- Better redundancy
### 3. Update Cluster Configuration ⚠️ RECOMMENDED
**Issue:** Hostnames changed but cluster may still reference old names
**Action:**
```bash
# Check cluster configuration
pvecm status
pvecm nodes
# Update if needed (may require cluster reconfiguration)
```
### 4. Storage Performance Optimization ⚠️ RECOMMENDED
**Current:**
- ml110: Using local-lvm (good)
- r630-01: Only local (directory) available (slower)
- r630-02: Only local (directory) available (slower)
**Recommendation:**
- Enable LVM thin storage on r630-01/r630-02 for better performance
- Use thin provisioning for space efficiency
- Monitor storage usage
### 5. Resource Monitoring ⚠️ RECOMMENDED
**ml110:**
- Memory usage: 75% (high) - Monitor closely
- CPU: Older/slower - Consider workload reduction
**r630-01/r630-02:**
- Excellent resources available
- Ready for heavy workloads
---
## Detailed Recommendations by Category
### Storage Recommendations
#### Immediate Actions
1. **Enable local-lvm on r630-01**
- LVM thin pools already exist
- Just need to activate in Proxmox
- Will enable efficient storage for VMs
2. **Configure storage on r630-02**
- Check LVM configuration
- Enable appropriate storage type
- Ensure compatibility with cluster
3. **Verify storage after enabling**
- Test VM creation
- Test storage migration
- Monitor performance
#### Long-term Actions
1. **Implement storage monitoring**
- Set up alerts for storage usage >80%
- Monitor thin pool usage
- Track storage growth trends
2. **Consider shared storage**
- For easier VM migration
- For better redundancy
- NFS or Ceph options
### Network Recommendations
#### Current Status
- All hosts on 192.168.11.0/24 network
- Flat network (no VLANs yet)
- Gateway: 192.168.11.1 (ER605-1)
#### Recommendations
1. **VLAN Migration** (Planned)
- Segment network by service type
- Improve security and isolation
- Better traffic management
2. **Network Monitoring**
- Monitor bandwidth usage
- Track network performance
- Alert on network issues
### Cluster Recommendations
#### Current Status
- Cluster name: "h"
- 3 nodes: ml110, r630-01, r630-02
- Cluster operational
#### Recommendations
1. **Update Cluster Configuration**
- Verify hostname changes reflected in cluster
- Update any references to old hostnames
- Test cluster operations
2. **Cluster Quorum**
- Ensure quorum is maintained
- Monitor cluster health
- Document cluster procedures
### Performance Recommendations
#### ml110
- **CPU:** Older/slower - Consider reducing workload
- **Memory:** High usage - Monitor and optimize
- **Storage:** Well configured - No changes needed
#### r630-01
- **CPU:** Good performance - Ready for workloads
- **Memory:** Excellent - Can handle many VMs
- **Storage:** Needs activation - Critical fix needed
#### r630-02
- **CPU:** Excellent (56 cores) - Best performance
- **Memory:** Excellent - Can handle many VMs
- **Storage:** Needs configuration - Critical fix needed
---
## Action Items
### Critical (Do Before Starting VMs)
1.**Hostname Migration** - COMPLETE
2.**IP Address Audit** - COMPLETE
3. 🔴 **Enable local-lvm storage on r630-01** - PENDING
4. 🔴 **Configure storage on r630-02** - PENDING
5. ⚠️ **Verify cluster configuration** - PENDING
### High Priority
1. ⚠️ **Test VM creation on r630-01/r630-02** - After storage enabled
2. ⚠️ **Update cluster configuration** - Verify hostname changes
3. ⚠️ **Plan VM distribution** - Balance workload across hosts
### Medium Priority
1. ⚠️ **Implement storage monitoring** - Set up alerts
2. ⚠️ **Document storage procedures** - For future reference
3. ⚠️ **Plan VLAN migration** - Network segmentation
---
## Verification Checklist
### Hostname Verification
- [x] r630-01 hostname correct
- [x] r630-02 hostname correct
- [x] /etc/hosts updated on both hosts
- [ ] Cluster configuration updated (if needed)
### IP Address Verification
- [x] No conflicts detected
- [x] No invalid IPs
- [x] All IPs documented
- [x] IP audit script working
### Storage Verification
- [x] ml110 storage working
- [ ] r630-01 local-lvm enabled
- [ ] r630-02 storage configured
- [ ] Storage tested and working
### Service Verification
- [x] All Proxmox services running
- [x] Web interfaces accessible
- [x] Cluster operational
- [ ] Storage accessible
---
## Next Steps
### Immediate (Before Starting VMs)
1. **Enable Storage on r630-01:**
```bash
ssh root@192.168.11.11
# Check current storage config
cat /etc/pve/storage.cfg
# Enable local-lvm
pvesm set local-lvm --disable 0
# Or reconfigure if needed
```
2. **Configure Storage on r630-02:**
```bash
ssh root@192.168.11.12
# Check LVM setup
vgs
lvs
# Configure appropriate storage
```
3. **Verify Storage:**
```bash
# On each host
pvesm status
# Should show local-lvm as active
```
### After Storage is Enabled
1. **Test VM Creation:**
- Create test container on r630-01
- Create test container on r630-02
- Verify storage works correctly
2. **Start VMs:**
- All IPs verified, no conflicts
- Hostnames correct
- Storage ready
---
## Scripts Available
1. **`scripts/check-all-vm-ips.sh`** - ✅ Working - IP audit
2. **`scripts/migrate-hostnames-proxmox.sh`** - ✅ Complete - Hostname migration
3. **`scripts/diagnose-proxmox-hosts.sh`** - ✅ Working - Diagnostics
4. **`scripts/enable-local-lvm-storage.sh`** - ⏳ May need updates for r630-01/r630-02
---
## Related Documentation
### Architecture Documents
- **[PHYSICAL_HARDWARE_INVENTORY.md](PHYSICAL_HARDWARE_INVENTORY.md)** ⭐⭐⭐ - Physical hardware inventory
- **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md)** ⭐⭐⭐ - Network architecture
- **[ORCHESTRATION_DEPLOYMENT_GUIDE.md](ORCHESTRATION_DEPLOYMENT_GUIDE.md)** ⭐⭐⭐ - Deployment orchestration
### Deployment Documents
- **[../03-deployment/PRE_START_CHECKLIST.md](../03-deployment/PRE_START_CHECKLIST.md)** - Pre-start checklist
- **[../03-deployment/LVM_THIN_PVE_ENABLED.md](../03-deployment/LVM_THIN_PVE_ENABLED.md)** - LVM thin storage setup
- **[../09-troubleshooting/STORAGE_MIGRATION_ISSUE.md](../09-troubleshooting/STORAGE_MIGRATION_ISSUE.md)** - Storage migration troubleshooting
---
**Last Updated:** 2025-01-20
**Document Version:** 1.0
**Review Cycle:** Quarterly

View File

@@ -1,6 +1,12 @@
# Final VMID Allocation Plan
**Updated**: Complete sovereign-scale allocation with all domains
**Navigation:** [Home](../README.md) > [Architecture](README.md) > VMID Allocation
**Last Updated:** 2025-01-20
**Document Version:** 1.0
**Status:** 🟢 Active Documentation
---
## Complete VMID Allocation Table