Complete markdown files cleanup and organization
- Organized 252 files across project - Root directory: 187 → 2 files (98.9% reduction) - Moved configuration guides to docs/04-configuration/ - Moved troubleshooting guides to docs/09-troubleshooting/ - Moved quick start guides to docs/01-getting-started/ - Moved reports to reports/ directory - Archived temporary files - Generated comprehensive reports and documentation - Created maintenance scripts and guides All files organized according to established standards.
This commit is contained in:
547
docs/02-architecture/COMPREHENSIVE_INFRASTRUCTURE_REVIEW.md
Normal file
547
docs/02-architecture/COMPREHENSIVE_INFRASTRUCTURE_REVIEW.md
Normal file
@@ -0,0 +1,547 @@
|
||||
# Comprehensive Infrastructure Review
|
||||
|
||||
**Last Updated:** 2025-12-27
|
||||
**Document Version:** 1.0
|
||||
**Status:** Active Documentation
|
||||
**Review Scope:** All Tunnels, DNS Entries, Nginx Configurations, VMIDs
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
This document provides a comprehensive review of:
|
||||
- ✅ All Cloudflare Tunnels
|
||||
- ✅ All DNS Entries
|
||||
- ✅ All Nginx Configurations
|
||||
- ✅ All VMIDs and Services
|
||||
- ✅ Recommendations for Optimization
|
||||
|
||||
---
|
||||
|
||||
## 1. Cloudflare Tunnels Review
|
||||
|
||||
### Active Tunnels
|
||||
|
||||
| Tunnel Name | Tunnel ID | Status | Location | Purpose |
|
||||
|-------------|-----------|--------|-----------|---------|
|
||||
| `explorer.d-bis.org` | `b02fe1fe-cb7d-484e-909b-7cc41298ebe8` | ✅ HEALTHY | VMID 102 | Explorer/Blockscout |
|
||||
| `rpc-http-pub.d-bis.org` | `10ab22da-8ea3-4e2e-a896-27ece2211a05` | ⚠️ DOWN | VMID 102 | RPC Services (needs config) |
|
||||
| `mim4u-tunnel` | `f8d06879-04f8-44ef-aeda-ce84564a1792` | ✅ HEALTHY | Unknown | Miracles In Motion |
|
||||
| `tunnel-ml110` | `ccd7150a-9881-4b8c-a105-9b4ead6e69a2` | ✅ HEALTHY | Unknown | Proxmox Host Access |
|
||||
| `tunnel-r630-01` | `4481af8f-b24c-4cd3-bdd5-f562f4c97df4` | ✅ HEALTHY | Unknown | Proxmox Host Access |
|
||||
| `tunnel-r630-02` | `0876f12b-64d7-4927-9ab3-94cb6cf48af9` | ✅ HEALTHY | Unknown | Proxmox Host Access |
|
||||
|
||||
### Current Tunnel Configuration (VMID 102)
|
||||
|
||||
**Active Tunnel**: `rpc-http-pub.d-bis.org` (Tunnel ID: `10ab22da-8ea3-4e2e-a896-27ece2211a05`)
|
||||
|
||||
**Current Routing** (from logs):
|
||||
- `rpc-ws-pub.d-bis.org` → `https://192.168.11.252:443`
|
||||
- `rpc-http-prv.d-bis.org` → `https://192.168.11.251:443`
|
||||
- `rpc-ws-prv.d-bis.org` → `https://192.168.11.251:443`
|
||||
- `rpc-http-pub.d-bis.org` → `https://192.168.11.252:443`
|
||||
|
||||
**⚠️ Issue**: Tunnel is routing directly to RPC nodes instead of central Nginx
|
||||
|
||||
**✅ Recommended Configuration**:
|
||||
- All HTTP endpoints → `http://192.168.11.21:80` (Central Nginx)
|
||||
- WebSocket endpoints → Direct to RPC nodes (as configured)
|
||||
|
||||
---
|
||||
|
||||
## 2. DNS Entries Review
|
||||
|
||||
### Current DNS Records (from d-bis.org zone file)
|
||||
|
||||
#### A Records (Direct IPs)
|
||||
|
||||
| Domain | IP Address(es) | Proxy Status | Notes |
|
||||
|--------|----------------|--------------|-------|
|
||||
| `api.d-bis.org` | 20.8.47.226 | ❌ Not Proxied | Should use tunnel |
|
||||
| `besu.d-bis.org` | 20.215.32.42, 70.153.83.83 | ✅ Proxied | **DUPLICATE** - Remove one |
|
||||
| `blockscout.d-bis.org` | 20.215.32.42, 70.153.83.83 | ✅ Proxied | **DUPLICATE** - Remove one |
|
||||
| `d-bis.org` (root) | 20.215.32.42, 20.215.32.15 | ✅ Proxied | **DUPLICATE** - Remove one |
|
||||
| `docs.d-bis.org` | 20.8.47.226 | ❌ Not Proxied | Should use tunnel |
|
||||
| `explorer.d-bis.org` | 20.215.32.42, 70.153.83.83 | ✅ Proxied | **DUPLICATE** - Remove one |
|
||||
| `grafana.d-bis.org` | 20.8.47.226 | ❌ Not Proxied | Should use tunnel |
|
||||
| `metrics.d-bis.org` | 70.153.83.83 | ❌ Not Proxied | Should use tunnel |
|
||||
| `monitoring.d-bis.org` | 70.153.83.83 | ✅ Proxied | Should use tunnel |
|
||||
| `prometheus.d-bis.org` | 20.8.47.226 | ❌ Not Proxied | Should use tunnel |
|
||||
| `tessera.d-bis.org` | 20.8.47.226 | ❌ Not Proxied | Should use tunnel |
|
||||
| `wallet.d-bis.org` | 70.153.83.83 | ✅ Proxied | Should use tunnel |
|
||||
| `ws.d-bis.org` | 20.8.47.226 | ❌ Not Proxied | Should use tunnel |
|
||||
| `www.d-bis.org` | 20.8.47.226 | ✅ Proxied | Should use tunnel |
|
||||
|
||||
#### CNAME Records (Tunnel-based)
|
||||
|
||||
| Domain | Target | Proxy Status | Notes |
|
||||
|--------|--------|--------------|-------|
|
||||
| `rpc.d-bis.org` | `dbis138fdendpoint-cgergbcqb7aca7at.a03.azurefd.net` | ✅ Proxied | Azure Front Door |
|
||||
| `ipfs.d-bis.org` | `ipfs.cloudflare.com` | ✅ Proxied | Cloudflare IPFS |
|
||||
|
||||
#### Missing DNS Records (Should Exist)
|
||||
|
||||
| Domain | Type | Target | Status |
|
||||
|--------|------|--------|--------|
|
||||
| `rpc-http-pub.d-bis.org` | CNAME | `<tunnel-id>.cfargotunnel.com` | ❌ Missing |
|
||||
| `rpc-ws-pub.d-bis.org` | CNAME | `<tunnel-id>.cfargotunnel.com` | ❌ Missing |
|
||||
| `rpc-http-prv.d-bis.org` | CNAME | `<tunnel-id>.cfargotunnel.com` | ❌ Missing |
|
||||
| `rpc-ws-prv.d-bis.org` | CNAME | `<tunnel-id>.cfargotunnel.com` | ❌ Missing |
|
||||
| `dbis-admin.d-bis.org` | CNAME | `<tunnel-id>.cfargotunnel.com` | ❌ Missing |
|
||||
| `dbis-api.d-bis.org` | CNAME | `<tunnel-id>.cfargotunnel.com` | ❌ Missing |
|
||||
| `dbis-api-2.d-bis.org` | CNAME | `<tunnel-id>.cfargotunnel.com` | ❌ Missing |
|
||||
| `mim4u.org` | CNAME | `<tunnel-id>.cfargotunnel.com` | ❌ Missing |
|
||||
| `www.mim4u.org` | CNAME | `<tunnel-id>.cfargotunnel.com` | ❌ Missing |
|
||||
|
||||
---
|
||||
|
||||
## 3. Nginx Configurations Review
|
||||
|
||||
### Central Nginx (VMID 105 - 192.168.11.21)
|
||||
|
||||
**Status**: ✅ Configured
|
||||
**Configuration**: `/data/nginx/custom/http.conf`
|
||||
**Type**: Nginx Proxy Manager (OpenResty)
|
||||
|
||||
**Configured Services**:
|
||||
- ✅ `explorer.d-bis.org` → `http://192.168.11.140:80`
|
||||
- ✅ `rpc-http-pub.d-bis.org` → `https://192.168.11.252:443`
|
||||
- ✅ `rpc-http-prv.d-bis.org` → `https://192.168.11.251:443`
|
||||
- ✅ `dbis-admin.d-bis.org` → `http://192.168.11.130:80`
|
||||
- ✅ `dbis-api.d-bis.org` → `http://192.168.11.150:3000`
|
||||
- ✅ `dbis-api-2.d-bis.org` → `http://192.168.11.151:3000`
|
||||
- ✅ `mim4u.org` → `http://192.168.11.19:80`
|
||||
- ✅ `www.mim4u.org` → `301 Redirect` → `mim4u.org`
|
||||
|
||||
**Note**: WebSocket endpoints (`rpc-ws-*`) are NOT in this config (routing directly)
|
||||
|
||||
### Blockscout Nginx (VMID 5000 - 192.168.11.140)
|
||||
|
||||
**Status**: ✅ Running
|
||||
**Configuration**: `/etc/nginx/sites-available/blockscout`
|
||||
**Purpose**: Local Nginx for Blockscout service
|
||||
|
||||
**Ports**:
|
||||
- Port 80: HTTP (redirects to HTTPS or serves content)
|
||||
- Port 443: HTTPS (proxies to Blockscout on port 4000)
|
||||
|
||||
### Miracles In Motion Nginx (VMID 7810 - 192.168.11.19)
|
||||
|
||||
**Status**: ✅ Running
|
||||
**Configuration**: `/etc/nginx/sites-available/default`
|
||||
**Purpose**: Web frontend and API proxy
|
||||
|
||||
**Ports**:
|
||||
- Port 80: HTTP (serves static files, proxies API to 192.168.11.8:3001)
|
||||
|
||||
### DBIS Frontend Nginx (VMID 10130 - 192.168.11.130)
|
||||
|
||||
**Status**: ✅ Running (assumed)
|
||||
**Purpose**: Frontend admin console
|
||||
|
||||
### RPC Nodes Nginx (VMIDs 2500, 2501, 2502)
|
||||
|
||||
**Status**: ⚠️ Partially Configured
|
||||
**Purpose**: SSL termination and local routing
|
||||
|
||||
**VMID 2500** (192.168.11.250):
|
||||
- Port 443: HTTPS RPC → `127.0.0.1:8545`
|
||||
- Port 8443: HTTPS WebSocket → `127.0.0.1:8546`
|
||||
|
||||
**VMID 2501** (192.168.11.251):
|
||||
- Port 443: HTTPS RPC → `127.0.0.1:8545`
|
||||
- Port 443: HTTPS WebSocket → `127.0.0.1:8546` (SNI-based)
|
||||
|
||||
**VMID 2502** (192.168.11.252):
|
||||
- Port 443: HTTPS RPC → `127.0.0.1:8545`
|
||||
- Port 443: HTTPS WebSocket → `127.0.0.1:8546` (SNI-based)
|
||||
|
||||
---
|
||||
|
||||
## 4. VMIDs Review
|
||||
|
||||
### Infrastructure Services
|
||||
|
||||
| VMID | Name | IP | Status | Purpose |
|
||||
|------|------|----|----|---------|
|
||||
| 100 | proxmox-mail-gateway | 192.168.11.32 | ✅ Running | Mail gateway |
|
||||
| 101 | proxmox-datacenter-manager | 192.168.11.33 | ✅ Running | Datacenter management |
|
||||
| 102 | cloudflared | 192.168.11.34 | ✅ Running | Cloudflare tunnel client |
|
||||
| 103 | omada | 192.168.11.30 | ✅ Running | Network management |
|
||||
| 104 | gitea | 192.168.11.31 | ✅ Running | Git repository |
|
||||
| 105 | nginxproxymanager | 192.168.11.26 | ✅ Running | Central Nginx reverse proxy |
|
||||
| 130 | monitoring-1 | 192.168.11.27 | ✅ Running | Monitoring stack |
|
||||
|
||||
### Blockchain Services
|
||||
|
||||
| VMID | Name | IP | Status | Purpose | Notes |
|
||||
|------|------|----|----|---------|-------|
|
||||
| 5000 | blockscout-1 | 192.168.11.140 | ✅ Running | Blockchain explorer | Has local Nginx |
|
||||
| 6200 | firefly-1 | 192.168.11.7 | ✅ Running | Hyperledger Firefly | Web3 gateway |
|
||||
|
||||
### RPC Nodes
|
||||
|
||||
| VMID | Name | IP | Status | Purpose | Notes |
|
||||
|------|------|----|----|---------|-------|
|
||||
| 2500 | besu-rpc-1 | 192.168.11.250 | ✅ Running | Core RPC | Located on ml110 (192.168.11.10) |
|
||||
| 2501 | besu-rpc-2 | 192.168.11.251 | ✅ Running | Permissioned RPC | Located on ml110 (192.168.11.10) |
|
||||
| 2502 | besu-rpc-3 | 192.168.11.252 | ✅ Running | Public RPC | Located on ml110 (192.168.11.10) |
|
||||
|
||||
**✅ Status**: RPC nodes are running on ml110 (192.168.11.10), not on pve2.
|
||||
|
||||
### Application Services
|
||||
|
||||
| VMID | Name | IP | Status | Purpose |
|
||||
|------|------|----|----|---------|
|
||||
| 7800 | sankofa-api-1 | 192.168.11.13 | ✅ Running | Sankofa API |
|
||||
| 7801 | sankofa-portal-1 | 192.168.11.16 | ✅ Running | Sankofa Portal |
|
||||
| 7802 | sankofa-keycloak-1 | 192.168.11.17 | ✅ Running | Sankofa Keycloak |
|
||||
| 7810 | mim-web-1 | 192.168.11.19 | ✅ Running | Miracles In Motion Web |
|
||||
| 7811 | mim-api-1 | 192.168.11.8 | ✅ Running | Miracles In Motion API |
|
||||
|
||||
### DBIS Core Services
|
||||
|
||||
| VMID | Name | IP | Status | Purpose | Notes |
|
||||
|------|------|----|----|---------|-------|
|
||||
| 10100 | dbis-postgres-primary | 192.168.11.100 | ✅ Running | PostgreSQL Primary | Located on ml110 (192.168.11.10) |
|
||||
| 10101 | dbis-postgres-replica-1 | 192.168.11.101 | ✅ Running | PostgreSQL Replica | Located on ml110 (192.168.11.10) |
|
||||
| 10120 | dbis-redis | 192.168.11.120 | ✅ Running | Redis Cache | Located on ml110 (192.168.11.10) |
|
||||
| 10130 | dbis-frontend | 192.168.11.130 | ✅ Running | Frontend Admin | Located on ml110 (192.168.11.10) |
|
||||
| 10150 | dbis-api-primary | 192.168.11.150 | ✅ Running | API Primary | Located on ml110 (192.168.11.10) |
|
||||
| 10151 | dbis-api-secondary | 192.168.11.151 | ✅ Running | API Secondary | Located on ml110 (192.168.11.10) |
|
||||
|
||||
**✅ Status**: DBIS Core containers are running on ml110 (192.168.11.10), not on pve2.
|
||||
|
||||
---
|
||||
|
||||
## 5. Critical Issues Identified
|
||||
|
||||
### 🔴 High Priority
|
||||
|
||||
1. **Tunnel Configuration Mismatch**
|
||||
- Tunnel `rpc-http-pub.d-bis.org` is DOWN
|
||||
- Currently routing directly to RPC nodes instead of central Nginx
|
||||
- **Action**: Update Cloudflare dashboard to route HTTP endpoints to `http://192.168.11.21:80`
|
||||
|
||||
2. **Missing DNS Records**
|
||||
- RPC endpoints (`rpc-http-pub`, `rpc-ws-pub`, `rpc-http-prv`, `rpc-ws-prv`) missing CNAME records
|
||||
- DBIS services (`dbis-admin`, `dbis-api`, `dbis-api-2`) missing CNAME records
|
||||
- `mim4u.org` and `www.mim4u.org` missing CNAME records
|
||||
- **Action**: Create CNAME records pointing to tunnel
|
||||
|
||||
3. **Duplicate DNS A Records**
|
||||
- `besu.d-bis.org`: 2 A records (20.215.32.42, 70.153.83.83)
|
||||
- `blockscout.d-bis.org`: 2 A records (20.215.32.42, 70.153.83.83)
|
||||
- `explorer.d-bis.org`: 2 A records (20.215.32.42, 70.153.83.83)
|
||||
- `d-bis.org`: 2 A records (20.215.32.42, 20.215.32.15)
|
||||
- **Action**: Remove duplicate records, keep single authoritative IP
|
||||
|
||||
4. **RPC Nodes Location**
|
||||
- ✅ VMIDs 2500, 2501, 2502 found on ml110 (192.168.11.10)
|
||||
- **Action**: Verify network connectivity from pve2 to ml110
|
||||
|
||||
5. **DBIS Core Services Location**
|
||||
- ✅ VMIDs 10100-10151 found on ml110 (192.168.11.10)
|
||||
- **Action**: Verify network connectivity from pve2 to ml110
|
||||
|
||||
### 🟡 Medium Priority
|
||||
|
||||
6. **DNS Records Using Direct IPs Instead of Tunnels**
|
||||
- Many services use A records with direct IPs
|
||||
- Should use CNAME records pointing to tunnel
|
||||
- **Action**: Migrate to tunnel-based DNS
|
||||
|
||||
7. **Inconsistent Proxy Status**
|
||||
- Some records proxied, some not
|
||||
- **Action**: Standardize proxy status (proxied for public services)
|
||||
|
||||
8. **Multiple Nginx Instances**
|
||||
- Central Nginx (105), Blockscout Nginx (5000), MIM Nginx (7810), RPC Nginx (2500-2502)
|
||||
- **Action**: Consider consolidating or document purpose of each
|
||||
|
||||
### 🟢 Low Priority
|
||||
|
||||
9. **Documentation Gaps**
|
||||
- Some VMIDs have incomplete documentation
|
||||
- **Action**: Update documentation with current status
|
||||
|
||||
10. **Service Discovery**
|
||||
- No centralized service registry
|
||||
- **Action**: Consider implementing service discovery
|
||||
|
||||
---
|
||||
|
||||
## 6. Recommendations
|
||||
|
||||
### Immediate Actions (Critical)
|
||||
|
||||
1. **Fix Tunnel Configuration**
|
||||
```yaml
|
||||
# Update Cloudflare dashboard for tunnel: rpc-http-pub.d-bis.org
|
||||
# Route all HTTP endpoints to central Nginx:
|
||||
- explorer.d-bis.org → http://192.168.11.21:80
|
||||
- rpc-http-pub.d-bis.org → http://192.168.11.21:80
|
||||
- rpc-http-prv.d-bis.org → http://192.168.11.21:80
|
||||
- dbis-admin.d-bis.org → http://192.168.11.21:80
|
||||
- dbis-api.d-bis.org → http://192.168.11.21:80
|
||||
- dbis-api-2.d-bis.org → http://192.168.11.21:80
|
||||
- mim4u.org → http://192.168.11.21:80
|
||||
- www.mim4u.org → http://192.168.11.21:80
|
||||
```
|
||||
|
||||
2. **Create Missing DNS Records**
|
||||
- Create CNAME records for all RPC endpoints
|
||||
- Create CNAME records for DBIS services
|
||||
- Create CNAME records for MIM services
|
||||
- All should point to: `<tunnel-id>.cfargotunnel.com`
|
||||
- Enable proxy (orange cloud) for all
|
||||
|
||||
3. **Remove Duplicate DNS Records**
|
||||
- Remove duplicate A records for `besu.d-bis.org`
|
||||
- Remove duplicate A records for `blockscout.d-bis.org`
|
||||
- Remove duplicate A records for `explorer.d-bis.org`
|
||||
- Remove duplicate A records for `d-bis.org` (keep 20.215.32.15)
|
||||
|
||||
4. **Locate Missing VMIDs**
|
||||
- Find RPC nodes (2500-2502) on other Proxmox hosts
|
||||
- Verify DBIS Core services (10100-10151) deployment status
|
||||
|
||||
### Short-term Improvements
|
||||
|
||||
5. **DNS Migration to Tunnels**
|
||||
- Migrate all A records to CNAME records pointing to tunnels
|
||||
- Remove direct IP exposure
|
||||
- Enable proxy for all public services
|
||||
|
||||
6. **Tunnel Consolidation**
|
||||
- Consider consolidating multiple tunnels into single tunnel
|
||||
- Use central Nginx for all HTTP routing
|
||||
- Simplify tunnel management
|
||||
|
||||
7. **Nginx Architecture Review**
|
||||
- Document purpose of each Nginx instance
|
||||
- Consider if all are necessary
|
||||
- Standardize configuration approach
|
||||
|
||||
### Long-term Optimizations
|
||||
|
||||
8. **Service Discovery**
|
||||
- Implement centralized service registry
|
||||
- Automate DNS record creation
|
||||
- Dynamic service routing
|
||||
|
||||
9. **Monitoring and Alerting**
|
||||
- Monitor all tunnel health
|
||||
- Alert on tunnel failures
|
||||
- Track DNS record changes
|
||||
|
||||
10. **Documentation**
|
||||
- Maintain up-to-date infrastructure map
|
||||
- Document all service dependencies
|
||||
- Create runbooks for common operations
|
||||
|
||||
---
|
||||
|
||||
## 7. Architecture Recommendations
|
||||
|
||||
### Recommended Architecture
|
||||
|
||||
```
|
||||
Internet
|
||||
↓
|
||||
Cloudflare (DNS + SSL Termination)
|
||||
↓
|
||||
Cloudflare Tunnel (VMID 102)
|
||||
↓
|
||||
Routing Decision:
|
||||
├─ HTTP Services → Central Nginx (VMID 105:80) → Internal Services
|
||||
└─ WebSocket Services → Direct to RPC Nodes (bypass Nginx)
|
||||
```
|
||||
|
||||
**Key Principle**:
|
||||
- HTTP traffic routes through central Nginx for unified management
|
||||
- WebSocket traffic routes directly to RPC nodes for optimal performance
|
||||
|
||||
### Benefits
|
||||
|
||||
1. **Single Point of Configuration**: All HTTP routing in one place
|
||||
2. **Simplified Management**: Easy to add/remove services
|
||||
3. **Better Security**: No direct IP exposure
|
||||
4. **Centralized Logging**: All traffic logs in one location
|
||||
5. **Easier Troubleshooting**: Single point to check routing
|
||||
|
||||
---
|
||||
|
||||
## 8. Action Items Checklist
|
||||
|
||||
### Critical (Do First)
|
||||
|
||||
- [ ] Update Cloudflare tunnel configuration to route HTTP endpoints to central Nginx
|
||||
- [ ] Create missing DNS CNAME records for all services
|
||||
- [ ] Remove duplicate DNS A records
|
||||
- [x] Locate and verify RPC nodes (2500-2502) - ✅ Found on ml110
|
||||
- [x] Verify DBIS Core services deployment status - ✅ Found on ml110
|
||||
- [ ] Verify network connectivity from pve2 (192.168.11.12) to ml110 (192.168.11.10)
|
||||
|
||||
### Important (Do Next)
|
||||
|
||||
- [ ] Migrate remaining A records to CNAME (tunnel-based)
|
||||
- [ ] Standardize proxy status across all DNS records
|
||||
- [ ] Document all Nginx instances and their purposes
|
||||
- [ ] Test all endpoints after configuration changes
|
||||
|
||||
### Nice to Have
|
||||
|
||||
- [ ] Implement service discovery
|
||||
- [ ] Set up monitoring and alerting
|
||||
- [ ] Create comprehensive infrastructure documentation
|
||||
- [ ] Automate DNS record management
|
||||
|
||||
---
|
||||
|
||||
## 9. DNS Records Migration Plan
|
||||
|
||||
### Current State (A Records - Direct IPs)
|
||||
|
||||
Many services use A records pointing to direct IPs. These should be migrated to CNAME records pointing to Cloudflare tunnels.
|
||||
|
||||
### Migration Priority
|
||||
|
||||
**High Priority** (Public-facing services):
|
||||
1. `explorer.d-bis.org` → CNAME to tunnel
|
||||
2. `rpc-http-pub.d-bis.org` → CNAME to tunnel
|
||||
3. `rpc-ws-pub.d-bis.org` → CNAME to tunnel
|
||||
4. `rpc-http-prv.d-bis.org` → CNAME to tunnel
|
||||
5. `rpc-ws-prv.d-bis.org` → CNAME to tunnel
|
||||
|
||||
**Medium Priority** (Internal services):
|
||||
6. `dbis-admin.d-bis.org` → CNAME to tunnel
|
||||
7. `dbis-api.d-bis.org` → CNAME to tunnel
|
||||
8. `dbis-api-2.d-bis.org` → CNAME to tunnel
|
||||
9. `mim4u.org` → CNAME to tunnel
|
||||
10. `www.mim4u.org` → CNAME to tunnel
|
||||
|
||||
**Low Priority** (Monitoring/internal):
|
||||
11. `grafana.d-bis.org` → CNAME to tunnel (if public access needed)
|
||||
12. `prometheus.d-bis.org` → CNAME to tunnel (if public access needed)
|
||||
13. `monitoring.d-bis.org` → CNAME to tunnel
|
||||
|
||||
### Migration Steps
|
||||
|
||||
For each domain:
|
||||
1. Create CNAME record: `<subdomain>` → `<tunnel-id>.cfargotunnel.com`
|
||||
2. Enable proxy (orange cloud)
|
||||
3. Wait for DNS propagation (1-5 minutes)
|
||||
4. Test endpoint accessibility
|
||||
5. Remove old A record (if exists)
|
||||
|
||||
---
|
||||
|
||||
## 10. Testing Plan
|
||||
|
||||
After implementing recommendations:
|
||||
|
||||
1. **Test HTTP Endpoints**:
|
||||
```bash
|
||||
curl https://explorer.d-bis.org/api/v2/stats
|
||||
curl -X POST https://rpc-http-pub.d-bis.org \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}'
|
||||
curl https://dbis-admin.d-bis.org
|
||||
curl https://mim4u.org
|
||||
```
|
||||
|
||||
2. **Test WebSocket Endpoints**:
|
||||
```bash
|
||||
wscat -c wss://rpc-ws-pub.d-bis.org
|
||||
wscat -c wss://rpc-ws-prv.d-bis.org
|
||||
```
|
||||
|
||||
3. **Test Redirects**:
|
||||
```bash
|
||||
curl -I https://www.mim4u.org # Should redirect to mim4u.org
|
||||
```
|
||||
|
||||
4. **Verify Tunnel Health**:
|
||||
- Check Cloudflare dashboard for tunnel status
|
||||
- Verify all tunnels show HEALTHY
|
||||
- Check tunnel logs for errors
|
||||
|
||||
---
|
||||
|
||||
---
|
||||
|
||||
## 11. Summary of Recommendations
|
||||
|
||||
### 🔴 Critical (Fix Immediately)
|
||||
|
||||
1. **Update Cloudflare Tunnel Configuration**
|
||||
- Tunnel: `rpc-http-pub.d-bis.org` (Tunnel ID: `10ab22da-8ea3-4e2e-a896-27ece2211a05`)
|
||||
- Action: Route all HTTP endpoints to `http://192.168.11.21:80` (central Nginx)
|
||||
- Keep WebSocket endpoints routing directly to RPC nodes
|
||||
|
||||
2. **Create Missing DNS CNAME Records**
|
||||
- `rpc-http-pub.d-bis.org` → CNAME to tunnel
|
||||
- `rpc-ws-pub.d-bis.org` → CNAME to tunnel
|
||||
- `rpc-http-prv.d-bis.org` → CNAME to tunnel
|
||||
- `rpc-ws-prv.d-bis.org` → CNAME to tunnel
|
||||
- `dbis-admin.d-bis.org` → CNAME to tunnel
|
||||
- `dbis-api.d-bis.org` → CNAME to tunnel
|
||||
- `dbis-api-2.d-bis.org` → CNAME to tunnel
|
||||
- `mim4u.org` → CNAME to tunnel
|
||||
- `www.mim4u.org` → CNAME to tunnel
|
||||
|
||||
3. **Remove Duplicate DNS A Records**
|
||||
- `besu.d-bis.org`: Remove one IP (keep single authoritative)
|
||||
- `blockscout.d-bis.org`: Remove one IP
|
||||
- `explorer.d-bis.org`: Remove one IP
|
||||
- `d-bis.org`: Remove 20.215.32.42 (keep 20.215.32.15)
|
||||
|
||||
### 🟡 Important (Fix Soon)
|
||||
|
||||
4. **Migrate A Records to CNAME (Tunnel-based)**
|
||||
- Convert remaining A records to CNAME records
|
||||
- Point all to Cloudflare tunnel endpoints
|
||||
- Enable proxy (orange cloud) for all public services
|
||||
|
||||
5. **Verify Network Connectivity**
|
||||
- Test connectivity from pve2 (192.168.11.12) to ml110 (192.168.11.10)
|
||||
- Ensure RPC nodes (2500-2502) are accessible from central Nginx
|
||||
- Ensure DBIS services (10100-10151) are accessible from central Nginx
|
||||
|
||||
### 🟢 Optimization (Nice to Have)
|
||||
|
||||
6. **Documentation Updates**
|
||||
- Update all service documentation with current IPs and locations
|
||||
- Document network topology (pve2 vs ml110)
|
||||
- Create service dependency map
|
||||
|
||||
7. **Monitoring Setup**
|
||||
- Monitor all tunnel health
|
||||
- Alert on tunnel failures
|
||||
- Track DNS record changes
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
### Architecture Documents
|
||||
- **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md)** ⭐⭐⭐ - Complete network architecture
|
||||
- **[PHYSICAL_HARDWARE_INVENTORY.md](PHYSICAL_HARDWARE_INVENTORY.md)** ⭐⭐⭐ - Physical hardware inventory
|
||||
- **[ORCHESTRATION_DEPLOYMENT_GUIDE.md](ORCHESTRATION_DEPLOYMENT_GUIDE.md)** ⭐⭐⭐ - Deployment orchestration
|
||||
- **[DOMAIN_STRUCTURE.md](DOMAIN_STRUCTURE.md)** ⭐⭐ - Domain structure
|
||||
|
||||
### Network Documents
|
||||
- **[../05-network/CLOUDFLARE_TUNNEL_ROUTING_ARCHITECTURE.md](../05-network/CLOUDFLARE_TUNNEL_ROUTING_ARCHITECTURE.md)** - Cloudflare tunnel routing
|
||||
- **[../05-network/CENTRAL_NGINX_ROUTING_SETUP.md](../05-network/CENTRAL_NGINX_ROUTING_SETUP.md)** - Central Nginx routing
|
||||
|
||||
### Configuration Documents
|
||||
- **[../04-configuration/cloudflare/CLOUDFLARE_DNS_TO_CONTAINERS.md](../04-configuration/cloudflare/CLOUDFLARE_DNS_TO_CONTAINERS.md)** - DNS mapping to containers
|
||||
- **[../04-configuration/RPC_DNS_CONFIGURATION.md](../04-configuration/RPC_DNS_CONFIGURATION.md)** - RPC DNS configuration
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** 2025-12-27
|
||||
**Document Version:** 1.0
|
||||
**Review Cycle:** Quarterly
|
||||
|
||||
172
docs/02-architecture/DOMAIN_STRUCTURE.md
Normal file
172
docs/02-architecture/DOMAIN_STRUCTURE.md
Normal file
@@ -0,0 +1,172 @@
|
||||
# Domain Structure
|
||||
|
||||
**Last Updated:** 2025-01-03
|
||||
**Document Version:** 1.0
|
||||
**Status:** Active Documentation
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This document defines the domain structure for the infrastructure, clarifying which domains are used for different purposes.
|
||||
|
||||
---
|
||||
|
||||
## Domain Assignments
|
||||
|
||||
### 1. sankofa.nexus - Hardware Infrastructure
|
||||
|
||||
**Purpose:** Physical hardware hostnames and internal network DNS
|
||||
|
||||
**Usage:**
|
||||
- All physical servers (ml110, r630-01 through r630-04)
|
||||
- Internal network DNS resolution
|
||||
- SSH access via FQDN
|
||||
- Internal service discovery
|
||||
|
||||
**Examples:**
|
||||
- `ml110.sankofa.nexus` → 192.168.11.10
|
||||
- `r630-01.sankofa.nexus` → 192.168.11.11
|
||||
- `r630-02.sankofa.nexus` → 192.168.11.12
|
||||
- `r630-03.sankofa.nexus` → 192.168.11.13
|
||||
- `r630-04.sankofa.nexus` → 192.168.11.14
|
||||
|
||||
**DNS Configuration:**
|
||||
- Internal DNS server (typically on ER605 or Omada controller)
|
||||
- Not publicly resolvable (internal network only)
|
||||
- Used for local network service discovery
|
||||
|
||||
**Related Documentation:**
|
||||
- [Physical Hardware Inventory](./PHYSICAL_HARDWARE_INVENTORY.md)
|
||||
|
||||
---
|
||||
|
||||
### 2. d-bis.org - ChainID 138 Services
|
||||
|
||||
**Purpose:** Public-facing services for ChainID 138 blockchain network
|
||||
|
||||
**Usage:**
|
||||
- RPC endpoints (public and permissioned)
|
||||
- Block explorer
|
||||
- WebSocket endpoints
|
||||
- Cloudflare tunnels for Proxmox hosts
|
||||
- All ChainID 138 blockchain-related services
|
||||
|
||||
**Examples:**
|
||||
- `rpc.d-bis.org` - Primary RPC endpoint
|
||||
- `rpc2.d-bis.org` - Secondary RPC endpoint
|
||||
- `explorer.d-bis.org` - Block explorer (Blockscout)
|
||||
- `ml110-01.d-bis.org` - Proxmox UI (via Cloudflare tunnel)
|
||||
- `r630-01.d-bis.org` - Proxmox UI (via Cloudflare tunnel)
|
||||
- `r630-02.d-bis.org` - Proxmox UI (via Cloudflare tunnel)
|
||||
- `r630-03.d-bis.org` - Proxmox UI (via Cloudflare tunnel)
|
||||
- `r630-04.d-bis.org` - Proxmox UI (via Cloudflare tunnel)
|
||||
|
||||
**DNS Configuration:**
|
||||
- Cloudflare DNS (proxied)
|
||||
- Publicly resolvable
|
||||
- SSL/TLS via Cloudflare
|
||||
|
||||
**Related Documentation:**
|
||||
- [Cloudflare Tunnel Setup](../04-configuration/CLOUDFLARE_TUNNEL_CONFIGURATION_GUIDE.md)
|
||||
- [RPC Configuration](../04-configuration/RPC_DNS_CONFIGURATION.md)
|
||||
- [Blockscout Setup](../BLOCKSCOUT_COMPLETE_SUMMARY.md)
|
||||
|
||||
---
|
||||
|
||||
### 3. defi-oracle.io - ChainID 138 Legacy (ThirdWeb RPC)
|
||||
|
||||
**Purpose:** Legacy RPC endpoint for ThirdWeb integration
|
||||
|
||||
**Usage:**
|
||||
- ThirdWeb RPC endpoint (VMID 2400)
|
||||
- Legacy compatibility for existing integrations
|
||||
- Public RPC access for ChainID 138
|
||||
|
||||
**Examples:**
|
||||
- `rpc.defi-oracle.io` - Legacy RPC endpoint
|
||||
- `rpc.public-0138.defi-oracle.io` - Specific ChainID 138 RPC endpoint
|
||||
|
||||
**DNS Configuration:**
|
||||
- Cloudflare DNS (proxied)
|
||||
- Publicly resolvable
|
||||
- SSL/TLS via Cloudflare
|
||||
|
||||
**Note:** This domain is maintained for backward compatibility with ThirdWeb integrations. New integrations should use `d-bis.org` endpoints.
|
||||
|
||||
**Related Documentation:**
|
||||
- [ThirdWeb RPC Setup](../04-configuration/THIRDWEB_RPC_CLOUDFLARE_SETUP.md)
|
||||
- [VMID 2400 DNS Structure](../04-configuration/VMID2400_DNS_STRUCTURE.md)
|
||||
|
||||
---
|
||||
|
||||
## Domain Summary Table
|
||||
|
||||
| Domain | Purpose | Public | DNS Provider | SSL/TLS |
|
||||
|--------|---------|--------|--------------|---------|
|
||||
| `sankofa.nexus` | Hardware infrastructure | No (internal) | Internal DNS | Self-signed |
|
||||
| `d-bis.org` | ChainID 138 services | Yes | Cloudflare | Cloudflare |
|
||||
| `defi-oracle.io` | ChainID 138 legacy (ThirdWeb) | Yes | Cloudflare | Cloudflare |
|
||||
|
||||
---
|
||||
|
||||
## Domain Usage Guidelines
|
||||
|
||||
### When to Use sankofa.nexus
|
||||
|
||||
- Internal network communication
|
||||
- SSH access to physical hosts
|
||||
- Internal service discovery
|
||||
- Local network DNS resolution
|
||||
- Proxmox cluster communication
|
||||
|
||||
### When to Use d-bis.org
|
||||
|
||||
- Public blockchain RPC endpoints
|
||||
- Block explorer access
|
||||
- Public-facing Proxmox UI (via tunnels)
|
||||
- ChainID 138 service endpoints
|
||||
- New integrations and services
|
||||
|
||||
### When to Use defi-oracle.io
|
||||
|
||||
- ThirdWeb RPC endpoint (legacy)
|
||||
- Backward compatibility
|
||||
- Existing integrations that reference this domain
|
||||
|
||||
---
|
||||
|
||||
## Migration Notes
|
||||
|
||||
### From defi-oracle.io to d-bis.org
|
||||
|
||||
For new services and integrations:
|
||||
- **Use `d-bis.org`** as the primary domain
|
||||
- `defi-oracle.io` is maintained for legacy ThirdWeb RPC compatibility
|
||||
- All new ChainID 138 services should use `d-bis.org`
|
||||
|
||||
### DNS Record Management
|
||||
|
||||
- **sankofa.nexus**: Managed via internal DNS (Omada controller or local DNS server)
|
||||
- **d-bis.org**: Managed via Cloudflare DNS
|
||||
- **defi-oracle.io**: Managed via Cloudflare DNS
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
### Architecture Documents
|
||||
- **[PHYSICAL_HARDWARE_INVENTORY.md](PHYSICAL_HARDWARE_INVENTORY.md)** ⭐⭐⭐ - Physical hardware inventory
|
||||
- **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md)** ⭐⭐⭐ - Complete network architecture
|
||||
- **[ORCHESTRATION_DEPLOYMENT_GUIDE.md](ORCHESTRATION_DEPLOYMENT_GUIDE.md)** ⭐⭐⭐ - Deployment orchestration
|
||||
|
||||
### Configuration Documents
|
||||
- **[../04-configuration/cloudflare/CLOUDFLARE_TUNNEL_CONFIGURATION_GUIDE.md](../04-configuration/cloudflare/CLOUDFLARE_TUNNEL_CONFIGURATION_GUIDE.md)** - Cloudflare tunnel configuration
|
||||
- **[../04-configuration/RPC_DNS_CONFIGURATION.md](../04-configuration/RPC_DNS_CONFIGURATION.md)** - RPC DNS configuration
|
||||
- **[../05-network/CLOUDFLARE_TUNNEL_ROUTING_ARCHITECTURE.md](../05-network/CLOUDFLARE_TUNNEL_ROUTING_ARCHITECTURE.md)** - Cloudflare routing architecture
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** 2025-01-03
|
||||
**Document Version:** 1.0
|
||||
**Review Cycle:** Quarterly
|
||||
@@ -1,7 +1,10 @@
|
||||
# Network Architecture - Enterprise Orchestration Plan
|
||||
|
||||
**Navigation:** [Home](../README.md) > [Architecture](README.md) > Network Architecture
|
||||
|
||||
**Last Updated:** 2025-01-20
|
||||
**Document Version:** 2.0
|
||||
**Status:** 🟢 Active Documentation
|
||||
**Project:** Sankofa / Phoenix / PanTel · ChainID 138 · Proxmox + Cloudflare Zero Trust + Dual ISP + 6×/28
|
||||
|
||||
---
|
||||
@@ -33,6 +36,8 @@ This document defines the complete enterprise-grade network architecture for the
|
||||
|
||||
## 1. Physical Topology & Hardware Roles
|
||||
|
||||
> **Reference:** For complete physical hardware inventory including IP addresses, credentials, and detailed specifications, see **[PHYSICAL_HARDWARE_INVENTORY.md](PHYSICAL_HARDWARE_INVENTORY.md)**.
|
||||
|
||||
### 1.1 Hardware Role Assignment
|
||||
|
||||
#### Edge / Routing
|
||||
@@ -65,13 +70,14 @@ This document defines the complete enterprise-grade network architecture for the
|
||||
|
||||
### Public Block #1 (Known - Spectrum)
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| **Network** | `76.53.10.32/28` |
|
||||
| **Gateway** | `76.53.10.33` |
|
||||
| **Usable Range** | `76.53.10.33–76.53.10.46` |
|
||||
| **Broadcast** | `76.53.10.47` |
|
||||
| **ER605 WAN1 IP** | `76.53.10.34` (router interface) |
|
||||
| Property | Value | Status |
|
||||
|----------|-------|--------|
|
||||
| **Network** | `76.53.10.32/28` | ✅ Configured |
|
||||
| **Gateway** | `76.53.10.33` | ✅ Active |
|
||||
| **Usable Range** | `76.53.10.33–76.53.10.46` | ✅ In Use |
|
||||
| **Broadcast** | `76.53.10.47` | - |
|
||||
| **ER605 WAN1 IP** | `76.53.10.34` (router interface) | ✅ Active |
|
||||
| **Available IPs** | 13 (76.53.10.35-46, excluding .34) | ✅ Available |
|
||||
|
||||
### Public Blocks #2–#6 (Placeholders - To Be Configured)
|
||||
|
||||
@@ -318,7 +324,43 @@ This architecture should be reflected in:
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
### Architecture Documents
|
||||
- **[PHYSICAL_HARDWARE_INVENTORY.md](PHYSICAL_HARDWARE_INVENTORY.md)** ⭐⭐⭐ - Complete physical hardware inventory and specifications
|
||||
- **[ORCHESTRATION_DEPLOYMENT_GUIDE.md](ORCHESTRATION_DEPLOYMENT_GUIDE.md)** ⭐⭐⭐ - Enterprise deployment orchestration guide
|
||||
- **[VMID_ALLOCATION_FINAL.md](VMID_ALLOCATION_FINAL.md)** ⭐⭐⭐ - VMID allocation registry
|
||||
- **[DOMAIN_STRUCTURE.md](DOMAIN_STRUCTURE.md)** ⭐⭐ - Domain structure and DNS assignments
|
||||
- **[HOSTNAME_MIGRATION_GUIDE.md](HOSTNAME_MIGRATION_GUIDE.md)** ⭐ - Hostname migration procedures
|
||||
|
||||
### Configuration Documents
|
||||
- **[../04-configuration/ER605_ROUTER_CONFIGURATION.md](../04-configuration/ER605_ROUTER_CONFIGURATION.md)** - Router configuration
|
||||
- **[../04-configuration/cloudflare/CLOUDFLARE_ZERO_TRUST_GUIDE.md](../04-configuration/cloudflare/CLOUDFLARE_ZERO_TRUST_GUIDE.md)** - Cloudflare Zero Trust setup
|
||||
- **[../05-network/CLOUDFLARE_TUNNEL_ROUTING_ARCHITECTURE.md](../05-network/CLOUDFLARE_TUNNEL_ROUTING_ARCHITECTURE.md)** - Cloudflare tunnel routing
|
||||
|
||||
### Deployment Documents
|
||||
- **[../03-deployment/ORCHESTRATION_DEPLOYMENT_GUIDE.md](../03-deployment/ORCHESTRATION_DEPLOYMENT_GUIDE.md)** - Deployment orchestration
|
||||
- **[../07-ccip/CCIP_DEPLOYMENT_SPEC.md](../07-ccip/CCIP_DEPLOYMENT_SPEC.md)** - CCIP deployment specification
|
||||
|
||||
---
|
||||
|
||||
**Document Status:** Complete (v2.0)
|
||||
**Maintained By:** Infrastructure Team
|
||||
**Review Cycle:** Quarterly
|
||||
**Next Update:** After public blocks #2-6 are assigned
|
||||
|
||||
---
|
||||
|
||||
## Change Log
|
||||
|
||||
### Version 2.0 (2025-01-20)
|
||||
- Added network topology Mermaid diagram
|
||||
- Added VLAN architecture Mermaid diagram
|
||||
- Added ASCII art network topology
|
||||
- Enhanced public IP block matrix with status indicators
|
||||
- Added breadcrumb navigation
|
||||
- Added status indicators
|
||||
|
||||
### Version 1.0 (2024-12-15)
|
||||
- Initial version
|
||||
- Basic network architecture documentation
|
||||
|
||||
@@ -1,10 +1,12 @@
|
||||
# Orchestration Deployment Guide - Enterprise-Grade
|
||||
|
||||
**Navigation:** [Home](../README.md) > [Architecture](README.md) > Orchestration Deployment Guide
|
||||
|
||||
**Sankofa / Phoenix / PanTel · ChainID 138 · Proxmox + Cloudflare Zero Trust + Dual ISP + 6×/28**
|
||||
|
||||
**Last Updated:** 2025-01-20
|
||||
**Document Version:** 1.0
|
||||
**Status:** Buildable Blueprint
|
||||
**Document Version:** 1.1
|
||||
**Status:** 🟢 Active Documentation
|
||||
|
||||
---
|
||||
|
||||
@@ -23,17 +25,20 @@ This guide provides a **buildable blueprint**: network, VLANs, Proxmox cluster,
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Core Principles](#core-principles)
|
||||
2. [Physical Topology & Roles](#physical-topology--roles)
|
||||
3. [ISP & Public IP Plan](#isp--public-ip-plan)
|
||||
4. [Layer-2 & VLAN Orchestration](#layer-2--vlan-orchestration)
|
||||
5. [Routing, NAT, and Egress Segmentation](#routing-nat-and-egress-segmentation)
|
||||
6. [Proxmox Cluster Orchestration](#proxmox-cluster-orchestration)
|
||||
7. [Cloudflare Zero Trust Orchestration](#cloudflare-zero-trust-orchestration)
|
||||
8. [VMID Allocation Registry](#vmid-allocation-registry)
|
||||
9. [CCIP Fleet Deployment Matrix](#ccip-fleet-deployment-matrix)
|
||||
10. [Deployment Orchestration Workflow](#deployment-orchestration-workflow)
|
||||
11. [Operational Runbooks](#operational-runbooks)
|
||||
**Estimated Reading Time:** 45 minutes
|
||||
**Progress:** Use this TOC to track your reading progress
|
||||
|
||||
1. ✅ [Core Principles](#core-principles) - *Foundation concepts*
|
||||
2. ✅ [Physical Topology & Roles](#physical-topology--roles) - *Hardware layout*
|
||||
3. ✅ [ISP & Public IP Plan](#isp--public-ip-plan) - *Public IP allocation*
|
||||
4. ✅ [Layer-2 & VLAN Orchestration](#layer-2--vlan-orchestration) - *VLAN configuration*
|
||||
5. ✅ [Routing, NAT, and Egress Segmentation](#routing-nat-and-egress-segmentation) - *Network routing*
|
||||
6. ✅ [Proxmox Cluster Orchestration](#proxmox-cluster-orchestration) - *Proxmox setup*
|
||||
7. ✅ [Cloudflare Zero Trust Orchestration](#cloudflare-zero-trust-orchestration) - *Cloudflare integration*
|
||||
8. ✅ [VMID Allocation Registry](#vmid-allocation-registry) - *VMID planning*
|
||||
9. ✅ [CCIP Fleet Deployment Matrix](#ccip-fleet-deployment-matrix) - *CCIP deployment*
|
||||
10. ✅ [Deployment Orchestration Workflow](#deployment-orchestration-workflow) - *Deployment process*
|
||||
11. ✅ [Operational Runbooks](#operational-runbooks) - *Operations guide*
|
||||
|
||||
---
|
||||
|
||||
@@ -52,205 +57,88 @@ This guide provides a **buildable blueprint**: network, VLANs, Proxmox cluster,
|
||||
|
||||
## Physical Topology & Roles
|
||||
|
||||
### Hardware Role Assignment
|
||||
> **Reference:** For complete hardware role assignments, physical topology, and detailed specifications, see **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md#1-physical-topology--hardware-roles)**.
|
||||
|
||||
#### Edge / Routing
|
||||
> **Hardware Inventory:** For complete physical hardware inventory including IP addresses, credentials, hostnames, and detailed specifications, see **[PHYSICAL_HARDWARE_INVENTORY.md](PHYSICAL_HARDWARE_INVENTORY.md)** ⭐⭐⭐.
|
||||
|
||||
**ER605-A (Primary Edge Router)**
|
||||
- WAN1: Spectrum primary with Block #1 (76.53.10.32/28)
|
||||
- WAN2: ISP #2 (failover/alternate policy)
|
||||
- Role: Active edge router, NAT pools, routing
|
||||
|
||||
**ER605-B (Standby Edge Router / Alternate WAN policy)**
|
||||
- Role: Standby router OR dedicated to WAN2 policies/testing
|
||||
- Note: ER605 does not support full stateful HA. This is **active/standby operational redundancy**, not automatic session-preserving HA.
|
||||
|
||||
#### Switching Fabric
|
||||
|
||||
- **ES216G-1**: Core / uplinks / trunks
|
||||
- **ES216G-2**: Compute rack aggregation
|
||||
- **ES216G-3**: Mgmt + out-of-band / staging
|
||||
|
||||
#### Compute
|
||||
|
||||
- **ML110 Gen9**: "Bootstrap & Management" node
|
||||
- IP: 192.168.11.10
|
||||
- Role: Proxmox mgmt services, Omada controller, Git, monitoring seed
|
||||
|
||||
- **4× Dell R630**: Proxmox compute cluster nodes
|
||||
- Resources: 512GB RAM each, 2×600GB boot, 6×250GB SSD
|
||||
- Role: Production workloads, CCIP fleet, sovereign tenants, services
|
||||
**Summary:**
|
||||
- **2× ER605** (edge + HA/failover design)
|
||||
- **3× ES216G switches** (core, compute, mgmt)
|
||||
- **1× ML110 Gen9** (management / seed / bootstrap) - IP: 192.168.11.10
|
||||
- **4× Dell R630** (compute cluster; 512GB RAM each; 2×600GB boot; 6×250GB SSD)
|
||||
|
||||
---
|
||||
|
||||
## ISP & Public IP Plan (6× /28)
|
||||
## ISP & Public IP Plan
|
||||
|
||||
### Public Block #1 (Known - Spectrum)
|
||||
> **Reference:** For complete public IP block plan, usage policy, and NAT pool assignments, see **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md#2-isp--public-ip-plan-6--28)**.
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| **Network** | `76.53.10.32/28` |
|
||||
| **Gateway** | `76.53.10.33` |
|
||||
| **Usable Range** | `76.53.10.33–76.53.10.46` |
|
||||
| **Broadcast** | `76.53.10.47` |
|
||||
| **ER605 WAN1 IP** | `76.53.10.34` (router interface) |
|
||||
|
||||
### Public Blocks #2–#6 (Placeholders - To Be Configured)
|
||||
|
||||
| Block | Network | Gateway | Usable Range | Broadcast | Designated Use |
|
||||
|-------|--------|---------|--------------|-----------|----------------|
|
||||
| **#2** | `<PUBLIC_BLOCK_2>/28` | `<GW2>` | `<USABLE2>` | `<BCAST2>` | CCIP Commit egress NAT pool |
|
||||
| **#3** | `<PUBLIC_BLOCK_3>/28` | `<GW3>` | `<USABLE3>` | `<BCAST3>` | CCIP Execute egress NAT pool |
|
||||
| **#4** | `<PUBLIC_BLOCK_4>/28` | `<GW4>` | `<USABLE4>` | `<BCAST4>` | RMN egress NAT pool |
|
||||
| **#5** | `<PUBLIC_BLOCK_5>/28` | `<GW5>` | `<USABLE5>` | `<BCAST5>` | Sankofa/Phoenix/PanTel service egress |
|
||||
| **#6** | `<PUBLIC_BLOCK_6>/28` | `<GW6>` | `<USABLE6>` | `<BCAST6>` | Sovereign Cloud Band tenant egress |
|
||||
|
||||
### Public IP Usage Policy (Role-based)
|
||||
|
||||
| Public /28 Block | Designated Use | Why |
|
||||
|------------------|----------------|-----|
|
||||
| **#1** (76.53.10.32/28) | Router WAN + break-glass VIPs | Primary connectivity + emergency |
|
||||
| **#2** | CCIP Commit egress NAT pool | Allowlistable egress for source RPCs |
|
||||
| **#3** | CCIP Execute egress NAT pool | Allowlistable egress for destination RPCs |
|
||||
| **#4** | RMN egress NAT pool | Independent security-plane egress |
|
||||
| **#5** | Sankofa/Phoenix/PanTel service egress | Service-plane separation |
|
||||
| **#6** | Sovereign Cloud Band tenant egress | Per-sovereign policy control |
|
||||
**Summary:**
|
||||
- **Block #1** (76.53.10.32/28): Router WAN + break-glass VIPs ✅ Configured
|
||||
- **Blocks #2-6**: Placeholders for CCIP Commit, Execute, RMN, Service, and Sovereign tenant egress NAT pools
|
||||
|
||||
---
|
||||
|
||||
## Layer-2 & VLAN Orchestration
|
||||
|
||||
### VLAN Set (Authoritative)
|
||||
> **Reference:** For complete VLAN orchestration plan, subnet allocations, and switching configuration, see **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md#3-layer-2--vlan-orchestration-plan)**.
|
||||
|
||||
> **Migration Note:** Currently on flat LAN 192.168.11.0/24. This plan migrates to VLANs while keeping compatibility.
|
||||
|
||||
| VLAN ID | VLAN Name | Purpose | Subnet | Gateway |
|
||||
|--------:|-----------|---------|--------|---------|
|
||||
| **11** | MGMT-LAN | Proxmox mgmt, switches mgmt, admin endpoints | 192.168.11.0/24 | 192.168.11.1 |
|
||||
| 110 | BESU-VAL | Validator-only network (no member access) | 10.110.0.0/24 | 10.110.0.1 |
|
||||
| 111 | BESU-SEN | Sentry mesh | 10.111.0.0/24 | 10.111.0.1 |
|
||||
| 112 | BESU-RPC | RPC / gateway tier | 10.112.0.0/24 | 10.112.0.1 |
|
||||
| 120 | BLOCKSCOUT | Explorer + DB | 10.120.0.0/24 | 10.120.0.1 |
|
||||
| 121 | CACTI | Interop middleware | 10.121.0.0/24 | 10.121.0.1 |
|
||||
| 130 | CCIP-OPS | Ops/admin | 10.130.0.0/24 | 10.130.0.1 |
|
||||
| 132 | CCIP-COMMIT | Commit-role DON | 10.132.0.0/24 | 10.132.0.1 |
|
||||
| 133 | CCIP-EXEC | Execute-role DON | 10.133.0.0/24 | 10.133.0.1 |
|
||||
| 134 | CCIP-RMN | Risk management network | 10.134.0.0/24 | 10.134.0.1 |
|
||||
| 140 | FABRIC | Fabric | 10.140.0.0/24 | 10.140.0.1 |
|
||||
| 141 | FIREFLY | FireFly | 10.141.0.0/24 | 10.141.0.1 |
|
||||
| 150 | INDY | Identity | 10.150.0.0/24 | 10.150.0.1 |
|
||||
| 160 | SANKOFA-SVC | Sankofa/Phoenix/PanTel service layer | 10.160.0.0/22 | 10.160.0.1 |
|
||||
| 200 | PHX-SOV-SMOM | Sovereign tenant | 10.200.0.0/20 | 10.200.0.1 |
|
||||
| 201 | PHX-SOV-ICCC | Sovereign tenant | 10.201.0.0/20 | 10.201.0.1 |
|
||||
| 202 | PHX-SOV-DBIS | Sovereign tenant | 10.202.0.0/20 | 10.202.0.1 |
|
||||
| 203 | PHX-SOV-AR | Absolute Realms tenant | 10.203.0.0/20 | 10.203.0.1 |
|
||||
|
||||
### Switching Configuration (ES216G)
|
||||
|
||||
- **ES216G-1**: **Core** (all VLAN trunks to ES216G-2/3 + ER605-A)
|
||||
- **ES216G-2**: **Compute** (trunks to R630s + ML110)
|
||||
- **ES216G-3**: **Mgmt/OOB** (mgmt access ports, staging, out-of-band)
|
||||
|
||||
**All Proxmox uplinks should be 802.1Q trunk ports.**
|
||||
**Summary:**
|
||||
- **19 VLANs** defined with complete subnet plan
|
||||
- **VLAN 11**: MGMT-LAN (192.168.11.0/24) - Current flat LAN
|
||||
- **VLANs 110-203**: Service-specific VLANs (10.x.0.0/24 or /20 or /22)
|
||||
- **Migration path**: From flat LAN to VLANs while maintaining compatibility
|
||||
|
||||
---
|
||||
|
||||
## Routing, NAT, and Egress Segmentation
|
||||
|
||||
### Dual Router Roles
|
||||
> **Reference:** For complete routing configuration, NAT policies, and egress segmentation details, see **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md#4-routing-nat-and-egress-segmentation-er605)**.
|
||||
|
||||
- **ER605-A**: Active edge router (WAN1 = Spectrum primary with Block #1)
|
||||
- **ER605-B**: Standby router OR dedicated to WAN2 policies/testing (no inbound services)
|
||||
|
||||
### NAT Policies (Critical)
|
||||
|
||||
#### Inbound NAT
|
||||
|
||||
- **Default: none**
|
||||
- Break-glass only (optional):
|
||||
- Jumpbox/SSH (single port, IP allowlist, Cloudflare Access preferred)
|
||||
- Proxmox admin should remain **LAN-only**
|
||||
|
||||
#### Outbound NAT (Role-based Pools Using /28 Blocks)
|
||||
|
||||
| Private Subnet | Role | Egress NAT Pool | Public Block |
|
||||
|----------------|------|-----------------|--------------|
|
||||
| 10.132.0.0/24 | CCIP Commit | **Block #2** `<PUBLIC_BLOCK_2>/28` | #2 |
|
||||
| 10.133.0.0/24 | CCIP Execute | **Block #3** `<PUBLIC_BLOCK_3>/28` | #3 |
|
||||
| 10.134.0.0/24 | RMN | **Block #4** `<PUBLIC_BLOCK_4>/28` | #4 |
|
||||
| 10.160.0.0/22 | Sankofa/Phoenix/PanTel | **Block #5** `<PUBLIC_BLOCK_5>/28` | #5 |
|
||||
| 10.200.0.0/20–10.203.0.0/20 | Sovereign tenants | **Block #6** `<PUBLIC_BLOCK_6>/28` | #6 |
|
||||
| 192.168.11.0/24 | Mgmt | Block #1 (or none; tightly restricted) | #1 |
|
||||
|
||||
This yields **provable separation**, allowlisting, and incident scoping.
|
||||
**Summary:**
|
||||
- **Inbound NAT**: Default none (Cloudflare Tunnel primary)
|
||||
- **Outbound NAT**: Role-based pools using /28 blocks #2-6
|
||||
- **Egress Segmentation**: CCIP Commit → Block #2, Execute → Block #3, RMN → Block #4, Services → Block #5, Sovereign → Block #6
|
||||
|
||||
---
|
||||
|
||||
## Proxmox Cluster Orchestration
|
||||
|
||||
### Node Layout
|
||||
> **Reference:** For complete Proxmox cluster orchestration, networking, and storage details, see **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md#5-proxmox-cluster-orchestration)**.
|
||||
|
||||
- **ml110 (192.168.11.10)**: mgmt + seed services + initial automation runner
|
||||
- **r630-01..04**: production compute
|
||||
|
||||
### Proxmox Networking (per host)
|
||||
|
||||
- **`vmbr0`**: VLAN-aware bridge
|
||||
- Native VLAN: 11 (MGMT)
|
||||
- Tagged VLANs: 110,111,112,120,121,130,132,133,134,140,141,150,160,200–203
|
||||
- **Proxmox host IP** remains on **VLAN 11** only.
|
||||
|
||||
### Storage Orchestration (R630)
|
||||
|
||||
**Hardware:**
|
||||
- 2×600GB boot (mirror recommended)
|
||||
- 6×250GB SSD
|
||||
|
||||
**Recommended:**
|
||||
- **Boot drives**: ZFS mirror or hardware RAID1
|
||||
- **Data SSDs**: ZFS pool (striped mirrors if you can pair, or RAIDZ1/2 depending on risk tolerance)
|
||||
- **High-write workloads** (logs/metrics/indexers) on dedicated dataset with quotas
|
||||
**Summary:**
|
||||
- **Node Layout**: ml110 (mgmt) + r630-01..04 (compute)
|
||||
- **Networking**: VLAN-aware bridge `vmbr0` with native VLAN 11
|
||||
- **Storage**: ZFS recommended for R630 data SSDs
|
||||
|
||||
---
|
||||
|
||||
## Cloudflare Zero Trust Orchestration
|
||||
|
||||
### cloudflared Gateway Pattern
|
||||
> **Reference:** For complete Cloudflare Zero Trust orchestration, cloudflared gateway pattern, and tunnel configuration, see **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md#6-cloudflare-zero-trust-orchestration)**.
|
||||
|
||||
Run **2 cloudflared LXCs** for redundancy:
|
||||
**Summary:**
|
||||
- **2 cloudflared LXCs** for redundancy (ML110 + R630)
|
||||
- **Tunnels for**: Blockscout, FireFly, Gitea, internal admin dashboards
|
||||
- **Proxmox UI**: LAN-only (publish via Cloudflare Access if needed)
|
||||
|
||||
- `cloudflared-1` on ML110
|
||||
- `cloudflared-2` on an R630
|
||||
|
||||
Both run tunnels for:
|
||||
- Blockscout
|
||||
- FireFly
|
||||
- Gitea
|
||||
- Internal admin dashboards (Grafana) behind Cloudflare Access
|
||||
|
||||
**Keep Proxmox UI LAN-only**; if needed, publish via Cloudflare Access with strict posture/MFA.
|
||||
For detailed Cloudflare configuration guides, see:
|
||||
- **[../04-configuration/cloudflare/CLOUDFLARE_ZERO_TRUST_GUIDE.md](../04-configuration/cloudflare/CLOUDFLARE_ZERO_TRUST_GUIDE.md)**
|
||||
- **[../04-configuration/cloudflare/CLOUDFLARE_DNS_TO_CONTAINERS.md](../04-configuration/cloudflare/CLOUDFLARE_DNS_TO_CONTAINERS.md)**
|
||||
|
||||
---
|
||||
|
||||
## VMID Allocation Registry
|
||||
|
||||
### Authoritative Registry Summary
|
||||
> **Reference:** For complete VMID allocation registry with detailed breakdowns, see **[VMID_ALLOCATION_FINAL.md](VMID_ALLOCATION_FINAL.md)**.
|
||||
|
||||
| VMID Range | Domain | Count | Notes |
|
||||
|-----------:|--------|------:|-------|
|
||||
| 1000–4999 | **Besu** | 4,000 | Validators, Sentries, RPC, Archive, Reserved |
|
||||
| 5000–5099 | **Blockscout** | 100 | Explorer/Indexing |
|
||||
| 5200–5299 | **Cacti** | 100 | Interop middleware |
|
||||
| 5400–5599 | **CCIP** | 200 | Ops, Monitoring, Commit, Execute, RMN, Reserved |
|
||||
| 6000–6099 | **Fabric** | 100 | Enterprise contracts |
|
||||
| 6200–6299 | **FireFly** | 100 | Workflow/orchestration |
|
||||
| 6400–7399 | **Indy** | 1,000 | Identity layer |
|
||||
| 7800–8999 | **Sankofa/Phoenix/PanTel** | 1,200 | Service + Cloud + Telecom |
|
||||
| 10000–13999 | **Phoenix Sovereign Cloud Band** | 4,000 | SMOM/ICCC/DBIS/AR tenants |
|
||||
**Summary:**
|
||||
- **Total Allocated**: 11,000 VMIDs (1000-13999)
|
||||
- **Besu Network**: 4,000 VMIDs (1000-4999)
|
||||
- **CCIP**: 200 VMIDs (5400-5599)
|
||||
- **Sovereign Cloud Band**: 4,000 VMIDs (10000-13999)
|
||||
|
||||
**Total Allocated**: 11,000 VMIDs (1000-13999)
|
||||
|
||||
See **[VMID_ALLOCATION_FINAL.md](VMID_ALLOCATION_FINAL.md)** for complete details.
|
||||
See also **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md#7-complete-vmid-and-network-allocation-table)** for VMID-to-VLAN mapping.
|
||||
|
||||
---
|
||||
|
||||
@@ -295,6 +183,33 @@ See **[CCIP_DEPLOYMENT_SPEC.md](CCIP_DEPLOYMENT_SPEC.md)** for complete specific
|
||||
|
||||
## Deployment Orchestration Workflow
|
||||
|
||||
### Deployment Workflow Diagram
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Start[Start Deployment] --> Phase0[Phase 0: Validate Foundation]
|
||||
Phase0 --> Check1{Foundation Valid?}
|
||||
Check1 -->|No| Fix1[Fix Issues]
|
||||
Fix1 --> Phase0
|
||||
Check1 -->|Yes| Phase1[Phase 1: Enable VLANs]
|
||||
Phase1 --> Verify1{VLANs Working?}
|
||||
Verify1 -->|No| FixVLAN[Fix VLAN Config]
|
||||
FixVLAN --> Phase1
|
||||
Verify1 -->|Yes| Phase2[Phase 2: Deploy Observability]
|
||||
Phase2 --> Verify2{Monitoring Active?}
|
||||
Verify2 -->|No| FixMonitor[Fix Monitoring]
|
||||
FixMonitor --> Phase2
|
||||
Verify2 -->|Yes| Phase3[Phase 3: Deploy CCIP Fleet]
|
||||
Phase3 --> Verify3{CCIP Nodes Running?}
|
||||
Verify3 -->|No| FixCCIP[Fix CCIP Config]
|
||||
FixCCIP --> Phase3
|
||||
Verify3 -->|Yes| Phase4[Phase 4: Deploy Sovereign Tenants]
|
||||
Phase4 --> Verify4{Tenants Operational?}
|
||||
Verify4 -->|No| FixTenants[Fix Tenant Config]
|
||||
FixTenants --> Phase4
|
||||
Verify4 -->|Yes| Complete[Deployment Complete]
|
||||
```
|
||||
|
||||
### Phase 0 — Validate Foundation
|
||||
|
||||
1. ✅ Confirm ER605-A WAN1 static: **76.53.10.34/28**, GW **76.53.10.33**
|
||||
@@ -336,9 +251,9 @@ See **[CCIP_DEPLOYMENT_SPEC.md](CCIP_DEPLOYMENT_SPEC.md)** for complete specific
|
||||
|
||||
### Network Operations
|
||||
|
||||
- **[ER605_ROUTER_CONFIGURATION.md](ER605_ROUTER_CONFIGURATION.md)** - Router configuration guide
|
||||
- **[BESU_ALLOWLIST_RUNBOOK.md](BESU_ALLOWLIST_RUNBOOK.md)** - Besu allowlist management
|
||||
- **[CLOUDFLARE_ZERO_TRUST_GUIDE.md](CLOUDFLARE_ZERO_TRUST_GUIDE.md)** - Cloudflare Zero Trust setup
|
||||
- **[../04-configuration/ER605_ROUTER_CONFIGURATION.md](../04-configuration/ER605_ROUTER_CONFIGURATION.md)** - Router configuration guide
|
||||
- **[../06-besu/BESU_ALLOWLIST_RUNBOOK.md](../06-besu/BESU_ALLOWLIST_RUNBOOK.md)** - Besu allowlist management
|
||||
- **[../04-configuration/cloudflare/CLOUDFLARE_ZERO_TRUST_GUIDE.md](../04-configuration/cloudflare/CLOUDFLARE_ZERO_TRUST_GUIDE.md)** - Cloudflare Zero Trust setup
|
||||
|
||||
### Deployment Operations
|
||||
|
||||
@@ -348,8 +263,8 @@ See **[CCIP_DEPLOYMENT_SPEC.md](CCIP_DEPLOYMENT_SPEC.md)** for complete specific
|
||||
|
||||
### Troubleshooting
|
||||
|
||||
- **[TROUBLESHOOTING_FAQ.md](TROUBLESHOOTING_FAQ.md)** - Common issues and solutions
|
||||
- **[QBFT_TROUBLESHOOTING.md](QBFT_TROUBLESHOOTING.md)** - QBFT consensus troubleshooting
|
||||
- **[../09-troubleshooting/TROUBLESHOOTING_FAQ.md](../09-troubleshooting/TROUBLESHOOTING_FAQ.md)** - Common issues and solutions
|
||||
- **[../09-troubleshooting/QBFT_TROUBLESHOOTING.md](../09-troubleshooting/QBFT_TROUBLESHOOTING.md)** - QBFT consensus troubleshooting
|
||||
|
||||
---
|
||||
|
||||
@@ -394,34 +309,52 @@ Then we can produce:
|
||||
## Related Documentation
|
||||
|
||||
### Prerequisites
|
||||
- **[PREREQUISITES.md](PREREQUISITES.md)** - System requirements and prerequisites
|
||||
- **[DEPLOYMENT_READINESS.md](DEPLOYMENT_READINESS.md)** - Pre-deployment validation checklist
|
||||
- **[../01-getting-started/PREREQUISITES.md](../01-getting-started/PREREQUISITES.md)** - System requirements and prerequisites
|
||||
- **[../03-deployment/DEPLOYMENT_READINESS.md](../03-deployment/DEPLOYMENT_READINESS.md)** - Pre-deployment validation checklist
|
||||
|
||||
### Architecture
|
||||
- **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md)** - Complete network architecture
|
||||
- **[VMID_ALLOCATION_FINAL.md](VMID_ALLOCATION_FINAL.md)** - VMID allocation registry
|
||||
- **[CCIP_DEPLOYMENT_SPEC.md](CCIP_DEPLOYMENT_SPEC.md)** - CCIP deployment specification
|
||||
- **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md)** ⭐⭐⭐ - Complete network architecture (authoritative reference)
|
||||
- **[PHYSICAL_HARDWARE_INVENTORY.md](PHYSICAL_HARDWARE_INVENTORY.md)** ⭐⭐⭐ - Physical hardware inventory and specifications
|
||||
- **[VMID_ALLOCATION_FINAL.md](VMID_ALLOCATION_FINAL.md)** ⭐⭐⭐ - VMID allocation registry
|
||||
- **[DOMAIN_STRUCTURE.md](DOMAIN_STRUCTURE.md)** ⭐⭐ - Domain structure and DNS assignments
|
||||
- **[CCIP_DEPLOYMENT_SPEC.md](../07-ccip/CCIP_DEPLOYMENT_SPEC.md)** - CCIP deployment specification
|
||||
|
||||
### Configuration
|
||||
- **[ER605_ROUTER_CONFIGURATION.md](ER605_ROUTER_CONFIGURATION.md)** - Router configuration
|
||||
- **[CLOUDFLARE_ZERO_TRUST_GUIDE.md](CLOUDFLARE_ZERO_TRUST_GUIDE.md)** - Cloudflare Zero Trust setup
|
||||
- **[../04-configuration/ER605_ROUTER_CONFIGURATION.md](../04-configuration/ER605_ROUTER_CONFIGURATION.md)** - Router configuration
|
||||
- **[../04-configuration/cloudflare/CLOUDFLARE_ZERO_TRUST_GUIDE.md](../04-configuration/cloudflare/CLOUDFLARE_ZERO_TRUST_GUIDE.md)** - Cloudflare Zero Trust setup
|
||||
|
||||
### Operations
|
||||
- **[OPERATIONAL_RUNBOOKS.md](OPERATIONAL_RUNBOOKS.md)** - Operational procedures
|
||||
- **[DEPLOYMENT_STATUS_CONSOLIDATED.md](DEPLOYMENT_STATUS_CONSOLIDATED.md)** - Deployment status
|
||||
- **[TROUBLESHOOTING_FAQ.md](TROUBLESHOOTING_FAQ.md)** - Troubleshooting guide
|
||||
- **[../03-deployment/OPERATIONAL_RUNBOOKS.md](../03-deployment/OPERATIONAL_RUNBOOKS.md)** - Operational procedures
|
||||
- **[../03-deployment/DEPLOYMENT_STATUS_CONSOLIDATED.md](../03-deployment/DEPLOYMENT_STATUS_CONSOLIDATED.md)** - Deployment status
|
||||
- **[../09-troubleshooting/TROUBLESHOOTING_FAQ.md](../09-troubleshooting/TROUBLESHOOTING_FAQ.md)** - Troubleshooting guide
|
||||
|
||||
### Best Practices
|
||||
- **[RECOMMENDATIONS_AND_SUGGESTIONS.md](RECOMMENDATIONS_AND_SUGGESTIONS.md)** - Comprehensive recommendations
|
||||
- **[IMPLEMENTATION_CHECKLIST.md](IMPLEMENTATION_CHECKLIST.md)** - Implementation checklist
|
||||
- **[../10-best-practices/RECOMMENDATIONS_AND_SUGGESTIONS.md](../10-best-practices/RECOMMENDATIONS_AND_SUGGESTIONS.md)** - Comprehensive recommendations
|
||||
- **[../10-best-practices/IMPLEMENTATION_CHECKLIST.md](../10-best-practices/IMPLEMENTATION_CHECKLIST.md)** - Implementation checklist
|
||||
|
||||
### Reference
|
||||
- **[MASTER_INDEX.md](MASTER_INDEX.md)** - Complete documentation index
|
||||
|
||||
---
|
||||
|
||||
**Document Status:** Complete (v1.0)
|
||||
**Document Status:** Complete (v1.1)
|
||||
**Maintained By:** Infrastructure Team
|
||||
**Review Cycle:** Monthly
|
||||
**Last Updated:** 2025-01-20
|
||||
|
||||
---
|
||||
|
||||
## Change Log
|
||||
|
||||
### Version 1.1 (2025-01-20)
|
||||
- Removed duplicate network architecture content
|
||||
- Added references to NETWORK_ARCHITECTURE.md
|
||||
- Added deployment workflow Mermaid diagram
|
||||
- Added ASCII art process flow
|
||||
- Added breadcrumb navigation
|
||||
- Added status indicators
|
||||
|
||||
### Version 1.0 (2024-12-15)
|
||||
- Initial version
|
||||
- Complete deployment orchestration guide
|
||||
|
||||
|
||||
250
docs/02-architecture/PROXMOX_CLUSTER_ARCHITECTURE.md
Normal file
250
docs/02-architecture/PROXMOX_CLUSTER_ARCHITECTURE.md
Normal file
@@ -0,0 +1,250 @@
|
||||
# Proxmox Cluster Architecture
|
||||
|
||||
**Last Updated:** 2025-01-20
|
||||
**Document Version:** 1.0
|
||||
**Status:** Active Documentation
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This document describes the Proxmox cluster architecture, including node configuration, storage setup, network bridges, and VM/container distribution.
|
||||
|
||||
---
|
||||
|
||||
## Cluster Architecture Diagram
|
||||
|
||||
```mermaid
|
||||
graph TB
|
||||
Cluster[Proxmox Cluster<br/>Name: h]
|
||||
|
||||
ML110[ML110 Management Node<br/>192.168.11.10<br/>6 cores, 125GB RAM]
|
||||
R6301[R630-01<br/>192.168.11.11<br/>32 cores, 503GB RAM]
|
||||
R6302[R630-02<br/>192.168.11.12<br/>32 cores, 503GB RAM]
|
||||
R6303[R630-03<br/>192.168.11.13<br/>32 cores, 512GB RAM]
|
||||
R6304[R630-04<br/>192.168.11.14<br/>32 cores, 512GB RAM]
|
||||
|
||||
Cluster --> ML110
|
||||
Cluster --> R6301
|
||||
Cluster --> R6302
|
||||
Cluster --> R6303
|
||||
Cluster --> R6304
|
||||
|
||||
ML110 --> Storage1[local: 94GB<br/>local-lvm: 813GB]
|
||||
R6301 --> Storage2[local: 536GB<br/>local-lvm: Available]
|
||||
R6302 --> Storage3[local: Available<br/>local-lvm: Available]
|
||||
R6303 --> Storage4[Storage: Available]
|
||||
R6304 --> Storage5[Storage: Available]
|
||||
|
||||
ML110 --> Bridge1[vmbr0<br/>VLAN-aware]
|
||||
R6301 --> Bridge2[vmbr0<br/>VLAN-aware]
|
||||
R6302 --> Bridge3[vmbr0<br/>VLAN-aware]
|
||||
R6303 --> Bridge4[vmbr0<br/>VLAN-aware]
|
||||
R6304 --> Bridge5[vmbr0<br/>VLAN-aware]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Cluster Nodes
|
||||
|
||||
### Node Summary
|
||||
|
||||
| Hostname | IP Address | CPU | RAM | Storage | VMs/Containers | Status |
|
||||
|----------|------------|-----|-----|---------|----------------|--------|
|
||||
| ml110 | 192.168.11.10 | 6 cores @ 1.60GHz | 125GB | local (94GB), local-lvm (813GB) | 34 | ✅ Active |
|
||||
| r630-01 | 192.168.11.11 | 32 cores @ 2.40GHz | 503GB | local (536GB), local-lvm (available) | 0 | ✅ Active |
|
||||
| r630-02 | 192.168.11.12 | 32 cores @ 2.40GHz | 503GB | local (available), local-lvm (available) | 0 | ✅ Active |
|
||||
| r630-03 | 192.168.11.13 | 32 cores | 512GB | Available | 0 | ✅ Active |
|
||||
| r630-04 | 192.168.11.14 | 32 cores | 512GB | Available | 0 | ✅ Active |
|
||||
|
||||
---
|
||||
|
||||
## Storage Configuration
|
||||
|
||||
### Storage Types
|
||||
|
||||
**local (Directory Storage):**
|
||||
- Type: Directory-based storage
|
||||
- Used for: ISO images, container templates, backups
|
||||
- Location: `/var/lib/vz`
|
||||
|
||||
**local-lvm (LVM Thin Storage):**
|
||||
- Type: LVM thin provisioning
|
||||
- Used for: VM/container disk images
|
||||
- Benefits: Thin provisioning, snapshots, efficient space usage
|
||||
|
||||
### Storage by Node
|
||||
|
||||
**ml110:**
|
||||
- `local`: 94GB total, 7.4GB used (7.87%)
|
||||
- `local-lvm`: 813GB total, 214GB used (26.29%)
|
||||
- Status: ✅ Active and operational
|
||||
|
||||
**r630-01:**
|
||||
- `local`: 536GB total, 0% used
|
||||
- `local-lvm`: Available (needs activation)
|
||||
- Status: ⏳ Storage available, ready for use
|
||||
|
||||
**r630-02:**
|
||||
- `local`: Available
|
||||
- `local-lvm`: Available (needs activation)
|
||||
- Status: ⏳ Storage available, ready for use
|
||||
|
||||
**r630-03/r630-04:**
|
||||
- Storage: Available
|
||||
- Status: ⏳ Ready for configuration
|
||||
|
||||
---
|
||||
|
||||
## Network Configuration
|
||||
|
||||
### Network Bridge (vmbr0)
|
||||
|
||||
**All nodes use VLAN-aware bridge:**
|
||||
|
||||
```bash
|
||||
# Bridge configuration (all nodes)
|
||||
auto vmbr0
|
||||
iface vmbr0 inet static
|
||||
address 192.168.11.<HOST_IP>/24
|
||||
gateway 192.168.11.1
|
||||
bridge-ports <PHYSICAL_INTERFACE>
|
||||
bridge-stp off
|
||||
bridge-fd 0
|
||||
bridge-vlan-aware yes
|
||||
bridge-vids 11 110 111 112 120 121 130 132 133 134 140 141 150 160 200 201 202 203
|
||||
```
|
||||
|
||||
**Bridge Features:**
|
||||
- **VLAN-aware:** Supports multiple VLANs on single bridge
|
||||
- **Native VLAN:** 11 (MGMT-LAN)
|
||||
- **Tagged VLANs:** All service VLANs (110-203)
|
||||
- **802.1Q Trunking:** Enabled for VLAN support
|
||||
|
||||
---
|
||||
|
||||
## VM/Container Distribution
|
||||
|
||||
### Current Distribution
|
||||
|
||||
**ml110 (192.168.11.10):**
|
||||
- **Total:** 34 containers/VMs
|
||||
- **Services:** All current services running here
|
||||
- **Breakdown:**
|
||||
- Besu validators: 5 (VMIDs 1000-1004)
|
||||
- Besu sentries: 4 (VMIDs 1500-1503)
|
||||
- Besu RPC: 3+ (VMIDs 2500-2502+)
|
||||
- Blockscout: 1 (VMID 5000)
|
||||
- DBIS services: Multiple
|
||||
- Other services: Various
|
||||
|
||||
**r630-01, r630-02, r630-03, r630-04:**
|
||||
- **Total:** 0 containers/VMs
|
||||
- **Status:** Ready for VM migration/deployment
|
||||
|
||||
---
|
||||
|
||||
## High Availability
|
||||
|
||||
### Current Setup
|
||||
|
||||
- **Cluster Name:** "h"
|
||||
- **HA Mode:** Active/Standby (manual)
|
||||
- **Quorum:** 3+ nodes required for quorum
|
||||
- **Storage:** Local storage (not shared)
|
||||
|
||||
### HA Considerations
|
||||
|
||||
**Current Limitations:**
|
||||
- No shared storage (each node has local storage)
|
||||
- Manual VM migration required
|
||||
- No automatic failover
|
||||
|
||||
**Future Enhancements:**
|
||||
- Consider shared storage (NFS, Ceph, etc.) for true HA
|
||||
- Implement automatic VM migration
|
||||
- Configure HA groups for critical services
|
||||
|
||||
---
|
||||
|
||||
## Resource Allocation
|
||||
|
||||
### CPU Resources
|
||||
|
||||
| Node | CPU Cores | CPU Usage | Available |
|
||||
|------|-----------|-----------|-----------|
|
||||
| ml110 | 6 @ 1.60GHz | High | Limited |
|
||||
| r630-01 | 32 @ 2.40GHz | Low | Excellent |
|
||||
| r630-02 | 32 @ 2.40GHz | Low | Excellent |
|
||||
| r630-03 | 32 cores | Low | Excellent |
|
||||
| r630-04 | 32 cores | Low | Excellent |
|
||||
|
||||
### Memory Resources
|
||||
|
||||
| Node | Total RAM | Used | Available | Usage % |
|
||||
|------|-----------|------|-----------|---------|
|
||||
| ml110 | 125GB | 94GB | 31GB | 75% ⚠️ |
|
||||
| r630-01 | 503GB | ~5GB | ~498GB | 1% ✅ |
|
||||
| r630-02 | 503GB | ~5GB | ~498GB | 1% ✅ |
|
||||
| r630-03 | 512GB | Low | High | Low ✅ |
|
||||
| r630-04 | 512GB | Low | High | Low ✅ |
|
||||
|
||||
---
|
||||
|
||||
## Storage Recommendations
|
||||
|
||||
### For R630 Nodes
|
||||
|
||||
**Boot Drives (2×600GB):**
|
||||
- **Recommended:** ZFS mirror or hardware RAID1
|
||||
- **Purpose:** Proxmox OS and boot files
|
||||
- **Benefits:** Redundancy, data integrity
|
||||
|
||||
**Data SSDs (6×250GB):**
|
||||
- **Option 1:** ZFS striped mirrors (3 pairs)
|
||||
- Capacity: ~750GB usable
|
||||
- Performance: High
|
||||
- Redundancy: Good
|
||||
|
||||
- **Option 2:** ZFS RAIDZ1 (5 drives + 1 parity)
|
||||
- Capacity: ~1.25TB usable
|
||||
- Performance: Good
|
||||
- Redundancy: Single drive failure tolerance
|
||||
|
||||
- **Option 3:** ZFS RAIDZ2 (4 drives + 2 parity)
|
||||
- Capacity: ~1TB usable
|
||||
- Performance: Good
|
||||
- Redundancy: Dual drive failure tolerance
|
||||
|
||||
---
|
||||
|
||||
## Network Recommendations
|
||||
|
||||
### VLAN Configuration
|
||||
|
||||
**All Proxmox hosts should:**
|
||||
- Use VLAN-aware bridge (vmbr0)
|
||||
- Support all 19 VLANs
|
||||
- Maintain native VLAN 11 for management
|
||||
- Enable 802.1Q trunking on physical interfaces
|
||||
|
||||
### Network Performance
|
||||
|
||||
- **Link Speed:** Ensure 1Gbps or higher for trunk ports
|
||||
- **Jumbo Frames:** Consider enabling if supported
|
||||
- **Bonding:** Consider link aggregation for redundancy
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md)** ⭐⭐⭐ - Network architecture with VLAN plan
|
||||
- **[PHYSICAL_HARDWARE_INVENTORY.md](PHYSICAL_HARDWARE_INVENTORY.md)** ⭐⭐⭐ - Physical hardware inventory
|
||||
- **[PROXMOX_COMPREHENSIVE_REVIEW.md](PROXMOX_COMPREHENSIVE_REVIEW.md)** ⭐⭐ - Comprehensive Proxmox review
|
||||
- **[ORCHESTRATION_DEPLOYMENT_GUIDE.md](ORCHESTRATION_DEPLOYMENT_GUIDE.md)** ⭐⭐⭐ - Deployment orchestration
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** 2025-01-20
|
||||
**Document Version:** 1.0
|
||||
**Review Cycle:** Quarterly
|
||||
483
docs/02-architecture/PROXMOX_COMPREHENSIVE_REVIEW.md
Normal file
483
docs/02-architecture/PROXMOX_COMPREHENSIVE_REVIEW.md
Normal file
@@ -0,0 +1,483 @@
|
||||
# Proxmox VE Comprehensive Configuration Review
|
||||
|
||||
**Last Updated:** 2025-01-20
|
||||
**Document Version:** 1.0
|
||||
**Status:** Active Documentation
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
### ✅ Completed Tasks
|
||||
- [x] Hostname migration (pve → r630-01, pve2 → r630-02)
|
||||
- [x] IP address audit (no conflicts found)
|
||||
- [x] Proxmox services verified (all operational)
|
||||
- [x] Storage configuration reviewed
|
||||
|
||||
### ⚠️ Issues Identified
|
||||
- r630-01 and r630-02 have LVM thin storage **disabled**
|
||||
- All VMs/containers currently on ml110 only
|
||||
- Storage not optimized for performance on r630-01/r630-02
|
||||
|
||||
---
|
||||
|
||||
## Hostname Migration - COMPLETE ✅
|
||||
|
||||
### Status
|
||||
- **r630-01** (192.168.11.11): ✅ Hostname changed from `pve` to `r630-01`
|
||||
- **r630-02** (192.168.11.12): ✅ Hostname changed from `pve2` to `r630-02`
|
||||
|
||||
### Verification
|
||||
```bash
|
||||
ssh root@192.168.11.11 "hostname" # Returns: r630-01 ✅
|
||||
ssh root@192.168.11.12 "hostname" # Returns: r630-02 ✅
|
||||
```
|
||||
|
||||
### Notes
|
||||
- Both hosts are in a cluster (cluster name: "h")
|
||||
- Cluster configuration may need update to reflect new hostnames
|
||||
- /etc/hosts updated on both hosts for proper resolution
|
||||
|
||||
---
|
||||
|
||||
## IP Address Audit - COMPLETE ✅
|
||||
|
||||
### Results
|
||||
- **Total VMs/Containers:** 34 with static IPs
|
||||
- **IP Conflicts:** 0 ✅
|
||||
- **Invalid IPs:** 0 ✅
|
||||
- **DHCP IPs:** 2 (VMIDs 3500, 3501)
|
||||
|
||||
### All VMs Currently On
|
||||
- **ml110** (192.168.11.10): All 34 VMs/containers
|
||||
- **r630-01** (192.168.11.11): 0 VMs/containers
|
||||
- **r630-02** (192.168.11.12): 0 VMs/containers
|
||||
|
||||
### IP Allocation Summary
|
||||
| IP Range | Count | Purpose |
|
||||
|----------|-------|---------|
|
||||
| 192.168.11.57 | 1 | Firefly (stopped) |
|
||||
| 192.168.11.60-63 | 4 | ML nodes |
|
||||
| 192.168.11.64 | 1 | Indy |
|
||||
| 192.168.11.80 | 1 | Cacti |
|
||||
| 192.168.11.100-104 | 5 | Besu Validators |
|
||||
| 192.168.11.105-106 | 2 | DBIS PostgreSQL |
|
||||
| 192.168.11.112 | 1 | Fabric |
|
||||
| 192.168.11.120 | 1 | DBIS Redis |
|
||||
| 192.168.11.130 | 1 | DBIS Frontend |
|
||||
| 192.168.11.150-154 | 5 | Besu Sentries |
|
||||
| 192.168.11.155-156 | 2 | DBIS API |
|
||||
| 192.168.11.201-204 | 4 | Named RPC |
|
||||
| 192.168.11.240-242 | 3 | ThirdWeb RPC |
|
||||
| 192.168.11.250-254 | 5 | Public RPC |
|
||||
|
||||
---
|
||||
|
||||
## Proxmox Host Configuration Review
|
||||
|
||||
### ml110 (192.168.11.10)
|
||||
|
||||
| Property | Value | Status |
|
||||
|----------|-------|--------|
|
||||
| **Hostname** | ml110 | ✅ Correct |
|
||||
| **Proxmox Version** | 9.1.0 (kernel 6.17.4-1-pve) | ✅ Current |
|
||||
| **CPU** | Intel Xeon E5-2603 v3 @ 1.60GHz (6 cores) | ⚠️ Older, slower |
|
||||
| **Memory** | 125GB total, 94GB used, 31GB available | ⚠️ High usage |
|
||||
| **Storage - local** | 94GB total, 7.4GB used (7.87%) | ✅ Good |
|
||||
| **Storage - local-lvm** | 813GB total, 214GB used (26.29%) | ✅ Active |
|
||||
| **VMs/Containers** | 34 total | ✅ All here |
|
||||
|
||||
**Storage Details:**
|
||||
- `local`: Directory storage, active, 94GB available
|
||||
- `local-lvm`: LVM thin, active, 600GB available
|
||||
- `thin1-thin6`: Configured but disabled (not in use)
|
||||
|
||||
**Recommendations:**
|
||||
- ⚠️ **CPU is older/slower** - Consider workload distribution
|
||||
- ⚠️ **Memory usage high (75%)** - Monitor closely
|
||||
- ✅ **Storage well configured** - LVM thin active and working
|
||||
|
||||
### r630-01 (192.168.11.11) - Previously "pve"
|
||||
|
||||
| Property | Value | Status |
|
||||
|----------|-------|--------|
|
||||
| **Hostname** | r630-01 | ✅ Migrated |
|
||||
| **Proxmox Version** | 9.1.0 (kernel 6.17.4-1-pve) | ✅ Current |
|
||||
| **CPU** | Intel Xeon E5-2630 v3 @ 2.40GHz (32 cores) | ✅ Good |
|
||||
| **Memory** | 503GB total, 6.4GB used, 497GB available | ✅ Excellent |
|
||||
| **Storage - local** | 536GB total, 0.1GB used (0.00%) | ✅ Available |
|
||||
| **Storage - local-lvm** | **DISABLED** | ⚠️ **Issue** |
|
||||
| **Storage - thin1-thin6** | **DISABLED** | ⚠️ **Issue** |
|
||||
| **VMs/Containers** | 0 | ⏳ Ready for deployment |
|
||||
|
||||
**Storage Details:**
|
||||
- **Volume Group:** `pve` exists with 2 physical volumes
|
||||
- **Thin Pools:** `data` (200GB) and `thin1` (208GB) exist
|
||||
- **Disks:** 4 disks (sda, sdb: 558GB each; sdc, sdd: 232GB each)
|
||||
- **LVM Setup:** Properly configured
|
||||
- **Storage Config Issue:** Storage configured but node references point to "pve" (old hostname) or "pve2"
|
||||
|
||||
**Issues:**
|
||||
- ⚠️ **Storage configured but node references outdated** - Points to "pve" instead of "r630-01"
|
||||
- ⚠️ **Storage may show as disabled** - Due to hostname mismatch in config
|
||||
- ⚠️ **Need to update storage.cfg** - Update node references to r630-01
|
||||
|
||||
**Recommendations:**
|
||||
- 🔴 **CRITICAL:** Enable local-lvm storage to use existing LVM thin pools
|
||||
- 🔴 **CRITICAL:** Activate thin1 storage for better performance
|
||||
- ✅ **Ready for VMs** - Excellent resources available
|
||||
|
||||
### r630-02 (192.168.11.12) - Previously "pve2"
|
||||
|
||||
| Property | Value | Status |
|
||||
|----------|-------|--------|
|
||||
| **Hostname** | r630-02 | ✅ Migrated |
|
||||
| **Proxmox Version** | 9.1.0 (kernel 6.17.4-1-pve) | ✅ Current |
|
||||
| **CPU** | Intel Xeon E5-2660 v4 @ 2.00GHz (56 cores) | ✅ Excellent |
|
||||
| **Memory** | 251GB total, 4.4GB used, 247GB available | ✅ Excellent |
|
||||
| **Storage - local** | 220GB total, 0.1GB used (0.06%) | ✅ Available |
|
||||
| **Storage - local-lvm** | **DISABLED** | ⚠️ **Issue** |
|
||||
| **Storage - thin1-thin6** | **DISABLED** | ⚠️ **Issue** |
|
||||
| **VMs/Containers** | 0 | ⏳ Ready for deployment |
|
||||
|
||||
**Storage Details:**
|
||||
- Need to check LVM configuration (command timed out)
|
||||
- Storage shows as disabled in Proxmox
|
||||
|
||||
**Issues:**
|
||||
- ⚠️ **Storage configured but node references outdated** - Points to "pve2" instead of "r630-02"
|
||||
- ⚠️ **VMs already exist on storage** - Need to verify they're accessible
|
||||
- ⚠️ **Need to update storage.cfg** - Update node references to r630-02
|
||||
|
||||
**Recommendations:**
|
||||
- 🔴 **CRITICAL:** Check and configure LVM storage
|
||||
- 🔴 **CRITICAL:** Enable local-lvm or thin storage
|
||||
- ✅ **Ready for VMs** - Excellent resources available
|
||||
|
||||
---
|
||||
|
||||
## Storage Configuration Analysis
|
||||
|
||||
### Current Storage Status
|
||||
|
||||
| Host | Storage Type | Status | Size | Usage | Recommendation |
|
||||
|------|--------------|--------|------|-------|----------------|
|
||||
| **ml110** | local | ✅ Active | 94GB | 7.87% | ✅ Good |
|
||||
| **ml110** | local-lvm | ✅ Active | 813GB | 26.29% | ✅ Good |
|
||||
| **r630-01** | local | ✅ Active | 536GB | 0.00% | ✅ Ready |
|
||||
| **r630-01** | local-lvm | ❌ Disabled | 0GB | N/A | 🔴 **Enable** |
|
||||
| **r630-01** | thin1 | ❌ Disabled | 0GB | N/A | 🔴 **Enable** |
|
||||
| **r630-02** | local | ✅ Active | 220GB | 0.06% | ✅ Ready |
|
||||
| **r630-02** | local-lvm | ❌ Disabled | 0GB | N/A | 🔴 **Enable** |
|
||||
| **r630-02** | thin1-thin6 | ❌ Disabled | 0GB | N/A | 🔴 **Enable** |
|
||||
|
||||
### Storage Issues
|
||||
|
||||
#### r630-01 Storage Issue
|
||||
**Problem:** LVM thin pools exist (`data` 200GB, `thin1` 208GB) but Proxmox storage is disabled
|
||||
|
||||
**Root Cause:** Storage configured in Proxmox but not activated/enabled
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
# Update storage.cfg node references on r630-01
|
||||
ssh root@192.168.11.11
|
||||
# Update node references from "pve" to "r630-01"
|
||||
sed -i 's/nodes pve$/nodes r630-01/' /etc/pve/storage.cfg
|
||||
sed -i 's/nodes pve /nodes r630-01 /' /etc/pve/storage.cfg
|
||||
# Enable storage
|
||||
pvesm set local-lvm --disable 0 2>/dev/null || true
|
||||
pvesm set thin1 --disable 0 2>/dev/null || true
|
||||
```
|
||||
|
||||
#### r630-02 Storage Issue
|
||||
**Problem:** Storage disabled, LVM configuration unknown
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
# Update storage.cfg node references on r630-02
|
||||
ssh root@192.168.11.12
|
||||
# Update node references from "pve2" to "r630-02"
|
||||
sed -i 's/nodes pve2$/nodes r630-02/' /etc/pve/storage.cfg
|
||||
sed -i 's/nodes pve2 /nodes r630-02 /' /etc/pve/storage.cfg
|
||||
# Enable all thin storage pools
|
||||
for storage in thin1 thin2 thin3 thin4 thin5 thin6; do
|
||||
pvesm set "$storage" --disable 0 2>/dev/null || true
|
||||
done
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Critical Recommendations
|
||||
|
||||
### 1. Enable LVM Thin Storage on r630-01 and r630-02 🔴 CRITICAL
|
||||
|
||||
**Priority:** HIGH
|
||||
**Impact:** Cannot migrate VMs or create new VMs with optimal storage
|
||||
|
||||
**Action Required:**
|
||||
1. Enable `local-lvm` storage on both hosts
|
||||
2. Activate `thin1` storage pools if they exist
|
||||
3. Verify storage is accessible and working
|
||||
|
||||
**Script Available:** `scripts/enable-local-lvm-storage.sh` (may need updates)
|
||||
|
||||
### 2. Distribute VMs Across Hosts ⚠️ RECOMMENDED
|
||||
|
||||
**Current State:** All 34 VMs on ml110 (overloaded)
|
||||
|
||||
**Recommendation:**
|
||||
- Migrate some VMs to r630-01 and r630-02
|
||||
- Balance workload across all three hosts
|
||||
- Use r630-01/r630-02 for new deployments
|
||||
|
||||
**Benefits:**
|
||||
- Better resource utilization
|
||||
- Improved performance (ml110 CPU is slower)
|
||||
- Better redundancy
|
||||
|
||||
### 3. Update Cluster Configuration ⚠️ RECOMMENDED
|
||||
|
||||
**Issue:** Hostnames changed but cluster may still reference old names
|
||||
|
||||
**Action:**
|
||||
```bash
|
||||
# Check cluster configuration
|
||||
pvecm status
|
||||
pvecm nodes
|
||||
|
||||
# Update if needed (may require cluster reconfiguration)
|
||||
```
|
||||
|
||||
### 4. Storage Performance Optimization ⚠️ RECOMMENDED
|
||||
|
||||
**Current:**
|
||||
- ml110: Using local-lvm (good)
|
||||
- r630-01: Only local (directory) available (slower)
|
||||
- r630-02: Only local (directory) available (slower)
|
||||
|
||||
**Recommendation:**
|
||||
- Enable LVM thin storage on r630-01/r630-02 for better performance
|
||||
- Use thin provisioning for space efficiency
|
||||
- Monitor storage usage
|
||||
|
||||
### 5. Resource Monitoring ⚠️ RECOMMENDED
|
||||
|
||||
**ml110:**
|
||||
- Memory usage: 75% (high) - Monitor closely
|
||||
- CPU: Older/slower - Consider workload reduction
|
||||
|
||||
**r630-01/r630-02:**
|
||||
- Excellent resources available
|
||||
- Ready for heavy workloads
|
||||
|
||||
---
|
||||
|
||||
## Detailed Recommendations by Category
|
||||
|
||||
### Storage Recommendations
|
||||
|
||||
#### Immediate Actions
|
||||
1. **Enable local-lvm on r630-01**
|
||||
- LVM thin pools already exist
|
||||
- Just need to activate in Proxmox
|
||||
- Will enable efficient storage for VMs
|
||||
|
||||
2. **Configure storage on r630-02**
|
||||
- Check LVM configuration
|
||||
- Enable appropriate storage type
|
||||
- Ensure compatibility with cluster
|
||||
|
||||
3. **Verify storage after enabling**
|
||||
- Test VM creation
|
||||
- Test storage migration
|
||||
- Monitor performance
|
||||
|
||||
#### Long-term Actions
|
||||
1. **Implement storage monitoring**
|
||||
- Set up alerts for storage usage >80%
|
||||
- Monitor thin pool usage
|
||||
- Track storage growth trends
|
||||
|
||||
2. **Consider shared storage**
|
||||
- For easier VM migration
|
||||
- For better redundancy
|
||||
- NFS or Ceph options
|
||||
|
||||
### Network Recommendations
|
||||
|
||||
#### Current Status
|
||||
- All hosts on 192.168.11.0/24 network
|
||||
- Flat network (no VLANs yet)
|
||||
- Gateway: 192.168.11.1 (ER605-1)
|
||||
|
||||
#### Recommendations
|
||||
1. **VLAN Migration** (Planned)
|
||||
- Segment network by service type
|
||||
- Improve security and isolation
|
||||
- Better traffic management
|
||||
|
||||
2. **Network Monitoring**
|
||||
- Monitor bandwidth usage
|
||||
- Track network performance
|
||||
- Alert on network issues
|
||||
|
||||
### Cluster Recommendations
|
||||
|
||||
#### Current Status
|
||||
- Cluster name: "h"
|
||||
- 3 nodes: ml110, r630-01, r630-02
|
||||
- Cluster operational
|
||||
|
||||
#### Recommendations
|
||||
1. **Update Cluster Configuration**
|
||||
- Verify hostname changes reflected in cluster
|
||||
- Update any references to old hostnames
|
||||
- Test cluster operations
|
||||
|
||||
2. **Cluster Quorum**
|
||||
- Ensure quorum is maintained
|
||||
- Monitor cluster health
|
||||
- Document cluster procedures
|
||||
|
||||
### Performance Recommendations
|
||||
|
||||
#### ml110
|
||||
- **CPU:** Older/slower - Consider reducing workload
|
||||
- **Memory:** High usage - Monitor and optimize
|
||||
- **Storage:** Well configured - No changes needed
|
||||
|
||||
#### r630-01
|
||||
- **CPU:** Good performance - Ready for workloads
|
||||
- **Memory:** Excellent - Can handle many VMs
|
||||
- **Storage:** Needs activation - Critical fix needed
|
||||
|
||||
#### r630-02
|
||||
- **CPU:** Excellent (56 cores) - Best performance
|
||||
- **Memory:** Excellent - Can handle many VMs
|
||||
- **Storage:** Needs configuration - Critical fix needed
|
||||
|
||||
---
|
||||
|
||||
## Action Items
|
||||
|
||||
### Critical (Do Before Starting VMs)
|
||||
|
||||
1. ✅ **Hostname Migration** - COMPLETE
|
||||
2. ✅ **IP Address Audit** - COMPLETE
|
||||
3. 🔴 **Enable local-lvm storage on r630-01** - PENDING
|
||||
4. 🔴 **Configure storage on r630-02** - PENDING
|
||||
5. ⚠️ **Verify cluster configuration** - PENDING
|
||||
|
||||
### High Priority
|
||||
|
||||
1. ⚠️ **Test VM creation on r630-01/r630-02** - After storage enabled
|
||||
2. ⚠️ **Update cluster configuration** - Verify hostname changes
|
||||
3. ⚠️ **Plan VM distribution** - Balance workload across hosts
|
||||
|
||||
### Medium Priority
|
||||
|
||||
1. ⚠️ **Implement storage monitoring** - Set up alerts
|
||||
2. ⚠️ **Document storage procedures** - For future reference
|
||||
3. ⚠️ **Plan VLAN migration** - Network segmentation
|
||||
|
||||
---
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
### Hostname Verification
|
||||
- [x] r630-01 hostname correct
|
||||
- [x] r630-02 hostname correct
|
||||
- [x] /etc/hosts updated on both hosts
|
||||
- [ ] Cluster configuration updated (if needed)
|
||||
|
||||
### IP Address Verification
|
||||
- [x] No conflicts detected
|
||||
- [x] No invalid IPs
|
||||
- [x] All IPs documented
|
||||
- [x] IP audit script working
|
||||
|
||||
### Storage Verification
|
||||
- [x] ml110 storage working
|
||||
- [ ] r630-01 local-lvm enabled
|
||||
- [ ] r630-02 storage configured
|
||||
- [ ] Storage tested and working
|
||||
|
||||
### Service Verification
|
||||
- [x] All Proxmox services running
|
||||
- [x] Web interfaces accessible
|
||||
- [x] Cluster operational
|
||||
- [ ] Storage accessible
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
### Immediate (Before Starting VMs)
|
||||
|
||||
1. **Enable Storage on r630-01:**
|
||||
```bash
|
||||
ssh root@192.168.11.11
|
||||
# Check current storage config
|
||||
cat /etc/pve/storage.cfg
|
||||
# Enable local-lvm
|
||||
pvesm set local-lvm --disable 0
|
||||
# Or reconfigure if needed
|
||||
```
|
||||
|
||||
2. **Configure Storage on r630-02:**
|
||||
```bash
|
||||
ssh root@192.168.11.12
|
||||
# Check LVM setup
|
||||
vgs
|
||||
lvs
|
||||
# Configure appropriate storage
|
||||
```
|
||||
|
||||
3. **Verify Storage:**
|
||||
```bash
|
||||
# On each host
|
||||
pvesm status
|
||||
# Should show local-lvm as active
|
||||
```
|
||||
|
||||
### After Storage is Enabled
|
||||
|
||||
1. **Test VM Creation:**
|
||||
- Create test container on r630-01
|
||||
- Create test container on r630-02
|
||||
- Verify storage works correctly
|
||||
|
||||
2. **Start VMs:**
|
||||
- All IPs verified, no conflicts
|
||||
- Hostnames correct
|
||||
- Storage ready
|
||||
|
||||
---
|
||||
|
||||
## Scripts Available
|
||||
|
||||
1. **`scripts/check-all-vm-ips.sh`** - ✅ Working - IP audit
|
||||
2. **`scripts/migrate-hostnames-proxmox.sh`** - ✅ Complete - Hostname migration
|
||||
3. **`scripts/diagnose-proxmox-hosts.sh`** - ✅ Working - Diagnostics
|
||||
4. **`scripts/enable-local-lvm-storage.sh`** - ⏳ May need updates for r630-01/r630-02
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
### Architecture Documents
|
||||
- **[PHYSICAL_HARDWARE_INVENTORY.md](PHYSICAL_HARDWARE_INVENTORY.md)** ⭐⭐⭐ - Physical hardware inventory
|
||||
- **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md)** ⭐⭐⭐ - Network architecture
|
||||
- **[ORCHESTRATION_DEPLOYMENT_GUIDE.md](ORCHESTRATION_DEPLOYMENT_GUIDE.md)** ⭐⭐⭐ - Deployment orchestration
|
||||
|
||||
### Deployment Documents
|
||||
- **[../03-deployment/PRE_START_CHECKLIST.md](../03-deployment/PRE_START_CHECKLIST.md)** - Pre-start checklist
|
||||
- **[../03-deployment/LVM_THIN_PVE_ENABLED.md](../03-deployment/LVM_THIN_PVE_ENABLED.md)** - LVM thin storage setup
|
||||
- **[../09-troubleshooting/STORAGE_MIGRATION_ISSUE.md](../09-troubleshooting/STORAGE_MIGRATION_ISSUE.md)** - Storage migration troubleshooting
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** 2025-01-20
|
||||
**Document Version:** 1.0
|
||||
**Review Cycle:** Quarterly
|
||||
@@ -1,6 +1,12 @@
|
||||
# Final VMID Allocation Plan
|
||||
|
||||
**Updated**: Complete sovereign-scale allocation with all domains
|
||||
**Navigation:** [Home](../README.md) > [Architecture](README.md) > VMID Allocation
|
||||
|
||||
**Last Updated:** 2025-01-20
|
||||
**Document Version:** 1.0
|
||||
**Status:** 🟢 Active Documentation
|
||||
|
||||
---
|
||||
|
||||
## Complete VMID Allocation Table
|
||||
|
||||
|
||||
Reference in New Issue
Block a user