Refactor code for improved readability and performance

This commit is contained in:
defiQUG
2025-12-21 22:32:09 -08:00
parent 79e3c02f50
commit b45c2006be
2259 changed files with 380318 additions and 2 deletions

View File

@@ -0,0 +1,284 @@
# Deployment Readiness Checklist
**Target:** ml110-01 (192.168.11.10)
**Status:****READY FOR DEPLOYMENT**
**Date:** $(date)
---
## ✅ Pre-Deployment Validation
### System Prerequisites
- [x] Node.js 16+ installed (v22.21.1) ✅
- [x] pnpm 8+ installed (10.24.0) ✅
- [x] Git installed (2.43.0) ✅
- [x] Required tools (curl, jq, bash) ✅
### Workspace Setup
- [x] Project structure organized ✅
- [x] All submodules initialized ✅
- [x] All dependencies installed ✅
- [x] Scripts directory organized ✅
- [x] Documentation organized ✅
### Configuration
- [x] `.env` file configured ✅
- [x] PROXMOX_HOST set (192.168.11.10) ✅
- [x] PROXMOX_USER set (root@pam) ✅
- [x] PROXMOX_TOKEN_NAME set (mcp-server) ✅
- [x] PROXMOX_TOKEN_VALUE configured ✅
- [x] API connection verified ✅
- [x] Deployment configs created ✅
### Validation Results
- [x] Prerequisites: 32/32 passing (100%) ✅
- [x] Deployment validation: 41/41 passing (100%) ✅
- [x] API connection: Working (Proxmox 9.1.1) ✅
- [x] Storage accessible ✅
- [x] Templates accessible ✅
- [x] No VMID conflicts ✅
---
## 🚀 Deployment Steps
### Step 1: Review Configuration
```bash
# Review deployment configuration
cat smom-dbis-138-proxmox/config/proxmox.conf
cat smom-dbis-138-proxmox/config/network.conf
```
**Key Settings:**
- Target Node: Auto-detected from Proxmox
- Storage: local-lvm (or configured storage)
- Network: 10.3.1.0/24
- VMID Ranges: Configured (106-153)
### Step 2: Verify Resources
**Estimated Requirements:**
- Memory: ~96GB
- Disk: ~1.35TB
- CPU: ~42 cores (can be shared)
**Current Status:**
- Check available resources on ml110-01
- Ensure sufficient capacity for deployment
### Step 3: Run Deployment
**Option A: Deploy Everything (Recommended)**
```bash
cd smom-dbis-138-proxmox
sudo ./scripts/deployment/deploy-all.sh
```
**Option B: Deploy Step-by-Step**
```bash
cd smom-dbis-138-proxmox
# 1. Deploy Besu nodes
sudo ./scripts/deployment/deploy-besu-nodes.sh
# 2. Deploy services
sudo ./scripts/deployment/deploy-services.sh
# 3. Deploy Hyperledger services
sudo ./scripts/deployment/deploy-hyperledger-services.sh
# 4. Deploy monitoring
sudo ./scripts/deployment/deploy-monitoring.sh
# 5. Deploy explorer
sudo ./scripts/deployment/deploy-explorer.sh
```
### Step 4: Post-Deployment
After containers are created:
1. **Copy Configuration Files**
```bash
# Copy genesis.json and configs to containers
# (Adjust paths as needed)
```
2. **Copy Validator Keys**
```bash
# Copy keys to validator containers only
```
3. **Update Static Nodes**
```bash
./scripts/network/update-static-nodes.sh
```
4. **Start Services**
```bash
# Start Besu services in containers
```
5. **Verify Deployment**
```bash
# Check container status
# Verify network connectivity
# Test RPC endpoints
```
---
## 📋 Deployment Components
### Phase 1: Blockchain Core (Besu)
- **Validators** (VMID 106-109): 4 nodes
- **Sentries** (VMID 110-114): 3 nodes
- **RPC Nodes** (VMID 115-119): 3 nodes
### Phase 2: Services
- **Oracle Publisher** (VMID 120)
- **CCIP Monitor** (VMID 121)
- **Keeper** (VMID 122)
- **Financial Tokenization** (VMID 123)
### Phase 3: Hyperledger Services
- **Firefly** (VMID 150)
- **Cacti** (VMID 151)
- **Fabric** (VMID 152) - Optional
- **Indy** (VMID 153) - Optional
### Phase 4: Monitoring
- **Monitoring Stack** (VMID 130)
### Phase 5: Explorer
- **Blockscout** (VMID 140)
**Total Containers:** ~20-25 containers
---
## ⚠️ Important Notes
### Resource Considerations
- Memory warning: Estimated ~96GB needed, verify available capacity
- Disk space: ~1.35TB estimated, ensure sufficient storage
- CPU: Can be shared, but ensure adequate cores
### Network Configuration
- Subnet: 10.3.1.0/24
- Gateway: 10.3.1.1
- VLANs: Configured per node type
### Security
- API token configured and working
- Containers will be created with proper permissions
- Network isolation via VLANs
---
## 🔍 Verification Commands
### Check Deployment Status
```bash
# List all containers
pct list
# Check specific container
pct status <vmid>
# View container config
pct config <vmid>
```
### Test Connectivity
```bash
# Test RPC endpoint
curl -X POST http://10.3.1.40:8545 \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'
```
### Monitor Resources
```bash
# Check node resources
pvesh get /nodes/<node>/status
# Check storage
pvesh get /nodes/<node>/storage
```
---
## 📊 Deployment Timeline
**Estimated Time:**
- Besu nodes: ~30-60 minutes
- Services: ~15-30 minutes
- Hyperledger: ~30-45 minutes
- Monitoring: ~15-20 minutes
- Explorer: ~20-30 minutes
- **Total: ~2-3 hours** (depending on resources)
---
## 🆘 Troubleshooting
### If Deployment Fails
1. **Check Logs**
```bash
tail -f smom-dbis-138-proxmox/logs/deployment-*.log
```
2. **Verify Resources**
```bash
./scripts/validate-ml110-deployment.sh
```
3. **Check API Connection**
```bash
./scripts/test-connection.sh
```
4. **Review Configuration**
```bash
cat smom-dbis-138-proxmox/config/proxmox.conf
```
---
## ✅ Final Checklist
Before starting deployment:
- [x] All prerequisites met
- [x] Configuration reviewed
- [x] Resources verified
- [x] API connection working
- [x] Storage accessible
- [x] Templates available
- [x] No VMID conflicts
- [ ] Backup plan in place (recommended)
- [ ] Deployment window scheduled (if production)
- [ ] Team notified (if applicable)
---
## 🎯 Ready to Deploy
**Status:** ✅ **ALL SYSTEMS GO**
All validations passed. The system is ready for deployment to ml110-01.
**Next Command:**
```bash
cd smom-dbis-138-proxmox && sudo ./scripts/deployment/deploy-all.sh
```
---
**Last Updated:** $(date)
**Validation Status:** ✅ Complete
**Deployment Status:** ✅ Ready

View File

@@ -0,0 +1,258 @@
# Deployment Status - Consolidated
**Last Updated:** 2025-01-20
**Document Version:** 2.0
**Status:** Active Deployment
---
## Overview
This document consolidates all deployment status information into a single authoritative source. It replaces multiple status documents with one comprehensive view.
---
## Current Deployment Status
### Proxmox Host: ml110 (192.168.11.10)
**Status:** ✅ Operational
### Active Containers
| VMID | Hostname | Status | IP Address | VLAN | Service Status | Notes |
|------|----------|--------|------------|------|----------------|-------|
| 1000 | besu-validator-1 | ✅ Running | 192.168.11.100 | 11 (mgmt) | ✅ Active | Static IP |
| 1001 | besu-validator-2 | ✅ Running | 192.168.11.101 | 11 (mgmt) | ✅ Active | Static IP |
| 1002 | besu-validator-3 | ✅ Running | 192.168.11.102 | 11 (mgmt) | ✅ Active | Static IP |
| 1003 | besu-validator-4 | ✅ Running | 192.168.11.103 | 11 (mgmt) | ✅ Active | Static IP |
| 1004 | besu-validator-5 | ✅ Running | 192.168.11.104 | 11 (mgmt) | ✅ Active | Static IP |
| 1500 | besu-sentry-1 | ✅ Running | 192.168.11.150 | 11 (mgmt) | ✅ Active | Static IP |
| 1501 | besu-sentry-2 | ✅ Running | 192.168.11.151 | 11 (mgmt) | ✅ Active | Static IP |
| 1502 | besu-sentry-3 | ✅ Running | 192.168.11.152 | 11 (mgmt) | ✅ Active | Static IP |
| 1503 | besu-sentry-4 | ✅ Running | 192.168.11.153 | 11 (mgmt) | ✅ Active | Static IP |
| 2500 | besu-rpc-1 | ✅ Running | 192.168.11.250 | 11 (mgmt) | ✅ Active | Static IP |
| 2501 | besu-rpc-2 | ✅ Running | 192.168.11.251 | 11 (mgmt) | ✅ Active | Static IP |
| 2502 | besu-rpc-3 | ✅ Running | 192.168.11.252 | 11 (mgmt) | ✅ Active | Static IP |
**Total Active Containers:** 12
**Total Memory:** 104GB
**Total CPU Cores:** 40 cores
### Network Status
**Current Network:** Flat LAN (192.168.11.0/24)
**VLAN Migration:** ⏳ Pending
**Target Network:** VLAN-based (see [NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md))
### Service Status
**Besu Services:**
- ✅ 5 Validators: Active
- ✅ 4 Sentries: Active
- ✅ 3 RPC Nodes: Active
**Consensus:**
- ✅ QBFT consensus operational
- ✅ Block production: Normal
- ✅ Validator participation: 5/5
---
## Deployment Phases
### Phase 0 — Foundation ✅
- [x] ER605-A WAN1 configured: 76.53.10.34/28
- [x] Proxmox mgmt accessible
- [x] Basic containers deployed
### Phase 1 — VLAN Enablement ⏳
- [ ] ES216G trunk ports configured
- [ ] VLAN-aware bridge enabled on Proxmox
- [ ] VLAN interfaces created on ER605
- [ ] Services migrated to VLANs
### Phase 2 — Observability ⏳
- [ ] Monitoring stack deployed
- [ ] Grafana published via Cloudflare Access
- [ ] Alerts configured
### Phase 3 — CCIP Fleet ⏳
- [ ] CCIP Ops/Admin deployed
- [ ] 16 commit nodes deployed
- [ ] 16 execute nodes deployed
- [ ] 7 RMN nodes deployed
- [ ] NAT pools configured
### Phase 4 — Sovereign Tenants ⏳
- [ ] Sovereign VLANs configured
- [ ] Tenant isolation enforced
- [ ] Access control configured
---
## Resource Usage
### Current Resources (ml110)
| Resource | Allocated | Available | Usage % |
|----------|-----------|-----------|---------|
| Memory | 104GB | [TBD] | [TBD] |
| CPU Cores | 40 | [TBD] | [TBD] |
| Disk | ~1.2TB | [TBD] | [TBD] |
### Planned Resources (R630 Cluster)
| Node | Memory | CPU | Disk | Status |
|------|--------|-----|------|--------|
| r630-01 | 512GB | [TBD] | 2×600GB + 6×250GB | ⏳ Pending |
| r630-02 | 512GB | [TBD] | 2×600GB + 6×250GB | ⏳ Pending |
| r630-03 | 512GB | [TBD] | 2×600GB + 6×250GB | ⏳ Pending |
| r630-04 | 512GB | [TBD] | 2×600GB + 6×250GB | ⏳ Pending |
---
## Network Architecture
### Current (Flat LAN)
- **Network:** 192.168.11.0/24
- **Gateway:** 192.168.11.1
- **All services:** On same network
### Target (VLAN-based)
See **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md)** for complete VLAN plan.
**Key VLANs:**
- VLAN 11: MGMT-LAN (192.168.11.0/24) - Legacy compatibility
- VLAN 110: BESU-VAL (10.110.0.0/24) - Validators
- VLAN 111: BESU-SEN (10.111.0.0/24) - Sentries
- VLAN 112: BESU-RPC (10.112.0.0/24) - RPC nodes
- VLAN 132: CCIP-COMMIT (10.132.0.0/24) - CCIP Commit nodes
- VLAN 133: CCIP-EXEC (10.133.0.0/24) - CCIP Execute nodes
- VLAN 134: CCIP-RMN (10.134.0.0/24) - CCIP RMN nodes
---
## Public IP Blocks
### Block #1 (Configured)
- **Network:** 76.53.10.32/28
- **Gateway:** 76.53.10.33
- **ER605 WAN1:** 76.53.10.34
- **Usage:** Router WAN + break-glass VIPs
### Blocks #2-6 (Pending)
- **Block #2:** CCIP Commit egress NAT pool
- **Block #3:** CCIP Execute egress NAT pool
- **Block #4:** RMN egress NAT pool
- **Block #5:** Sankofa/Phoenix/PanTel service egress
- **Block #6:** Sovereign Cloud Band tenant egress
See **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md)** for details.
---
## Known Issues
### Resolved ✅
- ✅ VMID 1000 IP configuration fixed (now 192.168.11.100)
- ✅ Besu services active (11/12 services running)
- ✅ Validator key issues resolved
### Pending ⏳
- ⏳ VLAN migration not started
- ⏳ CCIP fleet not deployed
- ⏳ Monitoring stack not deployed
- ⏳ Cloudflare Zero Trust not configured
---
## Next Steps
### Immediate (This Week)
1. **Complete VLAN Planning**
- Finalize VLAN configuration
- Plan migration sequence
- Prepare migration scripts
2. **Deploy Monitoring Stack**
- Prometheus
- Grafana
- Loki
- Alertmanager
3. **Configure Cloudflare Zero Trust**
- Set up cloudflared tunnels
- Publish applications
- Configure access policies
### Short-term (This Month)
1. **VLAN Migration**
- Configure ES216G switches
- Enable VLAN-aware bridge
- Migrate services
2. **CCIP Fleet Deployment**
- Deploy Ops/Admin nodes
- Deploy Commit nodes
- Deploy Execute nodes
- Deploy RMN nodes
3. **NAT Pool Configuration**
- Configure Block #2-6 (when assigned)
- Set up role-based egress NAT
- Test allowlisting
### Long-term (This Quarter)
1. **Sovereign Tenant Rollout**
- Configure tenant VLANs
- Deploy tenant services
- Enforce isolation
2. **High Availability**
- Deploy R630 cluster
- Configure HA for critical services
- Test failover
---
## References
### Architecture
- **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md)** - Complete network architecture
- **[ORCHESTRATION_DEPLOYMENT_GUIDE.md](ORCHESTRATION_DEPLOYMENT_GUIDE.md)** - Deployment guide
- **[VMID_ALLOCATION_FINAL.md](VMID_ALLOCATION_FINAL.md)** - VMID allocation
### Deployment
- **[VALIDATED_SET_DEPLOYMENT_GUIDE.md](VALIDATED_SET_DEPLOYMENT_GUIDE.md)** - Validated set deployment
- **[CCIP_DEPLOYMENT_SPEC.md](CCIP_DEPLOYMENT_SPEC.md)** - CCIP deployment
- **[DEPLOYMENT_READINESS.md](DEPLOYMENT_READINESS.md)** - Deployment readiness
### Operations
- **[OPERATIONAL_RUNBOOKS.md](OPERATIONAL_RUNBOOKS.md)** - Operational runbooks
- **[TROUBLESHOOTING_FAQ.md](TROUBLESHOOTING_FAQ.md)** - Troubleshooting guide
---
**Document Status:** Active
**Maintained By:** Infrastructure Team
**Review Cycle:** Weekly
**Last Updated:** 2025-01-20

View File

@@ -0,0 +1,351 @@
# Operational Runbooks - Master Index
**Last Updated:** 2025-01-20
**Document Version:** 1.0
---
## Overview
This document provides a master index of all operational runbooks and procedures for the Sankofa/Phoenix/PanTel Proxmox deployment.
---
## Quick Reference
### Emergency Procedures
- **[Emergency Access](#emergency-access)** - Break-glass access procedures
- **[Service Recovery](#service-recovery)** - Recovering failed services
- **[Network Recovery](#network-recovery)** - Network connectivity issues
### Common Operations
- **[Adding a Validator](#adding-a-validator)** - Add new validator node
- **[Removing a Validator](#removing-a-validator)** - Remove validator node
- **[Upgrading Besu](#upgrading-besu)** - Besu version upgrade
- **[Key Rotation](#key-rotation)** - Validator key rotation
---
## Network Operations
### ER605 Router Configuration
- **[ER605_ROUTER_CONFIGURATION.md](ER605_ROUTER_CONFIGURATION.md)** - Complete router configuration guide
- **VLAN Configuration** - Setting up VLANs on ER605
- **NAT Pool Configuration** - Configuring role-based egress NAT
- **Failover Configuration** - Setting up WAN failover
### VLAN Management
- **VLAN Migration** - Migrating from flat LAN to VLANs
- **VLAN Troubleshooting** - Common VLAN issues and solutions
- **Inter-VLAN Routing** - Configuring routing between VLANs
### Cloudflare Zero Trust
- **[CLOUDFLARE_ZERO_TRUST_GUIDE.md](CLOUDFLARE_ZERO_TRUST_GUIDE.md)** - Complete Cloudflare setup
- **Tunnel Management** - Managing cloudflared tunnels
- **Application Publishing** - Publishing applications via Cloudflare Access
- **Access Policy Management** - Managing access policies
---
## Besu Operations
### Node Management
#### Adding a Validator
**Prerequisites:**
- Validator key generated
- VMID allocated (1000-1499 range)
- VLAN 110 configured (if migrated)
**Steps:**
1. Create LXC container with VMID
2. Install Besu
3. Configure validator key
4. Add to static-nodes.json on all nodes
5. Update allowlist (if using permissioning)
6. Start Besu service
7. Verify validator is participating
**See:** [VALIDATED_SET_DEPLOYMENT_GUIDE.md](VALIDATED_SET_DEPLOYMENT_GUIDE.md)
#### Removing a Validator
**Prerequisites:**
- Validator is not critical (check quorum requirements)
- Backup validator key
**Steps:**
1. Stop Besu service
2. Remove from static-nodes.json on all nodes
3. Update allowlist (if using permissioning)
4. Remove container (optional)
5. Document removal
#### Upgrading Besu
**Prerequisites:**
- Backup current configuration
- Test upgrade in dev environment
- Create snapshot before upgrade
**Steps:**
1. Create snapshot: `pct snapshot <vmid> pre-upgrade-$(date +%Y%m%d)`
2. Stop Besu service
3. Backup configuration and keys
4. Install new Besu version
5. Update configuration if needed
6. Start Besu service
7. Verify node is syncing
8. Monitor for issues
**Rollback:**
- If issues occur: `pct rollback <vmid> pre-upgrade-YYYYMMDD`
### Allowlist Management
- **[BESU_ALLOWLIST_RUNBOOK.md](BESU_ALLOWLIST_RUNBOOK.md)** - Complete allowlist guide
- **[BESU_ALLOWLIST_QUICK_START.md](BESU_ALLOWLIST_QUICK_START.md)** - Quick start for allowlist issues
**Common Operations:**
- Generate allowlist from nodekeys
- Update allowlist on all nodes
- Verify allowlist is correct
- Troubleshoot allowlist issues
### Consensus Troubleshooting
- **[QBFT_TROUBLESHOOTING.md](QBFT_TROUBLESHOOTING.md)** - QBFT consensus troubleshooting
- **Block Production Issues** - Troubleshooting block production
- **Validator Recognition** - Validator not being recognized
---
## CCIP Operations
### CCIP Deployment
- **[CCIP_DEPLOYMENT_SPEC.md](CCIP_DEPLOYMENT_SPEC.md)** - Complete CCIP deployment specification
- **[ORCHESTRATION_DEPLOYMENT_GUIDE.md](ORCHESTRATION_DEPLOYMENT_GUIDE.md)** - Deployment orchestration
**Deployment Phases:**
1. Deploy Ops/Admin nodes (5400-5401)
2. Deploy Monitoring nodes (5402-5403)
3. Deploy Commit nodes (5410-5425)
4. Deploy Execute nodes (5440-5455)
5. Deploy RMN nodes (5470-5476)
### CCIP Node Management
- **Adding CCIP Node** - Add new CCIP node to fleet
- **Removing CCIP Node** - Remove CCIP node from fleet
- **CCIP Node Troubleshooting** - Common CCIP issues
---
## Monitoring & Observability
### Monitoring Setup
- **[MONITORING_SUMMARY.md](MONITORING_SUMMARY.md)** - Monitoring setup
- **[BLOCK_PRODUCTION_MONITORING.md](BLOCK_PRODUCTION_MONITORING.md)** - Block production monitoring
**Components:**
- Prometheus metrics collection
- Grafana dashboards
- Loki log aggregation
- Alertmanager alerting
### Health Checks
- **Node Health Checks** - Check individual node health
- **Service Health Checks** - Check service status
- **Network Health Checks** - Check network connectivity
**Scripts:**
- `check-node-health.sh` - Node health check script
- `check-service-status.sh` - Service status check
---
## Backup & Recovery
### Backup Procedures
- **Configuration Backup** - Backup all configuration files
- **Validator Key Backup** - Encrypted backup of validator keys
- **Container Backup** - Backup container configurations
**Automated Backups:**
- Scheduled daily backups
- Encrypted storage
- Multiple locations
- 30-day retention
### Disaster Recovery
- **Service Recovery** - Recover failed services
- **Network Recovery** - Recover network connectivity
- **Full System Recovery** - Complete system recovery
**Recovery Procedures:**
1. Identify failure point
2. Restore from backup
3. Verify service status
4. Monitor for issues
---
## Security Operations
### Key Management
- **[SECRETS_KEYS_CONFIGURATION.md](SECRETS_KEYS_CONFIGURATION.md)** - Secrets and keys management
- **Validator Key Rotation** - Rotate validator keys
- **API Token Rotation** - Rotate API tokens
### Access Control
- **SSH Key Management** - Manage SSH keys
- **Cloudflare Access** - Manage Cloudflare Access policies
- **Firewall Rules** - Manage firewall rules
---
## Troubleshooting
### Common Issues
- **[TROUBLESHOOTING_FAQ.md](TROUBLESHOOTING_FAQ.md)** - Common issues and solutions
- **[QBFT_TROUBLESHOOTING.md](QBFT_TROUBLESHOOTING.md)** - QBFT troubleshooting
- **[BESU_ALLOWLIST_QUICK_START.md](BESU_ALLOWLIST_QUICK_START.md)** - Allowlist troubleshooting
### Diagnostic Procedures
1. **Check Service Status**
```bash
systemctl status besu-validator
```
2. **Check Logs**
```bash
journalctl -u besu-validator -f
```
3. **Check Network Connectivity**
```bash
ping <node-ip>
```
4. **Check Node Health**
```bash
./scripts/health/check-node-health.sh <vmid>
```
---
## Emergency Procedures
### Emergency Access
**Break-glass Access:**
1. Use emergency SSH endpoint (if configured)
2. Access via Cloudflare Access (if available)
3. Physical console access (last resort)
**Emergency Contacts:**
- Infrastructure Team: [contact info]
- On-call Engineer: [contact info]
### Service Recovery
**Priority Order:**
1. Validators (critical for consensus)
2. RPC nodes (critical for access)
3. Monitoring (important for visibility)
4. Other services
**Recovery Steps:**
1. Identify failed service
2. Check service logs
3. Restart service
4. If restart fails, restore from backup
5. Verify service is operational
### Network Recovery
**Network Issues:**
1. Check ER605 router status
2. Check switch status
3. Check VLAN configuration
4. Check firewall rules
5. Test connectivity
**VLAN Issues:**
1. Verify VLAN configuration on switches
2. Verify VLAN configuration on ER605
3. Verify Proxmox bridge configuration
4. Test inter-VLAN routing
---
## Maintenance Windows
### Scheduled Maintenance
- **Weekly:** Health checks, log review
- **Monthly:** Security updates, configuration review
- **Quarterly:** Full system review, backup testing
### Maintenance Procedures
1. **Notify Stakeholders** - Send maintenance notification
2. **Create Snapshots** - Snapshot all containers before changes
3. **Perform Maintenance** - Execute maintenance tasks
4. **Verify Services** - Verify all services are operational
5. **Document Changes** - Document all changes made
---
## Related Documentation
### Troubleshooting
- **[TROUBLESHOOTING_FAQ.md](TROUBLESHOOTING_FAQ.md)** - Common issues and solutions - **Start here for problems**
- **[QBFT_TROUBLESHOOTING.md](QBFT_TROUBLESHOOTING.md)** - QBFT consensus troubleshooting
- **[BESU_ALLOWLIST_QUICK_START.md](BESU_ALLOWLIST_QUICK_START.md)** - Allowlist troubleshooting
### Architecture & Design
- **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md)** - Network architecture
- **[ORCHESTRATION_DEPLOYMENT_GUIDE.md](ORCHESTRATION_DEPLOYMENT_GUIDE.md)** - Deployment guide
- **[VMID_ALLOCATION_FINAL.md](VMID_ALLOCATION_FINAL.md)** - VMID allocation
### Configuration
- **[ER605_ROUTER_CONFIGURATION.md](ER605_ROUTER_CONFIGURATION.md)** - Router configuration
- **[CLOUDFLARE_ZERO_TRUST_GUIDE.md](CLOUDFLARE_ZERO_TRUST_GUIDE.md)** - Cloudflare setup
- **[SECRETS_KEYS_CONFIGURATION.md](SECRETS_KEYS_CONFIGURATION.md)** - Secrets management
### Deployment
- **[VALIDATED_SET_DEPLOYMENT_GUIDE.md](VALIDATED_SET_DEPLOYMENT_GUIDE.md)** - Validated set deployment
- **[CCIP_DEPLOYMENT_SPEC.md](CCIP_DEPLOYMENT_SPEC.md)** - CCIP deployment
- **[DEPLOYMENT_READINESS.md](DEPLOYMENT_READINESS.md)** - Deployment readiness
- **[DEPLOYMENT_STATUS_CONSOLIDATED.md](DEPLOYMENT_STATUS_CONSOLIDATED.md)** - Current deployment status
### Monitoring
- **[MONITORING_SUMMARY.md](MONITORING_SUMMARY.md)** - Monitoring setup
- **[BLOCK_PRODUCTION_MONITORING.md](BLOCK_PRODUCTION_MONITORING.md)** - Block production monitoring
### Reference
- **[MASTER_INDEX.md](MASTER_INDEX.md)** - Complete documentation index
---
**Document Status:** Active
**Maintained By:** Infrastructure Team
**Review Cycle:** Monthly
**Last Updated:** 2025-01-20

View File

@@ -0,0 +1,28 @@
# Deployment & Operations
This directory contains deployment guides and operational procedures.
## Documents
- **[ORCHESTRATION_DEPLOYMENT_GUIDE.md](ORCHESTRATION_DEPLOYMENT_GUIDE.md)** ⭐⭐⭐ - Complete enterprise deployment orchestration
- **[VALIDATED_SET_DEPLOYMENT_GUIDE.md](VALIDATED_SET_DEPLOYMENT_GUIDE.md)** ⭐⭐⭐ - Validated set deployment procedures
- **[OPERATIONAL_RUNBOOKS.md](OPERATIONAL_RUNBOOKS.md)** ⭐⭐⭐ - All operational procedures
- **[DEPLOYMENT_READINESS.md](DEPLOYMENT_READINESS.md)** ⭐⭐ - Pre-deployment validation checklist
- **[DEPLOYMENT_STATUS_CONSOLIDATED.md](DEPLOYMENT_STATUS_CONSOLIDATED.md)** ⭐⭐⭐ - Current deployment status
- **[RUN_DEPLOYMENT.md](RUN_DEPLOYMENT.md)** ⭐⭐ - Deployment execution guide
- **[REMOTE_DEPLOYMENT.md](REMOTE_DEPLOYMENT.md)** ⭐ - Remote deployment procedures
## Quick Reference
**Deployment Paths:**
- **Enterprise Deployment:** Start with ORCHESTRATION_DEPLOYMENT_GUIDE.md
- **Validated Set:** Start with VALIDATED_SET_DEPLOYMENT_GUIDE.md
- **Operations:** See OPERATIONAL_RUNBOOKS.md for all procedures
## Related Documentation
- **[../02-architecture/](../02-architecture/)** - Architecture reference
- **[../04-configuration/](../04-configuration/)** - Configuration guides
- **[../09-troubleshooting/](../09-troubleshooting/)** - Troubleshooting guides
- **[../10-best-practices/](../10-best-practices/)** - Best practices

View File

@@ -0,0 +1,189 @@
# Remote Deployment Guide
## Issue: Deployment Scripts Require Proxmox Host Access
The deployment scripts (`deploy-all.sh`, etc.) are designed to run **ON the Proxmox host** because they use the `pct` command-line tool, which is only available on Proxmox hosts.
**Error you encountered:**
```
[ERROR] pct command not found. This script must be run on Proxmox host.
```
---
## Solutions
### Option 1: Copy to Proxmox Host (Recommended)
**Best approach:** Copy the deployment package to the Proxmox host and run it there.
#### Step 1: Copy Deployment Package
```bash
# From your local machine
cd /home/intlc/projects/proxmox
# Copy to Proxmox host
scp -r smom-dbis-138-proxmox root@192.168.11.10:/opt/
```
#### Step 2: SSH to Proxmox Host
```bash
ssh root@192.168.11.10
```
#### Step 3: Run Deployment on Host
```bash
cd /opt/smom-dbis-138-proxmox
# Make scripts executable
chmod +x scripts/deployment/*.sh
chmod +x install/*.sh
# Run deployment
./scripts/deployment/deploy-all.sh
```
#### Automated Script
Use the provided script to automate this:
```bash
./scripts/deploy-to-proxmox-host.sh
```
This script will:
1. Copy the deployment package to the Proxmox host
2. SSH into the host
3. Run the deployment automatically
---
### Option 2: Hybrid Approach (API + SSH)
Create containers via API, then configure via SSH.
#### Step 1: Create Containers via API
```bash
# Use the remote deployment script (creates containers via API)
cd smom-dbis-138-proxmox
./scripts/deployment/deploy-remote.sh
```
#### Step 2: Copy Files and Install
```bash
# Copy installation scripts to Proxmox host
scp -r install/ root@192.168.11.10:/opt/smom-dbis-138-proxmox/
# SSH and run installations
ssh root@192.168.11.10
cd /opt/smom-dbis-138-proxmox
# Install in each container
for vmid in 106 107 108 109; do
pct push $vmid install/besu-validator-install.sh /tmp/install.sh
pct exec $vmid -- bash /tmp/install.sh
done
```
---
### Option 3: Use MCP Server Tools
The MCP server provides API-based tools that can create containers remotely.
**Available via MCP:**
- Container creation
- Container management
- Configuration
**Limitations:**
- File upload (`pct push`) still requires local access
- Some operations may need local execution
---
## Why `pct` is Required
The `pct` (Proxmox Container Toolkit) command:
- Is only available on Proxmox hosts
- Provides direct access to container filesystem
- Allows file upload (`pct push`)
- Allows command execution (`pct exec`)
- Is more efficient than API for some operations
**API Alternative:**
- Container creation: ✅ Supported
- Container management: ✅ Supported
- File upload: ⚠️ Limited (requires workarounds)
- Command execution: ✅ Supported (with limitations)
---
## Recommended Workflow
### For Remote Deployment:
1. **Copy Package to Host**
```bash
./scripts/deploy-to-proxmox-host.sh
```
2. **Or Manual Copy:**
```bash
scp -r smom-dbis-138-proxmox root@192.168.11.10:/opt/
ssh root@192.168.11.10
cd /opt/smom-dbis-138-proxmox
./scripts/deployment/deploy-all.sh
```
### For Local Deployment:
If you have direct access to the Proxmox host:
```bash
# On Proxmox host
cd /opt/smom-dbis-138-proxmox
./scripts/deployment/deploy-all.sh
```
---
## Troubleshooting
### Issue: "pct command not found"
**Solution:** Run deployment on Proxmox host, not remotely.
### Issue: "Permission denied"
**Solution:** Run with `sudo` or as `root` user.
### Issue: "Container creation failed"
**Check:**
- API token has proper permissions
- Storage is available
- Template exists
- Sufficient resources
---
## Summary
**Best Practice:** Copy deployment package to Proxmox host and run there.
**Quick Command:**
```bash
./scripts/deploy-to-proxmox-host.sh
```
This automates the entire process of copying and deploying.
---
**Last Updated:** $(date)

View File

@@ -0,0 +1,251 @@
# Run Deployment - Execution Guide
## ✅ Scripts Validated and Ready
All scripts have been validated:
- ✓ Syntax OK
- ✓ Executable permissions set
- ✓ Dependencies present
- ✓ Help/usage messages working
## Quick Start
### Step 1: Copy Scripts to Proxmox Host
**From your local machine:**
```bash
cd /home/intlc/projects/proxmox
./scripts/copy-scripts-to-proxmox.sh
```
This copies all deployment scripts to the Proxmox host at `/opt/smom-dbis-138-proxmox/scripts/`.
### Step 2: Run Deployment on Proxmox Host
**SSH to Proxmox host and execute:**
```bash
# 1. SSH to Proxmox host
ssh root@192.168.11.10
# 2. Navigate to deployment directory
cd /opt/smom-dbis-138-proxmox
# 3. Run complete deployment
sudo ./scripts/deployment/deploy-validated-set.sh \
--source-project /home/intlc/projects/smom-dbis-138
```
**Note**: The source project path must be accessible from the Proxmox host. If the Proxmox host is remote, ensure:
- The directory is mounted/shared, OR
- Configuration files are copied separately to the Proxmox host
```
## Execution Options
### Option 1: Complete Deployment (First Time)
Deploys everything from scratch:
```bash
sudo ./scripts/deployment/deploy-validated-set.sh \
--source-project /path/to/smom-dbis-138
```
**What it does:**
1. Deploys containers
2. Copies configuration files
3. Bootstraps network
4. Validates deployment
### Option 2: Bootstrap Existing Containers
If containers are already deployed:
```bash
sudo ./scripts/network/bootstrap-network.sh
```
Or using the main script:
```bash
sudo ./scripts/deployment/deploy-validated-set.sh \
--skip-deployment \
--skip-config \
--source-project /path/to/smom-dbis-138
```
### Option 3: Validate Only
Just validate the current deployment:
```bash
sudo ./scripts/validation/validate-validator-set.sh
```
### Option 4: Check Node Health
Check health of a specific node:
```bash
# Human-readable output
sudo ./scripts/health/check-node-health.sh 1000
# JSON output (for automation)
sudo ./scripts/health/check-node-health.sh 1000 --json
```
## Expected Output
### Successful Deployment
```
=========================================
Deploy Validated Set - Script-Based Approach
=========================================
=== Pre-Deployment Validation ===
[✓] Prerequisites checked
=========================================
Phase 1: Deploy Containers
=========================================
[INFO] Deploying Besu nodes...
[✓] Besu nodes deployed
=========================================
Phase 2: Copy Configuration Files
=========================================
[INFO] Copying Besu configuration files...
[✓] Configuration files copied
=========================================
Phase 3: Bootstrap Network
=========================================
[INFO] Bootstrapping network...
[INFO] Collecting enodes from validators...
[✓] Network bootstrapped
=========================================
Phase 4: Validate Deployment
=========================================
[INFO] Validating validator set...
[✓] All validators validated successfully!
=========================================
[✓] Deployment Complete!
=========================================
```
## Monitoring During Execution
### Watch Logs in Real-Time
```bash
# In another terminal, watch the log file
tail -f /opt/smom-dbis-138-proxmox/logs/deploy-validated-set-*.log
```
### Check Container Status
```bash
# List all containers
pct list | grep -E "1000|1001|1002|1003|1004|1500|1501|1502|1503|2500|2501|2502"
# Check specific container
pct status 1000
```
### Monitor Service Logs
```bash
# Watch Besu service logs
pct exec 1000 -- journalctl -u besu-validator -f
```
## Troubleshooting
### If Deployment Fails
1. **Check the log file:**
```bash
tail -100 /opt/smom-dbis-138-proxmox/logs/deploy-validated-set-*.log
```
2. **Check container status:**
```bash
pct list
```
3. **Check service status:**
```bash
pct exec <vmid> -- systemctl status besu-validator
```
4. **Review error messages** in the script output
### Common Issues
**Issue: Containers not starting**
- Check resources (RAM, disk)
- Check OS template availability
- Review container logs
**Issue: Configuration copy fails**
- Verify source project path is correct
- Check source files exist
- Verify containers are running
**Issue: Bootstrap fails**
- Ensure containers are running
- Check P2P port (30303) is accessible
- Verify enode extraction works
**Issue: Validation fails**
- Check validator keys exist
- Verify configuration files are present
- Check services are running
## Post-Deployment Verification
After successful deployment, verify:
```bash
# 1. Check all services are running
for vmid in 1000 1001 1002 1003 1004 1500 1501 1502 1503 2500 2501 2502; do
echo "=== Container $vmid ==="
pct exec $vmid -- systemctl status besu-validator besu-sentry besu-rpc --no-pager 2>/dev/null | head -5
done
# 2. Check consensus (block production)
pct exec 2500 -- curl -s -X POST \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
http://localhost:8545 | python3 -m json.tool
# 3. Check peer connections
pct exec 2500 -- curl -s -X POST \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \
http://localhost:8545 | python3 -m json.tool
```
## Success Criteria
Deployment is successful when:
- ✓ All containers are running
- ✓ All services are active
- ✓ Network is bootstrapped (static-nodes.json deployed)
- ✓ Validators are validated
- ✓ Consensus is active (blocks being produced)
- ✓ Nodes can connect to peers
## Next Steps
After successful deployment:
1. Set up monitoring
2. Configure backups
3. Document node endpoints
4. Set up alerting
5. Plan maintenance schedule

View File

@@ -0,0 +1,289 @@
# Validated Set Deployment Guide
Complete guide for deploying a validated Besu node set using the script-based approach.
## Overview
This guide covers deploying a validated set of Besu nodes (validators, sentries, RPC) on Proxmox VE LXC containers using automated scripts. The deployment uses a **script-based approach** with `static-nodes.json` for peer discovery (no boot node required).
## Prerequisites
- Proxmox VE 7.0+ installed
- Root access to Proxmox host
- Sufficient resources (RAM, disk, CPU)
- Network connectivity
- Source project with Besu configuration files
## Deployment Methods
### Method 1: Complete Deployment (Recommended)
Deploy everything from scratch in one command:
```bash
cd /opt/smom-dbis-138-proxmox
sudo ./scripts/deployment/deploy-validated-set.sh \
--source-project /path/to/smom-dbis-138
```
**What this does:**
1. Deploys all containers (validators, sentries, RPC)
2. Copies configuration files from source project
3. Bootstraps the network (generates and deploys static-nodes.json)
4. Validates the deployment
### Method 2: Step-by-Step Deployment
If you prefer more control, deploy step by step:
```bash
# Step 1: Deploy containers
sudo ./scripts/deployment/deploy-besu-nodes.sh
# Step 2: Copy configuration files
SOURCE_PROJECT=/path/to/smom-dbis-138 \
./scripts/copy-besu-config.sh
# Step 3: Bootstrap network
sudo ./scripts/network/bootstrap-network.sh
# Step 4: Validate validators
sudo ./scripts/validation/validate-validator-set.sh
```
### Method 3: Bootstrap Existing Containers
If containers are already deployed and configured:
```bash
# Quick bootstrap (just network bootstrap)
sudo ./scripts/deployment/bootstrap-quick.sh
# Or use the full script with skip options
sudo ./scripts/deployment/deploy-validated-set.sh \
--skip-deployment \
--skip-config \
--source-project /path/to/smom-dbis-138
```
## Detailed Steps
### Step 1: Prepare Source Project
Ensure your source project has the required files:
```
smom-dbis-138/
├── config/
│ ├── genesis.json
│ ├── permissions-nodes.toml
│ ├── permissions-accounts.toml
│ ├── static-nodes.json (will be generated/updated)
│ ├── config-validator.toml
│ ├── config-sentry.toml
│ └── config-rpc-public.toml
└── keys/
└── validators/
├── validator-1/
├── validator-2/
├── validator-3/
├── validator-4/
└── validator-5/
```
### Step 2: Review Configuration
Check your deployment configuration:
```bash
cat config/proxmox.conf
cat config/network.conf
```
Key settings:
- `VALIDATOR_START`, `VALIDATOR_COUNT` - Validator VMID range
- `SENTRY_START`, `SENTRY_COUNT` - Sentry VMID range
- `RPC_START`, `RPC_COUNT` - RPC VMID range
- `CONTAINER_OS_TEMPLATE` - OS template to use
### Step 3: Run Deployment
Execute the deployment script:
```bash
sudo ./scripts/deployment/deploy-validated-set.sh \
--source-project /path/to/smom-dbis-138
```
### Step 4: Monitor Progress
The script will output progress for each phase:
```
=========================================
Phase 1: Deploy Containers
=========================================
[INFO] Deploying Besu nodes...
[✓] Besu nodes deployed
=========================================
Phase 2: Copy Configuration Files
=========================================
[INFO] Copying Besu configuration files...
[✓] Configuration files copied
=========================================
Phase 3: Bootstrap Network
=========================================
[INFO] Bootstrapping network...
[INFO] Collecting enodes from validators...
[✓] Network bootstrapped
=========================================
Phase 4: Validate Deployment
=========================================
[INFO] Validating validator set...
[✓] All validators validated successfully!
```
### Step 5: Verify Deployment
After deployment completes, verify everything is working:
```bash
# Check all containers are running
pct list | grep -E "1000|1001|1002|1003|1004|1500|1501|1502|1503|2500|2501|2502"
# Check service status
for vmid in 1000 1001 1002 1003 1004; do
echo "=== Validator $vmid ==="
pct exec $vmid -- systemctl status besu-validator --no-pager -l
done
# Check consensus is active (blocks being produced)
pct exec 2500 -- curl -s -X POST \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
http://localhost:8545 | python3 -m json.tool
```
## Health Checks
### Check Individual Node Health
```bash
# Human-readable output
sudo ./scripts/health/check-node-health.sh 1000
# JSON output (for automation)
sudo ./scripts/health/check-node-health.sh 1000 --json
```
### Validate Validator Set
```bash
sudo ./scripts/validation/validate-validator-set.sh
```
This checks:
- Container and service status
- Validator keys exist and are accessible
- Configuration files are present
- Consensus participation
## Troubleshooting
### Containers Won't Start
```bash
# Check container status
pct status <vmid>
# View container console
pct console <vmid>
# Check logs
pct exec <vmid> -- journalctl -xe
```
### Services Won't Start
```bash
# Check service status
pct exec <vmid> -- systemctl status besu-validator
# View service logs
pct exec <vmid> -- journalctl -u besu-validator -f
# Check configuration
pct exec <vmid> -- cat /etc/besu/config-validator.toml
```
### Network Connectivity Issues
```bash
# Check P2P port is listening
pct exec <vmid> -- netstat -tuln | grep 30303
# Check peer connections (if RPC enabled)
pct exec <vmid> -- curl -s -X POST \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \
http://localhost:8545
# Verify static-nodes.json
pct exec <vmid> -- cat /etc/besu/static-nodes.json
```
### Consensus Issues
```bash
# Check validator is participating
pct exec <vmid> -- journalctl -u besu-validator --no-pager | grep -i "consensus\|qbft\|proposing"
# Verify validator keys
pct exec <vmid> -- ls -la /keys/validators/
# Check genesis file
pct exec <vmid> -- cat /etc/besu/genesis.json | python3 -m json.tool
```
## Rollback
If deployment fails, you can remove containers:
```bash
# Remove specific containers
for vmid in 1000 1001 1002 1003 1004 1500 1501 1502 1503 2500 2501 2502; do
pct stop $vmid 2>/dev/null || true
pct destroy $vmid 2>/dev/null || true
done
```
Then re-run the deployment after fixing any issues.
## Post-Deployment
After successful deployment:
1. **Monitor Logs**: Keep an eye on service logs for the first few hours
2. **Verify Consensus**: Ensure blocks are being produced
3. **Check Resources**: Monitor CPU, memory, and disk usage
4. **Network Health**: Verify all nodes are connected
5. **Backup**: Consider creating snapshots of working containers
## Next Steps
- Set up monitoring (Prometheus, Grafana)
- Configure backups
- Document node endpoints
- Set up alerting
- Plan for maintenance windows
## Additional Resources
- [Besu Nodes File Reference](BESU_NODES_FILE_REFERENCE.md)
- [Network Bootstrap Guide](NETWORK_BOOTSTRAP_GUIDE.md)
- [Boot Node Runbook](BOOT_NODE_RUNBOOK.md) (if using boot node)
- [Besu Allowlist Runbook](BESU_ALLOWLIST_RUNBOOK.md)