Refactor code for improved readability and performance
This commit is contained in:
284
docs/03-deployment/DEPLOYMENT_READINESS.md
Normal file
284
docs/03-deployment/DEPLOYMENT_READINESS.md
Normal file
@@ -0,0 +1,284 @@
|
||||
# Deployment Readiness Checklist
|
||||
|
||||
**Target:** ml110-01 (192.168.11.10)
|
||||
**Status:** ✅ **READY FOR DEPLOYMENT**
|
||||
**Date:** $(date)
|
||||
|
||||
---
|
||||
|
||||
## ✅ Pre-Deployment Validation
|
||||
|
||||
### System Prerequisites
|
||||
- [x] Node.js 16+ installed (v22.21.1) ✅
|
||||
- [x] pnpm 8+ installed (10.24.0) ✅
|
||||
- [x] Git installed (2.43.0) ✅
|
||||
- [x] Required tools (curl, jq, bash) ✅
|
||||
|
||||
### Workspace Setup
|
||||
- [x] Project structure organized ✅
|
||||
- [x] All submodules initialized ✅
|
||||
- [x] All dependencies installed ✅
|
||||
- [x] Scripts directory organized ✅
|
||||
- [x] Documentation organized ✅
|
||||
|
||||
### Configuration
|
||||
- [x] `.env` file configured ✅
|
||||
- [x] PROXMOX_HOST set (192.168.11.10) ✅
|
||||
- [x] PROXMOX_USER set (root@pam) ✅
|
||||
- [x] PROXMOX_TOKEN_NAME set (mcp-server) ✅
|
||||
- [x] PROXMOX_TOKEN_VALUE configured ✅
|
||||
- [x] API connection verified ✅
|
||||
- [x] Deployment configs created ✅
|
||||
|
||||
### Validation Results
|
||||
- [x] Prerequisites: 32/32 passing (100%) ✅
|
||||
- [x] Deployment validation: 41/41 passing (100%) ✅
|
||||
- [x] API connection: Working (Proxmox 9.1.1) ✅
|
||||
- [x] Storage accessible ✅
|
||||
- [x] Templates accessible ✅
|
||||
- [x] No VMID conflicts ✅
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Deployment Steps
|
||||
|
||||
### Step 1: Review Configuration
|
||||
|
||||
```bash
|
||||
# Review deployment configuration
|
||||
cat smom-dbis-138-proxmox/config/proxmox.conf
|
||||
cat smom-dbis-138-proxmox/config/network.conf
|
||||
```
|
||||
|
||||
**Key Settings:**
|
||||
- Target Node: Auto-detected from Proxmox
|
||||
- Storage: local-lvm (or configured storage)
|
||||
- Network: 10.3.1.0/24
|
||||
- VMID Ranges: Configured (106-153)
|
||||
|
||||
### Step 2: Verify Resources
|
||||
|
||||
**Estimated Requirements:**
|
||||
- Memory: ~96GB
|
||||
- Disk: ~1.35TB
|
||||
- CPU: ~42 cores (can be shared)
|
||||
|
||||
**Current Status:**
|
||||
- Check available resources on ml110-01
|
||||
- Ensure sufficient capacity for deployment
|
||||
|
||||
### Step 3: Run Deployment
|
||||
|
||||
**Option A: Deploy Everything (Recommended)**
|
||||
```bash
|
||||
cd smom-dbis-138-proxmox
|
||||
sudo ./scripts/deployment/deploy-all.sh
|
||||
```
|
||||
|
||||
**Option B: Deploy Step-by-Step**
|
||||
```bash
|
||||
cd smom-dbis-138-proxmox
|
||||
|
||||
# 1. Deploy Besu nodes
|
||||
sudo ./scripts/deployment/deploy-besu-nodes.sh
|
||||
|
||||
# 2. Deploy services
|
||||
sudo ./scripts/deployment/deploy-services.sh
|
||||
|
||||
# 3. Deploy Hyperledger services
|
||||
sudo ./scripts/deployment/deploy-hyperledger-services.sh
|
||||
|
||||
# 4. Deploy monitoring
|
||||
sudo ./scripts/deployment/deploy-monitoring.sh
|
||||
|
||||
# 5. Deploy explorer
|
||||
sudo ./scripts/deployment/deploy-explorer.sh
|
||||
```
|
||||
|
||||
### Step 4: Post-Deployment
|
||||
|
||||
After containers are created:
|
||||
|
||||
1. **Copy Configuration Files**
|
||||
```bash
|
||||
# Copy genesis.json and configs to containers
|
||||
# (Adjust paths as needed)
|
||||
```
|
||||
|
||||
2. **Copy Validator Keys**
|
||||
```bash
|
||||
# Copy keys to validator containers only
|
||||
```
|
||||
|
||||
3. **Update Static Nodes**
|
||||
```bash
|
||||
./scripts/network/update-static-nodes.sh
|
||||
```
|
||||
|
||||
4. **Start Services**
|
||||
```bash
|
||||
# Start Besu services in containers
|
||||
```
|
||||
|
||||
5. **Verify Deployment**
|
||||
```bash
|
||||
# Check container status
|
||||
# Verify network connectivity
|
||||
# Test RPC endpoints
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📋 Deployment Components
|
||||
|
||||
### Phase 1: Blockchain Core (Besu)
|
||||
- **Validators** (VMID 106-109): 4 nodes
|
||||
- **Sentries** (VMID 110-114): 3 nodes
|
||||
- **RPC Nodes** (VMID 115-119): 3 nodes
|
||||
|
||||
### Phase 2: Services
|
||||
- **Oracle Publisher** (VMID 120)
|
||||
- **CCIP Monitor** (VMID 121)
|
||||
- **Keeper** (VMID 122)
|
||||
- **Financial Tokenization** (VMID 123)
|
||||
|
||||
### Phase 3: Hyperledger Services
|
||||
- **Firefly** (VMID 150)
|
||||
- **Cacti** (VMID 151)
|
||||
- **Fabric** (VMID 152) - Optional
|
||||
- **Indy** (VMID 153) - Optional
|
||||
|
||||
### Phase 4: Monitoring
|
||||
- **Monitoring Stack** (VMID 130)
|
||||
|
||||
### Phase 5: Explorer
|
||||
- **Blockscout** (VMID 140)
|
||||
|
||||
**Total Containers:** ~20-25 containers
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ Important Notes
|
||||
|
||||
### Resource Considerations
|
||||
- Memory warning: Estimated ~96GB needed, verify available capacity
|
||||
- Disk space: ~1.35TB estimated, ensure sufficient storage
|
||||
- CPU: Can be shared, but ensure adequate cores
|
||||
|
||||
### Network Configuration
|
||||
- Subnet: 10.3.1.0/24
|
||||
- Gateway: 10.3.1.1
|
||||
- VLANs: Configured per node type
|
||||
|
||||
### Security
|
||||
- API token configured and working
|
||||
- Containers will be created with proper permissions
|
||||
- Network isolation via VLANs
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Verification Commands
|
||||
|
||||
### Check Deployment Status
|
||||
```bash
|
||||
# List all containers
|
||||
pct list
|
||||
|
||||
# Check specific container
|
||||
pct status <vmid>
|
||||
|
||||
# View container config
|
||||
pct config <vmid>
|
||||
```
|
||||
|
||||
### Test Connectivity
|
||||
```bash
|
||||
# Test RPC endpoint
|
||||
curl -X POST http://10.3.1.40:8545 \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'
|
||||
```
|
||||
|
||||
### Monitor Resources
|
||||
```bash
|
||||
# Check node resources
|
||||
pvesh get /nodes/<node>/status
|
||||
|
||||
# Check storage
|
||||
pvesh get /nodes/<node>/storage
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Deployment Timeline
|
||||
|
||||
**Estimated Time:**
|
||||
- Besu nodes: ~30-60 minutes
|
||||
- Services: ~15-30 minutes
|
||||
- Hyperledger: ~30-45 minutes
|
||||
- Monitoring: ~15-20 minutes
|
||||
- Explorer: ~20-30 minutes
|
||||
- **Total: ~2-3 hours** (depending on resources)
|
||||
|
||||
---
|
||||
|
||||
## 🆘 Troubleshooting
|
||||
|
||||
### If Deployment Fails
|
||||
|
||||
1. **Check Logs**
|
||||
```bash
|
||||
tail -f smom-dbis-138-proxmox/logs/deployment-*.log
|
||||
```
|
||||
|
||||
2. **Verify Resources**
|
||||
```bash
|
||||
./scripts/validate-ml110-deployment.sh
|
||||
```
|
||||
|
||||
3. **Check API Connection**
|
||||
```bash
|
||||
./scripts/test-connection.sh
|
||||
```
|
||||
|
||||
4. **Review Configuration**
|
||||
```bash
|
||||
cat smom-dbis-138-proxmox/config/proxmox.conf
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ✅ Final Checklist
|
||||
|
||||
Before starting deployment:
|
||||
|
||||
- [x] All prerequisites met
|
||||
- [x] Configuration reviewed
|
||||
- [x] Resources verified
|
||||
- [x] API connection working
|
||||
- [x] Storage accessible
|
||||
- [x] Templates available
|
||||
- [x] No VMID conflicts
|
||||
- [ ] Backup plan in place (recommended)
|
||||
- [ ] Deployment window scheduled (if production)
|
||||
- [ ] Team notified (if applicable)
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Ready to Deploy
|
||||
|
||||
**Status:** ✅ **ALL SYSTEMS GO**
|
||||
|
||||
All validations passed. The system is ready for deployment to ml110-01.
|
||||
|
||||
**Next Command:**
|
||||
```bash
|
||||
cd smom-dbis-138-proxmox && sudo ./scripts/deployment/deploy-all.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** $(date)
|
||||
**Validation Status:** ✅ Complete
|
||||
**Deployment Status:** ✅ Ready
|
||||
|
||||
258
docs/03-deployment/DEPLOYMENT_STATUS_CONSOLIDATED.md
Normal file
258
docs/03-deployment/DEPLOYMENT_STATUS_CONSOLIDATED.md
Normal file
@@ -0,0 +1,258 @@
|
||||
# Deployment Status - Consolidated
|
||||
|
||||
**Last Updated:** 2025-01-20
|
||||
**Document Version:** 2.0
|
||||
**Status:** Active Deployment
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This document consolidates all deployment status information into a single authoritative source. It replaces multiple status documents with one comprehensive view.
|
||||
|
||||
---
|
||||
|
||||
## Current Deployment Status
|
||||
|
||||
### Proxmox Host: ml110 (192.168.11.10)
|
||||
|
||||
**Status:** ✅ Operational
|
||||
|
||||
### Active Containers
|
||||
|
||||
| VMID | Hostname | Status | IP Address | VLAN | Service Status | Notes |
|
||||
|------|----------|--------|------------|------|----------------|-------|
|
||||
| 1000 | besu-validator-1 | ✅ Running | 192.168.11.100 | 11 (mgmt) | ✅ Active | Static IP |
|
||||
| 1001 | besu-validator-2 | ✅ Running | 192.168.11.101 | 11 (mgmt) | ✅ Active | Static IP |
|
||||
| 1002 | besu-validator-3 | ✅ Running | 192.168.11.102 | 11 (mgmt) | ✅ Active | Static IP |
|
||||
| 1003 | besu-validator-4 | ✅ Running | 192.168.11.103 | 11 (mgmt) | ✅ Active | Static IP |
|
||||
| 1004 | besu-validator-5 | ✅ Running | 192.168.11.104 | 11 (mgmt) | ✅ Active | Static IP |
|
||||
| 1500 | besu-sentry-1 | ✅ Running | 192.168.11.150 | 11 (mgmt) | ✅ Active | Static IP |
|
||||
| 1501 | besu-sentry-2 | ✅ Running | 192.168.11.151 | 11 (mgmt) | ✅ Active | Static IP |
|
||||
| 1502 | besu-sentry-3 | ✅ Running | 192.168.11.152 | 11 (mgmt) | ✅ Active | Static IP |
|
||||
| 1503 | besu-sentry-4 | ✅ Running | 192.168.11.153 | 11 (mgmt) | ✅ Active | Static IP |
|
||||
| 2500 | besu-rpc-1 | ✅ Running | 192.168.11.250 | 11 (mgmt) | ✅ Active | Static IP |
|
||||
| 2501 | besu-rpc-2 | ✅ Running | 192.168.11.251 | 11 (mgmt) | ✅ Active | Static IP |
|
||||
| 2502 | besu-rpc-3 | ✅ Running | 192.168.11.252 | 11 (mgmt) | ✅ Active | Static IP |
|
||||
|
||||
**Total Active Containers:** 12
|
||||
**Total Memory:** 104GB
|
||||
**Total CPU Cores:** 40 cores
|
||||
|
||||
### Network Status
|
||||
|
||||
**Current Network:** Flat LAN (192.168.11.0/24)
|
||||
**VLAN Migration:** ⏳ Pending
|
||||
**Target Network:** VLAN-based (see [NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md))
|
||||
|
||||
### Service Status
|
||||
|
||||
**Besu Services:**
|
||||
- ✅ 5 Validators: Active
|
||||
- ✅ 4 Sentries: Active
|
||||
- ✅ 3 RPC Nodes: Active
|
||||
|
||||
**Consensus:**
|
||||
- ✅ QBFT consensus operational
|
||||
- ✅ Block production: Normal
|
||||
- ✅ Validator participation: 5/5
|
||||
|
||||
---
|
||||
|
||||
## Deployment Phases
|
||||
|
||||
### Phase 0 — Foundation ✅
|
||||
|
||||
- [x] ER605-A WAN1 configured: 76.53.10.34/28
|
||||
- [x] Proxmox mgmt accessible
|
||||
- [x] Basic containers deployed
|
||||
|
||||
### Phase 1 — VLAN Enablement ⏳
|
||||
|
||||
- [ ] ES216G trunk ports configured
|
||||
- [ ] VLAN-aware bridge enabled on Proxmox
|
||||
- [ ] VLAN interfaces created on ER605
|
||||
- [ ] Services migrated to VLANs
|
||||
|
||||
### Phase 2 — Observability ⏳
|
||||
|
||||
- [ ] Monitoring stack deployed
|
||||
- [ ] Grafana published via Cloudflare Access
|
||||
- [ ] Alerts configured
|
||||
|
||||
### Phase 3 — CCIP Fleet ⏳
|
||||
|
||||
- [ ] CCIP Ops/Admin deployed
|
||||
- [ ] 16 commit nodes deployed
|
||||
- [ ] 16 execute nodes deployed
|
||||
- [ ] 7 RMN nodes deployed
|
||||
- [ ] NAT pools configured
|
||||
|
||||
### Phase 4 — Sovereign Tenants ⏳
|
||||
|
||||
- [ ] Sovereign VLANs configured
|
||||
- [ ] Tenant isolation enforced
|
||||
- [ ] Access control configured
|
||||
|
||||
---
|
||||
|
||||
## Resource Usage
|
||||
|
||||
### Current Resources (ml110)
|
||||
|
||||
| Resource | Allocated | Available | Usage % |
|
||||
|----------|-----------|-----------|---------|
|
||||
| Memory | 104GB | [TBD] | [TBD] |
|
||||
| CPU Cores | 40 | [TBD] | [TBD] |
|
||||
| Disk | ~1.2TB | [TBD] | [TBD] |
|
||||
|
||||
### Planned Resources (R630 Cluster)
|
||||
|
||||
| Node | Memory | CPU | Disk | Status |
|
||||
|------|--------|-----|------|--------|
|
||||
| r630-01 | 512GB | [TBD] | 2×600GB + 6×250GB | ⏳ Pending |
|
||||
| r630-02 | 512GB | [TBD] | 2×600GB + 6×250GB | ⏳ Pending |
|
||||
| r630-03 | 512GB | [TBD] | 2×600GB + 6×250GB | ⏳ Pending |
|
||||
| r630-04 | 512GB | [TBD] | 2×600GB + 6×250GB | ⏳ Pending |
|
||||
|
||||
---
|
||||
|
||||
## Network Architecture
|
||||
|
||||
### Current (Flat LAN)
|
||||
|
||||
- **Network:** 192.168.11.0/24
|
||||
- **Gateway:** 192.168.11.1
|
||||
- **All services:** On same network
|
||||
|
||||
### Target (VLAN-based)
|
||||
|
||||
See **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md)** for complete VLAN plan.
|
||||
|
||||
**Key VLANs:**
|
||||
- VLAN 11: MGMT-LAN (192.168.11.0/24) - Legacy compatibility
|
||||
- VLAN 110: BESU-VAL (10.110.0.0/24) - Validators
|
||||
- VLAN 111: BESU-SEN (10.111.0.0/24) - Sentries
|
||||
- VLAN 112: BESU-RPC (10.112.0.0/24) - RPC nodes
|
||||
- VLAN 132: CCIP-COMMIT (10.132.0.0/24) - CCIP Commit nodes
|
||||
- VLAN 133: CCIP-EXEC (10.133.0.0/24) - CCIP Execute nodes
|
||||
- VLAN 134: CCIP-RMN (10.134.0.0/24) - CCIP RMN nodes
|
||||
|
||||
---
|
||||
|
||||
## Public IP Blocks
|
||||
|
||||
### Block #1 (Configured)
|
||||
|
||||
- **Network:** 76.53.10.32/28
|
||||
- **Gateway:** 76.53.10.33
|
||||
- **ER605 WAN1:** 76.53.10.34
|
||||
- **Usage:** Router WAN + break-glass VIPs
|
||||
|
||||
### Blocks #2-6 (Pending)
|
||||
|
||||
- **Block #2:** CCIP Commit egress NAT pool
|
||||
- **Block #3:** CCIP Execute egress NAT pool
|
||||
- **Block #4:** RMN egress NAT pool
|
||||
- **Block #5:** Sankofa/Phoenix/PanTel service egress
|
||||
- **Block #6:** Sovereign Cloud Band tenant egress
|
||||
|
||||
See **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md)** for details.
|
||||
|
||||
---
|
||||
|
||||
## Known Issues
|
||||
|
||||
### Resolved ✅
|
||||
|
||||
- ✅ VMID 1000 IP configuration fixed (now 192.168.11.100)
|
||||
- ✅ Besu services active (11/12 services running)
|
||||
- ✅ Validator key issues resolved
|
||||
|
||||
### Pending ⏳
|
||||
|
||||
- ⏳ VLAN migration not started
|
||||
- ⏳ CCIP fleet not deployed
|
||||
- ⏳ Monitoring stack not deployed
|
||||
- ⏳ Cloudflare Zero Trust not configured
|
||||
|
||||
---
|
||||
|
||||
## Next Steps
|
||||
|
||||
### Immediate (This Week)
|
||||
|
||||
1. **Complete VLAN Planning**
|
||||
- Finalize VLAN configuration
|
||||
- Plan migration sequence
|
||||
- Prepare migration scripts
|
||||
|
||||
2. **Deploy Monitoring Stack**
|
||||
- Prometheus
|
||||
- Grafana
|
||||
- Loki
|
||||
- Alertmanager
|
||||
|
||||
3. **Configure Cloudflare Zero Trust**
|
||||
- Set up cloudflared tunnels
|
||||
- Publish applications
|
||||
- Configure access policies
|
||||
|
||||
### Short-term (This Month)
|
||||
|
||||
1. **VLAN Migration**
|
||||
- Configure ES216G switches
|
||||
- Enable VLAN-aware bridge
|
||||
- Migrate services
|
||||
|
||||
2. **CCIP Fleet Deployment**
|
||||
- Deploy Ops/Admin nodes
|
||||
- Deploy Commit nodes
|
||||
- Deploy Execute nodes
|
||||
- Deploy RMN nodes
|
||||
|
||||
3. **NAT Pool Configuration**
|
||||
- Configure Block #2-6 (when assigned)
|
||||
- Set up role-based egress NAT
|
||||
- Test allowlisting
|
||||
|
||||
### Long-term (This Quarter)
|
||||
|
||||
1. **Sovereign Tenant Rollout**
|
||||
- Configure tenant VLANs
|
||||
- Deploy tenant services
|
||||
- Enforce isolation
|
||||
|
||||
2. **High Availability**
|
||||
- Deploy R630 cluster
|
||||
- Configure HA for critical services
|
||||
- Test failover
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
### Architecture
|
||||
|
||||
- **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md)** - Complete network architecture
|
||||
- **[ORCHESTRATION_DEPLOYMENT_GUIDE.md](ORCHESTRATION_DEPLOYMENT_GUIDE.md)** - Deployment guide
|
||||
- **[VMID_ALLOCATION_FINAL.md](VMID_ALLOCATION_FINAL.md)** - VMID allocation
|
||||
|
||||
### Deployment
|
||||
|
||||
- **[VALIDATED_SET_DEPLOYMENT_GUIDE.md](VALIDATED_SET_DEPLOYMENT_GUIDE.md)** - Validated set deployment
|
||||
- **[CCIP_DEPLOYMENT_SPEC.md](CCIP_DEPLOYMENT_SPEC.md)** - CCIP deployment
|
||||
- **[DEPLOYMENT_READINESS.md](DEPLOYMENT_READINESS.md)** - Deployment readiness
|
||||
|
||||
### Operations
|
||||
|
||||
- **[OPERATIONAL_RUNBOOKS.md](OPERATIONAL_RUNBOOKS.md)** - Operational runbooks
|
||||
- **[TROUBLESHOOTING_FAQ.md](TROUBLESHOOTING_FAQ.md)** - Troubleshooting guide
|
||||
|
||||
---
|
||||
|
||||
**Document Status:** Active
|
||||
**Maintained By:** Infrastructure Team
|
||||
**Review Cycle:** Weekly
|
||||
**Last Updated:** 2025-01-20
|
||||
|
||||
351
docs/03-deployment/OPERATIONAL_RUNBOOKS.md
Normal file
351
docs/03-deployment/OPERATIONAL_RUNBOOKS.md
Normal file
@@ -0,0 +1,351 @@
|
||||
# Operational Runbooks - Master Index
|
||||
|
||||
**Last Updated:** 2025-01-20
|
||||
**Document Version:** 1.0
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This document provides a master index of all operational runbooks and procedures for the Sankofa/Phoenix/PanTel Proxmox deployment.
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Emergency Procedures
|
||||
|
||||
- **[Emergency Access](#emergency-access)** - Break-glass access procedures
|
||||
- **[Service Recovery](#service-recovery)** - Recovering failed services
|
||||
- **[Network Recovery](#network-recovery)** - Network connectivity issues
|
||||
|
||||
### Common Operations
|
||||
|
||||
- **[Adding a Validator](#adding-a-validator)** - Add new validator node
|
||||
- **[Removing a Validator](#removing-a-validator)** - Remove validator node
|
||||
- **[Upgrading Besu](#upgrading-besu)** - Besu version upgrade
|
||||
- **[Key Rotation](#key-rotation)** - Validator key rotation
|
||||
|
||||
---
|
||||
|
||||
## Network Operations
|
||||
|
||||
### ER605 Router Configuration
|
||||
|
||||
- **[ER605_ROUTER_CONFIGURATION.md](ER605_ROUTER_CONFIGURATION.md)** - Complete router configuration guide
|
||||
- **VLAN Configuration** - Setting up VLANs on ER605
|
||||
- **NAT Pool Configuration** - Configuring role-based egress NAT
|
||||
- **Failover Configuration** - Setting up WAN failover
|
||||
|
||||
### VLAN Management
|
||||
|
||||
- **VLAN Migration** - Migrating from flat LAN to VLANs
|
||||
- **VLAN Troubleshooting** - Common VLAN issues and solutions
|
||||
- **Inter-VLAN Routing** - Configuring routing between VLANs
|
||||
|
||||
### Cloudflare Zero Trust
|
||||
|
||||
- **[CLOUDFLARE_ZERO_TRUST_GUIDE.md](CLOUDFLARE_ZERO_TRUST_GUIDE.md)** - Complete Cloudflare setup
|
||||
- **Tunnel Management** - Managing cloudflared tunnels
|
||||
- **Application Publishing** - Publishing applications via Cloudflare Access
|
||||
- **Access Policy Management** - Managing access policies
|
||||
|
||||
---
|
||||
|
||||
## Besu Operations
|
||||
|
||||
### Node Management
|
||||
|
||||
#### Adding a Validator
|
||||
|
||||
**Prerequisites:**
|
||||
- Validator key generated
|
||||
- VMID allocated (1000-1499 range)
|
||||
- VLAN 110 configured (if migrated)
|
||||
|
||||
**Steps:**
|
||||
1. Create LXC container with VMID
|
||||
2. Install Besu
|
||||
3. Configure validator key
|
||||
4. Add to static-nodes.json on all nodes
|
||||
5. Update allowlist (if using permissioning)
|
||||
6. Start Besu service
|
||||
7. Verify validator is participating
|
||||
|
||||
**See:** [VALIDATED_SET_DEPLOYMENT_GUIDE.md](VALIDATED_SET_DEPLOYMENT_GUIDE.md)
|
||||
|
||||
#### Removing a Validator
|
||||
|
||||
**Prerequisites:**
|
||||
- Validator is not critical (check quorum requirements)
|
||||
- Backup validator key
|
||||
|
||||
**Steps:**
|
||||
1. Stop Besu service
|
||||
2. Remove from static-nodes.json on all nodes
|
||||
3. Update allowlist (if using permissioning)
|
||||
4. Remove container (optional)
|
||||
5. Document removal
|
||||
|
||||
#### Upgrading Besu
|
||||
|
||||
**Prerequisites:**
|
||||
- Backup current configuration
|
||||
- Test upgrade in dev environment
|
||||
- Create snapshot before upgrade
|
||||
|
||||
**Steps:**
|
||||
1. Create snapshot: `pct snapshot <vmid> pre-upgrade-$(date +%Y%m%d)`
|
||||
2. Stop Besu service
|
||||
3. Backup configuration and keys
|
||||
4. Install new Besu version
|
||||
5. Update configuration if needed
|
||||
6. Start Besu service
|
||||
7. Verify node is syncing
|
||||
8. Monitor for issues
|
||||
|
||||
**Rollback:**
|
||||
- If issues occur: `pct rollback <vmid> pre-upgrade-YYYYMMDD`
|
||||
|
||||
### Allowlist Management
|
||||
|
||||
- **[BESU_ALLOWLIST_RUNBOOK.md](BESU_ALLOWLIST_RUNBOOK.md)** - Complete allowlist guide
|
||||
- **[BESU_ALLOWLIST_QUICK_START.md](BESU_ALLOWLIST_QUICK_START.md)** - Quick start for allowlist issues
|
||||
|
||||
**Common Operations:**
|
||||
- Generate allowlist from nodekeys
|
||||
- Update allowlist on all nodes
|
||||
- Verify allowlist is correct
|
||||
- Troubleshoot allowlist issues
|
||||
|
||||
### Consensus Troubleshooting
|
||||
|
||||
- **[QBFT_TROUBLESHOOTING.md](QBFT_TROUBLESHOOTING.md)** - QBFT consensus troubleshooting
|
||||
- **Block Production Issues** - Troubleshooting block production
|
||||
- **Validator Recognition** - Validator not being recognized
|
||||
|
||||
---
|
||||
|
||||
## CCIP Operations
|
||||
|
||||
### CCIP Deployment
|
||||
|
||||
- **[CCIP_DEPLOYMENT_SPEC.md](CCIP_DEPLOYMENT_SPEC.md)** - Complete CCIP deployment specification
|
||||
- **[ORCHESTRATION_DEPLOYMENT_GUIDE.md](ORCHESTRATION_DEPLOYMENT_GUIDE.md)** - Deployment orchestration
|
||||
|
||||
**Deployment Phases:**
|
||||
1. Deploy Ops/Admin nodes (5400-5401)
|
||||
2. Deploy Monitoring nodes (5402-5403)
|
||||
3. Deploy Commit nodes (5410-5425)
|
||||
4. Deploy Execute nodes (5440-5455)
|
||||
5. Deploy RMN nodes (5470-5476)
|
||||
|
||||
### CCIP Node Management
|
||||
|
||||
- **Adding CCIP Node** - Add new CCIP node to fleet
|
||||
- **Removing CCIP Node** - Remove CCIP node from fleet
|
||||
- **CCIP Node Troubleshooting** - Common CCIP issues
|
||||
|
||||
---
|
||||
|
||||
## Monitoring & Observability
|
||||
|
||||
### Monitoring Setup
|
||||
|
||||
- **[MONITORING_SUMMARY.md](MONITORING_SUMMARY.md)** - Monitoring setup
|
||||
- **[BLOCK_PRODUCTION_MONITORING.md](BLOCK_PRODUCTION_MONITORING.md)** - Block production monitoring
|
||||
|
||||
**Components:**
|
||||
- Prometheus metrics collection
|
||||
- Grafana dashboards
|
||||
- Loki log aggregation
|
||||
- Alertmanager alerting
|
||||
|
||||
### Health Checks
|
||||
|
||||
- **Node Health Checks** - Check individual node health
|
||||
- **Service Health Checks** - Check service status
|
||||
- **Network Health Checks** - Check network connectivity
|
||||
|
||||
**Scripts:**
|
||||
- `check-node-health.sh` - Node health check script
|
||||
- `check-service-status.sh` - Service status check
|
||||
|
||||
---
|
||||
|
||||
## Backup & Recovery
|
||||
|
||||
### Backup Procedures
|
||||
|
||||
- **Configuration Backup** - Backup all configuration files
|
||||
- **Validator Key Backup** - Encrypted backup of validator keys
|
||||
- **Container Backup** - Backup container configurations
|
||||
|
||||
**Automated Backups:**
|
||||
- Scheduled daily backups
|
||||
- Encrypted storage
|
||||
- Multiple locations
|
||||
- 30-day retention
|
||||
|
||||
### Disaster Recovery
|
||||
|
||||
- **Service Recovery** - Recover failed services
|
||||
- **Network Recovery** - Recover network connectivity
|
||||
- **Full System Recovery** - Complete system recovery
|
||||
|
||||
**Recovery Procedures:**
|
||||
1. Identify failure point
|
||||
2. Restore from backup
|
||||
3. Verify service status
|
||||
4. Monitor for issues
|
||||
|
||||
---
|
||||
|
||||
## Security Operations
|
||||
|
||||
### Key Management
|
||||
|
||||
- **[SECRETS_KEYS_CONFIGURATION.md](SECRETS_KEYS_CONFIGURATION.md)** - Secrets and keys management
|
||||
- **Validator Key Rotation** - Rotate validator keys
|
||||
- **API Token Rotation** - Rotate API tokens
|
||||
|
||||
### Access Control
|
||||
|
||||
- **SSH Key Management** - Manage SSH keys
|
||||
- **Cloudflare Access** - Manage Cloudflare Access policies
|
||||
- **Firewall Rules** - Manage firewall rules
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
- **[TROUBLESHOOTING_FAQ.md](TROUBLESHOOTING_FAQ.md)** - Common issues and solutions
|
||||
- **[QBFT_TROUBLESHOOTING.md](QBFT_TROUBLESHOOTING.md)** - QBFT troubleshooting
|
||||
- **[BESU_ALLOWLIST_QUICK_START.md](BESU_ALLOWLIST_QUICK_START.md)** - Allowlist troubleshooting
|
||||
|
||||
### Diagnostic Procedures
|
||||
|
||||
1. **Check Service Status**
|
||||
```bash
|
||||
systemctl status besu-validator
|
||||
```
|
||||
|
||||
2. **Check Logs**
|
||||
```bash
|
||||
journalctl -u besu-validator -f
|
||||
```
|
||||
|
||||
3. **Check Network Connectivity**
|
||||
```bash
|
||||
ping <node-ip>
|
||||
```
|
||||
|
||||
4. **Check Node Health**
|
||||
```bash
|
||||
./scripts/health/check-node-health.sh <vmid>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Emergency Procedures
|
||||
|
||||
### Emergency Access
|
||||
|
||||
**Break-glass Access:**
|
||||
1. Use emergency SSH endpoint (if configured)
|
||||
2. Access via Cloudflare Access (if available)
|
||||
3. Physical console access (last resort)
|
||||
|
||||
**Emergency Contacts:**
|
||||
- Infrastructure Team: [contact info]
|
||||
- On-call Engineer: [contact info]
|
||||
|
||||
### Service Recovery
|
||||
|
||||
**Priority Order:**
|
||||
1. Validators (critical for consensus)
|
||||
2. RPC nodes (critical for access)
|
||||
3. Monitoring (important for visibility)
|
||||
4. Other services
|
||||
|
||||
**Recovery Steps:**
|
||||
1. Identify failed service
|
||||
2. Check service logs
|
||||
3. Restart service
|
||||
4. If restart fails, restore from backup
|
||||
5. Verify service is operational
|
||||
|
||||
### Network Recovery
|
||||
|
||||
**Network Issues:**
|
||||
1. Check ER605 router status
|
||||
2. Check switch status
|
||||
3. Check VLAN configuration
|
||||
4. Check firewall rules
|
||||
5. Test connectivity
|
||||
|
||||
**VLAN Issues:**
|
||||
1. Verify VLAN configuration on switches
|
||||
2. Verify VLAN configuration on ER605
|
||||
3. Verify Proxmox bridge configuration
|
||||
4. Test inter-VLAN routing
|
||||
|
||||
---
|
||||
|
||||
## Maintenance Windows
|
||||
|
||||
### Scheduled Maintenance
|
||||
|
||||
- **Weekly:** Health checks, log review
|
||||
- **Monthly:** Security updates, configuration review
|
||||
- **Quarterly:** Full system review, backup testing
|
||||
|
||||
### Maintenance Procedures
|
||||
|
||||
1. **Notify Stakeholders** - Send maintenance notification
|
||||
2. **Create Snapshots** - Snapshot all containers before changes
|
||||
3. **Perform Maintenance** - Execute maintenance tasks
|
||||
4. **Verify Services** - Verify all services are operational
|
||||
5. **Document Changes** - Document all changes made
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
### Troubleshooting
|
||||
- **[TROUBLESHOOTING_FAQ.md](TROUBLESHOOTING_FAQ.md)** - Common issues and solutions - **Start here for problems**
|
||||
- **[QBFT_TROUBLESHOOTING.md](QBFT_TROUBLESHOOTING.md)** - QBFT consensus troubleshooting
|
||||
- **[BESU_ALLOWLIST_QUICK_START.md](BESU_ALLOWLIST_QUICK_START.md)** - Allowlist troubleshooting
|
||||
|
||||
### Architecture & Design
|
||||
- **[NETWORK_ARCHITECTURE.md](NETWORK_ARCHITECTURE.md)** - Network architecture
|
||||
- **[ORCHESTRATION_DEPLOYMENT_GUIDE.md](ORCHESTRATION_DEPLOYMENT_GUIDE.md)** - Deployment guide
|
||||
- **[VMID_ALLOCATION_FINAL.md](VMID_ALLOCATION_FINAL.md)** - VMID allocation
|
||||
|
||||
### Configuration
|
||||
- **[ER605_ROUTER_CONFIGURATION.md](ER605_ROUTER_CONFIGURATION.md)** - Router configuration
|
||||
- **[CLOUDFLARE_ZERO_TRUST_GUIDE.md](CLOUDFLARE_ZERO_TRUST_GUIDE.md)** - Cloudflare setup
|
||||
- **[SECRETS_KEYS_CONFIGURATION.md](SECRETS_KEYS_CONFIGURATION.md)** - Secrets management
|
||||
|
||||
### Deployment
|
||||
- **[VALIDATED_SET_DEPLOYMENT_GUIDE.md](VALIDATED_SET_DEPLOYMENT_GUIDE.md)** - Validated set deployment
|
||||
- **[CCIP_DEPLOYMENT_SPEC.md](CCIP_DEPLOYMENT_SPEC.md)** - CCIP deployment
|
||||
- **[DEPLOYMENT_READINESS.md](DEPLOYMENT_READINESS.md)** - Deployment readiness
|
||||
- **[DEPLOYMENT_STATUS_CONSOLIDATED.md](DEPLOYMENT_STATUS_CONSOLIDATED.md)** - Current deployment status
|
||||
|
||||
### Monitoring
|
||||
- **[MONITORING_SUMMARY.md](MONITORING_SUMMARY.md)** - Monitoring setup
|
||||
- **[BLOCK_PRODUCTION_MONITORING.md](BLOCK_PRODUCTION_MONITORING.md)** - Block production monitoring
|
||||
|
||||
### Reference
|
||||
- **[MASTER_INDEX.md](MASTER_INDEX.md)** - Complete documentation index
|
||||
|
||||
---
|
||||
|
||||
**Document Status:** Active
|
||||
**Maintained By:** Infrastructure Team
|
||||
**Review Cycle:** Monthly
|
||||
**Last Updated:** 2025-01-20
|
||||
|
||||
28
docs/03-deployment/README.md
Normal file
28
docs/03-deployment/README.md
Normal file
@@ -0,0 +1,28 @@
|
||||
# Deployment & Operations
|
||||
|
||||
This directory contains deployment guides and operational procedures.
|
||||
|
||||
## Documents
|
||||
|
||||
- **[ORCHESTRATION_DEPLOYMENT_GUIDE.md](ORCHESTRATION_DEPLOYMENT_GUIDE.md)** ⭐⭐⭐ - Complete enterprise deployment orchestration
|
||||
- **[VALIDATED_SET_DEPLOYMENT_GUIDE.md](VALIDATED_SET_DEPLOYMENT_GUIDE.md)** ⭐⭐⭐ - Validated set deployment procedures
|
||||
- **[OPERATIONAL_RUNBOOKS.md](OPERATIONAL_RUNBOOKS.md)** ⭐⭐⭐ - All operational procedures
|
||||
- **[DEPLOYMENT_READINESS.md](DEPLOYMENT_READINESS.md)** ⭐⭐ - Pre-deployment validation checklist
|
||||
- **[DEPLOYMENT_STATUS_CONSOLIDATED.md](DEPLOYMENT_STATUS_CONSOLIDATED.md)** ⭐⭐⭐ - Current deployment status
|
||||
- **[RUN_DEPLOYMENT.md](RUN_DEPLOYMENT.md)** ⭐⭐ - Deployment execution guide
|
||||
- **[REMOTE_DEPLOYMENT.md](REMOTE_DEPLOYMENT.md)** ⭐ - Remote deployment procedures
|
||||
|
||||
## Quick Reference
|
||||
|
||||
**Deployment Paths:**
|
||||
- **Enterprise Deployment:** Start with ORCHESTRATION_DEPLOYMENT_GUIDE.md
|
||||
- **Validated Set:** Start with VALIDATED_SET_DEPLOYMENT_GUIDE.md
|
||||
- **Operations:** See OPERATIONAL_RUNBOOKS.md for all procedures
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **[../02-architecture/](../02-architecture/)** - Architecture reference
|
||||
- **[../04-configuration/](../04-configuration/)** - Configuration guides
|
||||
- **[../09-troubleshooting/](../09-troubleshooting/)** - Troubleshooting guides
|
||||
- **[../10-best-practices/](../10-best-practices/)** - Best practices
|
||||
|
||||
189
docs/03-deployment/REMOTE_DEPLOYMENT.md
Normal file
189
docs/03-deployment/REMOTE_DEPLOYMENT.md
Normal file
@@ -0,0 +1,189 @@
|
||||
# Remote Deployment Guide
|
||||
|
||||
## Issue: Deployment Scripts Require Proxmox Host Access
|
||||
|
||||
The deployment scripts (`deploy-all.sh`, etc.) are designed to run **ON the Proxmox host** because they use the `pct` command-line tool, which is only available on Proxmox hosts.
|
||||
|
||||
**Error you encountered:**
|
||||
```
|
||||
[ERROR] pct command not found. This script must be run on Proxmox host.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Solutions
|
||||
|
||||
### Option 1: Copy to Proxmox Host (Recommended)
|
||||
|
||||
**Best approach:** Copy the deployment package to the Proxmox host and run it there.
|
||||
|
||||
#### Step 1: Copy Deployment Package
|
||||
|
||||
```bash
|
||||
# From your local machine
|
||||
cd /home/intlc/projects/proxmox
|
||||
|
||||
# Copy to Proxmox host
|
||||
scp -r smom-dbis-138-proxmox root@192.168.11.10:/opt/
|
||||
```
|
||||
|
||||
#### Step 2: SSH to Proxmox Host
|
||||
|
||||
```bash
|
||||
ssh root@192.168.11.10
|
||||
```
|
||||
|
||||
#### Step 3: Run Deployment on Host
|
||||
|
||||
```bash
|
||||
cd /opt/smom-dbis-138-proxmox
|
||||
|
||||
# Make scripts executable
|
||||
chmod +x scripts/deployment/*.sh
|
||||
chmod +x install/*.sh
|
||||
|
||||
# Run deployment
|
||||
./scripts/deployment/deploy-all.sh
|
||||
```
|
||||
|
||||
#### Automated Script
|
||||
|
||||
Use the provided script to automate this:
|
||||
|
||||
```bash
|
||||
./scripts/deploy-to-proxmox-host.sh
|
||||
```
|
||||
|
||||
This script will:
|
||||
1. Copy the deployment package to the Proxmox host
|
||||
2. SSH into the host
|
||||
3. Run the deployment automatically
|
||||
|
||||
---
|
||||
|
||||
### Option 2: Hybrid Approach (API + SSH)
|
||||
|
||||
Create containers via API, then configure via SSH.
|
||||
|
||||
#### Step 1: Create Containers via API
|
||||
|
||||
```bash
|
||||
# Use the remote deployment script (creates containers via API)
|
||||
cd smom-dbis-138-proxmox
|
||||
./scripts/deployment/deploy-remote.sh
|
||||
```
|
||||
|
||||
#### Step 2: Copy Files and Install
|
||||
|
||||
```bash
|
||||
# Copy installation scripts to Proxmox host
|
||||
scp -r install/ root@192.168.11.10:/opt/smom-dbis-138-proxmox/
|
||||
|
||||
# SSH and run installations
|
||||
ssh root@192.168.11.10
|
||||
cd /opt/smom-dbis-138-proxmox
|
||||
|
||||
# Install in each container
|
||||
for vmid in 106 107 108 109; do
|
||||
pct push $vmid install/besu-validator-install.sh /tmp/install.sh
|
||||
pct exec $vmid -- bash /tmp/install.sh
|
||||
done
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Option 3: Use MCP Server Tools
|
||||
|
||||
The MCP server provides API-based tools that can create containers remotely.
|
||||
|
||||
**Available via MCP:**
|
||||
- Container creation
|
||||
- Container management
|
||||
- Configuration
|
||||
|
||||
**Limitations:**
|
||||
- File upload (`pct push`) still requires local access
|
||||
- Some operations may need local execution
|
||||
|
||||
---
|
||||
|
||||
## Why `pct` is Required
|
||||
|
||||
The `pct` (Proxmox Container Toolkit) command:
|
||||
- Is only available on Proxmox hosts
|
||||
- Provides direct access to container filesystem
|
||||
- Allows file upload (`pct push`)
|
||||
- Allows command execution (`pct exec`)
|
||||
- Is more efficient than API for some operations
|
||||
|
||||
**API Alternative:**
|
||||
- Container creation: ✅ Supported
|
||||
- Container management: ✅ Supported
|
||||
- File upload: ⚠️ Limited (requires workarounds)
|
||||
- Command execution: ✅ Supported (with limitations)
|
||||
|
||||
---
|
||||
|
||||
## Recommended Workflow
|
||||
|
||||
### For Remote Deployment:
|
||||
|
||||
1. **Copy Package to Host**
|
||||
```bash
|
||||
./scripts/deploy-to-proxmox-host.sh
|
||||
```
|
||||
|
||||
2. **Or Manual Copy:**
|
||||
```bash
|
||||
scp -r smom-dbis-138-proxmox root@192.168.11.10:/opt/
|
||||
ssh root@192.168.11.10
|
||||
cd /opt/smom-dbis-138-proxmox
|
||||
./scripts/deployment/deploy-all.sh
|
||||
```
|
||||
|
||||
### For Local Deployment:
|
||||
|
||||
If you have direct access to the Proxmox host:
|
||||
```bash
|
||||
# On Proxmox host
|
||||
cd /opt/smom-dbis-138-proxmox
|
||||
./scripts/deployment/deploy-all.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Issue: "pct command not found"
|
||||
|
||||
**Solution:** Run deployment on Proxmox host, not remotely.
|
||||
|
||||
### Issue: "Permission denied"
|
||||
|
||||
**Solution:** Run with `sudo` or as `root` user.
|
||||
|
||||
### Issue: "Container creation failed"
|
||||
|
||||
**Check:**
|
||||
- API token has proper permissions
|
||||
- Storage is available
|
||||
- Template exists
|
||||
- Sufficient resources
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
**Best Practice:** Copy deployment package to Proxmox host and run there.
|
||||
|
||||
**Quick Command:**
|
||||
```bash
|
||||
./scripts/deploy-to-proxmox-host.sh
|
||||
```
|
||||
|
||||
This automates the entire process of copying and deploying.
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** $(date)
|
||||
|
||||
251
docs/03-deployment/RUN_DEPLOYMENT.md
Normal file
251
docs/03-deployment/RUN_DEPLOYMENT.md
Normal file
@@ -0,0 +1,251 @@
|
||||
# Run Deployment - Execution Guide
|
||||
|
||||
## ✅ Scripts Validated and Ready
|
||||
|
||||
All scripts have been validated:
|
||||
- ✓ Syntax OK
|
||||
- ✓ Executable permissions set
|
||||
- ✓ Dependencies present
|
||||
- ✓ Help/usage messages working
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Step 1: Copy Scripts to Proxmox Host
|
||||
|
||||
**From your local machine:**
|
||||
|
||||
```bash
|
||||
cd /home/intlc/projects/proxmox
|
||||
./scripts/copy-scripts-to-proxmox.sh
|
||||
```
|
||||
|
||||
This copies all deployment scripts to the Proxmox host at `/opt/smom-dbis-138-proxmox/scripts/`.
|
||||
|
||||
### Step 2: Run Deployment on Proxmox Host
|
||||
|
||||
**SSH to Proxmox host and execute:**
|
||||
|
||||
```bash
|
||||
# 1. SSH to Proxmox host
|
||||
ssh root@192.168.11.10
|
||||
|
||||
# 2. Navigate to deployment directory
|
||||
cd /opt/smom-dbis-138-proxmox
|
||||
|
||||
# 3. Run complete deployment
|
||||
sudo ./scripts/deployment/deploy-validated-set.sh \
|
||||
--source-project /home/intlc/projects/smom-dbis-138
|
||||
```
|
||||
|
||||
**Note**: The source project path must be accessible from the Proxmox host. If the Proxmox host is remote, ensure:
|
||||
- The directory is mounted/shared, OR
|
||||
- Configuration files are copied separately to the Proxmox host
|
||||
```
|
||||
|
||||
## Execution Options
|
||||
|
||||
### Option 1: Complete Deployment (First Time)
|
||||
|
||||
Deploys everything from scratch:
|
||||
|
||||
```bash
|
||||
sudo ./scripts/deployment/deploy-validated-set.sh \
|
||||
--source-project /path/to/smom-dbis-138
|
||||
```
|
||||
|
||||
**What it does:**
|
||||
1. Deploys containers
|
||||
2. Copies configuration files
|
||||
3. Bootstraps network
|
||||
4. Validates deployment
|
||||
|
||||
### Option 2: Bootstrap Existing Containers
|
||||
|
||||
If containers are already deployed:
|
||||
|
||||
```bash
|
||||
sudo ./scripts/network/bootstrap-network.sh
|
||||
```
|
||||
|
||||
Or using the main script:
|
||||
|
||||
```bash
|
||||
sudo ./scripts/deployment/deploy-validated-set.sh \
|
||||
--skip-deployment \
|
||||
--skip-config \
|
||||
--source-project /path/to/smom-dbis-138
|
||||
```
|
||||
|
||||
### Option 3: Validate Only
|
||||
|
||||
Just validate the current deployment:
|
||||
|
||||
```bash
|
||||
sudo ./scripts/validation/validate-validator-set.sh
|
||||
```
|
||||
|
||||
### Option 4: Check Node Health
|
||||
|
||||
Check health of a specific node:
|
||||
|
||||
```bash
|
||||
# Human-readable output
|
||||
sudo ./scripts/health/check-node-health.sh 1000
|
||||
|
||||
# JSON output (for automation)
|
||||
sudo ./scripts/health/check-node-health.sh 1000 --json
|
||||
```
|
||||
|
||||
## Expected Output
|
||||
|
||||
### Successful Deployment
|
||||
|
||||
```
|
||||
=========================================
|
||||
Deploy Validated Set - Script-Based Approach
|
||||
=========================================
|
||||
|
||||
=== Pre-Deployment Validation ===
|
||||
[✓] Prerequisites checked
|
||||
|
||||
=========================================
|
||||
Phase 1: Deploy Containers
|
||||
=========================================
|
||||
[INFO] Deploying Besu nodes...
|
||||
[✓] Besu nodes deployed
|
||||
|
||||
=========================================
|
||||
Phase 2: Copy Configuration Files
|
||||
=========================================
|
||||
[INFO] Copying Besu configuration files...
|
||||
[✓] Configuration files copied
|
||||
|
||||
=========================================
|
||||
Phase 3: Bootstrap Network
|
||||
=========================================
|
||||
[INFO] Bootstrapping network...
|
||||
[INFO] Collecting enodes from validators...
|
||||
[✓] Network bootstrapped
|
||||
|
||||
=========================================
|
||||
Phase 4: Validate Deployment
|
||||
=========================================
|
||||
[INFO] Validating validator set...
|
||||
[✓] All validators validated successfully!
|
||||
|
||||
=========================================
|
||||
[✓] Deployment Complete!
|
||||
=========================================
|
||||
```
|
||||
|
||||
## Monitoring During Execution
|
||||
|
||||
### Watch Logs in Real-Time
|
||||
|
||||
```bash
|
||||
# In another terminal, watch the log file
|
||||
tail -f /opt/smom-dbis-138-proxmox/logs/deploy-validated-set-*.log
|
||||
```
|
||||
|
||||
### Check Container Status
|
||||
|
||||
```bash
|
||||
# List all containers
|
||||
pct list | grep -E "1000|1001|1002|1003|1004|1500|1501|1502|1503|2500|2501|2502"
|
||||
|
||||
# Check specific container
|
||||
pct status 1000
|
||||
```
|
||||
|
||||
### Monitor Service Logs
|
||||
|
||||
```bash
|
||||
# Watch Besu service logs
|
||||
pct exec 1000 -- journalctl -u besu-validator -f
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### If Deployment Fails
|
||||
|
||||
1. **Check the log file:**
|
||||
```bash
|
||||
tail -100 /opt/smom-dbis-138-proxmox/logs/deploy-validated-set-*.log
|
||||
```
|
||||
|
||||
2. **Check container status:**
|
||||
```bash
|
||||
pct list
|
||||
```
|
||||
|
||||
3. **Check service status:**
|
||||
```bash
|
||||
pct exec <vmid> -- systemctl status besu-validator
|
||||
```
|
||||
|
||||
4. **Review error messages** in the script output
|
||||
|
||||
### Common Issues
|
||||
|
||||
**Issue: Containers not starting**
|
||||
- Check resources (RAM, disk)
|
||||
- Check OS template availability
|
||||
- Review container logs
|
||||
|
||||
**Issue: Configuration copy fails**
|
||||
- Verify source project path is correct
|
||||
- Check source files exist
|
||||
- Verify containers are running
|
||||
|
||||
**Issue: Bootstrap fails**
|
||||
- Ensure containers are running
|
||||
- Check P2P port (30303) is accessible
|
||||
- Verify enode extraction works
|
||||
|
||||
**Issue: Validation fails**
|
||||
- Check validator keys exist
|
||||
- Verify configuration files are present
|
||||
- Check services are running
|
||||
|
||||
## Post-Deployment Verification
|
||||
|
||||
After successful deployment, verify:
|
||||
|
||||
```bash
|
||||
# 1. Check all services are running
|
||||
for vmid in 1000 1001 1002 1003 1004 1500 1501 1502 1503 2500 2501 2502; do
|
||||
echo "=== Container $vmid ==="
|
||||
pct exec $vmid -- systemctl status besu-validator besu-sentry besu-rpc --no-pager 2>/dev/null | head -5
|
||||
done
|
||||
|
||||
# 2. Check consensus (block production)
|
||||
pct exec 2500 -- curl -s -X POST \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
|
||||
http://localhost:8545 | python3 -m json.tool
|
||||
|
||||
# 3. Check peer connections
|
||||
pct exec 2500 -- curl -s -X POST \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \
|
||||
http://localhost:8545 | python3 -m json.tool
|
||||
```
|
||||
|
||||
## Success Criteria
|
||||
|
||||
Deployment is successful when:
|
||||
- ✓ All containers are running
|
||||
- ✓ All services are active
|
||||
- ✓ Network is bootstrapped (static-nodes.json deployed)
|
||||
- ✓ Validators are validated
|
||||
- ✓ Consensus is active (blocks being produced)
|
||||
- ✓ Nodes can connect to peers
|
||||
|
||||
## Next Steps
|
||||
|
||||
After successful deployment:
|
||||
1. Set up monitoring
|
||||
2. Configure backups
|
||||
3. Document node endpoints
|
||||
4. Set up alerting
|
||||
5. Plan maintenance schedule
|
||||
289
docs/03-deployment/VALIDATED_SET_DEPLOYMENT_GUIDE.md
Normal file
289
docs/03-deployment/VALIDATED_SET_DEPLOYMENT_GUIDE.md
Normal file
@@ -0,0 +1,289 @@
|
||||
# Validated Set Deployment Guide
|
||||
|
||||
Complete guide for deploying a validated Besu node set using the script-based approach.
|
||||
|
||||
## Overview
|
||||
|
||||
This guide covers deploying a validated set of Besu nodes (validators, sentries, RPC) on Proxmox VE LXC containers using automated scripts. The deployment uses a **script-based approach** with `static-nodes.json` for peer discovery (no boot node required).
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Proxmox VE 7.0+ installed
|
||||
- Root access to Proxmox host
|
||||
- Sufficient resources (RAM, disk, CPU)
|
||||
- Network connectivity
|
||||
- Source project with Besu configuration files
|
||||
|
||||
## Deployment Methods
|
||||
|
||||
### Method 1: Complete Deployment (Recommended)
|
||||
|
||||
Deploy everything from scratch in one command:
|
||||
|
||||
```bash
|
||||
cd /opt/smom-dbis-138-proxmox
|
||||
sudo ./scripts/deployment/deploy-validated-set.sh \
|
||||
--source-project /path/to/smom-dbis-138
|
||||
```
|
||||
|
||||
**What this does:**
|
||||
1. Deploys all containers (validators, sentries, RPC)
|
||||
2. Copies configuration files from source project
|
||||
3. Bootstraps the network (generates and deploys static-nodes.json)
|
||||
4. Validates the deployment
|
||||
|
||||
### Method 2: Step-by-Step Deployment
|
||||
|
||||
If you prefer more control, deploy step by step:
|
||||
|
||||
```bash
|
||||
# Step 1: Deploy containers
|
||||
sudo ./scripts/deployment/deploy-besu-nodes.sh
|
||||
|
||||
# Step 2: Copy configuration files
|
||||
SOURCE_PROJECT=/path/to/smom-dbis-138 \
|
||||
./scripts/copy-besu-config.sh
|
||||
|
||||
# Step 3: Bootstrap network
|
||||
sudo ./scripts/network/bootstrap-network.sh
|
||||
|
||||
# Step 4: Validate validators
|
||||
sudo ./scripts/validation/validate-validator-set.sh
|
||||
```
|
||||
|
||||
### Method 3: Bootstrap Existing Containers
|
||||
|
||||
If containers are already deployed and configured:
|
||||
|
||||
```bash
|
||||
# Quick bootstrap (just network bootstrap)
|
||||
sudo ./scripts/deployment/bootstrap-quick.sh
|
||||
|
||||
# Or use the full script with skip options
|
||||
sudo ./scripts/deployment/deploy-validated-set.sh \
|
||||
--skip-deployment \
|
||||
--skip-config \
|
||||
--source-project /path/to/smom-dbis-138
|
||||
```
|
||||
|
||||
## Detailed Steps
|
||||
|
||||
### Step 1: Prepare Source Project
|
||||
|
||||
Ensure your source project has the required files:
|
||||
|
||||
```
|
||||
smom-dbis-138/
|
||||
├── config/
|
||||
│ ├── genesis.json
|
||||
│ ├── permissions-nodes.toml
|
||||
│ ├── permissions-accounts.toml
|
||||
│ ├── static-nodes.json (will be generated/updated)
|
||||
│ ├── config-validator.toml
|
||||
│ ├── config-sentry.toml
|
||||
│ └── config-rpc-public.toml
|
||||
└── keys/
|
||||
└── validators/
|
||||
├── validator-1/
|
||||
├── validator-2/
|
||||
├── validator-3/
|
||||
├── validator-4/
|
||||
└── validator-5/
|
||||
```
|
||||
|
||||
### Step 2: Review Configuration
|
||||
|
||||
Check your deployment configuration:
|
||||
|
||||
```bash
|
||||
cat config/proxmox.conf
|
||||
cat config/network.conf
|
||||
```
|
||||
|
||||
Key settings:
|
||||
- `VALIDATOR_START`, `VALIDATOR_COUNT` - Validator VMID range
|
||||
- `SENTRY_START`, `SENTRY_COUNT` - Sentry VMID range
|
||||
- `RPC_START`, `RPC_COUNT` - RPC VMID range
|
||||
- `CONTAINER_OS_TEMPLATE` - OS template to use
|
||||
|
||||
### Step 3: Run Deployment
|
||||
|
||||
Execute the deployment script:
|
||||
|
||||
```bash
|
||||
sudo ./scripts/deployment/deploy-validated-set.sh \
|
||||
--source-project /path/to/smom-dbis-138
|
||||
```
|
||||
|
||||
### Step 4: Monitor Progress
|
||||
|
||||
The script will output progress for each phase:
|
||||
|
||||
```
|
||||
=========================================
|
||||
Phase 1: Deploy Containers
|
||||
=========================================
|
||||
[INFO] Deploying Besu nodes...
|
||||
[✓] Besu nodes deployed
|
||||
|
||||
=========================================
|
||||
Phase 2: Copy Configuration Files
|
||||
=========================================
|
||||
[INFO] Copying Besu configuration files...
|
||||
[✓] Configuration files copied
|
||||
|
||||
=========================================
|
||||
Phase 3: Bootstrap Network
|
||||
=========================================
|
||||
[INFO] Bootstrapping network...
|
||||
[INFO] Collecting enodes from validators...
|
||||
[✓] Network bootstrapped
|
||||
|
||||
=========================================
|
||||
Phase 4: Validate Deployment
|
||||
=========================================
|
||||
[INFO] Validating validator set...
|
||||
[✓] All validators validated successfully!
|
||||
```
|
||||
|
||||
### Step 5: Verify Deployment
|
||||
|
||||
After deployment completes, verify everything is working:
|
||||
|
||||
```bash
|
||||
# Check all containers are running
|
||||
pct list | grep -E "1000|1001|1002|1003|1004|1500|1501|1502|1503|2500|2501|2502"
|
||||
|
||||
# Check service status
|
||||
for vmid in 1000 1001 1002 1003 1004; do
|
||||
echo "=== Validator $vmid ==="
|
||||
pct exec $vmid -- systemctl status besu-validator --no-pager -l
|
||||
done
|
||||
|
||||
# Check consensus is active (blocks being produced)
|
||||
pct exec 2500 -- curl -s -X POST \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
|
||||
http://localhost:8545 | python3 -m json.tool
|
||||
```
|
||||
|
||||
## Health Checks
|
||||
|
||||
### Check Individual Node Health
|
||||
|
||||
```bash
|
||||
# Human-readable output
|
||||
sudo ./scripts/health/check-node-health.sh 1000
|
||||
|
||||
# JSON output (for automation)
|
||||
sudo ./scripts/health/check-node-health.sh 1000 --json
|
||||
```
|
||||
|
||||
### Validate Validator Set
|
||||
|
||||
```bash
|
||||
sudo ./scripts/validation/validate-validator-set.sh
|
||||
```
|
||||
|
||||
This checks:
|
||||
- Container and service status
|
||||
- Validator keys exist and are accessible
|
||||
- Configuration files are present
|
||||
- Consensus participation
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Containers Won't Start
|
||||
|
||||
```bash
|
||||
# Check container status
|
||||
pct status <vmid>
|
||||
|
||||
# View container console
|
||||
pct console <vmid>
|
||||
|
||||
# Check logs
|
||||
pct exec <vmid> -- journalctl -xe
|
||||
```
|
||||
|
||||
### Services Won't Start
|
||||
|
||||
```bash
|
||||
# Check service status
|
||||
pct exec <vmid> -- systemctl status besu-validator
|
||||
|
||||
# View service logs
|
||||
pct exec <vmid> -- journalctl -u besu-validator -f
|
||||
|
||||
# Check configuration
|
||||
pct exec <vmid> -- cat /etc/besu/config-validator.toml
|
||||
```
|
||||
|
||||
### Network Connectivity Issues
|
||||
|
||||
```bash
|
||||
# Check P2P port is listening
|
||||
pct exec <vmid> -- netstat -tuln | grep 30303
|
||||
|
||||
# Check peer connections (if RPC enabled)
|
||||
pct exec <vmid> -- curl -s -X POST \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"jsonrpc":"2.0","method":"admin_peers","params":[],"id":1}' \
|
||||
http://localhost:8545
|
||||
|
||||
# Verify static-nodes.json
|
||||
pct exec <vmid> -- cat /etc/besu/static-nodes.json
|
||||
```
|
||||
|
||||
### Consensus Issues
|
||||
|
||||
```bash
|
||||
# Check validator is participating
|
||||
pct exec <vmid> -- journalctl -u besu-validator --no-pager | grep -i "consensus\|qbft\|proposing"
|
||||
|
||||
# Verify validator keys
|
||||
pct exec <vmid> -- ls -la /keys/validators/
|
||||
|
||||
# Check genesis file
|
||||
pct exec <vmid> -- cat /etc/besu/genesis.json | python3 -m json.tool
|
||||
```
|
||||
|
||||
## Rollback
|
||||
|
||||
If deployment fails, you can remove containers:
|
||||
|
||||
```bash
|
||||
# Remove specific containers
|
||||
for vmid in 1000 1001 1002 1003 1004 1500 1501 1502 1503 2500 2501 2502; do
|
||||
pct stop $vmid 2>/dev/null || true
|
||||
pct destroy $vmid 2>/dev/null || true
|
||||
done
|
||||
```
|
||||
|
||||
Then re-run the deployment after fixing any issues.
|
||||
|
||||
## Post-Deployment
|
||||
|
||||
After successful deployment:
|
||||
|
||||
1. **Monitor Logs**: Keep an eye on service logs for the first few hours
|
||||
2. **Verify Consensus**: Ensure blocks are being produced
|
||||
3. **Check Resources**: Monitor CPU, memory, and disk usage
|
||||
4. **Network Health**: Verify all nodes are connected
|
||||
5. **Backup**: Consider creating snapshots of working containers
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Set up monitoring (Prometheus, Grafana)
|
||||
- Configure backups
|
||||
- Document node endpoints
|
||||
- Set up alerting
|
||||
- Plan for maintenance windows
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- [Besu Nodes File Reference](BESU_NODES_FILE_REFERENCE.md)
|
||||
- [Network Bootstrap Guide](NETWORK_BOOTSTRAP_GUIDE.md)
|
||||
- [Boot Node Runbook](BOOT_NODE_RUNBOOK.md) (if using boot node)
|
||||
- [Besu Allowlist Runbook](BESU_ALLOWLIST_RUNBOOK.md)
|
||||
|
||||
Reference in New Issue
Block a user