Complete markdown files cleanup and organization

- Organized 252 files across project
- Root directory: 187 → 2 files (98.9% reduction)
- Moved configuration guides to docs/04-configuration/
- Moved troubleshooting guides to docs/09-troubleshooting/
- Moved quick start guides to docs/01-getting-started/
- Moved reports to reports/ directory
- Archived temporary files
- Generated comprehensive reports and documentation
- Created maintenance scripts and guides

All files organized according to established standards.
This commit is contained in:
defiQUG
2026-01-06 01:46:25 -08:00
parent 1edcec953c
commit cb47cce074
1327 changed files with 217220 additions and 801 deletions

View File

@@ -0,0 +1,342 @@
# Backup and Restore Procedures
**Last Updated:** 2025-01-20
**Document Version:** 1.0
**Status:** Active Documentation
---
## Overview
This document provides detailed procedures for backing up and restoring Proxmox VMs, containers, and configuration.
---
## Backup Strategy
### Backup Types
1. **VM/Container Backups:**
- Full VM snapshots
- Container backups
- Application data backups
2. **Configuration Backups:**
- Proxmox host configuration
- Network configuration
- Storage configuration
3. **Data Backups:**
- Database backups
- Application data
- Configuration files
---
## Backup Procedures
### Proxmox VM/Container Backups
#### Using Proxmox Backup Server (PBS)
**Setup:**
1. **Install PBS** (if not already installed)
2. **Add PBS to Proxmox:**
- Datacenter → Storage → Add → Proxmox Backup Server
- Enter PBS server details
- Test connection
**Scheduled Backups:**
1. **Create Backup Job:**
- Datacenter → Backup → Add
- Select VMs/containers
- Set schedule (daily, weekly, etc.)
- Choose retention policy
2. **Backup Options:**
- **Mode:** Snapshot (recommended for running VMs)
- **Compression:** ZSTD (recommended)
- **Storage:** Proxmox Backup Server
**Manual Backup:**
```bash
# Backup single VM
vzdump <vmid> --storage <storage-name> --mode snapshot
# Backup multiple VMs
vzdump 100 101 102 --storage <storage-name> --mode snapshot
# Backup all VMs
vzdump --all --storage <storage-name> --mode snapshot
```
#### Using vzdump (Direct)
**Backup to Local Storage:**
```bash
# Backup VM to local storage
vzdump <vmid> --storage local --mode snapshot --compress zstd
# Backup with retention
vzdump <vmid> --storage local --mode snapshot --maxfiles 7
```
**Backup to NFS:**
```bash
# Add NFS storage first
# Datacenter → Storage → Add → NFS
# Backup to NFS
vzdump <vmid> --storage nfs-backup --mode snapshot
```
---
### Configuration Backups
#### Proxmox Host Configuration
**Backup Configuration Files:**
```bash
# Backup Proxmox configuration
tar -czf /backup/proxmox-config-$(date +%Y%m%d).tar.gz \
/etc/pve/ \
/etc/network/interfaces \
/etc/hosts \
/etc/hostname
```
**Restore Configuration:**
```bash
# Extract configuration
tar -xzf /backup/proxmox-config-YYYYMMDD.tar.gz -C /
# Restart services
systemctl restart pve-cluster
systemctl restart pve-daemon
```
#### Network Configuration
**Backup Network Config:**
```bash
# Backup network configuration
cp /etc/network/interfaces /backup/interfaces-$(date +%Y%m%d)
cp /etc/hosts /backup/hosts-$(date +%Y%m%d)
```
**Version Control:**
- Store network configuration in Git
- Track changes over time
- Easy rollback if needed
---
### Application Data Backups
#### Database Backups
**PostgreSQL:**
```bash
# Backup PostgreSQL database
pg_dump -U <user> <database> > /backup/db-$(date +%Y%m%d).sql
# Restore
psql -U <user> <database> < /backup/db-YYYYMMDD.sql
```
**MySQL/MariaDB:**
```bash
# Backup MySQL database
mysqldump -u <user> -p <database> > /backup/db-$(date +%Y%m%d).sql
# Restore
mysql -u <user> -p <database> < /backup/db-YYYYMMDD.sql
```
#### Application Files
```bash
# Backup application directory
tar -czf /backup/app-$(date +%Y%m%d).tar.gz /path/to/application
# Restore
tar -xzf /backup/app-YYYYMMDD.tar.gz -C /
```
---
## Restore Procedures
### Restore VM/Container from Backup
#### From Proxmox Backup Server
**Via Web UI:**
1. **Select VM/Container:**
- Datacenter → Backup → Select backup
- Click "Restore"
2. **Restore Options:**
- Select target storage
- Choose new VMID (or keep original)
- Set network configuration
3. **Start Restore:**
- Click "Restore"
- Monitor progress
**Via Command Line:**
```bash
# Restore from PBS
vzdump restore <backup-id> <vmid> --storage <storage>
# Restore with new VMID
vzdump restore <backup-id> <new-vmid> --storage <storage>
```
#### From vzdump Backup
```bash
# Restore from vzdump file
vzdump restore <backup-file.vma.gz> <vmid> --storage <storage>
```
---
### Restore Configuration
#### Restore Proxmox Configuration
```bash
# Stop Proxmox services
systemctl stop pve-cluster
systemctl stop pve-daemon
# Restore configuration
tar -xzf /backup/proxmox-config-YYYYMMDD.tar.gz -C /
# Start services
systemctl start pve-cluster
systemctl start pve-daemon
```
#### Restore Network Configuration
```bash
# Restore network config
cp /backup/interfaces-YYYYMMDD /etc/network/interfaces
cp /backup/hosts-YYYYMMDD /etc/hosts
# Restart networking
systemctl restart networking
```
---
## Backup Verification
### Verify Backup Integrity
**Check Backup Files:**
```bash
# List backups
vzdump list --storage <storage>
# Verify backup
vzdump verify <backup-id>
```
**Test Restore:**
- Monthly restore test
- Verify VM/container starts
- Test application functionality
- Document results
---
## Backup Retention Policy
### Retention Schedule
- **Daily Backups:** Keep 7 days
- **Weekly Backups:** Keep 4 weeks
- **Monthly Backups:** Keep 12 months
- **Yearly Backups:** Keep 7 years
### Cleanup Old Backups
```bash
# Remove backups older than retention period
vzdump prune --storage <storage> --keep-last 7
```
---
## Backup Monitoring
### Backup Status Monitoring
**Check Backup Jobs:**
- Datacenter → Backup → Jobs
- Review last backup time
- Check for errors
**Automated Monitoring:**
- Set up alerts for failed backups
- Monitor backup storage usage
- Track backup completion times
---
## Best Practices
1. **Test Restores Regularly:**
- Monthly restore tests
- Verify data integrity
- Document results
2. **Multiple Backup Locations:**
- Local backups (fast restore)
- Remote backups (disaster recovery)
- Offsite backups (complete protection)
3. **Document Backup Procedures:**
- Keep procedures up to date
- Document restore procedures
- Maintain backup inventory
4. **Monitor Backup Storage:**
- Check available space regularly
- Clean up old backups
- Plan for storage growth
---
## Related Documentation
- **[DISASTER_RECOVERY.md](DISASTER_RECOVERY.md)** - Disaster recovery procedures
- **[OPERATIONAL_RUNBOOKS.md](OPERATIONAL_RUNBOOKS.md)** - Operational procedures
- **[../../04-configuration/SECRETS_KEYS_CONFIGURATION.md](../../04-configuration/SECRETS_KEYS_CONFIGURATION.md)** - Secrets backup
---
**Last Updated:** 2025-01-20
**Review Cycle:** Monthly

View File

@@ -0,0 +1,229 @@
# ChainID 138 Automation Scripts
**Date:** December 26, 2024
**Status:** ✅ All automation scripts created and ready
---
## Overview
This document describes the automation scripts created for ChainID 138 deployment. These scripts can be run once containers are created to automate the complete configuration process.
---
## Available Scripts
### 1. Main Deployment Script
**File:** `scripts/deploy-all-chain138-containers.sh`
**Purpose:** Master script that orchestrates the complete deployment process.
**What it does:**
1. Configures all Besu nodes (static-nodes.json, permissioned-nodes.json)
2. Verifies configuration
3. Sets up JWT authentication for RPC containers
4. Generates JWT tokens for operators
**Usage:**
```bash
cd /home/intlc/projects/proxmox
./scripts/deploy-all-chain138-containers.sh
```
**Note:** This script will prompt for confirmation before proceeding.
---
### 2. JWT Authentication Setup
**File:** `scripts/setup-jwt-auth-all-rpc-containers.sh`
**Purpose:** Configures JWT authentication for all RPC containers (2503-2508).
**What it does:**
- Installs nginx and dependencies on each container
- Generates JWT secret keys
- Creates JWT validation service
- Configures nginx with JWT authentication
- Sets up SSL certificates
- Starts JWT validation service and nginx
**Usage:**
```bash
./scripts/setup-jwt-auth-all-rpc-containers.sh
```
**Requirements:**
- Containers must be running
- SSH access to Proxmox host
- Root access on Proxmox host
---
### 3. JWT Token Generation
**File:** `scripts/generate-jwt-token-for-container.sh`
**Purpose:** Generates JWT tokens for specific containers and operators.
**Usage:**
```bash
# Generate token for a specific container
./scripts/generate-jwt-token-for-container.sh <VMID> <username> [expiry_days]
# Examples:
./scripts/generate-jwt-token-for-container.sh 2503 ali-full-access 365
./scripts/generate-jwt-token-for-container.sh 2505 luis-rpc-access 365
./scripts/generate-jwt-token-for-container.sh 2507 putu-rpc-access 365
```
**Parameters:**
- `VMID`: Container VMID (2503-2508)
- `username`: Username for the token (e.g., ali-full-access, luis-rpc-access)
- `expiry_days`: Token expiry in days (default: 365)
**Output:**
- JWT token
- Usage example with curl command
---
### 4. Besu Configuration
**File:** `scripts/configure-besu-chain138-nodes.sh`
**Purpose:** Configures all Besu nodes with static-nodes.json and permissioned-nodes.json.
**What it does:**
1. Collects enodes from all Besu nodes
2. Generates static-nodes.json
3. Generates permissioned-nodes.json
4. Deploys configurations to all containers
5. Configures discovery settings
6. Restarts Besu services
**Usage:**
```bash
./scripts/configure-besu-chain138-nodes.sh
```
---
### 5. Configuration Verification
**File:** `scripts/verify-chain138-config.sh`
**Purpose:** Verifies the configuration of all Besu nodes.
**What it checks:**
- File existence (static-nodes.json, permissioned-nodes.json)
- Discovery settings
- Peer connections
- Service status
**Usage:**
```bash
./scripts/verify-chain138-config.sh
```
---
## Deployment Workflow
### Step 1: Create Containers
First, create all required containers (see `docs/MISSING_CONTAINERS_LIST.md`):
- 1504 - besu-sentry-5
- 2503-2508 - All RPC nodes
- 6201 - firefly-2
- Other services as needed
### Step 2: Run Main Deployment Script
Once containers are created and running:
```bash
cd /home/intlc/projects/proxmox
./scripts/deploy-all-chain138-containers.sh
```
This will:
1. Configure all Besu nodes
2. Verify configuration
3. Set up JWT authentication
4. Generate JWT tokens
### Step 3: Test and Verify
After deployment:
```bash
# Verify configuration
./scripts/verify-chain138-config.sh
# Test JWT authentication on each container
for vmid in 2503 2504 2505 2506 2507 2508; do
echo "Testing VMID $vmid:"
curl -k -H "Authorization: Bearer <TOKEN>" \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}' \
https://192.168.11.XXX/
done
```
---
## Token Distribution
After generating tokens, distribute them to operators:
### Ali (Full Access)
- VMID 2503 (0x8a identity): Full access token
- VMID 2504 (0x1 identity): Full access token
### Luis (RPC-Only Access)
- VMID 2505 (0x8a identity): RPC-only token
- VMID 2506 (0x1 identity): RPC-only token
### Putu (RPC-Only Access)
- VMID 2507 (0x8a identity): RPC-only token
- VMID 2508 (0x1 identity): RPC-only token
---
## Troubleshooting
### Containers Not Running
If containers are not running, the scripts will skip them with a warning. Re-run the scripts after containers are started.
### JWT Secret Not Found
If JWT secret is not found:
1. Run `setup-jwt-auth-all-rpc-containers.sh` first
2. Check that container is running
3. Verify SSH access to Proxmox host
### Configuration Files Not Found
If configuration files are missing:
1. Run `configure-besu-chain138-nodes.sh` first
2. Check that all Besu containers are running
3. Verify network connectivity
---
## Related Documentation
- [Next Steps](CHAIN138_NEXT_STEPS.md)
- [Missing Containers List](MISSING_CONTAINERS_LIST.md)
- [JWT Authentication Requirements](CHAIN138_JWT_AUTH_REQUIREMENTS.md)
- [Complete Implementation](CHAIN138_COMPLETE_IMPLEMENTATION.md)
---
**Last Updated:** December 26, 2024
**Status:** ✅ Ready for use

View File

@@ -0,0 +1,278 @@
# Change Management Process
**Last Updated:** 2025-01-20
**Document Version:** 1.0
**Status:** Active Documentation
---
## Overview
This document defines the change management process for the Proxmox infrastructure, ensuring all changes are properly planned, approved, implemented, and documented.
---
## Change Types
### Standard Changes
**Definition:** Pre-approved, low-risk changes that follow established procedures.
**Examples:**
- Routine maintenance
- Scheduled updates
- Standard VM/container deployments
**Process:**
- No formal approval required
- Document in change log
- Follow standard procedures
### Normal Changes
**Definition:** Changes that require review and approval but are not emergency.
**Examples:**
- Network configuration changes
- Storage modifications
- Security updates
- New service deployments
**Process:**
- Submit change request
- Review and approval
- Schedule implementation
- Document results
### Emergency Changes
**Definition:** Urgent changes required to resolve critical issues.
**Examples:**
- Security patches
- Critical bug fixes
- Service restoration
**Process:**
- Implement immediately
- Document during/after
- Post-implementation review
- Retrospective approval
---
## Change Request Process
### 1. Change Request Submission
**Required Information:**
1. **Change Details:**
- Description of change
- Reason for change
- Expected impact
2. **Technical Details:**
- Systems affected
- Implementation steps
- Rollback plan
3. **Risk Assessment:**
- Risk level (Low/Medium/High)
- Potential impact
- Mitigation strategies
4. **Timeline:**
- Proposed implementation date
- Estimated duration
- Maintenance window (if needed)
### 2. Change Review
**Review Criteria:**
1. **Technical Review:**
- Feasibility
- Impact assessment
- Risk evaluation
2. **Business Review:**
- Business impact
- Resource requirements
- Timeline alignment
3. **Security Review:**
- Security implications
- Compliance requirements
- Risk assessment
### 3. Change Approval
**Approval Levels:**
- **Standard Changes:** No approval required
- **Normal Changes:** Infrastructure lead approval
- **High-Risk Changes:** Management approval
- **Emergency Changes:** Post-implementation approval
### 4. Change Implementation
**Pre-Implementation:**
1. **Preparation:**
- Verify backups
- Prepare rollback plan
- Notify stakeholders
- Schedule maintenance window (if needed)
2. **Implementation:**
- Follow documented procedures
- Document steps taken
- Monitor for issues
3. **Verification:**
- Test functionality
- Verify system health
- Check logs for errors
### 5. Post-Implementation
**Activities:**
1. **Documentation:**
- Update documentation
- Document any issues
- Update change log
2. **Review:**
- Post-implementation review
- Lessons learned
- Process improvements
---
## Change Request Template
```markdown
# Change Request
## Change Information
- **Requestor:** [Name]
- **Date:** [Date]
- **Change Type:** [Standard/Normal/Emergency]
- **Priority:** [Low/Medium/High/Critical]
## Change Description
[Detailed description of the change]
## Reason for Change
[Why is this change needed?]
## Systems Affected
[List of systems, VMs, containers, or services]
## Implementation Plan
[Step-by-step implementation plan]
## Rollback Plan
[How to rollback if issues occur]
## Risk Assessment
- **Risk Level:** [Low/Medium/High]
- **Potential Impact:** [Description]
- **Mitigation:** [How to mitigate risks]
## Testing Plan
[How the change will be tested]
## Timeline
- **Proposed Date:** [Date]
- **Estimated Duration:** [Time]
- **Maintenance Window:** [If applicable]
## Approval
- **Reviewed By:** [Name]
- **Approved By:** [Name]
- **Date:** [Date]
```
---
## Change Log
### Change Log Format
| Date | Change ID | Description | Type | Status | Implemented By |
|------|-----------|-------------|------|--------|----------------|
| 2025-01-20 | CHG-001 | Network VLAN configuration | Normal | Completed | [Name] |
| 2025-01-19 | CHG-002 | Security patch deployment | Emergency | Completed | [Name] |
---
## Best Practices
1. **Plan Ahead:**
- Submit change requests early
- Allow time for review
- Schedule during maintenance windows
2. **Document Everything:**
- Document all changes
- Keep change log updated
- Update procedures
3. **Test First:**
- Test in non-production
- Verify rollback procedures
- Document test results
4. **Communicate:**
- Notify stakeholders
- Provide status updates
- Document issues
5. **Review Regularly:**
- Review change process
- Identify improvements
- Update procedures
---
## Emergency Change Process
### When to Use
- Critical security issues
- Service outages
- Data loss prevention
- Regulatory compliance
### Process
1. **Implement Immediately:**
- Take necessary action
- Document as you go
- Notify stakeholders
2. **Post-Implementation:**
- Complete change request
- Document what was done
- Conduct review
3. **Retrospective:**
- Review emergency change
- Identify improvements
- Update procedures
---
## Related Documentation
- **[OPERATIONAL_RUNBOOKS.md](OPERATIONAL_RUNBOOKS.md)** - Operational procedures
- **[DISASTER_RECOVERY.md](DISASTER_RECOVERY.md)** - Disaster recovery
- **[DEPLOYMENT_READINESS.md](DEPLOYMENT_READINESS.md)** - Deployment procedures
---
**Last Updated:** 2025-01-20
**Review Cycle:** Quarterly

View File

@@ -40,6 +40,39 @@
---
## Deployment Decision Tree
```mermaid
flowchart TD
Start[New Deployment?] --> EnvType{Environment Type?}
EnvType -->|Production| ProdCheck{Production Ready?}
EnvType -->|Staging| StagingDeploy[Staging Deployment]
EnvType -->|Development| DevDeploy[Development Deployment]
ProdCheck -->|No| PrepProd[Prepare Production<br/>Review Checklist<br/>Verify Resources]
ProdCheck -->|Yes| ProdDeploy[Production Deployment]
PrepProd --> ProdDeploy
ProdDeploy --> WhichComponents{Which Components?}
StagingDeploy --> WhichComponents
DevDeploy --> WhichComponents
WhichComponents -->|Full Stack| FullDeploy[Deploy Full Stack<br/>Validators, Sentries, RPC,<br/>Services, Monitoring]
WhichComponents -->|Besu Only| BesuDeploy[Deploy Besu Network<br/>Validators, Sentries, RPC]
WhichComponents -->|CCIP Only| CCIPDeploy[Deploy CCIP Fleet<br/>Commit, Execute, RMN]
WhichComponents -->|Services Only| ServicesDeploy[Deploy Services<br/>Blockscout, Cacti, etc.]
FullDeploy --> ValidateDeploy[Validate Deployment]
BesuDeploy --> ValidateDeploy
CCIPDeploy --> ValidateDeploy
ServicesDeploy --> ValidateDeploy
ValidateDeploy --> DeployComplete[Deployment Complete]
```
---
## 🚀 Deployment Steps
### Step 1: Review Configuration

View File

@@ -0,0 +1,232 @@
# Chain 138 Deployment Readiness Checklist
**Date**: $(date)
**Purpose**: Verify all prerequisites are met before deploying smart contracts
---
## ✅ Network Readiness
### RPC Endpoints
- [x] **RPC-01 (VMID 2500)**: ✅ Operational
- IP: 192.168.11.250
- HTTP RPC: Port 8545 ✅ Listening
- WebSocket RPC: Port 8546 ✅ Listening
- P2P: Port 30303 ✅ Listening
- Metrics: Port 9545 ✅ Listening
- Status: Active, syncing blocks
- [ ] **RPC-02 (VMID 2501)**: ⏳ Check status
- [ ] **RPC-03 (VMID 2502)**: ⏳ Check status
### Network Connectivity
- [x] RPC endpoint responds to `eth_blockNumber`
- [x] RPC endpoint responds to `eth_chainId`
- [x] Chain ID verified: 138
- [x] Network producing blocks (block number > 0)
### Validator Network
- [ ] All validators (1000-1004) operational
- [ ] Network consensus active
- [ ] Block production stable
---
## ✅ Configuration Readiness
### Deployment Scripts
- [x] **Deployment script updated**: `deploy-contracts-once-ready.sh`
- IP address updated: `10.3.1.4:8545``192.168.11.250:8545`
- Location: `/home/intlc/projects/smom-dbis-138/scripts/deployment/`
- [x] **Installation scripts updated**: All service install scripts
- Oracle Publisher: ✅ Updated
- CCIP Monitor: ✅ Updated
- Keeper: ✅ Updated
- Financial Tokenization: ✅ Updated
- Firefly: ✅ Updated
- Cacti: ✅ Updated
- Blockscout: ✅ Updated
### Configuration Templates
- [x] **Besu RPC config template**: ✅ Updated
- Deprecated options removed
- File: `templates/besu-configs/config-rpc.toml`
- [x] **Service installation script**: ✅ Updated
- Config file name corrected
- File: `install/besu-rpc-install.sh`
---
## ⏳ Deployment Prerequisites
### Environment Setup
- [ ] **Source project `.env` file configured**
- Location: `/home/intlc/projects/smom-dbis-138/.env`
- Required variables:
- `RPC_URL_138=http://192.168.11.250:8545`
- `PRIVATE_KEY=<deployer-private-key>`
- `RESERVE_ADMIN=<admin-address>`
- `KEEPER_ADDRESS=<keeper-address>`
- `ORACLE_PRICE_FEED=<oracle-address>` (after Oracle deployment)
### Deployer Account
- [ ] **Deployer account has sufficient balance**
- Check balance: `cast balance <deployer-address> --rpc-url http://192.168.11.250:8545`
- Minimum recommended: 1 ETH equivalent
### Network Verification
- [x] **Network is producing blocks**
- Verified: ✅ Yes
- Current block: > 11,200 (as of troubleshooting)
- [x] **Chain ID correct**
- Expected: 138
- Verified: ✅ Yes
---
## 📋 Contract Deployment Order
### Phase 1: Core Infrastructure (Priority 1)
1. [ ] **Oracle Contract**
- Script: `DeployOracle.s.sol`
- Dependencies: None
- Required for: Keeper, Price Feeds
2. [ ] **CCIP Router**
- Script: `DeployCCIPRouter.s.sol`
- Dependencies: None
- Required for: CCIP Sender, Cross-chain operations
3. [ ] **CCIP Sender**
- Script: `DeployCCIPSender.s.sol`
- Dependencies: CCIP Router
- Required for: Cross-chain messaging
### Phase 2: Supporting Contracts (Priority 2)
4. [ ] **Multicall**
- Script: `DeployMulticall.s.sol`
- Dependencies: None
- Utility contract
5. [ ] **MultiSig**
- Script: `DeployMultiSig.s.sol`
- Dependencies: None
- Governance contract
### Phase 3: Application Contracts (Priority 3)
6. [ ] **Price Feed Keeper**
- Script: `reserve/DeployKeeper.s.sol`
- Dependencies: Oracle Price Feed
- Required for: Automated price updates
7. [ ] **Reserve System**
- Script: `reserve/DeployReserveSystem.s.sol`
- Dependencies: Token Factory (if applicable)
- Required for: Financial tokenization
---
## 🔧 Service Configuration
### After Contract Deployment
Once contracts are deployed, update service configurations:
- [ ] **Oracle Publisher (VMID 3500)**
- Update `.env` with Oracle contract address
- Restart service
- [ ] **CCIP Monitor (VMID 3501)**
- Update `.env` with CCIP Router and Sender addresses
- Restart service
- [ ] **Keeper (VMID 3502)**
- Update `.env` with Keeper contract address
- Restart service
- [ ] **Financial Tokenization (VMID 3503)**
- Update `.env` with Reserve System address
- Restart service
---
## ✅ Verification Steps
### After Deployment
1. **Verify Contracts on Chain**
```bash
cast code <contract-address> --rpc-url http://192.168.11.250:8545
```
2. **Verify Service Connections**
```bash
# Test Oracle Publisher
pct exec 3500 -- curl -X POST http://localhost:8000/health
# Test CCIP Monitor
pct exec 3501 -- curl -X POST http://localhost:8000/health
# Test Keeper
pct exec 3502 -- curl -X POST http://localhost:3000/health
```
3. **Check Service Logs**
```bash
# Oracle Publisher
pct exec 3500 -- journalctl -u oracle-publisher -f
# CCIP Monitor
pct exec 3501 -- journalctl -u ccip-monitor -f
# Keeper
pct exec 3502 -- journalctl -u price-feed-keeper -f
```
---
## 📊 Current Status Summary
### Completed ✅
- ✅ RPC-01 (VMID 2500) troubleshooting and fix
- ✅ Configuration files updated
- ✅ Deployment scripts updated with correct IPs
- ✅ Network verified (producing blocks, Chain ID 138)
- ✅ RPC endpoint accessible and responding
### Pending ⏳
- ⏳ Verify RPC-02 and RPC-03 status
- ⏳ Configure deployer account and `.env` file
- ⏳ Deploy contracts (waiting for user action)
- ⏳ Update service configurations with deployed addresses
---
## 🚀 Ready for Deployment
**Status**: ✅ **READY** (pending deployer account setup)
All infrastructure, scripts, and documentation are in place. The network is operational and ready for contract deployment.
**Next Action**: Configure deployer account and `.env` file, then proceed with contract deployment.
---
**Last Updated**: $(date)

View File

@@ -0,0 +1,451 @@
# Deployment Runbook
## SolaceScanScout Explorer - Production Deployment Guide
**Last Updated**: $(date)
**Version**: 1.0.0
---
## Table of Contents
1. [Pre-Deployment Checklist](#pre-deployment-checklist)
2. [Environment Setup](#environment-setup)
3. [Database Migration](#database-migration)
4. [Service Deployment](#service-deployment)
5. [Health Checks](#health-checks)
6. [Rollback Procedures](#rollback-procedures)
7. [Post-Deployment Verification](#post-deployment-verification)
8. [Troubleshooting](#troubleshooting)
---
## Pre-Deployment Checklist
### Infrastructure Requirements
- [ ] Kubernetes cluster (AKS) or VM infrastructure ready
- [ ] PostgreSQL 16+ with TimescaleDB extension
- [ ] Redis cluster (for production cache/rate limiting)
- [ ] Elasticsearch/OpenSearch cluster
- [ ] Load balancer configured
- [ ] SSL certificates provisioned
- [ ] DNS records configured
- [ ] Monitoring stack deployed (Prometheus, Grafana)
### Configuration
- [ ] Environment variables configured
- [ ] Secrets stored in Key Vault
- [ ] Database credentials verified
- [ ] Redis connection string verified
- [ ] RPC endpoint URLs verified
- [ ] JWT secret configured (strong random value)
### Code & Artifacts
- [ ] All tests passing
- [ ] Docker images built and tagged
- [ ] Images pushed to container registry
- [ ] Database migrations reviewed
- [ ] Rollback plan documented
---
## Environment Setup
### 1. Set Environment Variables
```bash
# Database
export DB_HOST=postgres.example.com
export DB_PORT=5432
export DB_USER=explorer
export DB_PASSWORD=<from-key-vault>
export DB_NAME=explorer
# Redis (for production)
export REDIS_URL=redis://redis.example.com:6379
# RPC
export RPC_URL=https://rpc.d-bis.org
export WS_URL=wss://rpc.d-bis.org
# Application
export CHAIN_ID=138
export PORT=8080
export JWT_SECRET=<strong-random-secret>
# Optional
export LOG_LEVEL=info
export ENABLE_METRICS=true
```
### 2. Verify Secrets
```bash
# Test database connection
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "SELECT 1;"
# Test Redis connection
redis-cli -u $REDIS_URL ping
# Test RPC endpoint
curl -X POST $RPC_URL \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'
```
---
## Database Migration
### 1. Backup Existing Database
```bash
# Create backup
pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME > backup_$(date +%Y%m%d_%H%M%S).sql
# Verify backup
ls -lh backup_*.sql
```
### 2. Run Migrations
```bash
cd explorer-monorepo/backend/database/migrations
# Review pending migrations
go run migrate.go --status
# Run migrations
go run migrate.go --up
# Verify migration
go run migrate.go --status
```
### 3. Verify Schema
```bash
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "\dt"
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "\d blocks"
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "\d transactions"
```
---
## Service Deployment
### Option 1: Kubernetes Deployment
#### 1. Deploy API Server
```bash
kubectl apply -f k8s/api-server-deployment.yaml
kubectl apply -f k8s/api-server-service.yaml
kubectl apply -f k8s/api-server-ingress.yaml
# Verify deployment
kubectl get pods -l app=api-server
kubectl logs -f deployment/api-server
```
#### 2. Deploy Indexer
```bash
kubectl apply -f k8s/indexer-deployment.yaml
# Verify deployment
kubectl get pods -l app=indexer
kubectl logs -f deployment/indexer
```
#### 3. Rolling Update
```bash
# Update image
kubectl set image deployment/api-server api-server=registry.example.com/explorer-api:v1.1.0
# Monitor rollout
kubectl rollout status deployment/api-server
# Rollback if needed
kubectl rollout undo deployment/api-server
```
### Option 2: Docker Compose Deployment
```bash
cd explorer-monorepo/deployment
# Start services
docker-compose up -d
# Verify services
docker-compose ps
docker-compose logs -f api-server
```
---
## Health Checks
### 1. API Health Endpoint
```bash
# Check health
curl https://api.d-bis.org/health
# Expected response
{
"status": "ok",
"timestamp": "2024-01-01T00:00:00Z",
"database": "connected"
}
```
### 2. Service Health
```bash
# Kubernetes
kubectl get pods
kubectl describe pod <pod-name>
# Docker
docker ps
docker inspect <container-id>
```
### 3. Database Connectivity
```bash
# From API server
curl https://api.d-bis.org/health | jq .database
# Direct check
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "SELECT COUNT(*) FROM blocks;"
```
### 4. Redis Connectivity
```bash
# Test Redis
redis-cli -u $REDIS_URL ping
# Check cache stats
redis-cli -u $REDIS_URL INFO stats
```
---
## Rollback Procedures
### Quick Rollback (Kubernetes)
```bash
# Rollback to previous version
kubectl rollout undo deployment/api-server
kubectl rollout undo deployment/indexer
# Verify rollback
kubectl rollout status deployment/api-server
```
### Database Rollback
```bash
# Restore from backup
psql -h $DB_HOST -U $DB_USER -d $DB_NAME < backup_YYYYMMDD_HHMMSS.sql
# Or rollback migrations
cd explorer-monorepo/backend/database/migrations
go run migrate.go --down 1
```
### Full Rollback
```bash
# 1. Stop new services
kubectl scale deployment/api-server --replicas=0
kubectl scale deployment/indexer --replicas=0
# 2. Restore database
psql -h $DB_HOST -U $DB_USER -d $DB_NAME < backup_YYYYMMDD_HHMMSS.sql
# 3. Start previous version
kubectl set image deployment/api-server api-server=registry.example.com/explorer-api:v1.0.0
kubectl scale deployment/api-server --replicas=3
```
---
## Post-Deployment Verification
### 1. Functional Tests
```bash
# Test Track 1 endpoints (public)
curl https://api.d-bis.org/api/v1/track1/blocks/latest
# Test search
curl https://api.d-bis.org/api/v1/search?q=1000
# Test health
curl https://api.d-bis.org/health
```
### 2. Performance Tests
```bash
# Load test
ab -n 1000 -c 10 https://api.d-bis.org/api/v1/track1/blocks/latest
# Check response times
curl -w "@curl-format.txt" -o /dev/null -s https://api.d-bis.org/api/v1/track1/blocks/latest
```
### 3. Monitoring
- [ ] Check Grafana dashboards
- [ ] Verify Prometheus metrics
- [ ] Check error rates
- [ ] Monitor response times
- [ ] Check database connection pool
- [ ] Verify Redis cache hit rate
---
## Troubleshooting
### Common Issues
#### 1. Database Connection Errors
**Symptoms**: 500 errors, "database connection failed"
**Resolution**:
```bash
# Check database status
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "SELECT 1;"
# Check connection pool
# Review database/migrations for connection pool settings
# Restart service
kubectl rollout restart deployment/api-server
```
#### 2. Redis Connection Errors
**Symptoms**: Cache misses, rate limiting not working
**Resolution**:
```bash
# Test Redis connection
redis-cli -u $REDIS_URL ping
# Check Redis logs
kubectl logs -l app=redis
# Fallback to in-memory (temporary)
# Remove REDIS_URL from environment
```
#### 3. High Memory Usage
**Symptoms**: OOM kills, slow responses
**Resolution**:
```bash
# Check memory usage
kubectl top pods
# Increase memory limits
kubectl set resources deployment/api-server --limits=memory=2Gi
# Review cache TTL settings
```
#### 4. Slow Response Times
**Symptoms**: High latency, timeout errors
**Resolution**:
```bash
# Check database query performance
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "EXPLAIN ANALYZE SELECT * FROM blocks LIMIT 10;"
# Check indexer lag
curl https://api.d-bis.org/api/v1/track2/stats
# Review connection pool settings
```
---
## Emergency Procedures
### Service Outage
1. **Immediate Actions**:
- Check service status: `kubectl get pods`
- Check logs: `kubectl logs -f deployment/api-server`
- Check database: `psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "SELECT 1;"`
- Check Redis: `redis-cli -u $REDIS_URL ping`
2. **Quick Recovery**:
- Restart services: `kubectl rollout restart deployment/api-server`
- Scale up: `kubectl scale deployment/api-server --replicas=5`
- Rollback if needed: `kubectl rollout undo deployment/api-server`
3. **Communication**:
- Update status page
- Notify team via Slack/email
- Document incident
### Data Corruption
1. **Immediate Actions**:
- Stop writes: `kubectl scale deployment/api-server --replicas=0`
- Backup current state: `pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME > emergency_backup.sql`
2. **Recovery**:
- Restore from last known good backup
- Verify data integrity
- Resume services
---
## Maintenance Windows
### Scheduled Maintenance
1. **Pre-Maintenance**:
- Notify users 24 hours in advance
- Create maintenance mode flag
- Prepare rollback plan
2. **During Maintenance**:
- Enable maintenance mode
- Perform updates
- Run health checks
3. **Post-Maintenance**:
- Disable maintenance mode
- Verify all services
- Monitor for issues
---
## Contact Information
- **On-Call Engineer**: Check PagerDuty
- **Slack Channel**: #explorer-deployments
- **Emergency**: [Emergency Contact]
---
**Document Version**: 1.0.0
**Last Reviewed**: $(date)
**Next Review**: $(date -d "+3 months")

View File

@@ -0,0 +1,260 @@
# Disaster Recovery Procedures
**Last Updated:** 2025-01-20
**Document Version:** 1.0
**Status:** Active Documentation
---
## Overview
This document outlines disaster recovery procedures for the Proxmox infrastructure, including recovery from hardware failures, data loss, network outages, and security incidents.
---
## Recovery Scenarios
### 1. Complete Host Failure
**Scenario:** A Proxmox host (R630 or ML110) fails completely and cannot be recovered.
**Recovery Steps:**
1. **Assess Impact:**
```bash
# Check which VMs/containers were running on failed host
pvecm status
pvecm nodes
```
2. **Recover from Backup:**
- Identify backup location (Proxmox Backup Server or external storage)
- Restore VMs/containers to another host in the cluster
- Verify network connectivity and services
3. **Rejoin Cluster (if host is replaced):**
```bash
# On new/repaired host
pvecm add <cluster-name> -link0 <interface>
```
4. **Verify Services:**
- Check all critical services are running
- Verify network connectivity
- Test application functionality
**Recovery Time Objective (RTO):** 4 hours
**Recovery Point Objective (RPO):** Last backup (typically daily)
---
### 2. Storage Failure
**Scenario:** Storage pool fails (ZFS pool corruption, disk failure, etc.)
**Recovery Steps:**
1. **Immediate Actions:**
- Stop all VMs/containers using affected storage
- Assess extent of damage
- Check backup availability
2. **Storage Recovery:**
```bash
# For ZFS pools
zpool status
zpool import -f <pool-name>
zfs scrub <pool-name>
```
3. **Data Recovery:**
- Restore from backups if pool cannot be recovered
- Use Proxmox Backup Server if available
- Restore individual VMs/containers as needed
4. **Verification:**
- Verify data integrity
- Test restored VMs/containers
- Document lessons learned
**RTO:** 8 hours
**RPO:** Last backup
---
### 3. Network Outage
**Scenario:** Complete network failure or misconfiguration
**Recovery Steps:**
1. **Local Access:**
- Use console access (iDRAC, iLO, or physical console)
- Verify Proxmox host is running
- Check network configuration
2. **Network Restoration:**
```bash
# Check network interfaces
ip addr show
ip link show
# Check routing
ip route show
# Restart networking if needed
systemctl restart networking
```
3. **VLAN Restoration:**
- Verify VLAN configuration on switches
- Check Proxmox bridge configuration
- Test connectivity between VLANs
4. **Service Verification:**
- Test internal services
- Verify external connectivity (if applicable)
- Check Cloudflare tunnels (if used)
**RTO:** 2 hours
**RPO:** No data loss (network issue only)
---
### 4. Data Corruption
**Scenario:** VM/container data corruption or accidental deletion
**Recovery Steps:**
1. **Immediate Actions:**
- Stop affected VM/container
- Do not attempt repairs that might worsen corruption
- Document what was lost
2. **Recovery Options:**
- **From Snapshot:** Restore from most recent snapshot
- **From Backup:** Restore from Proxmox Backup Server
- **From External Backup:** Use external backup solution
3. **Restoration:**
```bash
# Restore from PBS
vzdump restore <backup-id> <vmid> --storage <storage>
# Or restore from snapshot
qm rollback <vmid> <snapshot-name>
```
4. **Verification:**
- Verify data integrity
- Test application functionality
- Update documentation
**RTO:** 4 hours
**RPO:** Last snapshot/backup
---
### 5. Security Incident
**Scenario:** Security breach, unauthorized access, or malware
**Recovery Steps:**
1. **Immediate Containment:**
- Isolate affected systems
- Disconnect from network if necessary
- Preserve evidence (logs, snapshots)
2. **Assessment:**
- Identify scope of breach
- Determine what was accessed/modified
- Check for data exfiltration
3. **Recovery:**
- Restore from known-good backups (pre-incident)
- Rebuild affected systems if necessary
- Update all credentials and keys
4. **Hardening:**
- Review and update security policies
- Patch vulnerabilities
- Enhance monitoring
5. **Documentation:**
- Document incident timeline
- Update security procedures
- Conduct post-incident review
**RTO:** 24 hours
**RPO:** Pre-incident state
---
## Backup Strategy
### Backup Schedule
- **Critical VMs/Containers:** Daily backups
- **Standard VMs/Containers:** Weekly backups
- **Configuration:** Daily backups of Proxmox configuration
- **Network Configuration:** Version controlled (Git)
### Backup Locations
1. **Primary:** Proxmox Backup Server (if available)
2. **Secondary:** External storage (NFS, SMB, or USB)
3. **Offsite:** Cloud storage or remote location
### Backup Verification
- Weekly restore tests
- Monthly full disaster recovery drill
- Quarterly review of backup strategy
---
## Recovery Contacts
### Primary Contacts
- **Infrastructure Lead:** [Contact Information]
- **Network Administrator:** [Contact Information]
- **Security Team:** [Contact Information]
### Escalation
- **Level 1:** Infrastructure team (4 hours)
- **Level 2:** Management (8 hours)
- **Level 3:** External support (24 hours)
---
## Testing and Maintenance
### Quarterly DR Drills
1. **Test Scenario:** Simulate host failure
2. **Test Scenario:** Simulate storage failure
3. **Test Scenario:** Simulate network outage
4. **Document Results:** Update procedures based on findings
### Annual Full DR Test
- Complete infrastructure rebuild from backups
- Verify all services
- Update documentation
---
## Related Documentation
- **[BACKUP_AND_RESTORE.md](BACKUP_AND_RESTORE.md)** - Detailed backup procedures
- **[OPERATIONAL_RUNBOOKS.md](OPERATIONAL_RUNBOOKS.md)** - Operational procedures
- **[../../09-troubleshooting/TROUBLESHOOTING_FAQ.md](../../09-troubleshooting/TROUBLESHOOTING_FAQ.md)** - Troubleshooting guide
---
**Last Updated:** 2025-01-20
**Review Cycle:** Quarterly

View File

@@ -0,0 +1,103 @@
# LVM Thin Storage Enabled on pve
**Date**: $(date)
**Status**: ✅ LVM Thin Storage Configured
## Summary
LVM thin storage has been successfully enabled on pve node for migrations.
## Configuration
### Volume Group
- **Name**: `pve`
- **Physical Volumes**: 2 disks (sdc, sdd)
- **Total Size**: ~465.77GB
- **Free Space**: ~257.77GB
### Thin Pool
- **Name**: `thin1`
- **Volume Group**: `pve`
- **Size**: 208GB
- **Type**: LVM thin pool
- **Status**: Created and configured
### Proxmox Storage
- **Name**: `thin1`
- **Type**: `lvmthin`
- **Configuration**:
- Thin pool: `thin1`
- Volume group: `pve`
- Content: `images,rootdir`
- Nodes: `pve`
## Storage Status
```
pve storage:
- local: active (directory storage)
- thin1: configured (LVM thin storage)
- local-lvm: disabled (configured for ml110 only)
```
## Usage
### Migrate VMs to pve with thin1 storage
```bash
# From source node (e.g., ml110)
ssh root@192.168.11.10
# Migrate with thin1 storage
pct migrate <VMID> pve --storage thin1
# Or using API
pvesh create /nodes/ml110/lxc/<VMID>/migrate --target pve --storage thin1 --online 0
```
### Create new VMs on pve
When creating new containers on pve, you can now use:
- `thin1` - LVM thin storage (recommended for performance)
- `local` - Directory storage (slower but works)
## Storage Capacity
- **thin1**: 208GB total (available for VMs)
- **local**: 564GB total, 2.9GB used, 561GB available
## Verification
### Check storage status
```bash
ssh root@192.168.11.11 "pvesm status"
```
### Check volume groups
```bash
ssh root@192.168.11.11 "vgs"
```
### Check thin pools
```bash
ssh root@192.168.11.11 "lvs pve"
```
### List storage contents
```bash
ssh root@192.168.11.11 "pvesm list thin1"
```
## Notes
- The thin pool is created and ready for use
- Storage may show as "inactive" in `pvesm status` until first use - this is normal
- The storage is properly configured and will activate when used
- Both `thin1` (LVM thin) and `local` (directory) storage are available on pve
## Related Documentation
- `docs/STORAGE_FIX_COMPLETE.md`: Complete storage fix documentation
- `docs/MIGRATION_STORAGE_FIX.md`: Migration guide
- `scripts/enable-lvm-thin-pve.sh`: Script used to enable storage

View File

@@ -0,0 +1,339 @@
# Missing LXC Containers - Complete List
**Date:** December 26, 2024
**Status:** Inventory of containers that need to be created
---
## Summary
| Category | Missing | Total Expected | Status |
|----------|---------|----------------|--------|
| **Besu Nodes** | 7 | 19 | 12/19 deployed |
| **Hyperledger Services** | 5 | 5 | 0/5 deployed |
| **Explorer** | 1 | 1 | 0/1 deployed |
| **TOTAL** | **13** | **25** | **12/25 deployed** |
---
## 🔴 Missing Containers by Category
### 1. Besu Nodes (ChainID 138)
#### Missing Sentry Node
| VMID | Hostname | Role | IP Address | Priority | Notes |
|------|----------|------|------------|----------|-------|
| **1504** | `besu-sentry-5` | Besu Sentry Node | 192.168.11.154 | **High** | New container for Ali's dedicated host |
**Specifications:**
- Memory: 4GB
- CPU: 2 cores
- Disk: 100GB
- Network: 192.168.11.154
- Discovery: Enabled
- Access: Ali (Full)
---
#### Missing RPC Nodes
| VMID | Hostname | Role | IP Address | Priority | Notes |
|------|----------|------|------------|----------|-------|
| **2503** | `besu-rpc-4` | Besu RPC Node (Ali - 0x8a) | 192.168.11.253 | **High** | Ali's RPC node - Permissioned identity: 0x8a |
| **2504** | `besu-rpc-4` | Besu RPC Node (Ali - 0x1) | 192.168.11.254 | **High** | Ali's RPC node - Permissioned identity: 0x1 |
| **2505** | `besu-rpc-luis` | Besu RPC Node (Luis - 0x8a) | 192.168.11.255 | **High** | Luis's RPC container - Permissioned identity: 0x8a |
| **2506** | `besu-rpc-luis` | Besu RPC Node (Luis - 0x1) | 192.168.11.256 | **High** | Luis's RPC container - Permissioned identity: 0x1 |
| **2507** | `besu-rpc-putu` | Besu RPC Node (Putu - 0x8a) | 192.168.11.257 | **High** | Putu's RPC container - Permissioned identity: 0x8a |
| **2508** | `besu-rpc-putu` | Besu RPC Node (Putu - 0x1) | 192.168.11.258 | **High** | Putu's RPC container - Permissioned identity: 0x1 |
**Specifications (per container):**
- Memory: 16GB
- CPU: 4 cores
- Disk: 200GB
- Discovery: **Disabled** (prevents connection to Ethereum mainnet while reporting chainID 0x1 to MetaMask for wallet compatibility)
- **Authentication: JWT Auth Required** (all containers)
**Access Model:**
- **2503** (besu-rpc-4): Ali (Full) - 0x8a identity
- **2504** (besu-rpc-4): Ali (Full) - 0x1 identity
- **2505** (besu-rpc-luis): Luis (RPC-only) - 0x8a identity
- **2506** (besu-rpc-luis): Luis (RPC-only) - 0x1 identity
- **2507** (besu-rpc-putu): Putu (RPC-only) - 0x8a identity
- **2508** (besu-rpc-putu): Putu (RPC-only) - 0x1 identity
**Configuration:**
- All use permissioned RPC configuration
- Discovery disabled for all (prevents connection to Ethereum mainnet while reporting chainID 0x1 to MetaMask for wallet compatibility)
- Each container has separate permissioned identity access
- **All require JWT authentication** via nginx reverse proxy
---
### 2. Hyperledger Services
#### Firefly
| VMID | Hostname | Role | IP Address | Priority | Notes |
|------|----------|------|------------|----------|-------|
| **6200** | `firefly-1` | Hyperledger Firefly Core | 192.168.11.66 | **High** | Workflow/orchestration |
| **6201** | `firefly-2` | Hyperledger Firefly Node | 192.168.11.67 | **High** | For Ali's dedicated host (ChainID 138) |
**Specifications (per container):**
- Memory: 4GB
- CPU: 2 cores
- Disk: 50GB
- Access: Ali (Full)
**Notes:**
- 6201 is specifically mentioned in ChainID 138 documentation
- 6200 is the core Firefly service
---
#### Cacti
| VMID | Hostname | Role | IP Address | Priority | Notes |
|------|----------|------|------------|----------|-------|
| **5200** | `cacti-1` | Hyperledger Cacti | 192.168.11.64 | **High** | Interop middleware |
**Specifications:**
- Memory: 4GB
- CPU: 2 cores
- Disk: 50GB
---
#### Fabric
| VMID | Hostname | Role | IP Address | Priority | Notes |
|------|----------|------|------------|----------|-------|
| **6000** | `fabric-1` | Hyperledger Fabric | 192.168.11.65 | Medium | Enterprise contracts |
**Specifications:**
- Memory: 8GB
- CPU: 4 cores
- Disk: 100GB
---
#### Indy
| VMID | Hostname | Role | IP Address | Priority | Notes |
|------|----------|------|------------|----------|-------|
| **6400** | `indy-1` | Hyperledger Indy | 192.168.11.68 | Medium | Identity layer |
**Specifications:**
- Memory: 8GB
- CPU: 4 cores
- Disk: 100GB
---
### 3. Explorer
#### Blockscout
| VMID | Hostname | Role | IP Address | Priority | Notes |
|------|----------|------|------------|----------|-------|
| **5000** | `blockscout-1` | Blockscout Explorer | TBD | **High** | Blockchain explorer for ChainID 138 |
**Specifications:**
- Memory: 8GB+
- CPU: 4 cores+
- Disk: 200GB+
- Requires: PostgreSQL database
---
## 📊 Deployment Priority
### Priority 1 - High (ChainID 138 Critical)
1. **1504** - `besu-sentry-5` (Ali's dedicated host)
2. **2503** - `besu-rpc-4` (Ali's RPC node - 0x8a identity)
3. **2504** - `besu-rpc-4` (Ali's RPC node - 0x1 identity)
4. **2505** - `besu-rpc-luis` (Luis's RPC container - 0x8a identity)
5. **2506** - `besu-rpc-luis` (Luis's RPC container - 0x1 identity)
6. **2507** - `besu-rpc-putu` (Putu's RPC container - 0x8a identity)
7. **2508** - `besu-rpc-putu` (Putu's RPC container - 0x1 identity)
8. **6201** - `firefly-2` (Ali's dedicated host, ChainID 138)
9. **5000** - `blockscout-1` (Explorer for ChainID 138)
**Note:** All RPC containers require JWT authentication via nginx reverse proxy.
### Priority 2 - High (Infrastructure)
5. **6200** - `firefly-1` (Core Firefly service)
6. **5200** - `cacti-1` (Interop middleware)
### Priority 3 - Medium
7. **6000** - `fabric-1` (Enterprise contracts)
8. **6400** - `indy-1` (Identity layer)
---
## ✅ Currently Deployed Containers
### Besu Network (12/14)
| VMID | Hostname | Status |
|------|----------|--------|
| 1000 | besu-validator-1 | ✅ Deployed |
| 1001 | besu-validator-2 | ✅ Deployed |
| 1002 | besu-validator-3 | ✅ Deployed |
| 1003 | besu-validator-4 | ✅ Deployed |
| 1004 | besu-validator-5 | ✅ Deployed |
| 1500 | besu-sentry-1 | ✅ Deployed |
| 1501 | besu-sentry-2 | ✅ Deployed |
| 1502 | besu-sentry-3 | ✅ Deployed |
| 1503 | besu-sentry-4 | ✅ Deployed |
| 1504 | besu-sentry-5 | ❌ **MISSING** |
| 2500 | besu-rpc-1 | ✅ Deployed |
| 2501 | besu-rpc-2 | ✅ Deployed |
| 2502 | besu-rpc-3 | ✅ Deployed |
| 2503 | besu-rpc-4 | ❌ **MISSING** |
### Services (2/4)
| VMID | Hostname | Status |
|------|----------|--------|
| 3500 | oracle-publisher-1 | ✅ Deployed |
| 3501 | ccip-monitor-1 | ✅ Deployed |
---
## 🚀 Deployment Scripts Available
### For Besu Nodes
- **Main deployment:** `smom-dbis-138-proxmox/scripts/deployment/deploy-besu-nodes.sh`
- **Configuration:** `scripts/configure-besu-chain138-nodes.sh`
- **Quick setup:** `scripts/setup-new-chain138-containers.sh`
### For Hyperledger Services
- **Deployment:** `smom-dbis-138-proxmox/scripts/deployment/deploy-hyperledger-services.sh`
### For Explorer
- **Deployment:** Check Blockscout deployment scripts
---
## 📝 Deployment Checklist
### Besu Nodes (Priority 1)
- [ ] **1504** - Create `besu-sentry-5` container
- [ ] Configure static-nodes.json
- [ ] Configure permissioned-nodes.json
- [ ] Enable discovery
- [ ] Verify peer connections
- [ ] Access: Ali (Full)
- [ ] **2503** - Create `besu-rpc-4` container (Ali's RPC - 0x8a)
- [ ] Use permissioned RPC configuration
- [ ] Configure static-nodes.json
- [ ] Configure permissioned-nodes.json
- [ ] **Disable discovery** (critical!)
- [ ] Configure permissioned identity (0x8a)
- [ ] Set up JWT authentication
- [ ] Access: Ali (Full)
- [ ] **2504** - Create `besu-rpc-4` container (Ali's RPC - 0x1)
- [ ] Use permissioned RPC configuration
- [ ] Configure static-nodes.json
- [ ] Configure permissioned-nodes.json
- [ ] **Disable discovery** (critical!)
- [ ] Configure permissioned identity (0x1)
- [ ] Set up JWT authentication
- [ ] Access: Ali (Full)
- [ ] **2505** - Create `besu-rpc-luis` container (Luis's RPC - 0x8a)
- [ ] Use permissioned RPC configuration
- [ ] Configure static-nodes.json
- [ ] Configure permissioned-nodes.json
- [ ] **Disable discovery** (critical!)
- [ ] Configure permissioned identity (0x8a)
- [ ] Set up JWT authentication
- [ ] Set up RPC-only access for Luis
- [ ] Access: Luis (RPC-only, 0x8a identity)
- [ ] **2506** - Create `besu-rpc-luis` container (Luis's RPC - 0x1)
- [ ] Use permissioned RPC configuration
- [ ] Configure static-nodes.json
- [ ] Configure permissioned-nodes.json
- [ ] **Disable discovery** (critical!)
- [ ] Configure permissioned identity (0x1)
- [ ] Set up JWT authentication
- [ ] Set up RPC-only access for Luis
- [ ] Access: Luis (RPC-only, 0x1 identity)
- [ ] **2507** - Create `besu-rpc-putu` container (Putu's RPC - 0x8a)
- [ ] Use permissioned RPC configuration
- [ ] Configure static-nodes.json
- [ ] Configure permissioned-nodes.json
- [ ] **Disable discovery** (critical!)
- [ ] Configure permissioned identity (0x8a)
- [ ] Set up JWT authentication
- [ ] Set up RPC-only access for Putu
- [ ] Access: Putu (RPC-only, 0x8a identity)
- [ ] **2508** - Create `besu-rpc-putu` container (Putu's RPC - 0x1)
- [ ] Use permissioned RPC configuration
- [ ] Configure static-nodes.json
- [ ] Configure permissioned-nodes.json
- [ ] **Disable discovery** (critical!)
- [ ] Configure permissioned identity (0x1)
- [ ] Set up JWT authentication
- [ ] Set up RPC-only access for Putu
- [ ] Access: Putu (RPC-only, 0x1 identity)
### Hyperledger Services
- [ ] **6200** - Create `firefly-1` container
- [ ] **6201** - Create `firefly-2` container (Ali's host)
- [ ] **5200** - Create `cacti-1` container
- [ ] **6000** - Create `fabric-1` container
- [ ] **6400** - Create `indy-1` container
### Explorer
- [ ] **5000** - Create `blockscout-1` container
- [ ] Set up PostgreSQL database
- [ ] Configure RPC endpoints
- [ ] Set up indexing
---
## 🔗 Related Documentation
- [ChainID 138 Configuration Guide](CHAIN138_BESU_CONFIGURATION.md)
- [ChainID 138 Quick Start](CHAIN138_QUICK_START.md)
- [VMID Allocation](smom-dbis-138-proxmox/config/proxmox.conf)
- [Deployment Plan](dbis_core/DEPLOYMENT_PLAN.md)
---
## 📊 Summary Statistics
**Total Missing:** 13 containers
- Besu Nodes: 7 (1504, 2503, 2504, 2505, 2506, 2507, 2508)
- Hyperledger Services: 5 (6200, 6201, 5200, 6000, 6400)
- Explorer: 1 (5000)
**Total Expected:** 25 containers
- Besu Network: 19 (12 existing + 7 new: 1504, 2503-2508)
- Hyperledger Services: 5
- Explorer: 1
**Deployment Rate:** 48% (12/25)
**Important:** All RPC containers (2503-2508) require JWT authentication via nginx reverse proxy.
---
**Last Updated:** December 26, 2024

View File

@@ -0,0 +1,81 @@
# Pre-Start Audit Plan - Hostnames and IP Addresses
**Date:** 2025-01-20
**Purpose:** Comprehensive audit and fix of hostnames and IP addresses before starting VMs
---
## Tasks
### 1. Hostname Migration
- **pve** (192.168.11.11) → **r630-01**
- **pve2** (192.168.11.12) → **r630-02**
### 2. IP Address Audit
- Check all VMs/containers across all Proxmox hosts
- Verify no IP conflicts
- Verify no invalid IPs (network/broadcast addresses)
- Document all IP assignments
### 3. Consistency Check
- Verify IPs match documentation
- Check for inconsistencies between hosts
- Ensure all static IPs are properly configured
---
## Scripts Available
1. **`scripts/comprehensive-ip-audit.sh`** - Audits all IPs for conflicts
2. **`scripts/migrate-hostnames-proxmox.sh`** - Migrates hostnames properly
---
## Execution Order
1. **Run IP Audit First**
```bash
./scripts/comprehensive-ip-audit.sh
```
2. **Fix any IP conflicts found**
3. **Migrate Hostnames**
```bash
./scripts/migrate-hostnames-proxmox.sh
```
4. **Re-run IP Audit to verify**
5. **Start VMs**
---
## Current Known IPs (from VMID_IP_ADDRESS_LIST.md)
### Validators (1000-1004)
- 192.168.11.100-104
### Sentries (1500-1503)
- 192.168.11.150-153
### RPC Nodes
- 192.168.11.240-242 (ThirdWeb)
- 192.168.11.250-252 (Public RPC)
- 192.168.11.201-204 (Named RPC)
### DBIS Core
- 192.168.11.105-106 (PostgreSQL)
- 192.168.11.120 (Redis)
- 192.168.11.130 (Frontend)
- 192.168.11.155-156 (API)
### Other Services
- 192.168.11.60-63 (ML nodes)
- 192.168.11.64 (Indy)
- 192.168.11.80 (Cacti)
- 192.168.11.112 (Fabric)
---
**Status:** Ready to execute

View File

@@ -0,0 +1,120 @@
# Pre-Start Checklist - Hostnames and IP Addresses
**Date:** 2025-01-20
**Purpose:** Complete audit and fixes before starting VMs on pve and pve2
---
## ✅ IP Address Audit - COMPLETE
**Status:** All IPs audited, no conflicts found
**Results:**
- All 34 VMs/containers are currently on **ml110** (192.168.11.10)
- **pve** (192.168.11.11) and **pve2** (192.168.11.12) have no VMs/containers yet
- **No IP conflicts detected** across all hosts
- **No invalid IPs** (network/broadcast addresses)
**Allocated IPs (34 total):**
- 192.168.11.57, .60-.64, .80, .100-.106, .112, .120, .130, .150-.156, .201-.204, .240-.242, .250-.254
---
## ⏳ Hostname Migration - PENDING
### Current State
- **pve** (192.168.11.11) - hostname: `pve`, should be: `r630-01`
- **pve2** (192.168.11.12) - hostname: `pve2`, should be: `r630-02`
### Migration Steps
**Script Available:** `scripts/migrate-hostnames-proxmox.sh`
**What it does:**
1. Updates `/etc/hostname` on both hosts
2. Updates `/etc/hosts` to ensure proper resolution
3. Restarts Proxmox services
4. Verifies hostname changes
**To execute:**
```bash
cd /home/intlc/projects/proxmox
./scripts/migrate-hostnames-proxmox.sh
```
**Manual steps (if script fails):**
```bash
# On pve (192.168.11.11)
ssh root@192.168.11.11
hostnamectl set-hostname r630-01
echo "r630-01" > /etc/hostname
# Update /etc/hosts to include: 192.168.11.11 r630-01 r630-01.sankofa.nexus pve pve.sankofa.nexus
systemctl restart pve-cluster pvestatd pvedaemon pveproxy
# On pve2 (192.168.11.12)
ssh root@192.168.11.12
hostnamectl set-hostname r630-02
echo "r630-02" > /etc/hostname
# Update /etc/hosts to include: 192.168.11.12 r630-02 r630-02.sankofa.nexus pve2 pve2.sankofa.nexus
systemctl restart pve-cluster pvestatd pvedaemon pveproxy
```
---
## Verification Steps
### 1. Verify Hostnames
```bash
ssh root@192.168.11.11 "hostname" # Should return: r630-01
ssh root@192.168.11.12 "hostname" # Should return: r630-02
```
### 2. Verify IP Resolution
```bash
ssh root@192.168.11.11 "getent hosts r630-01" # Should return: 192.168.11.11
ssh root@192.168.11.12 "getent hosts r630-02" # Should return: 192.168.11.12
```
### 3. Verify Proxmox Services
```bash
ssh root@192.168.11.11 "systemctl status pve-cluster pveproxy | grep Active"
ssh root@192.168.11.12 "systemctl status pve-cluster pveproxy | grep Active"
```
### 4. Re-run IP Audit
```bash
./scripts/check-all-vm-ips.sh
```
---
## Summary
### ✅ Completed
- [x] IP address audit across all hosts
- [x] Conflict detection (none found)
- [x] Invalid IP detection (none found)
- [x] Documentation of all IP assignments
### ⏳ Pending
- [ ] Hostname migration (pve → r630-01)
- [ ] Hostname migration (pve2 → r630-02)
- [ ] Verification of hostname changes
- [ ] Final IP audit after hostname changes
### 📋 Ready to Execute
1. Run hostname migration script
2. Verify changes
3. Start VMs on pve/pve2
---
## Scripts Available
1. **`scripts/check-all-vm-ips.sh`** - ✅ Working - Audits all IPs
2. **`scripts/migrate-hostnames-proxmox.sh`** - Ready - Migrates hostnames
3. **`scripts/diagnose-proxmox-hosts.sh`** - ✅ Working - Diagnostics
---
**Status:** IP audit complete, ready for hostname migration