Complete markdown files cleanup and organization
- Organized 252 files across project - Root directory: 187 → 2 files (98.9% reduction) - Moved configuration guides to docs/04-configuration/ - Moved troubleshooting guides to docs/09-troubleshooting/ - Moved quick start guides to docs/01-getting-started/ - Moved reports to reports/ directory - Archived temporary files - Generated comprehensive reports and documentation - Created maintenance scripts and guides All files organized according to established standards.
This commit is contained in:
342
docs/03-deployment/BACKUP_AND_RESTORE.md
Normal file
342
docs/03-deployment/BACKUP_AND_RESTORE.md
Normal file
@@ -0,0 +1,342 @@
|
||||
# Backup and Restore Procedures
|
||||
|
||||
**Last Updated:** 2025-01-20
|
||||
**Document Version:** 1.0
|
||||
**Status:** Active Documentation
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This document provides detailed procedures for backing up and restoring Proxmox VMs, containers, and configuration.
|
||||
|
||||
---
|
||||
|
||||
## Backup Strategy
|
||||
|
||||
### Backup Types
|
||||
|
||||
1. **VM/Container Backups:**
|
||||
- Full VM snapshots
|
||||
- Container backups
|
||||
- Application data backups
|
||||
|
||||
2. **Configuration Backups:**
|
||||
- Proxmox host configuration
|
||||
- Network configuration
|
||||
- Storage configuration
|
||||
|
||||
3. **Data Backups:**
|
||||
- Database backups
|
||||
- Application data
|
||||
- Configuration files
|
||||
|
||||
---
|
||||
|
||||
## Backup Procedures
|
||||
|
||||
### Proxmox VM/Container Backups
|
||||
|
||||
#### Using Proxmox Backup Server (PBS)
|
||||
|
||||
**Setup:**
|
||||
|
||||
1. **Install PBS** (if not already installed)
|
||||
2. **Add PBS to Proxmox:**
|
||||
- Datacenter → Storage → Add → Proxmox Backup Server
|
||||
- Enter PBS server details
|
||||
- Test connection
|
||||
|
||||
**Scheduled Backups:**
|
||||
|
||||
1. **Create Backup Job:**
|
||||
- Datacenter → Backup → Add
|
||||
- Select VMs/containers
|
||||
- Set schedule (daily, weekly, etc.)
|
||||
- Choose retention policy
|
||||
|
||||
2. **Backup Options:**
|
||||
- **Mode:** Snapshot (recommended for running VMs)
|
||||
- **Compression:** ZSTD (recommended)
|
||||
- **Storage:** Proxmox Backup Server
|
||||
|
||||
**Manual Backup:**
|
||||
|
||||
```bash
|
||||
# Backup single VM
|
||||
vzdump <vmid> --storage <storage-name> --mode snapshot
|
||||
|
||||
# Backup multiple VMs
|
||||
vzdump 100 101 102 --storage <storage-name> --mode snapshot
|
||||
|
||||
# Backup all VMs
|
||||
vzdump --all --storage <storage-name> --mode snapshot
|
||||
```
|
||||
|
||||
#### Using vzdump (Direct)
|
||||
|
||||
**Backup to Local Storage:**
|
||||
|
||||
```bash
|
||||
# Backup VM to local storage
|
||||
vzdump <vmid> --storage local --mode snapshot --compress zstd
|
||||
|
||||
# Backup with retention
|
||||
vzdump <vmid> --storage local --mode snapshot --maxfiles 7
|
||||
```
|
||||
|
||||
**Backup to NFS:**
|
||||
|
||||
```bash
|
||||
# Add NFS storage first
|
||||
# Datacenter → Storage → Add → NFS
|
||||
|
||||
# Backup to NFS
|
||||
vzdump <vmid> --storage nfs-backup --mode snapshot
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Configuration Backups
|
||||
|
||||
#### Proxmox Host Configuration
|
||||
|
||||
**Backup Configuration Files:**
|
||||
|
||||
```bash
|
||||
# Backup Proxmox configuration
|
||||
tar -czf /backup/proxmox-config-$(date +%Y%m%d).tar.gz \
|
||||
/etc/pve/ \
|
||||
/etc/network/interfaces \
|
||||
/etc/hosts \
|
||||
/etc/hostname
|
||||
```
|
||||
|
||||
**Restore Configuration:**
|
||||
|
||||
```bash
|
||||
# Extract configuration
|
||||
tar -xzf /backup/proxmox-config-YYYYMMDD.tar.gz -C /
|
||||
|
||||
# Restart services
|
||||
systemctl restart pve-cluster
|
||||
systemctl restart pve-daemon
|
||||
```
|
||||
|
||||
#### Network Configuration
|
||||
|
||||
**Backup Network Config:**
|
||||
|
||||
```bash
|
||||
# Backup network configuration
|
||||
cp /etc/network/interfaces /backup/interfaces-$(date +%Y%m%d)
|
||||
cp /etc/hosts /backup/hosts-$(date +%Y%m%d)
|
||||
```
|
||||
|
||||
**Version Control:**
|
||||
|
||||
- Store network configuration in Git
|
||||
- Track changes over time
|
||||
- Easy rollback if needed
|
||||
|
||||
---
|
||||
|
||||
### Application Data Backups
|
||||
|
||||
#### Database Backups
|
||||
|
||||
**PostgreSQL:**
|
||||
|
||||
```bash
|
||||
# Backup PostgreSQL database
|
||||
pg_dump -U <user> <database> > /backup/db-$(date +%Y%m%d).sql
|
||||
|
||||
# Restore
|
||||
psql -U <user> <database> < /backup/db-YYYYMMDD.sql
|
||||
```
|
||||
|
||||
**MySQL/MariaDB:**
|
||||
|
||||
```bash
|
||||
# Backup MySQL database
|
||||
mysqldump -u <user> -p <database> > /backup/db-$(date +%Y%m%d).sql
|
||||
|
||||
# Restore
|
||||
mysql -u <user> -p <database> < /backup/db-YYYYMMDD.sql
|
||||
```
|
||||
|
||||
#### Application Files
|
||||
|
||||
```bash
|
||||
# Backup application directory
|
||||
tar -czf /backup/app-$(date +%Y%m%d).tar.gz /path/to/application
|
||||
|
||||
# Restore
|
||||
tar -xzf /backup/app-YYYYMMDD.tar.gz -C /
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Restore Procedures
|
||||
|
||||
### Restore VM/Container from Backup
|
||||
|
||||
#### From Proxmox Backup Server
|
||||
|
||||
**Via Web UI:**
|
||||
|
||||
1. **Select VM/Container:**
|
||||
- Datacenter → Backup → Select backup
|
||||
- Click "Restore"
|
||||
|
||||
2. **Restore Options:**
|
||||
- Select target storage
|
||||
- Choose new VMID (or keep original)
|
||||
- Set network configuration
|
||||
|
||||
3. **Start Restore:**
|
||||
- Click "Restore"
|
||||
- Monitor progress
|
||||
|
||||
**Via Command Line:**
|
||||
|
||||
```bash
|
||||
# Restore from PBS
|
||||
vzdump restore <backup-id> <vmid> --storage <storage>
|
||||
|
||||
# Restore with new VMID
|
||||
vzdump restore <backup-id> <new-vmid> --storage <storage>
|
||||
```
|
||||
|
||||
#### From vzdump Backup
|
||||
|
||||
```bash
|
||||
# Restore from vzdump file
|
||||
vzdump restore <backup-file.vma.gz> <vmid> --storage <storage>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Restore Configuration
|
||||
|
||||
#### Restore Proxmox Configuration
|
||||
|
||||
```bash
|
||||
# Stop Proxmox services
|
||||
systemctl stop pve-cluster
|
||||
systemctl stop pve-daemon
|
||||
|
||||
# Restore configuration
|
||||
tar -xzf /backup/proxmox-config-YYYYMMDD.tar.gz -C /
|
||||
|
||||
# Start services
|
||||
systemctl start pve-cluster
|
||||
systemctl start pve-daemon
|
||||
```
|
||||
|
||||
#### Restore Network Configuration
|
||||
|
||||
```bash
|
||||
# Restore network config
|
||||
cp /backup/interfaces-YYYYMMDD /etc/network/interfaces
|
||||
cp /backup/hosts-YYYYMMDD /etc/hosts
|
||||
|
||||
# Restart networking
|
||||
systemctl restart networking
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Backup Verification
|
||||
|
||||
### Verify Backup Integrity
|
||||
|
||||
**Check Backup Files:**
|
||||
|
||||
```bash
|
||||
# List backups
|
||||
vzdump list --storage <storage>
|
||||
|
||||
# Verify backup
|
||||
vzdump verify <backup-id>
|
||||
```
|
||||
|
||||
**Test Restore:**
|
||||
|
||||
- Monthly restore test
|
||||
- Verify VM/container starts
|
||||
- Test application functionality
|
||||
- Document results
|
||||
|
||||
---
|
||||
|
||||
## Backup Retention Policy
|
||||
|
||||
### Retention Schedule
|
||||
|
||||
- **Daily Backups:** Keep 7 days
|
||||
- **Weekly Backups:** Keep 4 weeks
|
||||
- **Monthly Backups:** Keep 12 months
|
||||
- **Yearly Backups:** Keep 7 years
|
||||
|
||||
### Cleanup Old Backups
|
||||
|
||||
```bash
|
||||
# Remove backups older than retention period
|
||||
vzdump prune --storage <storage> --keep-last 7
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Backup Monitoring
|
||||
|
||||
### Backup Status Monitoring
|
||||
|
||||
**Check Backup Jobs:**
|
||||
|
||||
- Datacenter → Backup → Jobs
|
||||
- Review last backup time
|
||||
- Check for errors
|
||||
|
||||
**Automated Monitoring:**
|
||||
|
||||
- Set up alerts for failed backups
|
||||
- Monitor backup storage usage
|
||||
- Track backup completion times
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Test Restores Regularly:**
|
||||
- Monthly restore tests
|
||||
- Verify data integrity
|
||||
- Document results
|
||||
|
||||
2. **Multiple Backup Locations:**
|
||||
- Local backups (fast restore)
|
||||
- Remote backups (disaster recovery)
|
||||
- Offsite backups (complete protection)
|
||||
|
||||
3. **Document Backup Procedures:**
|
||||
- Keep procedures up to date
|
||||
- Document restore procedures
|
||||
- Maintain backup inventory
|
||||
|
||||
4. **Monitor Backup Storage:**
|
||||
- Check available space regularly
|
||||
- Clean up old backups
|
||||
- Plan for storage growth
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **[DISASTER_RECOVERY.md](DISASTER_RECOVERY.md)** - Disaster recovery procedures
|
||||
- **[OPERATIONAL_RUNBOOKS.md](OPERATIONAL_RUNBOOKS.md)** - Operational procedures
|
||||
- **[../../04-configuration/SECRETS_KEYS_CONFIGURATION.md](../../04-configuration/SECRETS_KEYS_CONFIGURATION.md)** - Secrets backup
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** 2025-01-20
|
||||
**Review Cycle:** Monthly
|
||||
229
docs/03-deployment/CHAIN138_AUTOMATION_SCRIPTS.md
Normal file
229
docs/03-deployment/CHAIN138_AUTOMATION_SCRIPTS.md
Normal file
@@ -0,0 +1,229 @@
|
||||
# ChainID 138 Automation Scripts
|
||||
|
||||
**Date:** December 26, 2024
|
||||
**Status:** ✅ All automation scripts created and ready
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This document describes the automation scripts created for ChainID 138 deployment. These scripts can be run once containers are created to automate the complete configuration process.
|
||||
|
||||
---
|
||||
|
||||
## Available Scripts
|
||||
|
||||
### 1. Main Deployment Script
|
||||
|
||||
**File:** `scripts/deploy-all-chain138-containers.sh`
|
||||
|
||||
**Purpose:** Master script that orchestrates the complete deployment process.
|
||||
|
||||
**What it does:**
|
||||
1. Configures all Besu nodes (static-nodes.json, permissioned-nodes.json)
|
||||
2. Verifies configuration
|
||||
3. Sets up JWT authentication for RPC containers
|
||||
4. Generates JWT tokens for operators
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
cd /home/intlc/projects/proxmox
|
||||
./scripts/deploy-all-chain138-containers.sh
|
||||
```
|
||||
|
||||
**Note:** This script will prompt for confirmation before proceeding.
|
||||
|
||||
---
|
||||
|
||||
### 2. JWT Authentication Setup
|
||||
|
||||
**File:** `scripts/setup-jwt-auth-all-rpc-containers.sh`
|
||||
|
||||
**Purpose:** Configures JWT authentication for all RPC containers (2503-2508).
|
||||
|
||||
**What it does:**
|
||||
- Installs nginx and dependencies on each container
|
||||
- Generates JWT secret keys
|
||||
- Creates JWT validation service
|
||||
- Configures nginx with JWT authentication
|
||||
- Sets up SSL certificates
|
||||
- Starts JWT validation service and nginx
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./scripts/setup-jwt-auth-all-rpc-containers.sh
|
||||
```
|
||||
|
||||
**Requirements:**
|
||||
- Containers must be running
|
||||
- SSH access to Proxmox host
|
||||
- Root access on Proxmox host
|
||||
|
||||
---
|
||||
|
||||
### 3. JWT Token Generation
|
||||
|
||||
**File:** `scripts/generate-jwt-token-for-container.sh`
|
||||
|
||||
**Purpose:** Generates JWT tokens for specific containers and operators.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
# Generate token for a specific container
|
||||
./scripts/generate-jwt-token-for-container.sh <VMID> <username> [expiry_days]
|
||||
|
||||
# Examples:
|
||||
./scripts/generate-jwt-token-for-container.sh 2503 ali-full-access 365
|
||||
./scripts/generate-jwt-token-for-container.sh 2505 luis-rpc-access 365
|
||||
./scripts/generate-jwt-token-for-container.sh 2507 putu-rpc-access 365
|
||||
```
|
||||
|
||||
**Parameters:**
|
||||
- `VMID`: Container VMID (2503-2508)
|
||||
- `username`: Username for the token (e.g., ali-full-access, luis-rpc-access)
|
||||
- `expiry_days`: Token expiry in days (default: 365)
|
||||
|
||||
**Output:**
|
||||
- JWT token
|
||||
- Usage example with curl command
|
||||
|
||||
---
|
||||
|
||||
### 4. Besu Configuration
|
||||
|
||||
**File:** `scripts/configure-besu-chain138-nodes.sh`
|
||||
|
||||
**Purpose:** Configures all Besu nodes with static-nodes.json and permissioned-nodes.json.
|
||||
|
||||
**What it does:**
|
||||
1. Collects enodes from all Besu nodes
|
||||
2. Generates static-nodes.json
|
||||
3. Generates permissioned-nodes.json
|
||||
4. Deploys configurations to all containers
|
||||
5. Configures discovery settings
|
||||
6. Restarts Besu services
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./scripts/configure-besu-chain138-nodes.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 5. Configuration Verification
|
||||
|
||||
**File:** `scripts/verify-chain138-config.sh`
|
||||
|
||||
**Purpose:** Verifies the configuration of all Besu nodes.
|
||||
|
||||
**What it checks:**
|
||||
- File existence (static-nodes.json, permissioned-nodes.json)
|
||||
- Discovery settings
|
||||
- Peer connections
|
||||
- Service status
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
./scripts/verify-chain138-config.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deployment Workflow
|
||||
|
||||
### Step 1: Create Containers
|
||||
|
||||
First, create all required containers (see `docs/MISSING_CONTAINERS_LIST.md`):
|
||||
|
||||
- 1504 - besu-sentry-5
|
||||
- 2503-2508 - All RPC nodes
|
||||
- 6201 - firefly-2
|
||||
- Other services as needed
|
||||
|
||||
### Step 2: Run Main Deployment Script
|
||||
|
||||
Once containers are created and running:
|
||||
|
||||
```bash
|
||||
cd /home/intlc/projects/proxmox
|
||||
./scripts/deploy-all-chain138-containers.sh
|
||||
```
|
||||
|
||||
This will:
|
||||
1. Configure all Besu nodes
|
||||
2. Verify configuration
|
||||
3. Set up JWT authentication
|
||||
4. Generate JWT tokens
|
||||
|
||||
### Step 3: Test and Verify
|
||||
|
||||
After deployment:
|
||||
|
||||
```bash
|
||||
# Verify configuration
|
||||
./scripts/verify-chain138-config.sh
|
||||
|
||||
# Test JWT authentication on each container
|
||||
for vmid in 2503 2504 2505 2506 2507 2508; do
|
||||
echo "Testing VMID $vmid:"
|
||||
curl -k -H "Authorization: Bearer <TOKEN>" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}' \
|
||||
https://192.168.11.XXX/
|
||||
done
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Token Distribution
|
||||
|
||||
After generating tokens, distribute them to operators:
|
||||
|
||||
### Ali (Full Access)
|
||||
- VMID 2503 (0x8a identity): Full access token
|
||||
- VMID 2504 (0x1 identity): Full access token
|
||||
|
||||
### Luis (RPC-Only Access)
|
||||
- VMID 2505 (0x8a identity): RPC-only token
|
||||
- VMID 2506 (0x1 identity): RPC-only token
|
||||
|
||||
### Putu (RPC-Only Access)
|
||||
- VMID 2507 (0x8a identity): RPC-only token
|
||||
- VMID 2508 (0x1 identity): RPC-only token
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Containers Not Running
|
||||
|
||||
If containers are not running, the scripts will skip them with a warning. Re-run the scripts after containers are started.
|
||||
|
||||
### JWT Secret Not Found
|
||||
|
||||
If JWT secret is not found:
|
||||
1. Run `setup-jwt-auth-all-rpc-containers.sh` first
|
||||
2. Check that container is running
|
||||
3. Verify SSH access to Proxmox host
|
||||
|
||||
### Configuration Files Not Found
|
||||
|
||||
If configuration files are missing:
|
||||
1. Run `configure-besu-chain138-nodes.sh` first
|
||||
2. Check that all Besu containers are running
|
||||
3. Verify network connectivity
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Next Steps](CHAIN138_NEXT_STEPS.md)
|
||||
- [Missing Containers List](MISSING_CONTAINERS_LIST.md)
|
||||
- [JWT Authentication Requirements](CHAIN138_JWT_AUTH_REQUIREMENTS.md)
|
||||
- [Complete Implementation](CHAIN138_COMPLETE_IMPLEMENTATION.md)
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** December 26, 2024
|
||||
**Status:** ✅ Ready for use
|
||||
|
||||
278
docs/03-deployment/CHANGE_MANAGEMENT.md
Normal file
278
docs/03-deployment/CHANGE_MANAGEMENT.md
Normal file
@@ -0,0 +1,278 @@
|
||||
# Change Management Process
|
||||
|
||||
**Last Updated:** 2025-01-20
|
||||
**Document Version:** 1.0
|
||||
**Status:** Active Documentation
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This document defines the change management process for the Proxmox infrastructure, ensuring all changes are properly planned, approved, implemented, and documented.
|
||||
|
||||
---
|
||||
|
||||
## Change Types
|
||||
|
||||
### Standard Changes
|
||||
|
||||
**Definition:** Pre-approved, low-risk changes that follow established procedures.
|
||||
|
||||
**Examples:**
|
||||
- Routine maintenance
|
||||
- Scheduled updates
|
||||
- Standard VM/container deployments
|
||||
|
||||
**Process:**
|
||||
- No formal approval required
|
||||
- Document in change log
|
||||
- Follow standard procedures
|
||||
|
||||
### Normal Changes
|
||||
|
||||
**Definition:** Changes that require review and approval but are not emergency.
|
||||
|
||||
**Examples:**
|
||||
- Network configuration changes
|
||||
- Storage modifications
|
||||
- Security updates
|
||||
- New service deployments
|
||||
|
||||
**Process:**
|
||||
- Submit change request
|
||||
- Review and approval
|
||||
- Schedule implementation
|
||||
- Document results
|
||||
|
||||
### Emergency Changes
|
||||
|
||||
**Definition:** Urgent changes required to resolve critical issues.
|
||||
|
||||
**Examples:**
|
||||
- Security patches
|
||||
- Critical bug fixes
|
||||
- Service restoration
|
||||
|
||||
**Process:**
|
||||
- Implement immediately
|
||||
- Document during/after
|
||||
- Post-implementation review
|
||||
- Retrospective approval
|
||||
|
||||
---
|
||||
|
||||
## Change Request Process
|
||||
|
||||
### 1. Change Request Submission
|
||||
|
||||
**Required Information:**
|
||||
|
||||
1. **Change Details:**
|
||||
- Description of change
|
||||
- Reason for change
|
||||
- Expected impact
|
||||
|
||||
2. **Technical Details:**
|
||||
- Systems affected
|
||||
- Implementation steps
|
||||
- Rollback plan
|
||||
|
||||
3. **Risk Assessment:**
|
||||
- Risk level (Low/Medium/High)
|
||||
- Potential impact
|
||||
- Mitigation strategies
|
||||
|
||||
4. **Timeline:**
|
||||
- Proposed implementation date
|
||||
- Estimated duration
|
||||
- Maintenance window (if needed)
|
||||
|
||||
### 2. Change Review
|
||||
|
||||
**Review Criteria:**
|
||||
|
||||
1. **Technical Review:**
|
||||
- Feasibility
|
||||
- Impact assessment
|
||||
- Risk evaluation
|
||||
|
||||
2. **Business Review:**
|
||||
- Business impact
|
||||
- Resource requirements
|
||||
- Timeline alignment
|
||||
|
||||
3. **Security Review:**
|
||||
- Security implications
|
||||
- Compliance requirements
|
||||
- Risk assessment
|
||||
|
||||
### 3. Change Approval
|
||||
|
||||
**Approval Levels:**
|
||||
|
||||
- **Standard Changes:** No approval required
|
||||
- **Normal Changes:** Infrastructure lead approval
|
||||
- **High-Risk Changes:** Management approval
|
||||
- **Emergency Changes:** Post-implementation approval
|
||||
|
||||
### 4. Change Implementation
|
||||
|
||||
**Pre-Implementation:**
|
||||
|
||||
1. **Preparation:**
|
||||
- Verify backups
|
||||
- Prepare rollback plan
|
||||
- Notify stakeholders
|
||||
- Schedule maintenance window (if needed)
|
||||
|
||||
2. **Implementation:**
|
||||
- Follow documented procedures
|
||||
- Document steps taken
|
||||
- Monitor for issues
|
||||
|
||||
3. **Verification:**
|
||||
- Test functionality
|
||||
- Verify system health
|
||||
- Check logs for errors
|
||||
|
||||
### 5. Post-Implementation
|
||||
|
||||
**Activities:**
|
||||
|
||||
1. **Documentation:**
|
||||
- Update documentation
|
||||
- Document any issues
|
||||
- Update change log
|
||||
|
||||
2. **Review:**
|
||||
- Post-implementation review
|
||||
- Lessons learned
|
||||
- Process improvements
|
||||
|
||||
---
|
||||
|
||||
## Change Request Template
|
||||
|
||||
```markdown
|
||||
# Change Request
|
||||
|
||||
## Change Information
|
||||
- **Requestor:** [Name]
|
||||
- **Date:** [Date]
|
||||
- **Change Type:** [Standard/Normal/Emergency]
|
||||
- **Priority:** [Low/Medium/High/Critical]
|
||||
|
||||
## Change Description
|
||||
[Detailed description of the change]
|
||||
|
||||
## Reason for Change
|
||||
[Why is this change needed?]
|
||||
|
||||
## Systems Affected
|
||||
[List of systems, VMs, containers, or services]
|
||||
|
||||
## Implementation Plan
|
||||
[Step-by-step implementation plan]
|
||||
|
||||
## Rollback Plan
|
||||
[How to rollback if issues occur]
|
||||
|
||||
## Risk Assessment
|
||||
- **Risk Level:** [Low/Medium/High]
|
||||
- **Potential Impact:** [Description]
|
||||
- **Mitigation:** [How to mitigate risks]
|
||||
|
||||
## Testing Plan
|
||||
[How the change will be tested]
|
||||
|
||||
## Timeline
|
||||
- **Proposed Date:** [Date]
|
||||
- **Estimated Duration:** [Time]
|
||||
- **Maintenance Window:** [If applicable]
|
||||
|
||||
## Approval
|
||||
- **Reviewed By:** [Name]
|
||||
- **Approved By:** [Name]
|
||||
- **Date:** [Date]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Change Log
|
||||
|
||||
### Change Log Format
|
||||
|
||||
| Date | Change ID | Description | Type | Status | Implemented By |
|
||||
|------|-----------|-------------|------|--------|----------------|
|
||||
| 2025-01-20 | CHG-001 | Network VLAN configuration | Normal | Completed | [Name] |
|
||||
| 2025-01-19 | CHG-002 | Security patch deployment | Emergency | Completed | [Name] |
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Plan Ahead:**
|
||||
- Submit change requests early
|
||||
- Allow time for review
|
||||
- Schedule during maintenance windows
|
||||
|
||||
2. **Document Everything:**
|
||||
- Document all changes
|
||||
- Keep change log updated
|
||||
- Update procedures
|
||||
|
||||
3. **Test First:**
|
||||
- Test in non-production
|
||||
- Verify rollback procedures
|
||||
- Document test results
|
||||
|
||||
4. **Communicate:**
|
||||
- Notify stakeholders
|
||||
- Provide status updates
|
||||
- Document issues
|
||||
|
||||
5. **Review Regularly:**
|
||||
- Review change process
|
||||
- Identify improvements
|
||||
- Update procedures
|
||||
|
||||
---
|
||||
|
||||
## Emergency Change Process
|
||||
|
||||
### When to Use
|
||||
|
||||
- Critical security issues
|
||||
- Service outages
|
||||
- Data loss prevention
|
||||
- Regulatory compliance
|
||||
|
||||
### Process
|
||||
|
||||
1. **Implement Immediately:**
|
||||
- Take necessary action
|
||||
- Document as you go
|
||||
- Notify stakeholders
|
||||
|
||||
2. **Post-Implementation:**
|
||||
- Complete change request
|
||||
- Document what was done
|
||||
- Conduct review
|
||||
|
||||
3. **Retrospective:**
|
||||
- Review emergency change
|
||||
- Identify improvements
|
||||
- Update procedures
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **[OPERATIONAL_RUNBOOKS.md](OPERATIONAL_RUNBOOKS.md)** - Operational procedures
|
||||
- **[DISASTER_RECOVERY.md](DISASTER_RECOVERY.md)** - Disaster recovery
|
||||
- **[DEPLOYMENT_READINESS.md](DEPLOYMENT_READINESS.md)** - Deployment procedures
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** 2025-01-20
|
||||
**Review Cycle:** Quarterly
|
||||
@@ -40,6 +40,39 @@
|
||||
|
||||
---
|
||||
|
||||
## Deployment Decision Tree
|
||||
|
||||
```mermaid
|
||||
flowchart TD
|
||||
Start[New Deployment?] --> EnvType{Environment Type?}
|
||||
|
||||
EnvType -->|Production| ProdCheck{Production Ready?}
|
||||
EnvType -->|Staging| StagingDeploy[Staging Deployment]
|
||||
EnvType -->|Development| DevDeploy[Development Deployment]
|
||||
|
||||
ProdCheck -->|No| PrepProd[Prepare Production<br/>Review Checklist<br/>Verify Resources]
|
||||
ProdCheck -->|Yes| ProdDeploy[Production Deployment]
|
||||
PrepProd --> ProdDeploy
|
||||
|
||||
ProdDeploy --> WhichComponents{Which Components?}
|
||||
StagingDeploy --> WhichComponents
|
||||
DevDeploy --> WhichComponents
|
||||
|
||||
WhichComponents -->|Full Stack| FullDeploy[Deploy Full Stack<br/>Validators, Sentries, RPC,<br/>Services, Monitoring]
|
||||
WhichComponents -->|Besu Only| BesuDeploy[Deploy Besu Network<br/>Validators, Sentries, RPC]
|
||||
WhichComponents -->|CCIP Only| CCIPDeploy[Deploy CCIP Fleet<br/>Commit, Execute, RMN]
|
||||
WhichComponents -->|Services Only| ServicesDeploy[Deploy Services<br/>Blockscout, Cacti, etc.]
|
||||
|
||||
FullDeploy --> ValidateDeploy[Validate Deployment]
|
||||
BesuDeploy --> ValidateDeploy
|
||||
CCIPDeploy --> ValidateDeploy
|
||||
ServicesDeploy --> ValidateDeploy
|
||||
|
||||
ValidateDeploy --> DeployComplete[Deployment Complete]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Deployment Steps
|
||||
|
||||
### Step 1: Review Configuration
|
||||
|
||||
232
docs/03-deployment/DEPLOYMENT_READINESS_CHECKLIST.md
Normal file
232
docs/03-deployment/DEPLOYMENT_READINESS_CHECKLIST.md
Normal file
@@ -0,0 +1,232 @@
|
||||
# Chain 138 Deployment Readiness Checklist
|
||||
|
||||
**Date**: $(date)
|
||||
**Purpose**: Verify all prerequisites are met before deploying smart contracts
|
||||
|
||||
---
|
||||
|
||||
## ✅ Network Readiness
|
||||
|
||||
### RPC Endpoints
|
||||
|
||||
- [x] **RPC-01 (VMID 2500)**: ✅ Operational
|
||||
- IP: 192.168.11.250
|
||||
- HTTP RPC: Port 8545 ✅ Listening
|
||||
- WebSocket RPC: Port 8546 ✅ Listening
|
||||
- P2P: Port 30303 ✅ Listening
|
||||
- Metrics: Port 9545 ✅ Listening
|
||||
- Status: Active, syncing blocks
|
||||
|
||||
- [ ] **RPC-02 (VMID 2501)**: ⏳ Check status
|
||||
- [ ] **RPC-03 (VMID 2502)**: ⏳ Check status
|
||||
|
||||
### Network Connectivity
|
||||
|
||||
- [x] RPC endpoint responds to `eth_blockNumber`
|
||||
- [x] RPC endpoint responds to `eth_chainId`
|
||||
- [x] Chain ID verified: 138
|
||||
- [x] Network producing blocks (block number > 0)
|
||||
|
||||
### Validator Network
|
||||
|
||||
- [ ] All validators (1000-1004) operational
|
||||
- [ ] Network consensus active
|
||||
- [ ] Block production stable
|
||||
|
||||
---
|
||||
|
||||
## ✅ Configuration Readiness
|
||||
|
||||
### Deployment Scripts
|
||||
|
||||
- [x] **Deployment script updated**: `deploy-contracts-once-ready.sh`
|
||||
- IP address updated: `10.3.1.4:8545` → `192.168.11.250:8545`
|
||||
- Location: `/home/intlc/projects/smom-dbis-138/scripts/deployment/`
|
||||
|
||||
- [x] **Installation scripts updated**: All service install scripts
|
||||
- Oracle Publisher: ✅ Updated
|
||||
- CCIP Monitor: ✅ Updated
|
||||
- Keeper: ✅ Updated
|
||||
- Financial Tokenization: ✅ Updated
|
||||
- Firefly: ✅ Updated
|
||||
- Cacti: ✅ Updated
|
||||
- Blockscout: ✅ Updated
|
||||
|
||||
### Configuration Templates
|
||||
|
||||
- [x] **Besu RPC config template**: ✅ Updated
|
||||
- Deprecated options removed
|
||||
- File: `templates/besu-configs/config-rpc.toml`
|
||||
|
||||
- [x] **Service installation script**: ✅ Updated
|
||||
- Config file name corrected
|
||||
- File: `install/besu-rpc-install.sh`
|
||||
|
||||
---
|
||||
|
||||
## ⏳ Deployment Prerequisites
|
||||
|
||||
### Environment Setup
|
||||
|
||||
- [ ] **Source project `.env` file configured**
|
||||
- Location: `/home/intlc/projects/smom-dbis-138/.env`
|
||||
- Required variables:
|
||||
- `RPC_URL_138=http://192.168.11.250:8545`
|
||||
- `PRIVATE_KEY=<deployer-private-key>`
|
||||
- `RESERVE_ADMIN=<admin-address>`
|
||||
- `KEEPER_ADDRESS=<keeper-address>`
|
||||
- `ORACLE_PRICE_FEED=<oracle-address>` (after Oracle deployment)
|
||||
|
||||
### Deployer Account
|
||||
|
||||
- [ ] **Deployer account has sufficient balance**
|
||||
- Check balance: `cast balance <deployer-address> --rpc-url http://192.168.11.250:8545`
|
||||
- Minimum recommended: 1 ETH equivalent
|
||||
|
||||
### Network Verification
|
||||
|
||||
- [x] **Network is producing blocks**
|
||||
- Verified: ✅ Yes
|
||||
- Current block: > 11,200 (as of troubleshooting)
|
||||
|
||||
- [x] **Chain ID correct**
|
||||
- Expected: 138
|
||||
- Verified: ✅ Yes
|
||||
|
||||
---
|
||||
|
||||
## 📋 Contract Deployment Order
|
||||
|
||||
### Phase 1: Core Infrastructure (Priority 1)
|
||||
|
||||
1. [ ] **Oracle Contract**
|
||||
- Script: `DeployOracle.s.sol`
|
||||
- Dependencies: None
|
||||
- Required for: Keeper, Price Feeds
|
||||
|
||||
2. [ ] **CCIP Router**
|
||||
- Script: `DeployCCIPRouter.s.sol`
|
||||
- Dependencies: None
|
||||
- Required for: CCIP Sender, Cross-chain operations
|
||||
|
||||
3. [ ] **CCIP Sender**
|
||||
- Script: `DeployCCIPSender.s.sol`
|
||||
- Dependencies: CCIP Router
|
||||
- Required for: Cross-chain messaging
|
||||
|
||||
### Phase 2: Supporting Contracts (Priority 2)
|
||||
|
||||
4. [ ] **Multicall**
|
||||
- Script: `DeployMulticall.s.sol`
|
||||
- Dependencies: None
|
||||
- Utility contract
|
||||
|
||||
5. [ ] **MultiSig**
|
||||
- Script: `DeployMultiSig.s.sol`
|
||||
- Dependencies: None
|
||||
- Governance contract
|
||||
|
||||
### Phase 3: Application Contracts (Priority 3)
|
||||
|
||||
6. [ ] **Price Feed Keeper**
|
||||
- Script: `reserve/DeployKeeper.s.sol`
|
||||
- Dependencies: Oracle Price Feed
|
||||
- Required for: Automated price updates
|
||||
|
||||
7. [ ] **Reserve System**
|
||||
- Script: `reserve/DeployReserveSystem.s.sol`
|
||||
- Dependencies: Token Factory (if applicable)
|
||||
- Required for: Financial tokenization
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Service Configuration
|
||||
|
||||
### After Contract Deployment
|
||||
|
||||
Once contracts are deployed, update service configurations:
|
||||
|
||||
- [ ] **Oracle Publisher (VMID 3500)**
|
||||
- Update `.env` with Oracle contract address
|
||||
- Restart service
|
||||
|
||||
- [ ] **CCIP Monitor (VMID 3501)**
|
||||
- Update `.env` with CCIP Router and Sender addresses
|
||||
- Restart service
|
||||
|
||||
- [ ] **Keeper (VMID 3502)**
|
||||
- Update `.env` with Keeper contract address
|
||||
- Restart service
|
||||
|
||||
- [ ] **Financial Tokenization (VMID 3503)**
|
||||
- Update `.env` with Reserve System address
|
||||
- Restart service
|
||||
|
||||
---
|
||||
|
||||
## ✅ Verification Steps
|
||||
|
||||
### After Deployment
|
||||
|
||||
1. **Verify Contracts on Chain**
|
||||
```bash
|
||||
cast code <contract-address> --rpc-url http://192.168.11.250:8545
|
||||
```
|
||||
|
||||
2. **Verify Service Connections**
|
||||
```bash
|
||||
# Test Oracle Publisher
|
||||
pct exec 3500 -- curl -X POST http://localhost:8000/health
|
||||
|
||||
# Test CCIP Monitor
|
||||
pct exec 3501 -- curl -X POST http://localhost:8000/health
|
||||
|
||||
# Test Keeper
|
||||
pct exec 3502 -- curl -X POST http://localhost:3000/health
|
||||
```
|
||||
|
||||
3. **Check Service Logs**
|
||||
```bash
|
||||
# Oracle Publisher
|
||||
pct exec 3500 -- journalctl -u oracle-publisher -f
|
||||
|
||||
# CCIP Monitor
|
||||
pct exec 3501 -- journalctl -u ccip-monitor -f
|
||||
|
||||
# Keeper
|
||||
pct exec 3502 -- journalctl -u price-feed-keeper -f
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Current Status Summary
|
||||
|
||||
### Completed ✅
|
||||
|
||||
- ✅ RPC-01 (VMID 2500) troubleshooting and fix
|
||||
- ✅ Configuration files updated
|
||||
- ✅ Deployment scripts updated with correct IPs
|
||||
- ✅ Network verified (producing blocks, Chain ID 138)
|
||||
- ✅ RPC endpoint accessible and responding
|
||||
|
||||
### Pending ⏳
|
||||
|
||||
- ⏳ Verify RPC-02 and RPC-03 status
|
||||
- ⏳ Configure deployer account and `.env` file
|
||||
- ⏳ Deploy contracts (waiting for user action)
|
||||
- ⏳ Update service configurations with deployed addresses
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Ready for Deployment
|
||||
|
||||
**Status**: ✅ **READY** (pending deployer account setup)
|
||||
|
||||
All infrastructure, scripts, and documentation are in place. The network is operational and ready for contract deployment.
|
||||
|
||||
**Next Action**: Configure deployer account and `.env` file, then proceed with contract deployment.
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: $(date)
|
||||
|
||||
451
docs/03-deployment/DEPLOYMENT_RUNBOOK.md
Normal file
451
docs/03-deployment/DEPLOYMENT_RUNBOOK.md
Normal file
@@ -0,0 +1,451 @@
|
||||
# Deployment Runbook
|
||||
## SolaceScanScout Explorer - Production Deployment Guide
|
||||
|
||||
**Last Updated**: $(date)
|
||||
**Version**: 1.0.0
|
||||
|
||||
---
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Pre-Deployment Checklist](#pre-deployment-checklist)
|
||||
2. [Environment Setup](#environment-setup)
|
||||
3. [Database Migration](#database-migration)
|
||||
4. [Service Deployment](#service-deployment)
|
||||
5. [Health Checks](#health-checks)
|
||||
6. [Rollback Procedures](#rollback-procedures)
|
||||
7. [Post-Deployment Verification](#post-deployment-verification)
|
||||
8. [Troubleshooting](#troubleshooting)
|
||||
|
||||
---
|
||||
|
||||
## Pre-Deployment Checklist
|
||||
|
||||
### Infrastructure Requirements
|
||||
|
||||
- [ ] Kubernetes cluster (AKS) or VM infrastructure ready
|
||||
- [ ] PostgreSQL 16+ with TimescaleDB extension
|
||||
- [ ] Redis cluster (for production cache/rate limiting)
|
||||
- [ ] Elasticsearch/OpenSearch cluster
|
||||
- [ ] Load balancer configured
|
||||
- [ ] SSL certificates provisioned
|
||||
- [ ] DNS records configured
|
||||
- [ ] Monitoring stack deployed (Prometheus, Grafana)
|
||||
|
||||
### Configuration
|
||||
|
||||
- [ ] Environment variables configured
|
||||
- [ ] Secrets stored in Key Vault
|
||||
- [ ] Database credentials verified
|
||||
- [ ] Redis connection string verified
|
||||
- [ ] RPC endpoint URLs verified
|
||||
- [ ] JWT secret configured (strong random value)
|
||||
|
||||
### Code & Artifacts
|
||||
|
||||
- [ ] All tests passing
|
||||
- [ ] Docker images built and tagged
|
||||
- [ ] Images pushed to container registry
|
||||
- [ ] Database migrations reviewed
|
||||
- [ ] Rollback plan documented
|
||||
|
||||
---
|
||||
|
||||
## Environment Setup
|
||||
|
||||
### 1. Set Environment Variables
|
||||
|
||||
```bash
|
||||
# Database
|
||||
export DB_HOST=postgres.example.com
|
||||
export DB_PORT=5432
|
||||
export DB_USER=explorer
|
||||
export DB_PASSWORD=<from-key-vault>
|
||||
export DB_NAME=explorer
|
||||
|
||||
# Redis (for production)
|
||||
export REDIS_URL=redis://redis.example.com:6379
|
||||
|
||||
# RPC
|
||||
export RPC_URL=https://rpc.d-bis.org
|
||||
export WS_URL=wss://rpc.d-bis.org
|
||||
|
||||
# Application
|
||||
export CHAIN_ID=138
|
||||
export PORT=8080
|
||||
export JWT_SECRET=<strong-random-secret>
|
||||
|
||||
# Optional
|
||||
export LOG_LEVEL=info
|
||||
export ENABLE_METRICS=true
|
||||
```
|
||||
|
||||
### 2. Verify Secrets
|
||||
|
||||
```bash
|
||||
# Test database connection
|
||||
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "SELECT 1;"
|
||||
|
||||
# Test Redis connection
|
||||
redis-cli -u $REDIS_URL ping
|
||||
|
||||
# Test RPC endpoint
|
||||
curl -X POST $RPC_URL \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database Migration
|
||||
|
||||
### 1. Backup Existing Database
|
||||
|
||||
```bash
|
||||
# Create backup
|
||||
pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME > backup_$(date +%Y%m%d_%H%M%S).sql
|
||||
|
||||
# Verify backup
|
||||
ls -lh backup_*.sql
|
||||
```
|
||||
|
||||
### 2. Run Migrations
|
||||
|
||||
```bash
|
||||
cd explorer-monorepo/backend/database/migrations
|
||||
|
||||
# Review pending migrations
|
||||
go run migrate.go --status
|
||||
|
||||
# Run migrations
|
||||
go run migrate.go --up
|
||||
|
||||
# Verify migration
|
||||
go run migrate.go --status
|
||||
```
|
||||
|
||||
### 3. Verify Schema
|
||||
|
||||
```bash
|
||||
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "\dt"
|
||||
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "\d blocks"
|
||||
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "\d transactions"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Service Deployment
|
||||
|
||||
### Option 1: Kubernetes Deployment
|
||||
|
||||
#### 1. Deploy API Server
|
||||
|
||||
```bash
|
||||
kubectl apply -f k8s/api-server-deployment.yaml
|
||||
kubectl apply -f k8s/api-server-service.yaml
|
||||
kubectl apply -f k8s/api-server-ingress.yaml
|
||||
|
||||
# Verify deployment
|
||||
kubectl get pods -l app=api-server
|
||||
kubectl logs -f deployment/api-server
|
||||
```
|
||||
|
||||
#### 2. Deploy Indexer
|
||||
|
||||
```bash
|
||||
kubectl apply -f k8s/indexer-deployment.yaml
|
||||
|
||||
# Verify deployment
|
||||
kubectl get pods -l app=indexer
|
||||
kubectl logs -f deployment/indexer
|
||||
```
|
||||
|
||||
#### 3. Rolling Update
|
||||
|
||||
```bash
|
||||
# Update image
|
||||
kubectl set image deployment/api-server api-server=registry.example.com/explorer-api:v1.1.0
|
||||
|
||||
# Monitor rollout
|
||||
kubectl rollout status deployment/api-server
|
||||
|
||||
# Rollback if needed
|
||||
kubectl rollout undo deployment/api-server
|
||||
```
|
||||
|
||||
### Option 2: Docker Compose Deployment
|
||||
|
||||
```bash
|
||||
cd explorer-monorepo/deployment
|
||||
|
||||
# Start services
|
||||
docker-compose up -d
|
||||
|
||||
# Verify services
|
||||
docker-compose ps
|
||||
docker-compose logs -f api-server
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Health Checks
|
||||
|
||||
### 1. API Health Endpoint
|
||||
|
||||
```bash
|
||||
# Check health
|
||||
curl https://api.d-bis.org/health
|
||||
|
||||
# Expected response
|
||||
{
|
||||
"status": "ok",
|
||||
"timestamp": "2024-01-01T00:00:00Z",
|
||||
"database": "connected"
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Service Health
|
||||
|
||||
```bash
|
||||
# Kubernetes
|
||||
kubectl get pods
|
||||
kubectl describe pod <pod-name>
|
||||
|
||||
# Docker
|
||||
docker ps
|
||||
docker inspect <container-id>
|
||||
```
|
||||
|
||||
### 3. Database Connectivity
|
||||
|
||||
```bash
|
||||
# From API server
|
||||
curl https://api.d-bis.org/health | jq .database
|
||||
|
||||
# Direct check
|
||||
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "SELECT COUNT(*) FROM blocks;"
|
||||
```
|
||||
|
||||
### 4. Redis Connectivity
|
||||
|
||||
```bash
|
||||
# Test Redis
|
||||
redis-cli -u $REDIS_URL ping
|
||||
|
||||
# Check cache stats
|
||||
redis-cli -u $REDIS_URL INFO stats
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rollback Procedures
|
||||
|
||||
### Quick Rollback (Kubernetes)
|
||||
|
||||
```bash
|
||||
# Rollback to previous version
|
||||
kubectl rollout undo deployment/api-server
|
||||
kubectl rollout undo deployment/indexer
|
||||
|
||||
# Verify rollback
|
||||
kubectl rollout status deployment/api-server
|
||||
```
|
||||
|
||||
### Database Rollback
|
||||
|
||||
```bash
|
||||
# Restore from backup
|
||||
psql -h $DB_HOST -U $DB_USER -d $DB_NAME < backup_YYYYMMDD_HHMMSS.sql
|
||||
|
||||
# Or rollback migrations
|
||||
cd explorer-monorepo/backend/database/migrations
|
||||
go run migrate.go --down 1
|
||||
```
|
||||
|
||||
### Full Rollback
|
||||
|
||||
```bash
|
||||
# 1. Stop new services
|
||||
kubectl scale deployment/api-server --replicas=0
|
||||
kubectl scale deployment/indexer --replicas=0
|
||||
|
||||
# 2. Restore database
|
||||
psql -h $DB_HOST -U $DB_USER -d $DB_NAME < backup_YYYYMMDD_HHMMSS.sql
|
||||
|
||||
# 3. Start previous version
|
||||
kubectl set image deployment/api-server api-server=registry.example.com/explorer-api:v1.0.0
|
||||
kubectl scale deployment/api-server --replicas=3
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Post-Deployment Verification
|
||||
|
||||
### 1. Functional Tests
|
||||
|
||||
```bash
|
||||
# Test Track 1 endpoints (public)
|
||||
curl https://api.d-bis.org/api/v1/track1/blocks/latest
|
||||
|
||||
# Test search
|
||||
curl https://api.d-bis.org/api/v1/search?q=1000
|
||||
|
||||
# Test health
|
||||
curl https://api.d-bis.org/health
|
||||
```
|
||||
|
||||
### 2. Performance Tests
|
||||
|
||||
```bash
|
||||
# Load test
|
||||
ab -n 1000 -c 10 https://api.d-bis.org/api/v1/track1/blocks/latest
|
||||
|
||||
# Check response times
|
||||
curl -w "@curl-format.txt" -o /dev/null -s https://api.d-bis.org/api/v1/track1/blocks/latest
|
||||
```
|
||||
|
||||
### 3. Monitoring
|
||||
|
||||
- [ ] Check Grafana dashboards
|
||||
- [ ] Verify Prometheus metrics
|
||||
- [ ] Check error rates
|
||||
- [ ] Monitor response times
|
||||
- [ ] Check database connection pool
|
||||
- [ ] Verify Redis cache hit rate
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
#### 1. Database Connection Errors
|
||||
|
||||
**Symptoms**: 500 errors, "database connection failed"
|
||||
|
||||
**Resolution**:
|
||||
```bash
|
||||
# Check database status
|
||||
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "SELECT 1;"
|
||||
|
||||
# Check connection pool
|
||||
# Review database/migrations for connection pool settings
|
||||
|
||||
# Restart service
|
||||
kubectl rollout restart deployment/api-server
|
||||
```
|
||||
|
||||
#### 2. Redis Connection Errors
|
||||
|
||||
**Symptoms**: Cache misses, rate limiting not working
|
||||
|
||||
**Resolution**:
|
||||
```bash
|
||||
# Test Redis connection
|
||||
redis-cli -u $REDIS_URL ping
|
||||
|
||||
# Check Redis logs
|
||||
kubectl logs -l app=redis
|
||||
|
||||
# Fallback to in-memory (temporary)
|
||||
# Remove REDIS_URL from environment
|
||||
```
|
||||
|
||||
#### 3. High Memory Usage
|
||||
|
||||
**Symptoms**: OOM kills, slow responses
|
||||
|
||||
**Resolution**:
|
||||
```bash
|
||||
# Check memory usage
|
||||
kubectl top pods
|
||||
|
||||
# Increase memory limits
|
||||
kubectl set resources deployment/api-server --limits=memory=2Gi
|
||||
|
||||
# Review cache TTL settings
|
||||
```
|
||||
|
||||
#### 4. Slow Response Times
|
||||
|
||||
**Symptoms**: High latency, timeout errors
|
||||
|
||||
**Resolution**:
|
||||
```bash
|
||||
# Check database query performance
|
||||
psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "EXPLAIN ANALYZE SELECT * FROM blocks LIMIT 10;"
|
||||
|
||||
# Check indexer lag
|
||||
curl https://api.d-bis.org/api/v1/track2/stats
|
||||
|
||||
# Review connection pool settings
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Emergency Procedures
|
||||
|
||||
### Service Outage
|
||||
|
||||
1. **Immediate Actions**:
|
||||
- Check service status: `kubectl get pods`
|
||||
- Check logs: `kubectl logs -f deployment/api-server`
|
||||
- Check database: `psql -h $DB_HOST -U $DB_USER -d $DB_NAME -c "SELECT 1;"`
|
||||
- Check Redis: `redis-cli -u $REDIS_URL ping`
|
||||
|
||||
2. **Quick Recovery**:
|
||||
- Restart services: `kubectl rollout restart deployment/api-server`
|
||||
- Scale up: `kubectl scale deployment/api-server --replicas=5`
|
||||
- Rollback if needed: `kubectl rollout undo deployment/api-server`
|
||||
|
||||
3. **Communication**:
|
||||
- Update status page
|
||||
- Notify team via Slack/email
|
||||
- Document incident
|
||||
|
||||
### Data Corruption
|
||||
|
||||
1. **Immediate Actions**:
|
||||
- Stop writes: `kubectl scale deployment/api-server --replicas=0`
|
||||
- Backup current state: `pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME > emergency_backup.sql`
|
||||
|
||||
2. **Recovery**:
|
||||
- Restore from last known good backup
|
||||
- Verify data integrity
|
||||
- Resume services
|
||||
|
||||
---
|
||||
|
||||
## Maintenance Windows
|
||||
|
||||
### Scheduled Maintenance
|
||||
|
||||
1. **Pre-Maintenance**:
|
||||
- Notify users 24 hours in advance
|
||||
- Create maintenance mode flag
|
||||
- Prepare rollback plan
|
||||
|
||||
2. **During Maintenance**:
|
||||
- Enable maintenance mode
|
||||
- Perform updates
|
||||
- Run health checks
|
||||
|
||||
3. **Post-Maintenance**:
|
||||
- Disable maintenance mode
|
||||
- Verify all services
|
||||
- Monitor for issues
|
||||
|
||||
---
|
||||
|
||||
## Contact Information
|
||||
|
||||
- **On-Call Engineer**: Check PagerDuty
|
||||
- **Slack Channel**: #explorer-deployments
|
||||
- **Emergency**: [Emergency Contact]
|
||||
|
||||
---
|
||||
|
||||
**Document Version**: 1.0.0
|
||||
**Last Reviewed**: $(date)
|
||||
**Next Review**: $(date -d "+3 months")
|
||||
|
||||
260
docs/03-deployment/DISASTER_RECOVERY.md
Normal file
260
docs/03-deployment/DISASTER_RECOVERY.md
Normal file
@@ -0,0 +1,260 @@
|
||||
# Disaster Recovery Procedures
|
||||
|
||||
**Last Updated:** 2025-01-20
|
||||
**Document Version:** 1.0
|
||||
**Status:** Active Documentation
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This document outlines disaster recovery procedures for the Proxmox infrastructure, including recovery from hardware failures, data loss, network outages, and security incidents.
|
||||
|
||||
---
|
||||
|
||||
## Recovery Scenarios
|
||||
|
||||
### 1. Complete Host Failure
|
||||
|
||||
**Scenario:** A Proxmox host (R630 or ML110) fails completely and cannot be recovered.
|
||||
|
||||
**Recovery Steps:**
|
||||
|
||||
1. **Assess Impact:**
|
||||
```bash
|
||||
# Check which VMs/containers were running on failed host
|
||||
pvecm status
|
||||
pvecm nodes
|
||||
```
|
||||
|
||||
2. **Recover from Backup:**
|
||||
- Identify backup location (Proxmox Backup Server or external storage)
|
||||
- Restore VMs/containers to another host in the cluster
|
||||
- Verify network connectivity and services
|
||||
|
||||
3. **Rejoin Cluster (if host is replaced):**
|
||||
```bash
|
||||
# On new/repaired host
|
||||
pvecm add <cluster-name> -link0 <interface>
|
||||
```
|
||||
|
||||
4. **Verify Services:**
|
||||
- Check all critical services are running
|
||||
- Verify network connectivity
|
||||
- Test application functionality
|
||||
|
||||
**Recovery Time Objective (RTO):** 4 hours
|
||||
**Recovery Point Objective (RPO):** Last backup (typically daily)
|
||||
|
||||
---
|
||||
|
||||
### 2. Storage Failure
|
||||
|
||||
**Scenario:** Storage pool fails (ZFS pool corruption, disk failure, etc.)
|
||||
|
||||
**Recovery Steps:**
|
||||
|
||||
1. **Immediate Actions:**
|
||||
- Stop all VMs/containers using affected storage
|
||||
- Assess extent of damage
|
||||
- Check backup availability
|
||||
|
||||
2. **Storage Recovery:**
|
||||
```bash
|
||||
# For ZFS pools
|
||||
zpool status
|
||||
zpool import -f <pool-name>
|
||||
zfs scrub <pool-name>
|
||||
```
|
||||
|
||||
3. **Data Recovery:**
|
||||
- Restore from backups if pool cannot be recovered
|
||||
- Use Proxmox Backup Server if available
|
||||
- Restore individual VMs/containers as needed
|
||||
|
||||
4. **Verification:**
|
||||
- Verify data integrity
|
||||
- Test restored VMs/containers
|
||||
- Document lessons learned
|
||||
|
||||
**RTO:** 8 hours
|
||||
**RPO:** Last backup
|
||||
|
||||
---
|
||||
|
||||
### 3. Network Outage
|
||||
|
||||
**Scenario:** Complete network failure or misconfiguration
|
||||
|
||||
**Recovery Steps:**
|
||||
|
||||
1. **Local Access:**
|
||||
- Use console access (iDRAC, iLO, or physical console)
|
||||
- Verify Proxmox host is running
|
||||
- Check network configuration
|
||||
|
||||
2. **Network Restoration:**
|
||||
```bash
|
||||
# Check network interfaces
|
||||
ip addr show
|
||||
ip link show
|
||||
|
||||
# Check routing
|
||||
ip route show
|
||||
|
||||
# Restart networking if needed
|
||||
systemctl restart networking
|
||||
```
|
||||
|
||||
3. **VLAN Restoration:**
|
||||
- Verify VLAN configuration on switches
|
||||
- Check Proxmox bridge configuration
|
||||
- Test connectivity between VLANs
|
||||
|
||||
4. **Service Verification:**
|
||||
- Test internal services
|
||||
- Verify external connectivity (if applicable)
|
||||
- Check Cloudflare tunnels (if used)
|
||||
|
||||
**RTO:** 2 hours
|
||||
**RPO:** No data loss (network issue only)
|
||||
|
||||
---
|
||||
|
||||
### 4. Data Corruption
|
||||
|
||||
**Scenario:** VM/container data corruption or accidental deletion
|
||||
|
||||
**Recovery Steps:**
|
||||
|
||||
1. **Immediate Actions:**
|
||||
- Stop affected VM/container
|
||||
- Do not attempt repairs that might worsen corruption
|
||||
- Document what was lost
|
||||
|
||||
2. **Recovery Options:**
|
||||
- **From Snapshot:** Restore from most recent snapshot
|
||||
- **From Backup:** Restore from Proxmox Backup Server
|
||||
- **From External Backup:** Use external backup solution
|
||||
|
||||
3. **Restoration:**
|
||||
```bash
|
||||
# Restore from PBS
|
||||
vzdump restore <backup-id> <vmid> --storage <storage>
|
||||
|
||||
# Or restore from snapshot
|
||||
qm rollback <vmid> <snapshot-name>
|
||||
```
|
||||
|
||||
4. **Verification:**
|
||||
- Verify data integrity
|
||||
- Test application functionality
|
||||
- Update documentation
|
||||
|
||||
**RTO:** 4 hours
|
||||
**RPO:** Last snapshot/backup
|
||||
|
||||
---
|
||||
|
||||
### 5. Security Incident
|
||||
|
||||
**Scenario:** Security breach, unauthorized access, or malware
|
||||
|
||||
**Recovery Steps:**
|
||||
|
||||
1. **Immediate Containment:**
|
||||
- Isolate affected systems
|
||||
- Disconnect from network if necessary
|
||||
- Preserve evidence (logs, snapshots)
|
||||
|
||||
2. **Assessment:**
|
||||
- Identify scope of breach
|
||||
- Determine what was accessed/modified
|
||||
- Check for data exfiltration
|
||||
|
||||
3. **Recovery:**
|
||||
- Restore from known-good backups (pre-incident)
|
||||
- Rebuild affected systems if necessary
|
||||
- Update all credentials and keys
|
||||
|
||||
4. **Hardening:**
|
||||
- Review and update security policies
|
||||
- Patch vulnerabilities
|
||||
- Enhance monitoring
|
||||
|
||||
5. **Documentation:**
|
||||
- Document incident timeline
|
||||
- Update security procedures
|
||||
- Conduct post-incident review
|
||||
|
||||
**RTO:** 24 hours
|
||||
**RPO:** Pre-incident state
|
||||
|
||||
---
|
||||
|
||||
## Backup Strategy
|
||||
|
||||
### Backup Schedule
|
||||
|
||||
- **Critical VMs/Containers:** Daily backups
|
||||
- **Standard VMs/Containers:** Weekly backups
|
||||
- **Configuration:** Daily backups of Proxmox configuration
|
||||
- **Network Configuration:** Version controlled (Git)
|
||||
|
||||
### Backup Locations
|
||||
|
||||
1. **Primary:** Proxmox Backup Server (if available)
|
||||
2. **Secondary:** External storage (NFS, SMB, or USB)
|
||||
3. **Offsite:** Cloud storage or remote location
|
||||
|
||||
### Backup Verification
|
||||
|
||||
- Weekly restore tests
|
||||
- Monthly full disaster recovery drill
|
||||
- Quarterly review of backup strategy
|
||||
|
||||
---
|
||||
|
||||
## Recovery Contacts
|
||||
|
||||
### Primary Contacts
|
||||
|
||||
- **Infrastructure Lead:** [Contact Information]
|
||||
- **Network Administrator:** [Contact Information]
|
||||
- **Security Team:** [Contact Information]
|
||||
|
||||
### Escalation
|
||||
|
||||
- **Level 1:** Infrastructure team (4 hours)
|
||||
- **Level 2:** Management (8 hours)
|
||||
- **Level 3:** External support (24 hours)
|
||||
|
||||
---
|
||||
|
||||
## Testing and Maintenance
|
||||
|
||||
### Quarterly DR Drills
|
||||
|
||||
1. **Test Scenario:** Simulate host failure
|
||||
2. **Test Scenario:** Simulate storage failure
|
||||
3. **Test Scenario:** Simulate network outage
|
||||
4. **Document Results:** Update procedures based on findings
|
||||
|
||||
### Annual Full DR Test
|
||||
|
||||
- Complete infrastructure rebuild from backups
|
||||
- Verify all services
|
||||
- Update documentation
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **[BACKUP_AND_RESTORE.md](BACKUP_AND_RESTORE.md)** - Detailed backup procedures
|
||||
- **[OPERATIONAL_RUNBOOKS.md](OPERATIONAL_RUNBOOKS.md)** - Operational procedures
|
||||
- **[../../09-troubleshooting/TROUBLESHOOTING_FAQ.md](../../09-troubleshooting/TROUBLESHOOTING_FAQ.md)** - Troubleshooting guide
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** 2025-01-20
|
||||
**Review Cycle:** Quarterly
|
||||
103
docs/03-deployment/LVM_THIN_PVE_ENABLED.md
Normal file
103
docs/03-deployment/LVM_THIN_PVE_ENABLED.md
Normal file
@@ -0,0 +1,103 @@
|
||||
# LVM Thin Storage Enabled on pve
|
||||
|
||||
**Date**: $(date)
|
||||
**Status**: ✅ LVM Thin Storage Configured
|
||||
|
||||
## Summary
|
||||
|
||||
LVM thin storage has been successfully enabled on pve node for migrations.
|
||||
|
||||
## Configuration
|
||||
|
||||
### Volume Group
|
||||
- **Name**: `pve`
|
||||
- **Physical Volumes**: 2 disks (sdc, sdd)
|
||||
- **Total Size**: ~465.77GB
|
||||
- **Free Space**: ~257.77GB
|
||||
|
||||
### Thin Pool
|
||||
- **Name**: `thin1`
|
||||
- **Volume Group**: `pve`
|
||||
- **Size**: 208GB
|
||||
- **Type**: LVM thin pool
|
||||
- **Status**: Created and configured
|
||||
|
||||
### Proxmox Storage
|
||||
- **Name**: `thin1`
|
||||
- **Type**: `lvmthin`
|
||||
- **Configuration**:
|
||||
- Thin pool: `thin1`
|
||||
- Volume group: `pve`
|
||||
- Content: `images,rootdir`
|
||||
- Nodes: `pve`
|
||||
|
||||
## Storage Status
|
||||
|
||||
```
|
||||
pve storage:
|
||||
- local: active (directory storage)
|
||||
- thin1: configured (LVM thin storage)
|
||||
- local-lvm: disabled (configured for ml110 only)
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Migrate VMs to pve with thin1 storage
|
||||
|
||||
```bash
|
||||
# From source node (e.g., ml110)
|
||||
ssh root@192.168.11.10
|
||||
|
||||
# Migrate with thin1 storage
|
||||
pct migrate <VMID> pve --storage thin1
|
||||
|
||||
# Or using API
|
||||
pvesh create /nodes/ml110/lxc/<VMID>/migrate --target pve --storage thin1 --online 0
|
||||
```
|
||||
|
||||
### Create new VMs on pve
|
||||
|
||||
When creating new containers on pve, you can now use:
|
||||
- `thin1` - LVM thin storage (recommended for performance)
|
||||
- `local` - Directory storage (slower but works)
|
||||
|
||||
## Storage Capacity
|
||||
|
||||
- **thin1**: 208GB total (available for VMs)
|
||||
- **local**: 564GB total, 2.9GB used, 561GB available
|
||||
|
||||
## Verification
|
||||
|
||||
### Check storage status
|
||||
```bash
|
||||
ssh root@192.168.11.11 "pvesm status"
|
||||
```
|
||||
|
||||
### Check volume groups
|
||||
```bash
|
||||
ssh root@192.168.11.11 "vgs"
|
||||
```
|
||||
|
||||
### Check thin pools
|
||||
```bash
|
||||
ssh root@192.168.11.11 "lvs pve"
|
||||
```
|
||||
|
||||
### List storage contents
|
||||
```bash
|
||||
ssh root@192.168.11.11 "pvesm list thin1"
|
||||
```
|
||||
|
||||
## Notes
|
||||
|
||||
- The thin pool is created and ready for use
|
||||
- Storage may show as "inactive" in `pvesm status` until first use - this is normal
|
||||
- The storage is properly configured and will activate when used
|
||||
- Both `thin1` (LVM thin) and `local` (directory) storage are available on pve
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- `docs/STORAGE_FIX_COMPLETE.md`: Complete storage fix documentation
|
||||
- `docs/MIGRATION_STORAGE_FIX.md`: Migration guide
|
||||
- `scripts/enable-lvm-thin-pve.sh`: Script used to enable storage
|
||||
|
||||
339
docs/03-deployment/MISSING_CONTAINERS_LIST.md
Normal file
339
docs/03-deployment/MISSING_CONTAINERS_LIST.md
Normal file
@@ -0,0 +1,339 @@
|
||||
# Missing LXC Containers - Complete List
|
||||
|
||||
**Date:** December 26, 2024
|
||||
**Status:** Inventory of containers that need to be created
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Category | Missing | Total Expected | Status |
|
||||
|----------|---------|----------------|--------|
|
||||
| **Besu Nodes** | 7 | 19 | 12/19 deployed |
|
||||
| **Hyperledger Services** | 5 | 5 | 0/5 deployed |
|
||||
| **Explorer** | 1 | 1 | 0/1 deployed |
|
||||
| **TOTAL** | **13** | **25** | **12/25 deployed** |
|
||||
|
||||
---
|
||||
|
||||
## 🔴 Missing Containers by Category
|
||||
|
||||
### 1. Besu Nodes (ChainID 138)
|
||||
|
||||
#### Missing Sentry Node
|
||||
|
||||
| VMID | Hostname | Role | IP Address | Priority | Notes |
|
||||
|------|----------|------|------------|----------|-------|
|
||||
| **1504** | `besu-sentry-5` | Besu Sentry Node | 192.168.11.154 | **High** | New container for Ali's dedicated host |
|
||||
|
||||
**Specifications:**
|
||||
- Memory: 4GB
|
||||
- CPU: 2 cores
|
||||
- Disk: 100GB
|
||||
- Network: 192.168.11.154
|
||||
- Discovery: Enabled
|
||||
- Access: Ali (Full)
|
||||
|
||||
---
|
||||
|
||||
#### Missing RPC Nodes
|
||||
|
||||
| VMID | Hostname | Role | IP Address | Priority | Notes |
|
||||
|------|----------|------|------------|----------|-------|
|
||||
| **2503** | `besu-rpc-4` | Besu RPC Node (Ali - 0x8a) | 192.168.11.253 | **High** | Ali's RPC node - Permissioned identity: 0x8a |
|
||||
| **2504** | `besu-rpc-4` | Besu RPC Node (Ali - 0x1) | 192.168.11.254 | **High** | Ali's RPC node - Permissioned identity: 0x1 |
|
||||
| **2505** | `besu-rpc-luis` | Besu RPC Node (Luis - 0x8a) | 192.168.11.255 | **High** | Luis's RPC container - Permissioned identity: 0x8a |
|
||||
| **2506** | `besu-rpc-luis` | Besu RPC Node (Luis - 0x1) | 192.168.11.256 | **High** | Luis's RPC container - Permissioned identity: 0x1 |
|
||||
| **2507** | `besu-rpc-putu` | Besu RPC Node (Putu - 0x8a) | 192.168.11.257 | **High** | Putu's RPC container - Permissioned identity: 0x8a |
|
||||
| **2508** | `besu-rpc-putu` | Besu RPC Node (Putu - 0x1) | 192.168.11.258 | **High** | Putu's RPC container - Permissioned identity: 0x1 |
|
||||
|
||||
**Specifications (per container):**
|
||||
- Memory: 16GB
|
||||
- CPU: 4 cores
|
||||
- Disk: 200GB
|
||||
- Discovery: **Disabled** (prevents connection to Ethereum mainnet while reporting chainID 0x1 to MetaMask for wallet compatibility)
|
||||
- **Authentication: JWT Auth Required** (all containers)
|
||||
|
||||
**Access Model:**
|
||||
- **2503** (besu-rpc-4): Ali (Full) - 0x8a identity
|
||||
- **2504** (besu-rpc-4): Ali (Full) - 0x1 identity
|
||||
- **2505** (besu-rpc-luis): Luis (RPC-only) - 0x8a identity
|
||||
- **2506** (besu-rpc-luis): Luis (RPC-only) - 0x1 identity
|
||||
- **2507** (besu-rpc-putu): Putu (RPC-only) - 0x8a identity
|
||||
- **2508** (besu-rpc-putu): Putu (RPC-only) - 0x1 identity
|
||||
|
||||
**Configuration:**
|
||||
- All use permissioned RPC configuration
|
||||
- Discovery disabled for all (prevents connection to Ethereum mainnet while reporting chainID 0x1 to MetaMask for wallet compatibility)
|
||||
- Each container has separate permissioned identity access
|
||||
- **All require JWT authentication** via nginx reverse proxy
|
||||
|
||||
---
|
||||
|
||||
### 2. Hyperledger Services
|
||||
|
||||
#### Firefly
|
||||
|
||||
| VMID | Hostname | Role | IP Address | Priority | Notes |
|
||||
|------|----------|------|------------|----------|-------|
|
||||
| **6200** | `firefly-1` | Hyperledger Firefly Core | 192.168.11.66 | **High** | Workflow/orchestration |
|
||||
| **6201** | `firefly-2` | Hyperledger Firefly Node | 192.168.11.67 | **High** | For Ali's dedicated host (ChainID 138) |
|
||||
|
||||
**Specifications (per container):**
|
||||
- Memory: 4GB
|
||||
- CPU: 2 cores
|
||||
- Disk: 50GB
|
||||
- Access: Ali (Full)
|
||||
|
||||
**Notes:**
|
||||
- 6201 is specifically mentioned in ChainID 138 documentation
|
||||
- 6200 is the core Firefly service
|
||||
|
||||
---
|
||||
|
||||
#### Cacti
|
||||
|
||||
| VMID | Hostname | Role | IP Address | Priority | Notes |
|
||||
|------|----------|------|------------|----------|-------|
|
||||
| **5200** | `cacti-1` | Hyperledger Cacti | 192.168.11.64 | **High** | Interop middleware |
|
||||
|
||||
**Specifications:**
|
||||
- Memory: 4GB
|
||||
- CPU: 2 cores
|
||||
- Disk: 50GB
|
||||
|
||||
---
|
||||
|
||||
#### Fabric
|
||||
|
||||
| VMID | Hostname | Role | IP Address | Priority | Notes |
|
||||
|------|----------|------|------------|----------|-------|
|
||||
| **6000** | `fabric-1` | Hyperledger Fabric | 192.168.11.65 | Medium | Enterprise contracts |
|
||||
|
||||
**Specifications:**
|
||||
- Memory: 8GB
|
||||
- CPU: 4 cores
|
||||
- Disk: 100GB
|
||||
|
||||
---
|
||||
|
||||
#### Indy
|
||||
|
||||
| VMID | Hostname | Role | IP Address | Priority | Notes |
|
||||
|------|----------|------|------------|----------|-------|
|
||||
| **6400** | `indy-1` | Hyperledger Indy | 192.168.11.68 | Medium | Identity layer |
|
||||
|
||||
**Specifications:**
|
||||
- Memory: 8GB
|
||||
- CPU: 4 cores
|
||||
- Disk: 100GB
|
||||
|
||||
---
|
||||
|
||||
### 3. Explorer
|
||||
|
||||
#### Blockscout
|
||||
|
||||
| VMID | Hostname | Role | IP Address | Priority | Notes |
|
||||
|------|----------|------|------------|----------|-------|
|
||||
| **5000** | `blockscout-1` | Blockscout Explorer | TBD | **High** | Blockchain explorer for ChainID 138 |
|
||||
|
||||
**Specifications:**
|
||||
- Memory: 8GB+
|
||||
- CPU: 4 cores+
|
||||
- Disk: 200GB+
|
||||
- Requires: PostgreSQL database
|
||||
|
||||
---
|
||||
|
||||
## 📊 Deployment Priority
|
||||
|
||||
### Priority 1 - High (ChainID 138 Critical)
|
||||
|
||||
1. **1504** - `besu-sentry-5` (Ali's dedicated host)
|
||||
2. **2503** - `besu-rpc-4` (Ali's RPC node - 0x8a identity)
|
||||
3. **2504** - `besu-rpc-4` (Ali's RPC node - 0x1 identity)
|
||||
4. **2505** - `besu-rpc-luis` (Luis's RPC container - 0x8a identity)
|
||||
5. **2506** - `besu-rpc-luis` (Luis's RPC container - 0x1 identity)
|
||||
6. **2507** - `besu-rpc-putu` (Putu's RPC container - 0x8a identity)
|
||||
7. **2508** - `besu-rpc-putu` (Putu's RPC container - 0x1 identity)
|
||||
8. **6201** - `firefly-2` (Ali's dedicated host, ChainID 138)
|
||||
9. **5000** - `blockscout-1` (Explorer for ChainID 138)
|
||||
|
||||
**Note:** All RPC containers require JWT authentication via nginx reverse proxy.
|
||||
|
||||
### Priority 2 - High (Infrastructure)
|
||||
|
||||
5. **6200** - `firefly-1` (Core Firefly service)
|
||||
6. **5200** - `cacti-1` (Interop middleware)
|
||||
|
||||
### Priority 3 - Medium
|
||||
|
||||
7. **6000** - `fabric-1` (Enterprise contracts)
|
||||
8. **6400** - `indy-1` (Identity layer)
|
||||
|
||||
---
|
||||
|
||||
## ✅ Currently Deployed Containers
|
||||
|
||||
### Besu Network (12/14)
|
||||
|
||||
| VMID | Hostname | Status |
|
||||
|------|----------|--------|
|
||||
| 1000 | besu-validator-1 | ✅ Deployed |
|
||||
| 1001 | besu-validator-2 | ✅ Deployed |
|
||||
| 1002 | besu-validator-3 | ✅ Deployed |
|
||||
| 1003 | besu-validator-4 | ✅ Deployed |
|
||||
| 1004 | besu-validator-5 | ✅ Deployed |
|
||||
| 1500 | besu-sentry-1 | ✅ Deployed |
|
||||
| 1501 | besu-sentry-2 | ✅ Deployed |
|
||||
| 1502 | besu-sentry-3 | ✅ Deployed |
|
||||
| 1503 | besu-sentry-4 | ✅ Deployed |
|
||||
| 1504 | besu-sentry-5 | ❌ **MISSING** |
|
||||
| 2500 | besu-rpc-1 | ✅ Deployed |
|
||||
| 2501 | besu-rpc-2 | ✅ Deployed |
|
||||
| 2502 | besu-rpc-3 | ✅ Deployed |
|
||||
| 2503 | besu-rpc-4 | ❌ **MISSING** |
|
||||
|
||||
### Services (2/4)
|
||||
|
||||
| VMID | Hostname | Status |
|
||||
|------|----------|--------|
|
||||
| 3500 | oracle-publisher-1 | ✅ Deployed |
|
||||
| 3501 | ccip-monitor-1 | ✅ Deployed |
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Deployment Scripts Available
|
||||
|
||||
### For Besu Nodes
|
||||
|
||||
- **Main deployment:** `smom-dbis-138-proxmox/scripts/deployment/deploy-besu-nodes.sh`
|
||||
- **Configuration:** `scripts/configure-besu-chain138-nodes.sh`
|
||||
- **Quick setup:** `scripts/setup-new-chain138-containers.sh`
|
||||
|
||||
### For Hyperledger Services
|
||||
|
||||
- **Deployment:** `smom-dbis-138-proxmox/scripts/deployment/deploy-hyperledger-services.sh`
|
||||
|
||||
### For Explorer
|
||||
|
||||
- **Deployment:** Check Blockscout deployment scripts
|
||||
|
||||
---
|
||||
|
||||
## 📝 Deployment Checklist
|
||||
|
||||
### Besu Nodes (Priority 1)
|
||||
|
||||
- [ ] **1504** - Create `besu-sentry-5` container
|
||||
- [ ] Configure static-nodes.json
|
||||
- [ ] Configure permissioned-nodes.json
|
||||
- [ ] Enable discovery
|
||||
- [ ] Verify peer connections
|
||||
- [ ] Access: Ali (Full)
|
||||
|
||||
- [ ] **2503** - Create `besu-rpc-4` container (Ali's RPC - 0x8a)
|
||||
- [ ] Use permissioned RPC configuration
|
||||
- [ ] Configure static-nodes.json
|
||||
- [ ] Configure permissioned-nodes.json
|
||||
- [ ] **Disable discovery** (critical!)
|
||||
- [ ] Configure permissioned identity (0x8a)
|
||||
- [ ] Set up JWT authentication
|
||||
- [ ] Access: Ali (Full)
|
||||
|
||||
- [ ] **2504** - Create `besu-rpc-4` container (Ali's RPC - 0x1)
|
||||
- [ ] Use permissioned RPC configuration
|
||||
- [ ] Configure static-nodes.json
|
||||
- [ ] Configure permissioned-nodes.json
|
||||
- [ ] **Disable discovery** (critical!)
|
||||
- [ ] Configure permissioned identity (0x1)
|
||||
- [ ] Set up JWT authentication
|
||||
- [ ] Access: Ali (Full)
|
||||
|
||||
- [ ] **2505** - Create `besu-rpc-luis` container (Luis's RPC - 0x8a)
|
||||
- [ ] Use permissioned RPC configuration
|
||||
- [ ] Configure static-nodes.json
|
||||
- [ ] Configure permissioned-nodes.json
|
||||
- [ ] **Disable discovery** (critical!)
|
||||
- [ ] Configure permissioned identity (0x8a)
|
||||
- [ ] Set up JWT authentication
|
||||
- [ ] Set up RPC-only access for Luis
|
||||
- [ ] Access: Luis (RPC-only, 0x8a identity)
|
||||
|
||||
- [ ] **2506** - Create `besu-rpc-luis` container (Luis's RPC - 0x1)
|
||||
- [ ] Use permissioned RPC configuration
|
||||
- [ ] Configure static-nodes.json
|
||||
- [ ] Configure permissioned-nodes.json
|
||||
- [ ] **Disable discovery** (critical!)
|
||||
- [ ] Configure permissioned identity (0x1)
|
||||
- [ ] Set up JWT authentication
|
||||
- [ ] Set up RPC-only access for Luis
|
||||
- [ ] Access: Luis (RPC-only, 0x1 identity)
|
||||
|
||||
- [ ] **2507** - Create `besu-rpc-putu` container (Putu's RPC - 0x8a)
|
||||
- [ ] Use permissioned RPC configuration
|
||||
- [ ] Configure static-nodes.json
|
||||
- [ ] Configure permissioned-nodes.json
|
||||
- [ ] **Disable discovery** (critical!)
|
||||
- [ ] Configure permissioned identity (0x8a)
|
||||
- [ ] Set up JWT authentication
|
||||
- [ ] Set up RPC-only access for Putu
|
||||
- [ ] Access: Putu (RPC-only, 0x8a identity)
|
||||
|
||||
- [ ] **2508** - Create `besu-rpc-putu` container (Putu's RPC - 0x1)
|
||||
- [ ] Use permissioned RPC configuration
|
||||
- [ ] Configure static-nodes.json
|
||||
- [ ] Configure permissioned-nodes.json
|
||||
- [ ] **Disable discovery** (critical!)
|
||||
- [ ] Configure permissioned identity (0x1)
|
||||
- [ ] Set up JWT authentication
|
||||
- [ ] Set up RPC-only access for Putu
|
||||
- [ ] Access: Putu (RPC-only, 0x1 identity)
|
||||
|
||||
### Hyperledger Services
|
||||
|
||||
- [ ] **6200** - Create `firefly-1` container
|
||||
- [ ] **6201** - Create `firefly-2` container (Ali's host)
|
||||
- [ ] **5200** - Create `cacti-1` container
|
||||
- [ ] **6000** - Create `fabric-1` container
|
||||
- [ ] **6400** - Create `indy-1` container
|
||||
|
||||
### Explorer
|
||||
|
||||
- [ ] **5000** - Create `blockscout-1` container
|
||||
- [ ] Set up PostgreSQL database
|
||||
- [ ] Configure RPC endpoints
|
||||
- [ ] Set up indexing
|
||||
|
||||
---
|
||||
|
||||
## 🔗 Related Documentation
|
||||
|
||||
- [ChainID 138 Configuration Guide](CHAIN138_BESU_CONFIGURATION.md)
|
||||
- [ChainID 138 Quick Start](CHAIN138_QUICK_START.md)
|
||||
- [VMID Allocation](smom-dbis-138-proxmox/config/proxmox.conf)
|
||||
- [Deployment Plan](dbis_core/DEPLOYMENT_PLAN.md)
|
||||
|
||||
---
|
||||
|
||||
## 📊 Summary Statistics
|
||||
|
||||
**Total Missing:** 13 containers
|
||||
- Besu Nodes: 7 (1504, 2503, 2504, 2505, 2506, 2507, 2508)
|
||||
- Hyperledger Services: 5 (6200, 6201, 5200, 6000, 6400)
|
||||
- Explorer: 1 (5000)
|
||||
|
||||
**Total Expected:** 25 containers
|
||||
- Besu Network: 19 (12 existing + 7 new: 1504, 2503-2508)
|
||||
- Hyperledger Services: 5
|
||||
- Explorer: 1
|
||||
|
||||
**Deployment Rate:** 48% (12/25)
|
||||
|
||||
**Important:** All RPC containers (2503-2508) require JWT authentication via nginx reverse proxy.
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** December 26, 2024
|
||||
|
||||
81
docs/03-deployment/PRE_START_AUDIT_PLAN.md
Normal file
81
docs/03-deployment/PRE_START_AUDIT_PLAN.md
Normal file
@@ -0,0 +1,81 @@
|
||||
# Pre-Start Audit Plan - Hostnames and IP Addresses
|
||||
|
||||
**Date:** 2025-01-20
|
||||
**Purpose:** Comprehensive audit and fix of hostnames and IP addresses before starting VMs
|
||||
|
||||
---
|
||||
|
||||
## Tasks
|
||||
|
||||
### 1. Hostname Migration
|
||||
- **pve** (192.168.11.11) → **r630-01**
|
||||
- **pve2** (192.168.11.12) → **r630-02**
|
||||
|
||||
### 2. IP Address Audit
|
||||
- Check all VMs/containers across all Proxmox hosts
|
||||
- Verify no IP conflicts
|
||||
- Verify no invalid IPs (network/broadcast addresses)
|
||||
- Document all IP assignments
|
||||
|
||||
### 3. Consistency Check
|
||||
- Verify IPs match documentation
|
||||
- Check for inconsistencies between hosts
|
||||
- Ensure all static IPs are properly configured
|
||||
|
||||
---
|
||||
|
||||
## Scripts Available
|
||||
|
||||
1. **`scripts/comprehensive-ip-audit.sh`** - Audits all IPs for conflicts
|
||||
2. **`scripts/migrate-hostnames-proxmox.sh`** - Migrates hostnames properly
|
||||
|
||||
---
|
||||
|
||||
## Execution Order
|
||||
|
||||
1. **Run IP Audit First**
|
||||
```bash
|
||||
./scripts/comprehensive-ip-audit.sh
|
||||
```
|
||||
|
||||
2. **Fix any IP conflicts found**
|
||||
|
||||
3. **Migrate Hostnames**
|
||||
```bash
|
||||
./scripts/migrate-hostnames-proxmox.sh
|
||||
```
|
||||
|
||||
4. **Re-run IP Audit to verify**
|
||||
|
||||
5. **Start VMs**
|
||||
|
||||
---
|
||||
|
||||
## Current Known IPs (from VMID_IP_ADDRESS_LIST.md)
|
||||
|
||||
### Validators (1000-1004)
|
||||
- 192.168.11.100-104
|
||||
|
||||
### Sentries (1500-1503)
|
||||
- 192.168.11.150-153
|
||||
|
||||
### RPC Nodes
|
||||
- 192.168.11.240-242 (ThirdWeb)
|
||||
- 192.168.11.250-252 (Public RPC)
|
||||
- 192.168.11.201-204 (Named RPC)
|
||||
|
||||
### DBIS Core
|
||||
- 192.168.11.105-106 (PostgreSQL)
|
||||
- 192.168.11.120 (Redis)
|
||||
- 192.168.11.130 (Frontend)
|
||||
- 192.168.11.155-156 (API)
|
||||
|
||||
### Other Services
|
||||
- 192.168.11.60-63 (ML nodes)
|
||||
- 192.168.11.64 (Indy)
|
||||
- 192.168.11.80 (Cacti)
|
||||
- 192.168.11.112 (Fabric)
|
||||
|
||||
---
|
||||
|
||||
**Status:** Ready to execute
|
||||
120
docs/03-deployment/PRE_START_CHECKLIST.md
Normal file
120
docs/03-deployment/PRE_START_CHECKLIST.md
Normal file
@@ -0,0 +1,120 @@
|
||||
# Pre-Start Checklist - Hostnames and IP Addresses
|
||||
|
||||
**Date:** 2025-01-20
|
||||
**Purpose:** Complete audit and fixes before starting VMs on pve and pve2
|
||||
|
||||
---
|
||||
|
||||
## ✅ IP Address Audit - COMPLETE
|
||||
|
||||
**Status:** All IPs audited, no conflicts found
|
||||
|
||||
**Results:**
|
||||
- All 34 VMs/containers are currently on **ml110** (192.168.11.10)
|
||||
- **pve** (192.168.11.11) and **pve2** (192.168.11.12) have no VMs/containers yet
|
||||
- **No IP conflicts detected** across all hosts
|
||||
- **No invalid IPs** (network/broadcast addresses)
|
||||
|
||||
**Allocated IPs (34 total):**
|
||||
- 192.168.11.57, .60-.64, .80, .100-.106, .112, .120, .130, .150-.156, .201-.204, .240-.242, .250-.254
|
||||
|
||||
---
|
||||
|
||||
## ⏳ Hostname Migration - PENDING
|
||||
|
||||
### Current State
|
||||
- **pve** (192.168.11.11) - hostname: `pve`, should be: `r630-01`
|
||||
- **pve2** (192.168.11.12) - hostname: `pve2`, should be: `r630-02`
|
||||
|
||||
### Migration Steps
|
||||
|
||||
**Script Available:** `scripts/migrate-hostnames-proxmox.sh`
|
||||
|
||||
**What it does:**
|
||||
1. Updates `/etc/hostname` on both hosts
|
||||
2. Updates `/etc/hosts` to ensure proper resolution
|
||||
3. Restarts Proxmox services
|
||||
4. Verifies hostname changes
|
||||
|
||||
**To execute:**
|
||||
```bash
|
||||
cd /home/intlc/projects/proxmox
|
||||
./scripts/migrate-hostnames-proxmox.sh
|
||||
```
|
||||
|
||||
**Manual steps (if script fails):**
|
||||
```bash
|
||||
# On pve (192.168.11.11)
|
||||
ssh root@192.168.11.11
|
||||
hostnamectl set-hostname r630-01
|
||||
echo "r630-01" > /etc/hostname
|
||||
# Update /etc/hosts to include: 192.168.11.11 r630-01 r630-01.sankofa.nexus pve pve.sankofa.nexus
|
||||
systemctl restart pve-cluster pvestatd pvedaemon pveproxy
|
||||
|
||||
# On pve2 (192.168.11.12)
|
||||
ssh root@192.168.11.12
|
||||
hostnamectl set-hostname r630-02
|
||||
echo "r630-02" > /etc/hostname
|
||||
# Update /etc/hosts to include: 192.168.11.12 r630-02 r630-02.sankofa.nexus pve2 pve2.sankofa.nexus
|
||||
systemctl restart pve-cluster pvestatd pvedaemon pveproxy
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Verification Steps
|
||||
|
||||
### 1. Verify Hostnames
|
||||
```bash
|
||||
ssh root@192.168.11.11 "hostname" # Should return: r630-01
|
||||
ssh root@192.168.11.12 "hostname" # Should return: r630-02
|
||||
```
|
||||
|
||||
### 2. Verify IP Resolution
|
||||
```bash
|
||||
ssh root@192.168.11.11 "getent hosts r630-01" # Should return: 192.168.11.11
|
||||
ssh root@192.168.11.12 "getent hosts r630-02" # Should return: 192.168.11.12
|
||||
```
|
||||
|
||||
### 3. Verify Proxmox Services
|
||||
```bash
|
||||
ssh root@192.168.11.11 "systemctl status pve-cluster pveproxy | grep Active"
|
||||
ssh root@192.168.11.12 "systemctl status pve-cluster pveproxy | grep Active"
|
||||
```
|
||||
|
||||
### 4. Re-run IP Audit
|
||||
```bash
|
||||
./scripts/check-all-vm-ips.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
### ✅ Completed
|
||||
- [x] IP address audit across all hosts
|
||||
- [x] Conflict detection (none found)
|
||||
- [x] Invalid IP detection (none found)
|
||||
- [x] Documentation of all IP assignments
|
||||
|
||||
### ⏳ Pending
|
||||
- [ ] Hostname migration (pve → r630-01)
|
||||
- [ ] Hostname migration (pve2 → r630-02)
|
||||
- [ ] Verification of hostname changes
|
||||
- [ ] Final IP audit after hostname changes
|
||||
|
||||
### 📋 Ready to Execute
|
||||
1. Run hostname migration script
|
||||
2. Verify changes
|
||||
3. Start VMs on pve/pve2
|
||||
|
||||
---
|
||||
|
||||
## Scripts Available
|
||||
|
||||
1. **`scripts/check-all-vm-ips.sh`** - ✅ Working - Audits all IPs
|
||||
2. **`scripts/migrate-hostnames-proxmox.sh`** - Ready - Migrates hostnames
|
||||
3. **`scripts/diagnose-proxmox-hosts.sh`** - ✅ Working - Diagnostics
|
||||
|
||||
---
|
||||
|
||||
**Status:** IP audit complete, ready for hostname migration
|
||||
Reference in New Issue
Block a user