# Backup and Recreation Plan **Date:** January 7, 2026 **Status:** 📋 **PLAN READY FOR IMPLEMENTATION** --- ## Executive Summary This document outlines a comprehensive plan for: 1. **Setting up automated backups** to prevent future data loss 2. **Recreating lost containers** from their configurations 3. **Restoring data** from backups if available 4. **Best practices** for ongoing backup management --- ## Current Situation ### Containers Status **Containers with Data ✅ (7 containers):** - 100, 101, 102, 103, 104, 105, 130 (migrated from r630-02) **Containers Without Data ❌ (~28 containers):** - 106, 107, 108 (empty volumes) - 3000-10151 (empty volumes) ### Data Loss Summary - **Lost During:** RAID 10 expansion (4→6 disks) - **Cause:** RAID recreation wiped all data structures - **Recovery:** Not possible from thin1 (data overwritten) - **Solution:** Restore from backups or recreate from templates --- ## Part 1: Automated Backup Setup ### Objective Set up automated daily backups for all containers/VMs to prevent future data loss. ### Implementation #### Step 1: Create Backup Script **Script:** `scripts/setup-automated-backups.sh` **Features:** - Daily backups at 2 AM - Snapshot mode (no downtime) - Gzip compression - Automatic cleanup (keep 7 days) - Logging to `/var/log/proxmox-backups/` **Usage:** ```bash ./scripts/setup-automated-backups.sh ``` #### Step 2: Manual Backup Script **Script:** `/usr/local/bin/manual-backup.sh` (created on r630-01) **Usage:** ```bash # Backup specific containers ssh root@192.168.11.11 "/usr/local/bin/manual-backup.sh 106 107 108" # Backup all running containers ssh root@192.168.11.11 "pct list | awk 'NR>1 && \$2==\"running\" {print \$1}' | xargs /usr/local/bin/manual-backup.sh" ``` #### Step 3: Backup Storage **Current Storage:** - `local` storage: `/var/lib/vz/dump/` (directory storage) - Capacity: ~536GB available **Backup Location:** - `/var/lib/vz/dump/vzdump-lxc--.tar.gz` - `/var/lib/vz/dump/vzdump-qemu--.vma.gz` #### Step 4: Backup Schedule **Automated:** - **Frequency:** Daily at 2:00 AM - **Mode:** Snapshot (no downtime) - **Compression:** Gzip - **Retention:** 7 days **Manual:** - Run anytime using `/usr/local/bin/manual-backup.sh` --- ## Part 2: Container Recreation Plan ### Objective Recreate containers that lost data, restoring them to a working state. ### Approach #### Option A: Restore from Backups (If Available) **Steps:** 1. Check for backups: ```bash find /var/lib/vz/dump -name "*106*" -o -name "*107*" -o -name "*108*" ``` 2. Restore container: ```bash pct restore --storage thin1 ``` 3. Start container: ```bash pct start ``` #### Option B: Recreate from Templates (If No Backups) **Steps:** 1. Use recreation script: ```bash ./scripts/recreate-containers-from-configs.sh 106 107 108 ``` 2. Script will: - Read container configuration - Destroy empty container - Recreate from template - Restore configuration - Create volume on correct storage 3. Manual recreation: ```bash # Download template if needed pveam download local ubuntu-22.04-standard_22.04-1_amd64.tar.zst # Recreate container pct create /var/lib/vz/template/cache/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \ --storage thin1 --rootfs thin1:10G \ --hostname \ --memory --swap --cores \ --net0 name=eth0,bridge=vmbr0,ip=/24 ``` ### Container Recreation Priority **High Priority (Critical Services):** 1. **106** - redis-rpc-translator 2. **107** - web3signer-rpc-translator 3. **108** - vault-rpc-translator 4. **3000-3003** - ml110 containers 5. **3500** - oracle-publisher-1 6. **3501** - ccip-monitor-1 **Medium Priority:** 7. **5200** - cacti-1 8. **6000** - fabric-1 9. **6400** - indy-1 10. **10100-10151** - dbis containers **Lower Priority:** 11. **10000-10092** - order containers 12. **10200-10230** - monitoring containers --- ## Part 3: Data Restoration Procedures ### Step 1: Check for Existing Backups ```bash # Check all nodes for backups for node in ml110 r630-01 r630-02; do echo "=== $node ===" ssh root@$node "find /var/lib/vz/dump -name '*106*' -o -name '*107*' -o -name '*108*'" done # Check Proxmox Backup Server (if configured) pvesm list | grep backup ``` ### Step 2: Restore from Backup ```bash # Copy backup to r630-01 scp root@source:/var/lib/vz/dump/vzdump-lxc-106-*.tar.gz root@192.168.11.11:/var/lib/vz/dump/ # Restore container ssh root@192.168.11.11 "pct restore 106 /var/lib/vz/dump/vzdump-lxc-106-*.tar.gz --storage thin1" # Start container ssh root@192.168.11.11 "pct start 106" ``` ### Step 3: Verify Restoration ```bash # Check container status pct list | grep 106 # Check container logs pct logs 106 # Test services pct exec 106 -- systemctl status ``` --- ## Part 4: Ongoing Backup Management ### Daily Operations **Automated Backups:** - Run automatically at 2 AM daily - Logs available in `/var/log/proxmox-backups/` - Check logs weekly for errors **Manual Backups:** - Before major changes - Before migrations - Before updates ### Backup Verification **Weekly Checks:** ```bash # Check backup directory ls -lh /var/lib/vz/dump/ # Check backup logs tail -f /var/log/proxmox-backups/backup_$(date +%Y%m%d).log # Verify backup integrity tar -tzf /var/lib/vz/dump/vzdump-lxc-106-*.tar.gz | head -10 ``` ### Backup Retention **Current Policy:** - Keep last 7 days of backups - Cleanup runs automatically after each backup **Recommended Policy:** - Daily backups: Keep 7 days - Weekly backups: Keep 4 weeks - Monthly backups: Keep 12 months ### Backup Storage Management **Monitor Storage:** ```bash # Check backup storage usage df -h /var/lib/vz/dump/ pvesm status | grep local # Cleanup old backups manually if needed find /var/lib/vz/dump -name "*.tar.gz" -mtime +7 -delete ``` --- ## Part 5: Implementation Checklist ### Phase 1: Backup Setup ✅ - [ ] Run `scripts/setup-automated-backups.sh` - [ ] Verify cron job is set up - [ ] Test manual backup script - [ ] Verify backup storage has space - [ ] Run test backup ### Phase 2: Check for Existing Backups - [ ] Check `/var/lib/vz/dump/` on all nodes - [ ] Check external backup locations - [ ] Check Proxmox Backup Server (if configured) - [ ] Document found backups ### Phase 3: Restore from Backups (If Available) - [ ] Copy backups to r630-01 - [ ] Restore containers using `pct restore` - [ ] Verify containers are working - [ ] Start containers - [ ] Test services ### Phase 4: Recreate Containers (If No Backups) - [ ] Prioritize containers by importance - [ ] Download required templates - [ ] Run recreation script for each container - [ ] Restore configurations manually - [ ] Install applications - [ ] Restore application data (if available) ### Phase 5: Verify and Document - [ ] Verify all containers are running - [ ] Test all services - [ ] Document restoration process - [ ] Update backup procedures - [ ] Schedule regular backup verification --- ## Part 6: Best Practices ### Backup Best Practices 1. **Automated Backups:** - Set up daily automated backups - Use snapshot mode for running containers - Compress backups to save space - Keep multiple backup copies 2. **Backup Storage:** - Use separate storage for backups - Monitor backup storage usage - Consider off-site backups - Use Proxmox Backup Server for better management 3. **Backup Testing:** - Test backup restoration regularly - Verify backup integrity - Document restoration procedures - Keep backup logs ### Container Recreation Best Practices 1. **Before Recreation:** - Check for backups first - Document container configurations - Note any custom settings - Plan recreation order 2. **During Recreation:** - Recreate from templates - Restore configurations - Install applications - Restore data if available 3. **After Recreation:** - Verify containers work - Test all services - Update documentation - Set up backups immediately --- ## Part 7: Recovery Procedures ### Container Recovery Workflow ``` 1. Check for Backups ├─ Found? → Restore from Backup └─ Not Found? → Recreate from Template 2. Restore from Backup ├─ Copy backup to r630-01 ├─ Restore using pct restore ├─ Start container └─ Verify services 3. Recreate from Template ├─ Read container config ├─ Download template ├─ Create container ├─ Restore configuration ├─ Install applications └─ Restore data (if available) ``` ### Emergency Recovery **If backups fail:** 1. Stop affected containers 2. Check backup logs 3. Manually create backup 4. Restore from manual backup 5. Verify restoration **If recreation fails:** 1. Check container logs 2. Verify template exists 3. Check storage availability 4. Retry recreation 5. Contact support if needed --- ## Part 8: Monitoring and Maintenance ### Backup Monitoring **Daily:** - Check backup logs for errors - Verify backups completed successfully **Weekly:** - Review backup storage usage - Test backup restoration - Clean up old backups **Monthly:** - Review backup policies - Update backup procedures - Document any issues ### Container Monitoring **After Recreation:** - Monitor container status - Check service logs - Verify applications work - Test functionality **Ongoing:** - Regular health checks - Monitor resource usage - Update applications - Maintain backups --- ## Commands Reference ### Backup Commands ```bash # Setup automated backups ./scripts/setup-automated-backups.sh # Manual backup ssh root@192.168.11.11 "/usr/local/bin/manual-backup.sh 106 107 108" # Check backup status ssh root@192.168.11.11 "ls -lh /var/lib/vz/dump/" # View backup logs ssh root@192.168.11.11 "tail -f /var/log/proxmox-backups/backup_$(date +%Y%m%d).log" ``` ### Restoration Commands ```bash # Restore from backup ssh root@192.168.11.11 "pct restore 106 /var/lib/vz/dump/vzdump-lxc-106-*.tar.gz --storage thin1" # Recreate from template ./scripts/recreate-containers-from-configs.sh 106 107 108 # Check container status ssh root@192.168.11.11 "pct list | grep 106" ``` ### Verification Commands ```bash # Check container volumes ssh root@192.168.11.11 "lvs pve | grep vm-106-disk" # Check container config ssh root@192.168.11.11 "pct config 106" # Check container logs ssh root@192.168.11.11 "pct logs 106" ``` --- ## Next Steps 1. **Immediate:** - [ ] Run backup setup script - [ ] Check for existing backups - [ ] Document found backups 2. **Short-term:** - [ ] Restore containers from backups (if available) - [ ] Recreate high-priority containers - [ ] Verify all services work 3. **Long-term:** - [ ] Set up Proxmox Backup Server - [ ] Implement off-site backups - [ ] Regular backup testing - [ ] Documentation updates --- **Status:** 📋 **PLAN READY** **Next Action:** Run backup setup script and check for existing backups **Last Updated:** January 7, 2026