503 lines
11 KiB
Markdown
503 lines
11 KiB
Markdown
|
|
# Backup and Recreation Plan
|
||
|
|
|
||
|
|
**Date:** January 7, 2026
|
||
|
|
**Status:** 📋 **PLAN READY FOR IMPLEMENTATION**
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Executive Summary
|
||
|
|
|
||
|
|
This document outlines a comprehensive plan for:
|
||
|
|
1. **Setting up automated backups** to prevent future data loss
|
||
|
|
2. **Recreating lost containers** from their configurations
|
||
|
|
3. **Restoring data** from backups if available
|
||
|
|
4. **Best practices** for ongoing backup management
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Current Situation
|
||
|
|
|
||
|
|
### Containers Status
|
||
|
|
|
||
|
|
**Containers with Data ✅ (7 containers):**
|
||
|
|
- 100, 101, 102, 103, 104, 105, 130 (migrated from r630-02)
|
||
|
|
|
||
|
|
**Containers Without Data ❌ (~28 containers):**
|
||
|
|
- 106, 107, 108 (empty volumes)
|
||
|
|
- 3000-10151 (empty volumes)
|
||
|
|
|
||
|
|
### Data Loss Summary
|
||
|
|
|
||
|
|
- **Lost During:** RAID 10 expansion (4→6 disks)
|
||
|
|
- **Cause:** RAID recreation wiped all data structures
|
||
|
|
- **Recovery:** Not possible from thin1 (data overwritten)
|
||
|
|
- **Solution:** Restore from backups or recreate from templates
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Part 1: Automated Backup Setup
|
||
|
|
|
||
|
|
### Objective
|
||
|
|
|
||
|
|
Set up automated daily backups for all containers/VMs to prevent future data loss.
|
||
|
|
|
||
|
|
### Implementation
|
||
|
|
|
||
|
|
#### Step 1: Create Backup Script
|
||
|
|
|
||
|
|
**Script:** `scripts/setup-automated-backups.sh`
|
||
|
|
|
||
|
|
**Features:**
|
||
|
|
- Daily backups at 2 AM
|
||
|
|
- Snapshot mode (no downtime)
|
||
|
|
- Gzip compression
|
||
|
|
- Automatic cleanup (keep 7 days)
|
||
|
|
- Logging to `/var/log/proxmox-backups/`
|
||
|
|
|
||
|
|
**Usage:**
|
||
|
|
```bash
|
||
|
|
./scripts/setup-automated-backups.sh
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Step 2: Manual Backup Script
|
||
|
|
|
||
|
|
**Script:** `/usr/local/bin/manual-backup.sh` (created on r630-01)
|
||
|
|
|
||
|
|
**Usage:**
|
||
|
|
```bash
|
||
|
|
# Backup specific containers
|
||
|
|
ssh root@192.168.11.11 "/usr/local/bin/manual-backup.sh 106 107 108"
|
||
|
|
|
||
|
|
# Backup all running containers
|
||
|
|
ssh root@192.168.11.11 "pct list | awk 'NR>1 && \$2==\"running\" {print \$1}' | xargs /usr/local/bin/manual-backup.sh"
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Step 3: Backup Storage
|
||
|
|
|
||
|
|
**Current Storage:**
|
||
|
|
- `local` storage: `/var/lib/vz/dump/` (directory storage)
|
||
|
|
- Capacity: ~536GB available
|
||
|
|
|
||
|
|
**Backup Location:**
|
||
|
|
- `/var/lib/vz/dump/vzdump-lxc-<vmid>-<timestamp>.tar.gz`
|
||
|
|
- `/var/lib/vz/dump/vzdump-qemu-<vmid>-<timestamp>.vma.gz`
|
||
|
|
|
||
|
|
#### Step 4: Backup Schedule
|
||
|
|
|
||
|
|
**Automated:**
|
||
|
|
- **Frequency:** Daily at 2:00 AM
|
||
|
|
- **Mode:** Snapshot (no downtime)
|
||
|
|
- **Compression:** Gzip
|
||
|
|
- **Retention:** 7 days
|
||
|
|
|
||
|
|
**Manual:**
|
||
|
|
- Run anytime using `/usr/local/bin/manual-backup.sh`
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Part 2: Container Recreation Plan
|
||
|
|
|
||
|
|
### Objective
|
||
|
|
|
||
|
|
Recreate containers that lost data, restoring them to a working state.
|
||
|
|
|
||
|
|
### Approach
|
||
|
|
|
||
|
|
#### Option A: Restore from Backups (If Available)
|
||
|
|
|
||
|
|
**Steps:**
|
||
|
|
1. Check for backups:
|
||
|
|
```bash
|
||
|
|
find /var/lib/vz/dump -name "*106*" -o -name "*107*" -o -name "*108*"
|
||
|
|
```
|
||
|
|
|
||
|
|
2. Restore container:
|
||
|
|
```bash
|
||
|
|
pct restore <vmid> <backup_file> --storage thin1
|
||
|
|
```
|
||
|
|
|
||
|
|
3. Start container:
|
||
|
|
```bash
|
||
|
|
pct start <vmid>
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Option B: Recreate from Templates (If No Backups)
|
||
|
|
|
||
|
|
**Steps:**
|
||
|
|
1. Use recreation script:
|
||
|
|
```bash
|
||
|
|
./scripts/recreate-containers-from-configs.sh 106 107 108
|
||
|
|
```
|
||
|
|
|
||
|
|
2. Script will:
|
||
|
|
- Read container configuration
|
||
|
|
- Destroy empty container
|
||
|
|
- Recreate from template
|
||
|
|
- Restore configuration
|
||
|
|
- Create volume on correct storage
|
||
|
|
|
||
|
|
3. Manual recreation:
|
||
|
|
```bash
|
||
|
|
# Download template if needed
|
||
|
|
pveam download local ubuntu-22.04-standard_22.04-1_amd64.tar.zst
|
||
|
|
|
||
|
|
# Recreate container
|
||
|
|
pct create <vmid> /var/lib/vz/template/cache/ubuntu-22.04-standard_22.04-1_amd64.tar.zst \
|
||
|
|
--storage thin1 --rootfs thin1:10G \
|
||
|
|
--hostname <hostname> \
|
||
|
|
--memory <memory> --swap <swap> --cores <cores> \
|
||
|
|
--net0 name=eth0,bridge=vmbr0,ip=<ip>/24
|
||
|
|
```
|
||
|
|
|
||
|
|
### Container Recreation Priority
|
||
|
|
|
||
|
|
**High Priority (Critical Services):**
|
||
|
|
1. **106** - redis-rpc-translator
|
||
|
|
2. **107** - web3signer-rpc-translator
|
||
|
|
3. **108** - vault-rpc-translator
|
||
|
|
4. **3000-3003** - ml110 containers
|
||
|
|
5. **3500** - oracle-publisher-1
|
||
|
|
6. **3501** - ccip-monitor-1
|
||
|
|
|
||
|
|
**Medium Priority:**
|
||
|
|
7. **5200** - cacti-1
|
||
|
|
8. **6000** - fabric-1
|
||
|
|
9. **6400** - indy-1
|
||
|
|
10. **10100-10151** - dbis containers
|
||
|
|
|
||
|
|
**Lower Priority:**
|
||
|
|
11. **10000-10092** - order containers
|
||
|
|
12. **10200-10230** - monitoring containers
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Part 3: Data Restoration Procedures
|
||
|
|
|
||
|
|
### Step 1: Check for Existing Backups
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check all nodes for backups
|
||
|
|
for node in ml110 r630-01 r630-02; do
|
||
|
|
echo "=== $node ==="
|
||
|
|
ssh root@$node "find /var/lib/vz/dump -name '*106*' -o -name '*107*' -o -name '*108*'"
|
||
|
|
done
|
||
|
|
|
||
|
|
# Check Proxmox Backup Server (if configured)
|
||
|
|
pvesm list | grep backup
|
||
|
|
```
|
||
|
|
|
||
|
|
### Step 2: Restore from Backup
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Copy backup to r630-01
|
||
|
|
scp root@source:/var/lib/vz/dump/vzdump-lxc-106-*.tar.gz root@192.168.11.11:/var/lib/vz/dump/
|
||
|
|
|
||
|
|
# Restore container
|
||
|
|
ssh root@192.168.11.11 "pct restore 106 /var/lib/vz/dump/vzdump-lxc-106-*.tar.gz --storage thin1"
|
||
|
|
|
||
|
|
# Start container
|
||
|
|
ssh root@192.168.11.11 "pct start 106"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Step 3: Verify Restoration
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check container status
|
||
|
|
pct list | grep 106
|
||
|
|
|
||
|
|
# Check container logs
|
||
|
|
pct logs 106
|
||
|
|
|
||
|
|
# Test services
|
||
|
|
pct exec 106 -- systemctl status <service>
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Part 4: Ongoing Backup Management
|
||
|
|
|
||
|
|
### Daily Operations
|
||
|
|
|
||
|
|
**Automated Backups:**
|
||
|
|
- Run automatically at 2 AM daily
|
||
|
|
- Logs available in `/var/log/proxmox-backups/`
|
||
|
|
- Check logs weekly for errors
|
||
|
|
|
||
|
|
**Manual Backups:**
|
||
|
|
- Before major changes
|
||
|
|
- Before migrations
|
||
|
|
- Before updates
|
||
|
|
|
||
|
|
### Backup Verification
|
||
|
|
|
||
|
|
**Weekly Checks:**
|
||
|
|
```bash
|
||
|
|
# Check backup directory
|
||
|
|
ls -lh /var/lib/vz/dump/
|
||
|
|
|
||
|
|
# Check backup logs
|
||
|
|
tail -f /var/log/proxmox-backups/backup_$(date +%Y%m%d).log
|
||
|
|
|
||
|
|
# Verify backup integrity
|
||
|
|
tar -tzf /var/lib/vz/dump/vzdump-lxc-106-*.tar.gz | head -10
|
||
|
|
```
|
||
|
|
|
||
|
|
### Backup Retention
|
||
|
|
|
||
|
|
**Current Policy:**
|
||
|
|
- Keep last 7 days of backups
|
||
|
|
- Cleanup runs automatically after each backup
|
||
|
|
|
||
|
|
**Recommended Policy:**
|
||
|
|
- Daily backups: Keep 7 days
|
||
|
|
- Weekly backups: Keep 4 weeks
|
||
|
|
- Monthly backups: Keep 12 months
|
||
|
|
|
||
|
|
### Backup Storage Management
|
||
|
|
|
||
|
|
**Monitor Storage:**
|
||
|
|
```bash
|
||
|
|
# Check backup storage usage
|
||
|
|
df -h /var/lib/vz/dump/
|
||
|
|
pvesm status | grep local
|
||
|
|
|
||
|
|
# Cleanup old backups manually if needed
|
||
|
|
find /var/lib/vz/dump -name "*.tar.gz" -mtime +7 -delete
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Part 5: Implementation Checklist
|
||
|
|
|
||
|
|
### Phase 1: Backup Setup ✅
|
||
|
|
|
||
|
|
- [ ] Run `scripts/setup-automated-backups.sh`
|
||
|
|
- [ ] Verify cron job is set up
|
||
|
|
- [ ] Test manual backup script
|
||
|
|
- [ ] Verify backup storage has space
|
||
|
|
- [ ] Run test backup
|
||
|
|
|
||
|
|
### Phase 2: Check for Existing Backups
|
||
|
|
|
||
|
|
- [ ] Check `/var/lib/vz/dump/` on all nodes
|
||
|
|
- [ ] Check external backup locations
|
||
|
|
- [ ] Check Proxmox Backup Server (if configured)
|
||
|
|
- [ ] Document found backups
|
||
|
|
|
||
|
|
### Phase 3: Restore from Backups (If Available)
|
||
|
|
|
||
|
|
- [ ] Copy backups to r630-01
|
||
|
|
- [ ] Restore containers using `pct restore`
|
||
|
|
- [ ] Verify containers are working
|
||
|
|
- [ ] Start containers
|
||
|
|
- [ ] Test services
|
||
|
|
|
||
|
|
### Phase 4: Recreate Containers (If No Backups)
|
||
|
|
|
||
|
|
- [ ] Prioritize containers by importance
|
||
|
|
- [ ] Download required templates
|
||
|
|
- [ ] Run recreation script for each container
|
||
|
|
- [ ] Restore configurations manually
|
||
|
|
- [ ] Install applications
|
||
|
|
- [ ] Restore application data (if available)
|
||
|
|
|
||
|
|
### Phase 5: Verify and Document
|
||
|
|
|
||
|
|
- [ ] Verify all containers are running
|
||
|
|
- [ ] Test all services
|
||
|
|
- [ ] Document restoration process
|
||
|
|
- [ ] Update backup procedures
|
||
|
|
- [ ] Schedule regular backup verification
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Part 6: Best Practices
|
||
|
|
|
||
|
|
### Backup Best Practices
|
||
|
|
|
||
|
|
1. **Automated Backups:**
|
||
|
|
- Set up daily automated backups
|
||
|
|
- Use snapshot mode for running containers
|
||
|
|
- Compress backups to save space
|
||
|
|
- Keep multiple backup copies
|
||
|
|
|
||
|
|
2. **Backup Storage:**
|
||
|
|
- Use separate storage for backups
|
||
|
|
- Monitor backup storage usage
|
||
|
|
- Consider off-site backups
|
||
|
|
- Use Proxmox Backup Server for better management
|
||
|
|
|
||
|
|
3. **Backup Testing:**
|
||
|
|
- Test backup restoration regularly
|
||
|
|
- Verify backup integrity
|
||
|
|
- Document restoration procedures
|
||
|
|
- Keep backup logs
|
||
|
|
|
||
|
|
### Container Recreation Best Practices
|
||
|
|
|
||
|
|
1. **Before Recreation:**
|
||
|
|
- Check for backups first
|
||
|
|
- Document container configurations
|
||
|
|
- Note any custom settings
|
||
|
|
- Plan recreation order
|
||
|
|
|
||
|
|
2. **During Recreation:**
|
||
|
|
- Recreate from templates
|
||
|
|
- Restore configurations
|
||
|
|
- Install applications
|
||
|
|
- Restore data if available
|
||
|
|
|
||
|
|
3. **After Recreation:**
|
||
|
|
- Verify containers work
|
||
|
|
- Test all services
|
||
|
|
- Update documentation
|
||
|
|
- Set up backups immediately
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Part 7: Recovery Procedures
|
||
|
|
|
||
|
|
### Container Recovery Workflow
|
||
|
|
|
||
|
|
```
|
||
|
|
1. Check for Backups
|
||
|
|
├─ Found? → Restore from Backup
|
||
|
|
└─ Not Found? → Recreate from Template
|
||
|
|
|
||
|
|
2. Restore from Backup
|
||
|
|
├─ Copy backup to r630-01
|
||
|
|
├─ Restore using pct restore
|
||
|
|
├─ Start container
|
||
|
|
└─ Verify services
|
||
|
|
|
||
|
|
3. Recreate from Template
|
||
|
|
├─ Read container config
|
||
|
|
├─ Download template
|
||
|
|
├─ Create container
|
||
|
|
├─ Restore configuration
|
||
|
|
├─ Install applications
|
||
|
|
└─ Restore data (if available)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Emergency Recovery
|
||
|
|
|
||
|
|
**If backups fail:**
|
||
|
|
1. Stop affected containers
|
||
|
|
2. Check backup logs
|
||
|
|
3. Manually create backup
|
||
|
|
4. Restore from manual backup
|
||
|
|
5. Verify restoration
|
||
|
|
|
||
|
|
**If recreation fails:**
|
||
|
|
1. Check container logs
|
||
|
|
2. Verify template exists
|
||
|
|
3. Check storage availability
|
||
|
|
4. Retry recreation
|
||
|
|
5. Contact support if needed
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Part 8: Monitoring and Maintenance
|
||
|
|
|
||
|
|
### Backup Monitoring
|
||
|
|
|
||
|
|
**Daily:**
|
||
|
|
- Check backup logs for errors
|
||
|
|
- Verify backups completed successfully
|
||
|
|
|
||
|
|
**Weekly:**
|
||
|
|
- Review backup storage usage
|
||
|
|
- Test backup restoration
|
||
|
|
- Clean up old backups
|
||
|
|
|
||
|
|
**Monthly:**
|
||
|
|
- Review backup policies
|
||
|
|
- Update backup procedures
|
||
|
|
- Document any issues
|
||
|
|
|
||
|
|
### Container Monitoring
|
||
|
|
|
||
|
|
**After Recreation:**
|
||
|
|
- Monitor container status
|
||
|
|
- Check service logs
|
||
|
|
- Verify applications work
|
||
|
|
- Test functionality
|
||
|
|
|
||
|
|
**Ongoing:**
|
||
|
|
- Regular health checks
|
||
|
|
- Monitor resource usage
|
||
|
|
- Update applications
|
||
|
|
- Maintain backups
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Commands Reference
|
||
|
|
|
||
|
|
### Backup Commands
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Setup automated backups
|
||
|
|
./scripts/setup-automated-backups.sh
|
||
|
|
|
||
|
|
# Manual backup
|
||
|
|
ssh root@192.168.11.11 "/usr/local/bin/manual-backup.sh 106 107 108"
|
||
|
|
|
||
|
|
# Check backup status
|
||
|
|
ssh root@192.168.11.11 "ls -lh /var/lib/vz/dump/"
|
||
|
|
|
||
|
|
# View backup logs
|
||
|
|
ssh root@192.168.11.11 "tail -f /var/log/proxmox-backups/backup_$(date +%Y%m%d).log"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Restoration Commands
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Restore from backup
|
||
|
|
ssh root@192.168.11.11 "pct restore 106 /var/lib/vz/dump/vzdump-lxc-106-*.tar.gz --storage thin1"
|
||
|
|
|
||
|
|
# Recreate from template
|
||
|
|
./scripts/recreate-containers-from-configs.sh 106 107 108
|
||
|
|
|
||
|
|
# Check container status
|
||
|
|
ssh root@192.168.11.11 "pct list | grep 106"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Verification Commands
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check container volumes
|
||
|
|
ssh root@192.168.11.11 "lvs pve | grep vm-106-disk"
|
||
|
|
|
||
|
|
# Check container config
|
||
|
|
ssh root@192.168.11.11 "pct config 106"
|
||
|
|
|
||
|
|
# Check container logs
|
||
|
|
ssh root@192.168.11.11 "pct logs 106"
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Next Steps
|
||
|
|
|
||
|
|
1. **Immediate:**
|
||
|
|
- [ ] Run backup setup script
|
||
|
|
- [ ] Check for existing backups
|
||
|
|
- [ ] Document found backups
|
||
|
|
|
||
|
|
2. **Short-term:**
|
||
|
|
- [ ] Restore containers from backups (if available)
|
||
|
|
- [ ] Recreate high-priority containers
|
||
|
|
- [ ] Verify all services work
|
||
|
|
|
||
|
|
3. **Long-term:**
|
||
|
|
- [ ] Set up Proxmox Backup Server
|
||
|
|
- [ ] Implement off-site backups
|
||
|
|
- [ ] Regular backup testing
|
||
|
|
- [ ] Documentation updates
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Status:** 📋 **PLAN READY**
|
||
|
|
**Next Action:** Run backup setup script and check for existing backups
|
||
|
|
**Last Updated:** January 7, 2026
|