Files
Sankofa/docs/archive/status/GUEST_AGENT_COMPLETE_PROCEDURE.md
defiQUG 7cd7022f6e Update .gitignore, remove package-lock.json, and enhance Cloudflare and Proxmox adapters
- Added lock file exclusions for pnpm in .gitignore.
- Removed obsolete package-lock.json from the api and portal directories.
- Enhanced Cloudflare adapter with additional interfaces for zones and tunnels.
- Improved Proxmox adapter error handling and logging for API requests.
- Updated Proxmox VM parameters with validation rules in the API schema.
- Enhanced documentation for Proxmox VM specifications and examples.
2025-12-12 19:29:01 -08:00

381 lines
8.7 KiB
Markdown

# QEMU Guest Agent: Complete Setup and Verification Procedure
**Last Updated**: 2025-12-11
**Status**: ✅ Complete and Verified
---
## Overview
This document provides comprehensive procedures for ensuring QEMU Guest Agent is properly configured in all VMs across the Sankofa Phoenix infrastructure. The guest agent is critical for:
- Graceful VM shutdown/restart
- VM lock prevention
- Guest OS command execution
- IP address detection
- Resource monitoring
---
## Architecture
### Two-Level Configuration
1. **Proxmox Level** (`agent: 1` in VM config)
- Configured by Crossplane provider automatically
- Enables guest agent communication channel
2. **Guest OS Level** (package + service)
- `qemu-guest-agent` package installed
- `qemu-guest-agent` service running
- Configured via cloud-init in all templates
---
## Automatic Configuration
### ✅ Crossplane Provider (Automatic)
The Crossplane provider **automatically** sets `agent: 1` during:
- **VM Creation** (`pkg/proxmox/client.go:317`)
- **VM Cloning** (`pkg/proxmox/client.go:242`)
- **VM Updates** (`pkg/proxmox/client.go:671`)
**No manual intervention required** - this is handled by the provider.
### ✅ Cloud-Init Templates (Automatic)
All VM templates include enhanced guest agent configuration:
1. **Package Installation**: `qemu-guest-agent` in packages list
2. **Service Enablement**: `systemctl enable qemu-guest-agent`
3. **Service Start**: `systemctl start qemu-guest-agent`
4. **Verification**: Automatic retry logic with status checks
5. **Error Handling**: Automatic installation if package missing
**Templates Updated**:
-`examples/production/basic-vm.yaml`
-`examples/production/medium-vm.yaml`
-`examples/production/large-vm.yaml`
-`crossplane-provider-proxmox/examples/vm-example.yaml`
-`gitops/infrastructure/claims/vm-claim-example.yaml`
- ✅ All 29 production VM templates (via enhancement script)
---
## Verification Procedures
### 1. Check Proxmox Configuration
**On Proxmox Node:**
```bash
# Check if guest agent is enabled in VM config
qm config <VMID> | grep agent
# Expected output:
# agent: 1
```
**If not enabled:**
```bash
qm set <VMID> --agent 1
```
### 2. Check Guest OS Package
**On Proxmox Node (requires working guest agent):**
```bash
# Check if package is installed
qm guest exec <VMID> -- dpkg -l | grep qemu-guest-agent
# Expected output:
# ii qemu-guest-agent <version> amd64 Guest communication agent for QEMU
```
**If not installed (via console/SSH):**
```bash
apt-get update
apt-get install -y qemu-guest-agent
systemctl enable qemu-guest-agent
systemctl start qemu-guest-agent
```
### 3. Check Guest OS Service
**On Proxmox Node:**
```bash
# Check service status
qm guest exec <VMID> -- systemctl status qemu-guest-agent
# Expected output:
# ● qemu-guest-agent.service - QEMU Guest Agent
# Loaded: loaded (...)
# Active: active (running) since ...
```
**If not running:**
```bash
qm guest exec <VMID> -- systemctl enable qemu-guest-agent
qm guest exec <VMID> -- systemctl start qemu-guest-agent
```
### 4. Comprehensive Check Script
**Use the automated check script:**
```bash
# On Proxmox node
/usr/local/bin/complete-vm-100-guest-agent-check.sh
# Or for any VM:
VMID=100
/usr/local/bin/complete-vm-100-guest-agent-check.sh
```
**Script checks:**
- ✅ VM exists and is running
- ✅ Proxmox guest agent config (`agent: 1`)
- ✅ Package installation
- ✅ Service status
- ✅ Provides clear error messages
---
## Troubleshooting
### Issue: "No QEMU guest agent configured"
**Symptoms:**
- `qm guest exec` commands fail
- Proxmox shows "No Guest Agent" in UI
**Causes:**
1. Guest agent not enabled in Proxmox config
2. Package not installed in guest OS
3. Service not running in guest OS
4. VM needs restart after configuration
**Solutions:**
1. **Enable in Proxmox:**
```bash
qm set <VMID> --agent 1
```
2. **Install in Guest OS:**
```bash
# Via console or SSH
apt-get update
apt-get install -y qemu-guest-agent
systemctl enable qemu-guest-agent
systemctl start qemu-guest-agent
```
3. **Restart VM:**
```bash
qm shutdown <VMID> # Graceful (requires working agent)
# OR
qm stop <VMID> # Force stop
qm start <VMID>
```
### Issue: VM Lock Issues
**Symptoms:**
- `qm` commands fail with lock errors
- VM appears stuck
**Solution:**
```bash
# Check for locks
ls -la /var/lock/qemu-server/lock-<VMID>.conf
# Remove lock (if safe)
qm unlock <VMID>
# Force stop if needed
qm stop <VMID> --skiplock
```
### Issue: Guest Agent Not Starting
**Symptoms:**
- Package installed but service not running
- Service fails to start
**Diagnosis:**
```bash
# Check service logs
journalctl -u qemu-guest-agent -n 50
# Check service status
systemctl status qemu-guest-agent -l
```
**Common Causes:**
- Missing dependencies
- Permission issues
- VM needs restart
**Solution:**
```bash
# Reinstall package
apt-get remove --purge qemu-guest-agent
apt-get install -y qemu-guest-agent
# Restart service
systemctl restart qemu-guest-agent
# If still failing, restart VM
```
---
## Best Practices
### 1. Always Include Guest Agent in Templates
**Required cloud-init configuration:**
```yaml
packages:
- qemu-guest-agent
runcmd:
- systemctl enable qemu-guest-agent
- systemctl start qemu-guest-agent
- |
# Verification with retry
for i in {1..30}; do
if systemctl is-active --quiet qemu-guest-agent; then
echo "✅ Guest agent running"
exit 0
fi
sleep 1
done
```
### 2. Verify After VM Creation
**Always verify guest agent after creating a VM:**
```bash
# Wait for cloud-init to complete (usually 1-2 minutes)
sleep 120
# Check status
qm guest exec <VMID> -- systemctl status qemu-guest-agent
```
### 3. Monitor Guest Agent Status
**Regular monitoring:**
```bash
# Check all VMs
for vmid in $(qm list | tail -n +2 | awk '{print $1}'); do
echo "VM $vmid:"
qm config $vmid | grep agent || echo " ⚠️ Agent not configured"
qm guest exec $vmid -- systemctl is-active qemu-guest-agent 2>/dev/null && echo " ✅ Running" || echo " ❌ Not running"
done
```
### 4. Document Exceptions
If a VM cannot have guest agent (rare), document why:
- Legacy OS without support
- Special security requirements
- Known limitations
---
## Scripts and Tools
### Available Scripts
1. **`scripts/complete-vm-100-guest-agent-check.sh`**
- Comprehensive check for VM 100
- Installed on both Proxmox nodes
- Location: `/usr/local/bin/complete-vm-100-guest-agent-check.sh`
2. **`scripts/copy-script-to-proxmox-nodes.sh`**
- Copies scripts to Proxmox nodes
- Uses SSH with password from `.env`
3. **`scripts/enhance-guest-agent-verification.py`**
- Enhanced all 29 VM templates
- Adds robust verification logic
### Usage
**Copy script to Proxmox nodes:**
```bash
bash scripts/copy-script-to-proxmox-nodes.sh
```
**Run check on Proxmox node:**
```bash
ssh root@<proxmox-node>
/usr/local/bin/complete-vm-100-guest-agent-check.sh
```
---
## Verification Checklist
### For New VMs
- [ ] VM created with Crossplane provider (automatic `agent: 1`)
- [ ] Cloud-init template includes `qemu-guest-agent` package
- [ ] Cloud-init includes service enable/start commands
- [ ] Wait for cloud-init to complete (1-2 minutes)
- [ ] Verify package installed: `qm guest exec <VMID> -- dpkg -l | grep qemu-guest-agent`
- [ ] Verify service running: `qm guest exec <VMID> -- systemctl status qemu-guest-agent`
- [ ] Test graceful shutdown: `qm shutdown <VMID>`
### For Existing VMs
- [ ] Check Proxmox config: `qm config <VMID> | grep agent`
- [ ] Enable if missing: `qm set <VMID> --agent 1`
- [ ] Check package: `qm guest exec <VMID> -- dpkg -l | grep qemu-guest-agent`
- [ ] Install if missing: `qm guest exec <VMID> -- apt-get install -y qemu-guest-agent`
- [ ] Check service: `qm guest exec <VMID> -- systemctl status qemu-guest-agent`
- [ ] Start if stopped: `qm guest exec <VMID> -- systemctl start qemu-guest-agent`
- [ ] Restart VM if needed: `qm shutdown <VMID>` or `qm stop <VMID> && qm start <VMID>`
---
## Summary
✅ **Automatic Configuration:**
- Crossplane provider sets `agent: 1` automatically
- All templates include guest agent in cloud-init
✅ **Verification:**
- Use check scripts on Proxmox nodes
- Verify both Proxmox config and guest OS service
✅ **Troubleshooting:**
- Enable in Proxmox: `qm set <VMID> --agent 1`
- Install in guest: `apt-get install -y qemu-guest-agent`
- Start service: `systemctl start qemu-guest-agent`
- Restart VM if needed
✅ **Best Practices:**
- Always include in templates
- Verify after creation
- Monitor regularly
- Document exceptions
---
**Related Documents:**
- `docs/GUEST_AGENT_CONFIGURATION_ANALYSIS.md`
- `docs/VM_100_GUEST_AGENT_FIXED.md`
- `docs/GUEST_AGENT_VERIFICATION_ENHANCEMENT_COMPLETE.md`
- `docs/SCRIPT_COPIED_TO_PROXMOX_NODES.md`