Co-authored-by: Cursor <cursoragent@cursor.com>
6.1 KiB
Guest Agent IP Discovery - Architecture Guide
Date: 2025-11-27
Purpose: Document the guest-agent IP discovery pattern for all scripts
Overview
All SSH-using scripts now discover VM IPs dynamically from the QEMU Guest Agent instead of hard-coding IP addresses. This provides:
- Flexibility: VMs can change IPs without breaking scripts
- Maintainability: No IP addresses scattered throughout codebase
- Reliability: Single source of truth (guest agent)
- Scalability: Easy to add new VMs without updating IP lists
Architecture
Helper Library
Location: scripts/lib/proxmox_vm_helpers.sh
Key Functions:
get_vm_ip_from_guest_agent <vmid>- Get IP from guest agentget_vm_ip_or_warn <vmid> <name>- Get IP with warning if unavailableget_vm_ip_or_fallback <vmid> <name> <fallback>- Get IP with fallbackensure_guest_agent_enabled <vmid>- Enable agent in VM configwait_for_guest_agent <vmid> <timeout>- Wait for agent to be ready
VM Array Pattern
Before (hard-coded IPs):
VMS=(
"100 cloudflare-tunnel 192.168.1.60"
"101 k3s-master 192.168.1.188"
)
After (IP-free):
VMS=(
"100 cloudflare-tunnel"
"101 k3s-master"
)
Script Pattern
Before:
read -r vmid name ip <<< "$vm_spec"
ssh "${VM_USER}@${ip}" ...
After:
read -r vmid name <<< "$vm_spec"
ip="$(get_vm_ip_or_warn "$vmid" "$name" || true)"
[[ -z "$ip" ]] && continue
ssh "${VM_USER}@${ip}" ...
Bootstrap Problem
The Challenge
Guest-agent IP discovery only works after QEMU Guest Agent is installed and running in the VM.
Solution: Fallback Pattern
For bootstrap scripts (installing QGA itself), use fallback IPs:
# Fallback IPs for bootstrap
declare -A FALLBACK_IPS=(
["100"]="192.168.1.60"
["101"]="192.168.1.188"
)
# Get IP with fallback
ip="$(get_vm_ip_or_fallback "$vmid" "$name" "${FALLBACK_IPS[$vmid]:-}" || true)"
Bootstrap Flow
- First Pass: Use fallback IPs to install QGA
- After QGA: All subsequent scripts use guest-agent discovery
- No More Hard-coded IPs: Once QGA is installed everywhere
Updated Scripts
✅ Refactored Scripts
scripts/ops/ssh-test-all.sh- Example SSH test scriptscripts/deploy/configure-vm-services.sh- Service deploymentscripts/deploy/add-ssh-keys-to-vms.sh- SSH key managementscripts/deploy/verify-cloud-init.sh- Cloud-init verificationscripts/infrastructure/install-qemu-guest-agent.sh- QGA installation (with fallback)
📋 Scripts to Update
All scripts that use hard-coded IPs should be updated:
scripts/troubleshooting/diagnose-vm-issues.shscripts/troubleshooting/test-all-access-paths.shscripts/deploy/deploy-vms-via-api.sh(IPs needed for creation, but can use discovery after)- And many more...
Usage Examples
Example 1: Simple SSH Script
#!/bin/bash
source "$PROJECT_ROOT/scripts/lib/proxmox_vm_helpers.sh"
VMS=(
"100 cloudflare-tunnel"
"101 k3s-master"
)
for vm_spec in "${VMS[@]}"; do
read -r vmid name <<< "$vm_spec"
ip="$(get_vm_ip_or_warn "$vmid" "$name" || true)"
[[ -z "$ip" ]] && continue
ssh "${VM_USER}@${ip}" "hostname"
done
Example 2: Bootstrap Script (with Fallback)
#!/bin/bash
source "$PROJECT_ROOT/scripts/lib/proxmox_vm_helpers.sh"
declare -A FALLBACK_IPS=(
["100"]="192.168.1.60"
)
for vm_spec in "${VMS[@]}"; do
read -r vmid name <<< "$vm_spec"
ip="$(get_vm_ip_or_fallback "$vmid" "$name" "${FALLBACK_IPS[$vmid]:-}" || true)"
[[ -z "$ip" ]] && continue
# Install QGA using discovered/fallback IP
ssh "${VM_USER}@${ip}" "sudo apt install -y qemu-guest-agent"
done
Example 3: Service Deployment
#!/bin/bash
source "$PROJECT_ROOT/scripts/lib/proxmox_vm_helpers.sh"
declare -A VM_IPS
# Discover all IPs first
for vm_spec in "${VMS[@]}"; do
read -r vmid name <<< "$vm_spec"
ip="$(get_vm_ip_or_warn "$vmid" "$name" || true)"
[[ -n "$ip" ]] && VM_IPS["$vmid"]="$ip"
done
# Use discovered IPs
if [[ -n "${VM_IPS[102]:-}" ]]; then
deploy_gitea "${VM_IPS[102]}"
fi
Prerequisites
On Proxmox Host
-
jq installed:
apt update && apt install -y jq -
Helper library accessible:
- Scripts run on Proxmox host: Direct access
- Scripts run remotely: Copy helper or source via SSH
In VMs
-
QEMU Guest Agent installed:
sudo apt install -y qemu-guest-agent sudo systemctl enable --now qemu-guest-agent -
Agent enabled in VM config:
qm set <vmid> --agent enabled=1
Migration Checklist
For each script that uses hard-coded IPs:
- Remove IPs from VM array (keep only VMID and NAME)
- Add
sourcefor helper library - Replace
read -r vmid name ipwithread -r vmid name - Add IP discovery:
ip="$(get_vm_ip_or_warn "$vmid" "$name" || true)" - Add skip logic:
[[ -z "$ip" ]] && continue - Test script with guest agent enabled
- For bootstrap scripts, add fallback IPs
Benefits
- No IP Maintenance: IPs change? Scripts still work
- Single Source of Truth: Guest agent provides accurate IPs
- Easier Testing: Can test with different IPs without code changes
- Better Error Handling: Scripts gracefully handle missing guest agent
- Future-Proof: Works with DHCP, dynamic IPs, multiple interfaces
Troubleshooting
"No IP from guest agent"
Causes:
- QEMU Guest Agent not installed in VM
- Agent not enabled in VM config
- VM not powered on
- Agent service not running
Fix:
# In VM
sudo apt install -y qemu-guest-agent
sudo systemctl enable --now qemu-guest-agent
# On Proxmox host
qm set <vmid> --agent enabled=1
"jq command not found"
Fix:
apt update && apt install -y jq
Scripts run remotely (not on Proxmox host)
Options:
- Copy helper library to remote location
- Source via SSH:
ssh proxmox-host "source /path/to/helpers.sh && get_vm_ip_or_warn 100 test" - Use Proxmox API instead of
qmcommands
Status: Helper library created, key scripts refactored. Remaining scripts should follow the same pattern.