Files
Sankofa/docs/VM_TEMPLATE_IMAGE_ISSUE_ANALYSIS.md
defiQUG 9daf1fd378 Apply Composer changes: comprehensive API updates, migrations, middleware, and infrastructure improvements
- Add comprehensive database migrations (001-024) for schema evolution
- Enhance API schema with expanded type definitions and resolvers
- Add new middleware: audit logging, rate limiting, MFA enforcement, security, tenant auth
- Implement new services: AI optimization, billing, blockchain, compliance, marketplace
- Add adapter layer for cloud integrations (Cloudflare, Kubernetes, Proxmox, storage)
- Update Crossplane provider with enhanced VM management capabilities
- Add comprehensive test suite for API endpoints and services
- Update frontend components with improved GraphQL subscriptions and real-time updates
- Enhance security configurations and headers (CSP, CORS, etc.)
- Update documentation and configuration files
- Add new CI/CD workflows and validation scripts
- Implement design system improvements and UI enhancements
2025-12-12 18:01:35 -08:00

4.0 KiB

VM Template Image Issue Analysis

Date: 2025-12-11
Issue: VMs 100 and 101 created without attached disk or image


Problem Summary

VMs 100 and 101 were created but had:

  • No attached disk
  • No bootable image
  • Stuck in "lock: create" state
  • Provider unable to complete image import

Root Cause Analysis

Template Configuration

File: examples/production/vm-100.yaml

  • Image specified: local:iso/ubuntu-22.04-cloud.img
  • Format: Volid format (storage:path)

Provider Code Flow

  1. Image Detection (Line 275-276 in client.go):

    if strings.Contains(spec.Image, ":") {
        imageVolid = spec.Image  // Treats as volid
    }
    
  2. Import Decision (Line 291-292):

    if strings.HasSuffix(imageVolid, ".img") || strings.HasSuffix(imageVolid, ".qcow2") {
        needsImageImport = true  // Triggers importdisk API
    }
    
  3. VM Creation (Line 294):

    • Creates VM with blank disk first
    • Then attempts to import image using importdisk API
  4. Import Process (Line 350-399):

    • Calls /nodes/{node}/qemu/{vmid}/importdisk
    • Creates new disk (usually scsi1)
    • Tries to replace scsi0 with imported disk
    • PROBLEM: Import operation holds lock, preventing config updates

The Issue

The importdisk API operation:

  1. Creates a lock on the VM (lock: create)
  2. Takes time to copy/import the image
  3. Provider tries to update config while lock is held
  4. Update fails with "VM is locked (create)" error
  5. Lock never releases properly, leaving VM in stuck state

Template Review

Current Template Format

image: "local:iso/ubuntu-22.04-cloud.img"

Problems:

  • Volid format is correct
  • Triggers importdisk path (slow, can get stuck)
  • Requires lock coordination
  • No timeout handling for import operations

Alternative Approaches

Option 1: Use Template Instead of Image Import

image: "local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst"
  • Direct template usage (no import needed)
  • Faster creation
  • No lock issues
  • Different OS (standard vs cloud)

Option 2: Pre-import Image to Storage

  • Upload image to local-lvm storage pool
  • Use as direct disk reference
  • Avoids importdisk API

Option 3: Fix Provider Code

  • Add proper task monitoring for importdisk
  • Wait for import to complete before updating config
  • Add timeout and retry logic
  • Better lock management

Recommendations

Immediate Fix

  1. Use existing template (if acceptable):

    image: "local:vztmpl/ubuntu-22.04-standard_22.04-1_amd64.tar.zst"
    
  2. Or pre-import cloud image to local-lvm:

    # On Proxmox node
    qm disk import <vmid> local:iso/ubuntu-22.04-cloud.img local-lvm
    

Long-term Fix

  1. Enhance provider code:

    • Monitor importdisk task status
    • Wait for completion before config updates
    • Add proper error handling and timeouts
    • Implement lock release on failure
  2. Template standardization:

    • Document image format requirements
    • Provide pre-imported images in storage
    • Use templates when possible (faster)

Verification Steps

After fixing templates:

  1. Check image availability:

    pvesm list local | grep ubuntu
    pvesm list local-lvm | grep ubuntu
    
  2. Verify template format:

    • Use volid format: storage:path/to/image
    • Or template format: storage:vztmpl/template.tar.zst
  3. Test VM creation:

    • Create test VM
    • Verify disk is attached
    • Verify boot order is set
    • Verify VM can start

  • examples/production/vm-100.yaml - Problematic template
  • examples/production/basic-vm.yaml - Base template
  • crossplane-provider-proxmox/pkg/proxmox/client.go - Provider code
  • Lines 274-470: Image handling and import logic

Status: ⚠️ ISSUE IDENTIFIED - NEEDS FIX

Next Steps:

  1. Review all templates for image format
  2. Decide on image strategy (template vs import)
  3. Update templates accordingly
  4. Test VM creation