- Added generated index files and report directories to .gitignore to prevent unnecessary tracking of transient files. - Updated README links to reflect new documentation paths for better navigation. - Improved documentation organization by ensuring all links point to the correct locations, enhancing user experience and accessibility.
222 lines
4.8 KiB
Markdown
222 lines
4.8 KiB
Markdown
# Deployment Next Steps
|
|
|
|
**Date**: 2025-12-09
|
|
**Status**: ⚠️ **LOCK ISSUE - MANUAL RESOLUTION REQUIRED**
|
|
|
|
---
|
|
|
|
## Current Situation
|
|
|
|
### ✅ Completed
|
|
1. **Provider Configuration**: ✅ Verified and working
|
|
2. **VM Resource Created**: ✅ basic-vm-001 (VMID 100)
|
|
3. **Deployment Initiated**: ✅ VM created in Proxmox
|
|
|
|
### ⚠️ Blocking Issue
|
|
**VM Lock Timeout**: Configuration update blocked by Proxmox lock file
|
|
|
|
**Error**: `can't lock file '/var/lock/qemu-server/lock-100.conf' - got timeout`
|
|
|
|
---
|
|
|
|
## Immediate Action Required
|
|
|
|
### Step 1: Resolve Lock on Proxmox Node
|
|
|
|
**Access the Proxmox node and clear the lock:**
|
|
|
|
```bash
|
|
# Connect to Proxmox node (replace with actual IP/hostname)
|
|
ssh root@<proxmox-node-ip>
|
|
|
|
# Check VM status
|
|
qm status 100
|
|
|
|
# Unlock the VM
|
|
qm unlock 100
|
|
|
|
# If unlock doesn't work, remove lock file
|
|
rm -f /var/lock/qemu-server/lock-100.conf
|
|
|
|
# Verify lock is cleared
|
|
ls -la /var/lock/qemu-server/lock-100.conf
|
|
```
|
|
|
|
**Note**: If you don't have direct SSH access, you may need to:
|
|
- Use Proxmox web UI
|
|
- Access via console
|
|
- Use another method to access the node
|
|
|
|
### Step 2: Verify Image Availability
|
|
|
|
**While on the Proxmox node, verify the image exists:**
|
|
|
|
```bash
|
|
# Check for image
|
|
find /var/lib/vz/template/iso -name "ubuntu-22.04-cloud.img"
|
|
pvesm list local-lvm | grep ubuntu-22.04-cloud
|
|
|
|
# If missing, download it
|
|
cd /var/lib/vz/template/iso
|
|
wget https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img
|
|
mv jammy-server-cloudimg-amd64.img ubuntu-22.04-cloud.img
|
|
```
|
|
|
|
### Step 3: Monitor Automatic Retry
|
|
|
|
**After clearing the lock, the provider will automatically retry:**
|
|
|
|
```bash
|
|
# Watch VM status
|
|
kubectl get proxmoxvm basic-vm-001 -w
|
|
|
|
# Watch provider logs
|
|
kubectl logs -n crossplane-system -l app=crossplane-provider-proxmox --tail=50 -f
|
|
```
|
|
|
|
**Expected Timeline**: 1-5 minutes after lock is cleared
|
|
|
|
---
|
|
|
|
## After Lock Resolution
|
|
|
|
### Expected Sequence
|
|
|
|
1. **Provider retries** configuration update (automatic)
|
|
2. **VM configuration** completes successfully
|
|
3. **Image import** (if needed) completes
|
|
4. **Boot order** set correctly
|
|
5. **Cloud-init** configured
|
|
6. **VM boots** successfully
|
|
7. **VM reaches "running" state**
|
|
8. **IP address assigned**
|
|
9. **Ready condition becomes "True"**
|
|
|
|
### Verification Steps
|
|
|
|
Once VM is running:
|
|
|
|
```bash
|
|
# Get VM IP
|
|
IP=$(kubectl get proxmoxvm basic-vm-001 -o jsonpath='{.status.networkInterfaces[0].ipAddress}')
|
|
|
|
# Check cloud-init logs
|
|
ssh admin@$IP "cat /var/log/cloud-init-output.log | tail -50"
|
|
|
|
# Verify services
|
|
ssh admin@$IP "systemctl status qemu-guest-agent chrony unattended-upgrades"
|
|
|
|
# Test SSH access
|
|
ssh admin@$IP "hostname && uptime"
|
|
```
|
|
|
|
---
|
|
|
|
## If Lock Resolution Fails
|
|
|
|
### Alternative: Delete and Redeploy
|
|
|
|
If the lock cannot be cleared:
|
|
|
|
```bash
|
|
# 1. Delete Kubernetes resource
|
|
kubectl delete proxmoxvm basic-vm-001
|
|
|
|
# 2. On Proxmox node, force delete VM
|
|
ssh root@<proxmox-node> "qm destroy 100 --purge --skiplock"
|
|
|
|
# 3. Clean up locks
|
|
ssh root@<proxmox-node> "rm -f /var/lock/qemu-server/lock-100.conf"
|
|
|
|
# 4. Wait for cleanup
|
|
sleep 10
|
|
|
|
# 5. Redeploy
|
|
kubectl apply -f examples/production/basic-vm.yaml
|
|
```
|
|
|
|
---
|
|
|
|
## Long-term Solutions
|
|
|
|
### 1. Code Enhancement
|
|
|
|
**Add lock handling to provider code:**
|
|
|
|
- Detect lock errors in `UpdateVM`
|
|
- Automatically call `qm unlock` before retry
|
|
- Increase timeout for lock operations
|
|
- Add exponential backoff for lock retries
|
|
|
|
**File**: `crossplane-provider-proxmox/pkg/proxmox/client.go`
|
|
|
|
### 2. Pre-deployment Checks
|
|
|
|
**Add validation before VM creation:**
|
|
|
|
- Check for existing locks on target node
|
|
- Verify no conflicting operations
|
|
- Ensure Proxmox node is healthy
|
|
|
|
### 3. Deployment Strategy
|
|
|
|
**For full deployment:**
|
|
|
|
- Deploy VMs sequentially (not in parallel)
|
|
- Add delays between deployments (30-60 seconds)
|
|
- Monitor each deployment before proceeding
|
|
- Implement retry logic with lock handling
|
|
|
|
---
|
|
|
|
## Full Deployment Plan (After Test Success)
|
|
|
|
### Phase 1: Infrastructure (2 VMs)
|
|
1. nginx-proxy-vm.yaml
|
|
2. cloudflare-tunnel-vm.yaml
|
|
|
|
### Phase 2: SMOM-DBIS-138 Core (8 VMs)
|
|
3-6. validator-01 through validator-04
|
|
7-10. sentry-01 through sentry-04
|
|
|
|
### Phase 3: SMOM-DBIS-138 Services (8 VMs)
|
|
11-14. rpc-node-01 through rpc-node-04
|
|
15. services.yaml
|
|
16. blockscout.yaml
|
|
17. monitoring.yaml
|
|
18. management.yaml
|
|
|
|
### Phase 4: Phoenix VMs (8 VMs)
|
|
19-26. All Phoenix VMs
|
|
|
|
### Phase 5: Template VMs (2 VMs - Optional)
|
|
27. medium-vm.yaml
|
|
28. large-vm.yaml
|
|
|
|
**Total**: 28 additional VMs after test VM
|
|
|
|
---
|
|
|
|
## Summary
|
|
|
|
### Current Status
|
|
- ✅ Provider: Working
|
|
- ✅ VM Created: Yes (VMID 100)
|
|
- ⚠️ Configuration: Blocked by lock
|
|
- ⚠️ State: Stopped
|
|
|
|
### Required Action
|
|
**Manual lock resolution on Proxmox node**
|
|
|
|
### After Resolution
|
|
- Provider will automatically retry
|
|
- VM should complete configuration
|
|
- VM should boot successfully
|
|
- Full deployment can proceed
|
|
|
|
---
|
|
|
|
**Last Updated**: 2025-12-09
|
|
**Status**: ⚠️ **WAITING FOR MANUAL LOCK RESOLUTION**
|
|
|