Files
loc_az_hci/docs/architecture/VM_PLACEMENT_EXPLANATION.md
defiQUG c39465c2bd
Some checks failed
Test / test (push) Has been cancelled
Initial commit: loc_az_hci (smom-dbis-138 excluded via .gitignore)
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-08 09:04:46 -08:00

205 lines
7.4 KiB
Markdown

# VM Placement Explanation - Why VMs Don't Need to Be on Both Servers
**Date:** 2025-11-27
**Question:** Why are VMs 100-103 required on both servers?
## Short Answer
**VMs 100-103 are NOT required on both servers.** They are deployed once and can run on either node in the Proxmox cluster. The cluster provides high availability through VM migration, not duplication.
## Architecture Overview
### Current Setup
- **Proxmox Cluster:** 2 nodes (ML110 and R630)
- **VMs 100-103:** Deployed on ML110 (can run on either node)
- **Shared Storage:** NFS (when configured) allows VM migration
### How It Works
```
┌─────────────────────────────────────────────────────────┐
│ Proxmox VE Cluster (hc-cluster) │
│ │
│ ┌──────────────┐ ┌──────────────┐ │
│ │ ML110 │◄───────►│ R630 │ │
│ │ (Node 1) │ Cluster │ (Node 2) │ │
│ │ │ Network │ │ │
│ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │
│ └──────────┬─────────────┘ │
│ │ │
│ ┌─────▼─────┐ │
│ │ NFS │ │
│ │ Storage │ │
│ │ (Shared) │ │
│ └─────┬─────┘ │
│ │ │
│ ┌──────────┼──────────┐ │
│ │ │ │ │
│ ┌────▼───┐ ┌───▼───┐ ┌───▼───┐ ┌───▼───┐ │
│ │ VM 100 │ │VM 101 │ │VM 102 │ │VM 103 │ │
│ │ │ │ │ │ │ │ │ │
│ │ Can run│ │ Can │ │ Can │ │ Can │ │
│ │ on │ │ run on│ │ run on│ │ run on│ │
│ │ either │ │ either│ │ either│ │ either│ │
│ │ node │ │ node │ │ node │ │ node │ │
│ └────────┘ └───────┘ └───────┘ └───────┘ │
│ │
└─────────────────────────────────────────────────────────┘
```
## Key Concepts
### 1. Cluster = Shared Management, Not Duplication
A Proxmox cluster means:
- **Shared management:** Both nodes managed together
- **Shared storage:** VMs stored on shared storage (NFS)
- **VM migration:** VMs can move between nodes
- **High availability:** If one node fails, VMs can run on the other
**It does NOT mean:**
- ❌ Duplicate VMs on both nodes
- ❌ VMs running simultaneously on both nodes
- ❌ Separate VM instances per node
### 2. VM Placement Strategy
**Current Deployment:**
- VMs 100-103 are deployed on ML110
- They can be migrated to R630 if needed
- Only one instance of each VM exists
**Why Deploy on One Node Initially:**
- Simpler initial setup
- ML110 has SSH access configured
- Can migrate later if needed
**When to Migrate:**
- Load balancing (spread VMs across nodes)
- Maintenance (move VMs off node being maintained)
- Failure recovery (automatic or manual migration)
### 3. High Availability Options
#### Option A: Manual Migration (Current Setup)
- VMs run on one node
- Can manually migrate if node fails
- Requires shared storage (NFS)
#### Option B: HA Groups (Future)
- Configure HA groups in Proxmox
- Automatic failover if node fails
- Requires shared storage and quorum
#### Option C: Load Balancing
- Distribute VMs across both nodes
- Better resource utilization
- Still one instance per VM
## VM Details
### VM 100 - Cloudflare Tunnel
- **Current Location:** ML110
- **Can Run On:** Either node
- **Why:** Single instance sufficient, can migrate if needed
### VM 101 - K3s Master
- **Current Location:** ML110
- **Can Run On:** Either node
- **Why:** Single K3s master, can migrate if needed
### VM 102 - Git Server
- **Current Location:** ML110
- **Can Run On:** Either node
- **Why:** Single Git server, can migrate if needed
### VM 103 - Observability
- **Current Location:** ML110
- **Can Run On:** Either node
- **Why:** Single observability stack, can migrate if needed
## When You WOULD Need VMs on Both Servers
### Scenario 1: Separate Environments
- **Dev on ML110, Prod on R630**
- Different VM IDs (e.g., 100-103 on ML110, 200-203 on R630)
- Not a cluster, separate deployments
### Scenario 2: Load Balancing
- **VM 100, 102 on ML110**
- **VM 101, 103 on R630**
- Still one instance per VM, just distributed
### Scenario 3: High Availability Pairs
- **VM 100 primary on ML110, standby on R630**
- Requires application-level HA (not Proxmox)
- More complex setup
## Current Architecture Benefits
### ✅ Advantages of Current Setup
1. **Simplicity:** One deployment, easier management
2. **Resource Efficiency:** No duplicate resource usage
3. **Flexibility:** Can migrate VMs as needed
4. **Cost:** Lower resource requirements
### ⚠️ Considerations
1. **Single Point of Failure:** If ML110 fails, VMs need migration
2. **Load Distribution:** All VMs on one node may cause resource contention
3. **Maintenance:** Need to migrate VMs for ML110 maintenance
## Recommendations
### For Current Setup
- **Keep VMs on ML110** (where they are now)
- **Configure shared storage** (NFS) for migration capability
- **Test VM migration** between nodes
- **Monitor resource usage** on ML110
### For Future Optimization
- **Distribute VMs** across both nodes for load balancing:
- ML110: VM 100, 102
- R630: VM 101, 103
- **Configure HA groups** for automatic failover
- **Monitor and balance** resource usage
## Migration Example
### How to Migrate a VM
**Via Web UI:**
1. Select VM → Migrate
2. Choose target node (R630)
3. Start migration
**Via CLI:**
```bash
# Migrate VM 100 from ML110 to R630
qm migrate 100 r630 --online
```
**Via API:**
```bash
curl -k -X POST \
-H "Cookie: PVEAuthCookie=..." \
-H "CSRFPreventionToken: ..." \
-d "target=r630" \
"https://192.168.1.206:8006/api2/json/nodes/pve/qemu/100/migrate"
```
## Summary
**VMs 100-103 are NOT required on both servers.** They are:
- ✅ Deployed once (currently on ML110)
- ✅ Stored on shared storage (when NFS configured)
- ✅ Can run on either node in the cluster
- ✅ Can be migrated between nodes as needed
The cluster provides **high availability through migration**, not duplication. This is the standard Proxmox cluster architecture.
---
**If you need VMs on both servers for a specific reason, please clarify the requirement and we can adjust the architecture accordingly.**