loc_az_hci/docs/architecture/VM_PLACEMENT_EXPLANATION.md

# VM Placement Explanation - Why VMs Don't Need to Be on Both Servers

**Date:** 2025-11-27
**Question:** Why are VMs 100-103 required on both servers?

## Short Answer

**VMs 100-103 are NOT required on both servers.** They are deployed once and can run on either node in the Proxmox cluster. The cluster provides high availability through VM migration, not duplication.

## Architecture Overview

### Current Setup
- **Proxmox Cluster:** 2 nodes (ML110 and R630)
- **VMs 100-103:** Deployed on ML110 (can run on either node)
- **Shared Storage:** NFS (when configured) allows VM migration

### How It Works

```
┌─────────────────────────────────────────────────────────┐
│              Proxmox VE Cluster (hc-cluster)            │
│                                                          │
│  ┌──────────────┐         ┌──────────────┐             │
│  │   ML110      │◄───────►│   R630       │             │
│  │ (Node 1)     │ Cluster │ (Node 2)     │             │
│  │              │ Network │              │             │
│  └──────┬───────┘         └──────┬───────┘             │
│         │                        │                      │
│         └──────────┬─────────────┘                      │
│                    │                                    │
│              ┌─────▼─────┐                             │
│              │   NFS     │                             │
│              │  Storage  │                             │
│              │ (Shared)  │                             │
│              └─────┬─────┘                             │
│                    │                                    │
│         ┌──────────┼──────────┐                        │
│         │          │          │                        │
│    ┌────▼───┐ ┌───▼───┐ ┌───▼───┐ ┌───▼───┐          │
│    │ VM 100 │ │VM 101 │ │VM 102 │ │VM 103 │          │
│    │        │ │       │ │       │ │       │          │
│    │ Can run│ │ Can   │ │ Can   │ │ Can   │          │
│    │ on     │ │ run on│ │ run on│ │ run on│          │
│    │ either │ │ either│ │ either│ │ either│          │
│    │ node   │ │ node  │ │ node  │ │ node  │          │
│    └────────┘ └───────┘ └───────┘ └───────┘          │
│                                                          │
└─────────────────────────────────────────────────────────┘
```

## Key Concepts

### 1. Cluster = Shared Management, Not Duplication

A Proxmox cluster means:
- **Shared management:** Both nodes managed together
- **Shared storage:** VMs stored on shared storage (NFS)
- **VM migration:** VMs can move between nodes
- **High availability:** If one node fails, VMs can run on the other

**It does NOT mean:**
- ❌ Duplicate VMs on both nodes
- ❌ VMs running simultaneously on both nodes
- ❌ Separate VM instances per node

### 2. VM Placement Strategy

**Current Deployment:**
- VMs 100-103 are deployed on ML110
- They can be migrated to R630 if needed
- Only one instance of each VM exists

**Why Deploy on One Node Initially:**
- Simpler initial setup
- ML110 has SSH access configured
- Can migrate later if needed

**When to Migrate:**
- Load balancing (spread VMs across nodes)
- Maintenance (move VMs off node being maintained)
- Failure recovery (automatic or manual migration)

### 3. High Availability Options

#### Option A: Manual Migration (Current Setup)
- VMs run on one node
- Can manually migrate if node fails
- Requires shared storage (NFS)

#### Option B: HA Groups (Future)
- Configure HA groups in Proxmox
- Automatic failover if node fails
- Requires shared storage and quorum

#### Option C: Load Balancing
- Distribute VMs across both nodes
- Better resource utilization
- Still one instance per VM

## VM Details

### VM 100 - Cloudflare Tunnel
- **Current Location:** ML110
- **Can Run On:** Either node
- **Why:** Single instance sufficient, can migrate if needed

### VM 101 - K3s Master
- **Current Location:** ML110
- **Can Run On:** Either node
- **Why:** Single K3s master, can migrate if needed

### VM 102 - Git Server
- **Current Location:** ML110
- **Can Run On:** Either node
- **Why:** Single Git server, can migrate if needed

### VM 103 - Observability
- **Current Location:** ML110
- **Can Run On:** Either node
- **Why:** Single observability stack, can migrate if needed

## When You WOULD Need VMs on Both Servers

### Scenario 1: Separate Environments
- **Dev on ML110, Prod on R630**
- Different VM IDs (e.g., 100-103 on ML110, 200-203 on R630)
- Not a cluster, separate deployments

### Scenario 2: Load Balancing
- **VM 100, 102 on ML110**
- **VM 101, 103 on R630**
- Still one instance per VM, just distributed

### Scenario 3: High Availability Pairs
- **VM 100 primary on ML110, standby on R630**
- Requires application-level HA (not Proxmox)
- More complex setup

## Current Architecture Benefits

### ✅ Advantages of Current Setup
1. **Simplicity:** One deployment, easier management
2. **Resource Efficiency:** No duplicate resource usage
3. **Flexibility:** Can migrate VMs as needed
4. **Cost:** Lower resource requirements

### ⚠️ Considerations
1. **Single Point of Failure:** If ML110 fails, VMs need migration
2. **Load Distribution:** All VMs on one node may cause resource contention
3. **Maintenance:** Need to migrate VMs for ML110 maintenance

## Recommendations

### For Current Setup
- **Keep VMs on ML110** (where they are now)
- **Configure shared storage** (NFS) for migration capability
- **Test VM migration** between nodes
- **Monitor resource usage** on ML110

### For Future Optimization
- **Distribute VMs** across both nodes for load balancing:
  - ML110: VM 100, 102
  - R630: VM 101, 103
- **Configure HA groups** for automatic failover
- **Monitor and balance** resource usage

## Migration Example

### How to Migrate a VM

**Via Web UI:**
1. Select VM → Migrate
2. Choose target node (R630)
3. Start migration

**Via CLI:**
```bash
# Migrate VM 100 from ML110 to R630
qm migrate 100 r630 --online
```

**Via API:**
```bash
curl -k -X POST \
  -H "Cookie: PVEAuthCookie=..." \
  -H "CSRFPreventionToken: ..." \
  -d "target=r630" \
  "https://192.168.1.206:8006/api2/json/nodes/pve/qemu/100/migrate"
```

## Summary

**VMs 100-103 are NOT required on both servers.** They are:
- ✅ Deployed once (currently on ML110)
- ✅ Stored on shared storage (when NFS configured)
- ✅ Can run on either node in the cluster
- ✅ Can be migrated between nodes as needed

The cluster provides **high availability through migration**, not duplication. This is the standard Proxmox cluster architecture.

---

**If you need VMs on both servers for a specific reason, please clarify the requirement and we can adjust the architecture accordingly.**