Co-authored-by: Cursor <cursoragent@cursor.com>
5.5 KiB
Proxmox VE Fix Complete - pve and pve2
Date: 2025-01-20
Status: ✅ ALL ISSUES RESOLVED
Issues Fixed
Root Cause
The primary issue was hostname resolution failure. The pve-cluster service could not resolve the hostname "pve" or "pve2" to a non-loopback IP address, causing:
- pve-cluster service to fail
- /etc/pve filesystem not mounting
- SSL certificates not accessible
- pveproxy workers crashing
Error Message
Unable to resolve node name 'pve' to a non-loopback IP address - missing entry in '/etc/hosts' or DNS?
Fixes Applied
1. Hostname Resolution Fix
Script: scripts/fix-proxmox-hostname-resolution.sh
What it did:
- Added proper entries to
/etc/hostson both hosts - Ensured hostnames resolve to their actual IP addresses (not loopback)
- Added both current hostname (pve/pve2) and correct hostname (r630-01/r630-02)
Results:
- ✅ pve-cluster service started successfully on both hosts
- ✅ /etc/pve filesystem is now mounted
- ✅ SSL certificates are accessible
2. SSL and Cluster Service Fix
Script: scripts/fix-proxmox-ssl-cluster.sh
What it did:
- Regenerated SSL certificates
- Restarted all Proxmox services in correct order
- Verified service status
Results:
- ✅ All services running
- ✅ Web interface accessible (HTTP 200)
- ✅ No worker exit errors
Current Status
pve (192.168.11.11 - r630-01)
| Service | Status | Notes |
|---|---|---|
| pve-cluster | ✅ Active (running) | Cluster filesystem mounted |
| pvestatd | ✅ Active (running) | Status daemon working |
| pvedaemon | ✅ Active (running) | API daemon working |
| pveproxy | ✅ Active (running) | Web interface accessible |
| Web Interface | ✅ Accessible | HTTP Status: 200 |
| Port 8006 | ✅ Listening | Workers running normally |
pve2 (192.168.11.12 - r630-02)
| Service | Status | Notes |
|---|---|---|
| pve-cluster | ✅ Active (running) | Cluster filesystem mounted |
| pvestatd | ✅ Active (running) | Status daemon working |
| pvedaemon | ✅ Active (running) | API daemon working |
| pveproxy | ✅ Active (running) | Web interface accessible |
| Web Interface | ✅ Accessible | HTTP Status: 200 |
| Port 8006 | ✅ Listening | Workers running normally |
/etc/hosts Configuration
pve (192.168.11.11)
192.168.11.11 pve pve.sankofa.nexus r630-01 r630-01.sankofa.nexus
pve2 (192.168.11.12)
192.168.11.12 pve2 pve2.sankofa.nexus r630-02 r630-02.sankofa.nexus
Key Point: The hostname (pve/pve2) must resolve to the actual IP address (192.168.11.11/12), not to 127.0.0.1. This is required for pve-cluster to function.
Cluster Status
Both nodes are in a cluster:
- Cluster Name: h
- Config Version: 3
- Transport: knet
- Status: Operational
Verification
Web Interface Access
# pve
curl -k https://192.168.11.11:8006/
# Returns: HTTP 200 ✅
# pve2
curl -k https://192.168.11.12:8006/
# Returns: HTTP 200 ✅
Service Status
# Check services on pve
ssh root@192.168.11.11 "systemctl status pve-cluster pvestatd pvedaemon pveproxy"
# Check services on pve2
ssh root@192.168.11.12 "systemctl status pve-cluster pvestatd pvedaemon pveproxy"
No Worker Exits
# Check for worker exit errors
ssh root@192.168.11.11 "journalctl -u pveproxy -n 50 | grep 'worker exit'"
# Should return: No recent worker exit errors ✅
Scripts Created
-
scripts/diagnose-proxmox-hosts.sh- Comprehensive diagnostic tool
- Tests connectivity, SSH, and all Proxmox services
- Usage:
./scripts/diagnose-proxmox-hosts.sh [pve|pve2|both]
-
scripts/fix-proxmox-hostname-resolution.sh- Fixes hostname resolution issues
- Updates /etc/hosts with correct entries
- Usage:
./scripts/fix-proxmox-hostname-resolution.sh
-
scripts/fix-proxmox-ssl-cluster.sh- Fixes SSL and cluster service issues
- Regenerates certificates and restarts services
- Usage:
./scripts/fix-proxmox-ssl-cluster.sh [pve|pve2|both]
Lessons Learned
-
Hostname Resolution is Critical
- Proxmox VE requires hostnames to resolve to non-loopback IPs
- /etc/hosts must have proper entries
- DNS alone may not be sufficient
-
Service Dependencies
- pve-cluster must be running before other services
- /etc/pve filesystem must be mounted for SSL certificates
- Services must be started in correct order
-
Cluster Filesystem
- pmxcfs (Proxmox Cluster File System) is required
- It provides /etc/pve as a FUSE filesystem
- Without it, SSL certificates and configuration are inaccessible
Next Steps
-
✅ Monitor Services
- Watch for any worker exit errors
- Verify web interface remains accessible
-
Consider Hostname Migration
- Current hostnames: pve, pve2
- Correct hostnames: r630-01, r630-02
- Migration can be done later if needed (see HOSTNAME_MIGRATION_GUIDE.md)
-
Document Cluster Configuration
- Document cluster setup
- Note any cluster-specific requirements
Related Documentation
- Proxmox Issues Analysis - Original issue analysis
- Hostname Migration Guide - How to change hostnames
- R630-04 Troubleshooting - Similar issues on r630-04
Last Updated: 2025-01-20
Status: ✅ All Issues Resolved
Both hosts are now fully operational!