# Common Issues and Solutions This document covers frequently encountered problems and their solutions. ## Proxmox Issues ### Cannot Connect to Proxmox Web UI **Symptoms:** - Browser shows connection error - SSL certificate warning **Solutions:** 1. Verify IP address and port (default: 8006) 2. Accept self-signed certificate in browser 3. Check firewall rules: `iptables -L -n` 4. Verify Proxmox service: `systemctl status pveproxy` ### VM Won't Start **Symptoms:** - VM shows as stopped - Error messages in logs **Solutions:** 1. Check VM configuration: `qm config ` 2. Verify storage availability: `pvesm status` 3. Check resource limits: `pvesh get /nodes//status` 4. Review VM logs: `journalctl -u qemu-server@` ### Cluster Issues **Symptoms:** - Nodes not showing in cluster - Quorum errors **Solutions:** 1. Check cluster status: `pvecm status` 2. Verify network connectivity between nodes 3. Check cluster configuration: `cat /etc/pve/corosync.conf` 4. Restart cluster services: `systemctl restart pve-cluster` ## Azure Arc Issues ### Agent Not Connecting **Symptoms:** - Machine not appearing in Azure Portal - Connection errors in logs **Solutions:** 1. Check agent status: `azcmagent status` 2. Verify network connectivity to Azure: `curl -v https://management.azure.com` 3. Check agent logs: `journalctl -u himdsd -f` 4. Re-register agent: `azcmagent connect --resource-group --tenant-id ` ### Policy Not Applying **Symptoms:** - Policies not showing as compliant - Assignment errors **Solutions:** 1. Verify agent is connected: `azcmagent status` 2. Check policy assignment in Azure Portal 3. Review policy logs: `azcmagent show` 4. Re-assign policies if needed ## Kubernetes Issues ### Pods Not Starting **Symptoms:** - Pods in Pending or CrashLoopBackOff state - Resource errors **Solutions:** 1. Check pod status: `kubectl describe pod ` 2. Check node resources: `kubectl top nodes` 3. Review pod logs: `kubectl logs ` 4. Check events: `kubectl get events --sort-by='.lastTimestamp'` ### Services Not Accessible **Symptoms:** - Cannot reach service endpoints - Connection timeouts **Solutions:** 1. Check service configuration: `kubectl get svc -o yaml` 2. Verify endpoints: `kubectl get endpoints ` 3. Check ingress configuration: `kubectl get ingress` 4. Test from within cluster: `kubectl run test --image=busybox --rm -it -- wget -O- ` ## Network Issues ### VLAN Not Working **Symptoms:** - VMs cannot communicate on VLAN - Network isolation not working **Solutions:** 1. Verify VLAN configuration: `cat /etc/network/interfaces` 2. Check bridge configuration: `ip link show` 3. Verify VLAN tagging: `qm config | grep net` 4. Test VLAN connectivity: `ping ` ### DNS Resolution Issues **Symptoms:** - Cannot resolve hostnames - Service discovery not working **Solutions:** 1. Check DNS configuration: `cat /etc/resolv.conf` 2. Test DNS resolution: `nslookup ` 3. Verify CoreDNS in Kubernetes: `kubectl get pods -n kube-system | grep coredns` 4. Check DNS service: `kubectl get svc kube-dns -n kube-system` ## Storage Issues ### Storage Not Available **Symptoms:** - Cannot create VMs - Storage errors **Solutions:** 1. Check storage status: `pvesm status` 2. Verify storage mounts: `df -h` 3. Check storage permissions: `ls -la /var/lib/vz/` 4. Review storage logs: `journalctl -u pvestatd` ### Performance Issues **Symptoms:** - Slow VM performance - High I/O wait **Solutions:** 1. Check disk I/O: `iostat -x 1` 2. Verify storage type (SSD vs HDD) 3. Check for disk errors: `dmesg | grep -i error` 4. Consider storage optimization settings ## Cloudflare Tunnel Issues ### Tunnel Not Connecting **Symptoms:** - Services not accessible externally - Tunnel errors in logs **Solutions:** 1. Check tunnel status: `cloudflared tunnel info` 2. Verify tunnel token: `echo $CLOUDFLARE_TUNNEL_TOKEN` 3. Check tunnel logs: `journalctl -u cloudflared -f` 4. Test tunnel connection: `cloudflared tunnel run ` ### Zero Trust Not Working **Symptoms:** - Access policies not applying - SSO not working **Solutions:** 1. Verify Zero Trust configuration in Cloudflare Dashboard 2. Check policy rules and conditions 3. Review access logs in Cloudflare Dashboard 4. Test with different user accounts ## General Troubleshooting Steps 1. **Check Logs**: Always review relevant logs first 2. **Verify Configuration**: Ensure all configuration files are correct 3. **Test Connectivity**: Verify network connectivity between components 4. **Check Resources**: Ensure sufficient CPU, memory, and storage 5. **Review Documentation**: Check relevant documentation and runbooks 6. **Search Issues**: Look for similar issues in logs or documentation ## Getting Help If you cannot resolve an issue: 1. Review the relevant runbook in `docs/operations/runbooks/` 2. Check the troubleshooting guide for your specific component 3. Review logs and error messages carefully 4. Document the issue with steps to reproduce 5. Check for known issues in the project repository ## Additional Resources - [VM Troubleshooting](vm-troubleshooting.md) - [Proxmox Operations Runbook](../operations/runbooks/proxmox-operations.md) - [Azure Arc Troubleshooting Runbook](../operations/runbooks/azure-arc-troubleshooting.md)