Some checks failed
Test / test (push) Has been cancelled
Co-authored-by: Cursor <cursoragent@cursor.com>
198 lines
5.3 KiB
Markdown
198 lines
5.3 KiB
Markdown
# Common Issues and Solutions
|
|
|
|
This document covers frequently encountered problems and their solutions.
|
|
|
|
## Proxmox Issues
|
|
|
|
### Cannot Connect to Proxmox Web UI
|
|
|
|
**Symptoms:**
|
|
- Browser shows connection error
|
|
- SSL certificate warning
|
|
|
|
**Solutions:**
|
|
1. Verify IP address and port (default: 8006)
|
|
2. Accept self-signed certificate in browser
|
|
3. Check firewall rules: `iptables -L -n`
|
|
4. Verify Proxmox service: `systemctl status pveproxy`
|
|
|
|
### VM Won't Start
|
|
|
|
**Symptoms:**
|
|
- VM shows as stopped
|
|
- Error messages in logs
|
|
|
|
**Solutions:**
|
|
1. Check VM configuration: `qm config <vmid>`
|
|
2. Verify storage availability: `pvesm status`
|
|
3. Check resource limits: `pvesh get /nodes/<node>/status`
|
|
4. Review VM logs: `journalctl -u qemu-server@<vmid>`
|
|
|
|
### Cluster Issues
|
|
|
|
**Symptoms:**
|
|
- Nodes not showing in cluster
|
|
- Quorum errors
|
|
|
|
**Solutions:**
|
|
1. Check cluster status: `pvecm status`
|
|
2. Verify network connectivity between nodes
|
|
3. Check cluster configuration: `cat /etc/pve/corosync.conf`
|
|
4. Restart cluster services: `systemctl restart pve-cluster`
|
|
|
|
## Azure Arc Issues
|
|
|
|
### Agent Not Connecting
|
|
|
|
**Symptoms:**
|
|
- Machine not appearing in Azure Portal
|
|
- Connection errors in logs
|
|
|
|
**Solutions:**
|
|
1. Check agent status: `azcmagent status`
|
|
2. Verify network connectivity to Azure: `curl -v https://management.azure.com`
|
|
3. Check agent logs: `journalctl -u himdsd -f`
|
|
4. Re-register agent: `azcmagent connect --resource-group <rg> --tenant-id <tenant>`
|
|
|
|
### Policy Not Applying
|
|
|
|
**Symptoms:**
|
|
- Policies not showing as compliant
|
|
- Assignment errors
|
|
|
|
**Solutions:**
|
|
1. Verify agent is connected: `azcmagent status`
|
|
2. Check policy assignment in Azure Portal
|
|
3. Review policy logs: `azcmagent show`
|
|
4. Re-assign policies if needed
|
|
|
|
## Kubernetes Issues
|
|
|
|
### Pods Not Starting
|
|
|
|
**Symptoms:**
|
|
- Pods in Pending or CrashLoopBackOff state
|
|
- Resource errors
|
|
|
|
**Solutions:**
|
|
1. Check pod status: `kubectl describe pod <pod-name>`
|
|
2. Check node resources: `kubectl top nodes`
|
|
3. Review pod logs: `kubectl logs <pod-name>`
|
|
4. Check events: `kubectl get events --sort-by='.lastTimestamp'`
|
|
|
|
### Services Not Accessible
|
|
|
|
**Symptoms:**
|
|
- Cannot reach service endpoints
|
|
- Connection timeouts
|
|
|
|
**Solutions:**
|
|
1. Check service configuration: `kubectl get svc <service-name> -o yaml`
|
|
2. Verify endpoints: `kubectl get endpoints <service-name>`
|
|
3. Check ingress configuration: `kubectl get ingress`
|
|
4. Test from within cluster: `kubectl run test --image=busybox --rm -it -- wget -O- <service-url>`
|
|
|
|
## Network Issues
|
|
|
|
### VLAN Not Working
|
|
|
|
**Symptoms:**
|
|
- VMs cannot communicate on VLAN
|
|
- Network isolation not working
|
|
|
|
**Solutions:**
|
|
1. Verify VLAN configuration: `cat /etc/network/interfaces`
|
|
2. Check bridge configuration: `ip link show`
|
|
3. Verify VLAN tagging: `qm config <vmid> | grep net`
|
|
4. Test VLAN connectivity: `ping <vlan-ip>`
|
|
|
|
### DNS Resolution Issues
|
|
|
|
**Symptoms:**
|
|
- Cannot resolve hostnames
|
|
- Service discovery not working
|
|
|
|
**Solutions:**
|
|
1. Check DNS configuration: `cat /etc/resolv.conf`
|
|
2. Test DNS resolution: `nslookup <hostname>`
|
|
3. Verify CoreDNS in Kubernetes: `kubectl get pods -n kube-system | grep coredns`
|
|
4. Check DNS service: `kubectl get svc kube-dns -n kube-system`
|
|
|
|
## Storage Issues
|
|
|
|
### Storage Not Available
|
|
|
|
**Symptoms:**
|
|
- Cannot create VMs
|
|
- Storage errors
|
|
|
|
**Solutions:**
|
|
1. Check storage status: `pvesm status`
|
|
2. Verify storage mounts: `df -h`
|
|
3. Check storage permissions: `ls -la /var/lib/vz/`
|
|
4. Review storage logs: `journalctl -u pvestatd`
|
|
|
|
### Performance Issues
|
|
|
|
**Symptoms:**
|
|
- Slow VM performance
|
|
- High I/O wait
|
|
|
|
**Solutions:**
|
|
1. Check disk I/O: `iostat -x 1`
|
|
2. Verify storage type (SSD vs HDD)
|
|
3. Check for disk errors: `dmesg | grep -i error`
|
|
4. Consider storage optimization settings
|
|
|
|
## Cloudflare Tunnel Issues
|
|
|
|
### Tunnel Not Connecting
|
|
|
|
**Symptoms:**
|
|
- Services not accessible externally
|
|
- Tunnel errors in logs
|
|
|
|
**Solutions:**
|
|
1. Check tunnel status: `cloudflared tunnel info`
|
|
2. Verify tunnel token: `echo $CLOUDFLARE_TUNNEL_TOKEN`
|
|
3. Check tunnel logs: `journalctl -u cloudflared -f`
|
|
4. Test tunnel connection: `cloudflared tunnel run <tunnel-name>`
|
|
|
|
### Zero Trust Not Working
|
|
|
|
**Symptoms:**
|
|
- Access policies not applying
|
|
- SSO not working
|
|
|
|
**Solutions:**
|
|
1. Verify Zero Trust configuration in Cloudflare Dashboard
|
|
2. Check policy rules and conditions
|
|
3. Review access logs in Cloudflare Dashboard
|
|
4. Test with different user accounts
|
|
|
|
## General Troubleshooting Steps
|
|
|
|
1. **Check Logs**: Always review relevant logs first
|
|
2. **Verify Configuration**: Ensure all configuration files are correct
|
|
3. **Test Connectivity**: Verify network connectivity between components
|
|
4. **Check Resources**: Ensure sufficient CPU, memory, and storage
|
|
5. **Review Documentation**: Check relevant documentation and runbooks
|
|
6. **Search Issues**: Look for similar issues in logs or documentation
|
|
|
|
## Getting Help
|
|
|
|
If you cannot resolve an issue:
|
|
|
|
1. Review the relevant runbook in `docs/operations/runbooks/`
|
|
2. Check the troubleshooting guide for your specific component
|
|
3. Review logs and error messages carefully
|
|
4. Document the issue with steps to reproduce
|
|
5. Check for known issues in the project repository
|
|
|
|
## Additional Resources
|
|
|
|
- [VM Troubleshooting](vm-troubleshooting.md)
|
|
- [Proxmox Operations Runbook](../operations/runbooks/proxmox-operations.md)
|
|
- [Azure Arc Troubleshooting Runbook](../operations/runbooks/azure-arc-troubleshooting.md)
|
|
|