Files
Sankofa/docs/runbooks/PROXMOX_TROUBLESHOOTING.md
defiQUG 9daf1fd378 Apply Composer changes: comprehensive API updates, migrations, middleware, and infrastructure improvements
- Add comprehensive database migrations (001-024) for schema evolution
- Enhance API schema with expanded type definitions and resolvers
- Add new middleware: audit logging, rate limiting, MFA enforcement, security, tenant auth
- Implement new services: AI optimization, billing, blockchain, compliance, marketplace
- Add adapter layer for cloud integrations (Cloudflare, Kubernetes, Proxmox, storage)
- Update Crossplane provider with enhanced VM management capabilities
- Add comprehensive test suite for API endpoints and services
- Update frontend components with improved GraphQL subscriptions and real-time updates
- Enhance security configurations and headers (CSP, CORS, etc.)
- Update documentation and configuration files
- Add new CI/CD workflows and validation scripts
- Implement design system improvements and UI enhancements
2025-12-12 18:01:35 -08:00

273 lines
6.1 KiB
Markdown

# Proxmox Troubleshooting Guide
## Common Issues and Solutions
### Provider Not Connecting
#### Symptoms
- Provider logs show connection errors
- ProviderConfig status is not Ready
- VM creation fails with connection errors
#### Solutions
1. **Verify Endpoint**:
```bash
curl -k https://your-proxmox:8006/api2/json/version
```
2. **Check Credentials**:
```bash
kubectl get secret proxmox-credentials -n crossplane-system -o yaml
```
3. **Test Authentication**:
```bash
curl -k -X POST \
-d "username=root@pam&password=your-password" \
https://your-proxmox:8006/api2/json/access/ticket
```
4. **Check Provider Logs**:
```bash
kubectl logs -n crossplane-system -l app=crossplane-provider-proxmox --tail=100
```
### VM Creation Fails
#### Symptoms
- VM resource stuck in Creating state
- Error messages in VM resource status
- No VM appears in Proxmox
#### Solutions
1. **Check VM Resource**:
```bash
kubectl describe proxmoxvm <vm-name>
```
2. **Verify Site Configuration**:
- Site must exist in ProviderConfig
- Endpoint must be reachable
- Node name must match actual Proxmox node
3. **Check Proxmox Resources**:
- Storage pool must exist
- Network bridge must exist
- OS template must exist
4. **Check Proxmox Logs**:
- Log into Proxmox Web UI
- Check System Log
- Review task history
### VM Status Not Updating
#### Symptoms
- VM status remains unknown
- IP address not populated
- State not reflecting actual VM state
#### Solutions
1. **Check Provider Connectivity**:
```bash
kubectl logs -n crossplane-system -l app=crossplane-provider-proxmox | grep -i error
```
2. **Verify VM Exists in Proxmox**:
- Check Proxmox Web UI
- Verify VM ID matches
3. **Check Reconciliation**:
```bash
kubectl get proxmoxvm <vm-name> -o yaml | grep -A 5 conditions
```
### Storage Issues
#### Symptoms
- VM creation fails with storage errors
- "Storage not found" errors
- Insufficient storage errors
#### Solutions
1. **List Available Storage**:
```bash
# Via Proxmox API
curl -k -H "Authorization: PVEAuthCookie=TOKEN" \
https://your-proxmox:8006/api2/json/storage
```
2. **Check Storage Capacity**:
- Log into Proxmox Web UI
- Check Storage section
- Verify available space
3. **Update Storage Name**:
- Verify actual storage pool name
- Update VM manifest if needed
### Network Issues
#### Symptoms
- VM created but no network connectivity
- IP address not assigned
- Network bridge errors
#### Solutions
1. **Verify Network Bridge**:
```bash
# Via Proxmox API
curl -k -H "Authorization: PVEAuthCookie=TOKEN" \
https://your-proxmox:8006/api2/json/nodes/ML110-01/network
```
2. **Check Network Configuration**:
- Verify bridge name in VM manifest
- Check bridge exists on node
- Verify bridge is active
3. **Check DHCP**:
- Verify DHCP server is running
- Check network configuration
- Review VM network settings
### Authentication Failures
#### Symptoms
- 401 Unauthorized errors
- Authentication failed messages
- Token/ticket errors
#### Solutions
1. **Verify Credentials**:
- Check username format: `user@realm`
- Verify password is correct
- Check token format if using tokens
2. **Test Authentication**:
```bash
# Password auth
curl -k -X POST \
-d "username=root@pam&password=your-password" \
https://your-proxmox:8006/api2/json/access/ticket
# Token auth
curl -k -H "Authorization: PVEAuthCookie=TOKEN" \
https://your-proxmox:8006/api2/json/version
```
3. **Check Permissions**:
- Verify user has VM creation permissions
- Check token permissions
- Review Proxmox user roles
### Provider Pod Issues
#### Symptoms
- Provider pod not starting
- Provider pod crashing
- Provider pod in Error state
#### Solutions
1. **Check Pod Status**:
```bash
kubectl get pods -n crossplane-system -l app=crossplane-provider-proxmox
kubectl describe pod -n crossplane-system -l app=crossplane-provider-proxmox
```
2. **Check Pod Logs**:
```bash
kubectl logs -n crossplane-system -l app=crossplane-provider-proxmox --tail=100
```
3. **Check Image**:
```bash
kubectl get deployment -n crossplane-system crossplane-provider-proxmox -o yaml | grep image
```
4. **Verify Resources**:
```bash
kubectl get deployment -n crossplane-system crossplane-provider-proxmox -o yaml | grep -A 5 resources
```
## Diagnostic Commands
### Check Provider Health
```bash
# Provider status
kubectl get deployment -n crossplane-system crossplane-provider-proxmox
# Provider logs
kubectl logs -n crossplane-system -l app=crossplane-provider-proxmox --tail=50
# Provider metrics
kubectl port-forward -n crossplane-system deployment/crossplane-provider-proxmox 8080:8080
curl http://localhost:8080/metrics
```
### Check VM Resources
```bash
# List all VMs
kubectl get proxmoxvm
# Get VM details
kubectl get proxmoxvm <vm-name> -o yaml
# Check VM events
kubectl describe proxmoxvm <vm-name>
```
### Check ProviderConfig
```bash
# List ProviderConfigs
kubectl get providerconfig
# Get ProviderConfig details
kubectl get providerconfig proxmox-provider-config -o yaml
# Check ProviderConfig status
kubectl describe providerconfig proxmox-provider-config
```
## Escalation Procedures
### Level 1: Basic Troubleshooting
1. Check provider logs
2. Verify credentials
3. Test connectivity
4. Review VM resource status
### Level 2: Advanced Troubleshooting
1. Check Proxmox Web UI
2. Review Proxmox logs
3. Verify network connectivity
4. Check resource availability
### Level 3: Infrastructure Issues
1. Contact Proxmox administrator
2. Check infrastructure status
3. Review network configuration
4. Verify DNS resolution
## Prevention
1. **Regular Monitoring**: Set up alerts for provider health
2. **Resource Verification**: Verify resources before deployment
3. **Credential Rotation**: Rotate credentials regularly
4. **Backup Configuration**: Backup ProviderConfig and secrets
5. **Documentation**: Keep documentation up to date
## Related Documentation
- [VM Provisioning Runbook](./PROXMOX_VM_PROVISIONING.md)
- [Deployment Guide](../proxmox/DEPLOYMENT_GUIDE.md)
- [Site Mapping](../proxmox/SITE_MAPPING.md)