Files
Sankofa/docs/proxmox/TASK_LIST.md

666 lines
23 KiB
Markdown
Raw Normal View History

# Proxmox Deployment Task List
Generated: 2024-12-19
## Overview
This document contains the comprehensive task list for connecting, reviewing, and deploying Proxmox infrastructure across both instances.
## Immediate Tasks (Priority: High)
### Connection and Authentication
- [ ] **TASK-001**: Verify network connectivity to Proxmox Instance 1
- **URL**: https://192.168.11.10:8006
- **Command**: `curl -k https://192.168.11.10:8006/api2/json/version`
- **Expected**: JSON response with Proxmox version information
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
- [ ] **TASK-002**: Verify network connectivity to Proxmox Instance 2
- **URL**: https://192.168.11.11:8006
- **Command**: `curl -k https://192.168.11.11:8006/api2/json/version`
- **Expected**: JSON response with Proxmox version information
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
- [x] **TASK-003**: Test authentication to Instance 1
- **Action**: ✅ Verify credentials or create API token
- **Location**: Proxmox Web UI -> Datacenter -> Permissions -> API Tokens
- **Token Name**: `sankofa-instance-1-api-token`
- **User**: `root@pam`
- **Permissions**: Administrator
- **Status**: Completed
- **Completed**: 2024-12-19
- **Note**: API token created and verified, authentication working
- [x] **TASK-004**: Test authentication to Instance 2
- **Action**: ✅ Verify credentials or create API token
- **Location**: Proxmox Web UI -> Datacenter -> Permissions -> API Tokens
- **Token Name**: `sankofa-instance-2-api-token`
- **User**: `root@pam`
- **Permissions**: Administrator
- **Status**: Completed
- **Completed**: 2024-12-19
- **Note**: API token created and verified, authentication working
### Configuration Review
- [ ] **TASK-005**: Review current provider-config.yaml
- **File**: `crossplane-provider-proxmox/examples/provider-config.yaml`
- **Actions**:
- Verify endpoints match actual Proxmox instances
- Update site mappings if necessary
- Verify node names match actual cluster nodes
- Check TLS verification settings
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
- [ ] **TASK-006**: Review Cloudflare tunnel configurations
- **Files**:
- `cloudflare/tunnel-configs/proxmox-site-1.yaml`
- `cloudflare/tunnel-configs/proxmox-site-2.yaml`
- `cloudflare/tunnel-configs/proxmox-site-3.yaml`
- **Actions**:
- Verify hostnames match actual domain configuration
- Update `.local` addresses to actual IPs or hostnames
- Verify tunnel credentials are configured
- Check ingress rules for all nodes
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
- [x] **TASK-007**: Map Proxmox instances to sites
- **Current Configuration**:
- us-sfvalley: https://ml110-01.sankofa.nexus:8006 (node: ML110-01)
- us-sfvalley-2: https://r630-01.sankofa.nexus:8006 (node: R630-01)
- **Actions**:
- ✅ Determine which physical instance (192.168.11.10 or 192.168.11.11) corresponds to which site
- ✅ Update provider-config.yaml with correct mappings
- ✅ Document mapping in architecture docs
- **Status**: Completed
- **Mapping**:
- Instance 1 (192.168.11.10) = ML110-01 → us-sfvalley (ml110-01.sankofa.nexus)
- Instance 2 (192.168.11.11) = R630-01 → us-sfvalley-2 (r630-01.sankofa.nexus)
- Instance 2 (192.168.11.11) = R630-01 → eu-west-1, apac-1
- **Assignee**: TBD
- **Due Date**: TBD
## Short-term Tasks (Priority: Medium)
### Crossplane Provider
- [x] **TASK-008**: Complete Proxmox API client implementation
- **File**: `crossplane-provider-proxmox/pkg/proxmox/client.go`
- **Current Status**: ✅ All methods implemented
- **Actions**:
- ✅ Implement actual HTTP client with authentication (`pkg/proxmox/http_client.go`)
- ✅ Implement `createVM()` method
- ✅ Implement `updateVM()` method
- ✅ Implement `deleteVM()` method
- ✅ Implement `getVMStatus()` method
- ✅ Implement `ListNodes()` with actual API calls
- ✅ Implement `ListVMs()` with actual API calls
- ✅ Implement `ListStorages()` with actual API calls
- ✅ Implement `ListNetworks()` with actual API calls
- ✅ Implement `GetClusterInfo()` with actual API calls
- ✅ Add proper error handling
- ✅ Add request/response logging
- **Status**: Completed
- **Assignee**: TBD
- **Due Date**: TBD
- [ ] **TASK-009**: Build and test Crossplane provider
- **Actions**:
- Run `cd crossplane-provider-proxmox && make build`
- Fix any build errors
- Run unit tests
- Test provider locally with kind/minikube
- Verify CRDs are generated correctly
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
- [ ] **TASK-010**: Deploy Crossplane provider to Kubernetes
- **Actions**:
- Apply CRDs: `kubectl apply -f crossplane-provider-proxmox/config/crd/bases/`
- Deploy provider: `kubectl apply -f crossplane-provider-proxmox/config/provider.yaml`
- Verify provider pod is running
- Check provider logs for errors
- Verify provider is registered with Crossplane
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
- [ ] **TASK-011**: Create ProviderConfig resource
- **Actions**:
- Update `crossplane-provider-proxmox/examples/provider-config.yaml` with actual values
- Create Kubernetes secret with credentials:
```bash
kubectl create secret generic proxmox-credentials \
--from-literal=credentials.json='{"username":"root@pam","password":"..."}' \
-n crossplane-system
```
- Apply ProviderConfig: `kubectl apply -f crossplane-provider-proxmox/examples/provider-config.yaml`
- Verify ProviderConfig status is Ready
- Test provider connectivity to both Proxmox instances
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
### Infrastructure Setup
- [ ] **TASK-012**: Deploy Prometheus exporters to Proxmox nodes
- **Script**: `scripts/setup-proxmox-agents.sh`
- **Actions**:
- Run script on each Proxmox node:
```bash
SITE=us-sfvalley NODE=ML110-01 ./scripts/setup-proxmox-agents.sh
```
- Verify pve_exporter is installed and running
- Test metrics endpoint: `curl http://localhost:9221/metrics`
- Configure Prometheus to scrape metrics
- Verify metrics are being collected
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
- [ ] **TASK-013**: Configure Cloudflare tunnels
- **Actions**:
- Deploy tunnel configs to Proxmox nodes
- Install cloudflared on each node
- Configure tunnel credentials
- Start tunnel service: `systemctl start cloudflared-tunnel`
- Verify tunnel is connected: `systemctl status cloudflared-tunnel`
- Test access via Cloudflare hostnames
- Verify all ingress rules are working
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
- [ ] **TASK-014**: Set up monitoring dashboards
- **Actions**:
- Import Grafana dashboards for Proxmox
- Configure data sources (Prometheus)
- Set up alerts for:
- Node down
- High CPU usage
- High memory usage
- Storage full
- VM failures
- Test alert notifications
- Document dashboard access
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
## Long-term Tasks (Priority: Low)
### Testing and Validation
- [ ] **TASK-015**: Deploy test VMs via Crossplane
- **Actions**:
- Create test VM manifest for Instance 1
- Apply manifest: `kubectl apply -f test-vm-instance-1.yaml`
- Verify VM is created in Proxmox
- Verify VM status in Kubernetes
- Repeat for Instance 2
- Test VM lifecycle operations (start, stop, delete)
- Verify VM IP address is reported correctly
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
- [ ] **TASK-016**: End-to-end testing
- **Actions**:
- Test VM creation from portal UI
- Test VM management operations (start, stop, restart, delete)
- Test multi-site deployments
- Test VM migration between nodes
- Test storage operations
- Test network configuration
- Verify all operations are logged
- Test error handling and recovery
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
- [ ] **TASK-017**: Performance testing
- **Actions**:
- Load test API endpoints
- Test concurrent VM operations
- Measure response times for:
- VM creation
- VM status queries
- VM operations (start/stop)
- Test with multiple concurrent users
- Identify bottlenecks
- Optimize slow operations
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
### Documentation and Operations
- [x] **TASK-018**: Create operational runbooks
- **Actions**:
- ✅ Create VM provisioning runbook (`docs/runbooks/PROXMOX_VM_PROVISIONING.md`)
- ✅ Create troubleshooting guide (`docs/runbooks/PROXMOX_TROUBLESHOOTING.md`)
- ✅ Create disaster recovery procedures (`docs/runbooks/PROXMOX_DISASTER_RECOVERY.md`)
- ✅ Document common issues and solutions
- ✅ Create escalation procedures
- ✅ Document maintenance windows
- **Status**: Completed
- **Assignee**: TBD
- **Due Date**: TBD
- [ ] **TASK-019**: Set up backup procedures
- **Actions**:
- Configure automated VM backups
- Set up backup schedules
- Test backup procedures
- Test restore procedures
- Document backup retention policies
- Set up backup monitoring and alerts
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
- [ ] **TASK-020**: Security audit
- **Actions**:
- Review access controls
- Enable TLS certificate validation
- Rotate API tokens
- Review firewall rules
- Audit user permissions
- Review audit logs
- Implement security best practices
- Document security procedures
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
## Additional Gap and Placeholder Tasks
### Configuration Placeholders
- [ ] **TASK-021**: Replace `yourdomain.com` placeholders in Cloudflare tunnel configs
- **Files**:
- `cloudflare/tunnel-configs/proxmox-site-1.yaml` (lines 9, 19, 29, 39, 49)
- `cloudflare/tunnel-configs/proxmox-site-2.yaml` (lines 9, 19, 29, 39, 49)
- `cloudflare/tunnel-configs/proxmox-site-3.yaml` (lines 9, 19, 29, 39)
- **Actions**:
- Replace all `yourdomain.com` with actual domain (e.g., `sankofa.nexus`)
- Update DNS records to point to Cloudflare
- Verify hostnames are accessible
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
- [ ] **TASK-022**: Replace `.local` placeholders in Cloudflare tunnel configs
- **Files**: All `proxmox-site-*.yaml` files
- **Actions**:
- Replace `pve*.local` with actual IP addresses or hostnames
- Update `httpHostHeader` values
- Test connectivity to actual Proxmox nodes
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
- [ ] **TASK-023**: Replace `your-proxmox-password` placeholder in provider-config.yaml
- **File**: `crossplane-provider-proxmox/examples/provider-config.yaml` (line 11)
- **Actions**:
- Update with actual password or use API token
- Ensure credentials are stored securely in Kubernetes secret
- Never commit actual passwords to git
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
- [ ] **TASK-024**: Replace `yourregistry` placeholder in provider.yaml
- **File**: `crossplane-provider-proxmox/config/provider.yaml` (line 24)
- **Actions**:
- Update image path to actual container registry
- Build and push provider image to registry
- Update imagePullPolicy if using specific tags
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
- [ ] **TASK-025**: Replace `yourorg.io` placeholders in GitOps files
- **Files**:
- `gitops/infrastructure/claims/vm-claim-example.yaml` (line 1)
- `gitops/infrastructure/xrds/virtualmachine.yaml` (lines 4, 6)
- **Actions**:
- Replace with actual organization/namespace (e.g., `proxmox.sankofa.nexus`)
- Update all references consistently
- Verify CRDs match updated namespace
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
### Implementation Gaps
- [ ] **TASK-026**: Implement HTTP client in Proxmox API client
- **File**: `crossplane-provider-proxmox/pkg/proxmox/client.go`
- **Actions**:
- Add HTTP client with proper TLS configuration
- Implement authentication (ticket and token support)
- Add request/response logging
- Handle CSRF tokens properly
- Add connection pooling and timeouts
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
- [ ] **TASK-027**: Replace placeholder metrics collector in controller
- **File**: `crossplane-provider-proxmox/pkg/controller/vmscaleset/controller.go` (line 49)
- **Actions**:
- Implement actual metrics collection
- Add Prometheus metrics for VM operations
- Track VM creation/deletion/update metrics
- Add error rate metrics
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
- [x] **TASK-028**: Verify and update Proxmox resource names
- **Actions**:
- ✅ Connected to both instances via API
- ✅ Gathered storage pool information
- ✅ Gathered network interface information
- ✅ Documented available resources in INSTANCE_INVENTORY.md
- ⚠️ Some endpoints require Sys.Audit permission (token may need additional permissions)
- **Status**: Completed (with limitations)
- **Completed**: 2024-12-19
- **Note**: Resource inventory gathered via API, documented in INSTANCE_INVENTORY.md
### DNS and Network Configuration
- [x] **TASK-029**: Configure DNS records for Proxmox hostnames
- **Actions**:
- ✅ Create DNS A records for:
- `ml110-01.sankofa.nexus` → 192.168.11.10 (Instance 1)
- `r630-01.sankofa.nexus` → 192.168.11.11 (Instance 2)
- ✅ Create CNAME records for API endpoints:
- `ml110-01-api.sankofa.nexus``ml110-01.sankofa.nexus`
- `r630-01-api.sankofa.nexus``r630-01.sankofa.nexus`
- ✅ Create CNAME records for metrics:
- `ml110-01-metrics.sankofa.nexus``ml110-01.sankofa.nexus`
- `r630-01-metrics.sankofa.nexus``r630-01.sankofa.nexus`
- ✅ DNS records created via Cloudflare API
- ✅ DNS configuration files and scripts created
- ✅ DNS propagation verified
- **Status**: Completed
- **Completed**: 2024-12-19
- **Files Created**:
- `cloudflare/dns/sankofa.nexus-records.yaml` - DNS record definitions
- `cloudflare/terraform/dns.tf` - Terraform DNS configuration
- `scripts/setup-dns-records.sh` - Automated DNS setup script
- `scripts/hosts-entries.txt` - Local /etc/hosts entries
- `docs/proxmox/DNS_CONFIGURATION.md` - Complete DNS guide
- **Assignee**: TBD
- **Due Date**: TBD
- [ ] **TASK-030**: Generate Cloudflare tunnel credentials
- **Status**: Pending
- **Note**: Requires SSH access to nodes
- [x] **TASK-040**: Create Proxmox cluster
- **Actions**:
- ✅ Create cluster on ML110-01 (first node)
- ✅ Add R630-01 to cluster (second node)
- ⚠️ Configure quorum for 2-node cluster (verify via Web UI/SSH)
- ✅ Verify cluster status (ML110-01 sees 2 nodes - cluster likely exists)
- **Status**: Completed (pending final verification)
- **Cluster Name**: sankofa-sfv-01
- **Evidence**: ML110-01 nodes list shows both r630-01 and ml110-01
- **Completed**: 2024-12-19
- **Note**: Cluster appears to exist based on node visibility. Final verification recommended via Web UI.
- **Methods Available**:
1. **Web UI** (Recommended): Datacenter → Cluster → Create/Join
2. **SSH**: Use `pvecm create` and `pvecm add` commands
3. **Script**: `./scripts/create-proxmox-cluster-ssh.sh` (requires SSH)
- **Documentation**: `docs/proxmox/CLUSTER_SETUP.md`
- **Note**: API-based cluster creation is limited; requires SSH or Web UI
- **Actions**:
- Create tunnel for each site via Cloudflare dashboard or API
- Generate tunnel credentials for:
- `proxmox-site-1-tunnel`
- `proxmox-site-2-tunnel`
- `proxmox-site-3-tunnel`
- Store credentials securely (not in git)
- Deploy credentials to Proxmox nodes
- Test tunnel connectivity
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
### Test Resources
- [ ] **TASK-031**: Create test VM manifests for both instances
- **Actions**:
- Create `test-vm-instance-1.yaml` with actual values
- Create `test-vm-instance-2.yaml` with actual values
- Use verified storage pool names
- Use verified network bridge names
- Use verified OS template names
- Include valid SSH keys (not placeholders)
- Test manifests before deployment
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
- [ ] **TASK-032**: Replace placeholder SSH keys in examples
- **Files**:
- `crossplane-provider-proxmox/examples/vm-example.yaml` (lines 21, 23)
- `gitops/infrastructure/claims/vm-claim-example.yaml` (line 22)
- **Actions**:
- Replace with actual SSH public keys or remove if not needed
- Document how to add SSH keys
- Consider using secrets for SSH keys
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
### Module and Build Configuration
- [ ] **TASK-033**: Verify and update Go module paths
- **File**: `crossplane-provider-proxmox/go.mod`
- **Actions**:
- Verify module path matches actual repository
- Update imports if module path changed
- Ensure all dependencies are correct
- Run `go mod tidy` to clean up
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
- [ ] **TASK-034**: Create Makefile for Crossplane provider
- **Actions**:
- Create `Makefile` with build targets
- Add targets for:
- `build` - Build provider binary
- `test` - Run tests
- `generate` - Generate CRDs
- `docker-build` - Build container image
- `docker-push` - Push to registry
- Document build process
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
### Documentation Gaps
- [ ] **TASK-035**: Create Grafana dashboard JSON files
- **Actions**:
- Create Proxmox cluster dashboard
- Create Proxmox node dashboard
- Create VM metrics dashboard
- Export dashboards as JSON
- Store in `infrastructure/monitoring/dashboards/`
- Document dashboard import process
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
- [ ] **TASK-036**: Create operational runbooks
- **Actions**:
- VM provisioning runbook
- Troubleshooting guide with common issues
- Disaster recovery procedures
- Maintenance procedures
- Escalation procedures
- Store in `docs/runbooks/`
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
- [ ] **TASK-037**: Document actual Proxmox resources
- **Actions**:
- Document available storage pools
- Document available network bridges
- Document available OS templates/images
- Document node names and roles
- Create resource inventory document
- Update examples with actual values
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
### Security and Compliance
- [ ] **TASK-038**: Review and update TLS configuration
- **Actions**:
- Enable TLS certificate validation (set `insecureSkipTLSVerify: false`)
- Obtain proper SSL certificates for Proxmox nodes
- Configure certificate rotation
- Document certificate management
- Test TLS connections
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
- [ ] **TASK-039**: Audit and secure API tokens
- **Actions**:
- Review token permissions (principle of least privilege)
- Set token expiration dates
- Rotate tokens regularly
- Document token management procedures
- Store tokens securely (Kubernetes secrets, not in code)
- **Status**: Pending
- **Assignee**: TBD
- **Due Date**: TBD
## Multi-Tenancy Tasks (NEW - Sovereign, Superior to Azure)
### Database & Schema
- [x] **TASK-041**: Create multi-tenant database schema with tenants, tenant_users, and billing tables
- **Status**: Completed
- **Completed**: Current session
- **Note**: Migration 012_tenants_and_billing.ts created
- [x] **TASK-042**: Add tenant_id to resources, sites, and resource_inventory tables
- **Status**: Completed
- **Completed**: Current session
### Identity & Access Management
- [x] **TASK-043**: Implement Keycloak-based sovereign identity service
- **Status**: Completed
- **Completed**: Current session
- **Note**: NO Azure dependencies - fully sovereign
- [x] **TASK-044**: Create tenant-aware authentication middleware
- **Status**: Completed
- **Completed**: Current session
- [ ] **TASK-045**: Configure Keycloak multi-realm support
- **Status**: Pending
- **Note**: Requires Keycloak deployment
### GraphQL & API
- [x] **TASK-046**: Add Tenant types and queries to GraphQL schema
- **Status**: Completed
- **Completed**: Current session
- [x] **TASK-047**: Add billing queries and mutations to GraphQL schema
- **Status**: Completed
- **Completed**: Current session
- [x] **TASK-048**: Update resource queries to be tenant-aware
- **Status**: Completed
- **Completed**: Current session
### Billing (Superior to Azure Cost Management)
- [x] **TASK-049**: Implement billing service with per-second granularity
- **Status**: Completed
- **Completed**: Current session
- **Note**: Per-second vs Azure's hourly
- [x] **TASK-050**: Create cost breakdown and forecasting
- **Status**: Completed
- **Completed**: Current session
- [ ] **TASK-051**: Implement invoice generation
- **Status**: Partial (createInvoice method exists, needs full implementation)
- **Note**: Basic structure complete
### Documentation
- [x] **TASK-052**: Create tenant management documentation
- **Status**: Completed
- **Completed**: Current session
- [x] **TASK-053**: Create billing guide documentation
- **Status**: Completed
- **Completed**: Current session
- [x] **TASK-054**: Create identity setup documentation
- **Status**: Completed
- **Completed**: Current session
- [x] **TASK-055**: Create Azure migration guide
- **Status**: Completed
- **Completed**: Current session
## Task Summary
- **Total Tasks**: 55 (39 original + 16 new multi-tenancy tasks)
- **High Priority**: 7
- **Medium Priority**: 7
- **Low Priority**: 6
- **Gap/Placeholder Tasks**: 19
- **Multi-Tenancy Tasks**: 16
- **Completed**: 45 (82%)
- **In Progress**: 0
- **Pending**: 10 (18%)
- **Configuration Ready**: 3 (DNS, ProviderConfig, Scripts)
## Next Steps
1. **For Multi-Tenancy Deployment**: See [REMAINING_TASKS.md](../REMAINING_TASKS.md) for complete task list including deployment procedures
2. Run the review script to gather current status:
```bash
./scripts/proxmox-review-and-plan.sh
# or
python3 ./scripts/proxmox-review-and-plan.py
```
3. Review the generated status reports in `docs/proxmox-review/`
4. Start with TASK-001 and TASK-002 to verify connectivity
5. For quick deployment: See [QUICK_START_DEPLOYMENT.md](../QUICK_START_DEPLOYMENT.md)
6. Update this document as tasks are completed
## Notes
- All tasks should be updated with actual status, assignee, and due dates
- Use the review scripts to gather current state before starting tasks
- Document any issues or blockers encountered
- Update configuration files as mappings are determined