Update documentation structure and enhance .gitignore

- Added generated index files and report directories to .gitignore to prevent unnecessary tracking of transient files.
- Updated README links to reflect new documentation paths for better navigation.
- Improved documentation organization by ensuring all links point to the correct locations, enhancing user experience and accessibility.
This commit is contained in:
defiQUG
2025-12-12 21:18:55 -08:00
parent 664707d912
commit fe0365757a
106 changed files with 4666 additions and 2294 deletions

View File

@@ -0,0 +1,152 @@
# Build and Deploy Instructions
**Date**: 2025-12-11
**Status**: ✅ **CODE FIXED - NEEDS IMAGE LOADING**
---
## Build Status
**Provider code fixed and built successfully**
- Fixed compilation errors
- Added `findVMNode` function
- Fixed variable scoping issue
- Image built: `crossplane-provider-proxmox:latest`
---
## Deployment Steps
### 1. Build Provider Image
```bash
cd crossplane-provider-proxmox
docker build -t crossplane-provider-proxmox:latest .
```
**COMPLETE**
### 2. Load Image into Kind Cluster
**Required**: `kind` command must be installed
```bash
kind load docker-image crossplane-provider-proxmox:latest --name sankofa
```
⚠️ **PENDING**: `kind` command not available in current environment
**Alternative Methods**:
#### Option A: Install kind
```bash
# Install kind
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.20.0/kind-linux-amd64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind
# Then load image
kind load docker-image crossplane-provider-proxmox:latest --name sankofa
```
#### Option B: Use Registry
```bash
# Tag and push to registry
docker tag crossplane-provider-proxmox:latest <registry>/crossplane-provider-proxmox:latest
docker push <registry>/crossplane-provider-proxmox:latest
# Update provider.yaml to use registry image
# Change imagePullPolicy from "Never" to "Always" or "IfNotPresent"
```
#### Option C: Manual Copy (Advanced)
```bash
# Save image to file
docker save crossplane-provider-proxmox:latest -o provider-image.tar
# Copy to kind node and load
docker cp provider-image.tar kind-sankofa-control-plane:/tmp/
docker exec kind-sankofa-control-plane ctr -n=k8s.io images import /tmp/provider-image.tar
```
### 3. Restart Provider
```bash
kubectl rollout restart deployment/crossplane-provider-proxmox -n crossplane-system
kubectl rollout status deployment/crossplane-provider-proxmox -n crossplane-system
```
**COMPLETE** (but using old image until step 2 is done)
### 4. Verify Deployment
```bash
kubectl get pods -n crossplane-system -l app=crossplane-provider-proxmox
kubectl logs -n crossplane-system -l app=crossplane-provider-proxmox --tail=20
```
---
## Current Status
### ✅ Completed
1. Code fixes applied
2. Provider image built
3. Templates updated to cloud image format
4. Provider deployment restarted
### ⏳ Pending
1. **Load image into kind cluster** (requires `kind` command)
2. Test VM creation with new provider
---
## Next Steps
1. **Install kind** or use alternative image loading method
2. **Load image** into cluster
3. **Restart provider** (if not already done)
4. **Test VM 100** creation
5. **Verify** task monitoring works
---
## Verification
After loading image and restarting:
1. **Check provider logs** for task monitoring:
```bash
kubectl logs -n crossplane-system -l app=crossplane-provider-proxmox | grep -i "task\|importdisk\|upid"
```
2. **Deploy VM 100**:
```bash
kubectl apply -f examples/production/vm-100.yaml
```
3. **Monitor creation**:
```bash
kubectl get proxmoxvm vm-100 -w
```
4. **Check Proxmox**:
```bash
qm status 100
qm config 100
```
---
## Expected Behavior
With the fixed provider:
- ✅ Provider waits for `importdisk` task to complete
- ✅ No lock timeouts
- ✅ VM configured correctly after import
- ✅ Boot disk attached properly
---
**Status**: ⏳ **AWAITING IMAGE LOAD INTO CLUSTER**

View File

@@ -0,0 +1,174 @@
# Code Documentation Guide
This guide outlines the standards and best practices for documenting code in the Sankofa Phoenix project.
## JSDoc Standards
### Function Documentation
All public functions should include JSDoc comments with:
- Description of what the function does
- `@param` tags for each parameter
- `@returns` tag describing the return value
- `@throws` tags for exceptions that may be thrown
- `@example` tag with usage example (for complex functions)
**Example:**
```typescript
/**
* Authenticate a user and return JWT token
*
* @param email - User email address
* @param password - User password
* @returns Authentication payload with JWT token and user information
* @throws {AuthenticationError} If credentials are invalid
* @example
* ```typescript
* const result = await login('user@example.com', 'password123');
* console.log(result.token); // JWT token
* ```
*/
export async function login(email: string, password: string): Promise<AuthPayload> {
// implementation
}
```
### Class Documentation
Classes should include:
- Description of the class purpose
- `@example` tag showing basic usage
**Example:**
```typescript
/**
* Proxmox VE Infrastructure Adapter
*
* Implements the InfrastructureAdapter interface for Proxmox VE infrastructure.
* Provides resource discovery, creation, update, deletion, metrics, and health checks.
*
* @example
* ```typescript
* const adapter = new ProxmoxAdapter({
* apiUrl: 'https://proxmox.example.com:8006',
* apiToken: 'token-id=...'
* });
* const resources = await adapter.discoverResources();
* ```
*/
export class ProxmoxAdapter implements InfrastructureAdapter {
// implementation
}
```
### Interface Documentation
Complex interfaces should include documentation:
```typescript
/**
* Resource filter criteria for querying resources
*
* @property type - Filter by resource type (e.g., 'VM', 'CONTAINER')
* @property status - Filter by resource status (e.g., 'RUNNING', 'STOPPED')
* @property siteId - Filter by site ID
* @property tenantId - Filter by tenant ID
*/
export interface ResourceFilter {
type?: string
status?: string
siteId?: string
tenantId?: string
}
```
### Method Documentation
Class methods should follow the same pattern as functions:
```typescript
/**
* Discover all resources across all Proxmox nodes
*
* @returns Array of normalized resources (VMs) from all nodes
* @throws {Error} If API connection fails or nodes cannot be retrieved
* @example
* ```typescript
* const resources = await adapter.discoverResources();
* console.log(`Found ${resources.length} VMs`);
* ```
*/
async discoverResources(): Promise<NormalizedResource[]> {
// implementation
}
```
## Inline Comments
### When to Use Inline Comments
- **Complex logic**: Explain non-obvious algorithms or business rules
- **Workarounds**: Document temporary fixes or known issues
- **Performance optimizations**: Explain why a particular approach was chosen
- **Business rules**: Document domain-specific logic
### Comment Style
```typescript
// Good: Explains why, not what
// Tenant-aware filtering (superior to Azure multi-tenancy)
if (context.tenantContext) {
// System admins can see all resources
if (context.tenantContext.isSystemAdmin) {
// No filtering needed
} else if (context.tenantContext.tenantId) {
// Filter by tenant ID
query += ` AND r.tenant_id = $${paramCount}`
}
}
// Bad: States the obvious
// Loop through nodes
for (const node of nodes) {
// Get VMs
const vms = await this.getVMs(node.node)
}
```
## TODO Comments
Use TODO comments for known improvements:
```typescript
// TODO: Add rate limiting to prevent API abuse
// TODO: Implement caching for frequently accessed resources
// FIXME: This workaround should be removed when upstream issue is fixed
```
## Documentation Checklist
When adding new code, ensure:
- [ ] Public functions have JSDoc comments
- [ ] Complex private functions have inline comments
- [ ] Classes have class-level documentation
- [ ] Interfaces have documentation for complex types
- [ ] Examples are provided for public APIs
- [ ] Error cases are documented with `@throws`
- [ ] Complex algorithms have explanatory comments
- [ ] Business rules are documented
## Tools
- **TypeScript**: Built-in JSDoc support
- **VS Code**: JSDoc snippets and IntelliSense
- **tsdoc**: Standard for TypeScript documentation comments
---
**Last Updated**: 2025-01-09

View File

@@ -0,0 +1,77 @@
# Contributing to Sankofa
**Last Updated**: 2025-01-09
Thank you for your interest in contributing to Sankofa! This document provides guidelines and instructions for contributing to the Sankofa ecosystem and Sankofa Phoenix cloud platform.
## Code of Conduct
- Be respectful and inclusive
- Welcome newcomers and help them learn
- Focus on constructive feedback
- Respect different viewpoints and experiences
## Getting Started
1. Fork the repository
2. Clone your fork: `git clone https://github.com/yourusername/Sankofa.git`
3. Create a branch: `git checkout -b feature/your-feature-name`
4. Make your changes
5. Commit your changes: `git commit -m "Add your feature"`
6. Push to your fork: `git push origin feature/your-feature-name`
7. Open a Pull Request
## Development Setup
See [DEVELOPMENT.md](./DEVELOPMENT.md) for detailed setup instructions.
## Pull Request Process
1. Ensure your code follows the project's style guidelines
2. Add tests for new features
3. Ensure all tests pass: `pnpm test`
4. Update documentation as needed
5. Ensure your branch is up to date with the main branch
6. Submit your PR with a clear description
## Coding Standards
### TypeScript/JavaScript
- Use TypeScript for all new code
- Follow the existing code style
- Use meaningful variable and function names
- Add JSDoc comments for public APIs
- Avoid `any` types - use proper typing
### React Components
- Use functional components with hooks
- Keep components small and focused
- Extract reusable logic into custom hooks
- Use proper prop types or TypeScript interfaces
### Git Commits
- Use clear, descriptive commit messages
- Follow conventional commits format when possible
- Keep commits focused on a single change
## Testing
- Write tests for all new features
- Ensure existing tests still pass
- Aim for >80% code coverage
- Test both success and error cases
## Documentation
- Update README.md if needed
- Add JSDoc comments for new functions
- Update API documentation for backend changes
- Keep architecture docs up to date
## Questions?
Feel free to open an issue for questions or reach out to the maintainers.

184
docs/guides/DEVELOPMENT.md Normal file
View File

@@ -0,0 +1,184 @@
# Development Guide
**Last Updated**: 2025-01-09
This guide will help you set up your development environment for Sankofa Phoenix.
## Prerequisites
- Node.js 18+ and pnpm (or npm/yarn)
- PostgreSQL 14+ (for API)
- Go 1.21+ (for Crossplane provider)
- Docker (optional, for local services)
## Initial Setup
### 1. Clone the Repository
```bash
git clone https://github.com/sankofa/Sankofa.git
cd Sankofa
```
### 2. Install Dependencies
```bash
# Main application
pnpm install
# Portal
cd portal
npm install
cd ..
# API
cd api
npm install
cd ..
# Crossplane Provider
cd crossplane-provider-proxmox
go mod download
cd ..
```
### 3. Set Up Environment Variables
Create `.env.local` files:
```bash
# Root .env.local
cp .env.example .env.local
# Portal .env.local
cd portal
cp .env.example .env.local
cd ..
# API .env.local
cd api
cp .env.example .env.local
cd ..
```
### 4. Set Up Database
```bash
# Create database
createdb sankofa
# Run migrations
cd api
npm run db:migrate
```
## Running the Application
### Development Mode
```bash
# Main app (port 3000)
pnpm dev
# Portal (port 3001)
cd portal
npm run dev
# API (port 4000)
cd api
npm run dev
```
### Running Tests
```bash
# Main app tests
pnpm test
# Portal tests
cd portal
npm test
# Crossplane provider tests
cd crossplane-provider-proxmox
go test ./...
```
## Project Structure
```
Sankofa/
├── src/ # Main Next.js app
├── portal/ # Portal application
├── api/ # GraphQL API server
├── crossplane-provider-proxmox/ # Crossplane provider
├── gitops/ # GitOps configurations
├── cloudflare/ # Cloudflare configs
└── docs/ # Documentation
```
## Common Tasks
### Adding a New Component
1. Create component in `src/components/`
2. Add tests in `src/components/**/*.test.tsx`
3. Export from appropriate index file
4. Update Storybook (if applicable)
### Adding a New API Endpoint
1. Add GraphQL type definition in `api/src/schema/typeDefs.ts`
2. Add resolver in `api/src/schema/resolvers.ts`
3. Add service logic in `api/src/services/`
4. Add tests
### Database Migrations
```bash
cd api
# Create migration
npm run db:migrate:create migration-name
# Run migrations
npm run db:migrate
```
## Debugging
### Frontend
- Use React DevTools
- Check browser console
- Use Next.js debug mode: `NODE_OPTIONS='--inspect' pnpm dev`
### Backend
- Use VS Code debugger
- Check API logs
- Use GraphQL Playground at `http://localhost:4000/graphql`
## Code Quality
### Linting
```bash
pnpm lint
```
### Type Checking
```bash
pnpm type-check
```
### Formatting
```bash
pnpm format
```
## Troubleshooting
See [TROUBLESHOOTING.md](./TROUBLESHOOTING.md) for common issues and solutions.

View File

@@ -0,0 +1,134 @@
# Force Unlock VM Instructions
**Date**: 2025-12-09
**Issue**: `qm unlock 100` is timing out
---
## Problem
The `qm unlock` command is timing out, which indicates:
- A stuck process is holding the lock
- The lock file is corrupted or in an invalid state
- Another operation is blocking the unlock
---
## Solution: Force Unlock
### Option 1: Use the Script (Recommended)
**On Proxmox Node (root@ml110-01)**:
```bash
# Copy the script to the Proxmox node
# Or run commands manually (see Option 2)
# Run the script
bash force-unlock-vm-proxmox.sh 100
```
### Option 2: Manual Commands
**On Proxmox Node (root@ml110-01)**:
```bash
# 1. Check for stuck processes
ps aux | grep -E 'qm|qemu' | grep 100
# 2. Check lock file
ls -la /var/lock/qemu-server/lock-100.conf
cat /var/lock/qemu-server/lock-100.conf 2>/dev/null
# 3. Kill stuck processes (if found)
pkill -9 -f 'qm.*100'
pkill -9 -f 'qemu.*100'
# 4. Wait a moment
sleep 2
# 5. Force remove lock file
rm -f /var/lock/qemu-server/lock-100.conf
# 6. Verify lock is gone
ls -la /var/lock/qemu-server/lock-100.conf
# Should show: No such file or directory
# 7. Check VM status
qm status 100
# 8. Try unlock again (should work now)
qm unlock 100
```
---
## If Lock Persists
### Check for Other Issues
```bash
# Check if VM is in a transitional state
qm status 100
# Check VM configuration
qm config 100
# Check for other locks
ls -la /var/lock/qemu-server/lock-*.conf
# Check system resources
df -h
free -h
```
### Nuclear Option: Restart Proxmox Services
**⚠️ WARNING: This will affect all VMs on the node**
```bash
# Only if absolutely necessary
systemctl restart pve-cluster
systemctl restart pvedaemon
```
---
## After Successful Unlock
1. **Monitor VM Status**:
```bash
qm status 100
```
2. **Check Provider Logs** (from Kubernetes):
```bash
kubectl logs -n crossplane-system -l app=crossplane-provider-proxmox --tail=50 -f
```
3. **Watch VM Resource**:
```bash
kubectl get proxmoxvm basic-vm-001 -w
```
4. **Expected Outcome**:
- Provider will retry within 1 minute
- VM configuration will complete
- VM will boot successfully
---
## Prevention
To prevent this issue in the future:
1. **Ensure proper VM shutdown** before operations
2. **Wait for operations to complete** before starting new ones
3. **Monitor for stuck processes** regularly
4. **Implement lock timeout handling** in provider code (already added)
---
**Last Updated**: 2025-12-09
**Status**: ⚠️ **MANUAL FORCE UNLOCK REQUIRED**

View File

@@ -0,0 +1,217 @@
# Keycloak Deployment
**Last Updated**: 2025-01-09 Guide
This guide covers deploying and configuring Keycloak for the Sankofa Phoenix platform.
## Prerequisites
- Kubernetes cluster with admin access
- kubectl configured
- Helm 3.x installed
- PostgreSQL database (for Keycloak persistence)
- Domain name configured (e.g., `keycloak.sankofa.nexus`)
## Deployment Steps
### 1. Deploy Keycloak via Helm
```bash
# Add Keycloak Helm repository
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
# Create namespace
kubectl create namespace keycloak
# Deploy Keycloak
helm install keycloak bitnami/keycloak \
--namespace keycloak \
--set auth.adminUser=admin \
--set auth.adminPassword=$(openssl rand -base64 32) \
--set postgresql.enabled=true \
--set postgresql.auth.postgresPassword=$(openssl rand -base64 32) \
--set ingress.enabled=true \
--set ingress.hostname=keycloak.sankofa.nexus \
--set ingress.tls=true \
--set ingress.certManager=true \
--set service.type=ClusterIP \
--set service.port=8080
```
### 2. Configure Keycloak Clients
Apply the client configuration:
```bash
kubectl apply -f gitops/apps/keycloak/keycloak-clients.yaml
```
Or configure manually via Keycloak Admin Console:
#### Portal Client
- **Client ID**: `portal-client`
- **Client Protocol**: `openid-connect`
- **Access Type**: `confidential`
- **Valid Redirect URIs**:
- `https://portal.sankofa.nexus/*`
- `http://localhost:3000/*` (for development)
- **Web Origins**: `+`
- **Standard Flow Enabled**: Yes
- **Direct Access Grants Enabled**: Yes
#### API Client
- **Client ID**: `api-client`
- **Client Protocol**: `openid-connect`
- **Access Type**: `confidential`
- **Service Accounts Enabled**: Yes
- **Standard Flow Enabled**: Yes
### 3. Configure Multi-Realm Support
For multi-tenant support, create realms per tenant:
```bash
# Create realm for tenant
kubectl exec -it -n keycloak deployment/keycloak -- \
/opt/bitnami/keycloak/bin/kcadm.sh create realms \
-s realm=tenant-1 \
-s enabled=true \
--no-config \
--server http://localhost:8080 \
--realm master \
--user admin \
--password $(kubectl get secret keycloak-admin -n keycloak -o jsonpath='{.data.password}' | base64 -d)
```
### 4. Configure Identity Providers
#### LDAP/Active Directory
1. Navigate to Identity Providers in Keycloak Admin Console
2. Add LDAP provider
3. Configure connection settings:
- **Vendor**: Active Directory (or other)
- **Connection URL**: `ldap://your-ldap-server:389`
- **Users DN**: `ou=Users,dc=example,dc=com`
- **Bind DN**: `cn=admin,dc=example,dc=com`
- **Bind Credential**: (stored in secret)
#### SAML Providers
1. Add SAML 2.0 provider
2. Configure:
- **Entity ID**: Your SAML entity ID
- **SSO URL**: Your SAML SSO endpoint
- **Signing Certificate**: Your SAML signing certificate
### 5. Enable Blockchain Identity Verification
For blockchain-based identity verification:
1. Install Keycloak Identity Provider plugin (if available)
2. Configure blockchain connection:
- **Blockchain RPC URL**: `https://besu.sankofa.nexus:8545`
- **Contract Address**: (deployed identity contract)
- **Private Key**: (stored in Kubernetes Secret)
### 6. Configure Environment Variables
Update API service environment variables:
```yaml
env:
- name: KEYCLOAK_URL
value: "https://keycloak.sankofa.nexus"
- name: KEYCLOAK_REALM
value: "master" # or tenant-specific realm
- name: KEYCLOAK_CLIENT_ID
value: "api-client"
- name: KEYCLOAK_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: keycloak-client-secret
key: api-client-secret
```
### 7. Set Up Secrets
Create Kubernetes secrets for client credentials:
```bash
# Create secret for API client
kubectl create secret generic keycloak-client-secret \
--from-literal=api-client-secret=$(openssl rand -base64 32) \
--namespace keycloak
# Create secret for portal client
kubectl create secret generic keycloak-portal-secret \
--from-literal=portal-client-secret=$(openssl rand -base64 32) \
--namespace keycloak
```
### 8. Configure Cloudflare Access
If using Cloudflare Zero Trust:
1. Configure Cloudflare Access application for Keycloak
2. Set domain: `keycloak.sankofa.nexus`
3. Configure access policies (see `cloudflare/access-policies.yaml`)
4. Require MFA for admin access
### 9. Verify Deployment
```bash
# Check Keycloak pods
kubectl get pods -n keycloak
# Check Keycloak service
kubectl get svc -n keycloak
# Test Keycloak health
curl https://keycloak.sankofa.nexus/health
# Access Admin Console
# https://keycloak.sankofa.nexus/admin
```
### 10. Post-Deployment Configuration
1. **Change Admin Password**: Change default admin password immediately
2. **Configure Email**: Set up SMTP for password reset emails
3. **Enable MFA**: Configure TOTP and backup codes
4. **Set Up Themes**: Customize Keycloak themes for branding
5. **Configure Events**: Set up event listeners for audit logging
6. **Backup Configuration**: Export realm configuration regularly
## Troubleshooting
### Keycloak Not Starting
- Check PostgreSQL connection
- Verify resource limits
- Check logs: `kubectl logs -n keycloak deployment/keycloak`
### Client Authentication Failing
- Verify client secret matches
- Check redirect URIs are correct
- Verify realm name matches
### Multi-Realm Issues
- Ensure realm names match tenant IDs
- Verify realm is enabled
- Check realm configuration
## Security Best Practices
1. **Use Strong Passwords**: Generate strong passwords for all accounts
2. **Enable MFA**: Require MFA for admin and privileged users
3. **Rotate Secrets**: Regularly rotate client secrets
4. **Monitor Access**: Enable audit logging
5. **Use HTTPS**: Always use TLS for Keycloak
6. **Limit Admin Access**: Restrict admin console access via Cloudflare Access
7. **Backup Regularly**: Export and backup realm configurations
## References
- [Keycloak Documentation](https://www.keycloak.org/documentation)
- [Keycloak Helm Chart](https://github.com/bitnami/charts/tree/main/bitnami/keycloak)
- Client configuration: `gitops/apps/keycloak/keycloak-clients.yaml`

View File

@@ -0,0 +1,237 @@
# Migration Guide
**Last Updated**: 2025-01-09
## Overview
This guide provides instructions for migrating between versions of Sankofa Phoenix and migrating from other platforms.
## Table of Contents
- [API Version Migration](#api-version-migration)
- [Database Migration](#database-migration)
- [Configuration Migration](#configuration-migration)
- [Azure Migration](#azure-migration)
- [Deployment Migration](#deployment-migration)
---
## API Version Migration
### Migrating Between API Versions
See [API Versioning Guide](./api/API_VERSIONING.md) for detailed API migration instructions.
### Quick Steps
1. Review API changelog for breaking changes
2. Update client code to use new API version
3. Test all API interactions
4. Deploy updated client code
5. Monitor for issues
---
## Database Migration
### Schema Migrations
Database migrations are managed automatically:
```bash
# Run migrations
cd api
npm run db:migrate
# Rollback if needed
npm run db:rollback
```
### Manual Migration Steps
1. **Backup Database**: Always backup before migration
```bash
pg_dump sankofa > backup_$(date +%Y%m%d).sql
```
2. **Run Migrations**: Execute migration scripts
```bash
npm run db:migrate
```
3. **Verify Migration**: Check migration status
```bash
npm run db:migrate:status
```
4. **Test Application**: Verify application functionality
5. **Monitor**: Watch for errors post-migration
### Data Migration
For data migrations:
1. **Export Data**: Export from source
2. **Transform Data**: Apply necessary transformations
3. **Import Data**: Import to new schema
4. **Validate**: Verify data integrity
5. **Update References**: Update any code references
---
## Configuration Migration
### Environment Variables
When updating configuration:
1. **Review Changes**: Check configuration changes in release notes
2. **Update `.env` Files**: Update environment variables
3. **Test Configuration**: Verify configuration is correct
4. **Deploy**: Deploy updated configuration
### Configuration Files
```bash
# Backup current configuration
cp .env.local .env.local.backup
# Update configuration
# Edit .env.local with new values
# Verify configuration
npm run config:validate
```
---
## Azure Migration
### From Azure to Sankofa Phoenix
See [Azure Migration Guide](./tenants/AZURE_MIGRATION.md) for comprehensive Azure migration instructions.
### Key Migration Areas
1. **Identity**: Migrate from Azure AD to Keycloak
2. **Resources**: Migrate VMs and resources
3. **Networking**: Update network configurations
4. **Storage**: Migrate data and storage
5. **Applications**: Update application configurations
---
## Deployment Migration
### Upgrading Deployment
1. **Review Release Notes**: Check for breaking changes
2. **Update Dependencies**: Update package versions
3. **Run Tests**: Ensure all tests pass
4. **Deploy**: Follow deployment procedures
5. **Verify**: Confirm deployment success
### Rolling Back Deployment
1. **Identify Issue**: Determine what needs rollback
2. **Stop Services**: Stop affected services
3. **Restore Previous Version**: Deploy previous version
4. **Restore Database** (if needed): Restore database backup
5. **Verify**: Confirm rollback success
---
## Common Migration Scenarios
### Scenario 1: Minor Version Update
**Steps:**
1. Review changelog
2. Update dependencies
3. Run tests
4. Deploy
5. Verify
### Scenario 2: Major Version Update
**Steps:**
1. Review migration guide for major version
2. Backup all data
3. Update configuration
4. Run database migrations
5. Update code for breaking changes
6. Test thoroughly
7. Deploy in staging first
8. Deploy to production
9. Monitor closely
### Scenario 3: Platform Migration
**Steps:**
1. Plan migration timeline
2. Set up new platform
3. Migrate data
4. Migrate applications
5. Update DNS/configurations
6. Test thoroughly
7. Cutover
8. Monitor and verify
---
## Migration Checklist
### Pre-Migration
- [ ] Review migration documentation
- [ ] Backup all data
- [ ] Test migration in staging
- [ ] Notify stakeholders
- [ ] Schedule migration window
### During Migration
- [ ] Execute migration steps
- [ ] Monitor progress
- [ ] Verify each step
- [ ] Document any issues
### Post-Migration
- [ ] Verify all functionality
- [ ] Test critical paths
- [ ] Monitor for errors
- [ ] Update documentation
- [ ] Communicate completion
---
## Troubleshooting
### Common Issues
1. **Migration Fails**: Check logs, rollback if needed
2. **Data Loss**: Restore from backup
3. **Configuration Errors**: Verify environment variables
4. **Service Downtime**: Check service status and logs
### Getting Help
- Check [Troubleshooting Guide](./TROUBLESHOOTING_GUIDE.md)
- Review migration documentation
- Check logs for specific errors
- Contact support if needed
---
## Related Documentation
- [API Versioning Guide](./api/API_VERSIONING.md)
- [Deployment Guide](./DEPLOYMENT.md)
- [Troubleshooting Guide](./TROUBLESHOOTING_GUIDE.md)
- [Azure Migration Guide](./tenants/AZURE_MIGRATION.md)
---
**Note**: Always backup data before performing migrations. Test migrations in a staging environment first.

View File

@@ -0,0 +1,339 @@
# Monitoring and Observability Guide
**Last Updated**: 2025-01-09
This guide covers monitoring setup, Grafana dashboards, and observability for Sankofa Phoenix.
## Overview
Sankofa Phoenix uses a comprehensive monitoring stack:
- **Prometheus**: Metrics collection and storage
- **Grafana**: Visualization and dashboards
- **Loki**: Log aggregation
- **Alertmanager**: Alert routing and notification
## Tenant-Aware Metrics
All metrics are tagged with tenant IDs for multi-tenant isolation.
### Metric Naming Convention
```
sankofa_<component>_<metric>_<unit>{tenant_id="<id>",...}
```
Examples:
- `sankofa_api_requests_total{tenant_id="tenant-1",method="POST",status="200"}`
- `sankofa_billing_cost_usd{tenant_id="tenant-1",service="compute"}`
- `sankofa_proxmox_vm_cpu_usage_percent{tenant_id="tenant-1",vm_id="101"}`
## Grafana Dashboards
### 1. System Overview Dashboard
**Location**: `grafana/dashboards/system-overview.json`
**Metrics**:
- API request rate and latency
- Database connection pool usage
- Keycloak authentication rate
- System resource usage (CPU, memory, disk)
**Panels**:
- Request rate (requests/sec)
- P95 latency (ms)
- Error rate (%)
- Active connections
- Authentication success rate
### 2. Tenant Dashboard
**Location**: `grafana/dashboards/tenant-overview.json`
**Metrics**:
- Tenant resource usage
- Tenant cost tracking
- Tenant API usage
- Tenant user activity
**Panels**:
- Resource usage by tenant
- Cost breakdown by tenant
- API calls by tenant
- Active users by tenant
### 3. Billing Dashboard
**Location**: `grafana/dashboards/billing.json`
**Metrics**:
- Real-time cost tracking
- Cost by service/resource
- Budget vs actual spend
- Cost forecast
- Billing anomalies
**Panels**:
- Current month cost
- Cost trend (7d, 30d)
- Top resources by cost
- Budget utilization
- Anomaly detection alerts
### 4. Proxmox Infrastructure Dashboard
**Location**: `grafana/dashboards/proxmox-infrastructure.json`
**Metrics**:
- VM status and health
- Node resource usage
- Storage utilization
- Network throughput
- VM creation/deletion rate
**Panels**:
- VM status overview
- Node CPU/memory usage
- Storage pool usage
- Network I/O
- VM lifecycle events
### 5. Security Dashboard
**Location**: `grafana/dashboards/security.json`
**Metrics**:
- Authentication events
- Failed login attempts
- Policy violations
- Incident response metrics
- Audit log events
**Panels**:
- Authentication success/failure rate
- Policy violations by severity
- Incident response time
- Audit log volume
- Security events timeline
## Prometheus Configuration
### Scrape Configs
```yaml
scrape_configs:
- job_name: 'sankofa-api'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- api
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: api
metric_relabel_configs:
- source_labels: [tenant_id]
target_label: tenant_id
regex: '(.+)'
replacement: '${1}'
- job_name: 'proxmox'
static_configs:
- targets:
- proxmox-exporter:9091
relabel_configs:
- source_labels: [__address__]
target_label: instance
```
### Recording Rules
```yaml
groups:
- name: sankofa_rules
interval: 30s
rules:
- record: sankofa:api:requests:rate5m
expr: rate(sankofa_api_requests_total[5m])
- record: sankofa:billing:cost:rate1h
expr: rate(sankofa_billing_cost_usd[1h])
- record: sankofa:proxmox:vm:count
expr: count(sankofa_proxmox_vm_info) by (tenant_id)
```
## Alerting Rules
### Critical Alerts
```yaml
groups:
- name: sankofa_critical
interval: 30s
rules:
- alert: HighErrorRate
expr: rate(sankofa_api_requests_total{status=~"5.."}[5m]) > 0.1
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"
description: "Error rate is {{ $value }} errors/sec"
- alert: DatabaseConnectionPoolExhausted
expr: sankofa_db_connections_active / sankofa_db_connections_max > 0.9
for: 2m
labels:
severity: critical
annotations:
summary: "Database connection pool nearly exhausted"
- alert: BudgetExceeded
expr: sankofa_billing_cost_usd / sankofa_billing_budget_usd > 1.0
for: 1h
labels:
severity: warning
annotations:
summary: "Budget exceeded for tenant {{ $labels.tenant_id }}"
- alert: ProxmoxNodeDown
expr: up{job="proxmox"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Proxmox node {{ $labels.instance }} is down"
```
### Billing Anomaly Detection
```yaml
- name: sankofa_billing_anomalies
interval: 1h
rules:
- alert: CostAnomalyDetected
expr: |
(
sankofa_billing_cost_usd
- predict_linear(sankofa_billing_cost_usd[7d], 3600)
) / predict_linear(sankofa_billing_cost_usd[7d], 3600) > 0.5
for: 2h
labels:
severity: warning
annotations:
summary: "Unusual cost increase detected for tenant {{ $labels.tenant_id }}"
```
## Real-Time Cost Tracking
### Metrics Exposed
- `sankofa_billing_cost_usd{tenant_id, service, resource_id}` - Current cost
- `sankofa_billing_cost_rate_usd_per_hour{tenant_id}` - Cost rate
- `sankofa_billing_budget_usd{tenant_id}` - Budget limit
- `sankofa_billing_budget_utilization_percent{tenant_id}` - Budget usage %
### Grafana Query Example
```promql
# Current month cost by tenant
sum(sankofa_billing_cost_usd) by (tenant_id)
# Cost trend (7 days)
rate(sankofa_billing_cost_usd[1h]) * 24 * 7
# Budget utilization
sankofa_billing_cost_usd / sankofa_billing_budget_usd * 100
```
## Log Aggregation
### Loki Configuration
Logs are collected with tenant context:
```yaml
clients:
- url: http://loki:3100/loki/api/v1/push
tenant_id: ${TENANT_ID}
```
### Log Labels
- `tenant_id`: Tenant identifier
- `service`: Service name (api, portal, etc.)
- `level`: Log level (info, warn, error)
- `component`: Component name
### Log Queries
```logql
# Errors for a specific tenant
{tenant_id="tenant-1", level="error"}
# API errors in last hour
{service="api", level="error"} | json | timestamp > now() - 1h
# Authentication failures
{component="auth"} | json | status="failed"
```
## Deployment
### Install Monitoring Stack
```bash
# Add Prometheus Operator Helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Install kube-prometheus-stack
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--values grafana/values.yaml
# Apply custom dashboards
kubectl apply -f grafana/dashboards/
```
### Import Dashboards
```bash
# Import all dashboards
for dashboard in grafana/dashboards/*.json; do
kubectl create configmap $(basename $dashboard .json) \
--from-file=$dashboard \
--namespace=monitoring \
--dry-run=client -o yaml | kubectl apply -f -
done
```
## Access
- **Grafana**: https://grafana.sankofa.nexus
- **Prometheus**: https://prometheus.sankofa.nexus
- **Alertmanager**: https://alertmanager.sankofa.nexus
Default credentials (change immediately):
- Username: `admin`
- Password: (from secret `monitoring-grafana`)
## Best Practices
1. **Tenant Isolation**: Always filter metrics by tenant_id
2. **Retention**: Configure appropriate retention periods
3. **Cardinality**: Avoid high-cardinality labels
4. **Alerts**: Set up alerting for critical metrics
5. **Dashboards**: Create tenant-specific dashboards
6. **Cost Tracking**: Monitor billing metrics closely
7. **Anomaly Detection**: Enable anomaly detection for billing
## References
- Dashboard definitions: `grafana/dashboards/`
- Prometheus config: `monitoring/prometheus/`
- Alert rules: `monitoring/alerts/`

View File

@@ -0,0 +1,428 @@
# Operations Runbook
**Last Updated**: 2025-01-09
This runbook provides operational procedures for Sankofa Phoenix.
## Table of Contents
1. [Daily Operations](#daily-operations)
2. [Tenant Management](#tenant-management)
3. [Backup Procedures](#backup-procedures)
4. [Incident Response](#incident-response)
5. [Maintenance Windows](#maintenance-windows)
6. [Troubleshooting](#troubleshooting)
## Daily Operations
### Health Checks
```bash
# Check all pods
kubectl get pods --all-namespaces
# Check API health
curl https://api.sankofa.nexus/health
# Check Keycloak health
curl https://keycloak.sankofa.nexus/health
# Check database connections
kubectl exec -it -n api deployment/api -- \
psql $DATABASE_URL -c "SELECT 1"
```
### Monitoring Dashboard Review
1. Review system overview dashboard
2. Check error rates and latency
3. Review billing anomalies
4. Check security events
5. Review Proxmox infrastructure status
### Log Review
```bash
# Recent errors
kubectl logs -n api deployment/api --tail=100 | grep -i error
# Authentication failures
kubectl logs -n api deployment/api | grep -i "auth.*fail"
# Billing issues
kubectl logs -n api deployment/api | grep -i billing
```
## Tenant Management
### Create New Tenant
```bash
# Via GraphQL
mutation {
createTenant(input: {
name: "New Tenant"
domain: "tenant.example.com"
tier: STANDARD
}) {
id
name
status
}
}
# Or via API
curl -X POST https://api.sankofa.nexus/graphql \
-H "Authorization: Bearer $TOKEN" \
-d '{"query": "mutation { createTenant(...) }"}'
```
### Suspend Tenant
```bash
# Update tenant status
mutation {
updateTenant(id: "tenant-id", input: { status: SUSPENDED }) {
id
status
}
}
```
### Delete Tenant
```bash
# Soft delete (recommended)
mutation {
updateTenant(id: "tenant-id", input: { status: DELETED }) {
id
status
}
}
# Hard delete (requires confirmation)
# This will delete all tenant resources
```
### Tenant Resource Quotas
```bash
# Check quota usage
query {
tenant(id: "tenant-id") {
quotaLimits {
compute { vcpu memory instances }
storage { total perInstance }
}
usage {
totalCost
byResource {
resourceId
cost
}
}
}
}
```
## Backup Procedures
### Database Backups
#### Automated Backups
Backups run daily at 2 AM UTC:
```bash
# Check backup job status
kubectl get cronjob -n api postgres-backup
# View recent backups
kubectl get pvc -n api | grep backup
```
#### Manual Backup
```bash
# Create backup
kubectl exec -it -n api deployment/postgres -- \
pg_dump -U sankofa sankofa > backup-$(date +%Y%m%d).sql
# Restore from backup
kubectl exec -i -n api deployment/postgres -- \
psql -U sankofa sankofa < backup-20240101.sql
```
### Keycloak Backups
```bash
# Export realm configuration
kubectl exec -it -n keycloak deployment/keycloak -- \
/opt/keycloak/bin/kcadm.sh get realms/master \
--realm master \
--server http://localhost:8080 \
--user admin \
--password $ADMIN_PASSWORD > keycloak-realm-$(date +%Y%m%d).json
```
### Proxmox Backups
```bash
# Backup VM configuration
# Via Proxmox API or UI
# Store in version control or backup storage
```
### Tenant-Specific Backups
```bash
# Export tenant data
query {
tenant(id: "tenant-id") {
id
name
resources {
id
name
type
}
}
}
# Backup tenant resources
# Use resource export API or database dump filtered by tenant_id
```
## Incident Response
### Incident Classification
- **P0 - Critical**: System down, data loss, security breach
- **P1 - High**: Major feature broken, performance degradation
- **P2 - Medium**: Minor feature broken, non-critical issues
- **P3 - Low**: Cosmetic issues, minor bugs
### Incident Response Process
1. **Detection**: Monitor alerts, user reports
2. **Triage**: Classify severity, assign owner
3. **Containment**: Isolate affected systems
4. **Investigation**: Root cause analysis
5. **Resolution**: Fix and verify
6. **Post-Mortem**: Document and improve
### Common Incidents
#### API Down
```bash
# Check pod status
kubectl get pods -n api
# Check logs
kubectl logs -n api deployment/api --tail=100
# Restart if needed
kubectl rollout restart deployment/api -n api
# Check database
kubectl exec -it -n api deployment/postgres -- \
psql -U sankofa -c "SELECT 1"
```
#### Database Connection Issues
```bash
# Check connection pool
kubectl exec -it -n api deployment/api -- \
curl http://localhost:4000/metrics | grep db_connections
# Restart API to reset connections
kubectl rollout restart deployment/api -n api
# Check database load
kubectl exec -it -n api deployment/postgres -- \
psql -U sankofa -c "SELECT * FROM pg_stat_activity"
```
#### High Error Rate
```bash
# Check error logs
kubectl logs -n api deployment/api | grep -i error | tail -50
# Check recent deployments
kubectl rollout history deployment/api -n api
# Rollback if needed
kubectl rollout undo deployment/api -n api
```
#### Billing Anomaly
```bash
# Check billing metrics
curl https://prometheus.sankofa.nexus/api/v1/query?query=sankofa_billing_cost_usd
# Review recent usage records
query {
usage(tenantId: "tenant-id", timeRange: {...}) {
totalCost
byResource {
resourceId
cost
}
}
}
# Check for resource leaks
kubectl get resources --all-namespaces | grep tenant-id
```
## Maintenance Windows
### Scheduled Maintenance
Maintenance windows are scheduled:
- **Weekly**: Sunday 2-4 AM UTC (low traffic)
- **Monthly**: First Sunday 2-6 AM UTC (major updates)
### Pre-Maintenance Checklist
- [ ] Notify all tenants (24h advance)
- [ ] Create backup of database
- [ ] Create backup of Keycloak
- [ ] Review recent changes
- [ ] Prepare rollback plan
- [ ] Set maintenance mode flag
### Maintenance Mode
```bash
# Enable maintenance mode
kubectl set env deployment/api -n api MAINTENANCE_MODE=true
# Disable maintenance mode
kubectl set env deployment/api -n api MAINTENANCE_MODE=false
```
### Post-Maintenance Checklist
- [ ] Verify all services are up
- [ ] Run health checks
- [ ] Check error rates
- [ ] Verify backups completed
- [ ] Notify tenants of completion
- [ ] Update documentation
## Troubleshooting
### API Not Responding
```bash
# Check pod status
kubectl describe pod -n api -l app=api
# Check logs
kubectl logs -n api -l app=api --tail=100
# Check resource limits
kubectl top pod -n api
# Check network policies
kubectl get networkpolicies -n api
```
### Database Performance Issues
```bash
# Check slow queries
kubectl exec -it -n api deployment/postgres -- \
psql -U sankofa -c "SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10"
# Check table sizes
kubectl exec -it -n api deployment/postgres -- \
psql -U sankofa -c "SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size FROM pg_tables ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC LIMIT 10"
# Analyze tables
kubectl exec -it -n api deployment/postgres -- \
psql -U sankofa -c "ANALYZE"
```
### Keycloak Issues
```bash
# Check Keycloak logs
kubectl logs -n keycloak deployment/keycloak --tail=100
# Check database connection
kubectl exec -it -n keycloak deployment/keycloak -- \
curl http://localhost:8080/health/ready
# Restart Keycloak
kubectl rollout restart deployment/keycloak -n keycloak
```
### Proxmox Integration Issues
```bash
# Check Crossplane provider
kubectl get pods -n crossplane-system | grep proxmox
# Check provider logs
kubectl logs -n crossplane-system deployment/crossplane-provider-proxmox
# Test Proxmox connection
kubectl exec -it -n crossplane-system deployment/crossplane-provider-proxmox -- \
curl https://proxmox-endpoint:8006/api2/json/version
```
## Security Audit
### Monthly Security Review
1. Review access logs
2. Check for failed authentication attempts
3. Review policy violations
4. Check for unusual API usage
5. Review incident response logs
6. Update security documentation
### Access Review
```bash
# List all users
query {
users {
id
email
role
lastLogin
}
}
# Review tenant access
query {
tenant(id: "tenant-id") {
users {
id
email
role
}
}
}
```
## Emergency Contacts
- **On-Call Engineer**: (configure in PagerDuty/Opsgenie)
- **Database Admin**: (configure)
- **Security Team**: (configure)
- **Management**: (configure)
## References
- Monitoring Guide: `docs/MONITORING_GUIDE.md`
- Deployment Guide: `docs/DEPLOYMENT_GUIDE.md`
- Keycloak Guide: `docs/KEYCLOAK_DEPLOYMENT.md`

View File

@@ -0,0 +1,136 @@
# pnpm Migration Guide
This guide explains the package management setup for the Sankofa Phoenix project.
## Current Status
The project supports both **pnpm** (recommended) and **npm** (fallback) for package management.
- **Root**: Uses `pnpm` with `pnpm-lock.yaml`
- **API**: Supports both `pnpm` and `npm` (via `.npmrc` configuration)
- **Portal**: Supports both `pnpm` and `npm` (via `.npmrc` configuration)
## Why pnpm?
pnpm offers several advantages:
1. **Disk Space Efficiency**: Shared dependency store across projects
2. **Speed**: Faster installation due to content-addressable storage
3. **Strict Dependency Resolution**: Prevents phantom dependencies
4. **Better Monorepo Support**: Excellent for managing multiple packages
## Installation
### Using pnpm (Recommended)
```bash
# Install pnpm globally
npm install -g pnpm
# Or using corepack (Node.js 16.13+)
corepack enable
corepack prepare pnpm@latest --activate
# Install dependencies
pnpm install
# In API directory
cd api
pnpm install
# In Portal directory
cd portal
pnpm install
```
### Using npm (Fallback)
```bash
# Install dependencies with npm
npm install
# In API directory
cd api
npm install
# In Portal directory
cd portal
npm install
```
## CI/CD
The CI/CD pipeline (`.github/workflows/ci.yml`) supports both package managers:
```yaml
- name: Install dependencies
run: npm install --frozen-lockfile || pnpm install --frozen-lockfile
```
This ensures CI works regardless of which package manager is used locally.
## Migration Steps (Optional)
If you want to fully migrate to pnpm:
1. **Remove package-lock.json files** (if any exist):
```bash
find . -name "package-lock.json" -not -path "*/node_modules/*" -delete
```
2. **Install with pnpm**:
```bash
pnpm install
```
3. **Verify installation**:
```bash
pnpm list
```
4. **Update CI/CD** (optional):
- The current CI already supports both, so no changes needed
- You can make it pnpm-only if desired
## Benefits of Current Setup
The current flexible setup provides:
- ✅ **Backward Compatibility**: Works with both package managers
- ✅ **Team Flexibility**: Team members can use their preferred tool
- ✅ **CI Resilience**: CI works with either package manager
- ✅ **Gradual Migration**: Can migrate at own pace
## Recommended Practice
While both are supported, we recommend:
- **Local Development**: Use `pnpm` for better performance
- **CI/CD**: Current setup (both supported) is fine
- **Documentation**: Update to reflect pnpm as primary, npm as fallback
## Troubleshooting
### Module not found errors
If you encounter module resolution issues:
1. Delete `node_modules` and lock file
2. Reinstall with your chosen package manager:
```bash
rm -rf node_modules package-lock.json
pnpm install # or npm install
```
### Lock file conflicts
If you see conflicts between `package-lock.json` and `pnpm-lock.yaml`:
- Use `.gitignore` to exclude `package-lock.json` (already configured)
- Team should agree on primary package manager
- Document choice in README
---
**Last Updated**: 2025-01-09

View File

@@ -0,0 +1,70 @@
# Quick Guide: Install Guest Agent via Proxmox Console
## Problem
VMs are not accessible via SSH from your current network location. Use Proxmox Web UI console instead.
## Solution: Proxmox Web UI Console
### Access Proxmox Web UI
**Site 1:** https://192.168.11.10:8006
**Site 2:** https://192.168.11.11:8006
### For Each VM (14 total):
1. **Open VM Console:**
- Click on the VM in Proxmox Web UI
- Click **"Console"** button
- Console opens in browser
2. **Login:**
- Username: `admin`
- Password: (your VM password)
3. **Install Guest Agent:**
```bash
sudo apt-get update
sudo apt-get install -y qemu-guest-agent
sudo systemctl enable qemu-guest-agent
sudo systemctl start qemu-guest-agent
sudo systemctl status qemu-guest-agent
```
4. **Verify:**
- Should see: `active (running)`
### After Installing on All VMs
Run verification:
```bash
./scripts/verify-guest-agent-complete.sh
./scripts/check-all-vm-ips.sh
```
## VM List
**Site 1 (8 VMs):**
- 136: nginx-proxy-vm
- 139: smom-management
- 141: smom-rpc-node-01
- 142: smom-rpc-node-02
- 145: smom-sentry-01
- 146: smom-sentry-02
- 150: smom-validator-01
- 151: smom-validator-02
**Site 2 (6 VMs):**
- 101: smom-rpc-node-03
- 104: smom-validator-04
- 137: cloudflare-tunnel-vm
- 138: smom-blockscout
- 144: smom-rpc-node-04
- 148: smom-sentry-04
## Expected Result
Once guest agent is running:
- ✅ Proxmox can automatically detect IP addresses
- ✅ IP assignment capability fully functional
- ✅ All guest agent features available

15
docs/guides/README.md Normal file
View File

@@ -0,0 +1,15 @@
# Guides
This directory contains step-by-step guides and how-to documentation.
## Contents
- **[Build and Deploy Instructions](BUILD_AND_DEPLOY_INSTRUCTIONS.md)** - Instructions for building and deploying the system
- **[Force Unlock Instructions](FORCE_UNLOCK_INSTRUCTIONS.md)** - Instructions for force unlocking resources
- **[Quick Install Guest Agent](QUICK_INSTALL_GUEST_AGENT.md)** - Quick installation guide for guest agent
- **[Enable Guest Agent Manual](enable-guest-agent-manual.md)** - Manual steps for enabling guest agent
---
**Last Updated**: 2025-01-09

293
docs/guides/TESTING.md Normal file
View File

@@ -0,0 +1,293 @@
# Testing Guide
**Last Updated**: 2025-01-09 for Sankofa Phoenix
## Overview
This guide covers testing strategies, test suites, and best practices for the Sankofa Phoenix platform.
## Test Structure
```
api/
src/
services/
__tests__/
*.test.ts # Unit tests for services
adapters/
__tests__/
*.test.ts # Adapter tests
schema/
__tests__/
*.test.ts # GraphQL resolver tests
src/
components/
__tests__/
*.test.tsx # Component tests
lib/
__tests__/
*.test.ts # Utility tests
blockchain/
tests/
*.test.ts # Smart contract tests
```
## Running Tests
### Frontend Tests
```bash
npm test # Run all frontend tests
npm test -- --ui # Run with Vitest UI
npm test -- --coverage # Generate coverage report
```
### Backend Tests
```bash
cd api
npm test # Run all API tests
npm test -- --coverage # Generate coverage report
```
### Blockchain Tests
```bash
cd blockchain
npm test # Run smart contract tests
```
### E2E Tests
```bash
npm run test:e2e # Run end-to-end tests
```
## Test Types
### 1. Unit Tests
Test individual functions and methods in isolation.
**Example: Resource Service Test**
```typescript
import { describe, it, expect, vi } from 'vitest'
import { getResources } from '../services/resource'
describe('getResources', () => {
it('should return resources', async () => {
const mockContext = createMockContext()
const result = await getResources(mockContext)
expect(result).toBeDefined()
})
})
```
### 2. Integration Tests
Test interactions between multiple components.
**Example: GraphQL Resolver Test**
```typescript
import { describe, it, expect } from 'vitest'
import { createTestSchema } from '../schema'
import { graphql } from 'graphql'
describe('Resource Resolvers', () => {
it('should query resources', async () => {
const query = `
query {
resources {
id
name
}
}
`
const result = await graphql(createTestSchema(), query)
expect(result.data).toBeDefined()
})
})
```
### 3. Component Tests
Test React components in isolation.
**Example: ResourceList Component Test**
```typescript
import { render, screen } from '@testing-library/react'
import { ResourceList } from '../ResourceList'
describe('ResourceList', () => {
it('should render resources', async () => {
render(<ResourceList />)
await waitFor(() => {
expect(screen.getByText('Test Resource')).toBeInTheDocument()
})
})
})
```
### 4. E2E Tests
Test complete user workflows.
**Example: Resource Provisioning E2E**
```typescript
import { test, expect } from '@playwright/test'
test('should provision resource', async ({ page }) => {
await page.goto('/resources')
await page.click('text=Provision Resource')
await page.fill('[name="name"]', 'test-resource')
await page.selectOption('[name="type"]', 'VM')
await page.click('text=Create')
await expect(page.locator('text=test-resource')).toBeVisible()
})
```
## Test Coverage Goals
- **Unit Tests**: >80% coverage
- **Integration Tests**: >60% coverage
- **Component Tests**: >70% coverage
- **E2E Tests**: Critical user paths covered
## Mocking
### Mock Database
```typescript
const mockDb = {
query: vi.fn().mockResolvedValue({ rows: [] }),
}
```
### Mock GraphQL Client
```typescript
vi.mock('@/lib/graphql/client', () => ({
apolloClient: {
query: vi.fn(),
mutate: vi.fn(),
},
}))
```
### Mock Provider APIs
```typescript
global.fetch = vi.fn().mockResolvedValue({
ok: true,
json: async () => ({ data: [] }),
})
```
## Test Utilities
### Test Helpers
```typescript
// test-utils.tsx
export function createMockContext(): Context {
return {
db: createMockDb(),
user: {
id: 'test-user',
email: 'test@sankofa.nexus',
name: 'Test User',
role: 'ADMIN',
},
}
}
```
### Test Data Factories
```typescript
export function createMockResource(overrides = {}) {
return {
id: 'resource-1',
name: 'Test Resource',
type: 'VM',
status: 'RUNNING',
...overrides,
}
}
```
## CI/CD Integration
Tests run automatically on:
- **Pull Requests**: All test suites
- **Main Branch**: All tests + coverage reports
- **Releases**: Full test suite + E2E tests
## Best Practices
1. **Write tests before fixing bugs** (TDD approach)
2. **Test edge cases and error conditions**
3. **Keep tests independent and isolated**
4. **Use descriptive test names**
5. **Mock external dependencies**
6. **Clean up after tests**
7. **Maintain test coverage**
## Performance Testing
### Load Testing
```bash
# Use k6 for load testing
k6 run tests/load/api-load-test.js
```
### Stress Testing
```bash
# Test API under load
artillery run tests/stress/api-stress.yml
```
## Security Testing
- **Dependency scanning**: `npm audit`
- **SAST**: SonarQube analysis
- **DAST**: OWASP ZAP scans
- **Penetration testing**: Quarterly assessments
## Test Reports
Test reports are generated in:
- `coverage/` - Coverage reports
- `test-results/` - Test execution results
- `playwright-report/` - E2E test reports
## Troubleshooting Tests
### Tests Timing Out
- Check for unclosed connections
- Verify mocks are properly reset
- Increase timeout values if needed
### Flaky Tests
- Ensure tests are deterministic
- Fix race conditions
- Use proper wait conditions
### Database Test Issues
- Ensure test database is isolated
- Clean up test data after each test
- Use transactions for isolation

View File

@@ -0,0 +1,314 @@
# Test Examples and Patterns
This document provides examples and patterns for writing tests in the Sankofa Phoenix project.
## Unit Tests
### Testing Service Functions
```typescript
// api/src/services/auth.test.ts
import { describe, it, expect, vi, beforeEach } from 'vitest'
import { login } from './auth'
import { getDb } from '../db'
import { AppErrors } from '../lib/errors'
// Mock dependencies
vi.mock('../db')
vi.mock('../lib/errors')
describe('auth service', () => {
beforeEach(() => {
vi.clearAllMocks()
})
it('should authenticate valid user', async () => {
const mockDb = {
query: vi.fn().mockResolvedValue({
rows: [{
id: '1',
email: 'user@example.com',
name: 'Test User',
password_hash: '$2a$10$hashed',
role: 'USER',
created_at: new Date(),
updated_at: new Date(),
}]
})
}
vi.mocked(getDb).mockReturnValue(mockDb as any)
// Mock bcrypt.compare to return true
vi.mock('bcryptjs', () => ({
compare: vi.fn().mockResolvedValue(true)
}))
const result = await login('user@example.com', 'password123')
expect(result).toHaveProperty('token')
expect(result.user.email).toBe('user@example.com')
})
it('should throw error for invalid credentials', async () => {
const mockDb = {
query: vi.fn().mockResolvedValue({
rows: []
})
}
vi.mocked(getDb).mockReturnValue(mockDb as any)
await expect(login('invalid@example.com', 'wrong')).rejects.toThrow()
})
})
```
### Testing GraphQL Resolvers
```typescript
// api/src/schema/resolvers.test.ts
import { describe, it, expect, vi } from 'vitest'
import { resolvers } from './resolvers'
import * as resourceService from '../services/resource'
vi.mock('../services/resource')
describe('GraphQL resolvers', () => {
it('should return resources', async () => {
const mockContext = {
user: { id: '1', email: 'test@example.com', role: 'USER' },
db: {} as any,
tenantContext: null
}
const mockResources = [
{ id: '1', name: 'Resource 1', type: 'VM', status: 'RUNNING' }
]
vi.mocked(resourceService.getResources).mockResolvedValue(mockResources as any)
const result = await resolvers.Query.resources({}, {}, mockContext)
expect(result).toEqual(mockResources)
expect(resourceService.getResources).toHaveBeenCalledWith(mockContext, undefined)
})
})
```
### Testing Adapters
```typescript
// api/src/adapters/proxmox/adapter.test.ts
import { describe, it, expect, vi, beforeEach } from 'vitest'
import { ProxmoxAdapter } from './adapter'
// Mock fetch
global.fetch = vi.fn()
describe('ProxmoxAdapter', () => {
let adapter: ProxmoxAdapter
beforeEach(() => {
adapter = new ProxmoxAdapter({
apiUrl: 'https://proxmox.example.com:8006',
apiToken: 'test-token'
})
vi.clearAllMocks()
})
it('should discover resources', async () => {
vi.mocked(fetch)
.mockResolvedValueOnce({
ok: true,
json: async () => ({
data: [{ node: 'node1' }]
})
} as Response)
.mockResolvedValueOnce({
ok: true,
json: async () => ({
data: [
{ vmid: 100, name: 'vm-100', status: 'running' }
]
})
} as Response)
const resources = await adapter.discoverResources()
expect(resources).toHaveLength(1)
expect(resources[0].name).toBe('vm-100')
})
it('should handle API errors', async () => {
vi.mocked(fetch).mockResolvedValueOnce({
ok: false,
status: 401,
statusText: 'Unauthorized',
text: async () => 'Authentication failed'
} as Response)
await expect(adapter.discoverResources()).rejects.toThrow()
})
})
```
## Integration Tests
### Testing Database Operations
```typescript
// api/src/services/resource.integration.test.ts
import { describe, it, expect, beforeAll, afterAll } from 'vitest'
import { getDb } from '../db'
import { createResource, getResource } from './resource'
describe('resource service integration', () => {
let db: any
let context: any
beforeAll(async () => {
db = getDb()
context = {
user: { id: 'test-user', role: 'ADMIN' },
db,
tenantContext: null
}
})
afterAll(async () => {
// Cleanup test data
await db.query('DELETE FROM resources WHERE name LIKE $1', ['test-%'])
await db.end()
})
it('should create and retrieve resource', async () => {
const input = {
name: 'test-vm',
type: 'VM',
siteId: 'test-site'
}
const created = await createResource(context, input)
expect(created.name).toBe('test-vm')
const retrieved = await getResource(context, created.id)
expect(retrieved.id).toBe(created.id)
expect(retrieved.name).toBe('test-vm')
})
})
```
## E2E Tests
### Testing API Endpoints
```typescript
// e2e/api.test.ts
import { describe, it, expect, beforeAll } from 'vitest'
import { request } from './helpers'
describe('API E2E tests', () => {
let authToken: string
beforeAll(async () => {
// Login to get token
const response = await request('/graphql', {
method: 'POST',
body: JSON.stringify({
query: `
mutation {
login(email: "test@example.com", password: "test123") {
token
}
}
`
})
})
const data = await response.json()
authToken = data.data.login.token
})
it('should get resources', async () => {
const response = await request('/graphql', {
method: 'POST',
headers: {
'Authorization': `Bearer ${authToken}`
},
body: JSON.stringify({
query: `
query {
resources {
id
name
type
}
}
`
})
})
const data = await response.json()
expect(data.data.resources).toBeInstanceOf(Array)
})
})
```
## React Component Tests
```typescript
// portal/src/components/Dashboard.test.tsx
import { describe, it, expect, vi } from 'vitest'
import { render, screen, waitFor } from '@testing-library/react'
import { Dashboard } from './Dashboard'
vi.mock('../lib/crossplane-client', () => ({
createCrossplaneClient: () => ({
getVMs: vi.fn().mockResolvedValue([
{ id: '1', name: 'vm-1', status: 'running' }
])
})
}))
describe('Dashboard', () => {
it('should render VM list', async () => {
render(<Dashboard />)
await waitFor(() => {
expect(screen.getByText('vm-1')).toBeInTheDocument()
})
})
})
```
## Best Practices
1. **Use descriptive test names**: Describe what is being tested
2. **Arrange-Act-Assert pattern**: Structure tests clearly
3. **Mock external dependencies**: Don't rely on real external services
4. **Test error cases**: Verify error handling
5. **Clean up test data**: Remove data created during tests
6. **Use fixtures**: Create reusable test data
7. **Test edge cases**: Include boundary conditions
8. **Keep tests isolated**: Tests should not depend on each other
## Running Tests
```bash
# Run all tests
pnpm test
# Run tests in watch mode
pnpm test:watch
# Run tests with coverage
pnpm test:coverage
# Run specific test file
pnpm test path/to/test/file.test.ts
```
---
**Last Updated**: 2025-01-09

View File

@@ -0,0 +1,523 @@
# Troubleshooting Guide
**Last Updated**: 2025-01-09
Common issues and solutions for Sankofa Phoenix.
## Table of Contents
1. [API Issues](#api-issues)
2. [Database Issues](#database-issues)
3. [Authentication Issues](#authentication-issues)
4. [Resource Provisioning](#resource-provisioning)
5. [Billing Issues](#billing-issues)
6. [Performance Issues](#performance-issues)
7. [Deployment Issues](#deployment-issues)
## API Issues
### API Not Responding
**Symptoms:**
- 503 Service Unavailable
- Connection timeout
- Health check fails
**Diagnosis:**
```bash
# Check pod status
kubectl get pods -n api
# Check logs
kubectl logs -n api deployment/api --tail=100
# Check service
kubectl get svc -n api api
```
**Solutions:**
1. Restart API deployment:
```bash
kubectl rollout restart deployment/api -n api
```
2. Check resource limits:
```bash
kubectl describe pod -n api -l app=api
```
3. Verify database connection:
```bash
kubectl exec -it -n api deployment/api -- \
psql $DATABASE_URL -c "SELECT 1"
```
### GraphQL Query Errors
**Symptoms:**
- GraphQL errors in response
- "Internal server error"
- Query timeouts
**Diagnosis:**
```bash
# Check API logs for errors
kubectl logs -n api deployment/api | grep -i error
# Test GraphQL endpoint
curl -X POST https://api.sankofa.nexus/graphql \
-H "Content-Type: application/json" \
-d '{"query": "{ health { status } }"}'
```
**Solutions:**
1. Check query syntax
2. Verify authentication token
3. Check database query performance
4. Review resolver logs
### Rate Limiting
**Symptoms:**
- 429 Too Many Requests
- Rate limit headers present
**Solutions:**
1. Implement request batching
2. Use subscriptions for real-time updates
3. Request rate limit increase (admin)
4. Implement client-side caching
## Database Issues
### Connection Pool Exhausted
**Symptoms:**
- "Too many connections" errors
- Slow query responses
- Database connection timeouts
**Diagnosis:**
```bash
# Check active connections
kubectl exec -it -n api deployment/postgres -- \
psql -U sankofa -c "SELECT count(*) FROM pg_stat_activity"
# Check connection pool metrics
curl https://api.sankofa.nexus/metrics | grep db_connections
```
**Solutions:**
1. Increase connection pool size:
```yaml
env:
- name: DB_POOL_SIZE
value: "30"
```
2. Close idle connections:
```sql
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle' AND state_change < NOW() - INTERVAL '5 minutes';
```
3. Restart API to reset connections
### Slow Queries
**Symptoms:**
- High query latency
- Timeout errors
- Database CPU high
**Diagnosis:**
```sql
-- Find slow queries
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Check table sizes
SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;
```
**Solutions:**
1. Add database indexes:
```sql
CREATE INDEX idx_resources_tenant_id ON resources(tenant_id);
CREATE INDEX idx_resources_status ON resources(status);
```
2. Analyze tables:
```sql
ANALYZE resources;
```
3. Optimize queries
4. Consider read replicas for heavy read workloads
### Database Lock Issues
**Symptoms:**
- Queries hanging
- "Lock timeout" errors
- Deadlock errors
**Solutions:**
1. Check for long-running transactions:
```sql
SELECT pid, state, query, now() - xact_start AS duration
FROM pg_stat_activity
WHERE state = 'active' AND xact_start IS NOT NULL
ORDER BY duration DESC;
```
2. Terminate blocking queries (if safe)
3. Review transaction isolation levels
4. Break up large transactions
## Authentication Issues
### Token Expired
**Symptoms:**
- 401 Unauthorized
- "Token expired" error
- Keycloak errors
**Solutions:**
1. Refresh token via Keycloak
2. Re-authenticate
3. Check token expiration settings in Keycloak
### Invalid Token
**Symptoms:**
- 401 Unauthorized
- "Invalid token" error
**Diagnosis:**
```bash
# Verify Keycloak is accessible
curl https://keycloak.sankofa.nexus/health
# Check Keycloak logs
kubectl logs -n keycloak deployment/keycloak --tail=100
```
**Solutions:**
1. Verify token format
2. Check Keycloak client configuration
3. Verify token signature
4. Check clock synchronization
### Permission Denied
**Symptoms:**
- 403 Forbidden
- "Access denied" error
**Solutions:**
1. Verify user role in Keycloak
2. Check tenant context
3. Review RBAC policies
4. Verify resource ownership
## Resource Provisioning
### VM Creation Fails
**Symptoms:**
- Resource stuck in PENDING
- Proxmox errors
- Crossplane errors
**Diagnosis:**
```bash
# Check Crossplane provider
kubectl get pods -n crossplane-system | grep proxmox
# Check ProxmoxVM resource
kubectl describe proxmoxvm -n default test-vm
# Check Proxmox connectivity
kubectl exec -it -n crossplane-system deployment/crossplane-provider-proxmox -- \
curl https://proxmox-endpoint:8006/api2/json/version
```
**Solutions:**
1. Verify Proxmox credentials
2. Check Proxmox node availability
3. Verify resource quotas
4. Check Crossplane provider logs
### Resource Update Fails
**Symptoms:**
- Update mutation fails
- Resource not updating
- Status mismatch
**Solutions:**
1. Check resource state
2. Verify update permissions
3. Review resource constraints
4. Check for conflicting updates
## Billing Issues
### Incorrect Costs
**Symptoms:**
- Unexpected charges
- Missing usage records
- Cost discrepancies
**Diagnosis:**
```sql
-- Check usage records
SELECT * FROM usage_records
WHERE tenant_id = 'tenant-id'
ORDER BY timestamp DESC
LIMIT 100;
-- Check billing calculations
SELECT * FROM invoices
WHERE tenant_id = 'tenant-id'
ORDER BY created_at DESC;
```
**Solutions:**
1. Review usage records
2. Verify pricing configuration
3. Check for duplicate records
4. Recalculate costs if needed
### Budget Alerts Not Triggering
**Symptoms:**
- Budget exceeded but no alert
- Alerts not sent
**Diagnosis:**
```sql
-- Check budget status
SELECT * FROM budgets
WHERE tenant_id = 'tenant-id';
-- Check alert configuration
SELECT * FROM billing_alerts
WHERE tenant_id = 'tenant-id' AND enabled = true;
```
**Solutions:**
1. Verify alert configuration
2. Check alert evaluation schedule
3. Review notification channels
4. Test alert manually
### Invoice Generation Fails
**Symptoms:**
- Invoice creation error
- Missing line items
- PDF generation fails
**Solutions:**
1. Check usage records exist
2. Verify billing period
3. Check PDF service
4. Review invoice template
## Performance Issues
### High Latency
**Symptoms:**
- Slow API responses
- Timeout errors
- High P95 latency
**Diagnosis:**
```bash
# Check API metrics
curl https://api.sankofa.nexus/metrics | grep request_duration
# Check database performance
kubectl exec -it -n api deployment/postgres -- \
psql -U sankofa -c "SELECT * FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10"
```
**Solutions:**
1. Add caching layer
2. Optimize database queries
3. Scale API horizontally
4. Review N+1 query problems
### High Memory Usage
**Symptoms:**
- OOM kills
- Pod restarts
- Memory warnings
**Solutions:**
1. Increase memory limits
2. Review memory leaks
3. Optimize data structures
4. Implement pagination
### High CPU Usage
**Symptoms:**
- Slow responses
- CPU throttling
- Pod evictions
**Solutions:**
1. Scale horizontally
2. Optimize algorithms
3. Add caching
4. Review expensive operations
## Deployment Issues
### Pods Not Starting
**Symptoms:**
- Pods in Pending/CrashLoopBackOff
- Image pull errors
- Init container failures
**Diagnosis:**
```bash
# Check pod status
kubectl describe pod -n api <pod-name>
# Check events
kubectl get events -n api --sort-by='.lastTimestamp'
# Check logs
kubectl logs -n api <pod-name>
```
**Solutions:**
1. Check image availability
2. Verify resource requests/limits
3. Check node resources
4. Review init container logs
### Service Not Accessible
**Symptoms:**
- Service unreachable
- DNS resolution fails
- Ingress errors
**Diagnosis:**
```bash
# Check service
kubectl get svc -n api
# Check ingress
kubectl describe ingress -n api api
# Test service directly
kubectl port-forward -n api svc/api 8080:80
curl http://localhost:8080/health
```
**Solutions:**
1. Verify service selector matches pods
2. Check ingress configuration
3. Verify DNS records
4. Check network policies
### Configuration Issues
**Symptoms:**
- Wrong environment variables
- Missing secrets
- ConfigMap errors
**Solutions:**
1. Verify environment variables:
```bash
kubectl exec -n api deployment/api -- env | grep -E "DB_|KEYCLOAK_"
```
2. Check secrets:
```bash
kubectl get secrets -n api
```
3. Review ConfigMaps:
```bash
kubectl get configmaps -n api
```
## Getting Help
### Logs
```bash
# API logs
kubectl logs -n api deployment/api --tail=100 -f
# Database logs
kubectl logs -n api deployment/postgres --tail=100
# Keycloak logs
kubectl logs -n keycloak deployment/keycloak --tail=100
# Crossplane logs
kubectl logs -n crossplane-system deployment/crossplane-provider-proxmox --tail=100
```
### Metrics
```bash
# Prometheus queries
curl 'https://prometheus.sankofa.nexus/api/v1/query?query=up'
# Grafana dashboards
# Access: https://grafana.sankofa.nexus
```
### Support
- **Documentation**: See `docs/` directory
- **Operations Runbook**: `docs/OPERATIONS_RUNBOOK.md`
- **API Documentation**: `docs/API_DOCUMENTATION.md`
## Common Error Messages
### "Database connection failed"
- Check database pod status
- Verify connection string
- Check network policies
### "Authentication required"
- Verify token in request
- Check token expiration
- Verify Keycloak is accessible
### "Quota exceeded"
- Review tenant quotas
- Check resource usage
- Request quota increase
### "Resource not found"
- Verify resource ID
- Check tenant context
- Review access permissions
### "Internal server error"
- Check application logs
- Review error details
- Check system resources

View File

@@ -0,0 +1,153 @@
# Enable Guest Agent on VMs
## Automated Scripts (Recommended)
The project includes automated scripts for managing guest agent:
### Enable Guest Agent
```bash
./scripts/enable-guest-agent-existing-vms.sh
```
This script will:
- Automatically discover all nodes on each Proxmox site
- Automatically discover all VMs on each node
- Check if guest agent is already enabled
- Enable guest agent on VMs that need it
- Provide detailed summary statistics
### Verify Guest Agent Status
```bash
./scripts/verify-guest-agent.sh
```
This script will:
- List all VMs with their guest agent status
- Show which VMs have guest agent enabled/disabled
- Provide per-node and per-site summaries
- Display VM names and VMIDs for easy identification
## Manual Instructions (Alternative)
If the automated script doesn't work, you can use Proxmox CLI via SSH.
## Site 1 (ml110-01) - 192.168.11.10
### Step 1: Connect to Proxmox Host
```bash
ssh root@192.168.11.10
```
### Step 2: Enable Guest Agent for All VMs
```bash
for vmid in 118 132 133 127 128 123 124 121; do
echo "Enabling guest agent on VMID $vmid..."
qm set $vmid --agent 1
echo "✅ VMID $vmid done"
done
```
### Step 3: Verify (Optional)
```bash
for vmid in 118 132 133 127 128 123 124 121; do
agent=$(qm config $vmid | grep '^agent:' | cut -d: -f2 | tr -d ' ')
echo "VMID $vmid: agent=${agent:-not set}"
done
```
### Step 4: Exit
```bash
exit
```
## Site 2 (r630-01) - 192.168.11.11
### Step 1: Connect to Proxmox Host
```bash
ssh root@192.168.11.11
```
### Step 2: Enable Guest Agent for All VMs
```bash
for vmid in 119 134 135 122 129 130 125 126 131 120; do
echo "Enabling guest agent on VMID $vmid..."
qm set $vmid --agent 1
echo "✅ VMID $vmid done"
done
```
### Step 3: Verify (Optional)
```bash
for vmid in 119 134 135 122 129 130 125 126 131 120; do
agent=$(qm config $vmid | grep '^agent:' | cut -d: -f2 | tr -d ' ')
echo "VMID $vmid: agent=${agent:-not set}"
done
```
### Step 4: Exit
```bash
exit
```
## Quick One-Liners (Alternative)
If you have SSH key-based authentication set up, you can run these one-liners:
```bash
# Site 1
ssh root@192.168.11.10 "for vmid in 118 132 133 127 128 123 124 121; do qm set \$vmid --agent 1; done"
# Site 2
ssh root@192.168.11.11 "for vmid in 119 134 135 122 129 130 125 126 131 120; do qm set \$vmid --agent 1; done"
```
## VMID Reference
### Site 1 (ml110-01)
- 118: nginx-proxy-vm
- 132: smom-validator-01
- 133: smom-validator-02
- 127: smom-sentry-01
- 128: smom-sentry-02
- 123: smom-rpc-node-01
- 124: smom-rpc-node-02
- 121: smom-management
### Site 2 (r630-01)
- 119: cloudflare-tunnel-vm
- 134: smom-validator-03
- 135: smom-validator-04
- 122: smom-sentry-03
- 129: smom-sentry-04
- 130: smom-rpc-node-03
- 125: smom-rpc-node-04
- 126: smom-services
- 131: smom-blockscout
- 120: smom-monitoring
## Next Steps
After enabling guest agent in Proxmox:
1. **Wait for VMs to get IP addresses** (if they don't have them yet)
2. **Install guest agent package in each VM** (if not already installed):
```bash
ssh admin@<vm-ip>
sudo apt-get update
sudo apt-get install -y qemu-guest-agent
sudo systemctl enable qemu-guest-agent
sudo systemctl start qemu-guest-agent
```
## Automatic Guest Agent Enablement
**New VMs** created with the updated Crossplane provider will automatically have guest agent enabled in Proxmox configuration. The provider code has been updated to set `agent=1` for all new VMs, cloned VMs, and when updating existing VMs.
The guest agent package (`qemu-guest-agent`) is also automatically installed via cloud-init userData in the VM manifests, so new VMs will have both:
1. Guest agent enabled in Proxmox config (`agent=1`)
2. Guest agent package installed and running in the OS
For existing VMs, use the automated script above or follow the manual instructions below.