Files

defiQUG 9daf1fd378 Apply Composer changes: comprehensive API updates, migrations, middleware, and infrastructure improvements

- Add comprehensive database migrations (001-024) for schema evolution
- Enhance API schema with expanded type definitions and resolvers
- Add new middleware: audit logging, rate limiting, MFA enforcement, security, tenant auth
- Implement new services: AI optimization, billing, blockchain, compliance, marketplace
- Add adapter layer for cloud integrations (Cloudflare, Kubernetes, Proxmox, storage)
- Update Crossplane provider with enhanced VM management capabilities
- Add comprehensive test suite for API endpoints and services
- Update frontend components with improved GraphQL subscriptions and real-time updates
- Enhance security configurations and headers (CSP, CORS, etc.)
- Update documentation and configuration files
- Add new CI/CD workflows and validation scripts
- Implement design system improvements and UI enhancements

2025-12-12 18:01:35 -08:00

9.8 KiB

Raw Blame History

Troubleshooting Guide

Common issues and solutions for Sankofa Phoenix.

API Issues
Database Issues
Authentication Issues
Resource Provisioning
Billing Issues
Performance Issues
Deployment Issues

API Issues

API Not Responding

Symptoms:

503 Service Unavailable
Connection timeout
Health check fails

Diagnosis:

# Check pod status
kubectl get pods -n api

# Check logs
kubectl logs -n api deployment/api --tail=100

# Check service
kubectl get svc -n api api

Solutions:

Restart API deployment:

kubectl rollout restart deployment/api -n api

Check resource limits:
```
kubectl describe pod -n api -l app=api
```

Verify database connection:

kubectl exec -it -n api deployment/api -- \
  psql $DATABASE_URL -c "SELECT 1"

GraphQL Query Errors

Symptoms:

GraphQL errors in response
"Internal server error"
Query timeouts

Diagnosis:

# Check API logs for errors
kubectl logs -n api deployment/api | grep -i error

# Test GraphQL endpoint
curl -X POST https://api.sankofa.nexus/graphql \
  -H "Content-Type: application/json" \
  -d '{"query": "{ health { status } }"}'

Solutions:

Check query syntax
Verify authentication token
Check database query performance
Review resolver logs

Rate Limiting

Symptoms:

429 Too Many Requests
Rate limit headers present

Solutions:

Implement request batching
Use subscriptions for real-time updates
Request rate limit increase (admin)
Implement client-side caching

Database Issues

Connection Pool Exhausted

Symptoms:

"Too many connections" errors
Slow query responses
Database connection timeouts

Diagnosis:

# Check active connections
kubectl exec -it -n api deployment/postgres -- \
  psql -U sankofa -c "SELECT count(*) FROM pg_stat_activity"

# Check connection pool metrics
curl https://api.sankofa.nexus/metrics | grep db_connections

Solutions:

Increase connection pool size:

env:
  - name: DB_POOL_SIZE
    value: "30"

Close idle connections:

SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle' AND state_change < NOW() - INTERVAL '5 minutes';

Restart API to reset connections

Slow Queries

Symptoms:

High query latency
Timeout errors
Database CPU high

Diagnosis:

-- Find slow queries
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

-- Check table sizes
SELECT schemaname, tablename, pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

Solutions:

Add database indexes:

CREATE INDEX idx_resources_tenant_id ON resources(tenant_id);
CREATE INDEX idx_resources_status ON resources(status);

Analyze tables:
```
ANALYZE resources;
```
Optimize queries
Consider read replicas for heavy read workloads

Database Lock Issues

Symptoms:

Queries hanging
"Lock timeout" errors
Deadlock errors

Solutions:

Check for long-running transactions:

SELECT pid, state, query, now() - xact_start AS duration
FROM pg_stat_activity
WHERE state = 'active' AND xact_start IS NOT NULL
ORDER BY duration DESC;

Terminate blocking queries (if safe)
Review transaction isolation levels
Break up large transactions

Authentication Issues

Token Expired

Symptoms:

401 Unauthorized
"Token expired" error
Keycloak errors

Solutions:

Refresh token via Keycloak
Re-authenticate
Check token expiration settings in Keycloak

Invalid Token

Symptoms:

401 Unauthorized
"Invalid token" error

Diagnosis:

# Verify Keycloak is accessible
curl https://keycloak.sankofa.nexus/health

# Check Keycloak logs
kubectl logs -n keycloak deployment/keycloak --tail=100

Solutions:

Verify token format
Check Keycloak client configuration
Verify token signature
Check clock synchronization

Permission Denied

Symptoms:

403 Forbidden
"Access denied" error

Solutions:

Verify user role in Keycloak
Check tenant context
Review RBAC policies
Verify resource ownership

Resource Provisioning

VM Creation Fails

Symptoms:

Resource stuck in PENDING
Proxmox errors
Crossplane errors

Diagnosis:

# Check Crossplane provider
kubectl get pods -n crossplane-system | grep proxmox

# Check ProxmoxVM resource
kubectl describe proxmoxvm -n default test-vm

# Check Proxmox connectivity
kubectl exec -it -n crossplane-system deployment/crossplane-provider-proxmox -- \
  curl https://proxmox-endpoint:8006/api2/json/version

Solutions:

Verify Proxmox credentials
Check Proxmox node availability
Verify resource quotas
Check Crossplane provider logs

Resource Update Fails

Symptoms:

Update mutation fails
Resource not updating
Status mismatch

Solutions:

Check resource state
Verify update permissions
Review resource constraints
Check for conflicting updates

Billing Issues

Incorrect Costs

Symptoms:

Unexpected charges
Missing usage records
Cost discrepancies

Diagnosis:

-- Check usage records
SELECT * FROM usage_records
WHERE tenant_id = 'tenant-id'
ORDER BY timestamp DESC
LIMIT 100;

-- Check billing calculations
SELECT * FROM invoices
WHERE tenant_id = 'tenant-id'
ORDER BY created_at DESC;

Solutions:

Review usage records
Verify pricing configuration
Check for duplicate records
Recalculate costs if needed

Budget Alerts Not Triggering

Symptoms:

Budget exceeded but no alert
Alerts not sent

Diagnosis:

-- Check budget status
SELECT * FROM budgets
WHERE tenant_id = 'tenant-id';

-- Check alert configuration
SELECT * FROM billing_alerts
WHERE tenant_id = 'tenant-id' AND enabled = true;

Solutions:

Verify alert configuration
Check alert evaluation schedule
Review notification channels
Test alert manually

Invoice Generation Fails

Symptoms:

Invoice creation error
Missing line items
PDF generation fails

Solutions:

Check usage records exist
Verify billing period
Check PDF service
Review invoice template

Performance Issues

High Latency

Symptoms:

Slow API responses
Timeout errors
High P95 latency

Diagnosis:

# Check API metrics
curl https://api.sankofa.nexus/metrics | grep request_duration

# Check database performance
kubectl exec -it -n api deployment/postgres -- \
  psql -U sankofa -c "SELECT * FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10"

Solutions:

Add caching layer
Optimize database queries
Scale API horizontally
Review N+1 query problems

High Memory Usage

Symptoms:

OOM kills
Pod restarts
Memory warnings

Solutions:

Increase memory limits
Review memory leaks
Optimize data structures
Implement pagination

High CPU Usage

Symptoms:

Slow responses
CPU throttling
Pod evictions

Solutions:

Scale horizontally
Optimize algorithms
Add caching
Review expensive operations

Deployment Issues

Pods Not Starting

Symptoms:

Pods in Pending/CrashLoopBackOff
Image pull errors
Init container failures

Diagnosis:

# Check pod status
kubectl describe pod -n api <pod-name>

# Check events
kubectl get events -n api --sort-by='.lastTimestamp'

# Check logs
kubectl logs -n api <pod-name>

Solutions:

Check image availability
Verify resource requests/limits
Check node resources
Review init container logs

Service Not Accessible

Symptoms:

Service unreachable
DNS resolution fails
Ingress errors

Diagnosis:

# Check service
kubectl get svc -n api

# Check ingress
kubectl describe ingress -n api api

# Test service directly
kubectl port-forward -n api svc/api 8080:80
curl http://localhost:8080/health

Solutions:

Verify service selector matches pods
Check ingress configuration
Verify DNS records
Check network policies

Configuration Issues

Symptoms:

Wrong environment variables
Missing secrets
ConfigMap errors

Solutions:

Verify environment variables:

kubectl exec -n api deployment/api -- env | grep -E "DB_|KEYCLOAK_"

Check secrets:
```
kubectl get secrets -n api
```
Review ConfigMaps:
```
kubectl get configmaps -n api
```

Getting Help

Logs

# API logs
kubectl logs -n api deployment/api --tail=100 -f

# Database logs
kubectl logs -n api deployment/postgres --tail=100

# Keycloak logs
kubectl logs -n keycloak deployment/keycloak --tail=100

# Crossplane logs
kubectl logs -n crossplane-system deployment/crossplane-provider-proxmox --tail=100

Metrics

# Prometheus queries
curl 'https://prometheus.sankofa.nexus/api/v1/query?query=up'

# Grafana dashboards
# Access: https://grafana.sankofa.nexus

Support

Documentation: See docs/ directory
Operations Runbook: docs/OPERATIONS_RUNBOOK.md
API Documentation: docs/API_DOCUMENTATION.md

Common Error Messages

"Database connection failed"

Check database pod status
Verify connection string
Check network policies

"Authentication required"

Verify token in request
Check token expiration
Verify Keycloak is accessible

"Quota exceeded"

Review tenant quotas
Check resource usage
Request quota increase

"Resource not found"

Verify resource ID
Check tenant context
Review access permissions

"Internal server error"

Check application logs
Review error details
Check system resources

9.8 KiB Raw Blame History

Troubleshooting Guide

Table of Contents

API Issues

API Not Responding

GraphQL Query Errors

Rate Limiting

Database Issues

Connection Pool Exhausted

Slow Queries

Database Lock Issues

Authentication Issues

Token Expired

Invalid Token

Permission Denied

Resource Provisioning

VM Creation Fails

Resource Update Fails

Billing Issues

Incorrect Costs

Budget Alerts Not Triggering

Invoice Generation Fails

Performance Issues

High Latency

High Memory Usage

High CPU Usage

Deployment Issues

Pods Not Starting

Service Not Accessible

Configuration Issues

Getting Help

Logs

Metrics

Support

Common Error Messages

"Database connection failed"

"Authentication required"

"Quota exceeded"

"Resource not found"

"Internal server error"

9.8 KiB

Raw Blame History