Apply Composer changes: comprehensive API updates, migrations, middleware, and infrastructure improvements
- Add comprehensive database migrations (001-024) for schema evolution - Enhance API schema with expanded type definitions and resolvers - Add new middleware: audit logging, rate limiting, MFA enforcement, security, tenant auth - Implement new services: AI optimization, billing, blockchain, compliance, marketplace - Add adapter layer for cloud integrations (Cloudflare, Kubernetes, Proxmox, storage) - Update Crossplane provider with enhanced VM management capabilities - Add comprehensive test suite for API endpoints and services - Update frontend components with improved GraphQL subscriptions and real-time updates - Enhance security configurations and headers (CSP, CORS, etc.) - Update documentation and configuration files - Add new CI/CD workflows and validation scripts - Implement design system improvements and UI enhancements
This commit is contained in:
240
infrastructure/monitoring/README.md
Normal file
240
infrastructure/monitoring/README.md
Normal file
@@ -0,0 +1,240 @@
|
||||
# Infrastructure Monitoring
|
||||
|
||||
Comprehensive monitoring solutions for all infrastructure components in Sankofa Phoenix.
|
||||
|
||||
## Overview
|
||||
|
||||
This directory contains monitoring components including custom Prometheus exporters, Grafana dashboards, and alerting rules for infrastructure monitoring.
|
||||
|
||||
## Components
|
||||
|
||||
### Exporters (`exporters/`)
|
||||
|
||||
Custom Prometheus exporters for:
|
||||
- Proxmox VE metrics
|
||||
- TP-Link Omada metrics
|
||||
- Network switch/router metrics
|
||||
- Infrastructure health checks
|
||||
|
||||
### Dashboards (`dashboards/`)
|
||||
|
||||
Grafana dashboards for:
|
||||
- Infrastructure overview
|
||||
- Proxmox cluster health
|
||||
- Network performance
|
||||
- Omada controller status
|
||||
- Site-level monitoring
|
||||
|
||||
## Exporters
|
||||
|
||||
### Proxmox Exporter
|
||||
|
||||
The Proxmox exporter (`pve_exporter`) provides metrics for:
|
||||
- VM status and resource usage
|
||||
- Node health and performance
|
||||
- Storage pool utilization
|
||||
- Network interface statistics
|
||||
- Cluster status
|
||||
|
||||
**Installation:**
|
||||
```bash
|
||||
pip install pve_exporter
|
||||
```
|
||||
|
||||
**Configuration:**
|
||||
```yaml
|
||||
exporter:
|
||||
listen_address: 0.0.0.0:9221
|
||||
proxmox:
|
||||
endpoint: https://pve1.sankofa.nexus:8006
|
||||
username: monitoring@pam
|
||||
password: ${PROXMOX_PASSWORD}
|
||||
```
|
||||
|
||||
### Omada Exporter
|
||||
|
||||
Custom exporter for TP-Link Omada Controller metrics:
|
||||
- Access point status
|
||||
- Client device counts
|
||||
- Network throughput
|
||||
- Controller health
|
||||
|
||||
**See**: `exporters/omada_exporter/` for implementation
|
||||
|
||||
### Network Exporter
|
||||
|
||||
SNMP-based exporter for network devices:
|
||||
- Switch port statistics
|
||||
- Router interface metrics
|
||||
- VLAN utilization
|
||||
- Network topology changes
|
||||
|
||||
**See**: `exporters/network_exporter/` for implementation
|
||||
|
||||
## Dashboards
|
||||
|
||||
### Infrastructure Overview
|
||||
|
||||
Comprehensive dashboard showing:
|
||||
- All sites status
|
||||
- Resource utilization
|
||||
- Health scores
|
||||
- Alert summary
|
||||
|
||||
**Location**: `dashboards/infrastructure-overview.json`
|
||||
|
||||
### Proxmox Cluster
|
||||
|
||||
Dashboard for Proxmox clusters:
|
||||
- Cluster health
|
||||
- Node performance
|
||||
- VM resource usage
|
||||
- Storage utilization
|
||||
|
||||
**Location**: `dashboards/proxmox-cluster.json`
|
||||
|
||||
### Network Performance
|
||||
|
||||
Network performance dashboard:
|
||||
- Bandwidth utilization
|
||||
- Latency metrics
|
||||
- Error rates
|
||||
- Top talkers
|
||||
|
||||
**Location**: `dashboards/network-performance.json`
|
||||
|
||||
### Omada Controller
|
||||
|
||||
Omada-specific dashboard:
|
||||
- Controller status
|
||||
- Access point health
|
||||
- Client statistics
|
||||
- Network policies
|
||||
|
||||
**Location**: `dashboards/omada-controller.json`
|
||||
|
||||
## Installation
|
||||
|
||||
### Deploy Exporters
|
||||
|
||||
```bash
|
||||
# Deploy all exporters
|
||||
kubectl apply -f exporters/manifests/
|
||||
|
||||
# Or deploy individually
|
||||
kubectl apply -f exporters/manifests/proxmox-exporter.yaml
|
||||
kubectl apply -f exporters/manifests/omada-exporter.yaml
|
||||
```
|
||||
|
||||
### Import Dashboards
|
||||
|
||||
```bash
|
||||
# Import all dashboards to Grafana
|
||||
./scripts/import-dashboards.sh
|
||||
|
||||
# Or import individually
|
||||
grafana-cli admin import-dashboard dashboards/infrastructure-overview.json
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Prometheus Scrape Configuration
|
||||
|
||||
```yaml
|
||||
scrape_configs:
|
||||
- job_name: 'proxmox'
|
||||
static_configs:
|
||||
- targets:
|
||||
- 'pve-exporter.monitoring.svc.cluster.local:9221'
|
||||
|
||||
- job_name: 'omada'
|
||||
static_configs:
|
||||
- targets:
|
||||
- 'omada-exporter.monitoring.svc.cluster.local:9222'
|
||||
|
||||
- job_name: 'network'
|
||||
static_configs:
|
||||
- targets:
|
||||
- 'network-exporter.monitoring.svc.cluster.local:9223'
|
||||
```
|
||||
|
||||
### Alerting Rules
|
||||
|
||||
Alert rules are defined in `exporters/alert-rules/`:
|
||||
|
||||
- `proxmox-alerts.yaml`: Proxmox cluster alerts
|
||||
- `omada-alerts.yaml`: Omada controller alerts
|
||||
- `network-alerts.yaml`: Network infrastructure alerts
|
||||
|
||||
## Metrics
|
||||
|
||||
### Proxmox Metrics
|
||||
|
||||
- `pve_node_status`: Node status (0=offline, 1=online)
|
||||
- `pve_vm_status`: VM status
|
||||
- `pve_storage_used_bytes`: Storage usage
|
||||
- `pve_network_rx_bytes`: Network receive bytes
|
||||
- `pve_network_tx_bytes`: Network transmit bytes
|
||||
|
||||
### Omada Metrics
|
||||
|
||||
- `omada_ap_status`: Access point status
|
||||
- `omada_clients_total`: Total client count
|
||||
- `omada_throughput_bytes`: Network throughput
|
||||
- `omada_controller_status`: Controller health
|
||||
|
||||
### Network Metrics
|
||||
|
||||
- `network_port_status`: Switch port status
|
||||
- `network_port_rx_bytes`: Port receive bytes
|
||||
- `network_port_tx_bytes`: Port transmit bytes
|
||||
- `network_vlan_utilization`: VLAN utilization
|
||||
|
||||
## Alerts
|
||||
|
||||
### Critical Alerts
|
||||
|
||||
- Proxmox cluster node down
|
||||
- Omada controller unreachable
|
||||
- Network switch offline
|
||||
- High resource utilization (>90%)
|
||||
|
||||
### Warning Alerts
|
||||
|
||||
- High resource utilization (>80%)
|
||||
- Network latency spikes
|
||||
- Access point offline
|
||||
- Storage pool >80% full
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Exporter Issues
|
||||
|
||||
```bash
|
||||
# Check exporter status
|
||||
kubectl get pods -n monitoring -l app=proxmox-exporter
|
||||
|
||||
# View exporter logs
|
||||
kubectl logs -n monitoring -l app=proxmox-exporter
|
||||
|
||||
# Test exporter endpoint
|
||||
curl http://proxmox-exporter.monitoring.svc.cluster.local:9221/metrics
|
||||
```
|
||||
|
||||
### Dashboard Issues
|
||||
|
||||
```bash
|
||||
# Verify dashboard import
|
||||
grafana-cli admin ls-dashboard
|
||||
|
||||
# Check dashboard data sources
|
||||
# In Grafana UI: Configuration > Data Sources
|
||||
```
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Proxmox Management](../proxmox/README.md)
|
||||
- [Omada Management](../omada/README.md)
|
||||
- [Network Management](../network/README.md)
|
||||
- [Infrastructure Management](../README.md)
|
||||
|
||||
85
infrastructure/monitoring/dashboards/proxmox-cluster.json
Normal file
85
infrastructure/monitoring/dashboards/proxmox-cluster.json
Normal file
@@ -0,0 +1,85 @@
|
||||
{
|
||||
"dashboard": {
|
||||
"title": "Proxmox Cluster Overview",
|
||||
"tags": ["proxmox", "infrastructure"],
|
||||
"timezone": "browser",
|
||||
"schemaVersion": 16,
|
||||
"version": 1,
|
||||
"refresh": "30s",
|
||||
"panels": [
|
||||
{
|
||||
"id": 1,
|
||||
"title": "Cluster Nodes Status",
|
||||
"type": "stat",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "up{job=\"pve_exporter\"}",
|
||||
"legendFormat": "{{instance}}"
|
||||
}
|
||||
],
|
||||
"gridPos": {"h": 4, "w": 6, "x": 0, "y": 0}
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"title": "Total VMs",
|
||||
"type": "stat",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "count(pve_vm_info)",
|
||||
"legendFormat": "VMs"
|
||||
}
|
||||
],
|
||||
"gridPos": {"h": 4, "w": 6, "x": 6, "y": 0}
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"title": "Running VMs",
|
||||
"type": "stat",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "count(pve_vm_info{status=\"running\"})",
|
||||
"legendFormat": "Running"
|
||||
}
|
||||
],
|
||||
"gridPos": {"h": 4, "w": 6, "x": 12, "y": 0}
|
||||
},
|
||||
{
|
||||
"id": 4,
|
||||
"title": "CPU Usage by Node",
|
||||
"type": "graph",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "pve_node_cpu_usage",
|
||||
"legendFormat": "{{node}}"
|
||||
}
|
||||
],
|
||||
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 4}
|
||||
},
|
||||
{
|
||||
"id": 5,
|
||||
"title": "Memory Usage by Node",
|
||||
"type": "graph",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "pve_node_memory_usage",
|
||||
"legendFormat": "{{node}}"
|
||||
}
|
||||
],
|
||||
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 4}
|
||||
},
|
||||
{
|
||||
"id": 6,
|
||||
"title": "Storage Usage",
|
||||
"type": "graph",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "pve_storage_usage",
|
||||
"legendFormat": "{{storage}}"
|
||||
}
|
||||
],
|
||||
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 12}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
131
infrastructure/monitoring/dashboards/proxmox-node.json
Normal file
131
infrastructure/monitoring/dashboards/proxmox-node.json
Normal file
@@ -0,0 +1,131 @@
|
||||
{
|
||||
"dashboard": {
|
||||
"title": "Proxmox Node Details",
|
||||
"tags": ["proxmox", "node", "infrastructure"],
|
||||
"timezone": "browser",
|
||||
"schemaVersion": 16,
|
||||
"version": 1,
|
||||
"refresh": "30s",
|
||||
"panels": [
|
||||
{
|
||||
"id": 1,
|
||||
"title": "Node Status",
|
||||
"type": "stat",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "up{job=\"pve_exporter\",instance=~\"$node\"}",
|
||||
"legendFormat": "{{instance}}"
|
||||
}
|
||||
],
|
||||
"gridPos": {"h": 4, "w": 6, "x": 0, "y": 0}
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"title": "CPU Usage",
|
||||
"type": "gauge",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "pve_node_cpu_usage{node=~\"$node\"}",
|
||||
"legendFormat": "{{node}}"
|
||||
}
|
||||
],
|
||||
"gridPos": {"h": 4, "w": 6, "x": 6, "y": 0}
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"title": "Memory Usage",
|
||||
"type": "gauge",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "pve_node_memory_usage{node=~\"$node\"}",
|
||||
"legendFormat": "{{node}}"
|
||||
}
|
||||
],
|
||||
"gridPos": {"h": 4, "w": 6, "x": 12, "y": 0}
|
||||
},
|
||||
{
|
||||
"id": 4,
|
||||
"title": "CPU Usage Over Time",
|
||||
"type": "graph",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "pve_node_cpu_usage{node=~\"$node\"}",
|
||||
"legendFormat": "{{node}}"
|
||||
}
|
||||
],
|
||||
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 4}
|
||||
},
|
||||
{
|
||||
"id": 5,
|
||||
"title": "Memory Usage Over Time",
|
||||
"type": "graph",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "pve_node_memory_usage{node=~\"$node\"}",
|
||||
"legendFormat": "{{node}}"
|
||||
}
|
||||
],
|
||||
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 4}
|
||||
},
|
||||
{
|
||||
"id": 6,
|
||||
"title": "Storage Usage by Pool",
|
||||
"type": "graph",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "pve_storage_usage{node=~\"$node\"}",
|
||||
"legendFormat": "{{storage}}"
|
||||
}
|
||||
],
|
||||
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 12}
|
||||
},
|
||||
{
|
||||
"id": 7,
|
||||
"title": "Network I/O",
|
||||
"type": "graph",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "pve_node_net_in{node=~\"$node\"}",
|
||||
"legendFormat": "{{node}} - In"
|
||||
},
|
||||
{
|
||||
"expr": "pve_node_net_out{node=~\"$node\"}",
|
||||
"legendFormat": "{{node}} - Out"
|
||||
}
|
||||
],
|
||||
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 12}
|
||||
},
|
||||
{
|
||||
"id": 8,
|
||||
"title": "Disk I/O",
|
||||
"type": "graph",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "pve_node_disk_read{node=~\"$node\"}",
|
||||
"legendFormat": "{{node}} - Read"
|
||||
},
|
||||
{
|
||||
"expr": "pve_node_disk_write{node=~\"$node\"}",
|
||||
"legendFormat": "{{node}} - Write"
|
||||
}
|
||||
],
|
||||
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 20}
|
||||
}
|
||||
],
|
||||
"templating": {
|
||||
"list": [
|
||||
{
|
||||
"name": "node",
|
||||
"type": "query",
|
||||
"query": "label_values(pve_node_info, node)",
|
||||
"current": {
|
||||
"text": "All",
|
||||
"value": "$__all"
|
||||
},
|
||||
"options": []
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
82
infrastructure/monitoring/dashboards/proxmox-vms.json
Normal file
82
infrastructure/monitoring/dashboards/proxmox-vms.json
Normal file
@@ -0,0 +1,82 @@
|
||||
{
|
||||
"dashboard": {
|
||||
"title": "Proxmox VMs",
|
||||
"tags": ["proxmox", "vms"],
|
||||
"timezone": "browser",
|
||||
"schemaVersion": 16,
|
||||
"version": 1,
|
||||
"refresh": "30s",
|
||||
"panels": [
|
||||
{
|
||||
"id": 1,
|
||||
"title": "VM CPU Usage",
|
||||
"type": "graph",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "pve_vm_cpu_usage",
|
||||
"legendFormat": "{{name}} ({{vmid}})"
|
||||
}
|
||||
],
|
||||
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 0}
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"title": "VM Memory Usage",
|
||||
"type": "graph",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "pve_vm_memory_usage",
|
||||
"legendFormat": "{{name}} ({{vmid}})"
|
||||
}
|
||||
],
|
||||
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 0}
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"title": "VM Network I/O",
|
||||
"type": "graph",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "pve_vm_net_in",
|
||||
"legendFormat": "{{name}} - In"
|
||||
},
|
||||
{
|
||||
"expr": "pve_vm_net_out",
|
||||
"legendFormat": "{{name}} - Out"
|
||||
}
|
||||
],
|
||||
"gridPos": {"h": 8, "w": 12, "x": 0, "y": 8}
|
||||
},
|
||||
{
|
||||
"id": 4,
|
||||
"title": "VM Disk I/O",
|
||||
"type": "graph",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "pve_vm_disk_read",
|
||||
"legendFormat": "{{name}} - Read"
|
||||
},
|
||||
{
|
||||
"expr": "pve_vm_disk_write",
|
||||
"legendFormat": "{{name}} - Write"
|
||||
}
|
||||
],
|
||||
"gridPos": {"h": 8, "w": 12, "x": 12, "y": 8}
|
||||
},
|
||||
{
|
||||
"id": 5,
|
||||
"title": "VM Status",
|
||||
"type": "table",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "pve_vm_info",
|
||||
"format": "table",
|
||||
"instant": true
|
||||
}
|
||||
],
|
||||
"gridPos": {"h": 8, "w": 24, "x": 0, "y": 16}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user