feat(it-ops): live inventory, drift API, Keycloak IT role, portal sync hint

- Add scripts/it-ops (Proxmox collector, IPAM drift, export orchestrator)
- Add sankofa-it-read-api stub with optional CORS and refresh
- Add systemd examples for read API, weekly inventory export, timer
- Add live-inventory-drift GitHub workflow (dispatch + weekly)
- Add IT controller spec, runbooks, Keycloak ensure-it-admin-role script
- Note IT_READ_API env on portal sync completion output

Made-with: Cursor
This commit is contained in:
defiQUG
2026-04-09 01:20:00 -07:00
parent 4eead3e53f
commit 61841b8291
14 changed files with 1384 additions and 0 deletions

View File

@@ -0,0 +1,48 @@
# IT operations UI — Keycloak and Sankofa portal next steps
**Purpose:** Close the gap between Phase 0 (live inventory scripts + read API) and the full **Sankofa admin** IT controller described in [SANKOFA_IT_OPERATIONS_CONTROLLER_SPEC.md](../02-architecture/SANKOFA_IT_OPERATIONS_CONTROLLER_SPEC.md).
---
## 1. Keycloak
1. Create realm role **`sankofa-it-admin`** (idempotent): `bash scripts/deployment/keycloak-sankofa-ensure-it-admin-role.sh` (needs `KEYCLOAK_ADMIN_PASSWORD` in repo `.env`, SSH to Proxmox, CT 7802). Then assign the role to IT staff in the Keycloak Admin Console (or use a group + token mapper if you prefer group claims).
2. Map **only** platform IT staff; require **MFA** at realm or IdP policy.
3. **Do not** reuse client-admin groups used for `admin.sankofa.nexus` tenant administration unless policy explicitly allows.
4. Optional: client scope **it-ops** with claim `it_admin=true` for the IT BFF audience.
**Reference:** Keycloak CT / VMID in [ALL_VMIDS_ENDPOINTS.md](../04-configuration/ALL_VMIDS_ENDPOINTS.md); portal login runbook `scripts/deployment/enable-sankofa-portal-login-7801.sh`.
---
## 2. Sankofa portal (`Sankofa/portal` repo)
1. **Implemented:** protected route **`/it`** (`src/app/it/page.tsx`) gated by **`sankofa-it-admin`** / **`ADMIN`** (credentials bootstrap). API proxies: `GET /api/it/drift`, `GET /api/it/inventory`, `POST /api/it/refresh`.
2. **Configure on CT 7801:** **`IT_READ_API_URL`** (e.g. `http://192.168.11.<host>:8787`) and optional **`IT_READ_API_KEY`** (server-only; never `NEXT_PUBLIC_*`). Proxies to the read API on VLAN 11.
3. **Do not** expose `IT_READ_API_KEY` or Proxmox credentials to the browser bundle.
4. Display **`collected_at`** from JSON; show a stale warning if older than your SLO (e.g. 24h).
**Deploy:** `scripts/deployment/sync-sankofa-portal-7801.sh` after portal changes.
---
## 3. NPM
Add an **internal** proxy host (optional TLS) from a hostname such as `it-api.sankofa.nexus` (LAN-only DNS) to **`127.0.0.1:8787`** on the host running the read API, **or** bind the service on a dedicated CT IP and point NPM at that upstream.
---
## 4. Full BFF (later)
Replace `services/sankofa-it-read-api/server.py` with a service that:
- Validates **OIDC** (Keycloak) JWTs.
- Stores **audit** rows for refresh and future writes.
- Adds **UniFi** and **NPM** collectors with `collected_at` per domain.
---
## Related
- [SANKOFA_IT_OPS_LIVE_INVENTORY_SCRIPTS.md](SANKOFA_IT_OPS_LIVE_INVENTORY_SCRIPTS.md)
- [SANKOFA_MARKETPLACE_SURFACES.md](SANKOFA_MARKETPLACE_SURFACES.md) (native vs partner; catalog alignment)

View File

@@ -0,0 +1,368 @@
# IT ops Phase 0 — live inventory scripts (implementation appendix)
**Purpose:** Canonical copy of Phase 0 scripts (also on disk under `scripts/it-ops/`). Use this page if you need to restore or review inline.
**Spec:** [SANKOFA_IT_OPERATIONS_CONTROLLER_SPEC.md](../02-architecture/SANKOFA_IT_OPERATIONS_CONTROLLER_SPEC.md) section 5.1 and Phase 0.
## File layout
| Path | Role |
|------|------|
| `scripts/it-ops/lib/collect_inventory_remote.py` | Run on PVE via SSH stdin (`python3 -`) |
| `scripts/it-ops/compute_ipam_drift.py` | Local: merge live JSON + `config/ip-addresses.conf` + **`ALL_VMIDS_ENDPOINTS.md`** pipe tables (`--all-vmids-md`) |
| `scripts/it-ops/export-live-inventory-and-drift.sh` | Orchestrator: ping seed, SSH, write `reports/status/` |
| `services/sankofa-it-read-api/server.py` | Read-only HTTP: `/v1/inventory/live`, `/v1/inventory/drift` |
| `.github/workflows/live-inventory-drift.yml` | `workflow_dispatch` + weekly (graceful skip without LAN) |
**Exit codes (`compute_ipam_drift.py`):** **2** = duplicate guest IP; **0** otherwise. **`vmid_ip_mismatch_live_vs_all_vmids_doc`** in `drift.json` is informational (docs often lag live CT config).
---
## `scripts/it-ops/lib/collect_inventory_remote.py`
```python
#!/usr/bin/env python3
"""Run ON a Proxmox cluster node (as root). Stdout: JSON live guest inventory."""
from __future__ import annotations
import json
import re
import subprocess
import sys
from datetime import datetime, timezone
def _run(cmd: list[str]) -> str:
return subprocess.check_output(cmd, text=True, stderr=subprocess.DEVNULL)
def _extract_ip_from_net_line(line: str) -> str | None:
m = re.search(r"ip=([0-9.]+)", line)
return m.group(1) if m else None
def _read_config(path: str) -> str:
try:
with open(path, encoding="utf-8", errors="replace") as f:
return f.read()
except OSError:
return ""
def main() -> None:
collected_at = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
try:
raw = _run(
["pvesh", "get", "/cluster/resources", "--output-format", "json"]
)
resources = json.loads(raw)
except (subprocess.CalledProcessError, json.JSONDecodeError) as e:
json.dump(
{
"collected_at": collected_at,
"error": f"pvesh_cluster_resources_failed: {e}",
"guests": [],
},
sys.stdout,
indent=2,
)
return
guests: list[dict] = []
for r in resources:
t = r.get("type")
if t not in ("lxc", "qemu"):
continue
vmid = r.get("vmid")
node = r.get("node")
if vmid is None or not node:
continue
vmid_s = str(vmid)
name = r.get("name") or ""
status = r.get("status") or ""
if t == "lxc":
cfg_path = f"/etc/pve/nodes/{node}/lxc/{vmid_s}.conf"
else:
cfg_path = f"/etc/pve/nodes/{node}/qemu-server/{vmid_s}.conf"
body = _read_config(cfg_path)
ip = ""
for line in body.splitlines():
if line.startswith("net0:"):
got = _extract_ip_from_net_line(line)
if got:
ip = got
break
if not ip and t == "qemu":
for line in body.splitlines():
if line.startswith("ipconfig0:"):
got = _extract_ip_from_net_line(line)
if got:
ip = got
break
if not ip and t == "qemu":
for line in body.splitlines():
if line.startswith("net0:"):
got = _extract_ip_from_net_line(line)
if got:
ip = got
break
guests.append(
{
"vmid": vmid_s,
"type": t,
"node": str(node),
"name": name,
"status": status,
"ip": ip,
"config_path": cfg_path,
}
)
out = {
"collected_at": collected_at,
"guests": sorted(guests, key=lambda g: int(g["vmid"])),
}
json.dump(out, sys.stdout, indent=2)
if __name__ == "__main__":
main()
```
---
## `scripts/it-ops/compute_ipam_drift.py`
```python
#!/usr/bin/env python3
"""Merge live JSON with config/ip-addresses.conf; write live_inventory.json + drift.json."""
from __future__ import annotations
import argparse
import json
import re
import sys
from pathlib import Path
IPV4_RE = re.compile(
r"(?<![0-9.])(?:[0-9]{1,3}\.){3}[0-9]{1,3}(?![0-9.])"
)
def parse_ip_addresses_conf(path: Path) -> tuple[dict[str, str], set[str]]:
var_map: dict[str, str] = {}
all_ips: set[str] = set()
if not path.is_file():
return var_map, all_ips
for line in path.read_text(encoding="utf-8", errors="replace").splitlines():
s = line.strip()
if not s or s.startswith("#") or "=" not in s:
continue
key, _, val = s.partition("=")
key = key.strip()
val = val.strip()
if val.startswith('"') and val.endswith('"'):
val = val[1:-1]
elif val.startswith("'") and val.endswith("'"):
val = val[1:-1]
var_map[key] = val
for m in IPV4_RE.findall(val):
all_ips.add(m)
return var_map, all_ips
def hypervisor_related_keys(var_map: dict[str, str]) -> set[str]:
keys = set()
for k in var_map:
ku = k.upper()
if any(
x in ku
for x in (
"PROXMOX_HOST",
"PROXMOX_ML110",
"PROXMOX_R630",
"PROXMOX_R750",
"WAN_AGGREGATOR",
"NETWORK_GATEWAY",
"UDM_PRO",
"PUBLIC_IP_GATEWAY",
"PUBLIC_IP_ER605",
)
):
keys.add(k)
return keys
def main() -> None:
ap = argparse.ArgumentParser()
ap.add_argument("--live", type=Path, help="live JSON file (default stdin)")
ap.add_argument("--ip-conf", type=Path, default=Path("config/ip-addresses.conf"))
ap.add_argument("--out-dir", type=Path, required=True)
args = ap.parse_args()
live_raw = args.live.read_text(encoding="utf-8") if args.live else sys.stdin.read()
live = json.loads(live_raw)
guests = live.get("guests") or []
var_map, conf_ips = parse_ip_addresses_conf(args.ip_conf)
hyp_keys = hypervisor_related_keys(var_map)
hyp_ips: set[str] = set()
for k in hyp_keys:
if k not in var_map:
continue
for m in IPV4_RE.findall(var_map[k]):
hyp_ips.add(m)
ip_to_vmids: dict[str, list[str]] = {}
for g in guests:
ip = (g.get("ip") or "").strip()
if not ip:
continue
ip_to_vmids.setdefault(ip, []).append(g.get("vmid", "?"))
duplicate_ips = {ip: vms for ip, vms in ip_to_vmids.items() if len(vms) > 1}
guest_ip_set = set(ip_to_vmids.keys())
conf_only = sorted(conf_ips - guest_ip_set - hyp_ips)
live_only = sorted(guest_ip_set - conf_ips)
drift = {
"collected_at": live.get("collected_at"),
"guest_count": len(guests),
"duplicate_ips": duplicate_ips,
"guest_ips_not_in_ip_addresses_conf": live_only,
"ip_addresses_conf_ips_not_on_guests": conf_only,
"hypervisor_and_infra_ips_excluded_from_guest_match": sorted(hyp_ips),
"notes": [],
}
if live.get("error"):
drift["notes"].append(live["error"])
inv_out = {
"collected_at": live.get("collected_at"),
"source": "proxmox_cluster_pvesh_plus_config",
"guests": guests,
}
args.out_dir.mkdir(parents=True, exist_ok=True)
(args.out_dir / "live_inventory.json").write_text(
json.dumps(inv_out, indent=2), encoding="utf-8"
)
(args.out_dir / "drift.json").write_text(
json.dumps(drift, indent=2), encoding="utf-8"
)
print(f"Wrote {args.out_dir / 'live_inventory.json'}")
print(f"Wrote {args.out_dir / 'drift.json'}")
sys.exit(2 if duplicate_ips else 0)
if __name__ == "__main__":
main()
```
---
## `scripts/it-ops/export-live-inventory-and-drift.sh`
```bash
#!/usr/bin/env bash
# Live Proxmox guest inventory + drift vs config/ip-addresses.conf.
# Usage: bash scripts/it-ops/export-live-inventory-and-drift.sh
# Requires: SSH key root@SEED, python3 locally and on PVE.
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
# shellcheck source=/dev/null
source "${PROJECT_ROOT}/config/ip-addresses.conf" 2>/dev/null || true
SEED="${SEED_HOST:-${PROXMOX_HOST_R630_01:-192.168.11.11}}"
OUT_DIR="${OUT_DIR:-${PROJECT_ROOT}/reports/status}"
TS="$(date +%Y%m%d_%H%M%S)"
TMP="${TMPDIR:-/tmp}/live_inv_${TS}.json"
PY="${SCRIPT_DIR}/lib/collect_inventory_remote.py"
mkdir -p "$OUT_DIR"
stub_unreachable() {
python3 - <<PY
import json, datetime
print(json.dumps({
"collected_at": datetime.datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%SZ"),
"error": "seed_unreachable",
"guests": [],
}, indent=2))
PY
}
if ! ping -c1 -W2 "$SEED" >/dev/null 2>&1; then
stub_unreachable >"$TMP"
else
if ! ssh -o BatchMode=yes -o ConnectTimeout=15 -o StrictHostKeyChecking=no \
"root@${SEED}" "python3 -" <"$PY" >"$TMP" 2>/dev/null; then
stub_unreachable >"$TMP"
fi
fi
set +e
python3 "${SCRIPT_DIR}/compute_ipam_drift.py" --live "$TMP" \
--ip-conf "${PROJECT_ROOT}/config/ip-addresses.conf" --out-dir "$OUT_DIR"
DRIFT_RC=$?
set -e
cp -f "$OUT_DIR/live_inventory.json" "${OUT_DIR}/live_inventory_${TS}.json" 2>/dev/null || true
cp -f "$OUT_DIR/drift.json" "${OUT_DIR}/drift_${TS}.json" 2>/dev/null || true
rm -f "$TMP"
echo "Latest: ${OUT_DIR}/live_inventory.json , ${OUT_DIR}/drift.json"
# Exit 2 when duplicate_ips present (for CI).
exit "${DRIFT_RC}"
```
After creating files: `chmod +x scripts/it-ops/export-live-inventory-and-drift.sh scripts/it-ops/compute_ipam_drift.py`
---
## `.github/workflows/live-inventory-drift.yml`
```yaml
name: Live inventory and IPAM drift
on:
workflow_dispatch:
schedule:
- cron: '25 6 * * 1'
jobs:
drift:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Export live inventory (LAN optional)
run: |
set +e
bash scripts/it-ops/export-live-inventory-and-drift.sh
echo "exit=$?"
continue-on-error: true
- name: Upload artifacts
uses: actions/upload-artifact@v4
if: always()
with:
name: live-inventory-drift
path: |
reports/status/live_inventory.json
reports/status/drift.json
```
**Note:** On GitHub-hosted runners the collector usually writes `seed_unreachable`; use a **self-hosted LAN runner** for real data, or run the shell script on the operator workstation.
---
## `AGENTS.md` row (Quick pointers table)
Add:
`| IT live inventory + drift (LAN) | `bash scripts/it-ops/export-live-inventory-and-drift.sh` → `reports/status/live_inventory.json`, `drift.json` — see [docs/03-deployment/SANKOFA_IT_OPS_LIVE_INVENTORY_SCRIPTS.md](docs/03-deployment/SANKOFA_IT_OPS_LIVE_INVENTORY_SCRIPTS.md) |`
---
## `docs/MASTER_INDEX.md`
Add a row pointing to this deployment appendix and the updated spec.