Sync workspace: config, docs, scripts, CI, operator rules, and submodule pointers.

- Update dbis_core, cross-chain-pmm-lps, explorer-monorepo, metamask-integration, pr-workspace/chains - Omit embedded publish git dirs and empty placeholders from index Made-with: Cursor
2026-04-12 06:12:20 -07:00
parent 6fb6bd3993
commit dbd517b279
2935 changed files with 327972 additions and 5533 deletions
--- a/scripts/maintenance/README.md
+++ b/scripts/maintenance/README.md
@@ -2,17 +2,17 @@

 **health-check-rpc-2101.sh** — Health check for Besu RPC on VMID 2101: container status, besu-rpc service, port 8545, eth_chainId, eth_blockNumber. Run from project root (LAN). See docs/09-troubleshooting/RPC_NODES_BLOCK_PRODUCTION_FIX.md.

-**fix-core-rpc-2101.sh** — One-command fix for Core RPC 2101: start CT if stopped, restart Besu, verify RPC. Options: `--dry-run`, `--restart-only`. If Besu fails with JNA/NoClassDefFoundError, run fix-rpc-2101-jna-reinstall.sh first.
+**fix-core-rpc-2101.sh** — One-command fix for Core RPC 2101: start CT if stopped, restart Besu, verify RPC. Options: `--dry-run`, `--apply` (mutations when `PROXMOX_SAFE_DEFAULTS=1`), `--restart-only`. Optional `PROXMOX_OPS_ALLOWED_VMIDS`. If Besu fails with JNA/NoClassDefFoundError, run fix-rpc-2101-jna-reinstall.sh first.

 **fix-rpc-2101-jna-reinstall.sh** — Reinstall Besu in CT 2101 to fix JNA/NoClassDefFoundError; then re-run fix-core-rpc-2101.sh. Use `--dry-run` to print steps only.

 **check-disk-all-vmids.sh** — Check root disk usage in all running containers on ml110, r630-01, r630-02. Use `--csv` for tab-separated output. For prevention and audits.

-**run-all-maintenance-via-proxmox-ssh.sh** — Run all maintenance/fix scripts that use SSH to Proxmox VE (r630-01, ml110, r630-02). **Runs make-rpc-vmids-writable-via-ssh.sh first** (so 2101, 2500-2505 are writable), then resolve-and-fix-all, fix-rpc-2101-jna-reinstall, install-besu-permanent-on-missing-nodes, address-all-remaining-502s; optional E2E with `--e2e`. Use `--no-npm` to skip NPM proxy update, `--dry-run` to print steps only, `--verbose` to show all step output (no stderr hidden). Step 2 (2101 fix) has optional timeout: `STEP2_TIMEOUT=900` (default) or `STEP2_TIMEOUT=0` to disable. Run from project root (LAN).
+**run-all-maintenance-via-proxmox-ssh.sh** — Run all maintenance/fix scripts that use SSH to Proxmox VE (r630-01, ml110, r630-02). **Runs make-rpc-vmids-writable-via-ssh.sh --apply first** (so 2101, 2500-2505 are writable), then resolve-and-fix-all, fix-rpc-2101-jna-reinstall, install-besu-permanent-on-missing-nodes, address-all-remaining-502s; optional E2E with `--e2e`. Use `--no-npm` to skip NPM proxy update, `--dry-run` to print steps only, `--verbose` to show all step output (no stderr hidden). Step 2 (2101 fix) has optional timeout: `STEP2_TIMEOUT=900` (default) or `STEP2_TIMEOUT=0` to disable. Run from project root (LAN).

-**make-rpc-vmids-writable-via-ssh.sh** — SSHs to r630-01 and for each VMID 2101, 2500-2505: stops the CT, runs `e2fsck -f -y` on the rootfs LV, starts the CT. Use before fix-rpc-2101 or install-besu-permanent when CTs are read-only. `--dry-run` to print only. Run from project root (LAN).
+**make-rpc-vmids-writable-via-ssh.sh** — SSHs to r630-01 and for each VMID (default 2101; override with `BESU_WRITABLE_VMIDS`): stops the CT, runs `e2fsck -f -y` on the rootfs LV, starts the CT. Use before fix-rpc-2101 or install-besu-permanent when CTs are read-only. `--dry-run` / `--apply`; with `PROXMOX_SAFE_DEFAULTS=1`, default is dry-run unless `--apply` or `PROXMOX_OPS_APPLY=1`. Optional `PROXMOX_OPS_ALLOWED_VMIDS`. Run from project root (LAN).

-**make-validator-vmids-writable-via-ssh.sh** — SSHs to r630-01 (1000, 1001, 1002) and ml110 (1003, 1004); stops each validator CT, runs `e2fsck -f -y` on rootfs, starts the CT. Fixes "Read-only file system" / JNA crash loop on validators. Then run `fix-all-validators-and-txpool.sh`. See docs/08-monitoring/RPC_AND_VALIDATOR_TESTING_RUNBOOK.md.
+**make-validator-vmids-writable-via-ssh.sh** — SSHs to r630-01 (1000, 1001, 1002) and r630-03 (1003, 1004); stops each validator CT, runs `e2fsck -f -y` on rootfs, starts the CT. Fixes "Read-only file system" / JNA crash loop on validators. Then run `fix-all-validators-and-txpool.sh`. See docs/08-monitoring/RPC_AND_VALIDATOR_TESTING_RUNBOOK.md.

 **Sentries 1500–1502 (r630-01)** — If deploy-besu-node-lists or set-all-besu-max-peers-32 reports Skip/fail or "Read-only file system" for 1500–1502, they have the same read-only root issue. On the host: `pct stop 1500; e2fsck -f -y /dev/pve/vm-1500-disk-0; pct start 1500` (repeat for 1501, 1502). Then re-run deploy and max-peers/restart.

@@ -23,10 +23,18 @@
 **fix-all-502s-comprehensive.sh** — Starts/serves backends for 10130, 10150/10151, 2101, 2500–2505, Cacti (Python stubs if needed). Use `--dry-run` to print actions without SSH. Does not update NPMplus; use `update-npmplus-proxy-hosts-api.sh` from LAN for that.

 **daily-weekly-checks.sh** — Daily (explorer, indexer lag, RPC) and weekly (config API, thin pool, log reminder).  
-**schedule-daily-weekly-cron.sh** — Install cron: daily 08:00, weekly Sun 09:00.
+**schedule-daily-weekly-cron.sh** — Install cron: daily 08:00, weekly Sun 09:00. Run from a persistent host checkout; set `CRON_PROJECT_ROOT=/srv/proxmox` when installing on a Proxmox node.
+
+**ensure-firefly-primary-via-ssh.sh** — SSHs to r630-02 and normalizes `/opt/firefly/docker-compose.yml` on VMID 6200, installs an idempotent helper-backed `firefly.service`, and verifies `/api/v1/status`. It is safe for the current mixed stack where `firefly-core` already exists outside compose while Postgres and IPFS remain compose-managed. Use `--dry-run` to print actions only.
+
+**ensure-fabric-sample-network-via-ssh.sh** — SSHs to r630-02 and ensures VMID 6000 has nested-LXC features, a boot-time `fabric-sample-network.service`, and a queryable `mychannel`. Use `--dry-run` to print actions only.
+
+**ensure-legacy-monitor-networkd-via-ssh.sh** — SSHs to r630-01 and fixes the legacy `3000`-`3003` monitor/RPC-adjacent LXCs so `systemd-networkd` is enabled host-side and started in-guest. This is the safe path for unprivileged guests where `systemctl enable` fails from inside the CT. `--dry-run` / `--apply`; same `PROXMOX_SAFE_DEFAULTS` behavior as other guarded maintenance scripts.

 **check-and-fix-explorer-lag.sh** — Checks RPC vs Blockscout block; if lag > threshold (default 500), runs `fix-explorer-indexer-lag.sh` (restart Blockscout).  
-**schedule-explorer-lag-cron.sh** — Install cron for lag check-and-fix: every 6 hours (0, 6, 12, 18). Log: `logs/explorer-lag-fix.log`. Use `--show` to print the line, `--install` to add to crontab, `--remove` to remove.
+**schedule-explorer-lag-cron.sh** — Install cron for lag check-and-fix: every 6 hours (0, 6, 12, 18). Log: `logs/explorer-lag-fix.log`. Use `--show` to print the line, `--install` to add to crontab, `--remove` to remove. Run from a persistent host checkout; set `CRON_PROJECT_ROOT=/srv/proxmox` when installing on a Proxmox node.
+
+**All schedule-*.sh installers** — Refuse transient roots such as `/tmp/...`. Install from a persistent checkout only.

 ## Optional: Alerting on failures

--- a/scripts/maintenance/apply-peer-plan-fixes.sh
+++ b/scripts/maintenance/apply-peer-plan-fixes.sh
@@ -5,7 +5,7 @@
 # Usage: ./scripts/maintenance/apply-peer-plan-fixes.sh [--deploy-only] [--restart-2101-only]
 #   --deploy-only       Only deploy node lists (no restarts).
 #   --restart-2101-only Only restart VMID 2101 (assumes lists already deployed).
-# Requires: SSH to Proxmox hosts (r630-01, r630-02, ml110). Run from LAN.
+# Requires: SSH to Proxmox hosts (r630-01, r630-02, r630-03). Run from LAN.

 set -euo pipefail

@@ -32,19 +32,19 @@ if [[ "$RESTART_2101_ONLY" != true ]]; then
 fi

 if [[ "$DEPLOY_ONLY" == true ]]; then
-  echo "Done (deploy only). To restart RPC 2101: $PROJECT_ROOT/scripts/maintenance/fix-core-rpc-2101.sh --restart-only"
+  echo "Done (deploy only). To restart RPC 2101: $PROJECT_ROOT/scripts/maintenance/fix-core-rpc-2101.sh --restart-only --apply"
  exit 0
 fi

 echo "--- Restart RPC 2101 to load new node lists ---"
-"$PROJECT_ROOT/scripts/maintenance/fix-core-rpc-2101.sh" --restart-only || { echo "Restart 2101 failed."; exit 1; }
+"$PROJECT_ROOT/scripts/maintenance/fix-core-rpc-2101.sh" --restart-only --apply || { echo "Restart 2101 failed."; exit 1; }
 echo ""

-echo "--- Optional: 2102 and 2201 max-peers=32 ---"
-echo "Repo updated: smom-dbis-138/config/config-rpc-public.toml has max-peers=32."
+echo "--- Optional: 2102 and 2201 max-peers=40 ---"
+echo "Repo and live fleet now use max-peers=40 on the modern RPC tier."
 echo "To apply on nodes (from host with SSH):"
-echo "  - 2102 (ml110): ensure config uses max-peers=32 (e.g. copy from repo config-rpc-core.toml), restart Besu."
-echo "  - 2201 (r630-02): ensure config uses max-peers=32 (e.g. copy from repo config-rpc-public.toml), restart Besu."
+echo "  - 2102 (r630-03): ensure config uses max-peers=40 (e.g. copy from repo config-rpc-core.toml), restart Besu."
+echo "  - 2201 (r630-02): ensure config uses max-peers=40 (e.g. copy from repo config-rpc-public.toml), restart Besu."
 echo "Then re-run: ./scripts/verify/check-rpc-2101-all-peers.sh"
 echo ""
 echo "Done. Verify: ./scripts/verify/verify-rpc-2101-approve-and-sync.sh && ./scripts/verify/check-rpc-2101-all-peers.sh"
--- a/scripts/maintenance/ensure-cacti-web-via-ssh.sh
+++ b/scripts/maintenance/ensure-cacti-web-via-ssh.sh
@@ -0,0 +1,117 @@
+#!/usr/bin/env bash
+# Ensure the public Cacti CTs on r630-02 keep both their nginx landing page and
+# Docker-backed Hyperledger Cacti API healthy.
+#
+# Expected runtime:
+# - VMID 5201 / 5202: nginx on :80 for the public landing page
+# - VMID 5201 / 5202: cacti.service exposing the internal API on :4000
+# - Proxmox CT config includes `features: nesting=1,keyctl=1` for Docker-in-LXC
+#
+# Usage: ./scripts/maintenance/ensure-cacti-web-via-ssh.sh [--dry-run]
+# Env: PROXMOX_HOST_R630_02 (default 192.168.11.12)
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
+[[ -f "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" ]] && source "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" 2>/dev/null || true
+
+DRY_RUN=false
+[[ "${1:-}" == "--dry-run" ]] && DRY_RUN=true
+
+PROXMOX_HOST="${PROXMOX_HOST_R630_02:-192.168.11.12}"
+
+log_info() { echo -e "\033[0;34m[INFO]\033[0m $1"; }
+log_ok()   { echo -e "\033[0;32m[✓]\033[0m $1"; }
+log_warn() { echo -e "\033[0;33m[⚠]\033[0m $1"; }
+
+run_ssh() { ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@"$PROXMOX_HOST" "$@"; }
+
+ensure_ct_features() {
+  local vmid="$1"
+  local conf="/etc/pve/lxc/${vmid}.conf"
+  local features
+
+  features="$(run_ssh "awk -F': ' '/^features:/{print \$2}' ${conf@Q} 2>/dev/null || true" | tr -d '\r\n')"
+  if [[ "$features" == *"nesting=1"* && "$features" == *"keyctl=1"* ]]; then
+    return 0
+  fi
+
+  if [[ "$DRY_RUN" == true ]]; then
+    log_info "Would add features: nesting=1,keyctl=1 to VMID $vmid and restart the CT"
+    return 0
+  fi
+
+  run_ssh "cp ${conf@Q} /root/${vmid}.conf.pre-codex.\$(date +%Y%m%d_%H%M%S)"
+  if [[ -n "$features" ]]; then
+    run_ssh "sed -i 's/^features:.*/features: nesting=1,keyctl=1/' ${conf@Q}"
+  else
+    run_ssh "printf '%s\n' 'features: nesting=1,keyctl=1' >> ${conf@Q}"
+  fi
+  run_ssh "pct shutdown $vmid --timeout 30 >/dev/null 2>&1 || pct stop $vmid >/dev/null 2>&1 || true"
+  run_ssh "pct start $vmid"
+  sleep 8
+}
+
+ensure_cacti_surface() {
+  local vmid="$1"
+  local ip="$2"
+  local label="$3"
+  local status
+  local local_check
+  local remote_script
+
+  ensure_ct_features "$vmid"
+
+  status="$(run_ssh "pct status $vmid 2>/dev/null | awk '{print \$2}'" 2>/dev/null || echo "missing")"
+  if [[ "$status" != "running" ]]; then
+    log_warn "$label (VMID $vmid) is not running"
+    return 0
+  fi
+
+  local_check="$(run_ssh "timeout 5 curl -sS -o /dev/null -w '%{http_code}' http://${ip}/ 2>/dev/null || true" | tr -d '\r\n')"
+  if [[ "$local_check" == "200" ]] && run_ssh "pct exec $vmid -- bash -lc 'curl -fsS http://127.0.0.1:4000/api/v1/api-server/healthcheck >/dev/null 2>&1'" >/dev/null 2>&1; then
+    log_ok "$label already serves both the landing page and internal Cacti API"
+    return 0
+  fi
+
+  if [[ "$DRY_RUN" == true ]]; then
+    log_info "Would restart nginx/docker/cacti.service in VMID $vmid (${label})"
+    return 0
+  fi
+
+  printf -v remote_script '%s' "$(cat <<'EOF'
+set -e
+id -nG cacti 2>/dev/null | grep -qw docker || usermod -aG docker cacti || true
+systemctl restart docker
+systemctl enable --now nginx
+systemctl reset-failed cacti.service || true
+systemctl enable --now cacti.service
+for _ in $(seq 1 20); do
+  if curl -fsS http://127.0.0.1:4000/api/v1/api-server/healthcheck >/dev/null 2>&1; then
+    break
+  fi
+  sleep 2
+done
+curl -fsS http://127.0.0.1/ >/dev/null
+curl -fsS http://127.0.0.1:4000/api/v1/api-server/healthcheck >/dev/null
+EOF
+)"
+  run_ssh "pct exec $vmid -- bash --norc -lc $(printf '%q' "$remote_script")"
+
+  local_check="$(run_ssh "timeout 5 curl -sS -o /dev/null -w '%{http_code}' http://${ip}/ 2>/dev/null || true" | tr -d '\r\n')"
+  if [[ "$local_check" == "200" ]] && run_ssh "pct exec $vmid -- bash -lc 'curl -fsS http://127.0.0.1:4000/api/v1/api-server/healthcheck >/dev/null 2>&1'" >/dev/null 2>&1; then
+    log_ok "$label restored on ${ip}:80 with a healthy internal Cacti API"
+  else
+    log_warn "$label is still only partially healthy on VMID $vmid"
+  fi
+}
+
+echo ""
+echo "=== Ensure Cacti surfaces ==="
+echo "  Host: $PROXMOX_HOST  dry-run=$DRY_RUN"
+echo ""
+
+ensure_cacti_surface 5201 "192.168.11.177" "Cacti ALLTRA"
+ensure_cacti_surface 5202 "192.168.11.251" "Cacti HYBX"
+
+echo ""
--- a/scripts/maintenance/ensure-core-rpc-config-2101-2102.sh
+++ b/scripts/maintenance/ensure-core-rpc-config-2101-2102.sh
@@ -1,9 +1,10 @@
 #!/usr/bin/env bash
-# Ensure Core RPC nodes 2101 and 2102 have TXPOOL and ADMIN (and DEBUG) in rpc-http-api and rpc-ws-api.
+# Ensure Core RPC nodes 2101, 2103 (Thirdweb admin core), and 2102 have TXPOOL and ADMIN (and DEBUG) in rpc-http-api and rpc-ws-api.
 # Does NOT add txpool_besuClear/txpool_clear/admin_removeTransaction — Besu does not implement them.
+# VMID 2103 uses /etc/besu/config-rpc-core.toml on r630-01; repo canonical: smom-dbis-138/config/config-rpc-thirdweb-admin-core.toml
 # See: docs/04-configuration/CORE_RPC_2101_2102_TXPOOL_ADMIN_STATUS.md
 #
-# Usage: ./scripts/maintenance/ensure-core-rpc-config-2101-2102.sh [--dry-run] [--2101-only] [--2102-only]
+# Usage: ./scripts/maintenance/ensure-core-rpc-config-2101-2102.sh [--dry-run] [--2101-only] [--2102-only] [--2103-only]

 set -euo pipefail

@@ -17,18 +18,23 @@ RPC_WS_API='["ETH","NET","WEB3","TXPOOL","QBFT","ADMIN"]'

 VMID_2101=2101
 VMID_2102=2102
+VMID_2103=2103
 HOST_2101="${PROXMOX_HOST_R630_01:-192.168.11.11}"
 HOST_2102="${PROXMOX_HOST_ML110:-192.168.11.10}"
+HOST_2103="${PROXMOX_HOST_R630_01:-192.168.11.11}"
 CONFIG_2101="/etc/besu/config-rpc-core.toml"
 CONFIG_2102="/etc/besu/config-rpc.toml"
+CONFIG_2103="/etc/besu/config-rpc-core.toml"

 DRY_RUN=false
 ONLY_2101=false
 ONLY_2102=false
+ONLY_2103=false
 for a in "$@"; do
  [[ "$a" == "--dry-run" ]] && DRY_RUN=true
  [[ "$a" == "--2101-only" ]] && ONLY_2101=true
  [[ "$a" == "--2102-only" ]] && ONLY_2102=true
+  [[ "$a" == "--2103-only" ]] && ONLY_2103=true
 done

 run_ssh() { ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@"$1" "$2"; }
@@ -42,7 +48,7 @@ ensure_apis() {
  local config_path=$3
  log_info "VMID $vmid ($host): ensuring $config_path has TXPOOL, ADMIN, DEBUG..."
  if $DRY_RUN; then
-    echo "  Would set rpc-http-api and rpc-ws-api to include TXPOOL, ADMIN, DEBUG, QBFT, TRACE (2101/2102)"
+    echo "  Would set rpc-http-api and rpc-ws-api to include TXPOOL, ADMIN, DEBUG, QBFT, TRACE (2101/2103/2102)"
    return 0
  fi
  # Pass API lists via env so quoting is safe; remote sed updates the config
@@ -63,18 +69,23 @@ ensure_apis() {
 }

 echo ""
-echo "=== Ensure Core RPC 2101 / 2102 — TXPOOL + ADMIN (max Besu supports) ==="
-echo "  dry-run=$DRY_RUN  2101-only=$ONLY_2101  2102-only=$ONLY_2102"
+echo "=== Ensure Core RPC 2101 / 2103 / 2102 — TXPOOL + ADMIN (max Besu supports) ==="
+echo "  dry-run=$DRY_RUN  2101-only=$ONLY_2101  2102-only=$ONLY_2102  2103-only=$ONLY_2103"
 echo "  Note: txpool_besuClear, txpool_clear, admin_removeTransaction are NOT in Besu; use clear-all-transaction-pools.sh to clear stuck txs."
 echo ""

-if [[ "$ONLY_2102" != true ]]; then
+if $ONLY_2103; then
+  ensure_apis "$VMID_2103" "$HOST_2103" "$CONFIG_2103" || true
+elif $ONLY_2101; then
  ensure_apis "$VMID_2101" "$HOST_2101" "$CONFIG_2101" || true
-fi
-if [[ "$ONLY_2101" != true ]]; then
+elif $ONLY_2102; then
+  ensure_apis "$VMID_2102" "$HOST_2102" "$CONFIG_2102" || true
+else
+  ensure_apis "$VMID_2101" "$HOST_2101" "$CONFIG_2101" || true
+  ensure_apis "$VMID_2103" "$HOST_2103" "$CONFIG_2103" || true
  ensure_apis "$VMID_2102" "$HOST_2102" "$CONFIG_2102" || true
 fi

 echo ""
-echo "Done. Verify: ./scripts/maintenance/health-check-rpc-2101.sh and curl to 192.168.11.212:8545 for 2102."
+echo "Done. Verify: ./scripts/maintenance/health-check-rpc-2101.sh; curl 192.168.11.217:8545 (2103); curl 192.168.11.212:8545 (2102)."
 echo "Ref: docs/04-configuration/CORE_RPC_2101_2102_TXPOOL_ADMIN_STATUS.md"
--- a/scripts/maintenance/ensure-dbis-services-via-ssh.sh
+++ b/scripts/maintenance/ensure-dbis-services-via-ssh.sh
@@ -1,6 +1,10 @@
 #!/usr/bin/env bash
-# Ensure web/API services inside DBIS containers (10130, 10150, 10151) are running.
-# Fixes 502 when containers are up but nginx or app inside is stopped.
+# Ensure the deployed DBIS frontend/API surfaces on r630-01 stay healthy.
+#
+# Expected runtime:
+# - VMID 10130: nginx serving the built DBIS frontend on port 80
+# - VMID 10150: dbis-api.service serving the primary DBIS API on port 3000
+# - VMID 10151: dbis-api.service serving the secondary DBIS API on port 3000
 #
 # Usage: ./scripts/maintenance/ensure-dbis-services-via-ssh.sh [--dry-run]
 # Env: PROXMOX_HOST_R630_01 (default 192.168.11.11)
@@ -21,20 +25,83 @@ log_warn() { echo -e "\033[0;33m[⚠]\033[0m $1"; }

 run_ssh() { ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@"$PROXMOX_HOST" "$@"; }

+build_url() {
+  local ip="$1"
+  local port="$2"
+  local path="$3"
+  if [[ "$port" == "80" ]]; then
+    printf 'http://%s%s' "$ip" "$path"
+  else
+    printf 'http://%s:%s%s' "$ip" "$port" "$path"
+  fi
+}
+
+ensure_service_surface() {
+  local vmid="$1"
+  local ip="$2"
+  local port="$3"
+  local path="$4"
+  local service="$5"
+  local label="$6"
+  local status
+  local local_url
+  local remote_url
+  local local_check
+  local remote_script
+
+  status="$(run_ssh "pct status $vmid 2>/dev/null | awk '{print \$2}'" 2>/dev/null || echo "missing")"
+  if [[ "$status" != "running" ]]; then
+    log_warn "$label (VMID $vmid) is not running"
+    return 0
+  fi
+
+  local_url="$(build_url "$ip" "$port" "$path")"
+  remote_url="$(build_url "127.0.0.1" "$port" "$path")"
+
+  local_check="$(run_ssh "timeout 5 curl -sS -o /dev/null -w '%{http_code}' ${local_url@Q} 2>/dev/null || true" | tr -d '\r\n')"
+  if [[ "$local_check" == "200" ]]; then
+    log_ok "$label already responds at $local_url"
+    return 0
+  fi
+
+  if [[ "$DRY_RUN" == true ]]; then
+    log_info "Would restart $service on VMID $vmid and recheck $local_url"
+    return 0
+  fi
+
+  printf -v remote_script '%s' "$(cat <<EOF
+set -e
+systemctl reset-failed ${service} >/dev/null 2>&1 || true
+systemctl restart ${service}
+for _ in \$(seq 1 15); do
+  if curl -fsS ${remote_url@Q} >/dev/null 2>&1; then
+    exit 0
+  fi
+  sleep 2
+done
+curl -fsS ${remote_url@Q} >/dev/null
+EOF
+)"
+
+  if ! run_ssh "pct exec $vmid -- bash --norc -lc $(printf '%q' "$remote_script")"; then
+    log_warn "$label restart path failed on VMID $vmid"
+  fi
+
+  local_check="$(run_ssh "timeout 5 curl -sS -o /dev/null -w '%{http_code}' ${local_url@Q} 2>/dev/null || true" | tr -d '\r\n')"
+  if [[ "$local_check" == "200" ]]; then
+    log_ok "$label restored at $local_url"
+  else
+    log_warn "$label still not healthy at $local_url (curl=${local_check:-000})"
+  fi
+}
+
 echo ""
-echo "=== Ensure DBIS container services (fix 502) ==="
+echo "=== Ensure DBIS deployed services ==="
 echo "  Host: $PROXMOX_HOST  dry-run=$DRY_RUN"
 echo ""

-for vmid in 10130 10150 10151; do
-  if [[ "$DRY_RUN" == true ]]; then
-    log_info "Would ensure nginx/node in VMID $vmid"
-    continue
-  fi
-  status=$(run_ssh "pct status $vmid 2>/dev/null | awk '{print \$2}'" 2>/dev/null || echo "missing")
-  [[ "$status" != "running" ]] && { log_warn "VMID $vmid not running"; continue; }
-  run_ssh "pct exec $vmid -- systemctl start nginx 2>/dev/null" || true
-  run_ssh "pct exec $vmid -- systemctl start node 2>/dev/null" || true
-  log_ok "VMID $vmid services started"
-done
+ensure_service_surface 10130 "192.168.11.130" "80" "/" "nginx" "DBIS frontend"
+ensure_service_surface 10150 "192.168.11.155" "3000" "/v1/health" "dbis-api.service" "DBIS API primary"
+ensure_service_surface 10151 "192.168.11.156" "3000" "/v1/health" "dbis-api.service" "DBIS API secondary"
+
 echo ""
--- a/scripts/maintenance/ensure-fabric-sample-network-via-ssh.sh
+++ b/scripts/maintenance/ensure-fabric-sample-network-via-ssh.sh
@@ -0,0 +1,189 @@
+#!/usr/bin/env bash
+# Ensure the Hyperledger Fabric sample network on VMID 6000 is up, queryable,
+# and boot-recoverable after container restarts.
+#
+# Expected runtime:
+# - VMID 6000 running on r630-02
+# - docker + nested LXC features enabled
+# - fabric-samples test-network payload under /opt/fabric/fabric-samples/test-network
+# - orderer.example.com, peer0.org1.example.com, peer0.org2.example.com running
+# - peer channel getinfo -c mychannel succeeds for Org1
+#
+# Usage: ./scripts/maintenance/ensure-fabric-sample-network-via-ssh.sh [--dry-run]
+# Env: PROXMOX_HOST_R630_02 (default 192.168.11.12)
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
+[[ -f "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" ]] && source "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" 2>/dev/null || true
+
+DRY_RUN=false
+[[ "${1:-}" == "--dry-run" ]] && DRY_RUN=true
+
+PROXMOX_HOST="${PROXMOX_HOST_R630_02:-192.168.11.12}"
+VMID=6000
+
+log_info() { echo -e "\033[0;34m[INFO]\033[0m $1"; }
+log_ok()   { echo -e "\033[0;32m[✓]\033[0m $1"; }
+log_warn() { echo -e "\033[0;33m[⚠]\033[0m $1"; }
+log_err()  { echo -e "\033[0;31m[ERR]\033[0m $1"; }
+
+run_ssh() { ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@"$PROXMOX_HOST" "$@"; }
+run_scp() { scp -o ConnectTimeout=10 -o StrictHostKeyChecking=no "$@"; }
+
+ensure_ct_features() {
+  local conf="/etc/pve/lxc/${VMID}.conf"
+  local features
+
+  features="$(run_ssh "awk -F': ' '/^features:/{print \$2}' ${conf@Q} 2>/dev/null || true" | tr -d '\r\n')"
+  if [[ "$features" == *"nesting=1"* && "$features" == *"keyctl=1"* ]]; then
+    return 0
+  fi
+
+  if [[ "$DRY_RUN" == true ]]; then
+    log_info "Would add features: nesting=1,keyctl=1 to VMID $VMID and restart the CT"
+    return 0
+  fi
+
+  run_ssh "cp ${conf@Q} /root/${VMID}.conf.pre-codex.\$(date +%Y%m%d_%H%M%S)"
+  if [[ -n "$features" ]]; then
+    run_ssh "sed -i 's/^features:.*/features: nesting=1,keyctl=1/' ${conf@Q}"
+  else
+    run_ssh "printf '%s\n' 'features: nesting=1,keyctl=1' >> ${conf@Q}"
+  fi
+  run_ssh "pct shutdown $VMID --timeout 30 >/dev/null 2>&1 || pct stop $VMID >/dev/null 2>&1 || true"
+  run_ssh "pct start $VMID"
+  sleep 8
+}
+
+ensure_boot_service() {
+  if [[ "$DRY_RUN" == true ]]; then
+    log_info "Would install and enable fabric-sample-network.service in VMID $VMID"
+    return 0
+  fi
+
+  local helper_tmp unit_tmp
+  helper_tmp="$(mktemp)"
+  unit_tmp="$(mktemp)"
+
+  cat > "$helper_tmp" <<'EOF'
+#!/usr/bin/env bash
+set -euo pipefail
+
+cd /opt/fabric/fabric-samples/test-network
+
+verify() {
+  docker ps --format '{{.Names}}' | grep -qx orderer.example.com
+  docker ps --format '{{.Names}}' | grep -qx peer0.org1.example.com
+  docker ps --format '{{.Names}}' | grep -qx peer0.org2.example.com
+  export PATH=/opt/fabric/fabric-samples/bin:$PATH
+  export FABRIC_CFG_PATH=/opt/fabric/fabric-samples/config
+  export $(./setOrgEnv.sh Org1 | xargs)
+  peer channel getinfo -c mychannel >/tmp/fabric-channel-info.txt
+}
+
+if verify 2>/dev/null; then
+  exit 0
+fi
+
+./network.sh up >/tmp/fabric-network-up.log 2>&1 || true
+verify
+EOF
+
+  cat > "$unit_tmp" <<'EOF'
+[Unit]
+Description=Ensure Hyperledger Fabric sample network
+After=docker.service network-online.target
+Requires=docker.service
+
+[Service]
+Type=oneshot
+RemainAfterExit=yes
+WorkingDirectory=/opt/fabric/fabric-samples/test-network
+ExecStart=/usr/local/bin/ensure-fabric-sample-network
+
+[Install]
+WantedBy=multi-user.target
+EOF
+
+  run_scp "$helper_tmp" "root@$PROXMOX_HOST:/tmp/ensure-fabric-sample-network"
+  run_scp "$unit_tmp" "root@$PROXMOX_HOST:/tmp/fabric-sample-network.service"
+  run_ssh "pct exec $VMID -- rm -f /usr/local/bin/ensure-fabric-sample-network /etc/systemd/system/fabric-sample-network.service"
+  run_ssh "pct push $VMID /tmp/ensure-fabric-sample-network /usr/local/bin/ensure-fabric-sample-network --perms 755"
+  run_ssh "pct push $VMID /tmp/fabric-sample-network.service /etc/systemd/system/fabric-sample-network.service --perms 644"
+  run_ssh "rm -f /tmp/ensure-fabric-sample-network /tmp/fabric-sample-network.service"
+  rm -f "$helper_tmp" "$unit_tmp"
+
+  run_ssh "pct exec $VMID -- bash -lc 'systemctl daemon-reload && systemctl enable fabric-sample-network.service >/dev/null 2>&1 && systemctl start fabric-sample-network.service'"
+}
+
+verify_fabric_sample_network() {
+  run_ssh "pct exec $VMID -- bash -lc '
+    set -euo pipefail
+    cd /opt/fabric/fabric-samples/test-network
+    echo service=\$(systemctl is-active fabric-sample-network.service 2>/dev/null || echo unknown)
+    docker ps --format \"{{.Names}}\" | grep -qx orderer.example.com
+    docker ps --format \"{{.Names}}\" | grep -qx peer0.org1.example.com
+    docker ps --format \"{{.Names}}\" | grep -qx peer0.org2.example.com
+    export PATH=/opt/fabric/fabric-samples/bin:\$PATH
+    export FABRIC_CFG_PATH=/opt/fabric/fabric-samples/config
+    export \$(./setOrgEnv.sh Org1 | xargs)
+    peer channel getinfo -c mychannel >/tmp/fabric-channel-info.txt
+    cat /tmp/fabric-channel-info.txt
+  '" 2>/dev/null
+}
+
+restore_fabric_sample_network() {
+  if [[ "$DRY_RUN" == true ]]; then
+    log_info "Would run ./network.sh up inside VMID $VMID and then verify mychannel"
+    return 0
+  fi
+
+  run_ssh "pct exec $VMID -- bash -lc '
+    set -euo pipefail
+    cd /opt/fabric/fabric-samples/test-network
+    ./network.sh up >/tmp/fabric-network-up.log 2>&1 || true
+    cat /tmp/fabric-network-up.log
+  '"
+}
+
+echo ""
+echo "=== Ensure Fabric sample network ==="
+echo "  Host: $PROXMOX_HOST  vmid=$VMID  dry-run=$DRY_RUN"
+echo ""
+
+ensure_ct_features
+
+status="$(run_ssh "pct status $VMID 2>/dev/null | awk '{print \$2}'" 2>/dev/null || echo "missing")"
+if [[ "$status" != "running" ]]; then
+  if [[ "$DRY_RUN" == true ]]; then
+    log_info "Would start VMID $VMID"
+  else
+    run_ssh "pct start $VMID"
+    sleep 8
+  fi
+fi
+
+ensure_boot_service
+
+if [[ "$DRY_RUN" == true ]]; then
+  log_info "Would verify running orderer/peer containers and peer channel getinfo -c mychannel"
+  exit 0
+fi
+
+if fabric_info="$(verify_fabric_sample_network)"; then
+  log_ok "Fabric sample network already healthy"
+  printf '%s\n' "$fabric_info"
+  exit 0
+fi
+
+log_warn "Fabric sample network not fully healthy; attempting restore"
+restore_fabric_sample_network
+
+if fabric_info="$(verify_fabric_sample_network)"; then
+  log_ok "Fabric sample network restored"
+  printf '%s\n' "$fabric_info"
+else
+  log_err "Fabric sample network is still not healthy after restore attempt"
+  exit 1
+fi
--- a/scripts/maintenance/ensure-firefly-primary-via-ssh.sh
+++ b/scripts/maintenance/ensure-firefly-primary-via-ssh.sh
@@ -0,0 +1,186 @@
+#!/usr/bin/env bash
+# Ensure the Hyperledger FireFly primary on VMID 6200 has a valid compose file
+# and an active systemd unit.
+#
+# Expected runtime:
+# - VMID 6200 running on r630-02
+# - /opt/firefly/docker-compose.yml present
+# - firefly.service enabled and active
+# - firefly-core, firefly-postgres, firefly-ipfs using restart=unless-stopped
+# - GET /api/v1/status succeeds on localhost:5000
+#
+# Usage: ./scripts/maintenance/ensure-firefly-primary-via-ssh.sh [--dry-run]
+# Env: PROXMOX_HOST_R630_02 (default 192.168.11.12)
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
+[[ -f "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" ]] && source "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" 2>/dev/null || true
+
+DRY_RUN=false
+[[ "${1:-}" == "--dry-run" ]] && DRY_RUN=true
+
+PROXMOX_HOST="${PROXMOX_HOST_R630_02:-192.168.11.12}"
+VMID=6200
+
+log_info() { echo -e "\033[0;34m[INFO]\033[0m $1"; }
+log_ok()   { echo -e "\033[0;32m[✓]\033[0m $1"; }
+log_err()  { echo -e "\033[0;31m[ERR]\033[0m $1"; }
+
+run_ssh() { ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@"$PROXMOX_HOST" "$@"; }
+
+normalize_compose() {
+  if [[ "$DRY_RUN" == true ]]; then
+    log_info "Would normalize /opt/firefly/docker-compose.yml and validate it with docker-compose config -q"
+    return 0
+  fi
+
+  run_ssh "pct exec $VMID -- bash -lc '
+    set -euo pipefail
+    test -f /opt/firefly/docker-compose.yml
+    if grep -qE \"^version:[[:space:]]*3\\.8[[:space:]]*$\" /opt/firefly/docker-compose.yml; then
+      sed -i \"s/^version:[[:space:]]*3\\.8[[:space:]]*$/version: \\\"3.8\\\"/\" /opt/firefly/docker-compose.yml
+    fi
+    docker-compose -f /opt/firefly/docker-compose.yml config -q
+  '"
+}
+
+install_firefly_helper() {
+  if [[ "$DRY_RUN" == true ]]; then
+    log_info "Would install an idempotent FireFly helper and systemd unit in VMID $VMID"
+    return 0
+  fi
+
+  local helper_tmp unit_tmp
+  helper_tmp="$(mktemp)"
+  unit_tmp="$(mktemp)"
+
+  cat > "$helper_tmp" <<'EOF'
+#!/usr/bin/env bash
+set -euo pipefail
+
+COMPOSE_FILE=/opt/firefly/docker-compose.yml
+STATUS_URL=http://127.0.0.1:5000/api/v1/status
+
+start_stack() {
+  cd /opt/firefly
+  test -f "$COMPOSE_FILE"
+  if grep -qE '^version:[[:space:]]*3\.8[[:space:]]*$' "$COMPOSE_FILE"; then
+    sed -i 's/^version:[[:space:]]*3\.8[[:space:]]*$/version: "3.8"/' "$COMPOSE_FILE"
+  fi
+  docker-compose -f "$COMPOSE_FILE" config -q
+  docker-compose -f "$COMPOSE_FILE" up -d postgres ipfs >/dev/null
+
+  if docker ps -a --format '{{.Names}}' | grep -qx firefly-core; then
+    docker start firefly-core >/dev/null 2>&1 || true
+  else
+    docker-compose -f "$COMPOSE_FILE" up -d firefly-core >/dev/null
+  fi
+
+  curl -fsS "$STATUS_URL" >/dev/null
+}
+
+stop_stack() {
+  docker stop firefly-core firefly-postgres firefly-ipfs >/dev/null 2>&1 || true
+}
+
+case "${1:-start}" in
+  start)
+    start_stack
+    ;;
+  stop)
+    stop_stack
+    ;;
+  *)
+    echo "Usage: $0 [start|stop]" >&2
+    exit 64
+    ;;
+esac
+EOF
+
+  cat > "$unit_tmp" <<'EOF'
+[Unit]
+Description=Ensure Hyperledger FireFly primary stack
+After=docker.service network-online.target
+Requires=docker.service
+
+[Service]
+Type=oneshot
+RemainAfterExit=yes
+WorkingDirectory=/opt/firefly
+User=firefly
+Group=firefly
+ExecStart=/usr/local/bin/ensure-firefly-primary start
+ExecStop=/usr/local/bin/ensure-firefly-primary stop
+
+[Install]
+WantedBy=multi-user.target
+EOF
+
+  scp -o ConnectTimeout=10 -o StrictHostKeyChecking=no "$helper_tmp" "root@$PROXMOX_HOST:/tmp/ensure-firefly-primary"
+  scp -o ConnectTimeout=10 -o StrictHostKeyChecking=no "$unit_tmp" "root@$PROXMOX_HOST:/tmp/firefly.service"
+  run_ssh "pct exec $VMID -- rm -f /usr/local/bin/ensure-firefly-primary /etc/systemd/system/firefly.service"
+  run_ssh "pct push $VMID /tmp/ensure-firefly-primary /usr/local/bin/ensure-firefly-primary --perms 755"
+  run_ssh "pct push $VMID /tmp/firefly.service /etc/systemd/system/firefly.service --perms 644"
+  run_ssh "rm -f /tmp/ensure-firefly-primary /tmp/firefly.service"
+  rm -f "$helper_tmp" "$unit_tmp"
+}
+
+ensure_firefly_service() {
+  if [[ "$DRY_RUN" == true ]]; then
+    log_info "Would reset-failed and enable/start firefly.service in VMID $VMID"
+    return 0
+  fi
+
+  run_ssh "pct exec $VMID -- bash -lc '
+    set -euo pipefail
+    systemctl daemon-reload
+    systemctl reset-failed firefly.service || true
+    systemctl enable firefly.service >/dev/null 2>&1
+    systemctl start firefly.service
+  '"
+}
+
+verify_firefly_primary() {
+  run_ssh "pct exec $VMID -- bash -lc '
+    set -euo pipefail
+    echo service=\$(systemctl is-active firefly.service)
+    docker inspect -f \"{{.HostConfig.RestartPolicy.Name}}\" firefly-core | grep -qx unless-stopped
+    docker inspect -f \"{{.HostConfig.RestartPolicy.Name}}\" firefly-postgres | grep -qx unless-stopped
+    docker inspect -f \"{{.HostConfig.RestartPolicy.Name}}\" firefly-ipfs | grep -qx unless-stopped
+    curl -fsS http://127.0.0.1:5000/api/v1/status
+  '" 2>/dev/null
+}
+
+echo ""
+echo "=== Ensure FireFly primary ==="
+echo "  Host: $PROXMOX_HOST  vmid=$VMID  dry-run=$DRY_RUN"
+echo ""
+
+status="$(run_ssh "pct status $VMID 2>/dev/null | awk '{print \$2}'" 2>/dev/null || echo "missing")"
+if [[ "$status" != "running" ]]; then
+  if [[ "$DRY_RUN" == true ]]; then
+    log_info "Would start VMID $VMID"
+  else
+    run_ssh "pct start $VMID"
+    sleep 8
+  fi
+fi
+
+normalize_compose
+install_firefly_helper
+
+if [[ "$DRY_RUN" == true ]]; then
+  log_info "Would enable/start firefly.service and verify API health"
+  exit 0
+fi
+
+ensure_firefly_service
+
+if firefly_info="$(verify_firefly_primary)"; then
+  log_ok "FireFly primary healthy"
+  printf '%s\n' "$firefly_info"
+else
+  log_err "FireFly primary is still not healthy after normalization"
+  exit 1
+fi
--- a/scripts/maintenance/ensure-legacy-monitor-networkd-via-ssh.sh
+++ b/scripts/maintenance/ensure-legacy-monitor-networkd-via-ssh.sh
@@ -0,0 +1,91 @@
+#!/usr/bin/env bash
+# Ensure the legacy 3000-3003 monitor/RPC-adjacent LXCs have working static
+# networking and a boot-time systemd-networkd enablement, even though the
+# unprivileged guests cannot write their own multi-user.target.wants entries.
+#
+# Usage: ./scripts/maintenance/ensure-legacy-monitor-networkd-via-ssh.sh [--dry-run] [--apply]
+# Env: PROXMOX_HOST_R630_01 (default 192.168.11.11)
+#      PROXMOX_SAFE_DEFAULTS=1 — default dry-run unless --apply or PROXMOX_OPS_APPLY=1
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
+# shellcheck source=../lib/proxmox-production-guard.sh
+source "${PROJECT_ROOT}/scripts/lib/proxmox-production-guard.sh"
+[[ -f "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" ]] && source "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" 2>/dev/null || true
+
+DRY_RUN=false
+APPLY=false
+EXPLICIT_DRY=false
+for _arg in "$@"; do
+  case "$_arg" in
+    --dry-run) DRY_RUN=true; EXPLICIT_DRY=true ;;
+    --apply) APPLY=true; DRY_RUN=false ;;
+  esac
+done
+if [[ "$EXPLICIT_DRY" != true ]] && pguard_mutations_allowed; then
+  APPLY=true
+  DRY_RUN=false
+fi
+if [[ "$EXPLICIT_DRY" != true ]] && pguard_safe_defaults_enabled && [[ "$APPLY" != true ]]; then
+  DRY_RUN=true
+fi
+
+PROXMOX_HOST="${PROXMOX_HOST_R630_01:-192.168.11.11}"
+VMIDS=(3000 3001 3002 3003)
+
+log_info() { echo -e "\033[0;34m[INFO]\033[0m $1"; }
+log_ok()   { echo -e "\033[0;32m[✓]\033[0m $1"; }
+log_err()  { echo -e "\033[0;31m[ERR]\033[0m $1"; }
+
+run_ssh() { ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@"$PROXMOX_HOST" "$@"; }
+
+enable_hostside() {
+  local vmid="$1"
+  run_ssh "pct unmount $vmid >/dev/null 2>&1 || true"
+  run_ssh "mp=\$(pct mount $vmid | sed -n \"s/^mounted CT [0-9]\\+ in '\\''\\(.*\\)'\\''$/\\1/p\"); test -n \"\$mp\"; mkdir -p \"\$mp/etc/systemd/system/multi-user.target.wants\"; ln -sf /lib/systemd/system/systemd-networkd.service \"\$mp/etc/systemd/system/multi-user.target.wants/systemd-networkd.service\"; pct unmount $vmid"
+}
+
+start_and_verify() {
+  local vmid="$1"
+  run_ssh "pct exec $vmid -- systemctl start systemd-networkd"
+  run_ssh "pct exec $vmid -- sh -c 'printf \"%s\\n\" \"active=\$(systemctl is-active systemd-networkd)\"; printf \"%s\\n\" \"enabled=\$(systemctl is-enabled systemd-networkd 2>/dev/null || true)\"; hostname -I 2>/dev/null'"
+}
+
+echo ""
+echo "=== Ensure legacy monitor networking ==="
+echo "  Host: $PROXMOX_HOST  vmids=${VMIDS[*]}  dry-run=$DRY_RUN  apply=$APPLY"
+echo ""
+
+for vmid in "${VMIDS[@]}"; do
+  pguard_vmid_allowed "$vmid" || exit 2
+done
+
+if [[ "$DRY_RUN" == true ]]; then
+  for vmid in "${VMIDS[@]}"; do
+    log_info "Would mount CT $vmid, create host-side systemd-networkd enablement symlink, start systemd-networkd, and verify hostname -I"
+  done
+  exit 0
+fi
+
+for vmid in "${VMIDS[@]}"; do
+  if ! status="$(run_ssh "pct status $vmid 2>/dev/null | awk '{print \$2}'" 2>/dev/null)"; then
+    log_err "VMID $vmid is missing or unreachable"
+    exit 1
+  fi
+  if [[ "$status" != "running" ]]; then
+    run_ssh "pct start $vmid"
+    sleep 4
+  fi
+
+  enable_hostside "$vmid"
+  monitor_info="$(start_and_verify "$vmid")"
+  if grep -q '^active=active$' <<<"$monitor_info" && grep -q '^enabled=enabled$' <<<"$monitor_info" && grep -Eq '^192\.168\.11\.[0-9]+' <<<"$monitor_info"; then
+    log_ok "VMID $vmid networking healthy"
+    printf '%s\n' "$monitor_info"
+  else
+    log_err "VMID $vmid networking verification failed"
+    printf '%s\n' "$monitor_info"
+    exit 1
+  fi
+done
--- a/scripts/maintenance/fix-all-502s-comprehensive.sh
+++ b/scripts/maintenance/fix-all-502s-comprehensive.sh
@@ -67,7 +67,7 @@ log "2101 (rpc-http-prv): ensure nodekey and fix Besu..."
 if run "$R630_01" "pct status 2101 2>/dev/null | awk '{print \$2}'" 2>/dev/null | grep -q running; then
  run "$R630_01" "pct exec 2101 -- sh -c 'mkdir -p /data/besu; [ -f /data/besu/nodekey ] || [ -f /data/besu/key ] || openssl rand -hex 32 > /data/besu/nodekey'" 2>/dev/null || true
 fi
-if $DRY_RUN; then log "Would run fix-core-rpc-2101.sh"; else "${SCRIPT_DIR}/fix-core-rpc-2101.sh" 2>/dev/null && ok "2101 fix run" || warn "2101 fix had issues"; fi
+if $DRY_RUN; then log "Would run fix-core-rpc-2101.sh"; else "${SCRIPT_DIR}/fix-core-rpc-2101.sh" --apply 2>/dev/null && ok "2101 fix run" || warn "2101 fix had issues"; fi

 # --- 2500-2505 Alltra/HYBX RPC: ensure nodekey then start besu ---
 for v in 2500 2501 2502 2503 2504 2505; do
--- a/scripts/maintenance/fix-core-rpc-2101.sh
+++ b/scripts/maintenance/fix-core-rpc-2101.sh
@@ -2,9 +2,12 @@
 # Fix Core Besu RPC on VMID 2101 (Chain 138 admin/deploy — RPC_URL_138).
 # Starts container if stopped, starts/restarts Besu service, verifies RPC.
 #
-# Usage: ./scripts/maintenance/fix-core-rpc-2101.sh [--dry-run] [--restart-only]
-#   --dry-run     Print actions only; do not run.
+# Usage: ./scripts/maintenance/fix-core-rpc-2101.sh [--dry-run] [--apply] [--restart-only]
+#   --dry-run       Print actions only; do not run.
+#   --apply         Perform mutations (required when PROXMOX_SAFE_DEFAULTS=1 is set).
 #   --restart-only  Skip pct start; only restart Besu service inside CT.
+# Env: PROXMOX_SAFE_DEFAULTS=1 — default to dry-run unless --apply or PROXMOX_OPS_APPLY=1.
+#      PROXMOX_OPS_ALLOWED_VMIDS — optional allowlist (e.g. only 2101 for this script).
 # Requires: SSH to r630-01 (key-based). Run from LAN or VPN.
 #
 # See: docs/00-meta/502_DEEP_DIVE_ROOT_CAUSES_AND_FIXES.md (rpc-http-prv)
@@ -14,7 +17,9 @@ set -euo pipefail

 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
-[[ -f "${PROJECT_ROOT}/config/ip-addresses.conf" ]] && source "${PROJECT_ROOT}/config/ip-addresses.conf" 2>/dev/null || true
+# shellcheck source=../lib/proxmox-production-guard.sh
+source "${PROJECT_ROOT}/scripts/lib/proxmox-production-guard.sh"
+[[ -f "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" ]] && source "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" 2>/dev/null || true

 VMID=2101
 HOST="${PROXMOX_HOST_R630_01:-${PROXMOX_R630_01:-192.168.11.11}}"
@@ -22,8 +27,27 @@ RPC_IP="${RPC_CORE_1:-192.168.11.211}"
 RPC_PORT=8545

 DRY_RUN=false
+APPLY=false
+EXPLICIT_DRY=false
 RESTART_ONLY=false
-for a in "$@"; do [[ "$a" == "--dry-run" ]] && DRY_RUN=true; [[ "$a" == "--restart-only" ]] && RESTART_ONLY=true; done
+for a in "$@"; do
+  case "$a" in
+    --dry-run) DRY_RUN=true; EXPLICIT_DRY=true ;;
+    --apply) APPLY=true; DRY_RUN=false ;;
+    --restart-only) RESTART_ONLY=true ;;
+  esac
+done
+# PROXMOX_OPS_APPLY=1 acts like --apply unless operator explicitly passed --dry-run
+if [[ "$EXPLICIT_DRY" != true ]] && pguard_mutations_allowed; then
+  APPLY=true
+  DRY_RUN=false
+fi
+if [[ "$EXPLICIT_DRY" != true ]] && pguard_safe_defaults_enabled && [[ "$APPLY" != true ]]; then
+  DRY_RUN=true
+fi
+if ! pguard_vmid_allowed "$VMID"; then
+  exit 2
+fi

 log_info() { echo -e "\033[0;34m[INFO]\033[0m $1"; }
 log_ok()   { echo -e "\033[0;32m[✓]\033[0m $1"; }
--- a/scripts/maintenance/fix-keycloak-relay-via-ssh.sh
+++ b/scripts/maintenance/fix-keycloak-relay-via-ssh.sh
@@ -0,0 +1,118 @@
+#!/usr/bin/env bash
+# Repair keycloak.sankofa.nexus after duplicate-IP / stale-neighbor regressions.
+# Current durable path is the direct upstream:
+#   keycloak.sankofa.nexus -> 192.168.11.52:8080
+#
+# This script:
+# 1. Removes the stray 192.168.11.52 alias from CT 10232 if present
+# 2. Removes the guest-side reboot job that reintroduces the bad alias
+# 3. Flushes stale neighbor state in the primary NPMplus CT
+# 4. Forces NPMplus proxy host 60 back to 192.168.11.52:8080
+# 5. Disables temporary relay services if they exist
+#
+# Usage: ./scripts/maintenance/fix-keycloak-relay-via-ssh.sh [--dry-run]
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
+[[ -f "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" ]] && source "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" 2>/dev/null || true
+
+DRY_RUN=false
+[[ "${1:-}" == "--dry-run" ]] && DRY_RUN=true
+
+PROXMOX_HOST="${PROXMOX_HOST_R630_01:-192.168.11.11}"
+KEYCLOAK_IP="${IP_KEYCLOAK:-192.168.11.52}"
+NPM_CID="${NPMPLUS_PRIMARY_VMID:-10233}"
+CONFLICT_CID="${KEYCLOAK_CONFLICT_VMID:-10232}"
+PROXY_HOST_ID="${KEYCLOAK_NPM_PROXY_HOST_ID:-60}"
+
+log_info() { echo -e "\033[0;34m[INFO]\033[0m $1"; }
+log_ok()   { echo -e "\033[0;32m[✓]\033[0m $1"; }
+log_warn() { echo -e "\033[0;33m[⚠]\033[0m $1"; }
+
+run_ssh() { ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@"$PROXMOX_HOST" "$@"; }
+
+if [[ "$DRY_RUN" == true ]]; then
+  echo ""
+  echo "=== Fix Keycloak direct routing via SSH ==="
+  echo "  Host: $PROXMOX_HOST  dry-run=true"
+  echo ""
+  log_info "Would remove stray ${KEYCLOAK_IP}/24 from CT ${CONFLICT_CID}"
+  log_info "Would remove CT ${CONFLICT_CID} reboot hooks that re-add ${KEYCLOAK_IP}/24"
+  log_info "Would flush neighbor cache for ${KEYCLOAK_IP} in NPMplus CT ${NPM_CID}"
+  log_info "Would update NPMplus proxy host ${PROXY_HOST_ID} to ${KEYCLOAK_IP}:8080"
+  log_info "Would disable temporary keycloak relay services if present"
+  echo ""
+  exit 0
+fi
+
+echo ""
+echo "=== Fix Keycloak direct routing via SSH ==="
+echo "  Host: $PROXMOX_HOST  dry-run=false"
+echo ""
+
+log_info "Removing any stray ${KEYCLOAK_IP}/24 alias from CT ${CONFLICT_CID}"
+run_ssh "pct exec ${CONFLICT_CID} -- bash --norc -c '
+ip addr del ${KEYCLOAK_IP}/24 dev eth0 2>/dev/null || true
+ip -br addr
+'"
+log_ok "Conflict CT ${CONFLICT_CID} no longer carries ${KEYCLOAK_IP}"
+
+log_info "Removing guest-side reboot hooks that reintroduce ${KEYCLOAK_IP}/24 in CT ${CONFLICT_CID}"
+run_ssh "pct exec ${CONFLICT_CID} -- bash --norc -c '
+set -e
+CRON_TMP=\$(mktemp)
+if crontab -l >/tmp/keycloak-crontab.current 2>/dev/null; then
+  grep -vF \"/usr/local/bin/configure-network.sh\" /tmp/keycloak-crontab.current >\"\$CRON_TMP\" || true
+  crontab \"\$CRON_TMP\"
+else
+  : >\"\$CRON_TMP\"
+fi
+rm -f /tmp/keycloak-crontab.current \"\$CRON_TMP\"
+
+if [[ -f /usr/local/bin/configure-network.sh ]]; then
+  cp /usr/local/bin/configure-network.sh /usr/local/bin/configure-network.sh.bak.\$(date +%Y%m%d%H%M%S)
+  cat > /usr/local/bin/configure-network.sh <<\"EOF\"
+#!/bin/bash
+set -euo pipefail
+ip link set eth0 up 2>/dev/null || true
+ip addr del ${KEYCLOAK_IP}/24 dev eth0 2>/dev/null || true
+ip addr flush dev eth0 scope global 2>/dev/null || true
+ip addr add 192.168.11.56/24 dev eth0
+ip route replace default via 192.168.11.11 dev eth0
+EOF
+  chmod 0755 /usr/local/bin/configure-network.sh
+fi
+
+ip addr del ${KEYCLOAK_IP}/24 dev eth0 2>/dev/null || true
+ip route del default via 192.168.11.1 dev eth0 2>/dev/null || true
+ip route replace default via 192.168.11.11 dev eth0
+ip -br addr show dev eth0
+ip route show default
+crontab -l 2>/dev/null || true
+'"
+log_ok "Conflict CT ${CONFLICT_CID} no longer re-adds ${KEYCLOAK_IP} on reboot"
+
+log_info "Disabling temporary relay services"
+run_ssh "bash --norc -c '
+systemctl disable --now keycloak-host-relay.service 2>/dev/null || true
+pct exec 7802 -- systemctl disable --now keycloak-ct-relay.service 2>/dev/null || true
+pct exec 7804 -- pkill -f /tmp/keycloak_gov_relay.py 2>/dev/null || true
+'"
+log_ok "Temporary relays disabled"
+
+log_info "Flushing neighbor state for ${KEYCLOAK_IP} in NPMplus CT ${NPM_CID}"
+run_ssh "pct exec ${NPM_CID} -- bash --norc -c '
+ip neigh del ${KEYCLOAK_IP} dev eth0 2>/dev/null || true
+curl -s -o /dev/null -w \"%{http_code} %{redirect_url}\n\" -H \"Host: keycloak.sankofa.nexus\" http://${KEYCLOAK_IP}:8080/
+'"
+log_ok "Direct Keycloak upstream responds from NPMplus CT ${NPM_CID}"
+
+log_info "Re-applying canonical NPMplus proxy host mapping for Keycloak"
+bash "${PROJECT_ROOT}/scripts/nginx-proxy-manager/update-npmplus-proxy-hosts-api.sh" >/tmp/keycloak-npmplus-sync.log 2>&1
+run_ssh "pct exec ${NPM_CID} -- bash --norc -c '
+curl -k -I -s -H \"Host: keycloak.sankofa.nexus\" https://127.0.0.1 | sed -n \"1,10p\"
+'"
+log_ok "NPMplus proxy host ${PROXY_HOST_ID} restored to direct upstream"
+
+echo ""
--- a/scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh
+++ b/scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh
@@ -2,8 +2,9 @@
 # Make Besu CT rootfs writable by running e2fsck on their root LV (fixes read-only / emergency_ro after ext4 errors).
 # SSHs to the Proxmox host (r630-01), stops each CT, runs e2fsck -f -y on the LV, starts the CT.
 #
-# Usage: ./scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh [--dry-run]
+# Usage: ./scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh [--dry-run] [--apply]
 # Optional: BESU_WRITABLE_VMIDS="1500 1501 1502" to add sentries or other CTs (default: Core RPC 2101 only).
+# Env: PROXMOX_SAFE_DEFAULTS=1 — default dry-run unless --apply or PROXMOX_OPS_APPLY=1. PROXMOX_OPS_ALLOWED_VMIDS optional.
 # Run from project root. Requires: SSH to r630-01 (root, key-based).
 # See: docs/00-meta/502_DEEP_DIVE_ROOT_CAUSES_AND_FIXES.md §Read-only CT

@@ -11,7 +12,9 @@ set -euo pipefail

 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
-[[ -f "${PROJECT_ROOT}/config/ip-addresses.conf" ]] && source "${PROJECT_ROOT}/config/ip-addresses.conf" 2>/dev/null || true
+# shellcheck source=../lib/proxmox-production-guard.sh
+source "${PROJECT_ROOT}/scripts/lib/proxmox-production-guard.sh"
+[[ -f "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" ]] && source "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" 2>/dev/null || true

 HOST="${PROXMOX_HOST_R630_01:-192.168.11.11}"
 # Default: Core RPC on r630-01 (2101). 2500-2505 removed — destroyed; see ALL_VMIDS_ENDPOINTS.md.
@@ -24,7 +27,21 @@ fi
 SSH_OPTS="-o ConnectTimeout=20 -o ServerAliveInterval=15 -o StrictHostKeyChecking=accept-new"

 DRY_RUN=false
-[[ "${1:-}" == "--dry-run" ]] && DRY_RUN=true
+APPLY=false
+EXPLICIT_DRY=false
+for _arg in "$@"; do
+  case "$_arg" in
+    --dry-run) DRY_RUN=true; EXPLICIT_DRY=true ;;
+    --apply) APPLY=true; DRY_RUN=false ;;
+  esac
+done
+if [[ "$EXPLICIT_DRY" != true ]] && pguard_mutations_allowed; then
+  APPLY=true
+  DRY_RUN=false
+fi
+if [[ "$EXPLICIT_DRY" != true ]] && pguard_safe_defaults_enabled && [[ "$APPLY" != true ]]; then
+  DRY_RUN=true
+fi

 log_info() { echo -e "\033[0;34m[INFO]\033[0m $1"; }
 log_ok()   { echo -e "\033[0;32m[✓]\033[0m $1"; }
@@ -32,7 +49,7 @@ log_warn() { echo -e "\033[0;33m[⚠]\033[0m $1"; }

 echo ""
 echo "=== Make RPC VMIDs writable via Proxmox SSH ==="
-echo "  Host: $HOST  VMIDs: ${RPC_VMIDS[*]}  dry-run=$DRY_RUN"
+echo "  Host: $HOST  VMIDs: ${RPC_VMIDS[*]}  dry-run=$DRY_RUN  apply=$APPLY"
 echo ""

 if ! ssh $SSH_OPTS "root@$HOST" "echo OK" 2>/dev/null; then
@@ -46,6 +63,10 @@ if $DRY_RUN; then
  exit 0
 fi

+for vmid in "${RPC_VMIDS[@]}"; do
+  pguard_vmid_allowed "$vmid" || exit 2
+done
+
 for vmid in "${RPC_VMIDS[@]}"; do
  log_info "VMID $vmid: stop, e2fsck, start..."
  status=$(ssh $SSH_OPTS "root@$HOST" "pct status $vmid 2>/dev/null | awk '{print \$2}'" 2>/dev/null || echo "missing")
--- a/scripts/maintenance/make-validator-vmids-writable-via-ssh.sh
+++ b/scripts/maintenance/make-validator-vmids-writable-via-ssh.sh
@@ -1,10 +1,10 @@
 #!/usr/bin/env bash
 # Make validator VMIDs (1000-1004) writable by running e2fsck on their rootfs.
 # Fixes "Read-only file system" / JNA UnsatisfiedLinkError when Besu tries to write temp files.
-# SSHs to r630-01 (1000,1001,1002) and ml110 (1003,1004), stops each CT, e2fsck, starts.
+# SSHs to r630-01 (1000,1001,1002) and r630-03 (1003,1004), stops each CT, e2fsck, starts.
 #
 # Usage: ./scripts/maintenance/make-validator-vmids-writable-via-ssh.sh [--dry-run]
-# Run from project root. Requires SSH to r630-01 and ml110 (root, key-based).
+# Run from project root. Requires SSH to r630-01 and r630-03 (root, key-based).

 set -euo pipefail

@@ -13,16 +13,16 @@ PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
 [[ -f "${PROJECT_ROOT}/config/ip-addresses.conf" ]] && source "${PROJECT_ROOT}/config/ip-addresses.conf" 2>/dev/null || true

 R630_01="${PROXMOX_HOST_R630_01:-192.168.11.11}"
-ML110="${PROXMOX_ML110:-192.168.11.10}"
+R630_03="${PROXMOX_R630_03:-192.168.11.13}"
 SSH_OPTS="-o ConnectTimeout=15 -o StrictHostKeyChecking=accept-new"

-# Validators: 1000,1001,1002 on r630-01; 1003,1004 on ml110
+# Validators: 1000,1001,1002 on r630-01; 1003,1004 on r630-03
 VALIDATORS=(
  "1000:$R630_01"
  "1001:$R630_01"
  "1002:$R630_01"
-  "1003:$ML110"
-  "1004:$ML110"
+  "1003:$R630_03"
+  "1004:$R630_03"
 )

 DRY_RUN=false
--- a/scripts/maintenance/migrate-ml110-besu-rpc-to-r630-02-03.sh
+++ b/scripts/maintenance/migrate-ml110-besu-rpc-to-r630-02-03.sh
@@ -0,0 +1,107 @@
+#!/usr/bin/env bash
+# Migrate Chain 138 / Besu LXCs from ml110 to r630-02 and r630-03 (cluster copy migration).
+# Use after freeing ml110 RAM (e.g. 4×32GB → 1×64GB): move validators, RPC, sentries, Thirdweb CTs off ml110.
+#
+# PVE 9: use --target-storage (not --storage). Running CTs need restart migration: --restart 1
+#
+# Target split (balances disk: r630-02 thin5 has ~200G+ free; r630-03 local-lvm ~1T free):
+#   r630-02 / thin5: 2305, 2306, 2307, 2308 (named RPCs — smaller footprint)
+#   r630-03 / local-lvm: everything else on ml110 (validators, core-2, private, 2304, sentries, thirdweb)
+#
+# Usage (from LAN, SSH key to Proxmox nodes):
+#   ./scripts/maintenance/migrate-ml110-besu-rpc-to-r630-02-03.sh              # migrate all still on ml110
+#   ./scripts/maintenance/migrate-ml110-besu-rpc-to-r630-02-03.sh --dry-run
+#   ./scripts/maintenance/migrate-ml110-besu-rpc-to-r630-02-03.sh 2305        # single VMID
+#
+# Prerequisites: ml110, r630-02, r630-03 in same cluster; storages active on targets.
+#
+set -uo pipefail
+# Do not use set -e: one failed migrate should not abort the whole batch (log and continue).
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
+[[ -f "${PROJECT_ROOT}/config/ip-addresses.conf" ]] && source "${PROJECT_ROOT}/config/ip-addresses.conf" 2>/dev/null || true
+
+SRC_IP="${PROXMOX_HOST_ML110:-192.168.11.10}"
+SSH_OPTS="-o BatchMode=yes -o ConnectTimeout=20 -o StrictHostKeyChecking=accept-new"
+
+# Order: stopped first (no --restart), then r630-02 RPCs, then r630-03 bulk (running with --restart).
+R630_02_STORAGE="thin5"
+R630_03_STORAGE="local-lvm"
+
+# VMID -> "r630-02" or "r630-03"
+declare -A TARGET_NODE
+declare -A TARGET_STOR
+
+for v in 2305 2306 2307 2308; do
+  TARGET_NODE[$v]="r630-02"
+  TARGET_STOR[$v]="$R630_02_STORAGE"
+done
+
+for v in 1003 1004 1503 1504 1505 1506 1507 1508 2102 2301 2304 2400 2402 2403; do
+  TARGET_NODE[$v]="r630-03"
+  TARGET_STOR[$v]="$R630_03_STORAGE"
+done
+
+ALL_ORDER=(1503 1504 1505 1506 1507 1508 2400 2402 2403 2305 2306 2307 2308 2304 2301 2102 1003 1004)
+
+DRY_RUN=false
+SINGLE=()
+
+for arg in "$@"; do
+  [[ "$arg" == "--dry-run" ]] && DRY_RUN=true
+  [[ "$arg" =~ ^[0-9]+$ ]] && SINGLE+=("$arg")
+done
+
+log() { echo "[$(date -Iseconds)] $*"; }
+
+ssh_src() { ssh $SSH_OPTS "root@${SRC_IP}" "$@"; }
+
+migrate_one() {
+  local vmid="$1"
+  local node="${TARGET_NODE[$vmid]:-}"
+  local stor="${TARGET_STOR[$vmid]:-}"
+  if [[ -z "$node" || -z "$stor" ]]; then
+    log "SKIP $vmid — not in migration map (edit script)."
+    return 0
+  fi
+
+  if $DRY_RUN; then
+    echo "  ssh root@${SRC_IP} \"pct migrate $vmid $node --target-storage $stor [--restart 1 if running]\""
+    return 0
+  fi
+
+  if ! ssh_src "pct config $vmid" &>/dev/null; then
+    log "SKIP $vmid — not on ${SRC_IP} (already migrated or missing)."
+    return 0
+  fi
+
+  local running
+  running=$(ssh_src "pct status $vmid 2>/dev/null | awk '{print \$2}'" || echo "unknown")
+  local extra=()
+  if [[ "$running" == "running" ]]; then
+    extra=(--restart 1)
+  fi
+
+  log "MIGRATE $vmid -> $node storage=$stor status=$running ${extra[*]:-}"
+  if ! ssh_src "pct migrate $vmid $node --target-storage $stor ${extra[*]:-}"; then
+    log "FAIL $vmid -> $node (see above). Fix and re-run this script; completed VMIDs are skipped."
+    return 1
+  fi
+  log "DONE $vmid -> $node"
+}
+
+main() {
+  if [[ ${#SINGLE[@]} -gt 0 ]]; then
+    for vmid in "${SINGLE[@]}"; do
+      migrate_one "$vmid" || true
+    done
+    return 0
+  fi
+
+  for vmid in "${ALL_ORDER[@]}"; do
+    migrate_one "$vmid" || true
+  done
+}
+
+main "$@"
--- a/scripts/maintenance/run-all-maintenance-via-proxmox-ssh.sh
+++ b/scripts/maintenance/run-all-maintenance-via-proxmox-ssh.sh
@@ -88,7 +88,7 @@ echo ""
 # 0. Make RPC VMIDs writable (e2fsck so fix/install scripts can write)
 echo "[0/5] Making RPC VMIDs writable..."
 echo "--- 0/5: Make RPC VMIDs writable (r630-01: 2101, 2500-2505) ---"
-if run_step "${SCRIPT_DIR}/make-rpc-vmids-writable-via-ssh.sh"; then
+if run_step "${SCRIPT_DIR}/make-rpc-vmids-writable-via-ssh.sh" --apply; then
  echo "  Done."
 else
  echo "  Step had warnings (check output)."
--- a/scripts/maintenance/schedule-daily-weekly-cron.sh
+++ b/scripts/maintenance/schedule-daily-weekly-cron.sh
@@ -6,20 +6,40 @@ set -euo pipefail

 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
-CHECKS_SCRIPT="$PROJECT_ROOT/scripts/maintenance/daily-weekly-checks.sh"
-LOG_DIR="$PROJECT_ROOT/logs"
-CRON_DAILY="0 8 * * * cd $PROJECT_ROOT && bash $CHECKS_SCRIPT daily >> $LOG_DIR/daily-weekly-checks.log 2>&1"
-CRON_WEEKLY="0 9 * * 0 cd $PROJECT_ROOT && bash $CHECKS_SCRIPT weekly >> $LOG_DIR/daily-weekly-checks.log 2>&1"
+INSTALL_ROOT="${CRON_PROJECT_ROOT:-$PROJECT_ROOT}"
+CHECKS_SCRIPT="$INSTALL_ROOT/scripts/maintenance/daily-weekly-checks.sh"
+LOG_DIR="$INSTALL_ROOT/logs"
+CRON_DAILY="0 8 * * * cd $INSTALL_ROOT && bash $CHECKS_SCRIPT daily >> $LOG_DIR/daily-weekly-checks.log 2>&1"
+CRON_WEEKLY="0 9 * * 0 cd $INSTALL_ROOT && bash $CHECKS_SCRIPT weekly >> $LOG_DIR/daily-weekly-checks.log 2>&1"
+
+validate_install_root() {
+  if [[ "$INSTALL_ROOT" == /tmp/* ]]; then
+    echo "Refusing to install cron from ephemeral path: $INSTALL_ROOT"
+    echo "Set CRON_PROJECT_ROOT to a persistent checkout on the host, then rerun."
+    exit 1
+  fi
+  if [[ ! -f "$CHECKS_SCRIPT" ]]; then
+    echo "Checks script not found at: $CHECKS_SCRIPT"
+    echo "Set CRON_PROJECT_ROOT to the host path that contains scripts/maintenance/daily-weekly-checks.sh."
+    exit 1
+  fi
+}

 case "${1:-}" in
  --install)
+    validate_install_root
    mkdir -p "$LOG_DIR"
-    (crontab -l 2>/dev/null; echo "$CRON_DAILY"; echo "$CRON_WEEKLY") | crontab -
+    {
+      crontab -l 2>/dev/null | grep -v 'daily-weekly-checks.sh' || true
+      echo "$CRON_DAILY"
+      echo "$CRON_WEEKLY"
+    } | crontab -
    echo "Installed daily (08:00) and weekly (Sun 09:00):"
    echo "  $CRON_DAILY"
    echo "  $CRON_WEEKLY"
    ;;
  --show)
+    validate_install_root
    echo "Daily (O-1, O-2): $CRON_DAILY"
    echo "Weekly (O-3):     $CRON_WEEKLY"
    ;;
--- a/scripts/maintenance/schedule-explorer-lag-cron.sh
+++ b/scripts/maintenance/schedule-explorer-lag-cron.sh
@@ -7,26 +7,41 @@ set -euo pipefail

 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
-LAG_SCRIPT="$PROJECT_ROOT/scripts/maintenance/check-and-fix-explorer-lag.sh"
-LOG_DIR="$PROJECT_ROOT/logs"
+INSTALL_ROOT="${CRON_PROJECT_ROOT:-$PROJECT_ROOT}"
+LAG_SCRIPT="$INSTALL_ROOT/scripts/maintenance/check-and-fix-explorer-lag.sh"
+LOG_DIR="$INSTALL_ROOT/logs"
 LOG_FILE="$LOG_DIR/explorer-lag-fix.log"

 # Every 6 hours (0:00, 6:00, 12:00, 18:00)
-CRON_LAG="0 */6 * * * cd $PROJECT_ROOT && bash $LAG_SCRIPT >> $LOG_FILE 2>&1"
+CRON_LAG="0 */6 * * * cd $INSTALL_ROOT && bash $LAG_SCRIPT >> $LOG_FILE 2>&1"
+
+validate_install_root() {
+  if [[ "$INSTALL_ROOT" == /tmp/* ]]; then
+    echo "Refusing to install cron from ephemeral path: $INSTALL_ROOT"
+    echo "Set CRON_PROJECT_ROOT to a persistent checkout on the host, then rerun."
+    exit 1
+  fi
+  if [[ ! -f "$LAG_SCRIPT" ]]; then
+    echo "Lag script not found at: $LAG_SCRIPT"
+    echo "Set CRON_PROJECT_ROOT to the host path that contains scripts/maintenance/check-and-fix-explorer-lag.sh."
+    exit 1
+  fi
+}

 case "${1:-}" in
  --install)
+    validate_install_root
    mkdir -p "$LOG_DIR"
-    if crontab -l 2>/dev/null | grep -q "check-and-fix-explorer-lag.sh"; then
-      echo "Explorer lag cron already present in crontab."
-    else
-      (crontab -l 2>/dev/null; echo "$CRON_LAG") | crontab -
-      echo "Installed explorer lag cron (every 6 hours):"
-      echo "  $CRON_LAG"
-      echo "Log: $LOG_FILE"
-    fi
+    {
+      crontab -l 2>/dev/null | grep -v "check-and-fix-explorer-lag.sh" || true
+      echo "$CRON_LAG"
+    } | crontab -
+    echo "Installed explorer lag cron (every 6 hours):"
+    echo "  $CRON_LAG"
+    echo "Log: $LOG_FILE"
    ;;
  --show)
+    validate_install_root
    echo "Explorer lag check-and-fix (every 6 hours):"
    echo "  $CRON_LAG"
    echo "Log: $LOG_FILE"
--- a/scripts/maintenance/schedule-npmplus-backup-cron.sh
+++ b/scripts/maintenance/schedule-npmplus-backup-cron.sh
@@ -6,16 +6,36 @@ set -euo pipefail

 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
-BACKUP_SCRIPT="$PROJECT_ROOT/scripts/verify/backup-npmplus.sh"
-CRON_LINE="0 3 * * * cd $PROJECT_ROOT && bash $BACKUP_SCRIPT >> $PROJECT_ROOT/logs/npmplus-backup.log 2>&1"
+INSTALL_ROOT="${CRON_PROJECT_ROOT:-$PROJECT_ROOT}"
+BACKUP_SCRIPT="$INSTALL_ROOT/scripts/verify/backup-npmplus.sh"
+LOG_DIR="$INSTALL_ROOT/logs"
+CRON_LINE="0 3 * * * /usr/bin/flock -n /var/lock/npmplus-backup.lock bash -lc 'cd $INSTALL_ROOT && bash $BACKUP_SCRIPT >> $LOG_DIR/npmplus-backup.log 2>&1'"
+
+validate_install_root() {
+    if [[ "$INSTALL_ROOT" == /tmp/* ]]; then
+        echo "Refusing to install cron from ephemeral path: $INSTALL_ROOT"
+        echo "Set CRON_PROJECT_ROOT to a persistent checkout on the host, then rerun."
+        exit 1
+    fi
+    if [[ ! -f "$BACKUP_SCRIPT" ]]; then
+        echo "Backup script not found at: $BACKUP_SCRIPT"
+        echo "Set CRON_PROJECT_ROOT to the host path that contains scripts/verify/backup-npmplus.sh."
+        exit 1
+    fi
+}

 case "${1:-}" in
  --install)
-    mkdir -p "$PROJECT_ROOT/logs"
-    (crontab -l 2>/dev/null; echo "$CRON_LINE") | crontab -
+    validate_install_root
+    mkdir -p "$LOG_DIR"
+    {
+      crontab -l 2>/dev/null | grep -v 'backup-npmplus.sh' || true
+      echo "$CRON_LINE"
+    } | crontab -
    echo "Installed: $CRON_LINE"
    ;;
  --show)
+    validate_install_root
    echo "Crontab line: $CRON_LINE"
    ;;
  *)
--- a/scripts/maintenance/schedule-storage-growth-cron.sh
+++ b/scripts/maintenance/schedule-storage-growth-cron.sh
@@ -6,36 +6,44 @@ set -euo pipefail

 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
-COLLECT_SCRIPT="$PROJECT_ROOT/scripts/monitoring/collect-storage-growth-data.sh"
-PRUNE_SNAPSHOTS="$PROJECT_ROOT/scripts/monitoring/prune-storage-snapshots.sh"
-PRUNE_HISTORY="$PROJECT_ROOT/scripts/monitoring/prune-storage-history.sh"
-LOG_DIR="$PROJECT_ROOT/logs/storage-growth"
+INSTALL_ROOT="${CRON_PROJECT_ROOT:-$PROJECT_ROOT}"
+COLLECT_SCRIPT="$INSTALL_ROOT/scripts/monitoring/collect-storage-growth-data.sh"
+PRUNE_SNAPSHOTS="$INSTALL_ROOT/scripts/monitoring/prune-storage-snapshots.sh"
+PRUNE_HISTORY="$INSTALL_ROOT/scripts/monitoring/prune-storage-history.sh"
+LOG_DIR="$INSTALL_ROOT/logs/storage-growth"
 # Every 6 hours
-CRON_STORAGE="0 */6 * * * cd $PROJECT_ROOT && bash $COLLECT_SCRIPT --append >> $LOG_DIR/cron.log 2>&1"
+CRON_STORAGE="0 */6 * * * cd $INSTALL_ROOT && bash $COLLECT_SCRIPT --append >> $LOG_DIR/cron.log 2>&1"
 # Weekly Sun 08:00: prune snapshots (30d) + history (~90d)
-CRON_PRUNE="0 8 * * 0 cd $PROJECT_ROOT && bash $PRUNE_SNAPSHOTS >> $LOG_DIR/cron.log 2>&1 && bash $PRUNE_HISTORY >> $LOG_DIR/cron.log 2>&1"
+CRON_PRUNE="0 8 * * 0 cd $INSTALL_ROOT && bash $PRUNE_SNAPSHOTS >> $LOG_DIR/cron.log 2>&1 && bash $PRUNE_HISTORY >> $LOG_DIR/cron.log 2>&1"
+
+validate_install_root() {
+  if [[ "$INSTALL_ROOT" == /tmp/* ]]; then
+    echo "Refusing to install cron from ephemeral path: $INSTALL_ROOT"
+    echo "Set CRON_PROJECT_ROOT to a persistent checkout on the host, then rerun."
+    exit 1
+  fi
+  if [[ ! -f "$COLLECT_SCRIPT" || ! -f "$PRUNE_SNAPSHOTS" || ! -f "$PRUNE_HISTORY" ]]; then
+    echo "One or more storage growth scripts are missing under: $INSTALL_ROOT"
+    echo "Set CRON_PROJECT_ROOT to the host path that contains scripts/monitoring/collect-storage-growth-data.sh and prune helpers."
+    exit 1
+  fi
+}

 case "${1:-}" in
  --install)
+    validate_install_root
    mkdir -p "$LOG_DIR"
-    added=""
-    if ! crontab -l 2>/dev/null | grep -q "collect-storage-growth-data.sh"; then
-      (crontab -l 2>/dev/null; echo "$CRON_STORAGE") | crontab -
-      added="collect"
-    fi
-    if ! crontab -l 2>/dev/null | grep -q "prune-storage-snapshots.sh"; then
-      (crontab -l 2>/dev/null; echo "$CRON_PRUNE") | crontab -
-      added="${added:+$added + }prune"
-    fi
-    if [ -n "$added" ]; then
-      echo "Installed storage growth cron:"
-      echo "  $CRON_STORAGE"
-      echo "  $CRON_PRUNE"
-    else
-      echo "Storage growth cron already present in crontab."
-    fi
+    {
+      crontab -l 2>/dev/null | grep -v "collect-storage-growth-data.sh" | grep -v "prune-storage-snapshots.sh" | grep -v "prune-storage-history.sh" || true
+      echo "$CRON_STORAGE"
+      echo "$CRON_PRUNE"
+    } | crontab -
+    echo "Installed storage growth cron:"
+    echo "  $CRON_STORAGE"
+    echo "  $CRON_PRUNE"
    ;;
  --show)
+    validate_install_root
    echo "Storage growth (append every 6h): $CRON_STORAGE"
    echo "Storage prune (weekly Sun 08:00): $CRON_PRUNE"
    ;;
--- a/scripts/maintenance/schedule-storage-monitor-cron.sh
+++ b/scripts/maintenance/schedule-storage-monitor-cron.sh
@@ -6,23 +6,38 @@ set -euo pipefail

 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
-MONITOR_SCRIPT="$PROJECT_ROOT/scripts/storage-monitor.sh"
-LOG_DIR="$PROJECT_ROOT/logs/storage-monitoring"
+INSTALL_ROOT="${CRON_PROJECT_ROOT:-$PROJECT_ROOT}"
+MONITOR_SCRIPT="$INSTALL_ROOT/scripts/storage-monitor.sh"
+LOG_DIR="$INSTALL_ROOT/logs/storage-monitoring"
 # Daily at 07:00 (before daily-weekly-checks at 08:00)
-CRON_STORAGE_MONITOR="0 7 * * * cd $PROJECT_ROOT && bash $MONITOR_SCRIPT >> $LOG_DIR/cron.log 2>&1"
+CRON_STORAGE_MONITOR="0 7 * * * cd $INSTALL_ROOT && bash $MONITOR_SCRIPT >> $LOG_DIR/cron.log 2>&1"
+
+validate_install_root() {
+    if [[ "$INSTALL_ROOT" == /tmp/* ]]; then
+      echo "Refusing to install cron from ephemeral path: $INSTALL_ROOT"
+      echo "Set CRON_PROJECT_ROOT to a persistent checkout on the host, then rerun."
+      exit 1
+    fi
+    if [[ ! -f "$MONITOR_SCRIPT" ]]; then
+      echo "Monitor script not found at: $MONITOR_SCRIPT"
+      echo "Set CRON_PROJECT_ROOT to the host path that contains scripts/storage-monitor.sh."
+      exit 1
+    fi
+}

 case "${1:-}" in
  --install)
+    validate_install_root
    mkdir -p "$LOG_DIR"
-    if crontab -l 2>/dev/null | grep -q "storage-monitor.sh"; then
-      echo "Storage monitor cron already present in crontab."
-    else
-      (crontab -l 2>/dev/null; echo "$CRON_STORAGE_MONITOR") | crontab -
-      echo "Installed storage monitor cron (daily 07:00):"
-      echo "  $CRON_STORAGE_MONITOR"
-    fi
+    {
+      crontab -l 2>/dev/null | grep -v "storage-monitor.sh" || true
+      echo "$CRON_STORAGE_MONITOR"
+    } | crontab -
+    echo "Installed storage monitor cron (daily 07:00):"
+    echo "  $CRON_STORAGE_MONITOR"
    ;;
  --show)
+    validate_install_root
    echo "Storage monitor (daily 07:00):"
    echo "  $CRON_STORAGE_MONITOR"
    ;;
--- a/scripts/maintenance/set-all-besu-max-peers-32.sh
+++ b/scripts/maintenance/set-all-besu-max-peers-32.sh
@@ -1,10 +1,10 @@
 #!/usr/bin/env bash
-# Set max-peers=32 in Besu config on all running Besu nodes (in-place sed).
+# Set max-peers=40 in Besu config on all running Besu nodes (in-place sed).
 # Run after repo configs are updated; then restart Besu with restart-besu-reload-node-lists.sh.
 # See: docs/08-monitoring/PEER_CONNECTIONS_PLAN.md
 #
 # Usage: ./scripts/maintenance/set-all-besu-max-peers-32.sh [--dry-run]
-# Requires: SSH to Proxmox hosts (r630-01, r630-02, ml110).
+# Requires: SSH to Proxmox hosts (r630-01, r630-02, r630-03).

 set -euo pipefail

@@ -14,16 +14,17 @@ PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"

 DRY_RUN=false
 [[ "${1:-}" == "--dry-run" ]] && DRY_RUN=true
+TARGET_MAX_PEERS=40

 declare -A HOST_BY_VMID
-for v in 1000 1001 1002 1500 1501 1502 2101 2500 2501 2502 2503 2504 2505; do HOST_BY_VMID[$v]="${PROXMOX_R630_01:-${PROXMOX_HOST_R630_01:-192.168.11.11}}"; done
-for v in 2201 2303 2401; do HOST_BY_VMID[$v]="${PROXMOX_R630_02:-${PROXMOX_HOST_R630_02:-192.168.11.12}}"; done
-for v in 1003 1004 1503 1504 1505 1506 1507 1508 2102 2301 2304 2305 2306 2307 2308 2400 2402 2403; do HOST_BY_VMID[$v]="${PROXMOX_ML110:-${PROXMOX_HOST_ML110:-192.168.11.10}}"; done
+for v in 1000 1001 1002 1500 1501 1502 2101 2103 2500 2501 2502 2503 2504 2505; do HOST_BY_VMID[$v]="${PROXMOX_R630_01:-${PROXMOX_HOST_R630_01:-192.168.11.11}}"; done
+for v in 2201 2303 2305 2306 2307 2308 2401; do HOST_BY_VMID[$v]="${PROXMOX_R630_02:-${PROXMOX_HOST_R630_02:-192.168.11.12}}"; done
+for v in 1003 1004 1503 1504 1505 1506 1507 1508 2102 2301 2304 2400 2402 2403; do HOST_BY_VMID[$v]="${PROXMOX_R630_03:-${PROXMOX_HOST_R630_03:-192.168.11.13}}"; done

-BESU_VMIDS=(1000 1001 1002 1003 1004 1500 1501 1502 1503 1504 1505 1506 1507 1508 2101 2102 2201 2301 2303 2304 2305 2306 2307 2308 2400 2401 2402 2403 2500 2501 2502 2503 2504 2505)
+BESU_VMIDS=(1000 1001 1002 1003 1004 1500 1501 1502 1503 1504 1505 1506 1507 1508 2101 2102 2103 2201 2301 2303 2304 2305 2306 2307 2308 2400 2401 2402 2403 2500 2501 2502 2503 2504 2505)
 SSH_OPTS="-o ConnectTimeout=8 -o StrictHostKeyChecking=accept-new"

-echo "Set max-peers=32 on all Besu nodes (dry-run=$DRY_RUN)"
+echo "Set max-peers=${TARGET_MAX_PEERS} on all Besu nodes (dry-run=$DRY_RUN)"
 echo ""

 for vmid in "${BESU_VMIDS[@]}"; do
@@ -35,7 +36,7 @@ for vmid in "${BESU_VMIDS[@]}"; do
    continue
  fi
  if $DRY_RUN; then
-    echo "VMID $vmid @ $host: [dry-run] would sed max-peers=25 -> 32"
+    echo "VMID $vmid @ $host: [dry-run] would normalize max-peers -> ${TARGET_MAX_PEERS}"
    continue
  fi
  # Try common Besu config locations; sed in place
@@ -44,7 +45,10 @@ for vmid in "${BESU_VMIDS[@]}"; do
      [ -d \"\$d\" ] || continue
      for f in \"\$d\"/*.toml; do
        [ -f \"\$f\" ] || continue
-        grep -q \"max-peers=25\" \"\$f\" 2>/dev/null && sed -i \"s/max-peers=25/max-peers=32/g\" \"\$f\" && echo \"OK:\$f\"
+        if grep -qE \"^max-peers=\" \"\$f\" 2>/dev/null; then
+          sed -i -E \"s/^max-peers=.*/max-peers=${TARGET_MAX_PEERS}/\" \"\$f\"
+          echo \"OK:\$f\"
+        fi
      done
    done
  '" 2>/dev/null || echo "FAIL")