ops: oracle publisher LXC 3500/3501, CT migrate docs, Besu/RPC maintenance

- Provision oracle-publisher on CT 3500 (quoted DATA_SOURCE URLs, dotenv). - Host-side pct-lxc-3501-net-up for ccip-monitor eth0 after migrate. - CoinGecko key script: avoid sed & corruption; document quoted URLs. - Besu node list reload, fstrim/RPC scripts, storage health docs. - Submodule smom-dbis-138: web3 v6 pin, oracle check default host r630-02. Made-with: Cursor
2026-03-28 15:22:23 -07:00
parent 56b0abe3d1
commit e0bb17eff7
16 changed files with 530 additions and 45 deletions
--- a/config/proxmox-operational-template.json
+++ b/config/proxmox-operational-template.json
@@ -764,7 +764,7 @@
      "vmid": 3500,
      "hostname": "oracle-publisher-1",
      "ipv4": "192.168.11.29",
-      "preferred_node": "r630-01",
+      "preferred_node": "r630-02",
      "category": "oracle",
      "ports": [],
      "fqdns": []
@@ -773,7 +773,7 @@
      "vmid": 3501,
      "hostname": "ccip-monitor-1",
      "ipv4": "192.168.11.28",
-      "preferred_node": "r630-01",
+      "preferred_node": "r630-02",
      "category": "ccip",
      "ports": [],
      "fqdns": []
--- a/docs/03-deployment/MIGRATE_CT_R630_01_TO_R630_02.md
+++ b/docs/03-deployment/MIGRATE_CT_R630_01_TO_R630_02.md
@@ -26,7 +26,17 @@ If both show the same cluster and the other node is listed, migration is:
 pct migrate <VMID> r630-02 --restart
 ```

-Storage will be copied to the target; choose the target storage when prompted (e.g. `thin1`). Then **delete or leave stopped** the container on r630-01 so the same VMID/IP are only on r630-02.
+**CLI caveat:** `pct migrate` may fail if the CT references storages that do not exist on the target (e.g. `local-lvm` on r630-02) or if the source storage ID is inactive on the target (e.g. `thin1` on r630-02 vs `thin1-r630-02`). Remove **stale** `unusedN` volumes only after verifying with `lvs` that they are not the same LV as `rootfs` (see incident note below).
+
+**Recommended (PVE API, maps rootfs to target pool):** use `pvesh` from the source node so disks land on e.g. `thin5`:
+
+```bash
+ssh root@192.168.11.11 "pvesh create /nodes/r630-01/lxc/<VMID>/migrate --target r630-02 --target-storage thin5 --restart 1"
+```
+
+This is the path that succeeded for **3501** (ccip-monitor) on 2026-03-28.
+
+Storage will be copied to the target. The source volume is removed after a successful migrate. **Do not** use `pct set <vmid> --delete unused0` when `unused0` and `rootfs` both name `vm-<id>-disk-0` on different storages — Proxmox can delete the **only** root LV (Oracle publisher **3500** incident, 2026-03-28).

 If the nodes are **not** in a cluster, use the backup/restore method below.

@@ -124,8 +134,8 @@ Containers that free meaningful space on r630-01 and are reasonable to run on r6
 | 6401 | indy-alltra-1      | 100G                    | ✅ Migrated (thin6) |
 | 6402 | indy-hybx-1        | 100G                    | ✅ Migrated (thin6) |
 | 5700 | dev-vm             | 400G (thin)             | ✅ Migrated (thin6) |
-| 3500 | oracle-publisher-1 | —                       | Oracle publisher |
-| 3501 | ccip-monitor-1     | —                       | CCIP monitor |
+| 3500 | oracle-publisher-1 | 20G thin1 (was)         | **2026-03-28:** root LV accidentally removed; CT **recreated** on r630-02 `thin5` (fresh template). **Redeploy** app + `.env`. |
+| 3501 | ccip-monitor-1     | 20G                     | **2026-03-28:** migrated to r630-02 **`thin5`** via `pvesh … /migrate --target-storage thin5`. **Networking:** unprivileged Ubuntu image may leave **eth0 DOWN** after migrate; `unprivileged` cannot be toggled later. Mitigation: on **r630-02** install `scripts/maintenance/pct-lxc-3501-net-up.sh` to `/usr/local/sbin/` and optional **`@reboot`** cron (see script header). |

 **High impact (larger disks):**

@@ -169,6 +179,23 @@ Example:

 See the script for exact steps (stop, vzdump, scp, restore, start, optional destroy on source).

+**Unprivileged CTs:** `vzdump` often fails with tar `Permission denied` under `lxc-usernsexec`. Prefer **section 1** `pvesh … /migrate` with `--target-storage` instead of this script for those guests.
+
+## 5a. Reprovision Oracle Publisher (VMID 3500) on r630-02
+
+After a fresh LXC template or data loss, from project root (LAN, secrets loaded):
+
+```bash
+source scripts/lib/load-project-env.sh   # or ensure PRIVATE_KEY / smom-dbis-138/.env
+./scripts/deployment/provision-oracle-publisher-lxc-3500.sh
+```
+
+Uses `web3` 6.x (POA middleware). If on-chain `updateAnswer` fails, use a `PRIVATE_KEY` for an EOA allowed on the aggregator contract.
+
+## 5b. r630-02 disk / VG limits (cannot automate)
+
+Each `thin1`–`thin6` VG on r630-02 is a **single ~231 GiB SSD** with **~124 MiB `vg_free`**. There is **no** space to `lvextend` pools until you **grow the partition/PV** or add hardware. Guest `fstrim` and migration to `thin5` reduce **data** usage only within existing pools.
+
 ---

 ## 6. References
--- a/docs/04-configuration/ALL_VMIDS_ENDPOINTS.md
+++ b/docs/04-configuration/ALL_VMIDS_ENDPOINTS.md
@@ -316,8 +316,8 @@ The following VMIDs have been permanently removed:

 | VMID | IP Address | Hostname | Status | Endpoints | Purpose |
 |------|------------|----------|--------|-----------|---------|
-| 3500 | 192.168.11.29 | oracle-publisher-1 | ✅ Running | Oracle: Various | Oracle publisher service |
-| 3501 | 192.168.11.28 | ccip-monitor-1 | ✅ Running | Monitor: Various | CCIP monitoring service |
+| 3500 | 192.168.11.29 | oracle-publisher-1 | ✅ Running (verify on-chain) | Oracle: Various | **r630-02** `thin5`. Reprovisioned 2026-03-28 via `scripts/deployment/provision-oracle-publisher-lxc-3500.sh` (systemd `oracle-publisher`). If `updateAnswer` txs revert, set `PRIVATE_KEY` in `/opt/oracle-publisher/.env` to an EOA **authorized on the aggregator** (may differ from deployer). Metrics: `:8000/metrics`. |
+| 3501 | 192.168.11.28 | ccip-monitor-1 | ✅ Running | Monitor: Various | CCIP monitoring; **migrated 2026-03-28** to **r630-02** `thin5` (`pvesh` … `/migrate --target-storage thin5`). |
 | 5200 | 192.168.11.80 | cacti-1 | ✅ Running | Web: 80, 443 | Network monitoring (Cacti); **host r630-02** (migrated 2026-02-15) |

 ---
--- a/docs/04-configuration/COINGECKO_API_KEY_SETUP.md
+++ b/docs/04-configuration/COINGECKO_API_KEY_SETUP.md
@@ -103,13 +103,19 @@ systemctl restart token-aggregation

 ### For Oracle Publisher Service

+**Gas (Chain 138 / Besu):** In `/opt/oracle-publisher/.env`, use **`GAS_LIMIT=400000`** (not `100000`). The aggregator `updateAnswer` call can **run out of gas** at 100k (`gasUsed == gasLimit`, failed receipt) even when `isTransmitter` is true. Align with `ORACLE_UPDATE_GAS_LIMIT` in `smom-dbis-138/scripts/update-oracle-price.sh`. **`GAS_PRICE=1000000000`** (1 gwei) matches that script’s legacy defaults.
+
+**Quoted URLs in `.env`:** `DATA_SOURCE_1_URL` (and Coinbase `DATA_SOURCE_2_URL`) must be **double-quoted** when passed through **systemd `EnvironmentFile`**, because unquoted `&` in query strings can be parsed incorrectly and corrupt the value. **`scripts/update-oracle-publisher-coingecko-key.sh`** uses `grep` + `append` (not `sed` with `&` in the replacement). Do not use `sed 's|...|...&...|'` for URLs that contain `&`.
+
+**Dotenv sources for provisioning:** `scripts/lib/load-project-env.sh` loads **project root `.env`** then **`smom-dbis-138/.env`** — so `PRIVATE_KEY` / `DEPLOYER_PRIVATE_KEY`, `COINGECKO_API_KEY` (root `.env`), and `AGGREGATOR_ADDRESS` are available to `scripts/deployment/provision-oracle-publisher-lxc-3500.sh` and `scripts/update-oracle-publisher-coingecko-key.sh`.
+
 **Step 1: SSH to Proxmox host**

 ```bash
-ssh root@192.168.11.10
+ssh root@192.168.11.12
 ```

-**Step 2: Access Oracle Publisher container**
+**Step 2: Access Oracle Publisher container** (VMID 3500 runs on **r630-02**)

 ```bash
 pct exec 3500 -- bash
@@ -162,10 +168,10 @@ npm run test -- coingecko-adapter.test.ts

 ```bash
 # Check .env file
-ssh root@192.168.11.10 "pct exec 3500 -- cat /opt/oracle-publisher/.env | grep COINGECKO"
+ssh root@192.168.11.12 "pct exec 3500 -- cat /opt/oracle-publisher/.env | grep COINGECKO"

 # Check service logs
-ssh root@192.168.11.10 "pct exec 3500 -- journalctl -u oracle-publisher -n 50 | grep -i coingecko"
+ssh root@192.168.11.12 "pct exec 3500 -- journalctl -u oracle-publisher -n 50 | grep -i coingecko"

 # Should see successful price fetches without 429 rate limit errors
 ```
--- a/docs/04-configuration/STORAGE_GROWTH_AND_HEALTH.md
+++ b/docs/04-configuration/STORAGE_GROWTH_AND_HEALTH.md
@@ -1,8 +1,18 @@
 # Storage Growth and Health — Predictable Growth Table & Proactive Monitoring

-**Last updated:** 2026-02-15  
+**Last updated:** 2026-03-28  
 **Purpose:** Real-time data collection and a predictable growth table so we can stay ahead of disk space issues on hosts and VMs.

+### Recent operator maintenance (2026-03-28)
+
+- **r630-01 `pve/data` (local-lvm):** Thin pool extended (+80 GiB data, +512 MiB metadata earlier); **LVM thin auto-extend** enabled in `lvm.conf` (`thin_pool_autoextend_threshold = 80`, `thin_pool_autoextend_percent = 20`); **dmeventd** must stay active.
+- **r630-01 `pve/thin1`:** Pool extended (+48 GiB data, +256 MiB metadata) to reduce pressure; metadata percent dropped accordingly.
+- **r630-01 `/var/lib/vz/dump`:** Removed obsolete **2026-02-15** vzdump archives/logs (~9 GiB); newer logs from 2026-02-28 retained.
+- **Fleet guest `fstrim`:** `scripts/maintenance/fstrim-all-running-ct.sh` supports **`FSTRIM_TIMEOUT_SEC`** and **`FSTRIM_HOSTS`** (e.g. `ml110`, `r630-01`, `r630-02`). Many CTs return FITRIM “not permitted” (guest/filesystem); others reclaim space on the thin pools (notably on **r630-02**).
+- **r630-02 `thin1`–`thin6` VGs:** Each VG is on a **single PV** with only **~124 MiB `vg_free`**; you **cannot** `lvextend` those thin pools until the underlying partition/disk is grown or a second PV is added. Monitor `pvesm status` and plan disk expansion before pools tighten.
+- **CT migration** off r630-01 for load balance remains a **planned** action when maintenance windows and target storage allow (not automated here).
+- **2026-03-28 (migration follow-up):** CT **3501** migrated to r630-02 **`thin5`** via `pvesh … lxc/3501/migrate --target-storage thin5`. CT **3500** had root LV removed after a mistaken `pct set --delete unused0` (config had `unused0: local-lvm:vm-3500-disk-0` and `rootfs: thin1:vm-3500-disk-0`); **3500** was recreated empty on r630-02 `thin5` — **reinstall Oracle Publisher** on the guest. See `MIGRATE_CT_R630_01_TO_R630_02.md`.
+
 ---

 ## 1. Real-time data collection
--- a/scripts/besu/restart-besu-reload-node-lists.sh
+++ b/scripts/besu/restart-besu-reload-node-lists.sh
@@ -46,7 +46,7 @@ for vmid in "${BESU_VMIDS[@]}"; do
    continue
  fi
  # Detect Besu unit: besu-validator, besu-sentry, besu-rpc, or generic besu.service (1505-1508, 2500-2505)
-  result=$(ssh $SSH_OPTS "root@$host" "pct exec $vmid -- bash -c 'svc=\$(systemctl list-units --type=service --no-legend 2>/dev/null | grep -iE \"besu-validator|besu-sentry|besu-rpc|besu\\.service\" | head -1 | awk \"{print \\\$1}\"); if [ -n \"\$svc\" ]; then systemctl restart \"\$svc\" && echo \"OK:\$svc\"; else echo \"NONE\"; fi'" 2>/dev/null || echo "FAIL")
+  result=$(ssh $SSH_OPTS "root@$host" "timeout 180 pct exec $vmid -- bash -c 'svc=\$(systemctl list-units --type=service --no-legend 2>/dev/null | grep -iE \"besu-validator|besu-sentry|besu-rpc|besu\\.service\" | head -1 | awk \"{print \\\$1}\"); if [ -n \"\$svc\" ]; then systemctl restart \"\$svc\" && echo \"OK:\$svc\"; else echo \"NONE\"; fi'" 2>/dev/null || echo "FAIL")
  if [[ "$result" == OK:* ]]; then
    echo "VMID $vmid @ $host: restarted (${result#OK:})"
    ((ok++)) || true
--- a/scripts/deployment/provision-oracle-publisher-lxc-3500.sh
+++ b/scripts/deployment/provision-oracle-publisher-lxc-3500.sh
@@ -0,0 +1,136 @@
+#!/usr/bin/env bash
+# Install Oracle Publisher on LXC 3500 (fresh Ubuntu template). Run from project root on LAN.
+# Sources scripts/lib/load-project-env.sh for PRIVATE_KEY, AGGREGATOR_ADDRESS, COINGECKO_API_KEY, etc.
+#
+# Usage: ./scripts/deployment/provision-oracle-publisher-lxc-3500.sh
+# Env:   ORACLE_LXC_PROXMOX_HOST (default 192.168.11.12 — node where VMID 3500 runs; do not use root PROXMOX_HOST)
+#        ORACLE_VMID (default 3500)
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
+# shellcheck source=/dev/null
+source "${PROJECT_ROOT}/scripts/lib/load-project-env.sh"
+
+PROXMOX_HOST="${ORACLE_LXC_PROXMOX_HOST:-${PROXMOX_HOST_R630_02:-192.168.11.12}}"
+ORACLE_VMID="${ORACLE_VMID:-3500}"
+ORACLE_HOME="/opt/oracle-publisher"
+ORACLE_USER="${ORACLE_USER:-oracle}"
+RPC_URL="${RPC_URL:-http://192.168.11.211:8545}"
+AGGREGATOR_ADDRESS="${AGGREGATOR_ADDRESS:-${ORACLE_AGGREGATOR_ADDRESS:-0x99b3511a2d315a497c8112c1fdd8d508d4b1e506}}"
+ORACLE_PROXY_ADDRESS="${ORACLE_PROXY_ADDRESS:-0x3304b747e565a97ec8ac220b0b6a1f6ffdb837e6}"
+SSH_OPTS=(-o ConnectTimeout=25 -o StrictHostKeyChecking=accept-new)
+
+if [[ -z "${PRIVATE_KEY:-}" ]]; then
+  echo "ERROR: PRIVATE_KEY not set. Source smom-dbis-138/.env or export PRIVATE_KEY before running." >&2
+  exit 1
+fi
+
+PY_SRC="${PROJECT_ROOT}/smom-dbis-138/services/oracle-publisher/oracle_publisher.py"
+REQ="${PROJECT_ROOT}/smom-dbis-138/services/oracle-publisher/requirements.txt"
+[[ -f "$PY_SRC" ]] || { echo "ERROR: missing $PY_SRC" >&2; exit 1; }
+[[ -f "$REQ" ]] || { echo "ERROR: missing $REQ" >&2; exit 1; }
+
+remote() { ssh "${SSH_OPTS[@]}" "root@${PROXMOX_HOST}" "$@"; }
+
+echo "=== Provisioning Oracle Publisher: host=${PROXMOX_HOST} vmid=${ORACLE_VMID} ==="
+
+remote "pct status ${ORACLE_VMID}" >/dev/null
+
+echo "[1/6] OS packages + oracle user..."
+remote "pct exec ${ORACLE_VMID} -- bash -es" <<EOS
+export DEBIAN_FRONTEND=noninteractive
+apt-get update -qq
+apt-get install -y -qq python3 python3-pip python3-venv ca-certificates curl
+if ! id -u ${ORACLE_USER} &>/dev/null; then
+  useradd -r -s /bin/bash -d ${ORACLE_HOME} -m ${ORACLE_USER}
+fi
+mkdir -p ${ORACLE_HOME}
+chown -R ${ORACLE_USER}:${ORACLE_USER} ${ORACLE_HOME}
+EOS
+
+echo "[2/6] Push Python app + requirements..."
+scp "${SSH_OPTS[@]}" "$PY_SRC" "root@${PROXMOX_HOST}:/tmp/oracle_publisher.py"
+scp "${SSH_OPTS[@]}" "$REQ" "root@${PROXMOX_HOST}:/tmp/oracle-requirements.txt"
+remote "pct push ${ORACLE_VMID} /tmp/oracle_publisher.py ${ORACLE_HOME}/oracle_publisher.py"
+remote "pct push ${ORACLE_VMID} /tmp/oracle-requirements.txt ${ORACLE_HOME}/requirements.txt"
+remote "pct exec ${ORACLE_VMID} -- chown ${ORACLE_USER}:${ORACLE_USER} ${ORACLE_HOME}/oracle_publisher.py ${ORACLE_HOME}/requirements.txt"
+remote "pct exec ${ORACLE_VMID} -- chmod 755 ${ORACLE_HOME}/oracle_publisher.py"
+
+echo "[3/6] Python venv + pip..."
+remote "pct exec ${ORACLE_VMID} -- bash -es" <<EOS
+sudo -u ${ORACLE_USER} python3 -m venv ${ORACLE_HOME}/venv
+sudo -u ${ORACLE_USER} ${ORACLE_HOME}/venv/bin/pip install -q --upgrade pip
+sudo -u ${ORACLE_USER} ${ORACLE_HOME}/venv/bin/pip install -q -r ${ORACLE_HOME}/requirements.txt || true
+# Minimal set if optional OTEL packages fail; web3 v7 breaks geth_poa_middleware — pin v6
+sudo -u ${ORACLE_USER} ${ORACLE_HOME}/venv/bin/pip install -q 'web3>=6.15,<7' eth-account requests python-dotenv prometheus-client || true
+EOS
+
+echo "[4/6] Write .env (no stdout of secrets)..."
+ENV_TMP="$(mktemp)"
+chmod 600 "$ENV_TMP"
+# Quote URLs for systemd EnvironmentFile: unquoted "&" can break parsing / concatenation.
+DS1_URL="https://api.coingecko.com/api/v3/simple/price?ids=ethereum&vs_currencies=usd"
+if [[ -n "${COINGECKO_API_KEY:-}" ]]; then
+  DS1_URL="${DS1_URL}&x_cg_demo_api_key=${COINGECKO_API_KEY}"
+fi
+{
+  echo "RPC_URL=${RPC_URL}"
+  echo "AGGREGATOR_ADDRESS=${AGGREGATOR_ADDRESS}"
+  echo "PRIVATE_KEY=${PRIVATE_KEY}"
+  echo "HEARTBEAT=60"
+  echo "DEVIATION_THRESHOLD=0.5"
+  echo "ORACLE_ADDRESS=${ORACLE_PROXY_ADDRESS}"
+  echo "CHAIN_ID=138"
+  echo "COINGECKO_API_KEY=${COINGECKO_API_KEY:-}"
+  echo "DATA_SOURCE_1_URL=\"${DS1_URL}\""
+  echo "DATA_SOURCE_1_PARSER=ethereum.usd"
+  echo "DATA_SOURCE_2_URL=\"https://api.coinbase.com/v2/prices/ETH-USD/spot\""
+  echo "DATA_SOURCE_2_PARSER=data.amount"
+  # Match smom-dbis-138/scripts/update-oracle-price.sh (100k was OOG on aggregator)
+  echo "GAS_LIMIT=400000"
+  echo "GAS_PRICE=1000000000"
+} > "$ENV_TMP"
+scp "${SSH_OPTS[@]}" "$ENV_TMP" "root@${PROXMOX_HOST}:/tmp/oracle-publisher.env"
+rm -f "$ENV_TMP"
+remote "pct push ${ORACLE_VMID} /tmp/oracle-publisher.env ${ORACLE_HOME}/.env"
+remote "pct exec ${ORACLE_VMID} -- chown ${ORACLE_USER}:${ORACLE_USER} ${ORACLE_HOME}/.env"
+remote "pct exec ${ORACLE_VMID} -- chmod 600 ${ORACLE_HOME}/.env"
+remote "rm -f /tmp/oracle-publisher.env"
+
+echo "[5/6] systemd unit..."
+remote "pct exec ${ORACLE_VMID} -- bash -es" <<EOF
+cat > /etc/systemd/system/oracle-publisher.service <<'UNIT'
+[Unit]
+Description=Oracle Publisher Service (Chain 138)
+After=network.target
+Wants=network-online.target
+
+[Service]
+Type=simple
+User=${ORACLE_USER}
+Group=${ORACLE_USER}
+WorkingDirectory=${ORACLE_HOME}
+Environment="PATH=${ORACLE_HOME}/venv/bin:/usr/local/bin:/usr/bin:/bin"
+EnvironmentFile=-${ORACLE_HOME}/.env
+ExecStart=${ORACLE_HOME}/venv/bin/python ${ORACLE_HOME}/oracle_publisher.py
+Restart=always
+RestartSec=15
+NoNewPrivileges=true
+
+[Install]
+WantedBy=multi-user.target
+UNIT
+systemctl daemon-reload
+systemctl enable oracle-publisher.service
+EOF
+
+echo "[6/6] Start service..."
+remote "pct exec ${ORACLE_VMID} -- systemctl restart oracle-publisher.service"
+sleep 3
+remote "pct exec ${ORACLE_VMID} -- systemctl is-active oracle-publisher.service"
+
+echo ""
+echo "OK: Oracle Publisher on VMID ${ORACLE_VMID} (${PROXMOX_HOST})."
+echo "Logs: ssh root@${PROXMOX_HOST} \"pct exec ${ORACLE_VMID} -- journalctl -u oracle-publisher -n 40 --no-pager\""
--- a/scripts/maintenance/fstrim-all-running-ct.sh
+++ b/scripts/maintenance/fstrim-all-running-ct.sh
@@ -3,6 +3,10 @@
 # Usage: ./scripts/maintenance/fstrim-all-running-ct.sh [--dry-run]
 # Requires: SSH key-based access to ml110, r630-01, r630-02.
 # See: docs/04-configuration/STORAGE_GROWTH_AND_HEALTH.md
+#
+# Environment (optional):
+#   FSTRIM_TIMEOUT_SEC   Seconds per CT (default 180). Use 45–60 for faster fleet passes when many CTs hang on FITRIM.
+#   FSTRIM_HOSTS         Space-separated host keys: ml110 r630-01 r630-02 (default: all three).

 set -euo pipefail

@@ -14,10 +18,14 @@ ML110="${PROXMOX_HOST_ML110:-192.168.11.10}"
 R630_01="${PROXMOX_HOST_R630_01:-192.168.11.11}"
 R630_02="${PROXMOX_HOST_R630_02:-192.168.11.12}"

+FSTRIM_TIMEOUT_SEC="${FSTRIM_TIMEOUT_SEC:-180}"
+# shellcheck disable=SC2206
+FSTRIM_HOSTS_ARR=(${FSTRIM_HOSTS:-ml110 r630-01 r630-02})
+
 DRY_RUN=0
 [[ "${1:-}" == "--dry-run" ]] && DRY_RUN=1

-run_ssh() { ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@"$1" "$2" 2>/dev/null || true; }
+run_ssh() { ssh -o ConnectTimeout=15 -o ServerAliveInterval=10 -o StrictHostKeyChecking=accept-new root@"$1" "$2" 2>/dev/null || true; }

 fstrim_host() {
  local host_ip="$1" host_name="$2"
@@ -29,21 +37,30 @@ fstrim_host() {
  fi
  for vmid in $vmids; do
    if [[ $DRY_RUN -eq 1 ]]; then
-      echo "  [dry-run] $host_name VMID $vmid: would run fstrim -v /"
+      echo "  [dry-run] $host_name VMID $vmid: would run fstrim -v / (timeout ${FSTRIM_TIMEOUT_SEC}s)"
    else
-      out=$(run_ssh "$host_ip" "pct exec $vmid -- fstrim -v / 2>&1" || true)
+      # timeout: some CTs hang on FITRIM or slow storage; do not block entire fleet
+      out=$(run_ssh "$host_ip" "timeout \"${FSTRIM_TIMEOUT_SEC}\" pct exec $vmid -- fstrim -v / 2>&1" || true)
      echo "  $host_name VMID $vmid: ${out:-done}"
    fi
  done
 }

 echo "=== fstrim all running CTs (reclaim thin pool space) ==="
+echo "  timeout_per_ct=${FSTRIM_TIMEOUT_SEC}s  hosts=${FSTRIM_HOSTS_ARR[*]}"
 [[ $DRY_RUN -eq 1 ]] && echo "(dry-run: no changes)"
 echo ""

-fstrim_host "$ML110" "ml110"
-fstrim_host "$R630_01" "r630-01"
-fstrim_host "$R630_02" "r630-02"
+for key in "${FSTRIM_HOSTS_ARR[@]}"; do
+  case "$key" in
+    ml110) fstrim_host "$ML110" "ml110" ;;
+    r630-01) fstrim_host "$R630_01" "r630-01" ;;
+    r630-02) fstrim_host "$R630_02" "r630-02" ;;
+    *)
+      echo "  Unknown FSTRIM_HOSTS entry: $key (use ml110, r630-01, r630-02)"
+      ;;
+  esac
+done

 echo ""
 echo "Done. Schedule weekly via cron or run with daily-weekly-checks weekly."
--- a/scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh
+++ b/scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh
@@ -1,8 +1,9 @@
 #!/usr/bin/env bash
-# Make RPC VMIDs (2101, 2500-2505) writable by running e2fsck on their rootfs (fixes read-only remount after ext4 errors).
+# Make Besu CT rootfs writable by running e2fsck on their root LV (fixes read-only / emergency_ro after ext4 errors).
 # SSHs to the Proxmox host (r630-01), stops each CT, runs e2fsck -f -y on the LV, starts the CT.
 #
 # Usage: ./scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh [--dry-run]
+# Optional: BESU_WRITABLE_VMIDS="1500 1501 1502" to add sentries or other CTs (default: Core RPC 2101 only).
 # Run from project root. Requires: SSH to r630-01 (root, key-based).
 # See: docs/00-meta/502_DEEP_DIVE_ROOT_CAUSES_AND_FIXES.md §Read-only CT

@@ -13,9 +14,14 @@ PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
 [[ -f "${PROJECT_ROOT}/config/ip-addresses.conf" ]] && source "${PROJECT_ROOT}/config/ip-addresses.conf" 2>/dev/null || true

 HOST="${PROXMOX_HOST_R630_01:-192.168.11.11}"
-# RPC VMIDs on r630-01: Core (2101) + Alltra/HYBX (2500-2505)
-RPC_VMIDS=(2101 2500 2501 2502 2503 2504 2505)
-SSH_OPTS="-o ConnectTimeout=15 -o StrictHostKeyChecking=accept-new"
+# Default: Core RPC on r630-01 (2101). 2500-2505 removed — destroyed; see ALL_VMIDS_ENDPOINTS.md.
+# Add sentries with: BESU_WRITABLE_VMIDS="1500 1501 1502 2101" ./scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh
+if [[ -n "${BESU_WRITABLE_VMIDS:-}" ]]; then
+  read -r -a RPC_VMIDS <<< "${BESU_WRITABLE_VMIDS}"
+else
+  RPC_VMIDS=(2101)
+fi
+SSH_OPTS="-o ConnectTimeout=20 -o ServerAliveInterval=15 -o StrictHostKeyChecking=accept-new"

 DRY_RUN=false
 [[ "${1:-}" == "--dry-run" ]] && DRY_RUN=true
--- a/scripts/maintenance/migrate-ct-r630-01-to-r630-02.sh
+++ b/scripts/maintenance/migrate-ct-r630-01-to-r630-02.sh
@@ -2,6 +2,14 @@
 # Migrate one LXC container from r630-01 to r630-02 (backup → copy → restore).
 # Use to free space on r630-01's thin pool. Run from project root (LAN); needs SSH to both hosts.
 #
+# IMPORTANT — unprivileged CTs: vzdump often fails with tar "Permission denied" inside the guest.
+# Prefer cluster migration via API (maps source storage to target), e.g.:
+#   ssh root@192.168.11.11 "pvesh create /nodes/r630-01/lxc/<VMID>/migrate --target r630-02 --target-storage thin5 --restart 1"
+# See docs/03-deployment/MIGRATE_CT_R630_01_TO_R630_02.md
+#
+# NEVER run `pct set <vmid> --delete unused0` if unused0 and rootfs reference the same disk name
+# on different storages (e.g. local-lvm:vm-N-disk-0 vs thin1:vm-N-disk-0) — Proxmox may remove the only root LV.
+#
 # Usage:
 #   ./scripts/maintenance/migrate-ct-r630-01-to-r630-02.sh <VMID> [target_storage]
 #   ./scripts/maintenance/migrate-ct-r630-01-to-r630-02.sh 5200 thin1
--- a/scripts/maintenance/pct-lxc-3501-net-up.sh
+++ b/scripts/maintenance/pct-lxc-3501-net-up.sh
@@ -0,0 +1,22 @@
+#!/usr/bin/env bash
+# Bring up static networking inside unprivileged LXC 3501 (ccip-monitor) when eth0 stays DOWN.
+# Run on the Proxmox node that hosts VMID 3501 (r630-02). Optional: @reboot cron.
+#
+# Usage (on r630-02 as root): /usr/local/sbin/pct-lxc-3501-net-up.sh
+# Install: scp to r630-02 /usr/local/sbin/ && chmod +x
+
+set -euo pipefail
+VMID="${CCIP_MONITOR_VMID:-3501}"
+IP="${CCIP_MONITOR_IP:-192.168.11.28/24}"
+GW="${CCIP_MONITOR_GW:-192.168.11.1}"
+BCAST="${CCIP_MONITOR_BCAST:-192.168.11.255}"
+
+if ! pct status "$VMID" 2>/dev/null | grep -q running; then
+  exit 0
+fi
+
+pct exec "$VMID" -- ip link set eth0 up
+pct exec "$VMID" -- ip addr replace "$IP" dev eth0 broadcast "$BCAST" 2>/dev/null || \
+  pct exec "$VMID" -- ip addr add "$IP" dev eth0 broadcast "$BCAST"
+pct exec "$VMID" -- ip route replace default via "$GW" dev eth0 2>/dev/null || \
+  pct exec "$VMID" -- ip route add default via "$GW" dev eth0
--- a/scripts/maintenance/proxmox-host-io-optimize-pass.sh
+++ b/scripts/maintenance/proxmox-host-io-optimize-pass.sh
@@ -0,0 +1,115 @@
+#!/usr/bin/env bash
+# Additional pass: diagnose I/O + load on Proxmox nodes, then apply safe host-level optimizations.
+# - Reports: load, PSI, zpool, pvesm, scrub, vzdump, running CT count
+# - Applies (idempotent): vm.swappiness on ml110; sysstat; host fstrim where supported
+#
+# Usage: ./scripts/maintenance/proxmox-host-io-optimize-pass.sh [--diagnose-only]
+# Requires: SSH key root@ ml110, r630-01, r630-02 (see config/ip-addresses.conf)
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
+# shellcheck source=/dev/null
+source "${PROJECT_ROOT}/config/ip-addresses.conf" 2>/dev/null || true
+
+ML="${PROXMOX_ML110:-${PROXMOX_HOST_ML110:-192.168.11.10}}"
+R1="${PROXMOX_R630_01:-${PROXMOX_HOST_R630_01:-192.168.11.11}}"
+R2="${PROXMOX_R630_02:-${PROXMOX_HOST_R630_02:-192.168.11.12}}"
+
+SSH_OPTS=(-o ConnectTimeout=20 -o ServerAliveInterval=15 -o StrictHostKeyChecking=accept-new)
+DIAG_ONLY=false
+[[ "${1:-}" == "--diagnose-only" ]] && DIAG_ONLY=true
+
+remote() { ssh "${SSH_OPTS[@]}" "root@$1" bash -s; }
+
+echo "=== Proxmox host I/O optimize pass ($(date -Is)) ==="
+echo "  ml110=$ML  r630-01=$R1  r630-02=$R2  diagnose-only=$DIAG_ONLY"
+echo ""
+
+for H in "$ML" "$R1" "$R2"; do
+  echo "########## DIAGNOSTIC: $H ##########"
+  remote "$H" <<'EOS'
+set +e
+hostname
+uptime
+echo "--- PSI ---"
+cat /proc/pressure/cpu 2>/dev/null | head -2
+cat /proc/pressure/io 2>/dev/null | head -2
+echo "--- pvesm ---"
+pvesm status 2>/dev/null | head -25
+echo "--- running workloads ---"
+echo -n "LXC running: "; pct list 2>/dev/null | awk 'NR>1 && $2=="running"' | wc -l
+echo -n "VM running: "; qm list 2>/dev/null | awk 'NR>1 && $3=="running"' | wc -l
+echo "--- vzdump ---"
+ps aux 2>/dev/null | grep -E '[v]zdump|[p]bs-|proxmox-backup' | head -5 || echo "(none visible)"
+echo "--- ZFS ---"
+zpool status 2>/dev/null | head -20 || echo "no zfs"
+echo "--- scrub ---"
+zpool status 2>/dev/null | grep -E 'scan|scrub' || true
+EOS
+  echo ""
+done
+
+if $DIAG_ONLY; then
+  echo "Diagnose-only: done."
+  exit 0
+fi
+
+echo "########## OPTIMIZE: ml110 swappiness ##########"
+remote "$ML" <<'EOS'
+set -e
+F=/etc/sysctl.d/99-proxmox-ml110-swappiness.conf
+if ! grep -q '^vm.swappiness=10$' "$F" 2>/dev/null; then
+  printf '%s\n' '# Prefer RAM over swap when plenty of memory free (operator pass)' 'vm.swappiness=10' > "$F"
+  sysctl -p "$F"
+  echo "Wrote and applied $F"
+else
+  echo "Already vm.swappiness=10 in $F"
+  sysctl vm.swappiness=10 2>/dev/null || true
+fi
+EOS
+echo ""
+
+echo "########## OPTIMIZE: sysstat (all hosts) ##########"
+for H in "$ML" "$R1" "$R2"; do
+  echo "--- $H ---"
+  remote "$H" <<'EOS'
+set -e
+export DEBIAN_FRONTEND=noninteractive
+if command -v sar >/dev/null 2>&1; then
+  echo "sysstat already present"
+else
+  apt-get update -qq && apt-get install -y -qq sysstat
+fi
+sed -i 's/^ENABLED="false"/ENABLED="true"/' /etc/default/sysstat 2>/dev/null || true
+systemctl enable sysstat 2>/dev/null || true
+systemctl restart sysstat 2>/dev/null || true
+echo "sar: $(command -v sar || echo missing)"
+EOS
+done
+echo ""
+
+echo "########## OPTIMIZE: host fstrim (hypervisor root / and /var/lib/vz if supported) ##########"
+for H in "$ML" "$R1" "$R2"; do
+  echo "--- $H ---"
+  remote "$H" <<'EOS'
+set +e
+for m in / /var/lib/vz; do
+  if mountpoint -q "$m" 2>/dev/null; then
+    out=$(fstrim -v "$m" 2>&1)
+    echo "$m: $out"
+  fi
+done
+EOS
+done
+echo ""
+
+echo "########## POST: quick load snapshot ##########"
+for H in "$ML" "$R1" "$R2"; do
+  echo -n "$H "
+  ssh "${SSH_OPTS[@]}" "root@$H" "cat /proc/loadavg | cut -d' ' -f1-3" 2>/dev/null || echo "unreachable"
+done
+
+echo ""
+echo "Done. Optional: run ./scripts/maintenance/fstrim-all-running-ct.sh during a quiet window (can be I/O heavy)."
--- a/scripts/maintenance/restart-ml110-besu-rpc-staggered.sh
+++ b/scripts/maintenance/restart-ml110-besu-rpc-staggered.sh
@@ -0,0 +1,55 @@
+#!/usr/bin/env bash
+# Staggered restart of Besu RPC services on ML110 (192.168.11.10) only.
+# Use after fleet restarts or when multiple RPC CTs compete for disk — avoids all nodes stuck in RocksDB open/compact.
+#
+# Usage: ./scripts/maintenance/restart-ml110-besu-rpc-staggered.sh [--dry-run]
+# Env: ML110_WAIT_SEC between restarts (default 75), PROXMOX_HOST_ML110 (default 192.168.11.10)
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
+# shellcheck source=/dev/null
+source "${PROJECT_ROOT}/config/ip-addresses.conf" 2>/dev/null || true
+
+HOST="${PROXMOX_ML110:-${PROXMOX_HOST_ML110:-192.168.11.10}}"
+WAIT="${ML110_WAIT_SEC:-75}"
+SSH_OPTS=(-o ConnectTimeout=25 -o ServerAliveInterval=15 -o StrictHostKeyChecking=accept-new)
+
+# RPC-only CTs on ML110 (see ALL_VMIDS_ENDPOINTS.md)
+RPC_VMIDS=(2102 2301 2304 2305 2306 2307 2308 2400 2402 2403)
+
+DRY_RUN=false
+[[ "${1:-}" == "--dry-run" ]] && DRY_RUN=true
+
+echo "=== Staggered besu-rpc restart on $HOST ==="
+echo "  VMIDs: ${RPC_VMIDS[*]}"
+echo "  Wait between: ${WAIT}s  dry-run=$DRY_RUN"
+echo ""
+
+if ! ssh "${SSH_OPTS[@]}" "root@$HOST" "echo OK" 2>/dev/null; then
+  echo "Cannot SSH to root@$HOST" >&2
+  exit 1
+fi
+
+last="${RPC_VMIDS[$(( ${#RPC_VMIDS[@]} - 1 ))]}"
+for vmid in "${RPC_VMIDS[@]}"; do
+  if $DRY_RUN; then
+    echo "[dry-run] would restart VMID $vmid"
+  else
+    echo "$(date -Is) restarting VMID $vmid ..."
+    if ssh "${SSH_OPTS[@]}" "root@$HOST" "timeout 180 pct exec $vmid -- systemctl restart besu-rpc.service"; then
+      echo "  OK"
+    else
+      echo "  FAIL (timeout or error)" >&2
+    fi
+  fi
+  if [[ "$vmid" != "$last" ]] && ! $DRY_RUN; then
+    echo "  waiting ${WAIT}s ..."
+    sleep "$WAIT"
+  fi
+done
+
+echo ""
+echo "Done. Wait 2–5 minutes for 2402/2403 if RocksDB compaction runs; then:"
+echo "  ./scripts/verify/check-chain138-rpc-health.sh"
--- a/scripts/update-oracle-publisher-coingecko-key.sh
+++ b/scripts/update-oracle-publisher-coingecko-key.sh
@@ -4,15 +4,11 @@

 set -euo pipefail

-# Load IP configuration
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
+# shellcheck source=/dev/null
 source "${PROJECT_ROOT}/config/ip-addresses.conf" 2>/dev/null || true

-
-SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
-PROJECT_ROOT="$(cd "$SCRIPT_DIR/.." && pwd)"
-
 # Colors
 RED='\033[0;31m'
 GREEN='\033[0;32m'
@@ -25,8 +21,8 @@ log_success() { echo -e "${GREEN}[✓]${NC} $1"; }
 log_warn() { echo -e "${YELLOW}[⚠]${NC} $1"; }
 log_error() { echo -e "${RED}[✗]${NC} $1"; }

-# Configuration
-PROXMOX_HOST="${PROXMOX_HOST:-192.168.11.10}"
+# Oracle Publisher LXC 3500 is on r630-02 (2026-03-28+)
+PROXMOX_HOST="${PROXMOX_ORACLE_PUBLISHER_HOST:-${PROXMOX_HOST_R630_02:-192.168.11.12}}"
 ORACLE_VMID="${ORACLE_VMID:-3500}"
 COINGECKO_API_KEY="${COINGECKO_API_KEY:?COINGECKO_API_KEY must be set. Export from .env or use: export COINGECKO_API_KEY=your-key}"

@@ -71,20 +67,12 @@ else
 fi

 # Update DATA_SOURCE_1_URL to include API key
+# NOTE: Do not use sed with the URL in the replacement string — query params contain "&", which sed treats as "matched text".
 log_info "Updating DATA_SOURCE_1_URL with API key..."
-
-# Check if DATA_SOURCE_1_URL exists
-if echo "$CURRENT_ENV" | grep -q "^DATA_SOURCE_1_URL="; then
-    # Update existing URL
-    NEW_URL="https://api.coingecko.com/api/v3/simple/price?ids=ethereum&vs_currencies=usd&x_cg_demo_api_key=$COINGECKO_API_KEY"
-    ssh "root@$PROXMOX_HOST" "pct exec $ORACLE_VMID -- bash -c 'sed -i \"s|^DATA_SOURCE_1_URL=.*|DATA_SOURCE_1_URL=$NEW_URL|\" /opt/oracle-publisher/.env'"
-    log_success "Updated DATA_SOURCE_1_URL"
-else
-    # Add new URL
-    NEW_URL="https://api.coingecko.com/api/v3/simple/price?ids=ethereum&vs_currencies=usd&x_cg_demo_api_key=$COINGECKO_API_KEY"
-    ssh "root@$PROXMOX_HOST" "pct exec $ORACLE_VMID -- bash -c 'echo \"DATA_SOURCE_1_URL=$NEW_URL\" >> /opt/oracle-publisher/.env'"
-    log_success "Added DATA_SOURCE_1_URL"
-fi
+NEW_URL="https://api.coingecko.com/api/v3/simple/price?ids=ethereum&vs_currencies=usd&x_cg_demo_api_key=$COINGECKO_API_KEY"
+# Double-quote value for systemd EnvironmentFile (ampersands in URL).
+ssh "root@$PROXMOX_HOST" "pct exec $ORACLE_VMID -- bash -c 'grep -v \"^DATA_SOURCE_1_URL=\" /opt/oracle-publisher/.env > /tmp/op.env.$$ && mv /tmp/op.env.$$ /opt/oracle-publisher/.env && printf \"%s\\n\" \"DATA_SOURCE_1_URL=\\\"$NEW_URL\\\"\" >> /opt/oracle-publisher/.env'"
+log_success "DATA_SOURCE_1_URL set (grep+append, quoted for systemd)"

 # Ensure DATA_SOURCE_1_PARSER is set correctly
 log_info "Ensuring DATA_SOURCE_1_PARSER is set..."
@@ -106,7 +94,7 @@ VERIFIED_KEY=$(ssh "root@$PROXMOX_HOST" "pct exec $ORACLE_VMID -- grep '^COINGEC
 VERIFIED_URL=$(ssh "root@$PROXMOX_HOST" "pct exec $ORACLE_VMID -- grep '^DATA_SOURCE_1_URL=' /opt/oracle-publisher/.env | cut -d= -f2-" || echo "")

 if [ "$VERIFIED_KEY" = "$COINGECKO_API_KEY" ]; then
-    log_success "CoinGecko API key verified: $VERIFIED_KEY"
+    log_success "CoinGecko API key verified (length ${#VERIFIED_KEY} chars; value not logged)"
 else
    log_error "API key verification failed"
    exit 1
--- a/scripts/verify/check-chain138-rpc-health.sh
+++ b/scripts/verify/check-chain138-rpc-health.sh
@@ -0,0 +1,95 @@
+#!/usr/bin/env bash
+# Chain 138 — RPC health: parallel head check + per-node peer count.
+# Exit 0 if all HTTP RPCs respond, head spread <= max_blocks_spread, each peer count >= min_peers.
+#
+# Usage: ./scripts/verify/check-chain138-rpc-health.sh
+# Env: RPC_MAX_HEAD_SPREAD (default 12), RPC_MIN_PEERS (default 10), RPC_TIMEOUT_SEC (default 20)
+
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
+# shellcheck source=/dev/null
+source "${PROJECT_ROOT}/config/ip-addresses.conf" 2>/dev/null || true
+
+MAX_SPREAD="${RPC_MAX_HEAD_SPREAD:-12}"
+MIN_PEERS="${RPC_MIN_PEERS:-10}"
+TO="${RPC_TIMEOUT_SEC:-20}"
+
+# VMID|IP (HTTP :8545)
+RPC_ROWS=(
+  "2101|${IP_BESU_RPC_CORE_1:-192.168.11.211}"
+  "2102|${IP_BESU_RPC_CORE_2:-192.168.11.212}"
+  "2201|${IP_BESU_RPC_PUBLIC_1:-192.168.11.221}"
+  "2301|${IP_BESU_RPC_PRIVATE_1:-192.168.11.232}"
+  "2303|192.168.11.233"
+  "2304|192.168.11.234"
+  "2305|192.168.11.235"
+  "2306|192.168.11.236"
+  "2307|192.168.11.237"
+  "2308|192.168.11.238"
+  "2400|192.168.11.240"
+  "2401|192.168.11.241"
+  "2402|192.168.11.242"
+  "2403|192.168.11.243"
+)
+
+PAYLOAD_BN='{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'
+PAYLOAD_PC='{"jsonrpc":"2.0","method":"net_peerCount","params":[],"id":1}'
+
+tmpdir=$(mktemp -d)
+trap 'rm -rf "$tmpdir"' EXIT
+
+for row in "${RPC_ROWS[@]}"; do
+  vmid="${row%%|*}"
+  ip="${row#*|}"
+  (
+    curl -sS -m "$TO" -X POST "http://${ip}:8545" -H "Content-Type: application/json" -d "$PAYLOAD_BN" >"$tmpdir/bn-$vmid.json" 2>/dev/null || echo '{"error":"curl"}' >"$tmpdir/bn-$vmid.json"
+    curl -sS -m "$TO" -X POST "http://${ip}:8545" -H "Content-Type: application/json" -d "$PAYLOAD_PC" >"$tmpdir/pc-$vmid.json" 2>/dev/null || echo '{"error":"curl"}' >"$tmpdir/pc-$vmid.json"
+  ) &
+done
+wait
+
+fail=0
+min_b=999999999
+max_b=0
+echo "Chain 138 RPC health (parallel sample)"
+printf '%-5s %-15s %-10s %-8s\n' "VMID" "IP" "block(dec)" "peers"
+echo "------------------------------------------------------------"
+
+for row in "${RPC_ROWS[@]}"; do
+  vmid="${row%%|*}"
+  ip="${row#*|}"
+  bh=$(jq -r '.result // empty' "$tmpdir/bn-$vmid.json" 2>/dev/null || true)
+  ph=$(jq -r '.result // empty' "$tmpdir/pc-$vmid.json" 2>/dev/null || true)
+  if [[ -z "$bh" ]]; then
+    printf '%-5s %-15s %-10s %-8s\n' "$vmid" "$ip" "FAIL" "—"
+    ((fail++)) || true
+    continue
+  fi
+  bd=$((bh))
+  pd=$((ph))
+  [[ "$bd" -lt "$min_b" ]] && min_b=$bd
+  [[ "$bd" -gt "$max_b" ]] && max_b=$bd
+  if [[ "$pd" -lt "$MIN_PEERS" ]]; then
+    printf '%-5s %-15s %-10s %-8s LOW_PEERS\n' "$vmid" "$ip" "$bd" "$pd"
+    ((fail++)) || true
+  else
+    printf '%-5s %-15s %-10s %-8s\n' "$vmid" "$ip" "$bd" "$pd"
+  fi
+done
+
+spread=$((max_b - min_b))
+echo "------------------------------------------------------------"
+echo "Head spread (max-min): $spread (max allowed $MAX_SPREAD)"
+if [[ "$spread" -gt "$MAX_SPREAD" ]]; then
+  echo "FAIL: head spread too large"
+  ((fail++)) || true
+fi
+
+if [[ "$fail" -eq 0 ]]; then
+  echo "OK: all RPCs responded, peers >= $MIN_PEERS, spread <= $MAX_SPREAD"
+  exit 0
+fi
+echo "FAIL: $fail check(s) failed"
+exit 1
--- a/2
+++ b/2