Sync workspace: config, docs, scripts, CI, operator rules, and submodule pointers.

- Update dbis_core, cross-chain-pmm-lps, explorer-monorepo, metamask-integration, pr-workspace/chains
- Omit embedded publish git dirs and empty placeholders from index

Made-with: Cursor
This commit is contained in:
defiQUG
2026-04-12 06:12:20 -07:00
parent 6fb6bd3993
commit dbd517b279
2935 changed files with 327972 additions and 5533 deletions

View File

@@ -2,17 +2,17 @@
**health-check-rpc-2101.sh** — Health check for Besu RPC on VMID 2101: container status, besu-rpc service, port 8545, eth_chainId, eth_blockNumber. Run from project root (LAN). See docs/09-troubleshooting/RPC_NODES_BLOCK_PRODUCTION_FIX.md.
**fix-core-rpc-2101.sh** — One-command fix for Core RPC 2101: start CT if stopped, restart Besu, verify RPC. Options: `--dry-run`, `--restart-only`. If Besu fails with JNA/NoClassDefFoundError, run fix-rpc-2101-jna-reinstall.sh first.
**fix-core-rpc-2101.sh** — One-command fix for Core RPC 2101: start CT if stopped, restart Besu, verify RPC. Options: `--dry-run`, `--apply` (mutations when `PROXMOX_SAFE_DEFAULTS=1`), `--restart-only`. Optional `PROXMOX_OPS_ALLOWED_VMIDS`. If Besu fails with JNA/NoClassDefFoundError, run fix-rpc-2101-jna-reinstall.sh first.
**fix-rpc-2101-jna-reinstall.sh** — Reinstall Besu in CT 2101 to fix JNA/NoClassDefFoundError; then re-run fix-core-rpc-2101.sh. Use `--dry-run` to print steps only.
**check-disk-all-vmids.sh** — Check root disk usage in all running containers on ml110, r630-01, r630-02. Use `--csv` for tab-separated output. For prevention and audits.
**run-all-maintenance-via-proxmox-ssh.sh** — Run all maintenance/fix scripts that use SSH to Proxmox VE (r630-01, ml110, r630-02). **Runs make-rpc-vmids-writable-via-ssh.sh first** (so 2101, 2500-2505 are writable), then resolve-and-fix-all, fix-rpc-2101-jna-reinstall, install-besu-permanent-on-missing-nodes, address-all-remaining-502s; optional E2E with `--e2e`. Use `--no-npm` to skip NPM proxy update, `--dry-run` to print steps only, `--verbose` to show all step output (no stderr hidden). Step 2 (2101 fix) has optional timeout: `STEP2_TIMEOUT=900` (default) or `STEP2_TIMEOUT=0` to disable. Run from project root (LAN).
**run-all-maintenance-via-proxmox-ssh.sh** — Run all maintenance/fix scripts that use SSH to Proxmox VE (r630-01, ml110, r630-02). **Runs make-rpc-vmids-writable-via-ssh.sh --apply first** (so 2101, 2500-2505 are writable), then resolve-and-fix-all, fix-rpc-2101-jna-reinstall, install-besu-permanent-on-missing-nodes, address-all-remaining-502s; optional E2E with `--e2e`. Use `--no-npm` to skip NPM proxy update, `--dry-run` to print steps only, `--verbose` to show all step output (no stderr hidden). Step 2 (2101 fix) has optional timeout: `STEP2_TIMEOUT=900` (default) or `STEP2_TIMEOUT=0` to disable. Run from project root (LAN).
**make-rpc-vmids-writable-via-ssh.sh** — SSHs to r630-01 and for each VMID 2101, 2500-2505: stops the CT, runs `e2fsck -f -y` on the rootfs LV, starts the CT. Use before fix-rpc-2101 or install-besu-permanent when CTs are read-only. `--dry-run` to print only. Run from project root (LAN).
**make-rpc-vmids-writable-via-ssh.sh** — SSHs to r630-01 and for each VMID (default 2101; override with `BESU_WRITABLE_VMIDS`): stops the CT, runs `e2fsck -f -y` on the rootfs LV, starts the CT. Use before fix-rpc-2101 or install-besu-permanent when CTs are read-only. `--dry-run` / `--apply`; with `PROXMOX_SAFE_DEFAULTS=1`, default is dry-run unless `--apply` or `PROXMOX_OPS_APPLY=1`. Optional `PROXMOX_OPS_ALLOWED_VMIDS`. Run from project root (LAN).
**make-validator-vmids-writable-via-ssh.sh** — SSHs to r630-01 (1000, 1001, 1002) and ml110 (1003, 1004); stops each validator CT, runs `e2fsck -f -y` on rootfs, starts the CT. Fixes "Read-only file system" / JNA crash loop on validators. Then run `fix-all-validators-and-txpool.sh`. See docs/08-monitoring/RPC_AND_VALIDATOR_TESTING_RUNBOOK.md.
**make-validator-vmids-writable-via-ssh.sh** — SSHs to r630-01 (1000, 1001, 1002) and r630-03 (1003, 1004); stops each validator CT, runs `e2fsck -f -y` on rootfs, starts the CT. Fixes "Read-only file system" / JNA crash loop on validators. Then run `fix-all-validators-and-txpool.sh`. See docs/08-monitoring/RPC_AND_VALIDATOR_TESTING_RUNBOOK.md.
**Sentries 15001502 (r630-01)** — If deploy-besu-node-lists or set-all-besu-max-peers-32 reports Skip/fail or "Read-only file system" for 15001502, they have the same read-only root issue. On the host: `pct stop 1500; e2fsck -f -y /dev/pve/vm-1500-disk-0; pct start 1500` (repeat for 1501, 1502). Then re-run deploy and max-peers/restart.
@@ -23,10 +23,18 @@
**fix-all-502s-comprehensive.sh** — Starts/serves backends for 10130, 10150/10151, 2101, 25002505, Cacti (Python stubs if needed). Use `--dry-run` to print actions without SSH. Does not update NPMplus; use `update-npmplus-proxy-hosts-api.sh` from LAN for that.
**daily-weekly-checks.sh** — Daily (explorer, indexer lag, RPC) and weekly (config API, thin pool, log reminder).
**schedule-daily-weekly-cron.sh** — Install cron: daily 08:00, weekly Sun 09:00.
**schedule-daily-weekly-cron.sh** — Install cron: daily 08:00, weekly Sun 09:00. Run from a persistent host checkout; set `CRON_PROJECT_ROOT=/srv/proxmox` when installing on a Proxmox node.
**ensure-firefly-primary-via-ssh.sh** — SSHs to r630-02 and normalizes `/opt/firefly/docker-compose.yml` on VMID 6200, installs an idempotent helper-backed `firefly.service`, and verifies `/api/v1/status`. It is safe for the current mixed stack where `firefly-core` already exists outside compose while Postgres and IPFS remain compose-managed. Use `--dry-run` to print actions only.
**ensure-fabric-sample-network-via-ssh.sh** — SSHs to r630-02 and ensures VMID 6000 has nested-LXC features, a boot-time `fabric-sample-network.service`, and a queryable `mychannel`. Use `--dry-run` to print actions only.
**ensure-legacy-monitor-networkd-via-ssh.sh** — SSHs to r630-01 and fixes the legacy `3000`-`3003` monitor/RPC-adjacent LXCs so `systemd-networkd` is enabled host-side and started in-guest. This is the safe path for unprivileged guests where `systemctl enable` fails from inside the CT. `--dry-run` / `--apply`; same `PROXMOX_SAFE_DEFAULTS` behavior as other guarded maintenance scripts.
**check-and-fix-explorer-lag.sh** — Checks RPC vs Blockscout block; if lag > threshold (default 500), runs `fix-explorer-indexer-lag.sh` (restart Blockscout).
**schedule-explorer-lag-cron.sh** — Install cron for lag check-and-fix: every 6 hours (0, 6, 12, 18). Log: `logs/explorer-lag-fix.log`. Use `--show` to print the line, `--install` to add to crontab, `--remove` to remove.
**schedule-explorer-lag-cron.sh** — Install cron for lag check-and-fix: every 6 hours (0, 6, 12, 18). Log: `logs/explorer-lag-fix.log`. Use `--show` to print the line, `--install` to add to crontab, `--remove` to remove. Run from a persistent host checkout; set `CRON_PROJECT_ROOT=/srv/proxmox` when installing on a Proxmox node.
**All schedule-*.sh installers** — Refuse transient roots such as `/tmp/...`. Install from a persistent checkout only.
## Optional: Alerting on failures

View File

@@ -5,7 +5,7 @@
# Usage: ./scripts/maintenance/apply-peer-plan-fixes.sh [--deploy-only] [--restart-2101-only]
# --deploy-only Only deploy node lists (no restarts).
# --restart-2101-only Only restart VMID 2101 (assumes lists already deployed).
# Requires: SSH to Proxmox hosts (r630-01, r630-02, ml110). Run from LAN.
# Requires: SSH to Proxmox hosts (r630-01, r630-02, r630-03). Run from LAN.
set -euo pipefail
@@ -32,19 +32,19 @@ if [[ "$RESTART_2101_ONLY" != true ]]; then
fi
if [[ "$DEPLOY_ONLY" == true ]]; then
echo "Done (deploy only). To restart RPC 2101: $PROJECT_ROOT/scripts/maintenance/fix-core-rpc-2101.sh --restart-only"
echo "Done (deploy only). To restart RPC 2101: $PROJECT_ROOT/scripts/maintenance/fix-core-rpc-2101.sh --restart-only --apply"
exit 0
fi
echo "--- Restart RPC 2101 to load new node lists ---"
"$PROJECT_ROOT/scripts/maintenance/fix-core-rpc-2101.sh" --restart-only || { echo "Restart 2101 failed."; exit 1; }
"$PROJECT_ROOT/scripts/maintenance/fix-core-rpc-2101.sh" --restart-only --apply || { echo "Restart 2101 failed."; exit 1; }
echo ""
echo "--- Optional: 2102 and 2201 max-peers=32 ---"
echo "Repo updated: smom-dbis-138/config/config-rpc-public.toml has max-peers=32."
echo "--- Optional: 2102 and 2201 max-peers=40 ---"
echo "Repo and live fleet now use max-peers=40 on the modern RPC tier."
echo "To apply on nodes (from host with SSH):"
echo " - 2102 (ml110): ensure config uses max-peers=32 (e.g. copy from repo config-rpc-core.toml), restart Besu."
echo " - 2201 (r630-02): ensure config uses max-peers=32 (e.g. copy from repo config-rpc-public.toml), restart Besu."
echo " - 2102 (r630-03): ensure config uses max-peers=40 (e.g. copy from repo config-rpc-core.toml), restart Besu."
echo " - 2201 (r630-02): ensure config uses max-peers=40 (e.g. copy from repo config-rpc-public.toml), restart Besu."
echo "Then re-run: ./scripts/verify/check-rpc-2101-all-peers.sh"
echo ""
echo "Done. Verify: ./scripts/verify/verify-rpc-2101-approve-and-sync.sh && ./scripts/verify/check-rpc-2101-all-peers.sh"

View File

@@ -0,0 +1,117 @@
#!/usr/bin/env bash
# Ensure the public Cacti CTs on r630-02 keep both their nginx landing page and
# Docker-backed Hyperledger Cacti API healthy.
#
# Expected runtime:
# - VMID 5201 / 5202: nginx on :80 for the public landing page
# - VMID 5201 / 5202: cacti.service exposing the internal API on :4000
# - Proxmox CT config includes `features: nesting=1,keyctl=1` for Docker-in-LXC
#
# Usage: ./scripts/maintenance/ensure-cacti-web-via-ssh.sh [--dry-run]
# Env: PROXMOX_HOST_R630_02 (default 192.168.11.12)
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
[[ -f "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" ]] && source "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" 2>/dev/null || true
DRY_RUN=false
[[ "${1:-}" == "--dry-run" ]] && DRY_RUN=true
PROXMOX_HOST="${PROXMOX_HOST_R630_02:-192.168.11.12}"
log_info() { echo -e "\033[0;34m[INFO]\033[0m $1"; }
log_ok() { echo -e "\033[0;32m[✓]\033[0m $1"; }
log_warn() { echo -e "\033[0;33m[⚠]\033[0m $1"; }
run_ssh() { ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@"$PROXMOX_HOST" "$@"; }
ensure_ct_features() {
local vmid="$1"
local conf="/etc/pve/lxc/${vmid}.conf"
local features
features="$(run_ssh "awk -F': ' '/^features:/{print \$2}' ${conf@Q} 2>/dev/null || true" | tr -d '\r\n')"
if [[ "$features" == *"nesting=1"* && "$features" == *"keyctl=1"* ]]; then
return 0
fi
if [[ "$DRY_RUN" == true ]]; then
log_info "Would add features: nesting=1,keyctl=1 to VMID $vmid and restart the CT"
return 0
fi
run_ssh "cp ${conf@Q} /root/${vmid}.conf.pre-codex.\$(date +%Y%m%d_%H%M%S)"
if [[ -n "$features" ]]; then
run_ssh "sed -i 's/^features:.*/features: nesting=1,keyctl=1/' ${conf@Q}"
else
run_ssh "printf '%s\n' 'features: nesting=1,keyctl=1' >> ${conf@Q}"
fi
run_ssh "pct shutdown $vmid --timeout 30 >/dev/null 2>&1 || pct stop $vmid >/dev/null 2>&1 || true"
run_ssh "pct start $vmid"
sleep 8
}
ensure_cacti_surface() {
local vmid="$1"
local ip="$2"
local label="$3"
local status
local local_check
local remote_script
ensure_ct_features "$vmid"
status="$(run_ssh "pct status $vmid 2>/dev/null | awk '{print \$2}'" 2>/dev/null || echo "missing")"
if [[ "$status" != "running" ]]; then
log_warn "$label (VMID $vmid) is not running"
return 0
fi
local_check="$(run_ssh "timeout 5 curl -sS -o /dev/null -w '%{http_code}' http://${ip}/ 2>/dev/null || true" | tr -d '\r\n')"
if [[ "$local_check" == "200" ]] && run_ssh "pct exec $vmid -- bash -lc 'curl -fsS http://127.0.0.1:4000/api/v1/api-server/healthcheck >/dev/null 2>&1'" >/dev/null 2>&1; then
log_ok "$label already serves both the landing page and internal Cacti API"
return 0
fi
if [[ "$DRY_RUN" == true ]]; then
log_info "Would restart nginx/docker/cacti.service in VMID $vmid (${label})"
return 0
fi
printf -v remote_script '%s' "$(cat <<'EOF'
set -e
id -nG cacti 2>/dev/null | grep -qw docker || usermod -aG docker cacti || true
systemctl restart docker
systemctl enable --now nginx
systemctl reset-failed cacti.service || true
systemctl enable --now cacti.service
for _ in $(seq 1 20); do
if curl -fsS http://127.0.0.1:4000/api/v1/api-server/healthcheck >/dev/null 2>&1; then
break
fi
sleep 2
done
curl -fsS http://127.0.0.1/ >/dev/null
curl -fsS http://127.0.0.1:4000/api/v1/api-server/healthcheck >/dev/null
EOF
)"
run_ssh "pct exec $vmid -- bash --norc -lc $(printf '%q' "$remote_script")"
local_check="$(run_ssh "timeout 5 curl -sS -o /dev/null -w '%{http_code}' http://${ip}/ 2>/dev/null || true" | tr -d '\r\n')"
if [[ "$local_check" == "200" ]] && run_ssh "pct exec $vmid -- bash -lc 'curl -fsS http://127.0.0.1:4000/api/v1/api-server/healthcheck >/dev/null 2>&1'" >/dev/null 2>&1; then
log_ok "$label restored on ${ip}:80 with a healthy internal Cacti API"
else
log_warn "$label is still only partially healthy on VMID $vmid"
fi
}
echo ""
echo "=== Ensure Cacti surfaces ==="
echo " Host: $PROXMOX_HOST dry-run=$DRY_RUN"
echo ""
ensure_cacti_surface 5201 "192.168.11.177" "Cacti ALLTRA"
ensure_cacti_surface 5202 "192.168.11.251" "Cacti HYBX"
echo ""

View File

@@ -1,9 +1,10 @@
#!/usr/bin/env bash
# Ensure Core RPC nodes 2101 and 2102 have TXPOOL and ADMIN (and DEBUG) in rpc-http-api and rpc-ws-api.
# Ensure Core RPC nodes 2101, 2103 (Thirdweb admin core), and 2102 have TXPOOL and ADMIN (and DEBUG) in rpc-http-api and rpc-ws-api.
# Does NOT add txpool_besuClear/txpool_clear/admin_removeTransaction — Besu does not implement them.
# VMID 2103 uses /etc/besu/config-rpc-core.toml on r630-01; repo canonical: smom-dbis-138/config/config-rpc-thirdweb-admin-core.toml
# See: docs/04-configuration/CORE_RPC_2101_2102_TXPOOL_ADMIN_STATUS.md
#
# Usage: ./scripts/maintenance/ensure-core-rpc-config-2101-2102.sh [--dry-run] [--2101-only] [--2102-only]
# Usage: ./scripts/maintenance/ensure-core-rpc-config-2101-2102.sh [--dry-run] [--2101-only] [--2102-only] [--2103-only]
set -euo pipefail
@@ -17,18 +18,23 @@ RPC_WS_API='["ETH","NET","WEB3","TXPOOL","QBFT","ADMIN"]'
VMID_2101=2101
VMID_2102=2102
VMID_2103=2103
HOST_2101="${PROXMOX_HOST_R630_01:-192.168.11.11}"
HOST_2102="${PROXMOX_HOST_ML110:-192.168.11.10}"
HOST_2103="${PROXMOX_HOST_R630_01:-192.168.11.11}"
CONFIG_2101="/etc/besu/config-rpc-core.toml"
CONFIG_2102="/etc/besu/config-rpc.toml"
CONFIG_2103="/etc/besu/config-rpc-core.toml"
DRY_RUN=false
ONLY_2101=false
ONLY_2102=false
ONLY_2103=false
for a in "$@"; do
[[ "$a" == "--dry-run" ]] && DRY_RUN=true
[[ "$a" == "--2101-only" ]] && ONLY_2101=true
[[ "$a" == "--2102-only" ]] && ONLY_2102=true
[[ "$a" == "--2103-only" ]] && ONLY_2103=true
done
run_ssh() { ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@"$1" "$2"; }
@@ -42,7 +48,7 @@ ensure_apis() {
local config_path=$3
log_info "VMID $vmid ($host): ensuring $config_path has TXPOOL, ADMIN, DEBUG..."
if $DRY_RUN; then
echo " Would set rpc-http-api and rpc-ws-api to include TXPOOL, ADMIN, DEBUG, QBFT, TRACE (2101/2102)"
echo " Would set rpc-http-api and rpc-ws-api to include TXPOOL, ADMIN, DEBUG, QBFT, TRACE (2101/2103/2102)"
return 0
fi
# Pass API lists via env so quoting is safe; remote sed updates the config
@@ -63,18 +69,23 @@ ensure_apis() {
}
echo ""
echo "=== Ensure Core RPC 2101 / 2102 — TXPOOL + ADMIN (max Besu supports) ==="
echo " dry-run=$DRY_RUN 2101-only=$ONLY_2101 2102-only=$ONLY_2102"
echo "=== Ensure Core RPC 2101 / 2103 / 2102 — TXPOOL + ADMIN (max Besu supports) ==="
echo " dry-run=$DRY_RUN 2101-only=$ONLY_2101 2102-only=$ONLY_2102 2103-only=$ONLY_2103"
echo " Note: txpool_besuClear, txpool_clear, admin_removeTransaction are NOT in Besu; use clear-all-transaction-pools.sh to clear stuck txs."
echo ""
if [[ "$ONLY_2102" != true ]]; then
if $ONLY_2103; then
ensure_apis "$VMID_2103" "$HOST_2103" "$CONFIG_2103" || true
elif $ONLY_2101; then
ensure_apis "$VMID_2101" "$HOST_2101" "$CONFIG_2101" || true
fi
if [[ "$ONLY_2101" != true ]]; then
elif $ONLY_2102; then
ensure_apis "$VMID_2102" "$HOST_2102" "$CONFIG_2102" || true
else
ensure_apis "$VMID_2101" "$HOST_2101" "$CONFIG_2101" || true
ensure_apis "$VMID_2103" "$HOST_2103" "$CONFIG_2103" || true
ensure_apis "$VMID_2102" "$HOST_2102" "$CONFIG_2102" || true
fi
echo ""
echo "Done. Verify: ./scripts/maintenance/health-check-rpc-2101.sh and curl to 192.168.11.212:8545 for 2102."
echo "Done. Verify: ./scripts/maintenance/health-check-rpc-2101.sh; curl 192.168.11.217:8545 (2103); curl 192.168.11.212:8545 (2102)."
echo "Ref: docs/04-configuration/CORE_RPC_2101_2102_TXPOOL_ADMIN_STATUS.md"

View File

@@ -1,6 +1,10 @@
#!/usr/bin/env bash
# Ensure web/API services inside DBIS containers (10130, 10150, 10151) are running.
# Fixes 502 when containers are up but nginx or app inside is stopped.
# Ensure the deployed DBIS frontend/API surfaces on r630-01 stay healthy.
#
# Expected runtime:
# - VMID 10130: nginx serving the built DBIS frontend on port 80
# - VMID 10150: dbis-api.service serving the primary DBIS API on port 3000
# - VMID 10151: dbis-api.service serving the secondary DBIS API on port 3000
#
# Usage: ./scripts/maintenance/ensure-dbis-services-via-ssh.sh [--dry-run]
# Env: PROXMOX_HOST_R630_01 (default 192.168.11.11)
@@ -21,20 +25,83 @@ log_warn() { echo -e "\033[0;33m[⚠]\033[0m $1"; }
run_ssh() { ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@"$PROXMOX_HOST" "$@"; }
build_url() {
local ip="$1"
local port="$2"
local path="$3"
if [[ "$port" == "80" ]]; then
printf 'http://%s%s' "$ip" "$path"
else
printf 'http://%s:%s%s' "$ip" "$port" "$path"
fi
}
ensure_service_surface() {
local vmid="$1"
local ip="$2"
local port="$3"
local path="$4"
local service="$5"
local label="$6"
local status
local local_url
local remote_url
local local_check
local remote_script
status="$(run_ssh "pct status $vmid 2>/dev/null | awk '{print \$2}'" 2>/dev/null || echo "missing")"
if [[ "$status" != "running" ]]; then
log_warn "$label (VMID $vmid) is not running"
return 0
fi
local_url="$(build_url "$ip" "$port" "$path")"
remote_url="$(build_url "127.0.0.1" "$port" "$path")"
local_check="$(run_ssh "timeout 5 curl -sS -o /dev/null -w '%{http_code}' ${local_url@Q} 2>/dev/null || true" | tr -d '\r\n')"
if [[ "$local_check" == "200" ]]; then
log_ok "$label already responds at $local_url"
return 0
fi
if [[ "$DRY_RUN" == true ]]; then
log_info "Would restart $service on VMID $vmid and recheck $local_url"
return 0
fi
printf -v remote_script '%s' "$(cat <<EOF
set -e
systemctl reset-failed ${service} >/dev/null 2>&1 || true
systemctl restart ${service}
for _ in \$(seq 1 15); do
if curl -fsS ${remote_url@Q} >/dev/null 2>&1; then
exit 0
fi
sleep 2
done
curl -fsS ${remote_url@Q} >/dev/null
EOF
)"
if ! run_ssh "pct exec $vmid -- bash --norc -lc $(printf '%q' "$remote_script")"; then
log_warn "$label restart path failed on VMID $vmid"
fi
local_check="$(run_ssh "timeout 5 curl -sS -o /dev/null -w '%{http_code}' ${local_url@Q} 2>/dev/null || true" | tr -d '\r\n')"
if [[ "$local_check" == "200" ]]; then
log_ok "$label restored at $local_url"
else
log_warn "$label still not healthy at $local_url (curl=${local_check:-000})"
fi
}
echo ""
echo "=== Ensure DBIS container services (fix 502) ==="
echo "=== Ensure DBIS deployed services ==="
echo " Host: $PROXMOX_HOST dry-run=$DRY_RUN"
echo ""
for vmid in 10130 10150 10151; do
if [[ "$DRY_RUN" == true ]]; then
log_info "Would ensure nginx/node in VMID $vmid"
continue
fi
status=$(run_ssh "pct status $vmid 2>/dev/null | awk '{print \$2}'" 2>/dev/null || echo "missing")
[[ "$status" != "running" ]] && { log_warn "VMID $vmid not running"; continue; }
run_ssh "pct exec $vmid -- systemctl start nginx 2>/dev/null" || true
run_ssh "pct exec $vmid -- systemctl start node 2>/dev/null" || true
log_ok "VMID $vmid services started"
done
ensure_service_surface 10130 "192.168.11.130" "80" "/" "nginx" "DBIS frontend"
ensure_service_surface 10150 "192.168.11.155" "3000" "/v1/health" "dbis-api.service" "DBIS API primary"
ensure_service_surface 10151 "192.168.11.156" "3000" "/v1/health" "dbis-api.service" "DBIS API secondary"
echo ""

View File

@@ -0,0 +1,189 @@
#!/usr/bin/env bash
# Ensure the Hyperledger Fabric sample network on VMID 6000 is up, queryable,
# and boot-recoverable after container restarts.
#
# Expected runtime:
# - VMID 6000 running on r630-02
# - docker + nested LXC features enabled
# - fabric-samples test-network payload under /opt/fabric/fabric-samples/test-network
# - orderer.example.com, peer0.org1.example.com, peer0.org2.example.com running
# - peer channel getinfo -c mychannel succeeds for Org1
#
# Usage: ./scripts/maintenance/ensure-fabric-sample-network-via-ssh.sh [--dry-run]
# Env: PROXMOX_HOST_R630_02 (default 192.168.11.12)
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
[[ -f "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" ]] && source "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" 2>/dev/null || true
DRY_RUN=false
[[ "${1:-}" == "--dry-run" ]] && DRY_RUN=true
PROXMOX_HOST="${PROXMOX_HOST_R630_02:-192.168.11.12}"
VMID=6000
log_info() { echo -e "\033[0;34m[INFO]\033[0m $1"; }
log_ok() { echo -e "\033[0;32m[✓]\033[0m $1"; }
log_warn() { echo -e "\033[0;33m[⚠]\033[0m $1"; }
log_err() { echo -e "\033[0;31m[ERR]\033[0m $1"; }
run_ssh() { ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@"$PROXMOX_HOST" "$@"; }
run_scp() { scp -o ConnectTimeout=10 -o StrictHostKeyChecking=no "$@"; }
ensure_ct_features() {
local conf="/etc/pve/lxc/${VMID}.conf"
local features
features="$(run_ssh "awk -F': ' '/^features:/{print \$2}' ${conf@Q} 2>/dev/null || true" | tr -d '\r\n')"
if [[ "$features" == *"nesting=1"* && "$features" == *"keyctl=1"* ]]; then
return 0
fi
if [[ "$DRY_RUN" == true ]]; then
log_info "Would add features: nesting=1,keyctl=1 to VMID $VMID and restart the CT"
return 0
fi
run_ssh "cp ${conf@Q} /root/${VMID}.conf.pre-codex.\$(date +%Y%m%d_%H%M%S)"
if [[ -n "$features" ]]; then
run_ssh "sed -i 's/^features:.*/features: nesting=1,keyctl=1/' ${conf@Q}"
else
run_ssh "printf '%s\n' 'features: nesting=1,keyctl=1' >> ${conf@Q}"
fi
run_ssh "pct shutdown $VMID --timeout 30 >/dev/null 2>&1 || pct stop $VMID >/dev/null 2>&1 || true"
run_ssh "pct start $VMID"
sleep 8
}
ensure_boot_service() {
if [[ "$DRY_RUN" == true ]]; then
log_info "Would install and enable fabric-sample-network.service in VMID $VMID"
return 0
fi
local helper_tmp unit_tmp
helper_tmp="$(mktemp)"
unit_tmp="$(mktemp)"
cat > "$helper_tmp" <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
cd /opt/fabric/fabric-samples/test-network
verify() {
docker ps --format '{{.Names}}' | grep -qx orderer.example.com
docker ps --format '{{.Names}}' | grep -qx peer0.org1.example.com
docker ps --format '{{.Names}}' | grep -qx peer0.org2.example.com
export PATH=/opt/fabric/fabric-samples/bin:$PATH
export FABRIC_CFG_PATH=/opt/fabric/fabric-samples/config
export $(./setOrgEnv.sh Org1 | xargs)
peer channel getinfo -c mychannel >/tmp/fabric-channel-info.txt
}
if verify 2>/dev/null; then
exit 0
fi
./network.sh up >/tmp/fabric-network-up.log 2>&1 || true
verify
EOF
cat > "$unit_tmp" <<'EOF'
[Unit]
Description=Ensure Hyperledger Fabric sample network
After=docker.service network-online.target
Requires=docker.service
[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/opt/fabric/fabric-samples/test-network
ExecStart=/usr/local/bin/ensure-fabric-sample-network
[Install]
WantedBy=multi-user.target
EOF
run_scp "$helper_tmp" "root@$PROXMOX_HOST:/tmp/ensure-fabric-sample-network"
run_scp "$unit_tmp" "root@$PROXMOX_HOST:/tmp/fabric-sample-network.service"
run_ssh "pct exec $VMID -- rm -f /usr/local/bin/ensure-fabric-sample-network /etc/systemd/system/fabric-sample-network.service"
run_ssh "pct push $VMID /tmp/ensure-fabric-sample-network /usr/local/bin/ensure-fabric-sample-network --perms 755"
run_ssh "pct push $VMID /tmp/fabric-sample-network.service /etc/systemd/system/fabric-sample-network.service --perms 644"
run_ssh "rm -f /tmp/ensure-fabric-sample-network /tmp/fabric-sample-network.service"
rm -f "$helper_tmp" "$unit_tmp"
run_ssh "pct exec $VMID -- bash -lc 'systemctl daemon-reload && systemctl enable fabric-sample-network.service >/dev/null 2>&1 && systemctl start fabric-sample-network.service'"
}
verify_fabric_sample_network() {
run_ssh "pct exec $VMID -- bash -lc '
set -euo pipefail
cd /opt/fabric/fabric-samples/test-network
echo service=\$(systemctl is-active fabric-sample-network.service 2>/dev/null || echo unknown)
docker ps --format \"{{.Names}}\" | grep -qx orderer.example.com
docker ps --format \"{{.Names}}\" | grep -qx peer0.org1.example.com
docker ps --format \"{{.Names}}\" | grep -qx peer0.org2.example.com
export PATH=/opt/fabric/fabric-samples/bin:\$PATH
export FABRIC_CFG_PATH=/opt/fabric/fabric-samples/config
export \$(./setOrgEnv.sh Org1 | xargs)
peer channel getinfo -c mychannel >/tmp/fabric-channel-info.txt
cat /tmp/fabric-channel-info.txt
'" 2>/dev/null
}
restore_fabric_sample_network() {
if [[ "$DRY_RUN" == true ]]; then
log_info "Would run ./network.sh up inside VMID $VMID and then verify mychannel"
return 0
fi
run_ssh "pct exec $VMID -- bash -lc '
set -euo pipefail
cd /opt/fabric/fabric-samples/test-network
./network.sh up >/tmp/fabric-network-up.log 2>&1 || true
cat /tmp/fabric-network-up.log
'"
}
echo ""
echo "=== Ensure Fabric sample network ==="
echo " Host: $PROXMOX_HOST vmid=$VMID dry-run=$DRY_RUN"
echo ""
ensure_ct_features
status="$(run_ssh "pct status $VMID 2>/dev/null | awk '{print \$2}'" 2>/dev/null || echo "missing")"
if [[ "$status" != "running" ]]; then
if [[ "$DRY_RUN" == true ]]; then
log_info "Would start VMID $VMID"
else
run_ssh "pct start $VMID"
sleep 8
fi
fi
ensure_boot_service
if [[ "$DRY_RUN" == true ]]; then
log_info "Would verify running orderer/peer containers and peer channel getinfo -c mychannel"
exit 0
fi
if fabric_info="$(verify_fabric_sample_network)"; then
log_ok "Fabric sample network already healthy"
printf '%s\n' "$fabric_info"
exit 0
fi
log_warn "Fabric sample network not fully healthy; attempting restore"
restore_fabric_sample_network
if fabric_info="$(verify_fabric_sample_network)"; then
log_ok "Fabric sample network restored"
printf '%s\n' "$fabric_info"
else
log_err "Fabric sample network is still not healthy after restore attempt"
exit 1
fi

View File

@@ -0,0 +1,186 @@
#!/usr/bin/env bash
# Ensure the Hyperledger FireFly primary on VMID 6200 has a valid compose file
# and an active systemd unit.
#
# Expected runtime:
# - VMID 6200 running on r630-02
# - /opt/firefly/docker-compose.yml present
# - firefly.service enabled and active
# - firefly-core, firefly-postgres, firefly-ipfs using restart=unless-stopped
# - GET /api/v1/status succeeds on localhost:5000
#
# Usage: ./scripts/maintenance/ensure-firefly-primary-via-ssh.sh [--dry-run]
# Env: PROXMOX_HOST_R630_02 (default 192.168.11.12)
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
[[ -f "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" ]] && source "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" 2>/dev/null || true
DRY_RUN=false
[[ "${1:-}" == "--dry-run" ]] && DRY_RUN=true
PROXMOX_HOST="${PROXMOX_HOST_R630_02:-192.168.11.12}"
VMID=6200
log_info() { echo -e "\033[0;34m[INFO]\033[0m $1"; }
log_ok() { echo -e "\033[0;32m[✓]\033[0m $1"; }
log_err() { echo -e "\033[0;31m[ERR]\033[0m $1"; }
run_ssh() { ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@"$PROXMOX_HOST" "$@"; }
normalize_compose() {
if [[ "$DRY_RUN" == true ]]; then
log_info "Would normalize /opt/firefly/docker-compose.yml and validate it with docker-compose config -q"
return 0
fi
run_ssh "pct exec $VMID -- bash -lc '
set -euo pipefail
test -f /opt/firefly/docker-compose.yml
if grep -qE \"^version:[[:space:]]*3\\.8[[:space:]]*$\" /opt/firefly/docker-compose.yml; then
sed -i \"s/^version:[[:space:]]*3\\.8[[:space:]]*$/version: \\\"3.8\\\"/\" /opt/firefly/docker-compose.yml
fi
docker-compose -f /opt/firefly/docker-compose.yml config -q
'"
}
install_firefly_helper() {
if [[ "$DRY_RUN" == true ]]; then
log_info "Would install an idempotent FireFly helper and systemd unit in VMID $VMID"
return 0
fi
local helper_tmp unit_tmp
helper_tmp="$(mktemp)"
unit_tmp="$(mktemp)"
cat > "$helper_tmp" <<'EOF'
#!/usr/bin/env bash
set -euo pipefail
COMPOSE_FILE=/opt/firefly/docker-compose.yml
STATUS_URL=http://127.0.0.1:5000/api/v1/status
start_stack() {
cd /opt/firefly
test -f "$COMPOSE_FILE"
if grep -qE '^version:[[:space:]]*3\.8[[:space:]]*$' "$COMPOSE_FILE"; then
sed -i 's/^version:[[:space:]]*3\.8[[:space:]]*$/version: "3.8"/' "$COMPOSE_FILE"
fi
docker-compose -f "$COMPOSE_FILE" config -q
docker-compose -f "$COMPOSE_FILE" up -d postgres ipfs >/dev/null
if docker ps -a --format '{{.Names}}' | grep -qx firefly-core; then
docker start firefly-core >/dev/null 2>&1 || true
else
docker-compose -f "$COMPOSE_FILE" up -d firefly-core >/dev/null
fi
curl -fsS "$STATUS_URL" >/dev/null
}
stop_stack() {
docker stop firefly-core firefly-postgres firefly-ipfs >/dev/null 2>&1 || true
}
case "${1:-start}" in
start)
start_stack
;;
stop)
stop_stack
;;
*)
echo "Usage: $0 [start|stop]" >&2
exit 64
;;
esac
EOF
cat > "$unit_tmp" <<'EOF'
[Unit]
Description=Ensure Hyperledger FireFly primary stack
After=docker.service network-online.target
Requires=docker.service
[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/opt/firefly
User=firefly
Group=firefly
ExecStart=/usr/local/bin/ensure-firefly-primary start
ExecStop=/usr/local/bin/ensure-firefly-primary stop
[Install]
WantedBy=multi-user.target
EOF
scp -o ConnectTimeout=10 -o StrictHostKeyChecking=no "$helper_tmp" "root@$PROXMOX_HOST:/tmp/ensure-firefly-primary"
scp -o ConnectTimeout=10 -o StrictHostKeyChecking=no "$unit_tmp" "root@$PROXMOX_HOST:/tmp/firefly.service"
run_ssh "pct exec $VMID -- rm -f /usr/local/bin/ensure-firefly-primary /etc/systemd/system/firefly.service"
run_ssh "pct push $VMID /tmp/ensure-firefly-primary /usr/local/bin/ensure-firefly-primary --perms 755"
run_ssh "pct push $VMID /tmp/firefly.service /etc/systemd/system/firefly.service --perms 644"
run_ssh "rm -f /tmp/ensure-firefly-primary /tmp/firefly.service"
rm -f "$helper_tmp" "$unit_tmp"
}
ensure_firefly_service() {
if [[ "$DRY_RUN" == true ]]; then
log_info "Would reset-failed and enable/start firefly.service in VMID $VMID"
return 0
fi
run_ssh "pct exec $VMID -- bash -lc '
set -euo pipefail
systemctl daemon-reload
systemctl reset-failed firefly.service || true
systemctl enable firefly.service >/dev/null 2>&1
systemctl start firefly.service
'"
}
verify_firefly_primary() {
run_ssh "pct exec $VMID -- bash -lc '
set -euo pipefail
echo service=\$(systemctl is-active firefly.service)
docker inspect -f \"{{.HostConfig.RestartPolicy.Name}}\" firefly-core | grep -qx unless-stopped
docker inspect -f \"{{.HostConfig.RestartPolicy.Name}}\" firefly-postgres | grep -qx unless-stopped
docker inspect -f \"{{.HostConfig.RestartPolicy.Name}}\" firefly-ipfs | grep -qx unless-stopped
curl -fsS http://127.0.0.1:5000/api/v1/status
'" 2>/dev/null
}
echo ""
echo "=== Ensure FireFly primary ==="
echo " Host: $PROXMOX_HOST vmid=$VMID dry-run=$DRY_RUN"
echo ""
status="$(run_ssh "pct status $VMID 2>/dev/null | awk '{print \$2}'" 2>/dev/null || echo "missing")"
if [[ "$status" != "running" ]]; then
if [[ "$DRY_RUN" == true ]]; then
log_info "Would start VMID $VMID"
else
run_ssh "pct start $VMID"
sleep 8
fi
fi
normalize_compose
install_firefly_helper
if [[ "$DRY_RUN" == true ]]; then
log_info "Would enable/start firefly.service and verify API health"
exit 0
fi
ensure_firefly_service
if firefly_info="$(verify_firefly_primary)"; then
log_ok "FireFly primary healthy"
printf '%s\n' "$firefly_info"
else
log_err "FireFly primary is still not healthy after normalization"
exit 1
fi

View File

@@ -0,0 +1,91 @@
#!/usr/bin/env bash
# Ensure the legacy 3000-3003 monitor/RPC-adjacent LXCs have working static
# networking and a boot-time systemd-networkd enablement, even though the
# unprivileged guests cannot write their own multi-user.target.wants entries.
#
# Usage: ./scripts/maintenance/ensure-legacy-monitor-networkd-via-ssh.sh [--dry-run] [--apply]
# Env: PROXMOX_HOST_R630_01 (default 192.168.11.11)
# PROXMOX_SAFE_DEFAULTS=1 — default dry-run unless --apply or PROXMOX_OPS_APPLY=1
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
# shellcheck source=../lib/proxmox-production-guard.sh
source "${PROJECT_ROOT}/scripts/lib/proxmox-production-guard.sh"
[[ -f "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" ]] && source "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" 2>/dev/null || true
DRY_RUN=false
APPLY=false
EXPLICIT_DRY=false
for _arg in "$@"; do
case "$_arg" in
--dry-run) DRY_RUN=true; EXPLICIT_DRY=true ;;
--apply) APPLY=true; DRY_RUN=false ;;
esac
done
if [[ "$EXPLICIT_DRY" != true ]] && pguard_mutations_allowed; then
APPLY=true
DRY_RUN=false
fi
if [[ "$EXPLICIT_DRY" != true ]] && pguard_safe_defaults_enabled && [[ "$APPLY" != true ]]; then
DRY_RUN=true
fi
PROXMOX_HOST="${PROXMOX_HOST_R630_01:-192.168.11.11}"
VMIDS=(3000 3001 3002 3003)
log_info() { echo -e "\033[0;34m[INFO]\033[0m $1"; }
log_ok() { echo -e "\033[0;32m[✓]\033[0m $1"; }
log_err() { echo -e "\033[0;31m[ERR]\033[0m $1"; }
run_ssh() { ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@"$PROXMOX_HOST" "$@"; }
enable_hostside() {
local vmid="$1"
run_ssh "pct unmount $vmid >/dev/null 2>&1 || true"
run_ssh "mp=\$(pct mount $vmid | sed -n \"s/^mounted CT [0-9]\\+ in '\\''\\(.*\\)'\\''$/\\1/p\"); test -n \"\$mp\"; mkdir -p \"\$mp/etc/systemd/system/multi-user.target.wants\"; ln -sf /lib/systemd/system/systemd-networkd.service \"\$mp/etc/systemd/system/multi-user.target.wants/systemd-networkd.service\"; pct unmount $vmid"
}
start_and_verify() {
local vmid="$1"
run_ssh "pct exec $vmid -- systemctl start systemd-networkd"
run_ssh "pct exec $vmid -- sh -c 'printf \"%s\\n\" \"active=\$(systemctl is-active systemd-networkd)\"; printf \"%s\\n\" \"enabled=\$(systemctl is-enabled systemd-networkd 2>/dev/null || true)\"; hostname -I 2>/dev/null'"
}
echo ""
echo "=== Ensure legacy monitor networking ==="
echo " Host: $PROXMOX_HOST vmids=${VMIDS[*]} dry-run=$DRY_RUN apply=$APPLY"
echo ""
for vmid in "${VMIDS[@]}"; do
pguard_vmid_allowed "$vmid" || exit 2
done
if [[ "$DRY_RUN" == true ]]; then
for vmid in "${VMIDS[@]}"; do
log_info "Would mount CT $vmid, create host-side systemd-networkd enablement symlink, start systemd-networkd, and verify hostname -I"
done
exit 0
fi
for vmid in "${VMIDS[@]}"; do
if ! status="$(run_ssh "pct status $vmid 2>/dev/null | awk '{print \$2}'" 2>/dev/null)"; then
log_err "VMID $vmid is missing or unreachable"
exit 1
fi
if [[ "$status" != "running" ]]; then
run_ssh "pct start $vmid"
sleep 4
fi
enable_hostside "$vmid"
monitor_info="$(start_and_verify "$vmid")"
if grep -q '^active=active$' <<<"$monitor_info" && grep -q '^enabled=enabled$' <<<"$monitor_info" && grep -Eq '^192\.168\.11\.[0-9]+' <<<"$monitor_info"; then
log_ok "VMID $vmid networking healthy"
printf '%s\n' "$monitor_info"
else
log_err "VMID $vmid networking verification failed"
printf '%s\n' "$monitor_info"
exit 1
fi
done

View File

@@ -67,7 +67,7 @@ log "2101 (rpc-http-prv): ensure nodekey and fix Besu..."
if run "$R630_01" "pct status 2101 2>/dev/null | awk '{print \$2}'" 2>/dev/null | grep -q running; then
run "$R630_01" "pct exec 2101 -- sh -c 'mkdir -p /data/besu; [ -f /data/besu/nodekey ] || [ -f /data/besu/key ] || openssl rand -hex 32 > /data/besu/nodekey'" 2>/dev/null || true
fi
if $DRY_RUN; then log "Would run fix-core-rpc-2101.sh"; else "${SCRIPT_DIR}/fix-core-rpc-2101.sh" 2>/dev/null && ok "2101 fix run" || warn "2101 fix had issues"; fi
if $DRY_RUN; then log "Would run fix-core-rpc-2101.sh"; else "${SCRIPT_DIR}/fix-core-rpc-2101.sh" --apply 2>/dev/null && ok "2101 fix run" || warn "2101 fix had issues"; fi
# --- 2500-2505 Alltra/HYBX RPC: ensure nodekey then start besu ---
for v in 2500 2501 2502 2503 2504 2505; do

View File

@@ -2,9 +2,12 @@
# Fix Core Besu RPC on VMID 2101 (Chain 138 admin/deploy — RPC_URL_138).
# Starts container if stopped, starts/restarts Besu service, verifies RPC.
#
# Usage: ./scripts/maintenance/fix-core-rpc-2101.sh [--dry-run] [--restart-only]
# --dry-run Print actions only; do not run.
# Usage: ./scripts/maintenance/fix-core-rpc-2101.sh [--dry-run] [--apply] [--restart-only]
# --dry-run Print actions only; do not run.
# --apply Perform mutations (required when PROXMOX_SAFE_DEFAULTS=1 is set).
# --restart-only Skip pct start; only restart Besu service inside CT.
# Env: PROXMOX_SAFE_DEFAULTS=1 — default to dry-run unless --apply or PROXMOX_OPS_APPLY=1.
# PROXMOX_OPS_ALLOWED_VMIDS — optional allowlist (e.g. only 2101 for this script).
# Requires: SSH to r630-01 (key-based). Run from LAN or VPN.
#
# See: docs/00-meta/502_DEEP_DIVE_ROOT_CAUSES_AND_FIXES.md (rpc-http-prv)
@@ -14,7 +17,9 @@ set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
[[ -f "${PROJECT_ROOT}/config/ip-addresses.conf" ]] && source "${PROJECT_ROOT}/config/ip-addresses.conf" 2>/dev/null || true
# shellcheck source=../lib/proxmox-production-guard.sh
source "${PROJECT_ROOT}/scripts/lib/proxmox-production-guard.sh"
[[ -f "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" ]] && source "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" 2>/dev/null || true
VMID=2101
HOST="${PROXMOX_HOST_R630_01:-${PROXMOX_R630_01:-192.168.11.11}}"
@@ -22,8 +27,27 @@ RPC_IP="${RPC_CORE_1:-192.168.11.211}"
RPC_PORT=8545
DRY_RUN=false
APPLY=false
EXPLICIT_DRY=false
RESTART_ONLY=false
for a in "$@"; do [[ "$a" == "--dry-run" ]] && DRY_RUN=true; [[ "$a" == "--restart-only" ]] && RESTART_ONLY=true; done
for a in "$@"; do
case "$a" in
--dry-run) DRY_RUN=true; EXPLICIT_DRY=true ;;
--apply) APPLY=true; DRY_RUN=false ;;
--restart-only) RESTART_ONLY=true ;;
esac
done
# PROXMOX_OPS_APPLY=1 acts like --apply unless operator explicitly passed --dry-run
if [[ "$EXPLICIT_DRY" != true ]] && pguard_mutations_allowed; then
APPLY=true
DRY_RUN=false
fi
if [[ "$EXPLICIT_DRY" != true ]] && pguard_safe_defaults_enabled && [[ "$APPLY" != true ]]; then
DRY_RUN=true
fi
if ! pguard_vmid_allowed "$VMID"; then
exit 2
fi
log_info() { echo -e "\033[0;34m[INFO]\033[0m $1"; }
log_ok() { echo -e "\033[0;32m[✓]\033[0m $1"; }

View File

@@ -0,0 +1,118 @@
#!/usr/bin/env bash
# Repair keycloak.sankofa.nexus after duplicate-IP / stale-neighbor regressions.
# Current durable path is the direct upstream:
# keycloak.sankofa.nexus -> 192.168.11.52:8080
#
# This script:
# 1. Removes the stray 192.168.11.52 alias from CT 10232 if present
# 2. Removes the guest-side reboot job that reintroduces the bad alias
# 3. Flushes stale neighbor state in the primary NPMplus CT
# 4. Forces NPMplus proxy host 60 back to 192.168.11.52:8080
# 5. Disables temporary relay services if they exist
#
# Usage: ./scripts/maintenance/fix-keycloak-relay-via-ssh.sh [--dry-run]
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
[[ -f "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" ]] && source "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" 2>/dev/null || true
DRY_RUN=false
[[ "${1:-}" == "--dry-run" ]] && DRY_RUN=true
PROXMOX_HOST="${PROXMOX_HOST_R630_01:-192.168.11.11}"
KEYCLOAK_IP="${IP_KEYCLOAK:-192.168.11.52}"
NPM_CID="${NPMPLUS_PRIMARY_VMID:-10233}"
CONFLICT_CID="${KEYCLOAK_CONFLICT_VMID:-10232}"
PROXY_HOST_ID="${KEYCLOAK_NPM_PROXY_HOST_ID:-60}"
log_info() { echo -e "\033[0;34m[INFO]\033[0m $1"; }
log_ok() { echo -e "\033[0;32m[✓]\033[0m $1"; }
log_warn() { echo -e "\033[0;33m[⚠]\033[0m $1"; }
run_ssh() { ssh -o ConnectTimeout=10 -o StrictHostKeyChecking=no root@"$PROXMOX_HOST" "$@"; }
if [[ "$DRY_RUN" == true ]]; then
echo ""
echo "=== Fix Keycloak direct routing via SSH ==="
echo " Host: $PROXMOX_HOST dry-run=true"
echo ""
log_info "Would remove stray ${KEYCLOAK_IP}/24 from CT ${CONFLICT_CID}"
log_info "Would remove CT ${CONFLICT_CID} reboot hooks that re-add ${KEYCLOAK_IP}/24"
log_info "Would flush neighbor cache for ${KEYCLOAK_IP} in NPMplus CT ${NPM_CID}"
log_info "Would update NPMplus proxy host ${PROXY_HOST_ID} to ${KEYCLOAK_IP}:8080"
log_info "Would disable temporary keycloak relay services if present"
echo ""
exit 0
fi
echo ""
echo "=== Fix Keycloak direct routing via SSH ==="
echo " Host: $PROXMOX_HOST dry-run=false"
echo ""
log_info "Removing any stray ${KEYCLOAK_IP}/24 alias from CT ${CONFLICT_CID}"
run_ssh "pct exec ${CONFLICT_CID} -- bash --norc -c '
ip addr del ${KEYCLOAK_IP}/24 dev eth0 2>/dev/null || true
ip -br addr
'"
log_ok "Conflict CT ${CONFLICT_CID} no longer carries ${KEYCLOAK_IP}"
log_info "Removing guest-side reboot hooks that reintroduce ${KEYCLOAK_IP}/24 in CT ${CONFLICT_CID}"
run_ssh "pct exec ${CONFLICT_CID} -- bash --norc -c '
set -e
CRON_TMP=\$(mktemp)
if crontab -l >/tmp/keycloak-crontab.current 2>/dev/null; then
grep -vF \"/usr/local/bin/configure-network.sh\" /tmp/keycloak-crontab.current >\"\$CRON_TMP\" || true
crontab \"\$CRON_TMP\"
else
: >\"\$CRON_TMP\"
fi
rm -f /tmp/keycloak-crontab.current \"\$CRON_TMP\"
if [[ -f /usr/local/bin/configure-network.sh ]]; then
cp /usr/local/bin/configure-network.sh /usr/local/bin/configure-network.sh.bak.\$(date +%Y%m%d%H%M%S)
cat > /usr/local/bin/configure-network.sh <<\"EOF\"
#!/bin/bash
set -euo pipefail
ip link set eth0 up 2>/dev/null || true
ip addr del ${KEYCLOAK_IP}/24 dev eth0 2>/dev/null || true
ip addr flush dev eth0 scope global 2>/dev/null || true
ip addr add 192.168.11.56/24 dev eth0
ip route replace default via 192.168.11.11 dev eth0
EOF
chmod 0755 /usr/local/bin/configure-network.sh
fi
ip addr del ${KEYCLOAK_IP}/24 dev eth0 2>/dev/null || true
ip route del default via 192.168.11.1 dev eth0 2>/dev/null || true
ip route replace default via 192.168.11.11 dev eth0
ip -br addr show dev eth0
ip route show default
crontab -l 2>/dev/null || true
'"
log_ok "Conflict CT ${CONFLICT_CID} no longer re-adds ${KEYCLOAK_IP} on reboot"
log_info "Disabling temporary relay services"
run_ssh "bash --norc -c '
systemctl disable --now keycloak-host-relay.service 2>/dev/null || true
pct exec 7802 -- systemctl disable --now keycloak-ct-relay.service 2>/dev/null || true
pct exec 7804 -- pkill -f /tmp/keycloak_gov_relay.py 2>/dev/null || true
'"
log_ok "Temporary relays disabled"
log_info "Flushing neighbor state for ${KEYCLOAK_IP} in NPMplus CT ${NPM_CID}"
run_ssh "pct exec ${NPM_CID} -- bash --norc -c '
ip neigh del ${KEYCLOAK_IP} dev eth0 2>/dev/null || true
curl -s -o /dev/null -w \"%{http_code} %{redirect_url}\n\" -H \"Host: keycloak.sankofa.nexus\" http://${KEYCLOAK_IP}:8080/
'"
log_ok "Direct Keycloak upstream responds from NPMplus CT ${NPM_CID}"
log_info "Re-applying canonical NPMplus proxy host mapping for Keycloak"
bash "${PROJECT_ROOT}/scripts/nginx-proxy-manager/update-npmplus-proxy-hosts-api.sh" >/tmp/keycloak-npmplus-sync.log 2>&1
run_ssh "pct exec ${NPM_CID} -- bash --norc -c '
curl -k -I -s -H \"Host: keycloak.sankofa.nexus\" https://127.0.0.1 | sed -n \"1,10p\"
'"
log_ok "NPMplus proxy host ${PROXY_HOST_ID} restored to direct upstream"
echo ""

View File

@@ -2,8 +2,9 @@
# Make Besu CT rootfs writable by running e2fsck on their root LV (fixes read-only / emergency_ro after ext4 errors).
# SSHs to the Proxmox host (r630-01), stops each CT, runs e2fsck -f -y on the LV, starts the CT.
#
# Usage: ./scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh [--dry-run]
# Usage: ./scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh [--dry-run] [--apply]
# Optional: BESU_WRITABLE_VMIDS="1500 1501 1502" to add sentries or other CTs (default: Core RPC 2101 only).
# Env: PROXMOX_SAFE_DEFAULTS=1 — default dry-run unless --apply or PROXMOX_OPS_APPLY=1. PROXMOX_OPS_ALLOWED_VMIDS optional.
# Run from project root. Requires: SSH to r630-01 (root, key-based).
# See: docs/00-meta/502_DEEP_DIVE_ROOT_CAUSES_AND_FIXES.md §Read-only CT
@@ -11,7 +12,9 @@ set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
[[ -f "${PROJECT_ROOT}/config/ip-addresses.conf" ]] && source "${PROJECT_ROOT}/config/ip-addresses.conf" 2>/dev/null || true
# shellcheck source=../lib/proxmox-production-guard.sh
source "${PROJECT_ROOT}/scripts/lib/proxmox-production-guard.sh"
[[ -f "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" ]] && source "${PROJECT_ROOT}/scripts/lib/load-project-env.sh" 2>/dev/null || true
HOST="${PROXMOX_HOST_R630_01:-192.168.11.11}"
# Default: Core RPC on r630-01 (2101). 2500-2505 removed — destroyed; see ALL_VMIDS_ENDPOINTS.md.
@@ -24,7 +27,21 @@ fi
SSH_OPTS="-o ConnectTimeout=20 -o ServerAliveInterval=15 -o StrictHostKeyChecking=accept-new"
DRY_RUN=false
[[ "${1:-}" == "--dry-run" ]] && DRY_RUN=true
APPLY=false
EXPLICIT_DRY=false
for _arg in "$@"; do
case "$_arg" in
--dry-run) DRY_RUN=true; EXPLICIT_DRY=true ;;
--apply) APPLY=true; DRY_RUN=false ;;
esac
done
if [[ "$EXPLICIT_DRY" != true ]] && pguard_mutations_allowed; then
APPLY=true
DRY_RUN=false
fi
if [[ "$EXPLICIT_DRY" != true ]] && pguard_safe_defaults_enabled && [[ "$APPLY" != true ]]; then
DRY_RUN=true
fi
log_info() { echo -e "\033[0;34m[INFO]\033[0m $1"; }
log_ok() { echo -e "\033[0;32m[✓]\033[0m $1"; }
@@ -32,7 +49,7 @@ log_warn() { echo -e "\033[0;33m[⚠]\033[0m $1"; }
echo ""
echo "=== Make RPC VMIDs writable via Proxmox SSH ==="
echo " Host: $HOST VMIDs: ${RPC_VMIDS[*]} dry-run=$DRY_RUN"
echo " Host: $HOST VMIDs: ${RPC_VMIDS[*]} dry-run=$DRY_RUN apply=$APPLY"
echo ""
if ! ssh $SSH_OPTS "root@$HOST" "echo OK" 2>/dev/null; then
@@ -46,6 +63,10 @@ if $DRY_RUN; then
exit 0
fi
for vmid in "${RPC_VMIDS[@]}"; do
pguard_vmid_allowed "$vmid" || exit 2
done
for vmid in "${RPC_VMIDS[@]}"; do
log_info "VMID $vmid: stop, e2fsck, start..."
status=$(ssh $SSH_OPTS "root@$HOST" "pct status $vmid 2>/dev/null | awk '{print \$2}'" 2>/dev/null || echo "missing")

View File

@@ -1,10 +1,10 @@
#!/usr/bin/env bash
# Make validator VMIDs (1000-1004) writable by running e2fsck on their rootfs.
# Fixes "Read-only file system" / JNA UnsatisfiedLinkError when Besu tries to write temp files.
# SSHs to r630-01 (1000,1001,1002) and ml110 (1003,1004), stops each CT, e2fsck, starts.
# SSHs to r630-01 (1000,1001,1002) and r630-03 (1003,1004), stops each CT, e2fsck, starts.
#
# Usage: ./scripts/maintenance/make-validator-vmids-writable-via-ssh.sh [--dry-run]
# Run from project root. Requires SSH to r630-01 and ml110 (root, key-based).
# Run from project root. Requires SSH to r630-01 and r630-03 (root, key-based).
set -euo pipefail
@@ -13,16 +13,16 @@ PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
[[ -f "${PROJECT_ROOT}/config/ip-addresses.conf" ]] && source "${PROJECT_ROOT}/config/ip-addresses.conf" 2>/dev/null || true
R630_01="${PROXMOX_HOST_R630_01:-192.168.11.11}"
ML110="${PROXMOX_ML110:-192.168.11.10}"
R630_03="${PROXMOX_R630_03:-192.168.11.13}"
SSH_OPTS="-o ConnectTimeout=15 -o StrictHostKeyChecking=accept-new"
# Validators: 1000,1001,1002 on r630-01; 1003,1004 on ml110
# Validators: 1000,1001,1002 on r630-01; 1003,1004 on r630-03
VALIDATORS=(
"1000:$R630_01"
"1001:$R630_01"
"1002:$R630_01"
"1003:$ML110"
"1004:$ML110"
"1003:$R630_03"
"1004:$R630_03"
)
DRY_RUN=false

View File

@@ -0,0 +1,107 @@
#!/usr/bin/env bash
# Migrate Chain 138 / Besu LXCs from ml110 to r630-02 and r630-03 (cluster copy migration).
# Use after freeing ml110 RAM (e.g. 4×32GB → 1×64GB): move validators, RPC, sentries, Thirdweb CTs off ml110.
#
# PVE 9: use --target-storage (not --storage). Running CTs need restart migration: --restart 1
#
# Target split (balances disk: r630-02 thin5 has ~200G+ free; r630-03 local-lvm ~1T free):
# r630-02 / thin5: 2305, 2306, 2307, 2308 (named RPCs — smaller footprint)
# r630-03 / local-lvm: everything else on ml110 (validators, core-2, private, 2304, sentries, thirdweb)
#
# Usage (from LAN, SSH key to Proxmox nodes):
# ./scripts/maintenance/migrate-ml110-besu-rpc-to-r630-02-03.sh # migrate all still on ml110
# ./scripts/maintenance/migrate-ml110-besu-rpc-to-r630-02-03.sh --dry-run
# ./scripts/maintenance/migrate-ml110-besu-rpc-to-r630-02-03.sh 2305 # single VMID
#
# Prerequisites: ml110, r630-02, r630-03 in same cluster; storages active on targets.
#
set -uo pipefail
# Do not use set -e: one failed migrate should not abort the whole batch (log and continue).
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
[[ -f "${PROJECT_ROOT}/config/ip-addresses.conf" ]] && source "${PROJECT_ROOT}/config/ip-addresses.conf" 2>/dev/null || true
SRC_IP="${PROXMOX_HOST_ML110:-192.168.11.10}"
SSH_OPTS="-o BatchMode=yes -o ConnectTimeout=20 -o StrictHostKeyChecking=accept-new"
# Order: stopped first (no --restart), then r630-02 RPCs, then r630-03 bulk (running with --restart).
R630_02_STORAGE="thin5"
R630_03_STORAGE="local-lvm"
# VMID -> "r630-02" or "r630-03"
declare -A TARGET_NODE
declare -A TARGET_STOR
for v in 2305 2306 2307 2308; do
TARGET_NODE[$v]="r630-02"
TARGET_STOR[$v]="$R630_02_STORAGE"
done
for v in 1003 1004 1503 1504 1505 1506 1507 1508 2102 2301 2304 2400 2402 2403; do
TARGET_NODE[$v]="r630-03"
TARGET_STOR[$v]="$R630_03_STORAGE"
done
ALL_ORDER=(1503 1504 1505 1506 1507 1508 2400 2402 2403 2305 2306 2307 2308 2304 2301 2102 1003 1004)
DRY_RUN=false
SINGLE=()
for arg in "$@"; do
[[ "$arg" == "--dry-run" ]] && DRY_RUN=true
[[ "$arg" =~ ^[0-9]+$ ]] && SINGLE+=("$arg")
done
log() { echo "[$(date -Iseconds)] $*"; }
ssh_src() { ssh $SSH_OPTS "root@${SRC_IP}" "$@"; }
migrate_one() {
local vmid="$1"
local node="${TARGET_NODE[$vmid]:-}"
local stor="${TARGET_STOR[$vmid]:-}"
if [[ -z "$node" || -z "$stor" ]]; then
log "SKIP $vmid — not in migration map (edit script)."
return 0
fi
if $DRY_RUN; then
echo " ssh root@${SRC_IP} \"pct migrate $vmid $node --target-storage $stor [--restart 1 if running]\""
return 0
fi
if ! ssh_src "pct config $vmid" &>/dev/null; then
log "SKIP $vmid — not on ${SRC_IP} (already migrated or missing)."
return 0
fi
local running
running=$(ssh_src "pct status $vmid 2>/dev/null | awk '{print \$2}'" || echo "unknown")
local extra=()
if [[ "$running" == "running" ]]; then
extra=(--restart 1)
fi
log "MIGRATE $vmid -> $node storage=$stor status=$running ${extra[*]:-}"
if ! ssh_src "pct migrate $vmid $node --target-storage $stor ${extra[*]:-}"; then
log "FAIL $vmid -> $node (see above). Fix and re-run this script; completed VMIDs are skipped."
return 1
fi
log "DONE $vmid -> $node"
}
main() {
if [[ ${#SINGLE[@]} -gt 0 ]]; then
for vmid in "${SINGLE[@]}"; do
migrate_one "$vmid" || true
done
return 0
fi
for vmid in "${ALL_ORDER[@]}"; do
migrate_one "$vmid" || true
done
}
main "$@"

View File

@@ -88,7 +88,7 @@ echo ""
# 0. Make RPC VMIDs writable (e2fsck so fix/install scripts can write)
echo "[0/5] Making RPC VMIDs writable..."
echo "--- 0/5: Make RPC VMIDs writable (r630-01: 2101, 2500-2505) ---"
if run_step "${SCRIPT_DIR}/make-rpc-vmids-writable-via-ssh.sh"; then
if run_step "${SCRIPT_DIR}/make-rpc-vmids-writable-via-ssh.sh" --apply; then
echo " Done."
else
echo " Step had warnings (check output)."

View File

@@ -6,20 +6,40 @@ set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
CHECKS_SCRIPT="$PROJECT_ROOT/scripts/maintenance/daily-weekly-checks.sh"
LOG_DIR="$PROJECT_ROOT/logs"
CRON_DAILY="0 8 * * * cd $PROJECT_ROOT && bash $CHECKS_SCRIPT daily >> $LOG_DIR/daily-weekly-checks.log 2>&1"
CRON_WEEKLY="0 9 * * 0 cd $PROJECT_ROOT && bash $CHECKS_SCRIPT weekly >> $LOG_DIR/daily-weekly-checks.log 2>&1"
INSTALL_ROOT="${CRON_PROJECT_ROOT:-$PROJECT_ROOT}"
CHECKS_SCRIPT="$INSTALL_ROOT/scripts/maintenance/daily-weekly-checks.sh"
LOG_DIR="$INSTALL_ROOT/logs"
CRON_DAILY="0 8 * * * cd $INSTALL_ROOT && bash $CHECKS_SCRIPT daily >> $LOG_DIR/daily-weekly-checks.log 2>&1"
CRON_WEEKLY="0 9 * * 0 cd $INSTALL_ROOT && bash $CHECKS_SCRIPT weekly >> $LOG_DIR/daily-weekly-checks.log 2>&1"
validate_install_root() {
if [[ "$INSTALL_ROOT" == /tmp/* ]]; then
echo "Refusing to install cron from ephemeral path: $INSTALL_ROOT"
echo "Set CRON_PROJECT_ROOT to a persistent checkout on the host, then rerun."
exit 1
fi
if [[ ! -f "$CHECKS_SCRIPT" ]]; then
echo "Checks script not found at: $CHECKS_SCRIPT"
echo "Set CRON_PROJECT_ROOT to the host path that contains scripts/maintenance/daily-weekly-checks.sh."
exit 1
fi
}
case "${1:-}" in
--install)
validate_install_root
mkdir -p "$LOG_DIR"
(crontab -l 2>/dev/null; echo "$CRON_DAILY"; echo "$CRON_WEEKLY") | crontab -
{
crontab -l 2>/dev/null | grep -v 'daily-weekly-checks.sh' || true
echo "$CRON_DAILY"
echo "$CRON_WEEKLY"
} | crontab -
echo "Installed daily (08:00) and weekly (Sun 09:00):"
echo " $CRON_DAILY"
echo " $CRON_WEEKLY"
;;
--show)
validate_install_root
echo "Daily (O-1, O-2): $CRON_DAILY"
echo "Weekly (O-3): $CRON_WEEKLY"
;;

View File

@@ -7,26 +7,41 @@ set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
LAG_SCRIPT="$PROJECT_ROOT/scripts/maintenance/check-and-fix-explorer-lag.sh"
LOG_DIR="$PROJECT_ROOT/logs"
INSTALL_ROOT="${CRON_PROJECT_ROOT:-$PROJECT_ROOT}"
LAG_SCRIPT="$INSTALL_ROOT/scripts/maintenance/check-and-fix-explorer-lag.sh"
LOG_DIR="$INSTALL_ROOT/logs"
LOG_FILE="$LOG_DIR/explorer-lag-fix.log"
# Every 6 hours (0:00, 6:00, 12:00, 18:00)
CRON_LAG="0 */6 * * * cd $PROJECT_ROOT && bash $LAG_SCRIPT >> $LOG_FILE 2>&1"
CRON_LAG="0 */6 * * * cd $INSTALL_ROOT && bash $LAG_SCRIPT >> $LOG_FILE 2>&1"
validate_install_root() {
if [[ "$INSTALL_ROOT" == /tmp/* ]]; then
echo "Refusing to install cron from ephemeral path: $INSTALL_ROOT"
echo "Set CRON_PROJECT_ROOT to a persistent checkout on the host, then rerun."
exit 1
fi
if [[ ! -f "$LAG_SCRIPT" ]]; then
echo "Lag script not found at: $LAG_SCRIPT"
echo "Set CRON_PROJECT_ROOT to the host path that contains scripts/maintenance/check-and-fix-explorer-lag.sh."
exit 1
fi
}
case "${1:-}" in
--install)
validate_install_root
mkdir -p "$LOG_DIR"
if crontab -l 2>/dev/null | grep -q "check-and-fix-explorer-lag.sh"; then
echo "Explorer lag cron already present in crontab."
else
(crontab -l 2>/dev/null; echo "$CRON_LAG") | crontab -
echo "Installed explorer lag cron (every 6 hours):"
echo " $CRON_LAG"
echo "Log: $LOG_FILE"
fi
{
crontab -l 2>/dev/null | grep -v "check-and-fix-explorer-lag.sh" || true
echo "$CRON_LAG"
} | crontab -
echo "Installed explorer lag cron (every 6 hours):"
echo " $CRON_LAG"
echo "Log: $LOG_FILE"
;;
--show)
validate_install_root
echo "Explorer lag check-and-fix (every 6 hours):"
echo " $CRON_LAG"
echo "Log: $LOG_FILE"

View File

@@ -6,16 +6,36 @@ set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
BACKUP_SCRIPT="$PROJECT_ROOT/scripts/verify/backup-npmplus.sh"
CRON_LINE="0 3 * * * cd $PROJECT_ROOT && bash $BACKUP_SCRIPT >> $PROJECT_ROOT/logs/npmplus-backup.log 2>&1"
INSTALL_ROOT="${CRON_PROJECT_ROOT:-$PROJECT_ROOT}"
BACKUP_SCRIPT="$INSTALL_ROOT/scripts/verify/backup-npmplus.sh"
LOG_DIR="$INSTALL_ROOT/logs"
CRON_LINE="0 3 * * * /usr/bin/flock -n /var/lock/npmplus-backup.lock bash -lc 'cd $INSTALL_ROOT && bash $BACKUP_SCRIPT >> $LOG_DIR/npmplus-backup.log 2>&1'"
validate_install_root() {
if [[ "$INSTALL_ROOT" == /tmp/* ]]; then
echo "Refusing to install cron from ephemeral path: $INSTALL_ROOT"
echo "Set CRON_PROJECT_ROOT to a persistent checkout on the host, then rerun."
exit 1
fi
if [[ ! -f "$BACKUP_SCRIPT" ]]; then
echo "Backup script not found at: $BACKUP_SCRIPT"
echo "Set CRON_PROJECT_ROOT to the host path that contains scripts/verify/backup-npmplus.sh."
exit 1
fi
}
case "${1:-}" in
--install)
mkdir -p "$PROJECT_ROOT/logs"
(crontab -l 2>/dev/null; echo "$CRON_LINE") | crontab -
validate_install_root
mkdir -p "$LOG_DIR"
{
crontab -l 2>/dev/null | grep -v 'backup-npmplus.sh' || true
echo "$CRON_LINE"
} | crontab -
echo "Installed: $CRON_LINE"
;;
--show)
validate_install_root
echo "Crontab line: $CRON_LINE"
;;
*)

View File

@@ -6,36 +6,44 @@ set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
COLLECT_SCRIPT="$PROJECT_ROOT/scripts/monitoring/collect-storage-growth-data.sh"
PRUNE_SNAPSHOTS="$PROJECT_ROOT/scripts/monitoring/prune-storage-snapshots.sh"
PRUNE_HISTORY="$PROJECT_ROOT/scripts/monitoring/prune-storage-history.sh"
LOG_DIR="$PROJECT_ROOT/logs/storage-growth"
INSTALL_ROOT="${CRON_PROJECT_ROOT:-$PROJECT_ROOT}"
COLLECT_SCRIPT="$INSTALL_ROOT/scripts/monitoring/collect-storage-growth-data.sh"
PRUNE_SNAPSHOTS="$INSTALL_ROOT/scripts/monitoring/prune-storage-snapshots.sh"
PRUNE_HISTORY="$INSTALL_ROOT/scripts/monitoring/prune-storage-history.sh"
LOG_DIR="$INSTALL_ROOT/logs/storage-growth"
# Every 6 hours
CRON_STORAGE="0 */6 * * * cd $PROJECT_ROOT && bash $COLLECT_SCRIPT --append >> $LOG_DIR/cron.log 2>&1"
CRON_STORAGE="0 */6 * * * cd $INSTALL_ROOT && bash $COLLECT_SCRIPT --append >> $LOG_DIR/cron.log 2>&1"
# Weekly Sun 08:00: prune snapshots (30d) + history (~90d)
CRON_PRUNE="0 8 * * 0 cd $PROJECT_ROOT && bash $PRUNE_SNAPSHOTS >> $LOG_DIR/cron.log 2>&1 && bash $PRUNE_HISTORY >> $LOG_DIR/cron.log 2>&1"
CRON_PRUNE="0 8 * * 0 cd $INSTALL_ROOT && bash $PRUNE_SNAPSHOTS >> $LOG_DIR/cron.log 2>&1 && bash $PRUNE_HISTORY >> $LOG_DIR/cron.log 2>&1"
validate_install_root() {
if [[ "$INSTALL_ROOT" == /tmp/* ]]; then
echo "Refusing to install cron from ephemeral path: $INSTALL_ROOT"
echo "Set CRON_PROJECT_ROOT to a persistent checkout on the host, then rerun."
exit 1
fi
if [[ ! -f "$COLLECT_SCRIPT" || ! -f "$PRUNE_SNAPSHOTS" || ! -f "$PRUNE_HISTORY" ]]; then
echo "One or more storage growth scripts are missing under: $INSTALL_ROOT"
echo "Set CRON_PROJECT_ROOT to the host path that contains scripts/monitoring/collect-storage-growth-data.sh and prune helpers."
exit 1
fi
}
case "${1:-}" in
--install)
validate_install_root
mkdir -p "$LOG_DIR"
added=""
if ! crontab -l 2>/dev/null | grep -q "collect-storage-growth-data.sh"; then
(crontab -l 2>/dev/null; echo "$CRON_STORAGE") | crontab -
added="collect"
fi
if ! crontab -l 2>/dev/null | grep -q "prune-storage-snapshots.sh"; then
(crontab -l 2>/dev/null; echo "$CRON_PRUNE") | crontab -
added="${added:+$added + }prune"
fi
if [ -n "$added" ]; then
echo "Installed storage growth cron:"
echo " $CRON_STORAGE"
echo " $CRON_PRUNE"
else
echo "Storage growth cron already present in crontab."
fi
{
crontab -l 2>/dev/null | grep -v "collect-storage-growth-data.sh" | grep -v "prune-storage-snapshots.sh" | grep -v "prune-storage-history.sh" || true
echo "$CRON_STORAGE"
echo "$CRON_PRUNE"
} | crontab -
echo "Installed storage growth cron:"
echo " $CRON_STORAGE"
echo " $CRON_PRUNE"
;;
--show)
validate_install_root
echo "Storage growth (append every 6h): $CRON_STORAGE"
echo "Storage prune (weekly Sun 08:00): $CRON_PRUNE"
;;

View File

@@ -6,23 +6,38 @@ set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
MONITOR_SCRIPT="$PROJECT_ROOT/scripts/storage-monitor.sh"
LOG_DIR="$PROJECT_ROOT/logs/storage-monitoring"
INSTALL_ROOT="${CRON_PROJECT_ROOT:-$PROJECT_ROOT}"
MONITOR_SCRIPT="$INSTALL_ROOT/scripts/storage-monitor.sh"
LOG_DIR="$INSTALL_ROOT/logs/storage-monitoring"
# Daily at 07:00 (before daily-weekly-checks at 08:00)
CRON_STORAGE_MONITOR="0 7 * * * cd $PROJECT_ROOT && bash $MONITOR_SCRIPT >> $LOG_DIR/cron.log 2>&1"
CRON_STORAGE_MONITOR="0 7 * * * cd $INSTALL_ROOT && bash $MONITOR_SCRIPT >> $LOG_DIR/cron.log 2>&1"
validate_install_root() {
if [[ "$INSTALL_ROOT" == /tmp/* ]]; then
echo "Refusing to install cron from ephemeral path: $INSTALL_ROOT"
echo "Set CRON_PROJECT_ROOT to a persistent checkout on the host, then rerun."
exit 1
fi
if [[ ! -f "$MONITOR_SCRIPT" ]]; then
echo "Monitor script not found at: $MONITOR_SCRIPT"
echo "Set CRON_PROJECT_ROOT to the host path that contains scripts/storage-monitor.sh."
exit 1
fi
}
case "${1:-}" in
--install)
validate_install_root
mkdir -p "$LOG_DIR"
if crontab -l 2>/dev/null | grep -q "storage-monitor.sh"; then
echo "Storage monitor cron already present in crontab."
else
(crontab -l 2>/dev/null; echo "$CRON_STORAGE_MONITOR") | crontab -
echo "Installed storage monitor cron (daily 07:00):"
echo " $CRON_STORAGE_MONITOR"
fi
{
crontab -l 2>/dev/null | grep -v "storage-monitor.sh" || true
echo "$CRON_STORAGE_MONITOR"
} | crontab -
echo "Installed storage monitor cron (daily 07:00):"
echo " $CRON_STORAGE_MONITOR"
;;
--show)
validate_install_root
echo "Storage monitor (daily 07:00):"
echo " $CRON_STORAGE_MONITOR"
;;

View File

@@ -1,10 +1,10 @@
#!/usr/bin/env bash
# Set max-peers=32 in Besu config on all running Besu nodes (in-place sed).
# Set max-peers=40 in Besu config on all running Besu nodes (in-place sed).
# Run after repo configs are updated; then restart Besu with restart-besu-reload-node-lists.sh.
# See: docs/08-monitoring/PEER_CONNECTIONS_PLAN.md
#
# Usage: ./scripts/maintenance/set-all-besu-max-peers-32.sh [--dry-run]
# Requires: SSH to Proxmox hosts (r630-01, r630-02, ml110).
# Requires: SSH to Proxmox hosts (r630-01, r630-02, r630-03).
set -euo pipefail
@@ -14,16 +14,17 @@ PROJECT_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
DRY_RUN=false
[[ "${1:-}" == "--dry-run" ]] && DRY_RUN=true
TARGET_MAX_PEERS=40
declare -A HOST_BY_VMID
for v in 1000 1001 1002 1500 1501 1502 2101 2500 2501 2502 2503 2504 2505; do HOST_BY_VMID[$v]="${PROXMOX_R630_01:-${PROXMOX_HOST_R630_01:-192.168.11.11}}"; done
for v in 2201 2303 2401; do HOST_BY_VMID[$v]="${PROXMOX_R630_02:-${PROXMOX_HOST_R630_02:-192.168.11.12}}"; done
for v in 1003 1004 1503 1504 1505 1506 1507 1508 2102 2301 2304 2305 2306 2307 2308 2400 2402 2403; do HOST_BY_VMID[$v]="${PROXMOX_ML110:-${PROXMOX_HOST_ML110:-192.168.11.10}}"; done
for v in 1000 1001 1002 1500 1501 1502 2101 2103 2500 2501 2502 2503 2504 2505; do HOST_BY_VMID[$v]="${PROXMOX_R630_01:-${PROXMOX_HOST_R630_01:-192.168.11.11}}"; done
for v in 2201 2303 2305 2306 2307 2308 2401; do HOST_BY_VMID[$v]="${PROXMOX_R630_02:-${PROXMOX_HOST_R630_02:-192.168.11.12}}"; done
for v in 1003 1004 1503 1504 1505 1506 1507 1508 2102 2301 2304 2400 2402 2403; do HOST_BY_VMID[$v]="${PROXMOX_R630_03:-${PROXMOX_HOST_R630_03:-192.168.11.13}}"; done
BESU_VMIDS=(1000 1001 1002 1003 1004 1500 1501 1502 1503 1504 1505 1506 1507 1508 2101 2102 2201 2301 2303 2304 2305 2306 2307 2308 2400 2401 2402 2403 2500 2501 2502 2503 2504 2505)
BESU_VMIDS=(1000 1001 1002 1003 1004 1500 1501 1502 1503 1504 1505 1506 1507 1508 2101 2102 2103 2201 2301 2303 2304 2305 2306 2307 2308 2400 2401 2402 2403 2500 2501 2502 2503 2504 2505)
SSH_OPTS="-o ConnectTimeout=8 -o StrictHostKeyChecking=accept-new"
echo "Set max-peers=32 on all Besu nodes (dry-run=$DRY_RUN)"
echo "Set max-peers=${TARGET_MAX_PEERS} on all Besu nodes (dry-run=$DRY_RUN)"
echo ""
for vmid in "${BESU_VMIDS[@]}"; do
@@ -35,7 +36,7 @@ for vmid in "${BESU_VMIDS[@]}"; do
continue
fi
if $DRY_RUN; then
echo "VMID $vmid @ $host: [dry-run] would sed max-peers=25 -> 32"
echo "VMID $vmid @ $host: [dry-run] would normalize max-peers -> ${TARGET_MAX_PEERS}"
continue
fi
# Try common Besu config locations; sed in place
@@ -44,7 +45,10 @@ for vmid in "${BESU_VMIDS[@]}"; do
[ -d \"\$d\" ] || continue
for f in \"\$d\"/*.toml; do
[ -f \"\$f\" ] || continue
grep -q \"max-peers=25\" \"\$f\" 2>/dev/null && sed -i \"s/max-peers=25/max-peers=32/g\" \"\$f\" && echo \"OK:\$f\"
if grep -qE \"^max-peers=\" \"\$f\" 2>/dev/null; then
sed -i -E \"s/^max-peers=.*/max-peers=${TARGET_MAX_PEERS}/\" \"\$f\"
echo \"OK:\$f\"
fi
done
done
'" 2>/dev/null || echo "FAIL")