# Maintenance scripts review **Date:** 2026-02-15 **Scope:** RPC/502 fix flow, writability step, runner, and related docs. --- ## 1. Flow overview | Step | Script | Purpose | |------|--------|---------| | 0 | `make-rpc-vmids-writable-via-ssh.sh` | Stop 2101, 2500–2505 on r630-01; e2fsck rootfs; start; verify /tmp writable | | 1 | `resolve-and-fix-all-via-proxmox-ssh.sh` | Dev VM IP .59, start containers, DBIS services (r630-01, ml110) | | 2 | `fix-rpc-2101-jna-reinstall.sh` | Reinstall Besu in 2101 (JNA fix), use /tmp in CT, set java.io.tmpdir=/data/besu/tmp | | 3 | `install-besu-permanent-on-missing-nodes.sh` | Install Besu on 1505–1508 (ml110), 2500–2505 (r630-01) where missing | | 4 | `address-all-remaining-502s.sh` | fix-all-502s-comprehensive + NPM proxy update + RPC diagnostics | | 5 | `verify-end-to-end-routing.sh` | E2E (optional via `--e2e`) | **Single entry point:** `./scripts/maintenance/run-all-maintenance-via-proxmox-ssh.sh` [--no-npm] [--e2e] [--dry-run] --- ## 2. What works well - **Writability first:** Step 0 fixes read-only root (ext4 errors) so steps 2 and 3 can write to CTs. All seven RPC VMIDs (2101, 2500–2505) are handled on r630-01. - **Clear ordering:** Make writable → resolve/start → fix 2101 → install Besu on missing → address 502s → E2E. Dependencies are respected. - **Config-driven:** Hosts and IPs come from `config/ip-addresses.conf` (PROXMOX_HOST_R630_01, etc.). - **Idempotent / skip logic:** resolve-and-fix skips if already correct; install-besu-permanent skips VMIDs that already have `/opt/besu/bin/besu`. - **Docs linked:** 502_DEEP_DIVE (§ Read-only CT), CHECK_ALL_UPDATES (§9 Remaining fixes), maintenance README all reference the runner and make-writable script. - **JNA tmpdir:** Standalone installer and 2101 fix set `-Djava.io.tmpdir=/data/besu/tmp` so Besu/JNA work when `/tmp` is restricted. - **Apt resilience:** Standalone installer allows `apt-get update` to fail (e.g. command-not-found I/O error) and still requires `java` and `wget` before continuing. --- ## 3. Gaps and risks - **Step 2 (2101) can be slow:** Apt install inside the CT can take 5–15+ minutes; the runner has no per-step timeout, so the whole run can appear to hang at “Installing packages…”. - **Errors hidden:** The runner uses `2>/dev/null` on each step and only prints “Done” or “Step had warnings.” Failures (e.g. 2101 install fail, 2505 install fail) are not surfaced unless you read the full output. - **Disk space:** 2502/2504 have historically hit “No space left on device” in `/data/besu` (RocksDB). The scripts do not check or resize CT disk; that remains manual (e.g. `pct resize rootfs +50G` or free space inside CT). - **LV name assumption:** make-rpc-vmids-writable assumes LVs are `/dev/pve/vm--disk-0`. Different storage or naming would need script changes. - **Single host for RPC:** make-rpc-vmids-writable only targets r630-01. If any RPC VMIDs are moved to ml110/r630-02, the script would need to be extended (or a second call with a different host). --- ## 4. Recommendations and completion 1. **Optional verbose mode:** ✅ **Done.** Runner supports `--verbose`; when set, step output is not redirected (no `2>/dev/null`), so failures are visible. 2. **Optional timeout for step 2:** ✅ **Done.** `STEP2_TIMEOUT` (default 900) applies to the 2101 fix; exit code 124 is detected and a message tells the user to re-run the fix manually. Use `STEP2_TIMEOUT=0` to disable. 3. **§9 checklist:** ✅ CHECK_ALL_UPDATES §9 includes "RPC CTs read-only → make-rpc-vmids-writable first"; operators have a single place for order of operations. 4. **Disk check (future):** Not implemented. Optionally run `pct exec -- df -h / /data/besu` before install/fix and warn if usage > 90%. --- ## 5. File reference | File | Role | |------|------| | `scripts/maintenance/run-all-maintenance-via-proxmox-ssh.sh` | Main runner (steps 0–5) | | `scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh` | e2fsck 2101, 2500–2505 on r630-01 | | `scripts/maintenance/address-all-remaining-502s.sh` | Backends + NPM + diagnostics | | `scripts/maintenance/fix-rpc-2101-jna-reinstall.sh` | 2101 Besu reinstall, /tmp + JNA tmpdir | | `scripts/install-besu-in-ct-standalone.sh` | In-CT Besu install; apt tolerant; JNA tmpdir | | `scripts/besu/install-besu-permanent-on-missing-nodes.sh` | Besu on 1505–1508, 2500–2505; writability check | | `docs/00-meta/502_DEEP_DIVE_ROOT_CAUSES_AND_FIXES.md` | Root causes, Read-only CT, 2101/2500–2505 fixes | | `docs/05-network/CHECK_ALL_UPDATES_AND_CLOUDFLARE_TUNNELS.md` | Config, tunnels, verification, §9 remaining fixes | --- ## 6. Quick commands ```bash # Full run (writable → fix → install → 502s → E2E) ./scripts/maintenance/run-all-maintenance-via-proxmox-ssh.sh --e2e # Show all step output (no 2>/dev/null) ./scripts/maintenance/run-all-maintenance-via-proxmox-ssh.sh --e2e --verbose # Step 2 (2101 fix) timeout: default 900s; disable with 0 STEP2_TIMEOUT=1200 ./scripts/maintenance/run-all-maintenance-via-proxmox-ssh.sh --e2e STEP2_TIMEOUT=0 ./scripts/maintenance/run-all-maintenance-via-proxmox-ssh.sh --e2e # Only make RPC CTs writable ./scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh # Dry-run (print steps only) ./scripts/maintenance/run-all-maintenance-via-proxmox-ssh.sh --dry-run ``` Reports and diagnostics: `docs/04-configuration/verification-evidence/` (RPC diagnostics, E2E reports).