Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
- Config, docs, scripts, and backup manifests - Submodule refs unchanged (m = modified content in submodules) Made-with: Cursor
81 lines
4.1 KiB
Markdown
81 lines
4.1 KiB
Markdown
# Concrete Next Steps: RPC 2101 and Storage (thin5 / data)
|
||
|
||
**Last updated:** 2026-02-28
|
||
|
||
---
|
||
|
||
## 1. VMID 2101 (Core RPC) — RPC not responding
|
||
|
||
**Symptom:** Container running, `besu-rpc` active, but RPC (e.g. `eth_blockNumber`) returns no response from 192.168.11.211:8545.
|
||
|
||
### Run order (from project root, on LAN with SSH to r630-01)
|
||
|
||
| Step | Action | Command |
|
||
|------|--------|---------|
|
||
| 1 | **Diagnose** | `bash scripts/maintenance/health-check-rpc-2101.sh` |
|
||
| 2a | If **read-only / database not writable** | `bash scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh` (then re-run step 1) |
|
||
| 2b | If **JNA / NoClassDefFoundError** in logs | `bash scripts/maintenance/fix-rpc-2101-jna-reinstall.sh` (then step 3) |
|
||
| 3 | **Fix** (start CT if needed, restart Besu, verify) | `bash scripts/maintenance/fix-core-rpc-2101.sh` |
|
||
| 4 | **Verify** | `bash scripts/health/check-rpc-vms-health.sh` — 2101 should show block number |
|
||
|
||
**Optional:** `fix-core-rpc-2101.sh --restart-only` if the container is already running and you only want to restart the Besu service.
|
||
|
||
**Docs:** `docs/09-troubleshooting/RPC_NODES_BLOCK_PRODUCTION_FIX.md`, `docs/03-deployment/RPC_2101_READONLY_FIX.md` (if present).
|
||
|
||
---
|
||
|
||
## 2. r630-02 thin5 — 84.6% used (monitor / reduce)
|
||
|
||
**Risk:** thin5 is approaching the 85% WARN threshold; LVM thin pools can become slow or fail above ~90%.
|
||
|
||
### Immediate
|
||
|
||
| Step | Action | Command / notes |
|
||
|------|--------|------------------|
|
||
| 1 | **See which containers use thin5** | On r630-02: `ssh root@192.168.11.12 'pct list; for v in $(pct list 2>/dev/null | awk "NR>1{print \$1}"); do grep -l thin5 /etc/pve/lxc/$v.conf 2>/dev/null && echo "VMID $v uses thin5"; done'` |
|
||
| 2 | **Check disk usage inside those CTs** | `bash scripts/maintenance/check-disk-all-vmids.sh` — find VMIDs on r630-02 with high % |
|
||
| 3 | **Free space inside CTs** (Besu/DB, logs) | Per VMID: `pct exec <vmid> -- du -sh /data /var/log 2>/dev/null`; prune logs, old snapshots, or Besu temp if safe |
|
||
| 4 | **Optional: migrate one CT to another thin** | If thin5 stays high: backup CT, restore to thin2/thin3/thin4/thin6 (e.g. `pct restore <vmid> /path/to/dump --storage thin2`) |
|
||
|
||
### Ongoing
|
||
|
||
| Step | Action | Command / notes |
|
||
|------|--------|------------------|
|
||
| 5 | **Track growth** | `bash scripts/monitoring/collect-storage-growth-data.sh --append` (or install cron: `bash scripts/maintenance/schedule-storage-growth-cron.sh --install`) |
|
||
| 6 | **Prune old snapshots** (on host) | `bash scripts/monitoring/prune-storage-snapshots.sh` (weekly; keeps last 30 days) |
|
||
|
||
---
|
||
|
||
## 3. r630-01 data / local-lvm — 71.9% used (monitor)
|
||
|
||
**Risk:** Still healthy; monitor so it does not reach 85%+.
|
||
|
||
### Immediate
|
||
|
||
| Step | Action | Command / notes |
|
||
|------|--------|------------------|
|
||
| 1 | **Snapshot + growth check** | `bash scripts/monitoring/collect-storage-growth-data.sh` — review `logs/storage-growth/` |
|
||
| 2 | **Identify large CTs on r630-01** | `bash scripts/maintenance/check-disk-all-vmids.sh` — ml110 + r630-01; VMIDs 2101, 2500–2505 are on r630-01 |
|
||
|
||
### Ongoing
|
||
|
||
| Step | Action | Command / notes |
|
||
|------|--------|------------------|
|
||
| 3 | **Same as thin5** | Use `schedule-storage-growth-cron.sh --install` for weekly collection + prune |
|
||
| 4 | **Before new deployments** | Re-run `bash scripts/audit-proxmox-rpc-storage.sh` and check data% / local-lvm% |
|
||
|
||
---
|
||
|
||
## Quick reference
|
||
|
||
| Item | Script | Purpose |
|
||
|------|--------|---------|
|
||
| 2101 health | `scripts/maintenance/health-check-rpc-2101.sh` | Diagnose Core RPC |
|
||
| 2101 fix | `scripts/maintenance/fix-core-rpc-2101.sh` | Restart Besu, verify RPC |
|
||
| 2101 read-only | `scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh` | e2fsck RPC VMIDs on r630-01 |
|
||
| 2101 JNA | `scripts/maintenance/fix-rpc-2101-jna-reinstall.sh` | Reinstall Besu in 2101 |
|
||
| Storage audit | `scripts/audit-proxmox-rpc-storage.sh` | All hosts + RPC rootfs mapping |
|
||
| Disk in CTs | `scripts/maintenance/check-disk-all-vmids.sh` | Root / usage per running CT |
|
||
| Storage growth | `scripts/monitoring/collect-storage-growth-data.sh` | Snapshot pvesm/lvs/df |
|
||
| Growth cron | `scripts/maintenance/schedule-storage-growth-cron.sh --install` | Weekly collect + prune |
|