Files
proxmox/docs/00-meta/NEXT_STEPS_2101_AND_STORAGE.md
defiQUG b3a8fe4496
Some checks failed
Deploy to Phoenix / deploy (push) Has been cancelled
chore: sync all changes to Gitea
- Config, docs, scripts, and backup manifests
- Submodule refs unchanged (m = modified content in submodules)

Made-with: Cursor
2026-03-02 11:37:34 -08:00

81 lines
4.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Concrete Next Steps: RPC 2101 and Storage (thin5 / data)
**Last updated:** 2026-02-28
---
## 1. VMID 2101 (Core RPC) — RPC not responding
**Symptom:** Container running, `besu-rpc` active, but RPC (e.g. `eth_blockNumber`) returns no response from 192.168.11.211:8545.
### Run order (from project root, on LAN with SSH to r630-01)
| Step | Action | Command |
|------|--------|---------|
| 1 | **Diagnose** | `bash scripts/maintenance/health-check-rpc-2101.sh` |
| 2a | If **read-only / database not writable** | `bash scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh` (then re-run step 1) |
| 2b | If **JNA / NoClassDefFoundError** in logs | `bash scripts/maintenance/fix-rpc-2101-jna-reinstall.sh` (then step 3) |
| 3 | **Fix** (start CT if needed, restart Besu, verify) | `bash scripts/maintenance/fix-core-rpc-2101.sh` |
| 4 | **Verify** | `bash scripts/health/check-rpc-vms-health.sh` — 2101 should show block number |
**Optional:** `fix-core-rpc-2101.sh --restart-only` if the container is already running and you only want to restart the Besu service.
**Docs:** `docs/09-troubleshooting/RPC_NODES_BLOCK_PRODUCTION_FIX.md`, `docs/03-deployment/RPC_2101_READONLY_FIX.md` (if present).
---
## 2. r630-02 thin5 — 84.6% used (monitor / reduce)
**Risk:** thin5 is approaching the 85% WARN threshold; LVM thin pools can become slow or fail above ~90%.
### Immediate
| Step | Action | Command / notes |
|------|--------|------------------|
| 1 | **See which containers use thin5** | On r630-02: `ssh root@192.168.11.12 'pct list; for v in $(pct list 2>/dev/null | awk "NR>1{print \$1}"); do grep -l thin5 /etc/pve/lxc/$v.conf 2>/dev/null && echo "VMID $v uses thin5"; done'` |
| 2 | **Check disk usage inside those CTs** | `bash scripts/maintenance/check-disk-all-vmids.sh` — find VMIDs on r630-02 with high % |
| 3 | **Free space inside CTs** (Besu/DB, logs) | Per VMID: `pct exec <vmid> -- du -sh /data /var/log 2>/dev/null`; prune logs, old snapshots, or Besu temp if safe |
| 4 | **Optional: migrate one CT to another thin** | If thin5 stays high: backup CT, restore to thin2/thin3/thin4/thin6 (e.g. `pct restore <vmid> /path/to/dump --storage thin2`) |
### Ongoing
| Step | Action | Command / notes |
|------|--------|------------------|
| 5 | **Track growth** | `bash scripts/monitoring/collect-storage-growth-data.sh --append` (or install cron: `bash scripts/maintenance/schedule-storage-growth-cron.sh --install`) |
| 6 | **Prune old snapshots** (on host) | `bash scripts/monitoring/prune-storage-snapshots.sh` (weekly; keeps last 30 days) |
---
## 3. r630-01 data / local-lvm — 71.9% used (monitor)
**Risk:** Still healthy; monitor so it does not reach 85%+.
### Immediate
| Step | Action | Command / notes |
|------|--------|------------------|
| 1 | **Snapshot + growth check** | `bash scripts/monitoring/collect-storage-growth-data.sh` — review `logs/storage-growth/` |
| 2 | **Identify large CTs on r630-01** | `bash scripts/maintenance/check-disk-all-vmids.sh` — ml110 + r630-01; VMIDs 2101, 25002505 are on r630-01 |
### Ongoing
| Step | Action | Command / notes |
|------|--------|------------------|
| 3 | **Same as thin5** | Use `schedule-storage-growth-cron.sh --install` for weekly collection + prune |
| 4 | **Before new deployments** | Re-run `bash scripts/audit-proxmox-rpc-storage.sh` and check data% / local-lvm% |
---
## Quick reference
| Item | Script | Purpose |
|------|--------|---------|
| 2101 health | `scripts/maintenance/health-check-rpc-2101.sh` | Diagnose Core RPC |
| 2101 fix | `scripts/maintenance/fix-core-rpc-2101.sh` | Restart Besu, verify RPC |
| 2101 read-only | `scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh` | e2fsck RPC VMIDs on r630-01 |
| 2101 JNA | `scripts/maintenance/fix-rpc-2101-jna-reinstall.sh` | Reinstall Besu in 2101 |
| Storage audit | `scripts/audit-proxmox-rpc-storage.sh` | All hosts + RPC rootfs mapping |
| Disk in CTs | `scripts/maintenance/check-disk-all-vmids.sh` | Root / usage per running CT |
| Storage growth | `scripts/monitoring/collect-storage-growth-data.sh` | Snapshot pvesm/lvs/df |
| Growth cron | `scripts/maintenance/schedule-storage-growth-cron.sh --install` | Weekly collect + prune |