- **Fleet checks (same day, follow-up):** Ran `collect-storage-growth-data.sh --append`, `storage-monitor.sh check`, `proxmox-host-io-optimize-pass.sh` (swappiness/sysstat; host `fstrim` N/A on LVM root). **Load:** ml110 load dominated by **Besu (Java)** and **cloudflared**; r630-01 load improved after earlier spike (still many CTs). **r630-01 `data` thin:** after guest `fstrim` fleet, **pvesm** used% dropped slightly (e.g. **~71.6% → ~70.2%** on 2026-03-28 — reclaim varies by CT). **ZFS:** r630-01 / r630-02 `rpool` ONLINE; last scrub **2026-03-08**, 0 errors. **`/proc/mdstat` (r630-01):** RAID devices present and active (no resync observed during check).
- **ml110 `cloudflared`:** If the process still runs as `tunnel run --token …` on the **command line**, the token is visible in **`ps`** and process listings. Prefer **`credentials-file` + YAML** (`scripts/cloudflare-tunnels/systemd/cloudflared-ml110.service`) or **`EnvironmentFile`** with tight permissions; **rotate the tunnel token** in Cloudflare if it was ever logged.
- **CT 7811 (r630-02, thin4):** Root was **100%** full (**~44 GiB** in `/var/log/syslog` + rotated `syslog.1`). **Remediation:** truncated `syslog` / `syslog.1` and restarted `rsyslog`; root **~6%** after fix. **`/etc/logrotate.d/rsyslog`** updated to **`daily` + `maxsize 200M`** (was weekly-only) and **`su root syslog`** so rotation runs before another multi-GB spike. If logs still grow fast, reduce app/rsyslog **facility** noise at the source.
- **CT 10100 (r630-01, thin1):** Root **WARN** (**~88–90%** on 8 GiB); growth mostly **`/var/lib/postgresql` (~5 GiB)**. **Remediation:**`pct resize 10100 rootfs +4G` + `resize2fs`; root **~57%** after. **Note:** Proxmox warned thin **overcommit** vs VG — monitor `pvesm` / `lvs` and avoid excessive concurrent disk expansions without pool growth.
- **`storage-monitor.sh`:** Fixed **`set -e` abort** on unreachable optional nodes and **pipe-subshell** so `ALERTS+=` runs in the main shell (alerts and summaries work).
- **Fleet guest `fstrim`:** `scripts/maintenance/fstrim-all-running-ct.sh` supports **`FSTRIM_TIMEOUT_SEC`** and **`FSTRIM_HOSTS`** (e.g. `ml110`, `r630-01`, `r630-02`). Many CTs return FITRIM “not permitted” (guest/filesystem); others reclaim space on the thin pools (notably on **r630-02**).
- **r630-02 `thin1`–`thin6` VGs:** Each VG is on a **single PV** with only **~124 MiB `vg_free`**; you **cannot**`lvextend` those thin pools until the underlying partition/disk is grown or a second PV is added. Monitor `pvesm status` and plan disk expansion before pools tighten.
- **CT migration** off r630-01 for load balance remains a **planned** action when maintenance windows and target storage allow (not automated here).
- **2026-03-28 (migration follow-up):** CT **3501** migrated to r630-02 **`thin5`** via `pvesh … lxc/3501/migrate --target-storage thin5`. CT **3500** had root LV removed after a mistaken `pct set --delete unused0` (config had `unused0: local-lvm:vm-3500-disk-0` and `rootfs: thin1:vm-3500-disk-0`); **3500** was recreated empty on r630-02 `thin5` — **reinstall Oracle Publisher** on the guest. See `MIGRATE_CT_R630_01_TO_R630_02.md`.
**Output:** Snapshot file `logs/storage-growth/snapshot_YYYYMMDD_HHMMSS.txt`. Use `--append` to grow `logs/storage-growth/history.csv` for trend analysis.
### Cron (proactive)
Use the scheduler script from project root (installs cron every 6 hours; uses `$PROJECT_ROOT`):
```bash
./scripts/maintenance/schedule-storage-growth-cron.sh --install # every 6h: collect + append
./scripts/maintenance/schedule-storage-growth-cron.sh --show # print cron line
**Retention:** Run `scripts/monitoring/prune-storage-snapshots.sh` weekly (e.g. keep last 30 days of snapshot files). Option: `--days 14` or `--dry-run` to preview. See **STORAGE_GROWTH_AUTOMATION_TASKS.md** for full automation list.
---
## 2. Predictable growth table (template)
Fill and refresh from real data. **Est. monthly growth** and **Growth factor** should be updated from `history.csv` or from observed rates.
| Host / VM | Storage / path | Current used | Capacity | Growth factor | Est. monthly growth | Threshold | Action when exceeded |
| **LVM thin pool data%** | Host (r630-01 data, r630-02 thin*, ml110 thin1) | 100% = no new writes | fstrim in CTs, migrate VMs, remove unused LVs, expand pool |
| **LVM thin metadata%** | Same | High metadata% can cause issues | Expand metadata LV or reduce snapshots |
| **RocksDB (Besu)** | /data/besu in 2101, 2500–2505, 2400, 2201, etc. | Grows with chain; compaction needs temp space | Ensure / and /data have headroom; avoid 100% thin pool |
| **Journal / systemd logs** | /var/log in every CT | Can grow if not rotated | logrotate, journalctl --vacuum-time=7d |
1.**Daily or every 6h:** Run `collect-storage-growth-data.sh --append` and inspect latest snapshot under `logs/storage-growth/`.
2.**Weekly:** Review `logs/storage-growth/history.csv` for rising trends; update the **Predictable growth table** with current numbers and est. monthly growth.
3.**When adding VMs or chain usage:** Re-estimate growth for affected hosts and thin pools; adjust thresholds or capacity.
---
## 5. Matching real-time data to the table
- **Host storage %:** From script output “pvesm status” and “LVM thin pools (data%)”. Map to row “Host / VM” = host name, “Storage / path” = storage or LV name.
- **VM /, /data, /var/log:** From “VM/CT on <host>” and “VMID <id>” in the same snapshot. Map to row “Host / VM” = VMID.
- **Growth over time:** Use `history.csv` (with `--append` runs). Compute delta of used% or used size between two timestamps to get rate; extrapolate to “Est. monthly growth” and “Action when exceeded”.
- **In-CT disk check:** `scripts/maintenance/check-disk-all-vmids.sh` (root /). Run daily via `daily-weekly-checks.sh` (cron 08:00).
- **Retention:** `scripts/monitoring/prune-storage-snapshots.sh` (snapshots), `scripts/monitoring/prune-storage-history.sh` (history.csv). Both run weekly when using `schedule-storage-growth-cron.sh --install`.
- **Weekly remediation:** `daily-weekly-checks.sh weekly` runs fstrim in all running CTs and journal vacuum in key CTs; see **STORAGE_GROWTH_AUTOMATION_TASKS.md**.
- **Making RPC VMIDs writable after full/read-only:** `scripts/maintenance/make-rpc-vmids-writable-via-ssh.sh`; see **502_DEEP_DIVE_ROOT_CAUSES_AND_FIXES.md**.
- **Thin pool full / migration:** **MIGRATE_CT_R630_01_TO_R630_02.md**, **R630-02_STORAGE_REVIEW.md**.