# Storage Recommendations by Fill Rate and Growth **Last updated:** 2026-02-28 Based on current usage, history in `logs/storage-growth/history.csv`, and physical drive layout across ml110, r630-01, and r630-02. **Completed (2026-02-28):** Storage growth cron verified; prune (VMID 5000 + r630-01 CTs) run; ml110 sdb added to VG `pve` and data thin pool extended to ~1.7 TB (ml110 data now ~11% used). **Phase 1 migration (r630-01 data → thin1):** 8 CTs migrated (10233, 10120, 10100, 10101, 10235, 10236, 7804, 8640); r630-01 data **65.8%** (was 72%), thin1 50.6%. --- ## 1. Thresholds and monitoring | Level | Use % | Action | |-------|--------|--------| | **Healthy** | < 75% | Continue normal collection; review quarterly. | | **Watch** | 75–84% | Weekly review; plan prune or migration. | | **WARN** | 85–94% | Prune and/or migrate within 1–2 weeks; do not add new large CTs. | | **CRIT** | ≥ 95% | Immediate action; LVM thin pools can fail or go read-only. | Current scripts: `check-disk-all-vmids.sh` uses WARN 85%, CRIT 95% for **container root** usage. These recommendations apply to **host storage (pvesm / LVM)** as well. --- ## 2. Observed fill behavior (from history) | Host | Storage | Trend (recent) | Implied rate / note | |------|---------|----------------|----------------------| | **ml110** | data | ~28.7% → ~25% (Feb 15 → 27) | Slight decrease (prune/dedup). Plenty of free space. | | **r630-01** | data | 88% → 100% → 72% → **65.8%** (Phase 1 migration) | After Phase 1 (8 CTs data→thin1). Main growth host (validators, RPCs, many CTs). | | **r630-02** | thin1-r630-02 | ~26.5% stable | Low growth. | | **r630-02** | thin2 | ~4.8% → ~9% after 5000 migration | Now holds Blockscout (5000); monitor. | | **r630-02** | thin5 | Was 84.6% → 0% after migration | Empty; available for future moves. | **Conclusion:** The pool that fills fastest and needs the most attention is **r630-01 data** (72% now; many CTs, Besu/DB growth). **ml110 data** is stable and has headroom. **r630-02** is manageable if you avoid concentrating more large CTs on a single thin pool. --- ## 3. Recommendations by host and pool ### ml110 - **data / local-lvm (~25%)** - **Rate:** Low/slow. - **Recommendations:** - Keep running `collect-storage-growth-data.sh --append` (e.g. cron every 6h). - Prune logs in CTs periodically (e.g. with `fix-storage-r630-01-and-thin5.sh`-style logic for ml110 or a dedicated prune script). - No urgency; review again when approaching 70%. - **Unused sdb (931G)** - **Recommendation:** Use it before adding new disks elsewhere. - **Option A:** Add sdb to VG `pve` and extend the `data` thin pool (or create a second thin pool). Frees pressure on sda and doubles effective data capacity. - **Option B:** Create a separate VG + thin pool on sdb for new or migrated CTs. - Document the chosen layout and any new Proxmox storage names in `storage.cfg` and in `PHYSICAL_DRIVES_AND_CONFIG.md`. ### r630-01 - **data / local-lvm (~72%)** - **Rate:** Highest risk; this pool has the most CTs and Besu/DB growth. - **Recommendations:** 1. **Short term:** - Run log/journal prune on all r630-01 CTs regularly (e.g. `fix-storage-r630-01-and-thin5.sh` Phase 2, or a cron job). - Keep storage growth collection (e.g. every 6h) and review weekly when > 70%. 2. **Before 85%:** - Move one or more large CTs to **thin1** on r630-01 (thin1 ~43% used, has space) if VMIDs allow, or plan migration to r630-02 thin pools. - Identify biggest CTs: `check-disk-all-vmids.sh` and `lvs` on r630-01 (data pool). 3. **Before 90%:** - Decide on expansion (e.g. add disks to RAID10 and extend md0/LVM) or permanent migration of several CTs to r630-02. - **Do not** let this pool sit above 85% for long; it has already hit 100% once. - **thin1 (~43%)** - **Rate:** Moderate. - **Recommendations:** Use as spillover for data pool migrations when possible. Monitor monthly; act if > 75%. ### r630-02 - **thin1-r630-02 (~26%)** - **Rate:** Low. - **Recommendation:** Monitor; no change needed unless you add many CTs here. - **thin2 (~9% after 5000 migration)** - **Rate:** May grow with Blockscout (5000) and other CTs. - **Recommendations:** - Run VMID 5000 prune periodically: `vmid5000-free-disk-and-logs.sh`. - If thin2 approaches 75%, consider moving one CT to thin5 (now empty) or thin6. - **thin3, thin4, thin6 (roughly 11–22%)** - **Rate:** Low to moderate. - **Recommendation:** Include in weekly pvesm/lvs review; no special action unless one pool trends > 75%. - **thin5 (0% after migration)** - **Recommendation:** Keep as reserve for migrations from thin2 or other pools when they approach WARN. --- ## 4. Operational schedule (by fill rate) | When | Action | |------|--------| | **Always** | Cron: `collect-storage-growth-data.sh --append` every 6h; weekly: `prune-storage-snapshots.sh` (e.g. Sun 08:00). | | **Weekly** | Review `pvesm status` and `lvs` (or run `audit-proxmox-rpc-storage.sh`); check any pool > 70%. | | **75% ≤ use < 85%** | Plan and run prune; plan migration for largest CTs on that pool; consider using ml110 sdb (if not yet in use). | | **85% ≤ use < 95%** | Execute prune and migration within 1–2 weeks; do not add new large VMs/CTs to that pool. | | **≥ 95%** | Immediate prune + migration; consider emergency migration to ml110 (after adding sdb) or r630-02. | --- ## 5. Scripts to support these recommendations | Script | Purpose | |--------|--------| | `scripts/monitoring/collect-storage-growth-data.sh --append` | Record fill over time (for rate). | | `scripts/maintenance/schedule-storage-growth-cron.sh --install` | Install 6h collect + weekly prune. | | `scripts/audit-proxmox-rpc-storage.sh` | Current pvesm + RPC rootfs mapping. | | `scripts/maintenance/check-disk-all-vmids.sh` | Per-CT disk usage (find big consumers). | | `scripts/maintenance/fix-storage-r630-01-and-thin5.sh` | Prune 5000 + r630-01 CT logs; optional migrate 5000. | | `scripts/maintenance/migrate-ct-r630-01-data-to-thin1.sh ` | Migrate one CT from r630-01 data → thin1 (same host). | | `scripts/maintenance/vmid5000-free-disk-and-logs.sh` | Prune Blockscout (5000) only. | --- ## 6. Adding ml110 sdb to increase capacity (suggested steps) 1. On ml110: `vgextend pve /dev/sdb` (if sdb is already a PV) or `pvcreate /dev/sdb && vgextend pve /dev/sdb`. 2. Extend the data thin pool: `lvextend -L +900G /dev/pve/data` (or use `lvextend -l +100%FREE` and adjust as needed). 3. Re-run `pvesm status` and update documentation. 4. No CT migration required; existing LVs on data can use the new space. (If sdb is a raw disk with no PV, partition or use full disk as PV per your policy; then add to `pve` and extend the data LV as above.) --- ## 7. Summary table by risk | Host | Pool | Current (approx) | Risk | Priority recommendation | |------|------|-------------------|------|--------------------------| | ml110 | data | ~11% (post-extension) | Low | **Done:** sdb added; pool ~1.7 TB. Monitor as before. | | ml110 | sdb | In use (extended data) | — | **Done:** sdb added to pve, data thin pool extended (~1.7 TB total). | | r630-01 | data | ~72% | High | Prune weekly; plan migrations before 85%; consider thin1 spillover. | | r630-01 | thin1 | ~43% | Medium | Use for migrations from data; monitor monthly. | | r630-02 | thin1-r630-02 | ~26% | Low | Monitor. | | r630-02 | thin2 | ~9% | Low | Prune 5000 periodically; watch growth. | | r630-02 | thin5 | 0% | Low | Keep as reserve for migrations. | | r630-02 | thin3, thin4, thin6 | ~11–22% | Low | Include in weekly review. | These recommendations are based on the rate of filling observed in history and current configurations; adjust thresholds or schedule if your growth pattern changes.