| **A1** | **Storage snapshot + history append** | Run `collect-storage-growth-data.sh --append` on a schedule so `history.csv` grows for trend analysis. | Cron every 6 hours (or daily). Install from a persistent host checkout, e.g. `CRON_PROJECT_ROOT=/srv/proxmox bash scripts/maintenance/schedule-storage-growth-cron.sh --install`. |
| **A2** | **Snapshot retention** | Prune old snapshot files under `logs/storage-growth/` so the directory does not grow unbounded. | **Done.** Script: `scripts/monitoring/prune-storage-snapshots.sh` (default keep 30 days; `--days N`, `--dry-run`). Schedule weekly or run manually. |
| **A3** | **History CSV retention** | Cap `history.csv` size (keep last 10k rows or ~90 days). | **Done.** Script: `scripts/monitoring/prune-storage-history.sh` (default 90 days proxy; `--max-rows N`, `--days N`, `--dry-run`). Run weekly via schedule-storage-growth-cron (prune line). |
### 1.2 Threshold checks and alerting
| # | Task | Description | How |
|---|------|-------------|-----|
| **A4** | **Thin pool / pvesm check (all hosts)** | Fail or warn when any host’s thin pool or pvesm storage is ≥ 95% (critical) or ≥ 80% (warn). | **Done.** In `daily-weekly-checks.sh` weekly (F3/M2). |
| **A5** | **In-CT disk check in cron** | Run `check-disk-all-vmids.sh` on a schedule and log or alert on WARN/CRIT. | **Done.** Called from `daily-weekly-checks.sh` daily (cron 08:00). |
| **A7** | **Metric file for alerting** | Write a metric file (e.g. `logs/storage-growth/last_run.metric`) with max thin pool % and timestamp so an external monitor can alert. | **Done.** Weekly run writes `STORAGE_METRIC_FILE` (storage_max_pct, storage_metric_timestamp). |
### 1.3 Proactive remediation (optional)
| # | Task | Description | How |
|---|------|-------------|-----|
| **A8** | **Weekly fstrim in CTs** | Run `fstrim` inside running CTs on hosts with thin pools to reclaim space. | **Done.**`scripts/maintenance/fstrim-all-running-ct.sh`; run from `daily-weekly-checks.sh` weekly. |
| **A10** | **Journal vacuum** | Run `journalctl --vacuum-time=7d` in key CTs on a schedule. | **Done.**`scripts/maintenance/journal-vacuum-key-ct.sh`; run from `daily-weekly-checks.sh` weekly. |
---
## 2. Fixes required
| # | Fix | Location | Detail |
|---|-----|----------|--------|
| **F1** | **Implement or remove --json** | `scripts/monitoring/collect-storage-growth-data.sh` | **Done.**`--json` outputs a JSON object with `timestamp` and `csv_rows` (array of CSV line strings). |
| **F2** | **CSV quoting for detail column** | `scripts/monitoring/collect-storage-growth-data.sh` | **Done.** Detail field is quoted when it contains commas or quotes via `csv_quote()`. |
| **F3** | **Thin pool check on all three hosts** | `scripts/maintenance/daily-weekly-checks.sh` | **Done.** [138a] now runs thin pool/storage check on r630-02, r630-01, and ml110 (WARN ≥85%, FAIL ≥95%/100%). |
| **M2** | **Extend weekly checks to all-host thin pool** | **Done.** Implemented with F3 in `daily-weekly-checks.sh`: `check_thin_pool_one_host` for r630-02, r630-01, ml110. |
| **M3** | **Doc and index updates** | **Done.** STORAGE_GROWTH_AND_HEALTH.md references schedule-storage-growth-cron.sh and prune script; MASTER_INDEX and OPERATIONAL_RUNBOOKS list storage growth cron. |
| **M4** | **Optional: CI job** | Add a GitHub Actions (or Gitea) workflow that runs `collect-storage-growth-data.sh --csv` (or a dry run that only checks script syntax / host reachability) so config changes don’t break the script. Optional because the script requires LAN/SSH to hosts. |
---
## 4. Implementation order
1.**F2** (CSV quoting) and **F1** (--json) in `collect-storage-growth-data.sh`.
2.**M1** Add `schedule-storage-growth-cron.sh` and **M3** update docs.
3.**F3** and **M2** Extend daily-weekly-checks.sh to check thin pool on all three hosts.
4.**A1** Install storage growth cron (via M1).
5.**A2** Add `prune-storage-snapshots.sh` and schedule weekly (or in same cron wrapper).
6.**A4/A7** Optionally have weekly check write a metric file; wire A5 (check-disk-all-vmids) into daily if desired.
7.**A8–A10** As needed (fstrim, logrotate audit, journal vacuum).
---
## 5. Quick reference
| Script | Purpose |
|--------|---------|
| `scripts/monitoring/collect-storage-growth-data.sh` | Collect host + VM storage; output snapshot + optional growth table; `--append` for history.csv. |