SolaceScanScout Deep-Dive: All Fixes Needed & Proactive vs Reactive Timing
Last Updated: 2026-02-09
Purpose: Investigate all fixes needed for the explorer, and define correct timing so we can be proactive instead of reactive.
Related: SOLACESCANSCOUT_CONNECTIONS_FULL_TREE.md, SOLACESCANSCOUT_REVIEW.md, BLOCKSCOUT_FIX_RUNBOOK.md.
Quick reference: when to act
| Frequency |
What to do |
Script / location |
| One-time / after change |
Fix RPC URL on 5000 if RPC VMID retired; SSL in NPMplus; migrate 5000 to thin5 if thin1 full |
BLOCKSCOUT_FIX_RUNBOOK; NEXT_STEPS_OPERATOR |
| Daily 08:00 |
Explorer HTTPS + API must pass; indexer lag (RPC block vs explorer block) < threshold; RPC 2201 up |
daily-weekly-checks.sh (harden per §6.1) |
| Weekly (e.g. Sun) |
Explorer logs review; thin pool usage on r630-02 (warn >85%) |
O-4; new thin-pool check §6.2 |
| On deploy / NPMplus change |
E2E routing; full explorer E2E from LAN; Blockscout migrations if needed |
verify-end-to-end-routing.sh; e2e-test-explorer.sh; fix-blockscout-ssl-and-migrations.sh |
1. Executive Summary
| Category |
Reactive (we discover when it breaks) |
Proactive (we detect before users do) |
| Explorer sync stop |
Users see stale blocks; 15-day lag happened Jan 2026 |
Daily check: compare RPC block vs explorer block; alert if lag > N blocks |
| 502 / DB / migrations |
Public 502 on explorer.d-bis.org |
Daily: HTTPS + API reachability; weekly: logs; storage check before full |
| Thin pool full |
"No space left on device"; Docker/Blockscout fail |
Weekly (or before major deploys): thin pool % on r630-02 |
| RPC endpoint wrong/down |
Indexer stops (e.g. VMID 2500 destroyed) |
Daily: RPC 2201 health; dependency list reviewed on infra changes |
| SSL / NPMplus |
"Connection isn't private" or 502 |
E2E run (e.g. after NPMplus changes); optional cert expiry check |
| Frontend/API config |
Wrong API URL or missing routes |
After deploy: E2E + explorer E2E from LAN |
Key insight: The Jan 2026 “explorer 15 days behind” incident was reactive: we had no check that compared chain head block to explorer’s last indexed block. The daily cron only checks “API returns 200 with total_blocks” and does not fail when Blockscout is unreachable (it logs SKIP). So we stayed green until someone looked at the UI.
2. Complete Fix Inventory (All Known Issues & Fixes)
2.1 Critical (Explorer Unusable or Stale)
| # |
Issue |
Root Cause |
Fix |
Runbook / Script |
| C1 |
Explorer stopped indexing (blocks stale) |
RPC unreachable (wrong IP or VM down), or indexer/DB crash |
Point ETHEREUM_JSONRPC_HTTP_URL to working RPC (e.g. 192.168.11.221:8545); restart Blockscout; fix DB if needed |
SOLACESCANSCOUT_REVIEW.md; BLOCKSCOUT_FIX_RUNBOOK |
| C2 |
502 Bad Gateway on explorer.d-bis.org |
Blockscout or Postgres down; or postgres nxdomain (Docker DNS); or thin pool full |
Restart stack; fix Docker network/DB URL; or migrate VM 5000 to thin5 |
BLOCKSCOUT_FIX_RUNBOOK; fix-blockscout-ssl-and-migrations.sh; fix-blockscout-1.sh |
| C3 |
SSL/migrations (migrations_status, blocks table missing) |
ECTO_USE_SSL=TRUE vs Postgres without SSL |
Run migrations with ?sslmode=disable and ECTO_USE_SSL=false; persist in docker-compose/.env |
fix-blockscout-ssl-and-migrations.sh |
| C4 |
No space left on device (thin pool 100%) |
thin1-r630-02 full; VM 5000 on thin1 |
Migrate VMID 5000 to thin5 (vzdump → destroy → restore to thin5); or free thin1 by moving other VMs |
BLOCKSCOUT_FIX_RUNBOOK; fix-blockscout-1.sh |
2.2 High (Degraded or One-Time Config)
| # |
Issue |
Root Cause |
Fix |
Runbook / Script |
| H1 |
RPC endpoint pointed to destroyed VM (e.g. 2500) |
VMID 2500 decommissioned; Blockscout env not updated |
Set ETHEREUM_JSONRPC_HTTP_URL=http://192.168.11.221:8545 (and WS if used) in Blockscout env on VM 5000 |
SOLACESCANSCOUT_REVIEW.md |
| H2 |
Explorer SSL "connection isn't private" |
No or invalid Let's Encrypt for explorer.d-bis.org in NPMplus |
NPMplus UI: SSL Certificates → request for explorer.d-bis.org; assign to proxy host, Force SSL |
NEXT_STEPS_OPERATOR.md § Explorer SSL |
| H3 |
NPMplus proxy wrong for explorer |
Proxy host points to wrong IP/port |
Update explorer.d-bis.org proxy to http://192.168.11.140:80 (and :4000 if API separate) |
update-npmplus-proxy-hosts-api.sh; RPC_ENDPOINTS_MASTER.md |
| H4 |
Blockscout container or service exited |
Crash or OOM; systemd "active (exited)" |
Restart: pct exec 5000 -- systemctl restart blockscout or docker-compose up -d; check logs |
SOLACESCANSCOUT_REVIEW.md; OPERATIONAL_RUNBOOKS [138] |
2.3 Medium (Operational / Optional)
| # |
Issue |
Root Cause |
Fix |
Runbook / Script |
| M1 |
Forge verification fails (params module/action) |
Blockscout API expects query params; Forge sends JSON |
Use run-contract-verification-with-proxy.sh or manual verification at explorer UI |
BLOCKSCOUT_FIX_RUNBOOK § Forge |
| M2 |
Custom frontend not served (wrong index.html or nginx) |
Nginx serves Blockscout at / instead of SolaceScanScout index.html |
deploy-frontend-to-vmid5000.sh; fix-nginx-serve-custom-frontend.sh |
deploy-frontend-to-vmid5000.sh |
| M3 |
Token list stale |
Token list not updated after new tokens |
Bump version/timestamp in dbis-138.tokenlist.json; validate; update explorer/config API reference |
OPERATIONAL_RUNBOOKS [139]; TOKEN_LIST_AUTHORING_GUIDE |
| M4 |
Explorer logs full or errors unnoticed |
No log review; disk full in container |
Weekly log review; cleanup-blockscout-journal.sh if needed |
OPERATIONAL_RUNBOOKS [138] (O-4) |
2.4 One-Time / After Change
| # |
Issue |
When |
Fix |
| O1 |
After destroying or changing RPC VMIDs |
Any RPC VMID decommissioned or IP change |
Update Blockscout env (and any script default RPC) to current RPC; update config/ip-addresses.conf and docs |
| O2 |
After NPMplus restore or major config change |
Restore from backup; new NPMplus instance |
Re-verify proxy hosts (explorer.d-bis.org → 192.168.11.140:80); re-request SSL if needed |
| O3 |
After Proxmox storage change |
New thin pool; migration of VMs |
Update BLOCKSCOUT_FIX_RUNBOOK and fix-blockscout-1.sh if default storage names change |
3. Reactive vs Proactive: When We Learn About Each Issue
| Issue |
Reactive trigger (we find out when…) |
Proactive detection (we could find out by…) |
| C1 Sync stop |
User or operator notices blocks are old |
Daily: Compare RPC eth_blockNumber to Blockscout /api/v2/stats (or indexer block). Alert if lag > e.g. 100 blocks or 10 min. |
| C2 502 / DB |
User gets 502; or E2E fails |
Daily: GET https://explorer.d-bis.org and https://explorer.d-bis.org/api/v2/stats; fail if non-2xx. |
| C3 SSL/migrations |
Blockscout won’t start or crashes on boot |
On deploy/restart: Run migrations with correct flags; weekly: review logs for migration/DB errors. |
| C4 Thin pool full |
Docker or pct fails with "no space left" |
Weekly (or before big deploy): On r630-02 run lvs / pvesm status and check thin1 (and thin5) usage; alert if >85%. |
| H1 Wrong RPC |
Indexer stops when that RPC is gone |
When changing infra: Checklist: “Update Blockscout RPC URL if any RPC VMID/IP changed.” Daily: RPC 2201 health (already in daily-weekly-checks). |
| H2 SSL |
User sees certificate warning |
E2E run after NPMplus changes; optional monthly cert expiry check. |
| H3 NPMplus proxy wrong |
502 or wrong site when opening explorer.d-bis.org |
E2E: verify-end-to-end-routing.sh (DNS, SSL, HTTPS 200). |
| H4 Container exited |
502 or API down |
Daily: Same as C2 (HTTPS + API); weekly: logs (O-4). |
4. Current Monitoring vs What’s Missing
4.1 What Exists Today
| Check |
Frequency |
Script / Cron |
Limitation |
| Explorer indexer (API reachable) |
Daily 08:00 |
daily-weekly-checks.sh [135] |
Does not fail when Blockscout unreachable (logs SKIP). |
| RPC 2201 health |
Daily 08:00 |
daily-weekly-checks.sh [136] |
Good; fails if RPC down. |
| Config API |
Weekly Sun 09:00 |
daily-weekly-checks.sh [137] |
Not explorer-specific. |
| Explorer logs |
Weekly (manual) |
OPERATIONAL_RUNBOOKS [138] |
Reminder only; no automated parse. |
| E2E (DNS, SSL, HTTPS) |
On-demand |
verify-end-to-end-routing.sh |
Optional Blockscout API; can skip off-LAN. |
| Explorer + block production |
On-demand |
verify-explorer-and-block-production.sh |
Compares RPC block to chain; does not compare explorer block to RPC block (indexer lag). |
| Thin pool |
On-demand |
fix-blockscout-1.sh (when already broken); investigate-thin2-storage.sh |
No scheduled thin pool check for r630-02 thin1. |
4.2 Gaps (Why We Were Reactive)
-
No indexer lag check
We never compare “latest block on RPC” vs “latest block in Blockscout.” So we don’t detect “API is up but indexer stopped” until someone looks at the UI or block count.
-
Explorer check is soft
If Blockscout is down, daily-weekly-checks.sh prints SKIP and does not increment FAILED. Cron stays “green” while explorer is broken.
-
No thin pool monitoring
thin1-r630-02 can reach 100% with no alert. First sign is often “no space left on device” during a restart or pull.
-
No automated alerting
Cron only logs to a file. No email, PagerDuty, or dashboard that fails when explorer or RPC fails.
-
RPC dependency not formalized
When VMID 2500 was destroyed, Blockscout’s RPC URL wasn’t in a “dependency list” that’s reviewed on infra changes.
5. Recommended Proactive Timing
5.1 One-Time (Do Once or After Change)
| Action |
When |
Owner |
| Fix RPC URL on VM 5000 |
Already done (192.168.11.221). Re-do whenever an RPC VMID used by explorer is retired or re-IP’d |
Ops |
| Add explorer.d-bis.org to “infra dependency” list |
When documenting RPC/explorer relationship |
Ops |
| Request SSL for explorer.d-bis.org in NPMplus |
Once (and after any NPMplus restore that loses certs) |
Ops |
| Migrate VM 5000 to thin5 if thin1 is near full |
Once (or when thin1 >85%) |
Ops |
5.2 Daily (Catch Outages and Sync Stop)
| Action |
When |
Implementation |
| Explorer HTTPS 200 |
Daily 08:00 (with existing cron) |
Add to daily-weekly-checks: GET https://explorer.d-bis.org, fail if not 2xx (run from host that can reach it or use public URL). |
| Explorer API 200 + body |
Daily 08:00 |
Same script: GET https://explorer.d-bis.org/api/v2/stats (or http://192.168.11.140:4000 from LAN); fail if not 200 or missing total_blocks/total_transactions. |
| Indexer lag |
Daily 08:00 |
New check: (1) RPC eth_blockNumber → chain_head. (2) Blockscout API → last indexed block (or total_blocks). (3) If chain_head - last_indexed > threshold (e.g. 100 blocks or 5 min), fail. |
| RPC 2201 health |
Already daily 08:00 |
Keep as-is (critical for indexer). |
5.3 Weekly (Catch Slow Degradation)
| Action |
When |
Implementation |
| Review explorer logs |
Weekly (e.g. Sun 09:00) |
Keep O-4: pct exec 5000 -- journalctl -u blockscout -n 200 (or SSH); optional: grep for ERROR / nxdomain / ssl. |
| Thin pool usage r630-02 |
Weekly (e.g. Sun) or before major deploy |
New: SSH to r630-02, run pvesm status | grep thin and/or lvs | grep thin; warn if thin1 >85%; fail if 100%. |
| Config API |
Already weekly |
Keep [137]. |
5.4 On-Deploy / On-Change
| Action |
When |
Implementation |
| E2E routing |
After NPMplus or DNS changes |
Run verify-end-to-end-routing.sh (include explorer.d-bis.org). |
| Full explorer E2E (LAN) |
After frontend or Blockscout deploy |
Run explorer-monorepo/scripts/e2e-test-explorer.sh from LAN. |
| Blockscout migrations |
Before/after Blockscout version or config change |
fix-blockscout-ssl-and-migrations.sh or manual migration with sslmode=disable. |
6. Concrete Script and Cron Changes
6.1 Harden daily-weekly-checks.sh (Explorer)
- Current: [135] Explorer indexer: curl to :4000; on failure print SKIP and do not increment FAILED.
- Change:
- Option A (minimal): When running from LAN (or when PUBLIC_EXPLORER_CHECK=1), also GET https://explorer.d-bis.org. If both API and homepage fail, increment FAILED.
- Option B (recommended): Add an indexer lag check:
- From LAN: get RPC block number (192.168.11.221:8545 eth_blockNumber).
- Get Blockscout last block from /api/v2/stats or /api/v2/blocks (or indexer stats).
- If RPC_block - explorer_block > 500 (or time-based, e.g. >10 min), increment FAILED and log “Explorer indexer lag > 500 blocks”.
- Ensure at least one explorer check fails the daily run when the explorer is clearly broken (e.g. API unreachable from LAN).
6.2 Add Weekly Thin Pool Check
- New script or block in weekly: On r630-02 (192.168.11.12), run:
ssh root@192.168.11.12 'pvesm status 2>/dev/null | grep -E "thin1|thin5"'
- Parse usage (e.g. 5th column); if thin1-r630-02 > 85%, log warning; if 100%, fail.
- Cron: Add to weekly branch of schedule-daily-weekly-cron.sh, or separate weekly script that runs Sunday.
6.3 Optional: Alerting
- Pipe daily/weekly check output to a log; have a wrapper that:
- Sends email or Slack on FAILED > 0, or
- Writes to a file that Prometheus/Grafana can scrape (e.g. “explorer_ok 0” vs “explorer_ok 1”).
6.4 Dependency Checklist (Procedural)
- In OPERATIONAL_RUNBOOKS or BLOCKSCOUT_FIX_RUNBOOK, add:
- When decommissioning or changing RPC nodes: Check if Blockscout (VMID 5000) uses that RPC; if yes, update ETHEREUM_JSONRPC_HTTP_URL and restart Blockscout.
- In SOLACESCANSCOUT_CONNECTIONS_FULL_TREE or a “dependency” section: list “Explorer (5000) depends on: RPC 2201 (192.168.11.221).”
7. Summary: From Reactive to Proactive
| Before (Reactive) |
After (Proactive) |
| Discover sync stop when users report stale data |
Daily: compare RPC block vs explorer block; fail if lag > threshold |
| Discover 502 when someone opens explorer |
Daily: HTTPS + API check that fails the run if down |
| Discover thin pool full when Docker fails |
Weekly: check thin1 (and thin5) usage on r630-02; warn at 85% |
| Update RPC URL only after indexer breaks |
Checklist on infra change: “Update Blockscout RPC if RPC VMID/IP changed” |
| Explorer check never fails cron |
Harden daily check so unreachable explorer or large indexer lag fails the job |
Implementing §5 (Recommended Proactive Timing) and §6 (Script and Cron Changes) will move SolaceScanScout operations from reactive to proactive, with clear timing for each fix category.
Last updated: 2026-02-09
References: SOLACESCANSCOUT_CONNECTIONS_FULL_TREE.md, SOLACESCANSCOUT_REVIEW.md, BLOCKSCOUT_FIX_RUNBOOK.md, OPERATIONAL_RUNBOOKS.md, daily-weekly-checks.sh, verify-explorer-and-block-production.sh