Files

Deploy to Phoenix / deploy (push) Has been cancelled

Details

docs: Ledger Live integration, contract deploy learnings, NEXT_STEPS updates

- ADD_CHAIN138_TO_LEDGER_LIVE: Ledger form done; public code review repo bis-innovations/LedgerLive; init/push commands
- CONTRACT_DEPLOYMENT_RUNBOOK: Chain 138 gas price 1 gwei, 36-addr check, TransactionMirror workaround
- CONTRACT_*: AddressMapper, MirrorManager deployed 2026-02-12; 36-address on-chain check
- NEXT_STEPS_FOR_YOU: Ledger done; steps completable now (no LAN); run-completable-tasks-from-anywhere
- MASTER_INDEX, OPERATOR_OPTIONAL, SMART_CONTRACTS_INVENTORY_SIMPLE: updates
- LEDGER_BLOCKCHAIN_INTEGRATION_COMPLETE: bis-innovations/LedgerLive reference

Co-authored-by: Cursor <cursoragent@cursor.com>

2026-02-12 15:46:57 -08:00

16 KiB

Raw Permalink Blame History

SolaceScanScout Deep-Dive: All Fixes Needed & Proactive vs Reactive Timing

Last Updated: 2026-02-09
Purpose: Investigate all fixes needed for the explorer, and define correct timing so we can be proactive instead of reactive.
Related: SOLACESCANSCOUT_CONNECTIONS_FULL_TREE.md, SOLACESCANSCOUT_REVIEW.md, BLOCKSCOUT_FIX_RUNBOOK.md.

Quick reference: when to act

Frequency	What to do	Script / location
One-time / after change	Fix RPC URL on 5000 if RPC VMID retired; SSL in NPMplus; migrate 5000 to thin5 if thin1 full	BLOCKSCOUT_FIX_RUNBOOK; NEXT_STEPS_OPERATOR
Daily 08:00	Explorer HTTPS + API must pass; indexer lag (RPC block vs explorer block) < threshold; RPC 2201 up	daily-weekly-checks.sh (harden per §6.1)
Weekly (e.g. Sun)	Explorer logs review; thin pool usage on r630-02 (warn >85%)	O-4; new thin-pool check §6.2
On deploy / NPMplus change	E2E routing; full explorer E2E from LAN; Blockscout migrations if needed	verify-end-to-end-routing.sh; e2e-test-explorer.sh; fix-blockscout-ssl-and-migrations.sh

1. Executive Summary

Category	Reactive (we discover when it breaks)	Proactive (we detect before users do)
Explorer sync stop	Users see stale blocks; 15-day lag happened Jan 2026	Daily check: compare RPC block vs explorer block; alert if lag > N blocks
502 / DB / migrations	Public 502 on explorer.d-bis.org	Daily: HTTPS + API reachability; weekly: logs; storage check before full
Thin pool full	"No space left on device"; Docker/Blockscout fail	Weekly (or before major deploys): thin pool % on r630-02
RPC endpoint wrong/down	Indexer stops (e.g. VMID 2500 destroyed)	Daily: RPC 2201 health; dependency list reviewed on infra changes
SSL / NPMplus	"Connection isn't private" or 502	E2E run (e.g. after NPMplus changes); optional cert expiry check
Frontend/API config	Wrong API URL or missing routes	After deploy: E2E + explorer E2E from LAN

Key insight: The Jan 2026 “explorer 15 days behind” incident was reactive: we had no check that compared chain head block to explorer’s last indexed block. The daily cron only checks “API returns 200 with total_blocks” and does not fail when Blockscout is unreachable (it logs SKIP). So we stayed green until someone looked at the UI.

2. Complete Fix Inventory (All Known Issues & Fixes)

2.1 Critical (Explorer Unusable or Stale)

#	Issue	Root Cause	Fix	Runbook / Script
C1	Explorer stopped indexing (blocks stale)	RPC unreachable (wrong IP or VM down), or indexer/DB crash	Point `ETHEREUM_JSONRPC_HTTP_URL` to working RPC (e.g. 192.168.11.221:8545); restart Blockscout; fix DB if needed	SOLACESCANSCOUT_REVIEW.md; BLOCKSCOUT_FIX_RUNBOOK
C2	502 Bad Gateway on explorer.d-bis.org	Blockscout or Postgres down; or postgres nxdomain (Docker DNS); or thin pool full	Restart stack; fix Docker network/DB URL; or migrate VM 5000 to thin5	BLOCKSCOUT_FIX_RUNBOOK; fix-blockscout-ssl-and-migrations.sh; fix-blockscout-1.sh
C3	SSL/migrations (migrations_status, blocks table missing)	ECTO_USE_SSL=TRUE vs Postgres without SSL	Run migrations with `?sslmode=disable` and ECTO_USE_SSL=false; persist in docker-compose/.env	fix-blockscout-ssl-and-migrations.sh
C4	No space left on device (thin pool 100%)	thin1-r630-02 full; VM 5000 on thin1	Migrate VMID 5000 to thin5 (vzdump → destroy → restore to thin5); or free thin1 by moving other VMs	BLOCKSCOUT_FIX_RUNBOOK; fix-blockscout-1.sh

2.2 High (Degraded or One-Time Config)

#	Issue	Root Cause	Fix	Runbook / Script
H1	RPC endpoint pointed to destroyed VM (e.g. 2500)	VMID 2500 decommissioned; Blockscout env not updated	Set ETHEREUM_JSONRPC_HTTP_URL=http://192.168.11.221:8545 (and WS if used) in Blockscout env on VM 5000	SOLACESCANSCOUT_REVIEW.md
H2	Explorer SSL "connection isn't private"	No or invalid Let's Encrypt for explorer.d-bis.org in NPMplus	NPMplus UI: SSL Certificates → request for explorer.d-bis.org; assign to proxy host, Force SSL	NEXT_STEPS_OPERATOR.md § Explorer SSL
H3	NPMplus proxy wrong for explorer	Proxy host points to wrong IP/port	Update explorer.d-bis.org proxy to http://192.168.11.140:80 (and :4000 if API separate)	update-npmplus-proxy-hosts-api.sh; RPC_ENDPOINTS_MASTER.md
H4	Blockscout container or service exited	Crash or OOM; systemd "active (exited)"	Restart: `pct exec 5000 -- systemctl restart blockscout` or docker-compose up -d; check logs	SOLACESCANSCOUT_REVIEW.md; OPERATIONAL_RUNBOOKS [138]

2.3 Medium (Operational / Optional)

#	Issue	Root Cause	Fix	Runbook / Script
M1	Forge verification fails (params module/action)	Blockscout API expects query params; Forge sends JSON	Use run-contract-verification-with-proxy.sh or manual verification at explorer UI	BLOCKSCOUT_FIX_RUNBOOK § Forge
M2	Custom frontend not served (wrong index.html or nginx)	Nginx serves Blockscout at / instead of SolaceScanScout index.html	deploy-frontend-to-vmid5000.sh; fix-nginx-serve-custom-frontend.sh	deploy-frontend-to-vmid5000.sh
M3	Token list stale	Token list not updated after new tokens	Bump version/timestamp in dbis-138.tokenlist.json; validate; update explorer/config API reference	OPERATIONAL_RUNBOOKS [139]; TOKEN_LIST_AUTHORING_GUIDE
M4	Explorer logs full or errors unnoticed	No log review; disk full in container	Weekly log review; cleanup-blockscout-journal.sh if needed	OPERATIONAL_RUNBOOKS [138] (O-4)

2.4 One-Time / After Change

#	Issue	When	Fix
O1	After destroying or changing RPC VMIDs	Any RPC VMID decommissioned or IP change	Update Blockscout env (and any script default RPC) to current RPC; update config/ip-addresses.conf and docs
O2	After NPMplus restore or major config change	Restore from backup; new NPMplus instance	Re-verify proxy hosts (explorer.d-bis.org → 192.168.11.140:80); re-request SSL if needed
O3	After Proxmox storage change	New thin pool; migration of VMs	Update BLOCKSCOUT_FIX_RUNBOOK and fix-blockscout-1.sh if default storage names change

3. Reactive vs Proactive: When We Learn About Each Issue

Issue	Reactive trigger (we find out when…)	Proactive detection (we could find out by…)
C1 Sync stop	User or operator notices blocks are old	Daily: Compare RPC `eth_blockNumber` to Blockscout `/api/v2/stats` (or indexer block). Alert if lag > e.g. 100 blocks or 10 min.
C2 502 / DB	User gets 502; or E2E fails	Daily: GET https://explorer.d-bis.org and https://explorer.d-bis.org/api/v2/stats; fail if non-2xx.
C3 SSL/migrations	Blockscout won’t start or crashes on boot	On deploy/restart: Run migrations with correct flags; weekly: review logs for migration/DB errors.
C4 Thin pool full	Docker or pct fails with "no space left"	Weekly (or before big deploy): On r630-02 run `lvs` / `pvesm status` and check thin1 (and thin5) usage; alert if >85%.
H1 Wrong RPC	Indexer stops when that RPC is gone	When changing infra: Checklist: “Update Blockscout RPC URL if any RPC VMID/IP changed.” Daily: RPC 2201 health (already in daily-weekly-checks).
H2 SSL	User sees certificate warning	E2E run after NPMplus changes; optional monthly cert expiry check.
H3 NPMplus proxy wrong	502 or wrong site when opening explorer.d-bis.org	E2E: verify-end-to-end-routing.sh (DNS, SSL, HTTPS 200).
H4 Container exited	502 or API down	Daily: Same as C2 (HTTPS + API); weekly: logs (O-4).

4. Current Monitoring vs What’s Missing

4.1 What Exists Today

Check	Frequency	Script / Cron	Limitation
Explorer indexer (API reachable)	Daily 08:00	daily-weekly-checks.sh [135]	Does not fail when Blockscout unreachable (logs SKIP).
RPC 2201 health	Daily 08:00	daily-weekly-checks.sh [136]	Good; fails if RPC down.
Config API	Weekly Sun 09:00	daily-weekly-checks.sh [137]	Not explorer-specific.
Explorer logs	Weekly (manual)	OPERATIONAL_RUNBOOKS [138]	Reminder only; no automated parse.
E2E (DNS, SSL, HTTPS)	On-demand	verify-end-to-end-routing.sh	Optional Blockscout API; can skip off-LAN.
Explorer + block production	On-demand	verify-explorer-and-block-production.sh	Compares RPC block to chain; does not compare explorer block to RPC block (indexer lag).
Thin pool	On-demand	fix-blockscout-1.sh (when already broken); investigate-thin2-storage.sh	No scheduled thin pool check for r630-02 thin1.

4.2 Gaps (Why We Were Reactive)

No indexer lag check
We never compare “latest block on RPC” vs “latest block in Blockscout.” So we don’t detect “API is up but indexer stopped” until someone looks at the UI or block count.
Explorer check is soft
If Blockscout is down, daily-weekly-checks.sh prints SKIP and does not increment FAILED. Cron stays “green” while explorer is broken.
No thin pool monitoring
thin1-r630-02 can reach 100% with no alert. First sign is often “no space left on device” during a restart or pull.
No automated alerting
Cron only logs to a file. No email, PagerDuty, or dashboard that fails when explorer or RPC fails.
RPC dependency not formalized
When VMID 2500 was destroyed, Blockscout’s RPC URL wasn’t in a “dependency list” that’s reviewed on infra changes.

5. Recommended Proactive Timing

5.1 One-Time (Do Once or After Change)

Action	When	Owner
Fix RPC URL on VM 5000	Already done (192.168.11.221). Re-do whenever an RPC VMID used by explorer is retired or re-IP’d	Ops
Add explorer.d-bis.org to “infra dependency” list	When documenting RPC/explorer relationship	Ops
Request SSL for explorer.d-bis.org in NPMplus	Once (and after any NPMplus restore that loses certs)	Ops
Migrate VM 5000 to thin5 if thin1 is near full	Once (or when thin1 >85%)	Ops

5.2 Daily (Catch Outages and Sync Stop)

Action	When	Implementation
Explorer HTTPS 200	Daily 08:00 (with existing cron)	Add to daily-weekly-checks: GET https://explorer.d-bis.org, fail if not 2xx (run from host that can reach it or use public URL).
Explorer API 200 + body	Daily 08:00	Same script: GET https://explorer.d-bis.org/api/v2/stats (or http://192.168.11.140:4000 from LAN); fail if not 200 or missing total_blocks/total_transactions.
Indexer lag	Daily 08:00	New check: (1) RPC eth_blockNumber → chain_head. (2) Blockscout API → last indexed block (or total_blocks). (3) If chain_head - last_indexed > threshold (e.g. 100 blocks or 5 min), fail.
RPC 2201 health	Already daily 08:00	Keep as-is (critical for indexer).

5.3 Weekly (Catch Slow Degradation)

Action	When	Implementation
Review explorer logs	Weekly (e.g. Sun 09:00)	Keep O-4: `pct exec 5000 -- journalctl -u blockscout -n 200` (or SSH); optional: grep for ERROR / nxdomain / ssl.
Thin pool usage r630-02	Weekly (e.g. Sun) or before major deploy	New: SSH to r630-02, run `pvesm status \| grep thin` and/or `lvs \| grep thin`; warn if thin1 >85%; fail if 100%.
Config API	Already weekly	Keep [137].

5.4 On-Deploy / On-Change

Action	When	Implementation
E2E routing	After NPMplus or DNS changes	Run verify-end-to-end-routing.sh (include explorer.d-bis.org).
Full explorer E2E (LAN)	After frontend or Blockscout deploy	Run explorer-monorepo/scripts/e2e-test-explorer.sh from LAN.
Blockscout migrations	Before/after Blockscout version or config change	fix-blockscout-ssl-and-migrations.sh or manual migration with sslmode=disable.

6. Concrete Script and Cron Changes

6.1 Harden daily-weekly-checks.sh (Explorer)

Current: [135] Explorer indexer: curl to :4000; on failure print SKIP and do not increment FAILED.
Change:
- Option A (minimal): When running from LAN (or when PUBLIC_EXPLORER_CHECK=1), also GET https://explorer.d-bis.org. If both API and homepage fail, increment FAILED.
- Option B (recommended): Add an indexer lag check:
  - From LAN: get RPC block number (192.168.11.221:8545 eth_blockNumber).
  - Get Blockscout last block from /api/v2/stats or /api/v2/blocks (or indexer stats).
  - If RPC_block - explorer_block > 500 (or time-based, e.g. >10 min), increment FAILED and log “Explorer indexer lag > 500 blocks”.
- Ensure at least one explorer check fails the daily run when the explorer is clearly broken (e.g. API unreachable from LAN).

6.2 Add Weekly Thin Pool Check

New script or block in weekly: On r630-02 (192.168.11.12), run:
- ssh root@192.168.11.12 'pvesm status 2>/dev/null | grep -E "thin1|thin5"'
- Parse usage (e.g. 5th column); if thin1-r630-02 > 85%, log warning; if 100%, fail.
Cron: Add to weekly branch of schedule-daily-weekly-cron.sh, or separate weekly script that runs Sunday.

6.3 Optional: Alerting

Pipe daily/weekly check output to a log; have a wrapper that:
- Sends email or Slack on FAILED > 0, or
- Writes to a file that Prometheus/Grafana can scrape (e.g. “explorer_ok 0” vs “explorer_ok 1”).

6.4 Dependency Checklist (Procedural)

In OPERATIONAL_RUNBOOKS or BLOCKSCOUT_FIX_RUNBOOK, add:
- When decommissioning or changing RPC nodes: Check if Blockscout (VMID 5000) uses that RPC; if yes, update ETHEREUM_JSONRPC_HTTP_URL and restart Blockscout.
In SOLACESCANSCOUT_CONNECTIONS_FULL_TREE or a “dependency” section: list “Explorer (5000) depends on: RPC 2201 (192.168.11.221).”

7. Summary: From Reactive to Proactive

Before (Reactive)	After (Proactive)
Discover sync stop when users report stale data	Daily: compare RPC block vs explorer block; fail if lag > threshold
Discover 502 when someone opens explorer	Daily: HTTPS + API check that fails the run if down
Discover thin pool full when Docker fails	Weekly: check thin1 (and thin5) usage on r630-02; warn at 85%
Update RPC URL only after indexer breaks	Checklist on infra change: “Update Blockscout RPC if RPC VMID/IP changed”
Explorer check never fails cron	Harden daily check so unreachable explorer or large indexer lag fails the job

Implementing §5 (Recommended Proactive Timing) and §6 (Script and Cron Changes) will move SolaceScanScout operations from reactive to proactive, with clear timing for each fix category.

Last updated: 2026-02-09
References: SOLACESCANSCOUT_CONNECTIONS_FULL_TREE.md, SOLACESCANSCOUT_REVIEW.md, BLOCKSCOUT_FIX_RUNBOOK.md, OPERATIONAL_RUNBOOKS.md, daily-weekly-checks.sh, verify-explorer-and-block-production.sh

16 KiB Raw Permalink Blame History Unescape Escape