Finalize DBIS infra verification and runtime baselines

2026-03-28 19:18:32 -07:00
parent 266a8ae30f
commit 6f53323eae
22 changed files with 1924 additions and 157 deletions
--- a/docs/03-deployment/CALIPER_CHAIN138_PERF_HOOK.md
+++ b/docs/03-deployment/CALIPER_CHAIN138_PERF_HOOK.md
@@ -0,0 +1,23 @@
+# Caliper performance hook — Chain 138 (Besu)
+
+**Last updated:** 2026-03-28  
+**Purpose:** Satisfy [dbis_chain_138_technical_master_plan.md](../../dbis_chain_138_technical_master_plan.md) Section 14 without vendoring Caliper into this repository.
+
+## Approach
+
+1. Use upstream [Hyperledger Caliper](https://github.com/hyperledger/caliper) (npm package `/@hyperledger/caliper-cli`).
+2. Create a **separate** working directory (or CI job) with:
+   - `networkconfig.json` pointing `url` to Chain 138 HTTP RPC (prefer an isolated load-test node, not production public RPC).
+   - `benchmarks/` with a minimal `read` workload (`eth_blockNumber`, `eth_getBlockByNumber`) before write-heavy contracts.
+3. Run: `npx caliper launch manager --caliper-workspace . --caliper-networkconfig networkconfig.json --caliper-benchconfig benchmarks/config.yaml`
+4. Archive results (HTML/JSON) next to Phase 1 discovery reports if desired: `reports/phase1-discovery/` or `reports/caliper/`.
+
+## Safety
+
+- Use **low** transaction rates first; Besu validators and RPC tier are production assets.
+- Do not point Caliper at **validator** JSON-RPC ports; use **RPC tier** only.
+- Align gas and chain ID with `smom-dbis-138/.env` and [DEPLOYMENT_ORDER_OF_OPERATIONS.md](DEPLOYMENT_ORDER_OF_OPERATIONS.md).
+
+## Wrapper
+
+`bash scripts/verify/print-caliper-chain138-stub.sh` prints this path and suggested env vars (no network I/O).
--- a/docs/03-deployment/DBIS_HYPERLEDGER_RUNTIME_STATUS.md
+++ b/docs/03-deployment/DBIS_HYPERLEDGER_RUNTIME_STATUS.md
@@ -0,0 +1,66 @@
+# DBIS Hyperledger Runtime Status
+
+**Last Reviewed:** 2026-03-28  
+**Purpose:** Concise app-level status table for the non-Besu Hyperledger footprint currently hosted on Proxmox. This complements the VMID inventory and discovery runbooks by recording what was actually verified inside the running containers.
+
+## Scope
+
+This document summarizes the latest operator verification for:
+
+- FireFly CTs: `6200`, `6201`
+- Fabric CTs: `6000`, `6001`, `6002`
+- Indy CTs: `6400`, `6401`, `6402`
+
+The checks were based on:
+
+- `pct status`
+- in-container process checks
+- in-container listener checks
+- FireFly API / Postgres / IPFS checks where applicable
+
+## Current status table
+
+| VMID | Service family | CT status | App-level status | Listening ports / probe | Notes |
+|------|----------------|-----------|------------------|--------------------------|-------|
+| `6200` | FireFly primary | Running | Healthy minimal local gateway | `5000/tcp` FireFly API, `5432/tcp` Postgres, `5001/tcp` IPFS | `firefly-core` restored on `ghcr.io/hyperledger/firefly:v1.2.0`; `GET /api/v1/status` returned `200`; Postgres `pg_isready` passed; IPFS version probe passed |
+| `6201` | FireFly secondary | Stopped | Standby / incomplete | None verified | CT exists but rootfs is effectively empty and no valid FireFly deployment footprint was found; do not treat as active secondary |
+| `6000` | Fabric primary | Running | Unproven | No Fabric listener verified | CT runs, but current app-level checks did not show active peer/orderer processes or expected listeners such as `7050` / `7051` |
+| `6001` | Fabric secondary | Running | Unproven | No Fabric listener verified | Same current state as `6000` |
+| `6002` | Fabric tertiary | Running | Unproven | No Fabric listener verified | Same current state as `6000` |
+| `6400` | Indy primary | Running | Unproven | No Indy listener verified | CT runs, but current checks did not show Indy node listeners on expected ports such as `9701`-`9708` |
+| `6401` | Indy secondary | Running | Unproven | No Indy listener verified | Same current state as `6400` |
+| `6402` | Indy tertiary | Running | Unproven | No Indy listener verified | Same current state as `6400` |
+
+## Interpretation
+
+### Confirmed working now
+
+- FireFly primary (`6200`) is restored enough to provide a working local FireFly API backed by Postgres and IPFS.
+
+### Present but not currently proved as active application workloads
+
+- Fabric CTs (`6000`-`6002`)
+- Indy CTs (`6400`-`6402`)
+
+These should be described as container footprints under validation, not as fully verified production application nodes, until app-level services and expected listeners are confirmed.
+
+### Not currently active
+
+- FireFly secondary (`6201`) should be treated as standby or incomplete deployment state unless it is intentionally rebuilt and verified.
+
+## Operational follow-up
+
+1. Keep `6200` under observation and preserve its working config/image path.
+2. Do not force `6201` online unless its intended role and deployment assets are re-established.
+3. For Fabric and Indy, the next verification step is app-native validation, not more CT-level checks.
+4. Any governance or architecture document should distinguish:
+   - `deployed and app-healthy`
+   - `container present only`
+   - `planned / aspirational`
+
+## Related artifacts
+
+- [docs/02-architecture/DBIS_NODE_ROLE_MATRIX.md](../02-architecture/DBIS_NODE_ROLE_MATRIX.md)
+- [docs/03-deployment/PHASE1_DISCOVERY_RUNBOOK.md](PHASE1_DISCOVERY_RUNBOOK.md)
+- [docs/03-deployment/DBIS_PHASE3_E2E_PRODUCTION_SIMULATION_RUNBOOK.md](DBIS_PHASE3_E2E_PRODUCTION_SIMULATION_RUNBOOK.md)
+- [dbis_chain_138_technical_master_plan.md](../../dbis_chain_138_technical_master_plan.md)
--- a/docs/03-deployment/DBIS_PHASE3_E2E_PRODUCTION_SIMULATION_RUNBOOK.md
+++ b/docs/03-deployment/DBIS_PHASE3_E2E_PRODUCTION_SIMULATION_RUNBOOK.md
@@ -0,0 +1,76 @@
+# DBIS Phase 3 — End-to-end production simulation
+
+**Last updated:** 2026-03-28  
+**Purpose:** Operationalize [dbis_chain_138_technical_master_plan.md](../../dbis_chain_138_technical_master_plan.md) Section 18 (example flow) and Sections 14, 17 as **repeatable liveness and availability checks** — not a single product build or a full business-E2E execution harness.
+
+**Prerequisites:** LAN access where noted; [DBIS_NODE_ROLE_MATRIX.md](../02-architecture/DBIS_NODE_ROLE_MATRIX.md) for IPs/VMIDs; operator env via `scripts/lib/load-project-env.sh` for on-chain steps.
+
+---
+
+## Section 18 flow → concrete checks
+
+| Step | Master plan | Verification (repo-aligned) |
+|------|-------------|-----------------------------|
+| 1 | Identity issued (Indy) | Indy steward / node RPC on VMID **6400** (192.168.11.64); pool genesis tools — **manual** until automated issuer script exists. Current CTs `6400/6401/6402` are present, but app-level Indy listener verification is still pending. |
+| 2 | Credential verified (Aries) | Aries agents (if colocated): confirm stack on Indy/FireFly integration path — **TBD** per deployment. |
+| 3 | Workflow triggered (FireFly) | FireFly API on **6200** (currently restored as a minimal local gateway profile at `http://192.168.11.35:5000`). VMID **6201** is presently stopped / standby and should not be assumed active. |
+| 4 | Settlement executed (Besu) | JSON-RPC `eth_chainId`, `eth_blockNumber`, optional test transaction via `smom-dbis-138` with `RPC_URL_138=http://192.168.11.211:8545`. PMM/oracle: [ORACLE_AND_KEEPER_CHAIN138.md](../../smom-dbis-138/docs/integration/ORACLE_AND_KEEPER_CHAIN138.md). |
+| 5 | Cross-chain sync (Cacti) | Cacti = network monitoring here (VMID **5200**); **Hyperledger Cacti** interoperability is **future/optional** — track separately if deployed. **CCIP:** relay on r630-01 per [CCIP_RELAY_DEPLOYMENT.md](../07-ccip/CCIP_RELAY_DEPLOYMENT.md). |
+| 6 | Compliance recorded (Fabric) | Fabric CTs `6000/6001/6002` are present, but current app-level verification has not yet proven active peer / orderer workloads inside those CTs. Treat Fabric business-flow validation as manual until that gap is closed. |
+| 7 | Final settlement confirmed | Re-check Besu head on **2101** and **2201**; Blockscout **5000** for tx receipt if applicable. |
+
+---
+
+## Automated wrapper (partial)
+
+From repo root:
+
+```bash
+bash scripts/verify/run-dbis-phase3-e2e-simulation.sh
+```
+
+Optional:
+
+```bash
+RUN_CHAIN138_RPC_HEALTH=1 bash scripts/verify/run-dbis-phase3-e2e-simulation.sh
+```
+
+The script **does not** replace Indy/Fabric business transactions; it proves **liveness** of RPC, optional FireFly HTTP, and prints manual follow-ups. Treat it as a wrapper for infrastructure availability, not as proof that the complete seven-step business flow succeeded.
+
+---
+
+## Performance slice (Section 14 — Caliper)
+
+Hyperledger Caliper is **not** vendored in this repo. To add benchmarks:
+
+1. Install Caliper in a throwaway directory or CI image.
+2. Point a Besu **SUT** at `http://192.168.11.211:8545` (deploy/core RPC only) or a dedicated load-test RPC.
+3. Start with `simple` contract scenarios; record **TPS**, **latency p95**, and **error rate**.
+
+**Suggested initial thresholds (tune per governance):**
+
+| Metric | Initial gate (lab) |
+|--------|-------------------|
+| RPC error rate under steady load | less than 1% for 5 min |
+| Block production | no stall > 30s (QBFT) |
+| Public RPC `eth_blockNumber` lag vs core | within documented spread ([check-chain138-rpc-health.sh](../../scripts/verify/check-chain138-rpc-health.sh) defaults) |
+
+Details: [CALIPER_CHAIN138_PERF_HOOK.md](CALIPER_CHAIN138_PERF_HOOK.md).
+
+---
+
+## Production readiness certification (matrix-driven)
+
+Use [OPERATOR_READY_CHECKLIST.md](../00-meta/OPERATOR_READY_CHECKLIST.md) section **10** plus:
+
+- Phase 1 report timestamped under `reports/phase1-discovery/`.
+- Phase 2 milestones acknowledged (Ceph/segmentation may be partial).
+- Node Role Matrix: no critical **TBD** for entity-owned validators without a documented interim owner.
+
+---
+
+## Related
+
+- [PHASE1_DISCOVERY_RUNBOOK.md](PHASE1_DISCOVERY_RUNBOOK.md)
+- [DBIS_PHASE2_PROXMOX_SOVEREIGNIZATION_ROADMAP.md](../02-architecture/DBIS_PHASE2_PROXMOX_SOVEREIGNIZATION_ROADMAP.md)
+- [verify-end-to-end-routing.sh](../../scripts/verify/verify-end-to-end-routing.sh) — public/private ingress
--- a/docs/03-deployment/PHASE1_DISCOVERY_RUNBOOK.md
+++ b/docs/03-deployment/PHASE1_DISCOVERY_RUNBOOK.md
@@ -0,0 +1,119 @@
+# Phase 1 — Reality mapping runbook
+
+**Last updated:** 2026-03-28  
+**Purpose:** Operational steps for [dbis_chain_138_technical_master_plan.md](../../dbis_chain_138_technical_master_plan.md) Sections 3 and 19.1–19.3: inventory Proxmox, Besu, optional Hyperledger CTs, and record dependency context.
+
+**Outputs:** Timestamped report under `reports/phase1-discovery/` (created by the orchestrator script).
+
+**Pass / fail semantics:** the orchestrator still writes a full evidence report when a critical section fails, but it now exits **non-zero** and appends a final **Critical failure summary** section. Treat the markdown as evidence capture, not automatic proof of success.
+
+---
+
+## Prerequisites
+
+- Repo root; `jq` recommended for template audit.
+- **LAN:** SSH keys to Proxmox nodes (default `192.168.11.10`, `.11`, `.12` from `config/ip-addresses.conf`).
+- Optional: `curl` for RPC probe.
+
+---
+
+## One-command orchestrator
+
+```bash
+bash scripts/verify/run-phase1-discovery.sh
+```
+
+Optional Hyperledger container smoke checks (SSH to r630-02, `pct exec`):
+
+```bash
+HYPERLEDGER_PROBE=1 bash scripts/verify/run-phase1-discovery.sh
+```
+
+Each run writes:
+
+- `reports/phase1-discovery/phase1-discovery-YYYYMMDD_HHMMSS.md` — human-readable report with embedded diagram and command output.
+- `reports/phase1-discovery/phase1-discovery-YYYYMMDD_HHMMSS.log` — same content log mirror.
+
+Critical sections for exit status:
+
+- Proxmox template audit
+- `pvecm` / `pvesm` / `pct list` / `qm list`
+- Chain 138 core RPC quick probe
+- `check-chain138-rpc-health.sh`
+- `verify-besu-enodes-and-ips.sh`
+- optional Hyperledger CT probe when `HYPERLEDGER_PROBE=1`
+
+See also `reports/phase1-discovery/README.md`.
+
+---
+
+## Dependency graph (logical)
+
+Ingress → RPC/sentries/validators → explorer; CCIP relay on r630-01 uses public RPC; FireFly/Fabric/Indy are optional DLT sides for the Section 18 flow.
+
+```mermaid
+flowchart TB
+  subgraph edge [EdgeIngress]
+    CF[Cloudflare_DNS]
+    NPM[NPMplus_LXC]
+  end
+  subgraph besu [Chain138_Besu]
+    RPCpub[RPC_public_2201]
+    RPCcore[RPC_core_2101]
+    Val[Validators_1000_1004]
+    Sen[Sentries_1500_1508]
+  end
+  subgraph observe [Observability]
+    BS[Blockscout_5000]
+  end
+  subgraph relay [CrossChain]
+    CCIP[CCIP_relay_r63001_host]
+  end
+  subgraph dlt [Hyperledger_optional]
+    FF[FireFly_6200_6201]
+    Fab[Fabric_6000_plus]
+    Indy[Indy_6400_plus]
+  end
+  CF --> NPM
+  NPM --> RPCpub
+  NPM --> RPCcore
+  NPM --> BS
+  RPCpub --> Sen
+  RPCcore --> Sen
+  Sen --> Val
+  CCIP --> RPCpub
+  FF --> Fab
+  FF --> Indy
+```
+
+**References:** [PROXMOX_VE_OPERATIONAL_DEPLOYMENT_TEMPLATE.md](PROXMOX_VE_OPERATIONAL_DEPLOYMENT_TEMPLATE.md), [ALL_VMIDS_ENDPOINTS.md](../04-configuration/ALL_VMIDS_ENDPOINTS.md), [NETWORK_CONFIGURATION_MASTER.md](../11-references/NETWORK_CONFIGURATION_MASTER.md).
+
+---
+
+## Manual follow-ups
+
+| Task | Command / doc |
+|------|----------------|
+| Template vs live VMIDs | `bash scripts/verify/audit-proxmox-operational-template.sh` |
+| Besu configs | `bash scripts/audit-besu-configs.sh` (review before running; LAN) |
+| IP audit | `bash scripts/audit-all-vm-ips.sh` |
+| Node role constitution | [DBIS_NODE_ROLE_MATRIX.md](../02-architecture/DBIS_NODE_ROLE_MATRIX.md) |
+
+---
+
+## ML110 documentation reconciliation
+
+**Physical inventory** summary must match **live** role:
+
+- If `192.168.11.10` still runs **Proxmox** and hosts guests, state that explicitly.
+- If migration to **OPNsense/pfSense WAN aggregator** is in progress or complete, align with [NETWORK_CONFIGURATION_MASTER.md](../11-references/NETWORK_CONFIGURATION_MASTER.md) and [PHYSICAL_HARDWARE_INVENTORY.md](../02-architecture/PHYSICAL_HARDWARE_INVENTORY.md).
+
+Use `pvecm status` and `pct list` on `.10` from the orchestrator output as evidence.
+
+---
+
+## Related
+
+- [DBIS_NODE_ROLE_MATRIX.md](../02-architecture/DBIS_NODE_ROLE_MATRIX.md)
+- [DBIS_PHASE2_PROXMOX_SOVEREIGNIZATION_ROADMAP.md](../02-architecture/DBIS_PHASE2_PROXMOX_SOVEREIGNIZATION_ROADMAP.md)
+- [DBIS_PHASE3_E2E_PRODUCTION_SIMULATION_RUNBOOK.md](DBIS_PHASE3_E2E_PRODUCTION_SIMULATION_RUNBOOK.md)