- Submodule pins: dbis_core, cross-chain-pmm-lps, mcp-proxmox (local, push may be pending), metamask-integration, smom-dbis-138 - Atomic swap + cross-chain-pmm-lops-publish, deploy-portal workflow, phoenix deploy-targets, routing/aggregator matrices - Docs, token-lists, forge proxy, phoenix API, runbooks, verify scripts Made-with: Cursor
175 lines
6.8 KiB
Markdown
175 lines
6.8 KiB
Markdown
# Vault Shard Custody Policy
|
||
|
||
**Last Updated:** 2026-04-18
|
||
**Status:** Proposed decision draft for approve/revise
|
||
**Scope:** Secret-shard custody for vault-related recovery material, rotation preparation, and admin-control continuity
|
||
|
||
---
|
||
|
||
## 1. Purpose
|
||
|
||
This document defines the custody policy for any shard-split recovery material used in the vault and contract-admin control plane. It exists to remove ambiguity before live admin rotation work begins.
|
||
|
||
The policy is designed to:
|
||
|
||
- prevent any one person from reconstructing privileged material alone;
|
||
- preserve recoverability if one or more custodians become unavailable;
|
||
- keep rotation work operationally realistic for Chain 138 and the follow-on chains;
|
||
- keep custody procedures compatible with a Safe-first admin model.
|
||
|
||
---
|
||
|
||
## 2. Decision Summary
|
||
|
||
**Recommended selection for §3:** adopt a **3-of-5 shard custody model** with **role-separated custodians**, **one shard per custodian**, and **Safe-based contract admin** as the steady-state control plane.
|
||
|
||
This means:
|
||
|
||
- recovery material is split into **5 shards**;
|
||
- any **3 shards** are required to reconstruct;
|
||
- no custodian may hold more than **1 shard**;
|
||
- the operational target admin on-chain is a **Gnosis Safe / Safe multisig**, not an EOA;
|
||
- shard reconstruction is allowed only for an approved rotation, incident recovery, or disaster-recovery drill.
|
||
|
||
---
|
||
|
||
## 3. Recommended Custody Policy
|
||
|
||
### 3.1 Selected model
|
||
|
||
Use **Shamir-style 3-of-5 sharding** for the recovery secret or equivalent root material that would allow privileged admin continuity.
|
||
|
||
This is the recommended balance because:
|
||
|
||
- **2-of-3 is too fragile** for travel, turnover, illness, or simultaneous unavailability;
|
||
- **4-of-7 adds coordination drag** without enough extra safety for the current operator size;
|
||
- **3-of-5** keeps strong separation of control while still being practical during a real incident.
|
||
|
||
### 3.2 Required custodian classes
|
||
|
||
Assign the 5 shards to 5 distinct custodians drawn from distinct responsibility domains:
|
||
|
||
1. **Operations lead**
|
||
2. **Security lead**
|
||
3. **Platform or protocol lead**
|
||
4. **Executive or governance delegate**
|
||
5. **Independent recovery custodian**
|
||
|
||
The independent recovery custodian should not be part of the routine deployer or signer path. This can be a board-level delegate, outside counsel escrow, or another approved non-operator custodian with documented identity and retrieval process.
|
||
|
||
### 3.3 Custody rules
|
||
|
||
- Each custodian holds exactly **one shard**.
|
||
- No household, single reporting line, or single laptop/password manager may control enough material to reach threshold alone.
|
||
- Shards must be stored **offline** or in an **offline-first** medium.
|
||
- At least **2 shards** must be held in physically distinct locations.
|
||
- No shard may be stored in the same place as the full reconstruction instructions unless those instructions are separately access-controlled.
|
||
- No plaintext shard may be committed to git, chat, ticketing, or shared cloud docs.
|
||
- Photographs, screenshots, clipboard sync, and auto-backup of shard material are prohibited.
|
||
|
||
### 3.4 Reconstruction approval gate
|
||
|
||
Reconstruction may occur only when one of the following is true:
|
||
|
||
- an approved **planned admin rotation** is in progress;
|
||
- a **suspected key compromise** or confirmed loss event requires emergency action;
|
||
- a scheduled **disaster-recovery drill** has been approved in advance.
|
||
|
||
Every reconstruction event requires:
|
||
|
||
- a ticket or written change record;
|
||
- named approvers;
|
||
- reason for reconstruction;
|
||
- time window;
|
||
- witness log of which custodians participated;
|
||
- post-event confirmation that temporary plaintext was destroyed.
|
||
|
||
### 3.5 Safe-first operating model
|
||
|
||
The shard policy does **not** mean the reconstructed secret should become the daily operating admin.
|
||
|
||
The steady-state operating model is:
|
||
|
||
- on-chain admin lives in a **Safe**;
|
||
- Safe signers use hardware-backed wallets where possible;
|
||
- shard reconstruction is reserved for recovery or controlled migration events;
|
||
- post-rotation, recovered or superseded material is re-sharded or retired.
|
||
|
||
### 3.6 Recommended Safe posture
|
||
|
||
For `NEW_ADMIN_ADDRESS`, the preferred target is:
|
||
|
||
- a **Chain 138 Safe**;
|
||
- **3-of-5 threshold** if 5 reliable signers are available;
|
||
- **2-of-3 threshold** only as an interim fallback if the 5-signer set is not ready yet.
|
||
|
||
If a temporary `2-of-3` Safe is used to unblock Chain 138, it should be explicitly marked as **interim** and scheduled for migration to **3-of-5** before broader multi-chain rollout.
|
||
|
||
### 3.7 Lifecycle requirements
|
||
|
||
- Re-verify custodian availability quarterly.
|
||
- Re-issue shards after any custodian departure, suspected exposure, or failed drill.
|
||
- Run at least one documented recovery drill before expanding from Chain 138 to additional production chains.
|
||
- Keep a custody register with custodian name, role, storage method, storage region, last attestation date, and replacement contact path.
|
||
|
||
---
|
||
|
||
## 4. Operational Guidance
|
||
|
||
### 4.1 When this policy blocks work
|
||
|
||
Rotation work should be treated as blocked if any of the following remain unresolved:
|
||
|
||
- the five custodian slots are not assigned;
|
||
- the threshold is not approved;
|
||
- `NEW_ADMIN_ADDRESS` is still an undecided EOA-versus-Safe placeholder;
|
||
- no recovery log template exists for reconstruction events.
|
||
|
||
### 4.2 What can proceed before live rotation
|
||
|
||
The following work may proceed before live execution:
|
||
|
||
- non-broadcast Forge scripts;
|
||
- dry-run and compile validation;
|
||
- runbook drafting;
|
||
- Safe creation and signer enrollment;
|
||
- custody register preparation;
|
||
- incident and recovery checklist review.
|
||
|
||
### 4.3 What must not proceed before approval
|
||
|
||
The following should not proceed until this policy is approved or revised and approved:
|
||
|
||
- live admin transfer;
|
||
- live shard generation for production recovery material;
|
||
- any irreversible decommissioning of the old admin path;
|
||
- multi-chain expansion of the rotation procedure.
|
||
|
||
---
|
||
|
||
## 5. Approve / Revise Checklist
|
||
|
||
Approve if you agree with all of the following:
|
||
|
||
- **Threshold:** `3-of-5`
|
||
- **Steady-state admin:** Safe multisig
|
||
- **Chain 138 target:** `NEW_ADMIN_ADDRESS` is a Chain 138 Safe
|
||
- **Fallback:** temporary `2-of-3` Safe allowed only if explicitly interim
|
||
- **Rollout gate:** no other-chain live rotation until Chain 138 drill and verification are complete
|
||
|
||
Revise if you want to change:
|
||
|
||
- threshold size;
|
||
- custodian classes;
|
||
- whether the independent custodian is internal or external;
|
||
- whether interim `2-of-3` is allowed;
|
||
- whether one additional chain may run in parallel with Chain 138.
|
||
|
||
---
|
||
|
||
## 6. Recommendation
|
||
|
||
Approve the **3-of-5 Safe-first** model as written.
|
||
|
||
It is the strongest option that still fits the current repo’s operational maturity, it removes the single-operator failure mode, and it gives the Chain 138 rotation work a clear governance target without forcing immediate parallel expansion.
|