Files
proxmox/docs/04-configuration/VAULT_SHARD_CUSTODY_POLICY.md
defiQUG b8613905bd
Some checks failed
Deploy to Phoenix / validate (push) Failing after 15s
Deploy to Phoenix / deploy (push) Has been skipped
chore: sync workspace — configs, docs, scripts, CI, pnpm, submodules
- Submodule pins: dbis_core, cross-chain-pmm-lps, mcp-proxmox (local, push may be pending), metamask-integration, smom-dbis-138
- Atomic swap + cross-chain-pmm-lops-publish, deploy-portal workflow, phoenix deploy-targets, routing/aggregator matrices
- Docs, token-lists, forge proxy, phoenix API, runbooks, verify scripts

Made-with: Cursor
2026-04-21 22:01:33 -07:00

6.8 KiB
Raw Blame History

Vault Shard Custody Policy

Last Updated: 2026-04-18
Status: Proposed decision draft for approve/revise
Scope: Secret-shard custody for vault-related recovery material, rotation preparation, and admin-control continuity


1. Purpose

This document defines the custody policy for any shard-split recovery material used in the vault and contract-admin control plane. It exists to remove ambiguity before live admin rotation work begins.

The policy is designed to:

  • prevent any one person from reconstructing privileged material alone;
  • preserve recoverability if one or more custodians become unavailable;
  • keep rotation work operationally realistic for Chain 138 and the follow-on chains;
  • keep custody procedures compatible with a Safe-first admin model.

2. Decision Summary

Recommended selection for §3: adopt a 3-of-5 shard custody model with role-separated custodians, one shard per custodian, and Safe-based contract admin as the steady-state control plane.

This means:

  • recovery material is split into 5 shards;
  • any 3 shards are required to reconstruct;
  • no custodian may hold more than 1 shard;
  • the operational target admin on-chain is a Gnosis Safe / Safe multisig, not an EOA;
  • shard reconstruction is allowed only for an approved rotation, incident recovery, or disaster-recovery drill.

3.1 Selected model

Use Shamir-style 3-of-5 sharding for the recovery secret or equivalent root material that would allow privileged admin continuity.

This is the recommended balance because:

  • 2-of-3 is too fragile for travel, turnover, illness, or simultaneous unavailability;
  • 4-of-7 adds coordination drag without enough extra safety for the current operator size;
  • 3-of-5 keeps strong separation of control while still being practical during a real incident.

3.2 Required custodian classes

Assign the 5 shards to 5 distinct custodians drawn from distinct responsibility domains:

  1. Operations lead
  2. Security lead
  3. Platform or protocol lead
  4. Executive or governance delegate
  5. Independent recovery custodian

The independent recovery custodian should not be part of the routine deployer or signer path. This can be a board-level delegate, outside counsel escrow, or another approved non-operator custodian with documented identity and retrieval process.

3.3 Custody rules

  • Each custodian holds exactly one shard.
  • No household, single reporting line, or single laptop/password manager may control enough material to reach threshold alone.
  • Shards must be stored offline or in an offline-first medium.
  • At least 2 shards must be held in physically distinct locations.
  • No shard may be stored in the same place as the full reconstruction instructions unless those instructions are separately access-controlled.
  • No plaintext shard may be committed to git, chat, ticketing, or shared cloud docs.
  • Photographs, screenshots, clipboard sync, and auto-backup of shard material are prohibited.

3.4 Reconstruction approval gate

Reconstruction may occur only when one of the following is true:

  • an approved planned admin rotation is in progress;
  • a suspected key compromise or confirmed loss event requires emergency action;
  • a scheduled disaster-recovery drill has been approved in advance.

Every reconstruction event requires:

  • a ticket or written change record;
  • named approvers;
  • reason for reconstruction;
  • time window;
  • witness log of which custodians participated;
  • post-event confirmation that temporary plaintext was destroyed.

3.5 Safe-first operating model

The shard policy does not mean the reconstructed secret should become the daily operating admin.

The steady-state operating model is:

  • on-chain admin lives in a Safe;
  • Safe signers use hardware-backed wallets where possible;
  • shard reconstruction is reserved for recovery or controlled migration events;
  • post-rotation, recovered or superseded material is re-sharded or retired.

For NEW_ADMIN_ADDRESS, the preferred target is:

  • a Chain 138 Safe;
  • 3-of-5 threshold if 5 reliable signers are available;
  • 2-of-3 threshold only as an interim fallback if the 5-signer set is not ready yet.

If a temporary 2-of-3 Safe is used to unblock Chain 138, it should be explicitly marked as interim and scheduled for migration to 3-of-5 before broader multi-chain rollout.

3.7 Lifecycle requirements

  • Re-verify custodian availability quarterly.
  • Re-issue shards after any custodian departure, suspected exposure, or failed drill.
  • Run at least one documented recovery drill before expanding from Chain 138 to additional production chains.
  • Keep a custody register with custodian name, role, storage method, storage region, last attestation date, and replacement contact path.

4. Operational Guidance

4.1 When this policy blocks work

Rotation work should be treated as blocked if any of the following remain unresolved:

  • the five custodian slots are not assigned;
  • the threshold is not approved;
  • NEW_ADMIN_ADDRESS is still an undecided EOA-versus-Safe placeholder;
  • no recovery log template exists for reconstruction events.

4.2 What can proceed before live rotation

The following work may proceed before live execution:

  • non-broadcast Forge scripts;
  • dry-run and compile validation;
  • runbook drafting;
  • Safe creation and signer enrollment;
  • custody register preparation;
  • incident and recovery checklist review.

4.3 What must not proceed before approval

The following should not proceed until this policy is approved or revised and approved:

  • live admin transfer;
  • live shard generation for production recovery material;
  • any irreversible decommissioning of the old admin path;
  • multi-chain expansion of the rotation procedure.

5. Approve / Revise Checklist

Approve if you agree with all of the following:

  • Threshold: 3-of-5
  • Steady-state admin: Safe multisig
  • Chain 138 target: NEW_ADMIN_ADDRESS is a Chain 138 Safe
  • Fallback: temporary 2-of-3 Safe allowed only if explicitly interim
  • Rollout gate: no other-chain live rotation until Chain 138 drill and verification are complete

Revise if you want to change:

  • threshold size;
  • custodian classes;
  • whether the independent custodian is internal or external;
  • whether interim 2-of-3 is allowed;
  • whether one additional chain may run in parallel with Chain 138.

6. Recommendation

Approve the 3-of-5 Safe-first model as written.

It is the strongest option that still fits the current repos operational maturity, it removes the single-operator failure mode, and it gives the Chain 138 rotation work a clear governance target without forcing immediate parallel expansion.