proxmox/docs/04-configuration/VAULT_SHARD_CUSTODY_POLICY.md

# Vault Shard Custody Policy

**Last Updated:** 2026-04-18
**Status:** Proposed decision draft for approve/revise
**Scope:** Secret-shard custody for vault-related recovery material, rotation preparation, and admin-control continuity

---

## 1. Purpose

This document defines the custody policy for any shard-split recovery material used in the vault and contract-admin control plane. It exists to remove ambiguity before live admin rotation work begins.

The policy is designed to:

- prevent any one person from reconstructing privileged material alone;
- preserve recoverability if one or more custodians become unavailable;
- keep rotation work operationally realistic for Chain 138 and the follow-on chains;
- keep custody procedures compatible with a Safe-first admin model.

---

## 2. Decision Summary

**Recommended selection for §3:** adopt a **3-of-5 shard custody model** with **role-separated custodians**, **one shard per custodian**, and **Safe-based contract admin** as the steady-state control plane.

This means:

- recovery material is split into **5 shards**;
- any **3 shards** are required to reconstruct;
- no custodian may hold more than **1 shard**;
- the operational target admin on-chain is a **Gnosis Safe / Safe multisig**, not an EOA;
- shard reconstruction is allowed only for an approved rotation, incident recovery, or disaster-recovery drill.

---

## 3. Recommended Custody Policy

### 3.1 Selected model

Use **Shamir-style 3-of-5 sharding** for the recovery secret or equivalent root material that would allow privileged admin continuity.

This is the recommended balance because:

- **2-of-3 is too fragile** for travel, turnover, illness, or simultaneous unavailability;
- **4-of-7 adds coordination drag** without enough extra safety for the current operator size;
- **3-of-5** keeps strong separation of control while still being practical during a real incident.

### 3.2 Required custodian classes

Assign the 5 shards to 5 distinct custodians drawn from distinct responsibility domains:

1. **Operations lead**
2. **Security lead**
3. **Platform or protocol lead**
4. **Executive or governance delegate**
5. **Independent recovery custodian**

The independent recovery custodian should not be part of the routine deployer or signer path. This can be a board-level delegate, outside counsel escrow, or another approved non-operator custodian with documented identity and retrieval process.

### 3.3 Custody rules

- Each custodian holds exactly **one shard**.
- No household, single reporting line, or single laptop/password manager may control enough material to reach threshold alone.
- Shards must be stored **offline** or in an **offline-first** medium.
- At least **2 shards** must be held in physically distinct locations.
- No shard may be stored in the same place as the full reconstruction instructions unless those instructions are separately access-controlled.
- No plaintext shard may be committed to git, chat, ticketing, or shared cloud docs.
- Photographs, screenshots, clipboard sync, and auto-backup of shard material are prohibited.

### 3.4 Reconstruction approval gate

Reconstruction may occur only when one of the following is true:

- an approved **planned admin rotation** is in progress;
- a **suspected key compromise** or confirmed loss event requires emergency action;
- a scheduled **disaster-recovery drill** has been approved in advance.

Every reconstruction event requires:

- a ticket or written change record;
- named approvers;
- reason for reconstruction;
- time window;
- witness log of which custodians participated;
- post-event confirmation that temporary plaintext was destroyed.

### 3.5 Safe-first operating model

The shard policy does **not** mean the reconstructed secret should become the daily operating admin.

The steady-state operating model is:

- on-chain admin lives in a **Safe**;
- Safe signers use hardware-backed wallets where possible;
- shard reconstruction is reserved for recovery or controlled migration events;
- post-rotation, recovered or superseded material is re-sharded or retired.

### 3.6 Recommended Safe posture

For `NEW_ADMIN_ADDRESS`, the preferred target is:

- a **Chain 138 Safe**;
- **3-of-5 threshold** if 5 reliable signers are available;
- **2-of-3 threshold** only as an interim fallback if the 5-signer set is not ready yet.

If a temporary `2-of-3` Safe is used to unblock Chain 138, it should be explicitly marked as **interim** and scheduled for migration to **3-of-5** before broader multi-chain rollout.

### 3.7 Lifecycle requirements

- Re-verify custodian availability quarterly.
- Re-issue shards after any custodian departure, suspected exposure, or failed drill.
- Run at least one documented recovery drill before expanding from Chain 138 to additional production chains.
- Keep a custody register with custodian name, role, storage method, storage region, last attestation date, and replacement contact path.

---

## 4. Operational Guidance

### 4.1 When this policy blocks work

Rotation work should be treated as blocked if any of the following remain unresolved:

- the five custodian slots are not assigned;
- the threshold is not approved;
- `NEW_ADMIN_ADDRESS` is still an undecided EOA-versus-Safe placeholder;
- no recovery log template exists for reconstruction events.

### 4.2 What can proceed before live rotation

The following work may proceed before live execution:

- non-broadcast Forge scripts;
- dry-run and compile validation;
- runbook drafting;
- Safe creation and signer enrollment;
- custody register preparation;
- incident and recovery checklist review.

### 4.3 What must not proceed before approval

The following should not proceed until this policy is approved or revised and approved:

- live admin transfer;
- live shard generation for production recovery material;
- any irreversible decommissioning of the old admin path;
- multi-chain expansion of the rotation procedure.

---

## 5. Approve / Revise Checklist

Approve if you agree with all of the following:

- **Threshold:** `3-of-5`
- **Steady-state admin:** Safe multisig
- **Chain 138 target:** `NEW_ADMIN_ADDRESS` is a Chain 138 Safe
- **Fallback:** temporary `2-of-3` Safe allowed only if explicitly interim
- **Rollout gate:** no other-chain live rotation until Chain 138 drill and verification are complete

Revise if you want to change:

- threshold size;
- custodian classes;
- whether the independent custodian is internal or external;
- whether interim `2-of-3` is allowed;
- whether one additional chain may run in parallel with Chain 138.

---

## 6. Recommendation

Approve the **3-of-5 Safe-first** model as written.

It is the strongest option that still fits the current repo’s operational maturity, it removes the single-operator failure mode, and it gives the Chain 138 rotation work a clear governance target without forcing immediate parallel expansion.