# Operations Runbook - Complete System **Date**: Operations Runbook **Status**: ✅ COMPLETE --- ## Overview This runbook provides operational procedures for: 1. Vault System Operations 2. ISO-4217 W Token System Operations 3. Bridge System Operations 4. Emergency Procedures --- ## 1. Daily Operations ### 1.1 Vault System Monitoring #### Health Check ```bash # Check vault health ratios cast call $LEDGER_ADDRESS "getVaultHealth(address)" $VAULT_ADDRESS --rpc-url $RPC_URL # Check total collateral cast call $LEDGER_ADDRESS "totalCollateral(address)" $ASSET_ADDRESS --rpc-url $RPC_URL # Check total debt cast call $LEDGER_ADDRESS "totalDebt(address)" $CURRENCY_ADDRESS --rpc-url $RPC_URL ``` #### Alert Thresholds - **Health Ratio < 120%**: Warning alert - **Health Ratio < 110%**: Critical alert (liquidation threshold) - **Debt Ceiling > 90%**: Warning alert - **Oracle Staleness > 1 hour**: Critical alert --- ### 1.2 ISO-4217 W Token Monitoring #### Reserve Verification ```bash # Check reserve sufficiency for USDW cast call $USDW_ADDRESS "isReserveSufficient()" --rpc-url $RPC_URL # Get reserve balance cast call $USDW_ADDRESS "verifiedReserve()" --rpc-url $RPC_URL # Get total supply cast call $USDW_ADDRESS "totalSupply()" --rpc-url $RPC_URL # Calculate reserve ratio # Reserve Ratio = (verifiedReserve / totalSupply) * 100 ``` #### Daily Reserve Check 1. **Check Reserve Oracle Reports** ```bash cast call $RESERVE_ORACLE "getVerifiedReserve(address)" $USDW_ADDRESS --rpc-url $RPC_URL ``` 2. **Verify Quorum** ```bash cast call $RESERVE_ORACLE "isQuorumMet(address)" $USDW_ADDRESS --rpc-url $RPC_URL ``` 3. **Check for Stale Reports** - Reports older than 1 hour should be removed - If quorum not met, investigate oracle issues #### Alert Thresholds - **Reserve Ratio < 100%**: CRITICAL - Minting must halt - **Reserve Ratio < 105%**: Warning alert - **Oracle Quorum Not Met**: Critical alert - **Stale Reports Detected**: Warning alert --- ### 1.3 Bridge System Monitoring #### Bridge Health Metrics ```bash # Check bridge success rate # Query bridge events for success/failure counts # Check settlement times # Monitor TransferStatusUpdated events # Check reserve verification failures # Monitor ReserveVerified events with sufficient=false ``` #### Alert Thresholds - **Success Rate < 95%**: Warning alert - **Success Rate < 90%**: Critical alert - **Settlement Time > 1 hour**: Warning alert - **Reserve Verification Failures**: Critical alert - **Compliance Violations**: Critical alert --- ### 1.4 Reserve and Stabilization Policies (VAULT_SYSTEM_MASTER_TECHNICAL_PLAN) The following formulas and checklists are from [VAULT_SYSTEM_MASTER_TECHNICAL_PLAN](../../../docs/VAULT_SYSTEM_MASTER_TECHNICAL_PLAN.md). Use them for sizing and operational verification. #### Reserve Sizing Model - **Variables:** PeakMinuteOutflow = P, StabilizationWindow = T (minutes). - **Required reserve:** Reserve ≥ P × T. - **Recommended safety factor:** 3–5× peak minute outflow. - **Example:** P = 10,000, T = 5 min → Reserve ≥ 50,000; with 3× safety → 150,000. #### Cantilever Stabilization Model - **Condition:** s × f ≥ Δ (s = micro trade size, f = micro trade frequency, Δ = net imbalance per minute). - **Dynamic rule:** If deviation > θ, set s = k × deviation (eliminates fixed frequency dependency). - **Use:** Size and frequency of stabilization trades so throughput offsets macro flow. #### Bridge Liquidity Buffer - **Rule:** BridgeReserve ≥ PeakBridgeOutflow × Latency (where Latency = bridge settlement time). - **Use:** Ensure cross-chain bridge buffers satisfy this so outflows do not exhaust reserves during settlement. #### Cross-chain parity and bridge buffer - **Objective:** Maintain |Price138 − Price651940| < ArbitrageThreshold (see [CROSS_CHAIN_ARBITRAGE_DESIGN](../../../docs/07-ccip/CROSS_CHAIN_ARBITRAGE_DESIGN.md)). Cross-chain private arbitrage bots execute when deviation exceeds threshold; bridge reserve must be sized so outflows do not exhaust reserves during settlement. - **Bridge buffer formula:** BridgeReserve ≥ PeakBridgeOutflow × Latency. - **PeakBridgeOutflow:** Measure from bridge events (e.g. lock/release or TransferInitiated volume) over a rolling window (e.g. peak hourly or daily outflow in USD or token units). - **Latency:** Bridge settlement time (e.g. typical time from lock on source chain to release on destination, in minutes or blocks). Use historical median or P95. - **Sizing steps:** (1) Query bridge contract events for initiated/released amounts per time window; (2) compute peak outflow; (3) measure typical settlement latency; (4) set minimum reserve = Peak × Latency; (5) add safety factor (e.g. 1.5–2×) and document in runbook. - **Alert when reserve below:** If BridgeReserve < PeakBridgeOutflow × Latency (or below safety threshold), trigger **Warning** alert. If reserve is falling and may breach within one settlement window, escalate to **Critical**. Integrate with existing monitoring (e.g. Prometheus + PagerDuty when monitoring stack is deployed). See [VAULT_SYSTEM_MASTER_TECHNICAL_PLAN](../../../docs/VAULT_SYSTEM_MASTER_TECHNICAL_PLAN.md) §9. #### Flash Loan Containment Checklist - Use **TWAP deviation detection** (not single-block price). - **Ignore single-block imbalance** for stabilizer triggers. - Require **sustained deviation for N blocks** before rebalancing. - **Cap per-block stabilization volume** to limit flash-driven execution. - **Target:** Flash drain recovery <3 blocks (per Master Plan §16). - **On-chain:** The [Stabilizer](../../contracts/bridge/trustless/integration/Stabilizer.sol) (Phase 3 + 6) implements block delay, sustained-deviation buffer, per-block volume cap, and slippage/gas checks; deploy and configure per [CONTRACT_DEPLOYMENT_RUNBOOK](../../../docs/03-deployment/CONTRACT_DEPLOYMENT_RUNBOOK.md) § Stabilizer. --- ## 2. Weekly Operations ### 2.1 Reserve Attestation #### Weekly Reserve Report 1. **Collect Custodial Balances** - USDW: Check USD custodial account - EURW: Check EUR custodial account - GBPW: Check GBP custodial account 2. **Submit Oracle Reports** ```solidity reserveOracle.submitReserveReport( tokenAddress, reserveBalance, block.timestamp ); ``` 3. **Verify Consensus** - Ensure quorum is met - Verify consensus matches custodial balance 4. **Publish Proof-of-Reserves** - Generate Merkle tree of reserves - Publish on-chain hash - Update public dashboard --- ### 2.2 System Health Review #### Review Metrics - Total vaults created - Total collateral locked - Total debt issued - W token supply per currency - Reserve ratios - Bridge operations count - Success rates #### Generate Report - Weekly operations report - Reserve attestation report - Compliance status report --- ## 3. Monthly Operations ### 3.1 Security Review #### Access Control Audit 1. Review all role assignments 2. Verify principle of least privilege 3. Check for unused roles 4. Review multi-sig configurations #### Compliance Audit 1. Verify money multiplier = 1.0 (all W tokens) 2. Verify GRU isolation (no GRU conversions) 3. Verify ISO-4217 compliance 4. Review reserve attestations #### Code Review 1. Review recent changes 2. Check for security updates 3. Review dependency updates 4. Verify test coverage --- ### 3.2 Performance Review #### Gas Optimization - Review gas usage trends - Identify optimization opportunities - Test optimization proposals #### System Performance - Review transaction throughput - Check oracle update frequency - Review bridge settlement times - Analyze user patterns --- ## 4. Emergency Procedures ### 4.1 Reserve Shortfall (W Tokens) #### Symptoms - Reserve < Supply for any W token - Money multiplier < 1.0 - Reserve verification fails #### Immediate Actions 1. **Halt Minting** ```solidity // Disable mint controller mintController.revokeRole(keccak256("MINTER_ROLE"), minterAddress); ``` 2. **Alert Team** - Notify operations team - Notify compliance team - Prepare public statement 3. **Investigate** - Check custodial account balance - Verify oracle reports - Check for accounting errors 4. **Remediation** - If accounting error: Correct and resume - If actual shortfall: Add reserves or halt operations - If oracle issue: Fix oracle and resume #### Recovery Steps 1. Verify reserve restored 2. Re-enable minting 3. Resume normal operations 4. Post-mortem review --- ### 4.2 Vault Liquidation Event #### Symptoms - Vault health ratio < 110% - Liquidation triggered #### Immediate Actions 1. **Verify Liquidation** ```bash cast call $LIQUIDATION_ADDRESS "canLiquidate(address)" $VAULT_ADDRESS --rpc-url $RPC_URL ``` 2. **Monitor Liquidation** - Track liquidation events - Verify collateral seized - Verify debt repaid 3. **Post-Liquidation** - Check remaining vault health - Verify system stability - Notify vault owner --- ### 4.3 Bridge Failure #### Symptoms - Bridge transaction fails - Settlement timeout - Reserve verification fails on bridge #### Immediate Actions 1. **Check Bridge Status** ```bash cast call $BRIDGE_REGISTRY "destinations(uint256)" $CHAIN_ID --rpc-url $RPC_URL ``` 2. **Investigate Failure** - Check transaction logs - Verify destination chain status - Check reserve verification 3. **Initiate Refund** (if timeout) ```solidity bridgeEscrowVault.initiateRefund(refundRequest, hsmSigner); bridgeEscrowVault.executeRefund(transferId); ``` 4. **Resume Operations** - Fix underlying issue - Re-enable bridge route - Resume normal operations --- ### 4.4 Oracle Failure #### Symptoms - Oracle staleness detected - Quorum not met - Price feed failure #### Immediate Actions 1. **Check Oracle Status** ```bash cast call $XAU_ORACLE "isFrozen()" --rpc-url $RPC_URL cast call $RESERVE_ORACLE "isQuorumMet(address)" $TOKEN_ADDRESS --rpc-url $RPC_URL ``` 2. **Freeze System** (if critical) ```solidity xauOracle.freeze(); // Pause vault operations if needed ``` 3. **Fix Oracle** - Add new oracle feeds - Remove stale reports - Restore quorum 4. **Resume Operations** ```solidity xauOracle.unfreeze(); ``` --- ### 4.5 Compliance Violation #### Symptoms - Money multiplier > 1.0 detected - GRU conversion detected - ISO-4217 violation #### Immediate Actions 1. **Halt Operations** - Pause minting - Pause bridging - Freeze affected tokens 2. **Investigate** - Review transaction history - Identify violation source - Check compliance guard logs 3. **Remediation** - Fix violation - Restore compliance - Resume operations 4. **Post-Mortem** - Document violation - Update compliance rules - Prevent recurrence --- ## 5. Incident Response ### 5.1 Incident Classification #### Severity Levels **CRITICAL (P0)**: - Reserve < Supply (money multiplier violation) - System compromise - Complete system failure **HIGH (P1)**: - Reserve ratio < 105% - Bridge failures > 10% - Oracle quorum failure **MEDIUM (P2)**: - Reserve ratio < 110% - Bridge failures 5-10% - Single oracle failure **LOW (P3)**: - Minor performance issues - Non-critical alerts - Documentation updates --- ### 5.2 Incident Response Process #### Step 1: Detection - Monitor alerts - Review logs - User reports #### Step 2: Assessment - Classify severity - Assess impact - Identify root cause #### Step 3: Containment - Apply emergency procedures - Halt affected operations - Isolate issue #### Step 4: Resolution - Fix root cause - Restore operations - Verify fix #### Step 5: Post-Mortem - Document incident - Identify improvements - Update procedures --- ## 6. Backup & Recovery ### 6.1 Backup Procedures #### Daily Backups - Contract state snapshots - Configuration backups - Access control backups #### Weekly Backups - Complete system state - Oracle configuration - Compliance rules #### Monthly Backups - Full system archive - Historical data - Audit logs --- ### 6.2 Recovery Procedures #### Contract State Recovery 1. Identify backup point 2. Restore contract state 3. Verify restoration 4. Resume operations #### Configuration Recovery 1. Restore configuration files 2. Verify settings 3. Test functionality 4. Resume operations --- ## 7. Monitoring Setup ### 7.1 Key Metrics #### Vault System Metrics - Total vaults - Total collateral (by asset) - Total debt (by currency) - Average health ratio - Liquidation events #### W Token Metrics - Supply per token (USDW, EURW, etc.) - Reserve balance per token - Reserve ratio per token - Mint/burn events - Redemption events #### Bridge Metrics - Bridge success rate - Average settlement time - Reserve verification success rate - Compliance check success rate - Transfer volume --- ### 7.2 Alert Configuration #### Critical Alerts ```yaml - name: Reserve Shortfall condition: reserveRatio < 100% action: halt_minting - name: Money Multiplier Violation condition: reserve < supply action: emergency_pause - name: Bridge Failure Rate High condition: successRate < 90% action: alert_team ``` #### Warning Alerts ```yaml - name: Reserve Ratio Low condition: reserveRatio < 105% action: alert_team - name: Vault Health Low condition: healthRatio < 120% action: alert_team - name: Oracle Staleness condition: reportAge > 1hour action: alert_team ``` --- ## 8. Operational Checklists ### 8.1 Daily Checklist - [ ] Check all reserve ratios (W tokens) - [ ] Verify oracle quorum status - [ ] Check vault health ratios - [ ] Review bridge success rates - [ ] Check for critical alerts - [ ] Review error logs ### 8.2 Weekly Checklist - [ ] Submit reserve attestations - [ ] Review system metrics - [ ] Check access control roles - [ ] Review compliance status - [ ] Generate weekly report - [ ] Update documentation ### 8.3 Monthly Checklist - [ ] Security review - [ ] Compliance audit - [ ] Performance review - [ ] Backup verification - [ ] Update procedures - [ ] Team training --- ## 9. Contact Information ### Emergency Contacts - **Operations Team**: [Contact Info] - **Security Team**: [Contact Info] - **Compliance Team**: [Contact Info] - **On-Call Engineer**: [Contact Info] ### Escalation Path 1. Operations Team (First Response) 2. Security Team (Security Issues) 3. Compliance Team (Compliance Issues) 4. Management (Critical Issues) --- **Last Updated**: Operations Runbook Complete