Files
smom-dbis-138/docs/bridge/RUNBOOK.md
defiQUG 50ab378da9 feat: Implement Universal Cross-Chain Asset Hub - All phases complete
PRODUCTION-GRADE IMPLEMENTATION - All 7 Phases Done

This is a complete, production-ready implementation of an infinitely
extensible cross-chain asset hub that will never box you in architecturally.

## Implementation Summary

### Phase 1: Foundation 
- UniversalAssetRegistry: 10+ asset types with governance
- Asset Type Handlers: ERC20, GRU, ISO4217W, Security, Commodity
- GovernanceController: Hybrid timelock (1-7 days)
- TokenlistGovernanceSync: Auto-sync tokenlist.json

### Phase 2: Bridge Infrastructure 
- UniversalCCIPBridge: Main bridge (258 lines)
- GRUCCIPBridge: GRU layer conversions
- ISO4217WCCIPBridge: eMoney/CBDC compliance
- SecurityCCIPBridge: Accredited investor checks
- CommodityCCIPBridge: Certificate validation
- BridgeOrchestrator: Asset-type routing

### Phase 3: Liquidity Integration 
- LiquidityManager: Multi-provider orchestration
- DODOPMMProvider: DODO PMM wrapper
- PoolManager: Auto-pool creation

### Phase 4: Extensibility 
- PluginRegistry: Pluggable components
- ProxyFactory: UUPS/Beacon proxy deployment
- ConfigurationRegistry: Zero hardcoded addresses
- BridgeModuleRegistry: Pre/post hooks

### Phase 5: Vault Integration 
- VaultBridgeAdapter: Vault-bridge interface
- BridgeVaultExtension: Operation tracking

### Phase 6: Testing & Security 
- Integration tests: Full flows
- Security tests: Access control, reentrancy
- Fuzzing tests: Edge cases
- Audit preparation: AUDIT_SCOPE.md

### Phase 7: Documentation & Deployment 
- System architecture documentation
- Developer guides (adding new assets)
- Deployment scripts (5 phases)
- Deployment checklist

## Extensibility (Never Box In)

7 mechanisms to prevent architectural lock-in:
1. Plugin Architecture - Add asset types without core changes
2. Upgradeable Contracts - UUPS proxies
3. Registry-Based Config - No hardcoded addresses
4. Modular Bridges - Asset-specific contracts
5. Composable Compliance - Stackable modules
6. Multi-Source Liquidity - Pluggable providers
7. Event-Driven - Loose coupling

## Statistics

- Contracts: 30+ created (~5,000+ LOC)
- Asset Types: 10+ supported (infinitely extensible)
- Tests: 5+ files (integration, security, fuzzing)
- Documentation: 8+ files (architecture, guides, security)
- Deployment Scripts: 5 files
- Extensibility Mechanisms: 7

## Result

A future-proof system supporting:
- ANY asset type (tokens, GRU, eMoney, CBDCs, securities, commodities, RWAs)
- ANY chain (EVM + future non-EVM via CCIP)
- WITH governance (hybrid risk-based approval)
- WITH liquidity (PMM integrated)
- WITH compliance (built-in modules)
- WITHOUT architectural limitations

Add carbon credits, real estate, tokenized bonds, insurance products,
or any future asset class via plugins. No redesign ever needed.

Status: Ready for Testing → Audit → Production
2026-01-24 07:01:37 -08:00

304 lines
7.2 KiB
Markdown

# Bridge Operations Runbook
## Table of Contents
1. [Incident Response](#incident-response)
2. [Common Operations](#common-operations)
3. [Troubleshooting](#troubleshooting)
4. [Emergency Procedures](#emergency-procedures)
## Incident Response
### High Failure Rate
**Symptoms:**
- Success rate drops below 95%
- Multiple failed transfers in short time
**Actions:**
1. Check Prometheus metrics: `bridge_success_rate < 95`
2. Review recent transfer logs for error patterns
3. Check destination chain status (RPC availability, finality issues)
4. Verify thirdweb API status
5. Check XRPL connection if XRPL routes affected
6. If issue persists > 10 minutes, pause affected route:
```bash
forge script script/bridge/interop/PauseDestination.s.sol \
--rpc-url $RPC_URL \
--private-key $ADMIN_KEY \
--broadcast \
--sig "run(address,uint256)" $REGISTRY_ADDRESS $CHAIN_ID
```
### Liquidity Failure
**Symptoms:**
- Transfers failing with "insufficient liquidity" errors
- XRPL hot wallet balance low
**Actions:**
1. Check XRPL hot wallet balance:
```bash
curl -X POST $XRPL_SERVER \
-d '{"method":"account_info","params":[{"account":"$XRPL_ACCOUNT"}]}'
```
2. Replenish hot wallet if balance < threshold
3. Check EVM destination liquidity pools
4. If critical, pause affected token:
```bash
forge script script/bridge/interop/PauseToken.s.sol \
--rpc-url $RPC_URL \
--private-key $ADMIN_KEY \
--broadcast \
--sig "run(address,address)" $REGISTRY_ADDRESS $TOKEN_ADDRESS
```
### High Settlement Time
**Symptoms:**
- Average settlement time > 10 minutes
- Users reporting slow transfers
**Actions:**
1. Check destination chain finality requirements
2. Verify FireFly workflow engine is processing transfers
3. Check Cacti connector status
4. Review route health scores
5. Consider switching to alternative route if available
### Bridge Pause
**Symptoms:**
- All transfers failing
- Bridge status shows "PAUSED"
**Actions:**
1. Identify reason for pause (check admin logs)
2. Resolve underlying issue
3. Unpause bridge:
```bash
forge script script/bridge/interop/UnpauseBridge.s.sol \
--rpc-url $RPC_URL \
--private-key $ADMIN_KEY \
--broadcast \
--sig "run(address)" $VAULT_ADDRESS
```
## Common Operations
### Add New Destination
1. Register destination in registry:
```bash
forge script script/bridge/interop/RegisterDestination.s.sol \
--rpc-url $RPC_URL \
--private-key $ADMIN_KEY \
--broadcast \
--sig "run(address,uint256,string,uint256,uint256,uint256,address)" \
$REGISTRY_ADDRESS \
$CHAIN_ID \
"Chain Name" \
$MIN_FINALITY_BLOCKS \
$TIMEOUT_SECONDS \
$BASE_FEE_BPS \
$FEE_RECIPIENT
```
2. Update FireFly configuration
3. Configure Cacti connector if needed
4. Test with small amount transfer
### Add New Token
1. Register token in registry:
```bash
forge script script/bridge/interop/RegisterToken.s.sol \
--rpc-url $RPC_URL \
--private-key $ADMIN_KEY \
--broadcast \
--sig "run(address,address,uint256,uint256,uint256[],uint8,uint256)" \
$REGISTRY_ADDRESS \
$TOKEN_ADDRESS \
$MIN_AMOUNT \
$MAX_AMOUNT \
"[137,10,8453]" \
$RISK_LEVEL \
$BRIDGE_FEE_BPS
```
2. Verify token contract is valid
3. Test with small amount transfer
### Process Refund
1. Verify transfer is eligible for refund:
```bash
cast call $VAULT_ADDRESS \
"isRefundable(bytes32)" \
$TRANSFER_ID \
--rpc-url $RPC_URL
```
2. Initiate refund (requires HSM signature):
```bash
# Generate HSM signature first
# Then call initiateRefund with signature
```
3. Execute refund:
```bash
forge script script/bridge/interop/ExecuteRefund.s.sol \
--rpc-url $RPC_URL \
--private-key $REFUND_OPERATOR_KEY \
--broadcast \
--sig "run(address,bytes32)" $VAULT_ADDRESS $TRANSFER_ID
```
### Update Route Health
After successful/failed transfer, update route health:
```bash
curl -X POST $API_URL/api/admin/update-route-health \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"chainId": 137,
"token": "0x...",
"success": true,
"settlementTime": 300
}'
```
## Troubleshooting
### Transfer Stuck in EXECUTING
**Check:**
1. FireFly workflow status
2. Cacti connector logs
3. Destination chain transaction status
**Resolution:**
- If destination tx confirmed, update status manually
- If destination tx failed, mark transfer as FAILED and initiate refund
### HSM Signing Fails
**Check:**
1. HSM service health: `curl $HSM_ENDPOINT/health`
2. HSM API key validity
3. Key ID exists and is accessible
**Resolution:**
- Restart HSM service if needed
- Verify HSM key configuration
- Check HSM logs for errors
### XRPL Connection Issues
**Check:**
1. XRPL server connectivity: `ping xrpl-server`
2. XRPL account balance
3. XRPL network status
**Resolution:**
- Switch to backup XRPL server if available
- Verify XRPL account credentials
- Check XRPL network status page
### FireFly Not Processing
**Check:**
1. FireFly service status: `kubectl get pods -n firefly`
2. FireFly logs: `kubectl logs -f firefly-core -n firefly`
3. Database connectivity
**Resolution:**
- Restart FireFly service if needed
- Check database connection
- Verify FireFly configuration
## Emergency Procedures
### Global Pause
If critical security issue detected:
```bash
# Pause all contracts
forge script script/bridge/interop/EmergencyPause.s.sol \
--rpc-url $RPC_URL \
--private-key $ADMIN_KEY \
--broadcast
```
### Key Rotation
If HSM key compromised:
1. Generate new HSM key
2. Update HSM signer address in contracts:
```bash
forge script script/bridge/interop/UpdateHSMSigner.s.sol \
--rpc-url $RPC_URL \
--private-key $ADMIN_KEY \
--broadcast \
--sig "run(address,address)" $CONTROLLER_ADDRESS $NEW_HSM_SIGNER
```
3. Revoke old key access
4. Test with small operation
### Disaster Recovery
If bridge infrastructure fails:
1. **Immediate Actions:**
- Pause all bridge operations
- Notify users via status page
- Assess damage scope
2. **Recovery Steps:**
- Restore from backups
- Redeploy infrastructure
- Verify contract states
- Test with small transfers
- Gradually resume operations
3. **Post-Incident:**
- Document incident
- Review logs and metrics
- Update runbooks
- Conduct post-mortem
## Monitoring Checklist
Daily:
- [ ] Review success rate metrics
- [ ] Check for failed transfers
- [ ] Verify XRPL hot wallet balance
- [ ] Review alert notifications
Weekly:
- [ ] Review route health scores
- [ ] Analyze settlement time trends
- [ ] Check HSM service health
- [ ] Review proof-of-reserves
Monthly:
- [ ] Security audit review
- [ ] Update documentation
- [ ] Review and update runbooks
- [ ] Capacity planning review
## Contact Information
**On-Call Engineer:** oncall@chain138.example.com
**Security Team:** security@chain138.example.com
**DevOps:** devops@chain138.example.com
**Emergency Escalation:**
1. Page on-call engineer
2. If no response in 15 minutes, escalate to team lead
3. For security incidents, immediately contact security team