- ADD_CHAIN138_TO_LEDGER_LIVE: Ledger form done; public code review repo bis-innovations/LedgerLive; init/push commands - CONTRACT_DEPLOYMENT_RUNBOOK: Chain 138 gas price 1 gwei, 36-addr check, TransactionMirror workaround - CONTRACT_*: AddressMapper, MirrorManager deployed 2026-02-12; 36-address on-chain check - NEXT_STEPS_FOR_YOU: Ledger done; steps completable now (no LAN); run-completable-tasks-from-anywhere - MASTER_INDEX, OPERATOR_OPTIONAL, SMART_CONTRACTS_INVENTORY_SIMPLE: updates - LEDGER_BLOCKCHAIN_INTEGRATION_COMPLETE: bis-innovations/LedgerLive reference Co-authored-by: Cursor <cursoragent@cursor.com>
5.3 KiB
Solution: QBFT Quorum Loss - Network Stalled
Last Updated: 2026-01-31
Document Version: 1.0
Status: Active Documentation
Date: 2026-01-24
Status: 🔴 CRITICAL - ROOT CAUSE IDENTIFIED
🎯 Root Cause Found
The network has stopped because we lost QBFT validator quorum.
The Numbers
- Genesis configuration: 5 validators (192.168.11.100-104)
- Currently active: Only 2 validators (VMIDs 1003, 1004)
- Required for consensus: Minimum 4 validators (⅔ + 1 of 5)
- Validators lost: 3 out of 5 (60%)
Why Network Stalled
From Besu QBFT documentation:
"Configure your network to ensure you never lose more than 1/3 of your validators. If more than 1/3 of validators stop participating, the network stops creating new blocks and stalls."
We lost 60% of validators, far exceeding the 33% threshold.
📊 Current Network State
Missing Validators
| IP | Status | Evidence |
|---|---|---|
| 192.168.11.100 | ❌ Not running | No RPC endpoint |
| 192.168.11.101 | ❌ Not running | No RPC endpoint |
| 192.168.11.102 | ❌ Not running | No RPC endpoint |
Active Validators
| VMID | IP | Status |
|---|---|---|
| 1003 | 192.168.11.103 | ✅ Running (stuck in sync) |
| 1004 | 192.168.11.104 | ✅ Running (stuck in sync) |
What's Happening
- Validators 1003 & 1004 are running but can't produce blocks
- QBFT requires 4 out of 5 validators to reach consensus
- With only 2 active, consensus is impossible
- Validators are "stuck in sync" waiting for consensus
- Network is deadlocked
🔧 Solution Options
Option 1: Reduce Validator Count (RECOMMENDED - Fast)
Update genesis to only include the 2 working validators (1003, 1004).
Pros:
- Fast implementation
- Uses existing working validators
- Network can resume immediately
Cons:
- Lower Byzantine fault tolerance (need both validators)
- Less decentralized
Steps:
- Stop validators 1003 & 1004
- Update genesis extraData to only include validators 103 & 104
- Update static-nodes.json and permissioned-nodes.json
- Restart validators
- Network should resume
Option 2: Start Missing Validators (IDEAL - Slower)
Find and start validators 1000, 1001, 1002 to restore full quorum.
Pros:
- Maintains Byzantine fault tolerance
- Network continues as originally designed
- Can lose 1 validator and still operate
Cons:
- Need to locate where these validators are/were
- May need to redeploy them
- Takes more time
Steps:
- Find if validators 1000-1002 exist on other Proxmox hosts
- If not, deploy new validators with correct keys
- Configure them with proper genesis
- Start them
- Network should resume when quorum is met
🚀 Recommended Action: Option 1
Since we need to resume the network quickly for bridge operations, implement Option 1:
Step 1: Create New Genesis ExtraData
Current extraData includes 5 validators. We need to generate new extraData with only 2:
- 192.168.11.103 (validator 1003)
- 192.168.11.104 (validator 1004)
Step 2: Update Static & Permissioned Nodes
Remove enodes for 192.168.11.100-102 from:
/etc/besu/static-nodes.json/etc/besu/permissioned-nodes.json
Keep only:
- 192.168.11.103 (validator 1003)
- 192.168.11.104 (validator 1004)
- RPC and sentry nodes
Step 3: Restart Validators
With updated config, validators should:
- Skip full sync (already synced)
- Form quorum with 2/2 validators
- Resume block production
📝 Technical Details
QBFT Quorum Math
Validators: N = 5
Byzantine Fault Tolerance: F = (N - 1) / 3 = 1.33 ≈ 1
Required for Consensus: 2F + 1 = 3
But with 5 validators, need ceiling(5 * 2/3) = ceiling(3.33) = 4
Why 2 Validators Will Work
Validators: N = 2
Byzantine Fault Tolerance: F = (N - 1) / 3 = 0.33 ≈ 0
Required for Consensus: 2F + 1 = 1
With 2 validators, need ceiling(2 * 2/3) = ceiling(1.33) = 2
Both validators must be active, but that's what we have!
Limitation with 2 Validators
- Cannot tolerate ANY validator failure
- If one validator goes down, network stops
- Not Byzantine fault tolerant
- But it will work for bridge operations
⚠️ Important Notes
After Resuming Network
- Test immediately: Send a transaction to verify blocks produce
- Monitor closely: Watch both validators
- Plan for redundancy: Consider adding more validators later
- Document: Note that network now has reduced fault tolerance
Future Improvements
- Deploy 3 more validators to reach 5 total
- This provides 1 Byzantine fault tolerance
- Network can survive 1 validator failure
🎯 Next Steps
- ✅ Root cause identified: Quorum loss
- ⏳ Generate new genesis with 2 validators
- ⏳ Update node lists
- ⏳ Restart validators
- ⏳ Verify blocks resume
- ⏳ Test bridge transaction
📚 References
- Besu QBFT Documentation
- QBFT requires: "Configure your network to ensure you never lose more than 1/3 of your validators"
- Minimum validators for Byzantine fault tolerance: 4
Status: Root cause confirmed, solution ready to implement
Blocker: Insufficient validator quorum (2/5 vs 4/5 required)
Resolution: Reduce validator count to 2 or start 3 missing validators
Last Updated: 2026-01-24 01:32 PST