Complete markdown files cleanup and organization

- Organized 252 files across project
- Root directory: 187 → 2 files (98.9% reduction)
- Moved configuration guides to docs/04-configuration/
- Moved troubleshooting guides to docs/09-troubleshooting/
- Moved quick start guides to docs/01-getting-started/
- Moved reports to reports/ directory
- Archived temporary files
- Generated comprehensive reports and documentation
- Created maintenance scripts and guides

All files organized according to established standards.
This commit is contained in:
defiQUG
2026-01-06 01:46:25 -08:00
parent 1edcec953c
commit cb47cce074
1327 changed files with 217220 additions and 801 deletions

View File

@@ -0,0 +1,165 @@
# Fix Tunnel - Alternative Methods
## Problem
The `fix-shared-tunnel.sh` script cannot connect because your machine is on `192.168.1.0/24` and cannot directly reach `192.168.11.0/24`.
## Solution Methods
### Method 1: Use SSH Tunnel ⭐ Recommended
```bash
# Terminal 1: Start SSH tunnel
./setup_ssh_tunnel.sh
# Terminal 2: Run fix with localhost
PROXMOX_HOST=localhost ./fix-shared-tunnel.sh
```
### Method 2: Manual File Deployment
The script automatically generates configuration files when connection fails:
**Location**: `/tmp/tunnel-fix-10ab22da-8ea3-4e2e-a896-27ece2211a05/`
**Files**:
- `tunnel-services.yml` - Tunnel configuration
- `cloudflared-services.service` - Systemd service
- `DEPLOY_INSTRUCTIONS.md` - Deployment guide
**Deploy from Proxmox host**:
```bash
# Copy files to Proxmox host
scp -r /tmp/tunnel-fix-* root@192.168.11.12:/tmp/
# SSH to Proxmox host
ssh root@192.168.11.12
# Deploy to container
pct push 102 /tmp/tunnel-fix-*/tunnel-services.yml /etc/cloudflared/tunnel-services.yml
pct push 102 /tmp/tunnel-fix-*/cloudflared-services.service /etc/systemd/system/cloudflared-services.service
pct exec 102 -- chmod 600 /etc/cloudflared/tunnel-services.yml
pct exec 102 -- systemctl daemon-reload
pct exec 102 -- systemctl enable cloudflared-services.service
pct exec 102 -- systemctl start cloudflared-services.service
```
### Method 3: Cloudflare Dashboard ⭐ Easiest
1. Go to: https://one.dash.cloudflare.com/
2. Navigate to: **Zero Trust****Networks****Tunnels**
3. Find tunnel: `10ab22da-8ea3-4e2e-a896-27ece2211a05`
4. Click **Configure**
5. Add all hostnames:
| Hostname | Service | URL |
|----------|---------|-----|
| dbis-admin.d-bis.org | HTTP | 192.168.11.21:80 |
| dbis-api.d-bis.org | HTTP | 192.168.11.21:80 |
| dbis-api-2.d-bis.org | HTTP | 192.168.11.21:80 |
| mim4u.org.d-bis.org | HTTP | 192.168.11.21:80 |
| www.mim4u.org.d-bis.org | HTTP | 192.168.11.21:80 |
| rpc-http-prv.d-bis.org | HTTP | 192.168.11.21:80 |
| rpc-http-pub.d-bis.org | HTTP | 192.168.11.21:80 |
| rpc-ws-prv.d-bis.org | HTTP | 192.168.11.21:80 |
| rpc-ws-pub.d-bis.org | HTTP | 192.168.11.21:80 |
6. Add catch-all rule: **HTTP 404: Not Found** (must be last)
7. Save configuration
8. Wait 1-2 minutes for tunnel to reload
### Method 4: Run from Proxmox Network
If you have access to a machine on `192.168.11.0/24`:
```bash
# Copy script to that machine
scp fix-shared-tunnel.sh user@192.168.11.x:/tmp/
# SSH to that machine and run
ssh user@192.168.11.x
cd /tmp
chmod +x fix-shared-tunnel.sh
./fix-shared-tunnel.sh
```
### Method 5: Direct Container Access
If you can access the container directly:
```bash
# Create config file inside container
pct exec 102 -- bash << 'EOF'
cat > /etc/cloudflared/tunnel-services.yml << 'CONFIG'
tunnel: 10ab22da-8ea3-4e2e-a896-27ece2211a05
credentials-file: /etc/cloudflared/credentials-services.json
ingress:
- hostname: dbis-admin.d-bis.org
service: http://192.168.11.21:80
originRequest:
httpHostHeader: dbis-admin.d-bis.org
- hostname: dbis-api.d-bis.org
service: http://192.168.11.21:80
originRequest:
httpHostHeader: dbis-api.d-bis.org
- hostname: dbis-api-2.d-bis.org
service: http://192.168.11.21:80
originRequest:
httpHostHeader: dbis-api-2.d-bis.org
- hostname: mim4u.org.d-bis.org
service: http://192.168.11.21:80
originRequest:
httpHostHeader: mim4u.org.d-bis.org
- hostname: www.mim4u.org.d-bis.org
service: http://192.168.11.21:80
originRequest:
httpHostHeader: www.mim4u.org.d-bis.org
- hostname: rpc-http-prv.d-bis.org
service: http://192.168.11.21:80
originRequest:
httpHostHeader: rpc-http-prv.d-bis.org
- hostname: rpc-http-pub.d-bis.org
service: http://192.168.11.21:80
originRequest:
httpHostHeader: rpc-http-pub.d-bis.org
- hostname: rpc-ws-prv.d-bis.org
service: http://192.168.11.21:80
originRequest:
httpHostHeader: rpc-ws-prv.d-bis.org
- hostname: rpc-ws-pub.d-bis.org
service: http://192.168.11.21:80
originRequest:
httpHostHeader: rpc-ws-pub.d-bis.org
- service: http_status:404
metrics: 127.0.0.1:9090
loglevel: info
gracePeriod: 30s
CONFIG
chmod 600 /etc/cloudflared/tunnel-services.yml
EOF
```
## Verification
After applying any method:
```bash
# Check tunnel status in Cloudflare Dashboard
# Should change from DOWN to HEALTHY
# Test endpoints
curl -I https://dbis-admin.d-bis.org
curl -I https://rpc-http-pub.d-bis.org
curl -I https://dbis-api.d-bis.org
```
## Recommended Approach
**For Quick Fix**: Use **Method 3 (Cloudflare Dashboard)** - No SSH needed, immediate effect
**For Automation**: Use **Method 1 (SSH Tunnel)** - Scriptable, repeatable
**For Production**: Use **Method 2 (Manual Deployment)** - Most control, can review files first

View File

@@ -0,0 +1,460 @@
# MetaMask Troubleshooting Guide - ChainID 138
**Date**: $(date)
**Network**: SMOM-DBIS-138 (ChainID 138)
---
## 🔍 Common Issues & Solutions
### 1. Network Connection Issues
#### Issue: "Could not fetch chain ID. Is your RPC URL correct?"
**Symptoms**:
- MetaMask shows error: "Could not fetch chain ID. Is your RPC URL correct?"
- Network won't connect
- Can't fetch balance
**Root Cause**: The RPC endpoint is requiring JWT authentication, which MetaMask doesn't support.
**Solutions**:
1. **Remove and Re-add Network with Correct RPC URL**
- MetaMask → Settings → Networks
- Find "Defi Oracle Meta Mainnet" or "SMOM-DBIS-138"
- Click "Delete" or "Remove"
- Click "Add Network" → "Add a network manually"
- Enter these exact values:
- **Network Name**: `Defi Oracle Meta Mainnet`
- **RPC URL**: `https://rpc-http-pub.d-bis.org`
- **Chain ID**: `138` (must be decimal, not hex)
- **Currency Symbol**: `ETH`
- **Block Explorer URL**: `https://explorer.d-bis.org` (optional)
- Click "Save"
2. **If RPC URL Still Requires Authentication (Server Issue)**
- The public RPC endpoint should NOT require JWT authentication
- Contact network administrators to fix server configuration
- VMID 2502 should serve `rpc-http-pub.d-bis.org` WITHOUT authentication
- Check Nginx configuration on VMID 2502
3. **Verify RPC Endpoint is Working**
```bash
# Test if endpoint responds (should return chain ID 0x8a = 138)
curl -X POST https://rpc-http-pub.d-bis.org \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_chainId","params":[],"id":1}'
```
- **Expected**: `{"jsonrpc":"2.0","id":1,"result":"0x8a"}`
- **If you get JWT error**: Server needs to be reconfigured
#### Issue: "Network Error" or "Failed to Connect"
**Symptoms**:
- MetaMask shows "Network Error"
- Can't fetch balance
- Transactions fail immediately
**Solutions**:
1. **Verify RPC URL**
```
Correct: https://rpc-http-pub.d-bis.org
Incorrect: http://rpc-http-pub.d-bis.org (missing 's')
Incorrect: https://rpc-core.d-bis.org (deprecated/internal)
```
2. **Check Chain ID**
- Must be exactly `138` (decimal)
- Not `0x8a` (that's hex, but MetaMask expects decimal in manual entry)
- Verify in network settings
3. **Remove and Re-add Network**
- Settings → Networks → Remove the network
- Add network again with correct settings
- See [Quick Start Guide](./METAMASK_QUICK_START_GUIDE.md)
4. **Clear MetaMask Cache**
- Settings → Advanced → Reset Account (if needed)
- Or clear browser cache and reload MetaMask
5. **Check RPC Endpoint Status**
```bash
curl -X POST https://rpc-http-pub.d-bis.org \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'
```
---
### 2. Token Display Issues
#### Issue: "6,000,000,000.0T WETH" Instead of "6 WETH"
**Root Cause**: WETH9 contract's `decimals()` returns 0 instead of 18
**Solution**:
1. **Remove Token**
- Find WETH9 in token list
- Click token → "Hide token" or remove
2. **Re-import with Correct Decimals**
- Import tokens → Custom token
- Address: `0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2`
- Symbol: `WETH`
- **Decimals: `18`** ⚠️ **Critical: Must be 18**
3. **Verify Display**
- Should now show: "6 WETH" or "6.0 WETH"
- Not: "6,000,000,000.0T WETH"
**See**:
- [WETH9 Display Fix Instructions](./METAMASK_WETH9_FIX_INSTRUCTIONS.md)
- [MetaMask RPC Chain ID Error Fix](./METAMASK_RPC_CHAIN_ID_ERROR_FIX.md) - For "Could not fetch chain ID" errors
- [RPC Public Endpoint Routing](./RPC_PUBLIC_ENDPOINT_ROUTING.md) - Architecture and routing details
---
#### Issue: Token Not Showing Balance
**Symptoms**:
- Token imported but shows 0 balance
- Token doesn't appear in list
**Solutions**:
1. **Check Token Address**
- WETH9: `0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2`
- WETH10: `0xf4BB2e28688e89fCcE3c0580D37d36A7672E8A9f`
- Verify address is correct (case-sensitive)
2. **Verify You Have Tokens**
```bash
cast call 0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2 \
"balanceOf(address)" <YOUR_ADDRESS> \
--rpc-url https://rpc-http-pub.d-bis.org
```
3. **Refresh Token List**
- Click "Import tokens" → Refresh
- Or remove and re-add token
4. **Check Network**
- Ensure you're on ChainID 138
- Tokens are chain-specific
---
### 3. Transaction Issues
#### Issue: Transaction Stuck or Pending Forever
**Symptoms**:
- Transaction shows "Pending" for extended time
- No confirmation after hours
**Solutions**:
1. **Check Network Status**
- Verify RPC endpoint is responding
- Check block explorer for recent blocks
2. **Check Gas Price**
- May need to increase gas price
- Network may be congested
3. **Replace Transaction** (Same Nonce)
- Create new transaction with same nonce
- Higher gas price
- This cancels the old transaction
4. **Reset Nonce** (Last Resort)
- Settings → Advanced → Reset Account
- ⚠️ This clears transaction history
---
#### Issue: "Insufficient Funds for Gas"
**Symptoms**:
- Transaction fails immediately
- Error: "insufficient funds"
**Solutions**:
1. **Check ETH Balance**
- Need ETH for gas fees
- Gas costs vary (typically 0.001-0.01 ETH)
2. **Reduce Gas Limit** (If too high)
- MetaMask may estimate too high
- Try manual gas limit
3. **Get More ETH**
- Request from network administrators
- Bridge from another chain
- Use faucet (if available)
---
#### Issue: Transaction Reverted
**Symptoms**:
- Transaction confirmed but reverted
- Error in transaction details
**Solutions**:
1. **Check Transaction Details**
- View on block explorer
- Look for revert reason
2. **Common Revert Reasons**:
- Insufficient allowance (for token transfers)
- Contract logic error
- Invalid parameters
- Out of gas (rare, usually fails before)
3. **Verify Contract State**
- Check if contract is paused
- Verify you have permissions
- Check contract requirements
---
### 4. Price Feed Issues
#### Issue: Price Not Updating
**Symptoms**:
- Oracle price seems stale
- Price doesn't change
**Solutions**:
1. **Check Oracle Contract**
```bash
cast call 0x3304b747e565a97ec8ac220b0b6a1f6ffdb837e6 \
"latestRoundData()" \
--rpc-url https://rpc-http-pub.d-bis.org
```
2. **Verify `updatedAt` Timestamp**
- Should update every 60 seconds
- If > 5 minutes old, Oracle Publisher may be down
3. **Check Oracle Publisher Service**
- Service should be running (VMID 3500)
- Check service logs for errors
4. **Manual Price Query**
- Use Web3.js or Ethers.js to query directly
- See [Oracle Integration Guide](./METAMASK_ORACLE_INTEGRATION.md)
---
#### Issue: Price Returns Zero or Error
**Symptoms**:
- `latestRoundData()` returns 0
- Contract call fails
**Solutions**:
1. **Verify Contract Address**
- Oracle Proxy: `0x3304b747e565a97ec8ac220b0b6a1f6ffdb837e6`
- Ensure correct address
2. **Check Contract Deployment**
- Verify contract exists on ChainID 138
- Check block explorer
3. **Verify Network**
- Must be on ChainID 138
- Price feeds are chain-specific
---
### 5. Network Switching Issues
#### Issue: Can't Switch to ChainID 138
**Symptoms**:
- Network doesn't appear in list
- Switch fails
**Solutions**:
1. **Add Network Manually**
- See [Quick Start Guide](./METAMASK_QUICK_START_GUIDE.md)
- Ensure all fields are correct
2. **Programmatic Addition** (For dApps)
```javascript
try {
await window.ethereum.request({
method: 'wallet_switchEthereumChain',
params: [{ chainId: '0x8a' }], // 138 in hex
});
} catch (switchError) {
// Network doesn't exist, add it
if (switchError.code === 4902) {
await window.ethereum.request({
method: 'wallet_addEthereumChain',
params: [networkConfig],
});
}
}
```
3. **Clear Network Cache**
- Remove network
- Re-add with correct settings
---
### 6. Account Issues
#### Issue: Wrong Account Connected
**Symptoms**:
- Different address than expected
- Can't see expected balance
**Solutions**:
1. **Switch Account in MetaMask**
- Click account icon
- Select correct account
2. **Import Account** (If needed)
- Settings → Import Account
- Use private key or seed phrase
3. **Verify Address**
- Check address matches expected
- Addresses are case-insensitive but verify format
---
#### Issue: Account Not Showing Balance
**Symptoms**:
- Account connected but balance is 0
- Expected to have ETH/tokens
**Solutions**:
1. **Verify Network**
- Must be on ChainID 138
- Balances are chain-specific
2. **Check Address**
- Verify correct address
- Check on block explorer
3. **Refresh Balance**
- Click refresh icon in MetaMask
- Or switch networks and switch back
---
## 🔧 Advanced Troubleshooting
### Enable Debug Mode
**MetaMask Settings**:
1. Settings → Advanced
2. Enable "Show Hex Data"
3. Enable "Enhanced Gas Fee UI"
4. Check browser console for errors
### Check Browser Console
**Open Console**:
- Chrome/Edge: F12 → Console
- Firefox: F12 → Console
- Safari: Cmd+Option+I → Console
**Look For**:
- RPC errors
- Network errors
- JavaScript errors
- MetaMask-specific errors
### Verify RPC Response
**Test RPC Endpoint**:
```bash
curl -X POST https://rpc-http-pub.d-bis.org \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "eth_blockNumber",
"params": [],
"id": 1
}'
```
**Expected Response**:
```json
{
"jsonrpc": "2.0",
"id": 1,
"result": "0x..."
}
```
---
## 📞 Getting Help
### Resources
1. **Documentation**:
- [Quick Start Guide](./METAMASK_QUICK_START_GUIDE.md)
- [Full Integration Requirements](./METAMASK_FULL_INTEGRATION_REQUIREMENTS.md)
- [Oracle Integration](./METAMASK_ORACLE_INTEGRATION.md)
2. **Block Explorer**:
- `https://explorer.d-bis.org`
- Check transactions, contracts, addresses
3. **Network Status**:
- RPC: `https://rpc-http-pub.d-bis.org` (public, no auth required)
- Permissioned RPC: `https://rpc-http-prv.d-bis.org` (requires JWT auth)
- Verify endpoint is responding
### Information to Provide When Reporting Issues
1. **MetaMask Version**: Settings → About
2. **Browser**: Chrome/Firefox/Safari + version
3. **Network**: ChainID 138
4. **Error Message**: Exact error text
5. **Steps to Reproduce**: What you did before error
6. **Console Errors**: Any JavaScript errors
7. **Transaction Hash**: If transaction-related
---
## ✅ Quick Diagnostic Checklist
Run through this checklist when troubleshooting:
- [ ] Network is "Defi Oracle Meta Mainnet" or "SMOM-DBIS-138" (ChainID 138)
- [ ] RPC URL is `https://rpc-http-pub.d-bis.org` (public endpoint, no auth)
- [ ] Chain ID is `138` (decimal, not hex)
- [ ] RPC endpoint does NOT require JWT authentication
- [ ] Account is connected and correct
- [ ] Sufficient ETH for gas fees
- [ ] Token decimals are correct (18 for WETH)
- [ ] Browser console shows no errors
- [ ] RPC endpoint is responding
- [ ] Block explorer shows recent blocks
---
**Last Updated**: $(date)

View File

@@ -0,0 +1,115 @@
# Solution: Fix Tunnels Without SSH Access
## Problem
- All 6 Cloudflare tunnels are DOWN
- Cannot access Proxmox network via SSH (network segmentation)
- SSH tunnel setup fails (can't connect to establish tunnel)
## Solution: Cloudflare Dashboard ⭐ EASIEST
**No SSH needed!** Configure tunnels directly in Cloudflare Dashboard.
### Step-by-Step
1. **Access Dashboard**
- Go to: https://one.dash.cloudflare.com/
- Sign in
- Navigate to: **Zero Trust****Networks****Tunnels**
2. **For Each Tunnel** (6 total):
- Click on tunnel name
- Click **Configure** button
- Go to **Public Hostnames** tab
- Add/Edit hostname configurations
- Save
3. **Wait 1-2 Minutes**
- Tunnels should reconnect automatically
- Status should change from **DOWN** to **HEALTHY**
### Tunnel Configuration Details
#### Shared Tunnel (Most Important)
**Tunnel**: `rpc-http-pub.d-bis.org` (ID: `10ab22da-8ea3-4e2e-a896-27ece2211a05`)
**Add these 9 hostnames** (all pointing to `http://192.168.11.21:80`):
- `dbis-admin.d-bis.org`
- `dbis-api.d-bis.org`
- `dbis-api-2.d-bis.org`
- `mim4u.org.d-bis.org`
- `www.mim4u.org.d-bis.org`
- `rpc-http-prv.d-bis.org`
- `rpc-http-pub.d-bis.org`
- `rpc-ws-prv.d-bis.org`
- `rpc-ws-pub.d-bis.org`
**Important**: Add catch-all rule (HTTP 404) as the LAST entry.
#### Proxmox Tunnels
Each needs one hostname pointing to HTTPS:
| Tunnel | Hostname | Target |
|--------|----------|--------|
| tunnel-ml110 | ml110-01.d-bis.org | https://192.168.11.10:8006 |
| tunnel-r630-01 | r630-01.d-bis.org | https://192.168.11.11:8006 |
| tunnel-r630-02 | r630-02.d-bis.org | https://192.168.11.12:8006 |
**Options**: Enable "No TLS Verify" (Proxmox uses self-signed certs)
#### Other Tunnels
- `explorer.d-bis.org``http://192.168.11.21:80`
- `mim4u-tunnel``http://192.168.11.21:80`
## Why This Works
Cloudflare tunnels use **outbound connections** from your infrastructure to Cloudflare. The configuration in the dashboard tells Cloudflare how to route traffic. Even if the tunnel connector (cloudflared) is down, once it reconnects, it will use the dashboard configuration.
## If Dashboard Method Doesn't Work
If tunnels remain DOWN after dashboard configuration, the tunnel connector (cloudflared in VMID 102) is likely not running. You need physical/network access to:
### Option 1: Physical Access to Proxmox Host
```bash
# Direct console access to 192.168.11.12
pct start 102
pct exec 102 -- systemctl start cloudflared-*
pct exec 102 -- systemctl status cloudflared-*
```
### Option 2: VPN Access
If you have VPN access to `192.168.11.0/24` network:
```bash
# Connect via VPN first, then:
ssh root@192.168.11.12 "pct start 102"
ssh root@192.168.11.12 "pct exec 102 -- systemctl start cloudflared-*"
```
### Option 3: Cloudflare Tunnel Token Method
If you can get new tunnel tokens from Cloudflare Dashboard:
1. Go to tunnel → Configure
2. Download new token/credentials
3. Deploy to container (requires access)
## Verification
After configuring in dashboard:
```bash
# Wait 1-2 minutes, then test:
curl -I https://ml110-01.d-bis.org
curl -I https://r630-01.d-bis.org
curl -I https://explorer.d-bis.org
curl -I https://rpc-http-pub.d-bis.org
```
## Summary
**Best Method**: Cloudflare Dashboard (no SSH needed)
⚠️ **If that fails**: Need physical/network access to start container
📋 **All tunnel IDs and configs**: See generated files in `/tmp/tunnel-fix-manual-*/`

View File

@@ -0,0 +1,165 @@
# R630-04 Authentication Issue
**IP:** 192.168.11.14
**User:** root
**Status:** ❌ Permission denied with password authentication
---
## Current Situation
- **SSH Port:** ✅ Open and accepting connections (port 22)
- **Authentication Methods Offered:** `publickey,password`
- **Password Auth:** ❌ Failing (permission denied)
- **Public Key Auth:** ⚠️ Not configured
---
## Debug Information
From SSH verbose output:
```
debug1: Authentications that can continue: publickey,password
debug1: Next authentication method: publickey
debug1: Authentications that can continue: publickey,password
debug1: Next authentication method: password
Permission denied, please try again.
```
This shows:
- Server accepts both authentication methods
- Public key auth tried first (no keys configured)
- Password auth attempted but rejected
---
## Possible Solutions
### Option 1: Verify Password
Double-check the password. Common issues:
- Typos (especially with special characters like `@`)
- Caps Lock
- Wrong password entirely
- Password changed since last successful login
### Option 2: Connect from R630-03
Since R630-03 works, try:
```bash
# Connect to R630-03 first
ssh root@192.168.11.13
# Password: L@kers2010
# Then from R630-03, connect to R630-04
ssh root@192.168.11.14
# Try password: L@kers2010
```
Sometimes connecting from within the same network helps.
### Option 3: Use Console Access
If you have physical/console access to R630-04:
1. **Physical Console** - Connect KVM/keyboard directly
2. **iDRAC/iLO** - Use Dell's remote management (if available)
3. **Serial Console** - If configured
From console:
```bash
# Check SSH configuration
cat /etc/ssh/sshd_config | grep -E "PasswordAuthentication|PermitRootLogin"
# Reset root password
passwd root
# Check account status
passwd -S root
lastb | grep root | tail -10 # Check failed login attempts
```
### Option 4: Set Up SSH Key Authentication
If you can access R630-04 through another method (console, Proxmox host, etc.):
**Generate SSH key:**
```bash
# On your local machine
ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519_r630-04 -N ""
```
**Copy public key to R630-04:**
```bash
# If you have console access to R630-04
cat ~/.ssh/id_ed25519_r630-04.pub
# Then on R630-04:
mkdir -p /root/.ssh
chmod 700 /root/.ssh
echo "PASTE_PUBLIC_KEY_HERE" >> /root/.ssh/authorized_keys
chmod 600 /root/.ssh/authorized_keys
```
**Connect with key:**
```bash
ssh -i ~/.ssh/id_ed25519_r630-04 root@192.168.11.14
```
### Option 5: Check if Password Was Changed
If you have access to another Proxmox host that manages R630-04, or have documentation, verify:
- When was the password last changed?
- Is there a password management system?
- Are there multiple root accounts or users?
---
## Quick Checklist
- [ ] Try password again carefully (check for typos)
- [ ] Try connecting from R630-03
- [ ] Check if password was changed
- [ ] Try console/iDRAC access
- [ ] Check if SSH keys are set up
- [ ] Verify you're using the correct username (root)
---
## If You Have Console Access
Once you can access the console, run:
```bash
# Reset root password
passwd root
# Verify SSH configuration allows password auth
grep -E "^PasswordAuthentication|^#PasswordAuthentication" /etc/ssh/sshd_config
# Should show:
# PasswordAuthentication yes
# OR (commented out means yes by default)
# #PasswordAuthentication yes
# If it shows "PasswordAuthentication no", change it:
sed -i 's/^PasswordAuthentication no/PasswordAuthentication yes/' /etc/ssh/sshd_config
systemctl restart sshd
# Check root account status
passwd -S root
# Check for locked account
usermod -U root # Unlock if locked
```
---
## Next Steps
1. **Try password one more time** - Make sure Caps Lock is off, type carefully
2. **Try from R630-03** - Network path might matter
3. **Get console access** - Physical KVM or iDRAC
4. **Check password documentation** - Verify if password was changed
5. **Set up SSH keys** - More secure and reliable long-term solution

View File

@@ -0,0 +1,256 @@
# R630-04 Console Access Guide
**IP:** 192.168.11.14
**Status:** Console access available
**Tasks:** Reset password, fix pveproxy, verify web interface
---
## Step 1: Login via Console
Log in to R630-04 using your console access (physical keyboard, iDRAC KVM, etc.)
---
## Step 2: Check Current Status
Once logged in, run these commands to understand the current state:
```bash
# Check hostname
hostname
cat /etc/hostname
# Check Proxmox version
pveversion
# Check pveproxy service status
systemctl status pveproxy --no-pager -l
# Check recent pveproxy logs
journalctl -u pveproxy --no-pager -n 50
# Check if port 8006 is listening
ss -tlnp | grep 8006
```
---
## Step 3: Reset Root Password
Set a password for root (you can use `L@kers2010` to match other hosts, or choose a different one):
```bash
passwd root
# Enter new password twice when prompted
```
**Recommended:** Use `L@kers2010` to match R630-03 and ml110 for consistency.
---
## Step 4: Fix pveproxy Service
### 4.1 Check Service Status
```bash
systemctl status pveproxy --no-pager -l | head -40
```
### 4.2 Check Logs for Errors
```bash
journalctl -u pveproxy --no-pager -n 100 | grep -i error
journalctl -u pveproxy --no-pager -n 100 | tail -50
```
### 4.3 Restart pveproxy
```bash
systemctl restart pveproxy
sleep 3
systemctl status pveproxy --no-pager | head -20
```
### 4.4 Check if Port 8006 is Now Listening
```bash
ss -tlnp | grep 8006
```
Should show something like:
```
LISTEN 0 128 0.0.0.0:8006 0.0.0.0:* users:(("pveproxy",pid=1234,fd=6))
```
---
## Step 5: If pveproxy Still Fails
### 5.1 Check All Proxmox Services
```bash
systemctl list-units --type=service --all | grep -E 'pveproxy|pvedaemon|pve-cluster|pvestatd'
systemctl status pvedaemon --no-pager | head -20
systemctl status pve-cluster --no-pager | head -20
```
### 5.2 Restart All Proxmox Services
```bash
systemctl restart pveproxy pvedaemon pvestatd pve-cluster
sleep 5
systemctl status pveproxy --no-pager | head -20
```
### 5.3 Check for Port Conflicts
```bash
# Check if something else is using port 8006
lsof -i :8006
ss -tlnp | grep 8006
```
### 5.4 Check Disk Space
```bash
df -h
# Low disk space can cause service issues
```
### 5.5 Check Log Directory Permissions
```bash
ls -la /var/log/pveproxy/
# Should be owned by root:root
```
### 5.6 Check Proxmox Cluster Status (if in cluster)
```bash
pvecm status
```
---
## Step 6: Verify Web Interface Works
### 6.1 Test Locally
```bash
# Test HTTPS connection locally
curl -k https://localhost:8006 | head -20
# Should return HTML (Proxmox login page)
```
### 6.2 Test from Another Host
From another machine on the network:
```bash
# Test from R630-03 or your local machine
curl -k https://192.168.11.14:8006 | head -20
```
### 6.3 Open in Browser
Open in web browser:
```
https://192.168.11.14:8006
```
You should see the Proxmox login page.
---
## Step 7: Document Password
Once password is set and everything works, document it:
1. Update `docs/PROXMOX_HOST_PASSWORDS.md` with R630-04 password
2. Update `INFRASTRUCTURE_OVERVIEW_COMPLETE.md` with correct status
---
## Quick Command Reference
Copy-paste these commands in order:
```bash
# 1. Check status
hostname
pveversion
systemctl status pveproxy --no-pager -l | head -30
# 2. Reset password
passwd root
# Enter: L@kers2010 (or your chosen password)
# 3. Fix pveproxy
systemctl restart pveproxy
sleep 3
systemctl status pveproxy --no-pager | head -20
ss -tlnp | grep 8006
# 4. If still failing, restart all services
systemctl restart pveproxy pvedaemon pvestatd
systemctl status pveproxy --no-pager | head -20
# 5. Test web interface
curl -k https://localhost:8006 | head -10
```
---
## Expected Results
After completing these steps:
✅ Root password set and documented
✅ pveproxy service running
✅ Port 8006 listening
✅ Web interface accessible at https://192.168.11.14:8006
✅ SSH access working with new password
---
## If Issues Persist
If pveproxy still fails after restart:
1. **Check for specific error messages:**
```bash
journalctl -u pveproxy --no-pager -n 200 | grep -i "error\|fail\|exit"
```
2. **Check Proxmox installation:**
```bash
dpkg -l | grep proxmox
pveversion -v
```
3. **Reinstall pveproxy (if needed):**
```bash
apt update
apt install --reinstall pveproxy
systemctl restart pveproxy
```
4. **Check system resources:**
```bash
free -h
df -h
top -bn1 | head -20
```
---
**Once you're done, let me know:**
1. What password you set
2. Whether pveproxy is working
3. If the web interface is accessible
4. Any error messages you encountered
I'll update the documentation accordingly!

View File

@@ -0,0 +1,185 @@
# R630-04 Proxmox Troubleshooting Guide
**IP Address:** 192.168.11.14
**Proxmox Version:** 6.17.2-1-PVE
**Issue:** pveproxy worker exit (web interface not accessible on port 8006)
---
## Problem Summary
- Proxmox VE is installed (version 6.17.2-1-PVE)
- SSH access works (port 22)
- Web interface not accessible (port 8006)
- pveproxy workers are crashing/exiting
---
## Diagnostic Steps
### 1. Check pveproxy Service Status
```bash
systemctl status pveproxy --no-pager -l
```
Look for:
- Service state (should be "active (running)")
- Worker process exits
- Error messages
### 2. Check Recent Logs
```bash
journalctl -u pveproxy --no-pager -n 100
```
Look for:
- Worker exit messages
- Error patterns
- Stack traces
### 3. Check Port 8006
```bash
ss -tlnp | grep 8006
# or
netstat -tlnp | grep 8006
```
Should show pveproxy listening on port 8006.
### 4. Check Proxmox Cluster Status
```bash
pvecm status
```
If in a cluster, verify cluster connectivity.
---
## Common Fixes
### Fix 1: Restart pveproxy Service
```bash
systemctl restart pveproxy
systemctl status pveproxy
```
### Fix 2: Check and Fix Configuration
```bash
# Check configuration files
ls -la /etc/pveproxy/
cat /etc/default/pveproxy 2>/dev/null
# Check for syntax errors
pveproxy --help
```
### Fix 3: Reinstall pveproxy Package
```bash
apt update
apt install --reinstall pveproxy
systemctl restart pveproxy
```
### Fix 4: Check for Port Conflicts
```bash
# Find what's using port 8006
ss -tlnp | grep 8006
lsof -i :8006
# If something else is using it, stop that service
```
### Fix 5: Check Disk Space and Permissions
```bash
# Check disk space
df -h
# Check log directory permissions
ls -la /var/log/pveproxy/
# Should be owned by root:root with appropriate permissions
```
### Fix 6: Check for Corrupted Database
```bash
# Check Proxmox database
pveversion -v
# Check cluster database (if in cluster)
systemctl status pve-cluster
```
### Fix 7: Full Service Restart
```bash
# Restart all Proxmox services
systemctl restart pveproxy pvedaemon pvestatd pve-cluster
systemctl status pveproxy pvedaemon pvestatd pve-cluster
```
---
## Advanced Troubleshooting
### View Real-time Logs
```bash
journalctl -u pveproxy -f
```
### Check Worker Process Details
```bash
# See running pveproxy processes
ps aux | grep pveproxy
# Check process limits
cat /proc/$(pgrep -f pveproxy | head -1)/limits
```
### Test pveproxy Manually
```bash
# Stop service
systemctl stop pveproxy
# Try running manually to see errors
/usr/bin/pveproxy start
```
---
## Scripts Available
1. **check-r630-04-commands.sh** - Diagnostic commands
2. **fix-r630-04-pveproxy.sh** - Automated fix script
---
## Expected Resolution
After fixing:
- `systemctl status pveproxy` should show "active (running)"
- `ss -tlnp | grep 8006` should show pveproxy listening
- Web interface should be accessible at `https://192.168.11.14:8006`
---
## Additional Resources
- Proxmox VE Documentation: https://pve.proxmox.com/pve-docs/
- Proxmox Forum: https://forum.proxmox.com/
- Log locations:
- `/var/log/pveproxy/access.log`
- `/var/log/pveproxy/error.log`
- `journalctl -u pveproxy`

View File

@@ -0,0 +1,329 @@
# Security Incident Response Procedures
**Last Updated:** 2025-01-20
**Document Version:** 1.0
**Status:** Active Documentation
---
## Overview
This document outlines procedures for responding to security incidents, including detection, containment, eradication, recovery, and post-incident activities.
---
## Incident Response Phases
### Phase 1: Preparation
**Pre-Incident Activities:**
1. **Incident Response Team:**
- Define roles and responsibilities
- Establish communication channels
- Create contact list
2. **Tools and Resources:**
- Log collection and analysis tools
- Forensic tools
- Backup systems
- Documentation
3. **Procedures:**
- Incident classification
- Escalation procedures
- Communication templates
---
### Phase 2: Detection and Analysis
#### Detection Methods
1. **Automated Detection:**
- Intrusion detection systems (IDS)
- Security information and event management (SIEM)
- Log analysis
- Anomaly detection
2. **Manual Detection:**
- User reports
- System administrator observations
- Security audits
#### Incident Classification
**Severity Levels:**
- **Critical:** Active breach, data exfiltration, system compromise
- **High:** Unauthorized access, potential data exposure
- **Medium:** Suspicious activity, policy violations
- **Low:** Minor security events, false positives
#### Initial Analysis
**Information Gathering:**
1. **What Happened:**
- Timeline of events
- Affected systems
- Indicators of compromise (IOCs)
2. **Who/What:**
- Source of attack
- Attack vector
- Tools used
3. **Impact Assessment:**
- Data accessed/modified
- Systems compromised
- Business impact
---
### Phase 3: Containment
#### Short-Term Containment
**Immediate Actions:**
1. **Isolate Affected Systems:**
```bash
# Disable network interface
ip link set <interface> down
# Block IP addresses
iptables -A INPUT -s <attacker-ip> -j DROP
```
2. **Preserve Evidence:**
- Take snapshots of affected systems
- Copy logs
- Document current state
3. **Disable Compromised Accounts:**
```bash
# Disable user account
usermod -L <username>
# Revoke API tokens
# Via Proxmox UI: Datacenter → Permissions → API Tokens
```
#### Long-Term Containment
**System Hardening:**
1. **Update Security Controls:**
- Patch vulnerabilities
- Update firewall rules
- Enhance monitoring
2. **Access Control:**
- Review user accounts
- Rotate credentials
- Implement MFA where possible
---
### Phase 4: Eradication
#### Remove Threat
**Actions:**
1. **Remove Malware:**
```bash
# Scan for malware
clamscan -r /path/to/scan
# Remove infected files
# (after verification)
```
2. **Close Attack Vectors:**
- Patch vulnerabilities
- Fix misconfigurations
- Update security policies
3. **Clean Compromised Systems:**
- Rebuild from known-good backups
- Verify system integrity
- Reinstall if necessary
---
### Phase 5: Recovery
#### System Restoration
**Steps:**
1. **Restore from Backups:**
- Use pre-incident backups
- Verify backup integrity
- Restore systems
2. **Verify System Integrity:**
- Check system logs
- Verify configurations
- Test functionality
3. **Monitor Systems:**
- Enhanced monitoring
- Watch for re-infection
- Track system behavior
#### Service Restoration
**Gradual Restoration:**
1. **Priority Systems First:**
- Critical services
- Business-critical applications
- User-facing services
2. **Verification:**
- Test each service
- Verify data integrity
- Confirm functionality
---
### Phase 6: Post-Incident Activity
#### Lessons Learned
**Post-Incident Review:**
1. **Timeline Review:**
- Document complete timeline
- Identify gaps in response
- Note what worked well
2. **Root Cause Analysis:**
- Identify root cause
- Determine contributing factors
- Document findings
3. **Improvements:**
- Update procedures
- Enhance security controls
- Improve monitoring
#### Documentation
**Incident Report:**
1. **Executive Summary:**
- Incident overview
- Impact assessment
- Response timeline
2. **Technical Details:**
- Attack vector
- IOCs
- Remediation steps
3. **Recommendations:**
- Security improvements
- Process improvements
- Training needs
---
## Incident Response Contacts
### Primary Contacts
- **Security Team Lead:** [Contact Information]
- **Infrastructure Lead:** [Contact Information]
- **Management:** [Contact Information]
### Escalation
- **Level 1:** Security team (immediate)
- **Level 2:** Management (1 hour)
- **Level 3:** External security firm (4 hours)
---
## Common Incident Scenarios
### Unauthorized Access
**Symptoms:**
- Unknown logins
- Unusual account activity
- Failed login attempts
**Response:**
1. Disable compromised accounts
2. Review access logs
3. Change all passwords
4. Investigate source
### Malware Infection
**Symptoms:**
- Unusual system behavior
- High CPU/memory usage
- Network anomalies
**Response:**
1. Isolate affected systems
2. Identify malware
3. Remove malware
4. Restore from backup if needed
### Data Breach
**Symptoms:**
- Unauthorized data access
- Data exfiltration
- Database anomalies
**Response:**
1. Contain breach
2. Assess data exposure
3. Notify affected parties (if required)
4. Enhance security controls
---
## Prevention
### Security Best Practices
1. **Regular Updates:**
- Keep systems patched
- Update security tools
- Review configurations
2. **Monitoring:**
- Log analysis
- Anomaly detection
- Regular audits
3. **Access Control:**
- Least privilege principle
- MFA where possible
- Regular access reviews
4. **Backups:**
- Regular backups
- Test restores
- Offsite backups
---
## Related Documentation
- **[DISASTER_RECOVERY.md](../03-deployment/DISASTER_RECOVERY.md)** - Disaster recovery procedures
- **[BACKUP_AND_RESTORE.md](../03-deployment/BACKUP_AND_RESTORE.md)** - Backup procedures
- **[TROUBLESHOOTING_FAQ.md](TROUBLESHOOTING_FAQ.md)** - General troubleshooting
---
**Last Updated:** 2025-01-20
**Review Cycle:** Quarterly

View File

@@ -0,0 +1,113 @@
# Storage Migration Issue - pve2 Configuration
**Date**: $(date)
**Issue**: Container migrations failing due to storage configuration mismatch
## Problem
Container migrations from ml110 to pve2 are failing with the error:
```
Volume group "pve" not found
ERROR: storage migration for 'local-lvm:vm-XXXX-disk-0' to storage 'local-lvm' failed
```
## Root Cause
**ml110** (source):
- Has `local-lvm` storage **active**
- Uses volume group named **"pve"** (standard Proxmox setup)
- Containers stored on `local-lvm:vm-XXXX-disk-0`
**pve2** (target):
- Has `local-lvm` storage but it's **INACTIVE**
- Has volume groups named **lvm1, lvm2, lvm3, lvm4, lvm5, lvm6** instead of "pve"
- Storage is not properly configured for Proxmox
## Storage Status
### ml110 Storage
```
local-lvm: lvmthin, active, 832GB total, 108GB used
Volume Group: pve (standard)
```
### pve2 Storage
```
local-lvm: lvmthin, INACTIVE, 0GB available
Volume Groups: lvm1, lvm2, lvm3, lvm4, lvm5, lvm6 (non-standard)
```
## Solutions
### Option 1: Configure pve2's local-lvm Storage (Recommended)
1. **Rename/create "pve" volume group on pve2**:
```bash
# On pve2, check current LVM setup
ssh root@192.168.11.12 "vgs; lvs"
# Rename one of the volume groups to "pve" (if possible)
# OR create a new "pve" volume group from available space
```
2. **Activate local-lvm storage on pve2**:
```bash
# Check storage configuration
ssh root@192.168.11.12 "cat /etc/pve/storage.cfg"
# May need to reconfigure local-lvm to use correct volume group
```
### Option 2: Migrate to Different Storage on pve2
Use `local` (directory storage) instead of `local-lvm`:
```bash
# Migrate with storage specification
pct migrate <VMID> pve2 --storage local --restart
```
**Pros**: Works immediately, no storage reconfiguration needed
**Cons**: Directory storage is slower than LVM thin provisioning
### Option 3: Use Shared Storage
Configure shared storage (NFS, Ceph, etc.) accessible from both nodes:
```bash
# Add shared storage to cluster
# Then migrate containers to shared storage
```
## Immediate Workaround
Until pve2's local-lvm is properly configured, we can:
1. **Skip migrations** for now
2. **Configure pve2 storage** first
3. **Then proceed with migrations**
## Next Steps
1. ⏳ Investigate pve2's LVM configuration
2. ⏳ Configure local-lvm storage on pve2 with "pve" volume group
3. ⏳ Verify storage is active and working
4. ⏳ Retry container migrations
## Verification Commands
```bash
# Check pve2 storage status
ssh root@192.168.11.12 "pvesm status"
# Check volume groups
ssh root@192.168.11.12 "vgs"
# Check local-lvm configuration
ssh root@192.168.11.12 "cat /etc/pve/storage.cfg | grep -A 5 local-lvm"
```
---
**Status**: ⚠️ Migrations paused pending storage configuration fix

View File

@@ -4,12 +4,16 @@ Common issues and solutions for Besu validated set deployment.
## Table of Contents
1. [Container Issues](#container-issues)
2. [Service Issues](#service-issues)
3. [Network Issues](#network-issues)
4. [Consensus Issues](#consensus-issues)
5. [Configuration Issues](#configuration-issues)
6. [Performance Issues](#performance-issues)
**Estimated Reading Time:** 30 minutes
**Progress:** Check off sections as you read
1. [Container Issues](#container-issues) - *Container troubleshooting*
2. ✅ [Service Issues](#service-issues) - *Service troubleshooting*
3. [Network Issues](#network-issues) - *Network troubleshooting*
4. ✅ [Consensus Issues](#consensus-issues) - *Consensus troubleshooting*
5. ✅ [Configuration Issues](#configuration-issues) - *Configuration troubleshooting*
6. ✅ [Performance Issues](#performance-issues) - *Performance troubleshooting*
7. ✅ [Additional Common Questions](#additional-common-questions) - *More FAQs*
---
@@ -43,6 +47,27 @@ pct start <vmid>
- Invalid container configuration
- OS template issues
<details>
<summary>Click to expand advanced troubleshooting steps</summary>
**Advanced Diagnostics:**
```bash
# Check container resources
pct list --full | grep <vmid>
# Check Proxmox host resources
free -h
df -h
# Check container logs in detail
journalctl -u pve-container@<vmid> -n 100 --no-pager
# Verify container template
pveam list | grep <template-name>
```
</details>
---
### Q: Container runs out of disk space
@@ -483,6 +508,187 @@ If issues persist:
---
## Additional Common Questions
### Q: How do I add a new VMID?
**Answer:**
1. Check available VMID ranges in [VMID_ALLOCATION_FINAL.md](../02-architecture/VMID_ALLOCATION_FINAL.md)
2. Select an appropriate VMID from the designated range for your service
3. Verify the VMID is not already in use: `pct list | grep <vmid>` or `qm list | grep <vmid>`
4. Document the assignment in VMID_ALLOCATION_FINAL.md
5. Use the VMID when creating containers/VMs
**Example:**
```bash
# Check if VMID 2503 is available
pct list | grep 2503
qm list | grep 2503
# If available, create container with VMID 2503
pct create 2503 ...
```
**Related Documentation:**
- [VMID Allocation Registry](../02-architecture/VMID_ALLOCATION_FINAL.md) ⭐⭐⭐
- [VMID Quick Reference](../12-quick-reference/VMID_QUICK_REFERENCE.md) ⭐⭐⭐
---
### Q: What's the difference between public and private RPC?
**Answer:**
| Feature | Public RPC | Private RPC |
|---------|-----------|-------------|
| **Discovery** | Enabled | Disabled |
| **Permissioning** | Disabled | Enabled |
| **Access** | Public (CORS: *) | Restricted (internal only) |
| **APIs** | ETH, NET, WEB3 (read-only) | ETH, NET, WEB3, ADMIN, DEBUG (full) |
| **Use Case** | dApps, external users | Internal services, admin |
| **ChainID** | 0x8a (138) or 0x1 (wallet compatibility) | 0x8a (138) |
| **Domain** | rpc-http-pub.d-bis.org | rpc-http-prv.d-bis.org |
**Public RPC:**
- Accessible from the internet
- Used by dApps and external tools
- Read-only APIs for security
- May report chainID 0x1 for MetaMask compatibility
**Private RPC:**
- Internal network only
- Used by internal services and administration
- Full API access including ADMIN and DEBUG
- Strict permissioning and access control
**Related Documentation:**
- [RPC Node Types Architecture](../05-network/RPC_NODE_TYPES_ARCHITECTURE.md) ⭐⭐
- [RPC Template Types](../05-network/RPC_TEMPLATE_TYPES.md) ⭐
---
### Q: How do I troubleshoot Cloudflare tunnel issues?
**Answer:**
**Step 1: Check Tunnel Status**
```bash
# Check cloudflared container status
pct status 102
# Check tunnel logs
pct logs 102 --tail 50
# Verify tunnel is running
pct exec 102 -- ps aux | grep cloudflared
```
**Step 2: Verify Configuration**
```bash
# Check tunnel configuration
pct exec 102 -- cat /etc/cloudflared/config.yaml
# Verify credentials file exists
pct exec 102 -- ls -la /etc/cloudflared/*.json
```
**Step 3: Test Connectivity**
```bash
# Test from internal network
curl -I http://192.168.11.21:80
# Test from external (through Cloudflare)
curl -I https://explorer.d-bis.org
```
**Step 4: Check Cloudflare Dashboard**
- Verify tunnel is healthy in Cloudflare Zero Trust dashboard
- Check ingress rules are configured correctly
- Verify DNS records point to tunnel
**Common Issues:**
- Tunnel not running → Restart: `pct restart 102`
- Configuration error → Check YAML syntax
- Credentials invalid → Regenerate tunnel token
- DNS not resolving → Check Cloudflare DNS settings
**Related Documentation:**
- [Cloudflare Tunnel Routing Architecture](../05-network/CLOUDFLARE_TUNNEL_ROUTING_ARCHITECTURE.md) ⭐⭐⭐
- [Cloudflare Routing Master Reference](../05-network/CLOUDFLARE_ROUTING_MASTER.md) ⭐⭐⭐
- [Troubleshooting Quick Reference](../12-quick-reference/TROUBLESHOOTING_QUICK_REFERENCE.md) ⭐⭐⭐
---
### Q: What's the recommended storage configuration?
**Answer:**
**For R630 Compute Nodes:**
- **Boot drives (2×600GB):** ZFS mirror (recommended) or hardware RAID1
- **Data SSDs (6×250GB):** ZFS pool with one of:
- Striped mirrors (if pairs available)
- RAIDZ1 (single parity, 5 drives usable)
- RAIDZ2 (double parity, 4 drives usable)
- **High-write workloads:** Dedicated dataset with quotas
**For ML110 Management Node:**
- Standard Proxmox storage configuration
- Sufficient space for templates and backups
**Storage Best Practices:**
- Use ZFS for data integrity and snapshots
- Enable compression for space efficiency
- Set quotas for containers to prevent disk exhaustion
- Regular backups to external storage
**Related Documentation:**
- [Network Architecture - Storage Orchestration](../02-architecture/NETWORK_ARCHITECTURE.md#53-storage-orchestration-r630) ⭐⭐⭐
- [Backup and Restore](../03-deployment/BACKUP_AND_RESTORE.md) ⭐⭐
---
### Q: How do I migrate from flat LAN to VLANs?
**Answer:**
**Phase 1: Preparation**
1. Review VLAN plan in [NETWORK_ARCHITECTURE.md](../02-architecture/NETWORK_ARCHITECTURE.md)
2. Document current IP assignments
3. Plan IP address migration for each service
4. Create rollback plan
**Phase 2: Network Configuration**
1. Configure ES216G switches with VLAN trunks
2. Enable VLAN-aware bridge on Proxmox hosts
3. Create VLAN interfaces on ER605 router
4. Test VLAN connectivity
**Phase 3: Service Migration**
1. Migrate services one VLAN at a time
2. Start with non-critical services
3. Update container/VM network configuration
4. Verify connectivity after each migration
**Phase 4: Validation**
1. Test all services on new VLANs
2. Verify routing between VLANs
3. Test egress NAT pools
4. Document final configuration
**Migration Order (Recommended):**
1. Management services (VLAN 11) - Already active
2. Monitoring/observability (VLAN 120, 121)
3. Besu network (VLANs 110, 111, 112)
4. CCIP network (VLANs 130, 132, 133, 134)
5. Service layer (VLAN 160)
6. Sovereign tenants (VLANs 200-203)
**Related Documentation:**
- [Network Architecture - VLAN Orchestration](../02-architecture/NETWORK_ARCHITECTURE.md#3-layer-2--vlan-orchestration-plan) ⭐⭐⭐
- [Orchestration Deployment Guide - VLAN Enablement](../02-architecture/ORCHESTRATION_DEPLOYMENT_GUIDE.md#phase-1--vlan-enablement) ⭐⭐⭐
---
## Related Documentation
### Operational Procedures

View File

@@ -0,0 +1,158 @@
# Comprehensive Troubleshooting Guide
**Purpose**: Common issues and solutions for bridge operations
---
## ❌ Common Errors
### "Execution reverted"
**Cause**: Transaction reverted by contract logic
**Solutions**:
1. Check contract state
2. Verify parameters
3. Check allowances
4. Verify balances
**Debug**:
```bash
cast call <CONTRACT> "<function>" <args> --rpc-url $RPC_URL
```
---
### "Insufficient funds"
**Cause**: Not enough ETH for gas or LINK for fees
**Solutions**:
1. Check ETH balance
```bash
cast balance <address> --rpc-url $RPC_URL
```
2. Check LINK balance
```bash
cast call <LINK_TOKEN> "balanceOf(address)" <address> --rpc-url $RPC_URL
```
3. Add funds if needed
---
### "Nonce too low"
**Cause**: Transaction nonce is lower than current nonce
**Solutions**:
1. Check current nonce
```bash
cast nonce <address> --rpc-url $RPC_URL
```
2. Wait for pending transactions
3. Use correct nonce
---
### "Replacement transaction underpriced"
**Cause**: Pending transaction with lower gas price
**Solutions**:
1. Wait for pending transaction
2. Use higher gas price
3. Cancel pending transaction (if possible)
---
### "Destination not enabled"
**Cause**: Destination chain not configured on bridge
**Solutions**:
1. Verify destination configuration
```bash
cast call <BRIDGE> "destinations(uint64)" <SELECTOR> --rpc-url $RPC_URL
```
2. Configure destination if missing
```bash
bash scripts/configure-bridge-destinations.sh
```
---
### "Gas price below minimum"
**Cause**: Gas price too low for network
**Solutions**:
1. Get current gas price
```bash
cast gas-price --rpc-url $RPC_URL
```
2. Use higher gas price (1.2x-1.5x current)
```bash
bash scripts/bridge-with-dynamic-gas.sh
```
---
## 🔍 Debugging Steps
### 1. Check System Status
```bash
bash scripts/health-check.sh
```
### 2. Check Transaction Status
```bash
cast tx <tx_hash> --rpc-url $RPC_URL
```
### 3. Check Logs
```bash
tail -100 logs/alerts-$(date +%Y%m%d).log
```
### 4. Run Test Suite
```bash
bash scripts/test-suite.sh all
```
### 5. Check Recent Events
```bash
bash scripts/monitor-bridge-transfers.sh
```
---
## 🛠️ Advanced Troubleshooting
### Transaction Stuck
1. Check transaction status
2. Check nonce
3. Retry with higher gas
4. Consider canceling if possible
### Contract Not Found
1. Verify contract address
2. Check network
3. Verify contract deployment
### RPC Issues
1. Test RPC connectivity
2. Check RPC logs
3. Try backup RPC endpoint
---
**Last Updated**: $(date)

View File

@@ -0,0 +1,121 @@
# Troubleshooting Proxmox Connection
## Current Issue
The Proxmox host `192.168.11.10` is not reachable from this machine.
## Diagnosis Results
-**Ping Test**: 100% packet loss (host unreachable)
-**Port 8006**: Not accessible
-**Configuration**: Loaded correctly from `~/.env`
## Possible Causes
1. **Network Connectivity**
- Host is on a different network segment
- VPN not connected
- Network routing issue
- Host is powered off
2. **Firewall**
- Firewall blocking port 8006
- Network firewall rules
3. **Wrong Host Address**
- Host IP may have changed
- Host may be on different network
## Troubleshooting Steps
### 1. Check Network Connectivity
```bash
# Test basic connectivity
ping -c 3 192.168.11.10
# Check if host is on same network
ip route | grep 192.168.11.0
```
### 2. Check Alternative Hosts
If you have access to other Proxmox hosts, try:
```bash
# Test connectivity to alternative hosts
ping -c 3 <alternative-proxmox-host>
```
### 3. Use Shell Script (SSH Alternative)
If you have SSH access to the Proxmox node, use the shell script instead:
```bash
export PROXMOX_HOST=192.168.11.10
export PROXMOX_USER=root
./list_vms.sh
```
The shell script uses SSH which may work even if the API port is blocked.
### 4. Check VPN/Network Access
If the Proxmox host is on a remote network:
- Ensure VPN is connected
- Verify network routing
- Check if you're on the correct network segment
### 5. Verify Host is Running
- Check if Proxmox host is powered on
- Verify Proxmox services are running
- Check Proxmox web interface accessibility
### 6. Test from Proxmox Host Itself
If you can access the Proxmox host directly:
```bash
# SSH to Proxmox host
ssh root@192.168.11.10
# Test API locally
curl -k https://localhost:8006/api2/json/version
```
## Alternative: Use Shell Script
The shell script (`list_vms.sh`) uses SSH instead of the API, which may work even if:
- API port is blocked
- You're on a different network
- VPN provides SSH access but not API access
```bash
export PROXMOX_HOST=192.168.11.10
export PROXMOX_USER=root
./list_vms.sh
```
## Next Steps
1. **If host is accessible via SSH**: Use `list_vms.sh`
2. **If host is on different network**: Connect VPN or update network routing
3. **If host IP changed**: Update `PROXMOX_HOST` in `~/.env`
4. **If host is down**: Wait for it to come back online
## Quick Test Commands
```bash
# Test ping
ping -c 3 192.168.11.10
# Test port
timeout 5 bash -c "echo > /dev/tcp/192.168.11.10/8006" && echo "Port open" || echo "Port closed"
# Test SSH (if available)
ssh -o ConnectTimeout=5 root@192.168.11.10 "pvesh get /nodes" && echo "SSH works" || echo "SSH failed"
# Check current network
ip addr show | grep "inet "
```

View File

@@ -0,0 +1,57 @@
# Tunnel-Based Solutions for Proxmox Access
## Quick Reference
### Your Current Situation
- **Your Network**: `192.168.1.0/24` (IP: 192.168.1.36)
- **Proxmox Network**: `192.168.11.0/24` (Hosts: 192.168.11.10, 11, 12)
- **Problem**: Different network segments - direct connection blocked
### Available Tunnels
| Host | Internal IP | Tunnel URL | Status |
|------|-------------|------------|--------|
| ml110-01 | 192.168.11.10 | https://ml110-01.d-bis.org | ✅ Active |
| r630-01 | 192.168.11.11 | https://r630-01.d-bis.org | ✅ Active |
| r630-02 | 192.168.11.12 | https://r630-02.d-bis.org | ✅ Healthy |
## Solution 1: Use SSH Tunnel (Recommended for API)
```bash
# Start SSH tunnel
./setup_ssh_tunnel.sh
# In another terminal, use localhost
PROXMOX_HOST=localhost python3 list_vms.py
# Stop tunnel when done
./stop_ssh_tunnel.sh
```
## Solution 2: Access Web UI via Cloudflare Tunnel
Simply open in browser:
- https://ml110-01.d-bis.org (for ml110-01)
- https://r630-01.d-bis.org (for r630-01)
- https://r630-02.d-bis.org (for r630-02)
## Solution 3: Run Script from Proxmox Network
Copy scripts to a machine on `192.168.11.0/24` and run there.
## Solution 4: Use Shell Script via SSH
```bash
export PROXMOX_HOST=192.168.11.10
export PROXMOX_USER=root
./list_vms.sh
```
## Files Created
- `TUNNEL_ANALYSIS.md` - Complete tunnel analysis
- `list_vms_with_tunnels.py` - Enhanced script with tunnel awareness
- `setup_ssh_tunnel.sh` - SSH tunnel setup script
- `stop_ssh_tunnel.sh` - Stop SSH tunnel script
- `TUNNEL_SOLUTIONS.md` - This file

View File

@@ -0,0 +1,133 @@
# Fix SSH "Failed to Load Local Private Key" Error
**Issue:** "failed to load local private key" error when trying to connect
---
## Common Causes
1. **SSH config references a key that doesn't exist**
2. **Private key has wrong permissions**
3. **Corrupted or missing private key**
4. **SSH trying to use wrong key file**
---
## Quick Fixes
### Option 1: Use Password Authentication Only (Temporary)
Force SSH to use password authentication and skip keys:
```bash
ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no root@192.168.11.14
```
Or with sshpass:
```bash
sshpass -p 'L@kers2010' ssh -o PreferredAuthentications=password -o PubkeyAuthentication=no root@192.168.11.14
```
### Option 2: Check and Fix SSH Config
Check if there's a problematic SSH config entry:
```bash
cat ~/.ssh/config
```
If you see an entry for R630-04 or 192.168.11.14 with `IdentityFile` pointing to a missing key, either:
- Remove that entry
- Comment it out
- Create the missing key file
### Option 3: Fix Key Permissions
If keys exist but have wrong permissions:
```bash
chmod 600 ~/.ssh/id_*
chmod 644 ~/.ssh/id_*.pub
chmod 700 ~/.ssh
```
### Option 4: Remove Problematic Key References
If a specific key is causing issues, you can:
```bash
# Check which keys SSH is trying to use
ssh -v root@192.168.11.14 2>&1 | grep -i "identity\|key"
# If a specific key is problematic, temporarily rename it
mv ~/.ssh/id_rsa ~/.ssh/id_rsa.backup 2>/dev/null
mv ~/.ssh/id_ed25519 ~/.ssh/id_ed25519.backup 2>/dev/null
```
### Option 5: Clear SSH Agent (if using)
```bash
ssh-add -D # Remove all keys from agent
eval $(ssh-agent -k) # Kill agent
```
---
## Recommended Solution
Since you have console access and just want to reset the password, use password-only authentication:
```bash
# From your local machine
sshpass -p 'YOUR_PASSWORD' ssh \
-o PreferredAuthentications=password \
-o PubkeyAuthentication=no \
-o StrictHostKeyChecking=no \
root@192.168.11.14
```
Or if you're already on console, just run commands directly without SSH.
---
## For Console Access
If you're already logged in via console, you don't need SSH at all. Just run the commands directly on R630-04:
```bash
# Reset password
passwd root
# Fix pveproxy
systemctl restart pveproxy
# Check status
systemctl status pveproxy
ss -tlnp | grep 8006
```
---
## After Fixing
Once password is reset and you can SSH in, you can:
1. **Set up SSH keys properly** (optional):
```bash
ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519_r630-04 -N ""
ssh-copy-id -i ~/.ssh/id_ed25519_r630-04.pub root@192.168.11.14
```
2. **Update SSH config** (optional):
```bash
cat >> ~/.ssh/config << 'EOF'
Host r630-04
HostName 192.168.11.14
User root
IdentityFile ~/.ssh/id_ed25519_r630-04
EOF
```
But for now, just use password authentication or console access.

View File

@@ -0,0 +1,179 @@
# SSH Connection Options for R630-04
**IP:** 192.168.11.14
**User:** root
**Issue:** Permission denied with password authentication
---
## Possible Causes
1. **Password incorrect** - Double-check the password
2. **Password authentication disabled** - Server may require key-based auth
3. **Account locked** - Too many failed attempts
4. **SSH configuration** - Server may have restrictive settings
5. **Wrong user** - May need different username
---
## Troubleshooting Steps
### 1. Check SSH Authentication Methods
From another host that can connect to R630-04, check:
```bash
ssh -v root@192.168.11.14 2>&1 | grep -i "auth"
```
Look for:
- `publickey` - Key-based authentication enabled
- `password` - Password authentication enabled
- `keyboard-interactive` - Interactive password prompt
### 2. Try Different Authentication Methods
**Option A: Use SSH Key (if available)**
```bash
# Check for existing SSH keys
ls -la ~/.ssh/id_*
# Copy public key to R630-04 (if you have access from another host)
ssh-copy-id root@192.168.11.14
```
**Option B: Check if password has special characters**
The password `L@kers2010` contains `@` which should work, but try:
- Typing it carefully
- Using copy-paste
- Checking for hidden characters
### 3. Connect from R630-03 (which works)
Since R630-03 works, you can:
```bash
# SSH to R630-03 first
ssh root@192.168.11.13
# Password: L@kers2010
# Then from R630-03, SSH to R630-04
ssh root@192.168.11.14
```
### 4. Check SSH Configuration on R630-04
If you have console access or another way to access R630-04:
```bash
# Check SSH configuration
cat /etc/ssh/sshd_config | grep -E "PasswordAuthentication|PubkeyAuthentication|PermitRootLogin"
# Should show:
# PasswordAuthentication yes (or the line commented out)
# PubkeyAuthentication yes
# PermitRootLogin yes (or prohibit-password)
```
### 5. Reset Root Password (if you have console access)
If you have physical/console access:
```bash
# Boot into single user mode or recovery
# Then reset password:
passwd root
```
### 6. Check Account Status
```bash
# Check if root account is locked
passwd -S root
# Check failed login attempts
lastb | grep root | tail -20
```
---
## Alternative Access Methods
### 1. Use Proxmox Console
If R630-04 is managed by another Proxmox host:
```bash
# From Proxmox host managing R630-04
pct enter <container-id> # if it's a container
# or
qm monitor <vm-id> # if it's a VM
```
### 2. Use iDRAC/iLO (Dell R630)
If it's a physical Dell R630 server:
- Access iDRAC interface (usually https://<idrac-ip>)
- Use remote console
- Reset password from console
### 3. Network Boot/KVM Access
If you have KVM over IP or network boot access, you can:
- Access console directly
- Reset password
- Check SSH configuration
---
## Quick Verification
Try these commands from R630-03 (which works):
```bash
# From R630-03
ssh root@192.168.11.13
# After logging in, try:
ssh -v root@192.168.11.14 2>&1 | grep -E "auth|password|key"
```
---
## Recommended Next Steps
1. **Try connecting from R630-03** - Sometimes network path matters
2. **Verify password** - Try typing it again carefully
3. **Check if password was changed** - May have been changed since last login
4. **Use console access** - If available (iDRAC, KVM, etc.)
5. **Check SSH logs on R630-04** - `/var/log/auth.log` or `journalctl -u ssh`
---
## If Password Authentication is Disabled
If the server only accepts SSH keys:
1. **Generate SSH key pair** (on your local machine):
```bash
ssh-keygen -t ed25519 -f ~/.ssh/id_ed25519_r630-04
```
2. **Copy public key** (if you have another way to access):
```bash
# Method 1: If you have access from R630-03
ssh root@192.168.11.13
ssh-copy-id -i ~/.ssh/id_ed25519_r630-04.pub root@192.168.11.14
# Method 2: Manual copy (if you have console access)
# Copy the public key content to:
# /root/.ssh/authorized_keys on R630-04
```
3. **Connect with key**:
```bash
ssh -i ~/.ssh/id_ed25519_r630-04 root@192.168.11.14
```