157 lines
4.8 KiB
Markdown
157 lines
4.8 KiB
Markdown
|
|
# NPMplus Network Routing Issue - Root Cause Analysis
|
||
|
|
|
||
|
|
**Last Updated:** 2026-01-31
|
||
|
|
**Document Version:** 1.0
|
||
|
|
**Status:** Active Documentation
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Date:** 2025-01-20
|
||
|
|
**Container:** 10233 (NPMplus)
|
||
|
|
**IP:** 192.168.11.166
|
||
|
|
**Issue:** Container cannot reach backend services on 192.168.11.0/24
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Current Status
|
||
|
|
|
||
|
|
### ✅ What's Working
|
||
|
|
- Container has correct IP address: `192.168.11.166/24`
|
||
|
|
- Container can reach gateway: `192.168.11.1` (UDM Pro)
|
||
|
|
- Routing table is correct: `192.168.11.0/24 dev eth0`
|
||
|
|
- Proxmox host CAN reach backend services
|
||
|
|
- Backend services are running and responding
|
||
|
|
|
||
|
|
### ❌ What's Not Working
|
||
|
|
- Container CANNOT ping backend services (all 7 services fail)
|
||
|
|
- All HTTPS domains return 502 errors
|
||
|
|
- Network connectivity from container to 192.168.11.0/24 is blocked
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Root Cause Analysis
|
||
|
|
|
||
|
|
### Finding 1: Proxmox Bridge VLAN Configuration
|
||
|
|
- **Container veth interface:** `veth10233i0` is configured with VLAN 1 (PVID), not VLAN 11
|
||
|
|
- **Container config:** Shows `tag=11` but veth interface doesn't reflect this
|
||
|
|
- **Bridge status:** `vmbr0` has VLAN 11 sub-interface (`vmbr0v11`) but container veth is on VLAN 1
|
||
|
|
|
||
|
|
### Finding 2: Network Isolation
|
||
|
|
- Container is on VLAN 11 network (192.168.11.166)
|
||
|
|
- Backend services are on VLAN 11 network (192.168.11.0/24)
|
||
|
|
- Both should be on same VLAN, but connectivity fails
|
||
|
|
- This suggests either:
|
||
|
|
1. UDM Pro firewall blocking inter-VLAN communication (even within same VLAN)
|
||
|
|
2. Proxmox bridge VLAN tagging not working correctly
|
||
|
|
3. ARP/neighbor discovery failing
|
||
|
|
|
||
|
|
### Finding 3: Proxmox Host Can Reach Backends
|
||
|
|
- Proxmox host (192.168.11.11) CAN ping backend services
|
||
|
|
- This confirms backend services are reachable
|
||
|
|
- Issue is container-specific networking
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Diagnostic Commands
|
||
|
|
|
||
|
|
### Check Container Network
|
||
|
|
```bash
|
||
|
|
ssh root@192.168.11.11 "pct exec 10233 -- ip addr show eth0"
|
||
|
|
ssh root@192.168.11.11 "pct exec 10233 -- ip route show"
|
||
|
|
ssh root@192.168.11.11 "pct exec 10233 -- ping -c 2 192.168.11.1"
|
||
|
|
ssh root@192.168.11.11 "pct exec 10233 -- ping -c 2 192.168.11.140"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Check Proxmox Bridge VLAN
|
||
|
|
```bash
|
||
|
|
ssh root@192.168.11.11 "bridge vlan show vmbr0 | grep -E '11|10233'"
|
||
|
|
ssh root@192.168.11.11 "bridge vlan show veth10233i0"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Check UDM Pro Firewall Rules
|
||
|
|
```bash
|
||
|
|
# Via API
|
||
|
|
curl -k -X GET "https://192.168.11.1/proxy/network/integration/v1/sites/88f7af54-98f8-306a-a1c7-c9349722b1f6/acl-rules" \
|
||
|
|
-H "X-API-KEY: <API_KEY>" \
|
||
|
|
-H 'Accept: application/json' | jq '.data[] | select(.enabled == true)'
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Potential Solutions
|
||
|
|
|
||
|
|
### Solution 1: Fix Proxmox Bridge VLAN Tagging (Recommended)
|
||
|
|
The container's veth interface needs to be properly configured for VLAN 11:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Stop container
|
||
|
|
ssh root@192.168.11.11 "pct stop 10233"
|
||
|
|
|
||
|
|
# Remove VLAN 1 from veth interface
|
||
|
|
ssh root@192.168.11.11 "bridge vlan del vid 1 dev veth10233i0"
|
||
|
|
|
||
|
|
# Add VLAN 11 as PVID
|
||
|
|
ssh root@192.168.11.11 "bridge vlan add vid 11 pvid untagged dev veth10233i0"
|
||
|
|
|
||
|
|
# Start container
|
||
|
|
ssh root@192.168.11.11 "pct start 10233"
|
||
|
|
```
|
||
|
|
|
||
|
|
**Note:** This may not persist across container restarts. May need to configure in Proxmox network configuration.
|
||
|
|
|
||
|
|
### Solution 2: Check UDM Pro Firewall Rules
|
||
|
|
UDM Pro may have firewall rules blocking traffic even within the same VLAN:
|
||
|
|
|
||
|
|
1. Access UDM Pro web UI: `https://192.168.11.1`
|
||
|
|
2. Navigate to: **Settings → Firewall & Security → Firewall Rules**
|
||
|
|
3. Check for rules blocking:
|
||
|
|
- Source: `192.168.11.166` or `192.168.11.0/24`
|
||
|
|
- Destination: `192.168.11.0/24`
|
||
|
|
4. Ensure there's an ALLOW rule for same-VLAN communication
|
||
|
|
|
||
|
|
### Solution 3: Use Proxmox Network Configuration
|
||
|
|
Instead of manual bridge VLAN configuration, reconfigure container network:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Remove current network config
|
||
|
|
ssh root@192.168.11.11 "pct set 10233 -delete net0"
|
||
|
|
|
||
|
|
# Add network with proper VLAN tagging
|
||
|
|
ssh root@192.168.11.11 "pct set 10233 -net0 name=eth0,bridge=vmbr0,tag=11,firewall=1,ip=192.168.11.166/24,gw=192.168.11.1"
|
||
|
|
|
||
|
|
# Restart container
|
||
|
|
ssh root@192.168.11.11 "pct stop 10233 && pct start 10233"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Solution 4: Check ARP/Neighbor Discovery
|
||
|
|
Container may not be able to resolve MAC addresses:
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Check ARP table in container
|
||
|
|
ssh root@192.168.11.11 "pct exec 10233 -- arp -a"
|
||
|
|
|
||
|
|
# Try to resolve gateway MAC
|
||
|
|
ssh root@192.168.11.11 "pct exec 10233 -- arp -s 192.168.11.1 <GATEWAY_MAC>"
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Next Steps
|
||
|
|
|
||
|
|
1. **Immediate:** Check UDM Pro firewall rules via web UI
|
||
|
|
2. **If firewall is OK:** Fix Proxmox bridge VLAN configuration
|
||
|
|
3. **Verify:** Test connectivity after fixes
|
||
|
|
4. **Document:** Update configuration documentation
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Related Files
|
||
|
|
|
||
|
|
- `scripts/check-npmplus-network-connectivity.sh` - Diagnostic script
|
||
|
|
- `scripts/diagnose-npmplus-backend-services.sh` - Backend service check
|
||
|
|
- `docs/04-configuration/NPMPLUS_BACKEND_SERVICES_RESOLUTION.md` - Related documentation
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
**Status:** 🔴 **BLOCKED** - Network routing issue preventing backend connectivity
|