Initial commit: loc_az_hci (smom-dbis-138 excluded via .gitignore)
Some checks failed
Test / test (push) Has been cancelled

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
defiQUG
2026-02-08 09:04:46 -08:00
commit c39465c2bd
386 changed files with 50649 additions and 0 deletions

View File

@@ -0,0 +1,444 @@
# Azure Arc Onboarding Guide
## Overview
This document describes the Azure Arc onboarding process for all Linux hosts and VMs in the Azure Stack HCI environment, enabling Azure governance, monitoring, and management.
## Architecture
### Azure Arc Architecture
```
┌─────────────────────────────────────────────────────────┐
│ Azure Portal │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Azure Arc │ │ Azure Policy │ │ Azure Monitor │ │
│ │ Servers │ │ │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Defender │ │ Update │ │ GitOps │ │
│ │ for Cloud │ │ Management │ │ (Flux) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
│ HTTPS (443) Outbound
┌─────────────────────────────────────────────────────────┐
│ On-Premises Infrastructure │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Router │ │ Proxmox │ │ Ubuntu │ │
│ │ Server │ │ ML110/R630 │ │ Service VMs │ │
│ │ │ │ │ │ │ │
│ │ Arc Agent │ │ Arc Agent │ │ Arc Agent │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
```
## Prerequisites
### Azure Requirements
- Azure subscription with Contributor role
- Resource group created (or will be created)
- Azure CLI installed and authenticated
- Service principal or managed identity (optional)
### Network Requirements
- Outbound HTTPS (443) connectivity to Azure
- Proxy support if needed (see Proxy Configuration section)
- DNS resolution for Azure endpoints
### Target Systems
- Linux hosts (Proxmox VE, Ubuntu)
- Windows Server (optional, for management VM)
- Ubuntu VMs (service VMs)
### Environment Configuration
Before starting, ensure your `.env` file is configured with Azure credentials:
```bash
# Copy template if not already done
cp .env.example .env
# Edit .env and set:
# - AZURE_SUBSCRIPTION_ID
# - AZURE_TENANT_ID
# - AZURE_CLIENT_ID (optional, for service principal)
# - AZURE_CLIENT_SECRET (optional, for service principal)
# - AZURE_RESOURCE_GROUP
# - AZURE_LOCATION
```
## Installation
### Step 1: Prepare Azure Environment
```bash
# Load environment variables from .env (if using .env file)
export $(cat .env | grep -v '^#' | xargs)
# Set variables (use from .env or set manually)
export SUBSCRIPTION_ID="${AZURE_SUBSCRIPTION_ID:-your-subscription-id}"
export RESOURCE_GROUP="${AZURE_RESOURCE_GROUP:-HC-Stack}"
export LOCATION="${AZURE_LOCATION:-eastus}"
export TENANT_ID="${AZURE_TENANT_ID:-$(az account show --query tenantId -o tsv)}"
# Login to Azure
az login
# Set subscription
az account set --subscription $SUBSCRIPTION_ID
# Create resource group (if not exists)
az group create \
--name $RESOURCE_GROUP \
--location $LOCATION
```
### Step 2: Install Arc Agent on Linux
#### Ubuntu/Debian
```bash
# Download installation script
curl -s https://aka.ms/azcmagent -o /tmp/install_linux_azcmagent.sh
# Run installation
bash /tmp/install_linux_azcmagent.sh
# Verify installation
azcmagent version
```
#### Proxmox VE (Debian-based)
```bash
# Same as Ubuntu/Debian
curl -s https://aka.ms/azcmagent -o /tmp/install_linux_azcmagent.sh
bash /tmp/install_linux_azcmagent.sh
azcmagent version
```
### Step 3: Onboard to Azure Arc
#### Using Service Principal
```bash
# Load environment variables from .env
export $(cat .env | grep -v '^#' | xargs)
# Use service principal from .env or create new one
if [ -z "$AZURE_CLIENT_ID" ] || [ -z "$AZURE_CLIENT_SECRET" ]; then
# Create service principal (if not exists)
az ad sp create-for-rbac \
--name "ArcOnboarding" \
--role "Azure Connected Machine Onboarding" \
--scopes "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP"
# Note: AppId, Password, Tenant - add these to .env file
else
export SUBSCRIPTION_ID="${AZURE_SUBSCRIPTION_ID}"
export RESOURCE_GROUP="${AZURE_RESOURCE_GROUP:-HC-Stack}"
export LOCATION="${AZURE_LOCATION:-eastus}"
export TENANT_ID="${AZURE_TENANT_ID}"
fi
# Onboard machine
azcmagent connect \
--service-principal-id "${AZURE_CLIENT_ID:-<app-id>}" \
--service-principal-secret "${AZURE_CLIENT_SECRET:-<password>}" \
--tenant-id "$TENANT_ID" \
--subscription-id "$SUBSCRIPTION_ID" \
--resource-group "$RESOURCE_GROUP" \
--location "$LOCATION" \
--tags "Environment=Production,Role=Router"
```
#### Using Interactive Login
```bash
# Load environment variables from .env
export $(cat .env | grep -v '^#' | xargs)
export SUBSCRIPTION_ID="${AZURE_SUBSCRIPTION_ID}"
export RESOURCE_GROUP="${AZURE_RESOURCE_GROUP:-HC-Stack}"
export LOCATION="${AZURE_LOCATION:-eastus}"
# Onboard machine (will prompt for login)
azcmagent connect \
--subscription-id "$SUBSCRIPTION_ID" \
--resource-group "$RESOURCE_GROUP" \
--location "$LOCATION" \
--tags "Environment=Production,Role=Router"
```
### Step 4: Verify Onboarding
```bash
# Check agent status
azcmagent show
# Verify in Azure Portal
az connectedmachine list \
--resource-group $RESOURCE_GROUP \
--output table
```
## Proxy Configuration
### If Outbound Proxy Required
#### Configure Proxy for Arc Agent
```bash
# Set proxy environment variables
export https_proxy="http://proxy.example.com:8080"
export http_proxy="http://proxy.example.com:8080"
export no_proxy="localhost,127.0.0.1,.local"
# Configure Arc agent proxy
azcmagent config set proxy.url "http://proxy.example.com:8080"
azcmagent config set proxy.bypass "localhost,127.0.0.1,.local"
# Restart agent
azcmagent restart
```
#### Proxy Authentication
```bash
# If proxy requires authentication
azcmagent config set proxy.url "http://user:password@proxy.example.com:8080"
azcmagent restart
```
## Governance Configuration
### Azure Policy
#### Enable Policy for Arc Servers
```bash
# Assign built-in policy: "Enable Azure Monitor for VMs"
az policy assignment create \
--name "EnableAzureMonitorForVMs" \
--display-name "Enable Azure Monitor for VMs" \
--scope "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP" \
--policy "/providers/Microsoft.Authorization/policyDefinitions/0ef5aac7-c064-427a-b87b-d47b3ddcaf73"
```
#### Custom Policy Example
```json
{
"if": {
"allOf": [
{
"field": "type",
"equals": "Microsoft.HybridCompute/machines"
},
{
"field": "Microsoft.HybridCompute/machines/osName",
"notEquals": "Ubuntu"
}
]
},
"then": {
"effect": "audit"
}
}
```
### Azure Monitor
#### Enable Log Analytics
```bash
# Create Log Analytics workspace
az monitor log-analytics workspace create \
--resource-group $RESOURCE_GROUP \
--workspace-name "hci-logs-$LOCATION"
# Enable VM insights
az monitor log-analytics solution create \
--resource-group $RESOURCE_GROUP \
--name "VMInsights" \
--workspace "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.OperationalInsights/workspaces/hci-logs-$LOCATION" \
--plan-publisher "Microsoft" \
--plan-product "OMSGallery/VMInsights"
```
#### Configure Data Collection
```bash
# Enable data collection rule
az monitor data-collection rule create \
--resource-group $RESOURCE_GROUP \
--name "hci-dcr" \
--location "$LOCATION" \
--log-analytics "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.OperationalInsights/workspaces/hci-logs-$LOCATION"
```
### Azure Defender
#### Enable Defender for Servers
```bash
# Enable Defender for Cloud
az security pricing create \
--name "VirtualMachines" \
--tier "Standard" \
--resource-group $RESOURCE_GROUP
```
#### Onboard Arc Servers to Defender
```bash
# Install Defender extension (via Azure Portal or CLI)
az connectedmachine extension create \
--machine-name "<machine-name>" \
--resource-group $RESOURCE_GROUP \
--name "WindowsDefenderATP" \
--publisher "Microsoft.AzureDefender" \
--type "MDE.Linux"
```
### Update Management
#### Enable Update Management
```bash
# Enable Update Management via Azure Automation
# This is typically done through Azure Portal:
# 1. Create Automation Account
# 2. Enable Update Management solution
# 3. Add Arc servers to Update Management
```
## Tagging Strategy
### Recommended Tags
```bash
# Tag machines during onboarding
azcmagent connect \
--subscription-id "$SUBSCRIPTION_ID" \
--resource-group "$RESOURCE_GROUP" \
--location "$LOCATION" \
--tags "Environment=Production,Role=Router,Project=AzureStackHCI,ManagedBy=Arc"
```
### Update Tags
```bash
# Update tags after onboarding
az connectedmachine update \
--name "<machine-name>" \
--resource-group $RESOURCE_GROUP \
--tags "Environment=Production,Role=Router,Updated=2024-01-01"
```
## Verification
### Check Agent Status
```bash
# On each machine
azcmagent show
# Expected output:
# Agent Status: Connected
# Azure Resource ID: /subscriptions/.../resourceGroups/.../providers/Microsoft.HybridCompute/machines/...
```
### Verify in Azure Portal
1. Navigate to Azure Portal > Azure Arc > Servers
2. Verify all machines listed
3. Check machine status (Connected)
4. Review machine details and tags
### Test Policy Enforcement
```bash
# Check policy compliance
az policy state list \
--resource "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP" \
--output table
```
## Troubleshooting
### Agent Not Connecting
**Problem:** Agent shows as disconnected
- **Solution:**
- Check network connectivity (HTTPS 443)
- Verify proxy configuration if needed
- Check agent logs: `azcmagent logs`
- Verify Azure credentials
### Proxy Issues
**Problem:** Agent can't connect through proxy
- **Solution:**
- Verify proxy URL and credentials
- Check proxy bypass list
- Test proxy connectivity manually
- Review agent logs
### Policy Not Applying
**Problem:** Azure Policy not enforcing
- **Solution:**
- Verify policy assignment scope
- Check policy evaluation status
- Verify machine tags match policy conditions
- Review policy compliance reports
### Monitoring Not Working
**Problem:** Azure Monitor not collecting data
- **Solution:**
- Verify Log Analytics workspace configuration
- Check data collection rules
- Verify agent extension installed
- Review Log Analytics workspace logs
## Best Practices
1. **Use Service Principals:**
- Create dedicated service principal for Arc onboarding
- Use least privilege permissions
- Rotate credentials regularly
2. **Tagging:**
- Use consistent tagging strategy
- Include environment, role, project tags
- Enable tag-based policy enforcement
3. **Monitoring:**
- Enable Azure Monitor for all Arc servers
- Configure alert rules
- Set up log retention policies
4. **Security:**
- Enable Azure Defender for all servers
- Configure security policies
- Review security recommendations regularly
5. **Updates:**
- Enable Update Management
- Schedule regular maintenance windows
- Test updates in dev environment first
## Related Documentation
- [Complete Architecture](complete-architecture.md) - Full architecture overview
- [Bring-Up Checklist](bring-up-checklist.md) - Installation guide
- [Microsoft Azure Arc Documentation](https://docs.microsoft.com/azure/azure-arc/)

View File

@@ -0,0 +1,377 @@
# Bring-Up Checklist
## Day-One Installation Guide
This checklist provides a step-by-step guide for bringing up the complete Azure Stack HCI environment on installation day.
## Pre-Installation Preparation
### Hardware Verification
- [ ] Router server chassis received and inspected
- [ ] All PCIe cards received (NICs, HBAs, QAT)
- [ ] Memory modules received (8× 4GB DDR4 ECC RDIMM)
- [ ] Storage SSD received (256GB)
- [ ] All cables received (Ethernet, Mini-SAS HD)
- [ ] Storage shelves received and inspected
- [ ] Proxmox hosts (ML110, R630) verified operational
### Documentation Review
- [ ] Complete architecture reviewed
- [ ] PCIe slot allocation map reviewed
- [ ] Network topology and VLAN schema reviewed
- [ ] Driver matrix reviewed
- [ ] All configuration files prepared
### Environment Configuration
- [ ] Copy `.env.example` to `.env`
- [ ] Configure Azure credentials in `.env`:
- [ ] `AZURE_SUBSCRIPTION_ID`
- [ ] `AZURE_TENANT_ID`
- [ ] `AZURE_RESOURCE_GROUP`
- [ ] `AZURE_LOCATION`
- [ ] Configure Cloudflare credentials in `.env`:
- [ ] `CLOUDFLARE_API_TOKEN`
- [ ] `CLOUDFLARE_ACCOUNT_EMAIL`
- [ ] Configure Proxmox credentials in `.env`:
- [ ] `PVE_ROOT_PASS` (shared root password for all instances)
- [ ] `PROXMOX_ML110_URL`
- [ ] `PROXMOX_R630_URL`
- [ ] Note: Username `root@pam` is implied and should not be stored
- [ ] For production: Create RBAC accounts and use API tokens instead of root
- [ ] Verify `.env` file is in `.gitignore` (should not be committed)
## Phase 1: Hardware Installation
### Router Server Assembly
- [ ] Install CPU and memory (8× 4GB DDR4 ECC RDIMM)
- [ ] Install boot SSD (256GB)
- [ ] Install Intel QAT 8970 in x16_1 slot
- [ ] Install Intel X550-T2 in x8_1 slot
- [ ] Install LSI 9207-8e #1 in x8_2 slot
- [ ] Install LSI 9207-8e #2 in x8_3 slot
- [ ] Install Intel i350-T4 in x4_1 slot
- [ ] Install Intel i350-T8 in x4_2 slot
- [ ] Install Intel i225 Quad-Port in x4_3 slot
- [ ] Verify all cards seated properly
- [ ] Connect power and verify POST
### BIOS/UEFI Configuration
- [ ] Enter BIOS/UEFI setup
- [ ] Verify all PCIe cards detected
- [ ] Configure boot order (SSD first)
- [ ] Enable virtualization (Intel VT-x, VT-d)
- [ ] Configure memory settings (ECC enabled)
- [ ] Set date/time
- [ ] Save and exit BIOS
### Storage Shelf Cabling
- [ ] Connect SFF-8644 cables from LSI HBA #1 to shelves 1-2
- [ ] Connect SFF-8644 cables from LSI HBA #2 to shelves 3-4
- [ ] Power on storage shelves
- [ ] Verify shelf power and status LEDs
- [ ] Label all cables
### Network Cabling
- [ ] Connect 4× Cat6 cables from i350-T4 to Spectrum modems/ONTs (WAN1-4)
- [ ] Connect 2× Cat6a cables to X550-T2 (reserved for future)
- [ ] Connect 4× Cat6 cables from i225 Quad to ML110, R630, and key services
- [ ] Connect 8× Cat6 cables from i350-T8 to remaining servers/appliances
- [ ] Label all cables at both ends
- [ ] Document cable mapping
## Phase 2: Operating System Installation
### Router Server OS
**Option A: Windows Server Core**
- [ ] Boot from Windows Server installation media
- [ ] Install Windows Server Core
- [ ] Configure initial administrator password
- [ ] Install Windows Updates
- [ ] Configure static IP on management interface
- [ ] Enable Remote Desktop (if needed)
- [ ] Install Windows Admin Center
**Option B: Proxmox VE**
- [ ] Boot from Proxmox VE installation media
- [ ] Install Proxmox VE
- [ ] Configure initial root password
- [ ] Configure network (management interface)
- [ ] Update Proxmox packages
- [ ] Verify Proxmox web interface accessible
### Proxmox Hosts (ML110, R630)
- [ ] Verify Proxmox VE installed and updated
- [ ] Configure network interfaces
- [ ] Verify cluster status (if clustered)
- [ ] Test VM creation
## Phase 3: Driver Installation
### Router Server Drivers
- [ ] Install Intel PROSet drivers for all NICs
- [ ] i350-T4 (WAN)
- [ ] i350-T8 (LAN 1GbE)
- [ ] X550-T2 (10GbE)
- [ ] i225 Quad-Port (LAN 2.5GbE)
- [ ] Verify all NICs detected and functional
- [ ] Install LSI mpt3sas driver
- [ ] Flash LSI HBAs to IT mode
- [ ] Verify storage shelves detected
- [ ] Install Intel QAT drivers (qatlib)
- [ ] Install OpenSSL QAT engine
- [ ] Verify QAT acceleration working
### Driver Verification
- [ ] Run driver verification script
- [ ] Test all network ports
- [ ] Test storage connectivity
- [ ] Test QAT acceleration
- [ ] Document any issues
## Phase 4: Network Configuration
### OpenWrt VM Setup
- [ ] Create OpenWrt VM on Router server
- [ ] Configure OpenWrt network interfaces
- [ ] Configure VLANs (10, 20, 30, 40, 50, 60, 99)
- [ ] Configure mwan3 for 4× Spectrum WAN
- [ ] Configure firewall zones
- [ ] Test multi-WAN failover
- [ ] Configure inter-VLAN routing
### Proxmox VLAN Configuration
- [ ] Configure VLAN bridges on ML110
- [ ] Configure VLAN bridges on R630
- [ ] Test VLAN connectivity
- [ ] Verify VM network isolation
### IP Address Configuration
- [ ] Configure IP addresses per VLAN schema
- [ ] Configure DNS settings
- [ ] Test network connectivity
- [ ] Verify routing between VLANs
## Phase 5: Storage Configuration
### Storage Spaces Direct Setup
- [ ] Verify all shelves detected
- [ ] Create Storage Spaces Direct pools
- [ ] Create volumes for VMs
- [ ] Create volumes for applications
- [ ] Configure storage exports (NFS/iSCSI)
### Proxmox Storage Mounts
- [ ] Configure NFS mounts on ML110
- [ ] Configure NFS mounts on R630
- [ ] Test storage connectivity
- [ ] Verify VM storage access
## Phase 6: Azure Arc Onboarding
### Arc Agent Installation
- [ ] Install Azure Arc agent on Router server (if Linux)
- [ ] Install Azure Arc agent on ML110
- [ ] Install Azure Arc agent on R630
- [ ] Install Azure Arc agent on Windows management VM (if applicable)
### Arc Onboarding
- [ ] Load environment variables from `.env`: `export $(cat .env | grep -v '^#' | xargs)`
- [ ] Configure Azure subscription and resource group (from `.env`)
- [ ] Onboard Router server to Azure Arc
- [ ] Onboard ML110 to Azure Arc
- [ ] Onboard R630 to Azure Arc
- [ ] Verify all resources visible in Azure Portal
### Arc Governance
- [ ] Configure Azure Policy
- [ ] Enable Azure Monitor
- [ ] Enable Azure Defender
- [ ] Configure Update Management
- [ ] Test policy enforcement
## Phase 7: Cloudflare Integration
### Cloudflare Tunnel Setup
- [ ] Create Cloudflare account (if not exists)
- [ ] Create Zero Trust organization
- [ ] Configure Cloudflare API token in `.env` file
- [ ] Install cloudflared on Ubuntu VM
- [ ] Authenticate cloudflared (interactive or using API token from `.env`)
- [ ] Configure Tunnel for WAC
- [ ] Configure Tunnel for Proxmox UI
- [ ] Configure Tunnel for dashboards
- [ ] Configure Tunnel for Git/CI services
### Zero Trust Policies
- [ ] Configure SSO (Azure AD/Okta)
- [ ] Configure MFA requirements
- [ ] Configure device posture checks
- [ ] Configure access policies
- [ ] Test external access
### WAF Configuration
- [ ] Configure WAF rules
- [ ] Test WAF protection
- [ ] Verify no inbound ports required
## Phase 8: Service VM Deployment
### Ubuntu VM Templates
- [ ] Create Ubuntu LTS template on Proxmox
- [ ] Install Azure Arc agent in template
- [ ] Configure base packages
- [ ] Create VM snapshots
### Service VM Deployment
- [ ] Deploy Cloudflare Tunnel VM (VLAN 99)
- [ ] Deploy Reverse Proxy VM (VLAN 30/99)
- [ ] Deploy Observability VM (VLAN 40)
- [ ] Deploy CI/CD VM (VLAN 50)
- [ ] Install Azure Arc agents on all VMs
### Service Configuration
- [ ] Configure Cloudflare Tunnel
- [ ] Configure reverse proxy (NGINX/Traefik)
- [ ] Configure observability stack (Prometheus/Grafana)
- [ ] Configure CI/CD (GitLab Runner/Jenkins)
## Phase 9: Verification and Testing
### Network Testing
- [ ] Test all WAN connections
- [ ] Test multi-WAN failover
- [ ] Test VLAN isolation
- [ ] Test inter-VLAN routing
- [ ] Test firewall rules
### Storage Testing
- [ ] Test storage read/write performance
- [ ] Test storage redundancy
- [ ] Test VM storage access
- [ ] Test storage exports
### Service Testing
- [ ] Test Cloudflare Tunnel access
- [ ] Test Azure Arc connectivity
- [ ] Test observability dashboards
- [ ] Test CI/CD pipelines
### Performance Testing
- [ ] Test QAT acceleration
- [ ] Test network throughput
- [ ] Test storage I/O
- [ ] Document performance metrics
## Phase 10: Documentation and Handoff
### Documentation
- [ ] Document all IP addresses
- [ ] Verify `.env` file contains all credentials (stored securely, not in version control)
- [ ] Document cable mappings
- [ ] Document VLAN configurations
- [ ] Document storage allocations
- [ ] Create network diagrams
- [ ] Create runbooks
- [ ] Verify `.env` is in `.gitignore` and not committed to repository
### Monitoring Setup
- [ ] Configure Grafana dashboards
- [ ] Configure Prometheus alerts
- [ ] Configure Azure Monitor alerts
- [ ] Test alerting
### Security Hardening
- [ ] Review firewall rules
- [ ] Review access policies
- [ ] Create RBAC accounts for Proxmox (replace root usage)
- [ ] Create service accounts for automation
- [ ] Create operator accounts with appropriate roles
- [ ] Generate API tokens for service accounts
- [ ] Document RBAC account usage (see docs/security/proxmox-rbac.md)
- [ ] Review secret management
- [ ] Perform security scan
## Post-Installation Tasks
### Ongoing Maintenance
- [ ] Schedule regular backups
- [ ] Schedule firmware updates
- [ ] Schedule driver updates
- [ ] Schedule OS updates
- [ ] Schedule security patches
### Monitoring
- [ ] Review monitoring dashboards daily
- [ ] Review Azure Arc status
- [ ] Review Cloudflare Tunnel status
- [ ] Review storage health
- [ ] Review network performance
## Troubleshooting Reference
### Common Issues
**Issue:** NIC not detected
- Check PCIe slot connection
- Check BIOS settings
- Update driver
**Issue:** Storage shelves not detected
- Check cable connections
- Check HBA firmware
- Check shelf power
**Issue:** Azure Arc not connecting
- Check network connectivity
- Check proxy settings
- Check Azure credentials
**Issue:** Cloudflare Tunnel not working
- Check cloudflared service
- Check Tunnel configuration
- Check Zero Trust policies
## Related Documentation
- [Complete Architecture](complete-architecture.md) - Full architecture overview
- [Hardware BOM](hardware-bom.md) - Complete bill of materials
- [PCIe Allocation](pcie-allocation.md) - Slot allocation map
- [Network Topology](network-topology.md) - VLAN/IP schema
- [Driver Matrix](driver-matrix.md) - Driver versions

View File

@@ -0,0 +1,387 @@
# Cloudflare Integration Guide
## Overview
This document describes the Cloudflare Zero Trust and Tunnel integration for secure external access to the Azure Stack HCI environment without requiring inbound ports.
## Architecture
### Cloudflare Tunnel Architecture
```
┌─────────────────────────────────────────────────────────┐
│ Cloudflare Zero Trust Network │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Zero Trust │ │ WAF │ │ Tunnel │ │
│ │ Policies │ │ Rules │ │ Endpoints │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
└─────────────────────────────────────────────────────────┘
│ Outbound HTTPS (443)
┌─────────────────────────────────────────────────────────┐
│ On-Premises Infrastructure │
│ │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ Cloudflare Tunnel VM (VLAN 99) │ │
│ │ ┌──────────────┐ │ │
│ │ │ cloudflared │ │ │
│ │ │ daemon │ │ │
│ │ └──────────────┘ │ │
│ └─────────────────────────────────────────────────────┘ │
│ │ │ │ │
│ ┌─────────▼──────┐ ┌────▼────┐ ┌─────▼─────┐ │
│ │ WAC │ │ Proxmox │ │ Dashboards│ │
│ │ (VLAN 60) │ │ UI │ │ (VLAN 40) │ │
│ └────────────────┘ └──────────┘ └───────────┘ │
└─────────────────────────────────────────────────────────┘
```
## Components
### Cloudflare Tunnel (cloudflared)
- **Purpose:** Secure outbound connection to Cloudflare network
- **Location:** Ubuntu VM in VLAN 99 (DMZ)
- **Protocol:** Outbound HTTPS (443) only
- **Benefits:** No inbound ports required, encrypted tunnel
### Zero Trust Policies
- **SSO Integration:** Azure AD, Okta, or other identity providers
- **MFA Requirements:** Multi-factor authentication enforcement
- **Device Posture:** Device health and compliance checks
- **Access Policies:** Least privilege access control
### WAF (Web Application Firewall)
- **Purpose:** Protect public ingress from attacks
- **Rules:** Custom WAF rules for application protection
- **Integration:** Works with Tunnel endpoints
## Installation
### Prerequisites
- Cloudflare account with Zero Trust enabled
- Ubuntu VM deployed in VLAN 99
- Network connectivity from Tunnel VM to services
- Azure AD or other SSO provider (optional)
### Environment Configuration
Before starting, ensure your `.env` file is configured with Cloudflare credentials:
```bash
# Copy template if not already done
cp .env.example .env
# Edit .env and set:
# - CLOUDFLARE_API_TOKEN (get from https://dash.cloudflare.com/profile/api-tokens)
# - CLOUDFLARE_ACCOUNT_EMAIL
# - CLOUDFLARE_ZONE_ID (optional)
```
### Step 1: Create Cloudflare Zero Trust Organization
1. Log in to [Cloudflare Dashboard](https://dash.cloudflare.com)
2. Navigate to Zero Trust
3. Create or select organization
4. Note your organization name
**Note**: If using automation scripts, ensure `CLOUDFLARE_API_TOKEN` is set in your `.env` file.
### Step 2: Install cloudflared
On the Ubuntu Tunnel VM:
```bash
# Download and install cloudflared
curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -o /usr/local/bin/cloudflared
chmod +x /usr/local/bin/cloudflared
# Verify installation
cloudflared --version
```
### Step 3: Authenticate cloudflared
```bash
# Option 1: Interactive login (recommended for first-time setup)
cloudflared tunnel login
# This will open a browser for authentication
# Follow the prompts to authenticate
# Option 2: Using API token from .env (for automation)
# Load environment variables if using .env
export $(cat .env | grep -v '^#' | xargs)
# Note: Tunnel credentials are stored in /etc/cloudflared/<tunnel-id>.json
# This file should be secured (chmod 600) and not committed to version control
```
### Step 4: Create Tunnel
```bash
# Create a new tunnel
cloudflared tunnel create azure-stack-hci
# Note the tunnel ID for configuration
```
## Configuration
### Tunnel Configuration File
Create `/etc/cloudflared/config.yml`:
```yaml
tunnel: <tunnel-id>
credentials-file: /etc/cloudflared/<tunnel-id>.json
ingress:
# Windows Admin Center
- hostname: wac.yourdomain.com
service: https://10.10.60.20:443
originRequest:
noHappyEyeballs: true
tcpKeepAlive: 30
# Proxmox UI
- hostname: proxmox.yourdomain.com
service: https://10.10.60.10:8006
originRequest:
noHappyEyeballs: true
tcpKeepAlive: 30
# Grafana Dashboard
- hostname: grafana.yourdomain.com
service: http://10.10.40.10:3000
originRequest:
noHappyEyeballs: true
# Git Server
- hostname: git.yourdomain.com
service: https://10.10.30.10:443
originRequest:
noHappyEyeballs: true
# CI/CD
- hostname: ci.yourdomain.com
service: https://10.10.50.10:443
originRequest:
noHappyEyeballs: true
# Catch-all (must be last)
- service: http_status:404
```
### DNS Configuration
In Cloudflare Dashboard:
1. Navigate to Zero Trust > Access > Tunnels
2. Select your tunnel
3. Configure public hostnames:
- `wac.yourdomain.com` → Tunnel
- `proxmox.yourdomain.com` → Tunnel
- `grafana.yourdomain.com` → Tunnel
- `git.yourdomain.com` → Tunnel
- `ci.yourdomain.com` → Tunnel
### Systemd Service
Create `/etc/systemd/system/cloudflared.service`:
```ini
[Unit]
Description=Cloudflare Tunnel
After=network.target
[Service]
Type=simple
User=cloudflared
ExecStart=/usr/local/bin/cloudflared tunnel --config /etc/cloudflared/config.yml run
Restart=on-failure
RestartSec=5s
[Install]
WantedBy=multi-user.target
```
Enable and start:
```bash
sudo systemctl enable cloudflared
sudo systemctl start cloudflared
sudo systemctl status cloudflared
```
## Zero Trust Policies
### SSO Configuration
1. Navigate to Zero Trust > Access > Authentication
2. Add identity provider:
- **Azure AD:** Configure Azure AD app registration
- **Okta:** Configure Okta application
- **Other:** Follow provider-specific instructions
### Access Policies
1. Navigate to Zero Trust > Access > Applications
2. Create application:
- **Application name:** WAC Access
- **Application domain:** `wac.yourdomain.com`
- **Session duration:** 24 hours
3. Configure policy:
- **Action:** Allow
- **Include:**
- Emails: `admin@yourdomain.com`
- Groups: `IT-Admins`
- **Require:**
- MFA: Yes
- Device posture: Optional
### Device Posture Checks
1. Navigate to Zero Trust > Settings > WARP
2. Configure device posture:
- **OS version:** Require minimum OS version
- **Disk encryption:** Require disk encryption
- **Firewall:** Require firewall enabled
## WAF Configuration
### WAF Rules
1. Navigate to Security > WAF
2. Create custom rules:
**Rule 1: Block Common Attacks**
- **Expression:** `(http.request.uri.path contains "/wp-admin" or http.request.uri.path contains "/phpmyadmin")`
- **Action:** Block
**Rule 2: Rate Limiting**
- **Expression:** `(rate(10m) > 100)`
- **Action:** Challenge
**Rule 3: Geographic Restrictions**
- **Expression:** `(ip.geoip.country ne "US" and ip.geoip.country ne "CA")`
- **Action:** Block (if needed)
## Proxmox Tunnel Example
### Community Patterns
For exposing Proxmox UI through Cloudflare Tunnel:
```yaml
# In config.yml
ingress:
- hostname: proxmox.yourdomain.com
service: https://10.10.60.10:8006
originRequest:
noHappyEyeballs: true
tcpKeepAlive: 30
connectTimeout: 10s
tlsTimeout: 10s
tcpKeepAliveTimeout: 30s
httpHostHeader: proxmox.yourdomain.com
```
### Proxmox Certificate Considerations
- Proxmox uses self-signed certificates by default
- Cloudflare Tunnel handles SSL termination
- Consider using Cloudflare's SSL/TLS mode: "Full (strict)" if using valid certificates
## Monitoring
### Tunnel Status
```bash
# Check tunnel status
sudo systemctl status cloudflared
# View tunnel logs
sudo journalctl -u cloudflared -f
# Test tunnel connectivity
cloudflared tunnel info <tunnel-id>
```
### Cloudflare Dashboard
- Navigate to Zero Trust > Access > Tunnels
- View tunnel status and metrics
- Monitor connection health
- Review access logs
## Troubleshooting
### Tunnel Not Connecting
**Problem:** Tunnel shows as disconnected
- **Solution:**
- Check network connectivity from VM
- Verify credentials file exists
- Check cloudflared service status
- Review logs: `journalctl -u cloudflared`
### Services Not Accessible
**Problem:** Can't access services through Tunnel
- **Solution:**
- Verify ingress rules in config.yml
- Check service connectivity from Tunnel VM
- Verify DNS configuration
- Check Zero Trust policies
### Authentication Issues
**Problem:** SSO not working
- **Solution:**
- Verify identity provider configuration
- Check application policies
- Verify user email addresses
- Check MFA configuration
### Performance Issues
**Problem:** Slow performance through Tunnel
- **Solution:**
- Check network latency
- Verify originRequest settings
- Consider using Cloudflare's Argo Smart Routing
- Review WAF rules for false positives
## Security Best Practices
1. **Use Zero Trust Policies:**
- Always require authentication
- Enforce MFA for sensitive services
- Use device posture checks
2. **WAF Rules:**
- Enable WAF for all public endpoints
- Configure rate limiting
- Block known attack patterns
3. **Tunnel Security:**
- Run cloudflared as non-root user
- Secure credentials file (chmod 600)
- Monitor tunnel logs for anomalies
4. **Network Isolation:**
- Keep Tunnel VM in DMZ (VLAN 99)
- Use firewall rules to restrict access
- Only allow necessary ports
## Related Documentation
- [Complete Architecture](complete-architecture.md) - Full architecture overview
- [Network Topology](network-topology.md) - VLAN/IP schema
- [Bring-Up Checklist](bring-up-checklist.md) - Installation guide

View File

@@ -0,0 +1,485 @@
# Deployment Guide
## Prerequisites
Before starting the deployment, ensure you have:
1. **Two Proxmox VE hosts** with:
- Proxmox VE 7.0+ installed
- Static IP addresses configured
- At least 8GB RAM per node
- Network connectivity between nodes
- Root or sudo access
2. **Azure Subscription** with:
- Azure CLI installed and authenticated
- Contributor role on subscription
- Resource group creation permissions
3. **Network Requirements**:
- Static IP addresses for all nodes
- DNS resolution (or hosts file)
- Internet access for Azure Arc connectivity
- NFS server (optional, for shared storage)
4. **Tools Installed**:
- SSH client
- kubectl
- helm (optional)
- terraform (optional)
5. **Environment Configuration**:
- Copy `.env.example` to `.env` and fill in all credentials
- See [Configuration](#configuration) section for details
## Configuration
### Environment Variables Setup
Before starting deployment, configure your environment variables:
1. **Copy the template:**
```bash
cp .env.example .env
```
2. **Edit `.env` with your credentials:**
- Azure credentials: `AZURE_SUBSCRIPTION_ID`, `AZURE_TENANT_ID`
- Cloudflare: `CLOUDFLARE_API_TOKEN`
- Proxmox: `PVE_ROOT_PASS` (shared root password for all instances)
- Proxmox ML110: `PROXMOX_ML110_URL`
- Proxmox R630: `PROXMOX_R630_URL`
**Note**: The username `root@pam` is implied and should not be stored. For production operations, use RBAC accounts and API tokens instead of root credentials.
3. **Load environment variables:**
```bash
# Source the .env file
export $(cat .env | grep -v '^#' | xargs)
```
**Note**: All scripts in this guide will use environment variables from `.env` if available. You can also set them manually using `export` commands.
## Deployment Phases
### Phase 1: Proxmox Cluster Setup
#### Step 1.1: Configure Network on Both Nodes
On each Proxmox node:
```bash
# Option 1: Use .env file (recommended)
# Load environment variables from .env
export $(cat .env | grep -v '^#' | xargs)
# Option 2: Set environment variables manually
export NODE_IP=192.168.1.10 # Use appropriate IP for each node
export NODE_GATEWAY=192.168.1.1
export NODE_NETMASK=24
export NODE_HOSTNAME=pve-node-1 # Use appropriate hostname
# Run network configuration script
cd /path/to/loc_az_hci
./infrastructure/proxmox/network-config.sh
```
**For Node 2**, repeat with appropriate values:
```bash
export NODE_IP=192.168.1.11
export NODE_HOSTNAME=pve-node-2
./infrastructure/proxmox/network-config.sh
```
#### Step 1.2: Update Proxmox Repositories
On both nodes:
```bash
# Update to subscription-free repos
sed -i 's/enterprise/no-subscription/g' /etc/apt/sources.list.d/pve-enterprise.list
apt update && apt dist-upgrade -y
```
#### Step 1.3: Configure Shared Storage (NFS)
**Option A: Using existing NFS server**
On both Proxmox nodes:
```bash
export NFS_SERVER=192.168.1.100
export NFS_PATH=/mnt/proxmox-storage
export STORAGE_NAME=nfs-shared
./infrastructure/proxmox/nfs-storage.sh
```
**Option B: Set up NFS server**
If you need to set up an NFS server, install and configure it on a separate machine or VM.
#### Step 1.4: Create Proxmox Cluster
**On Node 1** (cluster creator):
```bash
export NODE_ROLE=create
export CLUSTER_NAME=hc-cluster
./infrastructure/proxmox/cluster-setup.sh
```
**On Node 2** (join cluster):
```bash
export NODE_ROLE=join
export CLUSTER_NODE_IP=192.168.1.10 # IP of Node 1
export ROOT_PASSWORD=your-root-password # Optional, will prompt if not set
./infrastructure/proxmox/cluster-setup.sh
```
**Verify cluster**:
```bash
pvecm status
pvecm nodes
```
### Phase 2: Azure Arc Integration
#### Step 2.1: Prepare Azure Environment
```bash
# Load environment variables from .env (if using .env file)
export $(cat .env | grep -v '^#' | xargs)
# Login to Azure
az login
# Set subscription (use from .env or set manually)
az account set --subscription "${AZURE_SUBSCRIPTION_ID:-your-subscription-id}"
# Create resource group (if not exists)
az group create --name "${AZURE_RESOURCE_GROUP:-HC-Stack}" --location "${AZURE_LOCATION:-eastus}"
```
#### Step 2.2: Onboard Proxmox Hosts to Azure Arc
On each Proxmox node:
```bash
# Load environment variables from .env (if using .env file)
export $(cat .env | grep -v '^#' | xargs)
# Set Azure variables (use from .env or get from Azure CLI)
export RESOURCE_GROUP="${AZURE_RESOURCE_GROUP:-HC-Stack}"
export TENANT_ID="${AZURE_TENANT_ID:-$(az account show --query tenantId -o tsv)}"
export SUBSCRIPTION_ID="${AZURE_SUBSCRIPTION_ID:-$(az account show --query id -o tsv)}"
export LOCATION="${AZURE_LOCATION:-eastus}"
export TAGS="type=proxmox,environment=hybrid"
./scripts/azure-arc/onboard-proxmox-hosts.sh
```
**Verify in Azure Portal**:
- Navigate to: Azure Portal → Azure Arc → Servers
- You should see both Proxmox nodes
#### Step 2.3: Create VMs for Kubernetes and Git
Create VMs in Proxmox web UI or using Terraform:
```bash
# Load environment variables from .env
export $(cat .env | grep -v '^#' | xargs)
cd terraform/proxmox
# Create terraform.tfvars from environment variables or edit manually
cat > terraform.tfvars <<EOF
proxmox_host = "${PROXMOX_ML110_URL#https://}"
proxmox_username = "root@pam" # Hardcoded, not from env (best practice)
proxmox_password = "${PVE_ROOT_PASS}"
proxmox_node = "pve-node-1"
EOF
terraform init
terraform plan
terraform apply
```
#### Step 2.4: Onboard VMs to Azure Arc
For each VM:
```bash
# Load environment variables from .env
export $(cat .env | grep -v '^#' | xargs)
export VM_IP=192.168.1.188
export VM_USER=ubuntu
export RESOURCE_GROUP="${AZURE_RESOURCE_GROUP:-HC-Stack}"
export TENANT_ID="${AZURE_TENANT_ID:-$(az account show --query tenantId -o tsv)}"
export SUBSCRIPTION_ID="${AZURE_SUBSCRIPTION_ID:-$(az account show --query id -o tsv)}"
export LOCATION="${AZURE_LOCATION:-eastus}"
./scripts/azure-arc/onboard-vms.sh
```
### Phase 3: Kubernetes Setup
#### Step 3.1: Install K3s
On the VM designated for Kubernetes:
```bash
export INSTALL_MODE=local
export K3S_VERSION=latest
./infrastructure/kubernetes/k3s-install.sh
```
**Or install remotely**:
```bash
export INSTALL_MODE=remote
export REMOTE_IP=192.168.1.188
export REMOTE_USER=ubuntu
./infrastructure/kubernetes/k3s-install.sh
```
#### Step 3.2: Onboard Kubernetes to Azure Arc
```bash
# Load environment variables from .env
export $(cat .env | grep -v '^#' | xargs)
export RESOURCE_GROUP="${AZURE_RESOURCE_GROUP:-HC-Stack}"
export TENANT_ID="${AZURE_TENANT_ID:-$(az account show --query tenantId -o tsv)}"
export SUBSCRIPTION_ID="${AZURE_SUBSCRIPTION_ID:-$(az account show --query id -o tsv)}"
export LOCATION="${AZURE_LOCATION:-eastus}"
export CLUSTER_NAME=proxmox-k3s-cluster
# Ensure kubeconfig is set
export KUBECONFIG=~/.kube/config
./infrastructure/kubernetes/arc-onboard-k8s.sh
```
**Verify in Azure Portal**:
- Navigate to: Azure Portal → Azure Arc → Kubernetes
- You should see your cluster
#### Step 3.3: Install Base Infrastructure
```bash
# Apply namespace and base infrastructure
kubectl apply -f gitops/infrastructure/namespace.yaml
kubectl apply -f gitops/infrastructure/ingress-controller.yaml
kubectl apply -f gitops/infrastructure/cert-manager.yaml
```
### Phase 4: Git/DevOps Setup
#### Option A: Deploy Gitea (Recommended for small deployments)
```bash
export GITEA_DOMAIN=git.local
export GITEA_PORT=3000
./infrastructure/gitops/gitea-deploy.sh
```
Access Gitea at `http://git.local:3000` and complete initial setup.
#### Option B: Deploy GitLab CE
```bash
export GITLAB_DOMAIN=gitlab.local
export GITLAB_PORT=8080
./infrastructure/gitops/gitlab-deploy.sh
```
**Note**: GitLab requires at least 8GB RAM.
#### Option C: Azure DevOps Self-Hosted Agent
On a VM:
```bash
# Load environment variables from .env
export $(cat .env | grep -v '^#' | xargs)
export AZP_URL="${AZP_URL:-https://dev.azure.com/yourorg}"
export AZP_TOKEN="${AZP_TOKEN:-your-personal-access-token}"
export AZP_AGENT_NAME=proxmox-agent-1
export AZP_POOL=Default
./infrastructure/gitops/azure-devops-agent.sh
```
### Phase 5: Configure GitOps
#### Step 5.1: Create Git Repository
1. Create a new repository in your Git server (Gitea/GitLab)
2. Clone the repository locally
3. Copy the `gitops/` directory to your repository
4. Commit and push:
```bash
git clone http://git.local:3000/user/gitops-repo.git
cd gitops-repo
cp -r /path/to/loc_az_hci/gitops/* .
git add .
git commit -m "Initial GitOps configuration"
git push
```
#### Step 5.2: Connect GitOps to Azure Arc
In Azure Portal:
1. Navigate to: Azure Arc → Kubernetes → Your cluster
2. Go to "GitOps" section
3. Click "Add configuration"
4. Configure:
- Repository URL: `http://git.local:3000/user/gitops-repo.git`
- Branch: `main`
- Path: `gitops/`
- Authentication: Configure as needed
### Phase 6: Deploy HC Stack Services
#### Option A: Deploy via GitOps (Recommended)
1. Update Helm chart values in your Git repository
2. Commit and push changes
3. Flux will automatically deploy updates
#### Option B: Deploy Manually with Helm
```bash
# Add Helm charts
helm install besu ./gitops/apps/besu -n blockchain
helm install firefly ./gitops/apps/firefly -n blockchain
helm install chainlink-ccip ./gitops/apps/chainlink-ccip -n blockchain
helm install blockscout ./gitops/apps/blockscout -n blockchain
helm install cacti ./gitops/apps/cacti -n monitoring
helm install nginx-proxy ./gitops/apps/nginx-proxy -n hc-stack
```
#### Option C: Deploy with Terraform
```bash
cd terraform/kubernetes
terraform init
terraform plan
terraform apply
```
### Phase 7: Verify Deployment
#### Check Proxmox Cluster
```bash
pvecm status
pvesm status
```
#### Check Azure Arc
```bash
# List Arc-enabled servers
az connectedmachine list --resource-group HC-Stack -o table
# List Arc-enabled Kubernetes clusters
az arc kubernetes list --resource-group HC-Stack -o table
```
#### Check Kubernetes
```bash
kubectl get nodes
kubectl get pods --all-namespaces
kubectl get services --all-namespaces
```
#### Check Applications
```bash
# Check Besu
kubectl get pods -n blockchain -l app=besu
# Check Firefly
kubectl get pods -n blockchain -l app=firefly
# Check all services
kubectl get all --all-namespaces
```
## Post-Deployment Configuration
### 1. Configure Ingress
Update ingress configurations for external access:
```bash
# Edit ingress resources
kubectl edit ingress -n blockchain
```
### 2. Set Up Monitoring
- Configure Cacti to monitor your infrastructure
- Set up Azure Monitor alerts
- Configure log aggregation
### 3. Configure Backup
- Set up Proxmox backup schedules
- Configure Kubernetes backup (Velero)
- Set up Azure Backup for Arc resources
### 4. Security Hardening
- Enable Azure Policy for compliance
- Configure network policies
- Set up RBAC
- Enable Defender for Cloud
## Troubleshooting
### Common Issues
1. **Cluster creation fails**:
- Check network connectivity between nodes
- Verify firewall rules
- Check Corosync configuration
2. **Azure Arc connection fails**:
- Verify internet connectivity
- Check Azure credentials
- Review agent logs: `journalctl -u azcmagent`
3. **Kubernetes pods not starting**:
- Check resource limits
- Verify storage classes
- Review pod logs: `kubectl logs <pod-name>`
4. **GitOps not syncing**:
- Check Flux logs: `kubectl logs -n flux-system -l app=flux`
- Verify repository access
- Check GitOps configuration in Azure Portal
## Next Steps
1. Review architecture documentation
2. Set up monitoring and alerting
3. Configure backup and disaster recovery
4. Implement security policies
5. Plan for scaling and expansion