Initial commit: loc_az_hci (smom-dbis-138 excluded via .gitignore)
Some checks failed
Test / test (push) Has been cancelled
Some checks failed
Test / test (push) Has been cancelled
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
444
docs/deployment/azure-arc-onboarding.md
Normal file
444
docs/deployment/azure-arc-onboarding.md
Normal file
@@ -0,0 +1,444 @@
|
||||
# Azure Arc Onboarding Guide
|
||||
|
||||
## Overview
|
||||
|
||||
This document describes the Azure Arc onboarding process for all Linux hosts and VMs in the Azure Stack HCI environment, enabling Azure governance, monitoring, and management.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Azure Arc Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Azure Portal │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Azure Arc │ │ Azure Policy │ │ Azure Monitor │ │
|
||||
│ │ Servers │ │ │ │ │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Defender │ │ Update │ │ GitOps │ │
|
||||
│ │ for Cloud │ │ Management │ │ (Flux) │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
│
|
||||
│ HTTPS (443) Outbound
|
||||
│
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ On-Premises Infrastructure │
|
||||
│ │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Router │ │ Proxmox │ │ Ubuntu │ │
|
||||
│ │ Server │ │ ML110/R630 │ │ Service VMs │ │
|
||||
│ │ │ │ │ │ │ │
|
||||
│ │ Arc Agent │ │ Arc Agent │ │ Arc Agent │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Prerequisites
|
||||
|
||||
### Azure Requirements
|
||||
|
||||
- Azure subscription with Contributor role
|
||||
- Resource group created (or will be created)
|
||||
- Azure CLI installed and authenticated
|
||||
- Service principal or managed identity (optional)
|
||||
|
||||
### Network Requirements
|
||||
|
||||
- Outbound HTTPS (443) connectivity to Azure
|
||||
- Proxy support if needed (see Proxy Configuration section)
|
||||
- DNS resolution for Azure endpoints
|
||||
|
||||
### Target Systems
|
||||
|
||||
- Linux hosts (Proxmox VE, Ubuntu)
|
||||
- Windows Server (optional, for management VM)
|
||||
- Ubuntu VMs (service VMs)
|
||||
|
||||
### Environment Configuration
|
||||
|
||||
Before starting, ensure your `.env` file is configured with Azure credentials:
|
||||
|
||||
```bash
|
||||
# Copy template if not already done
|
||||
cp .env.example .env
|
||||
|
||||
# Edit .env and set:
|
||||
# - AZURE_SUBSCRIPTION_ID
|
||||
# - AZURE_TENANT_ID
|
||||
# - AZURE_CLIENT_ID (optional, for service principal)
|
||||
# - AZURE_CLIENT_SECRET (optional, for service principal)
|
||||
# - AZURE_RESOURCE_GROUP
|
||||
# - AZURE_LOCATION
|
||||
```
|
||||
|
||||
## Installation
|
||||
|
||||
### Step 1: Prepare Azure Environment
|
||||
|
||||
```bash
|
||||
# Load environment variables from .env (if using .env file)
|
||||
export $(cat .env | grep -v '^#' | xargs)
|
||||
|
||||
# Set variables (use from .env or set manually)
|
||||
export SUBSCRIPTION_ID="${AZURE_SUBSCRIPTION_ID:-your-subscription-id}"
|
||||
export RESOURCE_GROUP="${AZURE_RESOURCE_GROUP:-HC-Stack}"
|
||||
export LOCATION="${AZURE_LOCATION:-eastus}"
|
||||
export TENANT_ID="${AZURE_TENANT_ID:-$(az account show --query tenantId -o tsv)}"
|
||||
|
||||
# Login to Azure
|
||||
az login
|
||||
|
||||
# Set subscription
|
||||
az account set --subscription $SUBSCRIPTION_ID
|
||||
|
||||
# Create resource group (if not exists)
|
||||
az group create \
|
||||
--name $RESOURCE_GROUP \
|
||||
--location $LOCATION
|
||||
```
|
||||
|
||||
### Step 2: Install Arc Agent on Linux
|
||||
|
||||
#### Ubuntu/Debian
|
||||
|
||||
```bash
|
||||
# Download installation script
|
||||
curl -s https://aka.ms/azcmagent -o /tmp/install_linux_azcmagent.sh
|
||||
|
||||
# Run installation
|
||||
bash /tmp/install_linux_azcmagent.sh
|
||||
|
||||
# Verify installation
|
||||
azcmagent version
|
||||
```
|
||||
|
||||
#### Proxmox VE (Debian-based)
|
||||
|
||||
```bash
|
||||
# Same as Ubuntu/Debian
|
||||
curl -s https://aka.ms/azcmagent -o /tmp/install_linux_azcmagent.sh
|
||||
bash /tmp/install_linux_azcmagent.sh
|
||||
azcmagent version
|
||||
```
|
||||
|
||||
### Step 3: Onboard to Azure Arc
|
||||
|
||||
#### Using Service Principal
|
||||
|
||||
```bash
|
||||
# Load environment variables from .env
|
||||
export $(cat .env | grep -v '^#' | xargs)
|
||||
|
||||
# Use service principal from .env or create new one
|
||||
if [ -z "$AZURE_CLIENT_ID" ] || [ -z "$AZURE_CLIENT_SECRET" ]; then
|
||||
# Create service principal (if not exists)
|
||||
az ad sp create-for-rbac \
|
||||
--name "ArcOnboarding" \
|
||||
--role "Azure Connected Machine Onboarding" \
|
||||
--scopes "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP"
|
||||
|
||||
# Note: AppId, Password, Tenant - add these to .env file
|
||||
else
|
||||
export SUBSCRIPTION_ID="${AZURE_SUBSCRIPTION_ID}"
|
||||
export RESOURCE_GROUP="${AZURE_RESOURCE_GROUP:-HC-Stack}"
|
||||
export LOCATION="${AZURE_LOCATION:-eastus}"
|
||||
export TENANT_ID="${AZURE_TENANT_ID}"
|
||||
fi
|
||||
|
||||
# Onboard machine
|
||||
azcmagent connect \
|
||||
--service-principal-id "${AZURE_CLIENT_ID:-<app-id>}" \
|
||||
--service-principal-secret "${AZURE_CLIENT_SECRET:-<password>}" \
|
||||
--tenant-id "$TENANT_ID" \
|
||||
--subscription-id "$SUBSCRIPTION_ID" \
|
||||
--resource-group "$RESOURCE_GROUP" \
|
||||
--location "$LOCATION" \
|
||||
--tags "Environment=Production,Role=Router"
|
||||
```
|
||||
|
||||
#### Using Interactive Login
|
||||
|
||||
```bash
|
||||
# Load environment variables from .env
|
||||
export $(cat .env | grep -v '^#' | xargs)
|
||||
|
||||
export SUBSCRIPTION_ID="${AZURE_SUBSCRIPTION_ID}"
|
||||
export RESOURCE_GROUP="${AZURE_RESOURCE_GROUP:-HC-Stack}"
|
||||
export LOCATION="${AZURE_LOCATION:-eastus}"
|
||||
|
||||
# Onboard machine (will prompt for login)
|
||||
azcmagent connect \
|
||||
--subscription-id "$SUBSCRIPTION_ID" \
|
||||
--resource-group "$RESOURCE_GROUP" \
|
||||
--location "$LOCATION" \
|
||||
--tags "Environment=Production,Role=Router"
|
||||
```
|
||||
|
||||
### Step 4: Verify Onboarding
|
||||
|
||||
```bash
|
||||
# Check agent status
|
||||
azcmagent show
|
||||
|
||||
# Verify in Azure Portal
|
||||
az connectedmachine list \
|
||||
--resource-group $RESOURCE_GROUP \
|
||||
--output table
|
||||
```
|
||||
|
||||
## Proxy Configuration
|
||||
|
||||
### If Outbound Proxy Required
|
||||
|
||||
#### Configure Proxy for Arc Agent
|
||||
|
||||
```bash
|
||||
# Set proxy environment variables
|
||||
export https_proxy="http://proxy.example.com:8080"
|
||||
export http_proxy="http://proxy.example.com:8080"
|
||||
export no_proxy="localhost,127.0.0.1,.local"
|
||||
|
||||
# Configure Arc agent proxy
|
||||
azcmagent config set proxy.url "http://proxy.example.com:8080"
|
||||
azcmagent config set proxy.bypass "localhost,127.0.0.1,.local"
|
||||
|
||||
# Restart agent
|
||||
azcmagent restart
|
||||
```
|
||||
|
||||
#### Proxy Authentication
|
||||
|
||||
```bash
|
||||
# If proxy requires authentication
|
||||
azcmagent config set proxy.url "http://user:password@proxy.example.com:8080"
|
||||
azcmagent restart
|
||||
```
|
||||
|
||||
## Governance Configuration
|
||||
|
||||
### Azure Policy
|
||||
|
||||
#### Enable Policy for Arc Servers
|
||||
|
||||
```bash
|
||||
# Assign built-in policy: "Enable Azure Monitor for VMs"
|
||||
az policy assignment create \
|
||||
--name "EnableAzureMonitorForVMs" \
|
||||
--display-name "Enable Azure Monitor for VMs" \
|
||||
--scope "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP" \
|
||||
--policy "/providers/Microsoft.Authorization/policyDefinitions/0ef5aac7-c064-427a-b87b-d47b3ddcaf73"
|
||||
```
|
||||
|
||||
#### Custom Policy Example
|
||||
|
||||
```json
|
||||
{
|
||||
"if": {
|
||||
"allOf": [
|
||||
{
|
||||
"field": "type",
|
||||
"equals": "Microsoft.HybridCompute/machines"
|
||||
},
|
||||
{
|
||||
"field": "Microsoft.HybridCompute/machines/osName",
|
||||
"notEquals": "Ubuntu"
|
||||
}
|
||||
]
|
||||
},
|
||||
"then": {
|
||||
"effect": "audit"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Azure Monitor
|
||||
|
||||
#### Enable Log Analytics
|
||||
|
||||
```bash
|
||||
# Create Log Analytics workspace
|
||||
az monitor log-analytics workspace create \
|
||||
--resource-group $RESOURCE_GROUP \
|
||||
--workspace-name "hci-logs-$LOCATION"
|
||||
|
||||
# Enable VM insights
|
||||
az monitor log-analytics solution create \
|
||||
--resource-group $RESOURCE_GROUP \
|
||||
--name "VMInsights" \
|
||||
--workspace "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.OperationalInsights/workspaces/hci-logs-$LOCATION" \
|
||||
--plan-publisher "Microsoft" \
|
||||
--plan-product "OMSGallery/VMInsights"
|
||||
```
|
||||
|
||||
#### Configure Data Collection
|
||||
|
||||
```bash
|
||||
# Enable data collection rule
|
||||
az monitor data-collection rule create \
|
||||
--resource-group $RESOURCE_GROUP \
|
||||
--name "hci-dcr" \
|
||||
--location "$LOCATION" \
|
||||
--log-analytics "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP/providers/Microsoft.OperationalInsights/workspaces/hci-logs-$LOCATION"
|
||||
```
|
||||
|
||||
### Azure Defender
|
||||
|
||||
#### Enable Defender for Servers
|
||||
|
||||
```bash
|
||||
# Enable Defender for Cloud
|
||||
az security pricing create \
|
||||
--name "VirtualMachines" \
|
||||
--tier "Standard" \
|
||||
--resource-group $RESOURCE_GROUP
|
||||
```
|
||||
|
||||
#### Onboard Arc Servers to Defender
|
||||
|
||||
```bash
|
||||
# Install Defender extension (via Azure Portal or CLI)
|
||||
az connectedmachine extension create \
|
||||
--machine-name "<machine-name>" \
|
||||
--resource-group $RESOURCE_GROUP \
|
||||
--name "WindowsDefenderATP" \
|
||||
--publisher "Microsoft.AzureDefender" \
|
||||
--type "MDE.Linux"
|
||||
```
|
||||
|
||||
### Update Management
|
||||
|
||||
#### Enable Update Management
|
||||
|
||||
```bash
|
||||
# Enable Update Management via Azure Automation
|
||||
# This is typically done through Azure Portal:
|
||||
# 1. Create Automation Account
|
||||
# 2. Enable Update Management solution
|
||||
# 3. Add Arc servers to Update Management
|
||||
```
|
||||
|
||||
## Tagging Strategy
|
||||
|
||||
### Recommended Tags
|
||||
|
||||
```bash
|
||||
# Tag machines during onboarding
|
||||
azcmagent connect \
|
||||
--subscription-id "$SUBSCRIPTION_ID" \
|
||||
--resource-group "$RESOURCE_GROUP" \
|
||||
--location "$LOCATION" \
|
||||
--tags "Environment=Production,Role=Router,Project=AzureStackHCI,ManagedBy=Arc"
|
||||
```
|
||||
|
||||
### Update Tags
|
||||
|
||||
```bash
|
||||
# Update tags after onboarding
|
||||
az connectedmachine update \
|
||||
--name "<machine-name>" \
|
||||
--resource-group $RESOURCE_GROUP \
|
||||
--tags "Environment=Production,Role=Router,Updated=2024-01-01"
|
||||
```
|
||||
|
||||
## Verification
|
||||
|
||||
### Check Agent Status
|
||||
|
||||
```bash
|
||||
# On each machine
|
||||
azcmagent show
|
||||
|
||||
# Expected output:
|
||||
# Agent Status: Connected
|
||||
# Azure Resource ID: /subscriptions/.../resourceGroups/.../providers/Microsoft.HybridCompute/machines/...
|
||||
```
|
||||
|
||||
### Verify in Azure Portal
|
||||
|
||||
1. Navigate to Azure Portal > Azure Arc > Servers
|
||||
2. Verify all machines listed
|
||||
3. Check machine status (Connected)
|
||||
4. Review machine details and tags
|
||||
|
||||
### Test Policy Enforcement
|
||||
|
||||
```bash
|
||||
# Check policy compliance
|
||||
az policy state list \
|
||||
--resource "/subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RESOURCE_GROUP" \
|
||||
--output table
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Agent Not Connecting
|
||||
|
||||
**Problem:** Agent shows as disconnected
|
||||
- **Solution:**
|
||||
- Check network connectivity (HTTPS 443)
|
||||
- Verify proxy configuration if needed
|
||||
- Check agent logs: `azcmagent logs`
|
||||
- Verify Azure credentials
|
||||
|
||||
### Proxy Issues
|
||||
|
||||
**Problem:** Agent can't connect through proxy
|
||||
- **Solution:**
|
||||
- Verify proxy URL and credentials
|
||||
- Check proxy bypass list
|
||||
- Test proxy connectivity manually
|
||||
- Review agent logs
|
||||
|
||||
### Policy Not Applying
|
||||
|
||||
**Problem:** Azure Policy not enforcing
|
||||
- **Solution:**
|
||||
- Verify policy assignment scope
|
||||
- Check policy evaluation status
|
||||
- Verify machine tags match policy conditions
|
||||
- Review policy compliance reports
|
||||
|
||||
### Monitoring Not Working
|
||||
|
||||
**Problem:** Azure Monitor not collecting data
|
||||
- **Solution:**
|
||||
- Verify Log Analytics workspace configuration
|
||||
- Check data collection rules
|
||||
- Verify agent extension installed
|
||||
- Review Log Analytics workspace logs
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Use Service Principals:**
|
||||
- Create dedicated service principal for Arc onboarding
|
||||
- Use least privilege permissions
|
||||
- Rotate credentials regularly
|
||||
|
||||
2. **Tagging:**
|
||||
- Use consistent tagging strategy
|
||||
- Include environment, role, project tags
|
||||
- Enable tag-based policy enforcement
|
||||
|
||||
3. **Monitoring:**
|
||||
- Enable Azure Monitor for all Arc servers
|
||||
- Configure alert rules
|
||||
- Set up log retention policies
|
||||
|
||||
4. **Security:**
|
||||
- Enable Azure Defender for all servers
|
||||
- Configure security policies
|
||||
- Review security recommendations regularly
|
||||
|
||||
5. **Updates:**
|
||||
- Enable Update Management
|
||||
- Schedule regular maintenance windows
|
||||
- Test updates in dev environment first
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Complete Architecture](complete-architecture.md) - Full architecture overview
|
||||
- [Bring-Up Checklist](bring-up-checklist.md) - Installation guide
|
||||
- [Microsoft Azure Arc Documentation](https://docs.microsoft.com/azure/azure-arc/)
|
||||
|
||||
377
docs/deployment/bring-up-checklist.md
Normal file
377
docs/deployment/bring-up-checklist.md
Normal file
@@ -0,0 +1,377 @@
|
||||
# Bring-Up Checklist
|
||||
|
||||
## Day-One Installation Guide
|
||||
|
||||
This checklist provides a step-by-step guide for bringing up the complete Azure Stack HCI environment on installation day.
|
||||
|
||||
## Pre-Installation Preparation
|
||||
|
||||
### Hardware Verification
|
||||
|
||||
- [ ] Router server chassis received and inspected
|
||||
- [ ] All PCIe cards received (NICs, HBAs, QAT)
|
||||
- [ ] Memory modules received (8× 4GB DDR4 ECC RDIMM)
|
||||
- [ ] Storage SSD received (256GB)
|
||||
- [ ] All cables received (Ethernet, Mini-SAS HD)
|
||||
- [ ] Storage shelves received and inspected
|
||||
- [ ] Proxmox hosts (ML110, R630) verified operational
|
||||
|
||||
### Documentation Review
|
||||
|
||||
- [ ] Complete architecture reviewed
|
||||
- [ ] PCIe slot allocation map reviewed
|
||||
- [ ] Network topology and VLAN schema reviewed
|
||||
- [ ] Driver matrix reviewed
|
||||
- [ ] All configuration files prepared
|
||||
|
||||
### Environment Configuration
|
||||
|
||||
- [ ] Copy `.env.example` to `.env`
|
||||
- [ ] Configure Azure credentials in `.env`:
|
||||
- [ ] `AZURE_SUBSCRIPTION_ID`
|
||||
- [ ] `AZURE_TENANT_ID`
|
||||
- [ ] `AZURE_RESOURCE_GROUP`
|
||||
- [ ] `AZURE_LOCATION`
|
||||
- [ ] Configure Cloudflare credentials in `.env`:
|
||||
- [ ] `CLOUDFLARE_API_TOKEN`
|
||||
- [ ] `CLOUDFLARE_ACCOUNT_EMAIL`
|
||||
- [ ] Configure Proxmox credentials in `.env`:
|
||||
- [ ] `PVE_ROOT_PASS` (shared root password for all instances)
|
||||
- [ ] `PROXMOX_ML110_URL`
|
||||
- [ ] `PROXMOX_R630_URL`
|
||||
- [ ] Note: Username `root@pam` is implied and should not be stored
|
||||
- [ ] For production: Create RBAC accounts and use API tokens instead of root
|
||||
- [ ] Verify `.env` file is in `.gitignore` (should not be committed)
|
||||
|
||||
## Phase 1: Hardware Installation
|
||||
|
||||
### Router Server Assembly
|
||||
|
||||
- [ ] Install CPU and memory (8× 4GB DDR4 ECC RDIMM)
|
||||
- [ ] Install boot SSD (256GB)
|
||||
- [ ] Install Intel QAT 8970 in x16_1 slot
|
||||
- [ ] Install Intel X550-T2 in x8_1 slot
|
||||
- [ ] Install LSI 9207-8e #1 in x8_2 slot
|
||||
- [ ] Install LSI 9207-8e #2 in x8_3 slot
|
||||
- [ ] Install Intel i350-T4 in x4_1 slot
|
||||
- [ ] Install Intel i350-T8 in x4_2 slot
|
||||
- [ ] Install Intel i225 Quad-Port in x4_3 slot
|
||||
- [ ] Verify all cards seated properly
|
||||
- [ ] Connect power and verify POST
|
||||
|
||||
### BIOS/UEFI Configuration
|
||||
|
||||
- [ ] Enter BIOS/UEFI setup
|
||||
- [ ] Verify all PCIe cards detected
|
||||
- [ ] Configure boot order (SSD first)
|
||||
- [ ] Enable virtualization (Intel VT-x, VT-d)
|
||||
- [ ] Configure memory settings (ECC enabled)
|
||||
- [ ] Set date/time
|
||||
- [ ] Save and exit BIOS
|
||||
|
||||
### Storage Shelf Cabling
|
||||
|
||||
- [ ] Connect SFF-8644 cables from LSI HBA #1 to shelves 1-2
|
||||
- [ ] Connect SFF-8644 cables from LSI HBA #2 to shelves 3-4
|
||||
- [ ] Power on storage shelves
|
||||
- [ ] Verify shelf power and status LEDs
|
||||
- [ ] Label all cables
|
||||
|
||||
### Network Cabling
|
||||
|
||||
- [ ] Connect 4× Cat6 cables from i350-T4 to Spectrum modems/ONTs (WAN1-4)
|
||||
- [ ] Connect 2× Cat6a cables to X550-T2 (reserved for future)
|
||||
- [ ] Connect 4× Cat6 cables from i225 Quad to ML110, R630, and key services
|
||||
- [ ] Connect 8× Cat6 cables from i350-T8 to remaining servers/appliances
|
||||
- [ ] Label all cables at both ends
|
||||
- [ ] Document cable mapping
|
||||
|
||||
## Phase 2: Operating System Installation
|
||||
|
||||
### Router Server OS
|
||||
|
||||
**Option A: Windows Server Core**
|
||||
|
||||
- [ ] Boot from Windows Server installation media
|
||||
- [ ] Install Windows Server Core
|
||||
- [ ] Configure initial administrator password
|
||||
- [ ] Install Windows Updates
|
||||
- [ ] Configure static IP on management interface
|
||||
- [ ] Enable Remote Desktop (if needed)
|
||||
- [ ] Install Windows Admin Center
|
||||
|
||||
**Option B: Proxmox VE**
|
||||
|
||||
- [ ] Boot from Proxmox VE installation media
|
||||
- [ ] Install Proxmox VE
|
||||
- [ ] Configure initial root password
|
||||
- [ ] Configure network (management interface)
|
||||
- [ ] Update Proxmox packages
|
||||
- [ ] Verify Proxmox web interface accessible
|
||||
|
||||
### Proxmox Hosts (ML110, R630)
|
||||
|
||||
- [ ] Verify Proxmox VE installed and updated
|
||||
- [ ] Configure network interfaces
|
||||
- [ ] Verify cluster status (if clustered)
|
||||
- [ ] Test VM creation
|
||||
|
||||
## Phase 3: Driver Installation
|
||||
|
||||
### Router Server Drivers
|
||||
|
||||
- [ ] Install Intel PROSet drivers for all NICs
|
||||
- [ ] i350-T4 (WAN)
|
||||
- [ ] i350-T8 (LAN 1GbE)
|
||||
- [ ] X550-T2 (10GbE)
|
||||
- [ ] i225 Quad-Port (LAN 2.5GbE)
|
||||
- [ ] Verify all NICs detected and functional
|
||||
- [ ] Install LSI mpt3sas driver
|
||||
- [ ] Flash LSI HBAs to IT mode
|
||||
- [ ] Verify storage shelves detected
|
||||
- [ ] Install Intel QAT drivers (qatlib)
|
||||
- [ ] Install OpenSSL QAT engine
|
||||
- [ ] Verify QAT acceleration working
|
||||
|
||||
### Driver Verification
|
||||
|
||||
- [ ] Run driver verification script
|
||||
- [ ] Test all network ports
|
||||
- [ ] Test storage connectivity
|
||||
- [ ] Test QAT acceleration
|
||||
- [ ] Document any issues
|
||||
|
||||
## Phase 4: Network Configuration
|
||||
|
||||
### OpenWrt VM Setup
|
||||
|
||||
- [ ] Create OpenWrt VM on Router server
|
||||
- [ ] Configure OpenWrt network interfaces
|
||||
- [ ] Configure VLANs (10, 20, 30, 40, 50, 60, 99)
|
||||
- [ ] Configure mwan3 for 4× Spectrum WAN
|
||||
- [ ] Configure firewall zones
|
||||
- [ ] Test multi-WAN failover
|
||||
- [ ] Configure inter-VLAN routing
|
||||
|
||||
### Proxmox VLAN Configuration
|
||||
|
||||
- [ ] Configure VLAN bridges on ML110
|
||||
- [ ] Configure VLAN bridges on R630
|
||||
- [ ] Test VLAN connectivity
|
||||
- [ ] Verify VM network isolation
|
||||
|
||||
### IP Address Configuration
|
||||
|
||||
- [ ] Configure IP addresses per VLAN schema
|
||||
- [ ] Configure DNS settings
|
||||
- [ ] Test network connectivity
|
||||
- [ ] Verify routing between VLANs
|
||||
|
||||
## Phase 5: Storage Configuration
|
||||
|
||||
### Storage Spaces Direct Setup
|
||||
|
||||
- [ ] Verify all shelves detected
|
||||
- [ ] Create Storage Spaces Direct pools
|
||||
- [ ] Create volumes for VMs
|
||||
- [ ] Create volumes for applications
|
||||
- [ ] Configure storage exports (NFS/iSCSI)
|
||||
|
||||
### Proxmox Storage Mounts
|
||||
|
||||
- [ ] Configure NFS mounts on ML110
|
||||
- [ ] Configure NFS mounts on R630
|
||||
- [ ] Test storage connectivity
|
||||
- [ ] Verify VM storage access
|
||||
|
||||
## Phase 6: Azure Arc Onboarding
|
||||
|
||||
### Arc Agent Installation
|
||||
|
||||
- [ ] Install Azure Arc agent on Router server (if Linux)
|
||||
- [ ] Install Azure Arc agent on ML110
|
||||
- [ ] Install Azure Arc agent on R630
|
||||
- [ ] Install Azure Arc agent on Windows management VM (if applicable)
|
||||
|
||||
### Arc Onboarding
|
||||
|
||||
- [ ] Load environment variables from `.env`: `export $(cat .env | grep -v '^#' | xargs)`
|
||||
- [ ] Configure Azure subscription and resource group (from `.env`)
|
||||
- [ ] Onboard Router server to Azure Arc
|
||||
- [ ] Onboard ML110 to Azure Arc
|
||||
- [ ] Onboard R630 to Azure Arc
|
||||
- [ ] Verify all resources visible in Azure Portal
|
||||
|
||||
### Arc Governance
|
||||
|
||||
- [ ] Configure Azure Policy
|
||||
- [ ] Enable Azure Monitor
|
||||
- [ ] Enable Azure Defender
|
||||
- [ ] Configure Update Management
|
||||
- [ ] Test policy enforcement
|
||||
|
||||
## Phase 7: Cloudflare Integration
|
||||
|
||||
### Cloudflare Tunnel Setup
|
||||
|
||||
- [ ] Create Cloudflare account (if not exists)
|
||||
- [ ] Create Zero Trust organization
|
||||
- [ ] Configure Cloudflare API token in `.env` file
|
||||
- [ ] Install cloudflared on Ubuntu VM
|
||||
- [ ] Authenticate cloudflared (interactive or using API token from `.env`)
|
||||
- [ ] Configure Tunnel for WAC
|
||||
- [ ] Configure Tunnel for Proxmox UI
|
||||
- [ ] Configure Tunnel for dashboards
|
||||
- [ ] Configure Tunnel for Git/CI services
|
||||
|
||||
### Zero Trust Policies
|
||||
|
||||
- [ ] Configure SSO (Azure AD/Okta)
|
||||
- [ ] Configure MFA requirements
|
||||
- [ ] Configure device posture checks
|
||||
- [ ] Configure access policies
|
||||
- [ ] Test external access
|
||||
|
||||
### WAF Configuration
|
||||
|
||||
- [ ] Configure WAF rules
|
||||
- [ ] Test WAF protection
|
||||
- [ ] Verify no inbound ports required
|
||||
|
||||
## Phase 8: Service VM Deployment
|
||||
|
||||
### Ubuntu VM Templates
|
||||
|
||||
- [ ] Create Ubuntu LTS template on Proxmox
|
||||
- [ ] Install Azure Arc agent in template
|
||||
- [ ] Configure base packages
|
||||
- [ ] Create VM snapshots
|
||||
|
||||
### Service VM Deployment
|
||||
|
||||
- [ ] Deploy Cloudflare Tunnel VM (VLAN 99)
|
||||
- [ ] Deploy Reverse Proxy VM (VLAN 30/99)
|
||||
- [ ] Deploy Observability VM (VLAN 40)
|
||||
- [ ] Deploy CI/CD VM (VLAN 50)
|
||||
- [ ] Install Azure Arc agents on all VMs
|
||||
|
||||
### Service Configuration
|
||||
|
||||
- [ ] Configure Cloudflare Tunnel
|
||||
- [ ] Configure reverse proxy (NGINX/Traefik)
|
||||
- [ ] Configure observability stack (Prometheus/Grafana)
|
||||
- [ ] Configure CI/CD (GitLab Runner/Jenkins)
|
||||
|
||||
## Phase 9: Verification and Testing
|
||||
|
||||
### Network Testing
|
||||
|
||||
- [ ] Test all WAN connections
|
||||
- [ ] Test multi-WAN failover
|
||||
- [ ] Test VLAN isolation
|
||||
- [ ] Test inter-VLAN routing
|
||||
- [ ] Test firewall rules
|
||||
|
||||
### Storage Testing
|
||||
|
||||
- [ ] Test storage read/write performance
|
||||
- [ ] Test storage redundancy
|
||||
- [ ] Test VM storage access
|
||||
- [ ] Test storage exports
|
||||
|
||||
### Service Testing
|
||||
|
||||
- [ ] Test Cloudflare Tunnel access
|
||||
- [ ] Test Azure Arc connectivity
|
||||
- [ ] Test observability dashboards
|
||||
- [ ] Test CI/CD pipelines
|
||||
|
||||
### Performance Testing
|
||||
|
||||
- [ ] Test QAT acceleration
|
||||
- [ ] Test network throughput
|
||||
- [ ] Test storage I/O
|
||||
- [ ] Document performance metrics
|
||||
|
||||
## Phase 10: Documentation and Handoff
|
||||
|
||||
### Documentation
|
||||
|
||||
- [ ] Document all IP addresses
|
||||
- [ ] Verify `.env` file contains all credentials (stored securely, not in version control)
|
||||
- [ ] Document cable mappings
|
||||
- [ ] Document VLAN configurations
|
||||
- [ ] Document storage allocations
|
||||
- [ ] Create network diagrams
|
||||
- [ ] Create runbooks
|
||||
- [ ] Verify `.env` is in `.gitignore` and not committed to repository
|
||||
|
||||
### Monitoring Setup
|
||||
|
||||
- [ ] Configure Grafana dashboards
|
||||
- [ ] Configure Prometheus alerts
|
||||
- [ ] Configure Azure Monitor alerts
|
||||
- [ ] Test alerting
|
||||
|
||||
### Security Hardening
|
||||
|
||||
- [ ] Review firewall rules
|
||||
- [ ] Review access policies
|
||||
- [ ] Create RBAC accounts for Proxmox (replace root usage)
|
||||
- [ ] Create service accounts for automation
|
||||
- [ ] Create operator accounts with appropriate roles
|
||||
- [ ] Generate API tokens for service accounts
|
||||
- [ ] Document RBAC account usage (see docs/security/proxmox-rbac.md)
|
||||
- [ ] Review secret management
|
||||
- [ ] Perform security scan
|
||||
|
||||
## Post-Installation Tasks
|
||||
|
||||
### Ongoing Maintenance
|
||||
|
||||
- [ ] Schedule regular backups
|
||||
- [ ] Schedule firmware updates
|
||||
- [ ] Schedule driver updates
|
||||
- [ ] Schedule OS updates
|
||||
- [ ] Schedule security patches
|
||||
|
||||
### Monitoring
|
||||
|
||||
- [ ] Review monitoring dashboards daily
|
||||
- [ ] Review Azure Arc status
|
||||
- [ ] Review Cloudflare Tunnel status
|
||||
- [ ] Review storage health
|
||||
- [ ] Review network performance
|
||||
|
||||
## Troubleshooting Reference
|
||||
|
||||
### Common Issues
|
||||
|
||||
**Issue:** NIC not detected
|
||||
- Check PCIe slot connection
|
||||
- Check BIOS settings
|
||||
- Update driver
|
||||
|
||||
**Issue:** Storage shelves not detected
|
||||
- Check cable connections
|
||||
- Check HBA firmware
|
||||
- Check shelf power
|
||||
|
||||
**Issue:** Azure Arc not connecting
|
||||
- Check network connectivity
|
||||
- Check proxy settings
|
||||
- Check Azure credentials
|
||||
|
||||
**Issue:** Cloudflare Tunnel not working
|
||||
- Check cloudflared service
|
||||
- Check Tunnel configuration
|
||||
- Check Zero Trust policies
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Complete Architecture](complete-architecture.md) - Full architecture overview
|
||||
- [Hardware BOM](hardware-bom.md) - Complete bill of materials
|
||||
- [PCIe Allocation](pcie-allocation.md) - Slot allocation map
|
||||
- [Network Topology](network-topology.md) - VLAN/IP schema
|
||||
- [Driver Matrix](driver-matrix.md) - Driver versions
|
||||
|
||||
387
docs/deployment/cloudflare-integration.md
Normal file
387
docs/deployment/cloudflare-integration.md
Normal file
@@ -0,0 +1,387 @@
|
||||
# Cloudflare Integration Guide
|
||||
|
||||
## Overview
|
||||
|
||||
This document describes the Cloudflare Zero Trust and Tunnel integration for secure external access to the Azure Stack HCI environment without requiring inbound ports.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Cloudflare Tunnel Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ Cloudflare Zero Trust Network │
|
||||
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
|
||||
│ │ Zero Trust │ │ WAF │ │ Tunnel │ │
|
||||
│ │ Policies │ │ Rules │ │ Endpoints │ │
|
||||
│ └──────────────┘ └──────────────┘ └──────────────┘ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
│
|
||||
│ Outbound HTTPS (443)
|
||||
│
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ On-Premises Infrastructure │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────┐ │
|
||||
│ │ Cloudflare Tunnel VM (VLAN 99) │ │
|
||||
│ │ ┌──────────────┐ │ │
|
||||
│ │ │ cloudflared │ │ │
|
||||
│ │ │ daemon │ │ │
|
||||
│ │ └──────────────┘ │ │
|
||||
│ └─────────────────────────────────────────────────────┘ │
|
||||
│ │ │ │ │
|
||||
│ ┌─────────▼──────┐ ┌────▼────┐ ┌─────▼─────┐ │
|
||||
│ │ WAC │ │ Proxmox │ │ Dashboards│ │
|
||||
│ │ (VLAN 60) │ │ UI │ │ (VLAN 40) │ │
|
||||
│ └────────────────┘ └──────────┘ └───────────┘ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Components
|
||||
|
||||
### Cloudflare Tunnel (cloudflared)
|
||||
|
||||
- **Purpose:** Secure outbound connection to Cloudflare network
|
||||
- **Location:** Ubuntu VM in VLAN 99 (DMZ)
|
||||
- **Protocol:** Outbound HTTPS (443) only
|
||||
- **Benefits:** No inbound ports required, encrypted tunnel
|
||||
|
||||
### Zero Trust Policies
|
||||
|
||||
- **SSO Integration:** Azure AD, Okta, or other identity providers
|
||||
- **MFA Requirements:** Multi-factor authentication enforcement
|
||||
- **Device Posture:** Device health and compliance checks
|
||||
- **Access Policies:** Least privilege access control
|
||||
|
||||
### WAF (Web Application Firewall)
|
||||
|
||||
- **Purpose:** Protect public ingress from attacks
|
||||
- **Rules:** Custom WAF rules for application protection
|
||||
- **Integration:** Works with Tunnel endpoints
|
||||
|
||||
## Installation
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- Cloudflare account with Zero Trust enabled
|
||||
- Ubuntu VM deployed in VLAN 99
|
||||
- Network connectivity from Tunnel VM to services
|
||||
- Azure AD or other SSO provider (optional)
|
||||
|
||||
### Environment Configuration
|
||||
|
||||
Before starting, ensure your `.env` file is configured with Cloudflare credentials:
|
||||
|
||||
```bash
|
||||
# Copy template if not already done
|
||||
cp .env.example .env
|
||||
|
||||
# Edit .env and set:
|
||||
# - CLOUDFLARE_API_TOKEN (get from https://dash.cloudflare.com/profile/api-tokens)
|
||||
# - CLOUDFLARE_ACCOUNT_EMAIL
|
||||
# - CLOUDFLARE_ZONE_ID (optional)
|
||||
```
|
||||
|
||||
### Step 1: Create Cloudflare Zero Trust Organization
|
||||
|
||||
1. Log in to [Cloudflare Dashboard](https://dash.cloudflare.com)
|
||||
2. Navigate to Zero Trust
|
||||
3. Create or select organization
|
||||
4. Note your organization name
|
||||
|
||||
**Note**: If using automation scripts, ensure `CLOUDFLARE_API_TOKEN` is set in your `.env` file.
|
||||
|
||||
### Step 2: Install cloudflared
|
||||
|
||||
On the Ubuntu Tunnel VM:
|
||||
|
||||
```bash
|
||||
# Download and install cloudflared
|
||||
curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -o /usr/local/bin/cloudflared
|
||||
chmod +x /usr/local/bin/cloudflared
|
||||
|
||||
# Verify installation
|
||||
cloudflared --version
|
||||
```
|
||||
|
||||
### Step 3: Authenticate cloudflared
|
||||
|
||||
```bash
|
||||
# Option 1: Interactive login (recommended for first-time setup)
|
||||
cloudflared tunnel login
|
||||
|
||||
# This will open a browser for authentication
|
||||
# Follow the prompts to authenticate
|
||||
|
||||
# Option 2: Using API token from .env (for automation)
|
||||
# Load environment variables if using .env
|
||||
export $(cat .env | grep -v '^#' | xargs)
|
||||
|
||||
# Note: Tunnel credentials are stored in /etc/cloudflared/<tunnel-id>.json
|
||||
# This file should be secured (chmod 600) and not committed to version control
|
||||
```
|
||||
|
||||
### Step 4: Create Tunnel
|
||||
|
||||
```bash
|
||||
# Create a new tunnel
|
||||
cloudflared tunnel create azure-stack-hci
|
||||
|
||||
# Note the tunnel ID for configuration
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Tunnel Configuration File
|
||||
|
||||
Create `/etc/cloudflared/config.yml`:
|
||||
|
||||
```yaml
|
||||
tunnel: <tunnel-id>
|
||||
credentials-file: /etc/cloudflared/<tunnel-id>.json
|
||||
|
||||
ingress:
|
||||
# Windows Admin Center
|
||||
- hostname: wac.yourdomain.com
|
||||
service: https://10.10.60.20:443
|
||||
originRequest:
|
||||
noHappyEyeballs: true
|
||||
tcpKeepAlive: 30
|
||||
|
||||
# Proxmox UI
|
||||
- hostname: proxmox.yourdomain.com
|
||||
service: https://10.10.60.10:8006
|
||||
originRequest:
|
||||
noHappyEyeballs: true
|
||||
tcpKeepAlive: 30
|
||||
|
||||
# Grafana Dashboard
|
||||
- hostname: grafana.yourdomain.com
|
||||
service: http://10.10.40.10:3000
|
||||
originRequest:
|
||||
noHappyEyeballs: true
|
||||
|
||||
# Git Server
|
||||
- hostname: git.yourdomain.com
|
||||
service: https://10.10.30.10:443
|
||||
originRequest:
|
||||
noHappyEyeballs: true
|
||||
|
||||
# CI/CD
|
||||
- hostname: ci.yourdomain.com
|
||||
service: https://10.10.50.10:443
|
||||
originRequest:
|
||||
noHappyEyeballs: true
|
||||
|
||||
# Catch-all (must be last)
|
||||
- service: http_status:404
|
||||
```
|
||||
|
||||
### DNS Configuration
|
||||
|
||||
In Cloudflare Dashboard:
|
||||
|
||||
1. Navigate to Zero Trust > Access > Tunnels
|
||||
2. Select your tunnel
|
||||
3. Configure public hostnames:
|
||||
- `wac.yourdomain.com` → Tunnel
|
||||
- `proxmox.yourdomain.com` → Tunnel
|
||||
- `grafana.yourdomain.com` → Tunnel
|
||||
- `git.yourdomain.com` → Tunnel
|
||||
- `ci.yourdomain.com` → Tunnel
|
||||
|
||||
### Systemd Service
|
||||
|
||||
Create `/etc/systemd/system/cloudflared.service`:
|
||||
|
||||
```ini
|
||||
[Unit]
|
||||
Description=Cloudflare Tunnel
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=cloudflared
|
||||
ExecStart=/usr/local/bin/cloudflared tunnel --config /etc/cloudflared/config.yml run
|
||||
Restart=on-failure
|
||||
RestartSec=5s
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
```
|
||||
|
||||
Enable and start:
|
||||
|
||||
```bash
|
||||
sudo systemctl enable cloudflared
|
||||
sudo systemctl start cloudflared
|
||||
sudo systemctl status cloudflared
|
||||
```
|
||||
|
||||
## Zero Trust Policies
|
||||
|
||||
### SSO Configuration
|
||||
|
||||
1. Navigate to Zero Trust > Access > Authentication
|
||||
2. Add identity provider:
|
||||
- **Azure AD:** Configure Azure AD app registration
|
||||
- **Okta:** Configure Okta application
|
||||
- **Other:** Follow provider-specific instructions
|
||||
|
||||
### Access Policies
|
||||
|
||||
1. Navigate to Zero Trust > Access > Applications
|
||||
2. Create application:
|
||||
- **Application name:** WAC Access
|
||||
- **Application domain:** `wac.yourdomain.com`
|
||||
- **Session duration:** 24 hours
|
||||
3. Configure policy:
|
||||
- **Action:** Allow
|
||||
- **Include:**
|
||||
- Emails: `admin@yourdomain.com`
|
||||
- Groups: `IT-Admins`
|
||||
- **Require:**
|
||||
- MFA: Yes
|
||||
- Device posture: Optional
|
||||
|
||||
### Device Posture Checks
|
||||
|
||||
1. Navigate to Zero Trust > Settings > WARP
|
||||
2. Configure device posture:
|
||||
- **OS version:** Require minimum OS version
|
||||
- **Disk encryption:** Require disk encryption
|
||||
- **Firewall:** Require firewall enabled
|
||||
|
||||
## WAF Configuration
|
||||
|
||||
### WAF Rules
|
||||
|
||||
1. Navigate to Security > WAF
|
||||
2. Create custom rules:
|
||||
|
||||
**Rule 1: Block Common Attacks**
|
||||
- **Expression:** `(http.request.uri.path contains "/wp-admin" or http.request.uri.path contains "/phpmyadmin")`
|
||||
- **Action:** Block
|
||||
|
||||
**Rule 2: Rate Limiting**
|
||||
- **Expression:** `(rate(10m) > 100)`
|
||||
- **Action:** Challenge
|
||||
|
||||
**Rule 3: Geographic Restrictions**
|
||||
- **Expression:** `(ip.geoip.country ne "US" and ip.geoip.country ne "CA")`
|
||||
- **Action:** Block (if needed)
|
||||
|
||||
## Proxmox Tunnel Example
|
||||
|
||||
### Community Patterns
|
||||
|
||||
For exposing Proxmox UI through Cloudflare Tunnel:
|
||||
|
||||
```yaml
|
||||
# In config.yml
|
||||
ingress:
|
||||
- hostname: proxmox.yourdomain.com
|
||||
service: https://10.10.60.10:8006
|
||||
originRequest:
|
||||
noHappyEyeballs: true
|
||||
tcpKeepAlive: 30
|
||||
connectTimeout: 10s
|
||||
tlsTimeout: 10s
|
||||
tcpKeepAliveTimeout: 30s
|
||||
httpHostHeader: proxmox.yourdomain.com
|
||||
```
|
||||
|
||||
### Proxmox Certificate Considerations
|
||||
|
||||
- Proxmox uses self-signed certificates by default
|
||||
- Cloudflare Tunnel handles SSL termination
|
||||
- Consider using Cloudflare's SSL/TLS mode: "Full (strict)" if using valid certificates
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Tunnel Status
|
||||
|
||||
```bash
|
||||
# Check tunnel status
|
||||
sudo systemctl status cloudflared
|
||||
|
||||
# View tunnel logs
|
||||
sudo journalctl -u cloudflared -f
|
||||
|
||||
# Test tunnel connectivity
|
||||
cloudflared tunnel info <tunnel-id>
|
||||
```
|
||||
|
||||
### Cloudflare Dashboard
|
||||
|
||||
- Navigate to Zero Trust > Access > Tunnels
|
||||
- View tunnel status and metrics
|
||||
- Monitor connection health
|
||||
- Review access logs
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Tunnel Not Connecting
|
||||
|
||||
**Problem:** Tunnel shows as disconnected
|
||||
- **Solution:**
|
||||
- Check network connectivity from VM
|
||||
- Verify credentials file exists
|
||||
- Check cloudflared service status
|
||||
- Review logs: `journalctl -u cloudflared`
|
||||
|
||||
### Services Not Accessible
|
||||
|
||||
**Problem:** Can't access services through Tunnel
|
||||
- **Solution:**
|
||||
- Verify ingress rules in config.yml
|
||||
- Check service connectivity from Tunnel VM
|
||||
- Verify DNS configuration
|
||||
- Check Zero Trust policies
|
||||
|
||||
### Authentication Issues
|
||||
|
||||
**Problem:** SSO not working
|
||||
- **Solution:**
|
||||
- Verify identity provider configuration
|
||||
- Check application policies
|
||||
- Verify user email addresses
|
||||
- Check MFA configuration
|
||||
|
||||
### Performance Issues
|
||||
|
||||
**Problem:** Slow performance through Tunnel
|
||||
- **Solution:**
|
||||
- Check network latency
|
||||
- Verify originRequest settings
|
||||
- Consider using Cloudflare's Argo Smart Routing
|
||||
- Review WAF rules for false positives
|
||||
|
||||
## Security Best Practices
|
||||
|
||||
1. **Use Zero Trust Policies:**
|
||||
- Always require authentication
|
||||
- Enforce MFA for sensitive services
|
||||
- Use device posture checks
|
||||
|
||||
2. **WAF Rules:**
|
||||
- Enable WAF for all public endpoints
|
||||
- Configure rate limiting
|
||||
- Block known attack patterns
|
||||
|
||||
3. **Tunnel Security:**
|
||||
- Run cloudflared as non-root user
|
||||
- Secure credentials file (chmod 600)
|
||||
- Monitor tunnel logs for anomalies
|
||||
|
||||
4. **Network Isolation:**
|
||||
- Keep Tunnel VM in DMZ (VLAN 99)
|
||||
- Use firewall rules to restrict access
|
||||
- Only allow necessary ports
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Complete Architecture](complete-architecture.md) - Full architecture overview
|
||||
- [Network Topology](network-topology.md) - VLAN/IP schema
|
||||
- [Bring-Up Checklist](bring-up-checklist.md) - Installation guide
|
||||
|
||||
485
docs/deployment/deployment-guide.md
Normal file
485
docs/deployment/deployment-guide.md
Normal file
@@ -0,0 +1,485 @@
|
||||
# Deployment Guide
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before starting the deployment, ensure you have:
|
||||
|
||||
1. **Two Proxmox VE hosts** with:
|
||||
- Proxmox VE 7.0+ installed
|
||||
- Static IP addresses configured
|
||||
- At least 8GB RAM per node
|
||||
- Network connectivity between nodes
|
||||
- Root or sudo access
|
||||
|
||||
2. **Azure Subscription** with:
|
||||
- Azure CLI installed and authenticated
|
||||
- Contributor role on subscription
|
||||
- Resource group creation permissions
|
||||
|
||||
3. **Network Requirements**:
|
||||
- Static IP addresses for all nodes
|
||||
- DNS resolution (or hosts file)
|
||||
- Internet access for Azure Arc connectivity
|
||||
- NFS server (optional, for shared storage)
|
||||
|
||||
4. **Tools Installed**:
|
||||
- SSH client
|
||||
- kubectl
|
||||
- helm (optional)
|
||||
- terraform (optional)
|
||||
|
||||
5. **Environment Configuration**:
|
||||
- Copy `.env.example` to `.env` and fill in all credentials
|
||||
- See [Configuration](#configuration) section for details
|
||||
|
||||
## Configuration
|
||||
|
||||
### Environment Variables Setup
|
||||
|
||||
Before starting deployment, configure your environment variables:
|
||||
|
||||
1. **Copy the template:**
|
||||
```bash
|
||||
cp .env.example .env
|
||||
```
|
||||
|
||||
2. **Edit `.env` with your credentials:**
|
||||
- Azure credentials: `AZURE_SUBSCRIPTION_ID`, `AZURE_TENANT_ID`
|
||||
- Cloudflare: `CLOUDFLARE_API_TOKEN`
|
||||
- Proxmox: `PVE_ROOT_PASS` (shared root password for all instances)
|
||||
- Proxmox ML110: `PROXMOX_ML110_URL`
|
||||
- Proxmox R630: `PROXMOX_R630_URL`
|
||||
|
||||
**Note**: The username `root@pam` is implied and should not be stored. For production operations, use RBAC accounts and API tokens instead of root credentials.
|
||||
|
||||
3. **Load environment variables:**
|
||||
```bash
|
||||
# Source the .env file
|
||||
export $(cat .env | grep -v '^#' | xargs)
|
||||
```
|
||||
|
||||
**Note**: All scripts in this guide will use environment variables from `.env` if available. You can also set them manually using `export` commands.
|
||||
|
||||
## Deployment Phases
|
||||
|
||||
### Phase 1: Proxmox Cluster Setup
|
||||
|
||||
#### Step 1.1: Configure Network on Both Nodes
|
||||
|
||||
On each Proxmox node:
|
||||
|
||||
```bash
|
||||
# Option 1: Use .env file (recommended)
|
||||
# Load environment variables from .env
|
||||
export $(cat .env | grep -v '^#' | xargs)
|
||||
|
||||
# Option 2: Set environment variables manually
|
||||
export NODE_IP=192.168.1.10 # Use appropriate IP for each node
|
||||
export NODE_GATEWAY=192.168.1.1
|
||||
export NODE_NETMASK=24
|
||||
export NODE_HOSTNAME=pve-node-1 # Use appropriate hostname
|
||||
|
||||
# Run network configuration script
|
||||
cd /path/to/loc_az_hci
|
||||
./infrastructure/proxmox/network-config.sh
|
||||
```
|
||||
|
||||
**For Node 2**, repeat with appropriate values:
|
||||
```bash
|
||||
export NODE_IP=192.168.1.11
|
||||
export NODE_HOSTNAME=pve-node-2
|
||||
./infrastructure/proxmox/network-config.sh
|
||||
```
|
||||
|
||||
#### Step 1.2: Update Proxmox Repositories
|
||||
|
||||
On both nodes:
|
||||
|
||||
```bash
|
||||
# Update to subscription-free repos
|
||||
sed -i 's/enterprise/no-subscription/g' /etc/apt/sources.list.d/pve-enterprise.list
|
||||
apt update && apt dist-upgrade -y
|
||||
```
|
||||
|
||||
#### Step 1.3: Configure Shared Storage (NFS)
|
||||
|
||||
**Option A: Using existing NFS server**
|
||||
|
||||
On both Proxmox nodes:
|
||||
|
||||
```bash
|
||||
export NFS_SERVER=192.168.1.100
|
||||
export NFS_PATH=/mnt/proxmox-storage
|
||||
export STORAGE_NAME=nfs-shared
|
||||
|
||||
./infrastructure/proxmox/nfs-storage.sh
|
||||
```
|
||||
|
||||
**Option B: Set up NFS server**
|
||||
|
||||
If you need to set up an NFS server, install and configure it on a separate machine or VM.
|
||||
|
||||
#### Step 1.4: Create Proxmox Cluster
|
||||
|
||||
**On Node 1** (cluster creator):
|
||||
|
||||
```bash
|
||||
export NODE_ROLE=create
|
||||
export CLUSTER_NAME=hc-cluster
|
||||
|
||||
./infrastructure/proxmox/cluster-setup.sh
|
||||
```
|
||||
|
||||
**On Node 2** (join cluster):
|
||||
|
||||
```bash
|
||||
export NODE_ROLE=join
|
||||
export CLUSTER_NODE_IP=192.168.1.10 # IP of Node 1
|
||||
export ROOT_PASSWORD=your-root-password # Optional, will prompt if not set
|
||||
|
||||
./infrastructure/proxmox/cluster-setup.sh
|
||||
```
|
||||
|
||||
**Verify cluster**:
|
||||
|
||||
```bash
|
||||
pvecm status
|
||||
pvecm nodes
|
||||
```
|
||||
|
||||
### Phase 2: Azure Arc Integration
|
||||
|
||||
#### Step 2.1: Prepare Azure Environment
|
||||
|
||||
```bash
|
||||
# Load environment variables from .env (if using .env file)
|
||||
export $(cat .env | grep -v '^#' | xargs)
|
||||
|
||||
# Login to Azure
|
||||
az login
|
||||
|
||||
# Set subscription (use from .env or set manually)
|
||||
az account set --subscription "${AZURE_SUBSCRIPTION_ID:-your-subscription-id}"
|
||||
|
||||
# Create resource group (if not exists)
|
||||
az group create --name "${AZURE_RESOURCE_GROUP:-HC-Stack}" --location "${AZURE_LOCATION:-eastus}"
|
||||
```
|
||||
|
||||
#### Step 2.2: Onboard Proxmox Hosts to Azure Arc
|
||||
|
||||
On each Proxmox node:
|
||||
|
||||
```bash
|
||||
# Load environment variables from .env (if using .env file)
|
||||
export $(cat .env | grep -v '^#' | xargs)
|
||||
|
||||
# Set Azure variables (use from .env or get from Azure CLI)
|
||||
export RESOURCE_GROUP="${AZURE_RESOURCE_GROUP:-HC-Stack}"
|
||||
export TENANT_ID="${AZURE_TENANT_ID:-$(az account show --query tenantId -o tsv)}"
|
||||
export SUBSCRIPTION_ID="${AZURE_SUBSCRIPTION_ID:-$(az account show --query id -o tsv)}"
|
||||
export LOCATION="${AZURE_LOCATION:-eastus}"
|
||||
export TAGS="type=proxmox,environment=hybrid"
|
||||
|
||||
./scripts/azure-arc/onboard-proxmox-hosts.sh
|
||||
```
|
||||
|
||||
**Verify in Azure Portal**:
|
||||
- Navigate to: Azure Portal → Azure Arc → Servers
|
||||
- You should see both Proxmox nodes
|
||||
|
||||
#### Step 2.3: Create VMs for Kubernetes and Git
|
||||
|
||||
Create VMs in Proxmox web UI or using Terraform:
|
||||
|
||||
```bash
|
||||
# Load environment variables from .env
|
||||
export $(cat .env | grep -v '^#' | xargs)
|
||||
|
||||
cd terraform/proxmox
|
||||
# Create terraform.tfvars from environment variables or edit manually
|
||||
cat > terraform.tfvars <<EOF
|
||||
proxmox_host = "${PROXMOX_ML110_URL#https://}"
|
||||
proxmox_username = "root@pam" # Hardcoded, not from env (best practice)
|
||||
proxmox_password = "${PVE_ROOT_PASS}"
|
||||
proxmox_node = "pve-node-1"
|
||||
EOF
|
||||
|
||||
terraform init
|
||||
terraform plan
|
||||
terraform apply
|
||||
```
|
||||
|
||||
#### Step 2.4: Onboard VMs to Azure Arc
|
||||
|
||||
For each VM:
|
||||
|
||||
```bash
|
||||
# Load environment variables from .env
|
||||
export $(cat .env | grep -v '^#' | xargs)
|
||||
|
||||
export VM_IP=192.168.1.188
|
||||
export VM_USER=ubuntu
|
||||
export RESOURCE_GROUP="${AZURE_RESOURCE_GROUP:-HC-Stack}"
|
||||
export TENANT_ID="${AZURE_TENANT_ID:-$(az account show --query tenantId -o tsv)}"
|
||||
export SUBSCRIPTION_ID="${AZURE_SUBSCRIPTION_ID:-$(az account show --query id -o tsv)}"
|
||||
export LOCATION="${AZURE_LOCATION:-eastus}"
|
||||
|
||||
./scripts/azure-arc/onboard-vms.sh
|
||||
```
|
||||
|
||||
### Phase 3: Kubernetes Setup
|
||||
|
||||
#### Step 3.1: Install K3s
|
||||
|
||||
On the VM designated for Kubernetes:
|
||||
|
||||
```bash
|
||||
export INSTALL_MODE=local
|
||||
export K3S_VERSION=latest
|
||||
|
||||
./infrastructure/kubernetes/k3s-install.sh
|
||||
```
|
||||
|
||||
**Or install remotely**:
|
||||
|
||||
```bash
|
||||
export INSTALL_MODE=remote
|
||||
export REMOTE_IP=192.168.1.188
|
||||
export REMOTE_USER=ubuntu
|
||||
|
||||
./infrastructure/kubernetes/k3s-install.sh
|
||||
```
|
||||
|
||||
#### Step 3.2: Onboard Kubernetes to Azure Arc
|
||||
|
||||
```bash
|
||||
# Load environment variables from .env
|
||||
export $(cat .env | grep -v '^#' | xargs)
|
||||
|
||||
export RESOURCE_GROUP="${AZURE_RESOURCE_GROUP:-HC-Stack}"
|
||||
export TENANT_ID="${AZURE_TENANT_ID:-$(az account show --query tenantId -o tsv)}"
|
||||
export SUBSCRIPTION_ID="${AZURE_SUBSCRIPTION_ID:-$(az account show --query id -o tsv)}"
|
||||
export LOCATION="${AZURE_LOCATION:-eastus}"
|
||||
export CLUSTER_NAME=proxmox-k3s-cluster
|
||||
|
||||
# Ensure kubeconfig is set
|
||||
export KUBECONFIG=~/.kube/config
|
||||
|
||||
./infrastructure/kubernetes/arc-onboard-k8s.sh
|
||||
```
|
||||
|
||||
**Verify in Azure Portal**:
|
||||
- Navigate to: Azure Portal → Azure Arc → Kubernetes
|
||||
- You should see your cluster
|
||||
|
||||
#### Step 3.3: Install Base Infrastructure
|
||||
|
||||
```bash
|
||||
# Apply namespace and base infrastructure
|
||||
kubectl apply -f gitops/infrastructure/namespace.yaml
|
||||
kubectl apply -f gitops/infrastructure/ingress-controller.yaml
|
||||
kubectl apply -f gitops/infrastructure/cert-manager.yaml
|
||||
```
|
||||
|
||||
### Phase 4: Git/DevOps Setup
|
||||
|
||||
#### Option A: Deploy Gitea (Recommended for small deployments)
|
||||
|
||||
```bash
|
||||
export GITEA_DOMAIN=git.local
|
||||
export GITEA_PORT=3000
|
||||
|
||||
./infrastructure/gitops/gitea-deploy.sh
|
||||
```
|
||||
|
||||
Access Gitea at `http://git.local:3000` and complete initial setup.
|
||||
|
||||
#### Option B: Deploy GitLab CE
|
||||
|
||||
```bash
|
||||
export GITLAB_DOMAIN=gitlab.local
|
||||
export GITLAB_PORT=8080
|
||||
|
||||
./infrastructure/gitops/gitlab-deploy.sh
|
||||
```
|
||||
|
||||
**Note**: GitLab requires at least 8GB RAM.
|
||||
|
||||
#### Option C: Azure DevOps Self-Hosted Agent
|
||||
|
||||
On a VM:
|
||||
|
||||
```bash
|
||||
# Load environment variables from .env
|
||||
export $(cat .env | grep -v '^#' | xargs)
|
||||
|
||||
export AZP_URL="${AZP_URL:-https://dev.azure.com/yourorg}"
|
||||
export AZP_TOKEN="${AZP_TOKEN:-your-personal-access-token}"
|
||||
export AZP_AGENT_NAME=proxmox-agent-1
|
||||
export AZP_POOL=Default
|
||||
|
||||
./infrastructure/gitops/azure-devops-agent.sh
|
||||
```
|
||||
|
||||
### Phase 5: Configure GitOps
|
||||
|
||||
#### Step 5.1: Create Git Repository
|
||||
|
||||
1. Create a new repository in your Git server (Gitea/GitLab)
|
||||
2. Clone the repository locally
|
||||
3. Copy the `gitops/` directory to your repository
|
||||
4. Commit and push:
|
||||
|
||||
```bash
|
||||
git clone http://git.local:3000/user/gitops-repo.git
|
||||
cd gitops-repo
|
||||
cp -r /path/to/loc_az_hci/gitops/* .
|
||||
git add .
|
||||
git commit -m "Initial GitOps configuration"
|
||||
git push
|
||||
```
|
||||
|
||||
#### Step 5.2: Connect GitOps to Azure Arc
|
||||
|
||||
In Azure Portal:
|
||||
|
||||
1. Navigate to: Azure Arc → Kubernetes → Your cluster
|
||||
2. Go to "GitOps" section
|
||||
3. Click "Add configuration"
|
||||
4. Configure:
|
||||
- Repository URL: `http://git.local:3000/user/gitops-repo.git`
|
||||
- Branch: `main`
|
||||
- Path: `gitops/`
|
||||
- Authentication: Configure as needed
|
||||
|
||||
### Phase 6: Deploy HC Stack Services
|
||||
|
||||
#### Option A: Deploy via GitOps (Recommended)
|
||||
|
||||
1. Update Helm chart values in your Git repository
|
||||
2. Commit and push changes
|
||||
3. Flux will automatically deploy updates
|
||||
|
||||
#### Option B: Deploy Manually with Helm
|
||||
|
||||
```bash
|
||||
# Add Helm charts
|
||||
helm install besu ./gitops/apps/besu -n blockchain
|
||||
helm install firefly ./gitops/apps/firefly -n blockchain
|
||||
helm install chainlink-ccip ./gitops/apps/chainlink-ccip -n blockchain
|
||||
helm install blockscout ./gitops/apps/blockscout -n blockchain
|
||||
helm install cacti ./gitops/apps/cacti -n monitoring
|
||||
helm install nginx-proxy ./gitops/apps/nginx-proxy -n hc-stack
|
||||
```
|
||||
|
||||
#### Option C: Deploy with Terraform
|
||||
|
||||
```bash
|
||||
cd terraform/kubernetes
|
||||
terraform init
|
||||
terraform plan
|
||||
terraform apply
|
||||
```
|
||||
|
||||
### Phase 7: Verify Deployment
|
||||
|
||||
#### Check Proxmox Cluster
|
||||
|
||||
```bash
|
||||
pvecm status
|
||||
pvesm status
|
||||
```
|
||||
|
||||
#### Check Azure Arc
|
||||
|
||||
```bash
|
||||
# List Arc-enabled servers
|
||||
az connectedmachine list --resource-group HC-Stack -o table
|
||||
|
||||
# List Arc-enabled Kubernetes clusters
|
||||
az arc kubernetes list --resource-group HC-Stack -o table
|
||||
```
|
||||
|
||||
#### Check Kubernetes
|
||||
|
||||
```bash
|
||||
kubectl get nodes
|
||||
kubectl get pods --all-namespaces
|
||||
kubectl get services --all-namespaces
|
||||
```
|
||||
|
||||
#### Check Applications
|
||||
|
||||
```bash
|
||||
# Check Besu
|
||||
kubectl get pods -n blockchain -l app=besu
|
||||
|
||||
# Check Firefly
|
||||
kubectl get pods -n blockchain -l app=firefly
|
||||
|
||||
# Check all services
|
||||
kubectl get all --all-namespaces
|
||||
```
|
||||
|
||||
## Post-Deployment Configuration
|
||||
|
||||
### 1. Configure Ingress
|
||||
|
||||
Update ingress configurations for external access:
|
||||
|
||||
```bash
|
||||
# Edit ingress resources
|
||||
kubectl edit ingress -n blockchain
|
||||
```
|
||||
|
||||
### 2. Set Up Monitoring
|
||||
|
||||
- Configure Cacti to monitor your infrastructure
|
||||
- Set up Azure Monitor alerts
|
||||
- Configure log aggregation
|
||||
|
||||
### 3. Configure Backup
|
||||
|
||||
- Set up Proxmox backup schedules
|
||||
- Configure Kubernetes backup (Velero)
|
||||
- Set up Azure Backup for Arc resources
|
||||
|
||||
### 4. Security Hardening
|
||||
|
||||
- Enable Azure Policy for compliance
|
||||
- Configure network policies
|
||||
- Set up RBAC
|
||||
- Enable Defender for Cloud
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
1. **Cluster creation fails**:
|
||||
- Check network connectivity between nodes
|
||||
- Verify firewall rules
|
||||
- Check Corosync configuration
|
||||
|
||||
2. **Azure Arc connection fails**:
|
||||
- Verify internet connectivity
|
||||
- Check Azure credentials
|
||||
- Review agent logs: `journalctl -u azcmagent`
|
||||
|
||||
3. **Kubernetes pods not starting**:
|
||||
- Check resource limits
|
||||
- Verify storage classes
|
||||
- Review pod logs: `kubectl logs <pod-name>`
|
||||
|
||||
4. **GitOps not syncing**:
|
||||
- Check Flux logs: `kubectl logs -n flux-system -l app=flux`
|
||||
- Verify repository access
|
||||
- Check GitOps configuration in Azure Portal
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Review architecture documentation
|
||||
2. Set up monitoring and alerting
|
||||
3. Configure backup and disaster recovery
|
||||
4. Implement security policies
|
||||
5. Plan for scaling and expansion
|
||||
|
||||
Reference in New Issue
Block a user