Initial commit: loc_az_hci (smom-dbis-138 excluded via .gitignore)
Some checks failed
Test / test (push) Has been cancelled

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
defiQUG
2026-02-08 09:04:46 -08:00
commit c39465c2bd
386 changed files with 50649 additions and 0 deletions

View File

@@ -0,0 +1,309 @@
# Proxmox VE RBAC and Security Best Practices
## Overview
This document provides guidelines for implementing Role-Based Access Control (RBAC) and security best practices for Proxmox VE instances. The goal is to minimize root account usage and implement least-privilege access for all operational tasks.
## Root Account Usage
### When to Use Root
The `root@pam` account should **only** be used for:
- Initial system provisioning and setup
- Granting and adjusting permissions
- Emergency system recovery
- Security patches or updates that explicitly require superuser privileges
### Root Account Restrictions
- **Never** use root for daily operations
- **Never** create API tokens for root (bypasses RBAC and auditing)
- **Never** store root credentials in code repositories
- Root password should be stored only in secure vaults (`.env` file for local development)
## Credential Management
### Environment Variables
Store only the minimal required secret:
```bash
PVE_ROOT_PASS="<secure, unique, strong-password>"
```
**Important:**
- Do not store the username (`root@pam`) in environment variables - it is implied
- Never commit `.env` files to version control
- Use `.env.example` for documentation templates only
- In production, use proper secret management (HashiCorp Vault, Azure Key Vault, etc.)
## RBAC Implementation
### Create Non-Root Operational Accounts
Create dedicated accounts for different operational roles:
**Service Accounts:**
- `svc-pve-automation@pve` - For automation scripts and CI/CD
- `svc-pve-monitoring@pve` - For monitoring and alerting systems
**Operator Accounts:**
- `devops-admin@pve` - For DevOps team members
- `readonly-monitor@pve` - For read-only monitoring and dashboards
### Standard PVE Roles
| Role Type | PVE Role Name | Purpose |
|------------------|-----------------|-------------------------------------|
| Read-only | `PVEAuditor` | Monitoring, dashboards, API polling |
| Limited VM admin | `PVEVMAdmin` | Manage VMs only (no host access) |
| Storage admin | `PVEStorageAdmin`| Manage storage systems |
| Node admin | `PVESysAdmin` | Manage node services without root |
### Creating Custom Roles
Example: Create a role that allows only start/stop/reset of VMs:
```bash
pveum roleadd VMControl -privs "VM.PowerMgmt"
```
Then assign to a user:
```bash
pveum aclmod /vms -user svc-pve-automation@pve -role VMControl
```
### Assigning Roles
```bash
# Assign PVEAuditor role (read-only) to monitoring account
pveum aclmod / -user readonly-monitor@pve -role PVEAuditor
# Assign PVEVMAdmin role to DevOps account
pveum aclmod /vms -user devops-admin@pve -role PVEVMAdmin
# Assign custom role to service account
pveum aclmod /vms -user svc-pve-automation@pve -role VMControl
```
## API Token Management
### Creating API Tokens
Create API tokens tied to RBAC accounts (not root):
```bash
# Create token for service account with expiration
pveum user token add svc-pve-automation@pve automation-token \
--expire 2025-12-31 --privsep 1
```
**Best Practices:**
- Always set expiration dates for tokens
- Use `--privsep 1` to enable privilege separation
- Create separate tokens for different services/environments
- Document token purpose and rotation schedule
### Using API Tokens
In your `.env` file (for service accounts):
```bash
# Service account API token (not root)
PROXMOX_ML110_TOKEN_ID=svc-pve-automation@pve!automation-token
PROXMOX_ML110_TOKEN_SECRET=your-token-secret
```
### Token Rotation
- Rotate tokens every 90-180 days
- Create new token before deleting old one
- Update all systems using the token
- Monitor for failed authentications during rotation
## Access Workflow
### Normal Operations
All routine operations should use:
- RBAC accounts (DevOps, automation, monitoring)
- Service accounts with scoped privileges
- API tokens with expiration enabled
### Temporary Administrative Access
When privileged operations are required:
1. Log in as `root@pam` (only when necessary)
2. Make the configuration or assign needed permissions
3. Log out of root immediately
4. Revert elevated permissions when no longer needed
## Password and Secret Management
### Password Rules
- Use 20-32 character random passwords
- Rotate root password every 90-180 days
- Store secrets only in approved secure vaults
- Do not reuse passwords across systems
- Use password managers for human accounts
### SSH Key Policy
- Root SSH login should be **disabled**
- Only RBAC admin accounts should have SSH keys
- Use SSH certificates where possible
- Rotate SSH keys regularly
## Hardening Recommendations
### Disable Root Web UI Access (Optional)
You may restrict root login via PVE web UI to emergency use only by:
- Configuring firewall rules
- Using Cloudflare Zero Trust policies
- Implementing IP allowlists
### Limit API Exposure
- Restrict PVE API access to VPN/IP-allowed ranges
- Avoid exposing PVE API ports publicly
- Use Cloudflare Tunnel for secure external access
- Implement rate limiting
### SSL/TLS Certificate Management
**Self-Signed Certificates (Default):**
- Proxmox VE uses self-signed SSL certificates by default
- Browser security warnings are expected and normal
- For local/internal access, this is acceptable
- Scripts use `-k` flag with curl to bypass certificate validation
**Production Certificates:**
- For production, consider using proper SSL certificates:
- Let's Encrypt certificates (via ACME)
- Internal CA certificates
- Commercial SSL certificates
- Configure certificates in Proxmox: Datacenter > ACME
- Cloudflare Tunnel handles SSL termination for external access (recommended)
### Two-Factor Authentication
Implement 2FA for all non-automation accounts:
- TOTP (Time-based One-Time Password)
- WebAuthn
- Hardware tokens (YubiKey recommended)
## Logging, Audit, and Monitoring
### Enable Audit Logs
- Enable PVE audit logs
- Send logs to centralized logging (ELK, Prometheus, Loki, Azure Monitor)
- Configure log retention policies
### Monitor For
- Login attempts (successful and failed)
- Token creation/deletion
- Permission escalations
- VM or node-level API operations
- Root account usage
### Alerting
Implement alerts for:
- Root login events
- Failed login spikes
- Unexpected token creations
- Permission changes
- Unusual API activity patterns
## Compliance and Governance
### Access Control Matrix
Maintain a documented access-control matrix showing:
- User accounts and their roles
- Service accounts and their purposes
- API tokens and their scopes
- Permission assignments
### Regular Reviews
Perform periodic reviews (monthly or quarterly):
- Review user accounts (remove inactive)
- Verify token validity and expiration
- Audit role assignments
- Review audit logs for anomalies
- Update access-control matrix
### Change Control
Create change-control procedures for:
- Root-level actions
- Permission changes
- Token creation/deletion
- Role modifications
## Implementation Checklist
- [ ] Create service accounts for automation
- [ ] Create operator accounts for team members
- [ ] Assign appropriate roles to each account
- [ ] Create API tokens for service accounts (with expiration)
- [ ] Update automation scripts to use service accounts
- [ ] Disable root SSH access
- [ ] Enable audit logging
- [ ] Configure centralized log collection
- [ ] Set up alerting for security events
- [ ] Document access-control matrix
- [ ] Schedule regular access reviews
- [ ] Implement 2FA for human accounts
## Example: Complete Service Account Setup
```bash
# 1. Create service account
pveum user add svc-pve-automation@pve
# 2. Set password (or use API token only)
pveum passwd svc-pve-automation@pve
# 3. Create custom role for automation
pveum roleadd AutomationRole -privs "VM.PowerMgmt VM.Config.Network Datastore.AllocateSpace"
# 4. Assign role to service account
pveum aclmod /vms -user svc-pve-automation@pve -role AutomationRole
# 5. Create API token
pveum user token add svc-pve-automation@pve automation-token \
--expire 2025-12-31 --privsep 1
# 6. Document token ID and secret
# Token ID: svc-pve-automation@pve!automation-token
# Token Secret: <generated-secret>
```
## Related Documentation
- [Azure Arc Onboarding](azure-arc-onboarding.md) - Agent installation and governance
- [Cloudflare Integration](cloudflare-integration.md) - Secure external access
- [Bring-Up Checklist](../bring-up-checklist.md) - Initial setup procedures
- [Proxmox VE Documentation](https://pve.proxmox.com/pve-docs/)
## Summary
To secure a PVE environment properly:
1. Store only `PVE_ROOT_PASS` in `.env` (username implied)
2. Use root strictly for permission grants and essential admin tasks
3. Create and enforce RBAC accounts for all operational workflows
4. Use API tokens with expiration and role separation
5. Audit, log, and monitor all authentication and permission changes
6. Use strong secrets, vaults, 2FA, and SSH hardening
7. Review access regularly and maintain governance standards

View File

@@ -0,0 +1,155 @@
# Security Guide
Security best practices and configuration for the Azure Stack HCI infrastructure.
## Overview
This guide covers security considerations and best practices for securing the Azure Stack HCI infrastructure.
## Network Security
### VLAN Segmentation
- **VLAN 10**: Storage (isolated)
- **VLAN 20**: Compute (isolated)
- **VLAN 30**: App Tier (isolated)
- **VLAN 40**: Observability (isolated)
- **VLAN 50**: Dev/Test (isolated)
- **VLAN 60**: Management (restricted access)
- **VLAN 99**: DMZ (public-facing)
### Firewall Rules
- Default deny between VLANs
- Explicit allow rules for required communication
- Management VLAN access restricted to authorized IPs
- DMZ isolated from internal networks
## Access Control
### Proxmox RBAC
- Use role-based access control (RBAC)
- Create dedicated users instead of using root
- Use API tokens instead of passwords
- Limit permissions to minimum required
See [Proxmox RBAC Guide](proxmox-rbac.md) for detailed configuration.
### Azure Arc Security
- Use managed identities where possible
- Implement Azure Policy for compliance
- Enable Azure Defender for Cloud
- Use Azure Key Vault for secrets
### Kubernetes RBAC
- Use Role-Based Access Control (RBAC)
- Create service accounts for applications
- Limit cluster-admin access
- Use network policies for pod isolation
## Secrets Management
### Environment Variables
- Store secrets in `.env` file (not committed to git)
- Use `.env.example` as template
- Never commit `.env` to version control
- Rotate secrets regularly
### Azure Key Vault
For production deployments, consider using Azure Key Vault:
```bash
# Store secret
az keyvault secret set \
--vault-name <vault-name> \
--name <secret-name> \
--value <secret-value>
# Retrieve secret
az keyvault secret show \
--vault-name <vault-name> \
--name <secret-name> \
--query value -o tsv
```
### Kubernetes Secrets
- Use Kubernetes secrets for application credentials
- Consider external secret management (e.g., Sealed Secrets)
- Encrypt secrets at rest
- Rotate secrets regularly
## SSL/TLS
### Certificates
- Use valid SSL/TLS certificates for all services
- Configure certificate auto-renewal (Cert-Manager)
- Use Let's Encrypt for public services
- Use internal CA for private services
### Cloudflare Tunnel
- Cloudflare Tunnel handles SSL termination
- No inbound ports required
- WAF protection enabled
- DDoS protection enabled
## Monitoring and Auditing
### Logging
- Enable audit logging for all components
- Centralize logs (Azure Log Analytics, syslog)
- Retain logs for compliance
- Monitor for suspicious activity
### Azure Monitor
- Enable Azure Monitor for all resources
- Set up alerting for security events
- Monitor for policy violations
- Track access and changes
### Azure Defender
- Enable Azure Defender for Cloud
- Configure threat detection
- Set up security alerts
- Review security recommendations
## Compliance
### Azure Policy
- Apply security baseline policies
- Enforce compliance requirements
- Monitor policy compliance
- Remediate non-compliant resources
### Updates
- Keep all systems updated
- Use Azure Update Management
- Schedule regular maintenance windows
- Test updates in non-production first
## Best Practices
1. **Principle of Least Privilege**: Grant minimum required permissions
2. **Defense in Depth**: Multiple layers of security
3. **Regular Audits**: Review access and permissions regularly
4. **Incident Response**: Have a plan for security incidents
5. **Backup and Recovery**: Regular backups and tested recovery procedures
## Additional Resources
- [Proxmox RBAC Guide](proxmox-rbac.md)
- [Azure Security Documentation](https://docs.microsoft.com/azure/security/)
- [Kubernetes Security](https://kubernetes.io/docs/concepts/security/)