Initial commit: loc_az_hci (smom-dbis-138 excluded via .gitignore)
Some checks failed
Test / test (push) Has been cancelled
Some checks failed
Test / test (push) Has been cancelled
Co-authored-by: Cursor <cursoragent@cursor.com>
This commit is contained in:
309
docs/security/proxmox-rbac.md
Normal file
309
docs/security/proxmox-rbac.md
Normal file
@@ -0,0 +1,309 @@
|
||||
# Proxmox VE RBAC and Security Best Practices
|
||||
|
||||
## Overview
|
||||
|
||||
This document provides guidelines for implementing Role-Based Access Control (RBAC) and security best practices for Proxmox VE instances. The goal is to minimize root account usage and implement least-privilege access for all operational tasks.
|
||||
|
||||
## Root Account Usage
|
||||
|
||||
### When to Use Root
|
||||
|
||||
The `root@pam` account should **only** be used for:
|
||||
|
||||
- Initial system provisioning and setup
|
||||
- Granting and adjusting permissions
|
||||
- Emergency system recovery
|
||||
- Security patches or updates that explicitly require superuser privileges
|
||||
|
||||
### Root Account Restrictions
|
||||
|
||||
- **Never** use root for daily operations
|
||||
- **Never** create API tokens for root (bypasses RBAC and auditing)
|
||||
- **Never** store root credentials in code repositories
|
||||
- Root password should be stored only in secure vaults (`.env` file for local development)
|
||||
|
||||
## Credential Management
|
||||
|
||||
### Environment Variables
|
||||
|
||||
Store only the minimal required secret:
|
||||
|
||||
```bash
|
||||
PVE_ROOT_PASS="<secure, unique, strong-password>"
|
||||
```
|
||||
|
||||
**Important:**
|
||||
- Do not store the username (`root@pam`) in environment variables - it is implied
|
||||
- Never commit `.env` files to version control
|
||||
- Use `.env.example` for documentation templates only
|
||||
- In production, use proper secret management (HashiCorp Vault, Azure Key Vault, etc.)
|
||||
|
||||
## RBAC Implementation
|
||||
|
||||
### Create Non-Root Operational Accounts
|
||||
|
||||
Create dedicated accounts for different operational roles:
|
||||
|
||||
**Service Accounts:**
|
||||
- `svc-pve-automation@pve` - For automation scripts and CI/CD
|
||||
- `svc-pve-monitoring@pve` - For monitoring and alerting systems
|
||||
|
||||
**Operator Accounts:**
|
||||
- `devops-admin@pve` - For DevOps team members
|
||||
- `readonly-monitor@pve` - For read-only monitoring and dashboards
|
||||
|
||||
### Standard PVE Roles
|
||||
|
||||
| Role Type | PVE Role Name | Purpose |
|
||||
|------------------|-----------------|-------------------------------------|
|
||||
| Read-only | `PVEAuditor` | Monitoring, dashboards, API polling |
|
||||
| Limited VM admin | `PVEVMAdmin` | Manage VMs only (no host access) |
|
||||
| Storage admin | `PVEStorageAdmin`| Manage storage systems |
|
||||
| Node admin | `PVESysAdmin` | Manage node services without root |
|
||||
|
||||
### Creating Custom Roles
|
||||
|
||||
Example: Create a role that allows only start/stop/reset of VMs:
|
||||
|
||||
```bash
|
||||
pveum roleadd VMControl -privs "VM.PowerMgmt"
|
||||
```
|
||||
|
||||
Then assign to a user:
|
||||
|
||||
```bash
|
||||
pveum aclmod /vms -user svc-pve-automation@pve -role VMControl
|
||||
```
|
||||
|
||||
### Assigning Roles
|
||||
|
||||
```bash
|
||||
# Assign PVEAuditor role (read-only) to monitoring account
|
||||
pveum aclmod / -user readonly-monitor@pve -role PVEAuditor
|
||||
|
||||
# Assign PVEVMAdmin role to DevOps account
|
||||
pveum aclmod /vms -user devops-admin@pve -role PVEVMAdmin
|
||||
|
||||
# Assign custom role to service account
|
||||
pveum aclmod /vms -user svc-pve-automation@pve -role VMControl
|
||||
```
|
||||
|
||||
## API Token Management
|
||||
|
||||
### Creating API Tokens
|
||||
|
||||
Create API tokens tied to RBAC accounts (not root):
|
||||
|
||||
```bash
|
||||
# Create token for service account with expiration
|
||||
pveum user token add svc-pve-automation@pve automation-token \
|
||||
--expire 2025-12-31 --privsep 1
|
||||
```
|
||||
|
||||
**Best Practices:**
|
||||
- Always set expiration dates for tokens
|
||||
- Use `--privsep 1` to enable privilege separation
|
||||
- Create separate tokens for different services/environments
|
||||
- Document token purpose and rotation schedule
|
||||
|
||||
### Using API Tokens
|
||||
|
||||
In your `.env` file (for service accounts):
|
||||
|
||||
```bash
|
||||
# Service account API token (not root)
|
||||
PROXMOX_ML110_TOKEN_ID=svc-pve-automation@pve!automation-token
|
||||
PROXMOX_ML110_TOKEN_SECRET=your-token-secret
|
||||
```
|
||||
|
||||
### Token Rotation
|
||||
|
||||
- Rotate tokens every 90-180 days
|
||||
- Create new token before deleting old one
|
||||
- Update all systems using the token
|
||||
- Monitor for failed authentications during rotation
|
||||
|
||||
## Access Workflow
|
||||
|
||||
### Normal Operations
|
||||
|
||||
All routine operations should use:
|
||||
- RBAC accounts (DevOps, automation, monitoring)
|
||||
- Service accounts with scoped privileges
|
||||
- API tokens with expiration enabled
|
||||
|
||||
### Temporary Administrative Access
|
||||
|
||||
When privileged operations are required:
|
||||
|
||||
1. Log in as `root@pam` (only when necessary)
|
||||
2. Make the configuration or assign needed permissions
|
||||
3. Log out of root immediately
|
||||
4. Revert elevated permissions when no longer needed
|
||||
|
||||
## Password and Secret Management
|
||||
|
||||
### Password Rules
|
||||
|
||||
- Use 20-32 character random passwords
|
||||
- Rotate root password every 90-180 days
|
||||
- Store secrets only in approved secure vaults
|
||||
- Do not reuse passwords across systems
|
||||
- Use password managers for human accounts
|
||||
|
||||
### SSH Key Policy
|
||||
|
||||
- Root SSH login should be **disabled**
|
||||
- Only RBAC admin accounts should have SSH keys
|
||||
- Use SSH certificates where possible
|
||||
- Rotate SSH keys regularly
|
||||
|
||||
## Hardening Recommendations
|
||||
|
||||
### Disable Root Web UI Access (Optional)
|
||||
|
||||
You may restrict root login via PVE web UI to emergency use only by:
|
||||
- Configuring firewall rules
|
||||
- Using Cloudflare Zero Trust policies
|
||||
- Implementing IP allowlists
|
||||
|
||||
### Limit API Exposure
|
||||
|
||||
- Restrict PVE API access to VPN/IP-allowed ranges
|
||||
- Avoid exposing PVE API ports publicly
|
||||
- Use Cloudflare Tunnel for secure external access
|
||||
- Implement rate limiting
|
||||
|
||||
### SSL/TLS Certificate Management
|
||||
|
||||
**Self-Signed Certificates (Default):**
|
||||
- Proxmox VE uses self-signed SSL certificates by default
|
||||
- Browser security warnings are expected and normal
|
||||
- For local/internal access, this is acceptable
|
||||
- Scripts use `-k` flag with curl to bypass certificate validation
|
||||
|
||||
**Production Certificates:**
|
||||
- For production, consider using proper SSL certificates:
|
||||
- Let's Encrypt certificates (via ACME)
|
||||
- Internal CA certificates
|
||||
- Commercial SSL certificates
|
||||
- Configure certificates in Proxmox: Datacenter > ACME
|
||||
- Cloudflare Tunnel handles SSL termination for external access (recommended)
|
||||
|
||||
### Two-Factor Authentication
|
||||
|
||||
Implement 2FA for all non-automation accounts:
|
||||
- TOTP (Time-based One-Time Password)
|
||||
- WebAuthn
|
||||
- Hardware tokens (YubiKey recommended)
|
||||
|
||||
## Logging, Audit, and Monitoring
|
||||
|
||||
### Enable Audit Logs
|
||||
|
||||
- Enable PVE audit logs
|
||||
- Send logs to centralized logging (ELK, Prometheus, Loki, Azure Monitor)
|
||||
- Configure log retention policies
|
||||
|
||||
### Monitor For
|
||||
|
||||
- Login attempts (successful and failed)
|
||||
- Token creation/deletion
|
||||
- Permission escalations
|
||||
- VM or node-level API operations
|
||||
- Root account usage
|
||||
|
||||
### Alerting
|
||||
|
||||
Implement alerts for:
|
||||
- Root login events
|
||||
- Failed login spikes
|
||||
- Unexpected token creations
|
||||
- Permission changes
|
||||
- Unusual API activity patterns
|
||||
|
||||
## Compliance and Governance
|
||||
|
||||
### Access Control Matrix
|
||||
|
||||
Maintain a documented access-control matrix showing:
|
||||
- User accounts and their roles
|
||||
- Service accounts and their purposes
|
||||
- API tokens and their scopes
|
||||
- Permission assignments
|
||||
|
||||
### Regular Reviews
|
||||
|
||||
Perform periodic reviews (monthly or quarterly):
|
||||
- Review user accounts (remove inactive)
|
||||
- Verify token validity and expiration
|
||||
- Audit role assignments
|
||||
- Review audit logs for anomalies
|
||||
- Update access-control matrix
|
||||
|
||||
### Change Control
|
||||
|
||||
Create change-control procedures for:
|
||||
- Root-level actions
|
||||
- Permission changes
|
||||
- Token creation/deletion
|
||||
- Role modifications
|
||||
|
||||
## Implementation Checklist
|
||||
|
||||
- [ ] Create service accounts for automation
|
||||
- [ ] Create operator accounts for team members
|
||||
- [ ] Assign appropriate roles to each account
|
||||
- [ ] Create API tokens for service accounts (with expiration)
|
||||
- [ ] Update automation scripts to use service accounts
|
||||
- [ ] Disable root SSH access
|
||||
- [ ] Enable audit logging
|
||||
- [ ] Configure centralized log collection
|
||||
- [ ] Set up alerting for security events
|
||||
- [ ] Document access-control matrix
|
||||
- [ ] Schedule regular access reviews
|
||||
- [ ] Implement 2FA for human accounts
|
||||
|
||||
## Example: Complete Service Account Setup
|
||||
|
||||
```bash
|
||||
# 1. Create service account
|
||||
pveum user add svc-pve-automation@pve
|
||||
|
||||
# 2. Set password (or use API token only)
|
||||
pveum passwd svc-pve-automation@pve
|
||||
|
||||
# 3. Create custom role for automation
|
||||
pveum roleadd AutomationRole -privs "VM.PowerMgmt VM.Config.Network Datastore.AllocateSpace"
|
||||
|
||||
# 4. Assign role to service account
|
||||
pveum aclmod /vms -user svc-pve-automation@pve -role AutomationRole
|
||||
|
||||
# 5. Create API token
|
||||
pveum user token add svc-pve-automation@pve automation-token \
|
||||
--expire 2025-12-31 --privsep 1
|
||||
|
||||
# 6. Document token ID and secret
|
||||
# Token ID: svc-pve-automation@pve!automation-token
|
||||
# Token Secret: <generated-secret>
|
||||
```
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Azure Arc Onboarding](azure-arc-onboarding.md) - Agent installation and governance
|
||||
- [Cloudflare Integration](cloudflare-integration.md) - Secure external access
|
||||
- [Bring-Up Checklist](../bring-up-checklist.md) - Initial setup procedures
|
||||
- [Proxmox VE Documentation](https://pve.proxmox.com/pve-docs/)
|
||||
|
||||
## Summary
|
||||
|
||||
To secure a PVE environment properly:
|
||||
|
||||
1. Store only `PVE_ROOT_PASS` in `.env` (username implied)
|
||||
2. Use root strictly for permission grants and essential admin tasks
|
||||
3. Create and enforce RBAC accounts for all operational workflows
|
||||
4. Use API tokens with expiration and role separation
|
||||
5. Audit, log, and monitor all authentication and permission changes
|
||||
6. Use strong secrets, vaults, 2FA, and SSH hardening
|
||||
7. Review access regularly and maintain governance standards
|
||||
|
||||
155
docs/security/security-guide.md
Normal file
155
docs/security/security-guide.md
Normal file
@@ -0,0 +1,155 @@
|
||||
# Security Guide
|
||||
|
||||
Security best practices and configuration for the Azure Stack HCI infrastructure.
|
||||
|
||||
## Overview
|
||||
|
||||
This guide covers security considerations and best practices for securing the Azure Stack HCI infrastructure.
|
||||
|
||||
## Network Security
|
||||
|
||||
### VLAN Segmentation
|
||||
|
||||
- **VLAN 10**: Storage (isolated)
|
||||
- **VLAN 20**: Compute (isolated)
|
||||
- **VLAN 30**: App Tier (isolated)
|
||||
- **VLAN 40**: Observability (isolated)
|
||||
- **VLAN 50**: Dev/Test (isolated)
|
||||
- **VLAN 60**: Management (restricted access)
|
||||
- **VLAN 99**: DMZ (public-facing)
|
||||
|
||||
### Firewall Rules
|
||||
|
||||
- Default deny between VLANs
|
||||
- Explicit allow rules for required communication
|
||||
- Management VLAN access restricted to authorized IPs
|
||||
- DMZ isolated from internal networks
|
||||
|
||||
## Access Control
|
||||
|
||||
### Proxmox RBAC
|
||||
|
||||
- Use role-based access control (RBAC)
|
||||
- Create dedicated users instead of using root
|
||||
- Use API tokens instead of passwords
|
||||
- Limit permissions to minimum required
|
||||
|
||||
See [Proxmox RBAC Guide](proxmox-rbac.md) for detailed configuration.
|
||||
|
||||
### Azure Arc Security
|
||||
|
||||
- Use managed identities where possible
|
||||
- Implement Azure Policy for compliance
|
||||
- Enable Azure Defender for Cloud
|
||||
- Use Azure Key Vault for secrets
|
||||
|
||||
### Kubernetes RBAC
|
||||
|
||||
- Use Role-Based Access Control (RBAC)
|
||||
- Create service accounts for applications
|
||||
- Limit cluster-admin access
|
||||
- Use network policies for pod isolation
|
||||
|
||||
## Secrets Management
|
||||
|
||||
### Environment Variables
|
||||
|
||||
- Store secrets in `.env` file (not committed to git)
|
||||
- Use `.env.example` as template
|
||||
- Never commit `.env` to version control
|
||||
- Rotate secrets regularly
|
||||
|
||||
### Azure Key Vault
|
||||
|
||||
For production deployments, consider using Azure Key Vault:
|
||||
|
||||
```bash
|
||||
# Store secret
|
||||
az keyvault secret set \
|
||||
--vault-name <vault-name> \
|
||||
--name <secret-name> \
|
||||
--value <secret-value>
|
||||
|
||||
# Retrieve secret
|
||||
az keyvault secret show \
|
||||
--vault-name <vault-name> \
|
||||
--name <secret-name> \
|
||||
--query value -o tsv
|
||||
```
|
||||
|
||||
### Kubernetes Secrets
|
||||
|
||||
- Use Kubernetes secrets for application credentials
|
||||
- Consider external secret management (e.g., Sealed Secrets)
|
||||
- Encrypt secrets at rest
|
||||
- Rotate secrets regularly
|
||||
|
||||
## SSL/TLS
|
||||
|
||||
### Certificates
|
||||
|
||||
- Use valid SSL/TLS certificates for all services
|
||||
- Configure certificate auto-renewal (Cert-Manager)
|
||||
- Use Let's Encrypt for public services
|
||||
- Use internal CA for private services
|
||||
|
||||
### Cloudflare Tunnel
|
||||
|
||||
- Cloudflare Tunnel handles SSL termination
|
||||
- No inbound ports required
|
||||
- WAF protection enabled
|
||||
- DDoS protection enabled
|
||||
|
||||
## Monitoring and Auditing
|
||||
|
||||
### Logging
|
||||
|
||||
- Enable audit logging for all components
|
||||
- Centralize logs (Azure Log Analytics, syslog)
|
||||
- Retain logs for compliance
|
||||
- Monitor for suspicious activity
|
||||
|
||||
### Azure Monitor
|
||||
|
||||
- Enable Azure Monitor for all resources
|
||||
- Set up alerting for security events
|
||||
- Monitor for policy violations
|
||||
- Track access and changes
|
||||
|
||||
### Azure Defender
|
||||
|
||||
- Enable Azure Defender for Cloud
|
||||
- Configure threat detection
|
||||
- Set up security alerts
|
||||
- Review security recommendations
|
||||
|
||||
## Compliance
|
||||
|
||||
### Azure Policy
|
||||
|
||||
- Apply security baseline policies
|
||||
- Enforce compliance requirements
|
||||
- Monitor policy compliance
|
||||
- Remediate non-compliant resources
|
||||
|
||||
### Updates
|
||||
|
||||
- Keep all systems updated
|
||||
- Use Azure Update Management
|
||||
- Schedule regular maintenance windows
|
||||
- Test updates in non-production first
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Principle of Least Privilege**: Grant minimum required permissions
|
||||
2. **Defense in Depth**: Multiple layers of security
|
||||
3. **Regular Audits**: Review access and permissions regularly
|
||||
4. **Incident Response**: Have a plan for security incidents
|
||||
5. **Backup and Recovery**: Regular backups and tested recovery procedures
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- [Proxmox RBAC Guide](proxmox-rbac.md)
|
||||
- [Azure Security Documentation](https://docs.microsoft.com/azure/security/)
|
||||
- [Kubernetes Security](https://kubernetes.io/docs/concepts/security/)
|
||||
|
||||
Reference in New Issue
Block a user