# Deployment Monitoring Guide ## Overview Full deployment monitoring system for Chain-138 multi-region deployment with real-time status tracking. ## Monitoring Tools ### 1. Deployment Dashboard ```bash ./scripts/deployment/deployment-dashboard.sh ``` - **Purpose**: Comprehensive one-time status view - **Updates**: Static (run manually) - **Shows**: Infrastructure, clusters, resource groups, progress ### 2. Continuous Monitoring ```bash ./scripts/deployment/monitor-continuous.sh ``` - **Purpose**: Continuous real-time monitoring - **Updates**: Every 15 seconds - **Shows**: Full dashboard + Terraform log tail ### 3. Live Monitoring ```bash ./scripts/deployment/monitor-deployment-live.sh ``` - **Purpose**: Live updates with full details - **Updates**: Every 15 seconds - **Shows**: Complete status with log tail ### 4. Detailed Monitoring ```bash ./scripts/deployment/monitor-deployment.sh ``` - **Purpose**: Detailed per-region monitoring - **Updates**: Every 30 seconds - **Shows**: Individual cluster status per region ## Current Deployment Status ### Infrastructure - **Terraform**: Running (PID varies) - **Resource Groups**: 175 created - **Expected**: 144 (6 per region × 24 regions) - **Status**: Over-provisioned (includes managed resource groups) ### AKS Clusters - **Total Regions**: 24 - **Ready**: 0-1 (varies) - **Failed**: 8 - **Canceled**: 16 - **Creating**: 0 - **Not Found**: Varies ### Issues 1. **State Lock**: Terraform state locked (another process running) 2. **Failed Clusters**: 8 clusters in Failed state 3. **Canceled Clusters**: 16 clusters in Canceled state 4. **Deletion Issues**: Clusters can't be deleted easily (Azure limitation) ## Monitoring Commands ### Quick Status ```bash ./scripts/deployment/deployment-dashboard.sh ``` ### Continuous Monitoring ```bash ./scripts/deployment/monitor-continuous.sh ``` ### Terraform Log ```bash tail -f /tmp/terraform-apply-retry.log # OR tail -f /tmp/terraform-apply-final-clean.log ``` ### Cluster Status ```bash az aks list --subscription fc08d829-4f14-413d-ab27-ce024425db0b --query "[?contains(name, 'az-p-')].{name:name, state:provisioningState, power:powerState.code}" -o table ``` ## Troubleshooting ### Issue: State Lock **Symptom**: `Error acquiring the state lock` **Solution**: Wait for current Terraform process to complete, or force unlock: ```bash cd terraform/well-architected/cloud-sovereignty terraform force-unlock ``` ### Issue: Failed/Canceled Clusters **Symptom**: Clusters in Failed or Canceled state **Solution**: 1. Wait for clusters to be deleted automatically 2. Or manually delete via Azure Portal 3. Re-run Terraform deployment ### Issue: Clusters Not Deleting **Symptom**: Clusters stuck in deletion **Solution**: Check for dependencies, wait longer, or delete via Azure Portal ## Next Steps 1. **Monitor Deployment**: Use continuous monitoring 2. **Wait for Completion**: Let Terraform finish 3. **Verify Clusters**: Check cluster status 4. **Run Next Steps**: Once clusters are ready ## Files - **Dashboard**: `scripts/deployment/deployment-dashboard.sh` - **Continuous**: `scripts/deployment/monitor-continuous.sh` - **Live**: `scripts/deployment/monitor-deployment-live.sh` - **Terraform Log**: `/tmp/terraform-apply-retry.log` - **Final Log**: `/tmp/terraform-apply-final-clean.log`