Files
smom-dbis-138/docs/deployment/DEPLOYMENT_MONITORING_GUIDE.md
defiQUG 1fb7266469 Add Oracle Aggregator and CCIP Integration
- Introduced Aggregator.sol for Chainlink-compatible oracle functionality, including round-based updates and access control.
- Added OracleWithCCIP.sol to extend Aggregator with CCIP cross-chain messaging capabilities.
- Created .gitmodules to include OpenZeppelin contracts as a submodule.
- Developed a comprehensive deployment guide in NEXT_STEPS_COMPLETE_GUIDE.md for Phase 2 and smart contract deployment.
- Implemented Vite configuration for the orchestration portal, supporting both Vue and React frameworks.
- Added server-side logic for the Multi-Cloud Orchestration Portal, including API endpoints for environment management and monitoring.
- Created scripts for resource import and usage validation across non-US regions.
- Added tests for CCIP error handling and integration to ensure robust functionality.
- Included various new files and directories for the orchestration portal and deployment scripts.
2025-12-12 14:57:48 -08:00

122 lines
3.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Deployment Monitoring Guide
## Overview
Full deployment monitoring system for Chain-138 multi-region deployment with real-time status tracking.
## Monitoring Tools
### 1. Deployment Dashboard
```bash
./scripts/deployment/deployment-dashboard.sh
```
- **Purpose**: Comprehensive one-time status view
- **Updates**: Static (run manually)
- **Shows**: Infrastructure, clusters, resource groups, progress
### 2. Continuous Monitoring
```bash
./scripts/deployment/monitor-continuous.sh
```
- **Purpose**: Continuous real-time monitoring
- **Updates**: Every 15 seconds
- **Shows**: Full dashboard + Terraform log tail
### 3. Live Monitoring
```bash
./scripts/deployment/monitor-deployment-live.sh
```
- **Purpose**: Live updates with full details
- **Updates**: Every 15 seconds
- **Shows**: Complete status with log tail
### 4. Detailed Monitoring
```bash
./scripts/deployment/monitor-deployment.sh
```
- **Purpose**: Detailed per-region monitoring
- **Updates**: Every 30 seconds
- **Shows**: Individual cluster status per region
## Current Deployment Status
### Infrastructure
- **Terraform**: Running (PID varies)
- **Resource Groups**: 175 created
- **Expected**: 144 (6 per region × 24 regions)
- **Status**: Over-provisioned (includes managed resource groups)
### AKS Clusters
- **Total Regions**: 24
- **Ready**: 0-1 (varies)
- **Failed**: 8
- **Canceled**: 16
- **Creating**: 0
- **Not Found**: Varies
### Issues
1. **State Lock**: Terraform state locked (another process running)
2. **Failed Clusters**: 8 clusters in Failed state
3. **Canceled Clusters**: 16 clusters in Canceled state
4. **Deletion Issues**: Clusters can't be deleted easily (Azure limitation)
## Monitoring Commands
### Quick Status
```bash
./scripts/deployment/deployment-dashboard.sh
```
### Continuous Monitoring
```bash
./scripts/deployment/monitor-continuous.sh
```
### Terraform Log
```bash
tail -f /tmp/terraform-apply-retry.log
# OR
tail -f /tmp/terraform-apply-final-clean.log
```
### Cluster Status
```bash
az aks list --subscription fc08d829-4f14-413d-ab27-ce024425db0b --query "[?contains(name, 'az-p-')].{name:name, state:provisioningState, power:powerState.code}" -o table
```
## Troubleshooting
### Issue: State Lock
**Symptom**: `Error acquiring the state lock`
**Solution**: Wait for current Terraform process to complete, or force unlock:
```bash
cd terraform/well-architected/cloud-sovereignty
terraform force-unlock <LOCK_ID>
```
### Issue: Failed/Canceled Clusters
**Symptom**: Clusters in Failed or Canceled state
**Solution**:
1. Wait for clusters to be deleted automatically
2. Or manually delete via Azure Portal
3. Re-run Terraform deployment
### Issue: Clusters Not Deleting
**Symptom**: Clusters stuck in deletion
**Solution**: Check for dependencies, wait longer, or delete via Azure Portal
## Next Steps
1. **Monitor Deployment**: Use continuous monitoring
2. **Wait for Completion**: Let Terraform finish
3. **Verify Clusters**: Check cluster status
4. **Run Next Steps**: Once clusters are ready
## Files
- **Dashboard**: `scripts/deployment/deployment-dashboard.sh`
- **Continuous**: `scripts/deployment/monitor-continuous.sh`
- **Live**: `scripts/deployment/monitor-deployment-live.sh`
- **Terraform Log**: `/tmp/terraform-apply-retry.log`
- **Final Log**: `/tmp/terraform-apply-final-clean.log`