Files
the_order/docs/operations/ENTRA_VERIFIEDID_RUNBOOK.md
defiQUG 92cc41d26d Add Legal Office seal and complete Azure CDN deployment
- Add Legal Office of the Master seal (SVG design with Maltese Cross, scales of justice, legal scroll)
- Create legal-office-manifest-template.json for Legal Office credentials
- Update SEAL_MAPPING.md and DESIGN_GUIDE.md with Legal Office seal documentation
- Complete Azure CDN infrastructure deployment:
  - Resource group, storage account, and container created
  - 17 PNG seal files uploaded to Azure Blob Storage
  - All manifest templates updated with Azure URLs
  - Configuration files generated (azure-cdn-config.env)
- Add comprehensive Azure CDN setup scripts and documentation
- Fix manifest URL generation to prevent double slashes
- Verify all seals accessible via HTTPS
2025-11-12 22:03:42 -08:00

406 lines
10 KiB
Markdown

# Entra VerifiedID Operational Runbook
This runbook provides operational procedures for managing the Entra VerifiedID integration.
## Table of Contents
1. [Daily Operations](#daily-operations)
2. [Monitoring](#monitoring)
3. [Troubleshooting](#troubleshooting)
4. [Common Operations](#common-operations)
5. [Emergency Procedures](#emergency-procedures)
## Daily Operations
### Health Checks
**Check Service Health**
```bash
curl https://api.theorder.org/health
```
**Check Entra Client Status**
```bash
# Check logs for Entra client initialization
kubectl logs -n the-order-prod deployment/identity-service | grep -i entra
```
**Verify Metrics Collection**
```bash
curl https://api.theorder.org/metrics | grep entra
```
### Key Metrics to Monitor
1. **Issuance Success Rate**: Should be >95%
```promql
rate(entra_credentials_issued_total{status="success"}[5m]) /
rate(entra_credentials_issued_total[5m])
```
2. **API Latency**: p95 should be <5 seconds
```promql
histogram_quantile(0.95, entra_api_request_duration_seconds_bucket{operation="issueCredential"})
```
3. **Error Rate**: Should be <5%
```promql
rate(entra_api_errors_total[5m]) / rate(entra_api_requests_total[5m])
```
4. **Webhook Processing**: Should process all webhooks
```promql
rate(entra_webhooks_received_total[5m])
```
## Monitoring
### Grafana Dashboard
Access the Entra VerifiedID dashboard at: `https://grafana.theorder.org/d/entra-verifiedid`
**Key Panels:**
- Issuance Success Rate (gauge)
- API Request Rate (graph)
- Error Rate by Operation (graph)
- Issuance Duration (histogram)
- Webhook Events (graph)
- Active Requests (gauge)
### Alerts
**Critical Alerts:**
- `EntraIssuanceErrorRateHigh`: Error rate >10%
- `EntraIssuanceLatencyHigh`: p95 latency >10 seconds
- `EntraWebhookProcessingFailed`: Webhook processing failures
- `EntraAPIDown`: No successful API requests in 5 minutes
**Warning Alerts:**
- `EntraIssuanceErrorRateWarning`: Error rate >5%
- `EntraIssuanceLatencyWarning`: p95 latency >5 seconds
- `EntraRateLimitApproaching`: Rate limit usage >80%
## Troubleshooting
### Issue: Credential Issuance Failing
**Symptoms:**
- High error rate in metrics
- 500 errors in logs
- No credentials being issued
**Diagnosis:**
```bash
# Check recent errors
kubectl logs -n the-order-prod deployment/identity-service --tail=100 | grep -i error
# Check Entra API connectivity
curl -X POST https://verifiedid.did.msidentity.com/v1.0/<tenant-id>/verifiableCredentials/createIssuanceRequest \
-H "Authorization: Bearer <token>"
# Verify credentials
kubectl get secret -n the-order-prod entra-credentials -o yaml
```
**Solutions:**
1. Verify Entra credentials are correct
2. Check API permissions are granted
3. Verify credential manifest exists
4. Check network connectivity to Entra API
5. Review Entra service status in Azure Portal
### Issue: Webhooks Not Received
**Symptoms:**
- No webhook events in metrics
- Credentials stuck in "pending" status
- Database not updated
**Diagnosis:**
```bash
# Check webhook endpoint
curl -X POST https://api.theorder.org/vc/entra/webhook \
-H "Content-Type: application/json" \
-d '{"requestId":"test","requestStatus":"issuance_successful"}'
# Check webhook logs
kubectl logs -n the-order-prod deployment/identity-service | grep webhook
# Verify webhook URL in Entra
# Go to Azure Portal → Verified ID → Settings → Webhooks
```
**Solutions:**
1. Verify webhook URL is configured in Entra VerifiedID
2. Check webhook endpoint is accessible (firewall, ingress rules)
3. Verify webhook payload format matches expected schema
4. Check database connectivity
5. Review webhook processing logs
### Issue: High Latency
**Symptoms:**
- Slow credential issuance (>10 seconds)
- High p95/p99 latency metrics
- Timeout errors
**Diagnosis:**
```bash
# Check API request duration
kubectl logs -n the-order-prod deployment/identity-service | grep "duration"
# Check network latency to Entra
ping verifiedid.did.msidentity.com
# Check retry attempts
kubectl logs -n the-order-prod deployment/identity-service | grep retry
```
**Solutions:**
1. Check network connectivity and latency
2. Verify Entra API is not experiencing issues
3. Review retry configuration (may be retrying too many times)
4. Check if rate limiting is causing delays
5. Consider increasing timeout values
### Issue: Rate Limit Errors
**Symptoms:**
- 429 errors in logs
- Rate limit metrics showing violations
- Requests being rejected
**Diagnosis:**
```bash
# Check rate limit violations
kubectl logs -n the-order-prod deployment/identity-service | grep "429"
# Check current rate limit settings
kubectl get configmap -n the-order-prod identity-service-config -o yaml | grep ENTRA_RATE_LIMIT
```
**Solutions:**
1. Review current rate limit configuration
2. Check Entra API quota limits
3. Adjust rate limits if needed
4. Implement request queuing if necessary
5. Contact Entra support if quota needs increase
### Issue: Token Refresh Failures
**Symptoms:**
- "Failed to get access token" errors
- Authentication failures
- 401 errors
**Diagnosis:**
```bash
# Check token refresh logs
kubectl logs -n the-order-prod deployment/identity-service | grep "token"
# Verify credentials
kubectl get secret -n the-order-prod entra-credentials -o jsonpath='{.data.ENTRA_CLIENT_SECRET}' | base64 -d
```
**Solutions:**
1. Verify client secret is correct and not expired
2. Check API permissions are granted
3. Verify tenant ID and client ID are correct
4. Check if client secret needs rotation
5. Review Azure AD app registration status
## Common Operations
### Issue a Credential Manually
```bash
curl -X POST https://api.theorder.org/vc/issue/entra \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"claims": {
"email": "user@example.com",
"name": "John Doe",
"role": "member"
},
"manifestName": "default"
}'
```
### Check Credential Status
```bash
curl https://api.theorder.org/vc/entra/status/<requestId> \
-H "Authorization: Bearer <token>"
```
### Verify a Credential
```bash
curl -X POST https://api.theorder.org/vc/verify/entra \
-H "Content-Type: application/json" \
-d '{
"credential": {
"id": "vc:123",
"type": ["VerifiableCredential"],
"issuer": "did:web:...",
"credentialSubject": {...},
"proof": {...}
}
}'
```
### View Recent Issuances
```bash
# Query database
kubectl exec -n the-order-prod deployment/identity-service -- \
psql $DATABASE_URL -c "SELECT * FROM verifiable_credentials ORDER BY created_at DESC LIMIT 10;"
```
### Check Metrics
```bash
# Get all Entra metrics
curl https://api.theorder.org/metrics | grep entra_
# Get specific metric
curl https://api.theorder.org/metrics | grep entra_credentials_issued_total
```
### Rotate Client Secret
1. Create new client secret in Azure Portal
2. Update secret in Key Vault:
```bash
az keyvault secret set --vault-name <keyvault> --name "entra-client-secret" --value "<new-secret>"
```
3. Restart identity service to pick up new secret
4. Verify service starts correctly
5. Test credential issuance
6. Delete old secret after verification
### Add New Credential Manifest
1. Create manifest in Azure Portal → Verified ID
2. Note the Manifest ID
3. Update `ENTRA_MANIFESTS` environment variable:
```bash
ENTRA_MANIFESTS='{"default":"id1","new-manifest":"new-id"}'
```
4. Restart identity service
5. Test issuance with new manifest:
```bash
curl -X POST .../vc/issue/entra -d '{"claims": {...}, "manifestName": "new-manifest"}'
```
## Emergency Procedures
### Disable Entra Integration
If critical issues occur:
1. **Scale down identity service** (if using separate deployment):
```bash
kubectl scale deployment identity-service -n the-order-prod --replicas=0
```
2. **Or disable Entra routes** by setting:
```bash
ENTRA_TENANT_ID=""
```
3. **Verify routes are disabled**:
```bash
curl https://api.theorder.org/vc/issue/entra
# Should return 503 or route not found
```
4. **Monitor for stability**
### Rollback Deployment
1. Identify previous working version
2. Rollback deployment:
```bash
kubectl rollout undo deployment/identity-service -n the-order-prod
```
3. Verify rollback:
```bash
kubectl rollout status deployment/identity-service -n the-order-prod
```
4. Test critical functionality
5. Monitor metrics
### Emergency Credential Issuance
If automated issuance fails, use manual process:
1. Access Entra VerifiedID portal directly
2. Issue credential manually
3. Export credential data
4. Import into database if needed
5. Notify affected users
## Diagnostic Commands
### Check Service Status
```bash
kubectl get pods -n the-order-prod -l app=identity-service
kubectl describe pod <pod-name> -n the-order-prod
```
### View Logs
```bash
# Recent logs
kubectl logs -n the-order-prod deployment/identity-service --tail=100
# Follow logs
kubectl logs -n the-order-prod deployment/identity-service -f
# Logs with grep
kubectl logs -n the-order-prod deployment/identity-service | grep -i entra
```
### Check Configuration
```bash
# Environment variables
kubectl exec -n the-order-prod deployment/identity-service -- env | grep ENTRA
# ConfigMap
kubectl get configmap -n the-order-prod identity-service-config -o yaml
# Secrets (base64 encoded)
kubectl get secret -n the-order-prod entra-credentials -o yaml
```
### Test Connectivity
```bash
# Test Entra API
curl -v https://verifiedid.did.msidentity.com/v1.0/
# Test webhook endpoint
curl -X POST https://api.theorder.org/vc/entra/webhook \
-H "Content-Type: application/json" \
-d '{"requestId":"test","requestStatus":"issuance_successful"}'
```
## Support Escalation
1. **Level 1**: Check logs, metrics, and run diagnostic commands
2. **Level 2**: Review configuration and test connectivity
3. **Level 3**: Contact Azure support for Entra VerifiedID issues
4. **Level 4**: Escalate to engineering team for code issues
## Contact Information
- **On-Call Engineer**: [Contact Info]
- **Azure Support**: [Azure Portal](https://portal.azure.com/#blade/Microsoft_Azure_Support/HelpAndSupportBlade)
- **Entra Documentation**: [Microsoft Learn](https://learn.microsoft.com/en-us/azure/active-directory/verifiable-credentials/)
---
**Last Updated**: [Current Date]
**Version**: 1.0