# Entra VerifiedID Operational Runbook This runbook provides operational procedures for managing the Entra VerifiedID integration. ## Table of Contents 1. [Daily Operations](#daily-operations) 2. [Monitoring](#monitoring) 3. [Troubleshooting](#troubleshooting) 4. [Common Operations](#common-operations) 5. [Emergency Procedures](#emergency-procedures) ## Daily Operations ### Health Checks **Check Service Health** ```bash curl https://api.theorder.org/health ``` **Check Entra Client Status** ```bash # Check logs for Entra client initialization kubectl logs -n the-order-prod deployment/identity-service | grep -i entra ``` **Verify Metrics Collection** ```bash curl https://api.theorder.org/metrics | grep entra ``` ### Key Metrics to Monitor 1. **Issuance Success Rate**: Should be >95% ```promql rate(entra_credentials_issued_total{status="success"}[5m]) / rate(entra_credentials_issued_total[5m]) ``` 2. **API Latency**: p95 should be <5 seconds ```promql histogram_quantile(0.95, entra_api_request_duration_seconds_bucket{operation="issueCredential"}) ``` 3. **Error Rate**: Should be <5% ```promql rate(entra_api_errors_total[5m]) / rate(entra_api_requests_total[5m]) ``` 4. **Webhook Processing**: Should process all webhooks ```promql rate(entra_webhooks_received_total[5m]) ``` ## Monitoring ### Grafana Dashboard Access the Entra VerifiedID dashboard at: `https://grafana.theorder.org/d/entra-verifiedid` **Key Panels:** - Issuance Success Rate (gauge) - API Request Rate (graph) - Error Rate by Operation (graph) - Issuance Duration (histogram) - Webhook Events (graph) - Active Requests (gauge) ### Alerts **Critical Alerts:** - `EntraIssuanceErrorRateHigh`: Error rate >10% - `EntraIssuanceLatencyHigh`: p95 latency >10 seconds - `EntraWebhookProcessingFailed`: Webhook processing failures - `EntraAPIDown`: No successful API requests in 5 minutes **Warning Alerts:** - `EntraIssuanceErrorRateWarning`: Error rate >5% - `EntraIssuanceLatencyWarning`: p95 latency >5 seconds - `EntraRateLimitApproaching`: Rate limit usage >80% ## Troubleshooting ### Issue: Credential Issuance Failing **Symptoms:** - High error rate in metrics - 500 errors in logs - No credentials being issued **Diagnosis:** ```bash # Check recent errors kubectl logs -n the-order-prod deployment/identity-service --tail=100 | grep -i error # Check Entra API connectivity curl -X POST https://verifiedid.did.msidentity.com/v1.0//verifiableCredentials/createIssuanceRequest \ -H "Authorization: Bearer " # Verify credentials kubectl get secret -n the-order-prod entra-credentials -o yaml ``` **Solutions:** 1. Verify Entra credentials are correct 2. Check API permissions are granted 3. Verify credential manifest exists 4. Check network connectivity to Entra API 5. Review Entra service status in Azure Portal ### Issue: Webhooks Not Received **Symptoms:** - No webhook events in metrics - Credentials stuck in "pending" status - Database not updated **Diagnosis:** ```bash # Check webhook endpoint curl -X POST https://api.theorder.org/vc/entra/webhook \ -H "Content-Type: application/json" \ -d '{"requestId":"test","requestStatus":"issuance_successful"}' # Check webhook logs kubectl logs -n the-order-prod deployment/identity-service | grep webhook # Verify webhook URL in Entra # Go to Azure Portal → Verified ID → Settings → Webhooks ``` **Solutions:** 1. Verify webhook URL is configured in Entra VerifiedID 2. Check webhook endpoint is accessible (firewall, ingress rules) 3. Verify webhook payload format matches expected schema 4. Check database connectivity 5. Review webhook processing logs ### Issue: High Latency **Symptoms:** - Slow credential issuance (>10 seconds) - High p95/p99 latency metrics - Timeout errors **Diagnosis:** ```bash # Check API request duration kubectl logs -n the-order-prod deployment/identity-service | grep "duration" # Check network latency to Entra ping verifiedid.did.msidentity.com # Check retry attempts kubectl logs -n the-order-prod deployment/identity-service | grep retry ``` **Solutions:** 1. Check network connectivity and latency 2. Verify Entra API is not experiencing issues 3. Review retry configuration (may be retrying too many times) 4. Check if rate limiting is causing delays 5. Consider increasing timeout values ### Issue: Rate Limit Errors **Symptoms:** - 429 errors in logs - Rate limit metrics showing violations - Requests being rejected **Diagnosis:** ```bash # Check rate limit violations kubectl logs -n the-order-prod deployment/identity-service | grep "429" # Check current rate limit settings kubectl get configmap -n the-order-prod identity-service-config -o yaml | grep ENTRA_RATE_LIMIT ``` **Solutions:** 1. Review current rate limit configuration 2. Check Entra API quota limits 3. Adjust rate limits if needed 4. Implement request queuing if necessary 5. Contact Entra support if quota needs increase ### Issue: Token Refresh Failures **Symptoms:** - "Failed to get access token" errors - Authentication failures - 401 errors **Diagnosis:** ```bash # Check token refresh logs kubectl logs -n the-order-prod deployment/identity-service | grep "token" # Verify credentials kubectl get secret -n the-order-prod entra-credentials -o jsonpath='{.data.ENTRA_CLIENT_SECRET}' | base64 -d ``` **Solutions:** 1. Verify client secret is correct and not expired 2. Check API permissions are granted 3. Verify tenant ID and client ID are correct 4. Check if client secret needs rotation 5. Review Azure AD app registration status ## Common Operations ### Issue a Credential Manually ```bash curl -X POST https://api.theorder.org/vc/issue/entra \ -H "Content-Type: application/json" \ -H "Authorization: Bearer " \ -d '{ "claims": { "email": "user@example.com", "name": "John Doe", "role": "member" }, "manifestName": "default" }' ``` ### Check Credential Status ```bash curl https://api.theorder.org/vc/entra/status/ \ -H "Authorization: Bearer " ``` ### Verify a Credential ```bash curl -X POST https://api.theorder.org/vc/verify/entra \ -H "Content-Type: application/json" \ -d '{ "credential": { "id": "vc:123", "type": ["VerifiableCredential"], "issuer": "did:web:...", "credentialSubject": {...}, "proof": {...} } }' ``` ### View Recent Issuances ```bash # Query database kubectl exec -n the-order-prod deployment/identity-service -- \ psql $DATABASE_URL -c "SELECT * FROM verifiable_credentials ORDER BY created_at DESC LIMIT 10;" ``` ### Check Metrics ```bash # Get all Entra metrics curl https://api.theorder.org/metrics | grep entra_ # Get specific metric curl https://api.theorder.org/metrics | grep entra_credentials_issued_total ``` ### Rotate Client Secret 1. Create new client secret in Azure Portal 2. Update secret in Key Vault: ```bash az keyvault secret set --vault-name --name "entra-client-secret" --value "" ``` 3. Restart identity service to pick up new secret 4. Verify service starts correctly 5. Test credential issuance 6. Delete old secret after verification ### Add New Credential Manifest 1. Create manifest in Azure Portal → Verified ID 2. Note the Manifest ID 3. Update `ENTRA_MANIFESTS` environment variable: ```bash ENTRA_MANIFESTS='{"default":"id1","new-manifest":"new-id"}' ``` 4. Restart identity service 5. Test issuance with new manifest: ```bash curl -X POST .../vc/issue/entra -d '{"claims": {...}, "manifestName": "new-manifest"}' ``` ## Emergency Procedures ### Disable Entra Integration If critical issues occur: 1. **Scale down identity service** (if using separate deployment): ```bash kubectl scale deployment identity-service -n the-order-prod --replicas=0 ``` 2. **Or disable Entra routes** by setting: ```bash ENTRA_TENANT_ID="" ``` 3. **Verify routes are disabled**: ```bash curl https://api.theorder.org/vc/issue/entra # Should return 503 or route not found ``` 4. **Monitor for stability** ### Rollback Deployment 1. Identify previous working version 2. Rollback deployment: ```bash kubectl rollout undo deployment/identity-service -n the-order-prod ``` 3. Verify rollback: ```bash kubectl rollout status deployment/identity-service -n the-order-prod ``` 4. Test critical functionality 5. Monitor metrics ### Emergency Credential Issuance If automated issuance fails, use manual process: 1. Access Entra VerifiedID portal directly 2. Issue credential manually 3. Export credential data 4. Import into database if needed 5. Notify affected users ## Diagnostic Commands ### Check Service Status ```bash kubectl get pods -n the-order-prod -l app=identity-service kubectl describe pod -n the-order-prod ``` ### View Logs ```bash # Recent logs kubectl logs -n the-order-prod deployment/identity-service --tail=100 # Follow logs kubectl logs -n the-order-prod deployment/identity-service -f # Logs with grep kubectl logs -n the-order-prod deployment/identity-service | grep -i entra ``` ### Check Configuration ```bash # Environment variables kubectl exec -n the-order-prod deployment/identity-service -- env | grep ENTRA # ConfigMap kubectl get configmap -n the-order-prod identity-service-config -o yaml # Secrets (base64 encoded) kubectl get secret -n the-order-prod entra-credentials -o yaml ``` ### Test Connectivity ```bash # Test Entra API curl -v https://verifiedid.did.msidentity.com/v1.0/ # Test webhook endpoint curl -X POST https://api.theorder.org/vc/entra/webhook \ -H "Content-Type: application/json" \ -d '{"requestId":"test","requestStatus":"issuance_successful"}' ``` ## Support Escalation 1. **Level 1**: Check logs, metrics, and run diagnostic commands 2. **Level 2**: Review configuration and test connectivity 3. **Level 3**: Contact Azure support for Entra VerifiedID issues 4. **Level 4**: Escalate to engineering team for code issues ## Contact Information - **On-Call Engineer**: [Contact Info] - **Azure Support**: [Azure Portal](https://portal.azure.com/#blade/Microsoft_Azure_Support/HelpAndSupportBlade) - **Entra Documentation**: [Microsoft Learn](https://learn.microsoft.com/en-us/azure/active-directory/verifiable-credentials/) --- **Last Updated**: [Current Date] **Version**: 1.0