- Add comprehensive database migrations (001-024) for schema evolution - Enhance API schema with expanded type definitions and resolvers - Add new middleware: audit logging, rate limiting, MFA enforcement, security, tenant auth - Implement new services: AI optimization, billing, blockchain, compliance, marketplace - Add adapter layer for cloud integrations (Cloudflare, Kubernetes, Proxmox, storage) - Update Crossplane provider with enhanced VM management capabilities - Add comprehensive test suite for API endpoints and services - Update frontend components with improved GraphQL subscriptions and real-time updates - Enhance security configurations and headers (CSP, CORS, etc.) - Update documentation and configuration files - Add new CI/CD workflows and validation scripts - Implement design system improvements and UI enhancements
190 lines
4.8 KiB
Markdown
190 lines
4.8 KiB
Markdown
# Fairness Audit Orchestration - Design Document
|
||
|
||
## Design Philosophy
|
||
|
||
> **We design from the end result backwards.**
|
||
|
||
First we list every output you want (reports, files, metrics), then we calculate how much input validation and enrichment is needed — roughly **2× your input effort** — and size the total job at about **3.2× the input**.
|
||
|
||
You only choose:
|
||
1. **What goes in** (Input)
|
||
2. **What comes out** (Output)
|
||
3. **When it needs to be ready** (Timeline)
|
||
|
||
The orchestration engine does everything in between.
|
||
|
||
## Mathematical Model
|
||
|
||
### Core Formula
|
||
|
||
```
|
||
Total Process Load = O + 2I ≈ 3.2I
|
||
```
|
||
|
||
Where:
|
||
- **I** = Input size/effort (units)
|
||
- **O** = Total output effort (sum of output weights)
|
||
- **2I** = Two input processing passes (ingestion + enrichment + fairness evaluation)
|
||
|
||
### Design Target
|
||
|
||
```
|
||
O ≈ 1.2 × I
|
||
```
|
||
|
||
This means for a typical input of 100 units:
|
||
- Output should be around 120 units
|
||
- Total load ≈ 320 units
|
||
- Input passes = 200 units (2 × 100)
|
||
|
||
### Why 2× Input?
|
||
|
||
The input requires two full passes:
|
||
1. **Ingestion & Enrichment**: Load data, validate, enrich with metadata
|
||
2. **Fairness Evaluation**: Run fairness algorithms, calculate metrics
|
||
|
||
Each pass processes the full input, hence 2×.
|
||
|
||
## Output Weight Guidelines
|
||
|
||
### Weight Calculation Factors
|
||
|
||
1. **Complexity**: How complex is the output to generate?
|
||
2. **Data Volume**: How much data does it contain?
|
||
3. **Processing Time**: How long does generation take?
|
||
4. **Dependencies**: Does it depend on other outputs?
|
||
|
||
### Recommended Weights
|
||
|
||
| Complexity | Typical Weight Range | Examples |
|
||
|-----------|---------------------|----------|
|
||
| Simple | 0.5 - 1.0 | Metrics export, Alert config |
|
||
| Medium | 1.0 - 2.0 | CSV exports, JSON reports |
|
||
| Complex | 2.0 - 3.0 | PDF reports, Dashboards, Compliance docs |
|
||
|
||
### Weight Examples
|
||
|
||
- **Metrics Export (1.0)**: Simple calculation, small output
|
||
- **Flagged Cases CSV (1.5)**: Medium complexity, moderate data
|
||
- **Fairness Audit PDF (2.5)**: Complex formatting, large output
|
||
- **Compliance Report (2.2)**: Complex structure, regulatory requirements
|
||
|
||
## Input Load Estimation
|
||
|
||
### Base Calculation
|
||
|
||
```typescript
|
||
Base = 100 units
|
||
|
||
+ Sensitive Attributes: 20 units each
|
||
+ Date Range: 5 units per day
|
||
+ Filters: 10 units each
|
||
+ Estimated Size: Use if provided
|
||
```
|
||
|
||
### Example Calculations
|
||
|
||
**Small Dataset**:
|
||
- 2 sensitive attributes
|
||
- 7-day range
|
||
- 1 filter
|
||
- Load = 100 + (2×20) + (7×5) + (1×10) = 165 units
|
||
|
||
**Large Dataset**:
|
||
- 5 sensitive attributes
|
||
- 90-day range
|
||
- 5 filters
|
||
- Load = 100 + (5×20) + (90×5) + (5×10) = 700 units
|
||
|
||
## Timeline Validation
|
||
|
||
### SLA Parsing
|
||
|
||
Supports formats:
|
||
- "2 hours"
|
||
- "1 day"
|
||
- "30 minutes"
|
||
- "45 seconds"
|
||
|
||
### Feasibility Checks
|
||
|
||
1. **Time Check**: `estimatedTime ≤ maxTimeSeconds`
|
||
2. **Output Check**: `outputLoad ≤ 1.5 × (inputLoad × 1.2)`
|
||
3. **Total Load Check**: `totalLoad ≤ 1.3 × (inputLoad × 3.2)`
|
||
|
||
### Warning Thresholds
|
||
|
||
- **Critical**: Estimated time exceeds timeline
|
||
- **Warning**: Estimated time > 80% of timeline
|
||
- **Info**: Output load > 1.5× target
|
||
|
||
## User Experience Flow
|
||
|
||
### Step 1: Select Outputs
|
||
- User checks desired outputs
|
||
- Engine calculates O in real-time
|
||
- Shows total output load
|
||
|
||
### Step 2: Specify Input
|
||
- User enters dataset, attributes, range
|
||
- Engine calculates I in real-time
|
||
- Shows estimated input load
|
||
|
||
### Step 3: Set Timeline
|
||
- User selects mode and SLA
|
||
- Engine validates feasibility
|
||
- Shows estimated time and warnings
|
||
|
||
### Step 4: Review & Run
|
||
- Engine shows complete analysis
|
||
- User reviews warnings/suggestions
|
||
- User confirms and runs
|
||
|
||
## Error Handling
|
||
|
||
### Invalid Configurations
|
||
|
||
1. **No Outputs Selected**: Disable run button
|
||
2. **No Dataset**: Disable run button
|
||
3. **Invalid SLA Format**: Show format hint
|
||
4. **Infeasible Timeline**: Show suggestions
|
||
|
||
### Suggestions
|
||
|
||
Engine provides actionable suggestions:
|
||
- "Consider reducing outputs"
|
||
- "Consider extending timeline"
|
||
- "Consider simplifying input filters"
|
||
|
||
## Performance Considerations
|
||
|
||
### Processing Rates
|
||
|
||
Rates are configurable and can be tuned based on:
|
||
- Hardware capabilities
|
||
- Network bandwidth
|
||
- Concurrent job limits
|
||
- Historical performance data
|
||
|
||
### Optimization Strategies
|
||
|
||
1. **Parallel Processing**: Process outputs in parallel when possible
|
||
2. **Caching**: Cache intermediate results
|
||
3. **Batch Processing**: Batch similar operations
|
||
4. **Resource Allocation**: Allocate resources based on load
|
||
|
||
## Future Enhancements
|
||
|
||
1. **Machine Learning**: Learn from historical runs to improve estimates
|
||
2. **Dynamic Weights**: Adjust weights based on actual performance
|
||
3. **Resource Scaling**: Automatically scale resources based on load
|
||
4. **Cost Estimation**: Add cost estimates alongside time estimates
|
||
5. **Multi-Tenant**: Support multiple concurrent orchestrations
|
||
|
||
## Related Documentation
|
||
|
||
- [Orchestration Engine](./ORCHESTRATION_ENGINE.md)
|
||
- [Output Weight Guidelines](./OUTPUT_WEIGHTS.md)
|
||
- [API Reference](./API_REFERENCE.md)
|
||
|