Files
Sankofa/docs/fairness-audit/ORCHESTRATION_DESIGN.md

190 lines
4.8 KiB
Markdown
Raw Normal View History

# Fairness Audit Orchestration - Design Document
## Design Philosophy
> **We design from the end result backwards.**
First we list every output you want (reports, files, metrics), then we calculate how much input validation and enrichment is needed — roughly **2× your input effort** — and size the total job at about **3.2× the input**.
You only choose:
1. **What goes in** (Input)
2. **What comes out** (Output)
3. **When it needs to be ready** (Timeline)
The orchestration engine does everything in between.
## Mathematical Model
### Core Formula
```
Total Process Load = O + 2I ≈ 3.2I
```
Where:
- **I** = Input size/effort (units)
- **O** = Total output effort (sum of output weights)
- **2I** = Two input processing passes (ingestion + enrichment + fairness evaluation)
### Design Target
```
O ≈ 1.2 × I
```
This means for a typical input of 100 units:
- Output should be around 120 units
- Total load ≈ 320 units
- Input passes = 200 units (2 × 100)
### Why 2× Input?
The input requires two full passes:
1. **Ingestion & Enrichment**: Load data, validate, enrich with metadata
2. **Fairness Evaluation**: Run fairness algorithms, calculate metrics
Each pass processes the full input, hence 2×.
## Output Weight Guidelines
### Weight Calculation Factors
1. **Complexity**: How complex is the output to generate?
2. **Data Volume**: How much data does it contain?
3. **Processing Time**: How long does generation take?
4. **Dependencies**: Does it depend on other outputs?
### Recommended Weights
| Complexity | Typical Weight Range | Examples |
|-----------|---------------------|----------|
| Simple | 0.5 - 1.0 | Metrics export, Alert config |
| Medium | 1.0 - 2.0 | CSV exports, JSON reports |
| Complex | 2.0 - 3.0 | PDF reports, Dashboards, Compliance docs |
### Weight Examples
- **Metrics Export (1.0)**: Simple calculation, small output
- **Flagged Cases CSV (1.5)**: Medium complexity, moderate data
- **Fairness Audit PDF (2.5)**: Complex formatting, large output
- **Compliance Report (2.2)**: Complex structure, regulatory requirements
## Input Load Estimation
### Base Calculation
```typescript
Base = 100 units
+ Sensitive Attributes: 20 units each
+ Date Range: 5 units per day
+ Filters: 10 units each
+ Estimated Size: Use if provided
```
### Example Calculations
**Small Dataset**:
- 2 sensitive attributes
- 7-day range
- 1 filter
- Load = 100 + (2×20) + (7×5) + (1×10) = 165 units
**Large Dataset**:
- 5 sensitive attributes
- 90-day range
- 5 filters
- Load = 100 + (5×20) + (90×5) + (5×10) = 700 units
## Timeline Validation
### SLA Parsing
Supports formats:
- "2 hours"
- "1 day"
- "30 minutes"
- "45 seconds"
### Feasibility Checks
1. **Time Check**: `estimatedTime ≤ maxTimeSeconds`
2. **Output Check**: `outputLoad ≤ 1.5 × (inputLoad × 1.2)`
3. **Total Load Check**: `totalLoad ≤ 1.3 × (inputLoad × 3.2)`
### Warning Thresholds
- **Critical**: Estimated time exceeds timeline
- **Warning**: Estimated time > 80% of timeline
- **Info**: Output load > 1.5× target
## User Experience Flow
### Step 1: Select Outputs
- User checks desired outputs
- Engine calculates O in real-time
- Shows total output load
### Step 2: Specify Input
- User enters dataset, attributes, range
- Engine calculates I in real-time
- Shows estimated input load
### Step 3: Set Timeline
- User selects mode and SLA
- Engine validates feasibility
- Shows estimated time and warnings
### Step 4: Review & Run
- Engine shows complete analysis
- User reviews warnings/suggestions
- User confirms and runs
## Error Handling
### Invalid Configurations
1. **No Outputs Selected**: Disable run button
2. **No Dataset**: Disable run button
3. **Invalid SLA Format**: Show format hint
4. **Infeasible Timeline**: Show suggestions
### Suggestions
Engine provides actionable suggestions:
- "Consider reducing outputs"
- "Consider extending timeline"
- "Consider simplifying input filters"
## Performance Considerations
### Processing Rates
Rates are configurable and can be tuned based on:
- Hardware capabilities
- Network bandwidth
- Concurrent job limits
- Historical performance data
### Optimization Strategies
1. **Parallel Processing**: Process outputs in parallel when possible
2. **Caching**: Cache intermediate results
3. **Batch Processing**: Batch similar operations
4. **Resource Allocation**: Allocate resources based on load
## Future Enhancements
1. **Machine Learning**: Learn from historical runs to improve estimates
2. **Dynamic Weights**: Adjust weights based on actual performance
3. **Resource Scaling**: Automatically scale resources based on load
4. **Cost Estimation**: Add cost estimates alongside time estimates
5. **Multi-Tenant**: Support multiple concurrent orchestrations
## Related Documentation
- [Orchestration Engine](./ORCHESTRATION_ENGINE.md)
- [Output Weight Guidelines](./OUTPUT_WEIGHTS.md)
- [API Reference](./API_REFERENCE.md)