190 lines
4.8 KiB
Markdown
190 lines
4.8 KiB
Markdown
|
|
# Fairness Audit Orchestration - Design Document
|
|||
|
|
|
|||
|
|
## Design Philosophy
|
|||
|
|
|
|||
|
|
> **We design from the end result backwards.**
|
|||
|
|
|
|||
|
|
First we list every output you want (reports, files, metrics), then we calculate how much input validation and enrichment is needed — roughly **2× your input effort** — and size the total job at about **3.2× the input**.
|
|||
|
|
|
|||
|
|
You only choose:
|
|||
|
|
1. **What goes in** (Input)
|
|||
|
|
2. **What comes out** (Output)
|
|||
|
|
3. **When it needs to be ready** (Timeline)
|
|||
|
|
|
|||
|
|
The orchestration engine does everything in between.
|
|||
|
|
|
|||
|
|
## Mathematical Model
|
|||
|
|
|
|||
|
|
### Core Formula
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
Total Process Load = O + 2I ≈ 3.2I
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Where:
|
|||
|
|
- **I** = Input size/effort (units)
|
|||
|
|
- **O** = Total output effort (sum of output weights)
|
|||
|
|
- **2I** = Two input processing passes (ingestion + enrichment + fairness evaluation)
|
|||
|
|
|
|||
|
|
### Design Target
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
O ≈ 1.2 × I
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
This means for a typical input of 100 units:
|
|||
|
|
- Output should be around 120 units
|
|||
|
|
- Total load ≈ 320 units
|
|||
|
|
- Input passes = 200 units (2 × 100)
|
|||
|
|
|
|||
|
|
### Why 2× Input?
|
|||
|
|
|
|||
|
|
The input requires two full passes:
|
|||
|
|
1. **Ingestion & Enrichment**: Load data, validate, enrich with metadata
|
|||
|
|
2. **Fairness Evaluation**: Run fairness algorithms, calculate metrics
|
|||
|
|
|
|||
|
|
Each pass processes the full input, hence 2×.
|
|||
|
|
|
|||
|
|
## Output Weight Guidelines
|
|||
|
|
|
|||
|
|
### Weight Calculation Factors
|
|||
|
|
|
|||
|
|
1. **Complexity**: How complex is the output to generate?
|
|||
|
|
2. **Data Volume**: How much data does it contain?
|
|||
|
|
3. **Processing Time**: How long does generation take?
|
|||
|
|
4. **Dependencies**: Does it depend on other outputs?
|
|||
|
|
|
|||
|
|
### Recommended Weights
|
|||
|
|
|
|||
|
|
| Complexity | Typical Weight Range | Examples |
|
|||
|
|
|-----------|---------------------|----------|
|
|||
|
|
| Simple | 0.5 - 1.0 | Metrics export, Alert config |
|
|||
|
|
| Medium | 1.0 - 2.0 | CSV exports, JSON reports |
|
|||
|
|
| Complex | 2.0 - 3.0 | PDF reports, Dashboards, Compliance docs |
|
|||
|
|
|
|||
|
|
### Weight Examples
|
|||
|
|
|
|||
|
|
- **Metrics Export (1.0)**: Simple calculation, small output
|
|||
|
|
- **Flagged Cases CSV (1.5)**: Medium complexity, moderate data
|
|||
|
|
- **Fairness Audit PDF (2.5)**: Complex formatting, large output
|
|||
|
|
- **Compliance Report (2.2)**: Complex structure, regulatory requirements
|
|||
|
|
|
|||
|
|
## Input Load Estimation
|
|||
|
|
|
|||
|
|
### Base Calculation
|
|||
|
|
|
|||
|
|
```typescript
|
|||
|
|
Base = 100 units
|
|||
|
|
|
|||
|
|
+ Sensitive Attributes: 20 units each
|
|||
|
|
+ Date Range: 5 units per day
|
|||
|
|
+ Filters: 10 units each
|
|||
|
|
+ Estimated Size: Use if provided
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### Example Calculations
|
|||
|
|
|
|||
|
|
**Small Dataset**:
|
|||
|
|
- 2 sensitive attributes
|
|||
|
|
- 7-day range
|
|||
|
|
- 1 filter
|
|||
|
|
- Load = 100 + (2×20) + (7×5) + (1×10) = 165 units
|
|||
|
|
|
|||
|
|
**Large Dataset**:
|
|||
|
|
- 5 sensitive attributes
|
|||
|
|
- 90-day range
|
|||
|
|
- 5 filters
|
|||
|
|
- Load = 100 + (5×20) + (90×5) + (5×10) = 700 units
|
|||
|
|
|
|||
|
|
## Timeline Validation
|
|||
|
|
|
|||
|
|
### SLA Parsing
|
|||
|
|
|
|||
|
|
Supports formats:
|
|||
|
|
- "2 hours"
|
|||
|
|
- "1 day"
|
|||
|
|
- "30 minutes"
|
|||
|
|
- "45 seconds"
|
|||
|
|
|
|||
|
|
### Feasibility Checks
|
|||
|
|
|
|||
|
|
1. **Time Check**: `estimatedTime ≤ maxTimeSeconds`
|
|||
|
|
2. **Output Check**: `outputLoad ≤ 1.5 × (inputLoad × 1.2)`
|
|||
|
|
3. **Total Load Check**: `totalLoad ≤ 1.3 × (inputLoad × 3.2)`
|
|||
|
|
|
|||
|
|
### Warning Thresholds
|
|||
|
|
|
|||
|
|
- **Critical**: Estimated time exceeds timeline
|
|||
|
|
- **Warning**: Estimated time > 80% of timeline
|
|||
|
|
- **Info**: Output load > 1.5× target
|
|||
|
|
|
|||
|
|
## User Experience Flow
|
|||
|
|
|
|||
|
|
### Step 1: Select Outputs
|
|||
|
|
- User checks desired outputs
|
|||
|
|
- Engine calculates O in real-time
|
|||
|
|
- Shows total output load
|
|||
|
|
|
|||
|
|
### Step 2: Specify Input
|
|||
|
|
- User enters dataset, attributes, range
|
|||
|
|
- Engine calculates I in real-time
|
|||
|
|
- Shows estimated input load
|
|||
|
|
|
|||
|
|
### Step 3: Set Timeline
|
|||
|
|
- User selects mode and SLA
|
|||
|
|
- Engine validates feasibility
|
|||
|
|
- Shows estimated time and warnings
|
|||
|
|
|
|||
|
|
### Step 4: Review & Run
|
|||
|
|
- Engine shows complete analysis
|
|||
|
|
- User reviews warnings/suggestions
|
|||
|
|
- User confirms and runs
|
|||
|
|
|
|||
|
|
## Error Handling
|
|||
|
|
|
|||
|
|
### Invalid Configurations
|
|||
|
|
|
|||
|
|
1. **No Outputs Selected**: Disable run button
|
|||
|
|
2. **No Dataset**: Disable run button
|
|||
|
|
3. **Invalid SLA Format**: Show format hint
|
|||
|
|
4. **Infeasible Timeline**: Show suggestions
|
|||
|
|
|
|||
|
|
### Suggestions
|
|||
|
|
|
|||
|
|
Engine provides actionable suggestions:
|
|||
|
|
- "Consider reducing outputs"
|
|||
|
|
- "Consider extending timeline"
|
|||
|
|
- "Consider simplifying input filters"
|
|||
|
|
|
|||
|
|
## Performance Considerations
|
|||
|
|
|
|||
|
|
### Processing Rates
|
|||
|
|
|
|||
|
|
Rates are configurable and can be tuned based on:
|
|||
|
|
- Hardware capabilities
|
|||
|
|
- Network bandwidth
|
|||
|
|
- Concurrent job limits
|
|||
|
|
- Historical performance data
|
|||
|
|
|
|||
|
|
### Optimization Strategies
|
|||
|
|
|
|||
|
|
1. **Parallel Processing**: Process outputs in parallel when possible
|
|||
|
|
2. **Caching**: Cache intermediate results
|
|||
|
|
3. **Batch Processing**: Batch similar operations
|
|||
|
|
4. **Resource Allocation**: Allocate resources based on load
|
|||
|
|
|
|||
|
|
## Future Enhancements
|
|||
|
|
|
|||
|
|
1. **Machine Learning**: Learn from historical runs to improve estimates
|
|||
|
|
2. **Dynamic Weights**: Adjust weights based on actual performance
|
|||
|
|
3. **Resource Scaling**: Automatically scale resources based on load
|
|||
|
|
4. **Cost Estimation**: Add cost estimates alongside time estimates
|
|||
|
|
5. **Multi-Tenant**: Support multiple concurrent orchestrations
|
|||
|
|
|
|||
|
|
## Related Documentation
|
|||
|
|
|
|||
|
|
- [Orchestration Engine](./ORCHESTRATION_ENGINE.md)
|
|||
|
|
- [Output Weight Guidelines](./OUTPUT_WEIGHTS.md)
|
|||
|
|
- [API Reference](./API_REFERENCE.md)
|
|||
|
|
|