docs/fairness-audit/ORCHESTRATION_DESIGN.md

# Fairness Audit Orchestration - Design Document

## Design Philosophy

> **We design from the end result backwards.**

First we list every output you want (reports, files, metrics), then we calculate how much input validation and enrichment is needed — roughly **2× your input effort** — and size the total job at about **3.2× the input**.

You only choose:
1. **What goes in** (Input)
2. **What comes out** (Output)
3. **When it needs to be ready** (Timeline)

The orchestration engine does everything in between.

## Mathematical Model

### Core Formula

```
Total Process Load = O + 2I ≈ 3.2I
```

Where:
- **I** = Input size/effort (units)
- **O** = Total output effort (sum of output weights)
- **2I** = Two input processing passes (ingestion + enrichment + fairness evaluation)

### Design Target

```
O ≈ 1.2 × I
```

This means for a typical input of 100 units:
- Output should be around 120 units
- Total load ≈ 320 units
- Input passes = 200 units (2 × 100)

### Why 2× Input?

The input requires two full passes:
1. **Ingestion & Enrichment**: Load data, validate, enrich with metadata
2. **Fairness Evaluation**: Run fairness algorithms, calculate metrics

Each pass processes the full input, hence 2×.

## Output Weight Guidelines

### Weight Calculation Factors

1. **Complexity**: How complex is the output to generate?
2. **Data Volume**: How much data does it contain?
3. **Processing Time**: How long does generation take?
4. **Dependencies**: Does it depend on other outputs?

### Recommended Weights

| Complexity | Typical Weight Range | Examples |
|-----------|---------------------|----------|
| Simple | 0.5 - 1.0 | Metrics export, Alert config |
| Medium | 1.0 - 2.0 | CSV exports, JSON reports |
| Complex | 2.0 - 3.0 | PDF reports, Dashboards, Compliance docs |

### Weight Examples

- **Metrics Export (1.0)**: Simple calculation, small output
- **Flagged Cases CSV (1.5)**: Medium complexity, moderate data
- **Fairness Audit PDF (2.5)**: Complex formatting, large output
- **Compliance Report (2.2)**: Complex structure, regulatory requirements

## Input Load Estimation

### Base Calculation

```typescript
Base = 100 units

+ Sensitive Attributes: 20 units each
+ Date Range: 5 units per day
+ Filters: 10 units each
+ Estimated Size: Use if provided
```

### Example Calculations

**Small Dataset**:
- 2 sensitive attributes
- 7-day range
- 1 filter
- Load = 100 + (2×20) + (7×5) + (1×10) = 165 units

**Large Dataset**:
- 5 sensitive attributes
- 90-day range
- 5 filters
- Load = 100 + (5×20) + (90×5) + (5×10) = 700 units

## Timeline Validation

### SLA Parsing

Supports formats:
- "2 hours"
- "1 day"
- "30 minutes"
- "45 seconds"

### Feasibility Checks

1. **Time Check**: `estimatedTime ≤ maxTimeSeconds`
2. **Output Check**: `outputLoad ≤ 1.5 × (inputLoad × 1.2)`
3. **Total Load Check**: `totalLoad ≤ 1.3 × (inputLoad × 3.2)`

### Warning Thresholds

- **Critical**: Estimated time exceeds timeline
- **Warning**: Estimated time > 80% of timeline
- **Info**: Output load > 1.5× target

## User Experience Flow

### Step 1: Select Outputs
- User checks desired outputs
- Engine calculates O in real-time
- Shows total output load

### Step 2: Specify Input
- User enters dataset, attributes, range
- Engine calculates I in real-time
- Shows estimated input load

### Step 3: Set Timeline
- User selects mode and SLA
- Engine validates feasibility
- Shows estimated time and warnings

### Step 4: Review & Run
- Engine shows complete analysis
- User reviews warnings/suggestions
- User confirms and runs

## Error Handling

### Invalid Configurations

1. **No Outputs Selected**: Disable run button
2. **No Dataset**: Disable run button
3. **Invalid SLA Format**: Show format hint
4. **Infeasible Timeline**: Show suggestions

### Suggestions

Engine provides actionable suggestions:
- "Consider reducing outputs"
- "Consider extending timeline"
- "Consider simplifying input filters"

## Performance Considerations

### Processing Rates

Rates are configurable and can be tuned based on:
- Hardware capabilities
- Network bandwidth
- Concurrent job limits
- Historical performance data

### Optimization Strategies

1. **Parallel Processing**: Process outputs in parallel when possible
2. **Caching**: Cache intermediate results
3. **Batch Processing**: Batch similar operations
4. **Resource Allocation**: Allocate resources based on load

## Future Enhancements

1. **Machine Learning**: Learn from historical runs to improve estimates
2. **Dynamic Weights**: Adjust weights based on actual performance
3. **Resource Scaling**: Automatically scale resources based on load
4. **Cost Estimation**: Add cost estimates alongside time estimates
5. **Multi-Tenant**: Support multiple concurrent orchestrations

## Related Documentation

- [Orchestration Engine](./ORCHESTRATION_ENGINE.md)
- [Output Weight Guidelines](./OUTPUT_WEIGHTS.md)
- [API Reference](./API_REFERENCE.md)