106 lines
3.4 KiB
Markdown
106 lines
3.4 KiB
Markdown
|
|
# GPU / TensorCore Integration — Architecture Spec
|
||
|
|
|
||
|
|
## Overview
|
||
|
|
|
||
|
|
FusionAGI integrates GPU-accelerated compute via TensorFlow, CUDA TensorCores, and JAX
|
||
|
|
to transform reasoning, similarity scoring, consensus, and training from CPU-bound
|
||
|
|
symbolic operations into massively parallel tensor operations.
|
||
|
|
|
||
|
|
## Design Principles
|
||
|
|
|
||
|
|
1. **Optional dependency** — GPU support is an extra (`pip install fusionagi[gpu]`).
|
||
|
|
All GPU-accelerated code paths have CPU fallbacks.
|
||
|
|
2. **Module boundary** — GPU compute lives in `fusionagi/gpu/` (new module). Other modules
|
||
|
|
import from `fusionagi.gpu` only when GPU acceleration is needed.
|
||
|
|
3. **Backend abstraction** — `TensorBackend` protocol abstracts TensorFlow, JAX, and
|
||
|
|
pure-NumPy backends. The system auto-selects the best available backend.
|
||
|
|
|
||
|
|
## Module: `fusionagi/gpu/`
|
||
|
|
|
||
|
|
```
|
||
|
|
fusionagi/gpu/
|
||
|
|
├── __init__.py # Public API, auto-detection
|
||
|
|
├── backend.py # TensorBackend protocol + backend registry
|
||
|
|
├── tensorflow_ops.py # TF/TensorCore similarity, attention, scoring
|
||
|
|
├── tensor_similarity.py # GPU-accelerated embedding similarity
|
||
|
|
├── tensor_attention.py # Multi-head attention for consensus
|
||
|
|
├── tensor_scoring.py # Batch hypothesis scoring on GPU
|
||
|
|
└── training.py # GPU-accelerated training loop for self-improvement
|
||
|
|
```
|
||
|
|
|
||
|
|
## Integration Points
|
||
|
|
|
||
|
|
### 1. Reasoning Pipeline (`reasoning/`)
|
||
|
|
|
||
|
|
**Current:** `multi_path.py` scores hypotheses sequentially with word-overlap heuristics.
|
||
|
|
**GPU:** Batch embed hypotheses → cosine similarity matrix on GPU → parallel scoring.
|
||
|
|
|
||
|
|
**Current:** `consensus_engine.py` uses Jaccard word overlap for similarity.
|
||
|
|
**GPU:** Dense embedding vectors + GPU cosine similarity for semantic matching.
|
||
|
|
|
||
|
|
### 2. Super Big Brain (`core/super_big_brain.py`)
|
||
|
|
|
||
|
|
**Current:** `generate_and_score_parallel` uses ThreadPoolExecutor.
|
||
|
|
**GPU:** Tensor-parallel scoring with batched dot-products on TensorCore.
|
||
|
|
|
||
|
|
### 3. Memory Subsystem (`memory/`)
|
||
|
|
|
||
|
|
**Current:** `semantic_graph.py` is pure Python dict/adjacency list.
|
||
|
|
**GPU:** Vector similarity search via GPU-accelerated embedding lookup.
|
||
|
|
|
||
|
|
### 4. Self-Improvement (`self_improvement/`)
|
||
|
|
|
||
|
|
**Current:** `AutoTrainer` suggests heuristic updates, no actual neural training.
|
||
|
|
**GPU:** GPU-backed fine-tuning loops, gradient-based heuristic optimization.
|
||
|
|
|
||
|
|
### 5. Adapter Layer (`adapters/`)
|
||
|
|
|
||
|
|
**New:** `TensorFlowAdapter` — local model inference via TF/Keras with TensorCore.
|
||
|
|
|
||
|
|
## Data Flow
|
||
|
|
|
||
|
|
```
|
||
|
|
User Prompt
|
||
|
|
│
|
||
|
|
▼
|
||
|
|
Decomposition (CPU — symbolic)
|
||
|
|
│
|
||
|
|
▼
|
||
|
|
Embedding (GPU — TF/TensorCore)
|
||
|
|
│
|
||
|
|
├──► Similarity Matrix (GPU — batched cosine)
|
||
|
|
│ │
|
||
|
|
│ ▼
|
||
|
|
│ Consensus Scoring (GPU — attention)
|
||
|
|
│
|
||
|
|
├──► Hypothesis Scoring (GPU — batched inference)
|
||
|
|
│
|
||
|
|
▼
|
||
|
|
Recomposition (CPU — symbolic + GPU scores)
|
||
|
|
│
|
||
|
|
▼
|
||
|
|
Final Response
|
||
|
|
```
|
||
|
|
|
||
|
|
## Backend Selection
|
||
|
|
|
||
|
|
```python
|
||
|
|
from fusionagi.gpu import get_backend, TensorBackend
|
||
|
|
|
||
|
|
backend: TensorBackend = get_backend() # Auto-selects best available
|
||
|
|
# Returns: TensorFlowBackend > NumPyBackend (fallback)
|
||
|
|
```
|
||
|
|
|
||
|
|
## Dependencies
|
||
|
|
|
||
|
|
```toml
|
||
|
|
[project.optional-dependencies]
|
||
|
|
gpu = ["tensorflow>=2.16", "numpy>=1.26"]
|
||
|
|
```
|
||
|
|
|
||
|
|
TensorFlow 2.16+ includes:
|
||
|
|
- TensorCore (FP16/BF16 mixed-precision) via `tf.keras.mixed_precision`
|
||
|
|
- XLA compilation for GPU kernel fusion
|
||
|
|
- `tf.linalg` for batched linear algebra
|
||
|
|
- TensorRT integration for inference optimization
|