Files
virtual-banker/backend/tts/README.md
defiQUG 9839401d1d
Some checks failed
CI / build (push) Has been cancelled
TTS: configurable auth, Health check, Phoenix options; .env.example; Gitea CI workflow
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-10 16:54:10 -08:00

87 lines
4.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# TTS package — ElevenLabs-compatible, Phoenix endpoint swap
This package provides a **text-to-speech client** that matches the [ElevenLabs TTS API](https://elevenlabs.io/docs/api-reference/text-to-speech) contract. You can point it at **ElevenLabs** or at a **Phoenix-hosted** TTS service that implements the same API shape; switching is a config change (base URL), no code change.
**Note:** The repo [eleven-labs/api-service](https://github.com/eleven-labs/api-service) on GitHub is a PHP OpenAPI consumer library, not the voice TTS API. This client targets the **REST TTS API** at `api.elevenlabs.io` (and compatible backends).
---
## Parity with ElevenLabs TTS API
| Feature | ElevenLabs API | This client |
|--------|----------------|-------------|
| **Sync** `POST /v1/text-to-speech/:voice_id` | ✅ | ✅ `Synthesize` |
| **Stream** `POST /v1/text-to-speech/:voice_id/stream` | ✅ | ✅ `SynthesizeStream` |
| **Voice settings** (stability, similarity_boost, style, speaker_boost) | ✅ | ✅ `VoiceConfig` |
| **Model** (`model_id`) | ✅ | ✅ `SetModelID` / default `eleven_multilingual_v2` |
| **Auth** `xi-api-key` header | ✅ | ✅ |
| **Output** `Accept: audio/mpeg` (mp3) | ✅ | ✅ |
| **Retries** (5xx, backoff) | — | ✅ on sync |
| **Visemes** (lip sync) | ❌ (no phoneme API) | ✅ client-side approximation |
Optional ElevenLabs features not used here: `output_format` query, `optimize_streaming_latency`, WebSocket streaming. For “just change endpoint” to Phoenix, the host only needs to implement the same **sync + stream** JSON body and return **audio/mpeg**.
---
## Which TTS backend? (decision table)
| Env / condition | Backend used |
|----------------|--------------|
| `TTS_VOICE_ID` unset (or no auth) | **Mock** (no real synthesis) |
| `TTS_VOICE_ID` + `TTS_API_KEY` or `ELEVENLABS_*` set, `TTS_BASE_URL` unset | **ElevenLabs** (api.elevenlabs.io) |
| `TTS_BASE_URL` set (e.g. Phoenix) + auth + voice | **Phoenix** (or other compatible host) |
| `USE_PHOENIX_TTS=true` | Prefer Phoenix; use `TTS_BASE_URL` or `PHOENIX_TTS_BASE_URL` |
Auth: default header is `xi-api-key` (ElevenLabs). For Phoenix with Bearer token set `TTS_AUTH_HEADER_NAME=Authorization` and `TTS_AUTH_HEADER_VALUE=Bearer <token>`.
---
## Using with Phoenix (swap endpoint)
1. **Phoenix TTS service** must expose the same contract:
- `POST /v1/text-to-speech/:voice_id` — body: `{"text","model_id","voice_settings"}` → response: raw mp3
- `POST /v1/text-to-speech/:voice_id/stream` — same body → response: streaming mp3
- **Health:** `GET /health` at the same origin (e.g. `{baseURL}/../health`) returning 2xx so `tts.Service.Health(ctx)` can be used for readiness.
2. **Configure the app** with the Phoenix base URL (and optional auth):
```bash
export TTS_BASE_URL="https://phoenix.example.com/tts/v1"
export TTS_VOICE_ID="default-voice-id"
# Optional: Phoenix uses Bearer token
export TTS_AUTH_HEADER_NAME="Authorization"
export TTS_AUTH_HEADER_VALUE="Bearer your-token"
# Or feature flag to force Phoenix
export USE_PHOENIX_TTS=true
export PHOENIX_TTS_BASE_URL="https://phoenix.example.com/tts/v1"
```
3. **Health check:** The clients `Health(ctx)` calls `GET {baseURL}/../health` when base URL is not ElevenLabs. Wire this into your readiness probe or a `/ready` endpoint if you need TTS to be up before accepting traffic.
4. **In code** (e.g. for reuse in another project):
```go
opts := tts.TTSOptions{
BaseURL: "https://phoenix.example.com/tts/v1",
AuthHeaderName: "Authorization",
AuthHeaderValue: "Bearer token",
}
svc := tts.NewElevenLabsTTSServiceWithOptionsFull(apiKey, voiceID, opts)
if err := svc.Health(ctx); err != nil { /* not ready */ }
audio, err := svc.Synthesize(ctx, "Hello world")
```
No code change beyond config: same interface, different base URL and optional auth header.
---
## Reuse across projects
This package lives in **virtual-banker** and can be depended on as a Go module path (e.g. `github.com/your-org/virtual-banker/backend/tts` or via a shared repo). Any project that needs TTS can:
- Depend on this package.
- Use `tts.Service` and either `NewMockTTSService()` or `NewElevenLabsTTSServiceWithOptions(apiKey, voiceID, baseURL)` / `NewElevenLabsTTSServiceWithOptionsFull(apiKey, voiceID, opts)` for custom auth.
- Set `baseURL` to ElevenLabs (`""` or `https://api.elevenlabs.io/v1`) or to the Phoenix TTS base URL.
The **interface** (`Synthesize`, `SynthesizeStream`, `GetVisemes`) stays the same regardless of backend.