Some checks failed
CI / build (push) Has been cancelled
Co-authored-by: Cursor <cursoragent@cursor.com>
87 lines
4.5 KiB
Markdown
87 lines
4.5 KiB
Markdown
# TTS package — ElevenLabs-compatible, Phoenix endpoint swap
|
||
|
||
This package provides a **text-to-speech client** that matches the [ElevenLabs TTS API](https://elevenlabs.io/docs/api-reference/text-to-speech) contract. You can point it at **ElevenLabs** or at a **Phoenix-hosted** TTS service that implements the same API shape; switching is a config change (base URL), no code change.
|
||
|
||
**Note:** The repo [eleven-labs/api-service](https://github.com/eleven-labs/api-service) on GitHub is a PHP OpenAPI consumer library, not the voice TTS API. This client targets the **REST TTS API** at `api.elevenlabs.io` (and compatible backends).
|
||
|
||
---
|
||
|
||
## Parity with ElevenLabs TTS API
|
||
|
||
| Feature | ElevenLabs API | This client |
|
||
|--------|----------------|-------------|
|
||
| **Sync** `POST /v1/text-to-speech/:voice_id` | ✅ | ✅ `Synthesize` |
|
||
| **Stream** `POST /v1/text-to-speech/:voice_id/stream` | ✅ | ✅ `SynthesizeStream` |
|
||
| **Voice settings** (stability, similarity_boost, style, speaker_boost) | ✅ | ✅ `VoiceConfig` |
|
||
| **Model** (`model_id`) | ✅ | ✅ `SetModelID` / default `eleven_multilingual_v2` |
|
||
| **Auth** `xi-api-key` header | ✅ | ✅ |
|
||
| **Output** `Accept: audio/mpeg` (mp3) | ✅ | ✅ |
|
||
| **Retries** (5xx, backoff) | — | ✅ on sync |
|
||
| **Visemes** (lip sync) | ❌ (no phoneme API) | ✅ client-side approximation |
|
||
|
||
Optional ElevenLabs features not used here: `output_format` query, `optimize_streaming_latency`, WebSocket streaming. For “just change endpoint” to Phoenix, the host only needs to implement the same **sync + stream** JSON body and return **audio/mpeg**.
|
||
|
||
---
|
||
|
||
## Which TTS backend? (decision table)
|
||
|
||
| Env / condition | Backend used |
|
||
|----------------|--------------|
|
||
| `TTS_VOICE_ID` unset (or no auth) | **Mock** (no real synthesis) |
|
||
| `TTS_VOICE_ID` + `TTS_API_KEY` or `ELEVENLABS_*` set, `TTS_BASE_URL` unset | **ElevenLabs** (api.elevenlabs.io) |
|
||
| `TTS_BASE_URL` set (e.g. Phoenix) + auth + voice | **Phoenix** (or other compatible host) |
|
||
| `USE_PHOENIX_TTS=true` | Prefer Phoenix; use `TTS_BASE_URL` or `PHOENIX_TTS_BASE_URL` |
|
||
|
||
Auth: default header is `xi-api-key` (ElevenLabs). For Phoenix with Bearer token set `TTS_AUTH_HEADER_NAME=Authorization` and `TTS_AUTH_HEADER_VALUE=Bearer <token>`.
|
||
|
||
---
|
||
|
||
## Using with Phoenix (swap endpoint)
|
||
|
||
1. **Phoenix TTS service** must expose the same contract:
|
||
- `POST /v1/text-to-speech/:voice_id` — body: `{"text","model_id","voice_settings"}` → response: raw mp3
|
||
- `POST /v1/text-to-speech/:voice_id/stream` — same body → response: streaming mp3
|
||
- **Health:** `GET /health` at the same origin (e.g. `{baseURL}/../health`) returning 2xx so `tts.Service.Health(ctx)` can be used for readiness.
|
||
|
||
2. **Configure the app** with the Phoenix base URL (and optional auth):
|
||
|
||
```bash
|
||
export TTS_BASE_URL="https://phoenix.example.com/tts/v1"
|
||
export TTS_VOICE_ID="default-voice-id"
|
||
# Optional: Phoenix uses Bearer token
|
||
export TTS_AUTH_HEADER_NAME="Authorization"
|
||
export TTS_AUTH_HEADER_VALUE="Bearer your-token"
|
||
# Or feature flag to force Phoenix
|
||
export USE_PHOENIX_TTS=true
|
||
export PHOENIX_TTS_BASE_URL="https://phoenix.example.com/tts/v1"
|
||
```
|
||
|
||
3. **Health check:** The client’s `Health(ctx)` calls `GET {baseURL}/../health` when base URL is not ElevenLabs. Wire this into your readiness probe or a `/ready` endpoint if you need TTS to be up before accepting traffic.
|
||
|
||
4. **In code** (e.g. for reuse in another project):
|
||
|
||
```go
|
||
opts := tts.TTSOptions{
|
||
BaseURL: "https://phoenix.example.com/tts/v1",
|
||
AuthHeaderName: "Authorization",
|
||
AuthHeaderValue: "Bearer token",
|
||
}
|
||
svc := tts.NewElevenLabsTTSServiceWithOptionsFull(apiKey, voiceID, opts)
|
||
if err := svc.Health(ctx); err != nil { /* not ready */ }
|
||
audio, err := svc.Synthesize(ctx, "Hello world")
|
||
```
|
||
|
||
No code change beyond config: same interface, different base URL and optional auth header.
|
||
|
||
---
|
||
|
||
## Reuse across projects
|
||
|
||
This package lives in **virtual-banker** and can be depended on as a Go module path (e.g. `github.com/your-org/virtual-banker/backend/tts` or via a shared repo). Any project that needs TTS can:
|
||
|
||
- Depend on this package.
|
||
- Use `tts.Service` and either `NewMockTTSService()` or `NewElevenLabsTTSServiceWithOptions(apiKey, voiceID, baseURL)` / `NewElevenLabsTTSServiceWithOptionsFull(apiKey, voiceID, opts)` for custom auth.
|
||
- Set `baseURL` to ElevenLabs (`""` or `https://api.elevenlabs.io/v1`) or to the Phoenix TTS base URL.
|
||
|
||
The **interface** (`Synthesize`, `SynthesizeStream`, `GetVisemes`) stays the same regardless of backend.
|