Files
proxmox/docs/04-configuration/PHOENIX_TTS_API_CONTRACT.md

62 lines
2.0 KiB
Markdown
Raw Normal View History

# Phoenix TTS API contract (ElevenLabs-compatible)
**Last Updated:** 2026-02-10
**Purpose:** So virtual-banker (and other apps) can “just change endpoint” from ElevenLabs to a Phoenix-hosted TTS service.
---
## Required endpoints
The Phoenix TTS service **must** implement the same HTTP contract as ElevenLabs for these paths (base path is the apps `/tts` or similar; below uses prefix `/v1`).
### 1. Sync text-to-speech
- **Method:** `POST`
- **Path:** `/v1/text-to-speech/:voice_id`
- **Headers:**
- `Content-Type: application/json`
- `Accept: audio/mpeg`
- Auth: either `xi-api-key: <key>` or `Authorization: Bearer <token>` (configurable in client)
- **Body (JSON):**
```json
{
"text": "Hello world",
"model_id": "eleven_multilingual_v2",
"voice_settings": {
"stability": 0.5,
"similarity_boost": 0.75,
"style": 0,
"use_speaker_boost": true
}
}
```
- **Response:** `200 OK`, body = raw **mp3** bytes (`audio/mpeg`).
### 2. Streaming text-to-speech
- **Method:** `POST`
- **Path:** `/v1/text-to-speech/:voice_id/stream`
- **Headers:** Same as sync.
- **Body:** Same JSON as sync.
- **Response:** `200 OK`, body = **streaming** mp3 (same format).
### 3. Health (recommended)
- **Method:** `GET`
- **Path:** `/health` (at same origin as the TTS base URL, e.g. `https://phoenix.example.com/tts/health` if base is `.../tts/v1`)
- **Response:** `200 OK` (body optional; used for readiness).
---
## Optional
- **Auth:** If Phoenix uses a different scheme (e.g. Bearer only), clients set `TTS_AUTH_HEADER_NAME` / `TTS_AUTH_HEADER_VALUE`; no API change.
- **Visemes:** For better lip-sync, a future endpoint could return phoneme/viseme timings; client would call it when available.
---
## Reference
- Virtual-banker TTS client: `virtual-banker/backend/tts` (see `backend/tts/README.md`).
- ElevenLabs TTS API: [Text-to-speech](https://elevenlabs.io/docs/api-reference/text-to-speech), [Stream](https://elevenlabs.io/docs/api-reference/text-to-speech/stream).