Chatterbox API - AI Text to Speech APIs
by Resemble-ai
Chatterbox API, developers can convert text into lifelike audio with customizable voice characteristics. The API supports multiple languages and speaking styles, making it ideal for voiceovers, audiobooks, virtual assistants, and accessibility applications requiring human-quality speech output.

Models Version
Get $5 Free Credit on First Payment
No strings attached — add funds and get $5 bonus instantly
Chatterbox v1 Text to Speech API Documentation
https://gateway.pixazo.ai/chatterbox-text-to-speech/v1
Authentication
All requests require an API key passed via header.
| Header | Type | Required | Description |
|---|---|---|---|
| Ocp-Apim-Subscription-Key | string | Yes | Your API subscription key |
Chatterbox Text to Speech generate request - Chatterbox Text to Speech API
Request Code
POST https://gateway.pixazo.ai/chatterbox-text-to-speech/v1/chatterbox-text-to-speech-request
Content-Type: application/json
Cache-Control: no-cache
Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY
{
"text": "Hello world, this is a test of the Chatterbox text to speech model.",
"audio_url": "https://storage.googleapis.com/chatterbox-demo-samples/prompts/male_rickmorty.mp3",
"exaggeration": 0.25,
"temperature": 0.7,
"cfg": 0.5
}
import requests
url = "https://gateway.pixazo.ai/chatterbox-text-to-speech/v1/chatterbox-text-to-speech-request"
headers = {
"Content-Type": "application/json",
"Cache-Control": "no-cache",
"Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
}
data = {
"text": "Hello world, this is a test of the Chatterbox text to speech model.",
"audio_url": "https://storage.googleapis.com/chatterbox-demo-samples/prompts/male_rickmorty.mp3",
"exaggeration": 0.25,
"temperature": 0.7,
"cfg": 0.5
}
response = requests.post(url, json=data, headers=headers)
print(response.json())
const url = 'https://gateway.pixazo.ai/chatterbox-text-to-speech/v1/chatterbox-text-to-speech-request';
const data = {
text: 'Hello world, this is a test of the Chatterbox text to speech model.',
audio_url: 'https://storage.googleapis.com/chatterbox-demo-samples/prompts/male_rickmorty.mp3',
exaggeration: 0.25,
temperature: 0.7,
cfg: 0.5
};
fetch(url, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Cache-Control': 'no-cache',
'Ocp-Apim-Subscription-Key': 'YOUR_SUBSCRIPTION_KEY'
},
body: JSON.stringify(data)
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error('Error:', error));
curl -X POST "https://gateway.pixazo.ai/chatterbox-text-to-speech/v1/chatterbox-text-to-speech-request" \
-H "Content-Type: application/json" \
-H "Cache-Control: no-cache" \
-H "Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY" \
--data-raw '{
"text": "Hello world, this is a test of the Chatterbox text to speech model.",
"audio_url": "https://storage.googleapis.com/chatterbox-demo-samples/prompts/male_rickmorty.mp3",
"exaggeration": 0.25,
"temperature": 0.7,
"cfg": 0.5
}'
Output
{
"request_id": "chatterbox-text-to-speech_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "QUEUED",
"polling_url": "https://gateway.pixazo.ai/v2/requests/status/chatterbox-text-to-speech_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
Webhook (Optional)
Add the X-Webhook-URL header to your submit request to receive a POST callback when the job completes — no polling required.
Webhook Headers
| Header | Required | Default | Description |
|---|---|---|---|
X-Webhook-URL | Yes (to enable) | — | HTTPS endpoint on your server that will receive the POST callback. Must respond 2xx within a few seconds (process async if needed). |
X-Webhook-Mode | No | terminal | terminal — fires once at the final status (COMPLETED/FAILED/ERROR). sync — fires on every poll cycle plus the terminal event, and caps the queue’s polling delay at 15s for tighter progress updates. |
Example: enable webhook
X-Webhook-URL: https://your-server.com/webhook/callback
X-Webhook-Mode: terminal
Callback Payload
Your endpoint receives a POST application/json with the same shape as the GET /v2/requests/status/{request_id} response. Example terminal callback (mode terminal):
{
"request_id": "chatterbox-text-to-speech_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "COMPLETED",
"model_id": "chatterbox-text-to-speech",
"error": null,
"output": {
"media_url": [
"https://pub-582b7213209642b9b995c96c95a30381.r2.dev/v1/chatterbox-text-to-speech_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/output.wav"
],
"media_type": "audio/wav"
},
"created_at": "2026-05-22T13:17:32.110Z",
"updated_at": "2026-05-22 13:19:23",
"completed_at": "2026-05-22 13:19:23"
}
Failure callback shape
{
"request_id": "chatterbox-text-to-speech_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "ERROR",
"model_id": "chatterbox-text-to-speech",
"error": "Description of the error",
"output": null,
"created_at": "...",
"updated_at": "...",
"completed_at": "..."
}
Delivery semantics
- terminal mode (default) — exactly one
POSTwhen the request reaches a terminal status. No callback duringPROCESSING. - sync mode —
POSTon every status poll (with delay capped at ~15s) plus a finalPOSTat terminal status. Use when you want progress updates. - Idempotency — use
request_idas your idempotency key. Network retries can deliver the same callback more than once; your handler must tolerate duplicates. - Response — respond
200 OKwithin a few seconds. The queue does not block on slow handlers, but persistent failures may stop further deliveries. - HTTPS required — plain
http://URLs are rejected.
Request Parameters - Chatterbox Text to Speech generate request
| Parameter | Required | Type | Default | Allowed values / range | Description |
|---|---|---|---|---|---|
| text | Yes | string | — | — | The textual content to convert into speech. Must be a valid string of readable language. |
| audio_url | No | string | — | — | A URL pointing to an audio file (e.g., MP3) to serve as a voice reference. Used to clone or adapt the speaking style. |
| exaggeration | No | number | — | 0.0–1.0. | Controls the degree of expressive emphasis in the generated speech. Higher values increase modulation (e.g., intonation, stress). Range: 0.0 to 1.0. |
| temperature | No | number | — | 0.1–1.0. | Controls randomness in voice generation. Higher values increase variability in pitch and timing; lower values produce more consistent, predictable speech. Range: 0.1 to 1.0. |
| cfg | No | number | — | 0.0–2.0. | Classifier-Free Guidance strength. Influences how closely the output adheres to the input prompt and reference audio. Higher values increase fidelity. Range: 0.0 to 2.0. |
Example Request
{
"text": "Hello world, this is a test of the Chatterbox text to speech model.",
"audio_url": "https://storage.googleapis.com/chatterbox-demo-samples/prompts/male_rickmorty.mp3",
"exaggeration": 0.25,
"temperature": 0.7,
"cfg": 0.5
}
Response
{
"request_id": "chatterbox-text-to-speech_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "QUEUED",
"polling_url": "https://gateway.pixazo.ai/v2/requests/status/chatterbox-text-to-speech_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
Request Headers
| Header | Value |
|---|---|
| Content-Type | application/json |
| Cache-Control | no-cache |
| Ocp-Apim-Subscription-Key | YOUR_SUBSCRIPTION_KEY |
Response Handling
Common status codes.
| Code | Meaning |
|---|---|
| 202 | Accepted — Request queued |
| 400 | Bad Request |
| 401 | Unauthorized |
| 402 | Insufficient Balance |
| 403 | Forbidden |
| 429 | Too Many Requests |
| 500 | Internal Server Error |
Error Responses
Queue system errors and model validation errors.
Queue System Errors
// 402 — Insufficient balance
{
"error": "Insufficient Balance",
"message": "Your wallet does not have enough balance. Required: $0.01"
}
// 400 — Model not found
{
"error": "Model not found",
"message": "Model 'chatterbox-text-to-speech' not found or is disabled"
}
Error via Status/Webhook
{
"request_id": "chatterbox-text-to-speech_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "ERROR",
"model_id": "chatterbox-text-to-speech",
"error": "Description of the error",
"output": null
}
Retrieving Results
Poll the universal status endpoint to check progress and retrieve results.
Endpoint
GET https://gateway.pixazo.ai/v2/requests/status/{request_id}
Ocp-Apim-Subscription-Key: YOUR_API_KEY
cURL Example
curl -H "Ocp-Apim-Subscription-Key: YOUR_API_KEY" \
"https://gateway.pixazo.ai/v2/requests/status/chatterbox-text-to-speech_019d42ce-bc92-7f98-8181-b42db433b9f2e"
Response (Completed)
{
"request_id": "chatterbox-text-to-speech_019d42ce-bc92-7f98-8181-b42db433b9f2e",
"status": "COMPLETED",
"model_id": "chatterbox-text-to-speech",
"error": null,
"output": {
"media_url": [
"https://pub-582b7213209642b9b995c96c95a30381.r2.dev/v1/chatterbox-text-to-speech_019d42ce-bc92-7f98-8181-b42db433b9f2e/output.wav"
],
"media_type": "audio/wav"
},
"created_at": "2026-03-31T07:32:03.749Z",
"updated_at": "2026-03-31T07:32:20.000Z",
"completed_at": "2026-03-31T07:32:20.000Z"
}
Response Fields
| Field | Type | Description |
|---|---|---|
| request_id | string | Unique request identifier |
| status | string | QUEUED, PROCESSING, COMPLETED, FAILED, or ERROR |
| model_id | string | Model that processed the request |
| error | string|null | Error message if failed |
| output.media_url | array | URLs to generated media (R2 CDN) |
| output.media_type | string | MIME type (audio/wav) |
| created_at | string | When request was created |
| completed_at | string|null | When request completed |
| polling_url | string | Status URL (initial response only) |
Status Values
| Status | Description |
|---|---|
| QUEUED | Request accepted, waiting to be processed |
| PROCESSING | Being processed by the model |
| COMPLETED | Done — output contains the result |
| FAILED | Failed — check error field |
| ERROR | System error — not charged |
Status Flow
QUEUED → PROCESSING → COMPLETED
→ FAILED
→ ERROR
Typical Workflow
- Send a generate request to the API endpoint
- Save the
request_idfrom the response - Poll every 5-10 seconds:
GET /v2/requests/status/{request_id} - When
statusis"COMPLETED", download fromoutput.media_url
Tip: Use X-Webhook-URL header to get a callback instead of polling.
Chatterbox v1 Text to Speech API Pricing
| Resolution | Price (USD) |
|---|---|
| All Resolution | $0.03 |