Zonos 2 API: Pricing, Documentation
by Zyphra
Zonos 2 API is a text-to-speech and voice synthesis API that enables developers to generate natural-sounding speech from text with support for voice cloning, multilingual speech generation, and expressive audio controls. It is designed to create high-quality voice outputs using short reference audio samples while offering customization over speaking style, pitch, speed, and emotional tone. The API can be integrated into applications such as virtual assistants, content creation tools, audiobooks, customer support systems, and accessibility solutions, delivering realistic and responsive speech generation through a scalable cloud-based interface.

Models Version
Get $5 Free Credit on First Payment
No strings attached — add funds and get $5 bonus instantly
Zonos 2 API Documentation
https://gateway.pixazo.ai/zonos-2/v1/zonos-2-request
Authentication
All requests require an API key passed via header.
| Header | Type | Required | Description |
|---|---|---|---|
| Ocp-Apim-Subscription-Key | string | Yes | Your API subscription key |
Zonos 2 generate request
Request Code
POST https://gateway.pixazo.ai/zonos-2/v1/zonos-2-request
Content-Type: application/json
Cache-Control: no-cache
Ocp-Apim-Subscription-Key: YOUR_API_KEY
{
"reference_audio_url": "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/Reference.wav",
"text": "Hello, this is a sample of my cloned voice speaking naturally.",
"language": "en_us",
"accurate_mode": true,
"clean_speaker_background": false,
"temperature": 1.15,
"top_p": 0,
"min_p": 0.18,
"top_k": 106
}
import requests
url = "https://gateway.pixazo.ai/zonos-2/v1/zonos-2-request"
headers = {
"Content-Type": "application/json",
"Cache-Control": "no-cache",
"Ocp-Apim-Subscription-Key": "YOUR_API_KEY"
}
data = {
"reference_audio_url": "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/Reference.wav",
"text": "Hello, this is a sample of my cloned voice speaking naturally.",
"language": "en_us",
"accurate_mode": true,
"clean_speaker_background": false,
"temperature": 1.15,
"top_p": 0,
"min_p": 0.18,
"top_k": 106
}
response = requests.post(url, json=data, headers=headers)
print(response.json())
const url = "https://gateway.pixazo.ai/zonos-2/v1/zonos-2-request";
const headers = {
"Content-Type": "application/json",
"Cache-Control": "no-cache",
"Ocp-Apim-Subscription-Key": "YOUR_API_KEY"
};
const data = {
"reference_audio_url": "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/Reference.wav",
"text": "Hello, this is a sample of my cloned voice speaking naturally.",
"language": "en_us",
"accurate_mode": true,
"clean_speaker_background": false,
"temperature": 1.15,
"top_p": 0,
"min_p": 0.18,
"top_k": 106
};
fetch(url, {
method: "POST",
headers: headers,
body: JSON.stringify(data)
})
.then(response => response.json())
.then(data => console.log(data));
curl -X POST "https://gateway.pixazo.ai/zonos-2/v1/zonos-2-request" \
-H "Content-Type: application/json" \
-H "Cache-Control: no-cache" \
-H "Ocp-Apim-Subscription-Key: YOUR_API_KEY" \
--data-raw '{
"reference_audio_url": "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/Reference.wav",
"text": "Hello, this is a sample of my cloned voice speaking naturally.",
"language": "en_us",
"accurate_mode": true,
"clean_speaker_background": false,
"temperature": 1.15,
"top_p": 0,
"min_p": 0.18,
"top_k": 106
}'
Output
{
"request_id": "zonos-2_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "QUEUED",
"polling_url": "https://gateway.pixazo.ai/v2/requests/status/zonos-2_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
Webhook (Optional)
Add the X-Webhook-URL header to your generate request to receive a POST callback instead of polling.
X-Webhook-URL: https://your-server.com/webhook/callback
Request Parameters - Zonos 2 generate request
| Parameter | Required | Type | Default | Allowed values / range | Description |
|---|---|---|---|---|---|
| reference_audio_url | Yes | string | — | — | URL of the reference audio to clone the voice from. Supported formats: MP3, OGG, WAV, M4A, AAC. |
| text | No | string | "Hello, this is a sample of my cloned voice speaking naturally." | — | Text to synthesize in the cloned voice. If omitted, a built-in example is used. |
| language | No | enum | "en_us" | "en_us", "en_gb", "fr_fr", "de", "es", "it", "pt_br", "ja", "cmn", "ko" | Text-normalization language code. |
| accurate_mode | No | boolean | true | — | True = closer voice match; false = more expressive delivery. |
| clean_speaker_background | No | boolean | false | — | Mark the reference audio as having a clean background (removes ambient noise processing). |
| temperature | No | float | 1.15 | 0–2 | Sampling temperature (0–2). Higher values increase randomness. |
| top_p | No | float | 0 | 0–1 | Nucleus sampling probability (0–1). Value of 0 disables nucleus sampling. |
| min_p | No | float | 0.18 | 0–1 | Minimum-probability sampling threshold (0–1). Filters out tokens below this probability. |
| top_k | No | integer | 106 | 0–2048 | Top-k sampling cutoff (0–2048). Value of 0 disables top-k sampling. |
| max_tokens | No | integer | — | 1–6144 | Maximum audio frames to generate (1–6144). Limits output duration. |
| seed | No | integer | — | 0–2147483647 | Random seed for reproducibility (0–2147483647). |
Example Request
{
"reference_audio_url": "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/Reference.wav",
"text": "Hello, this is a sample of my cloned voice speaking naturally.",
"language": "en_us",
"accurate_mode": true,
"clean_speaker_background": false,
"temperature": 1.15,
"top_p": 0,
"min_p": 0.18,
"top_k": 106
}
Response
{
"request_id": "zonos-2_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "QUEUED",
"polling_url": "https://gateway.pixazo.ai/v2/requests/status/zonos-2_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
Request Headers
| Header | Value |
|---|---|
| Content-Type | application/json |
| Cache-Control | no-cache |
| Ocp-Apim-Subscription-Key | YOUR_API_KEY |
Response Handling
Common status codes.
| Code | Meaning |
|---|---|
| 202 | Accepted — Request queued |
| 400 | Bad Request |
| 401 | Unauthorized |
| 402 | Insufficient Balance |
| 403 | Forbidden |
| 429 | Too Many Requests |
| 500 | Internal Server Error |
Error Responses
Queue system errors and model validation errors.
Queue System Errors
// 402 — Insufficient balance
{
"error": "Insufficient Balance",
"message": "Your wallet does not have enough balance."
}
// 400 — Model not found
{
"error": "Model not found",
"message": "Model 'zonos-2' not found or is disabled"
}
Error via Status/Webhook
{
"request_id": "zonos-2_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "ERROR",
"model_id": "zonos-2",
"error": "Description of the error",
"output": null
}
Retrieving Results
Poll the universal status endpoint to check progress and retrieve results.
Endpoint
GET https://gateway.pixazo.ai/v2/requests/status/{request_id}
Ocp-Apim-Subscription-Key: YOUR_API_KEY
cURL Example
curl -H "Ocp-Apim-Subscription-Key: YOUR_API_KEY" \
"https://gateway.pixazo.ai/v2/requests/status/zonos-2_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
Response (Completed)
{
"request_id": "zonos-2_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "COMPLETED",
"model_id": "zonos-2",
"error": null,
"output": {
"media_url": ["https://pub-582b7213209642b9b995c96c95a30381.r2.dev/v1/zonos-2_019dxxxx/output.wav"],
"media_type": "audio/wav",
"file_name": "output.wav",
"file_size": 1245678
},
"created_at": "2026-03-31T10:00:00.000Z",
"updated_at": "2026-03-31T10:00:15.000Z",
"completed_at": "2026-03-31T10:00:15.000Z"
}
Response Fields
| Field | Type | Description |
|---|---|---|
| request_id | string | Unique request identifier |
| status | string | QUEUED, PROCESSING, COMPLETED, FAILED, or ERROR |
| model_id | string | Model that processed the request |
| error | string|null | Error message if failed |
| output.media_url | array | URLs to generated media (R2 CDN) |
| output.media_type | string | MIME type of the output |
| created_at | string | When request was created |
| completed_at | string | When request completed |
| polling_url | string | Status URL (initial response only) |
Status Values
| Status | Description |
|---|---|
| QUEUED | Request accepted, waiting to be processed |
| PROCESSING | Being processed by the model |
| COMPLETED | Done — output contains the result |
| FAILED | Failed — check error field |
| ERROR | System error — not charged |
Status Flow
QUEUED → PROCESSING → COMPLETED
→ FAILED
→ ERROR
Typical Workflow
- Send a generate request to the API endpoint
- Save the
request_idfrom the response - Poll every 5-10 seconds:
GET /v2/requests/status/{request_id} - When
statusis"COMPLETED", download fromoutput.media_url
Tip: Use X-Webhook-URL header to get a callback instead of polling.