XTTS API - AI Voice Cloning & Text to Speech APIs
by Xtts
XTTS API, developers can clone voices and generate speech in multiple languages while maintaining the cloned voice characteristics. The API is ideal for content localization, personalized voice experiences, and applications requiring custom voice generation across language barriers.

Models Version
Get $5 Free Credit on First Payment
No strings attached — add funds and get $5 bonus instantly
v2 Text to Speech API Documentation
https://gateway.pixazo.ai/voice-clone/v1
Authentication
All requests require an API key passed via header.
| Header | Type | Required | Description |
|---|---|---|---|
| Ocp-Apim-Subscription-Key | string | Yes | Your API subscription key |
Text to Speech Request - XTTS V2 API
Request Code
POST https://gateway.pixazo.ai/voice-clone/v1/xtts-v2/generate
Content-Type: application/json
Cache-Control: no-cache
Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY
{
"speaker": "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/male.wav",
"text": "Hello! Welcome to our voice cloning service.",
"language": "en"
}
import requests
url = "https://gateway.pixazo.ai/voice-clone/v1/xtts-v2/generate"
headers = {
"Content-Type": "application/json",
"Cache-Control": "no-cache",
"Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
}
data = {
"speaker": "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/male.wav",
"text": "Hello! Welcome to our voice cloning service.",
"language": "en"
}
response = requests.post(url, json=data, headers=headers)
print(response.json())
const url = 'https://gateway.pixazo.ai/voice-clone/v1/xtts-v2/generate';
const data = {
speaker: 'https://pub-582b7213209642b9b995c96c95a30381.r2.dev/male.wav',
text: 'Hello! Welcome to our voice cloning service.',
language: 'en'
};
fetch(url, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Cache-Control': 'no-cache',
'Ocp-Apim-Subscription-Key': 'YOUR_SUBSCRIPTION_KEY'
},
body: JSON.stringify(data)
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error('Error:', error));
curl -v -X POST "https://gateway.pixazo.ai/voice-clone/v1/xtts-v2/generate" \
-H "Content-Type: application/json" \
-H "Cache-Control: no-cache" \
-H "Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY" \
--data-raw '{
"speaker": "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/male.wav",
"text": "Hello! Welcome to our voice cloning service.",
"language": "en"
}'
Output
{
"request_id": "xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "QUEUED",
"polling_url": "https://gateway.pixazo.ai/v2/requests/status/xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
Webhook (Optional)
Add the X-Webhook-URL header to your submit request to receive a POST callback when the job completes — no polling required.
Webhook Headers
| Header | Required | Default | Description |
|---|---|---|---|
X-Webhook-URL | Yes (to enable) | — | HTTPS endpoint on your server that will receive the POST callback. Must respond 2xx within a few seconds (process async if needed). |
X-Webhook-Mode | No | terminal | terminal — fires once at the final status (COMPLETED/FAILED/ERROR). sync — fires on every poll cycle plus the terminal event, and caps the queue’s polling delay at 15s for tighter progress updates. |
Example: enable webhook
X-Webhook-URL: https://your-server.com/webhook/callback
X-Webhook-Mode: terminal
Callback Payload
Your endpoint receives a POST application/json with the same shape as the GET /v2/requests/status/{request_id} response. Example terminal callback (mode terminal):
{
"request_id": "xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "COMPLETED",
"model_id": "xtts-v2-api",
"error": null,
"output": {
"media_url": [
"https://pub-582b7213209642b9b995c96c95a30381.r2.dev/v1/xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/output.wav"
],
"media_type": "audio/wav"
},
"created_at": "2026-05-22T13:17:32.110Z",
"updated_at": "2026-05-22 13:19:23",
"completed_at": "2026-05-22 13:19:23"
}
Failure callback shape
{
"request_id": "xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "ERROR",
"model_id": "xtts-v2-api",
"error": "Description of the error",
"output": null,
"created_at": "...",
"updated_at": "...",
"completed_at": "..."
}
Delivery semantics
- terminal mode (default) — exactly one
POSTwhen the request reaches a terminal status. No callback duringPROCESSING. - sync mode —
POSTon every status poll (with delay capped at ~15s) plus a finalPOSTat terminal status. Use when you want progress updates. - Idempotency — use
request_idas your idempotency key. Network retries can deliver the same callback more than once; your handler must tolerate duplicates. - Response — respond
200 OKwithin a few seconds. The queue does not block on slow handlers, but persistent failures may stop further deliveries. - HTTPS required — plain
http://URLs are rejected.
Request Parameters - Text to Speech Request
| Parameter | Required | Type | Default | Allowed values / range | Description |
|---|---|---|---|---|---|
| speaker | Yes | string | — | — | URL to speaker audio file (wav, mp3, m4a, ogg, or flv). 3-10 seconds of clear speech recommended |
| text | No | string | Hi | — | Default: "Hi there, I'm your new voice clone. Try your best to upload quality audio", Text to synthesize (max 500 characters recommended) |
| language | No | string | en | — | Default: "en", Output language code. Supported: en, es, fr, de, it, pt, pl, tr, ru, nl, cs, ar, zh, hu, ko, hi |
| cleanup_voice | No | boolean | false | true, false | Default: false, Apply denoising to speaker audio. Use for microphone recordings with background noise |
| webhook | No | string | — | — | Default: null, Callback URL for completion notification. POST request sent with results when complete |
| webhook_events_filter | No | array | : | — | Default: ["*"], Events that trigger webhook. Values: ["*"] (all), ["completed"] (success/failure only) |
Example Request
{
"speaker": "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/male.wav",
"text": "Hello! Welcome to our voice cloning service.",
"language": "en"
}
Response
{
"request_id": "xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "QUEUED",
"polling_url": "https://gateway.pixazo.ai/v2/requests/status/xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
Request Headers
| Header | Value |
|---|---|
| Content-Type | application/json |
| Cache-Control | no-cache |
| Ocp-Apim-Subscription-Key | YOUR_SUBSCRIPTION_KEY |
Response Handling
Common status codes.
| Code | Meaning |
|---|---|
| 202 | Accepted — Request queued |
| 400 | Bad Request |
| 401 | Unauthorized |
| 402 | Insufficient Balance |
| 403 | Forbidden |
| 429 | Too Many Requests |
| 500 | Internal Server Error |
Error Responses
Queue system errors and model validation errors.
Queue System Errors
// 402 — Insufficient balance
{
"error": "Insufficient Balance",
"message": "Your wallet does not have enough balance."
}
// 400 — Model not found
{
"error": "Model not found",
"message": "Model 'xtts-v2-api' not found or is disabled"
}
Error via Status/Webhook
{
"request_id": "xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "ERROR",
"model_id": "xtts-v2-api",
"error": "Description of the error",
"output": null
}
Retrieving Results
Poll the universal status endpoint to check progress and retrieve results.
Endpoint
GET https://gateway.pixazo.ai/v2/requests/status/{request_id}
Ocp-Apim-Subscription-Key: YOUR_API_KEY
cURL Example
curl -H "Ocp-Apim-Subscription-Key: YOUR_API_KEY" \
"https://gateway.pixazo.ai/v2/requests/status/xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
Response (Completed)
{
"request_id": "xtts-v2-api_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "COMPLETED",
"model_id": "xtts-v2-api",
"error": null,
"output": {
"media_url": [
"https://pub-582b7213209642b9b995c96c95a30381.r2.dev/v1/xtts-v2-api_019dxxxx-xxxx/output.ext"
],
"media_type": "application/octet-stream"
},
"created_at": "2026-03-31T10:00:00.000Z",
"updated_at": "2026-03-31T10:00:15.000Z",
"completed_at": "2026-03-31T10:00:15.000Z"
}
Response Fields
| Field | Type | Description |
|---|---|---|
| request_id | string | Unique request identifier |
| status | string | QUEUED, PROCESSING, COMPLETED, FAILED, or ERROR |
| model_id | string | Model that processed the request |
| error | string|null | Error message if failed |
| output.media_url | array | URLs to generated media (R2 CDN) |
| output.media_type | string | MIME type of the output |
| created_at | string | When request was created |
| completed_at | string|null | When request completed |
| polling_url | string | Status URL (initial response only) |
Status Values
| Status | Description |
|---|---|
| QUEUED | Request accepted, waiting to be processed |
| PROCESSING | Being processed by the model |
| COMPLETED | Done — output contains the result |
| FAILED | Failed — check error field |
| ERROR | System error — not charged |
Status Flow
QUEUED → PROCESSING → COMPLETED
→ FAILED
→ ERROR
Typical Workflow
- Send a generate request to the API endpoint
- Save the
request_idfrom the response - Poll every 5-10 seconds:
GET /v2/requests/status/{request_id} - When
statusis"COMPLETED", download fromoutput.media_url
Tip: Use X-Webhook-URL header to get a callback instead of polling.
v2 Text to Speech API Pricing
| Resolution | Price (USD) |
|---|---|
| All | $0.015 |