VoxCPM API - Text to Speech Voice Design API: Pricing, Documentation

VoxCPM API provides developers with powerful tools to integrate continuous, tokenizer-free speech synthesis into their applications. By operating in a continuous latent space rather than relying on discrete audio tokens, it enables deep contextual expressiveness, zero-shot voice cloning, and custom voice design without sacrificing audio fidelity or risking traditional quantization artifacts.

Get API Key

Models Version

WELCOME BONUS

Get $5 Free Credit on First Payment

No strings attached — add funds and get $5 bonus instantly

Claim Your $5 →

Openbmb VoxCPM 2.0 API Documentation

https://gateway.pixazo.ai/voxcpm/v1/text-to-speech

Openbmb VoxCPM2 v1.0 Text to Speech API Documentation

POST https://gateway.pixazo.ai/voxcpm/v1/text-to-speech

Authentication

All requests require an API key passed via header.

Header	Type	Required	Description
Ocp-Apim-Subscription-Key	string	Yes	Your API subscription key

Text to Speech - Openbmb VoxCPM2

Request Code

POST https://gateway.pixazo.ai/voxcpm/v1/text-to-speech
Content-Type: application/json
Cache-Control: no-cache
Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY

{
  "text": "Hello, from Pixazo.",
  "cfg_value": 2.0,
  "dit_steps": 10
}

import requests

url = "https://gateway.pixazo.ai/voxcpm/v1/text-to-speech"
headers = {
    "Content-Type": "application/json",
    "Cache-Control": "no-cache",
    "Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
}
data = {
    "text": "Hello, from Pixazo.",
    "cfg_value": 2.0,
    "dit_steps": 10
}

response = requests.post(url, json=data, headers=headers)
print(response.json())

const url = 'https://gateway.pixazo.ai/voxcpm/v1/text-to-speech';
const headers = {
  'Content-Type': 'application/json',
  'Cache-Control': 'no-cache',
  'Ocp-Apim-Subscription-Key': 'YOUR_SUBSCRIPTION_KEY'
};
const data = {
  text: 'Hello, from Pixazo.',
  cfg_value: 2.0,
  dit_steps: 10
};

fetch(url, {
  method: 'POST',
  headers: headers,
  body: JSON.stringify(data)
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error('Error:', error));

curl -v -X POST "https://gateway.pixazo.ai/voxcpm/v1/text-to-speech" \
  -H "Content-Type: application/json" \
  -H "Cache-Control: no-cache" \
  -H "Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY" \
  --data-raw '{
    "text": "Hello, from Pixazo.",
    "cfg_value": 2.0,
    "dit_steps": 10
  }'

Output

{
  "output": "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/openbmb-voxcpm2/1768578707564-851083.wav"
}

Try Now

Request Parameters - Text to Speech

Parameter	Required	Type	Default	Allowed values / range	Description
text	Yes	string	—	—	The text to convert to speech. Supports natural-language sentences and punctuation for prosody control. VoxCPM 2 also supports voice design: prepend a style instruction in parentheses — e.g. `(calm, whispering)` — to steer the speaking style.
cfg_value	No	number	2.0	1.0 – 3.0 (recommended)	Classifier-free guidance scale — how strictly the generated speech follows the text and voice conditioning. Higher values adhere more closely to the prompt; lower values allow more natural variation. Values far outside the recommended range can degrade audio quality.
dit_steps	No	integer	10	4 – 30 (recommended)	Number of diffusion steps used to synthesize the audio (upstream `inference_timesteps`). More steps improve detail and stability at the cost of speed; fewer steps generate faster.

Example Request

{
  "text": "Hello, from Pixazo.",
  "cfg_value": 2.0,
  "dit_steps": 10
}

Response

{
  "output": "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/openbmb-voxcpm2/1768578707564-851083.wav"
}

Request Headers

Header	Value
Content-Type	application/json
Cache-Control	no-cache
Ocp-Apim-Subscription-Key	YOUR_SUBSCRIPTION_KEY

Response Handling

Common status codes.

Code	Meaning
200	Success — audio generated
400	Bad Request
401	Unauthorized
402	Insufficient Balance
403	Forbidden
429	Too Many Requests
500	Internal Server Error

Openbmb VoxCPM 2.0 API Pricing

Your request is Free

Free during preview — fair-use rate limit of 60 requests/minute applies. See terms.

Voice Cloning