MMAudio V2 API - AI Audio Generation APIs
by Sony AI
MMAudio V2 API is a high-performance audio synthesis interface designed to generate high-fidelity sound effects and soundtracks synchronized directly to video content or text descriptions. By utilizing advanced temporal alignment and neural processing, it accurately bridges the gap between visual motion and auditory experience, making it an essential tool for creators looking to automate sound design. The system supports a wide range of features, including prompt-based audio generation, negative prompting for refined control, and adjustable sample rates to ensure output quality matches professional standards.

Models Version
Get $5 Free Credit on First Payment
No strings attached — add funds and get $5 bonus instantly
MMAudio v2 Text to Audio API Documentation
https://gateway.pixazo.ai/mmaudio-v2-text-to-audio/v1
Authentication
All requests require an API key passed via header.
| Header | Type | Required | Description |
|---|---|---|---|
| Ocp-Apim-Subscription-Key | string | Yes | Your API subscription key |
MMAudio V2 Text to Audio generate request - MMAudio V2 Text to Audio
Request Code
POST https://gateway.pixazo.ai/mmaudio-v2-text-to-audio/v1/mmaudio-v2-text-to-audio-request
Content-Type: application/json
Cache-Control: no-cache
Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY
{
"prompt": "Gentle ocean waves crashing on a sandy beach with seagulls",
"negative_prompt": "",
"num_steps": 25,
"duration": 8,
"cfg_strength": 4.5,
"mask_away_clip": false
}
import requests
url = "https://gateway.pixazo.ai/mmaudio-v2-text-to-audio/v1/mmaudio-v2-text-to-audio-request"
headers = {
"Content-Type": "application/json",
"Cache-Control": "no-cache",
"Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
}
data = {
"prompt": "Gentle ocean waves crashing on a sandy beach with seagulls",
"negative_prompt": "",
"num_steps": 25,
"duration": 8,
"cfg_strength": 4.5,
"mask_away_clip": false
}
response = requests.post(url, json=data, headers=headers)
print(response.json())
const url = "https://gateway.pixazo.ai/mmaudio-v2-text-to-audio/v1/mmaudio-v2-text-to-audio-request";
const headers = {
"Content-Type": "application/json",
"Cache-Control": "no-cache",
"Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
};
const data = {
"prompt": "Gentle ocean waves crashing on a sandy beach with seagulls",
"negative_prompt": "",
"num_steps": 25,
"duration": 8,
"cfg_strength": 4.5,
"mask_away_clip": false
};
fetch(url, {
method: "POST",
headers: headers,
body: JSON.stringify(data)
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error("Error:", error));
curl -X POST "https://gateway.pixazo.ai/mmaudio-v2-text-to-audio/v1/mmaudio-v2-text-to-audio-request" \
-H "Content-Type: application/json" \
-H "Cache-Control: no-cache" \
-H "Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY" \
--data-raw '{
"prompt": "Gentle ocean waves crashing on a sandy beach with seagulls",
"negative_prompt": "",
"num_steps": 25,
"duration": 8,
"cfg_strength": 4.5,
"mask_away_clip": false
}'
Output
{
"request_id": "mmaudio-v2-text-to-audio_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "QUEUED",
"polling_url": "https://gateway.pixazo.ai/v2/requests/status/mmaudio-v2-text-to-audio_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
Webhook (Optional)
Add the X-Webhook-URL header to your submit request to receive a POST callback when the job completes — no polling required.
Webhook Headers
| Header | Required | Default | Description |
|---|---|---|---|
X-Webhook-URL | Yes (to enable) | — | HTTPS endpoint on your server that will receive the POST callback. Must respond 2xx within a few seconds (process async if needed). |
X-Webhook-Mode | No | terminal | terminal — fires once at the final status (COMPLETED/FAILED/ERROR). sync — fires on every poll cycle plus the terminal event, and caps the queue’s polling delay at 15s for tighter progress updates. |
Example: enable webhook
X-Webhook-URL: https://your-server.com/webhook/callback
X-Webhook-Mode: terminal
Callback Payload
Your endpoint receives a POST application/json with the same shape as the GET /v2/requests/status/{request_id} response. Example terminal callback (mode terminal):
{
"request_id": "mmaudio-v2-text-to-audio_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "COMPLETED",
"model_id": "mmaudio-v2-text-to-audio",
"error": null,
"output": {
"media_url": [
"https://pub-582b7213209642b9b995c96c95a30381.r2.dev/v1/mmaudio-v2-text-to-audio_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/output.wav"
],
"media_type": "audio/wav"
},
"created_at": "2026-05-22T13:17:32.110Z",
"updated_at": "2026-05-22 13:19:23",
"completed_at": "2026-05-22 13:19:23"
}
Failure callback shape
{
"request_id": "mmaudio-v2-text-to-audio_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "ERROR",
"model_id": "mmaudio-v2-text-to-audio",
"error": "Description of the error",
"output": null,
"created_at": "...",
"updated_at": "...",
"completed_at": "..."
}
Delivery semantics
- terminal mode (default) — exactly one
POSTwhen the request reaches a terminal status. No callback duringPROCESSING. - sync mode —
POSTon every status poll (with delay capped at ~15s) plus a finalPOSTat terminal status. Use when you want progress updates. - Idempotency — use
request_idas your idempotency key. Network retries can deliver the same callback more than once; your handler must tolerate duplicates. - Response — respond
200 OKwithin a few seconds. The queue does not block on slow handlers, but persistent failures may stop further deliveries. - HTTPS required — plain
http://URLs are rejected.
Request Parameters - MMAudio V2 Text to Audio generate request
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| prompt | string | Yes | — | A detailed text description of the desired audio. Example: "Gentle ocean waves crashing on a sandy beach with seagulls". |
| negative_prompt | string | No | "" | Describes sounds to avoid in the generated audio. Leave empty for no exclusion. |
| num_steps | integer | No | 25 | Number of denoising steps. Higher values improve quality but increase generation time. Range: 10–100. |
| duration | integer | No | 8 | Duration of the generated audio in seconds. Range: 2–30. |
| cfg_strength | number | No | 4.5 | Classifier-Free Guidance strength. Controls how closely the output follows the prompt. Higher values increase prompt adherence. Range: 1.0–10.0. |
| mask_away_clip | boolean | No | false | If true, masks out the beginning and end of the audio to avoid abrupt cuts. Recommended for seamless loops. |
Minimum Request
{
"prompt": "Gentle ocean waves crashing on a sandy beach with seagulls"
}
Full Request (all options)
{
"prompt": "Gentle ocean waves crashing on a sandy beach with seagulls",
"negative_prompt": "",
"num_steps": 25,
"duration": 8,
"cfg_strength": 4.5,
"mask_away_clip": false
}
Response
{
"request_id": "mmaudio-v2-text-to-audio_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "QUEUED",
"polling_url": "https://gateway.pixazo.ai/v2/requests/status/mmaudio-v2-text-to-audio_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}
Request Headers
| Header | Value |
|---|---|
| Content-Type | application/json |
| Cache-Control | no-cache |
| Ocp-Apim-Subscription-Key | Your API subscription key |
Response Handling
Common status codes for MMAudio V2 Text to Audio generate request.
| Code | Meaning |
|---|---|
| 202 | Accepted — Request queued |
| 400 | Bad Request |
| 401 | Unauthorized |
| 403 | Forbidden |
| 404 | Not Found |
| 429 | Too Many Requests |
| 500 | Internal Server Error |
Response Handling
Common status codes.
| Code | Meaning |
|---|---|
| 202 | Accepted — Request queued |
| 400 | Bad Request |
| 401 | Unauthorized |
| 402 | Insufficient Balance |
| 403 | Forbidden |
| 429 | Too Many Requests |
| 500 | Internal Server Error |
Error Responses
Queue system errors and model validation errors.
Queue System Errors
// 402 — Insufficient balance
{
"error": "Insufficient Balance",
"message": "Your wallet does not have enough balance."
}
// 400 — Model not found
{
"error": "Model not found",
"message": "Model 'mmaudio-v2-text-to-audio' not found or is disabled"
}
Error via Status/Webhook
{
"request_id": "mmaudio-v2-text-to-audio_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "ERROR",
"model_id": "mmaudio-v2-text-to-audio",
"error": "Description of the error",
"output": null
}
Retrieving Results
Poll the universal status endpoint to check progress and retrieve results.
Endpoint
GET https://gateway.pixazo.ai/v2/requests/status/{request_id}
Ocp-Apim-Subscription-Key: YOUR_API_KEY
cURL Example
curl -H "Ocp-Apim-Subscription-Key: YOUR_API_KEY" \
"https://gateway.pixazo.ai/v2/requests/status/mmaudio-v2-text-to-audio_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
Response (Completed)
{
"request_id": "mmaudio-v2-text-to-audio_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
"status": "COMPLETED",
"model_id": "mmaudio-v2-text-to-audio",
"error": null,
"output": {
"media_url": [
"https://pub-582b7213209642b9b995c96c95a30381.r2.dev/v1/mmaudio-v2-text-to-audio_019dxxxx-xxxx/output.ext"
],
"media_type": "application/octet-stream"
},
"created_at": "2026-03-31T10:00:00.000Z",
"updated_at": "2026-03-31T10:00:15.000Z",
"completed_at": "2026-03-31T10:00:15.000Z"
}
Response Fields
| Field | Type | Description |
|---|---|---|
| request_id | string | Unique request identifier |
| status | string | QUEUED, PROCESSING, COMPLETED, FAILED, or ERROR |
| model_id | string | Model that processed the request |
| error | string|null | Error message if failed |
| output.media_url | array | URLs to generated media (R2 CDN) |
| output.media_type | string | MIME type of the output |
| created_at | string | When request was created |
| completed_at | string|null | When request completed |
| polling_url | string | Status URL (initial response only) |
Status Values
| Status | Description |
|---|---|
| QUEUED | Request accepted, waiting to be processed |
| PROCESSING | Being processed by the model |
| COMPLETED | Done — output contains the result |
| FAILED | Failed — check error field |
| ERROR | System error — not charged |
Status Flow
QUEUED → PROCESSING → COMPLETED
→ FAILED
→ ERROR
Typical Workflow
- Send a generate request to the API endpoint
- Save the
request_idfrom the response - Poll every 5-10 seconds:
GET /v2/requests/status/{request_id} - When
statusis"COMPLETED", download fromoutput.media_url
Tip: Use X-Webhook-URL header to get a callback instead of polling.
MMAudio v2 Text to Audio API Pricing
No data available
Could not load current pricing