MMAudio V2 API - AI Audio Generation API: Pricing, Documentation

MMAudio V2 API is a high-performance audio synthesis interface designed to generate high-fidelity sound effects and soundtracks synchronized directly to video content or text descriptions. By utilizing advanced temporal alignment and neural processing, it accurately bridges the gap between visual motion and auditory experience, making it an essential tool for creators looking to automate sound design. The system supports a wide range of features, including prompt-based audio generation, negative prompting for refined control, and adjustable sample rates to ensure output quality matches professional standards.

Get API Key

Models Version

WELCOME BONUS

Get $5 Free Credit on First Payment

No strings attached — add funds and get $5 bonus instantly

Claim Your $5 →

MMAudio v2 API Documentation

https://gateway.pixazo.ai/mmaudio-v2-text-to-audio/v1/mmaudio-v2-text-to-audio-request

Authentication

All requests require an API key passed via header.

Header	Type	Required	Description
Ocp-Apim-Subscription-Key	string	Yes	Your API subscription key

MMAudio V2 Text to Audio generate request - MMAudio V2 Text to Audio

Request Code

POST https://gateway.pixazo.ai/mmaudio-v2-text-to-audio/v1/mmaudio-v2-text-to-audio-request
Content-Type: application/json
Cache-Control: no-cache
Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY

{
  "prompt": "Gentle ocean waves crashing on a sandy beach with seagulls",
  "negative_prompt": "",
  "num_steps": 25,
  "duration": 8,
  "cfg_strength": 4.5,
  "mask_away_clip": false
}

import requests

url = "https://gateway.pixazo.ai/mmaudio-v2-text-to-audio/v1/mmaudio-v2-text-to-audio-request"
headers = {
    "Content-Type": "application/json",
    "Cache-Control": "no-cache",
    "Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
}
data = {
    "prompt": "Gentle ocean waves crashing on a sandy beach with seagulls",
    "negative_prompt": "",
    "num_steps": 25,
    "duration": 8,
    "cfg_strength": 4.5,
    "mask_away_clip": false
}

response = requests.post(url, json=data, headers=headers)
print(response.json())

const url = "https://gateway.pixazo.ai/mmaudio-v2-text-to-audio/v1/mmaudio-v2-text-to-audio-request";
const headers = {
  "Content-Type": "application/json",
  "Cache-Control": "no-cache",
  "Ocp-Apim-Subscription-Key": "YOUR_SUBSCRIPTION_KEY"
};
const data = {
  "prompt": "Gentle ocean waves crashing on a sandy beach with seagulls",
  "negative_prompt": "",
  "num_steps": 25,
  "duration": 8,
  "cfg_strength": 4.5,
  "mask_away_clip": false
};

fetch(url, {
  method: "POST",
  headers: headers,
  body: JSON.stringify(data)
})
.then(response => response.json())
.then(data => console.log(data))
.catch(error => console.error("Error:", error));

curl -X POST "https://gateway.pixazo.ai/mmaudio-v2-text-to-audio/v1/mmaudio-v2-text-to-audio-request" \
  -H "Content-Type: application/json" \
  -H "Cache-Control: no-cache" \
  -H "Ocp-Apim-Subscription-Key: YOUR_SUBSCRIPTION_KEY" \
  --data-raw '{
    "prompt": "Gentle ocean waves crashing on a sandy beach with seagulls",
    "negative_prompt": "",
    "num_steps": 25,
    "duration": 8,
    "cfg_strength": 4.5,
    "mask_away_clip": false
  }'

Output

{
  "request_id": "mmaudio-v2-text-to-audio_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "QUEUED",
  "polling_url": "https://gateway.pixazo.ai/v2/requests/status/mmaudio-v2-text-to-audio_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Try Now

Webhook (Optional)

Add the X-Webhook-URL header to your submit request to receive a POST callback when the job completes — no polling required.

Using curl? These are HTTP request headers — pass each with -H, e.g. -H "X-Webhook-URL: https://your-server.com/webhook/callback". Do not paste them as bare lines, and end every line of a multi-line command with \.

Webhook Headers

Header	Required	Default	Description
`X-Webhook-URL`	Yes (to enable)	—	HTTPS endpoint on your server that will receive the `POST` callback. Must respond `2xx` within a few seconds (process async if needed).
`X-Webhook-Mode`	No	`terminal`	`terminal` — fires once at the final status (`COMPLETED`/`FAILED`/`ERROR`). `sync` — fires on every poll cycle plus the terminal event, and caps the queue’s polling delay at 15s for tighter progress updates.

Example: enable webhook

X-Webhook-URL: https://your-server.com/webhook/callback
X-Webhook-Mode: terminal

Callback Payload

Your endpoint receives a POST application/json with the same shape as the GET /v2/requests/status/{request_id} response. Example terminal callback (mode terminal):

{
  "request_id": "mmaudio-v2-text-to-audio_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "COMPLETED",
  "model_id": "mmaudio-v2-text-to-audio",
  "error": null,
  "output": {
    "media_url": [
      "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/v1/mmaudio-v2-text-to-audio_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/output.wav"
    ],
    "media_type": "audio/wav"
  },
  "created_at": "2026-05-22T13:17:32.110Z",
  "updated_at": "2026-05-22 13:19:23",
  "completed_at": "2026-05-22 13:19:23"
}

Failure callback shape

{
  "request_id": "mmaudio-v2-text-to-audio_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "ERROR",
  "model_id": "mmaudio-v2-text-to-audio",
  "error": "Description of the error",
  "output": null,
  "created_at": "...",
  "updated_at": "...",
  "completed_at": "..."
}

Delivery semantics

terminal mode (default) — exactly one POST when the request reaches a terminal status. No callback during PROCESSING.
sync mode — POST on every status poll (with delay capped at ~15s) plus a final POST at terminal status. Use when you want progress updates.
Idempotency — use request_id as your idempotency key. Network retries can deliver the same callback more than once; your handler must tolerate duplicates.
Response — respond 200 OK within a few seconds. The queue does not block on slow handlers, but persistent failures may stop further deliveries.
HTTPS required — plain http:// URLs are rejected.

Request Parameters - MMAudio V2 Text to Audio generate request

Parameter	Required	Type	Default	Allowed values / range	Description
prompt	Yes	string	—	—	A detailed text description of the desired audio. Example: "Gentle ocean waves crashing on a sandy beach with seagulls".
negative_prompt	No	string	""	—	Describes sounds to exclude from the generated audio. Leave empty for no exclusion.
seed	No	integer	—	0–65535	Seed for the random number generator. Use the same seed with the same prompt to reproduce a previous result. Omit for a random seed on every request.
num_steps	No	integer	25	4–50	Number of refinement steps the model runs while generating. Higher values refine detail and quality but increase processing time; lower values are faster.
duration	No	number	8	1–30	Duration of the generated audio in seconds. Accepts fractional values (for example 7.5).
cfg_strength	No	number	4.5	0–20	Strength of Classifier Free Guidance. Controls how closely the output follows your prompt: higher values stick more strictly to the prompt, lower values allow more variation.
mask_away_clip	No	boolean	false	true, false	Advanced. When true, masks away the CLIP conditioning embedding so the model relies less on it. Leave as false for normal text-to-audio generation.

Minimum Request

{
  "prompt": "Gentle ocean waves crashing on a sandy beach with seagulls"
}

Full Request (all options)

{
  "prompt": "Gentle ocean waves crashing on a sandy beach with seagulls",
  "negative_prompt": "",
  "num_steps": 25,
  "duration": 8,
  "cfg_strength": 4.5,
  "mask_away_clip": false
}

Response

{
  "request_id": "mmaudio-v2-text-to-audio_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "QUEUED",
  "polling_url": "https://gateway.pixazo.ai/v2/requests/status/mmaudio-v2-text-to-audio_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Request Headers

Header	Value
Content-Type	application/json
Cache-Control	no-cache
Ocp-Apim-Subscription-Key	Your API subscription key

Response Handling

Common status codes for MMAudio V2 Text to Audio generate request.

Code	Meaning
202	Accepted — Request queued
400	Bad Request
401	Unauthorized
403	Forbidden
404	Not Found
429	Too Many Requests
500	Internal Server Error

Error Responses

Queue system errors and model validation errors.

Queue System Errors

// 402 — Insufficient balance
{
  "error": "Insufficient Balance",
  "message": "Your wallet does not have enough balance."
}

// 400 — Model not found
{
  "error": "Model not found",
  "message": "Model 'mmaudio-v2-text-to-audio' not found or is disabled"
}

Error via Status/Webhook

{
  "request_id": "mmaudio-v2-text-to-audio_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "ERROR",
  "model_id": "mmaudio-v2-text-to-audio",
  "error": "Description of the error",
  "output": null
}

Retrieving Results

Poll the universal status endpoint to check progress and retrieve results.

Endpoint

GET https://gateway.pixazo.ai/v2/requests/status/{request_id}
Ocp-Apim-Subscription-Key: YOUR_API_KEY

cURL Example

curl -H "Ocp-Apim-Subscription-Key: YOUR_API_KEY" \
  "https://gateway.pixazo.ai/v2/requests/status/mmaudio-v2-text-to-audio_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

Response (Completed)

{
  "request_id": "mmaudio-v2-text-to-audio_019dxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
  "status": "COMPLETED",
  "model_id": "mmaudio-v2-text-to-audio",
  "error": null,
  "output": {
    "media_url": [
      "https://pub-582b7213209642b9b995c96c95a30381.r2.dev/v1/mmaudio-v2-text-to-audio_019dxxxx-xxxx/output.ext"
    ],
    "media_type": "application/octet-stream"
  },
  "created_at": "2026-03-31T10:00:00.000Z",
  "updated_at": "2026-03-31T10:00:15.000Z",
  "completed_at": "2026-03-31T10:00:15.000Z"
}

Response Fields

Field	Type	Description
request_id	string	Unique request identifier
status	string	QUEUED, PROCESSING, COMPLETED, FAILED, or ERROR
model_id	string	Model that processed the request
error	string\|null	Error message if failed
output.media_url	array	URLs to generated media (R2 CDN)
output.media_type	string	MIME type of the output
created_at	string	When request was created
completed_at	string\|null	When request completed
polling_url	string	Status URL (initial response only)

Status Values

Status	Description
QUEUED	Request accepted, waiting to be processed
PROCESSING	Being processed by the model
COMPLETED	Done — output contains the result
FAILED	Failed — check error field
ERROR	System error — not charged

Status Flow

QUEUED → PROCESSING → COMPLETED
                    → FAILED
                    → ERROR

Typical Workflow

Send a generate request to the API endpoint
Save the request_id from the response
Poll every 5-10 seconds: GET /v2/requests/status/{request_id}
When status is "COMPLETED", download from output.media_url

Tip: Use X-Webhook-URL header to get a callback instead of polling.

MMAudio v2 API Pricing

Your request will cost $0.001 per second of generated audio.

30-second clip ≈ $0.03