Audio API Documentation
This guide covers the full technical flow: models, voices, languages, request formats, sync/async/stream examples, SSE events, signed webhooks, and compatibility notes.
Comenzar en 5 minutos
1. Crear API key
Desde /settings/api.
2. Probar sync
Ejecuta una solicitud minima y valida respuesta JSON.
3. Escalar a stream/async
SSE en tiempo real o jobs asincronos con webhook firmado.
curl -X POST "https://studio.dropabyte.com/v1/tts" \
-H "Content-Type: application/json" \
-H "x-api-key: sk_live_..." \
-d '{
"text": "Hello, this is a synchronous synthesis request.",
"modelId": "google_gemini_2_5_flash_tts",
"voiceName": "Zephyr",
"languageCode": "en-US",
"audioEncoding": "OGG_OPUS",
"speakingRate": 1.0
}'POST /api/v1/tts
POST /api/v1/tts/stream
POST /api/v1/audio/generate-async
GET /api/v1/audio/status?jobId=:jobId
GET /api/v1/audio/download?jobId=:jobId
x-api-key: sk_live_...
Authorization: Bearer sk_live_...
Diagnostic sandbox: admin panel only.
Webhook with HMAC SHA-256 signature.
Safe comparison with timingSafeEqual.
Timestamp tolerance control.
Available models
Internally mapped to compatible providers (Cloud TTS by default; Vertex optional by feature flag).
| Model | API ID | Category | Multi-speaker | Latency | API cost / 100 chars |
|---|---|---|---|---|---|
| Gemini 3.1 Pro | google_gemini_3_1_pro_tts | Gemini | Si | < 500ms | 15 |
| Gemini 3.1 Flash | google_gemini_3_1_flash_tts | Gemini | Si | < 500ms | 8 |
| Gemini 3.1 Lite (Preview) | google_gemini_3_1_lite_preview_tts | Gemini | Si | < 300ms | 6 |
| Gemini 2.5 Pro | google_gemini_2_5_pro_tts | Gemini | Si | < 500ms | 15 |
| Gemini 2.5 Flash | google_gemini_2_5_flash_tts | Gemini | Si | < 500ms | 8 |
| Gemini 2.5 Flash Lite (Preview) | google_gemini_2_5_flash_lite_tts | Gemini | Si | < 300ms | 6 |
| Chirp 3 HD | chirp-3-hd | Chirp | No | < 150ms | 12 |
| Neural2 Standard | neural2-standard | Neural | No | 300-500ms | 10 |
Official Google TTS families
Reference guide based on official Google docs: Standard, WaveNet, Neural2, Studio, and Chirp 3 HD.
| Family | Recommended use case | Control | Multi-speaker | Name example |
|---|---|---|---|---|
| Standard | Cost-efficient and general-purpose scenarios | SSML | No | en-US-Standard-A |
| WaveNet | Better naturalness for premium general usage | SSML | No | en-US-Wavenet-D |
| Neural2 | Modern neural voices for production apps | SSML | No | en-US-Neural2-F |
| Studio | Narration, news, media, and long-form content | SSML (with some restrictions) | Single + Studio multi-speaker variants | en-US-Studio-O |
| Chirp 3 HD | Low-latency natural conversation | Advanced controls; no full SSML support | Depends on flow/model | en-US-Chirp3-HD-Zephyr |
Gemini TTS and 2.5 variants
Reference for Gemini audio models. In Google TTS, gemini-2.5-flash-tts and gemini-2.5-pro-tts are officially listed.
| Model | ID / Catalog | Best for | I/O | Multi-speaker | Formats |
|---|---|---|---|---|---|
| Gemini 2.5 Flash TTS | gemini-2.5-flash-tts | Low latency, high volume, better cost-performance | Text -> Audio | Yes (single and multi-speaker) | LINEAR16, ALAW, MULAW, MP3, OGG_OPUS, PCM |
| Gemini 2.5 Pro TTS | gemini-2.5-pro-tts | Higher expressiveness and style control | Text -> Audio | Yes (single and multi-speaker) | LINEAR16, ALAW, MULAW, MP3, OGG_OPUS, PCM |
| Gemini 2.5 Flash-Lite (Vertex catalog) | Gemini 2.5 Flash-Lite | Massive inference, very cost/latency-sensitive usage | General Gemini model in Vertex | Verify TTS availability by region/project | Depends on active endpoint/release |
Note: Gemini 2.5 Flash-Lite appears in the general Vertex AI Gemini catalog. Exact TTS availability may vary by endpoint, region, and release stage.
Validate effective availability with project-level tests and official release notes.
google_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_pro_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_flash_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_3_1_lite_preview_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_pro_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttsgoogle_gemini_2_5_flash_lite_ttschirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdchirp-3-hdneural2-standardneural2-standardneural2-standardneural2-standardneural2-standardneural2-standardneural2-standardneural2-standardneural2-standardFor full up-to-date inventory by region/language, use GET https://texttospeech.googleapis.com/v1/voices.
Languages
Common languages enabled in the app (BCP-47). Google Cloud supports a wider catalog by voice type.
Official real-time voices/languages lookup:
curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://texttospeech.googleapis.com/v1/voices"Request formats
curl -X POST "https://studio.dropabyte.com/v1/tts" \
-H "Content-Type: application/json" \
-H "x-api-key: sk_live_..." \
-d '{
"text": "Hello, this is a synchronous synthesis request.",
"modelId": "google_gemini_2_5_flash_tts",
"voiceName": "Zephyr",
"languageCode": "en-US",
"audioEncoding": "OGG_OPUS",
"speakingRate": 1.0
}'Supported parameters
| Field | Type | Required | Description |
|---|---|---|---|
text | string | Conditional | Plain text to synthesize (use text or turns). |
turns | array<{speaker,text}> | Conditional | Structured multi-speaker dialogue. |
speakers | array<{speaker,voiceName}> | No | Voice mapping by speaker in dialogue mode. |
modelId | string | Yes | TTS model ID. |
voiceName | string | No | Primary voice when not using multi-speaker. |
languageCode | string | No | BCP-47 code such as en-US or es-ES. |
audioEncoding | MP3|OGG_OPUS|LINEAR16|PCM|WAV | No | Audio output format. |
sampleRateHertz | number | No | Target sample rate in Hz. |
speakingRate | number | No | Speech rate. |
pitch | number | No | Pitch adjustment. |
volumeGainDb | number | No | Volume gain in dB. |
editing.normalize | boolean | No (async) | Normalize post-processed volume. |
editing.trimSilence | boolean | No (async) | Trim leading/trailing silence. |
editing.backgroundAudioUrl | url | No (async) | Background layer for FFmpeg mix. |
webhookUrl | url | No (async) | Callback URL after job completion. |
webhookSecret | string | No (async) | Secret for HMAC SHA-256 signature. |
Codificaciones de audio
| audioEncoding | MIME tipico | Notas |
|---|---|---|
MP3 | audio/mpeg | Tamano pequeno, ideal para entrega web/mobile. |
OGG_OPUS | audio/ogg | Muy eficiente para voz y streaming. |
LINEAR16 | audio/wav | PCM con encabezado WAV en salidas sync. |
PCM | audio/wav | Sin compresion, util para pipelines DSP. |
WAV | audio/wav | Salida WAV directa para compatibilidad. |
Respuestas y eventos
Errores y estados HTTP
Webhooks firmados (HMAC)
webhookSecret (o configuras WEBHOOK_SIGNING_SECRET) se adjuntan headers de firma.x-dropabyte-signature: hex HMAC SHA-256
x-dropabyte-timestamp: unix ms
x-dropabyte-signature-alg: hmac-sha256
import crypto from "crypto";
export function verifyDropabyteWebhook({
rawBody,
signature,
timestamp,
secret,
toleranceMs = 5 * 60 * 1000,
}: {
rawBody: string;
signature: string;
timestamp: string;
secret: string;
toleranceMs?: number;
}) {
const now = Date.now();
const ts = Number(timestamp);
if (!Number.isFinite(ts)) return false;
if (Math.abs(now - ts) > toleranceMs) return false;
const payloadToSign = `${timestamp}.${rawBody}`;
const expected = crypto.createHmac("sha256", secret).update(payloadToSign).digest("hex");
const expectedBuf = Buffer.from(expected, "utf8");
const receivedBuf = Buffer.from(signature || "", "utf8");
if (expectedBuf.length !== receivedBuf.length) return false;
return crypto.timingSafeEqual(expectedBuf, receivedBuf);
}