Audio

BVE Gateway supports two audio endpoints: speech synthesis (TTS) and audio transcription (Whisper).

Text-to-Speech

POST https://api.bve.me/v1/audio/speech

Requires Authorization: Bearer sk-bve-YOUR_KEY.

Converts text to audio. Returns a binary audio stream (audio/mpeg).

Request body

{
  "model": "tts-1",
  "input": "Hello, world!",
  "voice": "alloy"
}

Field	Type	Required	Description
`model`	string	Yes	`tts-1`, `tts-1-hd`, or `gpt-4o-mini-tts`
`input`	string	Yes	Text to synthesize (max 4096 characters)
`voice`	string or object	Yes	Built-in voice ID or a custom voice object like `{ "id": "voice_1234" }`
`instructions`	string	No	Voice/style instructions (max 4096 characters). Not supported by `tts-1` or `tts-1-hd`.
`response_format`	string	No	Audio format: `mp3` (default), `opus`, `aac`, `flac`, `wav`, `pcm`
`speed`	number	No	Speed multiplier 0.25–4.0 (default 1.0)
`stream_format`	string	No	Stream mode: `audio` or `sse`. `sse` is not supported by `tts-1` or `tts-1-hd`.

Supported voices:

Voice	Description
`alloy`	Neutral, balanced
`ash`	Warm, conversational
`ballad`	Rich, lyrical
`cedar`	Grounded, resonant
`coral`	Bright, expressive
`echo`	Clear, precise
`fable`	Storytelling cadence
`marin`	Natural, polished
`nova`	Energetic, upbeat
`onyx`	Deep, authoritative
`sage`	Calm, measured
`shimmer`	Soft, gentle
`verse`	Crisp, articulate

Custom voices created upstream can also be referenced by passing a voice object with an id field, for example:

{
  "model": "gpt-4o-mini-tts",
  "input": "Hello from a custom voice.",
  "voice": { "id": "voice_1234" }
}

cURL example

curl https://api.bve.me/v1/audio/speech \
  -H "Authorization: Bearer sk-bve-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello from BVE Gateway!",
    "voice": "alloy"
  }' \
  --output speech.mp3

import OpenAI from 'openai';
import fs from 'fs';

const client = new OpenAI({
  apiKey: 'sk-bve-YOUR_KEY',
  baseURL: 'https://api.bve.me/v1',
});

const mp3 = await client.audio.speech.create({
  model: 'tts-1',
  voice: 'alloy',
  input: 'Hello from BVE Gateway!',
});

const buffer = Buffer.from(await mp3.arrayBuffer());
fs.writeFileSync('speech.mp3', buffer);

from openai import OpenAI

client = OpenAI(
    api_key="sk-bve-YOUR_KEY",
    base_url="https://api.bve.me/v1",
)

response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello from BVE Gateway!",
)

response.stream_to_file("speech.mp3")

Streaming TTS

For lower latency, stream audio bytes as they are generated rather than waiting for the full audio file:

TypeScript
Python

import OpenAI from 'openai';
import fs from 'fs';

const client = new OpenAI({
  apiKey: 'sk-bve-YOUR_KEY',
  baseURL: 'https://api.bve.me/v1',
});

const response = await client.audio.speech.create({
  model: 'gpt-4o-mini-tts',
  voice: 'alloy',
  input: 'Streaming audio from BVE Gateway.',
  response_format: 'mp3',
});

const writeStream = fs.createWriteStream('speech.mp3');
for await (const chunk of response.body as AsyncIterable<Uint8Array>) {
  writeStream.write(chunk);
}
writeStream.end();

from openai import OpenAI

client = OpenAI(
    api_key="sk-bve-YOUR_KEY",
    base_url="https://api.bve.me/v1",
)

with client.audio.speech.with_streaming_response.create(
    model="gpt-4o-mini-tts",
    voice="alloy",
    input="Streaming audio from BVE Gateway.",
    response_format="mp3",
) as response:
    response.stream_to_file("speech.mp3")

Notes

Response is a binary audio stream, not JSON.
tts-1-hd produces higher-quality audio at higher latency and cost.
gpt-4o-mini-tts is a GPT-4o-mini-based TTS model with improved naturalness.
All three fields (model, input, voice) are required. Missing or invalid values return 400 missing_required_parameter or 400 invalid_value.
input is limited to 4096 characters. Oversized TTS inputs are rejected at the gateway with 400 invalid_value.
voice may be either a built-in string or a custom voice object with an id field.
instructions must be a string of 4096 characters or fewer when provided.
Invalid response_format values and out-of-range speed values are rejected at the gateway with 400 invalid_value before the request is proxied upstream.
stream_format must be either audio or sse.
Sending a TTS model to any other endpoint (e.g., /v1/chat/completions) returns 400 model_endpoint_mismatch.

Audio Transcriptions

POST https://api.bve.me/v1/audio/transcriptions

Requires Authorization: Bearer sk-bve-YOUR_KEY.

Transcribes audio to text using Whisper. Accepts multipart/form-data.

Request (multipart/form-data)

Field	Type	Required	Description
`file`	binary	Yes	Audio file (mp3, mp4, mpeg, mpga, m4a, wav, webm)
`model`	string	Yes	See model table below
`language`	string	No	ISO-639-1 language code (e.g. `en`)
`prompt`	string	No	Optional context/style prompt
`response_format`	string	No	`json` (default), `text`, `srt`, `verbose_json`, `vtt`
`temperature`	number	No	Sampling temperature 0–1
`timestamp_granularities[]`	string[]	No	Timestamp granularities to include: `word`, `segment` (repeated form field)

cURL example

curl https://api.bve.me/v1/audio/transcriptions \
  -H "Authorization: Bearer sk-bve-YOUR_KEY" \
  -F file="@audio.mp3" \
  -F model="whisper-1"

Response:

{
  "text": "Hello, this is the transcribed text."
}

SDK example

TypeScript
Python

import OpenAI from 'openai';
import fs from 'fs';

const client = new OpenAI({
  apiKey: 'sk-bve-YOUR_KEY',
  baseURL: 'https://api.bve.me/v1',
});

const transcription = await client.audio.transcriptions.create({
  file: fs.createReadStream('audio.mp3'),
  model: 'whisper-1',
});

console.log(transcription.text);

from openai import OpenAI

client = OpenAI(
    api_key="sk-bve-YOUR_KEY",
    base_url="https://api.bve.me/v1",
)

with open("audio.mp3", "rb") as f:
    transcription = client.audio.transcriptions.create(
        file=f,
        model="whisper-1",
    )

print(transcription.text)

Verbose JSON with word timestamps

TypeScript
Python

const transcription = await client.audio.transcriptions.create({
  file: fs.createReadStream('audio.mp3'),
  model: 'whisper-1',
  response_format: 'verbose_json',
  timestamp_granularities: ['word'],
});

for (const word of transcription.words ?? []) {
  console.log(`${word.word}: ${word.start}s – ${word.end}s`);
}

with open("audio.mp3", "rb") as f:
    transcription = client.audio.transcriptions.create(
        file=f,
        model="whisper-1",
        response_format="verbose_json",
        timestamp_granularities=["word"],
    )

for word in transcription.words or []:
    print(f"{word.word}: {word.start}s – {word.end}s")

Available transcription models

Model	Provider	Notes
`whisper-1`	OpenAI	Standard Whisper transcription
`gpt-4o-transcribe`	OpenAI	GPT-4o-based transcription (auto-updates to latest)
`gpt-4o-transcribe-2025-03-20`	OpenAI	Pinned March 2025 snapshot of `gpt-4o-transcribe`
`gpt-4o-mini-transcribe`	OpenAI	Lighter, faster GPT-4o Mini-based transcription
`gpt-4o-transcribe-diarize`	OpenAI	GPT-4o transcription with speaker diarization (auto-updates)
`gpt-4o-transcribe-diarize-2025-03-20`	OpenAI	Pinned March 2025 snapshot of `gpt-4o-transcribe-diarize`
`whisper-large-v3`	Groq	Groq-hosted Whisper large v3
`whisper-large-v3-turbo`	Groq	Faster Groq Whisper large v3
`distil-whisper-large-v3-en`	Groq	Groq distilled Whisper (English only, fastest)

Notes

Both model and file fields are required. Missing either returns 400 missing_required_parameter.
Sending a TTS, embedding, or image-generation model to this endpoint returns 400 model_endpoint_mismatch.
Audio translations (POST /v1/audio/translations) are not supported by Fuelix and return 404.
Maximum file size is 25 MB (OpenAI limit; Fuelix may enforce lower).
The multipart request is forwarded directly — no re-encoding.
response_format must be one of json, text, srt, verbose_json, vtt. Invalid values return 400 invalid_value at the gateway.
temperature must be a number in [0, 1]. Non-numeric strings return 400 invalid_type; out-of-range values return 400 invalid_value.
timestamp_granularities[] is sent as repeated form fields (e.g. -F 'timestamp_granularities[]=word' -F 'timestamp_granularities[]=segment'). Each value must be word or segment; unknown values return 400 invalid_value with param: "timestamp_granularities".

Response headers

BVE Gateway adds the following headers to every audio endpoint response:

Header	Example	Description
`X-Request-Id`	`550e8400-…`	Server-generated UUID for this request
`X-BVE-Client-Id`	`my-trace-123`	Echo of the client-supplied `X-Request-Id` (when present and valid). Absent when not sent or invalid.
`X-BVE-Latency`	`312`	Total gateway latency in milliseconds
`X-BVE-Model`	`tts-1`	Model ID used for this request
`X-BVE-Key-Name`	`prod-key`	Display name of the authenticated key

The standard X-RateLimit-* per-key headers (requests per minute, per day, and optional monthly caps) are also present. See Rate Limits & Quotas for the full table.

Next steps

Models Audio model catalog — tts-1, tts-1-hd, whisper-1, whisper-large-v3, and more

SDK Usage TypeScript and Python SDK examples for text-to-speech and audio transcription

Rate Limits & Quotas Per-key RPM/RPD limits, monthly caps, and Retry-After handling

Errors Full error code reference including model_endpoint_mismatch for audio models