Chat Completions

Endpoint

POST https://api.bve.me/v1/chat/completions

Requires Authorization: Bearer sk-bve-YOUR_KEY.

This endpoint proxies directly to Fuelix /chat/completions. The request body and response shape follow the OpenAI Chat Completions API. Streaming (SSE) is supported.

Request body

{
  "model": "gpt-4o",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "What is 2 + 2?" }
  ],
  "stream": false
}

Field	Type	Required	Description
`model`	string	Yes	Model ID (e.g. `gpt-4o`)
`messages`	array	Yes	Array of message objects (`{role, content}`)
`stream`	boolean	No	Enable SSE streaming
`temperature`	number	No	Sampling temperature — `[0, 2]`
`top_p`	number	No	Nucleus sampling — `[0, 1]`
`max_tokens`	integer	No	Maximum tokens to generate (positive integer)
`max_completion_tokens`	integer	No	Newer alias for `max_tokens`; preferred for o-series models (positive integer)
`n`	integer	No	Number of completions to return (positive integer)
`stop`	string \| array	No	Stop sequences — string or array of ≤ 4 strings
`presence_penalty`	number	No	Presence penalty — `[-2, 2]`
`frequency_penalty`	number	No	Frequency penalty — `[-2, 2]`
`logit_bias`	object	No	Token bias map — object of token ID strings to numbers in `[-100, 100]`
`tools`	array	No	Function tool definitions for function calling (non-empty; see Function calling)
`tool_choice`	string \| object	No	Tool selection — `"none"`, `"auto"`, `"required"`, or `{type:"function", function:{name}}`
`response_format`	object	No	Output format — `{type: "text"}`, `{type: "json_object"}`, or `{type: "json_schema", json_schema: {name, schema}}`
`stream_options`	object	No	SSE streaming options — `{include_usage: boolean}`; requires `stream: true`
`logprobs`	boolean	No	Return log probabilities of output tokens (OpenAI, Groq, OpenRouter; use with `top_logprobs`)
`top_logprobs`	integer	No	Number of top-token log probabilities per output token — `[0, 20]`; requires `logprobs: true`
`parallel_tool_calls`	boolean	No	Allow the model to call multiple tools simultaneously (default `true` for OpenAI, Groq, OpenRouter)
`user`	string	No	End-user identifier — forwarded to Fuelix for abuse detection (≤ 256 chars)
`seed`	integer	No	Deterministic seed — must be an integer (not a float)
`store`	boolean	No	Persist the response for OpenAI model distillation or evals — must be a boolean (`true` or `false`); silently ignored by non-OpenAI providers
`service_tier`	string	No	Compute tier for the request: `"auto"`, `"default"`, `"flex"`, or `"scale"` (OpenAI and OpenRouter; forwarded unchanged)
`reasoning_effort`	string	No	Reasoning budget for o-series models: `"low"`, `"medium"`, `"high"`, or `"auto"` (see Reasoning models)
`max_reasoning_tokens`	integer	No	o-series reasoning token budget — sets an upper bound on internal reasoning tokens; `0` disables extended thinking, positive values cap it (o3, o3-mini, o4-mini)
`thinking`	object	No	Claude extended thinking configuration — `{ "type": "enabled", "budget_tokens": N }` (N ≥ 1024) or `{ "type": "disabled" }`. Enables extended reasoning traces for claude-3-7-sonnet and later. See Claude extended thinking
`modalities`	array	No	Output modalities to request — non-empty array of `"text"` and/or `"audio"`. Omit for text-only output. Include `"audio"` with an `audio` config object to receive spoken audio (OpenAI audio models only)
`audio`	object	No	Audio output configuration for `gpt-4o-audio-preview` and similar audio-capable models. Required when `modalities` includes `"audio"`. Fields: `voice` (required — one of `alloy`, `ash`, `ballad`, `cedar`, `coral`, `echo`, `fable`, `marin`, `nova`, `onyx`, `sage`, `shimmer`, `verse`) and `format` (optional — one of `aac`, `flac`, `mp3`, `opus`, `pcm16`, `pcm24`, `wav`)
`top_k`	integer	No	Top-k sampling — restricts the token pool to the `k` most likely next tokens; must be ≥ 1 (Cohere `command-*`, Groq, OpenRouter)
`min_p`	number	No	Minimum probability threshold for token sampling — `[0, 1]`; a token is only sampled if its probability is ≥ `min_p` × (max token probability) (OpenRouter extension)
`top_a`	number	No	Top-a sampling — `[0, 1]`; a token is only considered if its probability is ≥ `top_a` × (max token probability)²; complements `top_p` and `min_p` (OpenRouter extension)
`repetition_penalty`	number	No	Multiplicative penalty applied to already-seen tokens — must be > 0 (values > 1 reduce repetition, values < 1 encourage it); distinct from the additive `presence_penalty`/`frequency_penalty` (Cohere, Mistral, OpenRouter)
`thinking_config`	object	No	Gemini 2.5 thinking budget configuration — `{ "thinking_budget": N }` where N is a non-negative integer; `0` disables thinking, positive values set the token budget (Gemini 2.5 Flash and Pro via OpenRouter)
`transforms`	array	No	OpenRouter prompt-transformation pipeline — array of transformation name strings (e.g. `["middle-out"]`). Applied by OpenRouter before inference; forwarded unchanged and silently ignored by non-OpenRouter providers. See OpenRouter prompt transforms
`provider`	object	No	OpenRouter provider routing object — controls upstream provider selection, fallback behavior, and data collection. See OpenRouter provider routing
`prediction`	object	No	Predicted Outputs — supply expected output text to accelerate generation via speculative decoding. Object with `type: "content"` (required) and `content` (required): a string or array of `{type: "text", text: string}` blocks. See Predicted Outputs
`functions`	array	No	Deprecated — legacy function definitions from the pre-`tools` API (OpenAI SDK v0.x). Each entry must have a `name` matching `[a-zA-Z0-9_-]{1,64}`. Prefer `tools` for new integrations.
`function_call`	string \| object	No	Deprecated — legacy function routing from the pre-`tool_choice` API. `"none"`, `"auto"`, or `{name: "<function-name>"}`. Prefer `tool_choice` for new integrations.
`web_search_options`	object	No	Web search configuration for search-capable OpenAI models (`gpt-4o-search-preview`, `gpt-4.1`, etc.). See Web search

Any additional fields supported by Fuelix are forwarded as-is.

Response headers

BVE Gateway adds the following headers to every allowed (non-429) response:

Header	Example	Description
`X-Request-Id`	`550e8400-…`	UUID for this request (generated per request)
`X-BVE-Client-Id`	`my-trace-123`	Echo of the client-supplied `X-Request-Id` (when present and valid: alphanumeric + `-_.`, ≤ 128 chars). Absent when the client did not send `X-Request-Id` or the value failed validation.
`X-BVE-Latency`	`143`	Total gateway latency in milliseconds
`X-BVE-Model`	`gpt-4o`	Model ID resolved for this request (from response JSON when buffered, from request body for streaming)
`X-BVE-Key-Name`	`prod-key`	Name of the API key used for this request (redacted if it matches a provider credential pattern)
`X-RateLimit-Limit-Requests`	`60`	Per-minute request cap for this key
`X-RateLimit-Remaining-Requests`	`57`	Requests remaining in the current minute window
`X-RateLimit-Reset-Requests`	`42s`	Seconds until the minute window resets
`X-RateLimit-Limit-Day`	`10000`	Per-day request cap for this key
`X-RateLimit-Remaining-Day`	`9843`	Requests remaining until UTC midnight
`X-RateLimit-Reset-Day`	`38412s`	Seconds until the next UTC midnight
`X-RateLimit-Limit-Month`	`1000`	Monthly request cap (only set when `monthly_limit` is configured)
`X-RateLimit-Remaining-Month`	`748`	Requests remaining before the monthly request cap (only set when configured)
`X-RateLimit-Reset-Month`	`604800s`	Seconds until the start of the next UTC calendar month (only set when configured)
`X-RateLimit-Limit-Tokens`	`50000`	Monthly token cap (only set when `monthly_token_limit` is configured)
`X-RateLimit-Remaining-Tokens`	`47832`	Tokens remaining before the monthly token cap (only set when configured)
`X-RateLimit-Reset-Tokens`	`691200s`	Seconds until the monthly token window resets (only set when configured)

All X-RateLimit-* headers reflect the per-key limits configured in BVE Gateway and are exposed via CORS. See Rate Limits & Quotas for full details and example output.

The following Fuelix upstream headers are forwarded to the client when present:

Header	Description
`content-type`	Response content type
`content-length`	Response body size
`cache-control`	Cache directives
`x-request-id`	Fuelix’s own request ID
`x-quota-allowed`, `x-quota-available`, `x-quota-reset`	Fuelix quota headers
`x-ratelimit-limit-requests`, `x-ratelimit-limit-tokens`	Upstream request/token limits
`x-ratelimit-remaining-requests`, `x-ratelimit-remaining-tokens`	Upstream remaining capacity
`x-ratelimit-reset-requests`, `x-ratelimit-reset-tokens`	Upstream window reset times
`retry-after`	Seconds until the rate-limit window resets (present on 429/503)
`anthropic-ratelimit-requests-limit`, `anthropic-ratelimit-requests-remaining`, `anthropic-ratelimit-requests-reset`	Anthropic Messages API request limits (present on `/v1/messages` responses)
`anthropic-ratelimit-tokens-limit`, `anthropic-ratelimit-tokens-remaining`, `anthropic-ratelimit-tokens-reset`	Anthropic Messages API token limits (present on `/v1/messages` responses)
`x-groq-request-id`	Groq internal request ID for log correlation
`openai-processing-ms`	OpenAI server-side processing time
`x-openrouter-model`	Actual model ID selected by OpenRouter after provider routing (e.g. `openai/gpt-4o`)
`x-or-cache-status`	OpenRouter semantic cache result: `HIT` or `MISS`
`x-or-remaining-tokens`	Token budget remaining in the OpenRouter rate-limit window

All other Fuelix upstream headers are stripped. See Security Notes for the full list.

Example — non-streaming

curl https://api.bve.me/v1/chat/completions \
  -H "Authorization: Bearer sk-bve-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{ "role": "user", "content": "What is 2 + 2?" }]
  }'

Response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1716288000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "4" },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 1,
    "total_tokens": 15
  }
}

Example — streaming

See Streaming for full streaming documentation.

curl https://api.bve.me/v1/chat/completions \
  -H "Authorization: Bearer sk-bve-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{ "role": "user", "content": "Count to 3." }],
    "stream": true
  }'

Body size limit

Request bodies larger than 10 MB are rejected with 413 Request Entity Too Large:

{
  "error": {
    "message": "Request body too large",
    "type": "invalid_request_error",
    "param": null,
    "code": "request_too_large"
  }
}

Gateway validation

The gateway validates the request body before forwarding to Fuelix. Missing or invalid fields return a 400 with a standard OpenAI-compatible error shape instead of an opaque Fuelix Pydantic error.

Required fields

Missing / invalid	Code	`param`
`model` absent	`missing_required_parameter`	`"model"`
`model` not a string	`invalid_type`	`"model"`
`model` longer than 200 characters	`invalid_value`	`"model"`
`messages` absent	`missing_required_parameter`	`"messages"`
`messages` not an array	`invalid_type`	`"messages"`
`messages` is empty	`invalid_value`	`"messages"`
`messages[N]` not an object	`invalid_type`	`"messages[N]"`
`messages[N].role` absent	`missing_required_parameter`	`"messages[N].role"`
`messages[N].role` not a string	`invalid_type`	`"messages[N].role"`
`messages[N].role` not one of `system`, `user`, `assistant`, `tool`, `function`, `developer`	`invalid_value`	`"messages[N].role"`

Optional field constraints

Parameter	Invalid condition	Code
`temperature`	Not a number	`invalid_type`
`temperature`	Outside `[0, 2]`	`invalid_value`
`top_p`	Not a number	`invalid_type`
`top_p`	Outside `[0, 1]`	`invalid_value`
`max_tokens`	Not a positive integer	`invalid_value`
`max_completion_tokens`	Not a positive integer	`invalid_value`
`n`	Not a positive integer	`invalid_value`
`stream`	Not a boolean	`invalid_type`
`stream_options`	Not an object	`invalid_type`
`stream_options`	Present when `stream` is not `true`	`invalid_value`
`stream_options.include_usage`	Not a boolean	`invalid_type`
`logprobs`	Not a boolean (e.g. `5`, `"true"`)	`invalid_type`
`top_logprobs`	Not an integer or outside `[0, 20]`	`invalid_value`
`parallel_tool_calls`	Not a boolean	`invalid_type`
`frequency_penalty`	Not a number	`invalid_type`
`frequency_penalty`	Outside `[-2, 2]`	`invalid_value`
`presence_penalty`	Not a number	`invalid_type`
`presence_penalty`	Outside `[-2, 2]`	`invalid_value`
`logit_bias`	Not an object (e.g. array or string)	`invalid_type`
`logit_bias["KEY"]`	Not a number	`invalid_type`
`logit_bias["KEY"]`	Outside `[-100, 100]`	`invalid_value`
`stop`	Not a string or array	`invalid_type`
`stop`	Array with more than 4 elements	`invalid_value`
`stop[N]`	Not a string	`invalid_type`
`tools`	Not an array	`invalid_type`
`tools`	Empty array	`invalid_value`
`tools[N]`	Not an object	`invalid_type`
`tools[N].type`	Absent	`missing_required_parameter`
`tools[N].type`	Not a string	`invalid_type`
`tools[N].type`	Not `"function"`	`invalid_value`
`tools[N].function`	Absent	`missing_required_parameter`
`tools[N].function`	Not an object	`invalid_type`
`tools[N].function.name`	Absent or not a string	`invalid_type`
`tools[N].function.name`	Empty string	`invalid_value`
`tools[N].function.name`	Fails `[a-zA-Z0-9_-]{1,64}` regex	`invalid_value`
`tools[N].function.description`	Not a string	`invalid_type`
`tools[N].function.parameters`	Not an object	`invalid_type`
`tools[N].function.strict`	Not a boolean (e.g. `"true"`, `1`)	`invalid_type`
`tool_choice`	Not a string or object	`invalid_type`
`tool_choice` (string)	Not `"none"`, `"auto"`, or `"required"`	`invalid_value`
`tool_choice.type`	Absent	`missing_required_parameter`
`tool_choice.type`	Not a string	`invalid_type`
`tool_choice.type`	Not `"function"`	`invalid_value`
`tool_choice.function`	Absent	`missing_required_parameter`
`tool_choice.function`	Not an object	`invalid_type`
`tool_choice.function.name`	Absent or not a string	`invalid_type`
`tool_choice.function.name`	Empty string	`invalid_value`
`response_format`	Not an object	`invalid_type`
`response_format.type`	Missing	`missing_required_parameter`
`response_format.type`	Not `"text"`, `"json_object"`, or `"json_schema"`	`invalid_value`
`response_format.json_schema`	Missing when `type` is `"json_schema"`	`missing_required_parameter`
`response_format.json_schema.name`	Missing or not a non-empty string	`missing_required_parameter` / `invalid_value`
`user`	Not a string	`invalid_type`
`user`	Longer than 256 characters	`invalid_value`
`seed`	Not an integer (e.g. `1.5` or a string)	`invalid_type`
`service_tier`	Not a string	`invalid_type`
`service_tier`	Not `"auto"`, `"default"`, `"flex"`, or `"scale"`	`invalid_value`
`store`	Not a boolean (e.g. `1`, `"yes"`)	`invalid_type`
`reasoning_effort`	Not `"low"`, `"medium"`, `"high"`, or `"auto"` (non-string or unrecognised string)	`invalid_value`
`max_reasoning_tokens`	Negative, a float, or not a number	`invalid_value`
`top_k`	Not a positive integer (includes non-number types, floats like `10.5`, or values ≤ 0)	`invalid_value`
`min_p`	Not a number (e.g. a string)	`invalid_type`
`min_p`	Outside `[0, 1]`	`invalid_value`
`top_a`	Not a number (e.g. a string or boolean)	`invalid_type`
`top_a`	Outside `[0, 1]`	`invalid_value`
`repetition_penalty`	Not a number (e.g. a string)	`invalid_type`
`repetition_penalty`	Not > 0 (zero or a negative number)	`invalid_value`
`modalities`	Not an array (e.g. a string or object)	`invalid_type`
`modalities`	Empty array	`invalid_value`
`modalities[N]`	Not a string	`invalid_type`
`modalities[N]`	Not `"text"` or `"audio"`	`invalid_value`
`audio`	Not an object	`invalid_type`
`audio.voice`	Absent	`missing_required_parameter`
`audio.voice`	Not a string	`invalid_type`
`audio.voice`	Not a recognised voice ID	`invalid_value`
`audio.format`	Not a string	`invalid_type`
`audio.format`	Not one of `aac`, `flac`, `mp3`, `opus`, `pcm16`, `pcm24`, `wav`	`invalid_value`
`thinking_config`	Not an object (e.g. a number or string)	`invalid_type`
`thinking_config.thinking_budget`	Not a non-negative integer (negative, float, or non-number)	`invalid_value`
`thinking`	Not an object (e.g. a number or string)	`invalid_type`
`thinking.type`	Absent	`missing_required_parameter`
`thinking.type`	Not `"enabled"` or `"disabled"`	`invalid_value`
`thinking.budget_tokens`	Absent when `thinking.type` is `"enabled"`	`missing_required_parameter`
`thinking.budget_tokens`	Not a number	`invalid_type`
`thinking.budget_tokens`	Not an integer ≥ 1024 (negative, zero, or fractional)	`invalid_value`
`transforms`	Not an array (e.g. a string or object)	`invalid_type`
`transforms[N]`	Not a string	`invalid_type`
`transforms[N]`	Empty string	`invalid_value`
`prediction`	Not an object (e.g. a number or string)	`invalid_type`
`prediction.type`	Absent	`missing_required_parameter`
`prediction.type`	Not a string	`invalid_type`
`prediction.type`	Not `"content"`	`invalid_value`
`prediction.content`	Absent	`missing_required_parameter`
`prediction.content`	Not a string or array	`invalid_type`
`prediction.content`	Empty string	`invalid_value`
`prediction.content`	Empty array	`invalid_value`
`prediction.content[N]`	Not an object	`invalid_type`
`prediction.content[N].type`	Absent	`missing_required_parameter`
`prediction.content[N].type`	Not a string	`invalid_type`
`prediction.content[N].text`	Absent when `type` is `"text"`	`missing_required_parameter`
`prediction.content[N].text`	Not a string when `type` is `"text"`	`invalid_type`
`functions`	Not an array	`invalid_type`
`functions`	Empty array	`invalid_value`
`functions[N]`	Not an object	`invalid_type`
`functions[N].name`	Absent or not a string	`invalid_type`
`functions[N].name`	Empty string	`invalid_value`
`functions[N].name`	Fails `[a-zA-Z0-9_-]{1,64}` regex (invalid chars or too long)	`invalid_value`
`functions[N].description`	Present but not a string	`invalid_type`
`functions[N].parameters`	Present but not an object	`invalid_type`
`function_call`	Not a string or object	`invalid_type`
`function_call` (string)	Not `"none"` or `"auto"` (e.g. `"required"`)	`invalid_value`
`function_call.name`	Absent or not a non-empty string	`invalid_value`
`web_search_options`	Not an object (e.g. a string or number)	`invalid_type`
`web_search_options.search_context_size`	Not a string	`invalid_type`
`web_search_options.search_context_size`	Not `"low"`, `"medium"`, or `"high"`	`invalid_value`
`web_search_options.user_location`	Not an object	`invalid_type`
`web_search_options.user_location.type`	Not a string	`invalid_type`
`web_search_options.user_location.type`	Not `"approximate"`	`invalid_value`
`web_search_options.user_location.approximate`	Not an object	`invalid_type`
`web_search_options.user_location.approximate.country` (and `region`, `city`, `timezone`)	Present but not a string	`invalid_type`
`provider`	Not an object (e.g. a string or number)	`invalid_type`
`provider.allow_fallbacks`	Not a boolean	`invalid_type`
`provider.require_parameters`	Not a boolean	`invalid_type`
`provider.data_collection`	Not a string	`invalid_type`
`provider.data_collection`	Not `"allow"` or `"deny"`	`invalid_value`
`provider.order` (and `only`, `ignore`, `quantizations`)	Not an array	`invalid_type`
`provider.order[N]` (and `only[N]`, `ignore[N]`, `quantizations[N]`)	Not a string	`invalid_type`
`provider.order[N]` (and `only[N]`, `ignore[N]`, `quantizations[N]`)	Empty string	`invalid_value`
`provider.sort`	Not a string	`invalid_type`
`provider.sort`	Empty string	`invalid_value`

All 400 responses use the standard error envelope:

{
  "error": {
    "message": "model is required",
    "type": "invalid_request_error",
    "param": "model",
    "code": "missing_required_parameter"
  }
}

Reasoning model parameter restrictions

o-series reasoning models have additional parameter constraints enforced at the gateway before the request reaches Fuelix. Violations return 400 with code: "unsupported_value" or "unsupported_parameter".

All reasoning models (o1, o3, o4-mini families and dated variants):

Parameter	Constraint	Code
`n`	Must be `1` — parallel completions are not supported	`unsupported_value`
`logprobs`	Must not be `true`	`unsupported_parameter`
`top_logprobs`	Must not be a positive integer	`unsupported_parameter`

o1 family only (o1, o1-mini, o1-preview, and dated variants like o1-2024-12-17):

Parameter	Required value	Code
`temperature`	`1` (the default) or absent	`unsupported_value`
`top_p`	`1` (the default) or absent	`unsupported_value`
`presence_penalty`	`0` (the default) or absent	`unsupported_value`
`frequency_penalty`	`0` (the default) or absent	`unsupported_value`

The o3/o4 family (o3, o3-mini, o4-mini, and dated variants like o4-mini-2025-04-16) does not restrict temperature, top_p, presence_penalty, or frequency_penalty — pass any valid value and it is forwarded to Fuelix unchanged.

Function calling

Use tools and tool_choice to define functions the model can call:

{
  "model": "gpt-4o",
  "messages": [{ "role": "user", "content": "What's the weather in London?" }],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": { "type": "string" }
          },
          "required": ["city"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Tool name validation: must match [a-zA-Z0-9_-]{1,64} (OpenAI API constraint). Tool descriptions and parameters schemas are optional but forwarded as-is.

JSON mode and structured output

Use response_format to constrain the model’s output format:

{ "response_format": { "type": "json_object" } }

For structured output with a schema:

{
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "my_schema",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": { "answer": { "type": "string" } },
        "required": ["answer"]
      }
    }
  }
}

When type is "json_schema", both the json_schema sub-object and json_schema.name (non-empty string) are required by the gateway. Extra fields like strict and schema are forwarded to Fuelix unchanged.

Vision and multimodal content

Vision-capable models (gpt-4o, gpt-4.1, Claude Sonnet/Haiku, Gemini, and others) accept image content alongside text. Audio-input-capable models such as gpt-4o-audio-preview additionally accept base64-encoded audio. Pass an array of typed content blocks in the content field of a user message instead of a plain string:

Content block type	Required fields	Description
`{ "type": "text", "text": "..." }`	`text`	Plain-text segment
`{ "type": "image_url", "image_url": { "url": "..." } }`	`url`	Publicly accessible HTTPS URL, or `data:image/jpeg;base64,...` inline
`{ "type": "input_audio", "input_audio": { "data": "...", "format": "..." } }`	`data`, `format`	Base64-encoded audio clip; see Audio input (`input_audio`) below

Example — image question

curl https://api.bve.me/v1/chat/completions \
  -H "Authorization: Bearer sk-bve-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "What does this chart show?" },
          { "type": "image_url", "image_url": { "url": "https://example.com/sales-chart.png" } }
        ]
      }
    ]
  }'

Image content block validation

When content is an array (multimodal format), the gateway validates each block. Malformed blocks return a 400 with the standard error envelope — instead of an opaque upstream 422.

All blocks:

Condition	Code	`param`
Block not an object	`invalid_type`	`"messages[N].content[M]"`
`type` absent	`missing_required_parameter`	`"messages[N].content[M].type"`
`type` not a string	`invalid_type`	`"messages[N].content[M].type"`

Unknown block types pass through without validation (forward-compatible with future OpenAI block types).

type: "text" blocks:

Condition	Code	`param`
`text` absent	`missing_required_parameter`	`"messages[N].content[M].text"`
`text` not a string	`invalid_type`	`"messages[N].content[M].text"`

type: "image_url" blocks:

Condition	Code	`param`
`image_url` absent	`missing_required_parameter`	`"messages[N].content[M].image_url"`
`image_url` not an object	`invalid_type`	`"messages[N].content[M].image_url"`
`image_url.url` absent	`missing_required_parameter`	`"messages[N].content[M].image_url.url"`
`image_url.url` empty or not a string	`invalid_value`	`"messages[N].content[M].image_url.url"`
`image_url.detail` not `"auto"`, `"low"`, or `"high"` (when present)	`invalid_value`	`"messages[N].content[M].image_url.detail"`

Example error:

{
  "error": {
    "message": "messages[0].content[1].image_url.url is required",
    "type": "invalid_request_error",
    "param": "messages[0].content[1].image_url.url",
    "code": "missing_required_parameter"
  }
}

Example — OCR with mistral-ocr

mistral-ocr uses the same multimodal content format. Pass a document or image URL and the model returns the extracted text:

curl https://api.bve.me/v1/chat/completions \
  -H "Authorization: Bearer sk-bve-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mistral-ocr",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "Extract all line items and totals from this invoice." },
          { "type": "image_url", "image_url": { "url": "https://example.com/invoice.pdf" } }
        ]
      }
    ]
  }'

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    {
      role: 'user',
      content: [
        { type: 'text', text: 'Describe this image.' },
        { type: 'image_url', image_url: { url: 'https://example.com/photo.jpg' } },
      ],
    },
  ],
});
console.log(response.choices[0].message.content);

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Describe this image."},
                {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}},
            ],
        }
    ],
)
print(response.choices[0].message.content)

Audio input (`input_audio`)

Audio-input-capable models such as gpt-4o-audio-preview accept base64-encoded audio clips directly in user message content via the input_audio block type. Use this to transcribe, analyse, or respond to audio without a separate transcription step.

curl https://api.bve.me/v1/chat/completions \
  -H "Authorization: Bearer sk-bve-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-audio-preview",
    "messages": [
      {
        "role": "user",
        "content": [
          { "type": "text", "text": "What is being said in this audio clip?" },
          {
            "type": "input_audio",
            "input_audio": {
              "data": "<base64-encoded-audio>",
              "format": "mp3"
            }
          }
        ]
      }
    ]
  }'

`input_audio` block fields

Field	Type	Required	Description
`type`	string	Yes	Must be `"input_audio"`
`input_audio`	object	Yes	Audio payload object (see below)
`input_audio.data`	string	Yes	Base64-encoded audio — must be a non-empty string
`input_audio.format`	string	Yes	Audio encoding format — one of `flac`, `m4a`, `mp3`, `ogg`, `wav`, `webm`

`input_audio` content block validation

When a user message contains an input_audio block, the gateway validates its shape before forwarding to Fuelix. Malformed blocks return a descriptive 400 rather than an opaque upstream 422.

Condition	Code	`param`
`input_audio` field absent	`missing_required_parameter`	`"messages[N].content[M].input_audio"`
`input_audio` not an object	`invalid_type`	`"messages[N].content[M].input_audio"`
`input_audio.data` absent	`missing_required_parameter`	`"messages[N].content[M].input_audio.data"`
`input_audio.data` not a string	`invalid_type`	`"messages[N].content[M].input_audio.data"`
`input_audio.data` empty string	`invalid_value`	`"messages[N].content[M].input_audio.data"`
`input_audio.format` absent	`missing_required_parameter`	`"messages[N].content[M].input_audio.format"`
`input_audio.format` not a string	`invalid_type`	`"messages[N].content[M].input_audio.format"`
`input_audio.format` not a recognised format	`invalid_value`	`"messages[N].content[M].input_audio.format"`

OpenAI SDK

TypeScript
Python

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-bve-YOUR_KEY',
  baseURL: 'https://api.bve.me/v1',
});

const response = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'What is 2 + 2?' }],
});

console.log(response.choices[0].message.content);

from openai import OpenAI

client = OpenAI(
    api_key="sk-bve-YOUR_KEY",
    base_url="https://api.bve.me/v1",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is 2 + 2?"}],
)

print(response.choices[0].message.content)

Reasoning models (o-series)

The o1, o1-mini, o1-preview, o3, o3-mini, and o4-mini models — and their dated variants such as o1-2024-12-17 and o4-mini-2025-04-16 — support the reasoning_effort parameter, which controls how much compute the model spends on internal reasoning before producing a response.

Value	Description
`"low"`	Fastest, least reasoning compute
`"medium"`	Balanced (default on most o-series models)
`"high"`	Slowest, maximum reasoning compute
`"auto"`	Let the model pick the most appropriate reasoning level automatically

Passing any other value (e.g. "extreme" or an integer) returns 400 invalid_value immediately, before the request reaches Fuelix:

{
  "error": {
    "message": "reasoning_effort must be one of: low, medium, high, auto",
    "type": "invalid_request_error",
    "param": "reasoning_effort",
    "code": "invalid_value"
  }
}

Example — o3-mini with reasoning_effort

curl https://api.bve.me/v1/chat/completions \
  -H "Authorization: Bearer sk-bve-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "o3-mini",
    "messages": [{ "role": "user", "content": "Prove that sqrt(2) is irrational." }],
    "reasoning_effort": "high"
  }'

const response = await client.chat.completions.create({
  model: 'o3-mini',
  messages: [{ role: 'user', content: 'Prove that sqrt(2) is irrational.' }],
  // @ts-expect-error — reasoning_effort is not yet in openai@4.x typedefs
  reasoning_effort: 'high',
});

Provider-specific extensions

BVE Gateway validates and forwards several non-OpenAI parameters that specific upstream providers support. These are accepted by the gateway, validated for type/range, and passed through to Fuelix unchanged.

Gemini thinking budget (`thinking_config`)

Gemini 2.5 Flash and 2.5 Pro support a thinking budget via the thinking_config parameter. The thinking budget controls how many tokens the model may spend on internal chain-of-thought before generating the response.

`thinking_budget` value	Effect
`0`	Disable thinking — model responds like a standard non-thinking model
`1`–`N`	Allow up to N thinking tokens before generating
Omitted	Provider default

curl https://api.bve.me/v1/chat/completions \
  -H "Authorization: Bearer sk-bve-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-2.5-flash",
    "messages": [{ "role": "user", "content": "Prove the Pythagorean theorem." }],
    "thinking_config": { "thinking_budget": 8000 }
  }'

Passing a non-object (e.g. "thinking_config": 5000) or a negative/float budget returns 400 invalid_type / 400 invalid_value before reaching Fuelix.

Claude extended thinking

Claude claude-3-7-sonnet and later support extended thinking via the thinking parameter. When enabled, the model produces internal reasoning traces before generating its final answer. These traces appear as thinking-type content blocks in the response alongside the normal text block.

Field	Type	Required	Description
`type`	string	Yes	`"enabled"` to turn on thinking; `"disabled"` to turn it off
`budget_tokens`	integer	When `type` is `"enabled"`	Maximum thinking tokens — must be an integer ≥ 1024

curl https://api.bve.me/v1/chat/completions \
  -H "Authorization: Bearer sk-bve-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [{ "role": "user", "content": "Step by step: a train leaves Chicago at 60 mph, another leaves NYC at 80 mph, 800 miles apart. When do they meet?" }],
    "thinking": { "type": "enabled", "budget_tokens": 5000 }
  }'

When thinking.type is "enabled", the response may include thinking content blocks before the text answer:

{
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": [
        { "type": "thinking", "thinking": "Let me set up the equation. Combined speed = 60 + 80 = 140 mph. Time = 800 / 140 ≈ 5.71 hours." },
        { "type": "text", "text": "The trains meet after approximately 5 hours 43 minutes." }
      ]
    },
    "finish_reason": "end_turn"
  }]
}

Constraint violations return 400 before reaching Fuelix:

thinking not an object → invalid_type
thinking.type absent → missing_required_parameter
thinking.type not "enabled" or "disabled" → invalid_value
thinking.budget_tokens absent when type is "enabled" → missing_required_parameter
thinking.budget_tokens < 1024 or not an integer → invalid_value

OpenRouter top-a sampling (`top_a`)

top_a is an OpenRouter-native sampling parameter. A token is only considered if its probability satisfies:

P(token) ≥ top_a × P(max_token)²

Valid range: [0, 1]. Use alongside top_p or temperature to further narrow the candidate token pool.

curl https://api.bve.me/v1/chat/completions \
  -H "Authorization: Bearer sk-bve-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{ "role": "user", "content": "Write a haiku." }],
    "top_a": 0.3
  }'

OpenRouter prompt transforms (`transforms`)

transforms is an OpenRouter-native prompt-transformation pipeline applied before inference. Pass an array of transformation name strings:

curl https://api.bve.me/v1/chat/completions \
  -H "Authorization: Bearer sk-bve-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{ "role": "user", "content": "Summarise this document: ..." }],
    "transforms": ["middle-out"]
  }'

The only documented transform value is "middle-out" — OpenRouter’s context-window compression algorithm. When a prompt exceeds a model’s context limit, OpenRouter removes less-important middle tokens to bring it within bounds. This is transparent to the caller: the model still receives a coherent prompt and you avoid a context-length error.

transforms is forwarded unchanged to Fuelix. Providers that do not support it silently ignore the field.

Validation rules:

transforms must be an array — a string or object returns 400 invalid_type.
Each element must be a non-empty string — non-strings return 400 invalid_type; empty strings return 400 invalid_value.
An empty array ([]) is accepted (no transforms applied).

OpenRouter provider routing (`provider`)

The provider object controls which upstream providers OpenRouter may route a request to, along with fallback and data-collection settings. When omitted, OpenRouter applies its default routing logic.

Field	Type	Description
`order`	string[]	Preferred provider order — OpenRouter tries each in sequence (e.g. `["Anthropic", "AWS Bedrock"]`)
`only`	string[]	Restrict routing to only these providers — any not listed are excluded
`ignore`	string[]	Exclude these providers from routing — opposite of `only`
`allow_fallbacks`	boolean	Whether to try other providers if the preferred provider fails (`true` by default)
`require_parameters`	boolean	Only route to providers that support all parameters in the request
`data_collection`	string	Either `"allow"` (default) or `"deny"` to opt out of provider training data collection
`quantizations`	string[]	Restrict to providers offering specific quantization levels (e.g. `["fp8", "int4"]`)
`sort`	string	Sort providers by a criterion before routing (e.g. `"throughput"`, `"price"`, `"latency"`)

curl https://api.bve.me/v1/chat/completions \
  -H "Authorization: Bearer sk-bve-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{ "role": "user", "content": "Hello" }],
    "provider": {
      "order": ["OpenAI", "Azure"],
      "allow_fallbacks": true,
      "data_collection": "deny",
      "require_parameters": false
    }
  }'

provider is forwarded unchanged to Fuelix. Non-OpenRouter providers silently ignore the field.

Validation rules:

provider must be an object — a string, number, or array returns 400 invalid_type.
allow_fallbacks and require_parameters must be booleans when present.
data_collection must be exactly "allow" or "deny" — any other string returns 400 invalid_value.
order, only, ignore, and quantizations must be string arrays. Non-arrays return 400 invalid_type; non-string elements or empty strings return 400 invalid_type / 400 invalid_value.
sort must be a non-empty string when present.

Web search (OpenAI)

OpenAI’s search-capable models (gpt-4o-search-preview, gpt-4.1, and gpt-4.1-mini) can perform live web searches before generating a response. Enable web search by including the web_search_options object in the request body.

Field	Type	Required	Description
`search_context_size`	string	No	Amount of web search context to retrieve: `"low"`, `"medium"` (default), or `"high"`. Higher values improve answer quality at the cost of more tokens.
`user_location`	object	No	Hint the model’s search toward a geographic region. See below.

user_location shape:

{
  "type": "approximate",
  "approximate": {
    "country": "US",
    "region": "California",
    "city": "San Francisco",
    "timezone": "America/Los_Angeles"
  }
}

All user_location.approximate fields are optional strings. Only type: "approximate" is accepted.

curl https://api.bve.me/v1/chat/completions \
  -H "Authorization: Bearer sk-bve-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-search-preview",
    "messages": [{ "role": "user", "content": "What is the latest news about Cloudflare?" }],
    "web_search_options": {
      "search_context_size": "high",
      "user_location": {
        "type": "approximate",
        "approximate": { "country": "US", "timezone": "America/New_York" }
      }
    }
  }'

Passing a non-object web_search_options (e.g. "web_search_options": "medium") or an invalid search_context_size returns 400 invalid_type / 400 invalid_value before the request reaches Fuelix.

Audio output (`modalities` and `audio`)

Audio-capable models such as gpt-4o-audio-preview can return spoken audio alongside (or instead of) text. Use modalities to declare which output types you want, and audio to configure voice and encoding format.

{
  "model": "gpt-4o-audio-preview",
  "modalities": ["text", "audio"],
  "audio": { "voice": "alloy", "format": "mp3" },
  "messages": [{ "role": "user", "content": "Say hello in a cheerful tone." }]
}

`modalities`

A non-empty array of output modality strings. Accepted values:

Value	Description
`"text"`	Return a text completion in `choices[N].message.content`
`"audio"`	Return audio data in `choices[N].message.audio` (requires `audio` config)

Omit modalities entirely for standard text-only output.

`audio`

Required when modalities includes "audio". An object with two fields:

Field	Type	Required	Description
`voice`	string	Yes	Voice ID for the generated speech. One of: `alloy`, `ash`, `ballad`, `cedar`, `coral`, `echo`, `fable`, `marin`, `nova`, `onyx`, `sage`, `shimmer`, `verse`
`format`	string	No	Audio encoding format. One of: `aac`, `flac`, `mp3`, `opus`, `pcm16`, `pcm24`, `wav`. Defaults to provider default when omitted.

Predicted Outputs (OpenAI)

OpenAI’s Predicted Outputs feature uses speculative decoding to significantly reduce latency when you know most of the response text in advance — for example, code editing tasks where most lines remain unchanged.

Supply the expected output text via the prediction parameter:

{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "user",
      "content": "Replace 'Hello' with 'Hi' in this function:\n\ndef greet():\n    return 'Hello, world!'"
    }
  ],
  "prediction": {
    "type": "content",
    "content": "def greet():\n    return 'Hello, world!'"
  }
}

The content field accepts either a plain string or an array of {type: "text", text: string} content blocks:

{
  "prediction": {
    "type": "content",
    "content": [
      { "type": "text", "text": "def greet():\n    return 'Hello, world!'" }
    ]
  }
}

Next steps

Streaming SSE format, token usage tracking, and streaming examples in Python and TypeScript

Models Full model catalog including o-series, vision, and OCR models

cURL Examples Quick-reference cURL for every endpoint including vision and OCR

Rate Limits & Quotas Per-key RPM/RPD limits, monthly caps, and 429 Retry-After handling

Errors Full error code reference with envelope shape and handling examples