Responses API

POST https://api.bve.me/v1/responses

Requires Authorization: Bearer sk-bve-YOUR_KEY.

The Responses API is OpenAI’s newer, stateful generation interface. BVE Gateway proxies this endpoint directly to Fuelix.

Request body

{
  "model": "gpt-4o",
  "input": "What is 2 + 2?",
  "max_output_tokens": 100
}

Field	Type	Required	Description
`model`	string	Yes	GPT model ID (e.g. `gpt-4o`, `gpt-4.1`)
`input`	string or array	Yes	Text prompt or message array
`instructions`	string	No	System-level instructions (equivalent to a `system` message)
`max_output_tokens`	integer	No	Max tokens (minimum 16)
`temperature`	number	No	Sampling temperature — `[0, 2]`
`top_p`	number	No	Nucleus sampling — `[0, 1]`
`stream`	boolean	No	Enable SSE streaming
`store`	boolean	No	Persist the response in OpenAI’s response store
`parallel_tool_calls`	boolean	No	Allow the model to call multiple tools simultaneously
`tools`	array	No	Tool definitions for function calling
`tool_choice`	string or object	No	Tool selection — `"none"`, `"auto"`, `"required"`, or `{type:"function", function:{name}}`
`response_format`	object	No	Output format — `{type:"text"}`, `{type:"json_object"}`, or `{type:"json_schema", json_schema:{name, schema}}`
`reasoning`	object	No	Reasoning budget for o-series models — `{effort: "low" \| "medium" \| "high", summary?: string}`. Note: `"auto"` is not accepted here; use the top-level `reasoning_effort` field with `"auto"` in `/v1/chat/completions` instead
`user`	string	No	End-user identifier forwarded to Fuelix for abuse detection (≤ 256 chars)
`previous_response_id`	string	No	Response ID to continue — enables multi-turn conversations
`truncation`	string	No	Context-window truncation strategy when the conversation exceeds the model limit: `"auto"` or `"disabled"`
`metadata`	object	No	String-to-string map attached to the response for caller tracking (max 16 pairs; keys ≤ 64 chars; values ≤ 512 chars)

Response

{
  "id": "resp_abc123",
  "object": "response",
  "created_at": 1716288000,
  "model": "gpt-4o-2024-11-20",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        { "type": "output_text", "text": "4" }
      ]
    }
  ],
  "usage": {
    "input_tokens": 7,
    "output_tokens": 1,
    "total_tokens": 8
  }
}

cURL example

curl https://api.bve.me/v1/responses \
  -H "Authorization: Bearer sk-bve-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "What is 2 + 2?",
    "max_output_tokens": 100
  }'

Multi-turn with previous_response_id

# First turn
curl https://api.bve.me/v1/responses \
  -H "Authorization: Bearer sk-bve-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "My name is Alice.",
    "max_output_tokens": 100
  }'

# Second turn (reference the previous response)
curl https://api.bve.me/v1/responses \
  -H "Authorization: Bearer sk-bve-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "What is my name?",
    "previous_response_id": "resp_abc123",
    "max_output_tokens": 100
  }'

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-bve-YOUR_KEY',
  baseURL: 'https://api.bve.me/v1',
});

const response = await client.responses.create({
  model: 'gpt-4o',
  input: 'What is 2 + 2?',
});

console.log(response.output_text);

from openai import OpenAI

client = OpenAI(
    api_key="sk-bve-YOUR_KEY",
    base_url="https://api.bve.me/v1",
)

response = client.responses.create(
    model="gpt-4o",
    input="What is 2 + 2?",
)

print(response.output_text)

Streaming

Pass "stream": true (or use client.responses.stream()) to receive a Server-Sent Events stream. See Streaming — Responses API for the event format and a full code example.

const stream = client.responses.stream({
  model: 'gpt-4o',
  input: 'Count from 1 to 5.',
});

for await (const event of stream) {
  if (event.type === 'response.output_text.delta') {
    process.stdout.write(event.delta);
  }
}

const finalResponse = await stream.finalResponse();
console.log('Usage:', finalResponse.usage);

with client.responses.stream(
    model="gpt-4o",
    input="Count from 1 to 5.",
) as stream:
    for event in stream:
        if event.type == "response.output_text.delta":
            print(event.delta, end="", flush=True)

print()
final_response = stream.get_final_response()
print("Usage:", final_response.usage)

curl https://api.bve.me/v1/responses \
  -H "Authorization: Bearer sk-bve-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "Count from 1 to 5.",
    "stream": true
  }'

Retrieve a response

GET https://api.bve.me/v1/responses/:id

Retrieve a previously created response by its ID. Proxied directly to Fuelix.

curl https://api.bve.me/v1/responses/resp_abc123 \
  -H "Authorization: Bearer sk-bve-YOUR_KEY"

Delete a response

DELETE https://api.bve.me/v1/responses/:id

Delete a stored response. Proxied directly to Fuelix. Returns 200 with a deletion confirmation object on success.

curl -X DELETE https://api.bve.me/v1/responses/resp_abc123 \
  -H "Authorization: Bearer sk-bve-YOUR_KEY"

Gateway validation

The gateway validates required fields before forwarding to Fuelix. Missing or invalid fields return 400 with a standard error shape instead of a Fuelix-specific Pydantic error.

Required fields

Missing / invalid	Code	`param`
`model` absent	`missing_required_parameter`	`"model"`
`model` not a string	`invalid_type`	`"model"`
`input` absent	`missing_required_parameter`	`"input"`
`input` not a string or array	`invalid_type`	`"input"`

Optional field constraints

Field	Constraint	Code	`param`
`instructions`	must be a string	`invalid_type`	`"instructions"`
`temperature`	must be a number	`invalid_type`	`"temperature"`
`temperature`	must be in `[0, 2]`	`invalid_value`	`"temperature"`
`top_p`	must be a number	`invalid_type`	`"top_p"`
`top_p`	must be in `[0, 1]`	`invalid_value`	`"top_p"`
`max_output_tokens`	must be an integer ≥ 16	`invalid_value`	`"max_output_tokens"`
`stream`	must be a boolean	`invalid_type`	`"stream"`
`store`	must be a boolean	`invalid_type`	`"store"`
`parallel_tool_calls`	must be a boolean	`invalid_type`	`"parallel_tool_calls"`
`previous_response_id`	must be a string	`invalid_type`	`"previous_response_id"`
`truncation`	must be a string	`invalid_type`	`"truncation"`
`truncation`	must be `"auto"` or `"disabled"`	`invalid_value`	`"truncation"`
`metadata`	must be an object (not an array)	`invalid_type`	`"metadata"`
`metadata`	must not have more than 16 key-value pairs	`invalid_value`	`"metadata"`
`metadata` (keys)	key must not exceed 64 characters	`invalid_value`	`"metadata"`
`metadata["KEY"]`	value must be a string	`invalid_type`	`"metadata.KEY"`
`metadata["KEY"]`	value must not exceed 512 characters	`invalid_value`	`"metadata.KEY"`
`user`	must be a string	`invalid_type`	`"user"`
`user`	must be ≤ 256 characters	`invalid_value`	`"user"`
`reasoning`	must be an object (not an array)	`invalid_type`	`"reasoning"`
`reasoning.effort`	not a string	`invalid_type`	`"reasoning.effort"`
`reasoning.effort`	not `"low"`, `"medium"`, or `"high"` (including `"auto"`, which is only valid in `/v1/chat/completions`)	`invalid_value`	`"reasoning.effort"`
`reasoning.summary`	must be a string	`invalid_type`	`"reasoning.summary"`
`response_format`	must be an object with a `type` field	`invalid_type`	`"response_format"`
`response_format.type`	must be `"text"`, `"json_object"`, or `"json_schema"`	`invalid_value`	`"response_format.type"`
`response_format.json_schema`	required when `type` is `"json_schema"`	`missing_required_parameter`	`"response_format.json_schema"`
`response_format.json_schema.name`	required non-empty string	`missing_required_parameter`	`"response_format.json_schema.name"`
`tools`	not an array	`invalid_type`	`"tools"`
`tools`	empty array	`invalid_value`	`"tools"`
`tools[N]`	not an object	`invalid_type`	`"tools[N]"`
`tools[N].type`	absent	`missing_required_parameter`	`"tools[N].type"`
`tools[N].type`	not a string	`invalid_type`	`"tools[N].type"`
`tools[N].type`	not `"function"`	`invalid_value`	`"tools[N].type"`
`tools[N].function`	absent	`missing_required_parameter`	`"tools[N].function"`
`tools[N].function`	not an object	`invalid_type`	`"tools[N].function"`
`tools[N].function.name`	absent or not a string	`invalid_type`	`"tools[N].function.name"`
`tools[N].function.name`	empty string	`invalid_value`	`"tools[N].function.name"`
`tools[N].function.name`	fails `[a-zA-Z0-9_-]{1,64}` regex	`invalid_value`	`"tools[N].function.name"`
`tool_choice`	not a string or object	`invalid_type`	`"tool_choice"`
`tool_choice` (string)	not `"none"`, `"auto"`, or `"required"`	`invalid_value`	`"tool_choice"`
`tool_choice.type`	absent	`missing_required_parameter`	`"tool_choice.type"`
`tool_choice.type`	not a string	`invalid_type`	`"tool_choice.type"`
`tool_choice.type`	not `"function"`	`invalid_value`	`"tool_choice.type"`
`tool_choice.function`	absent	`missing_required_parameter`	`"tool_choice.function"`
`tool_choice.function`	not an object	`invalid_type`	`"tool_choice.function"`
`tool_choice.function.name`	absent or not a string	`invalid_type`	`"tool_choice.function.name"`
`tool_choice.function.name`	empty string	`invalid_value`	`"tool_choice.function.name"`

Example 400 response:

{
  "error": {
    "message": "model is required",
    "type": "invalid_request_error",
    "param": "model",
    "code": "missing_required_parameter"
  }
}

Response headers

BVE Gateway adds the following headers to every authenticated response:

Header	Example	Description
`X-Request-Id`	`550e8400-…`	UUID for this request (generated per request)
`X-BVE-Client-Id`	`my-trace-123`	Echo of the client-supplied `X-Request-Id` (when present and valid: alphanumeric + `-_.`, ≤ 128 chars). Absent when not supplied or value failed validation.
`X-BVE-Latency`	`143`	Total gateway latency in milliseconds
`X-BVE-Model`	`gpt-4o`	Model ID resolved for this request
`X-BVE-Key-Name`	`prod-key`	Name of the API key used for this request (redacted if it matches a provider credential pattern)

The full X-RateLimit-* header set (RPM, RPD, monthly) is also included. See Rate Limits & Quotas for details and example output.

Notes

Only GPT models are supported upstream (e.g. gpt-4o, gpt-4.1, gpt-5, o3, o4-mini).
For Claude or Gemini models, use /v1/chat/completions or /v1/messages instead.
The gateway enforces max_output_tokens >= 16 for the Responses API.
Token usage (input_tokens, output_tokens) is extracted and recorded in D1 for both streaming and non-streaming requests.

Next steps

Chat Completions For all model providers — OpenAI, Claude, Gemini — via POST /v1/chat/completions

Streaming — Responses API Full code examples for SSE streaming with the Responses API

Rate Limits & Quotas Per-key RPM/RPD/monthly enforcement and Retry-After headers

Errors Full error code reference including model_endpoint_mismatch for non-OpenAI models