Skip to content

Streaming

BVE Gateway supports streaming responses via Server-Sent Events (SSE) for /v1/chat/completions.

When "stream": true is set in the request body, the Fuelix response is piped directly to the client without buffering. BVE Gateway:

  • Does not call response.json() or response.text() on streaming responses
  • Does not buffer the response body
  • Preserves the upstream Content-Type (typically text/event-stream)
  • Preserves the upstream status code
  • Preserves SSE format exactly as Fuelix sends it

Only the X-BVE-Latency header is added; all other BVE-specific headers come from the safe header allowlist.

Each streamed chunk is a data-only SSE event:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1716288000,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1716288000,"model":"gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
Terminal window
curl https://api.bve.me/v1/chat/completions \
-H "Authorization: Bearer sk-bve-YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{ "role": "user", "content": "Count from 1 to 5." }],
"stream": true
}'
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: 'sk-bve-YOUR_KEY',
baseURL: 'https://api.bve.me/v1',
});
const stream = await client.chat.completions.stream({
model: 'gpt-4o',
messages: [{ role: 'user', content: 'Count from 1 to 5.' }],
});
for await (const chunk of stream) {
const text = chunk.choices[0]?.delta?.content ?? '';
process.stdout.write(text);
}

The same header allowlist applies to streaming responses:

HeaderForwarded
content-typeYes
cache-controlYes
x-request-idYes
x-quota-allowedYes
x-quota-availableYes
x-quota-resetYes
X-BVE-LatencyAdded by gateway
All othersStripped