Streaming

BVE Gateway supports streaming responses for three endpoints:

Endpoint	Format	Content-Type
`POST /v1/chat/completions`	OpenAI SSE (data-only events, `[DONE]` terminator) or Gemini NDJSON	`text/event-stream` or `application/x-ndjson`
`POST /v1/messages`	Anthropic SSE (named events: `message_start`, `content_block_delta`, `message_delta`) or Gemini NDJSON	`text/event-stream` or `application/x-ndjson`
`POST /v1/responses`	Responses API SSE (`response.created`, `response.in_progress`, `response.completed`)	`text/event-stream`

How streaming works

When "stream": true is set in the request body, the upstream response body is piped to the client via a tee:

One branch passes the raw SSE stream directly to the caller with no buffering
One branch asynchronously extracts token usage (prompt and completion tokens) from the stream in the background via ctx.waitUntil — this never blocks the response

BVE Gateway:

Preserves the upstream Content-Type (typically text/event-stream)
Preserves the upstream status code
Preserves the SSE byte sequence exactly as Fuelix sends it
Adds X-Request-Id (a per-request UUID) and X-BVE-Latency (gateway latency in ms) to the response headers

Chat completions SSE format

Each streamed chunk follows the OpenAI SSE format — a data: line containing a JSON delta, terminated by data: [DONE]:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1716288000,"model":"gpt-4o","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1716288000,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1716288000,"model":"gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":5,"total_tokens":15}}

data: [DONE]

Anthropic Messages API SSE format

POST /v1/messages with "stream": true uses Anthropic’s named-event SSE format:

event: message_start
data: {"type":"message_start","message":{"id":"msg_abc","type":"message","role":"assistant","model":"claude-opus-4-5","usage":{"input_tokens":25,"output_tokens":1},"content":[]}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":12}}

event: message_stop
data: {"type":"message_stop"}

Requires Anthropic-Version: 2023-06-01 request header (add to the CORS allowlist when calling from a browser).

Responses API SSE format

POST /v1/responses with "stream": true uses the Responses API event format:

event: response.created
data: {"type":"response.created","response":{"id":"resp_abc","model":"gpt-4o","status":"in_progress"}}

event: response.in_progress
data: {"type":"response.in_progress","response":{"id":"resp_abc","status":"in_progress"}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":"Hello"}

event: response.completed
data: {"type":"response.completed","response":{"id":"resp_abc","status":"completed","usage":{"input_tokens":10,"output_tokens":5}}}

event: response.done
data: [DONE]

Code examples

Chat completions

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-bve-YOUR_KEY',
  baseURL: 'https://api.bve.me/v1',
});

const stream = await client.chat.completions.stream({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Count from 1 to 5.' }],
});

for await (const chunk of stream) {
  const text = chunk.choices[0]?.delta?.content ?? '';
  process.stdout.write(text);
}

// Access the final message once streaming is complete
const finalMessage = await stream.finalMessage();
console.log('Usage:', finalMessage.usage);

from openai import OpenAI

client = OpenAI(
    api_key="sk-bve-YOUR_KEY",
    base_url="https://api.bve.me/v1",
)

with client.chat.completions.stream(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Count from 1 to 5."}],
) as stream:
    for chunk in stream:
        text = chunk.choices[0].delta.content or ""
        print(text, end="", flush=True)

print()  # newline

curl https://api.bve.me/v1/chat/completions \
  -H "Authorization: Bearer sk-bve-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{ "role": "user", "content": "Count from 1 to 5." }],
    "stream": true
  }'

Anthropic Messages API

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  apiKey: 'sk-bve-YOUR_KEY',
  baseURL: 'https://api.bve.me',
});

const stream = await client.messages.stream({
  model: 'claude-opus-4-5',
  max_tokens: 256,
  messages: [{ role: 'user', content: 'Count from 1 to 5.' }],
});

for await (const chunk of stream) {
  if (chunk.type === 'content_block_delta' && chunk.delta.type === 'text_delta') {
    process.stdout.write(chunk.delta.text);
  }
}

import anthropic

client = anthropic.Anthropic(
    api_key="sk-bve-YOUR_KEY",
    base_url="https://api.bve.me",
)

with client.messages.stream(
    model="claude-opus-4-5",
    max_tokens=256,
    messages=[{"role": "user", "content": "Count from 1 to 5."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

print()  # newline

curl https://api.bve.me/v1/messages \
  -H "Authorization: Bearer sk-bve-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -H "Anthropic-Version: 2023-06-01" \
  -d '{
    "model": "claude-opus-4-5",
    "max_tokens": 256,
    "messages": [{ "role": "user", "content": "Count from 1 to 5." }],
    "stream": true
  }'

Responses API

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-bve-YOUR_KEY',
  baseURL: 'https://api.bve.me/v1',
});

const stream = client.responses.stream({
  model: 'gpt-4o',
  input: 'Count from 1 to 5.',
});

for await (const event of stream) {
  if (event.type === 'response.output_text.delta') {
    process.stdout.write(event.delta);
  }
}

// Access usage once streaming is complete
const finalResponse = await stream.finalResponse();
console.log('Usage:', finalResponse.usage);

from openai import OpenAI

client = OpenAI(
    api_key="sk-bve-YOUR_KEY",
    base_url="https://api.bve.me/v1",
)

with client.responses.stream(
    model="gpt-4o",
    input="Count from 1 to 5.",
) as stream:
    for event in stream:
        if event.type == "response.output_text.delta":
            print(event.delta, end="", flush=True)

print()  # newline
final_response = stream.get_final_response()
print("Usage:", final_response.usage)

curl https://api.bve.me/v1/responses \
  -H "Authorization: Bearer sk-bve-YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "input": "Count from 1 to 5.",
    "stream": true
  }'

Token usage tracking

For streaming responses, BVE Gateway extracts token counts from the SSE stream in the background:

Endpoint	Where usage is extracted
`/v1/chat/completions` (SSE)	Last delta chunk containing `usage` field (`prompt_tokens` / `completion_tokens`)
`/v1/chat/completions` (Gemini NDJSON)	Last NDJSON object containing `usageMetadata` (`promptTokenCount` / `candidatesTokenCount`)
`/v1/messages` (Anthropic SSE)	`message_start` event (`input_tokens`) + `message_delta` event (`output_tokens`)
`/v1/messages` (Gemini NDJSON)	Last NDJSON object containing `usageMetadata` (`promptTokenCount` / `candidatesTokenCount`)
`/v1/responses`	`response.completed` event (`input_tokens` + `output_tokens`)

Token counts are recorded in D1 usage tables and visible via GET /admin/usage. The extraction runs in ctx.waitUntil and never delays the streamed response.

Response headers on streaming responses

The following headers are added by BVE Gateway on every streaming response:

Header	Description
`X-Request-Id`	Per-request UUID for log correlation
`X-BVE-Client-Id`	Echo of the client-supplied `X-Request-Id` (when present and valid). Absent when the client did not send `X-Request-Id` or the value failed validation.
`X-BVE-Latency`	Gateway-side latency in milliseconds
`X-BVE-Model`	Model ID for this request (from request body; also added to buffered non-streaming responses from the upstream response JSON)
`X-BVE-Key-Name`	Operator-assigned display name of the key that authenticated the request. Absent on 401/403 error responses

The same upstream header allowlist applies to streaming responses as to non-streaming responses. Key headers forwarded when present:

Header	Description
`content-type`	Always `text/event-stream` for SSE
`x-request-id` / `request-id`	Upstream request ID
`x-quota-allowed`, `x-quota-available`, `x-quota-reset`	Fuelix quota headers
`x-ratelimit-*`	Upstream provider rate-limit headers (aggregate account limits)
`retry-after`	Seconds to wait on 429/503
`anthropic-ratelimit-*`	Anthropic Messages API rate-limit headers
`x-groq-request-id`	Groq request tracking ID
`openai-processing-ms`	OpenAI server-side processing time

See Security Notes for the complete forwarded-header list.

Next steps

Chat Completions Full request/response reference for POST /v1/chat/completions

Anthropic Messages API Anthropic SSE streaming format: message_start, message_delta, and more

Responses API OpenAI Responses API streaming with server-sent events

Rate Limits & Quotas Per-key RPM/RPD limits, monthly caps, and quota window definitions