Changelog

2026-06-03 (docs correction)

feat(dashboard): add optional admin-only public catalog drift probe to the Models page

The Models page could already detect stale public deploys from https://api.bve.me/health, but it still had no first-class way to prove that the public GET /v1/models contract itself was drifting. That gap is now closed when the dashboard Worker has an optional server-side BVE_API_KEY binding available.

What changed

Added a new admin-only dashboard route: GET /api/public-api-catalog-probe
The route uses the configured BVE_API_KEY to query public https://api.bve.me/v1/models
The response is normalized against the same key’s allowed_models restrictions, so per-key filtering is not misreported as deploy drift
The Models page now shows a dedicated public contract drift warning card when the public list is missing live Fuelix rows, returns unexpected rows, still omits current bve_available / bve_availability annotations, or serves stale /v1/models/:id behavior for representative callable and blocked models
When the optional binding is absent, the page degrades cleanly and explains that only /health-level public checks are available

docs(dashboard): align Models page docs with the split live catalog / policy row / snapshot maintenance contract

The Admin Dashboard docs for /models were still describing a single blended allowlist table, a stale snapshotModels collection that the API no longer returns, and outdated status: "enabled" / status: "disabled" mutation payloads. They now match the current dashboard contract:

GET /api/models returns catalogAvailable, models, and policyModels
models contains the live Fuelix catalog rows for the current account
policyModels contains preserved D1 policy rows that are either stale (allowlist_only) or temporarily unverified while Fuelix is unavailable (allowlist_unverified)
standalone snapshot-only registry rows are no longer emitted by this endpoint; if a stale D1 row exists for one of those models it instead surfaces through policyModels
The row-level available flag means effective BVE routability after D1 policy overrides are applied
Global model policy mutations use enabled: true / enabled: false, and stale policy rows are removed via DELETE /admin/model-allowlist/:model

The historical degraded-mode note below was also clarified so it no longer reads like the pre-split response shape is still the current one.

2026-05-31 (improvement loop, iteration 828)

docs(errors): fix `invalid_json` description + add `invalid_content_type` error code

errors.mdx incorrectly documented invalid_json as handling two distinct failure modes: body not valid JSON, and wrong Content-Type. Since iteration 826, wrong Content-Type returns a separate invalid_content_type code.

Changes:

api-reference/errors.mdx — updated invalid_json row (body-only, not Content-Type); added invalid_content_type row to the Request errors table; added an example JSON block for the new code.
guides/troubleshooting.md — added a ### invalid_content_type section under Validation errors (400) with cause, fix, cURL example, and a note on which endpoints are exempt (multipart form-data endpoints).

2026-05-30 (improvement loop, iteration 823)

docs(admin/api-keys): document `GET /admin/api-keys/:id/usage` per-key usage history endpoint

The GET /admin/api-keys/:id/usage endpoint (added in commit be0800c) was missing from the Admin API: API Keys reference page and from the endpoint tables in Introduction and Admin API Overview.

A full Get key usage history section has been added to the API Keys reference page covering:

Endpoint, authentication, and path parameters
All query parameters: from, to, daily_limit, monthly_limit
Full example response with annotated daily[] and monthly[] row fields
cURL examples: full history, date-range slice, 90-day billing report, TypeScript token aggregation snippet
Common error table
Tip callout explaining when to use this endpoint vs GET /admin/usage?key_id= (key difference: this endpoint 404s on unknown IDs)

The endpoint also appears in the Admin API Overview endpoint table and the Introduction supported-endpoints table.

2026-05-30 (improvement loop, iteration 822)

feat(models): implement `?sort_by=`, `?sort_dir=`, and `?limit=` on `GET /v1/models`

GET /v1/models now supports three additional query parameters that were already documented in the OpenAPI spec (/openapi.json) but not yet implemented:

?sort_by=id|created — Sort the returned model list. id sorts alphabetically by model ID; created sorts by Unix creation timestamp ascending. Without ?sort_by, the upstream registration order is preserved.
?sort_dir=asc|desc — Sort direction. asc (ascending, default); desc (descending). Accepted but ignored when ?sort_by is omitted. Enables newest-first ordering with ?sort_by=created&sort_dir=desc.
?limit=1–200 — Maximum number of models to return after all other filters and sorting are applied. Useful for “top N” queries such as ?sort_by=created&sort_dir=desc&limit=5 to get the five newest models.

All three parameters return 400 invalid_value for out-of-range or unrecognised values. Invalid parameters are validated before the upstream models fetch, so no Fuelix round-trip is wasted on bad requests. sort_by and limit compose with all existing filters (?category=, ?provider=, ?reasoning=, ?vision=, ?endpoint=, ?search=, ?web_search=, ?audio_input=, ?tool_use=).

New tests (22 tests across validation and functional coverage):

Validation: sort_by=name, sort_by=created_at, sort_dir=ascending, limit=0, limit=201, limit=abc, limit=1.5 all return 400
sort_dir=asc without sort_by returns 200 (accepted but ignored per spec)
sort_by=id / sort_by=id&sort_dir=desc: assert alphabetical ordering
sort_by=created / sort_by=created&sort_dir=desc: assert timestamp ordering
limit=1 / limit=2 / limit=200: assert correct truncation
sort_by=id&sort_dir=desc&limit=2: top-2 reverse-alphabetical
sort_by=created&sort_dir=desc&limit=1: single newest model

2026-05-30 (improvement loop, iteration 810)

fix(scheduled): add `monthly_usage` cleanup to daily cron handler (24-month retention)

The monthly_usage table was never cleaned up by the daily cron, causing it to grow indefinitely — one row accumulates per API key per calendar month, resulting in ~12 rows/key/year without bound.

The handleScheduled batch now includes a sixth DELETE statement:

DELETE FROM monthly_usage WHERE year_month < ?

The cutoff is approximately 24 months ago (720 days), computed as a YYYY-MM string. Lexicographic comparison is correct for ISO 8601 year-month strings. The 30-day-per-month approximation may vary ±1–2 weeks at month boundaries, which is acceptable for a periodic cleanup job.

The scheduled_cleanup structured log now includes monthly_usage in the pruned object alongside the existing five tables. Three new unit tests cover: delete-past-cutoff, retain-within-cutoff, and pruned-count-in-log.

Why 24 months? Daily usage keeps 12 months (enough for operational monitoring); monthly totals are more useful for year-over-year comparisons, so a 2-year window is appropriate.

2026-05-30 (improvement loop, iteration 808)

smoke(?tool_use=): add `?tool_use=` filter smoke test coverage + `bve_tool_use` admin allowlist assertion

scripts/smoke-test.sh gained a new --- Model list ?tool_use= filter --- section (10 checks, ~14 curl invocations) following the same pattern established for ?reasoning=, ?vision=, ?web_search=, and ?audio_input=:

?tool_use=maybe → 400 with code=invalid_value, param=tool_use
?tool_use=1 (non-boolean) → 400
?tool_use=true → 200; confirms bve_tool_use field present; asserts gpt-4o is in the result set (canonical chat/function-calling model)
?tool_use=false → 200; asserts text-embedding-ada-002 is in the result set (embedding model, no function calling)

The GET /admin/model-allowlist admin section gains a bve_tool_use field present assertion — the annotation was shipped in iteration 807 but was not yet guarded in the smoke test.

2026-05-30 (improvement loop, iteration 807)

feat(admin): add `bve_tool_use` annotation to `GET /admin/model-allowlist` and `GET /admin/model-allowlist/:model`

Both admin model-allowlist endpoints now include a bve_tool_use boolean field alongside the existing bve_reasoning, bve_vision, bve_web_search, and bve_audio_input annotations. This brings the admin allowlist responses into parity with GET /v1/models, which has returned bve_tool_use since the initial tool-use annotation batch.

bve_tool_use: true means the model accepts the tools array in POST /v1/chat/completions for structured function calling. Embedding, TTS, STT, image-generation, and legacy completion-only models return false.

Also documents bve_web_search, bve_audio_input, and bve_tool_use in the Model Allowlist reference (previously undocumented despite being returned by the API), and adds bve_tool_use + the ?tool_use= filter to the Models reference page.

2026-05-30 (improvement loop, iteration 805)

docs(admin): document `GET /admin/api-keys/:id/audit` in the API Keys reference

The GET /admin/api-keys/:id/audit endpoint (added in iteration 800) was missing from the Admin API: API Keys reference page. A full Get key audit log section has been added covering:

Endpoint, authentication, and path parameters
All query parameters: since, until, limit, offset
Full example response with total, has_more, and annotated logs entries
Response field reference including the has_more boolean (not present in the global endpoint)
Explanation of how this differs from GET /admin/audit-logs?target_id= (404 on unknown keys)
cURL examples for full history, date-bounded queries, pagination, and the bun run key:audit CLI shortcut
Common errors table

A cross-link card for the per-key audit section has also been added to the Audit Logs page “Next steps” footer.

2026-05-30 (improvement loop, iteration 802)

refactor(scheduled): extract `pushAlert` helper + add audit_log and daily_usage cleanup

Three improvements bundled in this iteration:

audit_logs and daily_usage cleanup in the daily cron — The handleScheduled cron handler now prunes two additional tables on every run:
- audit_logs rows older than 365 days (one year of key-lifecycle events)
- daily_usage rows with date older than 365 days (one year of daily token/request totals) Without this cleanup, both tables grew permanently — every key create/update/revoke/rotate adds an audit_logs row, and daily_usage accumulates ~365 rows/key/year.
pushAlert helper eliminates duplicated .catch() pattern — The four alert loops in dispatchScheduledAlerts each had an identical 11-line .catch() block that logged scheduled_alert_send_failed. The shared helper centralises this so a change to the warn-log shape only needs to happen once and eliminates ~44 lines of copy-pasted error handling.
fuelix.ts allocation micro-optimisations — Hoisted _defaultSSESkipFilter to module scope (was an inline anonymous closure created on every streaming response) and replaced conditional spread operators ...(x ? { k: x } : {}) with explicit property assignment in makeSSEExtractor and parseUsageFromJson to reduce GC pressure on the streaming hot path.

2026-05-30 (improvement loop, iteration 800)

feat(admin): add `GET /admin/api-keys/:id/audit` per-key audit log endpoint

A new convenience endpoint scopes audit log retrieval to a single API key and verifies the key exists before returning results. Unlike GET /admin/audit-logs?target_id=<id>, which silently returns an empty list for unknown IDs, this endpoint returns 404 not_found when the key does not exist.

Endpoint: GET /admin/api-keys/:id/audit

Auth: Admin Bearer token required.

Query parameters: limit (1–500, default 100), offset (default 0), since (ISO 8601), until (ISO 8601).

Response shape (identical to GET /admin/audit-logs):

{
  "total": 3,
  "has_more": false,
  "logs": [
    {
      "id": "...",
      "action": "api_key.unsuspended",
      "actor_type": "admin",
      "target_type": "api_key",
      "target_id": "<key-id>",
      "metadata": { "source": "legacy_admin_api" },
      "created_at": "2026-05-30T16:00:00.000Z"
    }
  ]
}

Errors:

404 not_found — key ID does not exist
400 validation_error — invalid since/until or inverted range
401 — missing or invalid admin bearer token

2026-05-30 (improvement loop, iteration 798)

docs(key-stats): document `top_models` array on `GET /admin/key-stats`

The GET /admin/key-stats API reference page (admin-api/key-stats) was missing the top_models field that was added to every key row in iteration 797, even though it was already covered in the changelog entry. Updated to match the live API:

Response JSON example — both key rows now include a top_models array showing model ID + sampled request count entries, sorted descending
Response fields table — new top_models row describing the field type, sort order, and empty-array behavior for keys with no model-bearing log rows
Use-cases section — new “See which models each top key is using” jq recipe that pulls top_models alongside key_name and total_tokens

2026-05-30 (improvement loop, iteration 792)

docs(models): document `bve_audio_input` annotation and `?audio_input=` filter

docs/src/content/docs/api-reference/models.mdx updated to cover the bve_audio_input boolean annotation and the ?audio_input= query parameter added to GET /v1/models in iterations 788–790:

Query parameters table — new audio_input row (after web_search): true returns only models with inline audio-input capability (gpt-4o-audio-preview / gpt-4o-mini-audio-preview families); false returns only models without; any other value returns 400 invalid_value
cURL examples — three new examples: ?audio_input=true, ?audio_input=false, and ?audio_input=true&provider=openai
Error mention — audio_input added to the invalid-value error description
Response JSON — "bve_audio_input": false added to the list-response and single-model-response examples
BVE-injected fields — count updated from six to seven; bve_audio_input description added, distinguishing it from TTS (generate audio) and STT (transcribe files) models
GET /v1/models/:id — bve_audio_input added to the annotation list in the description paragraph and response JSON example

2026-05-30 (improvement loop, iteration 791)

obs(logger): add quota limit fields to 429 rate_limit_exceeded log entries

Four new structured log fields are now emitted only on 429 rate_limit_exceeded responses where an authenticated key is present. This makes quota rejection log entries completely self-contained — operators can see the configured limit alongside the rejection reason in a single wrangler tail line without querying the admin API:

Field	Description
`keyRpmLimit`	Configured requests-per-minute limit for the key
`keyRpdLimit`	Configured requests-per-day limit for the key
`keyMonthlyReqLimit`	Configured monthly request cap (absent for unlimited keys)
`keyMonthlyTokenLimit`	Configured monthly token cap (absent for unlimited keys)

Example 429 log entry:

{
  "level": "warn",
  "type": "request",
  "errorCode": "rate_limit_exceeded",
  "quotaReason": "Rate limit exceeded: 30 requests per minute",
  "keyId": "550e8400-e29b-41d4-a716-446655440000",
  "keyName": "ci-bot",
  "keyRpmLimit": 30,
  "keyRpdLimit": 500,
  "keyMonthlyReqLimit": 5000
}

The fields appear in the log only on 429 rate_limit_exceeded — they do not pollute every authenticated request entry. New wrangler tail filter recipe added to the Observability guide:

# Quota rejections with configured limits
bun run tail | jq 'select(.errorCode == "rate_limit_exceeded") | {keyId, keyName, quotaReason, keyRpmLimit, keyRpdLimit}'

2026-05-30 (improvement loop, iterations 788–790)

feat(models): add `bve_audio_input` annotation and `?audio_input=` filter to `GET /v1/models`

A seventh BVE-injected boolean field, bve_audio_input, is now present on every model entry returned by GET /v1/models and GET /v1/models/:id:

true — model accepts input_audio content blocks in POST /v1/chat/completions messages (gpt-4o-audio-preview, gpt-4o-mini-audio-preview, and dated variants)
false — model does not support inline audio input

A matching ?audio_input= query parameter filters the model list:

# Only models that accept audio input in chat completions
curl "https://api.bve.me/v1/models?audio_input=true" \
  -H "Authorization: Bearer sk-bve-YOUR_KEY"

This is distinct from TTS models (bve_category: tts, /v1/audio/speech) and STT models (bve_category: transcription, /v1/audio/transcriptions). The OpenAPI spec (GET /openapi.json) and the OpenAI-compatible validation layer were updated to cover this parameter.

feat(admin): add `top_models` field to `GET /admin/key-stats`

Each key row returned by GET /admin/key-stats now includes a top_models array, mirroring the top_endpoints breakdown on GET /admin/model-stats:

{
  "key_id": "550e...",
  "top_models": [
    { "model": "gpt-4o", "request_count": 142 },
    { "model": "claude-sonnet-4", "request_count": 38 }
  ]
}

The list is sorted descending by request_count within each key. Models are populated from request_logs_sampled using the same since/until/model/endpoint filters already supported by the endpoint. Keys with no model-bearing log rows return an empty array.

The query (getKeyTopModels) runs in parallel with getKeyUsageSummaryWithCount via Promise.all, adding no extra sequential D1 round-trip.

fix(models): add category-specific hints to `model_endpoint_mismatch` errors

model_endpoint_mismatch errors from POST /v1/audio/transcriptions and POST /v1/images/edits (multipart routes) now include a parenthetical describing the correct endpoint:

Sent model	Endpoint	Error message suffix
`tts-1`	`/v1/audio/transcriptions`	`(text-to-speech model, use POST /v1/audio/speech instead)`
`gpt-4o`	`/v1/audio/transcriptions`	`(chat model — this endpoint requires a speech-to-text model such as whisper-1)`
`imagen-3`	`/v1/audio/transcriptions`	`(image generation model, use POST /v1/images/generations instead)`

This matches the hints already present on JSON-body routes and makes the correction actionable without consulting the docs.

2026-05-30 (improvement loop, iteration 787)

Two commonly-needed pages were absent from the home page discovery surface:

Observability (guides/observability) — structured logs reference added in iteration 777 — now appears in the “Popular topics” card grid with a description covering log fields, wrangler tail recipes, and Cloudflare Logpush.
Troubleshooting (guides/troubleshooting) — the primary error-diagnosis page — now appears in “Popular topics” alongside Observability.

The Popular topics section grows from 6 to 8 cards (3 → 4 rows), keeping all existing cards intact.

Additionally, the sidebar entry for “Observability” now carries a { text: "New", variant: "tip" } badge — consistent with the “Responses API” and “Anthropic Messages API” sidebar badges — so users browsing the Guides section see it was recently added.

2026-05-30 (improvement loop, iteration 779)

smoke(models): add `?web_search=` filter coverage + admin allowlist `bve_vision`/`bve_web_search` checks

scripts/smoke-test.sh was missing regression coverage for the ?web_search= query param filter added in iteration 773 and the bve_vision/bve_web_search annotations on the admin model-allowlist endpoint. Added:

--- Model list ?web_search= filter --- section (14 new checks) — mirrors the existing ?reasoning= and ?vision= pattern:
- ?web_search=maybe → 400 with code=invalid_value, param=web_search
- ?web_search=True (uppercase) → 400 (case-sensitive guard)
- ?web_search=true → 200 with bve_web_search field present and gpt-4o-search-preview in the result
- ?web_search=false → 200 with gpt-4o present (non-web-search model guard)
Admin model-allowlist section — added two missing annotation presence checks:
- bve_vision field present in GET /admin/model-allowlist response
- bve_web_search field present in GET /admin/model-allowlist response

2026-05-30 (improvement loop, iteration 777)

docs(observability): add Observability & Structured Logs guide

Added a new “Observability & Structured Logs” guide page (guides/observability) documenting all structured JSON log fields emitted by the BVE Gateway per-request logger. Covers:

Full field reference table for all ~35 log fields (always-present, routing, latency, key, model/provider, upstream correlation, cache status, token usage, error/quota, Cloudflare metadata, worker version)
Provider-specific notes for OpenAI, Anthropic Claude, Google Gemini, Cohere, Groq/Meta Llama, and OpenRouter — including which header each provider uses for upstreamRequestId, Anthropic-only cacheWriteTokens, Gemini NDJSON streaming behavior, and Groq’s x-groq-processing-time
wrangler tail | jq filter recipes for common observability tasks (slow requests, provider filtering, cache hit rate, quota rejections, web-search traffic share)
Cloudflare Logpush integration steps (Workers Trace Events dataset)
A complete example log entry showing all fields in context

2026-05-30 (improvement loop, iteration 776)

docs(curl-examples): add streaming web search cURL example

guides/curl-examples.md had web search examples but no streaming variant. Added a new “Chat completion with web search (streaming SSE)” subsection immediately after the existing web search section with two ready-to-paste examples:

Basic streaming — gpt-4o-search-preview with stream: true and an empty web_search_options: {}
Streaming with context size and user location — gpt-4.1 with stream: true, search_context_size: "high", and a full user_location.approximate object

Includes a note on the SSE stream format (chat.completion.chunk deltas, terminated with data: [DONE]) and on the "webSearch": true structured log field emitted by the gateway for all requests that carry web_search_options.

2026-05-30 (improvement loop, iteration 769)

docs(curl-examples): add web search section

guides/curl-examples.md had no example for web_search_options even though chat-completions.mdx already documented the full parameter shape. Added a “Chat completion with web search” section with two ready-to-paste cURL examples:

Basic — gpt-4o-search-preview with an empty web_search_options: {} to trigger search with default settings
With context size and user location — gpt-4.1 with search_context_size: "high" and a full user_location.approximate object (city, region, country, timezone)

Includes an inline note on valid search_context_size values ("low" / "medium" / "high"), which location sub-fields are optional, and the 400 errors returned for invalid shapes. Links to the full Web search section in chat-completions.mdx.

2026-05-30 (improvement loop, iteration 762)

docs: add Claude extended thinking documentation + reasoning cURL examples

chat-completions.mdx — the thinking parameter (Claude extended thinking for claude-3-7-sonnet and later) was validated by the gateway but entirely absent from the docs.

Added:

thinking row to the parameters table with link to new dedicated section
6 new validation error rows covering all thinking.* constraint violations
### Claude extended thinking section with a full budget_tokens table, bash example, annotated JSON response shape, and constraint-violation reference

curl-examples.md — added a “Reasoning and extended thinking” section grouping all four provider-specific reasoning mechanisms into one place:

o-series reasoning_effort (low / medium / high / auto)
o-series max_reasoning_tokens (explicit token cap)
Claude thinking with budget_tokens ≥ 1024
Gemini 2.5 thinking_config.thinking_budget

Each example includes a brief note on valid values and links to the full API reference.

2026-05-30 (improvement loop, iteration 759)

docs(curl-examples): add function calling and audio output sections

Two commonly-used OpenAI-compatible features were undocumented in guides/curl-examples.md.

Function calling (tools) — added a two-step example showing how to send tool definitions, handle a tool_calls response, and return a tool-role message with the function result. Covers tool_choice: "auto", forcing a specific function, and disabling tool calling with "none".

Audio output (modalities + audio) — added two examples for gpt-4o-audio-preview:

Basic request with modalities: ["text", "audio"], voice, and format
Pipe-to-file example using jq to decode the base64 audio.data field directly to hello.mp3

Includes the full voice list (alloy, ash, ballad, cedar, coral, echo, fable, marin, nova, onyx, sage, shimmer, verse) and valid formats (mp3, opus, aac, flac, wav, pcm16, pcm24).

2026-05-30 (improvement loop, iterations 752–758)

docs(curl-examples): add vision chat completion section

guides/curl-examples.md had ?vision=true model filter examples but no working example of sending an image to a vision-capable model. Added a “Chat completion with image input (vision)” section with three ready-to-paste cURL examples:

HTTPS image URL: send a publicly accessible image alongside a text prompt
With detail: "high": control resolution for detailed analysis (e.g. screenshot OCR)
Base64 inline image: embed a data:image/jpeg;base64,... URL for direct image upload

The section links to ?vision=true for model discovery and to the full vision documentation in chat-completions.mdx for validation rules and SDK examples.

feat(observability): log `httpVersion` from `cf.httpProtocol`

The structured request log now includes an httpVersion field (e.g. "HTTP/1.1", "HTTP/2", "HTTP/3") sourced from request.cf.httpProtocol. Positioned after colo in the field order. Absent in local dev and on requests without CF metadata. Enables Logpush queries like WHERE httpVersion = "HTTP/2" without changing existing filters.

feat(validation): validate `input_audio` content blocks in chat completions

validateChatCompletionsBody now validates input_audio blocks in multimodal user message content. When a block has type: "input_audio", the gateway checks that input_audio is a non-null object with: data (required non-empty string), and format (required string from the set flac, m4a, mp3, ogg, wav, webm). Without this guard, malformed audio blocks reached Fuelix and produced opaque 422 errors. Nine unit tests cover the new validation path.

2026-05-30 (improvement loop, iterations 741–751)

docs(curl-examples): add model filter examples and fix allowlist response shape

guides/curl-examples.md — the quick-reference cURL page — had only a single bare “List models” example. It was also missing the bve_* fields that the GET /admin/model-allowlist/:model endpoint has returned since iteration 722.

New “List models — filtered” section with eight ready-to-paste examples: ?category=chat, ?vision=true, ?reasoning=true, ?provider=anthropic, ?endpoint=/v1/embeddings, ?search=gpt-4o, and two combined-filter examples (?vision=true&category=chat&provider=openai, ?reasoning=true&provider=openai).

Updated model allowlist inspect response examples now match the actual API shape: both the working-model and unregistered-model examples include bve_category, bve_provider, bve_reasoning, bve_vision, and the allowlist object (or null), consistent with admin-api/model-allowlist.md.

feat(model-filter): mark Llama 4 Maverick and Scout as vision-capable

llama-4-maverick-17b-128e and llama-4-scout-17b-16e — Meta’s April 2025 natively multimodal models — were incorrectly annotated bve_vision: false. Added 'llama-4-' to VISION_MODEL_PREFIXES in src/middleware/modelFilter.ts; both models and any future llama-4-* variants now return bve_vision: true and appear in ?vision=true results.

feat(scripts): add `--expired`, `--since`, `--until` flags to `key:list`

bun run key:list now exposes the ?expired=true, ?since=, and ?until= query parameters that the admin API already supported. Operators who saw an expiring_soon count in bun run stats can now list those specific keys without curling the API directly. The new --expired flag is mutually exclusive with --status. Filter labels and pagination hints carry the new flags forward.

refactor(usage): consolidate streaming usage recorder; remove dead code

recordUsageWhenReady and sampledLogCallback were the old streaming usage path. They were replaced on all four streaming route handlers by a unified recordStreamingUsage() in src/services/usage.ts (iter 746), then the old functions were removed entirely (iter 748), cutting ~100 lines of dead code and simplifying the usage.ts module. Ten dedicated unit tests were added for the new function first.

fix(security): validate `openai-processing-ms` header + `thinking_config.include_thoughts`

openai-processing-ms from OpenAI upstream is now validated as a numeric millisecond value before being forwarded to clients, preventing a potentially attacker-controlled header value from reaching end-users unchanged. thinking_config.include_thoughts in the chat completions body is validated as a boolean.

docs(model-allowlist): add `bve_vision` to allowlist response examples and field tables

admin-api/model-allowlist.md was missing bve_vision from its list/inspect response examples and field description table. Updated in iter 745.

2026-05-29 (improvement loop, iteration 740)

docs(audit-logs): add `before`/`after` fields to `api_key.updated` metadata

The audit-logs.md documentation listed the api_key.updated audit entry’s metadata as having only a changes array (e.g. ["rpm_limit", "name"]). In practice, src/services/keys.ts also records before and after objects containing the old and new field values keyed by snake_case field name — this was added in an earlier iteration but never reflected in the docs.

Updated response example now shows the full metadata shape:

"metadata": {
  "changes": ["name", "rpm_limit"],
  "before": { "name": "old-name", "rpm_limit": 60 },
  "after": { "name": "new-name", "rpm_limit": 120 },
  "source": "legacy_admin_api"
}

Updated action types table now describes all three metadata fields for api_key.updated:

changes — array of changed field names in snake_case
before — object mapping each changed field to its value before the update
after — object mapping each changed field to its new value

The before/after fields enable audit reviewers to reconstruct the exact change without querying the key record separately. For allowed_models, the value is a parsed JSON array (not a string). For expires_at, the value is an ISO 8601 string.

Also corrected the api_key.created response example: it was missing source: "legacy_admin_api" from the metadata, which is always added by the LEGACY_ADMIN_ACTOR.

2026-05-29 (improvement loop, iterations 729–737)

fix(images): set `X-BVE-Model` response header on `POST /v1/images/generations` success path

/v1/images/generations was the only authenticated endpoint that did not set the X-BVE-Model response header on a 200 success response. Every other endpoint sets this header via handlePassthroughResponse, but the images/generations 200 path had a bespoke success branch that only set X-BVE-Latency. Two lines were added to match the pattern used by all other authenticated endpoints.

Browser clients calling POST /v1/images/generations can now read X-BVE-Model from the response header to confirm which model served the request.

perf(usage): combine `recordUsage` + `logRequestSampled` into a single D1 batch

Non-streaming route handlers previously registered two separate ctx.waitUntil calls for background D1 writes — one for daily/monthly usage counters and one for the 1%-sampled request log insert. Each was a separate HTTP round-trip to Cloudflare D1.

Both writes are now batched into a single d1.batch() call, reducing background D1 requests from 2 to 1 for sampled requests and eliminating the second waitUntil entirely for the 99% non-sampled case.

A new buildUsageStmts() shared helper eliminates the duplicated SQL between the two code paths, and incrementUsageAndMaybeLog() in src/db/queries.ts appends the sampled log insert to the same batch when it applies.

docs: add Response headers sections to `images.mdx`, `audio.mdx`, and `legacy-completions.mdx`

Three API reference pages were missing the standard Response headers table that all other authenticated endpoint pages include.

Each page now documents: X-Request-Id, X-BVE-Client-Id, X-BVE-Latency, X-BVE-Model, X-BVE-Key-Name, and a note linking to the Rate Limits page for X-RateLimit-* headers. The audio TTS section includes a callout that TTS returns binary audio so BVE headers are read from HTTP response headers independently of the response body.

docs: add structured Gateway validation tables to `images.mdx`

images.mdx was the last API reference page with prose-only validation notes rather than the structured ### Gateway validation tables present on every other endpoint page.

Both the /v1/images/generations and /v1/images/edits sections now have:

A Required fields table (param / code / source field columns)
An Optional field constraints table covering all 12+ validated fields with their specific invalid_type or invalid_value constraint and exact error code

Validation is sourced directly from validateImagesGenerationsBody() and validateImagesEditsFormData() in src/validation/openai.ts.

2026-05-29 (improvement loop, iteration 722)

fix(models): correct `bve_vision` detection for Claude 3.5 Haiku

claude-3-5-haiku and claude-3-5-haiku-20241022 were in CHAT_MODELS and support image inputs in chat completions, but isVisionModel() returned false for them. The VISION_MODEL_PREFIXES covered claude-haiku-4 (4th-generation Haiku) but missed the 3.5-generation family. Added 'claude-3-5-haiku' prefix to fix the annotation — both models now receive bve_vision: true in GET /v1/models and GET /v1/models/:id responses, and are included in GET /v1/models?vision=true results.

Two regression tests added to isVisionModel — unit tests.

fix(logger): correct `msToIsoString` midnight-crossing timestamp bug

Replaced the getDateStrings() date-cache dependency with new Date(ms).toISOString(). The cache is keyed on the current time — not the argument — so a concurrent request crossing UTC midnight could leave stale D+1 date in the cache, producing incorrect timestamps for requests that started on day D. Regression test added.

docs: document `bve_vision`/`?vision=` filter and `is_expired` field

Three documentation pages updated to reflect features shipped in iterations 718–722:

api-reference/models.mdx

Added ?vision= to the query parameters table: true returns only vision-capable models (those accepting image inputs in message content); false returns non-vision models
Added three new cURL examples: ?vision=true, ?vision=true&category=chat, ?vision=true&provider=anthropic
Updated the error validation note to include vision in the invalid-value param list
Updated the response JSON example to include "bve_vision": true
Extended the BVE fields description from four to five entries with a clear explanation of bve_vision (image-accepting models; image-generation models are false)
Updated GET /v1/models/:id notes and response example to include bve_reasoning and bve_vision

admin-api/api-keys.md

Added "is_expired": false to the Create (201), Get (200), and List (200) response JSON examples
Added is_expired row to the Get API key field table: true when expires_at is non-null and in the past, false otherwise — lets clients detect expired-but-status=active keys without comparing date strings

api-reference/usage.mdx

Added "is_expired": false to the GET /v1/usage response JSON example
Added is_expired row to the Response fields table with the same semantics as the admin key response

2026-05-29 (improvement loop, iteration 708)

docs(response-headers): document `X-BVE-Client-Id` response header

The X-BVE-Client-Id response header was added in commit ba5b825 but was entirely absent from the public documentation. Three pages updated:

What it does:

When a client sends X-Request-Id with a valid value (alphanumeric + -_., ≤ 128 chars), BVE Gateway echoes it back in a separate X-BVE-Client-Id response header. This lets clients confirm their trace ID was received and logged — without querying the log stream.

The server-generated UUID always stays authoritative in X-Request-Id. X-BVE-Client-Id is the echo-only header; it is absent when the client did not supply X-Request-Id or the value failed the safe-character validation.

The header is exposed via CORS so browser-based SDK clients can read it after preflight.

Pages updated:

api-reference/chat-completions.mdx — added X-BVE-Client-Id row to the gateway response headers table
api-reference/streaming.mdx — added X-BVE-Client-Id row to the streaming response headers table
guides/security.md — added X-BVE-Client-Id row to the CORS exposed headers table

2026-05-29 (improvement loop, iteration 697)

docs(admin): LinkCard grids for five admin API “Next steps” sections

Five admin API reference pages had plain Markdown bullet lists for their “Next steps” sections, inconsistent with the rest of the documentation which uses Starlight LinkCard / CardGrid grids for visual navigation. Updated:

admin-api/api-keys.md — 7 link cards (Usage Statistics, Key Stats, Model Stats, Endpoint Stats, Audit Logs, Model Allowlist, Rate Limits & Quotas)
admin-api/endpoint-stats.md — 4 link cards (Key Stats, Model Stats, Request Logs, Admin API Overview)
admin-api/key-stats.md — 4 link cards (Per-Key Stats, Endpoint Stats, Usage Statistics, Admin API Overview)
admin-api/model-allowlist.md — 3 link cards (API Keys, Models, Errors)
admin-api/quota.md — 4 link cards (API Keys, Rate Limits & Quotas, Usage Statistics, Admin API Overview)

Each page also received the required import { LinkCard, CardGrid } from '@astrojs/starlight/components'; import. Build: 38 pages, 0 errors.

2026-05-29 (improvement loop, iteration 690)

docs(chat-completions): document `provider` OpenRouter routing object

The provider field was added to validateChatCompletionsBody in commit f79da16 (iteration 683) but was not documented anywhere in the public Chat Completions reference. Callers using OpenRouter’s provider routing (e.g. provider: { order: ["OpenAI", "Azure"] }) had no docs to cross-reference for field names, allowed values, or gateway validation rules.

Three additions to docs/src/content/docs/api-reference/chat-completions.mdx:

Request body table: new provider row linking to the new subsection
Optional field constraints table: 10 new validation rows covering the provider object and all sub-fields
New “OpenRouter provider routing (provider)” subsection under Provider-specific extensions with field table, cURL example, and validation rules

provider field reference:

Field	Type	Description
`order`	string[]	Preferred provider routing order
`only`	string[]	Restrict routing to listed providers only
`ignore`	string[]	Exclude listed providers from routing
`allow_fallbacks`	boolean	Whether to fall back to other providers on failure
`require_parameters`	boolean	Only route to providers that support all request parameters
`data_collection`	string	`"allow"` or `"deny"` for training data collection
`quantizations`	string[]	Restrict to providers with specific quantization levels
`sort`	string	Sort providers by criterion before routing

2026-05-29 (improvement loop, iterations 644–669)

docs(chat-completions): document `transforms` OpenRouter parameter

The transforms request body field was validated by the gateway since early iterations but was absent from the Chat Completions reference. It is now documented in the parameter table, the validation error table, and a dedicated section.

transforms (array of strings)

OpenRouter’s prompt-transformation pipeline applied before inference. The only currently documented value is "middle-out" — OpenRouter’s context-window compression algorithm that removes less-important middle tokens when a prompt exceeds the model’s context limit.

Non-array type → 400 invalid_type
Element not a string → 400 invalid_type
Element is empty string → 400 invalid_value
Empty array [] is accepted (no transforms applied)

Forwarded unchanged to Fuelix. Non-OpenRouter providers silently ignore the field.

feat(observability): extract `cachedTokens` and `reasoningTokens` from OpenAI usage

The structured JSON request log now includes two additional usage sub-fields extracted from OpenAI usage detail objects:

cachedTokens — from usage.prompt_tokens_details.cached_tokens — the number of prompt tokens served from the OpenAI prompt cache. Non-zero values indicate a cache hit and correspond to lower per-token cost.
reasoningTokens — from usage.completion_tokens_details.reasoning_tokens — the number of completion tokens consumed by internal chain-of-thought for o-series reasoning models.

Both fields are logged alongside the existing usage.total_tokens, usage.prompt_tokens, and usage.completion_tokens. They are 0 for models that do not emit these fields.

feat(models): 9 new OpenAI audio-preview and search-preview model IDs

The following models were previously absent from the gateway’s internal model registry, causing 403 model_not_available for valid API requests despite being documented in both the gateway validation and reference pages:

Added model ID	Category
`gpt-4o-audio-preview`	Chat
`gpt-4o-audio-preview-2024-10-01`	Chat
`gpt-4o-audio-preview-2024-12-17`	Chat
`gpt-4o-mini-audio-preview`	Chat
`gpt-4o-mini-audio-preview-2024-12-17`	Chat
`gpt-4o-search-preview`	Chat
`gpt-4o-search-preview-2025-03-11`	Chat
`gpt-4o-mini-search-preview`	Chat
`gpt-4o-mini-search-preview-2025-03-01`	Chat

All 9 models are categorised as chat, provider openai, and allowed on /v1/chat/completions, /v1/completions, and /v1/responses.

feat(validation): reject `n > 1` with `stream: true` for chat and legacy completions

Sending n > 1 alongside stream: true in a /v1/chat/completions or /v1/completions request previously forwarded to Fuelix, which returned an opaque 422 upstream error. The gateway now validates this combination and returns:

{
  "error": {
    "message": "n > 1 is not supported with stream: true",
    "type": "invalid_request_error",
    "param": "n",
    "code": "invalid_value"
  }
}

The 400 is returned before any upstream call, giving callers a clear, actionable error instead of a proxied 422.

feat(admin): `has_more` pagination field on `GET /admin/api-keys`

The GET /admin/api-keys response now includes a top-level has_more boolean alongside data and total. When has_more is true, there are additional keys beyond the current page that can be retrieved by incrementing the ?page= parameter. This matches the pagination shape of the /admin/request-logs and stats endpoints.

security(redact): Stripe and SendGrid key patterns added to `redactSecrets`

The redactSecrets() utility in src/services/fuelix.ts now recognises three additional provider key patterns, preventing accidental log leakage:

Pattern	Description
`sk_live_` / `sk_test_`	Stripe secret keys
`rk_live_` / `rk_test_`	Stripe restricted keys
`whsec_*`	Stripe webhook signing secrets
`SG.*` (base64 after dot)	SendGrid v3 API keys

These join the existing patterns for OpenAI (sk-…), Groq (gsk_…), Google (AIzaSy…), Anthropic (sk-ant-…), and Fuelix keys.

docs: LinkCard grids for errors, troubleshooting, and SDK guide “Next steps”

The “Next steps” / “See also” sections at the bottom of three reference pages were plain markdown bullet lists. They now render as visual Starlight LinkCard grids, consistent with the Getting Started onboarding pages updated in iteration 661.

Pages updated: errors.mdx, troubleshooting.md, sdk.mdx.

2026-05-28 (improvement loop, iteration 643)

docs(admin-stats): document cross-dimensional filters on key-stats, model-stats, endpoint-stats

Three admin stats reference pages were updated to document query parameters that were implemented in prior iterations but not yet reflected in the docs:

GET /admin/key-stats (key-stats.md):

Added ?model= — restrict the key leaderboard to a single model ID (added in iteration 628)
Added ?endpoint= — restrict to a specific endpoint path, e.g. /v1/responses (added in iteration 638)
total field now shown in the response example and response-fields table (added in iteration 632)
New cURL examples: model filter, endpoint filter, combined model + endpoint cross-filter
TypeScript example updated to destructure total from the response

GET /admin/model-stats (model-stats.md):

Added ?endpoint= — restrict model aggregates to a specific endpoint path (added in iteration 639)
New cURL examples: endpoint-only filter, endpoint + provider combined filter
New use case: “Find which models are used on a specific endpoint”

GET /admin/endpoint-stats (endpoint-stats.md):

Added ?model= — restrict endpoint aggregates to a single model ID
New cURL examples: model filter, combined key_id + model filter
New use case: “Find which endpoints a specific model is called through”

All three filters compose with each other and with the existing date-range (since/until), key_id, and provider filters documented on each page.

2026-05-28 (improvement loop, iteration 639)

feat(admin): `?endpoint=` filter for `GET /admin/model-stats`

Operators can now filter the model-stats leaderboard to a specific endpoint path (e.g. ?endpoint=/v1/responses) to find which models are used on that exact endpoint. When supplied, only request_logs_sampled rows for that endpoint are aggregated; models with no rows for the endpoint are excluded entirely.

Combines with the existing ?key_id= and ?provider= filters — for example ?endpoint=/v1/responses&provider=openai returns only OpenAI models seen on the Responses API endpoint.

The ?endpoint= filter uses the indexed endpoint column in request_logs_sampled and is evaluated inside the existing batched D1 call — no additional round-trip.

2026-05-28 (improvement loop, iteration 638)

feat(admin): `?endpoint=` filter for `GET /admin/key-stats`

Operators can now filter the key-stats leaderboard to a specific endpoint path (e.g. ?endpoint=/v1/responses) to find the top consumers of that exact endpoint. When supplied, only request_logs_sampled rows for that endpoint are aggregated; keys with no rows for the endpoint are excluded entirely.

Combines with the existing ?model= filter — ?model=gpt-4o&endpoint=/v1/responses returns only keys using gpt-4o through the Responses API.

The ?endpoint= filter uses the indexed endpoint column in request_logs_sampled and is evaluated inside the existing batched D1 call — no additional round-trip.

2026-05-28 (improvement loop, iteration 636)

docs(security): complete upstream header sanitization reference

The “Upstream header sanitization” section in Security Notes now documents every header category stripped by proxyToFuelix, including groups added in recent iterations that were missing from the reference:

api-key — Azure OpenAI SDK credential header stripped to prevent accidental Azure credential exposure
openai-beta — Stripped to prevent clients from enabling experimental Fuelix features; CORS note clarifies it is accepted by preflight but not forwarded
anthropic-dangerous-direct-browser-only — Stripped to prevent request routing under direct Anthropic billing
host — Stripped so the Workers runtime derives the correct Host from the upstream URL
Extended x-forwarded-* variants — x-forwarded-scheme, x-forwarded-ssl, x-forwarded-server, x-forwarded-port, x-forwarded-prefix now documented alongside the existing x-forwarded-host/x-forwarded-proto entries
Browser metadata headers (sec-*) — sec-fetch-site, sec-fetch-mode, sec-fetch-dest, sec-fetch-user, sec-purpose, sec-gpc and all other sec-* prefixed headers
SDK telemetry headers (x-stainless-*) — OS, language, SDK version, runtime headers emitted by Anthropic/OpenAI/Stainless SDKs
Gateway-internal headers (x-bve-*) — X-BVE-Worker, X-BVE-Model, X-BVE-Latency, X-BVE-Cache, X-BVE-Signature; these are response-only annotations that must never appear on upstream requests

The “Response header filtering” table now includes X-BVE-Worker (added in a prior iteration as a gateway-added response header).

2026-05-28 (improvement loop, iterations 617–618)

fix(security): redact model field in `modelFilter` warn logs

When a client sends a provider-key-format string as the model field (e.g. sk-proj-abc… or gsk_xyz…) the gateway rejects the request, but the raw credential string was being emitted verbatim in every model_blocked and param_validation warn log entry visible in wrangler tail / Logpush. A logSafeModel variable (the result of redactSecrets() applied once after the MAX_MODEL_ID_LENGTH guard) now replaces the raw model string at all seven console.warn() sites in modelFilter.ts. redactSecrets() is a no-op for legitimate model IDs so there is zero overhead on the non-adversarial path.

Five regression tests cover three warn log paths:

globally_blocked / not_available — sk-proj-*, gsk_*, sk-bve-* used as model names produce [REDACTED] in warn log fields
key_not_allowed — sk-ant-api03-* key used as a model name (explicitly allowed in D1) but blocked by per-key allowlist produces [REDACTED]
Positive regression — a legitimate unknown model ID (e.g. gpt-4o-future-model-…) is NOT redacted

docs(api-reference): `service_tier` — add `"flex"` as a valid value

The service_tier field in POST /v1/chat/completions accepts four values but the docs only listed three. The source (src/validation/openai.ts) has accepted "auto", "default", "flex", and "scale" since iteration 607. Both the request body table and the validation error table in api-reference/chat-completions.mdx now include "flex".

docs(api-reference): add `unsupported_value` to errors reference

The error code unsupported_value is returned by the gateway for o1-family reasoning model parameter restrictions (e.g. temperature ≠ 1, n > 1) but was not listed in the Errors reference page. Added to the Request errors table with a description distinguishing it from unsupported_parameter (parameter not allowed at all) vs unsupported_value (parameter is supported but its value is not allowed for the specific model).

2026-05-28 (improvement loop, iterations 605, 607, 610)

fix(validation): `response_format.json_schema.strict` and `.schema` type guards

Two new guards in the validateResponseFormatField() function prevent malformed response_format.json_schema objects from reaching Fuelix and producing opaque 422 errors:

json_schema.strict — when present and non-null, must be a boolean (true or false). Sending "true" (string) or 1 (integer) now returns 400 invalid_type with param: "response_format.json_schema.strict" instead of a confusing Fuelix 422.
json_schema.schema — when present and non-null, must be a plain object (not an array, number, or string). An array or primitive schema value now returns 400 invalid_type with param: "response_format.json_schema.schema".

Both guards apply to POST /v1/chat/completions and POST /v1/responses. Nine unit tests and three integration tests cover the new behavior.

fix(scripts): `parseDevVars` deduplicated; quote-stripping for `.dev.vars` values

scripts/check-env.ts contained a verbatim duplicate of the parseDevVars function already exported from scripts/lib/dev-vars.ts. The duplicate was removed and replaced with an import from the shared module.

During this audit, a related bug was found: neither implementation stripped surrounding quotes from .dev.vars values. Wrangler strips matching "..." / '...' pairs when it loads .dev.vars for the Worker, but the CLI scripts did not — so a developer who writes ADMIN_API_KEY="actual-key" would get the value "actual-key" (with quotes) in every script call, causing all admin HTTP requests to fail with 401.

scripts/lib/dev-vars.ts now calls a stripQuotes() helper on every parsed value, matching Wrangler’s behavior.

feat(scripts): `--help` / `-h` flags for quota and key-management CLI scripts

bun run key:quota, bun run key:update (manage-api-key.ts), and bun run key:new now accept --help / -h to print usage and exit, consistent with bun run key:list and other scripts. Running any of these commands with no arguments or with --help now shows a concise usage summary instead of hitting the API with a missing parameter.

docs(intro+sdk): fix `/v1/completions` streaming claim; expand SDK provider table

Two accuracy fixes in the documentation:

getting-started/introduction.md — the endpoint table row for POST /v1/completions incorrectly said “no streaming”. The legacy completions emulation layer (emulateCompletions()) has a full SSE streaming path that transforms chat completion chunks into text_completion SSE events. Updated to “emulated; streaming supported”.
guides/sdk.mdx — the overview provider table listed four rows (OpenAI, Anthropic, Groq, Cohere) but the page also covered Gemini and Mistral/Llama in dedicated sections. A developer scanning the table would see no mention of Gemini and might conclude it is unsupported. Gemini and Mistral/Llama rows added; column header updated from “SDK” to “SDK / Provider”.

2026-05-28 (improvement loop, iteration 603)

fix(dashboard): `overflow-x-auto` added to model stats table card

The “Top Models by Usage” table in the admin dashboard usage page has 8 sortable columns (Model, Requests, Tokens, Avg, p50, p95, Max, Err%). On narrow viewports (tablets, small laptops) the table was silently overflowing its card container because the card content wrapper had no horizontal scroll. The endpoint stats and key stats tables on the same page already had overflow-x-auto, so this was a missing consistency fix. Both the loading skeleton and the live data table are now wrapped.

docs(claude): add `account.ts` to CLAUDE.md source layout

src/routes/account.ts (GET /v1/usage) was missing from the Worker source layout section in CLAUDE.md. The route has been present and tested since its introduction but was omitted from the documentation, which could mislead future agents or developers browsing the layout.

2026-05-28 (improvement loop, iteration 601)

docs(admin): `sort_by` and `sort_dir` documented for `GET /admin/endpoint-stats` and `GET /admin/key-stats`

Both endpoints have supported sort_by and sort_dir query parameters since they were implemented, but the documentation never reflected this. The docs now list both parameters in the query parameters table, include three sorting cURL examples each, replace client-side jq sorts in the Use cases sections with native ?sort_by=… params, and add sort validation error rows to the Common errors tables.

GET /admin/endpoint-stats — sort_by accepts: request_count (default), error_count, avg_latency_ms, max_latency_ms, p95_latency_ms
GET /admin/key-stats — sort_by accepts: request_count (default), error_count, total_tokens, avg_latency_ms, max_latency_ms, p95_latency_ms

Both accept sort_dir: asc | desc (default desc).

2026-05-28 (improvement loop, iterations 583–600)

feat(admin): `sort_by` and `sort_dir` params added to `GET /admin/model-stats`

GET /admin/model-stats now accepts two additional query parameters:

sort_by — column to order results by. One of: request_count (default), error_count, total_tokens, avg_latency_ms, max_latency_ms, p95_latency_ms. Returns 400 validation_error for unrecognized values.
sort_dir — direction: asc or desc (default desc). Returns 400 validation_error for other values.

Previously the only way to get token-sorted or latency-sorted results was client-side jq manipulation. Examples:

# Top 5 models by total tokens
curl "https://api.bve.me/admin/model-stats?sort_by=total_tokens&limit=5" \
  -H "Authorization: Bearer $ADMIN_KEY"

# Highest p95 latency
curl "https://api.bve.me/admin/model-stats?sort_by=p95_latency_ms" \
  -H "Authorization: Bearer $ADMIN_KEY"

The admin dashboard Model Stats table also gained a sortable Max column (max_latency_ms) to match the existing Avg, p95, and Err% sortable columns.

fix(security): `redactSecrets` applied to `queue_send_failed` error messages in key provisioning

When EVENTS_QUEUE.send() fails during key lifecycle events (provision, rotate, revoke, suspend, unsuspend), the error message is now passed through redactSecrets() before being written to the structured log. Previously, a queue send failure whose Error.message contained a provider key pattern (e.g. from the runtime environment or a malformed binding) would be logged unredacted. The fix mirrors the same pattern already applied in middleware/quota.ts and handlers/queue.ts.

fix(models): D1-allowlisted models excluded from `models_unregistered_upstream` log

GET /v1/models emits a models_unregistered_upstream info log on every cache MISS listing model IDs returned by Fuelix that are not in the static registry — intended to help operators discover new upstream models. However, models that had already been registered via POST /admin/model-allowlist (stored in D1 with enabled=true) still appeared in this log. The filter now also excludes D1-allowlisted models, so the log only surfaces genuinely undiscovered models.

2026-05-28 (improvement loop, iterations 572–582)

fix(security): align query-param credential stripping with logger redaction

BLOCKED_QUERY_PARAMS (which strips params before forwarding to the upstream Fuelix account) and the proxyToFuelix slow-path strip list were missing apikey (no-separator variant used by some SDKs) and password (HTTP basic-auth param). The structured request logger already redacted both via CREDENTIAL_QUERY_REPLACE, but they were still being forwarded upstream. Both params are now stripped in all three locations.

Impact: A misconfigured client passing ?password=… or ?apikey=sk-bve-… in the URL would previously have the value forwarded to Fuelix. This is now blocked.

16 regression tests cover the fix (logger redaction + proxyToFuelix forwarding + BLOCKED_QUERY_PARAMS in openai.ts).

fix(security): Cerebras `csk-` key pattern added to `redactSecrets`

Cerebras Cloud API keys use a csk- prefix. The PROVIDER_KEY_PATTERN regex in fuelix.ts (used by redactSecrets()) now matches csk-[A-Za-z0-9_-]{8,} and replaces matched substrings with [REDACTED] in error bodies, log fields, and 404 messages. Without this fix, a Cerebras key embedded in a Fuelix error response body would be forwarded to the API caller and appear in wrangler tail / Logpush records unredacted.

fix(queries): renamed keys no longer appear as duplicate rows in `/admin/key-stats`

getKeyUsageSummary previously grouped by both key_id AND key_name. Because request_logs_sampled snapshots the key_name at request time, a key that was renamed appeared as two separate rows in the key-stats leaderboard — same key ID, different labels, both with partial counts. The outer GROUP BY is now key_id only, with MAX(key_name) to produce a single canonical name. All aggregates (request count, token sums, latency percentiles) are unaffected. One regression test verifies that a key with two different snapshot names collapses to a single result row.

feat(dashboard/usage): `maxLatencyMs` surfaced in admin dashboard Key Stats and Endpoint Stats

MAX(latency_ms) was already computed by the getKeyUsageSummary and getEndpointBreakdown SQL queries but was silently discarded before the dashboard API response. maxLatencyMs: number | null is now included in:

DashboardKeyStatRow and DashboardEndpointStat contract types
getDashboardKeyStats and getDashboardEndpointStats API mappings
Key Stats and Endpoint Stats CSV export columns in the admin dashboard Usage page

The public GET /admin/key-stats and GET /admin/endpoint-stats endpoints already returned max_latency_ms — this change brings the admin dashboard’s internal API into parity.

feat(logger): `X-BVE-Worker` response header

Every API response now includes an X-BVE-Worker header containing the Worker version UUID from CF_VERSION_METADATA when running in production (absent in local dev). This lets operators instantly identify the exact deployed version from curl -I or browser dev tools without tailing logs. The header is exposed via CORS so browser-based clients can read it. The version UUID is also logged as workerId in the structured request log for cross-correlation with the /health endpoint’s version.id field.

feat(dashboard/usage): Key Stats and Endpoint Stats CSV exports

The admin dashboard Usage page now supports CSV export on the Key Stats and Endpoint Stats tabs. Clicking the Export button downloads a CSV file containing all columns shown in the table, including maxLatencyMs. Previously there was no way to get this data out of the dashboard without using the Admin API directly.

2026-05-28 (improvement loop, iterations 552–572)

feat(dashboard): surface model upstream availability + fix provider display

Model upstream availability — the available field (whether a model is currently returned by the Fuelix upstream catalog) is now surfaced throughout the Models admin page:

Table row indicator: a small dot is shown to the left of each model ID. Green = available in the upstream Fuelix catalog; gray = not returned by upstream (may have been removed).
Detail panel warning: when a model is not in the upstream catalog, an amber warning banner appears in the inspector panel explaining that requests may fail.
“Upstream Offline” mini-card: replaces the former “Reasoning Nodes” mini-card and shows how many models are not currently returned by Fuelix. The card is clickable — clicking it activates the new Offline category filter tab to show only those models.
“Offline” filter tab: added to the category tab bar. When selected, the table shows only models where available === false. The tab badge uses amber to distinguish it from other category filters.

Previously, models in the D1 allowlist that had been removed from the Fuelix upstream catalog were invisible to admins — they appeared in the table but were indistinguishable from live models. This change makes it immediately obvious when an upstream model is gone.

Provider display fix — the provider column previously used CSS capitalize which rendered ‘openai’ as ‘Openai’ instead of ‘OpenAI’. This is now fixed with a providerLabel() helper that maps provider keys to their correct display names (OpenAI, Anthropic, Google, Meta, Mistral, Cohere, Cursor). The model detail panel also now uses the server-authoritative model.provider field instead of a client-side inference function.

Three new backend tests verify:

All models returned by the upstream catalog have available: true
All models have available typed as a boolean (never undefined)
The existing D1-only model test already covered available: false

2026-05-28 (improvement loop, iterations 552–571)

feat(dashboard): `bveReasoning` badge in Models page

Reasoning models (o3, o3-mini, o4-mini, and dated variants) now display a distinct Reasoning badge — a violet pill with a sparkle icon — in two places in the admin dashboard Models page:

Model detail panel — shown alongside the category badge in the panel header when the selected model is a reasoning model.
Models table — a small sparkle icon appears next to the category text in the Category column for each reasoning model row.

The “Reasoning Nodes” counter in the overview stats now uses the server-provided bveReasoning field from getDashboardModels (backed by isReasoningModel()) instead of the frontend inferring it from capabilities strings — making the count authoritative and consistent with the model registry.

Three new tests in test/admin-dashboard.test.ts verify:

bveReasoning is true for o4-mini
bveReasoning is false for gpt-4o and claude-sonnet-4-6
Every model in the response has bveReasoning typed as a boolean (never undefined)

feat(dashboard): expandable metadata rows in Audit Logs page

Clicking any audit log entry now expands an inline detail panel showing the full metadata JSON with syntax-highlighted formatting and a Copy JSON button. Previously, the Metadata column showed a truncated 80-character preview (reduced from 120 to leave room for the expand chevron), with no way to view the full payload without a separate API call.

How it works:

Each row in the Audit Logs table is now clickable. The active row highlights with a teal accent border.
Clicking an expanded row collapses it; clicking a different row closes the previous detail and opens the new one.
The expand/collapse chevron rotates 180° when the row is open.
The metadata JSON is rendered in a <pre> block with whitespace-pre-wrap and break-all for long key values. A Copy JSON button appears on hover in the top-right corner.
The copy button shows a checkmark for 1.5 s after a successful clipboard write.

No new endpoints or schema changes — the metadata payload is already present in the GET /admin/audit-logs response.

feat(openapi): `bve_reasoning` field in `/v1/models` response schemas

bve_reasoning is now listed in the required[] array for both the list response (GET /v1/models data items) and the single-model response (GET /v1/models/:id). Without required[], code generators (openapi-generator, orval) emit bve_reasoning?: boolean — optional despite always being present. Two new contract tests lock in the field’s required status.

feat(openapi): complete `required[]` sweep across all admin API endpoints

All admin API response schemas now have required[] arrays covering every non-optional field. Code generators will no longer emit T | undefined for fields that are always present in the response.

Schemas updated across iterations 563–567:

Endpoint	Schema(s)
`POST /admin/api-keys/{id}/reset-quota`	outer response
`GET /admin/api-keys/{id}/stats`	outer + inner `stats` object
`GET /admin/usage`	outer + `daily[]` items + `monthly[]` items
`GET /admin/audit-logs`	outer + `logs[]` items
`GET /admin/request-logs`	outer + `stats` aggregate + `logs[]` items
`GET /admin/model-stats`	outer + `models[]` items + `top_endpoints[]` items
`GET /admin/endpoint-stats`	outer + `endpoints[]` items
`GET /admin/key-stats`	outer + `keys[]` items
`GET /admin/stats`	outer + `keys` + `current_month` + `previous_month` + `isolate` sub-objects
`POST /admin/api-keys/bulk-revoke`	response
`POST /admin/api-keys/bulk-suspend`	response
`POST /admin/api-keys/bulk-unsuspend`	response
`GET /admin/model-allowlist/{model}`	outer + `registry` + `allowlist` sub-objects
`GET /v1/models`	`data[]` items (including `bve_reasoning`)
`GET /v1/models/{model}`	schema (including `bve_reasoning`)

15+ new contract tests (spread across iterations) verify the required[] arrays are present and enumerate the correct field names.

perf(quota/keys): batch D1 reset for `window=all`; hoist `Date.now()` in `checkQuota`

Batch quota reset: resetQuota(window=all) previously issued two sequential D1 HTTP round-trips (one for daily usage, one for monthly usage). Both are now batched into a single d1.batch() call using a new resetUsageForKeyBatch() helper — saving one round-trip per full-reset operation.

Date.now() hoisting: checkQuota() called Date.now() twice: once for the SWR cache age check and once as a nowSec fallback on the monthly-token-limit path. A single Date.now() call is now hoisted to function entry and reused in both places, eliminating the redundant syscall on the monthly-limit fast path.

fix(security): `password=` and `apikey=` query param redaction; Cerebras `csk-` key pattern

Two credential-scrubbing gaps in the structured request logger were closed:

password= / apikey= query params: The logger’s CREDENTIAL_QUERY_TEST / CREDENTIAL_QUERY_REPLACE regexes already sanitized api_key, api-key, access_token, secret, token, and key query params. password= (HTTP basic auth param) and apikey= (no separator) were missing. A client accidentally passing ?password=… or ?apikey=sk-bve-… would expose the raw value in every wrangler tail / Logpush record. Both are now redacted.

Cerebras csk- key pattern: Cerebras Cloud API keys use a csk- prefix. Added csk-[A-Za-z0-9_-]{8,} to PROVIDER_KEY_PATTERN in fuelix.ts. Without this, a Cerebras key embedded in a Fuelix error body would reach the API consumer and log stream unredacted.

16 new regression tests cover both fixes.

fix(validation): align numeric type-error messages to `'must be a number'`

Eight validators produced invalid_type errors whose message described the value constraint (“must be a positive integer”, “must be an integer >= 1024”) rather than the type mismatch — misleading for callers passing a string where a number was expected. All numeric type-error branches now say "${param} must be a number", while value-error branches retain their constraint-specific message. The code field (invalid_type) was always correct — only the human-readable message was fixed.

Fields corrected: seed, top_k (Messages API), thinking.budget_tokens, logprobs (completions), max_output_tokens, image generation n, output_compression, partial_images.

14 new regression tests verify the corrected messages and confirm that value-error branches still produce constraint-specific text.

docs: model-stats `?provider=` and `top_endpoints`; endpoint-stats `max_latency_ms`; key-stats `max_latency_ms`

Three admin API docs pages updated to document previously undocumented response fields and query parameters:

model-stats.md — added ?provider= query param (one of anthropic|cohere|cursor|google|meta|mistral|openai, returns 400 for unknown values), two cURL examples, and top_endpoints field documentation (the top-10 × limit endpoint paths for each model).
endpoint-stats.md — added max_latency_ms (was silently omitted from the response-fields table despite being present in the handler and OpenAPI spec since earlier iterations).
key-stats.md — added max_latency_ms to the response JSON example and response-fields table (same gap as endpoint-stats.md).

2026-05-27 (improvement loop, iterations 542–551)

fix(admin): normalize GET /admin/usage response fields to snake_case

GET /admin/usage previously returned Drizzle ORM objects with camelCase field names (keyId, requestCount, yearMonth, updatedAt, promptTokens, completionTokens, totalTokens), while every other Admin API endpoint already uses snake_case. External consumers got inconsistent field names depending on which endpoint they called.

Both the daily and monthly arrays are now normalized to snake_case:

Old (camelCase)	New (snake_case)
`keyId`	`key_id`
`requestCount`	`request_count`
`yearMonth`	`year_month`
`updatedAt`	`updated_at`
`promptTokens`	`prompt_tokens`
`completionTokens`	`completion_tokens`
`totalTokens`	`total_tokens`

Migration: Update any code that reads fields from GET /admin/usage to use the new snake_case names.

The admin dashboard uses a separate internal API (GET /api/dashboard/usage) that reads from Drizzle ORM directly and is unaffected.

feat(admin): expose `previous_month` in GET /admin/stats response

GET /admin/stats now includes a previous_month field alongside current_month, enabling month-over-month trend comparisons in a single request:

{
  "previous_month": {
    "period": "2026-04",
    "total_requests": 41200,
    "total_tokens": 7830400
  }
}

The data was already fetched by getGatewayStatsBatch() but was previously discarded in the handler. No additional D1 query is required.

feat(scripts): key:quota CLI for real-time quota status

bun run key:quota -- --id <uuid>
bun run key:quota -- --id <uuid> --remote

Calls GET /admin/api-keys/:id/quota and displays RPM/RPD/monthly request and token counters from the Durable Object and D1, with ASCII usage bars colour-coded by utilisation percentage (teal < 80%, amber 80–94%, rose ≥ 95%) and human-readable reset times. Follows the same pattern as key:new, key:list, key:rotate, key:revoke, key:suspend, and key:unsuspend — reads .dev.vars for local, accepts --remote for production.

fix(openapi): correct error_rate schema type and description

error_rate in the OpenAPI spec was declared as oneOf: [number, null] with a description saying it is a fraction in 0.0–1.0 and returns null when there are no requests. Both were wrong: errorRate() always returns a number (returning 0 for zero requests, never null), and the value is a percentage in the range 0.0–100.0 (e.g. 66.67 for two out of three requests erroring).

Fixed across all five stat endpoints (/admin/api-keys/{id}/stats, /admin/request-logs, /admin/model-stats, /admin/endpoint-stats, /admin/key-stats). No behavior change — only the spec contract is corrected.

2026-05-27 (improvement loop, iterations 487–527)

fix(security): 404 response path no longer reflects credentials

GET /some/route/sk-bve-abc123 previously included the raw URL path in the 404 error message body ("Route GET /some/route/sk-bve-abc123 not found"). A client that accidentally embedded a provider key in the URL path would see it reflected verbatim in the HTTP response, exposing it to response-logging proxies and API clients. redactSecrets() is now applied to the path before inclusion in the 404 message, consistent with the existing guards in the request logger and auth middleware.

feat(v1/models): `?endpoint=` and `?provider=` query parameters

GET /v1/models now accepts two additional filters:

?endpoint= — returns only models that support a specific gateway path. Accepted values: /v1/chat/completions, /v1/completions, /v1/embeddings, /v1/responses, /v1/messages, /v1/images/generations, /v1/images/edits, /v1/audio/speech, /v1/audio/transcriptions. D1-only custom models are excluded when this filter is active.
?provider= — returns only models from a specific creator. Accepted values: anthropic, cohere, cursor, deepseek, google, meta, mistral, openai, wasikan. D1-only custom models are excluded.

Both parameters compose cleanly with the existing ?search= and ?category= filters and with each other.

feat(v1/models): `bve_endpoints` and `bve_provider` BVE annotation fields

Every model object in the GET /v1/models and GET /v1/models/:id responses now includes three BVE-injected fields:

Field	Description
`bve_category`	Capability class: `chat`, `embedding`, `image`, `tts`, `transcription`, `ocr`
`bve_endpoints`	Exact API paths the model supports — use to route calls without guessing
`bve_provider`	Model creator: `anthropic`, `cohere`, `cursor`, `deepseek`, `google`, `meta`, `mistral`, `openai`, `wasikan`

GET /v1/models/:id now returns all three fields in parity with the list endpoint. D1-only custom models that don’t match a known prefix omit bve_provider, bve_category, and bve_endpoints.

# All Anthropic models
curl "https://api.bve.me/v1/models?provider=anthropic" \
  -H "Authorization: Bearer sk-bve-YOUR_KEY"

# Models that support the Embeddings endpoint
curl "https://api.bve.me/v1/models?endpoint=/v1/embeddings" \
  -H "Authorization: Bearer sk-bve-YOUR_KEY"

# Single-model lookup (includes bve_provider)
curl "https://api.bve.me/v1/models/gpt-4o" \
  -H "Authorization: Bearer sk-bve-YOUR_KEY"

feat(v1/usage): `key_name`, `status`, `expires_at`, and `allowed_models` fields

GET /v1/usage now includes four additional fields beyond the quota counters:

Field	Type	Description
`key_name`	`string`	Human-readable display name set at key creation
`status`	`"active" \| "suspended" \| "revoked"`	Current key lifecycle status
`expires_at`	ISO 8601 \| `null`	Expiry timestamp, or `null` for non-expiring keys
`allowed_models`	`string[] \| null`	Per-key model allow-list; `null` means no restriction

These fields let callers confirm their key identity and restrictions without a separate admin call.

feat(logger): `provider` field in structured request log

The per-request structured JSON log now emits a provider field immediately after model. The value is derived from getModelProvider() — the same function that populates bve_provider on the models list — and uses the same taxonomy: openai, anthropic, google, cohere, meta, mistral, cursor, deepseek, wasikan. The field is absent when model is not set (e.g. unauthenticated rejections, GET /v1/models) or when the model prefix is unrecognized.

This enables wrangler tail and Logpush filtering by provider without maintaining a separate model-prefix lookup table.

feat(admin): model allowlist operations now produce audit log entries

POST /admin/model-allowlist and DELETE /admin/model-allowlist/:model previously left no audit trail — blocking or enabling a model had zero trace in GET /admin/audit-logs. Both mutation handlers now write audit entries:

Action	Trigger
`model_allowlist.added`	New model added to the allowlist via Admin API
`model_allowlist.updated`	Existing model entry toggled via Admin API
`model_allowlist.removed`	Model entry deleted via Admin API

These entries use actor_type: "admin" (Bearer-token path) and are returned under ?action_category=models. The dashboard-initiated equivalents (model_allowlist.update, model_allowlist.delete with actor_type: "admin_user") were already recorded.

2026-05-27 (improvement loop, iterations 478–486)

feat(admin-api): date range filtering for GET /admin/api-keys

GET /admin/api-keys now accepts ?since= and ?until= ISO 8601 timestamps to filter keys by creation date, and ?expired=true to list only keys whose expires_at has passed. All parameters are combinable with the existing ?status=, ?name=, ?limit=, and ?offset= filters.

# Keys created in May 2026
curl "https://api.bve.me/admin/api-keys?since=2026-05-01T00:00:00Z&until=2026-05-31T23:59:59Z" \
  -H "Authorization: Bearer admin_bve_YOUR_ADMIN_KEY"

# Only expired keys
curl "https://api.bve.me/admin/api-keys?expired=true" \
  -H "Authorization: Bearer admin_bve_YOUR_ADMIN_KEY"

The list response now includes expires_at and last_used_at fields on each key object.

feat(scripts): key:revoke, key:suspend, key:unsuspend CLI scripts

Three new local scripts complement the existing key:new and key:list commands:

bun run key:revoke   -- --id <uuid>          # Permanently revoke a key (prompts for confirmation)
bun run key:suspend  -- --id <uuid>          # Suspend a key (reversible)
bun run key:unsuspend -- --id <uuid>         # Re-activate a suspended key

All three accept --remote to target production instead of local dev. key:revoke requires an explicit confirmation prompt before executing.

feat(admin-dashboard): Load More pagination in Transactions page

The Transactions page previously used previous/next page navigation that lost scroll position on each page turn. It now uses a Load More button that appends rows incrementally. “N of M entries loaded” replaces “Page X of Y”. Changing any filter resets to the first page automatically.

feat(admin-dashboard): monthly token quota notifications

The notification bell now alerts when a key is approaching or has exceeded its monthly_token_limit. Keys at 80–94% of their token budget trigger a warning notification; keys at ≥ 95% trigger an error. The daily cron handler also dispatches key_token_quota_warning and key_token_quota_exceeded queue events for webhook delivery.

feat(admin-dashboard): Models page sortable columns + p50/p95 latency

The Models stats table now supports sortable columns (Requests, Avg Latency, Error %) with a chevron indicator and persistent sort direction. p50 (median) and p95 latency columns were added — p95 is colour-coded (green / amber / rose) to highlight tail-latency outliers at a glance.

feat(admin-dashboard): per-key filter on Usage page

The Usage page now includes a key selector dropdown. Selecting a key shows daily and monthly usage for that key only, making it easy to track consumption per client without navigating to each key’s quota view.

2026-05-26 (improvement loop, iterations 471–474)

fix(csv-export): latency range-inversion guard for export endpoints

GET /admin/request-logs/export previously accepted inverted latency ranges (?min_latency_ms=500&max_latency_ms=100) silently — the main JSON endpoint correctly returned 400 for this, but the export route did not. Both endpoints now enforce the same validation.

feat(csv-export): monthly usage columns in API keys CSV

The API keys CSV export (GET /admin/api-keys/export) now includes Monthly Requests Used and Monthly Tokens Used columns, showing current-month consumption from D1 alongside the configured limits. Keys with no usage in the current month show 0.

fix(admin-dashboard): revoked keys visible in Transactions page filter

The key selector dropdown on the Transactions page previously excluded revoked keys, making it impossible to filter the request log by a key that had since been revoked. Revoked keys now appear in a separate Revoked optgroup, labeled in rose to distinguish them from active/suspended keys.

2026-05-26 (improvement loop, iteration 465)

feat(admin): bulk suspend and unsuspend API keys

Two new Admin API endpoints and a corresponding admin dashboard UI allow operators to suspend or unsuspend multiple API keys in a single operation — the natural complement to the bulk revoke capability added in iteration 459.

Admin API — new endpoints:

POST https://api.bve.me/admin/api-keys/bulk-suspend
POST https://api.bve.me/admin/api-keys/bulk-unsuspend

Both endpoints follow the same request shape as POST /admin/api-keys/bulk-revoke:

Field	Type	Required	Description
`ids`	string[]	Yes	Array of API key UUIDs. Min 1, max 100.

POST /admin/api-keys/bulk-suspend response:

{
  "suspended_count": 2,
  "suspended_ids": ["uuid1", "uuid2"],
  "total": 3
}

Only transitions keys from active → suspended; already-suspended or revoked keys are silently skipped.
Suspended keys are evicted from the module-scope auth cache immediately — subsequent API calls using those tokens receive 403 api_key_suspended without waiting for the 30-second cache TTL.
suspended_count reflects only keys that actually transitioned (idempotent).

POST /admin/api-keys/bulk-unsuspend response:

{
  "unsuspended_count": 2,
  "unsuspended_ids": ["uuid1", "uuid2"],
  "total": 3
}

Only transitions keys from suspended → active; already-active or revoked keys are silently skipped.
Reactivated keys are evicted from the auth cache so the updated active status is read from D1 on the next request.
unsuspended_count reflects only keys that actually transitioned.

Dashboard UI — bulk actions toolbar:

The API Keys table’s bulk-selection toolbar (shown when ≥ 1 key is selected) now exposes three bulk actions:

Button	Color	Behaviour
Suspend N	Amber	Opens a confirmation dialog; submits to `bulk-suspend`. Active keys in the selection are suspended; others skipped.
Unsuspend N	Teal	Opens a confirmation dialog; submits to `bulk-unsuspend`. Suspended keys in the selection are reactivated; others skipped.
Revoke N	Rose	Existing bulk-revoke (unchanged).

Each dialog describes the exact state transition and what gets skipped, and shows a spinner while the mutation is in-flight. All three buttons are disabled while any mutation is pending to prevent concurrent submissions.

Tests: 16 new tests in test/admin.test.ts — 8 for bulk-suspend (happy path, idempotency, skips-revoked, unknown-ids, auth-rejection, validation × 2, 401) and 8 for bulk-unsuspend (same structure). 3508 → 3524 tests.

2026-05-26 (improvement loop, iteration 464)

fix(stats): aggregate stats and error breakdown correctly apply upstream latency filter

Bug fixed: GET /admin/request-logs?min_upstream_latency_ms=X (and max_upstream_latency_ms=Y) correctly filtered the logs[] array to rows whose upstream_latency_ms fell within the requested range. However, the stats aggregate object (returned alongside the log rows) was still computed over all rows, ignoring the upstream-latency filter — so the total, avg_latency_ms, error_rate, and other aggregate fields described a different dataset than the rows shown.

The same discrepancy affected the stats.topErrorCodes breakdown.

Fix: aggregateRequestLogStats and getErrorCodeBreakdown now accept minUpstreamLatencyMs / maxUpstreamLatencyMs parameters and add the same AND upstream_latency_ms BETWEEN ? AND ? clauses already present in listRequestLogsSampled. Both the admin API route handler and the dashboard server layer thread these parameters through consistently.

Two new regression tests in test/admin.test.ts verify:

When filtering by min_upstream_latency_ms, the returned stats.total equals the number of rows in logs[] — not the unfiltered row count.
When filtering by max_upstream_latency_ms, the aggregate stats.avg_latency_ms reflects only the filtered rows.

2026-05-26 (improvement loop, iteration 459)

feat(admin): bulk API key revocation

Admin API — POST /admin/api-keys/bulk-revoke:

POST https://api.bve.me/admin/api-keys/bulk-revoke
Authorization: Bearer admin_bve_YOUR_ADMIN_KEY
Content-Type: application/json

{ "ids": ["uuid1", "uuid2", ..., "uuidN"] }

Field	Type	Required	Constraints
`ids`	string[]	Yes	1–100 UUIDs per request

Response:

{
  "revoked_count": 2,
  "revoked_ids": ["uuid1", "uuid2"],
  "total": 3
}

Idempotent: already-revoked keys in the ids list are silently skipped; revoked_count counts only keys that actually transitioned to revoked.
Revoked keys are evicted from the module-scope auth cache immediately — subsequent API calls receive 403 api_key_revoked without waiting for the 30-second cache TTL.
Returns 400 validation_error when ids is empty or exceeds 100 entries.

Dashboard UI — bulk selection toolbar:

The API Keys page gained a multi-select mode:

A checkbox column appears in the table header and each key row (owner and admin roles only).
A select-all checkbox in the header selects all non-revoked keys visible on the current page.
When ≥ 1 key is checked, a bulk actions toolbar slides in above the table showing the selection count, a Clear button, and a Revoke N selected button.
Clicking Revoke N selected opens a confirmation dialog with a warning; confirming the dialog calls POST /api/api-keys/bulk-revoke (CSRF-protected) and invalidates the keys list on success.
The toolbar shows a spinner while the mutation is in-flight; clearing the selection closes the toolbar.

Dashboard API — POST /api/api-keys/bulk-revoke (internal):

Requires an active owner or admin session and a CSRF token. Calls the new DB query, writes a single api_key.bulk_revoked audit log entry (with actor email and revoked IDs), and returns the same shape as the admin API endpoint.

Tests: 21 new integration tests covering happy path (multi-key), idempotency (skip already-revoked), suspended-key revoke, all-already-revoked (count 0), non-existent IDs (count 0), validation errors (empty array, >100, missing field, wrong type), auth failures (no key, wrong key), and auth-cache eviction (key is rejected immediately after bulk-revoke).

2026-05-26 (improvement loop, iteration 463)

docs(request-logs): add sort and upstream latency filter parameters

The Request Logs reference page now documents all query parameters and response fields added in iteration 458:

sort_by — sort field: created_at (default), latency_ms, upstream_latency_ms, status
sort_dir — sort direction: asc or desc (default desc)
min_latency_ms / max_latency_ms — filter by total gateway round-trip latency range
min_upstream_latency_ms / max_upstream_latency_ms — filter by Fuelix upstream latency range
upstream_latency_ms — response field on each log entry (null for gateway-rejected requests)

New cURL examples for sorting by slowest request, filtering by latency range, and isolating slow upstream responses.

docs(guides): add Troubleshooting page

New Troubleshooting guide in the sidebar with concrete remediation steps for the most common failure modes:

Auth errors: missing_api_key, invalid_api_key, api_key_expired, api_key_suspended, api_key_revoked
Model errors: model_not_allowed, model_not_available, model_endpoint_mismatch with per-endpoint hint table
Rate limit errors: rate_limit_exceeded with Retry-After usage and TypeScript/Python retry examples
Validation errors: missing_required_parameter with required-fields-by-endpoint table, request_too_large
Server errors: upstream_error (502), capacity_exceeded (503), internal_error (500)
Debugging tips: capturing X-Request-Id, checking quota via /v1/usage, looking up traces in request logs

2026-05-26 (improvement loop, iterations 451–455)

fix(admin): `?action_category=admins` accepted on `GET /admin/audit-logs`

GET /admin/audit-logs?action_category=admins previously returned 400 validation_error even though the query layer and admin dashboard both handled the admins category correctly. The public Admin API route now accepts all four categories: auth, keys, models, and admins. The OpenAPI spec action_category description was updated to list all four values.

The Audit Logs reference page now documents:

All four action_category values with their descriptions
Complete action type tables for each category (api_key.*, admin.login/logout, model_allowlist.*, admin_user.*)
A cURL example for ?action_category=admins
Accurate actor_type and target_type field descriptions reflecting all event classes

feat(logger): `routePattern` field in structured request log

The structured JSON request log now includes routePattern — the normalized Hono route template (e.g. /admin/api-keys/:id/usage) sourced from c.req.matchedRoutes. This enables grouping parameterized routes by pattern in Logpush / wrangler tail without the raw parameter values obscuring the endpoint shape. The field is absent when no route was matched (404s).

feat(security): provider key patterns redacted from logged request path

Requests whose URL path contains a provider API key pattern (e.g. sk-..., AIza...) now have that portion redacted to [REDACTED] in the structured log path field. This prevents accidental key leakage when a client embeds a key in a path segment rather than the Authorization header.

test(logger): unit tests for middleware rejection sampling path

Added 10 unit tests covering the logger middleware’s rejection sampling branch (the path that calls logMiddlewareRejectionSampled when quota or model-filter middleware blocks an authenticated request). The path now has full guard coverage: all-guards-met triggers waitUntil; each of 5 missing guards prevents the call; sampling skips when Math.random >= 0.01; error-swallow resolves cleanly.

2026-05-26 (improvement loop, iteration 447)

feat(dashboard): endpoint pills in Usage model stats + OpenAPI top_endpoints field

Usage page — endpoint pills in model stats table

The “Top Models by Usage” table in the Usage page now shows which API endpoint paths a model is being called through, displayed as compact sky-colored pills (e.g., CHAT, MESSAGES, EMBEDDINGS) below the usage bar in the Model column. Up to 3 endpoints are shown; if there are more, a +N overflow label appears. This data comes from the topEndpoints field added to model stats in iteration 446 — no additional API calls.

OpenAPI spec — top_endpoints field added to /admin/model-stats response schema

The GET /admin/model-stats 200 response schema now documents the top_endpoints array on each model stat item:

"top_endpoints": {
  "type": "array",
  "items": {
    "endpoint": "<path>",
    "request_count": <integer>
  }
}

Contract test updated to assert top_endpoints is present in the spec.

2026-05-26 (improvement loop, iteration 446)

feat(dashboard): per-model endpoint usage breakdown in model stats

The GET /api/model-stats response now includes a topEndpoints array on each model stat entry, showing which BVE Gateway endpoint paths the model was called through and how many sampled requests went through each.

Example:

{
  "model": "gpt-4o-mini",
  "requestCount": 500,
  "topEndpoints": [
    { "endpoint": "/v1/chat/completions", "requestCount": 420 },
    { "endpoint": "/v1/messages", "requestCount": 80 }
  ]
}

Entries are ordered by request count descending within each model. The breakdown is computed by a second getModelTopEndpoints D1 query that runs in parallel with the existing model aggregation, so it adds no serial latency.

The admin dashboard Models page renders this data in the model detail slide-over panel as a compact bar chart showing relative endpoint distribution for the selected stats period. This lets operators see at a glance whether a model is used primarily through the Chat Completions API, the Messages API (Anthropic-compatible), Embeddings, etc.

Type fix: Added errorCode?: string to the DashboardRequestLogsParams shared contract type. The field was already handled by the server route and sent by the frontend query function, but was missing from the canonical shared type — creating a stale contract definition.

2026-05-26 (improvement loop, iteration 445)

docs: add `?endpoint=` param to Models page; add `stats` object to Request Logs response

Models page — ?endpoint= query parameter

The GET /v1/models endpoint has accepted an ?endpoint= filter since the model-registry work in earlier iterations, and the OpenAPI spec was updated in iteration 444. The Models reference page now documents the parameter in the query parameters table with its full enum of accepted paths (/v1/chat/completions, /v1/embeddings, /v1/responses, etc.), two new cURL examples (filter by endpoint, combined endpoint + search), and the 400 invalid_value error shape for invalid endpoint values.

Request Logs page — stats aggregate object

The GET /admin/request-logs response has always included a stats aggregate object computed across all matching rows (not just the current page), but the Request Logs docs showed only total and logs. The full object is now documented:

Field	Meaning
`avg_latency_ms` / `max_latency_ms`	Mean and peak gateway-to-upstream latency
`p50_latency_ms` / `p95_latency_ms`	Median and tail-latency percentiles
`total_tokens`	Sum of `prompt_tokens + completion_tokens` across all matching rows
`error_count`	4xx + 5xx row count
`client_error_count` / `server_error_count`	Split of 4xx vs 5xx
`error_rate`	`error_count / total` fraction

The example JSON response in the docs now includes the stats block alongside total and logs.

2026-05-26 (improvement loop, iteration 438)

feat(dashboard): `clientErrorCount`, `serverErrorCount`, and `topErrorCodes` in transaction stats

The admin dashboard Transactions page now exposes richer error analytics in the aggregate stats returned by GET /api/request-logs:

clientErrorCount / serverErrorCount — The Error Rate KPI card now shows a secondary line breaking down errors by class (e.g., 12 4xx / 3 5xx), giving operators an at-a-glance picture of whether failures are client-side (bad API key, quota, model) or gateway/upstream (5xx).

topErrorCodes breakdown bar — When the current filtered result set contains any gateway-level error codes (e.g., rate_limit_exceeded, model_not_allowed), a Top Error Codes section appears between the KPI row and the log table. Each code is rendered as a clickable chip labelled with its count. Clicking a chip applies that error_code as a sidebar filter instantly; clicking the active chip deselects it. This lets operators quickly identify the dominant failure modes without hunting through individual log rows.

The breakdown is powered by a new getErrorCodeBreakdown() D1 query (parallel to the existing aggregateRequestLogStats call, so no extra latency) and is exposed as stats.topErrorCodes: Array<{ errorCode, count }> in the GET /api/request-logs JSON response.

refactor(proxy): extract `checkMultipartModel` helper in openai.ts

Identical 30-line model-validation blocks were duplicated across /v1/audio/transcriptions and /v1/images/edits multipart handlers. Extracted into a shared checkMultipartModel(c, model, endpoint) helper — same behavior, no behavior change, ~60 fewer lines in the route file.

2026-05-26 (improvement loop, iterations 435–437)

feat(transactions): `error_code` in request log + middleware-rejection visibility

Two gaps in the sampled request log closed in one batch:

1. error_code column in request_logs_sampled

The sampled request log (request_logs_sampled) previously stored no information about why a request was blocked at the gateway level. The errorCode context variable set by auth, quota, model-filter, and validation middleware was written to the structured console log but never to D1 — making it impossible to filter or browse error types in the Transactions page.

Added error_code TEXT to the table via migration 0010_request_logs_error_code.sql. All insertRequestLogSampled calls now pass the error code; GET /admin/request-logs?error_code=<value> filters accordingly.

Common error_code values:

`error_code`	Trigger
`rate_limit_exceeded`	RPM/RPD/monthly quota hit
`model_not_allowed`	Key’s `allowed_models` list or global allowlist block
`model_not_available`	Model disabled in registry
`invalid_api_key`	Key not found or invalid format
`api_key_expired`	Key past its `expires_at` timestamp
`api_key_suspended`	Key has status `suspended`
`request_too_large`	Body exceeds 10 MB limit

2. Middleware-rejection log visibility

Quota 429s and model-filter 403s are authenticated (the API key is resolved) but never reach a route handler, so the existing logRequestSampled path was never triggered for them. They were entirely invisible in the Transactions page.

Added logMiddlewareRejectionSampled() in services/usage.ts. The logger middleware calls it after await next() when:

apiKey is set (auth succeeded)
errorCode is set (middleware rejected the request)
upstreamLatencyMs is undefined (no upstream call was made)

security: strip `x-stainless-*` SDK metadata headers + `anthropic-dangerous-direct-browser-only`

x-stainless-* stripping — Anthropic, OpenAI, and other Stainless-generated SDKs attach x-stainless-lang, x-stainless-version, x-stainless-os, x-stainless-runtime, x-stainless-runtime-version, x-stainless-arch, and x-stainless-async to every request. These headers reveal client environment metadata (OS, language, SDK version) to the upstream provider without the caller’s explicit intent. Added to the combined sec-*/x-basicllm-* strip pass in proxyToFuelix.

anthropic-dangerous-direct-browser-only stripping — The Anthropic SDK sets this header to allow browser-to-Anthropic direct calls (CORS bypass). If forwarded through the gateway to Fuelix and then to Anthropic proper, it could route the request under Anthropic’s direct billing rather than the Fuelix account. Stripped unconditionally before forwarding.

obs(auth): structured error log for D1 failures

The lookupKey catch block in auth.ts previously swallowed D1 errors silently. A D1 outage caused all authenticated requests to return 500 with no structured trace, making the root cause indistinguishable from application bugs. Now emits a console.error with { level: 'error', type: 'auth_key_lookup_failed', requestId, path } — operators can alert on this type in Logpush or wrangler tail.

2026-05-26 (improvement loop, iteration 431)

obs(logger): `quotaReason` field in structured request log on rate-limit rejections

The quota middleware already emitted a console.warn with the specific reason string when a request was rejected (RPM, RPD, monthly requests, or monthly tokens). The main structured request log had no equivalent field — operators needed to cross-reference two separate log entries to see which limit was hit for a given request.

quotaReason is now included as a first-class field in the structured JSON request log line whenever a 429 is returned for a rate-limited request. No duplicate log entries are added; the existing quota warn log is preserved for backwards compatibility.

Example log line (rate-limited request):

{
  "level": "warn",
  "type": "request",
  "requestId": "...",
  "method": "POST",
  "path": "/v1/chat/completions",
  "status": 429,
  "latencyMs": 3,
  "errorCode": "rate_limit_exceeded",
  "quotaReason": "Rate limit exceeded: 60 requests per minute",
  "retryAfter": 42
}

Possible quotaReason values mirror the message field of the 429 response body:

`quotaReason`	Limit hit
`Rate limit exceeded: 60 requests per minute`	RPM cap
`Rate limit exceeded: requests per day`	RPD cap
`Monthly request limit exceeded`	Monthly request cap
`Monthly token limit exceeded`	Monthly token cap

2026-05-26 (improvement loop, iteration 429)

feat(admin-dashboard): current-month usage per API key in keys list with colour-coded progress bars

The API Keys list in the admin dashboard now shows current-month request and token consumption inline for every key — no need to open individual quota dialogs to spot keys approaching their limits.

Each key row gains a Monthly Usage cell with two compact rows:

Requests: used/limit (N%) with a thin progress bar
Tokens: used/limit (N%) with a thin progress bar

Colour coding:

Usage	Colour
< 80%	Teal
80–94%	Amber
≥ 95%	Rose

Keys without a monthly_limit show an unlimited indicator instead of a bar. Keys with no usage this month (freshly created) show nothing in the count slot. Single-key responses (key creation, rotation, revoke) do not include the monthly fields.

Data is fetched with a single extra D1 query (monthly_usage table) scoped to the current YYYY-MM period and joined to the keys list server-side — one round-trip per page load.

2026-05-26 (improvement loop, iteration 428)

fix(validation): enforce enum for `reasoning.summary` in Responses API

POST /v1/responses accepts a nested reasoning.summary string that controls whether and how reasoning summaries are returned. Previously this field was only type-checked (must be a string); any value — including undocumented ones like "verbose" or "full" — would pass and be forwarded to Fuelix, producing opaque 422 errors.

OpenAI’s Responses API accepts only three values:

Value	Behaviour
`"auto"`	Include a summary only when reasoning is present
`"concise"`	Always include a brief summary
`"detailed"`	Always include a detailed summary

Any other string now returns a clean 400 invalid_value with param: "reasoning.summary" instead of a Fuelix 422. This follows the same pattern as reasoning_effort, service_tier, and truncation enums.

2026-05-26 (improvement loop, iteration 427)

docs: fix api_key_expired error code gap + update introduction admin table

Three documentation accuracy gaps fixed:

api_key_expired error code was undocumented:

The gateway returns 401 api_key_expired when a key’s expires_at date has passed. This code was implemented but absent from both the Errors and Authentication reference pages — developers who encountered this response had no documentation to explain it.

Added to both pages:

api_key_expired row in the auth error code tables
Expired-key response body example in Errors
expired status row in the Authentication Key Statuses table, with a note that the key’s status field remains active and the expiry is controlled by expires_at
Explanation of how to reactivate (PATCH expires_at to a future date or null)

Introduction page admin API table was missing two routes:

GET /admin/endpoint-stats and GET /admin/key-stats (added in iteration 422, documented in iteration 425) were present in admin-api/overview.md but not in the Introduction page’s admin endpoint table. Both rows added.

2026-05-26 (improvement loop, iteration 425)

docs(admin-api): add Endpoint Stats and Key Stats reference pages

Two admin API endpoints added in iteration 422 had no public documentation. New pages created:

Endpoint Stats — GET /admin/endpoint-stats: query parameters, response fields, error-rate formula, curl and TypeScript examples, use-case snippets
Key Stats — GET /admin/key-stats: per-key leaderboard with token totals and latency percentiles, query parameters, examples

The Admin API Overview endpoint table was also updated with three previously missing rows (/admin/api-keys/:id/stats, /admin/endpoint-stats, /admin/key-stats). Both new pages are linked from the sidebar and the API Keys Next Steps section.

2026-05-26 (improvement loop, iteration 424)

fix(admin-api): normalize `error_rate` across all stats endpoints

Five admin stats endpoints computed error_rate with two different formulas — some returned values like 33.333..., others always returned 2 decimal places. Extracted a single errorRate() helper applied at all five call sites. All error_rate fields are now consistently rounded to 2 decimal places across every stats endpoint.

2026-05-26 (improvement loop, iteration 423)

feat(smoke): add 401 auth coverage for three missing admin routes

Three admin routes (GET /admin/api-keys/:id/stats, GET /admin/endpoint-stats, GET /admin/key-stats) were missing from the smoke-test’s auth-rejection section. All three now have corresponding 401-rejection checks in scripts/smoke-test.sh.

2026-05-26 (improvement loop, iteration 422)

feat: quota-approaching notifications in the admin bell + Weaviate credential redaction

Dashboard — key quota notifications:

The notification bell in the admin dashboard now surfaces proactive warnings when API keys are approaching or have reached their monthly request limit. Previously the bell only tracked key expiry (expiring soon / already expired).

New notification types:

Type	Trigger	Severity
`key_quota_warning`	Key has consumed 80–94% of its `monthly_limit`	warning
`key_quota_warning`	Key has consumed 95–99% of its `monthly_limit`	error
`key_quota_exceeded`	Key has consumed ≥ 100% of its `monthly_limit`	error

Each quota notification shows the utilisation percentage (X% used) in the badge, links directly to the key’s quota dialog, and is implemented via a D1 LEFT JOIN between api_keys and monthly_usage — no new endpoints or schema changes required. Keys without a monthly_limit are excluded. Suspended and revoked keys are excluded.

Eight new integration tests: 85% → warning, 95% → error severity, 100% → exceeded, 79% → excluded, no-monthly-limit → excluded, suspended key → excluded, expiresAt absent on quota notifications, usedPct present as number.

Security — Weaviate Cloud Service credential redaction:

wcs_[A-Za-z0-9_]{8,} added to PROVIDER_KEY_PATTERN in redactSecrets(). Weaviate Cloud Service API keys use the wcs_ prefix and can appear in Fuelix error bodies when a vector-DB retrieval or hybrid-search call is rejected by the upstream. Two new regression tests and one new row in the security guide table.

2026-05-26 (improvement loop, iteration 420)

fix(security): add Cohere and Voyage AI key patterns; fix NVIDIA doc entry

Two fixes and one new pattern in redactSecrets():

Bug fix — Cohere keys were not redacted: co_[A-Za-z0-9_]{8,} was documented in the security guide as a detected pattern but was missing from PROVIDER_KEY_PATTERN. Cohere v2 API keys use the co_ prefix and can appear in Fuelix error messages when a Cohere chat or embedding tool-call is rejected. Without this fix, Cohere credentials leaked verbatim into wrangler tail and Logpush records.

New pattern — Voyage AI: pa-[A-Za-z0-9_-]{32,} matches Voyage AI embedding API keys (pa- prefix + 32+ base64url chars). Voyage AI is a leading embedding provider used in RAG pipelines; its keys surface in Fuelix error messages when an embedding retrieval step is rejected. The 32-char minimum body length guards against false positives on unrelated short pa- strings (URL path segments, locale codes, etc.).

Docs fix — NVIDIA NIM prefix: The security guide incorrectly listed nv_ as the NVIDIA NIM key prefix. The actual PROVIDER_KEY_PATTERN regex uses nvapi- (the NVIDIA NIM / NGC API key format). The security guide table is now corrected.

Four new regression tests: positive match and negative (too-short) for co_ and pa- respectively.

2026-05-26 (improvement loop, iteration 419)

fix+feat(security): add Tavily AI and Pinecone key patterns to secret redaction

PROVIDER_KEY_PATTERN in redactSecrets() now matches two additional key prefixes:

tvly- (Tavily AI web-search API keys) — appears in Fuelix error messages when agent/RAG tool-calling flows forward auth to a Tavily search backend.
pcsk_ (Pinecone serverless API keys) — appears in Fuelix error messages when RAG pipelines include a vector-DB retrieval step using Pinecone.

Without this fix, keys from these providers would appear verbatim in every wrangler tail and Logpush record wherever redactSecrets() is invoked (logger, error handlers, queue consumer, admin dashboard routes).

Four new regression tests: positive match and negative (too-short) for each prefix.

2026-05-26 (improvement loop, iteration 418)

fix(security): redact provider key patterns from User-Agent field in structured logger

The structured request logger was recording the client’s User-Agent header verbatim (after slicing to 200 chars) without running redactSecrets(). Some SDK versions and debug HTTP clients embed API key material in their User-Agent string (e.g. openai-python/1.0.0 (api-key=sk-proj-...)). Without this fix those credentials appeared in every wrangler tail and Logpush record visible to anyone with log access.

Fix: redactSecrets is now applied to the sliced User-Agent before emitting the ua log field.

Six new regression tests: positive match (key present → redacted), negative (no key → unchanged), boundary-length (exactly 200 chars preserved), overlong (sliced before redact), and combined (two key patterns in one User-Agent).

2026-05-26 (improvement loop, iteration 413)

feat(logger): `gatewayOverheadMs` field in structured request log

The structured request log now emits gatewayOverheadMs = max(0, latencyMs - upstreamLatencyMs) on every proxied request. This makes it straightforward to distinguish gateway-side processing overhead (D1 auth queries, quota Durable Object checks, model-filter, request validation, response header filtering) from Fuelix upstream latency — without manual arithmetic in Logpush or wrangler tail.

Field semantics:

Present only when upstreamLatencyMs is set (a proxy call was made)
Absent for rejected requests (401/403/400 from middleware — no upstream call made)
Clamped at 0 when sub-millisecond imprecision causes upstreamLatencyMs to slightly exceed latencyMs
Positioned after slowUpstream in log field order

Example:

{
  "latencyMs": 342,
  "upstreamLatencyMs": 298,
  "gatewayOverheadMs": 44
}

2026-05-26 (improvement loop, iteration 412)

fix(proxy): streaming responses no longer truncated at 25 seconds

The 25-second hang-detection timer added in iteration 410 (AbortSignal.timeout(25_000)) was applied to the entire upstream fetch() lifecycle — including the body stream. For streaming chat completions, the Responses API, and the Messages API, this meant any response whose body took longer than 25 seconds to fully drain was silently aborted mid-stream: clients received a partial SSE stream or truncated NDJSON, with no error indication.

The fix replaces AbortSignal.timeout() with a manual AbortController + setTimeout. The hang-detection timer fires (returning 504 upstream_timeout) only if response headers do not arrive within 25 seconds. Once fetch() resolves with response headers, clearTimeout() cancels the pending timer — the body stream then continues until the upstream closes or the client disconnects, with no artificial limit.

Non-streaming responses (embeddings, images, TTS, audio transcriptions, completions, model list) are unaffected: their bodies are read synchronously before the function returns, so the timer was always cancelled before the stream issue could arise.

Before (broken): A 40-second streaming chat completion was cut off at 25 seconds.
After (fixed): The stream completes fully regardless of duration; only pre-header hangs trigger the 504.

feat(dashboard): manual refresh button on overview hourly chart

The 24-Hour Traffic Pattern chart previously showed “refreshes every 60s” with no way to trigger an immediate update. A Refresh button is now shown in the chart card header. Clicking it re-fetches both the overview KPI row and the hourly traffic data in parallel. The button shows an animated spinner while fetching and disables itself during in-flight requests to prevent double-submit. The chart footer also shows the exact UTC timestamp of the last data fetch.

2026-05-26 (improvement loop, iteration 411)

fix(type): add modelAllowlistCacheStatus to ContextVariableMap

modelAllowlistCacheStatus was set and read across modelFilter.ts, openai.ts, and logger.ts but was not declared in Hono’s ContextVariableMap interface, causing five TypeScript errors. The field is now declared in auth.ts with an optional string type, consistent with modelsCacheStatus.

feat(observability): modelAllowlistCacheStatus in structured request logs

isModelGloballyBlocked now returns { blocked, cacheStatus } so callers can propagate the D1 cache hit-rate to the request log. Routes that perform a per-model allowlist check (modelFilter, GET /v1/models/:id, POST /v1/audio/transcriptions, POST /v1/images/edits) set modelAllowlistCacheStatus on the Hono context; the logger writes it to the structured JSON log line. Values: HIT (< 20 s), STALE (20–30 s with background D1 refresh), MISS (> 30 s, synchronous D1 query). Use in Logpush/Tail to monitor D1 query rates and cache effectiveness.

fix(proxy): log sampled request for upstream_empty_image_result

/v1/images/generations HTTP 200 with empty data: [] was converted to a 400 error without writing a sampled log entry. The early return path now calls logRequestSampled and recordUsage, making image-generation errors visible in the Transactions page and usage stats.

feat(api): add errorRate to model and endpoint stats responses

getDashboardModelStats and getDashboardEndpointStats now include a precomputed errorRate (percentage, rounded to 2 decimal places). This matches the pattern of getDashboardKeyStats / DashboardKeyStatRow. DashboardModelStat and DashboardEndpointStat types are updated to include the field.

fix(dashboard): use precomputed errorRate in usage-page tables

Usage-page model and endpoint sort-by-error-rate now reads m.errorRate / ep.errorRate from the server instead of recomputing inline. The model stats CSV export also uses the server value, eliminating potential rounding divergence.

fix(dashboard): buildCurlCommand uses settings.apiUrl instead of hardcoded domain

The cURL replay command in the Transactions inspector hardcoded https://api.bve.me. It now reads settings.apiUrl from settingsQueryOptions and strips /v1 to reconstruct the gateway origin. Non-production deployments now generate correct replay commands.

2026-05-26 (improvement loop, iteration 410)

fix(dashboard): React Hooks violation in overview page

useState and useEffect for the clock widget were placed after an early return <OverviewSkeleton /> in OverviewPage. This violated React’s Rules of Hooks (hooks must be called unconditionally). The clock state/effect are now hoisted above the early return, so hook call order is always consistent.

fix(dashboard): hardcoded API URL in overview Copy action

The “Copy API URL” action in the overview’s Quick Controls card hardcoded https://api.bve.me/v1. It now reads apiUrl from the settings API so non-production deployments (staging, custom domains) copy the correct URL.

fix(dashboard): dead Cell coloring in hourly chart

The successCount bar in the 24-hour traffic chart used a Cell array where both branches of the ternary evaluated to the same teal color ('#0d9488'), making the code a no-op. Hours with errors now render the success portion in dimmer teal ('#0f766e') to visually distinguish mixed-quality hours from clean ones.

fix(api): NaN guard for monthly cap parseInt in dashboard

getDashboardOverview and getDashboardSettings used parseInt(env.MONTHLY_WORKER_REQUEST_SOFT_CAP, 10) without a NaN fallback. If the environment variable was missing or malformed, both functions returned NaN in the caps JSON, which JSON-serializes to null and broke the frontend cap-utilization calculation (division by null). Both now use || 0 as a safe fallback.

fix(proxy): upstream fetch timeout

proxyToFuelix now passes signal: AbortSignal.timeout(25_000) to the upstream fetch() call. Previously, a hanging Fuelix connection would silently consume the Worker’s 30-second wall-clock limit and surface as an opaque timeout with no structured error. The gateway now returns 504 with code: upstream_timeout when Fuelix doesn’t respond within 25 seconds, leaving 5 seconds for the response path.

fix(validation): remove n=1 restriction for o3/o4-mini reasoning models

The gateway was incorrectly rejecting n > 1 for o3, o3-mini, o4-mini, and future o-series models that are not in the o1 family. Unlike o1, these newer models support multiple completions. The o1-family’s REASONING_FIXED_PARAMS check (which enforces n=1) is now only applied to o1/o1-mini/o1-preview — other reasoning models pass n through to Fuelix as-is.

fix(dashboard): Blob double-read in API keys export

The CSV export in api-keys-page.tsx read res.blob() for the download URL, revoked the URL, then called blob.text() to count rows. The text is now read from the response directly before creating a Blob, guaranteeing the data is available for both the download and the row count toast.

fix(dashboard): search debounce cleanup on unmount

ApiKeysPage now cleans up any pending debounce timer when the component unmounts, preventing a setState call on an unmounted component.

fix(dashboard): graceful degradation when Fuelix models API is unavailable

getDashboardModels previously threw an unhandled error when the upstream models endpoint was unreachable, causing the dashboard Models page to crash with a 502. It now catches upstream failures, logs the error, and preserves the D1 policy state so operators can still manage the allowlist even when Fuelix is down.

At the time of this fix, the degraded response was effectively sourced from D1 only. The current dashboard contract has since been refined to return catalogAvailable, keep live catalog rows in models, and place preserved D1 rows in policyModels as allowlist_unverified while Fuelix is unavailable.

fix(tests): correct invalid_type vs invalid_value for type-mismatch validation errors

Six tests incorrectly asserted invalid_value for cases where a number field receives a string argument (type mismatch). Per the established convention (invalid_type = wrong type, invalid_value = wrong value of the right type), these tests now assert invalid_type. Affected: validateCompletionsBody unit tests for n/best_of; integration tests for top_k and top_logprobs.

2026-05-26 (improvement loop, iteration 409)

docs(sdk): add Gemini and Mistral/Llama SDK sections to the SDK guide

Two new provider sections added to SDK Usage:

Gemini models via OpenAI SDK — shows that Google’s Gemini family (including gemini-2.5-flash, gemini-2.5-pro, and Gemini 3.x) is accessible with the standard OpenAI SDK by pointing baseURL at https://api.bve.me/v1. Includes chat completions, streaming, and a note explaining the transparent NDJSON→SSE streaming adaptation that BVE Gateway performs for Gemini models.

Mistral / Llama models via OpenAI SDK — shows that Mistral (mistral-large-24.02, mixtral-8x7b-32768) and Groq-hosted Llama models (llama-4-maverick-17b-128e, llama-3.3-70b-versatile, etc.) are also accessible with the standard OpenAI SDK. Includes a provider/model table and a note explaining that Groq-hosted models are interchangeable between the OpenAI and Groq SDKs.

Both sections follow the same pattern as the existing Cohere section — demonstrating BVE Gateway’s multi-provider support through a single unified SDK interface.

2026-05-26 (improvement loop, iteration 407)

refactor(dashboard): migrate all remaining `useSuspenseQuery` to `useQuery` with placeholderData

All five dashboard pages that still used useSuspenseQuery for their primary data and role queries have been migrated to useQuery with placeholderData: (prev) => prev.

Effect: Data re-fetches triggered by mutations (e.g., blocking a model, creating/deleting a user, or toggling model allowlist entries) no longer cause a full-page <PageLoader /> flash. The previous data remains visible while the background re-fetch completes, matching the pattern used by Usage, Overview, and Settings pages since iteration 392.

Changes per page:

transactions-page.tsx — dead code removal: me was loaded via useSuspenseQuery(meQueryOptions()) but never referenced anywhere in the component. Removed the query and its two now-unused imports (useSuspenseQuery, meQueryOptions).
api-keys-page.tsx — migrated me; role check uses me?.user?.role ?? 'viewer' as a safe write-disabled default while data loads.
models-page.tsx — migrated both modelsQueryOptions and meQueryOptions; introduced allModels = data?.models ?? [] to guard all downstream array operations.
users-page.tsx — migrated both adminUsersQueryOptions and meQueryOptions; introduced allUsers = data?.users ?? [] for JSX counts and filtering; isSelf guard uses me?.user?.email ?? ''.
settings-page.tsx — migrated meQueryOptions and settingsQueryOptions; all me?.user?.* and settings?. uses updated with optional-chain null guards and sensible fallbacks ('—' for display strings, 0 for numeric caps).

test(models): add `getBveEndpoints` coverage for 6 model families

New unit tests in test/openai-compat.test.ts for edge cases in the model endpoint registry:

cursor-c-* and c-* short-form Claude aliases → [/v1/chat/completions, /v1/completions, /v1/messages]
Mistral chat models → [/v1/chat/completions, /v1/completions] (no Responses or Messages API)
Gemini 3.x chat models → [/v1/chat/completions, /v1/completions, /v1/messages]
GPT-5 family → [/v1/chat/completions, /v1/completions, /v1/responses]
gemini-3.1-flash-image → [/v1/images/generations, /v1/images/edits]

2026-05-26 (improvement loop, iteration 404)

fix(dashboard): notification bell deep-link — auto-open quota dialog for linked key

Symptom: The notification bell links to /api-keys?keyId=<uuid> when an API key is expiring or expired. Clicking the notification navigated to the API Keys page but nothing highlighted the relevant key — the keyId query param was silently discarded.

Root cause: apiKeysRoute in router.tsx had no validateSearch handler. TanStack Router drops search params on routes that don’t declare them, so keyId never reached the component.

Fix:

router.tsx: Added validateSearch to apiKeysRoute that parses keyId?: string (matches the existing transactionsRoute pattern). Route is now exported.
api-keys-page.tsx: Reads deepLinkedKeyId from apiKeysRoute.useSearch() via a one-shot useEffect (guarded by a useRef flag). On first trigger it sets selectedKeyId and opens the Real-time Quota Inspector dialog for that key, then navigates to /api-keys without the param so reloading does not re-open the dialog.
notification-bell.tsx: Removed the as Record<string, string> cast — no longer needed since validateSearch gives the route typed search params.

Result: Clicking an expiry notification now immediately opens the quota dialog showing real-time RPM/RPD/monthly usage for the affected key, without requiring the user to scroll through pages of keys to find it.

refactor(crypto): consolidate hex encoding — remove duplicate `bytesToHex` from passwords.ts

Iteration 403 added bufToHex in src/services/crypto.ts with a pre-computed 256-entry HEX_TABLE for O(1)-per-byte hex encoding (replaces two array accesses + bit-shift per byte). passwords.ts had its own unoptimized duplicate (HEX = '0123456789abcdef' + bytesToHex using for...of with byte >> 4 / byte & 0x0f).

crypto.ts: Changed bufToHex parameter type from ArrayBuffer to Uint8Array (more ergonomic — callers no longer need new Uint8Array(digest) wrapper for subtle.digest results).
passwords.ts: Deleted HEX constant and bytesToHex function; imported bufToHex from crypto.ts. Both generateRandomToken and sha256Hex now use the optimized table lookup.

test: quota reset window=minute unblock integration test

New it() inside describe('Admin API: quota reset'):

Creates a key with rpm_limit: 1.
Fires two requests — confirms the second is 429.
Calls POST /admin/api-keys/:id/reset-quota with { window: "minute" }.
Confirms a third request is no longer 429.

This is the first end-to-end test that verifies the minute-window DO counter reset actually unblocks real traffic (previous tests only checked the HTTP response shape).

test(openai-compat): tighten reasoning_effort enum assertion

Changed from expect.arrayContaining(['low', 'medium', 'high', 'auto']) to expect(reasoningEnum?.slice().sort()).toEqual(['auto', 'high', 'low', 'medium']). The new assertion is order-independent but also catches extra undocumented enum values being accidentally added to the spec.

2026-05-26 (improvement loop, iteration 403)

perf: Worker CPU optimizations — hex lookup table + OpenAPI spec JSON cache

bufToHex — full-byte lookup table (hot auth path):

hashApiKey is called on every authenticated request to hash the raw key for cache lookup. Internally, bufToHex converted each SHA-256 output byte using two single-char string lookups (HEX[b >> 4] and HEX[b & 0xf]) plus a bit-shift, resulting in 64 array accesses and 32 bit-shift operations per call.

Replaced with a pre-computed HEX_TABLE (256 entries, one per possible byte value), built once at module init. Each byte now requires a single indexed read. Halves array accesses per hashApiKey call on the auth hot path.

OpenAPI spec JSON caching:

GET /openapi.json previously constructed a large static spec object (~200 nested JS objects) and called JSON.stringify on every request. The spec never changes at runtime.

Added module-scope _specJson cache using the same lazy-init pattern as _fuelixBaseUrl in fuelix.ts. First request builds and serializes the spec; all subsequent requests in the same isolate return the pre-serialized string with zero object allocation. Response returned via new Response(_specJson, ...) — CORS and security header middleware continue to apply normally.

OverviewPage — useSuspenseQuery → useQuery migration:

The admin dashboard OverviewPage used useSuspenseQuery for its primary data fetches, which blocked rendering until data arrived. Migrated to useQuery with placeholderData: (prev) => prev (same pattern applied to SettingsPage in iteration 401). Added an OverviewSkeleton loading state with matching layout pulse animations so the page renders immediately on navigation. Guards added for !data || !me to prevent render before first fetch completes.

2026-05-26 (improvement loop, iteration 402)

docs: fix `reasoning_effort` accuracy — add `"auto"` and document `modalities`/`audio`

reasoning_effort — "auto" was missing from docs (4 places):

The gateway accepts "auto" as a fourth valid reasoning_effort value alongside "low", "medium", and "high". This has been validated in code and unit-tested since it was added, but was never documented. The docs reflected only three values, causing confusion for clients that probed the actual allowed set and found four.

Fixed in api-reference/chat-completions.mdx:

Parameter table description: added "auto" to the valid values list
Gateway validation table: corrected the invalid-condition wording to include "auto"
Reasoning models value table: added "auto" row (“Let the model pick the most appropriate reasoning level automatically”)
Example 400 error message: fixed from "reasoning_effort must be one of: high, low, medium" to "reasoning_effort must be one of: low, medium, high, auto" (matches the actual gateway error message)

modalities and audio — parameters validated but undocumented:

Both parameters are fully validated by the gateway (with unit tests covering 10+ cases each) but were entirely absent from the chat completions API reference. Clients who encountered gateway 400 errors for malformed modalities or audio fields had no documentation to explain the expected shape.

Added:

Parameter table: modalities and audio rows with full type and constraint descriptions
Gateway validation table: 10 new rows covering all validated conditions for both parameters
New “Audio output” section with request example, modalities values table, audio fields table, and a note on the format difference between audio.format and /v1/audio/speech format (pcm16/pcm24 vs pcm)

The OpenAPI spec at GET /openapi.json was already correct (already listed "auto" in the reasoning_effort enum and had modalities/audio schema entries).

2026-05-26 (improvement loop, iteration 401)

feat(dashboard): per-key 24-hour hourly activity chart in quota dialog

New: The API Key quota dialog now shows a 24-Hour Hourly Activity chart for the selected key, displayed as a stacked bar chart (teal = success, rose = errors). This gives operators an immediate “is this key active right now?” view alongside the existing 7-day daily history and performance analytics panels.

Backend — GET /api/hourly-usage gains ?key_id= filter:

The hourly-usage endpoint now accepts an optional key_id query parameter that scopes the aggregate query to a single API key. The getHourlyRequestBreakdown D1 query adds AND key_id = ? when the parameter is present, leveraging the existing request_logs_key_created_idx composite index on (key_id, created_at). Validation rejects empty strings and values exceeding 36 characters with a 400 validation_error. Viewer-readable (same auth level as the existing endpoint).

Frontend:

hourlyUsageQueryOptions() accepts an optional keyId parameter and encodes it as ?key_id= in the request URL; the query key includes keyId so per-key caches are isolated
A new hourlyQuery fires alongside quotaQuery and statsQuery when the quota dialog opens, scoped to the selected key
The stacked bar chart uses the same teal/rose color scheme as the overview page hourly chart; empty state shows “No sampled requests in the last 24 hours”

13 new integration tests:

?key_id= validation: empty string → 400, >36 chars → 400
?key_id= semantics: no matching logs → empty array, filters to only specified key’s rows (other keys excluded), error count accuracy per key, viewer access

fix(dashboard): settings-page `useSuspenseQuery` → `useQuery` migration

overview and adminUsers queries in SettingsPage were called with useSuspenseQuery, causing the page to block on a fresh <Suspense> load when those queries weren’t already cached. Migrated both to useQuery with placeholderData: prev => prev and added null guards (overview?.health.database, adminUsers?.users ?? []) so the page renders immediately with skeleton states while data loads. me and settings remain as useSuspenseQuery (always cached from the layout and fast respectively).

fix(cors): add `X-Groq-Processing-Time` and `X-Cohere-Request-Id` to CORS `exposeHeaders`

Gap closed: Both headers were added to SAFE_UPSTREAM_HEADERS in iteration 399 and are correctly forwarded to downstream clients, but were missing from exposeHeaders in cors.ts. Without this, browser JavaScript (e.g. Groq SDK or Cohere SDK in a SPA) could not read these headers from the Response object — CORS blocks access to non-listed response headers from cross-origin requests.

X-Groq-Processing-Time: Groq server-side inference duration in decimal seconds. Browser Groq SDK clients use this to distinguish network latency from inference latency.
X-Cohere-Request-Id: Cohere request correlation ID. Browser Cohere SDK clients need this for support ticket correlation.

Added 1 regression test (openai-compat.test.ts) verifying both headers appear in Access-Control-Expose-Headers. Added 2 smoke test checks (scripts/smoke-test.sh) for post-deploy validation.

2026-05-26 (improvement loop, iteration 400)

fix(proxy): strip `api-key` header from upstream requests (Azure OpenAI credential exposure)

Security hardening: proxyToFuelix already stripped x-api-key (Anthropic SDK auth header) and x-goog-api-key (Google Gemini) from upstream requests. The Azure OpenAI SDK authenticates using a different header — api-key (no x- prefix). Without this fix, a client that passed api-key: <azure-key> alongside their BVE Authorization: Bearer sk-bve-xxx header would have their Azure API key forwarded verbatim to Fuelix, exposing it to the upstream provider.

The fix adds upstreamHeaders.delete('api-key') to the credential-stripping block in proxyToFuelix, closing the last auth-header leak path for the Azure OpenAI SDK credential format.

2 new regression tests in test/security.test.ts:

api-key is stripped from the upstream request — Azure credential never reaches Fuelix
Belt-and-suspenders: api-key is stripped while the gateway’s FUELIX_API_KEY Authorization header is still injected correctly

2026-05-26 (improvement loop, iteration 399)

feat(proxy): forward Groq inference time and Cohere request-ID headers

Groq compatibility — x-groq-processing-time: Groq returns server-side inference duration as a float header (e.g. "0.4823" seconds). The gateway now forwards this header so SDK clients can separate network round-trip time from inference latency. Values are validated (non-numeric and negative values are dropped) and capped at 300 seconds to prevent unbounded values from upstream misconfiguration.

Cohere compatibility — x-cohere-request-id: Cohere includes a request correlation header for support ticket workflows, analogous to x-groq-request-id (Groq) and request-id (Anthropic). The gateway now forwards it so Cohere SDK clients can correlate gateway logs with Cohere’s internal records.

Both headers were previously stripped by the SAFE_UPSTREAM_HEADERS allowlist in filterResponseHeaders, making them invisible to SDK clients. Adding them to the allowlist closes the gap without relaxing any security boundary — all headers not in the allowlist continue to be filtered.

5 new tests in test/security.test.ts: x-groq-processing-time passthrough, 300 s cap enforcement, non-numeric drop, negative value drop, and x-cohere-request-id passthrough.

2026-05-26 (improvement loop, iteration 398)

fix(proxy): redact credentials in non-JSON 4xx upstream responses

Security hardening: When Fuelix returned a 4xx response with a non-JSON Content-Type (e.g. text/plain, text/html, or missing/empty content-type), proxyToFuelix previously passed the body through to the API consumer without applying redactSecrets. Provider key material can appear in plain-text or HTML error bodies from upstream auth layers or CDN error pages.

The fix adds an else branch in the 4xx handling block that buffers the non-JSON response body, applies redactSecrets, and returns it with the original status code. Since 4xx responses are never streamed (SSE only applies to 2xx success paths), buffering is always safe.

This closes the last remaining credential-leak path in the 4xx pipeline — all 4xx branches (JSON object, JSON array/null/primitive, invalid JSON, and now non-JSON) now apply redactSecrets before returning.

4 new regression tests in test/security.test.ts cover: text/plain, text/html, missing content-type, and safe non-credential body passthrough.

2026-05-26 (improvement loop, iteration 394)

feat(dashboard): Model detail inspector panel

Clicking any row in the Model Catalog & Controls table now opens a slide-in detail panel on the right side of the screen, giving operators a focused view of a model’s configuration and live usage data without leaving the page.

Panel contents:

Header: model ID (monospace), provider label, category badge, enabled/blocked status indicator
Capabilities section: all capability chips (chat, streaming, embeddings, audio.speech, etc.)
BVE Endpoints section: full /v1/... endpoint paths this model accepts, shown as indigo mono badges
Usage Stats section (respects the selected 7d / 30d / 90d period from the table header):
- Request count and total tokens
- Average latency, p50 (indigo), p95 (rose)
- Error rate %, 4xx count, 5xx count
- Prompt tokens vs. completion tokens split
Policy status card: green “Globally Active” or rose “Globally Blocked” with an explanation
Actions: “View Transactions for this model” deep-link (closes panel, navigates to /transactions?model=<id>); Block / Enable button (owner/admin only) with spinner while the mutation is in-flight

Interaction details:

Click a row to open the panel; click the same row again or press Escape to close it
The selected row gets a teal left-border highlight and subtle background tint
The Block/Enable button in the row’s Action column still works independently (click propagation stopped) — the panel reflects the result via optimistic update + query invalidation
Stats shown in the panel come from the same modelStatsQueryOptions query already loaded by the table — no extra network request

2026-05-26 (improvement loop, iteration 397)

fix(proxy): preserve body when 4xx upstream response has non-parseable JSON body

Bug fixed: When Fuelix returned a 4xx response with Content-Type: application/json but a body that could not be parsed as a plain JSON object — invalid/truncated JSON, or valid JSON that resolves to null, an array, or a primitive — proxyToFuelix consumed the body via response.text() but then fell through to new Response(response.body, ...). Since response.body is null after response.text(), the client received a 4xx with an empty body instead of the original error payload.

The fix adds a catch-all return at the end of the application/json branch: if JSON.parse fails or produces a non-object value, the consumed body text is returned with redactSecrets applied (consistent with all other 4xx paths). This prevents empty-body 4xx responses and also closes the secondary gap where provider key material embedded in invalid/array JSON upstream responses would bypass redactSecrets.

4 new regression tests in test/security.test.ts cover: invalid JSON (truncated), JSON null, JSON array, and credential redaction in invalid-JSON bodies.

2026-05-26 (improvement loop, iteration 393)

feat(dashboard): Model picker for API key allowed-models + streaming completions test

Model picker for API key allowed-models — The “Allowed Models” field in both the Create API Key dialog and the Edit Key form now shows a searchable, checkbox-driven model picker instead of a plain comma-separated text input. Operators can browse and select models from the live model catalog, see selected models as removable chips, use the search box to filter by ID/category/provider, and use “Select shown” / “Deselect shown” for bulk selection. This eliminates model-ID typos and makes per-key model restrictions much easier to configure correctly.

Streaming /v1/completions integration test — Added a full HTTP-path integration test (via worker.fetch) for the streaming completions emulation path (stream=true). Mocks the upstream SSE stream and verifies that the gateway correctly transforms it to text_completion SSE chunks with the cmpl- ID prefix, correct text content, logprobs: null, and data: [DONE] terminator.

Logger: requestSize field — The structured request log now includes requestSize (integer bytes from Content-Length) when the client includes this header. Helps operators correlate large request payloads with latency or body-limit rejections in wrangler tail / Logpush.

2026-05-26 (improvement loop, iteration 395)

obs(logger): add `requestSize` field + workerId/workerTag test coverage

New log field: requestSize — The structured request log now includes requestSize (integer bytes) when the client sends a Content-Length header. Absent for requests without a Content-Length (streaming uploads, GET, OPTIONS). Lets operators correlate request sizes with latency spikes or body-limit rejections in wrangler tail / Logpush without needing to capture raw request bodies.

Closed test gap: workerId / workerTag — The logger has emitted workerId (version UUID) and workerTag (deploy tag from --tag) since the CF_VERSION_METADATA binding was introduced, but these code paths had zero test coverage. Added 6 new logger unit tests covering: workerId present when binding is active, workerTag present when tag is non-empty, workerTag absent when tag is empty string, workerTag absent when no tag property, both absent in local dev (no binding), and correct field order (ua → workerId → workerTag).

Added 4 new requestSize tests: present with Content-Length, absent without, Content-Length: 0 is valid (maps to requestSize: 0), large value matches exactly.

17 new unit tests added to test/logger.test.ts. 3172/3172 total tests pass.

2026-05-26 (improvement loop, iteration 392)

feat(dashboard): Duplicate API Key

Operators can now clone an existing API key from the API Keys workspace without manually re-entering all configuration. The Duplicate Key action appears in each row’s action menu (owner/admin only).

What gets copied: name (with " (copy)" suffix), RPM limit, RPD limit, monthly request limit, monthly token limit, and allowed-models list.

What is NOT copied: expiry date (the clone is always a fresh, non-expiring credential until the operator sets one), usage counters, and quota state.

API: POST /api/api-keys/:id/clone — owner/admin + CSRF required. Returns HTTP 201 with the same { rawKey, apiKey } shape as POST /api/api-keys. Returns 404 if the source key does not exist.

The new raw key is shown immediately in the API Key Created dialog (same one-time-view flow as key creation and rotation).

2026-05-26 (improvement loop, iteration 391)

docs(models): add `bve_endpoints` to list response example; fix gpt-4o endpoints; add mapping table

Three accuracy and UX improvements to the Models reference page:

GET /v1/models list response example — The response JSON snippet previously showed only bve_category. It now also shows bve_endpoints, matching what the real API returns. Example:

{
  "id": "gpt-4o",
  "bve_category": "chat",
  "bve_endpoints": ["/v1/chat/completions", "/v1/completions", "/v1/responses"]
}

GET /v1/models/:id response example — The gpt-4o example showed bve_endpoints: ["/v1/chat/completions", "/v1/responses"]. Since iteration 390 added /v1/completions to all chat models via the completions emulation layer, the correct value is now ["/v1/chat/completions", "/v1/completions", "/v1/responses"].

New bve_endpoints mapping table — Replaces the old 6-row bve_category → endpoint table with a comprehensive table showing the exact bve_endpoints array per model family. Includes a note callout explaining the /v1/completions emulation for chat models.

2026-05-25 (improvement loop, iteration 382)

feat(dashboard): server-side search + pagination for API keys list

The API Keys page now fetches keys server-side with full search, status filter, and pagination support. Previously all keys were loaded at once (up to 100) and filtered client-side, which broke for large deployments.

Backend changes (GET /api/api-keys):

New query params: ?search= (LIKE match on key name), ?status=active|suspended|revoked|expired, ?limit= (default 50, max 200), ?offset=
Response now includes total: number and statusCounts: { active, suspended, revoked, expired } alongside keys[]
statusCounts always reflects the global count (no filter applied) so tab badges stay accurate regardless of active search
?status=expired is a special pseudo-status that queries WHERE status = 'active' AND expires_at <= unixepoch() — consistent with the existing overview count logic
?status=invalid returns a clean 400 validation error
DB layer: listApiKeys and countApiKeys both gain an expired?: boolean parameter

Frontend changes:

Search input debounces 500ms before issuing a server request (no more useDeferredValue client-side filter)
Status tab badges use data.statusCounts from the server response (accurate even when paginating)
Pagination: Prev/Next buttons with “Showing X–Y of N” and “Page N/M” indicator appear when totalPages > 1 (page size: 50)
Migrated from useSuspenseQuery to useQuery so the page renders immediately with a loading skeleton instead of blocking
Client-side sort still operates on the current page (for quick column re-ordering without an extra round-trip)

11 new integration tests in test/admin-dashboard.test.ts:

401 without session
Response includes keys[], total, statusCounts with correct types
?search= LIKE filter — exact match and no-match empty
?status=active|suspended|expired filter accuracy
?status=invalid → 400
?limit=2&offset=0 pagination
statusCounts reflects all keys regardless of current ?search= filter
Viewer role can read the list

2026-05-25 (improvement loop, iteration 381)

feat(audit-logs): server-side CSV export for audit log entries

The Audit Log page now exports all matching entries server-side (up to 5,000 rows), matching the Transactions page behavior.

What changed:

Backend (new endpoint):

GET /api/audit-logs/export — accepts all the same filters as GET /api/audit-logs (?category=, ?search=, ?since=, ?until=, ?key_id=, ?action=). Returns a UTF-8 BOM CSV file with Content-Disposition: attachment for reliable browser download. Limit: 5,000 rows per export.
CSV columns: ID, Timestamp, Action, Actor Type, Target Type, Target ID, Actor Email, Actor IP, Metadata
Actor Email and Actor IP are extracted from the metadata JSON blob (always present in login/action events)
Invalid ?category= values return a clean 400

Frontend:

The old client-side export (Blob + createObjectURL) only included the current page (≤ 100 rows) with 6 columns and no headers
Replaced with a fetch call to the new endpoint — exports all matching rows across all pages
Export button shows a Loader2 spinner and “Exporting…” label while in-flight; disabled to prevent double-click
Toast shows the exact row count on success (e.g. “Exported 1,234 audit log entries to bve-audit-logs-2026-05-25.csv”)

7 new integration tests in test/admin-dashboard.test.ts:

401 without session
200 with text/csv content-type
Header row + Content-Disposition: attachment shape
Data rows appear for seeded entries (incl. Actor Email and IP)
?category=keys filter selects only key events
Invalid ?category=invalid → 400
Viewer role can export
Security headers (x-content-type-options, cache-control)

2026-05-25 (improvement loop, iteration 378)

feat(dashboard): 24-hour hourly traffic chart + remove fake metrics

New: hourly traffic chart on Overview

The “Real-Time Traffic Flow” 7-day daily area chart on the Overview page has been replaced with a 24-hour stacked bar chart showing actual sampled request traffic per clock hour.

Backend:

GET /api/hourly-usage — new endpoint grouping request_logs_sampled by clock hour (SQL strftime('%Y-%m-%d %H:00:00', created_at / 1000, 'unixepoch')). Returns hours[] (hourTs, hourLabel in HH:MM format, requestCount, errorCount, successCount, avgLatencyMs) and windowHours. Default: 24 hours; max: 72 hours; configurable via ?hours=. 10 new integration tests.

Frontend:

Stacked BarChart (Recharts): teal bars = successful requests, rose bars = errors; refreshes every 60 seconds via refetchInterval
Summary stats below chart: peak hour (by request count), overall error rate, total sampled in window
Non-suspense useQuery so the rest of the Overview page renders immediately while the chart data loads
Empty state when no sampled logs exist yet

Remove fake metrics from app shell and command menu:

App shell “Sync Online” dropdown no longer shows hardcoded fake metrics (“14 ms latency”, “0.02s D1 replication lag”, “Webhook Queue: online”) — replaced with real Worker/D1 status labels and a link to Diagnostics
Fake “Maintenance Mode” checkbox and its handleMaintenanceToggle toast removed
Command menu: “Trigger D1 Database Backup” (setTimeout pretending to write a backup) removed; “Simulate Maintenance Mode Toggle” removed; “Create New API Key” now navigates to /api-keys instead of showing a toast; “Go to Support Desk” renamed “Go to Diagnostics”

OpenAPI spec improvements for POST /v1/chat/completions:

Better min/max constraints and description fields for all params
Added: parallel_tool_calls, max_reasoning_tokens, service_tier, modalities, audio, top_k, min_p, repetition_penalty, top_a, thinking_config, deprecated functions/function_call

2026-05-25 (improvement loop, iteration 380)

feat(validation): `store` + `max_reasoning_tokens` params for chat completions

Two OpenAI parameters previously forwarded unvalidated to Fuelix now produce clean 400 invalid_request_error responses for malformed values instead of opaque 422s from Fuelix.

store (boolean)

Controls whether the response is persisted for OpenAI model distillation or evals
A non-boolean value (e.g. store: 1, store: "yes") now returns 400 invalid_type with param: 'store'

max_reasoning_tokens (non-negative integer)

Sets an upper bound on how many tokens an o-series model (o3, o3-mini, o4-mini) may spend on internal reasoning before emitting the visible response
0 disables extended thinking; any positive value caps it
A fractional, negative, or non-integer value now returns 400 invalid_value with param: 'max_reasoning_tokens'

17 new unit tests in test/validation.test.ts: 7 for store, 10 for max_reasoning_tokens.

2026-05-25 (improvement loop, iteration 377)

feat(security): Active Sessions management in dashboard settings

Operators can now view and revoke their active dashboard sessions from the Settings → Operator Profile tab.

What changed:

Backend (3 new API routes):

GET /api/sessions — lists all active (non-revoked, non-expired) sessions for the authenticated user, newest first. Each session includes: id, createdAt, expiresAt, lastSeenAt, ipAddress, userAgent, isCurrent
DELETE /api/sessions/:id — revokes a specific session by ID. Scoped to the current user — cannot revoke another user’s session. Returns 404 if not found or already revoked. Requires CSRF token
POST /api/sessions/revoke-others — revokes all active sessions except the current one. Returns { revoked: N }. Requires CSRF token

Sessions service (sessions.ts):

listUserActiveSessions() — D1 query for non-revoked, non-expired sessions by userId
revokeUserSession() — scoped single-session revoke (userId guard prevents cross-user access)
revokeAllOtherSessions() — bulk revoke excluding current session ID

Frontend (Settings page):

New “Active Sessions” card in the Profile tab, full-width below Change Password
Displays each session with browser icon (Monitor/Smartphone), parsed browser name and OS from User-Agent string, IP address, signed-in time, last active time, expires time
“Current” badge for the active session
Per-row “Revoke” button (disabled for the current session) with loading spinner and query invalidation on success
“Sign out other devices” button (shown only when >1 session exists) to bulk-revoke with count toast
refetchInterval: 60_000 auto-refresh

10 new integration tests across 3 describe blocks in test/admin-dashboard.test.ts:

GET /api/sessions: 401 without session, correct shape, isCurrent flag, viewer access
DELETE /api/sessions/:id: 401 without session, 404 for non-existent, cross-session revoke + verification, cross-user rejection
POST /api/sessions/revoke-others: bulk revoke count, 401 without session

2026-05-25 (improvement loop, iteration 376)

fix(smoke): correct model-stats field names + add per-key stats smoke coverage

Two production verification gaps fixed in scripts/smoke-test.sh:

Model-stats field name correction

The smoke test was checking for camelCase "p50LatencyMs" / "p95LatencyMs" but GET /admin/model-stats returns snake_case p50_latency_ms / p95_latency_ms. The camelCase variants exist only in the internal TypeScript return type; the JSON response has always used snake_case. Added checks for error_count, error_rate, client_error_count, and server_error_count which were added in iterations 361–362 but had no smoke test.

New: GET /admin/api-keys/:id/stats smoke coverage

This endpoint was added in iteration 360 and documented in 373 but had zero production smoke test verification. New checks verify HTTP 200, all stats fields present (avg_latency_ms, p50_latency_ms, p95_latency_ms, max_latency_ms, total_tokens, error_count, client_error_count, server_error_count, error_rate), and HTTP 404 for unknown UUID.

2026-05-25 (improvement loop, iteration 372)

feat(dashboard): per-key model breakdown in quota dialog stats panel

When an operator opens the quota/stats dialog for an API key, the stats panel now shows the top 5 models that key has used during the selected time period (7d / 30d / 90d / All).

What changed:

DashboardApiKeyStats.topModels — array of { model, requestCount, totalTokens, errorRate } (empty when no logs)
getDashboardApiKeyStats now runs getModelUsageSummary(limit=5) in parallel with aggregateRequestLogStats — no extra latency
Frontend: compact “Top Models” section in the stats panel with proportional teal bars; hidden when empty
3 new integration tests: empty topModels, sort/accuracy, limit-to-5

2026-05-25 (improvement loop, iteration 373)

docs(admin-api): document missing fields and endpoints across 5 pages

Five documentation accuracy gaps fixed — all correspond to features added in iterations 320–362 that were never reflected in the public docs:

admin-api/api-keys.md — new GET /admin/api-keys/:id/stats section

The endpoint was added in iteration 360 and never documented. The new section covers:

Query parameters (?since=, ?until=)
Full response shape with key_id, total_requests, and stats object
Field table: avg/p50/p95/max_latency_ms, total_tokens, error_count, client_error_count, server_error_count, error_rate
Sampling caveat note
cURL examples for all-time and date-windowed queries
TypeScript alert example using error_rate
“Next steps” updated to include Model Stats

admin-api/stats.md — new keys.expired and keys.expiring_soon fields

Added in iteration 320 but absent from the docs. The response example and fields table now include:

keys.expired — active-status keys whose expires_at is in the past (return 401 api_key_expired)
keys.expiring_soon — active keys expiring within 7 days
keys.total description corrected to reflect the new 4-way sum (active + suspended + revoked + expired)
keys.active description updated: now only counts truly functional keys (non-expired)

admin-api/model-stats.md — new client_error_count, server_error_count, error_rate fields

Added in iterations 361–362 but absent from the docs. The response example and fields table now include:

client_error_count — HTTP 4xx count per model
server_error_count — HTTP 5xx count per model
error_rate — (error_count / request_count) × 100, rounded to 2 decimal places

admin-api/request-logs.md — new ?request_id= server-side filter

Added in iteration 341 but missing from the query parameter table. Added as a new row. The “Look up a request” cURL example was upgraded from a client-side jq filter to a direct ?request_id=<uuid> server-side query — much faster for high-volume deployments.

getting-started/introduction.md — endpoint table now includes /admin/api-keys/:id/stats

The Admin API endpoint table was missing this route entirely. Added as a new row with a description linking to the api-keys reference.

2026-05-25 (improvement loop, iteration 371)

feat(dashboard): usage analytics columns in the model catalog

The Model Catalog & Controls page now shows live usage data alongside the existing policy management controls, turning it into a full model performance dashboard.

New columns — Req (7d/30d/90d), Avg Lat., and Err% added to the table. Each row shows:

Requests — total sampled request count for the selected period (formatted as compact number, e.g. “1.2k”).
Avg Latency — end-to-end average latency, formatted as Xms or X.Xs.
Err% — error rate from 4xx + 5xx responses; color-coded amber ≥ 2%, rose ≥ 10%.
Models with no usage in the period show — in all three stats columns.

Stats period selector — A compact 7d / 30d / 90d tab bar in the table card header lets operators switch the stats window without navigating to the Usage page. A spinner shows while stats are loading in the background.

Fourth stat mini-card — “Active (Nd)” card alongside the existing three (Active Catalog, Globally Blocked, Reasoning Nodes) shows total request count and number of models with usage in the selected period.

Non-blocking — Model registry loads immediately via useSuspenseQuery; stats use useQuery (non-suspense) so the page renders instantly and stats populate progressively.

Backend — GET /api/model-stats limit cap raised from 100 → 200 to cover the full Fuelix catalog (103 models confirmed). Error message updated to match.

Tests: 5 new integration tests — empty array for future since, limit=200 accepted, since param filters to correct window, error rate computable from errorCount/requestCount (5 seeded logs: 3×200 + 422 + 503 → 40% error rate), viewer access.

2026-05-25 (improvement loop, iteration 368)

feat(dashboard): period-over-period trend indicators on the overview stat cards

The Monthly Requests and Monthly Tokens stat cards on the Overview page now display period-over-period trend badges showing how the current month compares to the same metric last month.

Trend badge — Appears inline with the card value. Positive deltas show a green +N.N% ↑ badge; negative deltas show a rose -N.N% ↓ badge; when no prior-period data exists (first billing cycle) the badge shows — no prior data.

Monthly Tokens detail — The “Aggregate consumption logged” detail line is replaced with vs X last month, giving operators a concrete prior-month baseline at a glance without navigating to the Usage page.

Server changes — getGatewayStats() now queries getGatewayMonthlyTotals for both the current and prior month in a single Promise.all. DashboardOverviewResponse gains a previousMonth field ({ period, totalRequests, totalTokens }). getDashboardOverview() maps it through.

StatCard component — Gains an optional trend: { deltaPct: number | null } prop. The new TrendBadgeEl helper renders the colored badge using TrendingUp/TrendingDown/Minus icons from lucide-react.

Tests: 3 new tests — previousMonth field types, period is prior calendar month, previousMonth.totalRequests reflects seeded monthly_usage rows.

2026-05-25 (improvement loop, iteration 367)

feat(dashboard): 4xx/5xx breakdown, date range filter, and deep link in per-key stats panel

The Key Monitor quota dialog on the API Keys page now shows significantly more useful performance detail:

4xx/5xx error breakdown — The “Error Rate” cell is joined by two new cells: 4xx Client Errs (amber when non-zero) and 5xx Server Errs (rose when non-zero). These come from clientErrorCount/serverErrorCount that aggregateRequestLogStats already computed; the fields were promoted from DB query through the DashboardApiKeyStats contract and wired into the UI.

Date range selector — A compact tab bar (7d / 30d / 90d / All) appears in the panel header. Selecting a range re-queries /api/api-keys/:id/stats?since=<ISO> so latency and error stats reflect only the chosen window (default: 30 days). The “No logs” empty state echoes the active window (“No sampled request logs yet for this key in the last 30 days”).

“View all transactions →” deep link — A teal text button at the bottom of the stats section closes the dialog and navigates directly to /transactions?keyId=<id>, opening the Transactions page pre-filtered to that key.

perf(fuelix): SSE chunk pre-filter — makeSSEExtractor gains modelCaptured state. Once the model string is captured from the first parseable SSE chunk, subsequent chunks that contain neither "usage", "type", nor "event_type" are skipped without JSON.parse. This reduces parse overhead from O(N) to O(1 + usage-chunks) for long OpenAI/Groq streams.

Tests: 2 new integration tests — clientErrorCount and serverErrorCount are accurate independently (4 seeded logs: 200, 400, 422, 502 → clientErrorCount=2, serverErrorCount=1) and since param filters out logs before the cutoff (old log 7 days ago + recent log, since=1 day ago → requestCount=1). Existing shape test updated to assert both new fields.

2026-05-25 (improvement loop, iteration 366)

docs(chat-completions): document `top_a` and `thinking_config` extension parameters

Two validated request parameters were missing from the Chat Completions API reference:

top_a — OpenRouter top-a sampling extension (range [0, 1]). A token is only sampled if its probability is ≥ top_a × P(max_token)². Added to the request body table and the gateway validation constraints table alongside the existing min_p entry. Gateway returns 400 invalid_type for non-numbers and 400 invalid_value for out-of-range values.
thinking_config — Gemini 2.5 thinking budget configuration ({ thinking_budget: N }). Allows callers to set or disable the model’s internal chain-of-thought token budget. Added to the request body table and validation constraints table. Gateway returns 400 invalid_type if the value is not an object, and 400 invalid_value if thinking_budget is negative, a float, or a non-number.

A new Provider-specific extensions section was added to the page with cURL examples for both parameters and a table explaining the thinking_budget values (0 = disable, positive = budget in tokens).

Both parameters have been validated by the gateway since iteration 363b but were not reflected in the public documentation.

2026-05-25 (improvement loop, iteration 356)

feat(api): `GET /v1/models/:id` annotated with `bve_category` and `bve_endpoints`

GET /v1/models/:id previously passed the Fuelix response through unmodified. It now annotates successful responses with the same bve_category and bve_endpoints fields that GET /v1/models (the list endpoint) injects into every model entry.

Before:

{ "id": "gpt-4o", "object": "model", "created": 1715367049, "owned_by": "openai" }

After:

{
  "id": "gpt-4o",
  "object": "model",
  "created": 1715367049,
  "owned_by": "openai",
  "bve_category": "chat",
  "bve_endpoints": ["/v1/chat/completions", "/v1/responses"]
}

This removes an inconsistency where clients needed to call GET /v1/models and find the target entry just to discover what endpoints a specific model supports. Now a direct GET /v1/models/:id call returns the complete picture.

D1-only models (registered via POST /admin/model-allowlist without a matching static registry entry) are returned without these fields when their capability type cannot be determined. Error responses (4xx/5xx from Fuelix) are passed through unchanged.

3 new integration tests added in test/openai-compat.test.ts:

gpt-4o → bve_category: 'chat', bve_endpoints contains /v1/chat/completions
text-embedding-3-large → bve_category: 'embedding', no /v1/chat/completions in endpoints
ml-detail-no-cat (test fixture) → no bve_category or bve_endpoints fields

Docs updated: docs/src/content/docs/api-reference/models.mdx now documents the GET /v1/models/:id response shape with bve_category/bve_endpoints example. Also documents 3 API reference pages that now include “Next steps” navigation sections (Files, Assistants, Vector Stores).

2026-05-25 (improvement loop, iteration 348)

docs(authentication): complete CORS header tables — add `Anthropic-Beta`, `x-api-key`, and 6 missing Anthropic rate-limit headers

The Authentication page CORS section was missing entries that have been present in src/middleware/cors.ts since they were added:

Allowed request headers (newly documented):

Anthropic-Beta — Anthropic beta feature flags (e.g. interleaved-thinking-2025-05-14)
x-api-key — Alternative key header accepted by the Anthropic SDK in browser contexts; the gateway maps it to Authorization: Bearer and strips it before forwarding to Fuelix

Exposed response headers (newly documented):

Anthropic-RateLimit-Input-Tokens-Limit/Remaining/Reset — Anthropic per-request input-token rate limits (lets browser clients distinguish prompt vs. output budget exhaustion)
Anthropic-RateLimit-Output-Tokens-Limit/Remaining/Reset — Anthropic per-request output-token rate limits

Previously only the combined Anthropic-RateLimit-Tokens-* and Anthropic-RateLimit-Requests-* groups were listed; the input/output-direction headers (added to the allowlist alongside the rest) were silently omitted from the docs.

2026-05-25 (improvement loop, iteration 345)

feat(models): `bve_endpoints` field in `GET /v1/models` response

Every model entry returned by GET /v1/models now includes a bve_endpoints string array that lists the BVE API paths the model can be called at. This makes it straightforward for multi-provider SDK authors and API consumers to route requests without needing to infer the correct endpoint from the bve_category field alone.

Example model object (Claude):

{
  "id": "claude-sonnet-4",
  "bve_category": "chat",
  "bve_endpoints": ["/v1/chat/completions", "/v1/messages"]
}

Endpoint mapping by provider:

Provider	Model example	`bve_endpoints`
OpenAI GPT	`gpt-4o`, `gpt-4.1`	`["/v1/chat/completions", "/v1/responses"]`
OpenAI O-series	`o3`, `o4-mini`	`["/v1/chat/completions", "/v1/responses"]`
Anthropic Claude	`claude-sonnet-4`	`["/v1/chat/completions", "/v1/messages"]`
Google Gemini	`gemini-2.5-pro`, `gemma-4-31b-it`	`["/v1/chat/completions", "/v1/messages"]`
Groq/Llama	`llama-4-maverick-17b-128e`	`["/v1/chat/completions"]`
Cohere	`command-r-plus`	`["/v1/chat/completions"]`
Mistral	`mistral-large-24.02`	`["/v1/chat/completions"]`
OpenAI Embeddings	`text-embedding-3-large`	`["/v1/embeddings"]`
Cohere Embeddings	`embed-english-v3.0`	`["/v1/embeddings"]`
OpenAI TTS	`tts-1`, `tts-1-hd`	`["/v1/audio/speech"]`
Whisper / STT	`whisper-1`, `gpt-4o-transcribe`	`["/v1/audio/transcriptions"]`
Image generation	`imagen-4`, `imagen-4-ultra`	`["/v1/images/generations", "/v1/images/edits"]`
OCR	`mistral-ocr`	`["/v1/chat/completions"]`
Legacy completions	`gpt-3.5-turbo-instruct`	`["/v1/completions"]`

D1-only custom models (registered via POST /admin/model-allowlist) may omit bve_endpoints when their capability type cannot be determined from the static registry.

The GET /v1/models OpenAPI description is also updated to mention both bve_category and bve_endpoints.

2026-05-25 (improvement loop, iteration 344)

fix(routes): uniform `MAX_MODEL_ID_LENGTH` guard and `errorCode` logging for `POST /v1/audio/transcriptions`

The multipart handler for /v1/audio/transcriptions now enforces the same 200-character model ID cap that /v1/images/edits (added in iteration 343) and the JSON-body modelFilter middleware apply to all other endpoints.

Changes:

A 201+ character model ID in the model FormData field now returns 400 invalid_value immediately, before any D1 or Durable Object calls — identical to the guard on the images/edits multipart path.
All four model rejection branches (invalid_value, model_not_available, model_not_allowed, model_endpoint_mismatch) now set c.set('errorCode', ...) so the error code is captured in sampled request logs — matching the pattern used in images/edits.
2 integration tests added: oversized model ID (201 chars) → 400; boundary model ID (200 chars) → not rejected.

Before: A 201-character model field bypassed the length guard, reached isModelGloballyBlocked, and produced an opaque model_not_available response (because the oversized ID is not in the registry). Sampled logs also omitted the errorCode field for all transcription model rejections.

After: All model rejections at the transcription endpoint are logged with their error code and the length guard fires first for oversized IDs.

2026-05-25 (improvement loop, iteration 343 — originally listed here, now moved below)

Dashboard — Webhook test button in Gateway Config settings

Operators can now verify their webhook configuration directly from the admin dashboard without modifying API keys or waiting for a real event.

New: “Send Test Event” button (Gateway Config → Webhook Integration)

The Webhook Integration card in Settings → Gateway Config now includes a Send Test Event button (visible to owners and admins):

Sends a signed gateway.test JSON payload to the configured WEBHOOK_URL.
If WEBHOOK_SECRET is configured, the payload is signed with X-BVE-Signature: sha256=<hmac> — identical to production webhook signing.
Follows the same HTTPS-only, no-redirect policy as production webhook delivery.
Shows inline result: delivery status (success/failure), HTTP status code, and round-trip latency in milliseconds.
Error messages (network failures, redirects, misconfigured URL) are displayed inline below the button.
The button is disabled when WEBHOOK_URL is not configured and hidden from viewer-role users.

New: POST /api/webhook-test

Requires active session + CSRF token.
Requires admin or owner role (viewers are blocked with 403).
Returns 400 webhook_not_configured when WEBHOOK_URL is absent.
Response shape: { success, status?, durationMs, error? }.
Enforces 10-second timeout via AbortSignal.timeout.
Payload: { gateway, event: "gateway.test", test: true, message, timestamp }.

Also: Permissions matrix fix

The Team & Access tab permissions matrix now correctly shows “Create Admin Users” as owner-only (not CLI-only). The POST /api/admin-users endpoint (added in iteration 333) is owner-restricted; the matrix was outdated.

2026-05-25 (improvement loop, iteration 343)

fix(routes): model availability and allowlist checks for `POST /v1/images/edits` multipart requests

Multipart POST /v1/images/edits requests previously bypassed all model-level gateway checks. The modelFilter middleware only processes application/json bodies; multipart requests fall through to the route handler. The audio transcriptions handler (POST /v1/audio/transcriptions) already handled this correctly, but the image edits handler did not.

What was skipped for multipart image-edit requests:

Global model block — a model disabled via POST /admin/model-allowlist was not enforced.
Per-key allowlist — allowed_models restrictions on API keys were silently ignored.
Endpoint compatibility — sending a non-image model (e.g. whisper-1, gpt-4o) passed through without error.

Fix: After validating the multipart form-data structure, the route handler now extracts the model field and runs the same three-step check as /v1/audio/transcriptions:

If the model is globally blocked or not in the registry (and not explicitly allowed via D1), returns 403 model_not_available.
If the model is not in the API key’s allowed_models, returns 403 model_not_allowed.
If the model is not an image-generation model, returns 400 model_endpoint_mismatch.

The model is also set in context so it appears in sampled request logs and the X-BVE-Model response header.

5 integration tests added covering all three rejection paths, the allowlist-pass path, and the no-model (omitted field) path.

2026-05-25 (improvement loop, iteration 341)

Admin API — `?request_id=` filter on `GET /admin/request-logs`

GET /admin/request-logs now accepts a ?request_id= query parameter that performs an exact lookup by BVE request UUID — the same UUID returned to callers in the X-Request-Id response header and stored in the request_id field of each sampled log entry.

Use case: a caller includes their X-Request-Id when reporting a support issue. The operator can now run:

GET /admin/request-logs?request_id=<uuid>

and receive the exact sampled log entry for that request (status, model, latency, tokens, timestamp) without needing to know the key, model, or endpoint in advance.

The filter is exact-match (not a substring search) and may return zero entries when the request was not sampled (only ~1% of requests are written to request_logs_sampled) or when the UUID is not recognized. The total field in the response reflects the filtered count. The filter can be combined with other params (?key_id=, ?since=, etc.).

The new parameter is documented in GET /openapi.json.

2026-05-25 (improvement loop, iteration 340)

Dashboard — `?keyId=` deep-link for Transactions page; “View Transactions” action on API Keys page

Operators can now navigate directly from an API key to its sampled transaction log:

API Keys page → “View Transactions”: A new View Transactions entry in each key’s action dropdown navigates to the Transactions page pre-filtered by that key’s UUID (/transactions?keyId=<id>).
?keyId= URL param on /transactions: The Transactions route now accepts a keyId search param. When present, the “Filter by Key” dropdown is pre-selected and the server-side query is applied immediately on page load — no manual selection required.

This is symmetric to the existing ?model= deep-link (used by the Model Stats → Transactions navigation in the Usage page).

OpenAPI spec — model `enum` for `POST /v1/audio/transcriptions`

The model field in the /v1/audio/transcriptions multipart schema now carries an explicit enum listing all 9 supported STT models:

whisper-1, whisper-large-v3, whisper-large-v3-turbo, distil-whisper-large-v3-en
gpt-4o-transcribe, gpt-4o-transcribe-2025-03-20
gpt-4o-transcribe-diarize, gpt-4o-transcribe-diarize-2025-03-20
gpt-4o-mini-transcribe

Previously the field was typed as a plain string with a short description. The enum makes the contract machine-readable for SDK generators and Swagger UI.

2026-05-25 (improvement loop, iteration 339)

Dashboard — 3-color HTTP status badges in Transactions page

The Status column in the Gateway Request Transactions page now uses three distinct colors instead of two:

Emerald (green) — 2xx success responses
Amber (orange) — 4xx client-side errors (bad request, auth failure, rate limit, validation)
Rose (red) — 5xx gateway or upstream errors

Previously both 4xx and 5xx were shown in the same rose/red color, making it impossible to distinguish a client auth error from an upstream failure at a glance. The Log Inspector slide-over panel also applies the same 3-color treatment to the HTTP <status> badge.

2026-05-25 (improvement loop, iteration 335)

Admin API — `GET /admin/request-logs` now supports `?status_range=` and `?search=`

Two additional filter parameters are now available on the sampled request log endpoint:

?status_range= — accepts 2xx, 4xx, or 5xx; returns only rows in that HTTP status class. Any other value returns 400 validation_error. Applies to both the row set and the total count so pagination arithmetic remains correct.
?search= — free-text case-insensitive LIKE match across endpoint, model, and key_name (truncated to 200 chars). Forwarded to both the row query and the aggregate count query.

Nine new integration tests verify each status class, invalid input rejection, and count consistency for both filters.

Dashboard — Transactions search now queries all logs (server-side)

The Transactions page search field previously filtered only the 50 rows loaded on the current page. It now sends the committed search term to the server as a ?search= query parameter, matching across all sampled log entries regardless of which page is loaded. Press Enter or click the clear × button to commit or cancel.

Docs — `?status_range=` and `?search=` documented in Request Logs reference

docs/admin-api/request-logs.md updated with the two new parameters in the query parameter table and cURL examples.

2026-05-25 (improvement loop, iteration 337)

Docs — Audio transcription model table: 3 missing models added + model-selection guide

docs/api-reference/audio.mdx transcription models table was updated to match the gateway’s TRANSCRIPTION_MODELS registry. Three models present in the registry were missing from the page:

gpt-4o-mini-transcribe — lighter, faster GPT-4o Mini-based transcription (added to models.mdx in iteration 336 but not to audio.mdx)
gpt-4o-transcribe-2025-03-20 — pinned March 2025 snapshot of gpt-4o-transcribe; use this when you need a stable, unchanging model version
gpt-4o-transcribe-diarize-2025-03-20 — pinned March 2025 snapshot of gpt-4o-transcribe-diarize; same diarization output but guaranteed not to auto-update

A model-selection tip callout is now included below the table, summarising when to prefer each model: whisper-1 for compatibility, gpt-4o-transcribe for accuracy, dated snapshot variants for stability, gpt-4o-mini-transcribe for high-volume or latency-sensitive workloads, and the Groq models for ultra-low latency inference.

2026-05-25 (improvement loop, iteration 336)

Docs — `gpt-4o-mini-transcribe` added to transcription model list; working model count corrected

gpt-4o-mini-transcribe was present in the gateway’s TRANSCRIPTION_MODELS registry and exposed via GET /v1/models with bve_category: "transcription", but was missing from the docs transcription model list in the Models reference page. The working model count on the same page was also wrong (listed as 108, actual is 109). Both are now corrected.

2026-05-25 (improvement loop, iterations 330–333)

Admin API — `GET /admin/audit-logs` now forwards `?action_category=` and `?search=`

Two query parameters supported by the underlying D1 query were never extracted or passed through by the route handler. Both are now fully wired:

?action_category= — accepts auth, keys, or models; validates the value (unknown category → 400 validation_error) and filters both the row set and the total count.
?search= — free-text LIKE match across action, target_id, and actor_type; forwarded to both the row query and count query so total reflects the filtered scope.

Eight new integration tests verify filtering, empty-result handling, 400 on an invalid category, and count consistency.

Model Stats API — `p50_latency_ms`, `p95_latency_ms`, `error_count` now included in response

GET /admin/model-stats silently dropped three fields that the SQL query already computed. The route handler’s .map() only serialized avg_latency_ms and max_latency_ms; p50_latency_ms, p95_latency_ms, and error_count are now included in every model row.

Proxy performance — merged duplicate header scan loops

proxyToFuelix ran two separate [...headers.keys()] loops: one to strip x-basicllm-* headers and one to strip sec-* headers. Both are now merged into a single loop with an || condition — one array allocation and one iteration pass per request instead of two.

Docs — Model Stats, Chat Completions, Changelog updated

admin-api/model-stats.md — Response JSON example and fields table updated with p50_latency_ms, p95_latency_ms, and error_count; jq snippets extended.
api-reference/chat-completions.mdx — Deprecated functions/function_call parameters documented with 13 new validation rows.
admin-api/audit-logs.md — ?action_category= and ?search= parameters added to the query parameter table and cURL examples section.

2026-05-25 (improvement loop, iterations 326–329)

Validation — Deprecated `functions`/`function_call` params validated in chat completions

Older OpenAI SDK clients (v0.x, early v1.x) and many third-party tools still send the deprecated functions array and function_call field. The gateway now validates these before forwarding to Fuelix, replacing opaque Pydantic 422 responses with structured 400 errors:

functions[N].name must match [a-zA-Z0-9_-]{1,64}; missing or empty → invalid_type/invalid_value
function_call string must be "none" or "auto" (not "required"); object form must have a non-empty name

Dashboard — Sortable model stats table; p50/p95 latency percentiles in Transactions

Sortable model stats table — The Usage page Top Models table headers are now clickable for sort (Requests, Tokens, Avg latency, p50, p95, Err%). Click once to sort descending, again to flip.
Error Rate % column — Per-model error rate column added to the Usage page Top Models table (rose > 10%, amber > 2%, green/neutral at 0%).
p50/p95 in Transactions KPI — The Transactions page Avg Latency KPI card now shows a sub-line p50 Xms / p95 Xms sourced from a CTE window-function query across all matching rows (not just the 50-row page).

Model Stats API — `p50_latency_ms`, `p95_latency_ms`, `error_count` in GET /admin/model-stats

GET /admin/model-stats now returns three additional fields per model row:

p50_latency_ms — Median (50th percentile) end-to-end latency computed via ROW_NUMBER() OVER (PARTITION BY model ORDER BY latency_ms) window function. More representative than avg_latency_ms for skewed latency distributions.
p95_latency_ms — 95th percentile latency for SLO alerting and tail-latency analysis.
error_count — Number of sampled requests with HTTP status ≥ 400 for this model. Divide by request_count for the sampled error rate.

Other fixes in this group

Responses API EasyInputMessage content validation — POST /v1/responses input items now validate the content field (must be string or array; array entries must be valid input-block objects with a string type). Previously malformed content blocks passed through silently.
Transcription endpoint respects D1 allowlist — /v1/audio/transcriptions now checks getCachedModelExplicitlyAllowed() before applying the registry block, matching the behavior of all other endpoints.
since == until accepted in /admin/model-stats — equal timestamps (point-in-time queries) now return 200 with an empty-or-filtered result instead of 400.
Security — sec-* header strip in proxy — proxyToFuelix now removes any upstream request header starting with sec- (e.g. sec-fetch-site, sec-fetch-mode) to prevent browser context leakage and spoofing-based upstream routing abuse.
Security — unrecognized 4xx body redaction — Non-standard JSON 4xx upstream responses (not the FastAPI {detail} or OpenAI {error} shape) now have redactSecrets() applied before being forwarded to API consumers.
Transactions search clarification — Input placeholder changed from “Search logs…” to “Search this page…” with a tooltip clarifying it searches the current page only.

2026-05-25 (improvement loop, iteration 320)

Dashboard + DB — Expiry-aware key stats in overview; expired/expiring-soon alerts; `expires_at` index

The GET /admin/stats endpoint (and the admin dashboard overview) now returns expiry-aware credential counts. Previously, keys.active included all status='active' keys even if their expires_at had already passed — those keys return 401 api_key_expired on every request, so showing them as “active” was misleading.

Changes:

keys.active now counts only keys with status='active' and no past expiry date (truly functional keys).
New keys.expired field: count of status='active' keys whose expires_at is in the past.
New keys.expiringSoon field: count of status='active' keys whose expires_at is within the next 7 days.
keys.total is now active + suspended + revoked + expired.
The single-scan aggregate SQL query replaces the old GROUP BY status query — all five counts are computed in one round-trip.

Dashboard alerts: The overview page now surfaces two new operational alerts:

“N API keys expired” (critical) — when expired > 0, prompting operators to revoke or update the expiry.
“N API keys expire within 7 days” (warning) — when expiringSoon > 0.

The “Active API Keys” stat card detail text also shows the expired count when non-zero (e.g. “12 total · 2 expired”) so the problem is visible at a glance.

Performance: A new D1 index (api_keys_expires_at_idx) on the expires_at column is added via migration 0008_api_key_expiry_idx.sql to support the WHERE expires_at ... filter in the aggregate query.

Docs: expires_at field is now documented in the API Keys reference including create/update examples and the key lifecycle section.

Tests: 6 new tests in admin-dashboard.test.ts verify the expiry-aware counts (shape, expired delta, expiringSoon delta, far-future not in expiringSoon, total arithmetic, 401 without session).

2026-05-25 (improvement loop, iteration 319)

Three admin dashboard improvements:

Models page — category filter tabs — A row of tabs (All, Chat, Embedding, Image, TTS, Transcription, OCR) now appears in the “Policy Allowlists” table header. Each tab shows a live count badge for that category. Selecting a tab filters the table instantly (client-side) without clearing the text search. The two filters combine: e.g. “Chat” tab + search “gpt-4” shows only chat-category models whose ID contains “gpt-4”.
Transactions page — API key filter — A “Filter by Key” selector in the filter sidebar lets operators scope the request log table to a single API key. The key list is loaded from the existing /api/api-keys endpoint (cached) and filtered to non-revoked keys only. Selecting a key passes key_id server-side (already supported by the backend). A “Clear” button removes the filter. The active key name is echoed below the selector.
Models page — reasoning node count fix — The “Reasoning Nodes” stat card previously always showed 0 because it tested model.category === 'reasoning', but 'reasoning' is not a valid ModelCategory (valid values: chat, embedding, image, tts, transcription, ocr). The correct check is model.capabilities.includes('reasoning'), which matches models whose capability array includes the reasoning tag (e.g. o3, o4-mini, claude-3-7-sonnet). Fixed.

2026-05-25 (improvement loop, iteration 318)

Models — `bve_category` field on every `/v1/models` response entry; `?search=` and `?category=` query params

GET /v1/models now annotates each model object with a bve_category field (chat, embedding, image, tts, transcription, ocr) so API consumers can programmatically pick the correct endpoint for a model without needing knowledge of the internal capability registry. Models registered only via the D1 allowlist without a registry entry are returned without the field.

Two query parameters are now accepted:

?search= — case-insensitive substring filter on model ID. Returns only models whose ID contains the given term (length 1–200). Example: ?search=gpt-4 returns only models whose IDs include gpt-4.
?category= — filter by capability type. One of: chat, embedding, image, tts, transcription, ocr. Both parameters can be combined.

Both parameters validate their input before hitting Fuelix. Invalid values return 400 invalid_value immediately. The OpenAPI spec at /openapi.json now documents both parameters and the bve_category field.

2026-05-25 (improvement loop, iteration 317)

DX — CLAUDE.md accuracy, debug-request.sh Anthropic headers, .gitignore cleanup

Three developer-experience fixes:

CLAUDE.md test count updated — ~2383 → ~2632 to reflect test suite growth through iterations 309–315.
debug-request.sh Section 4 — Added Anthropic per-direction rate-limit headers (anthropic-ratelimit-{input,output}-tokens-{limit,remaining,reset}) to the authenticated debug output. These headers were added to SAFE_UPSTREAM_HEADERS and exposeHeaders in iteration 309 but were absent from the debug script.
.gitignore — Added cy-loop-log.md alongside the existing cy-test-loop-log.md and cy-simple-loop-log.md patterns.

2026-05-25 (improvement loop, iteration 315)

Dashboard — model stats `errorCount`, D1 indexes, session display fix, search wiring

Four improvements to the admin dashboard and data layer:

errorCount in model stats — GET /admin/model-stats now returns error_count (requests with HTTP status ≥ 400) per model. The Usage page Status Distribution and the Transactions page KPI cards now use this field instead of computing error rate from the 50-row page window. Integration tests verify a mixed 1-success + 2-error seed produces errorCount === 2.
D1 indexes — Migration 0006 adds request_logs_endpoint_idx and request_logs_model_idx on request_logs_sampled, eliminating full-table scans for common Transactions page filters.
Session display fix — The Overview page “Session Integrity” stat card previously displayed the raw cookie name (bve_admin_session). It now shows a “DB Health” card backed by a real database probe result.
Search wiring — The ?search= query param is now threaded end-to-end from the Transactions page search input through the server API layer to the D1 listRequestLogsSampled query.

2026-05-25 (improvement loop, iteration 314)

Dashboard — server-side aggregate KPI stats for Transactions page

KPI cards (Avg Latency, Token Volume, Error Rate) on the Transactions page were previously computed by reducing the 50 visible rows for the current page — meaningless when paginating.

A new aggregateRequestLogStats() SQL query runs COUNT / AVG / SUM across all matching rows (not just the page), respecting the same model, endpoint, status_range, search, since, and until filters as the listing query.
The DashboardRequestLogsResponse contract now includes a stats field with avgLatencyMs, totalTokens, errorRate, errorCount, and totalCount.
The separate countRequestLogsSampled call is eliminated; total is derived from stats.totalCount in a single D1 round-trip.
Five integration tests verify stats shape, avgLatencyMs accuracy, totalTokens summation, errorCount + errorRate, and filter isolation.

2026-05-25 (improvement loop, iteration 311)

Dashboard — server-side HTTP status range filter for request logs

The Transactions page status filter (2xx / 4xx / 5xx) was previously applied client-side on the 50 records loaded for the current page only. This caused two correctness problems: the “Total Logged” count was always the unfiltered total regardless of status selection, and paginating through errors was impossible because the pagination window was not scoped to the selected status class.

The filter is now server-side across all layers:

DB layer — listRequestLogsSampled and countRequestLogsSampled in src/db/queries.ts accept statusRange: '2xx' | '4xx' | '5xx' and apply status >= 200 AND status < 300, status >= 400 AND status < 500, or status >= 500 SQL conditions respectively.
Service layer — getRequestLogs in src/services/usage.ts forwards statusRange to the DB query.
Server API — getDashboardRequestLogs accepts statusRange and passes it to both the listing query and the count query so totals are accurate.
Server route — GET /api/request-logs parses ?status_range= and rejects invalid values with a 400.
Frontend queries — requestLogsQueryOptions includes status_range in the URL when set.
Frontend page — filterStatus is no longer applied client-side. Changing the status filter resets pagination to page 1 via handleStatusFilter. The “Total Logged” KPI card and page count now accurately reflect the filtered result set.

4 new integration tests verify the behavior end-to-end: 2xx returns only 2xx logs with correct total, 4xx isolates 4xx from mixed seeded data, 5xx isolates 5xx, and an invalid value (3xx) is rejected with 400 validation_error.

2026-05-25 (improvement loop, iteration 310)

Dashboard — expose `request_id` in transactions panel; endpoint-specific cURL templates; live UTC clock

Three admin dashboard improvements:

requestId in transaction inspector — The request_id field stored in request_logs_sampled (added in migration 0005) is now exposed end-to-end through the API contract (DashboardRequestLog.requestId: string | null), the /api/request-logs response, and the transactions inspector panel. Operators can copy the Request ID directly from the side panel to correlate a sampled log entry with the corresponding entry in wrangler tail or Cloudflare Logpush. The copy button appears only when a requestId is present (older logs without it show nothing).
Endpoint-specific cURL replay templates — The “Replay Command” cURL snippet in the transaction inspector previously hardcoded the chat completions request body ({"messages": [...]}) for every endpoint. It now generates the correct body format per endpoint: embeddings uses {"input": "..."}, TTS uses {"input": "...", "voice": "alloy"}, image generation uses {"prompt": "..."}, Anthropic Messages API uses {"max_tokens": 16, "messages": [...]}, Responses API uses {"input": "..."}, legacy completions uses {"prompt": "..."}, and multipart endpoints (transcriptions, image edits) use -F flags instead of -d.
Live UTC clock on overview — The “ISOLATE TIME” clock in the overview page welcome banner now updates every second via a setInterval effect. Previously it was frozen at the component mount time and never updated.

2026-05-25 (improvement loop, iteration 301)

Two UI/UX improvements to the docs site:

quickstart.mdx — The four onboarding sections (Install SDK, Make your first request, Health check, List models) are now wrapped in Starlight’s Steps component. This renders them as visually numbered steps (1, 2, 3, 4) with a vertical connector line, making the page read as a guided walkthrough rather than a flat list of H2 sections. All tab-based code examples are preserved inside each step.
index.mdx — The four Card components on the home page are replaced with LinkCard components. LinkCard renders the entire card as a clickable block (including the title and description), versus Card which requires an inline markdown link for navigation. The description attribute on each LinkCard is shown as subtext below the title, and the whole card surface routes to the target page. This matches standard docs-site best practice for feature navigation grids.

2026-05-25 (improvement loop, iteration 294)

Docs — Add top_k / min_p / repetition_penalty to chat completions reference; fix health response example

Two accuracy gaps closed:

chat-completions.mdx request body table — Added top_k, min_p, and repetition_penalty to the request parameters table. These three parameters were added to gateway validation in iteration 291 but were missing from the docs reference. Each entry includes the accepted value range and the provider context (Cohere, Groq, Mistral, OpenRouter).
chat-completions.mdx validation table — Added the corresponding validation error rows: top_k returns invalid_value for anything that is not a positive integer (strings, floats, and ≤ 0 values); min_p returns invalid_type for non-numbers and invalid_value for values outside [0, 1]; repetition_penalty returns invalid_type for non-numbers and invalid_value for values ≤ 0.
quickstart.mdx health response — The GET /health example response was missing the request_id field added in iteration 288. Updated to include request_id alongside status, service, and timestamp.

2026-05-24 (improvement loop, iteration 276)

Docs — Fix wrong error code in embeddings and audio pages; add missing admin endpoint

Three accuracy bugs fixed:

embeddings.mdx — The embedding-only-models note incorrectly cited model_not_supported_for_endpoint. The actual error code emitted by the gateway (modelFilter.ts) is model_endpoint_mismatch.
audio.mdx (TTS notes) — Same stale error code in the TTS notes section (“Sending a TTS model to any other endpoint…”).
audio.mdx (transcriptions notes) — Same stale error code in the transcriptions notes section (“Sending a TTS, embedding, or image-generation model…”).

One completeness fix:

introduction.md admin table — The GET /admin/model-stats endpoint was implemented and documented in admin-api/model-stats.md and the admin API overview, but omitted from the endpoint table in the Introduction page.

2026-05-24 (improvement loop, iteration 114)

Docs — Files, Assistants, Vector Stores converted to MDX with tabbed TypeScript/Python SDK examples

The three remaining .md API reference stubs have been converted to .mdx files, each with <Tabs syncKey="lang"> sections containing TypeScript and Python SDK examples:

Files (files.mdx) — two tab sections: Upload a file and List and delete files
Assistants (assistants.mdx) — two tab sections: Create an assistant and Thread, messages, and run
Vector Stores (vector-stores.mdx) — three tab sections: Create a vector store, Upload a file and attach it, and Use a vector store with an assistant

All existing cURL examples, response shapes, endpoint tables, and caution admonitions are preserved. All .md stubs deleted. No remaining .md files in api-reference/.

2026-05-24 (improvement loop, iteration 111)

Docs — Models converted to MDX with tabbed TypeScript/Python SDK examples

docs/src/content/docs/api-reference/models.md has been converted to models.mdx with two <Tabs syncKey="lang"> sections:

List all models — client.models.list() iterating over models.data in both TypeScript and Python
Retrieve a single model — client.models.retrieve("gpt-4o") logging id and owned_by

The “OpenAI SDK” section heading was renamed to “SDK examples” to reflect that both TypeScript and Python tabs are now included. All existing content is preserved.

2026-05-24 (improvement loop, iteration 110)

Docs — Moderations converted to MDX with tabbed SDK examples

docs/src/content/docs/api-reference/moderations.md has been converted to moderations.mdx with three tabbed TypeScript/Python SDK example sections:

Basic single input — client.moderations.create() with a single string; logs flagged status and the list of flagged categories
Batch input — passes an array of strings and iterates over the results array
Explicit model — shows model: 'text-moderation-stable'; demonstrates that model is optional and defaults to omni-moderation-latest when omitted

All existing content is preserved: request body table, moderation models table, model restriction note, response JSON, gateway validation table, cURL examples, and notes. Docs build: 34 pages.

2026-05-24 (improvement loop, iteration 258)

Docs — Legacy Completions converted to MDX with tabbed SDK examples

docs/src/content/docs/api-reference/legacy-completions.md has been converted to legacy-completions.mdx with three tabbed TypeScript/Python SDK examples:

Basic completion — client.completions.create() with model, prompt, max_tokens
Array prompts — shows how ["Translate to French:", "Hello, world!"] is joined with \n\n before being forwarded as a single user message
Echo mode — demonstrates echo: true prepending the prompt to each choice.text

All existing validation tables, cURL examples, limitation notes, and the emulated-endpoint callout are preserved unchanged. Docs build: 34 pages.

2026-05-24 (improvement loop, iteration 106)

Smoke test — production verification enhancements

scripts/smoke-test.sh now performs 6 additional production checks per run, without any extra curl calls (reuses already-fetched header/body variables):

GET /health timestamp validation — verifies timestamp field is present and is ISO 8601 format ("timestamp":"20YY-..."). Catches workers serving stale or static cached responses.
Content-Type: application/json on 401 errors — confirms error responses carry the correct content type so OpenAI SDK clients can parse error bodies.
X-Content-Type-Options: nosniff on 401 errors — verifies the securityHeaders() middleware fires on all request paths, not just 200 OK. A missing header on error responses would indicate a middleware ordering bug.
request_id in v1 error body — end-to-end check that the X-Request-Id generated by requestId() middleware propagates into the error.request_id field of 401 responses on /v1/* routes (previously only verified for /admin/* routes).

2026-05-24 (improvement loop, iteration 255)

Tests — 11 HTTP integration tests for responses/completions/messages validation

Added 11 end-to-end integration tests (via SELF.fetch) that exercise the full middleware stack for existing field validators, complementing the existing pure unit tests:

/v1/responses — truncation (invalid string, non-string), metadata (non-string value, >16 pairs), include (string instead of array, empty element), store (non-boolean), parallel_tool_calls (non-boolean)
/v1/completions — frequency_penalty (out of range, wrong type)
/v1/messages — stop_sequences element type (non-string element)

All return correct 400 OpenAI error shapes with code, param, and request_id. Test count: 2344 → 2355 (+11).

Docs — Commit images.md → images.mdx conversion (iteration 105 work)

Committed the uncommitted images.mdx file (Python + TypeScript tabbed SDK examples for image generation, disk save, image edits, and multi-image edit) alongside the removal of images.md.

2026-05-24 (improvement loop, iteration 105)

Docs — Images page converted to tabbed TypeScript/Python examples

api-reference/images.md converted to images.mdx with tabbed SDK examples (TypeScript + Python via syncKey="lang") matching the pattern established in audio, chat-completions, responses, messages, and embeddings pages.

Image Generation section

”### OpenAI SDK” section replaced with <Tabs> block — TypeScript example preserved, Python example added using client.images.generate().
New Saving to disk subsection added with tabs — shows gpt-image-1 with output_format: "webp" and quality: "high", saving b64_json data to disk.

Image Edits section

New ”### SDK example” <Tabs> block added — TypeScript uses client.images.edit() with fs.createReadStream(); Python uses an open() context manager.
New Multi-image edit subsection added with tabs — shows array input for multi-image edit/generation workflows.

Docs build: 34 pages, no errors.

2026-05-24 (improvement loop, iteration 103)

Docs — Audio page converted to tabbed TypeScript/Python examples

api-reference/audio.md converted to audio.mdx with tabbed SDK examples (TypeScript + Python via syncKey="lang") matching the pattern established in chat-completions, responses, messages, and embeddings pages.

Text-to-Speech section

”### OpenAI SDK” section replaced with <Tabs> block — TypeScript example preserved, Python example added using client.audio.speech.create() + response.stream_to_file("speech.mp3").
New Streaming TTS subsection added with tabs — TypeScript uses an async iterator over response.body; Python uses client.audio.speech.with_streaming_response.create() context manager.

Transcriptions section

”### OpenAI SDK” section replaced with <Tabs> block — TypeScript example preserved, Python example added using open("audio.mp3", "rb") context manager + client.audio.transcriptions.create().
New Verbose JSON with word timestamps subsection added with tabs — shows response_format: "verbose_json" and timestamp_granularities: ["word"] with iteration over transcription.words.

Next steps block updated to also link to SDK Usage.

2026-05-24 (improvement loop, iteration 99)

Smoke tests — `service_tier`, `truncation`, `metadata`, `include` validation coverage

Four new authenticated sections added to scripts/smoke-test.sh, covering validation that was implemented in earlier iterations but had no end-to-end smoke test coverage.

service_tier (chat completions)

Sends service_tier: "flex" to POST /v1/chat/completions. The only accepted values are "auto", "default", and "scale". Expects 400 with code=invalid_value, param=service_tier.

truncation (Responses API)

Sends truncation: "always" to POST /v1/responses. The only accepted values are "auto" and "disabled". Expects 400 with code=invalid_value, param=truncation.

metadata type check (Responses API)

Sends metadata: "not-an-object" to POST /v1/responses. The field must be a plain object (string-to-string map). Expects 400 with code=invalid_type, param=metadata.

include type check (Responses API)

Sends include: "output_text.logprobs" to POST /v1/responses. The field must be an array of strings, not a bare string. Expects 400 with code=invalid_type, param=include.

Each section performs two check_body assertions (error code and param) so production regressions in these validation paths will surface immediately when bun run smoke is run after deploy.

2026-05-24 (improvement loop, iteration 98)

Tests — quota endpoint lifecycle and PATCH-consistency coverage

Three new tests added to test/admin.test.ts for GET /admin/api-keys/:id/quota:

Suspended key: verifies the endpoint returns 200 (not 404) with status: "suspended" when called for a suspended key. Confirms the quota endpoint is informational and works across all key lifecycle states.
Revoked key: same behavior for a revoked key — 200 with status: "revoked". Useful for admins auditing why requests are failing.
Post-PATCH consistency: after PATCH /admin/api-keys/:id changes rpm_limit, the quota endpoint immediately reflects the new limits.rpm value. Confirms the quota route reads limits fresh from D1 on every call.

2026-05-24 (improvement loop, iterations 248–249)

Validation — `tool_call_id` required on `tool`-role messages; `tool_calls` array validated on `assistant`-role messages

Two gaps in chat completions body validation were closed. Both previously caused upstream Fuelix to return opaque Pydantic 422 errors that clients could not distinguish from gateway failures.

tool_call_id validation (iteration 248)

Every message with role: "tool" must include a tool_call_id string that links it back to the originating tool_calls entry in the preceding assistant message. The gateway now returns a structured 400 before forwarding to Fuelix:

Condition	`code`	`param`
Field absent or null	`missing_required_parameter`	`messages[N].tool_call_id`
Field is not a string	`invalid_type`	`messages[N].tool_call_id`
Field is an empty string	`invalid_value`	`messages[N].tool_call_id`

tool_calls array validation (iteration 249)

When an assistant message includes a tool_calls array the gateway now validates each entry:

Condition	`code`	`param`
`tool_calls` is not an array	`invalid_type`	`messages[N].tool_calls`
`tool_calls` is empty	`invalid_value`	`messages[N].tool_calls`
entry is not an object	`invalid_type`	`messages[N].tool_calls[M]`
entry missing `id`	`missing_required_parameter`	`messages[N].tool_calls[M].id`
entry `id` not a string	`invalid_type`	`messages[N].tool_calls[M].id`
entry `id` empty	`invalid_value`	`messages[N].tool_calls[M].id`
entry missing `type`	`missing_required_parameter`	`messages[N].tool_calls[M].type`
entry `type` not a string	`invalid_type`	`messages[N].tool_calls[M].type`
entry `type` ≠ `"function"`	`invalid_value`	`messages[N].tool_calls[M].type`
entry missing `function`	`missing_required_parameter`	`messages[N].tool_calls[M].function`
entry `function` not an object	`invalid_type`	`messages[N].tool_calls[M].function`
entry `function.name` missing	`missing_required_parameter`	`messages[N].tool_calls[M].function.name`
entry `function.name` not a string	`invalid_type`	`messages[N].tool_calls[M].function.name`
entry `function.name` empty	`invalid_value`	`messages[N].tool_calls[M].function.name`
entry `function.arguments` not a string (when present)	`invalid_type`	`messages[N].tool_calls[M].function.arguments`

Smoke test — both validation paths are now covered in scripts/smoke-test.sh under the --- Tool-role message validation --- and --- Assistant tool_calls array validation --- sections (iteration 250).

2026-05-24 (improvement loop, iteration 247)

Docs — `service_tier`, `truncation`, and `metadata` documented in API reference

Three validated request fields were missing from the API reference pages after being added to the gateway validation layer in iteration 93.

api-reference/chat-completions.mdx — service_tier added

Added service_tier to the Request body table: optional string field accepted by OpenAI and OpenRouter to select a compute tier ("auto", "default", "scale"); forwarded unchanged to Fuelix.
Added service_tier to the Optional field constraints table: two new rows covering the invalid_type (non-string value) and invalid_value (unrecognized tier string) cases.

api-reference/responses.mdx — truncation and metadata added

Added truncation to the Request body table: optional string field controlling context-window truncation when the conversation exceeds the model limit ("auto" or "disabled").
Added metadata to the Request body table: optional string-to-string map attached to the response for caller tracking (max 16 pairs; keys ≤ 64 chars; values ≤ 512 chars).
Added both to the Optional field constraints table: eight new rows covering type, cardinality, key-length, and value-type/length validation rules for truncation and metadata.

2026-05-24 (improvement loop, iteration 87)

Docs — Errors page converted to tabbed language selector; /v1/responses mismatch hint documented

errors.md was using a TypeScript-only “Error handling example” with no Python or cURL tabs, inconsistent with the rest of the API reference (chat-completions, responses, messages, embeddings, streaming, quickstart all use <Tabs syncKey="lang">).

api-reference/errors.md → errors.mdx

Added import { Tabs, TabItem } from '@astrojs/starlight/components'
Error handling example section: wrapped the existing TypeScript block in <Tabs syncKey="lang"> and added:
- Python tab — from openai import OpenAI, APIError with e.status_code, e.message, e.code, e.param
- cURL tab — curl -o /tmp/bve_response.json -w "%{http_code}" with if [ "$HTTP_STATUS" -ge 400 ] guard
Chat or reasoning model sent to a specialized endpoint table: added missing POST /v1/responses row with the (non-OpenAI model — this endpoint only supports OpenAI GPT and O-series models such as gpt-4o or o3) hint (added to the gateway in iteration 237)
Full capability-mismatch hint catalog: added the non-OpenAI model on /v1/responses row to keep the catalog exhaustive

2026-05-24 (improvement loop, iteration 85)

Docs — Embeddings page converted to tabbed language selector with cURL and TypeScript batch example

embeddings.md was using the old flat **TypeScript / Node.js:** / **Python:** bold-label style instead of <Tabs syncKey="lang"> tab groups. The cURL and SDK examples were in separate sections, and the batch embedding section only showed Python.

api-reference/embeddings.md → embeddings.mdx

Added import { Tabs, TabItem } from '@astrojs/starlight/components'
Single input section: merged the cURL example block and the TypeScript/Python SDK blocks into a single <Tabs syncKey="lang"> group (TypeScript, Python, cURL)
Batch embedding section: wrapped the existing Python example in <Tabs syncKey="lang"> and added a new TypeScript tab, plus a cURL tab showing array input format
The page now shares syncKey="lang" with chat-completions, responses, messages, Quickstart, SDK guide, and Streaming pages — selecting a language on any page syncs across all tabbed pages site-wide

2026-05-24 (improvement loop, iteration 233)

Logger — `retryAfter` field added to rate-limit structured logs

The structured request log now includes a retryAfter integer field (seconds) when the response carries a Retry-After header. This field appears on:

429 responses from the per-key quota middleware (RPM, RPD, or monthly request cap exceeded)
503 responses from the global per-isolate hard cap

Previously, operators watching wrangler tail could see errorCode: "rate_limit_exceeded" but had to check the raw HTTP headers to learn how long until the client can retry. The new field surfaces that duration directly in the log line, making rate-limit events immediately actionable from the log stream.

2026-05-24 (improvement loop, iteration 232)

Smoke test — added `GET /admin/api-keys/:id` and `GET /admin/api-keys/:id/quota` coverage

The ADMIN_KEY-authenticated section of scripts/smoke-test.sh covered eight admin endpoints but was missing the two single-key detail endpoints. Added:

GET /admin/api-keys/:id — verifies the single-key retrieval route returns 200 with id, name, status, and rpm_limit fields present
GET /admin/api-keys/:id/quota — verifies the Durable Object quota state endpoint returns 200 with key_id, limits, current, and checked_at fields; this confirms the DO getUsage() RPC pipeline is working in production

The key ID is extracted from the already-fetched GET /admin/api-keys response body using a UUID-pattern grep, so no additional listing request is needed. If no keys exist in production the checks are skipped with a clear SKIP message rather than failing.

2026-05-24 (improvement loop, iteration 231)

Docs — messages page converted to tabbed language selector with Python streaming example

messages.md was the last major API reference page showing multi-language SDK examples as flat sequential code blocks instead of <Tabs syncKey="lang"> tab groups. The Anthropic SDK section had TypeScript basic, TypeScript streaming, and Python basic examples but was missing a Python streaming example entirely.

api-reference/messages.md → messages.mdx

Added import { Tabs, TabItem } from '@astrojs/starlight/components'
Basic message section: merged TypeScript and Python blocks into a <Tabs syncKey="lang"> group
Streaming section (new heading): merged TypeScript streaming block and a new Python streaming example into a <Tabs syncKey="lang"> group
Python streaming uses client.messages.stream() context manager with stream.text_stream iterator — the idiomatic Anthropic Python SDK pattern

The page now shares syncKey="lang" with chat-completions, responses, Quickstart, SDK guide, and Streaming pages — selecting a language on any page syncs across all six tabbed pages site-wide.

2026-05-24 (improvement loop, iteration 229)

Docs UI/UX — chat-completions and responses pages converted to tabbed language selector

chat-completions.md and responses.md were the last major API reference pages showing multi-language SDK examples as separate flat sections instead of <Tabs syncKey="lang"> tab groups. Users who selected “Python” on the Quickstart, SDK guide, or Streaming pages had to manually scroll past the TypeScript examples on these pages to reach the Python version.

api-reference/chat-completions.md → chat-completions.mdx

Converted two sections from flat TypeScript/Python blocks to <Tabs syncKey="lang">:

Vision SDK examples — the ### TypeScript SDK (vision) and ### Python SDK (vision) H3 sub-headings are now a single unified <Tabs syncKey="lang"> group titled “SDK examples — vision”. Selecting TypeScript or Python once syncs both the vision and main SDK tabs simultaneously.
OpenAI SDK section — the **TypeScript:** / **Python:** bold-label blocks are now a <Tabs syncKey="lang"> group. The console.log output line was added to the TypeScript tab for completeness.

api-reference/responses.md → responses.mdx

Converted two sections from flat blocks to <Tabs syncKey="lang">:

OpenAI SDK section — previously TypeScript-only. Added a Python SDK example (client.responses.create(...)) and put both in a <Tabs syncKey="lang"> group.
Streaming section — the ### cURL, ### TypeScript (OpenAI SDK), and ### Python (OpenAI SDK) H3 sub-headings are now a unified three-tab group (TypeScript / Python / cURL). The cURL section was previously above the SDK sub-sections; it is now a third tab in the same group.

Both pages share syncKey="lang" with the Quickstart, SDK guide, and Streaming pages — selecting a language on any of those pages now syncs across all five pages site-wide.

2026-05-24 (improvement loop, iteration 78)

Docs — corrected validation error codes in chat-completions and responses API reference

api-reference/chat-completions.md — 7 rows corrected, 4 rows added

Following validation fixes in iterations 74–76 (undefined guards for tools[N].type, tools[N].function, tool_choice.type, tool_choice.function), the optional field constraints table had several wrong or merged rows that cited the wrong error code:

tools[N].type was listed as a single “Not "function"” → invalid_value row. Split into 3 rows: absent → missing_required_parameter, not a string → invalid_type, wrong value → invalid_value.
tools[N].function was listed as “Missing or not an object” → invalid_type. The absent case returns missing_required_parameter (not invalid_type). Split into 2 rows.
tools[N].function.name was listed as “Not a non-empty string” → invalid_value. Absent or non-string actually returns invalid_type; only the empty-string case returns invalid_value. Split into 2 rows (absent/non-string → invalid_type, empty → invalid_value); the regex row is unchanged.
tool_choice.type was listed as a single “Not "function"” → invalid_value row. Split into 3 rows matching the source (absent → missing_required_parameter, non-string → invalid_type, wrong value → invalid_value).
tool_choice.function was entirely absent from the table. Added 2 rows: absent → missing_required_parameter, not an object → invalid_type.
tool_choice.function.name was listed as “Missing or not a non-empty string” → invalid_value. Absent/non-string returns invalid_type; empty returns invalid_value. Split into 2 rows.

Also added the missing required-field row: messages[N].role not one of the valid set → invalid_value (the set is system, user, assistant, tool, function, developer; unrecognised values like "operator" were already rejected since iteration 140 but the row was absent from the table).

api-reference/responses.md — Optional field constraints table expanded

The same shared validators (validateToolsField, validateToolChoiceField) power the Responses API. The table previously had merged or missing rows for tools[N].type, tools[N].function, tools[N].function.name, and tool_choice — all now corrected and expanded to match the source. Also fixed reasoning.effort which was only showing invalid_value but the invalid_type case (non-string effort) returns a different code.

2026-05-24 (improvement loop, iteration 222)

Docs — audio `timestamp_granularities[]` and structured log field reference

Two documentation gaps closed:

1. api-reference/audio.md — timestamp_granularities[] added

The audio transcriptions request parameter table and notes section were missing timestamp_granularities[], which was added to gateway validation in commit b45fb58. The field is sent as repeated multipart form fields (the HTTP array convention). The gateway validates each value against {word, segment} and returns 400 invalid_value with param: "timestamp_granularities" for unknown values. Also added explicit notes for response_format and temperature gateway validation (added in earlier iterations) which were similarly absent from the notes section.

2. guides/deployment.md — structured request log fields reference added

wrangler tail is mentioned in the deployment page but the structured log format was never documented. Added:

A concrete example log line for a streaming chat request (showing all typical fields)
A full “Structured log fields” reference table covering all 20 fields emitted by src/middleware/logger.ts
Explanation of the stream: true field (added in commit 10642fd) — clarifies that high latencyMs is expected for streaming requests and why token counts are absent
Note that level maps to console.log/warn/error for Logpush level filtering

2026-05-24 (improvement loop, iteration 60)

Docs — admin log response shapes and model ID length guard documented

Three documentation gaps corrected based on recent code changes:

admin-api/audit-logs.md — total count field added to response — GET /admin/audit-logs returns a total integer alongside logs since commit bebb4ac (iteration 206), but the response example and field table were missing it. total reflects the count of entries matching the current filters, independent of limit/offset, and is useful for building pagination UIs. Added a “Response fields” table distinguishing the top-level total and logs array, and updated the JSON example.
admin-api/request-logs.md — total and key_name fields added to response — Same total gap as audit-logs. Additionally, GET /admin/request-logs log entries include key_name (the human-readable API key name, or null) since the key_name column was added to request_logs_sampled — but the field was absent from the docs. Added both fields to the response example JSON and the log entry fields table.
api-reference/chat-completions.md — model ID length validation added — The gateway rejects model values longer than 200 characters with 400 invalid_value (guard added in commit f3005ee, iteration 57). This prevented oversized model IDs from being stored in the in-memory MODEL_CACHE or emitted in unbounded log lines. The gateway validation “Required fields” table now includes the length constraint row.

2026-05-24 (improvement loop, iteration 54)

Docs — Cohere models section added to SDK Usage guide

docs/guides/sdk.mdx was missing documentation for Cohere Command models (command-r, command-r-plus, command-a-03-2025, etc.) even though they were added to the static CHAT_MODELS registry in iteration 201. Groq already had a dedicated section; Cohere did not.

Changes:

docs/guides/sdk.mdx — added ”## Cohere models” section with:
- Compatibility table row (Cohere models via OpenAI SDK)
- Model reference table listing all 6 supported Command models
- TypeScript + Python chat completions examples using command-r-plus
- TypeScript + Python streaming examples using command-r
- Tip callout confirming OpenAI-compatible parameters work unchanged
Updated frontmatter description to mention Cohere alongside OpenAI, Anthropic, and Groq
Updated compatibility table at top of page to add Cohere row

2026-05-24 (improvement loop, iteration 49)

Docs — Monthly request headers documented across all rate-limit pages

X-RateLimit-Limit-Month, X-RateLimit-Remaining-Month, and X-RateLimit-Reset-Month were added to the gateway implementation in commit eadda14 (iteration 196) but were absent from the documentation. These headers are set on every allowed response for keys that have a monthly_limit (per-month request cap) configured.

Changes:

docs/api-reference/rate-limits.md — new ”### Monthly request headers” subsection (between the RPD and Monthly token sections) describing all three headers, their units, and the condition under which they appear; updated the cURL example to include monthly_limit: 1000 in the scenario and show the three new header lines in the response block
docs/getting-started/authentication.md — added the three X-RateLimit-*-Month headers to the CORS exposed-headers table, between the RPD row and the tokens row
docs/api-reference/chat-completions.md — added three new rows to the ”## Response headers” table (between RPD and token rows) documenting the monthly request limit headers

2026-05-24 (improvement loop, iteration 193)

Docs UI/UX — Streaming page converted to MDX with tabbed language selector

docs/api-reference/streaming.md was the only API reference page with explicit ### TypeScript / Node.js and ### Python H3 sub-headings instead of <Tabs syncKey="lang"> tab groups. This meant selecting “Python” on the Quickstart or SDK guide page did not sync the Streaming page — users had to scroll past the TypeScript example to reach the Python version.

Changes:

Renamed streaming.md → streaming.mdx (required to use Astro Starlight components)
Added import { Tabs, TabItem } from '@astrojs/starlight/components'
Converted the split ”## cURL examples” + ”## OpenAI SDK streaming” sections into a unified ”## Code examples” section with two <Tabs syncKey="lang"> groups:
- Chat completions — TypeScript (OpenAI SDK stream() with finalMessage()), Python (same pattern), cURL
- Anthropic Messages API — TypeScript (Anthropic SDK messages.stream() with text_stream), Python (same), cURL
The Anthropic tab group now includes TypeScript and Python SDK examples that were previously absent from the streaming page (only a cURL example existed)
Raw SSE format sections (Chat completions, Anthropic Messages, Responses API) are unchanged — they remain format references above the code examples
Sidebar slug (api-reference/streaming) unchanged; sidebar config needs no update

2026-05-24 (improvement loop, iteration 43)

Performance — Durable Object `getState()` allocation elimination and `hashApiKey` encoder reuse

src/durable-objects/ApiKeyLimiter.ts — hot-path allocation elimination

getState() is called on every authenticated request via checkAndIncrement(). Previously it always allocated a new State object and three Window sub-objects even when no rate-limit windows had expired — the common case for active keys. The new fast path returns the existing in-memory object directly when !minuteExpired && !dayExpired && !monthExpired, eliminating 4 object allocations per Durable Object call on the hot path.

When some windows have expired, the code now reuses unchanged Window sub-objects by reference instead of creating copies (stored.minute rather than { count: stored.minute.count, resetAt: stored.minute.resetAt }), reducing allocations in the partial-expiry case from 3 new objects to only those needed for expired windows.

src/services/crypto.ts — reuse module-scope TextEncoder in hashApiKey

hashApiKey is called on every authenticated request (before the auth cache lookup, since the hash is the cache key). It previously allocated new TextEncoder() on every invocation instead of using the module-scope encoder constant that already exists in crypto.ts. Using the shared instance avoids one object allocation per request.

2026-05-24 (improvement loop, iteration 42)

Docs — Responses API: missing optional parameters and full validation table

docs/api-reference/responses.md had two gaps sourced from validateResponsesBody in src/validation/openai.ts:

Request body table — five missing parameters added

The table previously listed 10 fields; the following were absent despite being validated by the gateway:

Field	Notes
`store`	boolean — persist the response in OpenAI’s store
`parallel_tool_calls`	boolean — allow simultaneous tool calls
`response_format`	object — `text` / `json_object` / `json_schema` (same shape as chat completions)
`reasoning`	object — `{effort: "low"\|"medium"\|"high", summary?: string}` for o-series reasoning models
`user`	string ≤ 256 chars — end-user identifier forwarded to Fuelix

The instructions row was moved above max_output_tokens to match parameter order in the source.

Gateway validation section — Optional field constraints table added

The previous validation section only documented the two required-field checks (model and input). A new “Optional field constraints” table now covers all 22 validation rules for optional parameters, including type checks, range checks (temperature [0,2], top_p [0,1], max_output_tokens ≥ 16), reasoning.effort enum, reasoning.summary type, the full response_format sub-object tree, and tools/tool_choice shape rules.

2026-05-24 (improvement loop, iteration 186)

Docs — `X-BVE-Cache` header documented in Models and Authentication pages

The X-BVE-Cache response header has been implemented since iteration 183 (via fetchModelsWithStatus() in src/services/fuelix.ts and src/routes/openai.ts) and is listed in the CORS Access-Control-Expose-Headers, but was entirely absent from the documentation.

api-reference/models.md — new “Response headers” and “Model list caching” sections

Added a Response headers table documenting X-Request-Id, X-BVE-Latency, and X-BVE-Cache on GET /v1/models responses.

Added a “Model list caching” subsection explaining the Stale-While-Revalidate (SWR) strategy:

Value	Meaning
`HIT`	Fresh cache entry (< 4 minutes old)
`STALE`	Stale cache entry (4–5 minutes old); background refresh triggered
`MISS`	Cache empty or hard-expired (> 5 min); fresh upstream fetch

Includes a curl -I example showing how to inspect the cache status header and a note that the cache is per-isolate (different isolates may report different values for concurrent requests).

getting-started/authentication.md — three missing headers added to CORS exposed headers table

X-BVE-Cache — previously missing despite being in cors.ts exposeHeaders since iteration 183
X-OpenRouter-Model — actual model selected by OpenRouter after provider routing
X-Or-Cache-Status — OpenRouter semantic cache result (HIT/MISS)
X-Or-Remaining-Tokens — remaining token budget in the OpenRouter rate-limit window

Also removed the upstream lowercase x-ratelimit-* entries that were listed as CORS-exposed but are NOT in cors.ts exposeHeaders (those headers are forwarded in the response but not exposed to cross-origin JS — the correct per-key X-RateLimit-* uppercase headers remain documented).

2026-05-24 (improvement loop, iteration 31)

Docs UI/UX — SDK guide converted to tabbed language selector + X-BVE-Model header documented

SDK guide: sdk.md → sdk.mdx with <Tabs syncKey="lang">

The SDK Usage guide previously showed all TypeScript examples in one long section followed by all Python examples in another — requiring users to scroll past every TS example just to find the Python version of a feature. The page has been converted to use <Tabs syncKey="lang"> throughout, matching the Quickstart page’s UX pattern.

Every feature now shows both languages side-by-side in a tab group sharing syncKey="lang". Selecting “Python” on the Quickstart page automatically switches all tab groups on the SDK page too (and vice versa) — users pick their language once, site-wide.

Tab groups added across all three SDK sections (OpenAI, Anthropic, Groq):

OpenAI SDK: install, chat completions, streaming, embeddings, list models, error handling, function calling, structured output (json_object + json_schema), retry-with-backoff
Anthropic SDK: install, chat (Messages API), streaming
Groq SDK: install, chat completions, streaming
Environment variable pattern (OpenAI) — tabbed

The function calling and structured output sections previously used **Python:** bold labels (not headings) to separate language examples. These are now proper tab items alongside the TypeScript equivalent.

X-BVE-Model response header documented in three pages

The X-BVE-Model header was implemented in the prior iteration (178/30) but was absent from the documentation. Added to:

api-reference/chat-completions.md — added to the BVE Gateway response headers table (buffered JSON responses prefer the upstream model field; streaming uses the request-body model)
api-reference/streaming.md — added to the streaming response headers table
getting-started/authentication.md — added to the CORS exposed response headers table

2026-05-24 (improvement loop, iteration 175)

Docs UI/UX — Quickstart tabbed language selector with syncKey

Converted getting-started/quickstart.md → quickstart.mdx to enable Starlight component usage.

The Quickstart page now uses <Tabs syncKey="lang"> with TabItem labels TypeScript, Python, and cURL. All three tab groups on the page share the same syncKey, so selecting a language in one section automatically switches all others — users pick their language once and see consistent examples throughout.

Tab groups added:

Install the SDK — bun add openai / pip install openai / no install needed
Make your first request — full client.chat.completions.create() examples in each language
List available models — client.models.list() / client.models.list() / curl /v1/models

Also added sidebar badges in astro.config.mjs:

Responses API → New badge (tip variant) — flags the newer stateful OpenAI API
Anthropic Messages API → Anthropic badge (note variant) — distinguishes this from OpenAI-format endpoints

2026-05-24 (improvement loop, iteration 24)

Docs — introduction endpoint table and legacy-completions validation completeness

Two documentation gaps corrected:

introduction.md — three endpoints missing from Public API table — GET /v1/responses/:id, DELETE /v1/responses/:id, and POST /v1/moderations were all implemented and documented in their own reference pages but never added to the gateway-wide endpoint summary in introduction.md. The table now lists all 29 public API endpoints.
legacy-completions.md — optional field constraints undocumented — validateCompletionsBody validates 10 optional parameters (temperature, top_p, max_tokens, n, echo, presence_penalty, frequency_penalty, stop, user, seed) with the same range and type rules as the chat completions endpoint, but none of these were documented in the gateway validation section. The page previously only listed the required model constraint and the two unsupported-parameter rejections (best_of, logprobs). Added a complete “Optional field constraints” table covering all 17 validation rules, plus the standard 400 error envelope example.

2026-05-23 (improvement loop, iterations 166–168)

Validation — Anthropic Messages API content block validation

POST /v1/messages now validates each content block when messages[N].content is an array. Previously, malformed Anthropic content blocks (e.g., missing source in an image block, missing tool_use_id in a tool_result block) passed through to Fuelix and returned opaque Pydantic 422 errors that clients could not parse.

All Anthropic content block types are now validated at the gateway:

text — text must be a non-empty string.
image — source must be an object with type: "base64" or "url". Base64 images require media_type (one of image/jpeg, image/png, image/gif, image/webp) and a non-empty data string. URL images require a non-empty url string.
tool_use — id, name, and input are all required. input must be a plain object (not an array).
tool_result — tool_use_id is required. content (when present) must be a string or array.
document — source must have type: "base64", "url", or "text". URL sources require url; base64 and text sources require data.
Unknown types (thinking, redacted_thinking, future types) pass through without validation.

All errors return 400 with the standard { error: { message, type, param, code } } envelope.

Validation — multimodal content block validation for /v1/chat/completions

POST /v1/chat/completions now validates image_url content blocks in multimodal messages. Previously, a missing image_url.url or an invalid image_url.detail value passed through to Fuelix and produced an opaque error.

Validated conditions:

image_url must be an object.
image_url.url must be a non-empty string (required).
image_url.detail must be "auto", "low", or "high" (when present).
text blocks require a non-empty text string.
Each block must be an object with a string type field.

Also added: inverted date-range rejection for admin endpoints. GET /admin/audit-logs, GET /admin/request-logs, and GET /admin/model-stats now return 400 validation_error when since > until (previously returned HTTP 200 with 0 rows — indistinguishable from a valid empty window).

The admin login endpoint (POST /api/login on admin.bve.me) now truncates the User-Agent header to 512 characters before storing it in D1. Previously the raw header was stored verbatim — an attacker could send multi-megabyte User-Agent values to exhaust D1 storage quota without authentication. The cap is generous for all legitimate user agents (browsers: 100–300 chars, SDK clients: 30–80 chars).

2026-05-23 (improvement loop, iteration 150)

Docs — corrected endpoint availability: four active endpoints mislabeled as disabled

docs/api-reference/models.md had a “Unsupported endpoints kept disabled” section that incorrectly listed four endpoints which are fully implemented and active in the gateway:

POST /v1/completions — emulated via emulateCompletions() since iteration 2; has gateway validation and real token usage tracking. The docs said it was disabled.
POST /v1/responses — responsesProxy added later; validates required fields, extracts token usage from both streaming (response.completed SSE event) and non-streaming responses. The docs said it was disabled.
POST /v1/messages and GET /v1/messages/:id — Anthropic Messages API proxied at /v1/messages with messagesProxy; full SSE and NDJSON streaming support, validates max_tokens required field. Listed as “native Anthropic /v1/messages” in the disabled list.
POST /v1/moderations — content moderation endpoint with its own validation and buffered JSON response handling. The docs said it was disabled.

These endpoints were active in src/routes/openai.ts and were correctly listed in getting-started/introduction.md but had never been removed from the disabled list in api-reference/models.md after each was re-enabled in successive iterations.

“Supported endpoints” section expanded — the section previously listed only 7 endpoints; it now lists all 16 current endpoint groups including the four above, plus GET /v1/models/:id, POST /v1/images/edits, and the full Assistants / Threads / Vector Stores / Files API surface.

“Unsupported endpoints” section corrected — the section retains only the genuinely unimplemented paths (GET /v1/batches, native Gemini/Cohere/OpenRouter/Mistral routes, /openai/v1/chat/completions). The section title changed from “Unsupported endpoints kept disabled” to “Unsupported endpoints” to avoid implying they were ever intentionally enabled then disabled.

Docs — `GET /admin/stats` added to admin endpoint tables

GET /admin/stats was implemented in iteration 143 and has its own reference page (admin-api/stats.md) and a changelog entry, but was never added to the two gateway-wide endpoint reference tables:

admin-api/overview.md — the admin endpoint table now includes GET /admin/stats
getting-started/introduction.md — the admin endpoint table in the Introduction now includes GET /admin/stats

2026-05-23 (improvement loop, iteration 143)

Admin API — GET /admin/stats: isolate diagnostics section

GET /admin/stats now returns an additional isolate object alongside the existing keys and current_month fields:

{
  "isolate": {
    "month": "2026-05",
    "request_count": 1042,
    "soft_cap": 8500000,
    "hard_cap": 9500000,
    "soft_cap_exceeded": false,
    "hard_cap_exceeded": false
  }
}

request_count — Number of requests this specific Worker isolate has handled this month. Resets when the month rolls over or the isolate is recycled by Cloudflare.
soft_cap / hard_cap — Values of the MONTHLY_WORKER_REQUEST_SOFT_CAP and MONTHLY_WORKER_REQUEST_HARD_CAP wrangler vars. 0 means disabled.
soft_cap_exceeded / hard_cap_exceeded — Boolean flags computed from request_count >= cap. When hard_cap_exceeded is true, subsequent requests from this isolate return 503 capacity_exceeded.

Operators can now monitor cap proximity via the admin API without streaming Worker logs via wrangler tail. The change is purely additive — existing clients that parse keys and current_month are unaffected.

2026-05-23 (improvement loop, iteration 140)

Chat completions validation — message role value and content type checks

Two new validation rules for POST /v1/chat/completions message items:

Role value validation — messages[N].role must be one of: system, user, assistant, tool, function, developer. The gateway previously checked that role is a string, but accepted arbitrary values like "operator", "bot", or "agent". These were forwarded to Fuelix which returned opaque FastAPI/Pydantic 422 responses instead of the OpenAI-compatible error shape clients expect. Clients now receive 400 invalid_value with param: "messages[N].role".
Content type validation — messages[N].content, when present and non-null, must be a string or array. Numbers, booleans, and plain objects previously passed through to Fuelix unvalidated. Clients now receive 400 invalid_type with param: "messages[N].content". null is still accepted (assistant messages with tool_calls legitimately set content to null).

These checks run before the request reaches Fuelix, replacing opaque upstream validation errors with stable, descriptive gateway errors. Valid requests with correct types and known roles are completely unaffected.

2026-05-23 (improvement loop, iteration 139)

Docs — `?offset=` pagination documented for request-logs and audit-logs

GET /admin/request-logs — ?offset= parameter added to docs — The offset-based pagination parameter (implemented in iteration 135) was missing from the query parameter table in admin-api/request-logs.md. Added the offset row (integer, default 0, use with ?limit= for cursor-free pagination) and a new “Paginate through logs” cURL example showing page 1 and page 2 requests.
GET /admin/audit-logs — ?offset= parameter added to docs — Same gap as request-logs. Added the offset row to the query parameter table and a “Paginate through logs” cURL example.
admin-api/audit-logs.md — cURL section normalized to H3 format — The cURL examples in audit-logs.md each had their own top-level ## heading (## cURL — all logs, ## cURL — filter by action type, etc.), which created a cluttered table of contents and was inconsistent with request-logs.md (which uses a single ## cURL examples section with ### sub-headings). Reorganized to match the standard pattern used throughout the rest of the docs.

2026-05-23 (improvement loop, iterations 129–130)

Chat completions validation — three new parameters + stream_options semantic rule

logprobs (boolean) — The gateway now validates that logprobs is a boolean when present. The legacy /v1/completions API uses logprobs as an integer (number of top log probs); the chat completions API uses a boolean. Passing logprobs: 5 previously reached Fuelix and produced an opaque Pydantic 422; the gateway now returns 400 invalid_type immediately.
top_logprobs (integer [0, 20]) — Used alongside logprobs: true to specify how many top-token probabilities to return per output token. Supported by OpenAI and Groq. Values outside the [0, 20] range, non-integers, or wrong types now return 400 invalid_value instead of an opaque upstream error.
parallel_tool_calls (boolean) — Controls whether the model may call multiple tools simultaneously. Must be a boolean; passing parallel_tool_calls: "true" previously forwarded a non-boolean to Fuelix and produced a 422. Now returns 400 invalid_type.
stream_options semantic rule — The gateway already validated stream_options structurally (must be an object, include_usage must be boolean). It now also enforces the semantic requirement: stream_options is only valid when stream: true is also set. Sending stream_options without stream: true returns 400 invalid_value with param: "stream_options" and message "stream_options requires stream: true", instead of forwarding the request to Fuelix where it would be silently ignored or produce a confusing error.

These changes are documented in the Gateway validation section of the Chat Completions reference.

2026-05-23 (improvement loop, iteration 116)

Docs — Corrected reset-quota D1 behavior

admin-api/quota.md — fixed inaccurate caution note for POST /admin/api-keys/:id/reset-quota — The previous docs stated “Resetting quota counters does not reset the D1 monthly_usage table. Monthly token totals are unaffected.” This was factually wrong. The actual resetQuota() implementation in src/services/keys.ts also zeroes D1 monthly_usage (and evicts the monthly token cache) when window === 'month' || window === 'all', and zeroes D1 daily_usage when window === 'day' || window === 'all'. Replaced the incorrect caution with a table showing exactly what each window value resets across the Durable Object counters, D1 daily_usage, D1 monthly_usage, and the monthly token cache.

2026-05-23 (improvement loop, iteration 114)

Docs — Messages API gateway validation accuracy

messages.md — two corrected error codes — The Gateway validation table listed model not a string and messages not an array both as invalid_value. The actual gateway response for type mismatch errors is invalid_type (matching the OpenAI API convention: wrong type → invalid_type, wrong value → invalid_value). Both rows now correctly show invalid_type.
messages.md — split max_tokens into two rows — The previous table merged “max_tokens not a number” and “max_tokens not a positive integer” into a single invalid_value row. These are distinct cases: a non-number returns invalid_type; a non-integer or non-positive number returns invalid_value. The table now has separate rows for each.
messages.md — new Optional field constraints section — validateMessagesBody validates 13 optional parameters (temperature, top_p, top_k, stream, system, stop_sequences, metadata, metadata.user_id, thinking and sub-fields), but none of these were documented. Added a complete “Optional field constraints” table matching the source implementation, including Anthropic-specific constraints (temperature range is [0, 1] not [0, 2], top_k must be a positive integer, thinking.budget_tokens must be ≥ 1024). Also added the standard 400 error envelope example.

2026-05-23 (improvement loop, iteration 51)

Error compatibility

Proxy error type field: server_error → api_error — All gateway-generated 5xx and proxy error responses now use type: "api_error" to match the real OpenAI API specification. Previously, errors from proxyToFuelix (unreachable upstream, 5xx upstream responses, insecure/invalid FUELIX_BASE_URL, redirect blocks, malformed upstream JSON) and the global rate limiter (503 capacity exceeded) all returned type: "server_error" — a non-standard value that OpenAI’s own error-handling code does not recognize. Affected error codes: internal_error (500), upstream_error (502), and capacity_exceeded (503).

2026-05-23 (improvement loop, iteration 49)

Docs — SDK guide examples

Function calling examples — Added TypeScript and Python function/tool-calling examples to the SDK Usage guide. Both examples show defining a get_weather tool, sending tool_choice: "auto", and parsing tool_calls from the response. Linked to the Chat Completions — Function calling reference section.
Structured output examples — Added TypeScript and Python examples for response_format: { type: "json_object" } and response_format: { type: "json_schema", ... }. The json_schema example includes strict: true and an additionalProperties: false schema — the minimum required shape for structured output with schema validation.
Retry-with-backoff pattern — Added TypeScript and Python examples showing how to read the Retry-After response header on a RateLimitError and sleep that many seconds before retrying. Includes a note that the OpenAI SDK retries automatically by default, and how to disable that (maxRetries: 0 / max_retries=0) when implementing custom retry logic.

2026-05-23 (improvement loop, iterations 97–103)

Model compatibility

o1-family vs o3/o4-family parameter restrictions — The gateway previously applied the same strict parameter set to all o-series reasoning models. Restrictions are now split by family:
- o1 family (o1, o1-mini, o1-preview, and dated variants starting with o1-): temperature, top_p, n, presence_penalty, and frequency_penalty must equal their defaults (1, 1, 1, 0, 0). The gateway returns 400 unsupported_value for any other value.
- o3/o4 family (o3, o3-mini, o4-mini, and dated variants starting with o3- or o4-): only n must be 1. The other four parameters are accepted and forwarded to Fuelix unchanged.
- Clients using temperature: 0.7 with o3-mini previously received a gateway 400 even though the request is valid per the upstream API spec. This is now correctly allowed.

Admin API

Double-revoke guard — DELETE /admin/api-keys/:id (revoke) now returns 404 not_found when called on a key that is already revoked. Previously, revoking an already-revoked key would overwrite the original revoked_at timestamp with the current time. The guard uses a ne(status, 'revoked') condition in the D1 query.
GET /admin/api-keys — ?limit= and ?offset= in OpenAPI spec — The OpenAPI spec now documents limit and offset as integer query parameters for GET /admin/api-keys. Both parameters were already supported by the route handler; the spec entry was the missing piece.
validationError helper in admin routes — The 400 validation error response block repeated verbatim 13 times across admin.ts. Extracted into a validationError(c, message, param, code?) helper, eliminating 46 lines of duplication. Also fixed a code inconsistency: POST /admin/model-allowlist was returning code: 'validation_error' for JSON parse failures while all other POST routes use code: 'invalid_json'.
notFoundError helper in admin routes — The 404 not-found response block repeated verbatim 9 times across admin.ts. Extracted into a notFoundError(c, message) helper.
Double createDb in POST /admin/api-keys/:id/reset-quota — The reset-quota handler was calling createDb(env.DB) twice in sequence. Eliminated the redundant call.

Docs

chat-completions.md — Reasoning model parameter restrictions subsection — A new “Reasoning model parameter restrictions” subsection within “Gateway validation” lists exactly which parameters are blocked for all reasoning models (n, logprobs, top_logprobs) vs only for o1-family models (temperature, top_p, presence_penalty, frequency_penalty).
chat-completions.md — Corrected o-series note — The “Reasoning models (o-series)” note now correctly differentiates o1-family (strict fixed params) from o3/o4-family (only n=1 required). The previous note incorrectly stated that o3/o4 models do not support temperature, top_p, presence_penalty, or frequency_penalty.
README.md — ?offset= added to GET /admin/api-keys — The admin endpoint table row for GET /admin/api-keys now documents ?offset= alongside ?limit=. The offset parameter has been supported since iteration 36 and was fully documented in the Starlight docs but missing from the README.

2026-05-23 (improvement loop, iteration 42)

Docs

chat-completions.md — reasoning_effort added to optional field constraints table — The “Optional field constraints” table listed all 19 optional parameters except reasoning_effort. The parameter is validated by the modelFilter middleware: on o-series models (o1, o1-mini, o1-preview, o3, o3-mini, o4-mini, and dated variants), any value that is not "low", "medium", or "high" (non-string or unrecognised string) returns 400 invalid_value with param: "reasoning_effort". The table now includes a row documenting this constraint.
introduction.md — POST /admin/api-keys/:id/reset-quota added to Admin API endpoint table — The gateway-wide endpoint reference in the Introduction page listed all admin endpoints except POST /admin/api-keys/:id/reset-quota, which was already documented in admin-api/quota.md and admin-api/overview.md. The introduction table now includes the reset-quota endpoint, consistent with the other endpoint reference pages.

2026-05-23 (improvement loop, iteration 96)

Docs

chat-completions.md — complete request body reference — The request body table now documents all 19 validated parameters, including seven that were previously missing: max_completion_tokens, n, stream_options, logit_bias, tools, tool_choice, and response_format. Each entry includes its type, optionality, and constraint range.
chat-completions.md — comprehensive gateway validation section — The previous “Gateway validation” section only listed the model and messages required-field checks. Expanded into two tables: one for required-field rules (including per-message role validation) and one for all optional parameter constraints (temperature, top_p, max_tokens, max_completion_tokens, n, stream, stream_options, penalties, logit_bias, stop, tools, tool_choice, response_format, user, seed). Every row names the constraint, the error code, and the param field.
chat-completions.md — Function calling and JSON mode sections — Added two new sections documenting the tools/tool_choice function-calling pattern (with a full JSON example and name-regex constraint note) and the response_format JSON mode / structured-output pattern (with json_object and json_schema examples).

2026-05-23 (improvement loop, iterations 86–95)

Request validation

response_format validation — POST /v1/chat/completions now validates response_format before forwarding to Fuelix. Must be an object with a type field ("text", "json_object", or "json_schema"). When type is "json_schema", the json_schema sub-object and json_schema.name (non-empty string) are required. Invalid values return 400 invalid_type or 400 invalid_value with descriptive param paths (e.g. response_format.type, response_format.json_schema.name).
tools and tool_choice validation — Function-calling parameters are now validated. tools must be a non-empty array; each entry must have type: "function" and a function.name matching [a-zA-Z0-9_-]{1,64}. tool_choice must be "none", "auto", "required", or {type: "function", function: {name}}. Invalid values return descriptive 400s with per-index param paths (e.g. tools[0].function.name).
logit_bias and stop validation — logit_bias must be an object mapping token ID strings to numbers in [-100, 100]. stop must be a string or an array of ≤ 4 strings. Both validate type and value separately and return param paths that point to the specific offending key (e.g. logit_bias.12345, stop[2]).
user and seed validation — user must be a string of ≤ 256 characters (prevents oversized fields from reaching Fuelix). seed must be an integer — fractional numbers (e.g. 1.5) are rejected with invalid_type. Both validations apply to POST /v1/chat/completions.
max_completion_tokens validation — POST /v1/chat/completions validates max_completion_tokens with the same rules as max_tokens: must be a positive integer when present. max_completion_tokens is the preferred parameter for o-series models.
frequency_penalty and presence_penalty range validation — Both penalty parameters must be numbers in [-2, 2]. Out-of-range values now return 400 invalid_value immediately instead of reaching Fuelix as an opaque Pydantic error.

Observability

keyName in request_logs_sampled and structured request logs — All 7 proxied routes in openai.ts now pass apiKey.name to sampledLogCallback and handleBufferedJsonResponse, so the human-readable key name is stored in the key_name column of the request_logs_sampled D1 table. Also added to per-request structured log lines (wrangler tail) alongside keyId. Requires migration 0003_breezy_cardiac.sql (ALTER TABLE request_logs_sampled ADD key_name TEXT) to be applied in production.
keyId + keyName in suspended/revoked auth failure warn logs — The key_suspended and key_revoked console.warn events now include both keyId and keyName, so security incidents are immediately identifiable without a separate admin API lookup.
requestId in EVENTS_QUEUE alert event payloads — Rate-limit alert events queued to bve-gateway-events now include the requestId UUID. This propagates to the handleQueue console.warn log line and to the webhook POST body. Operators can now cross-reference a specific X-Request-Id with its alert event without fuzzy time-window matching.

Admin API

POST /admin/api-keys/:id/reset-quota — Manually reset an API key’s Durable Object quota counters. Accepts optional { "window": "minute" | "day" | "month" | "all" } (defaults to "all"). Returns { key_id, window, reset: true }. Appends api_key.quota_reset to the audit log.

Security

X-Frame-Options: DENY — Added to the security headers middleware (securityHeaders.ts). Belt-and-suspenders clickjacking protection alongside the existing Content-Security-Policy: frame-ancestors 'none'.
Cross-Origin-Resource-Policy: cross-origin — Added to allow browser-based OpenAI SDK clients to read responses from cross-origin JS. Explicit CORP declaration required by browser isolation policies.
User-Agent truncated to 200 chars in logs — The structured request logger truncates the User-Agent header to 200 characters before writing to prevent large UA strings from bloating Logpush exports.

2026-05-23 (improvement loop, iterations 93–162)

Observability

upstreamLatencyMs in structured request logs — Every request log line now includes upstreamLatencyMs: the wall-clock time from just before the upstream Fuelix fetch until response headers arrive. For streaming responses this is first-byte time. Absent for middleware-level rejections (401/403/413/429) that never reach Fuelix. Lets operators distinguish “Fuelix is slow” from “gateway overhead” in wrangler tail or Logpush without extra tooling.
keyName in request logs and sampled request logs — Structured log lines and request_logs_sampled rows now include the key’s human-readable name alongside keyId. Makes log filtering by team or environment possible without cross-referencing the api_keys table.
requestId in queue alert payloads — Rate limit alert events enqueued to bve-gateway-events now include the requestId UUID, propagating to handleQueue logs and webhook POST bodies for cross-service correlation.
Isolate diagnostics in GET /admin/stats — The stats endpoint now returns an isolate block with per-isolate metrics: requestCount, softCapHit, hardCapHit, and current month. The OpenAPI schema has been updated.

New endpoints and features

Full admin dashboard at admin.bve.me — A React-based admin UI at admin.bve.me provides a visual interface for key management, quota inspection, audit logs, request logs, model allowlist configuration, and real-time stats. The dashboard is served via the same Worker using the ASSETS binding.
POST /v1/moderations — Content moderation endpoint now fully implemented. Accepts input (string or array) and optional model (defaults to omni-moderation-latest upstream). Moderation models are endpoint-locked: sending them to any other endpoint returns 400 model_endpoint_mismatch. See Moderations for full documentation.
GET /v1/models/:id filtered per-key and global blocklist — The model detail endpoint now respects per-key allowed_models and the global model allowlist, returning 403 model_not_available for blocked models.
POST /admin/api-keys/:id/reset-quota — Manually reset an API key’s Durable Object quota counters for minute, day, month, or all windows. Returns { key_id, window, reset: true }.
?offset= pagination on admin log endpoints — GET /admin/request-logs and GET /admin/audit-logs now accept ?offset=N for cursor-based pagination in addition to ?limit=.

Model routing and validation

mistral-ocr locked to /v1/chat/completions — The OCR model is now strictly routed via the chat completions endpoint only. Attempts to use it on /v1/embeddings, /v1/images/generations, or any other endpoint return 400 model_endpoint_mismatch.
Moderation model guard — omni-moderation-* and text-moderation-* models can only be used with /v1/moderations. Cross-endpoint use returns 400 model_endpoint_mismatch.
Extended chat completions validation — Added validation for logprobs, top_logprobs, parallel_tool_calls, stream_options, max_completion_tokens, role values, and content type per message. stream_options is only allowed when stream: true.
Anthropic tools/tool_choice validation in /v1/messages — Validates tool definitions (name, description, input_schema) and tool_choice values accepted by Claude.

Security

Strip client-injected Host header — proxyToFuelix now strips the Host header before forwarding to prevent host-header injection attacks against Fuelix.
CF-Connecting-IP preferred over X-Forwarded-For — Admin login rate limiting now uses the Cloudflare-verified CF-Connecting-IP header instead of the client-supplied X-Forwarded-For, preventing rate-limit bypass via header spoofing.
Webhook redirect chain protection — HMAC-signed webhook notifications now reject redirect chains (301/302) that would downgrade HTTPS to HTTP.
Admin session expiry pruning — Expired adminSessions rows are pruned asynchronously via ctx.waitUntil on each request, keeping the D1 table from accumulating stale data.

Provider compatibility (OpenRouter)

x-or-cache-status, x-or-remaining-tokens, x-openrouter-model — OpenRouter-specific response headers are now forwarded to clients alongside the standard quota headers. Clients using the OpenRouter API can observe cache hits and remaining token budgets without additional API calls.

Performance

Parallel D1 + DO quota queries — The quota check now issues the D1 monthly token total query and the Durable Object checkAndIncrement call in parallel (where both are needed), reducing per-request latency by the sequential D1 roundtrip.
Retry-After on 503 global rate limit responses — The globalRateLimiter middleware now includes a Retry-After header on 503 capacity_exceeded responses, enabling clients to back off correctly.
modelFilter per-call Set allocation eliminated — Endpoint guard logic refactored to avoid allocating transient Set objects on each request, reducing GC pressure at high RPS.

Bug fixes

Images/generations: return 502 (not probe image) on empty upstream data — If Fuelix returns an empty data: [] array from the image generation endpoint, the gateway returns a clean 502 upstream_error instead of silently returning the probe image used for retry detection.
reset-quota now zeroes D1 daily/monthly usage rows — POST /admin/api-keys/:id/reset-quota previously only reset the Durable Object counters; it now also zeroes the corresponding D1 daily_usage and monthly_usage rows for window=day, window=month, and window=all.
Error type field consistency — All gateway errors now consistently use "type": "api_error" for server-side errors and "type": "invalid_request_error" for client errors, matching OpenAI API spec.

2026-05-23 (improvement loop, iteration 92)

Model compatibility

max_completion_tokens validation — POST /v1/chat/completions now validates max_completion_tokens with the same rules as max_tokens: must be a positive integer when present. Previously, invalid values (0, -1, 1.5, a string) were forwarded to Fuelix unvalidated, producing opaque upstream errors. max_completion_tokens is the preferred parameter for o-series models and is accepted by all chat completions targets; max_tokens remains valid and both may coexist in a request.

2026-05-23 (improvement loop, iterations 86–91)

Observability

keyName in request_logs_sampled D1 table — All 7 call sites in openai.ts that write sampled request log rows now pass apiKey.name through to the key_name TEXT column added in migration 0003_breezy_cardiac.sql. Previously, 7 routes (responses proxy streaming/buffered, completions emulation, embeddings, audio/speech, audio/transcriptions, images/generations) stored null for key_name even though the schema column existed. Operators querying request_logs_sampled can now filter by key_name without cross-referencing api_keys.
requestId in EVENTS_QUEUE alert event payloads — When a rate limit is exceeded and the event is enqueued to bve-gateway-events, the requestId UUID is now included in the payload. This propagates to the console.warn log line in handleQueue and to the webhook POST body. Operators investigating a specific request by its X-Request-Id header (or requestId in request logs) can now find the corresponding alert event without a fuzzy keyId + time-window match.
keyName in structured request logs and auth failure audit logs — Structured log lines for proxied requests now include the key’s human-readable name alongside keyId. Auth failure audit events (emitted by adminAuth and auth middleware on rejected requests) also include keyName for operator context.

Admin API

POST /admin/api-keys/:id/reset-quota — Manually reset an API key’s Durable Object quota counters. Accepts an optional JSON body { "window": "minute" | "day" | "month" | "all" } (defaults to "all" if omitted). Resets the specified counter window(s) in the ApiKeyLimiter Durable Object and appends an api_key.quota_reset entry to the audit log. Returns { key_id, window, reset: true } on success, 404 not_found for unknown key IDs, and 400 validation_error for an unrecognized window value.
Terminal window
```
# Reset all windows
curl -X POST https://api.bve.me/admin/api-keys/k-abc123/reset-quota \
  -H "Authorization: Bearer admin_bve_YOUR_ADMIN_KEY"

# Reset only the per-minute RPM counter
curl -X POST https://api.bve.me/admin/api-keys/k-abc123/reset-quota \
  -H "Authorization: Bearer admin_bve_YOUR_ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{"window":"minute"}'
```

2026-05-23 (improvement loop, iteration 85)

chat-completions.md — gateway per-key X-RateLimit-* headers added to response headers table — The page listed upstream x-ratelimit-* forwarded headers but was missing the gateway’s own per-key rate-limit headers (X-RateLimit-Limit-Requests, X-RateLimit-Remaining-Requests, X-RateLimit-Reset-Requests, X-RateLimit-Limit-Day, X-RateLimit-Remaining-Day, X-RateLimit-Reset-Day, and the optional token variants). These are the headers clients should use for per-key backoff. The upstream vs gateway headers note is now explicit.
“Next steps” sections added to stats.md, quota.md, and deployment.md — All three pages ended abruptly with no navigation pointers to related content. Added “Next steps” sections with links to related admin and API reference pages. Also added a use-cases section to stats.md (dashboard summary and threshold alert pattern).

2026-05-23 (improvement loop, iterations 83–84)

Reasoning model support

reasoning_effort parameter validation — Requests to POST /v1/chat/completions using o-series models (o1, o1-mini, o1-preview, o3, o3-mini, o4-mini, and dated variants like o4-mini-2025-04-16) with an invalid reasoning_effort value now return 400 invalid_value immediately, before the request reaches Fuelix. Valid values are "low", "medium", and "high". Non-string values (e.g. an integer) and unknown strings (e.g. "extreme") are both rejected. Non-reasoning models with any reasoning_effort value are unaffected (the parameter is forwarded unchanged). See Chat Completions — Reasoning models.

Docs

admin-api/quota.md — new Key Quota reference page — Documents GET /admin/api-keys/:id/quota with the full response schema (minute/day/month window counts, remaining headroom, reset timestamps), data-source consistency notes (Durable Object vs D1), common errors, a bash batch-job pre-check example, and a TypeScript token burn-rate monitoring example. Listed in the Admin API sidebar between Gateway Stats and Audit Logs.

2026-05-23 (improvement loop, iteration 27)

Docs accuracy — security.md

Security response headers: added X-Permitted-Cross-Domain-Policies: none — This header was added to securityHeaders.ts in a recent code commit but was absent from the security response headers table. Prevents Adobe Flash Player, Adobe Reader, and similar plugin runtimes from making cross-domain policy requests to the API server.
Upstream header sanitization: expanded to document all stripped/rewritten headers — The section now covers four header categories:
- Credential and auth headers — x-api-key, openai-organization, openai-project, x-basicllm-*, x-goog-api-key, anthropic-auth-token, x-anthropic-auth-token (unchanged; already documented)
- Session and proxy credentials — cookie, proxy-authorization (unchanged; already documented)
- Cloudflare metadata headers — CF-Connecting-IP, CF-Ray, CF-Visitor, CF-IPCountry, True-Client-IP (newly documented; all stripped before forwarding to Fuelix)
- Forwarding and IP headers — X-Forwarded-For (overwritten with Cloudflare-verified client IP to prevent IP spoofing), x-forwarded-host, x-forwarded-proto, x-real-ip, forwarded (all stripped; newly documented; prevent host/IP spoofing in Fuelix logs)
Response header filtering table: added x-openrouter-model — This header is forwarded from Fuelix and was present in SAFE_UPSTREAM_HEADERS in src/services/fuelix.ts but was missing from the documentation table. It carries the actual model ID selected by OpenRouter after provider routing.

2026-05-23 (improvement loop, iteration 78)

Docs accuracy

images.md — added dall-e-2 and gpt-image-1 to image models table — Both models were already guarded by the image-only model filter (IMAGE_ONLY_MODELS) but were not listed in the docs. Updated the caution to note that DALL-E model availability depends on Fuelix subscription.
admin-api/model-allowlist.md — documented SWR cache behavior — Added a “Caching behavior” section explaining the two-tier Stale-While-Revalidate TTL strategy: per-model lookups (20 s soft / 30 s hard TTL), full allowlist for /v1/models (45 s soft / 60 s hard TTL). Operators now know that a newly blocked model may still be accessible for up to 30 s in warm isolates, and that a Worker redeploy clears all caches immediately.
curl-examples.md — added Groq Whisper STT examples — Added whisper-large-v3 (Groq Whisper, faster) and distil-whisper-large-v3-en (Groq distil-Whisper, fastest) transcription curl examples alongside the existing whisper-1 example. Renamed the whisper-1 heading to clarify it is the OpenAI variant.

2026-05-23 (improvement loop, iteration 24)

Model compatibility

STT model endpoint guard — whisper-* and distil-whisper-* models are now blocked from non-transcription endpoints at the gateway level. Sending whisper-1, whisper-large-v3, whisper-large-v3-turbo, or distil-whisper-large-v3-en to /v1/chat/completions, /v1/embeddings, or any other endpoint returns 400 model_not_supported_for_endpoint with a clear “speech-to-text model” message. Groq-hosted Whisper variants are covered by the same whisper-* and distil-whisper-* prefix matching.
Image-generation-only model guard — dall-e-2, dall-e-3, gpt-image-1, imagen-3, imagen-3-fast, and the dall-e-* / imagen-* prefixes are now blocked from non-images endpoints. Sending them to /v1/chat/completions or /v1/embeddings returns 400 model_not_supported_for_endpoint with a clear “image generation model” message.
NDJSON passthrough in /v1/messages — When a Gemini model is routed through /v1/messages and Fuelix returns application/x-ndjson streaming (rather than Anthropic-format text/event-stream), the gateway now passes the NDJSON stream through unchanged and extracts token usage from the usageMetadata field, matching the existing NDJSON handling for /v1/chat/completions.

Request validation

POST /v1/audio/transcriptions pre-flight validation — The transcription endpoint now validates model and file before forwarding to Fuelix. Missing either field returns 400 missing_required_parameter. Sending a TTS, embedding, or image-generation model to this endpoint returns 400 model_not_supported_for_endpoint. Previously, missing fields produced opaque Fuelix errors.
POST /v1/audio/speech extended voice set — The validator now accepts 9 voices: alloy, ash, coral, echo, fable, nova, onyx, sage, shimmer. Previously, only the original 6 voices (alloy, echo, fable, nova, onyx, shimmer) were accepted; ash, coral, and sage were rejected with 400 invalid_value.
gpt-4o-mini-tts TTS model recognized — gpt-4o-mini-tts is now a recognized TTS-only model: it is accepted by POST /v1/audio/speech and blocked from all other endpoints via the TTS model guard.

Security

Strip cookie and proxy-authorization from upstream requests — proxyToFuelix now explicitly deletes cookie and proxy-authorization headers before forwarding to Fuelix. Browser clients may carry session cookies that have no meaning to Fuelix; proxy credentials from intermediate HTTP proxies must not reach the upstream provider. Both headers are now stripped alongside the existing x-api-key, openai-organization, x-goog-api-key, and x-basicllm-* blocks.

2026-05-23 (improvement loop, iteration 13)

Docs

cURL Examples quick-reference page added — A new guides/curl-examples.md page provides a single-page reference for every endpoint: public API (health, models, chat completions streaming + non-streaming, embeddings, legacy completions, Anthropic Messages, Responses API, TTS, audio transcription, image generation) and admin API (all key lifecycle operations, quota status, usage, audit logs, request logs with model/endpoint filters, model allowlist). Linked from the Guides sidebar between SDK Usage and Security Notes.

2026-05-23 (improvement loop, iterations 57–60)

Admin API

GET /admin/api-keys ?name= substring search — Pass ?name=production to return only keys whose display name contains the search string (case-insensitive). Combinable with ?status=: ?status=active&name=prod returns active keys matching the name. Empty or blank ?name= is ignored (returns all keys). The D1 LIKE %name% query has been implemented since early iterations; this change threads it through the service and route layers and exposes it via the admin API.

Bug fixes

D1 audit-log timestamp fix — Raw INSERT statements in the test suite were using Date.now() (milliseconds) for created_at columns that store Unix seconds. For the corrupt-data regression tests, this created rows with timestamps ~55,000 years in the future that were matching date-range queries they shouldn’t. Fixed to Math.floor(Date.now() / 1000).

Performance

D1 indexes for audit_logs and request_logs_sampled — Three new indexes added via migration 0001_sticky_ogun.sql:
- audit_logs(created_at) — makes ?since=/?until= date-range queries O(log n) instead of full table scans
- audit_logs(target_id) — makes per-key audit history lookups O(log n)
- request_logs_sampled(key_id, created_at) — covers the combined key-filter + date-order path in GET /admin/request-logs

Docs

api-reference/responses.md — Added gateway validation section documenting model (required string) and input (required string or array) pre-flight checks; added “Next steps” navigation section.
api-reference/legacy-completions.md — Added gateway validation table; added “Next steps” navigation section.
admin-api/api-keys.md — List API keys: added ?name= to query parameters table with combinability note; added two cURL examples (name search, combined name+status).

2026-05-22 (improvement loop, current session)

Docs

errors.md — complete param field coverage — Every error response in BVE Gateway includes a param field (the OpenAI error schema requirement). The error reference page now shows param in the canonical envelope example, in every inline JSON example (rate limit, 500, 404), and explains that it is null for most errors and a field name (e.g., "model") for validation errors.
errors.md — 503 capacity_exceeded documented — The global per-isolate request cap guard (globalRateLimiter middleware) returns 503 capacity_exceeded when the hard cap is hit. This status code and error code were missing from the HTTP status codes table and error codes reference. Both are now documented, with a link to the Rate Limits page for full cap details.
errors.md — "Monthly token limit exceeded" rate limit message added — The rate limit “Possible messages” list was missing this message, which is returned when a key’s monthly_token_limit is reached. All four possible 429 messages are now documented with their conditions.

Admin API filtering

GET /admin/api-keys ?status= filter — Pass ?status=active, ?status=suspended, or ?status=revoked to list only keys in a specific lifecycle state. Previously the endpoint always returned all statuses regardless of the query. An unrecognized status value returns 400 validation_error.
GET /admin/usage ?from=/?to= date filters — Scope usage reports to a calendar window. Both parameters accept YYYY-MM-DD format and can be combined with ?key_id=. The daily and monthly tables are filtered independently (daily by date, monthly by year-month prefix). Invalid date strings return 400 validation_error.
GET /admin/audit-logs ?since= filter — Return only audit log entries at or after an ISO 8601 timestamp (e.g., ?since=2026-05-01T00:00:00Z). Combines with the existing ?target_id=, ?action=, and ?limit= params. Invalid timestamps return 400 validation_error.
GET /admin/request-logs ?since= filter — Same pattern as audit-logs: filter sampled request log rows to entries on or after the given timestamp.

Provider compatibility

Fuelix 4xx normalization — proxyToFuelix now converts FastAPI/Pydantic 4xx error shapes ({"detail":[{"loc":["body","model"],"msg":"..."}]} and {"detail":"..."}) into the standard OpenAI error envelope before returning to clients. Previously, these reached callers in a format OpenAI SDK error parsers cannot handle.
/v1/completions body validation — Pre-flight validation added for POST /v1/completions before the emulation path runs. Missing or wrong-type model field returns 400 missing_required_parameter instead of a 400 invalid_json from Fuelix truncating a body with a mismatched Content-Length.

Security

X-Forwarded-For sanitization — proxyToFuelix now strips client-injected X-Forwarded-For and replaces it with the Cloudflare-verified CF-Connecting-IP (real client IP). CF-Visitor, CF-IPCountry, and True-Client-IP are deleted before forwarding. Prevents IP spoofing in Fuelix access logs.
Admin auth failure logging — adminAuth middleware now emits a structured console.warn with level: "warn", type: "admin_auth_failed" for every rejected Bearer token. Operators can filter Logpush for this event to detect brute-force attempts against the admin API.

2026-05-22 (improvement loop, iterations 42–47+)

Security

Strict-Transport-Security: max-age=63072000 added to all API responses — prevents HTTP downgrade attacks for browser-based clients (TypeScript SDK in SPAs, fetch() from web apps). The header instructs browsers to always use HTTPS for api.bve.me for the next 2 years. Does not include includeSubDomains or preload (conservative scope).
Webhook HTTPS-only enforcement — the queue consumer (handleQueue) now validates WEBHOOK_URL before any fetch() call. Invalid URLs and http:// URLs are rejected with a structured error log; the message is still acked to prevent retry storms. Prevents alert event data (key IDs, lifecycle actions) from being transmitted in plaintext.
Webhook HMAC signing — optional WEBHOOK_SECRET Worker secret. When set, every webhook POST includes an X-BVE-Signature: sha256=<hmac-sha256-hex> header. Receivers can verify the header by computing HMAC-SHA256(WEBHOOK_SECRET, raw-body) and comparing with the sha256= value. Follows the GitHub webhook signing convention.
URL query parameter injection prevention — proxyToFuelix and fuelixPath now strip credential-injection query parameters (api_key, api-key, key, token, access_token, secret) from upstream Fuelix URLs. Closes the injection vector that was already blocked at the header level (iterations 4 and 40) but was not covered for query strings.
safeJsonParse in admin API — serializeKey and audit-log metadata handlers in admin.ts now use a safeJsonParse() wrapper instead of bare JSON.parse. Corrupt allowed_models or metadata values in D1 (e.g., from a manual DB edit or partial write) return null instead of causing a 500 response.

Bug fixes

incrementCachedTokenTotal TTL bug — the monthly token cache in src/services/quota.ts was resetting cachedAt to Date.now() on every post-request write, defeating the 30-second TTL. For high-traffic keys the cache would never expire, causing each Worker isolate to accumulate stale totals indefinitely while the D1 DB accumulated tokens from all isolates. Fixed: increments now preserve the original cachedAt from the D1-read timestamp. The TTL now correctly measures from the last D1 read, not the last write.
revokeApiKey double-revocation guard — revokeApiKey had no status filter, so calling POST /admin/api-keys/:id/revoke on an already-revoked key would overwrite revokedAt (corrupting the audit trail) and insert a duplicate audit log entry. Now uses ne(status, 'revoked') in the WHERE clause, consistent with suspendApiKey/unsuspendApiKey. A second revocation returns 404 “API key not found or already revoked”.

Provider compatibility

x-ratelimit-* headers forwarded to clients — the six standard OpenAI rate-limit headers (x-ratelimit-limit-requests, x-ratelimit-limit-tokens, x-ratelimit-remaining-requests, x-ratelimit-remaining-tokens, x-ratelimit-reset-requests, x-ratelimit-reset-tokens) and retry-after are now in SAFE_UPSTREAM_HEADERS. Previously stripped, they are now forwarded to clients and exposed via CORS so SDK-based clients (OpenAI Python/TypeScript SDK, Groq SDK) can implement pre-emptive rate-limit backoff.
Anthropic, Groq, and OpenAI provider-native headers forwarded — anthropic-ratelimit-requests-{limit,remaining,reset}, anthropic-ratelimit-tokens-{limit,remaining,reset}, x-groq-request-id, and openai-processing-ms added to SAFE_UPSTREAM_HEADERS and CORS expose headers. Anthropic SDK clients using /v1/messages now receive Anthropic-native rate-limit feedback; Groq clients can correlate requests via Groq’s request ID.

Observability

Model field in all structured request logs — previously, streaming requests (/v1/chat/completions, /v1/messages, /v1/responses) produced log lines with no model field because the model was only stored in context from the response body, which is unavailable synchronously for streamed responses. Now modelFilter stores the request model in Hono context before the route handler runs. Every structured log line includes model regardless of streaming/non-streaming.

2026-05-22 (improvement loop, iterations 24–41)

Token quota enforcement

monthly_token_limit now enforced — Previously stored in D1 but never checked at request time. Now enforced as a pre-flight check before the Durable Object call: if accumulated total_tokens for the current month meets or exceeds the key’s limit, the request is rejected with 429 rate_limit_exceeded and a Retry-After header pointing to the first second of next month.
Per-key in-memory TTL cache for monthly token totals — The D1 query for the monthly token total is now cached in-memory per key per month with a 30 s TTL. After the first request, subsequent requests from the same key skip the D1 roundtrip entirely (within the same Worker isolate and TTL window).

Model list filtering

GET /v1/models filtered by key and global allowlist — Previously the endpoint returned the full Fuelix model list to every authenticated caller regardless of restrictions. Now the response is filtered: globally blocked models (those with enabled: false in the model allowlist) and models not in the key’s per-key allowlist are removed before the response is returned. Clients that discover models via /v1/models now only see models they are permitted to call.

OpenAI API compliance

param field added to all error responses — The OpenAI error schema defines four fields: message, type, param, and code. Previously every gateway error was missing param. All errors now include param: null or a field-specific value (e.g., param: 'model' for model filter rejections, param: 'name' for admin validation errors). Strict OpenAI SDK clients that check error.param now receive the correct type.
Retry-After header on all 429 responses — All rate-limit rejections now include Retry-After: <seconds> so clients can back off correctly. The value is computed from the DO window’s resetAt timestamp for RPM/RPD/monthly-request limits, and from the start of next month for monthly-token-limit rejections. The header is always a positive integer (clamped to at least 1).
Validation errors use standard gateway error shape — Admin API schema validation failures now return { error: { message, type: "invalid_request_error", code: "validation_error", param } } instead of the validator library’s default shape.

Security

Strip additional provider auth injection headers — proxyToFuelix now strips x-goog-api-key (Google Gemini), anthropic-auth-token, and x-anthropic-auth-token from client requests, in addition to the previously stripped x-api-key, openai-organization, openai-project, and x-basicllm-*. Prevents malicious callers from injecting their own provider credentials.
Content-Security-Policy: default-src 'none' — Prevents browsers from executing any gateway response as HTML, JS, or any other resource type.
Permissions-Policy: interest-cohort=() — Opts the gateway out of Privacy Sandbox FLoC cohort calculation.
Cache-Control: no-store — Prevents intermediary proxies and corporate HTTP caches from storing authenticated API responses (auth errors, admin responses, quota rejections, model output).

Observability

Enriched structured request logs — The per-request log line now includes:
- model — the model name extracted from the upstream response body (non-streaming routes)
- upstreamStatus — the HTTP status code from Fuelix (distinguishes upstream failures from gateway rejections in logs)
- cfRay — the Cloudflare CF-Ray header for cross-referencing with edge infrastructure logs
Webhook notifications for rate-limit alerts — When WEBHOOK_URL is set as a Worker secret, the queue consumer POSTs a structured JSON notification ({ gateway, event, ...payload, timestamp }) to the URL on every rate_limit_exceeded alert event. Network and non-2xx failures are logged and do not block ack.

Performance

In-memory TTL cache for global model allowlist — modelFilter middleware previously called findModelInAllowlist(db, model) on every JSON API request. Now uses a module-scope Map with a 30 s TTL. D1 is only queried on cold miss or after TTL expiry.
ApiKeyLimiter already cached — Confirmed carrying over from iteration 7: the Durable Object caches State in-memory after cold start, eliminating one storage roundtrip per request for warm DO instances.

New routes

GET /v1/responses/:id — Retrieve a stored response by ID (proxied to Fuelix via genericProxy).
DELETE /v1/responses/:id — Delete a stored response by ID (proxied to Fuelix via genericProxy).
Responses API token tracking — POST /v1/responses was promoted from genericProxy (always 0,0) to a dedicated handler that extracts input_tokens/output_tokens from both non-streaming JSON responses and the response.completed SSE event.

Provider compatibility

Gemini token format — parseUsageFromJson now extracts usageMetadata.promptTokenCount / usageMetadata.candidatesTokenCount from Gemini API responses. Model extracted from modelVersion field (Gemini-specific) as a fallback after the standard model field.
Cohere v2 token format — Extracts from usage.tokens.input_tokens / usage.tokens.output_tokens (preferred) and usage.billed_units.input_tokens / usage.billed_units.output_tokens (fallback).
Cohere v1 token format — Extracts from meta.billed_units.input_tokens / meta.billed_units.output_tokens.

Bug fixes

emulateCompletions — clean 502 on unexpected upstream shape — If Fuelix returned valid JSON that did not match the expected { id, choices[] } shape (e.g., an error object), accessing chat.choices.map(...) threw a TypeError caught by the global error handler as a generic 500. Now validated immediately after parsing; unexpected shapes return a structured 502 upstream_error.
GET /v1/models excluded globally blocked models — Models with enabled: false in the model allowlist no longer appear in the model list for any caller.
Sampled request logs restored for non-binary generic passthrough routes — GET/DELETE /v1/responses/:id, Assistants, Threads, and Vector Stores now participate in the 1% request_logs_sampled flow again. File, audio, and image passthrough routes remain excluded.
/v1/models* now participates in async usage accounting and sampled request logging — Model list/detail requests once again increment D1 request_count, emit sampled request log rows, and include X-BVE-Latency on successful responses.
/openapi.json now matches the live route surface again — Added missing public/admin endpoints (Messages, Responses detail, Audio Transcriptions, Images, Vector Stores, and the newer Admin API routes), fixed the documented error schema to include param, and stopped advertising thread list/run-list methods that are not part of the published API contract.
Unsupported verbs no longer proxy through broad route matchers — /v1/messages, /v1/files*, and the base /v1/threads* / /v1/vector_stores* routes now reject undocumented methods with 404 route_not_found instead of forwarding them upstream. The documented wildcard thread/vector-store subresource tree still proxies normally.

Global request cap

Per-isolate monthly request cap guard — MONTHLY_WORKER_REQUEST_SOFT_CAP and MONTHLY_WORKER_REQUEST_HARD_CAP (configured in wrangler.jsonc) are now actively enforced by the globalRateLimiter middleware. Each Worker isolate maintains its own request counter; at the hard cap the middleware returns 503 capacity_exceeded and logs a structured error. At the soft cap it logs a warning and continues. The counter resets at UTC month change. Note: enforcement is per-isolate, not cluster-wide.

2026-05-22 (improvement loop, iterations 12–13)

New admin API endpoints

POST /admin/api-keys/:id/rotate — Generates a new raw key value and immediately invalidates the old one. The key ID, status, rate limits, and model allowlist are preserved. Only active and suspended keys can be rotated (revoked keys return 404). The new key is returned once in the response; the old key stops working immediately.

Docs

GET /admin/audit-logs ?action= filter documented — The endpoint has accepted an ?action= query parameter since its introduction, but this was not documented. The audit-logs reference page now includes the ?action= parameter and an api_key.rotated entry in the action types table.

2026-05-22 (improvement loop, iterations 7–23)

New admin API endpoints

GET /admin/api-keys/:id — retrieve a single key by ID without listing all keys
PATCH /admin/api-keys/:id — partial update of name, rate limits, and allowed_models
POST /admin/api-keys/:id/suspend — suspend an active key (reversible, returns 403 on use)
POST /admin/api-keys/:id/unsuspend — re-activate a suspended key
GET /admin/audit-logs — paginated audit log with optional ?target_id= filter
GET /admin/request-logs — sampled request log (1% sample rate) with ?key_id= filter
GET /admin/model-allowlist — list gateway-wide model registry entries
POST /admin/model-allowlist — add or update a model allowlist entry (idempotent upsert)
DELETE /admin/model-allowlist/:model — remove a model from the global registry

Model filtering

Per-key model allowlist enforced — the allowed_models array on API keys is now enforced at request time. Requests for disallowed models return 403 model_not_allowed. Previously this field was stored but never checked.
Global model allowlist — operators can now disable a model gateway-wide regardless of per-key settings. A model registered with enabled: false returns 403 model_not_available for all keys.

Token usage tracking

Streaming chat completions — the SSE stream is teed; the usage field in the final SSE chunk is parsed and recorded in D1 via ctx.waitUntil without blocking the response.
Anthropic Messages API (non-streaming) — usage.input_tokens / usage.output_tokens extracted and recorded.
Anthropic Messages API (streaming) — message_start and message_delta SSE events parsed for Anthropic-format token counts.
Legacy /v1/completions — token counts extracted from the synthetic chat completion response used for emulation.

Sampled request logging

All buffered (non-streaming) routes now log 1% of requests to the request_logs_sampled D1 table with model, endpoint, status, token counts, and latency.
Streaming /v1/chat/completions and /v1/messages extract the model name from the first SSE chunk and also participate in 1% sampled logging.

Security

Upstream error suppression — Fuelix 5xx response bodies are no longer forwarded to callers. The client always receives "Upstream service error" while the full body is logged internally with requestId.
Security response headers — X-Content-Type-Options: nosniff and Referrer-Policy: no-referrer are now set on all responses.
Validation error shape — admin API schema validation failures now return the standard { error: { message, type, code: "validation_error" } } envelope instead of the validator library’s default shape.

Observability

Structured request logger — every request emits a structured JSON log line ({ level, type, requestId, method, path, status, latencyMs, keyId? }) consumable by wrangler tail or Cloudflare Logpush.
Request ID forwarding — the gateway’s per-request UUID is forwarded as X-Request-Id to Fuelix on all routes for cross-service log correlation.
Structured error logs — unhandled exceptions are now logged as { level: "error", type: "unhandled_error", requestId, message, stack }.
request_id in 500 bodies — 500 responses include the request UUID in error.request_id for support reporting.
Queue consumer logging — the EVENTS_QUEUE consumer emits structured JSON logs; alert events route to console.error for higher-severity Logpush filtering.

Performance

ApiKeyLimiter state caching — the Durable Object now caches its State in memory and skips a storage.get() call on all requests after cold start, eliminating one storage roundtrip per request for warm DO instances.

Bug fixes

/v1/completions Content-Length mismatch — the emulated completions path was forwarding the original Content-Length to the synthetic chat request, causing Fuelix to truncate the body and return JSONDecodeError. Fixed by stripping the header before forwarding.
Request ID forwarding for /v1/completions and /v1/models — emulateCompletions and fetchModels now thread the gateway request ID through to proxyToFuelix.

2026-05-22 (improvement loop)

Security fixes

Header injection prevention — proxyToFuelix now strips client-injected x-api-key, openai-organization, openai-project, and all x-basicllm-* headers before forwarding to Fuelix. Previously a malicious caller could inject an alternate Anthropic or OpenAI auth header and potentially route the request under their own account.
Queue log sanitization — The queue handler no longer serializes body.payload to console.log, reducing exposure of API key IDs and request metadata in Cloudflare log streams.

Bug fixes

Real token usage tracking — Token counts are now extracted from Fuelix responses and recorded in D1 for all endpoints. Previously every request recorded 0, 0 for prompt_tokens and completion_tokens. Changes:
- Non-streaming /v1/chat/completions — response body buffered; usage.prompt_tokens / usage.completion_tokens extracted.
- Streaming /v1/chat/completions — response stream is teed; SSE data: lines are scanned for a usage field; ctx.waitUntil records counts without blocking the response.
- /v1/embeddings — response body buffered; usage extracted.
- /v1/completions — emulated response buffered; usage extracted from the synthetic chat completion.
- /v1/messages (Anthropic) — non-streaming response buffered; usage.input_tokens / usage.output_tokens extracted. Streaming responses still record 0, 0 (Anthropic SSE uses a different event schema).
Anthropic token format support — parseUsageFromJson now falls back to usage.input_tokens / usage.output_tokens when the OpenAI prompt_tokens / completion_tokens fields are absent, enabling correct tracking across both OpenAI-format and Anthropic-format responses.

CORS accuracy

Anthropic-Version added to Access-Control-Allow-Headers — browser clients calling /v1/messages no longer receive a preflight rejection when sending this header.
X-Quota-Allowed, X-Quota-Available, and X-Quota-Reset added to Access-Control-Expose-Headers — browser JavaScript can now read upstream quota headers.

2026-05-22 (discovery session 2)

Confirmed surface (live probe against api.fuelix.ai/v1):

103 models available including GPT-5.4, Claude Sonnet 4.6, Gemini 3.x, Llama 4, imagen-4
All previously documented endpoints remain supported
POST /responses confirmed working (requires max_output_tokens >= 16)
Anthropic Messages API (POST /messages) confirmed returning Anthropic-format responses

Docs added:

Audio (TTS + transcriptions)
Images (generations + edits)
Files API
Assistants API (assistants + threads + messages + runs)
Vector Stores
Anthropic Messages API
Responses API
Deployment Notes

2026-05-22 (initial)

Initial discovery and implementation

Discovery: Live probing of https://api.fuelix.ai/v1 via scripts/fuelix-mega-discovery.ts.

Supported endpoints confirmed:

Endpoint	Notes
`GET /models`	Full model list
`GET /models/:id`	Individual model details
`POST /chat/completions`	All models; streaming SSE supported
`POST /embeddings`	`float` and `base64` encoding, `dimensions` param
`POST /audio/speech`	TTS; binary audio stream
`POST /audio/transcriptions`	Whisper; multipart/form-data
`POST /images/generations`	imagen-3, imagen-3-fast (not dall-e-3)
`POST /images/edits`	multipart/form-data
`GET/POST/DELETE /files*`	Shared upstream account scope
`GET/POST/DELETE /assistants*`	Assistants v2; shared scope
`GET/POST/DELETE /threads*`	Full thread/message/run tree; shared scope
`GET/POST/DELETE /vector_stores*`	Vector knowledge bases; shared scope
`POST /responses`	GPT models only; `max_output_tokens >= 16`
`POST /messages`	Anthropic Messages API; Anthropic-format response

Emulated:

Endpoint	Notes
`POST /completions`	Emulated via `/chat/completions`; streaming rejected with 400

Unsupported (returns 404):

POST /audio/translations
POST /images/variations
POST /moderations
GET /batches
GET /fine_tuning/jobs
POST /rerank
GET /usage
GET /realtime

Chat completion accepted params:

temperature, top_p, stop, presence_penalty, frequency_penalty, user, seed, n, response_format (text/json_object/json_schema), logprobs, top_logprobs, logit_bias, tools, tool_choice, stream, stream_options, reasoning_effort, thinking, service_tier, store, modalities, max_tokens, max_completion_tokens

Architecture implemented:

Cloudflare Workers (Hono routing)
Cloudflare D1 (SQLite) for API key storage and usage tracking
Cloudflare Durable Objects (ApiKeyLimiter) for per-key rate limiting
Cloudflare Queues (bve-gateway-events) for async audit events
SHA-256 + pepper key hashing (raw keys never stored)
Header allowlist filtering (internal Fuelix headers stripped)
Body size limit (10 MB)
JSONDecodeError normalization (Fuelix 500 → gateway 400)

Changelog

2026-06-03 (docs correction)

feat(dashboard): add optional admin-only public catalog drift probe to the Models page

docs(dashboard): align Models page docs with the split live catalog / policy row / snapshot maintenance contract

2026-05-31 (improvement loop, iteration 828)

docs(errors): fix invalid_json description + add invalid_content_type error code

2026-05-30 (improvement loop, iteration 823)

docs(admin/api-keys): document GET /admin/api-keys/:id/usage per-key usage history endpoint

2026-05-30 (improvement loop, iteration 822)

feat(models): implement ?sort_by=, ?sort_dir=, and ?limit= on GET /v1/models

2026-05-30 (improvement loop, iteration 810)

fix(scheduled): add monthly_usage cleanup to daily cron handler (24-month retention)

2026-05-30 (improvement loop, iteration 808)

smoke(?tool_use=): add ?tool_use= filter smoke test coverage + bve_tool_use admin allowlist assertion

2026-05-30 (improvement loop, iteration 807)

feat(admin): add bve_tool_use annotation to GET /admin/model-allowlist and GET /admin/model-allowlist/:model

2026-05-30 (improvement loop, iteration 805)

docs(admin): document GET /admin/api-keys/:id/audit in the API Keys reference

2026-05-30 (improvement loop, iteration 802)

refactor(scheduled): extract pushAlert helper + add audit_log and daily_usage cleanup

2026-05-30 (improvement loop, iteration 800)

feat(admin): add GET /admin/api-keys/:id/audit per-key audit log endpoint

2026-05-30 (improvement loop, iteration 798)

docs(key-stats): document top_models array on GET /admin/key-stats

2026-05-30 (improvement loop, iteration 792)

docs(models): document bve_audio_input annotation and ?audio_input= filter

2026-05-30 (improvement loop, iteration 791)

obs(logger): add quota limit fields to 429 rate_limit_exceeded log entries

2026-05-30 (improvement loop, iterations 788–790)

feat(models): add bve_audio_input annotation and ?audio_input= filter to GET /v1/models

feat(admin): add top_models field to GET /admin/key-stats

fix(models): add category-specific hints to model_endpoint_mismatch errors

2026-05-30 (improvement loop, iteration 787)

docs(home): add Observability and Troubleshooting to “Popular topics” + Observability sidebar badge

2026-05-30 (improvement loop, iteration 779)

smoke(models): add ?web_search= filter coverage + admin allowlist bve_vision/bve_web_search checks

2026-05-30 (improvement loop, iteration 777)

docs(observability): add Observability & Structured Logs guide

2026-05-30 (improvement loop, iteration 776)

docs(curl-examples): add streaming web search cURL example

2026-05-30 (improvement loop, iteration 769)

docs(curl-examples): add web search section

2026-05-30 (improvement loop, iteration 762)

docs: add Claude extended thinking documentation + reasoning cURL examples

2026-05-30 (improvement loop, iteration 759)

docs(curl-examples): add function calling and audio output sections

2026-05-30 (improvement loop, iterations 752–758)

docs(curl-examples): add vision chat completion section

feat(observability): log httpVersion from cf.httpProtocol

feat(validation): validate input_audio content blocks in chat completions

2026-05-30 (improvement loop, iterations 741–751)

docs(curl-examples): add model filter examples and fix allowlist response shape

feat(model-filter): mark Llama 4 Maverick and Scout as vision-capable

feat(scripts): add --expired, --since, --until flags to key:list

refactor(usage): consolidate streaming usage recorder; remove dead code

fix(security): validate openai-processing-ms header + thinking_config.include_thoughts

docs(model-allowlist): add bve_vision to allowlist response examples and field tables

2026-05-29 (improvement loop, iteration 740)

docs(audit-logs): add before/after fields to api_key.updated metadata

2026-05-29 (improvement loop, iterations 729–737)

fix(images): set X-BVE-Model response header on POST /v1/images/generations success path

perf(usage): combine recordUsage + logRequestSampled into a single D1 batch

docs: add Response headers sections to images.mdx, audio.mdx, and legacy-completions.mdx

docs: add structured Gateway validation tables to images.mdx

2026-05-29 (improvement loop, iteration 722)

fix(models): correct bve_vision detection for Claude 3.5 Haiku

fix(logger): correct msToIsoString midnight-crossing timestamp bug

docs: document bve_vision/?vision= filter and is_expired field

2026-05-29 (improvement loop, iteration 708)

docs(response-headers): document X-BVE-Client-Id response header

2026-05-29 (improvement loop, iteration 697)

docs(admin): LinkCard grids for five admin API “Next steps” sections

2026-05-29 (improvement loop, iteration 690)

docs(chat-completions): document provider OpenRouter routing object

2026-05-29 (improvement loop, iterations 644–669)

docs(chat-completions): document transforms OpenRouter parameter

feat(observability): extract cachedTokens and reasoningTokens from OpenAI usage

feat(models): 9 new OpenAI audio-preview and search-preview model IDs

feat(validation): reject n > 1 with stream: true for chat and legacy completions

feat(admin): has_more pagination field on GET /admin/api-keys

docs(errors): fix `invalid_json` description + add `invalid_content_type` error code

docs(admin/api-keys): document `GET /admin/api-keys/:id/usage` per-key usage history endpoint

feat(models): implement `?sort_by=`, `?sort_dir=`, and `?limit=` on `GET /v1/models`

fix(scheduled): add `monthly_usage` cleanup to daily cron handler (24-month retention)

smoke(?tool_use=): add `?tool_use=` filter smoke test coverage + `bve_tool_use` admin allowlist assertion

feat(admin): add `bve_tool_use` annotation to `GET /admin/model-allowlist` and `GET /admin/model-allowlist/:model`

docs(admin): document `GET /admin/api-keys/:id/audit` in the API Keys reference

refactor(scheduled): extract `pushAlert` helper + add audit_log and daily_usage cleanup

feat(admin): add `GET /admin/api-keys/:id/audit` per-key audit log endpoint

docs(key-stats): document `top_models` array on `GET /admin/key-stats`

docs(models): document `bve_audio_input` annotation and `?audio_input=` filter

feat(models): add `bve_audio_input` annotation and `?audio_input=` filter to `GET /v1/models`

feat(admin): add `top_models` field to `GET /admin/key-stats`

fix(models): add category-specific hints to `model_endpoint_mismatch` errors

smoke(models): add `?web_search=` filter coverage + admin allowlist `bve_vision`/`bve_web_search` checks

feat(observability): log `httpVersion` from `cf.httpProtocol`

feat(validation): validate `input_audio` content blocks in chat completions

feat(scripts): add `--expired`, `--since`, `--until` flags to `key:list`

fix(security): validate `openai-processing-ms` header + `thinking_config.include_thoughts`

docs(model-allowlist): add `bve_vision` to allowlist response examples and field tables

docs(audit-logs): add `before`/`after` fields to `api_key.updated` metadata

fix(images): set `X-BVE-Model` response header on `POST /v1/images/generations` success path

perf(usage): combine `recordUsage` + `logRequestSampled` into a single D1 batch

docs: add Response headers sections to `images.mdx`, `audio.mdx`, and `legacy-completions.mdx`

docs: add structured Gateway validation tables to `images.mdx`

fix(models): correct `bve_vision` detection for Claude 3.5 Haiku

fix(logger): correct `msToIsoString` midnight-crossing timestamp bug

docs: document `bve_vision`/`?vision=` filter and `is_expired` field

docs(response-headers): document `X-BVE-Client-Id` response header

docs(chat-completions): document `provider` OpenRouter routing object

docs(chat-completions): document `transforms` OpenRouter parameter

feat(observability): extract `cachedTokens` and `reasoningTokens` from OpenAI usage

feat(validation): reject `n > 1` with `stream: true` for chat and legacy completions

feat(admin): `has_more` pagination field on `GET /admin/api-keys`

security(redact): Stripe and SendGrid key patterns added to `redactSecrets`

feat(admin): `?endpoint=` filter for `GET /admin/model-stats`

feat(admin): `?endpoint=` filter for `GET /admin/key-stats`

fix(security): redact model field in `modelFilter` warn logs

docs(api-reference): `service_tier` — add `"flex"` as a valid value

docs(api-reference): add `unsupported_value` to errors reference

fix(validation): `response_format.json_schema.strict` and `.schema` type guards

fix(scripts): `parseDevVars` deduplicated; quote-stripping for `.dev.vars` values

feat(scripts): `--help` / `-h` flags for quota and key-management CLI scripts

docs(intro+sdk): fix `/v1/completions` streaming claim; expand SDK provider table

fix(dashboard): `overflow-x-auto` added to model stats table card

docs(claude): add `account.ts` to CLAUDE.md source layout

docs(admin): `sort_by` and `sort_dir` documented for `GET /admin/endpoint-stats` and `GET /admin/key-stats`

feat(admin): `sort_by` and `sort_dir` params added to `GET /admin/model-stats`

fix(security): `redactSecrets` applied to `queue_send_failed` error messages in key provisioning

fix(models): D1-allowlisted models excluded from `models_unregistered_upstream` log

fix(security): Cerebras `csk-` key pattern added to `redactSecrets`

fix(queries): renamed keys no longer appear as duplicate rows in `/admin/key-stats`

feat(dashboard/usage): `maxLatencyMs` surfaced in admin dashboard Key Stats and Endpoint Stats

feat(logger): `X-BVE-Worker` response header

feat(dashboard): `bveReasoning` badge in Models page

feat(openapi): `bve_reasoning` field in `/v1/models` response schemas

feat(openapi): complete `required[]` sweep across all admin API endpoints

perf(quota/keys): batch D1 reset for `window=all`; hoist `Date.now()` in `checkQuota`

fix(security): `password=` and `apikey=` query param redaction; Cerebras `csk-` key pattern

fix(validation): align numeric type-error messages to `'must be a number'`

docs: model-stats `?provider=` and `top_endpoints`; endpoint-stats `max_latency_ms`; key-stats `max_latency_ms`

feat(admin): expose `previous_month` in GET /admin/stats response

feat(v1/models): `?endpoint=` and `?provider=` query parameters

feat(v1/models): `bve_endpoints` and `bve_provider` BVE annotation fields

feat(v1/usage): `key_name`, `status`, `expires_at`, and `allowed_models` fields

feat(logger): `provider` field in structured request log