v2026.4.4: Kimi K2.6 available + K2.5 capability fix + Groq Kimi K2 retired
Added Moonshot's new Kimi K2.6 flagship (image + video + reasoning, 262K context) with direct Moonshot pricing. Fixed Kimi K2.5 on Moonshot — reasoning capability was previously unflagged in the catalog even though the model supports it. Removed the Groq mapping for Kimi K2 — Groq no longer serves moonshotai/kimi-k2-instruct.

New model: Kimi K2.6
Moonshot released kimi-k2.6, the successor to K2.5. It supports image + video input, reasoning, and a 262,144-token context window.
Pricing (via Moonshot direct, per 1M tokens):
| Tier | Price |
|---|---|
| Input (cache hit) | $0.16 |
| Input (cache miss) | $0.95 |
| Output | $4.00 |
You can now select kimi-k2.6 directly through the gateway, or let auto-routing pick it.
K2.5 capability fix
The Moonshot provider mapping for kimi-k2.5 was previously flagged as vision: true only. Moonshot's /v1/models endpoint reports K2.5 as vision + reasoning (and video input), and the model has always supported thinking by default — so the catalog was under-reporting its capabilities. We've set reasoning: true on the Moonshot K2.5 mapping so auto-routing and feature filters pick it up correctly for reasoning workloads. Pricing and context size are unchanged.
Groq mapping for Kimi K2 removed
Groq's /v1/models endpoint no longer returns moonshotai/kimi-k2-instruct, so the Groq provider mapping for the kimi-k2 model has been removed. Kimi K2 remains available through Novita, Moonshot direct, Cloudrift, and Nebius. No action required unless you were explicitly pinning ?provider=groq for kimi-k2.
DeepSeek V4 available, V3.2 direct mapping retired
DeepSeek's /v1/models endpoint now returns only two models: deepseek-v4-flash and deepseek-v4-pro. Both have been added to the catalog with 1M token context, reasoning, tools, and JSON output support.
Pricing (per 1M tokens):
| Model | Input (cache hit) | Input (cache miss) | Output |
|---|---|---|---|
| deepseek-v4-flash | $0.028 | $0.14 | $0.28 |
| deepseek-v4-pro | $0.145 | $1.74 | $3.48 |
The old deepseek-v3.2 → DeepSeek direct mapping (deepseek-chat alias) has been removed since the API no longer lists it. deepseek-v3.2 remains available via Canopywave.
Reliability fix: DeepSeek V4 reasoning_content preservation
DeepSeek V4 Flash and V4 Pro have thinking mode enabled by default and reject requests where a prior assistant message is missing reasoning_content:
{"error":{"message":"The `reasoning_content` in the thinking mode must be passed back to the API.", ...}}
{"error":{"message":"The `reasoning_content` in the thinking mode must be passed back to the API.", ...}}
The gateway now preserves reasoning_content across multi-turn conversations for the deepseek provider (same treatment as Moonshot/Kimi). This fixes 400 errors seen from multi-turn tool-using clients (e.g., Claude Code) when routing to deepseek-v4-flash or deepseek-v4-pro. No caller changes required — the fix is centralized in the gateway's request preparation layer, so both the OpenAI-compatible (/v1/chat/completions) and Anthropic-compatible (/v1/messages) entry points are covered.
The same code path also normalises two additional DeepSeek V4 quirks so callers can keep using the standard OpenAI schema:
reasoning_effortvalues are mapped up to"high"(DeepSeek only accepts"high"or"max"; OpenAI-styleminimal/low/mediumwould 400).temperature,top_p,presence_penalty, andfrequency_penaltyare omitted from the upstream request because DeepSeek V4 thinking mode rejects them.max_tokensis still forwarded.