v2026.4.4: Kimi K2.6 available + K2.5 capability fix + Groq Kimi K2 retired

Added Moonshot's new Kimi K2.6 flagship (image + video + reasoning, 262K context) with direct Moonshot pricing. Fixed Kimi K2.5 on Moonshot — reasoning capability was previously unflagged in the catalog even though the model supports it. Removed the Groq mapping for Kimi K2 — Groq no longer serves moonshotai/kimi-k2-instruct.

osmAPI v2026.4.4 - Kimi K2.6

New model: Kimi K2.6

Moonshot released kimi-k2.6, the successor to K2.5. It supports image + video input, reasoning, and a 262,144-token context window.

Pricing (via Moonshot direct, per 1M tokens):

Tier Price
Input (cache hit) $0.16
Input (cache miss) $0.95
Output $4.00

You can now select kimi-k2.6 directly through the gateway, or let auto-routing pick it.

K2.5 capability fix

The Moonshot provider mapping for kimi-k2.5 was previously flagged as vision: true only. Moonshot's /v1/models endpoint reports K2.5 as vision + reasoning (and video input), and the model has always supported thinking by default — so the catalog was under-reporting its capabilities. We've set reasoning: true on the Moonshot K2.5 mapping so auto-routing and feature filters pick it up correctly for reasoning workloads. Pricing and context size are unchanged.

Groq mapping for Kimi K2 removed

Groq's /v1/models endpoint no longer returns moonshotai/kimi-k2-instruct, so the Groq provider mapping for the kimi-k2 model has been removed. Kimi K2 remains available through Novita, Moonshot direct, Cloudrift, and Nebius. No action required unless you were explicitly pinning ?provider=groq for kimi-k2.

DeepSeek V4 available, V3.2 direct mapping retired

DeepSeek's /v1/models endpoint now returns only two models: deepseek-v4-flash and deepseek-v4-pro. Both have been added to the catalog with 1M token context, reasoning, tools, and JSON output support.

Pricing (per 1M tokens):

Model Input (cache hit) Input (cache miss) Output
deepseek-v4-flash $0.028 $0.14 $0.28
deepseek-v4-pro $0.145 $1.74 $3.48

The old deepseek-v3.2 → DeepSeek direct mapping (deepseek-chat alias) has been removed since the API no longer lists it. deepseek-v3.2 remains available via Canopywave.

Reliability fix: DeepSeek V4 reasoning_content preservation

DeepSeek V4 Flash and V4 Pro have thinking mode enabled by default and reject requests where a prior assistant message is missing reasoning_content:

{"error":{"message":"The `reasoning_content` in the thinking mode must be passed back to the API.", ...}}

The gateway now preserves reasoning_content across multi-turn conversations for the deepseek provider (same treatment as Moonshot/Kimi). This fixes 400 errors seen from multi-turn tool-using clients (e.g., Claude Code) when routing to deepseek-v4-flash or deepseek-v4-pro. No caller changes required — the fix is centralized in the gateway's request preparation layer, so both the OpenAI-compatible (/v1/chat/completions) and Anthropic-compatible (/v1/messages) entry points are covered.

The same code path also normalises two additional DeepSeek V4 quirks so callers can keep using the standard OpenAI schema:

  • reasoning_effort values are mapped up to "high" (DeepSeek only accepts "high" or "max"; OpenAI-style minimal/low/medium would 400).
  • temperature, top_p, presence_penalty, and frequency_penalty are omitted from the upstream request because DeepSeek V4 thinking mode rejects them. max_tokens is still forwarded.