v2026.4.1: New Fireworks Models, Free Gemma 4 & Moonshot Fixes

New Fireworks models (Qwen 3P6 Plus, Kimi K2.5), free Gemma 4 models with reasoning, removed deprecated Gemma 3 models, and critical fixes for Moonshot and Fireworks provider handling.

osmAPI v2026.4.1 - New Fireworks Models, Free Gemma 4 & Moonshot Fixes

We're shipping v2026.4.1 with new models on Fireworks, free Gemma 4 with reasoning, and important reliability fixes for Moonshot Kimi K2.5 and the Anthropic endpoint.

New Models

Fireworks: Qwen 3P6 Plus

A 396B MoE reasoning model with vision and tool use support, now available via Fireworks.

  • Model ID: fireworks/qwen3p6-plus
  • Context: 262K tokens
  • Input: $0.50/M | Cached: $0.10/M | Output: $3.00/M
  • Capabilities: Reasoning, Streaming, Vision, Tools, JSON
curl -X POST https://api.osmapi.com/v1/chat/completions \
-H "Authorization: Bearer $OSM_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "fireworks/qwen3p6-plus",
"messages": [{"role": "user", "content": "Explain quantum entanglement"}]
}'

Fireworks: Kimi K2.5

Kimi K2.5 is now available as an additional provider via Fireworks, alongside the existing Moonshot provider. Auto-routing will select the best available provider automatically.

  • Model ID: fireworks/kimi-k2.5
  • Input: $0.60/M | Cached: $0.10/M | Output: $3.00/M
  • Capabilities: Reasoning, Streaming, Vision, Tools, JSON

Google Gemma 4 (Free)

Google's Gemma 4 models are now marked as free with native reasoning support enabled.

  • Gemma 4 26B A4B IT — 26B params (3.8B active per token), 256K context
  • Gemma 4 31B IT — 31B params, 256K context

Both confirmed free on Google AI Studio. Apache 2.0 licensed.

Removed Models

Removed 6 Gemma 3 models no longer listed on Google's pricing page:

  • gemma-3-1b-it, gemma-3-4b-it, gemma-3-12b-it
  • gemma-3n-e2b-it, gemma-3n-e4b-it
  • gemma-3-27b (Nebius)

Bug Fixes & Improvements

  • Moonshot Kimi K2.5: Fixed reasoning_content is missing in assistant tool call message error during multi-turn tool calls. The gateway now correctly preserves reasoning context across turns as required by Moonshot's API.
  • Fireworks max_tokens: Fixed max_tokens > 4096 must have stream=true error by capping non-streaming requests to 4096.
  • Kimi K2.5 parameters: Enforced fixed parameters (temperature: 1.0, top_p: 0.95, penalties: 0.0) per Moonshot docs. Invalid values no longer cause errors.
  • Anthropic endpoint: Fixed reasoning_content: Extra inputs are not permitted when routing through the Anthropic-compatible endpoint.
  • Kimi K2.5 pricing: Corrected Moonshot cached input price from $0.15/M to $0.10/M per official pricing.

Try the new models in the Playground | Read the docs | Get started