v2026.4.1: New Fireworks Models, Free Gemma 4 & Moonshot Fixes
New Fireworks models (Qwen 3P6 Plus, Kimi K2.5), free Gemma 4 models with reasoning, removed deprecated Gemma 3 models, and critical fixes for Moonshot and Fireworks provider handling.

We're shipping v2026.4.1 with new models on Fireworks, free Gemma 4 with reasoning, and important reliability fixes for Moonshot Kimi K2.5 and the Anthropic endpoint.
New Models
Fireworks: Qwen 3P6 Plus
A 396B MoE reasoning model with vision and tool use support, now available via Fireworks.
- Model ID:
fireworks/qwen3p6-plus - Context: 262K tokens
- Input: $0.50/M | Cached: $0.10/M | Output: $3.00/M
- Capabilities: Reasoning, Streaming, Vision, Tools, JSON
curl -X POST https://api.osmapi.com/v1/chat/completions \-H "Authorization: Bearer $OSM_API_KEY" \-H "Content-Type: application/json" \-d '{"model": "fireworks/qwen3p6-plus","messages": [{"role": "user", "content": "Explain quantum entanglement"}]}'
curl -X POST https://api.osmapi.com/v1/chat/completions \-H "Authorization: Bearer $OSM_API_KEY" \-H "Content-Type: application/json" \-d '{"model": "fireworks/qwen3p6-plus","messages": [{"role": "user", "content": "Explain quantum entanglement"}]}'
Fireworks: Kimi K2.5
Kimi K2.5 is now available as an additional provider via Fireworks, alongside the existing Moonshot provider. Auto-routing will select the best available provider automatically.
- Model ID:
fireworks/kimi-k2.5 - Input: $0.60/M | Cached: $0.10/M | Output: $3.00/M
- Capabilities: Reasoning, Streaming, Vision, Tools, JSON
Google Gemma 4 (Free)
Google's Gemma 4 models are now marked as free with native reasoning support enabled.
- Gemma 4 26B A4B IT — 26B params (3.8B active per token), 256K context
- Gemma 4 31B IT — 31B params, 256K context
Both confirmed free on Google AI Studio. Apache 2.0 licensed.
Removed Models
Removed 6 Gemma 3 models no longer listed on Google's pricing page:
- gemma-3-1b-it, gemma-3-4b-it, gemma-3-12b-it
- gemma-3n-e2b-it, gemma-3n-e4b-it
- gemma-3-27b (Nebius)
Bug Fixes & Improvements
- Moonshot Kimi K2.5: Fixed
reasoning_content is missing in assistant tool call messageerror during multi-turn tool calls. The gateway now correctly preserves reasoning context across turns as required by Moonshot's API. - Fireworks max_tokens: Fixed
max_tokens > 4096 must have stream=trueerror by capping non-streaming requests to 4096. - Kimi K2.5 parameters: Enforced fixed parameters (
temperature: 1.0,top_p: 0.95,penalties: 0.0) per Moonshot docs. Invalid values no longer cause errors. - Anthropic endpoint: Fixed
reasoning_content: Extra inputs are not permittedwhen routing through the Anthropic-compatible endpoint. - Kimi K2.5 pricing: Corrected Moonshot cached input price from $0.15/M to $0.10/M per official pricing.
Try the new models in the Playground | Read the docs | Get started