Kimi Releases K2 Thinking — Open-Source “Thinking” Model That Boosts Agent and Reasoning Abilities

Kimi today announced Kimi K2 Thinking, the company’s most capable open-source “thinking” model to date. Built around the “model-as-agent” concept, K2 Thinking natively combines prolonged multi-step reasoning with extensive tool use — enabling agents that can “think while using tools.”

What it does

Kimi says K2 Thinking can autonomously run up to 300 tool-call cycles in a single session and sustain long, stable multi-turn reasoning chains. That ability is powered by the team’s latest Test-Time Scaling techniques, which extend both the number of reasoning tokens and tool-call iterations at inference time to improve agentic and reasoning performance.

Benchmarks and capabilities

K2 Thinking achieves state-of-the-art (SOTA) results across several agent and reasoning benchmarks:

Humanity’s Last Exam (a comprehensive closed-book academic test spanning 100+ disciplines): 44.9% (SOTA when tools are permitted).
BrowseComp (OpenAI’s benchmark for web-browsing agents): 60.2% (new SOTA; human average is ~29.2%).
SEAL-0 and other complex information-gathering/reasoning tests: SOTA-level performance. Kimi highlights gains in agentic search, agentic programming, creative writing and general multi-step reasoning. Example walkthroughs show the model chaining iterative search → browse → code → reasoning loops to decompose open-ended problems into actionable subtasks and produce verified answers.

Agentic coding and creative tasks

K2 Thinking improves coding performance on multilingual software-engineering benchmarks (SWE-Multilingual, SWE-bench, Terminal tasks). The model is better at front-end tasks (HTML/React/components) and can operate inside software agents to manage multi-step development workflows — for example, assembling a functioning Word-style editor or producing voxel-art creations. Creative and research capabilities are also stronger: the model produces more coherent long-form creative writing, deeper academic analysis, and more empathetic, practical responses to personal or emotional queries.

Efficiency: native INT4 quantization

To reduce latency and GPU memory usage during long reasoning runs, Kimi applied quantization-aware training and weight-only INT4 quantization for MoE components. The result: native INT4 inference support that roughly doubles generation speed and improves compatibility with domestic accelerator chips. Kimi notes that all reported benchmark scores were obtained under INT4 precision.

Availability

K2 Thinking is already live on kimi.com and in the latest Kimi mobile app under the standard chat mode. The underlying model will also replace the base model in Kimi’s Agent mode in a forthcoming update to enable full multi-turn thinking and tool use. Developers can access the model via the Kimi Open Platform or download it from public model hubs such as Hugging Face and ModelScope for self-hosting. The platform supports 256K context.

Notes on deployed experience

To keep the regular chat experience lightweight, Kimi deploys a restricted tool set and fewer tool-call rounds on kimi.com and in the app. As a result, on-site chat may not match benchmark scores; the full agentic capabilities will become visible when the Agent mode ("OK Computer") is updated to K2 Thinking.

source：Kimi