
Kimi K2 Thinking Ranks No. 2 Globally, No. 1 Among Open-Source Models in Latest Artificial Analysis Report
Want to read in a language you're more familiar with?
Kimi K2 Thinking ranked No. 2 worldwide and No. 1 among open-source AI models, according to Artificial Analysis, showcasing exceptional reasoning and agentic capabilities just behind GPT-5.
A new report from leading AI analytics firm Artificial Analysis reveals that Kimi K2 Thinking has achieved the second-highest global ranking—and the top spot among open-source models—in its latest evaluation of intelligent and agentic AI systems.

Strong Agentic and Reasoning Capabilities
Kimi K2 Thinking scored 67 points on the AI Intelligence Index, outperforming all other open-source models such as MiniMax-M2 (61) and DeepSeek-V3.2-Exp (57). It trails only GPT-5, underscoring its impressive reasoning and problem-solving capabilities.

On the Agentic Benchmark, which measures performance in AI tool-use and autonomy, Kimi K2 Thinking ranked second only to GPT-5, earning a remarkable 93% on the 𝜏²-Bench Telecom test—the highest independent score ever recorded by the firm.

In the Humanity’s Last Exam, a challenging test of reasoning without tools, Kimi K2 Thinking reached 22.3%, setting a new record for open-source models and ranking just behind GPT-5 and Grok 4.

New Leader in Open-Source Code Models
While not the top performer in every coding benchmark, Kimi K2 Thinking consistently placed among the highest, ranking 6th on Terminal-Bench Hard, 7th on SciCode, and 2nd on LiveCodeBench. These results crowned it as the new open-source leader in Artificial Analysis’s Code Index, overtaking DeepSeek V3.2.

Technical Specs: 1 Trillion Parameters, INT4 Precision
Kimi K2 Thinking features 1 trillion total parameters and 32 billion active parameters (~594GB), supporting a 256K context window with text-only input.
It’s a reasoning variant of Kimi K2 Instruct, maintaining the same architecture but using INT4 native precisioninstead of FP8.
This quantization—achieved through quantization-aware training (QAT)—reduces model size by nearly half, significantly improving efficiency.
Tradeoffs: High Verbosity, Cost, and Latency
Kimi K2 Thinking was noted for being extremely “talkative,” generating 140 million tokens during testing—2.5× DeepSeek V3.2 and 2× GPT-5.

While this verbosity raises inference cost and latency, the model still offers competitive pricing:
- Base API: $2.5 per million tokens (output), total cost $356 per evaluation
- Turbo API: $8 per million tokens (output), total cost $1,172 — second only to Grok 4 in expense
Processing speeds range from 8 tokens/sec (Base) to 50 tokens/sec (Turbo).
The report concludes that post-training methods like reinforcement learning (RL) continue to drive significant performance gains in reasoning and long-horizon tool-use tasks.




