Kimi K2 Thinking Ranks No. 2 Globally, No. 1 Among Open-Source Models in Latest Artificial Analysis Report

Kimi K2 Thinking Ranks No. 2 Globally, No. 1 Among Open-Source Models in Latest Artificial Analysis Report

Published:November 8, 2025
Reading Time:2 min read

Want to read in a language you're more familiar with?

Kimi K2 Thinking ranked No. 2 worldwide and No. 1 among open-source AI models, according to Artificial Analysis, showcasing exceptional reasoning and agentic capabilities just behind GPT-5.

A new report from leading AI analytics firm Artificial Analysis reveals that Kimi K2 Thinking has achieved the second-highest global ranking—and the top spot among open-source models—in its latest evaluation of intelligent and agentic AI systems.

7-1108Kimi.jpeg

Strong Agentic and Reasoning Capabilities

Kimi K2 Thinking scored 67 points on the AI Intelligence Index, outperforming all other open-source models such as MiniMax-M2 (61) and DeepSeek-V3.2-Exp (57). It trails only GPT-5, underscoring its impressive reasoning and problem-solving capabilities.

8-1108Kimi.jpeg

On the Agentic Benchmark, which measures performance in AI tool-use and autonomy, Kimi K2 Thinking ranked second only to GPT-5, earning a remarkable 93% on the 𝜏²-Bench Telecom test—the highest independent score ever recorded by the firm.

9-1108Kimi.jpeg

In the Humanity’s Last Exam, a challenging test of reasoning without tools, Kimi K2 Thinking reached 22.3%, setting a new record for open-source models and ranking just behind GPT-5 and Grok 4.

10-1108Kimi.png

New Leader in Open-Source Code Models

While not the top performer in every coding benchmark, Kimi K2 Thinking consistently placed among the highest, ranking 6th on Terminal-Bench Hard, 7th on SciCode, and 2nd on LiveCodeBench. These results crowned it as the new open-source leader in Artificial Analysis’s Code Index, overtaking DeepSeek V3.2.

11-1108Kimi.jpeg

Technical Specs: 1 Trillion Parameters, INT4 Precision

Kimi K2 Thinking features 1 trillion total parameters and 32 billion active parameters (~594GB), supporting a 256K context window with text-only input.

It’s a reasoning variant of Kimi K2 Instruct, maintaining the same architecture but using INT4 native precisioninstead of FP8.

This quantization—achieved through quantization-aware training (QAT)—reduces model size by nearly half, significantly improving efficiency.

Tradeoffs: High Verbosity, Cost, and Latency

Kimi K2 Thinking was noted for being extremely “talkative,” generating 140 million tokens during testing—2.5× DeepSeek V3.2 and 2× GPT-5.

12-1108Kimi.jpeg

While this verbosity raises inference cost and latency, the model still offers competitive pricing:

  • Base API: $2.5 per million tokens (output), total cost $356 per evaluation
  • Turbo API: $8 per million tokens (output), total cost $1,172 — second only to Grok 4 in expense

Processing speeds range from 8 tokens/sec (Base) to 50 tokens/sec (Turbo).

The report concludes that post-training methods like reinforcement learning (RL) continue to drive significant performance gains in reasoning and long-horizon tool-use tasks.