Kimi K2 Thinking Ranks No. 2 Globally, No. 1 Among Open-Source Models in Latest Artificial Analysis Report

A new report from leading AI analytics firm Artificial Analysis reveals that Kimi K2 Thinking has achieved the second-highest global ranking—and the top spot among open-source models—in its latest evaluation of intelligent and agentic AI systems.

Strong Agentic and Reasoning Capabilities

Kimi K2 Thinking scored 67 points on the AI Intelligence Index, outperforming all other open-source models such as MiniMax-M2 (61) and DeepSeek-V3.2-Exp (57). It trails only GPT-5, underscoring its impressive reasoning and problem-solving capabilities.

On the Agentic Benchmark, which measures performance in AI tool-use and autonomy, Kimi K2 Thinking ranked second only to GPT-5, earning a remarkable 93% on the 𝜏²-Bench Telecom test—the highest independent score ever recorded by the firm.

In the Humanity’s Last Exam, a challenging test of reasoning without tools, Kimi K2 Thinking reached 22.3%, setting a new record for open-source models and ranking just behind GPT-5 and Grok 4.

New Leader in Open-Source Code Models

While not the top performer in every coding benchmark, Kimi K2 Thinking consistently placed among the highest, ranking 6th on Terminal-Bench Hard, 7th on SciCode, and 2nd on LiveCodeBench. These results crowned it as the new open-source leader in Artificial Analysis’s Code Index, overtaking DeepSeek V3.2.

Technical Specs: 1 Trillion Parameters, INT4 Precision

Kimi K2 Thinking features 1 trillion total parameters and 32 billion active parameters (~594GB), supporting a 256K context window with text-only input.

It’s a reasoning variant of Kimi K2 Instruct, maintaining the same architecture but using INT4 native precisioninstead of FP8.

This quantization—achieved through quantization-aware training (QAT)—reduces model size by nearly half, significantly improving efficiency.

Tradeoffs: High Verbosity, Cost, and Latency

Kimi K2 Thinking was noted for being extremely “talkative,” generating 140 million tokens during testing—2.5× DeepSeek V3.2 and 2× GPT-5.

While this verbosity raises inference cost and latency, the model still offers competitive pricing:

Base API: $2.5 per million tokens (output), total cost $356 per evaluation
Turbo API: $8 per million tokens (output), total cost $1,172 — second only to Grok 4 in expense

Processing speeds range from 8 tokens/sec (Base) to 50 tokens/sec (Turbo).

The report concludes that post-training methods like reinforcement learning (RL) continue to drive significant performance gains in reasoning and long-horizon tool-use tasks.

Kimi K2 Thinking Ranks No. 2 Globally, No. 1 Among Open-Source Models in Latest Artificial Analysis Report

Tags

Featured Posts

Alibaba Launches Robotics and Embodied AI

Xiaomi Responds to Incident of Car Reportedly Driving Off on Its Own

Afari Technology Unveils AI Plan and New Brand

ByteDance’s Doubao Translation Model Supports 28 Languages, Performance Comparable to GPT-4o

Zhipu AI Launches GLM‑4.5, an Open-Source 355B AI Model Aimed at AI Agents

More in News