Kimi Open Source Moonlight: 30 Billion / 160 Billion Parameter Hybrid Expert Model

Published:February 24, 2025

Reading Time:2 min read

Want to read in a language you're more familiar with? Please click the pink icon below for one-click AIPlease click the pink icon top right for one-click AI translation

Kimi announced the launch of "Moonlight": a 30 billion / 160 billion parameter hybrid expert model (MoE) trained on Muon.

Moonshot Kimi released a new technical report on "Muon is scalable for LLM training" yesterday and announced the launch of "Moonlight": a 30 billion / 160 billion parameter hybrid expert model (MoE) trained on Muon. Using 57 trillion tokens, it achieved better performance with fewer floating-point operations (FLOPs), thus improving Pareto efficiency boundaries.

The team discovered that the Muon optimizer can be extended by adding weight decay, carefully adjusting the update magnitude of each parameter, and other techniques. It has the following highlights:

These techniques allow Muon to be used out-of-the-box in large-scale training without needing hyperparameter tuning. Experimental results show that compared to AdamW which computes optimal training, Muon achieves approximately twice the computational efficiency.

The model used in this paper is Moonlight-16B-A3B with a total parameter count of 15.29 billion and activation parameters of 2.24 billion. Using the Muon optimizer, it achieved the above results on training data with 5.7 trillion tokens.

Our model not only surpasses current Pareto frontiers but also achieves better performance than previous models while significantly reducing required FLOPs for training.

We have open-sourced a distributed version of Muon implementation optimized for memory usage and communication efficiency. Additionally, we have released pre-trained models, instruction-tuned models, and intermediate training checkpoints to support future research.

Kimi Open Source Moonlight: 30 Billion / 160 Billion Parameter Hybrid Expert Model

Tags

Featured Posts

Zhipu AI Launches GLM‑4.5, an Open-Source 355B AI Model Aimed at AI Agents

China Proposes “World AI Cooperation Organization” at WAIC 2025

AI ‘Godfather’ Geoffrey Hinton Urges Global AI Cooperation at WAIC 2025 in Shanghai

NIO's First Self-developed Chip Is Aimed at Intelligent Cabin, with Ex-Huawei Hisilicon Expert in Charge

Chinese Auto-Driving Company WeRide Raises $310M in Series B Financing Deal Led by Yutong Group

More in AI