
DeepSeek’s New Release! API Costs Slashed by Over 50%
Want to read in a language you're more familiar with?
The team led by DeepSeek founder Wenfeng Liang has launched a new experimental model, DeepSeek-V3.2-Exp, on September 29, marking a significant step in explorin...
The team led by DeepSeek founder Wenfeng Liang has launched a new experimental model, DeepSeek-V3.2-Exp, on September 29, marking a significant step in exploring next-generation transformer architectures. Released as an open-source transitional product, its core upgrade lies in the introduction of DeepSeek’s proprietary Sparse Attention (DSA) mechanism, designed to optimize training and inference efficiency for long-text processing.
Wu Chao, Chief TMT Analyst at CITIC Securities, commented that the new version “significantly enhances usability.” Technically, the DeepSeek DSA mechanism achieves fine-grained sparse attention for the first time. In public benchmark tests across various domains, its output quality matches that of the previous V3.1-Terminus while significantly improving long-text processing efficiency, thanks to rigorous alignment validation in training configurations.
The release’s highlight is its contribution to the open-source ecosystem. In addition to the standard NVIDIA CUDA version, DeepSeek has open-sourced a TileLang version of GPU operators. Developed by Peking University’s Yang Zhi team, this programming language compresses FlashAttention operator code from over 500 lines to just 80, maintaining performance while providing developers with a user-friendly debugging tool. Major platforms like Huawei Ascend and Cambricon have completed model adaptations, with open-source inference code also released simultaneously.
Excitingly for developers, API prices have been drastically reduced: the input cache hit price dropped from 0.5 CNY per million tokens to 0.2 CNY, the non-cache hit price fell from 4 CNY to 2 CNY, and the output price was halved from 12 CNY to 3 CNY, resulting in an overall cost reduction of over 50%. The official app, web platform, and mini-program have all been updated to reflect these changes.
This release coincides with a wave of domestic large-scale model iterations. At the 2025 Cloud Habitat Conference, Alibaba Cloud unveiled seven new products, with its flagship model Qwen3-Max leveraging 36 trillion data points and trillions of parameters to enhance programming and agent capabilities. Zhipu’s GLM-4.6 is set to debut soon, while Moonshot AI’s Kimi has begun beta testing its “OK Computer” Agent mode. Industry competition is increasingly centered on efficiency and ecosystem development.