DeepSeek Releases New R1-0528 Model on Hugging Face, Rivaling Top AI in Coding

May 29, 2025

In the early hours of May 29, Chinese AI startup DeepSeek quietly open-sourced the latest iteration of its R1 large language model, DeepSeek-R1-0528, on the Hugging Face platform . The release came just after DeepSeek announced in its user community that a “small version upgrade” of the R1 model was complete and ready for testing via the official website, app, and mini-program . Despite being labeled a minor update, initial tests by developers indicate significant improvements in the model’s capabilities, including coding, complex reasoning, and interactive performance . Early adopters are hailing the new model as a notable leap forward, with one calling it “a huge victory for open source” .

Near-OpenAI Level Coding Performance

The Live CodeBench leaderboard (a competitive coding benchmark) as of May 29, 2025, highlights DeepSeek-R1-0528 in fourth place with a Pass@1 score of 73.1 (red box), close behind OpenAI’s O3 and O4-Mini models. This result signals that DeepSeek’s new model has nearly matched the coding proficiency of OpenAI’s advanced proprietary models in this test . In fact, on the Live CodeBench challenge platform, DeepSeek-R1-0528 achieved a Pass@1 score of 73.1, placing 4th overall – just shy of OpenAI’s “O3 (High)” model at 75.8 and the “O4-Mini (High)” at 80.2 . Such performance is remarkable for an open-source model, and has led developers to celebrate the release as a milestone for the community . Notably, the widely-regarded top coding model, Anthropic’s Claude-4, did not appear on the CodeBench leaderboard (likely due to API rate limits), so a direct comparison isn’t available yet.

Under the hood, DeepSeek-R1-0528 continues to utilize a Mixture-of-Experts (MoE) architecture, now scaled up to an enormous size. According to early reports, the model has approximately 670–685 billion parameters in total, although only about 37 billion are active at any given time during inference thanks to the sparse MoE design . This approach allows the model to be highly scaled while maintaining efficiency. The latest version also dramatically extends the context window for inputs – supporting context lengths on the order of 128K tokens (and up to 164K in some tests) . In practical terms, R1-0528 can ingest and reason over extremely large documents or codebases, far beyond the few-thousand-token limits of most earlier models.

Developer Tests and Reaction

Early user feedback suggests DeepSeek-R1-0528’s real-world performance is now on par with Anthropic’s Claude 4 in many tasks, and even outperforms it in certain scenarios . Developers wasted no time in pitting the new model against both closed-source and previous open models. For instance, one AI blogger (and KCORES project co-founder) known as Karminski tested R1-0528 side-by-side with Claude-4 (Sonnet) by asking both to generate a 3D simulation of an orange ball colliding with a surface . The results were striking: DeepSeek’s output included realistic lighting effects such as orange-tinted diffuse reflections on a wall and a polished control panel UI, whereas Claude-4’s version appeared plainer . The DeepSeek model also wrote 728 lines of code vs. Claude’s 542 lines for this task, indicating a more detailed implementation . The developer who ran the experiment highlighted the improved visuals (noting the subtle reflections and smoother motion) and praised R1-0528’s thoroughness in execution .

Another comparison shared on social media involved prompting different models to create a simple “airplane shooter” game. DeepSeek-R1-0528, Claude 4, and the older DeepSeek-V3-0324 were each tasked with generating the game code. The new R1-0528 not only successfully produced the game, but even added extra features and gameplay elements on its own, resulting in a more complex and visually rich output than Claude 4’s version . Observers noted that R1-0528’s game included additional projectiles and power-up items that were absent from Claude’s more bare-bones result – a sign of the model’s enhanced creativity and understanding in coding tasks. Testers have also reported that R1-0528 can generate over a thousand lines of clean, bug-free code in one go, and that it handles front-end web development tasks with greater precision than before . In experiments, the model was able to produce interactive web components (such as animated weather forecast cards and data visualization dashboards) with accurate functionality and styling, something that earlier versions struggled with .

Despite these impressive anecdotes, some industry experts urge caution until more systematic evaluations are completed. They note that results can vary depending on the prompt and use case . In community discussions, a few users humorously rated R1-0528’s coding ability as “Claude 3.7” – implying it nearly reaches Claude-4 level, with substantial improvements over Claude-3.5 . Many have also observed noticeable reductions in hallucinations and more coherent language output from the new model . Overall, there’s a consensus that DeepSeek-R1-0528 represents a significant step up in quality, even if it’s not absolutely perfect in every scenario.

Key Enhancements and Long-Form Capabilities

Beyond programming prowess, DeepSeek’s latest update brings a range of other enhancements. A number of users have remarked that R1-0528’s writing and reasoning abilities are more refined in this iteration. In fact, one enthusiast summarized the model’s new strengths as follows :

Deep reasoning: Able to perform step-by-step logical reasoning as deeply as leading models (comparable to Google’s AI models), rather than jumping to conclusions.
Improved text generation: Outputs are more natural and well-formatted, making essays and explanations read more fluently .
Rigorous yet efficient style: The model adopts a unique reasoning style that is thorough and methodical without sacrificing speed . It tends to show its work (a “chain-of-thought” approach), which increases transparency and reliability in its answers .
Extended focus (“long thinking”): Able to concentrate on a single complex task for an extended period (30–60 minutes) , thanks to the expanded context window. This allows it to handle lengthy queries or multi-step problems in one go.

Users testing the model’s long-context handling report promising results. In one experiment, R1-0528 was given a large document and asked detailed questions about it. The model demonstrated much higher accuracy in retrieving and using information within a 32K-token context compared to the previous R1 version . At extremely large contexts (e.g. 60K tokens), its accuracy did decline, but the 32K context performance saw a notable boost . This suggests that for most practical purposes (tens of thousands of words of reference material), the new model can provide reliable answers, whereas the prior model might have struggled. Testers also noted that R1-0528’s written outputs have become more grounded: an idiosyncrasy in earlier versions, where the model would inject bizarre “quantum mechanics” references into unrelated text, appears to have been fixed . Writing tasks now read more normally, with appropriate style and far less randomness, which will be a relief to users who employ it for drafting content.

Low-Key Release, Future Plans, and Context

The manner of this release underscores DeepSeek’s characteristically low-profile approach to development. The new model was dropped on Hugging Face under an open-source MIT license with little fanfare or official documentation – no formal press release or detailed model card was provided initially . Instead, the company hinted at the update in a community group and let word of mouth spread among developers. This “stealth launch” strategy is something DeepSeek has done before: back in March 2025, it quietly uploaded an updated DeepSeek-V3-0324 model to Hugging Face, which introduced some of R1’s reinforcement learning techniques into the V3 series to improve its reasoning and task performance . In both cases, enthusiastic users and independent testers quickly dissected the new models and shared findings, while DeepSeek remained mostly silent publicly – a style one overseas observer described as “DeepSeek’s consistently low-key fashion” .

Interestingly, DeepSeek chose to brand this powerful new model as just an incremental R1 update rather than “R2.” This has led to speculation about the company’s versioning strategy. Some industry insiders suspect that since the core architecture wasn’t entirely overhauled (the model remains an improved DeepSeek-V3 backbone with MoE scaling and training tweaks), the team opted not to declare it a full R2 release – possibly reserving the “R2” label for a future model with more fundamental changes . Others believe that the R1-0528 might be what an R2 was intended to be, but it was pushed out under the R1 name due to competitive pressures and to manage expectations . “If this is R1, how good will R2 be?” quipped one amazed user, reflecting the excitement and curiosity in the community . So far, DeepSeek has not given any official timeline or details for an R2 model – leaving AI enthusiasts eagerly waiting for what the company will do next.

With DeepSeek-R1-0528 now freely accessible, developers around the world can experiment with one of the most advanced open-source large language models to date. Its ability to rival (and in some cases, approach or surpass) the outputs of top-tier proprietary models like OpenAI’s and Anthropic’s offerings marks a significant moment in the AI landscape. This release not only demonstrates the rapid progress of China’s AI startups in the open-source domain, but also raises the bar for what upcoming models – including DeepSeek’s eventual R2 – might achieve. As the open vs. closed AI race heats up, DeepSeek’s low-key yet high-impact strategy is one to watch, and R1-0528’s performance is likely to spark further innovation and collaboration in the global AI community.

DeepSeek Releases New R1-0528 Model on Hugging Face, Rivaling Top AI in Coding

Discussion about this post