
Inspur Launches New AI Server, Claims Inference Costs Cut to $0.14 per Million Tokens
Want to read in a language you're more familiar with?
Inspur Information announces the YuanNao HC1000 AI server, claiming a breakthrough in reducing inference costs to as low as ¥1 (≈ $0.14) per million tokens. It aims to overcome the cost barrier for large-scale AI agent deployment.
Inspur Information has unveiled its YuanNao HC1000 hyperscale AI server, claiming it has reduced large-model inference costs to as low as ¥1 (about USD 0.14) per million tokens—a milestone the company says removes a key barrier to large-scale AI agent deployment.
According to Inspur’s Chief AI Strategy Officer Liu Jun, GPU utilization during inference typically reaches only 5–10%, far below the 50%+ utilization seen in training workloads. The HC1000 addresses this inefficiency through a fully symmetric DirectCom ultra-high-speed architecture and a hyperscale design that decomposes computing workflows and optimizes resource allocation.
Liu said the new architecture can boost single-card MFU (Model FLOPs Utilization) by up to 5.7×, significantly lowering inference costs. He stressed that as token consumption grows exponentially, incremental cost optimizations will no longer suffice. Fundamental changes to computing architectures are required, and cost efficiency will become a “license to survive” for AI companies in the coming era.
Source: liangziwei




