Zhipu AI Announces the Free Release of the GLM-4-Flash Large Model

On August 27, Zhipu AI announced that the GLM-4-Flash large model is now freely available for use through the Zhipu AI large model open platform.

GLM-4-Flash is suitable for completing simple vertical tasks that are low cost and require quick responses, achieving a generation speed of 72.14 tokens/s, which is approximately equivalent to 115 characters/s.

GLM-4-Flash features multi-turn dialogue, web browsing, Function Call, and long-text reasoning (supporting a maximum context of 128K), and supports 26 languages including Chinese, English, Japanese, Korean, and German.

Officially, through the use of adaptive weight quantization, various parallelization methods, batching strategies, and speculative sampling, the model achieves reduced latency and increased speed at the inference level. This not only enhances efficiency with greater concurrency and throughput but also significantly lowers inference costs, allowing it to be offered for free.

In terms of pre-training, the official team has introduced large language models into the data screening process, resulting in 10T of high-quality multilingual data, which is more than three times the amount of the ChatGLM3-6B model; additionally, FP8 technology has been used for pre-training to improve training efficiency and computational capacity.

SEE ALSO: Zhipu AI Unveils GLM-4 Model with Advanced Performance Paralleling GPT-4