Meituan LongCat-Video

Meituan Open-Sources “LongCat-Video,” a 5-Minute Text-to-Video AI Model

Published:October 27, 2025
Reading Time:1 min read

Want to read in a language you're more familiar with?

Meituan open-sources LongCat-Video, a breakthrough AI model that generates 5-minute HD videos from text or images, advancing China’s generative video tech.

Chinese tech giant Meituan has released its new LongCat-Video model, claiming a breakthrough in text-to-video generation by producing coherent, high-definition clips up to five minutes long. The company has also open-sourced the model on GitHub and Hugging Face to support broader research collaboration.

According to Meituan, LongCat-Video is built on a Diffusion Transformer (DiT) architecture and supports three modes — text-to-video, image-to-video, and video continuation. The model can transform a text prompt or a single reference image into a smooth 720p/30 fps sequence, or extend existing footage into longer scenes with consistent style, motion, and physics.

The team said the model addresses a persistent challenge in generative video — maintaining quality and temporal stability across extended durations. LongCat-Video can generate continuous, multi-minute content without the typical frame degradation that affects most diffusion-based systems.

Meituan described LongCat-Video as a step toward “world-model” AI, capable of learning real-world geometry, semantics, and motion to simulate physical environments. The model is publicly available through Meituan’s repositories on GitHub and Hugging Face.