Alibaba Has Developed An AI-Generated Video Model “EMO”

After the United States’ OpenAI launched Sora to ignite the internet, Chinese internet technology giant Alibaba Group Holding Limited is now stepping up its efforts to catch up.

Alibaba Group Holding Limited‘s Intelligent Computing Research Institute recently launched a new AI image-audio-video model technology EMO, officially referred to as “an expressive audio-driven portrait video generation framework”.

It is reported that you only need to provide a photo and an audio file, EMO can generate AI videos that can speak and sing, as well as seamlessly integrate dynamic short videos, with a maximum length of about 1 minute and 30 seconds. The expressions are very accurate, any voice, any speed, and any image can be matched one by one.

For example, in the TV drama “The Knockout,” “Gao Qiqiang” talks about Luo Xiang’s law popularization; a picture of Cai Xukun can be used with other audio to “sing out” a rap song, even the lip movements are almost identical; and recently, in the video case of Sora released by OpenAI, an AI-generated Japanese street heroine wearing sunglasses can not only speak now but also sing beautiful songs.

The Alibaba research team stated that EMO can generate sound avatar videos with rich facial expressions and various head poses, and it can also create videos of any duration based on the length of the input video.

At the same time, EMO also has audio-driven portrait video generation, rich expression dynamic rendering, support for various head turning poses, increased dynamism and realism of videos, support for multiple languages and portrait styles, fast-paced synchronization, cross-actor performance transformation and other features and functions.

On a technical level, researchers at Alibaba shared that the EMO framework uses the Audio2Video diffusion model to generate expressive portrait videos. This technology mainly consists of three stages: first is the initial stage of frame encoding, where ReferenceNet is used to extract features from reference images and motion frames; second is the diffusion process stage, where a pre-trained audio encoder processes audio embeddings. Facial region masks and multi-frame noise integration are used to control facial image generation; third is using the backbone network to facilitate denoising operations. In the backbone network, two forms are applied – reference attention and audio attention mechanisms, which are crucial for preserving character identity and adjusting character actions respectively. Additionally, EMO’s temporal module is used to manipulate the time dimension and adjust movement speed.

In fact, over the past year, Alibaba Group Holding Limited has continued to make efforts in AI, including launching a variety of large-scale AI products such as Qwen-VL that compete with OpenAI on Alibaba Cloud, as well as technologies like Outfit Anyone based on the dual-stream conditional diffusion model for human face transformation and Animate Anyone for character animation models, achieving applications in multiple scenarios.

SEE ALSO: Alibaba Cloud’s Qwen-VL Officially Opens to the Whole Society Today

Currently, Alibaba is still assisting in the development of technology applications related to robots, digital humans, and Agents based on generative AI technology.

In addition, Alibaba is currently one of the major technology companies in China’s open source model field, creating and operating the Chinese AI model open source community ‘ModelScope’. Since its launch a year ago, the number of model downloads on the ‘ModelScope’ community has exceeded 100 million. Earlier, Alibaba also released an all-in-one large-scale model service platform – Aliyun ‘Bailian’.

In addition to developing its own AI model technology products, Alibaba is also promoting investment in some large-scale AI companies.

In February of this year, Alibaba led a new round of financing for the domestic AI large-scale model team MoonShot AI with $1 billion, pushing the company’s valuation to as high as $25 billion and making it the largest single-round financing for a Chinese AI startup.

Nowadays, Chinese tech giants such as Alibaba and Tencent have taken action to support early-stage AI startups in China through various investment methods, promoting the development of large-scale AI models in China.

According to research firm CB Insights, in 2023, China’s investment in the field of AI was about 232 deals, a year-on-year decrease of 38%. The total financing amount during the same period was approximately $2 billion, a decrease of 70% compared to the previous year.