CCTV and Baidu AI Cloud Jointly Launch AI Sign Language Anchor

Baidu ai anchor
(Source: Baidu AI Cloud)

On November 24, CCTV and Baidu AI Cloud announced the debut of CCTV’s first AI sign language anchor, a major step forward in “overcoming the barrier of sound with technology”. The anchor will help cover the 2022 Beijing Winter Olympics.

With around 430 million people around the world and 27.8 million in China experiencing disabling hearing loss, the launch of the AI sign language anchor will enable CCTV to provide 24/7 real time news service for the hearing impaired. The AI anchor will assist in providing the latest updates on the Winter Olympics and allowing viewers to better enjoy the excitement of winter sport.

AI-Driven Engines for Sign Language Translation

Baidu’s sign language translation engine and natural action engine enable the AI sign language anchor to achieve high intelligibility in sign language expression together with accurate and coherent presentation.

  • Baidu AI Cloud adopts self-developed intelligent technologies such as speech recognition and machine translation to build a complex and accurate sign language translation engine, enabling the translation from text, audio and video into sign language. 
  • The natural action engine is specifically optimized for sign language and drives the virtual image. The engine interprets the sign language into facial expressions for the digital anchor in real time. 

AI-based Digital Star Operation Platform

The Digital Star Operation Platform is a platform-level product integrating virtual human generation and content production. It provides a complete set of services for the creation and operation of virtual hosts, virtual idols, and virtual brand spokespersons for clients in the fields of broadcasting, interactive entertainment and branding, lowering the application threshold of virtual humans.

SEE ALSO: Canalys Shows China’s Cloud Infrastructure Market Has Hit $6.6 Billion, Baidu AI Cloud Sees Fastest Growth

The platform is able to generate virtual humans in multiple styles such as anime, 2D and high-definition 3D. In addition, regarding content production, the platform supports multiple livestream formats, including human-driven, AI-driven and fusion. Empowered by cross-modal technology, the accuracy of lip synchronization is about 98.5%.