Pika Founder Responds to Sora’s Release: Very excited, We Will Directly Benchmark Sora
OpenAI’s recent release of Sora is like a bomb, once again causing a global sensation.
As an AI video model, Sora can create realistic and imaginative scenes based on text instructions. It is capable of generating high-definition videos with multiple characters, specific types of movements, and accurate details of both subjects and backgrounds in complex scenes that can last up to one minute.
Sora’s understanding of language has reached a new level, enabling it to accurately comprehend cue words and generate videos that express vibrant emotions. Building upon comprehensive research on the DALL·E and GPT models in the past, it proposes a new model possibility. It can not only understand the requests presented by users in cues but also comprehend their existence in the physical world.
The important thing is that Sora is a diffusion transformer, and transformers have already shown outstanding extension properties in various fields such as language modeling, computer vision, and image generation.
As a diffusion model, Sora can not only generate videos based on text instructions, but also extract existing static images to create videos, accurately animate the content of images and focus on small details. Sora can also retrieve existing videos and extend or fill in missing scenes.
Sora draws inspiration from large language models, which acquire general capabilities through training on internet-scale data.
The technical report released by OpenAI believes that Sora’s research results indicate that expanding video generation models is a highly promising approach to building a universal simulator for the physical world. It enables artificial intelligence to understand and simulate the physical world in motion, reaching new heights.
Therefore, Sora is also considered a significant milestone event in the process of achieving AGI, not just video generation.
Before the release of Sora, Runway and Pika were considered to be top players in the video generation track. After the release of Sora, many people believe that it has easily surpassed and is about to dominate these two emerging unicorn companies on its own, expressing concerns about the fate of entrepreneurs.
However, they themselves seem more excited than fearful about this. Pika founder Demi Guo exclusively responded to TMTPost, saying, ‘We feel that this is a very exciting news. We are already preparing to go head-on and directly compete with Sora.’
In addition, Demi Guo also revealed that they are currently recruiting people, but the specific plans cannot be disclosed to the public for now.
Pika Labs was founded in April 2023, and released its first product, Pika 1.0, in the same year in November. Pika 1.0 is capable of generating and editing 3D animations, cartoons, and movies. Ordinary users can also process it, making it a zero-threshold ‘video creation tool’.
In the interview, Demi Guo also mentioned that a key limitation in the development of generative video is the maturity of algorithms, which is also Pika’s core focus.
‘I think videos are different from language models. Regarding language models, people already have a general idea of the methods, and the algorithms are quite mature. However, there is currently no good algorithm for videos. It is not a scalable problem. It’s not that everyone doesn’t have enough GPUs now; many times it’s actually because there isn’t a good approach in terms of algorithms.’ Demi Guo said.
And the release of Sora this time also provides the industry with a very good algorithmic idea, which may in turn provide a more mature algorithmic path for leading start-up companies like Pika.
In fact, facing the strong OpenAI, Demi Guo has long been prepared. Months ago, reporter also asked Demi Guo who he was most concerned about as a competitor in the video generation track, and Demi Guo immediately said it should be OpenAI.
As a girl who grew up in East Asian culture, Demi Guo graduated from Harvard University with an undergraduate degree, while she dropped out of Stanford’s doctoral program to start her own business. After the demo video of the Pika 1.0 version that she founded was released, it immediately gained attention. It can generate and edit videos in various styles including 3D animation, anime, cartoons or movie styles, and is also very easy to use.
It also allows users to upload video clips themselves and use generative AI to edit and reconstruct scenes. With movie-like quality, animation-level special effects, the visual effects of Pika 1.0 are extremely explosive, seemingly enabling ordinary people to become film directors, which is about to become a possibility.
This team, initially formed by four people as pika_labs, has raised over 55 million US dollars in funding, with almost all well-known early-stage investment companies in the AI field participating in this round of financing.
And just less than four months after the release of Pika 1.0, Sora emerged on the same track in AI video generation, adding many variables and possibilities.