Getty Images/iStockphoto
OpenAI intros text-to-video model Sora, challenges rivals
Sora can generate video from text. OpenAI is red teaming the technology and admits it has some weaknesses. The release is a natural progression of the vendor's GenAI technology.
ChatGPT creator OpenAI on Thursday introduced Sora, a generative AI model that can create video from text.
Sora generates videos that are about a minute long depending on the user's prompt, according to OpenAI. It can produce complex scenes with multiple characters, specific types of motion and accurate details of the subject.
The model is available to red teamers such as visual artists, designers and filmmakers who can examine and test it for harm or risk, OpenAI said.
OpenAI's release of the text-to-video model comes two days after it revealed that it was testing memory with ChatGPT so that the AI chatbot can remember something specific.
And the AI vendor, the exclusive partner of Microsoft, released Sora on the same day Google introduced the updated version of its Gemini multimodal model, ratcheting up the near-monthly generative AI competition between the two tech giants.
The new text-to-video model also comes three days after competitor Stability AI introduced a new image generation model, Stable Cascade, which can generate photos and produce variations of the same image. Meanwhile, another player in the generative AI image market, Midjourney, has also been working on video.
Arun ChandrasekaranAnalyst, Gartner
A natural progression
The release of Sora is a natural progression for OpenAI, Gartner analyst Arun Chandrasekaran said.
OpenAI started as a natural language company, but it has expanded into other modalities such as images, coding, speech and now video, he noted.
"They've been trying to expand the various modalities that reflect the real world that they could offer to enterprise clients," Chandrasekaran said. "The thing that was left for them in some sense was video, and this is a natural evolution in the multimodal progress that OpenAI has been making."
OpenAI is not the first vendor to venture into text-to-video.
AI startup Runway introduced its text-to-video AI model Gen-2 last March. Facebook parent company Meta also introduced Make-A-Video in 2022, and later Emu Video and Emu Edit last November. Both enable text instructions and a method for text-to-video generation.
However, for OpenAI, this is a way to show that it's also competitive in this market, Constellation Research founder R. "Ray" Wang said.
"The public announcement of Sora and red team availability to researchers is a stake in the ground," Wang said. "It's going to give Stability AI and Midjourney a run for their money."
Testing challenges
Meanwhile, OpenAI's early introduction of Sora and its red teaming efforts bring up questions about how it will ensure the project is safe and not subjected to misuse, Chandrasekaran said.
"Given that the technology is really, really new, they have to sufficiently gate it to the point that it's not being abused and misused, or even customers using it without recognizing all of the limitations of a nascent technology," he said.
The guardrails OpenAI puts around the model and how the vendor qualifies who gets access are important, he added.
OpenAI has maintained a competitive edge in rolling out its generative AI products quickly. In doing so, the vendor continues to flaunt its ambitions in the generative AI arena, Chandrasekaran said.
"It looks like there's no company that's perhaps more ambitious in the space than OpenAI -- that seems really unabated," he said.
OpenAI acknowledged that Sora still has weaknesses and might struggle with accurately simulating a complex scene. It might also confuse spatial details of a prompt and mix up right and left.
OpenAI said it is building tools to detect misleading content and tell when a video was generated by Sora.
Esther Ajao is a TechTarget Editorial news writer covering artificial intelligence software and systems.