Advertisement

Alibaba develops new video generation tool based on Sora’s open-source model

  • The move by Alibaba marked its latest effort to launch Sora-like video-generating tools, as Chinese companies rush to gain a foothold in the AI video field

Reading Time:2 minutes
Why you can trust SCMP
0
Signage for Alibaba at one of the company’s offices in Beijing, Feb. 6, 2024. Photo: Bloomberg
Ann Caoin Shanghai

Alibaba Group Holding is working on a video-generating tool called Tora based on OpenAI’s Sora, marking the latest effort by the Chinese tech giant to develop video artificial intelligence (AI) tools.

Tora, a video-generation framework that adopts OpenSora as its foundational model, was described in a paper released last week by a group of five researchers from Alibaba. Alibaba owns the South China Morning Post.

The Tora framework achieved a breakthrough based on the Diffusion Transformer (DiT) architecture, the novel architecture that underpins Sora, the text-to-video model launched by OpenAI in February, according to the paper, which was published on repository website arXiv.

The researchers claim to have developed the first “trajectory-oriented DiT framework for video generation”, meaning it ensures the generated movements precisely follow the specified trajectories while replicating the dynamics of the physical world.

“We adapted OpenSora’s workflow to transform raw videos into high-quality video-text pairs and leverage an optical flow estimator for trajectory extraction,” they said.

The Alibaba booth at the World Artificial Intelligence Conference (WAIC) in Shanghai, July 6, 2023. Photo: Reuters
The Alibaba booth at the World Artificial Intelligence Conference (WAIC) in Shanghai, July 6, 2023. Photo: Reuters

The paper references a series of videos that show different objects – from a wooden sailing boat in a river to men cycling on the highway – moving in accordance with designated trajectories. Tora is capable of generating videos guided by trajectories, images, text, or a combination of the three, according to the researchers.

Advertisement