Alibaba Cloud open sources AI models for video creation

Thu, 27th Feb 2025

Alibaba Cloud has announced the open sourcing of its AI models for video generation, marking a contribution to the open-source community.

The company is offering four models from its Wan2.1 series, which includes versions with 14-billion and 1.3-billion parameters. These models, identified as T2V-14B, T2V-1.3B, I2V-14B-720P, and I2V-14B-480P, are built to transform text and image inputs into high-quality videos and images. Interested users, including academics, researchers, and commercial entities, can access these models on Alibaba Cloud's AI model community, Model Scope, and the collaborative AI platform, Hugging Face.

According to Alibaba Cloud, the Wan2.1 series is the pioneering video generation model supporting text effects in both Chinese and English languages. It is engineered to produce realistic visuals by managing complex movements and enhancing pixel quality. The model's capability in accurately following instructions has secured its top position on the VBench leaderboard, a key benchmark for video generative models, and it remains the only open-source option among the top five listed in Hugging Face's VBench leaderboard.

VBench has scored the Wan2.1 series at 86.22%, emphasizing its strengths in dynamic degree, spatial relationships, color, and multi-object interactions. This scoring highlights the series' excellence in these crucial video generation dimensions.

Alibaba Cloud officials note that creating video foundation models demands significant computational resources and the collection of vast datasets of high-quality training data. Making these models openly available aims to lower accessibility barriers, enabling diverse businesses to utilise AI for developing high-quality visual content in a cost-effective manner.

Amongst the released models, the T2V-14B is optimised for generating visuals with high motion dynamics. In contrast, the T2V-1.3B model offers a balance between visual generation quality and computational demand, making it suitable for developers exploring secondary development and academia. For instance, developers using a standard personal laptop can generate a five-second 480p resolution video using the T2V-1.3B model in approximately four minutes.

The I2V-14B-720P and I2V-14B-480P models additionally provide functionality for image-to-video generation. Users can input a single image and a brief text to create dynamic video content, with support for image inputs of varied dimensions.

Alibaba Cloud has previously made waves in the technology sphere by being a frontrunner among major global tech companies to release open-source large-scale AI models. Its first such endeavour was the release of its Qwen model (Qwen-7B) in 2023, which also consistently ranks highly on HuggingFace's Open LLM Leaderboards, showing performance on par with top global AI models across various benchmarks.

Currently, there are over 100,000 derivative models based on the Qwen model family developed on Hugging Face, recognizing it as one of the most expansive AI model families globally.

Share on: