Alibaba launches new open-source tool that turns photos into video

Alibaba has released a new open-source speech-to-video model capable of generating animated digital humans from a single portrait and an audio clip. The tool is aimed at content creators and researchers who are looking for a way to produce lifelike avatars capable of speaking, singing, or performing.

The Wan2.2-S2V release builds on Alibaba’s Wan2.2 video generation series. By becoming open-source, the company is offering developers a system that can animate portraits across different perspectives including close-up, bust, and full-body shots.

SEE ALSO: Less than a third of organizations are prepared for deepfake attacks

Wan2.2-S2V is powered by audio-driven animation technology that carefully synchronizes speech and movement. It is able to handle complex multi-character scenes and adapt to prompts that specify particular gestures or environmental elements.

According to Alibaba, this will allow creators to make videos for uses ranging from social media content to longer form film-style projects.

The model also provides output options of 480P and 720P, producing quality results without requiring high-end computing power, something that should appeal to independent creators as well as professional teams working on large-scale projects.

Alibaba research

Researchers behind the model developed a custom audio-visual dataset focused on film and television scenarios. They used multi-resolution training to ensure that the system could generate both vertical short-form videos and traditional widescreen outputs.

Wan2.2-S2V uses a frame compression process which condenses long video histories into a single latent representation. This lowers computational overhead while maintaining consistency over extended clips, which Alibaba says is a challenge for many video generation systems.

By stabilizing longer sequences, the model should be able to generate more ambitious animated productions.

The launch follows earlier open-source releases in the Wan series, including Wan2.1 in February and Wan2.2 in July. Downloads of the Wan models across Hugging Face and ModelScope have already exceeded 6.9 million.

Wan2.2-S2V is now available through Hugging Face, GitHub, and Alibaba’s ModelScope platform.

What do you think about Alibaba’s new speech-to-video model? Let us know in the comments.

© 1998-2025 BetaNews, Inc. All Rights Reserved. About Us - Privacy Policy - Cookie Policy - Sitemap.

Regional iGaming Content