Hear ‘Mona Lisa’ recite a famous Shakespeare monologue — Chinese engineers manage to get a picture to sing and talk using an AI app called Emote Portrait Live

Chinese engineers at the Institute for Intelligent Computing, Alibaba Group, have developed an AI app called Emote Portrait Live that can animate a still photo of a face and synchronize it to an audio track.

The technology behind this relies on the generative capabilities of diffusion models (mathematical models used to describe how things spread or diffuse over time), which can directly synthesize character head videos from a provided image and any audio clip. This process bypasses the need for complex pre-processing or intermediate representations, thus simplifying the creation of talking head videos.

The challenge lies in capturing the nuances and diversity of human facial movements during video synthesis. Traditional methods simplify this by imposing constraints on the final video output, such as using 3D models to limit facial keypoints or extracting head movement sequences from base videos to guide overall motion. However, these constraints may limit the naturalness and richness of the resulting facial expressions.

Not without challenges

The research team’s objective is to develop a talking head framework that can capture a wide range of realistic facial expressions, including subtle micro-expressions, and allow for natural head movements.

However, the integration of audio with diffusion models presents its own challenges due to the ambiguous relationship between audio and facial expressions. This can result in instability in the videos produced by the model, including facial distortions or jittering between video frames. To overcome this, the researchers included stable control mechanisms in their model, specifically a speed controller and a face region controller, to improve stability during the generation process.

Despite the potential of this technology, there are certain drawbacks. The process is more time-consuming than methods that don’t use diffusion models. Additionally, since there are no explicit control signals to guide the character’s motion, the model may unintentionally generate other body parts, like hands, resulting in artifacts in the video.

The group has published a paper on its work on the arXiv preprint server, and this website is home to a number of other videos showcasing the possibilities of Emote Portrait Live, including clips of Joaquin Phoenix (as The Joker), Leonardo DiCaprio, and Audrey Hepburn.

You can watch the Mona Lisa recite Rosalind’s monologue from Shakespeare’s As You Like It, Act 3, Scene 2, below.

More from TechRadar Pro

Services Marketplace – Listings, Bookings & Reviews

Entertainment blogs & Forums

Best Apple Watch (2026): Series 11, SE 3, and Ultra 3

How to Choose the Right Gaming Laptop (2026): What You Need to Know

Best Alternatives to Google’s Android Operating System (2026), Tested and Reviewed

Ring Kills Flock Safety Deal After Super Bowl Ad Uproar

Best Apple Watch (2026): Series 11, SE 3, and Ultra 3

The Science Fiction and Fantasy Books You Can’t Afford to Miss in September!

Send a newsletter? This $100 list-building tool is just $12 right now.

There’s officially a snake named after Salazar Slytherin now

Best Apple Watch (2026): Series 11, SE 3, and Ultra 3

How to Choose the Right Gaming Laptop (2026): What You Need to Know

Best Alternatives to Google’s Android Operating System (2026), Tested and Reviewed

Ring Kills Flock Safety Deal After Super Bowl Ad Uproar

Hear ‘Mona Lisa’ recite a famous Shakespeare monologue — Chinese engineers manage to get a picture to sing and talk using an AI app called Emote Portrait Live

Bydls

Not without challenges

More from TechRadar Pro

Related Post

Best Apple Watch (2026): Series 11, SE 3, and Ultra 3

How to Choose the Right Gaming Laptop (2026): What You Need to Know

Ring Kills Flock Safety Deal After Super Bowl Ad Uproar

You missed

Best Apple Watch (2026): Series 11, SE 3, and Ultra 3

How to Choose the Right Gaming Laptop (2026): What You Need to Know

Best Alternatives to Google’s Android Operating System (2026), Tested and Reviewed

Ring Kills Flock Safety Deal After Super Bowl Ad Uproar