If you’re impressed by the recent spate of text-to-image generators, get ready for the next step in AI artistry: text-to-video.
While the huge compute costs and scarcity of text-to-video datasets have stunted the technique’s growth, recent research has brought the promise closer to reality.
A computer artist called Glenn Marshall has given a glimpse at the potential.
The Belfast-based composer recently won the Jury Award at the Cannes Short Film Festival for his AI The Crow.
<iframe title="The Crow" width="500" height="281" srcdoc="*{padding:0;margin:0;overflow:hidden}html,body{background:#000;height:100%}img{position:absolute;top:0;left:0;width:100%;height:100%;object-fit:cover;transition:opacity .1s cubic-bezier(0.4,0,1,1)}a:hover img+img{opacity:1!important}” frameborder=”0″ allow=”accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture” allowfullscreen>[embedded content]
Marshall had previously earned plaudits for an AI-generated Daft Punk video, but he applied a different approach to The Crow.
While his earlier technique turned text into random visual mutations, The Crow uses an underlying film as an image reference.
“I had been heavily getting into the idea of AI style transfer using video footage as a source,” Marshall told TNW.
“So every day I would be looking for something on YouTube or stock video sites, and trying to make an interesting video by abstracting it or transforming it into something different using my techniques.
“It was during this time I discovered Painted on YouTube — a short live action dance film — which would become the basis of The Crow.”
Marshall fed the video frames of Painted to CLIP, a neural network created by OpenAI.
He then prompted the system to generate a video of “a painting of a crow in a desolate landscape.”
My AI film ‘The Crow’ wins Jury Award at Cannes!https://t.co/WHDsI7UzJM pic.twitter.com/Ww1DGyBbxw
— Glenn Marshall (@GlennIsZen) August 24, 2022
Marshall says the outputs required little cherry-picking. He attributes this to the similarity between the prompt and underlying video, which depicts a dancer in a black shawl mimicking the movements of a crow.
“It’s this that makes the film work so well, as the AI is trying to make every live action frame look like a painting with a crow in it, so I’m meeting it half way, and the film becomes kind of a battle between the human and the AI — with all the suggestive symbolism.”
In the future, Marshall wants to add 3D animation to his AI creations. He’s also exploring CLIP-guided video generation, which can add detailed text-based directions, such as specific camera movements.
That could lead to entire feature films produced by text-to-video systems. Yet Marshall believes even his current techniques could attract mainstream recognition.
He says The Crow is now eligible for submission to the prestigious BAFTA Awards.
“I haven’t got a speech prepared, but I fantasize about collecting an award, in the role of a herald of AI, and proclaiming to the star-studded audience that [for] each and every one of you, actor, director, set designer, costume designer, artist, composer… AI is coming, and you’ll find yourself in a very different job soon — or out of a job all together.”