The update aims to transform Google's video-to-audio (V2A) technology.
A new update to Google’s video-to-audio technology enables users to add AI-generated scores, sound effects, and dialogue to their video clips.
According to a blog post by the tech giant, the technology “combines video pixels with natural language text prompts to generate rich soundscapes for the on-screen action.”
Google describes this update as a “major step towards bringing generated movies to life.”
DeepMind, Google’s AI research lab, explains that the technology can “understand raw pixels” independently, making text prompts optional but useful for enhancing accuracy.
The update also offers enhanced creative control, with “V2A able to generate unlimited soundtracks for any video input.”
A Google spokesperson mentioned that users can use a ‘positive prompt’ to guide the generated output toward desired sounds or a ‘negative prompt’ to steer it away from unwanted sounds.
The update is not yet publicly available. The statement notes: “There are still several limitations we're addressing, and further research is ongoing.”
“Since the audio output quality depends on the video input quality, artifacts or distortions in the video, which fall outside the model’s training distribution, can cause a noticeable decline in audio quality.”