Google DeepMind unveils major generative AI updates for creators

Today

Google DeepMind has introduced a series of new generative media models designed to expand the capabilities available to creators across images, videos, and music, while underscoring its ongoing collaboration with professionals in creative industries.

The new offerings include the release of Veo 3, an upgraded video generation model which now features audio generation, as well as Imagen 4, the latest image generation model that aims to deliver higher quality and improved typography. Google DeepMind is also broadening access to its Lyria 2 music generation technology and unveiling Flow, an AI tool for filmmaking intended to give users more nuanced control over their visual storytelling.

According to a company statement, "These models create breathtaking images, videos and music, empowering artists to bring their creative vision to life. They also power amazing tools for everyone to express themselves." The firm notes that the technology is the result of extensive partnership with filmmakers, musicians, artists, and YouTube creators to ensure that these products are both effective and responsibly developed.

Veo 3 stands out as the first in its series to generate both video and audio, offering outputs such as city street scenes complete with traffic noise, parks enriched by bird songs, or dialogue between characters. "Veo 3, our new state-of-the-art video generation model, not only improves on the quality of Veo 2, but for the first time, can also generate videos with audio — traffic noises in the background of a city street scene, birds singing in a park, even dialogue between characters," the company stated.

The model is described as having strengths in text and image prompting, real-world physics, and accurate lip syncing. Google DeepMind said, "Across the board, Veo 3 excels from text and image prompting to real-world physics and accurate lip syncing. It's great at understanding; you can tell a short story in your prompt, and the model gives you back a clip that brings it to life." Veo 3 is currently available to Ultra subscribers in the United States through the Gemini app and Flow, as well as for enterprise users via Vertex AI.

Alongside the release of Veo 3, the company has introduced updates to Veo 2, informed by feedback from creators and filmmakers. The updates include reference-powered video generation for creative control and consistency, explicit camera controls for movement and positioning, and the ability to outpaint (transitioning scenes between portrait and landscape aspect ratios), as well as functions to add or remove objects from videos. The company explained, "Our state-of-the-art reference powered video capability allows you to give Veo images of characters, scenes, objects, and even styles for better creative control and consistency." These features are now available in Flow and will be added to Vertex AI API and other products in the coming months.

Flow, the company's AI filmmaking tool, is aimed at enabling the creation of cinematic clips, scenes, and narratives using natural language, while managing story elements such as casting, locations, and styles from a single platform. The company described Flow: "Use natural language to describe your shots to Flow, manage the ingredients for your story — cast, locations, objects and styles — in a single convenient place, and use Flow to weave your narrative into beautiful scenes."

Turning to image generation, Imagen 4 is positioned as the company's latest advancement in speed and image clarity, including high-fidelity details in fabrics, water, and fur, and robust handling of typography for use cases like greeting cards or posters. The company commented, "Imagen 4 has remarkable clarity in fine details like intricate fabrics, water droplets, and animal fur, and excels in both photorealistic and abstract styles. Imagen 4 can create images in a range of aspect ratios and up to 2k resolution - even better for printing or presentations. It is also significantly better at spelling and typography, making it easier to create your own greeting cards, posters and even comics." A faster version of Imagen 4 is set to launch, promising up to ten times the speed of its predecessor, Imagen 3.

For music, access to Lyria 2 has been widened. It powers the Music AI Sandbox, which provides experimental tools for musicians, producers, and songwriters, supporting new creative possibilities. "Music AI Sandbox offers musicians, producers and songwriters a set of experimental tools, which can spark new creative possibilities and help artists explore unique musical ideas," the company stated. Lyria 2 is available for creators through YouTube Shorts and enterprises via Vertex AI. The Lyria RealTime model, which facilitates interactive, real-time music generation, is accessible via API and AI Studio.

Addressing concerns around authenticity in the age of generative AI, Google DeepMind highlighted that its SynthID watermarking has been applied to over 10 billion pieces of AI-generated content, such as images, video, audio, and text, to mitigate misinformation and misattribution. "Outputs generated by Veo 3, Imagen 4 and Lyria 2 will continue to have SynthID watermarks." The company has also launched SynthID Detector, a verification tool enabling users to upload content and verify if it is AI-generated by detecting the presence of SynthID watermarks.

Google DeepMind summarised its approach, saying, "With all our generative AI models, we aim to unleash human creativity and enable artists and creators to bring their ideas to life faster and more easily than ever before."

Share on: