AI & Robotics News

Google announces AI advances in text-to-video, language translation, more

November 3, 2022

Join us on November 9 to learn how to successfully innovate and achieve efficiency by upskilling and scaling citizen developers at the Low-Code/No-Code Summit. Register here.

At a Google AI event this morning at the company’s Pier 57 offices in New York City, Google announced a variety of artificial intelligence (AI) advances, including in generative AI, language translation, health AI and disaster management.

The event also focused heavily on a discussion around its efforts to build responsible AI, particularly related to control and safety, helping identify generative AI, and “building for everyone.”

“We see so much opportunity ahead and are committed to making sure the technology is built in service of helping people, like any transformational technology,” Google CEO, Sundar Pichai, said in a video shared with attendees, in which the event was meant to “reimagine how technology can be helpful in people’s lives.”

In addition, Pichai pointed out the risks and challenges that come with AI. “That’s why Google is focused on responsible AI from the beginning, publishing AI principles which prioritize the safety and privacy of people over anything else,” he said.

Event

Low-Code/No-Code Summit

Learn how to build, scale, and govern low-code programs in a straightforward way that creates success for all this November 9. Register for your free pass today.

Google debuts Imagen Video — Phenaki combo

Douglas Eck, principal scientist at Google Research and research director for Google’s Brain Team, shared a variety of Google generative AI announcements, including its cautious, slow efforts (compared to DALL-E 2 or Stability AI) to release its text-to-image AI systems.

While Google Imagen is not yet available to the public, the company announced it will add a limited form of it to its AI Test Kitchen app (which this year showed off LaMDA) as a way to collect early feedback. The company showed off a demo called City Dreamer in which users can generate images of a city designed around a theme, such as, say, pumpkins.

In addition, building on its text-to-video work announced last month, Google shared the first rendering of a video that shares both of the company’s complementary text-to-video research approaches — Imagen Video and Phenaki. The result combines Phenaki‘s ability to generate video with a sequence of text prompts with Imagen’s high-resolution detail.

Google’s first rendering of video generated by Imagen Video and Phenaki

“I think it is amazing that we can talk about telling long-form stories like this with super-resolution video, not just from one prompt but a sequence of prompts, with a new way of storytelling,” said Douglas Eck, principal scientist at Google Research and research director for Google’s Brain Team, adding that he was excited about how filmmakers or video storytellers might make use of this technology.

Other generative AI advances

In the text space, Eck also discussed the LaMDA dialogue engine and the Wordcraft Writers Workshop, which challenged professional authors to write experimental fiction using LaMDA as a tool.

Google will soon release a research paper on this, Eck said.

“One clear finding is that using LaMDA to write full stories is a dead end,” he said. “It’s more useful to use LaMDA to add spice.” The user interface also has to be right, he added, serving as a “text editor with a purpose.”

Eck also highlighted Google’s efforts to use AI to generate code, as well as recently introduced research from AudioLM which — with no need for a musical score — extends the audio from any audio clip entered – and DreamFusion, the recently-announced text-to-3D rendering that combines Imagen with NeRF’s 3D capabilities.

“I’ve never seen quite so many advances in the generative space, the pace is really incredible,” he said.

Google’s DreamFusion, designed to convert text into 3D-generated images

Google is building a universal speech translator

After reviewing a variety of Google advances in language AI research, Google Brain leader Zoubin Ghahramani announced the company’s effort to reflect the diversity of the world’s languages and an ambitious stab at building a model that supports the world’s top 1000 languages.

In addition, Google says it is building a universal speech model trained on over 400 languages, with the claim that it is the “largest language model coverage seen in a speech model today.”

All of these efforts “will be a multi year journey,” he said. “But this project will set a critical foundation for making language based AI truly helpful for everyone.”

A strong focus on responsible AI

Following the AI announcements, which also included Marian Croak, VP of engineering at Google, and James Manyika, SVP at Google-Alphabet, discussed Google’s focus on responsible AI.

“I think if we’re going to be leaders, it’s extremely important that we push the state of the art on responsible AI technology,” said Croak. “I’m passionate about wanting to discover ways to make things work in practice.”

Google does adversarial testing “constantly and continuously,” she said. “Then we also make sure that we’re setting benchmarks set of quantitative and can be measured and verified across all the dimensions of our AI. So, we also do that on a continuous basis.”

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Author: Sharon Goldman
Source: Venturebeat

673

0