AI & Robotics News

DeepMind’s XLearn trains AI agents to complete complex tasks

July 28, 2021

All the sessions from Transform 2021 are available on-demand now. Watch now.

DeepMind today detailed its latest efforts to create AI systems capable of completing a range of different, unique tasks. By designing a virtual environment called XLand, the Alphabet-backed lab says that it managed to train systems with the ability to succeed at problems and games including hide and seek, capture the flag, and finding objects, some of which they didn’t encounter during training.

The AI technique known as reinforcement learning has shown remarkable potential, enabling systems to learn to play games like chess, shogi, Go, and StarCraft II through a repetitive process of trial and error. But a lack of training data has been one of the major factors limiting reinforcement learning–trained systems’ behavior being general enough to apply across diverse games. Without being able to train systems on a vast enough set of tasks, systems trained with reinforcement learning have been unable to adapt their learned behaviors to new tasks.

DeepMind designed XLand to address this, which includes multiplayer games within consistent, “human-relatable” digital worlds. The simulated space allows for procedurally generated tasks, enabling systems to train on — and generate experience from — tasks that are created programmatically.

XLand offers billions of tasks across varied worlds and players. AI controls players in an environment meant to simulate the physical world, training on a number of cooperative and competitive games. Each player’s objective is to maximize rewards, and each game defines the individual rewards for the players.

“These complex, non-linear interactions create an ideal source of data to train on, since sometimes even small changes in the components of the environment can result in large changes in the challenges for the [systems],” DeepMind explains in a blog post.

XLand trains systems by dynamically generating tasks in response to the systems’ behavior. The systems’ task-generating functions evolve to match their relative performance and robustness, and the generations of systems bootstrap from each other — introducing ever-better players into the multiplayer environment.

DeepMind says that after training systems for five generations — 700,000 unique games in 4,000 worlds within XLand, with each system experiencing 200 billion training steps — they saw consistent improvements in both learning and performance. DeepMind found that the systems exhibited general behaviors such as experimentation, like changing the state of the world until they achieved a rewarding state. Moreover, they observed that the systems were aware of the basics of their bodies, the passage of time, and the high-level structure of the games they encountered.

With just 30 minutes of focused training on a newly presented, complex task, the systems could quickly adapt, whereas agents trained with reinforcement learning from scratch couldn’t learn the tasks at all. “DeepMind’s mission of solving intelligence to advance science and humanity led us to explore how we could overcome this limitation to create AI [systems] with more general and adaptive behaviour,” DeepMind said. “Instead of learning one game at a time, these [systems] would be able to react to completely new conditions and play a whole universe of games and tasks, including ones never seen before.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

up-to-date information on the subjects of interest to you
our newsletters
gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
networking features, and more

Become a member

Author: Kyle Wiggers
Source: Venturebeat

1011

0