In a preprint paper, researchers at Alphabet’s DeepMind and the University of California, Berkeley propose a framework for comparing the ways children learn about the world to the way AI learns. The work, which was motivated by research suggesting children’s learning supports behaviors later in life, could help close the gap between AI and humans when it comes to aquiring new abilities. For instance, it might lead to robots that can pick and pack millions of different kinds of products while avoiding various obstacles.
Exploration is a key feature of human behavior, and recent evidence suggests children explore their surroundings more often than adults. This is thought to translate to more learning that enables powerful, abstract task generalization — a type of generalization AI agents could tangibly benefit from. For instance, in one study, preschoolers who played with a toy developed a theory about how the toy functioned, such as determining whether its blocks worked based on their color, and they used this theory to make inferences about a new toy or block they hadn’t seen before. AI can approximate this kind of domain and task adaptation, but it struggles without a degree of human oversight and intervention.
The DeepMind approach incorporates an experimental setup built atop DeepMind Lab, DeepMind’s Quake-based learning environment comprising navigation and puzzle-solving tasks for learning agents. The tasks require physical or spatial navigation skills and are modeled after games children play. In the setup, children are allowed to interact with DeepMind Lab through a custom Arduino-based controller, which exposes the same four actions agents would use: move forward, move back, move left, and turn right.
During experiments approved by UC Berkeley’s institutional review board, the researchers attempted to determine two things:
- Whether differences in children’s exploration exist with respect to unknown environments.
- Whether children are less susceptible to corresponding too closely to a particular set of data (i.e., overfitting) compared with AI agents.
In one test, children were told to complete two mazes — one after another — each with the same layout. They explored freely in the first maze, but in the second, they were told to look for a “gummy.”
The researchers say that in the “no-goal condition” — the first maze — the children’s strategies closely resembled that of a depth-first search AI agent, which pursues an unexplored path until it reaches a dead-end and then turns around to explore the last path it saw. The children made choices consistent with DFS (depth-first search) 89.61% of the time compared to the goal condition (the second maze), in which children made choices consistent with DFS 96.04% of the time. Moreover, children who explored less than their peers took the longest to reach the goal (95 steps on average), while children who explored more found the gummy in the least amount of time (66 steps).
The team notes that these behaviors are in contrast with the techniques used to train AI agents, which often depend on having the agent stumble upon an interesting area by chance and then encouraging it to revisit that area until it is no longer “interesting.” Unlike humans, which are prospective explorers, AI agents are retrospective.
In another test, children aged four to six were told to complete two mazes in three phases. In the first phase, they explored the maze in a no-goal condition, a “sparse” condition with a goal and no immediate rewards, and a “dense” condition with both a goal and rewards leading up to it. In the second phase, the children were tasked with once again finding the goal item, which was in the same location as during exploration. In the final phase, they were asked to find the goal item but with the optimal route to it blocked.
Initial data suggests that children are less likely to explore an area in the dense rewards condition, according to the researchers. However, the lack of exploration doesn’t hurt their performance in the final phase. This isn’t true of AI agents — typically, dense rewards make agents less incentivized to explore and lead to poor generalization.
“Our proposed paradigm [allows] us to identify the areas where agents and children already act similarly and those in which they do not,” concluded the coauthors. “This work only begins to touch on a number of deep questions regarding how children and agents explore … In asking [new] questions, we will be able to acquire a deeper understanding of the way that children and agents explore novel environments, and how to close the gap between them.”
Author: Kyle Wiggers.
Source: Venturebeat