
In the same way that a dog can be trained to roll over in exchange for treats, reinforcement learning algorithms can train AI systems to learn to maximize rewards via trial and error.
Reinforcement learning methods enable AI systems to find strategies that are better than those used by humans, as demonstrated by Go-playing algorithm AlphaGo, and to solve problems that no human knows how to solve, like in the case of controlling nuclear fusion reactors. However, one of the key challenges in applying these reinforcement learning methods to application domains, especially outside of games, is defining the rewards.
To solve this problem in reinforcement learning methods, Princeton University Assistant Professor of Computer Science Benjamin Eysenbach and his colleagues began creating algorithms that learn by maximizing received rewards without constant human guidance. “They give us a way of going beyond mimicking strategies to find better solutions that no human has thought about, to problems no human knows how to solve,” said Eysenbach.
Learning with greater efficiency
On March 18, Eysenbach gave a seminar at the Center for Statistics and Machine Learning in which he discussed his research group’s findings on self-supervised reinforcement learning. The talk was a part of the center’s ongoing lunchtime faculty seminar series.
Eysenbach and his collaborators have been testing self-supervised learning methods through the use of robotic arms in setting a table with a plate and utensils. “The conventional way of having reinforcement learning systems work on a task like this is akin to playing the game of hotter and colder,” said Eysenbach. Essentially, a graduate student working with the robotic arm would be there to guide the process, telling the arm when it is closer to achieving the desired configuration of forks and knives or further away from it.
However, with self-supervised learning, Eysenbach and colleagues remove the need for a graduate student to play this role. “Instead, the grad student simply walks in and gives the robot an image of a desired configuration and the robot, by itself, figures out how to get there from the initial configuration.”
Surprisingly, Eysenbach found that when the robotic arm worked to complete the task on its own, it reached the goal state far faster than it did with humans guiding the actions. “When grad students play the hot or cold approach, they are often unable to get the robot to solve the task at all,” said Eysenbach. “Reinforcement learning algorithms can learn as efficiently, if not more efficiently, when you just tell them to get to this desired configuration on their own.”
Researchers outside of Eysenbach’s group have even used his algorithms for applications in healthcare. A healthcare company applied self-supervised reinforcement learning methods to an ultrasound device that is used to take pictures of precise areas of the heart and lungs. “It's hard for the human technicians to control these devices, because they have many knobs that you have to fiddle with,” said Eysenbach “So, they're using reinforcement learning algorithms that we've developed to better control the devices.”
“For a long time, when people thought about reinforcement learning, they thought about solving games, like chess,” said Eysenbach. Yet, similar sequential decision-making problems exist in fields like chemistry and biology. “I'm excited about how we can use these reinforcement learning tools and applications in science and engineering.”