A deep reinforcement learning–based framework for full-body control of humanoid robots was developed, enabling a game of one-versus-one soccer and the robots exhibited emergent behaviors in the form of dynamic motor skills such as the ability to recover from falls and also tactics like defending the ball against an opponent.
We investigated whether deep reinforcement learning (deep RL) is able to synthesize sophisticated and safe movement skills for a low-cost, miniature humanoid robot that can be composed into complex behavioral strategies. We used deep RL to train a humanoid robot to play a simplified one-versus-one soccer game. The resulting agent exhibits robust and dynamic movement skills, such as rapid fall recovery, walking, turning, and kicking, and it transitions between them in a smooth and efficient manner. It also learned to anticipate ball movements and block opponent shots. The agent’s tactical behavior adapts to specific game contexts in a way that would be impractical to manually design. Our agent was trained in simulation and transferred to real robots zero-shot. A combination of sufficiently high-frequency control, targeted dynamics randomization, and perturbations during training enabled good-quality transfer. In experiments, the agent walked 181% faster, turned 302% faster, took 63% less time to get up, and kicked a ball 34% faster than a scripted baseline. OP3 humanoid robots learned to play agile soccer using deep reinforcement learning. Editor’s summary Generating robust motor skills in bipedal robots in the real world is challenging because of the inability of current control methods to generalize to specific tasks. Haarnoja et al. developed a deep reinforcement learning–based framework for full-body control of humanoid robots, enabling a game of one-versus-one soccer. The robots exhibited emergent behaviors in the form of dynamic motor skills such as the ability to recover from falls and also tactics like defending the ball against an opponent. The robot movements were faster when using their framework than a scripted baseline controller and may have potential for more complex multirobot interactions. —Amos Matsiko