Google researchers improve reinforcement learning by having their AI play Pong

Deep reinforcement learning — an AI training technique that employs rewards to drive software policies toward goals — has been tapped to model the impact of social convention and to  create AI that’s amazing at playing games

 Recently  researchers at Google invented a new algorithm — Simulated Policy Learning, or SimPLe for short — which uses game models to learn quality policies for selecting actions. 


“At a high-level, the idea behind SimPLe is to alternate between learning a world model of how the game behaves and using that model to optimize a policy (with model-free reinforcement learning) within the simulated game environment,”  Google AI scientists have been quoted . The basic principles behind this algorithm are well established and have been employed in numerous recent model-based reinforcement learning methods.”

As the researchers further explain, training an AI system to play games requires predicting the target game’s next frame given a sequence of observed frames and commands (e.g., “left,” “right,” “forward,” “backward”). A successful model, they point out, can produce trajectories that could be used to train a gaming agent policy, which would obviate the need to rely on computationally costly in-game sequences.


SimPLe does exactly this. It takes a series of frames  as input to predict the next frame along with the reward, and after it’s fully trained, it produces “rollouts” — sample sequences of actions, observations, and outcomes — which are used to improve policies. 

The main promise of model-based reinforcement learning methods is in environments where interactions are either costly or slow such like several  robotics tasks. The future of reinforcement learning does not seem too bleak.