Discovering state-of-the-art reinforcement learning algorithms
Human and other animals use powerful RL mechanism which is learnt by billions years of
evolution. Machine on the other hand learn through hand-crafted learning rules. The goal
is autonomously discover RL algorithm for better learning mechanism. This is achieved by meta-learning from diverse experience of various agents across many challenging environments. Meta learning is a subfield of ML where agents
are trained to learn how to learn. The discovered rule is able to out perform all the existing rules.
Standard RL rules updates the predictions V(s) or Q(s,a) as well as the policy π towards targets. The standard targets are some form of expected discounted reward. For
TD Learning=r+γV(s′)Q Learning=γa′maxQ(s,a′)TD Learning=rt+γV(st+1)−V(st)
The targets define the nature of predictions like V, Q or the advantage function.