Profile picture
Kishan Chakraborty
Research Scholar
Home
Blog
Back

Discovering state-of-the-art reinforcement learning algorithms

Human and other animals use powerful RL mechanism which is learnt by billions years of
evolution. Machine on the other hand learn through hand-crafted learning rules. The goal
is autonomously discover RL algorithm for better learning mechanism. This is achieved by meta-learning from diverse experience of various agents across many challenging environments. Meta learning is a subfield of ML where agents are trained to learn how to learn. The discovered rule is able to out perform all the existing rules.
Standard RL rules updates the predictions V(s)V(s)V(s) or Q(s,a)Q(s, a)Q(s,a) as well as the policy π\piπ towards targets. The standard targets are some form of expected discounted reward. For

TD Learning=r+γV(s′)Q Learning=γmax⁡a′Q(s,a′)TD Learning=rt+γV(st+1)−V(st) \text{TD Learning} = r + \gamma V(s') \\ \text{Q Learning} = \gamma\max_{a'} Q(s, a') \\ \text{TD Learning} = r_t + \gamma V(s_{t+1}) - V(s_{t}) \\TD Learning=r+γV(s′)Q Learning=γa′max​Q(s,a′)TD Learning=rt​+γV(st+1​)−V(st​)

The targets define the nature of predictions like VVV, QQQ or the advantage function.