Profile picture
Kishan Chakraborty
Research Scholar
Home
Blog
Back

Glossary

  1. Bellman Optimality Equation: Represents the value function of a state wrt to an optimal policy.

    V∗(s)=max⁡ar(s,a)+γEs′[V∗(s′)] V^*(s) = \max_{a} r(s, a) + \gamma \mathbb{E}_{s'}[V^*(s')]V∗(s)=amax​r(s,a)+γEs′​[V∗(s′)]
  2. Environment: RL environment is the universe of an RL agent. It consists of states containing the location of the agent. It has a set of actions from which an agent choose what to do. Corresponding to each action at each state the environment awards it with a reward and it moves to a new state. Mathematically an environment is given by an MDP (S,A,P,R,γ)(S, A, P, R, \gamma)(S,A,P,R,γ)

    • SSS: Set of all possible states.
    • AAA: Set of all possible actions.
    • P(s′∣s,a)P(s'|s,a)P(s′∣s,a): Transition probability form state sss to s′s's′.
    • R(s,a,s′)R(s, a, s')R(s,a,s′): Reward for action aaa in state sss and following state s′s's′.
    • γ\gammaγ: Discount factor.
  3. Value Function: It is a metric to evaluate a policy π\piπ. It tells what could an agent achieve starting from a state sts_tst​ by following a policy π\piπ in the long run. Mathematically

    Vπ(st)=∑i=0∞γirt+1+iV_{\pi}(s_t) = \sum_{i=0}^{\infty} \gamma^{i} r_{t+1+i}Vπ​(st​)=i=0∑∞​γirt+1+i​