Environment: RL environment is the universe of an RL agent. It consists of states containing the location of the agent.
It has a set of actions from which an agent choose what to do. Corresponding to each action at each state the environment awards
it with a reward and it moves to a new state.
Mathematically an environment is given by an MDP (S,A,P,R,γ)
- S: Set of all possible states.
- A: Set of all possible actions.
- P(s′∣s,a): Transition probability form state s to s′.
- R(s,a,s′): Reward for action a in state s and following state s′.
- γ: Discount factor.