Policy Gradient Methods
- learns state to action mapping directly which is often more simple
- no model of environment dynamics needed
- allows continuous action space
- allows stochastic policy which can be a crucial advantage compared to deterministic policies
- Actor Critic RL Methods