MAIDEAS
Cross Entropy (CE) RL Methods
learns policy by filtering out low-reward episodes trajectory data and favor high-reward episodes trajectory data