Global Convergence in Reinforcement Learning

The project seeks to check global properties of the objective function (the expected return) in various reinforcement learning (RL) settings. In particular, we focus on the gradient dominance property which ensures that any first-order stationary point is global optimum. We will study empirically whether this property holds for a class of policies (such as soft-max tabular policies) in environments with finite state and action spaces.

Requirements

Good knowledge of reinforcement learning

Strong Python programming skills

(preferred) Experience with ML libraries such as Pytorch

If interested, please send your CV and a transcript of your grades to saber.salehkaleybar@epfl.ch