Contact: Saber Salehkaleybar
The project seeks to check global properties of the objective function (the expected return) in various reinforcement learning (RL) settings. In particular, we focus on the gradient dominance property which ensures that any first-order stationary point is global optimum. We will study empirically whether this property holds for a class of policies (such as soft-max tabular policies) in environments with finite state and action spaces.
Good knowledge of reinforcement learning
Strong Python programming skills
(preferred) Experience with ML libraries such as Pytorch
If interested, please send your CV and a transcript of your grades to firstname.lastname@example.org