### Context: Recommendation Systems

Modern recommendation systems include an online learning component that continuously improves the quality of recommendations for users as the system collects more data. In this project, we seek to build a learning algorithm whose input data is in the form of comparisons and choices. This is in contrast to most algorithms, where the data is in the form of scores or ratings. The motivation for considering such data is that it is generally considered easier for a human to choose a favorite among some given items (say, food items) instead of giving each of them an absolute score.

### Bandit Algorithms

The *dueling bandits* framework is a mathematical model that captures an online learning setting where the platform must strategically select which items to recommend so that it eventually learns the users' preferences. A recent extension of this framework is the *contextual dueling bandits problem*. Although there are a few different algorithms proposed in the literature, there has not been extensive evaluation on real-world data. In this project, we would like to test whether contextual dueling bandits are indeed better than the regular version on real-world data.

The steps in this project are:

- Survey, understand, and implement some popular dueling bandit and contextual dueling bandit algorithms
- Run these algorithms on simulated data
- Evaluate these algorithms on a real-world dataset of comparison data.

The last step involves

```
-Processing the data to make it amenable for (contextual) dueling bandit algorithms.
-Understanding how to evaluate bandit algorithms on offline data.
```

Please reach out to suryanarayana.sankagiri@epfl.ch