Collecting a large dataset on human similarity and preference perception


We want to collect a large dataset where users are asked questions of the form:

  • "Is object A more similar to B or to C"
  • "Which of A,B,C do you prefer?"

This dataset will be the backbone of research on choice models and recommender systems. While there is a variety of existing datasets, we propose to improve on them by:

  • asking both preference and similarity questions
  • keeping track of anonymized user IDs
  • using multiple datasets (for example pictures of food, books, movies, human faces)
  • (optional) mixing different types of feedback, rating of individual items and comparisons between items

We already have the approval of EPFL's human research ethics committee. And we have an existing prototype, implemeted in a clean and principled web app. This project is ready to take off :)

The existing prototype was implemented by a PhD student at the lab. If there's trouble with the code, you can always drop by the office and ask questions. Don't worry, you're not taking over a messy codebase that's left over from a prior semester project.

In this semester project you'll extend the existing codebase to include more datasets and types of queries. Once this is done you'll set up a machine learning pipeline to evaluate the data and help with conducting the crowdsourced experiment. We're looking for a student with strong web development skills and enthusiasm for datascience.