INDY Lab - Learning A Graph Representation From Comparison Data

Learning A Graph Representation From Comparison Data

Background: Search Through Similarity Queries

Consider a new, interactive search engine that works as follows. The user has a certain abstract item in mind that they want to search for. The user is repeatedly shown a small set of items. Each time, they must select the item most similar to their chosen target. You can explore and play with such an interactive search platform for movie actors developed by our lab here.

There are two important algorithmic components at play here: - An algorithm that carefully chooses the set of actors to be displayed in each step so that it narrows down to the user's target in a few steps. - An algorithm that learns a good representation of the set of actors, such that similarities among faces are accurately captured.

In this project, we are interested in exploring the representation learning aspect. We choose to learn this representation from the comparison data that is collected from the platform itself.

The Research Problem

The most common way of representing abstract objects, such as human faces, is to map each object to a point in a low-dimensional Euclidean space. This is the approach adopted by the current algorithm in the above interactive search portal (see this paper from our group for more details). A good representation would be one where similar objects are mapped two points with a small distance, while dissimilar objects have a larger distance between their corresponding points. However, not all kinds of data can be accurately captured in a low-dimensional Euclidean space. This is because points in a Euclidean space must satisfy some constraints. Distances between abstract objects may not accurately fit the distances between any set of points in the Euclidean space.

We postulate that representing points as a sparse graph would allow us to represent these objects more accurately. Objects that are close in the graph (e.g., neighbors, or two-hop neighbors) should be similar to each other, while objects that are further apart should be dissimilar items. This sparse graph representation must be learned from the data available to us. The data is a collection of comparison statements of the form ``object i is more similar to object j than to object k", for different triplets (i,j,k).

The goals of this project can be summarized as follows:

Develop efficient algorithms to learn sparse graphs from comparison data
Test these algorithms on real-world datasets with the aim of improving performance of previous benchmarks
For the theoretically-oriented student: develop a generative model of the data, based on which we can give performance guarantees of algorithms.

We're happy to discuss more details once you reach out (suryanarayana.sankagiri@epfl.ch).