INDY Lab - Projects

Recommender System with hierarchical structure

Daichi Kuroda

Matrix factorization is one of the most widely used techniques for recommender systems. In this method, the user-item interaction matrix is assumed to exhibit a low-rank structure. We aim to refine this assumption further by exploring hierarchical structures within the user-item matrix and investigating whether incorporating hierarchical relationships can enhance the performance. Additionally, we try to develop a more resilient system to the influence of malicious users.

Robustness and Performance of Spectral Clustering Methods for Community Detection

Daichi Kuroda

This project examines the robustness and performance of various spectral clustering methods, focusing on the effects of different Laplacian operators. It aims to clarify when and why performance differences emerge. The goal is to provide practitioners with principled guidelines for selecting spectral methods in graph clustering.

SAT‑Driven Exact and Heuristic Treewidth & Decomposition

Sepehr Elahi

The goal of this project is to leverage SAT/MaxSAT for exact and approximate tree decompositions, evaluated against PACE 2017 Treewidth benchmarks.

Robust Collaborative Filtering

Oscar Villemaud

Recommender systems are at the heart of the modern internet. They are the ones deciding what people see online. However, most of them are vulnerable to manipulation. Your task will be to develop a personalized recommendation algorithm that limits the influence each user can have on the others.

Web scrapping for hierarchical clustering

Daichi Kuroda

Hierarchical clustering is the task of organizing data into a tree representation. Although the field has been intensively studied, the definition of hierarchy had not been clearly defined. Recently, we proposed a rigorous definition of hierarchy. In this project, you will collect text and metadata from Wikipedia and/or arXiv and evaluate how much agreement our definition of hierarchy shows with the hierarchy derived from the metadata.

Exploring good transformations of edge weights for spectral clustering

Daichi Kuroda

This project seeks to identify optimal edge weight transformations for a range of spectral clustering algorithms, with a focus on maximizing community separation in the embedded space.

Estimation the Number of Clusters in Mixture Models using Data Thinning

Maximilien Dreveton

Determining the optimal number of clusters, K, for a dataset of n data points (X1,…,Xn) is a complex task, especially when K is unknown. A common approach involves running a clustering algorithm for various K values and selecting the one that minimizes an objective function. However, this method can be flawed, as it simultaneously fits and validates the model on the same data, potentially leading to overfitting and biased results.

Distinguishing Random Graphs via Distance Measures

Paula Murmann

This project uses First Passage Percolation (FPP) to distinguish between tree, Erdős–Rényi, and complete graphs based on shortest-path distances from a source node. By simulating FPP on random graphs, we aim to identify structural thresholds and accuracy bounds for graph classification.

Analyzing Network Cascades to Infer Graph Properties

Maximilien Dreveton

Given the infection times of vertices from one or multiple cascades (such as waves of an epidemic) spreading through an unknown graph, what can be inferred about the graph's structure?

Clustering Community Detection Methods and Characterizing Them

Daichi Kuroda

Grouping several community detection algorithms, and investigating the properties of communities each algorithm finds.

Generating random graphs with a given treewidth of k

Sepehr Elahi

The goal of this project is to design and implement methods for generating random graphs that have a given treewidth k, in either Python or Julia.

Collaborative item scoring for recommender systems

Oscar Villemaud

The project consists in studying how to best collect and combine the feedback of different users, which is collected in the form of ratings and comparisons.

Routing trains in Julia

Guillaume Dalle

The Flatland challenge was developed by the Swiss, German and French railway companies. It consists in finding itineraries for a set of trains so that they reach their destination as fast as possible... without colliding. The only problem is that the simulator is coded in Python, which makes it very slow: your job will be to make it fast by translating it to Julia!

Graph algorithms in Julia

Guillaume Dalle

Contribute to a major part of the the Julia ecosystem by implementing efficient graph algorithms. Knowledge of Julia or graph theory is not a prerequisite: consider this project as an opportunity to learn!

Collecting a large dataset on human similarity and preference perception

Are you enthusiastic about datascience? Do you have strong web development skills? Then you'd be an ideal candidate to help us conduct a large online study on human similarity and preference choices. The dataset will be the backbone of research on probabilistic choice models and recommender systems.

Few-Shot Learning

Using prior knowledge, Few-Shot Learning aims to (rapidly) generalize to new tasks containing only a few samples of information. We have multiple possible projects available around this topic.

Bayesian Optimization for Soft Clustering

Anthony Bardou

In this project, the student will participate in the design, the implementation and the evaluation of clustering algorithms that exploit Bayesian optimization.

Matrix Factorisation With Comparison Data

Surya Sankagiri

Matrix Factorisation is a classical modelling framework in recommender systems. In this project, you will perform simulations to see test the performance of gradient descent on a class of nonconvex problems arising from matrix factorisation.

SIS Model Simulator with Dynamic Vaccination and Visualization

Sepehr Elahi

Develop an SIS model simulator in Python or Julia that incorporates advanced vaccination strategies and dynamic adjustments of infection and recovery rates. Create interactive visualization tools to monitor and analyze the effects of vaccinations and track disease dynamics in real-time.

Computational methods for Chance Constraint Optimization

Saeed Masiha

Numerical experiments for several known methods for solving Chance Constraint Optimization.

Stochastic proximal first-order algorithm

Saeed Masiha

In this project, we are going to find a batch-free stochastic proximal first-order algorithm that achieves O(1/\epsilon) gradient oracle complexity when the objective function is smooth and satisfies 2-PL (or quadratic growth).

Bounding the Regret of a Dynamic Bayesian Optimization Algorithm

Anthony Bardou

The goal of this project is to contribute to the theoretical understanding of a dynamic Bayesian optimization algorithm.

Enhancing Unsupervised Learning Through Data Thinning: An Exploration of Sample Splitting

Maximilien Dreveton

Clustering a set of n data points (X1,…,Xn) into an optimal number of clusters K is challenging when K is unknown. A common approach involves running a clustering algorithm for different K values and selecting the K that minimizes the objective function. This method is flawed as it uses the same dataset for both model fitting and validation.

Designing and implementing an object to handle mixed-graphs

Sepehr Elahi

The aim of this project is to design and implement a class for handling mixed graphs, in either Python or Julia.

Simulation of an Infectious Process on Graphs

Paula Murmann

Simulate a mathematical model of an infectious process on graph and compare the simulator to state-of-the-art simulators on real-world data.

Adressing Data Staleness in Dynamic Bayesian Optimization

Anthony Bardou

The goal of this project is to implement and study the performance of a criterion measuring data staleness in a dynamic Bayesian optimization context.

Does degree heterogeneity helps or handicaps graph clustering?

Maximilien Dreveton

Real graphs tend to show a lot of heterogeneity in the node degrees (few hubs with large degrees and a lot of low-degree nodes). Does this heterogeneity help or degrade the performance of community detection algorithms?

Sparsifying a graph by keeping only the shortest paths

Maximilien Dreveton

The goal of this project is to study different applications of graph sparsification.

Generalization Performance of Stochastic Gradient Methods

Saeed Masiha

The first goal of this project is to use the different notions of algorithmic stability to study the generalization behavior of stochastic first-order optimization and the effect of variance reduction and adaptation on generalization error. The second goal of this project is whether SGD can be seen as performing regularized empirical risk minimization i.e., studying implicit regularization, a popular theory for why SGD generalizes so well.

Recommendation Systems that Learn from Comparisons

Surya Sankagiri

The goal of this project is to design and implement recommendation system algorithms that can learn users' preferences based on data where users select which items they prefer when given a choice of items. In particular, we focus on multi-armed bandit algorithms.

Second-Order Methods in Deep RL

The project seeks to apply second-order methods in complex environments (such as Atari games) and compare their performances with first-order methods empirically in terms of sample complexity and robustness to changes in initializations.

Clustering temporal networks

Maximilien Dreveton

This project seeks to propose and study methods for clustering networks in which the interactions between the nodes vary with time.

How does the presence of communities affect the source localisation?

Maximilien Dreveton

This project seeks to observe how the community structure of a network affects the identification of the source of a spreading process (disease spreading, etc.).

Full-stack Web Developer for Climpact

We are looking for a full-stack Web developer (Python, JavaScript, HTML, CSS) to continue the development of a platform for Climpact, a research project about machine learning and climate. You will be paid CHF 26.- / hour during the Fall semester, on the campus or remotely.

How does the second-order derivative information affect generalization error or test error?

Saeed Masiha

The goal of the project is to compare empirically the generalization error of two stochastic optimization algorithms used in Reinforcement Learning (SCRN and momentum-based SGD).

Global Convergence in Reinforcement Learning

The project seeks to check global properties of the objective function (the expected return) in various reinforcement learning (RL) settings.

Framing in Wikipedia

How does Wikipedia frame controversial topics and how does it change over time?

Predicting Swiss Votes Through Machine Learning

Can we predict the outcome of Swiss popular votes from information available on the web before the vote?

Early Stopping for Time-series Applications

For time-series applications, standard cross-validation, where data is randomly sampled into train and test partitions, is problematic. In this project, we would like to explore possible replacements for cross-validation in such settings.

Algorithms for epidemic contact tracing on networks

The goal of the project is to develop heuristics for a toy model that captures many of the challenges of COVID-19 contact tracing.

Neural Architecture Search without Training using Sensitivity

In this project, we aim to study the output sensitivity in the NAS-Bench-201 search space. The project requires a literature review over NAS algorithms. The sensitivity metric should then be compared to the state-of-the-art NAS benchmarks in terms of cost and performance.

Few-Shot Learning

Using prior knowledge, Few-Shot Learning aims to (rapidly) generalize to new tasks containing only a few samples of information. We have multiple possible projects available around this topic.

Who Makes Law? Understanding the Structure of Lobbying in Brussels

Lobbying is sometimes considering the hidden part of the iceberg in political decisions. In this project, we'll try to understand to what extent is this statement true using quantitative methods from computer science and machine learning.

LawGit: Visualize the Evolution of European Laws

The goal of this data-mining project is to visualize the evolution of European Laws. Computer scientists are lucky to benefit from version-control systems. Let's make the rest of the world do so!

Mining International Climate Negotiations

The United Nations climate negotiations (COP) started in 1995 in Berlin, and the 25th COP will be held in December 2019 in Madrid. Interactions between countries taking part in these negotiations provide a rich environment to study the global competitive dynamics of our world. In this project, we aim at collecting and analyzing a new dataset of collaborations and conflicts between countries, as well as develop models of such dynamics.

Few-shot Semi-Supervised Learning with Meta-Learning

In few-shot classification, we are interested in learning algorithms that train a classifier from only a handful of labeled examples. Meta-learning is a successful approach for few-shot learning. In this project we'll investigate, implement and compare state-of-the-art meta-learning algorithms in the related but more natural setting of semi-supervised learning.

A Closer Look at Meta-Learning: Fast Adaptation of Deep Neural Networks

Meta-learning, also known as learning-to-learn, is a paradigm that exploits cross-task information and training experience to perform well on a new unseen task. The goals of this project are (1) constructing a unifying framework to compare meta-learning algorithms for classification and regression, and (2) investigating algorithmic and theoretical improvements related to meta-learning.

Implementing source localization algorithms for benchmarking

The goal of this project is to implement existing source localization algorithms and add it to the benchmark software recently developed in the INDY lab.

Reinforcement learning for active comparison-based search

Reinforcement learning for active comparison-based search .

Climpact: understanding people's perception of their carbon footprint

Flying to New York is worse than taking a long shower. But is it 10 times worse or 1000 times worse? In this project, we aim at understanding the perception that people have of their actions. Your task will be to i) develop an application to collect relevant data and ii) implement a statistical model of people's perception.

Implementing variants of sparsified back-propagation (MeProp) in deep neural networks

The goal of this project is to first compare the method with other regularization techniques such as Dropout. Second, experiment different improvements or extensions that can be brought to MeProp (for instance, non-constant k over the training epochs, top-k selection vs random-k selection, implementing MeProp to CNNs, etc.).

Robustness analysis of the Metric Dimension

The goal of the project is to analyse how the Metric Dimension of a graph changes with random perturbations. This project is aimed at students with strong theoretical background.

EPFL people search via profile photo comparisons using machine learning techniques

In this project, we are interested in developing a brand new search framework for navigating through the database of EPFL people via comparing their profile photos.

Mining of Political Texts

This project is about scraping, mining, understanding, and modelling political texts. The goal will be to explore new insights of an existing large data set, extending it with new data when needed. It is an open project where all ideas are encouraged.

Detecting epidemic sources in presence of misinformation

How to deal with misinformation about node states when we want to detect the source of an epidemic?

Machine Learning for Networks

The project is a mix of Machine Learning and Networking. In a hybrid network that consists of Wifi and Power Line Communication, the idea is to find the optimal end-to-end route between access points when you know the pairwise links’ capacities. The main task would be to predict the end-to-end throughput (regression problem). No background in Networking is required (but it is of course a plus).