Hierarchical clustering is the task of organizing data into a tree representation. Although the field has been intensively studied, the definition of hierarchy had not been clearly defined. Recently, we proposed a rigorous definition of hierarchy. In this project, you will collect text and metadata from Wikipedia and/or arXiv and evaluate how much agreement our definition of hierarchy shows with the hierarchy derived from the metadata.
In this project, the student will participate in the design, the implementation and the evaluation of clustering algorithms that exploit Bayesian optimization.
In this project, the student will use the dynamic Bayesian optimization framework for power control in cellular networks.
The Bradley-Terry-Luce model (aka Multinomial Logit model) is a classical tool for modelling how humans make choices when presented with a class of alternatives. This project involves exploring generalisations of this model that take into consideration effects of 'human irrationality'.
Matrix Factorisation is a classical modelling framework in recommender systems. In this project, you will perform simulations to see test the performance of gradient descent on a class of nonconvex problems arising from matrix factorisation.
This project uses First Passage Percolation (FPP) to distinguish between tree, Erdős–Rényi, and complete graphs based on shortest-path distances from a source node. By simulating FPP on random graphs, we aim to identify structural thresholds and accuracy bounds for graph classification.
Develop an SIS model simulator in Python or Julia that incorporates advanced vaccination strategies and dynamic adjustments of infection and recovery rates. Create interactive visualization tools to monitor and analyze the effects of vaccinations and track disease dynamics in real-time.
Numerical experiments for several known methods for solving Chance Constraint Optimization.
In this project, we are going to find a batch-free stochastic proximal first-order algorithm that achieves O(1/\epsilon) gradient oracle complexity when the objective function is smooth and satisfies 2-PL (or quadratic growth).
The project consists in studying how to best collect and combine the feedback of different users, which is collected in the form of ratings and comparisons.
This project seeks to identify optimal edge weight transformations for a range of spectral clustering algorithms, with a focus on maximizing community separation in the embedded space.
Determining the optimal number of clusters, K, for a dataset of n data points (X1,…,Xn) is a complex task, especially when K is unknown. A common approach involves running a clustering algorithm for various K values and selecting the one that minimizes an objective function. However, this method can be flawed, as it simultaneously fits and validates the model on the same data, potentially leading to overfitting and biased results.
Given the infection times of vertices from one or multiple cascades (such as waves of an epidemic) spreading through an unknown graph, what can be inferred about the graph's structure?
Grouping several community detection algorithms, and investigating the properties of communities each algorithm finds.
The goal of this project is to design and implement methods for generating random graphs that have a given treewidth k, in either Python or Julia.
The Flatland challenge was developed by the Swiss, German and French railway companies. It consists in finding itineraries for a set of trains so that they reach their destination as fast as possible... without colliding. The only problem is that the simulator is coded in Python, which makes it very slow: your job will be to make it fast by translating it to Julia!
Contribute to a major part of the the Julia ecosystem by implementing efficient graph algorithms. Knowledge of Julia or graph theory is not a prerequisite: consider this project as an opportunity to learn!
The first goal of this project is to use the different notions of algorithmic stability to study the generalization behavior of stochastic first-order optimization and the effect of variance reduction and adaptation on generalization error. The second goal of this project is whether SGD can be seen as performing regularized empirical risk minimization i.e., studying implicit regularization, a popular theory for why SGD generalizes so well.
Are you enthusiastic about datascience? Do you have strong web development skills? Then you'd be an ideal candidate to help us conduct a large online study on human similarity and preference choices. The dataset will be the backbone of research on probabilistic choice models and recommender systems.
Using prior knowledge, Few-Shot Learning aims to (rapidly) generalize to new tasks containing only a few samples of information. We have multiple possible projects available around this topic.
The goal of this project is to contribute to the theoretical understanding of a dynamic Bayesian optimization algorithm.
Clustering a set of n data points (X1,…,Xn) into an optimal number of clusters K is challenging when K is unknown. A common approach involves running a clustering algorithm for different K values and selecting the K that minimizes the objective function. This method is flawed as it uses the same dataset for both model fitting and validation.
The aim of this project is to design and implement a class for handling mixed graphs, in either Python or Julia.
Simulate a mathematical model of an infectious process on graph and compare the simulator to state-of-the-art simulators on real-world data.
The goal of this project is to implement and study the performance of a criterion measuring data staleness in a dynamic Bayesian optimization context.
Real graphs tend to show a lot of heterogeneity in the node degrees (few hubs with large degrees and a lot of low-degree nodes). Does this heterogeneity help or degrade the performance of community detection algorithms?
The goal of this project is to study different applications of graph sparsification.
The goal of this project is to design and implement recommendation system algorithms that can learn users' preferences based on data where users select which items they prefer when given a choice of items. In particular, we focus on multi-armed bandit algorithms.
The project seeks to apply second-order methods in complex environments (such as Atari games) and compare their performances with first-order methods empirically in terms of sample complexity and robustness to changes in initializations.
This project seeks to propose and study methods for clustering networks in which the interactions between the nodes vary with time.
This project seeks to observe how the community structure of a network affects the identification of the source of a spreading process (disease spreading, etc.).
We are looking for a full-stack Web developer (Python, JavaScript, HTML, CSS) to continue the development of a platform for Climpact, a research project about machine learning and climate. You will be paid CHF 26.- / hour during the Fall semester, on the campus or remotely.
The goal of the project is to compare empirically the generalization error of two stochastic optimization algorithms used in Reinforcement Learning (SCRN and momentum-based SGD).
The project seeks to check global properties of the objective function (the expected return) in various reinforcement learning (RL) settings.
How does Wikipedia frame controversial topics and how does it change over time?
Can we predict the outcome of Swiss popular votes from information available on the web before the vote?
For time-series applications, standard cross-validation, where data is randomly sampled into train and test partitions, is problematic. In this project, we would like to explore possible replacements for cross-validation in such settings.
The goal of the project is to develop heuristics for a toy model that captures many of the challenges of COVID-19 contact tracing.
In this project, we aim to study the output sensitivity in the NAS-Bench-201 search space. The project requires a literature review over NAS algorithms. The sensitivity metric should then be compared to the state-of-the-art NAS benchmarks in terms of cost and performance.
Using prior knowledge, Few-Shot Learning aims to (rapidly) generalize to new tasks containing only a few samples of information. We have multiple possible projects available around this topic.
Lobbying is sometimes considering the hidden part of the iceberg in political decisions. In this project, we'll try to understand to what extent is this statement true using quantitative methods from computer science and machine learning.
The goal of this data-mining project is to visualize the evolution of European Laws. Computer scientists are lucky to benefit from version-control systems. Let's make the rest of the world do so!
The United Nations climate negotiations (COP) started in 1995 in Berlin, and the 25th COP will be held in December 2019 in Madrid. Interactions between countries taking part in these negotiations provide a rich environment to study the global competitive dynamics of our world. In this project, we aim at collecting and analyzing a new dataset of collaborations and conflicts between countries, as well as develop models of such dynamics.
In few-shot classification, we are interested in learning algorithms that train a classifier from only a handful of labeled examples. Meta-learning is a successful approach for few-shot learning. In this project we'll investigate, implement and compare state-of-the-art meta-learning algorithms in the related but more natural setting of semi-supervised learning.
Meta-learning, also known as learning-to-learn, is a paradigm that exploits cross-task information and training experience to perform well on a new unseen task. The goals of this project are (1) constructing a unifying framework to compare meta-learning algorithms for classification and regression, and (2) investigating algorithmic and theoretical improvements related to meta-learning.
The goal of this project is to implement existing source localization algorithms and add it to the benchmark software recently developed in the INDY lab.
Reinforcement learning for active comparison-based search .
Flying to New York is worse than taking a long shower. But is it 10 times worse or 1000 times worse? In this project, we aim at understanding the perception that people have of their actions. Your task will be to i) develop an application to collect relevant data and ii) implement a statistical model of people's perception.
The goal of this project is to first compare the method with other regularization techniques such as Dropout. Second, experiment different improvements or extensions that can be brought to MeProp (for instance, non-constant k over the training epochs, top-k selection vs random-k selection, implementing MeProp to CNNs, etc.).
The goal of the project is to analyse how the Metric Dimension of a graph changes with random perturbations. This project is aimed at students with strong theoretical background.
In this project, we are interested in developing a brand new search framework for navigating through the database of EPFL people via comparing their profile photos.
This project is about scraping, mining, understanding, and modelling political texts. The goal will be to explore new insights of an existing large data set, extending it with new data when needed. It is an open project where all ideas are encouraged.
How to deal with misinformation about node states when we want to detect the source of an epidemic?
The project is a mix of Machine Learning and Networking. In a hybrid network that consists of Wifi and Power Line Communication, the idea is to find the optimal end-to-end route between access points when you know the pairwise links’ capacities. The main task would be to predict the end-to-end throughput (regression problem). No background in Networking is required (but it is of course a plus).