INDY Lab

Information and Network Dynamics group

Our research group is part of the School of Computer and Communication Sciences at EPFL in Lausanne, Switzerland. The group is lead by Matthias Grossglauser and Patrick Thiran. Our research focuses broadly on the statistical modeling of large dynamical systems involving both human and technical agents. Examples include social and information networks, epidemic processes, human mobility and transportation, and recommender systems. Our work lies at the intersection of machine learning, probabilistic modeling, large-scale data analytics, and performance analysis. Here are the research areas we work on:

Graph Mining

Network alignment, network assembly, and network inference

Mobility Mining

Prediction and transfer learning in populations

Epidemics

Monitoring, prediction, and source inference

Distributed Processes on Graphs

Gossiping, voting, and optimization

Discrete-Choice Models

Large-scale inference and ranking

Active Learning

Multi-armed bandits, online optimization, active learning

Wireless and Hybrid Networking

Wireless networking, power-line communication, hybrid networking

Applications

In computational biology, data privacy, medical data analytics, etc.

Recent publications

Discovering Lobby-Parliamentarian Alignments through NLP
A. Suresh, L. Radojević, F. Salvi, A. Magron, V. Kristof and M. Grossglauser
2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Mexico City, Mexico. , June 16-21, 2024.
[abstract]

We discover alignments of views between interest groups (lobbies) and members of the European Parliament (MEPs) by automatically analyzing their texts. Specifically, we do so by collecting novel datasets of lobbies’ position papers and MEPs’ speeches, and comparing these texts on the basis of semantic similarity and entailment. In the absence of ground-truth, we perform an indirect validation by comparing the discovered alignments with a dataset, which we curate, of retweet links between MEPs and lobbies, and with the publicly disclosed meetings of MEPs. Our best method performs significantly better than several baselines. Moreover, an aggregate analysis of the discovered alignments, between groups of related lobbies and political groups of MEPs, correspond to the expectations from the ideology of the groups (e.g., groups on the political left are more aligned with humanitarian and environmental organisations). We believe that this work is a step towards enhancing the transparency of the intricate decision-making processes within democratic institutions.

Relaxing the Additivity Constraints in Decentralized No-Regret High-Dimensional Bayesian Optimization
A. Bardou, P. Thiran and T. Begin
The Twelfth International Conference on Learning Representations, Vienna, Austria, May 7-11, 2024.
[abstract]

Bayesian Optimization (BO) is typically used to optimize an unknown function f that is noisy and costly to evaluate, by exploiting an acquisition function that must be maximized at each optimization step. Even if provably asymptotically optimal BO algorithms are efficient at optimizing low-dimensional functions, scaling them to high-dimensional spaces remains an open problem, often tackled by assuming an additive structure for f. By doing so, BO algorithms typically introduce additional restrictive assumptions on the additive structure that reduce their applicability domain. This paper contains two main contributions: (i) we relax the restrictive assumptions on the additive structure of f without weakening the maximization guarantees of the acquisition function, and (ii) we address the over-exploration problem for decentralized BO algorithms. To these ends, we propose DuMBO, an asymptotically optimal decentralized BO algorithm that achieves very competitive performance against state-of-the-art BO algorithms, especially when the additive structure of f comprises high-dimensional factors.

Recovering Static and Time-Varying Communities Using Persistent Edges
K. Avrachenkov, M. Dreveton and L. Leskela
Ieee Transactions On Network Science And Engineering, 2024.
[full text] [view at publisher] [abstract]

This article focuses on spectral methods for recovering communities in temporal networks. In the case of fixed communities, spectral clustering on the simple time-aggregated graph (i.e., the weighted graph formed by the sum of the interactions over all temporal snapshots) does not always produce satisfying results. To utilise information carried by temporal correlations, we propose to employ different weights on freshly appearing and persistent edges. We show that spectral clustering on such weighted graphs can be explained as a relaxation of the maximum likelihood estimator of an extension of the degree-corrected stochastic block model with Markov interactions. We also study the setting of evolving communities, for which we use the prediction at time t-1 as an oracle for inferring the community labels at time t. We demonstrate the accuracy of the proposed methods on synthetic and real data sets.

It’s All Relative: Learning Interpretable Models for Scoring Subjective Bias in Documents from Pairwise Comparisons
A. Suresh, C. H. Wu and M. Grossglauser
18th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2024), Malta, March 17th -22nd, 2024.
[abstract]

We propose an interpretable model to score the subjective bias present in documents, based only on their textual content. Our model is trained on pairs of revisions of the same Wikipedia article, where one version is more biased than the other. Although prior approaches based on bias classification have struggled to obtain a high accuracy for the task, we are able to develop a useful model for scoring bias by learning to accurately perform pairwise comparisons. We show that we can interpret the parameters of the trained model to discover the words most indicative of bias. We also apply our model in three different settings by studying the temporal evolution of bias in Wikipedia articles, comparing news sources based on bias, and scoring bias in law amendments. In each case, we demonstrate that the outputs of the model can be explained and validated, even for the two domains that are outside the training-data domain. We also use the model to compare the general level of bias between domains, where we see that legal texts are the least biased and news media are the most biased, with Wikipedia articles in between.

Leveraging Unlabeled Data to Track Memorization
M. Forouzesh, H. Sedghi and P. Thiran
11th International Conference on Learning Representations (ICLR 2023), Kigali, Rwanda, May 1-5, 2023.
[abstract]

Deep neural networks may easily memorize noisy labels present in real-world data, which degrades their ability to generalize. It is therefore important to track and evaluate the robustness of models against noisy label memorization. We propose a metric, called susceptibility, to gauge such memorization for neural networks. Susceptibility is simple and easy to compute during training. Moreover, it does not require access to ground-truth labels and it only uses unlabeled data. We empirically show the effectiveness of our metric in tracking memorization on various architectures and datasets and provide theoretical insights into the design of the susceptibility metric. Finally, we show through extensive experiments on datasets with synthetic and real-world label noise that one can utilize susceptibility and the overall training accuracy to distinguish models that maintain a low memorization on the training set and generalize well to unseen clean data.

We have open positions!

We are hiring postdocs and PhD students in all our research areas.

Join now!