Our research group is part of the School of Computer and Communication Sciences at EPFL in Lausanne, Switzerland. The group is lead by Matthias Grossglauser and Patrick Thiran. Our research focuses broadly on the statistical modeling of large dynamical systems involving both human and technical agents. Examples include social and information networks, epidemic processes, human mobility and transportation, and recommender systems. Our work lies at the intersection of machine learning, probabilistic modeling, large-scale data analytics, and performance analysis. Here are the research areas we work on:

**
TB Hackathon: Development and Comparison of Five Models to Predict Subnational Tuberculosis Prevalence in Pakistan
**

S. Alba, E. Rood, F. Mecatti, J. M. Ross, P. J. Dodd, S. Chang, M. Potgieter, G. Bertarelli, N. J. Henry, K. E. LeGrand, W. Trouleau, D. Shaweno, P. MacPherson, Z. Z. Qin, C. Mergenthaler, F. Giardina, E. Augustijn, A. Q. Baloch and A. Latif

Tropical Medicine And Infectious Disease, 2022.

Pakistan's national tuberculosis control programme (NTP) is among the many programmes worldwide that value the importance of subnational tuberculosis (TB) burden estimates to support disease control efforts, but do not have reliable estimates. A hackathon was thus organised to solicit the development and comparison of several models for small area estimation of TB. The TB hackathon was launched in April 2019. Participating teams were requested to produce district-level estimates of bacteriologically positive TB prevalence among adults (over 15 years of age) for 2018. The NTP provided case-based data from their 2010-2011 TB prevalence survey, along with data relating to TB screening, testing and treatment for the period between 2010-2011 and 2018. Five teams submitted district-level TB prevalence estimates, methodological details and programming code. Although the geographical distribution of TB prevalence varied considerably across models, we identified several districts with consistently low notification-to-prevalence ratios. The hackathon highlighted the challenges of generating granular spatiotemporal TB prevalence forecasts based on a cross-sectional prevalence survey data and other data sources. Nevertheless, it provided a range of approaches to subnational disease modelling. The NTP's use and plans for these outputs shows that, limitations notwithstanding, they can be valuable for programme planning.

**
On the robustness of the metric dimension of grid graphs to adding a single edge
**

S. Mashkaria, G. Odor and P. Thiran

Discrete Applied Mathematics, 2022-07-31.

The metric dimension (MD) of a graph is a combinatorial notion capturing the minimum number of landmark nodes needed to distinguish every pair of nodes in the graph based on graph distance. We study how much the MD can increase if we add a single edge to the graph. The extra edge can either be selected adversarially, in which case we are interested in the largest possible value that the MD can take, or uniformly at random, in which case we are interested in the distribution of the MD. The adversarial setting has already been studied by [Eroh et. al., 2015] for general graphs, who found an example where the MD doubles on adding a single edge. By constructing a different example, we show that this increase can be as large as exponential. However, we believe that such a large increase can occur only in specially constructed graphs, and that in most interesting graph families, the MD at most doubles on adding a single edge. We prove this for $d$-dimensional grid graphs, by showing that $2d$ appropriately chosen corners and the endpoints of the extra edge can distinguish every pair of nodes, no matter where the edge is added. For the special case of $d=2$, we show that it suffices to choose the four corners as landmarks. Finally, when the extra edge is sampled uniformly at random, we conjecture that the MD of 2-dimensional grids converges in probability to $3+\mathrm{Ber}(8/27)$, and we give an almost complete proof.

**
The power of adaptivity in source identification with time queries on the path
**

V. Lecomte, G. Odor and P. Thiran

Theoretical Computer Science, 2022-04-08.

We study the problem of identifying the source of a stochastic diffusion process spreading on a graph based on the arrival times of the diffusion at a few queried nodes. In a graph $G=(V,E)$, an unknown source node $v^* \in V$ is drawn uniformly at random, and unknown edge weights $w(e)$ for $e\in E$, representing the propagation delays along the edges, are drawn independently from a Gaussian distribution of mean $1$ and variance $\sigma^2$. An algorithm then attempts to identify $v^*$ by querying nodes $q \in V$ and being told the length of the shortest path between $q$ and $v^*$ in graph $G$ weighted by $w$. We consider two settings: \emph{non-adaptive}, in which all query nodes must be decided in advance, and \emph{adaptive}, in which each query can depend on the results of the previous ones. Both settings are motivated by an application of the problem to epidemic processes (where the source is called patient zero), which we discuss in detail. We characterize the query complexity when $G$ is an $n$-node path. In the non-adaptive setting, $\Theta(n\sigma^2)$ queries are needed for $\sigma^2 \leq 1$, and $\Theta(n)$ for $\sigma^2 \geq 1$. In the adaptive setting, somewhat surprisingly, only $\Theta(\log\log_{1/\sigma}n)$ are needed when $\sigma^2 \leq 1/2$, and $\Theta(\log \log n)+O_\sigma(1)$ when $\sigma^2 \geq 1/2$. This is the first mathematical study of source identification with time queries in a non-deterministic diffusion process.

**
Augmenting and Tuning Knowledge Graph Embeddings
**

R. Bamler, F. Salehi and S. Mandt

35th Uncertainty in Artificial Intelligence (UAI) Conference, Tel Aviv, ISRAEL, Jul 22-25, 2019.

Knowledge graph embeddings rank among the most successful methods for link prediction in knowledge graphs, i.e., the task of completing an incomplete collection of relational facts. A downside of these models is their strong sensitivity to model hyperparameters, in particular regularizers, which have to be extensively tuned to reach good performance [Kadlec el al, 2017]. We propose an efficient method for large scale hyperparameter tuning by interpreting these models in a probabilistic framework. After a model augmentation that introduces perentity hyperparameters, we use a variational expectation-maximization approach to tune thousands of such hyperparameters with minimal additional cost. Our approach is agnostic to details of the model and results in a new state of the art in link prediction on standard benchmark data.

**
Sequential metric dimension for random graphs
**

G. Odor and P. Thiran

Journal of Applied Probability, 2021.

In the localization game on a graph, the goal is to find a fixed but unknown target node v* with the least number of distance queries possible. In the j-th step of the game, the player queries a single node v_j and receives, as an answer to their query, the distance between the nodes v_j and v* . The sequential metric dimension (SMD) is the minimal number of queries that the player needs to guess the target with absolute certainty, no matter where the target is. The term SMD originates from the related notion of metric dimension (MD), which can be defined the same way as the SMD except that the player’s queries are non-adaptive. In this work we extend the results of Bollobás, Mitsche, and Prałat on the MD of Erdős–Rényi graphs to the SMD. We find that, in connected Erdős–Rényi graphs, the MD and the SMD are a constant factor apart. For the lower bound we present a clean analysis by combining tools developed for the MD and a novel coupling argument. For the upper bound we show that a strategy that greedily minimizes the number of candidate targets in each step uses asymptotically optimal queries in Erdős–Rényi graphs. Connections with source localization, binary search on graphs, and the birthday problem are discussed.

We are hiring postdocs and PhD students in all our research areas.