Our research group is part of the School of Computer and Communication Sciences at EPFL in Lausanne, Switzerland. The group is lead by Matthias Grossglauser and Patrick Thiran. Our research focuses broadly on the statistical modeling of large dynamical systems involving both human and technical agents. Examples include social and information networks, epidemic processes, human mobility and transportation, and recommender systems. Our work lies at the intersection of machine learning, probabilistic modeling, large-scale data analytics, and performance analysis. Here are the research areas we work on:

**
On the robustness of the metric dimension of grid graphs to adding a single edge
**

S. Mashkaria, G. Odor and P. Thiran

Discrete Applied Mathematics, 2022-07-31.

The metric dimension (MD) of a graph is a combinatorial notion capturing the minimum number of landmark nodes needed to distinguish every pair of nodes in the graph based on graph distance. We study how much the MD can increase if we add a single edge to the graph. The extra edge can either be selected adversarially, in which case we are interested in the largest possible value that the MD can take, or uniformly at random, in which case we are interested in the distribution of the MD. The adversarial setting has already been studied by [Eroh et. al., 2015] for general graphs, who found an example where the MD doubles on adding a single edge. By constructing a different example, we show that this increase can be as large as exponential. However, we believe that such a large increase can occur only in specially constructed graphs, and that in most interesting graph families, the MD at most doubles on adding a single edge. We prove this for $d$-dimensional grid graphs, by showing that $2d$ appropriately chosen corners and the endpoints of the extra edge can distinguish every pair of nodes, no matter where the edge is added. For the special case of $d=2$, we show that it suffices to choose the four corners as landmarks. Finally, when the extra edge is sampled uniformly at random, we conjecture that the MD of 2-dimensional grids converges in probability to $3+\mathrm{Ber}(8/27)$, and we give an almost complete proof.

**
The power of adaptivity in source identification with time queries on the path
**

V. Lecomte, G. Odor and P. Thiran

Theoretical Computer Science, 2022-04-08.

We study the problem of identifying the source of a stochastic diffusion process spreading on a graph based on the arrival times of the diffusion at a few queried nodes. In a graph $G=(V,E)$, an unknown source node $v^* \in V$ is drawn uniformly at random, and unknown edge weights $w(e)$ for $e\in E$, representing the propagation delays along the edges, are drawn independently from a Gaussian distribution of mean $1$ and variance $\sigma^2$. An algorithm then attempts to identify $v^*$ by querying nodes $q \in V$ and being told the length of the shortest path between $q$ and $v^*$ in graph $G$ weighted by $w$. We consider two settings: \emph{non-adaptive}, in which all query nodes must be decided in advance, and \emph{adaptive}, in which each query can depend on the results of the previous ones. Both settings are motivated by an application of the problem to epidemic processes (where the source is called patient zero), which we discuss in detail. We characterize the query complexity when $G$ is an $n$-node path. In the non-adaptive setting, $\Theta(n\sigma^2)$ queries are needed for $\sigma^2 \leq 1$, and $\Theta(n)$ for $\sigma^2 \geq 1$. In the adaptive setting, somewhat surprisingly, only $\Theta(\log\log_{1/\sigma}n)$ are needed when $\sigma^2 \leq 1/2$, and $\Theta(\log \log n)+O_\sigma(1)$ when $\sigma^2 \geq 1/2$. This is the first mathematical study of source identification with time queries in a non-deterministic diffusion process.

**
Augmenting and Tuning Knowledge Graph Embeddings
**

R. Bamler, F. Salehi and S. Mandt

35th Uncertainty in Artificial Intelligence (UAI) Conference, Tel Aviv, ISRAEL, Jul 22-25, 2019.

Knowledge graph embeddings rank among the most successful methods for link prediction in knowledge graphs, i.e., the task of completing an incomplete collection of relational facts. A downside of these models is their strong sensitivity to model hyperparameters, in particular regularizers, which have to be extensively tuned to reach good performance [Kadlec el al, 2017]. We propose an efficient method for large scale hyperparameter tuning by interpreting these models in a probabilistic framework. After a model augmentation that introduces perentity hyperparameters, we use a variational expectation-maximization approach to tune thousands of such hyperparameters with minimal additional cost. Our approach is agnostic to details of the model and results in a new state of the art in link prediction on standard benchmark data.

**
Sequential metric dimension for random graphs
**

G. Odor and P. Thiran

Journal of Applied Probability, 2021.

In the localization game on a graph, the goal is to find a fixed but unknown target node v* with the least number of distance queries possible. In the j-th step of the game, the player queries a single node v_j and receives, as an answer to their query, the distance between the nodes v_j and v* . The sequential metric dimension (SMD) is the minimal number of queries that the player needs to guess the target with absolute certainty, no matter where the target is. The term SMD originates from the related notion of metric dimension (MD), which can be defined the same way as the SMD except that the player’s queries are non-adaptive. In this work we extend the results of Bollobás, Mitsche, and Prałat on the MD of Erdős–Rényi graphs to the SMD. We find that, in connected Erdős–Rényi graphs, the MD and the SMD are a constant factor apart. For the lower bound we present a clean analysis by combining tools developed for the MD and a novel coupling argument. For the upper bound we show that a strategy that greedily minimizes the number of candidate targets in each step uses asymptotically optimal queries in Erdős–Rényi graphs. Connections with source localization, binary search on graphs, and the birthday problem are discussed.

**
Switchover phenomenon induced by epidemic seeding on geometric networks
**

G. Odor, D. Czifra, J. Komjáthy, L. Lovász and M. Karsai

Proceedings of the National Academy of Sciences, 2021.

It is a fundamental question in disease modeling how the initial seeding of an epidemic, spreading over a network, determines its final outcome. One important goal has been to find the seed configuration, which infects the most individuals. Although the identified optimal configurations give insight into how the initial state affects the outcome of an epidemic, they are unlikely to occur in real life. In this paper we identify two important seeding scenarios, both motivated by historical data, that reveal a complex phenomenon. In one scenario, the seeds are concentrated on the central nodes of a network, while in the second one, they are spread uniformly in the population. Comparing the final size of the epidemic started from these two initial conditions through data-driven and synthetic simulations on real and modeled geometric metapopulation networks, we find evidence for a switchover phenomenon: When the basic reproduction number R0 is close to its critical value, more individuals become infected in the first seeding scenario, but for larger values of R0, the second scenario is more dangerous. We find that the switchover phenomenon is amplified by the geometric nature of the underlying network and confirm our results via mathematically rigorous proofs, by mapping the network epidemic processes to bond percolation. Our results expand on the previous finding that, in the case of a single seed, the first scenario is always more dangerous and further our understanding of why the sizes of consecutive waves of a pandemic can differ even if their epidemic characters are similar.

We are hiring postdocs and PhD students in all our research areas.