Prof. Dr. Tobias Friedrich

# Research Seminar (Summer Term 2021)

## COVID-19 and virtual seminars

Due to the current situation we are not hosting seminars at the HPI anymore. Instead, we host virtual seminars via zoom. If you want to join the seminar on short notice or if you do not want to subscribe to the mailing list, please send an email to Katrin.Casel(at)hpi.de or to Andreas.Goebel(at)hpi.de.

## Description

A series of talks on current research and interesting topics in algorithm engineering and theoretical computer science. Everyone is welcome to attend talks.

If you want to give a talk about a topic in algorithm engineering or theoretical computer science, please write an e-mail to Dr. Katrin Casel or to Dr. Andreas Göbel.

## Announcements

For announcements of talks, subscribe to our mailing list.

## Talks (click on title to show abstract)

DateSpeakerTitle
07.04. 10:15Michael Vaichenker
As fair clustering is quite a popular topic many questions regarding it are already answered. However, the problem of finding a Fair Ratio-Cut remains mostly a blank slate. Fair Ratio-Cut is a cut in a colored graph that minimizes the ratio between edges cut and the balance of the cut, while containing each color to equal amounts within each resulting cluster. The goal of my thesis is to contribute first theoretical results for this problem and I will present to you some of the results I already found, including the hardness of the "many colors case".
16.04. 11:00Nicolas Klodt, Lars Seifert, Arthur Zahn

Chromatic Correlation Clustering (CCC) models clustering of objects with categorical pairwise relationships. The model can be viewed as clustering the vertices of a graph with edge-labels (colors). Bonchi et al. [KDD 2012] introduced it as a natural generalization of the well studied problem Correlation Clustering (CC), also known as Cluster Editing.

Our main theoretical contribution is an alternative analysis of the famous Pivot algorithm for CC. We show that, when simply run color-blind, Pivot is also a linear time 3-approximation for CCC. The previous best theoretical results for CCC were a 4-approximation with a high-degree polynomial runtime and a linear time 11-approximation, both by Anava et al. [WWW 2015].

While this theoretical result justifies Pivot as a baseline comparison for other heuristics, its blunt color-blindness performs poorly in practice. We develop a color-sensitive, practical heuristic we call Greedy Expansion that empirically outperforms all heuristics proposed for CCC so far, both on real-world and synthetic instances.

Further, we propose a novel generalization of CCC allowing for multi-labelled edges. We argue that it is more suitable for many of the real-world applications and show that running the color blind algorithm does still result in a 3-approximation.

23.04. 12:00John Sylvester

We consider two types of controlled random walks on graphs. In the choice random walk, the controller chooses between two random neighbours at each step; in the epsilon-biased random walk the controller instead has a small probability at each step of a free choice of neighbour. We consider the problem of finding optimal strategies for the controller minimising the expected time to hit a given vertex or visit (cover) all vertices.

Using a general framework for boosting the probabilities of rare events, we show a significant speed-up over the simple random walk for graphs with good expansion properties. We also establish a complexity dichotomy for making optimal choices, which are tractable for hitting times but NP-hard for cover times on undirected graphs (PSPACE-complete for directed graphs). We shall also discuss some recent new results.

This is joint work with Agelos Georgakopoulos, John Haslegrave, Sam Olesker-Taylor and Thomas Sauerwald.

03.05. 17:00Thomas Bläsius,
Philipp Fischbeck
A theoretical running time analysis is a great tool for predicting the performance of algorithms, saving programmers from implementing and evaluating alternatives that are "obviously" worse. Though this works very well in many cases, there are situations where the predictions do not match practical observations. This often comes from the fact that practical problem instances have certain properties that make them well-behaved. In this talk, I formalize these properties using probabilistic models and discuss theoretical results based on these formalizations. In the course of this, I will discuss advantages and limitations of this model-based approach.

Bootstrap percolation is a classical model for the spread of information in a network. In the round-based version, nodes of an undirected graph become active once at least r neighbors were active in the previous round. We propose the perturbed percolation process, a superposition of two percolation processes on the same node set. One process acts on a base graph with activation threshold 1, while the other acts on a random graph with threshold r (representing local and global contacts, respectively). We consider a class of grid-like base graphs, in which the bootstrap percolation process only spreads polynomially fast. As random graph, we consider Erdős–Rényi graphs. For r=1, the perturbed percolation process percolates exponentially fast, as the diameter is known to be low. For r>=2 however, we prove that the process initially spreads polynomially fast, with no activations via the random graph, and eventually the activation rate becomes exponential, driven only by the random graph. We complement our theoretical results by empirical analyses of this process. In addition, we discuss future work on the inclusion of recovery into this process.
07.05. 11:00 Haris Angelidakis

There has been a recent surge of interest in incorporating fairness aspects into classical clustering problems. Two recently introduced variants of the k-Center problem in this spirit are Colorful k-Center, introduced by Bandyapadhyay, Inamdar, Pai, and Varadarajan (2019), and lottery models, such as the Fair Robust k-Center problem introduced by Harris, Pensyl, Srinivasan, and Trinh (2017). To address fairness aspects, these models, compared to traditional k-Center, include additional covering constraints. Prior approximation results for these models require to relax some of the normally hard constraints, like the number of centers to be opened or the involved covering constraints, and therefore, only obtain constant-factor pseudo-approximations. In this work, we introduce a new approach to deal with such covering constraints that leads to (true) approximations, including a 4-approximation for Colorful k-Center with constantly many colors---settling an open question raised by Bandyapadhyay, Inamdar, Pai, and Varadarajan---and a 4-approximation for Fair Robust k-Center, for which the existence of a (true) constant-factor approximation was also open. We complement our results by showing that if one allows an unbounded number of colors, then Colorful k-Center admits no approximation algorithm with finite approximation guarantee, assuming that P is not equal to NP. Moreover, under the Exponential Time Hypothesis, the problem is inapproximable if the number of colors grows faster than logarithmic in the size of the ground set.

This is joint work with Georg Anegg, Adam Kurpisz, and Rico Zenklusen.

21.05. 11:00 Marcus Pappik

The hard-sphere model is one of the most extensively studied models in statistical physics. It describes the continuous distribution of spherical particles, governed by hard-core interactions. An important quantity of this model is the normalizing factor of this distribution, called the partition function. We propose a Markov chain Monte Carlo algorithm for approximating the grand-canonical partition function of the hard-sphere model in d dimensions. The runtime of our algorithm is polynomial in the volume of the system for the entire known real-valued regime for the uniqueness of the Gibbs measure.

Key to our approach is to define a discretization that closely approximates the partition function of the continuous model. This results in a discrete hard-core instance that is exponential in the size of the initial hard-sphere model. Our approximation bound follows directly from the correlation decay threshold of an infinite regular tree with degree equal to the maximum degree of our discretization. To cope with the exponential blow-up of the discrete instance we use clique dynamics, a Markov chain that was recently introduced in the setting of abstract polymer models. We prove rapid mixing of clique dynamics up to the tree threshold of the univariate hard-core model. This is achieved by relating clique dynamics to block dynamics and adapting the spectral expansion method, which was recently used to bound the mixing time of Glauber dynamics within the same parameter regime.

28.05. 11:00 Moshik Hershcovitch

In this talk, I will share the new trends in the storage/memory field, focusing on Persistent Memory. We will overview the storage stack and the gap between volatile Memory (DRAM) and persistent storage (SSD). After more than a decade of research, in 2015 Intel closed this gap by announcing a new persistent memory media called 3D Xpoint. This media is closing the gap by being persistent like SSD, fast almost like DRAM, and costs slightly less than DRAM. The first commercial product called Optane DC was out in 2019. In this talk, we will deep dive into the challenges of adopting this technology and explore the research opportunities in this area.

04.06. 11:00 Maximilian Katzmann

Force-directed drawing algorithms are the most commonly used approach to visualize networks. While they are usually very robust, the performance of Euclidean spring embedders decreases if the graph exhibits the high level of heterogeneity that typically occurs in scale-free real-world networks. As heterogeneity naturally emerges from hyperbolic geometry (in fact, scale-free networks are often perceived to have an underlying hyperbolic geometry), it is natural to embed them into the hyperbolic plane instead. Previous techniques that produce hyperbolic embeddings usually make assumptions about the given network, which (if not met) impairs the quality of the embedding. It is still an open problem to adapt force-directed embedding algorithms to make use of the heterogeneity of the hyperbolic plane, while also preserving their robustness. We identify fundamental differences between the behavior of spring embedders in Euclidean and hyperbolic space, and adapt the technique to take advantage of the heterogeneity of the hyperbolic plane.

11.06. 11:00 Nicolas Klodt
Katrin Casel

The contact process models the spread of an infection in a network. If the infection rate is high enough, the infection can survive exponentially long in the size of the network. This effect even occurs on simple sub structures such as big stars, that can be found in many real world networks. We analyze thresholds for the infection rate above which the infection becomes epidemic on these structures.

We all know how to efficiently find shortest paths, and we have (mostly) accepted that finding longest paths is difficult, simply from the NP-hardness of the Hamiltonian path problem. Curiously, there seems to be a whole world of problems about finding detours that are in-between shortest and longest paths. Recent work threw the full arsenal of graph minor theory on these and still did not resolve a seemingly simple question. I am curious to see if one of you has a smart idea for this!
18.06. 11:00Simon Wietheger

Fault tolerant algorithms reflect the error-proneness of many real-world systems. If the system is modelled as a graph, errors might represent failing edges or vertices, depending on the problem. We usually assume that only up to $$k\in\mathbb N$$ vertices/edges will fail at the same time. Given $$k$$, the task is to preprocess the original graph into a data structure that efficiently answers queries that include a set $$F, |F|\le k$$ of failing vertices/edges.

For the Fault Tolerant Reachability problem, we are given a directed graph $$G=(V,E), s\in V$$ and $$k\in\mathbb N$$. The computed data structure has to answer given $$F\subseteq E, v\in V$$, whether $$v$$ is reachable from $$s$$ in $$G-F$$. The presented algorithm uses a useful graph structure called important separators to build a subgraph of $$G$$, which is then used to answer the queries.

02.07. 11:00 Maximilian Boether

Traffic congestion is a major issue that can be solved by suggesting drivers alternative routes they are willing to take. This concept has been formalized as a strategic routing problem in which a single alternative route is suggested to an existing one. We extend this formalization and introduce the Multiple-Routes problem, which is given a start and destination and aims at finding up to n different routes that the drivers strategically disperse over, minimizing the overall travel time of the system.

Due to the NP-hard nature of the problem, we introduce the Multiple-Routes evolutionary algorithm (MREA) as a heuristic solver. We study several mutation and crossover operators and evaluate them on real-world data of Berlin, Germany. We find that a combination of all operators yields the best result, improving the overall travel time by a factor between 1.8 and 3, in the median, compared to all drivers taking the fastest route. For the base case n=2, we compare our MREA to the highly tailored optimal solver by Bläsius et al. [ATMOS 2020] and show that, in the median, our approach finds solutions of quality at least 99.69% of an optimal solution while only requiring 40% of the time.

09.07. 11:00 Karen Seidel
This talk is about learning from informant, a formal model for incremental binary classification by E. M. Gold. Illustrating examples are linear separators and other uniformly decidable sets of formal languages. We sketch why a full-information learning algorithm can be assumed not to withdraw an optimal classifier with growing training data size, even with respect to a more general hypothesis space. Moreover, we discuss hypothesis-based and state-based models for memory-efficient algorithms. We observe that in both cases the above mentioned U-shapes can not be prevented for every concept class, in case they are syntactic.
16.07. 11:00Leon Schiller
Hyperbolic Random Graphs have become a well established model for scale-free networks. We used a similar approach for generating trees that resemble the properties of infection chains, and could show that, in contrast to trees generated in the Euclidean plane, they model a heterogeneous infection spread where a small fraction of individuals is responsible for a vast majority of infections. We showed that the trees from our model are a subset of the hyperbolic Delaunay Triangulation of the underlying set of points, and that we can make their computation more efficient by precalculating this triangulation or its dual, the hyperbolic Voronoi diagram. Although there is an efficient algorithm for this, it suffers from numerical inaccuracies when working with points that have a large hyperbolic distance from the origin. That is because it uses the Poincaré model of hyperbolic space where points distant from the center are all represented with a radial coordinate very close to one. Due to the limited precision of floating point numbers, these points can then not be distinguished anymore. In my talk, I want to propose a different method for calculating hyperbolic Voronoi diagrams which does not suffer from these problems. It is an adaptation of Fortune’s Algorithm - a commonly used algorithm for computing Voronoi diagrams in Euclidean space - to a model of hyperbolic space where the coordinates of points are not bounded: the hyperboloid model.
30.07. 11:00 Louise Molitor
Schelling’s classical segregation model gives a coherent explanation for the wide-spread phenomenon of residential segregation. We introduce an agent-based saturated open-city variant, the Flip Schelling Process (FSP), in which agents, placed on a graph, have one out of two types and, based on the predominant type in their neighborhood, decide whether to change their types; similar to a new agent arriving as soon as another agent leaves the vertex. We investigate the probability that an edge {u, v} is monochrome, i.e., that both vertices u and v have the same type in the FSP, and we provide a general framework for analyzing the influence of the underlying graph topology on residential segregation. In particular, for two adjacent vertices, we show that a highly decisive common neighborhood, i.e., a common neighborhood where the absolute value of the difference between the number of vertices with different types is high, supports segregation and, moreover, that large common neighborhoods are more decisive. As an application, we study the expected behavior of the FSP on two common random graph models with and without geometry: (1) For random geometric graphs, we show that the existence of an edge {u, v} makes a highly decisive common neighborhood for u and v more likely. Based on this, we prove the existence of a constant c > 0 such that the expected fraction of monochrome edges after the FSP is at least 1/2 + c. (2) For Erdős–Rényi graphs we show that large common neighborhoods are unlikely and that the expected fraction of monochrome edges after the FSP is at most 1/2 + o (1). Our results indicate that the cluster structure of the underlying graph has a significant impact on the obtained segregation strength.
In gene co-expression networks, weighted gene-to-gene connections give the probability of co-existence in some module (i.e. pathway), leading to a change or product in a cell. Gene modules carry out vital processes such as damage repair or drug processing. In these networks, each module organizes its corresponding genes into dense overlapping, subgraphs---often forming cliques. Detecting all module participating genes is beneficial for both disease understanding and drug discovery. If a particular disease-associated gene is known to be undruggable, the gene module provides a set of additional drug targets and lends insights into the disease itself. In this talk, I will present our recent work on detecting modules in gene co-expression networks by decomposing into weighted cliques. We improve upon prior clique decomposition work by proposing a new problem called Weighted Clique Decomposition and give two fixed-parameter tractable algorithms for solving a simplified variant. Finally, we show improved scalability results on a corpus of biologically inspired synthetic graphs.
16.08. 14:00 Martin Schirneck

An $$f$$-edge fault-tolerant diameter oracle ($$f$$-FDO) preprocesses a given graph $$G$$ and, when queried with a set $$F$$ of at most $$f$$ edges, returns the approximate diameter of the graph $$G - F$$. Such data structures need to strike a balance between the time they take for preprocessing and answering queries, the occupied space, and the stretch of the approximation.

For the single-failure case ($$f = 1$$) on directed $$n$$-vertex, $$m$$-edge graphs with diameter $$D$$, Henzinger et al. [ITCS 2017] presented an $$(1+\varepsilon)$$-approximate 1-FDO with $$\widetilde{O}(mn + n^{1.5} \sqrt{Dm/\varepsilon})$$ preprocessing time, constant query time, and $$O(m)$$ space. They also showed that any combinatorial 1-FDO requires $$n^{3-o(1)}$$ preprocessing time, unless the Boolean Matrix Multiplication conjecture fails. We improve the preprocessing to $$\widetilde{O}(mn + n^2/\varepsilon)$$ keeping the same stretch, query time, and space. When using fast matrix multiplication, this reduces to $$\widetilde{O}(n^{2.5794} + n^2/\varepsilon)$$, breaking the cubic barrier. We further give an unconditional space lower bound, showing that any 1-FDO with stretch better than $$3/2$$ requires $$\Omega(m)$$ bits of space.

For multiple failures ($$f > 1$$) on undirected graphs, we present an $$(f+2)$$-approximate $$f$$-FDO, with preprocessing time $$\widetilde{O}(fm)$$, query time $$\widetilde{O}(f^2)$$, taking $$\widetilde{O}(fn)$$ space. This refines an analysis by Bilò et al. [STACS 2016]. We also show that the space requirement is tight up to polylogarithmic factors, provided that the oracle can also be queried with non-edges. Finally, we discuss how we can swap the need of approximation for a slightly larger query time in networks with a polylogarithmic diameter.

This is joint work with Davide Bilò, Sarel Cohen, and Tobias Friedrich.

20.08. 11:00Bashini Mahaarachchi
Availability of parking spaces map is important in traffic planning and related infrastructure planning. The information needed in such applications include the number of parking spaces, their positions, directions, etc. One way to obtain this information is by analysing aerial images, and extracting information from them through intelligent machine learning models. Thus, in this research we use a two-stage object detection model for finding parking spaces in satellite images using the Detecron2 architecture and the digital orthophotos from Berlin.
26.09. 13:30Janosch Ruff
Kolmogorov Complexity, the shortest algorithmic description, finds a lot of different applications from the Machine Learning concept of Minimal Description Length to defining Algorithmic Randomness among others. Universal Probability, the probability of an output using a random computer program, is one of its most prominent notions with interesting properties for algorithm analysis. The topic of this talk is to find an adequate definition for Length-Conditional Universal Probability based on desired properties within the scope of Recursion Theory and to give an outlook for algorithm design choices and compression implications based on the theory.
27.08. 11:00 Maximilian Katzmann

Finding a minimum vertex cover in a network is a fundamental NP-complete graph problem. One way to deal with its computational hardness, is to trade the qualitative performance of an algorithm (allowing non-optimal outputs) for an improved running time. For the vertex cover problem, there is a gap between theory and practice when it comes to understanding this tradeoff. On the one hand, it is known that it is NP-hard to approximate a minimum vertex cover within a factor of √2. On the other hand, a simple greedy algorithm yields close to optimal approximations in practice.

A promising approach towards understanding this discrepancy is to recognize the differences between theoretical worst-case instances and real-world networks. Following this direction, we close the gap between theory and practice by providing an algorithm that efficiently computes nearly optimal vertex cover approximations on hyperbolic random graphs; a network model that closely resembles real-world networks in terms of degree distribution, clustering, and the small-world property. More precisely, our algorithm computes a (1 + o(1))-approximation, asymptotically almost surely, and has a running time of O(m log(n)).

The proposed algorithm is an adaption of the successful greedy approach, enhanced with a procedure that improves on parts of the graph where greedy is not optimal. This makes it possible to introduce a parameter that can be used to tune the tradeoff between approximation performance and running time. Our empirical evaluation on real-world networks shows that this allows for improving over the near-optimal results of the greedy approach.

02.09. 17:00Richard Spence

An additive +β spanner of a graph G is a subgraph which preserves all distances in G up to an additive +β error. Additive spanners are well-studied in unweighted graphs but have only recently received attention in weighted graphs [Elkin et al. 2019 and 2020, Ahmed et al. 2020]. In this talk, we discuss algorithms which construct sparse additive spanners with error +cW(.,.), where W(u,v) is the maximum edge weight of a shortest u-v path in G. These include a pairwise +(6+ε)W(.,.) spanner over G and vertex pairs P on $$O_ε(n|P|^{1/4})$$ edges and a pairwise +(2+ε)W(.,.) spanner on $$O_ε(n|P|^{1/3})$$ edges, nearly matching the unweighted case.

Besides sparsity, another common way to measure the quality of a spanner in weighted graphs is by its lightness, defined as the total edge weight of the spanner divided by the weight of an MST of G. We provide the first known lightness results for additive spanners: an all-pairs +εW(.,.) spanner with $$O_ε(n)$$ lightness, and a +(4+ε) W(.,.) spanner with $$O_ε(n^{2/3})$$ lightness. All of the above spanners can be constructed in polynomial time.

03.09. 11:00 Martin Schirneck ,
Ziena Zeif

A single-source distance sensitivity oracle (single-source DSO) preprocesses a given graph $$G$$ with a distinguished source vertex $$s$$ into a data structure. When queried with a target vertex $$t$$ and an edge $$e$$, the structure reports the replacement distance $$d_{G-e}(s,t)$$ in case $$e$$ fails. In the closely related single-source replacement paths (SSRP) problem, we are given $$G$$ and $$s$$ and have to enumerate all distances $$d_{G-e}(s,t)$$.

We present deterministic single-source DSOs for undirected graphs with $$\widetilde{O}(1)$$ query time. For unweighted graphs, we give a combinatorial preprocessing algorithm running in time $$\widetilde{O}(m \sqrt{n} + n^2)$$, resulting in a data structure taking $$O(n^{3/2})$$ space. For graphs with integer edge weights in the range $$[1,M]$$ and when using fast matrix multiplication, we get a preprocessing time of $$\widetilde{O}(Mn^{\omega})$$, and the oracle takes $$O(M^{1/2}n^{3/2})$$ space. Here, $$\omega < 2.373$$ denotes the matrix multiplication exponent. Both preprocessing times improve over previous constructions by polynomial factors. We further show that the space requirement is optimal up to the size of a machine word. Finally, we give a randomized preprocessing algorithm that breaks the quadratic barrier on sufficiently sparse graphs.

We obtain our oracles by derandomizing known SSRP algorithms by Chechik & Cohen [SODA'19] as well as Grandoni & Vassilevska Williams [FOCS'12] and combining them with a novel compression scheme for replacement distances.

This is joint work with Davide Bilò, Sarel Cohen, and Tobias Friedrich.

We introduce the balanced crown decomposition that captures the structure imposed on graphs by their connected induced subgraphs of a given size. Such subgraphs are a popular modeling tool in various application areas, where the non-local nature of the connectivity condition usually results in very challenging algorithmic tasks. The balanced crown decomposition is a combination of a crown decomposition and a balanced partition which makes it applicable to graph editing as well as graph packing and partitioning problems. We illustrate this by deriving improved approximation algorithms and kernelization for a variety of such problems.
10.09. 11:00Daniel Stephan

Routing is a method of sending messages in a network by successively forwarding them to a neighbor until they reach the target. This is done by assigning to every vertex a coordinate that stores some structural information. Most routing schemes show a trade-off between length of routed paths and memory required per coordinate. In this talk, we present a routing scheme such that in the worst case, every path it chooses is only five times longer than the shortest path. Further, we prove that in a hyperbolic random graph $$G$$, the coordinates of our routing scheme are encoded using $$O(log^3 n·diam(n))$$ bits. These graphs are of special interest as they are assumed to be a model which approximates real-world graphs. Our findings greatly improve upon the previous best known results for hyperbolic random graphs, which guarantee on average short paths only asymptotically almost surely and make no statements about coordinate size.

We evaluate our routing scheme on multiple generated and real-world networks. Our experiments show, that the coordinates in these networks all require little memory, while most paths used for routing match the shortest paths in length. In networks that are assumed to have an underlying hyperbolic geometry, for example the Internet graph, the resulting paths of our scheme are on average only 5% longer than the shortest paths.

24.09. 11:00Martin Bullinger

The formal study of coalition formation in multiagent systems is typically realized using so-called hedonic games, which originate from economic theory. The main focus of this branch of research has been on the existence and the computational complexity of deciding the existence of coalition structures that satisfy various stability criteria. The actual process of forming coalitions based on individual behavior has received considerably less attention. In this talk, we study the convergence of simple dynamics based on single-agent deviations in hedonic games. We consider various strategies for proving convergence of the dynamics based on potential functions. In particular, we showcase methods for dealing with non-monotonic potential functions. On the other hand, it is a challenging task to pinpoint the boundary of tractability of stable states. We show how to construct complicated counterexamples with the aid of linear programs. These counterexamples can usually be used to prove computational intractabilities.