
Bringmann, Karl; Friedrich, Tobias; Krohmer, Anton Deanonymization of Heterogeneous Random Graphs in Quasilinear Time. Algorithmica 2018: 131
There are hundreds of online social networks with altogether billions of users. Many such networks publicly release structural information, with all personal information removed. Empirical studies have shown, however, that this provides a false sense of privacy  it is possible to identify almost all users that appear in two such anonymized network as long as a few initial mappings are known. We analyze this problem theoretically by reconciling two versions of an artificial powerlaw network arising from independent subsampling of vertices and edges. We present a new algorithm that identifies most vertices and makes no wrong identifications with high probability. The number of vertices matched is shown to be asymptotically optimal. For an nvertex graph, our algorithm uses \(n^\varepsilon\) seed nodes (for an arbitrarily small \(\varepsilon\)) and runs in quasilinear time. This improves previous theoretical results which need \(\Theta(n)\) seed nodes and have runtimes of order \(n^{1+\Omega(1)}\). Additionally, the applicability of our algorithm is studied experimentally on different networks.

Bläsius, Thomas; Friedrich, Tobias; Krohmer, Anton Cliques in Hyperbolic Random Graphs. Algorithmica 2018: 23242344
Most complex real world networks display scalefree features. This characteristic motivated the study of numerous random graph models with a powerlaw degree distribution. There is, however, no established and simple model which also has a high clustering of vertices as typically observed in real data. Hyperbolic random graphs bridge this gap. This natural model has recently been introduced by hyprg and has shown theoretically and empirically to fulfill all typical properties of real world networks, including powerlaw degree distribution and high clustering. We study cliques in hyperbolic random graphs \(G\) and present new results on the expected number of \(k\)cliques \(E(K_k)\) and the size of the largest clique \(\omega(G)\). We observe that there is a phase transition at powerlaw exponent \(\beta = 3\). More precisely, for \(beta in (2,3)\) we prove \(E(K_k)=n^k (3\beta)/2} \Theta(k)^{k}\) and \(\omega(G)=\Theta(n^(3\beta)/2})\), while for \(\beta\geq3\) we prove \(E(K_k)=n \Theta(k)^{k}\) and \(\omega(G)=\Theta(\log(n)/ \log \log n)\). Furthermore, we show that for \(\beta \geq 3\), cliques in hyperbolic random graphs can be computed in time \(O(n)\). If the underlying geometry is known, cliques can be found with worstcase runtime \(O(m \cdot n^{2.5})\) for all values of \(\beta\).

Bläsius, Thomas; Friedrich, Tobias; Krohmer, Anton; Laue, Sören Efficient Embedding of ScaleFree Graphs in the Hyperbolic Plane. IEEE/ACM Transactions on Networking 2018: 920933
Hyperbolic geometry appears to be intrinsic in many large real networks. We construct and implement a new maximum likelihood estimation algorithm that embeds scalefree graphs in the hyperbolic space. All previous approaches of similar embedding algorithms require at least a quadratic runtime. Our algorithm achieves quasilinear runtime, which makes it the first algorithm that can embed networks with hundreds of thousands of nodes in less than one hour. We demonstrate the performance of our algorithm on artificial and real networks. In all typical metrics, like Loglikelihood and greedy routing, our algorithm discovers embeddings that are very close to the ground truth.

Friedrich, Tobias; Katzmann, Maximilian; Krohmer, Anton Unbounded Discrepancy of Deterministic Random Walks on Grids. SIAM Journal on Discrete Mathematics 2018
Random walks are frequently used in randomized algorithms. We study a derandomized variant of a random walk on graphs, called rotorrouter model. In this model, instead of distributing tokens randomly, each vertex serves its neighbors in a fixed deterministic order. For most setups, both processes behave remarkably similar: Starting with the same initial configuration, the number of tokens in the rotorrouter model deviates only slightly from the expected number of tokens on the corresponding vertex in the random walk model. The maximal difference over all vertices and all times is called single vertex discrepancy. Cooper and Spencer (2006) showed that on \(\mathbb{Z}^{d}\) the single vertex discrepancy is only a constant \(c_d\). Other authors also determined the precise value of \(c_d\) for \(d=1,2\). All these results, however, assume that initially all tokens are only placed on one partition of the bipartite graph \(\mathbb{Z}^{d}\). We show that this assumption is crucial by proving that otherwise the single vertex discrepancy can become arbitrarily large. For all dimensions \(d\geq1\) and arbitrary discrepancies~\(\ell \geq 0\), we construct configurations that reach a discrepancy of at least \(\ell\).

Friedrich, Tobias; Krohmer, Anton On the diameter of hyperbolic random graphs. SIAM Journal on Discrete Mathematics 2018: 13141334
Large realworld networks are typically scalefree. Recent research has shown that such graphs are described best in a geometric space. More precisely, the internet can be mapped to a hyperbolic space such that geometric greedy routing is close to optimal (Bogu\~n}{á}, Papadopoulos, and Krioukov. Nature Communications, 1:62, 2010). This observation has pushed the interest in hyperbolic networks as a natural model for scalefree networks. Hyperbolic random graphs follow a power law degree distribution with controllable exponent \(\beta)\) and show high clustering (Gugelmann, Panagiotou, and Peter. ICALP, pp. 573–585, 2012). For understanding the structure of the resulting graphs and for analyzing the behavior of network algorithms, the next question is bounding the size of the diameter. The only known explicit bound is \(O((\log n)^32/((3  \beta)(5  \beta))+1})\)(Kiwi and Mitsche. ANALCO, pp. 26–39, 2015). We present two much simpler proofs for an improved upper bound of \(O((\log n)^2/(3  \beta)})\) and a lower bound of \(\Omega(\log n)\). If \(\beta > 3\), we show that the latter bound is tight by proving an upper bound of \(O(\log n)\) for the diameter.

Bläsius, Thomas; Friedrich, Tobias; Katzmann, Maximilian; Krohmer, Anton Hyperbolic Embeddings for NearOptimal Greedy Routing. Algorithm Engineering and Experiments (ALENEX) 2018: 199208
Greedy routing computes paths between nodes in a network by successively moving to the neighbor closest to the target with respect to coordinates given by an embedding into some metric space. Its advantage is that only local information is used for routing decisions. We present different algorithms for generating graph embeddings into the hyperbolic plane that are well suited for greedy routing. In particular our embeddings guarantee that greedy routing always succeeds in reaching the target and we try to minimize the lengths of the resulting greedy paths. We evaluate our algorithm on multiple generated and real wold networks. For networks that are generally assumed to have a hidden underlying hyperbolic geometry, such as the Internet graph, we achieve nearoptimal results, i.e., the resulting greedy paths are only slightly longer than the corresponding shortest paths. In the case of the Internet graph, they are only \(6\%\) longer when using our best algorithm, which greatly improves upon the previous best known embedding, whose creation required substantial manual intervention.

Bläsius, Thomas; Friedrich, Tobias; Katzmann, Maximilian; Krohmer, Anton; Striebel, Jonathan Towards a Systematic Evaluation of Generative Network Models. Workshop on Algorithms and Models for the Web Graph (WAW) 2018: 99114
Generative graph models play an important role in network science. Unlike realworld networks, they are accessible for mathematical analysis and the number of available networks is not limited. The explanatory power of results on generative models, however, heavily depends on how realistic they are. We present a framework that allows for a systematic evaluation of generative network models. It is based on the question whether realworld networks can be distinguished from generated graphs with respect to certain graph parameters. As a proof of concept, we apply our framework to four popular random graph models (ErdősRényi, BarabásiAlbert, ChungLu, and hyperbolic random graphs). Our experiments for example show that all four models are bad representations for Facebook's social networks, while ChungLu and hyperbolic random graphs are good representations for other networks, with different strengths and weaknesses.

Friedrich, Tobias; Krohmer, Anton; Rothenberger, Ralf; Sutton, Andrew M. Phase Transitions for ScaleFree SAT Formulas. Association for the Advancement of Artificial Intelligence (AAAI) 2017: 38933899
Recently, a number of nonuniform random satisfiability models have been proposed that are closer to practical satisfiability problems in some characteristics. In contrast to uniform random Boolean formulas, scalefree formulas have a variable occurrence distribution that follows a power law. It has been conjectured that such a distribution is a more accurate model for some industrial instances than the uniform random model. Though it seems that there is already an awareness of a threshold phenomenon in such models, there is still a complete picture lacking. In contrast to the uniform model, the critical density threshold does not lie at a single point, but instead exhibits a functional dependency on the powerlaw exponent. For scalefree formulas with clauses of length \(k = 2\), we give a lower bound on the phase transition threshold as a function of the scaling parameter. We also perform computational studies that suggest our bound is tight and investigate the critical density for formulas with higher clause lengths. Similar to the uniform model, on formulas with \(k \geq 3\), we find that the phase transition regime corresponds to a set of formulas that are difficult to solve by backtracking search.

Friedrich, Tobias; Krohmer, Anton; Rothenberger, Ralf; Sauerwald, Thomas; Sutton, Andrew M. Bounds on the Satisfiability Threshold for Power Law Distributed Random SAT. European Symposium on Algorithms (ESA) 2017: 37:137:15
Propositional satisfiability (SAT) is one of the most fundamental problems in computer science. The worstcase hardness of SAT lies at the core of computational complexity theory. The averagecase analysis of SAT has triggered the development of sophisticated rigorous and nonrigorous techniques for analyzing random structures. Despite a long line of research and substantial progress, nearly all theoretical work on random SAT assumes a uniform distribution on the variables. In contrast, realworld instances often exhibit large fluctuations in variable occurrence. This can be modeled by a scalefree distribution of the variables, which results in distributions closer to industrial SAT instances. We study random \(k\)SAT on \(n\) variables, \(m=\Theta(n)\) clauses, and a power law distribution on the variable occurrences with exponent \(\beta\). We observe a satisfiability threshold at \(\beta=(2k1)/(k1)\). This threshold is tight in the sense that instances with \(beta < (2k1)/(k1)\varepsilon\) for any constant \(\varepsilon>0\) are unsatisfiable with high probability (w.h.p.). For \(\beta\ge(2k1)/(k1)+\varepsilon\), the picture is reminiscent of the uniform case: instances are satisfiable w.h.p. for sufficiently small constant clausevariable ratios \(m/n\); they are unsatisfiable above a ratio \(m/n\) that depends on \(\beta\).

Bläsius, Thomas; Friedrich, Tobias; Krohmer, Anton Hyperbolic Random Graphs: Separators and Treewidth. European Symposium on Algorithms (ESA) 2016: 15:115:16
When designing and analyzing algorithms, one can obtain better and more realistic results for practical instances by assuming a certain probability distribution on the input. The worstcase runtime is then replaced by the expected runtime or by bounds that hold with high probability (whp), i.e., with probability \(1  O(1/n)\), on the random input. Hyperbolic random graphs can be used to model complex realworld networks as they share many important properties such as a small diameter, a large clustering coefficient, and a powerlaw degreedistribution. Divide and conquer is an important algorithmic design principle that works particularly well if the instance admits small separators. We show that hyperbolic random graphs in fact have comparatively small separators. More precisely, we show that a hyperbolic random graph can be expected to have a balanced separator hierarchy with separators of size \(O(\sqrt{n^{(3\beta)}})\), \(O(\log n)\), and \(O(1)\) if \(2 < \beta < 3\), \(\beta = 3\) and \(3 < \beta\), respectively (\(\beta\) is the powerlaw exponent). We infer that these graphs have whp a treewidth of \(O(\sqrt{n^{(3  \beta)}})\), \(O(\log^{2}n)\), and \(O(\log n)\), respectively. For \(2 < \beta < 3\), this matches a known lower bound. For the more realistic (but harder to analyze) binomial model, we still prove a sublinear bound on the treewidth. To demonstrate the usefulness of our results, we apply them to obtain fast matching algorithms and an approximation scheme for Independent Set.

Bläsius, Thomas; Friedrich, Tobias; Krohmer, Anton; Laue, Sören Efficient Embedding of ScaleFree Graphs in the Hyperbolic Plane. European Symposium on Algorithms (ESA) 2016: 16:116:18
EATCS Best Paper Award
Hyperbolic geometry appears to be intrinsic in many large real networks. We construct and implement a new maximum likelihood estimation algorithm that embeds scalefree graphs in the hyperbolic space. All previous approaches of similar embedding algorithms require a runtime of \(\Omega(n^{2})\). Our algorithm achieves quasilinear runtime, which makes it the first algorithm that can embed networks with hundreds of thousands of nodes in less than one hour. We demonstrate the performance of our algorithm on artificial and real networks. In all typical metrics like Loglikelihood and greedy routing our algorithm discovers embeddings that are very close to the ground truth.

Friedrich, Tobias; Krohmer, Anton Parameterized clique on inhomogeneous random graphs. Discrete Applied Mathematics 2015: 130138
Finding cliques in graphs is a classical problem which is in general NPhard and parameterized intractable. In typical applications like social networks or biological networks, however, the considered graphs are scalefree, i.e., their degree sequence follows a power law. Their specific structure can be algorithmically exploited and makes it possible to solve clique much more efficiently. We prove that on inhomogeneous random graphs with \(n\) nodes and power law exponent \(\beta\), cliques of size \(k\) can be found in time \(O(n)\) for \(\beta \ge 3\) and in time \(O(ne^{k^4})\) for \(2 < \beta < 3\).

Friedrich, Tobias; Krohmer, Anton On the Diameter of Hyperbolic Random Graphs. International Colloquium on Automata, Languages and Programming (ICALP) 2015: 614625
Large realworld networks are typically scalefree. Recent research has shown that such graphs are described best in a geometric space. More precisely, the internet can be mapped to a hyperbolic space such that geometric greedy routing performs close to optimal (Boguna, Papadopoulos, and Krioukov. Nature Communications, 1:62, 2010). This observation pushed the interest in hyperbolic networks as a natural model for scalefree networks. Hyperbolic random graphs follow a powerlaw degree distribution with controllable exponent \(\beta\) and show high clustering (Gugelmann, Panagiotou, and Peter. ICALP, pp. 573585, 2012). For understanding the structure of the resulting graphs and for analyzing the behavior of network algorithms, the next question is bounding the size of the diameter. The only known explicit bound is \(O((\log n)^32/((3\beta)(5\beta)}))\) (Kiwi and Mitsche. ANALCO, pp. 2639, 2015). We present two much simpler proofs for an improved upper bound of \(O((\log n)^2/(3\beta)})\) and a lower bound of \(\Omega(\log n)\).

Friedrich, Tobias; Krohmer, Anton Cliques in Hyperbolic Random Graphs. International Conference on Computer Communications (INFOCOM) 2015: 15441552
Most complex realworld networks display scalefree features. This motivated the study of numerous random graph models with a powerlaw degree distribution. There is, however, no established and simple model which also has a high clustering of vertices as typically observed in real data. Hyperbolic random graphs bridge this gap. This natural model has recently been introduced by Papadopoulos, Krioukov, Boguna, Vahdat (INFOCOM, pp. 29732981, 2010) and has shown theoretically and empirically to fulfill all typical properties of realworld networks, including powerlaw degree distribution and high clustering. We study cliques in hyperbolic random graphs \(G\) and present new results on the expected number of \(k\)cliques \(E[K_k]\) and the size of the largest clique \(\omega(G)\). We observe that there is a phase transition at powerlaw exponent \(\gamma = 3\). More precisely, for \(gamma in (2,3)\) we prove \(E[K_k] = n^k(3\gamma)/2 \Theta(k)^{k}\) and \(\omega(G) = \Theta(n^(3\gamma)/2})\) while for \(\gamma \ge 3\) we prove \(E[K_k] = n \Theta(k)^{k}\) and \(\omega(G) = \Theta(\log(n)/\log \log n)\). We empirically compare the \(\omega(G)\) value of several scalefree random graph models with realworld networks. Our experiments show that the \(\omega(G)\)predictions by hyperbolic random graphs are much closer to the data than other scalefree random graph models.

Friedrich, Tobias; Katzmann, Maximilian; Krohmer, Anton Unbounded Discrepancy of Deterministic Random Walks on Grids. International Symposium on Algorithms and Computation (ISAAC) 2015: 212222
Random walks are frequently used in randomized algorithms. We study a derandomized variant of a random walk on graphs, called rotorrouter model. In this model, instead of distributing tokens randomly, each vertex serves its neighbors in a fixed deterministic order. For most setups, both processes behave remarkably similar: Starting with the same initial configuration, the number of tokens in the rotorrouter model deviates only slightly from the expected number of tokens on the corresponding vertex in the random walk model. The maximal difference over all vertices and all times is called single vertex discrepancy. Cooper and Spencer (2006) showed that on \(\mathbb{Z}^d\) the single vertex discrepancy is only a constant \(c_d\). Other authors also determined the precise value of \(c_d\) for \(d=1,2\). All these results, however, assume that initially all tokens are only placed on one partition of the bipartite graph \(\mathbb{Z}^d\). We show that this assumption is crucial by proving that otherwise the single vertex discrepancy can become arbitrarily large. For all dimensions \(d \ge 1\) and arbitrary discrepancies \(\ell \ge 0\), we construct configurations that reach a discrepancy of at least \(\ell\).

Alcaraz, Nicolas; Friedrich, Tobias; Kötzing, Timo; Krohmer, Anton; Müller, Joachim; Pauling, Josch; Baumbach, Jan Efficient Key Pathway Mining: Combining Networks and OMICS Data. Integrative Biology 2012: 756764
Systems biology has emerged over the last decade. Driven by the advances in sophisticated measurement technology the research community generated huge molecular biology data sets. This comprises rather static data on the interplay of biological entities, for instance proteinprotein interaction network data, as well as quite dynamic data collected for studying the behavior of individual cells or tissues in accordance to changing environmental conditions, such as DNA microarrays or RNA sequencing. Here we bring the two different data types together in order to gain higher level knowledge. We introduce a significantly improved version of the KeyPathwayMiner software framework. Given a biological network modelled as graph and a set of expression studies, KeyPathwayMiner efficiently finds and visualizes connected subnetworks where most components are expressed in most cases. It finds all maximal connected subnetworks where all nodes but k exceptions are expressed in all experimental studies but at least l exceptions. We demonstrate the power of the new approach by comparing it to similar approaches with gene expression data previously used to study Huntington's disease. In addition, we demonstrate KeyPathwayMiner's flexibility and applicability to nonarray data by analyzing genomescale DNA methylation profiles from colorectal tumor cancer patients. KeyPathwayMiner release 2 is available as a Cytoscape plugin and online at http://keypathwayminer.mpiinf.mpg.de.

Baumbach, Jan; Friedrich, Tobias; Kötzing, Timo; Krohmer, Anton; Müller, Joachim; Pauling, Josch Efficient Algorithms for Extracting Biological Key Pathways with Global Constraints. Genetic and Evolutionary Computation Conference (GECCO) 2012: 169176
The integrated analysis of data of different types and with various interdependencies is one of the major challenges in computational biology. Recently, we developed KeyPathwayMiner, a method that combines biological networks modeled as graphs with diseasespecific genetic expression data gained from a set of cases (patients, cell lines, tissues, etc.). We aimed for finding all maximal connected subgraphs where all nodes but K are expressed in all cases but at most L, i.e. key pathways. Thereby, we combined biological networks with OMICS data, instead of analyzing these data sets in isolation. Here we present an alternative approach that avoids a certain bias towards hub nodes: We now aim for extracting all maximal connected subnetworks where all but at most K nodes are expressed in all cases but in total (!) at most L, i.e. accumulated over all cases and all nodes in a solution. We call this strategy GLONE (global node exceptions); the previous problem we call INES (individual node exceptions). Since finding GLONEcomponents is computationally hard, we developed an Ant Colony Optimization algorithm and implemented it with the KeyPathwayMiner Cytoscape framework as an alternative to the INES algorithms. KeyPathwayMiner 3.0 now offers both the INES and the GLONE algorithms. It is available as plugin from Cytoscape and online at http://keypathwayminer.mpiinf.mpg.de.

Friedrich, Tobias; Krohmer, Anton Parameterized Clique on ScaleFree Networks. International Symposium on Algorithms and Computation (ISAAC) 2012: 659668
Finding cliques in graphs is a classical problem which is in general NPhard and parameterized intractable. However, in typical applications like social networks or proteinprotein interaction networks, the considered graphs are scalefree, i.e., their degree sequence follows a power law. Their specific structure can be algorithmically exploited and makes it possible to solve clique much more efficiently. We prove that on inhomogeneous random graphs with \(n\) nodes and power law exponent \(\gamma\), cliques of size \(k\) can be found in time \(O(n^2)\) for \(\gamma \ge 3\) and in time \(O(n, exp(k^4))\) for \(2<\gamma <3\).