Bilò, Davide; Friedrich, Tobias; Lenzner, Pascal; Lowski, Stefanie; Melnichenko, Anna Selfish Creation of Social NetworksConference on Artificial Intelligence (AAAI) 2021: 5185–5193
Understanding real-world networks has been a core research endeavor throughout the last two decades. Network Creation Games are a promising approach for this from a game-theoretic perspective. In these games, selfish agents corresponding to nodes in a network strategically decide which links to form to optimize their centrality. Many versions have been introduced and analyzed, but none of them fits to modeling the evolution of social networks. In real-world social networks, connections are often established by recommendations from common acquaintances or by a chain of such recommendations. Thus establishing and maintaining a contact with a friend of a friend is easier than connecting to complete strangers. This explains the high clustering, i.e., the abundance of triangles, in real-world social networks. We propose and analyze a network creation model inspired by real-world social networks. Edges are formed in our model via bilateral consent of both endpoints and the cost for establishing and maintaining an edge is proportional to the distance of the endpoints before establishing the connection. We provide results for generic cost functions, which essentially only must be convex functions in the distance of the endpoints without the respective edge. For this broad class of cost functions, we provide many structural properties of equilibrium networks and prove (almost) tight bounds on the diameter, the Price of Anarchy and the Price of Stability. Moreover, as a proof-of-concept we show via experiments that the created equilibrium networks of our model indeed closely mimics real-world social networks. We observe degree distributions that seem to follow a power-law, high clustering, and low diameters. This can be seen as a promising first step towards game-theoretic network creation models that predict networks featuring all core real-world properties.
Quinzan, Francesco; Doskoč, Vanja; Göbel, Andreas; Friedrich, Tobias Adaptive Sampling for Fast Constrained Maximization of Submodular FunctionsArtificial Intelligence and Statistics (AISTATS) 2021: 964–972
Several large-scale machine learning tasks, such as data summarization, can be approached by maximizing functions that satisfy submodularity. These optimization problems often involve complex side constraints, imposed by the underlying application. In this paper, we develop an algorithm with poly-logarithmic adaptivity for non-monotone submodular maximization under general side constraints. The adaptive complexity of a problem is the minimal number of sequential rounds required to achieve the objective. Our algorithm is suitable to maximize a non-monotone submodular function under a \(p\)-system side constraint, and it achieves a \((p + O({\sqrt{p}}))\)-approximation for this problem, after only poly-logarithmic adaptive rounds and polynomial queries to the valuation oracle function. Furthermore, our algorithm achieves a \(p + O(1))\)-approximation when the given side constraint is a \(p\)-extendible system. This algorithm yields an exponential speed-up, with respect to the adaptivity, over any other known constant-factor approximation algorithm for this problem. It also competes with previous known results in terms of the query complexity. We perform various experiments on various real-world applications. We find that, in comparison with commonly used heuristics, our algorithm performs better on these instances.
Cohen, Sarel; Hershcovitch, Moshik; Taraz, Martin; Kißig, Otto; Wood, Andrew; Waddington, Daniel; Chin, Peter; Friedrich, Tobias Drug Repurposing using Link Prediction on Knowledge Graphs with Applications to Non-Volatile MemoryComplex Networks and their Applications (ComplexNetworks) 2021: 742–753
The active global SARS-CoV-2 pandemic caused more than 167 million cases and 3.4 million deaths worldwide. The development of completely new drugs for such a novel disease is a challenging, time intensive process and despite researchers around the world working on this task, no effective treatments have been developed yet. This emphasizes the importance of \emph{drug repurposing, where treatments are found among existing drugs that are meant for different diseases. A common approach to this is based on \emph{knowledge graphs, that condense relationships between entities like drugs, diseases and genes. Graph neural networks (GNNs) can then be used for the task at hand by predicting links in such knowledge graphs. Expanding on state-of-the-art GNN research, Doshi et al. recently developed the Dr-COVID model. We further extend their work using additional output interpretation strategies. The best aggregation strategy derives a top-100 ranking of candidate drugs, 32 of which currently being in COVID-19-related clinical trials. Moreover, we present an alternative application for the model, the generation of additional candidates based on a given pre-selection of drug candidates using collaborative filtering. In addition, we improved the implementation of the Dr-COVID model by significantly shortening the inference and pre-processing time by exploiting data-parallelism. As drug repurposing is a task that requires high computation and memory resources, we further accelerate the post-processing phase using a new emerging hardware --- we propose a new approach to leverage the use of high-capacity Non-Volatile Memory for aggregate drug ranking.
Bilò, Davide; Cohen, Sarel; Friedrich, Tobias; Schirneck, Martin Near-Optimal Deterministic Single-Source Distance Sensitivity OraclesEuropean Symposium on Algorithms (ESA) 2021: 18:1–18:17
Given a graph with a distinguished source vertex \(s\), the Single Source Replacement Paths (SSRP) problem is to compute and output, for any target vertex \(t\) and edge \(e\), the length \(d(s,t,e)\) of a shortest path from \(s\) to \(t\) that avoids a failing edge \(e\). A Single-Source Distance Sensitivity Oracle (Single-Source DSO) is a compact data structure that answers queries of the form \((t,e)\) by returning the distance \(d(s,t,e)\). We show how to compress the output of the SSRP problem on \(n\)-vertex, \(m\)-edge graphs with integer edge weights in the range \([1,M]\) into a deterministic Single-Source DSO that has size \(O(M^{1/2 n^{3/2})\) and query time \(\widetildeO(1)\). We prove that the space requirement is optimal (up to the word size). Our techniques can also handle vertex failures within the same bounds. Chechik and Cohen [SODA 2019] presented a combinatorial randomized \(\widetildeO(m\sqrt{n}+n^2)\) time SSRP algorithm for undirected and unweighted graphs. We derandomize their algorithm with the same asymptotic running time and apply our compression to obtain a deterministic Single-Source DSO with \(\widetildeO(m\sqrt{n}+n^2)\) preprocessing time, \(O(n^{3/2})\) space, and \(\widetildeO(1)\) query time. Our combinatorial Single-Source DSO has near-optimal space, preprocessing and query time for dense unweighted graphs, improving the preprocessing time by a \(\sqrt{n}\)-factor compared to previous results. Grandoni and Vassilevska Williams [FOCS 2012, TALG 2020] gave an algebraic randomized \(\widetildeO(Mn^\omega)\) time SSRP algorithm for (undirected and directed) graphs with integer edge weights in the range \([1,M]\), where \(\omega < 2.373\) is the matrix multiplication exponent. We derandomize their algorithm for undirected graphs and apply our compression to obtain an algebraic Single-Source DSO with \(\widetildeO(Mn^\omega)\) preprocessing time, \(O(M^{1/2 n^{3/2})\) space, and \(\widetildeO(1)\) query time. This improves the preprocessing time of algebraic Single-Source DSOs by polynomial factors compared to previous results. We also present further improvements of our Single-Source DSOs. We show that the query time can be reduced to a constant at the cost of increasing the size of the oracle to \(O(M^{1/3 n^{5/3})\) and that all our oracles can be made path-reporting. On sparse graphs with \(m=O(\frac{n^{5/4-\varepsilon}}{M^{7/4}})\) edges, for any constant \(\varepsilon > 0\), we reduce the preprocessing to randomized \(\widetildeO(M^7/8 m^1/2 n^{11/8}) = O(n^{2-\varepsilon/2})\) time. To the best of our knowledge, this is the first truly subquadratic time algorithm for building Single-Source DSOs on sparse graphs.
Bläsius, Thomas; Friedrich, Tobias; Katzmann, Maximilian Efficiently Approximating Vertex Cover on Scale-Free Networks with Underlying Hyperbolic GeometryEuropean Symposium on Algorithms (ESA) 2021: 20:1–20:15
Finding a minimum vertex cover in a network is a fundamental NP-complete graph problem. One way to deal with its computational hardness, is to trade the qualitative performance of an algorithm (allowing non-optimal outputs) for an improved running time. For the vertex cover problem, there is a gap between theory and practice when it comes to understanding this tradeoff. On the one hand, it is known that it is NP-hard to approximate a minimum vertex cover within a factor of \(\sqrt{2}\). On the other hand, a simple greedy algorithm yields close to optimal approximations in practice. A promising approach towards understanding this discrepancy is to recognize the differences between theoretical worst-case instances and real-world networks. Following this direction, we close the gap between theory and practice by providing an algorithm that efficiently computes nearly optimal vertex cover approximations on hyperbolic random graphs; a network model that closely resembles real-world networks in terms of degree distribution, clustering, and the small-world property. More precisely, our algorithm computes a \((1 + o(1))\)-approximation, asymptotically almost surely, and has a running time of \(\mathcal{O(m \log(n))\). The proposed algorithm is an adaption of the successful greedy approach, enhanced with a procedure that improves on parts of the graph where greedy is not optimal. This makes it possible to introduce a parameter that can be used to tune the tradeoff between approximation performance and running time. Our empirical evaluation on real-world networks shows that this allows for improving over the near-optimal results of the greedy approach.
Bläsius, Thomas; Friedrich, Tobias; Weyand, Christopher Efficiently Computing Maximum Flows in Scale-Free NetworksEuropean Symposium on Algorithms (ESA) 2021: 21:1–21:14
We study the maximum-flow/minimum-cut problem on scale-free networks, i.e., graphs whose degree distribution follows a power-law. We propose a simple algorithm that capitalizes on the fact that often only a small fraction of such a network is relevant for the flow. At its core, our algorithm augments Dinitz's algorithm with a balanced bidirectional search. Our experiments on a scale-free random network model indicate sublinear run time. On scale-free real-world networks, we outperform the commonly used highest-label Push-Relabel implementation by up to two orders of magnitude. Compared to Dinitz's original algorithm, our modifications reduce the search space, e.g., by a factor of 275 on an autonomous systems graph. Beyond these good run times, our algorithm has an additional advantage compared to Push-Relabel. The latter computes a preflow, which makes the extraction of a minimum cut potentially more difficult. This is relevant, for example, for the computation of Gomory-Hu trees. On a social network with 70000 nodes, our algorithm computes the Gomory-Hu tree in 3 seconds compared to 12 minutes when using Push-Relabel.
Casel, Katrin; Friedrich, Tobias; Issac, Davis; Niklanovits, Aikaterini; Zeif, Ziena Balanced Crown Decomposition for Connectivity ConstraintsEuropean Symposium on Algorithms (ESA) 2021: 26:1–26:15
We introduce the balanced crown decomposition that captures the structure imposed on graphs by their connected induced subgraphs of a given size. Such subgraphs are a popular modeling tool in various application areas, where the non-local nature of the connectivity condition usually results in very challenging algorithmic tasks. The balanced crown decomposition is a combination of a crown decomposition and a balanced partition which makes it applicable to graph editing as well as graph packing and partitioning problems. We illustrate this by deriving improved approximation algorithms and kernelization for a variety of such problems. In particular, through this structure, we obtain the first constant-factor approximation for the Balanced Connected Partition (BCP) problem, where the task is to partition a vertex-weighted graph into k connected components of approximately equal weight. We derive a \(3\)-approximation for the two most commonly used objectives of maximizing the weight of the lightest component or minimizing the weight of the heaviest component.
Behrendt, Lukas; Casel, Katrin; Friedrich, Tobias; Lagodzinski, J. A. Gregor; Löser, Alexander; Wilhelm, Marcus From Symmetry to Asymmetry: Generalizing TSP Approximations by ParametrizationFundamentals of Computation Theory (FCT) 2021: 53–66
We generalize the tree doubling and Christofides algorithm to parameterized approximations for ATSP. The parameters we consider for the respective generalizations are upper bounded by the number of asymmetric distances, which yields algorithms to efficiently compute good approximations also for moderately asymmetric TSP instances. As generalization of the Christofides algorithm, we derive a parameterized 2.5-approximation, where the parameter is the size of a vertex cover for the subgraph induced by the asymmetric distances. Our generalization of the tree doubling algorithm gives a parameterized 3-approximation, where the parameter is the minimum number of asymmetric distances in a minimum spanning arborescence. Further, we combine these with a notion of symmetry relaxation which allows to trade approximation guarantee for runtime. Since the two parameters we consider are theoretically incomparable, we present experimental results which show that generalized tree doubling frequently outperforms generalized Christofides with respect to parameter size.
Berger, Julian; Bleidt, Tibor; Büßemeyer, Martin; Ding, Marcus; Feldmann, Moritz; Feuerpfeil, Moritz; Jacoby, Janusch; Schröter, Valentin; Sievers, Bjarne; Spranger, Moritz; Stadlinger, Simon; Wullenweber, Paul; Cohen, Sarel; Doskoč, Vanja; Friedrich, Tobias Fine-Grained Localization, Classification and Segmentation of Lungs with Various DiseasesCVPR Workshop on Fine-Grained Visual Categorization (FGVC@CVPR) 2021
The fine-grained localization and classification of various lung abnormalities is a challenging yet important task for combating diseases and, also, pandemics. In this paper, we present one way to detect and classify abnormalities within chest X-ray scans. In particular, we investigate the use of binary image classification (to distinguish between healthy and infected chests) and the weighted box fusion (which constructs a detection box using the proposed boxes within range). We observe that both methods increase the performance of a base model significantly. Furthermore, we improve state of the art on lung segmentation, even in the presence of abnormalities. We do so using transfer learning to fine-tune a UNet model on the Montgomery and Shenzhen datasets. In our experiments, we compare standard augmentations (like crop, pad, rotate, warp, zoom, brightness, and contrast variations) to more complex ones (for example, block masking and diffused noise augmentations). This way, we obtain a state-of-the-art model with a dice score of 97.9%. In particular, we show that simple augmentations outperform complex ones in our setting.
Böther, Maximilian; Schiller, Leon; Fischbeck, Philipp; Molitor, Louise; Krejca, Martin S.; Friedrich, Tobias Evolutionary Minimization of Traffic CongestionGenetic and Evolutionary Computation Conference (GECCO) 2021: 937–945
Best-Paper Award (RWA Track)
Traffic congestion is a major issue that can be solved by suggesting drivers alternative routes they are willing to take. This concept has been formalized as a strategic routing problem in which a single alternative route is suggested to an existing one. We extend this formalization and introduce the Multiple-Routes problem, which is given a start and a destination and then aims at finding up to \(n\) different routes that the drivers strategically disperse over, minimizing the overall travel time of the system. Due to the NP-hard nature of the problem, we introduce the Multiple-Routes evolutionary algorithm (MREA) as a heuristic solver. We study several mutation and crossover operators and evaluate them on real-world data of the city of Berlin, Germany. We find that a combination of all operators yields the best result, improving the overall travel time by a factor between 1.8 and 3, in the median, compared to all drivers taking the fastest route. For the base case \(n=2\), we compare our MREA to the highly tailored optimal solver by Bläsius etal. [ATMOS 2020] and show that, in the median, our approach finds solutions of quality at least \(99.69\%\) of an optimal solution while only requiring \(40\%\) of the time.
Friedrich, Tobias; Göbel, Andreas; Krejca, Martin S.; Pappik, Marcus A spectral independence view on hard spheres via block dynamicsInternational Colloquium on Automata, Languages and Programming (ICALP) 2021: 66:1–66:15
The hard-sphere model is one of the most extensively studied models in statistical physics. It describes the continuous distribution of spherical particles, governed by hard-core interactions. An important quantity of this model is the normalizing factor of this distribution, called the partition function. We propose a Markov chain Monte Carlo algorithm for approximating the grand-canonical partition function of the hard-sphere model in \(d\) dimensions. Up to a fugacity of \( \lambda < \text{e}/2^d\), the runtime of our algorithm is polynomial in the volume of the system. This covers the entire known real-valued regime for the uniqueness of the Gibbs measure. Key to our approach is to define a discretization that closely approximates the partition function of the continuous model. This results in a discrete hard-core instance that is exponential in the size of the initial hard-sphere model. Our approximation bound follows directly from the correlation decay threshold of an infinite regular tree with degree equal to the maximum degree of our discretization. To cope with the exponential blow-up of the discrete instance we use clique dynamics, a Markov chain that was recently introduced in the setting of abstract polymer models. We prove rapid mixing of clique dynamics up to the tree threshold of the univariate hard-core model. This is achieved by relating clique dynamics to block dynamics and adapting the spectral expansion method, which was recently used to bound the mixing time of Glauber dynamics within the same parameter regime.
Lagodzinski, J. A. Gregor; Göbel, Andreas; Casel, Katrin; Friedrich, Tobias On Counting (Quantum-)Graph Homomorphisms in Finite Fields of Prime OrderInternational Colloquium on Automata, Languages and Programming (ICALP) 2021: 91:1–91:15
We study the problem of counting the number of homomorphisms from an input graph \(G\) to a fixed (quantum) graph \(\bar{H}\) in any finite field of prime order \(\mathbb{Z}_p\). The subproblem with graph \(H\) was introduced by Faben and Jerrum [ToC'15] and its complexity is still uncharacterised despite active research, e.g. the very recent work of Focke, Goldberg, Roth, and Zivný [SODA'21]. Our contribution is threefold. First, we introduce the study of quantum graphs to the study of modular counting homomorphisms. We show that the complexity for a quantum graph \(\bar{H}\) collapses to the complexity criteria found at dimension 1: graphs. Second, in order to prove cases of intractability we establish a further reduction to the study of bipartite graphs. Lastly, we establish a dichotomy for all bipartite \( (K_3,3 \ e, domino) \)-free graphs by a thorough structural study incorporating both local and global arguments. This result subsumes all results on bipartite graphs known for all prime moduli and extends them significantly. Even for the subproblem with \(p=2\) this establishes new results.
Bläsius, Thomas; Friedrich, Tobias; Krejca, Martin S.; Molitor, Louise The Impact of Geometry on Monochrome Regions in the Flip Schelling ProcessInternational Symposium on Algorithms and Computation, (ISAAC) 2021 2021: 29:1–29:17
Schelling’s classical segregation model gives a coherent explanation for the wide-spread phenomenon of residential segregation. We consider an agent-based saturated open-city variant, the Flip-Schelling-Process (FSP), in which agents, placed on a graph, have one out of two types and, based on the predominant type in their neighborhood, decide whether to changes their types; similar to a new agent arriving as soon as another agent leaves the vertex. We investigate the probability that an edge u,vis monochrome, i.e., that both vertices u and v have the same type in the FSP, and we provide a general framework for analyzing the influence of the underlying graph topology on residential segregation. In particular, for two adjacent vertices, we show that a highly decisive common neighborhood, i.e., a common neighborhood where the absolute value of the difference between the number of vertices with different types is high, supports segregation and moreover, that large common neighborhoods are more decisive. As an application, we study the expected behavior of the FSP on two common random graph models with and without geometry: (1) For random geometric graphs, we show that the existence of an edge u,v makes a highly decisive common neighborhood for u and v more likely. Based on this, we prove the existence of a constant c > 0 such that the expected fraction of monochrome edges after the FSP is at least 1/2 + c. (2) For Erdős–Rényi graphs we show that large common neighborhoods are unlikely and that the expected fraction of monochrome edges after the FSP is at most 1/2 + o (1). Our results indicate that the cluster structure of the underlying graph has a significant impact on the obtained segregation strength.
Casel, Katrin; Friedrich, Tobias; Issac, Davis; Klodt, Nicolas; Seifert, Lars; Zahn, Arthur A Color-blind 3-Approximation for Chromatic Correlation Clustering and Improved HeuristicsKnowledge Discovery and Data Mining (KDD) 2021: 882–891
Chromatic Correlation Clustering (CCC) models clustering of objects with categorical pairwise relationships. The model can be viewed as clustering the vertices of a graph with edge-labels (colors). Bonchi et al. [KDD 2012] introduced it as a natural generalization of the well studied problem Correlation Clustering (CC), motivated by real-world applications from data-mining, social networks and bioinformatics. We give theoretical as well as practical contributions to the study of CCC. Our main theoretical contribution is an alternative analysis of the famous Pivot algorithm for CC. We show that, when simply run color-blind, Pivot is also a linear time 3-approximation for CCC. The previous best theoretical results for CCC were a 4-approximation with a high-degree polynomial runtime and a linear time 11-approximation, both by Anava et al. [WWW 2015]. While this theoretical result justifies Pivot as a baseline comparison for other heuristics, its blunt color-blindness performs poorly in practice. We develop a color-sensitive, practical heuristic we call Greedy Expansion that empirically outperforms all heuristics proposed for CCC so far, both on real-world and synthetic instances. Further, we propose a novel generalization of CCC allowing for multi-labelled edges. We argue that it is more suitable for many of the real-world applications and extend our results to this model.
Bilò, Davide; Cohen, Sarel; Friedrich, Tobias; Schirneck, Martin Space-Efficient Fault-Tolerant Diameter OraclesMathematical Foundations of Computer Science (MFCS) 2021: 18:1–18:16
We design \(f\)-edge fault-tolerant diameter oracles (\(f\)-FDO, or simply FDO if \(f=1\)). For a given directed or undirected and possibly edge-weighted graph \(G\) with \(n\) vertices and \(m\) edges and a positive integer \(f\), we preprocess the graph and construct a data structure that, when queried with a set \(F\) of edges, where \(|F| \leq f\), returns the diameter of \(G - F\). An \(f\)-FDO has stretch \(\sigma \geq 1\) if the returned value \(\widehat D\) satisfies \(\operatornamediam(G - F) leq widehat D leq sigma \operatornamediam(G - F)\). For the case of a single edge failure (\(f=1\)) in an unweighted directed graph, there exists an approximate FDO by Henzinger et al. [ITCS 2017] with stretch \((1+\varepsilon)\), constant query time, space \(O(m)\), and a combinatorial preprocessing time of \(\widetildeO(mn + n^1.5 \sqrt{Dm/\varepsilon})\), where \(D\) is the diameter. We present a near-optimal FDO with the same stretch, query time, and space. It has a preprocessing time of \(\widetildeO(mn + n^2/\varepsilon)\), which is better for any constant \(\varepsilon > 0\). The preprocessing time nearly matches a conditional lower bound for combinatorial algorithms, also by Henzinger et al. When using fast matrix multiplication instead, we achieve a preprocessing time of \(\widetildeO(n^2.5794 + n^2/\varepsilon)\). We further prove an information-theoretic lower bound showing that any FDO with stretch better than \(3/2\) requires \(\Omega(m)\) bits of space. Thus, for constant \(0 < \varepsilon < 3/2\), our combinatorial \((1+ \varepsilon)\)-approximate FDO is near-optimal in all the parameters. In the case of multiple edge failures (\(f>1\)) in undirected graphs with non-negative edge weights, we give an \(f\)-FDO with stretch \((f+2)\), query time \(O(f^2\log^2{n})\), \(\widetildeO(fn)\) space, and preprocessing time \(\widetildeO(fm)\). We complement this with a lower bound excluding any finite stretch in \(o(fn)\) space. Many real-world networks have polylogarithmic diameter. We show that for those graphs and up to \(f = o(\log n/ \log\log n)\) failures one can swap approximation for query time and space. We present an exact combinatorial \(f\)-FDO with preprocessing time \(mn^{1+o(1)}\), query time \(n^{o(1)}\), and space \(n^{2+o(1)}\). With fast matrix multiplication, the preprocessing time can be improved to \(n^{\omega+o(1)}\), where \(\omega < 2.373\) is the matrix multiplication exponent.
Kißig, Otto; Taraz, Martin; Cohen, Sarel; Doskoč, Vanja; Friedrich, Tobias Drug Repurposing for Multiple COVID Strains using Collaborative FilteringICLR Workshop on Machine Learning for Preventing and Combating Pandemics (MLPCP@ICLR) 2021
The ongoing COVID-19 pandemic demands for a swift discovery of suitable treatments. The development of completely new compounds for such a novel disease is a challenging, time intensive process. This amplifies the relevance of drug repurposing, a technique where existing drugs are used to treat other diseases. A common bioinformatical approach to this is based on knowledge graphs, which compile relationships between drugs, diseases, genes and other biomedical entities. Then, graph neural networks (GNNs) are used for the drug repurposing task as they provide a good link prediction performance on such knowledge graphs. Building on state-of-the-art GNN research, Doshi & Chepuri (2020) construct the remarkable model DR-COVID. We re-implement their model and extend the approach to perform significantly better. We propose and evaluate several strategies for the aggregation of link predictions into drug recommendation rankings. With the help of clustering of similar target diseases we improve the model by a substantial margin, compiling a top-100 ranking of candidates including 32 currently being in COVID-19-related clinical trials. Regarding the re-implementation, we offer more flexibility in the selection of the graph neighborhood sizes fed into the model and reduce the training time significantly by making use of data parallelism.
Friedrich, Tobias; Neumann, Frank; Rothenberger, Ralf; Sutton, Andrew M. Solving Non-Uniform Planted and Filtered Random SAT Formulas GreedilyTheory and Applications of Satisfiability Testing (SAT) 2021: 188–206
Recently, there has been an interest in studying non-uniform random \(k\)-satisfiability (\(k\)-SAT) models in order to address the non-uniformity of formulas arising from real-world applications. While uniform random \(k\)-SAT has been extensively studied from both a theoretical and experimental perspective, understanding the algorithmic complexity of heterogeneous distributions is still an open challenge. When a sufficiently dense formula is guaranteed to be satisfiable by conditioning or a planted assignment, it is well-known that uniform random \(k\)-SAT is easy on average. We generalize this result to the broad class of non-uniform random \(k\)-SAT models that are characterized only by an ensemble of distributions over variables with a mild balancing condition. This balancing condition rules out extremely skewed distributions in which nearly half the variables occur less frequently than a small constant fraction of the most frequent variables, but generalizes recently studied non-uniform \(k\)-SAT distributions such as power-law and geometric formulas. We show that for all formulas generated from this model of at least logarithmic densities, a simple greedy algorithm can find a solution with high probability. As a side result we show that the total variation distance between planted and filtered (conditioned on satisfiability) models is \(o(1)\) once the planted model produces formulas with a unique solution with probability \(1-o(1)\). This holds for all random \(k\)-SAT models where the signs of variables are drawn uniformly and independently at random.
Bläsius, Thomas; Friedrich, Tobias; Katzmann, Maximilian Force-Directed Embedding of Scale-Free Networks in the Hyperbolic PlaneSymposium on Experimental Algorithms (SEA) 2021: 22:1–22:18
Force-directed drawing algorithms are the most commonly used approach to visualize networks. While they are usually very robust, the performance of Euclidean spring embedders decreases if the graph exhibits the high level of heterogeneity that typically occurs in scale-free real-world networks. As heterogeneity naturally emerges from hyperbolic geometry (in fact, scale-free networks are often perceived to have an underlying hyperbolic geometry), it is natural to embed them into the hyperbolic plane instead. Previous techniques that produce hyperbolic embeddings usually make assumptions about the given network, which (if not met) impairs the quality of the embedding. It is still an open problem to adapt force-directed embedding algorithms to make use of the heterogeneity of the hyperbolic plane, while also preserving their robustness. We identify fundamental differences between the behavior of spring embedders in Euclidean and hyperbolic space, and adapt the technique to take advantage of the heterogeneity of the hyperbolic plane.
Bläsius, Thomas; Friedrich, Tobias; Göbel, Andreas; Levy, Jordi; Rothenberger, Ralf The Impact of Heterogeneity and Geometry on the Proof Complexity of Random SatisfiabilitySymposium on Discrete Algorithms (SODA) 2021: 42–53
Satisfiability is considered the canonical NP-complete problem and is used as a starting point for hardness reductions in theory, while in practice heuristic SAT solving algorithms can solve large-scale industrial SAT instances very efficiently. This disparity between theory and practice is believed to be a result of inherent properties of industrial SAT instances that make them tractable. Two characteristic properties seem to be prevalent in the majority of real-world SAT instances, heterogeneous degree distribution and locality. To understand the impact of these two properties on SAT, we study the proof complexity of random k-SAT models that allow to control heterogeneity and locality. Our findings show that heterogeneity alone does not make SAT easy as heterogeneous random k-SAT instances have superpolynomial resolution size. This implies intractability of these instances for modern SAT-solvers. On the other hand, modeling locality with an underlying geometry leads to small unsatisfiable subformulas, which can be found within polynomial time. A key ingredient for the result on geometric random k-SAT can be found in the complexity of higher-order Voronoi diagrams. As an additional technical contribution, we show an upper bound on the number of non-empty Voronoi regions, that holds for points with random positions in a very general setting. In particular, it covers arbitrary p-norms, higher dimensions, and weights affecting the area of influence of each point multiplicatively. Our bound is linear in the total weight. This is in stark contrast to quadratic lower bounds for the worst case.
Friedemann, Wilhelm; Friedrich, Tobias; Gawendowicz, Hans; Lenzner, Pascal; Melnichenko, Anna; Peters, Jannik; Stephan, Daniel; Vaichenker, Michael Efficiency and Stability in Euclidean Network DesignSymposium on Parallelism in Algorithms and Architectures (SPAA) 2021: 232–242
Network Design problems typically ask for a minimum cost sub-network from a given host network. This classical point-of-view assumes a central authority enforcing the optimum solution. But how should networks be designed to cope with selfish agents that own parts of the network? In this setting, minimum cost networks may be very unstable in that agents will deviate from a proposed solution if this decreases their individual cost. Hence, designed networks should be both efficient in terms of total cost and stable in terms of the agents' willingness to accept the network. We study this novel type of Network Design problem by investigating the creation of \($(\beta,\gamma)$\)-networks, that are in \($\beta$\)-approximate Nash equilibrium and have a total cost of at most \($\gamma$\) times the optimal cost, for the recently proposed Euclidean Generalized Network Creation Game by Bilò et al.SPAA2019. There, \($n$\) agents corresponding to points in Euclidean space create costly edges among themselves to optimize their centrality in the created network. Our main result is a simple \($\mathcal{O(n^2)$\)-time algorithm that computes a \($(\beta,\beta)$\)-network with low \($\beta$\) for any given set of points. Moreover, on integer grid point sets or random point sets our algorithm achieves a low constant \($\beta$\). Besides these results for the Euclidean model, we discuss a generalization of our algorithm to instances with arbitrary, even non-metric, edge lengths. Moreover, in contrast to these algorithmic results, we show that no such positive results are possible when focusing on either optimal networks, i.e., \($(\beta,1)$\)-networks, or perfectly stable networks, i.e., \($(1,\gamma)$\)-networks, as in both cases NP-hard problems arise, there exist instances with very unstable optimal networks, and there are instances for perfectly stable networks with high total cost. Along the way, we significantly improve several results from Bilò et al. and we asymptotically resolve their conjecture about the Price of Anarchy by providing a tight bound.
Kißig, Otto; Taraz, Martin; Cohen, Sarel; Friedrich, Tobias Drug Repurposing Using Link Prediction on Knowledge GraphsICML Workshop on Computational Biology (WCB@ICML) 2021: 742–753
The active global SARS-CoV-2 pandemic caused more than 167 million cases and 3.4 million deaths worldwide. The development of completely new drugs for such a novel disease is a challenging, time intensive process and despite researchers around the world working on this task, no effective treatments have been developed yet. This emphasizes the importance of drug repurposing, where treatments are found among existing drugs that are meant for different diseases. A common approach to this is based on knowledge graphs, that condense relationships between entities like drugs, diseases and genes. Graph neural networks (GNNs) can then be used for the task at hand by predicting links in such knowledge graphs. Expanding on state-of-the-art GNN research, Doshi et al. recently developed the Dr-COVID model. We further extend their work using additional output interpretation strategies. The best aggregation strategy derives a top-100 ranking of candidate drugs, 32 of which currently being in COVID-19-related clinical trials. Moreover, we present an alternative application for the model, the generation of additional candidates based on a given pre-selection of drug candidates using collaborative filtering. In addition, we improved the implementation of the Dr-COVID model by significantly shortening the inference and pre-processing time by exploiting data-parallelism. As drug repurposing is a task that requires high computation and memory resources, we further accelerate the post-processing phase using a new emerging hardware—we propose a new approach to leverage the use of high-capacity Non-Volatile Memory for aggregate drug ranking.