Göbel, Andreas; Lagodzinski, J. A. Gregor; Seidel, Karen Counting Homomorphisms to Trees Modulo a PrimeACM Transactions on Computation Theory 2021
Many important graph theoretic notions can be encoded as counting graph homomorphism problems, such as partition functions in statistical physics, in particular independent sets and colourings. In this article we study the complexity of~\($\#_p\textsc{HomsTo}H$\), the problem of counting graph homomorphisms from an input graph to a graph \($H$\) modulo a prime number~\($p$\). Dyer and Greenhill proved a dichotomy stating that the tractability of non-modular counting graph homomorphisms depends on the structure of the target graph. Many intractable cases in non-modular counting become tractable in modular counting due to the common phenomenon of cancellation. In subsequent studies on counting modulo~\($2$\), however, the influence of the structure of~\($H$\) on the tractability was shown to persist, which yields similar dichotomies. Our main result states that for every tree~\($H$\) and every prime~\($p$\) the problem \($\#_p\textsc{HomsTo}H$\) is either polynomial time computable or \($\#_p\mathsf{P}$\)-complete. This relates to the conjecture of Faben and Jerrum stating that this dichotomy holds for every graph \($H$\) when counting modulo~2. In contrast to previous results on modular counting, the tractable cases of \($\#_p\textsc{HomsTo}H$\) are essentially the same for all values of the modulo when \($H$\) is a tree. To prove this result, we study the structural properties of a homomorphism. As an important interim result, our study yields a dichotomy for the problem of counting weighted independent sets in a bipartite graph modulo some prime~\($p$\). These results are the first suggesting that such dichotomies hold not only for the one-bit functions of the modulo~2 case but also for the modular counting functions of all primes~\($p$\).
Cseh, Ágnes; Juhos, Attila Pairwise Preferences in the Stable Marriage ProblemACM Transactions on Economics and Computation (TEAC) 2021: 1–28
We study the classical, two-sided stable marriage problem under pairwise preferences. In the most general setting, agents are allowed to express their preferences as comparisons of any two of their edges and they also have the right to declare a draw or even withdraw from such a comparison. This freedom is then gradually restricted as we specify six stages of orderedness in the preferences, ending with the classical case of strictly ordered lists. We study all cases occurring when combining the three known notions of stability—weak, strong and super-stability—under the assumption that each side of the bipartite market obtains one of the six degrees of orderedness. By designing three polynomial algorithms and two NP-completeness proofs we determine the complexity of all cases not yet known, and thus give an exact boundary in terms of preference structure between tractable and intractable cases.
Cseh, Ágnes; Kavitha, Telikepalli Popular matchings in complete graphsAlgorithmica 2021: 1–31
Our input is a complete graph \(G = (V,E)\) on n vertices where each vertex has a strict ranking of all other vertices in \(G\). The goal is to construct a matching in \(G\) that is "globally stable" or popular. A matching \(M\) is popular if \(M\) does not lose a head-to-head election against any matching \(M'\): here each vertex casts a vote for the matching in \(\{M,M'\}\) where it gets a better assignment. Popular matchings need not exist in the given instance \(G\) and the popular matching problem is to decide whether one exists or not. The popular matching problem in \(G\) is easy to solve for odd \(n\). Surprisingly, the problem becomes NP-hard for even n, as we show here.
Andersson, Tommy; Cseh, Ágnes; Ehlers, Lars; Erlanson, Albin Organizing time exchanges: Lessons from matching marketsAmerican Economic Journal: Microeconomics 2021: 338–73
This paper considers time exchanges via a common platform (e.g. , markets for exchanging time units, positions at education institutions, and tuition waivers). There are several problems associated with such markets, e.g., imbalanced outcomes, coordination problems, and inefficiencies. We model time exchanges as matching markets and construct a non-manipulable mec hanism that selects an individually rational and balanced allocation that maximizes exchanges among the participating agents (and those allocations are efficient). This mechanism works on a preference domain whereby agents classify the goods provided by other participating agents as either unacceptable or acceptable, and for goods classified as acceptable, agents have specific upper quotas representing their maximum needs.
Haemers, Willem H; Parsaei Majd, Leila Spectral symmetry in conference matricesDesigns, Codes and Cryptography 2021: 1–8
A conference matrix of order \(n\) is an \(n \times n\) matrix \(C\) with diagonal entries \(0\) and off-diagonal entries \(\pm 1\) satisfying \(CC^T = (n - 1) I \). If \(C\) is symmetric, then \(C\) has a symmetric spectrum \(\sum\) (that is, \( \sum = - \sum \) ) and eigenvalues \(\pm \sqrt{n-1}\). We show that many principal submatrices of \(C\) also have symmetric spectrum, which leads to examples of Seidel matrices of graphs (or, equivalently, adjacency matrices of complete signed graphs) with a symmetric spectrum. In addition, we show that some Seidel matrices with symmetric spectrum can be characterized by this construction.
Michail, Othon; Skretas, George; Spirakis, Paul G. Distributed computation and reconfiguration in actively dynamic networksDistributed Computing 2021: 185–206
Doerr, Benjamin; Krejca, Martin S. The Univariate Marginal Distribution Algorithm Copes Well with Deception and EpistasisEvolutionary Computation 2021: 543–563
In their recent work, Lehre and Nguyen (FOGA 2019) show that the univariate marginal distribution algorithm (UMDA) needs time exponential in the parent populations size to optimize the DeceivingLeadingBlocks (DLB) problem. They conclude from this result that univariate EDAs have difficulties with deception and epistasis. In this work, we show that this negative finding is caused by an unfortunate choice of the parameters of the UMDA. When the population sizes are chosen large enough to prevent genetic drift, then the UMDA optimizes the DLB problem with high probability with at most \(\lambda(\frac{n}{2 + 2 e \ln n)\) fitness evaluations. Since an offspring population size \(\lambda\) of order \(n \log n\) can prevent genetic drift, the UMDA can solve the DLB problem with \(O(n^2 \log n)\) fitness evaluations. In contrast, for classic evolutionary algorithms no better run time guarantee than \(O(n^3)\) is known (which we prove to be tight for the \(1 + 1\) EA), so our result rather suggests that the UMDA can cope well with deception and epistatis. From a broader perspective, our result shows that the UMDA can cope better with local optima than many classic evolutionary algorithms; such a result was previously known only for the compact genetic algorithm. Together with the lower bound of Lehre and Nguyen, our result for the first time rigorously proves that running EDAs in the regime with genetic drift can lead to drastic performance losses.
Casel, Katrin; Fernau, Henning; Khosravian Ghadikolaei, Mehdi; Monnot, Jérôme; Sikora, Florian On the Complexity of Solution Extension of Optimization ProblemsTheoretical Computer Science 2021
The question if a given partial solution to a problem can be extended reasonably occurs in many algorithmic approaches for optimization problems. For instance, when enumerating minimal vertex covers of a graph G=(V,E), one usually arrives at the problem to decide for a vertex set \(U \subseteq V\) (pre-solution), if there exists a minimal vertex cover S (i.e., a vertex cover \(S \subseteq V\) such that no proper subset of S is a vertex cover) with \(U \subseteq S\) (minimal extension of U). We propose a general, partial-order based formulation of such extension problems which allows to model parameterization and approximation aspects of extension, and also highlights relationships between extension tasks for different specific problems. As examples, we study a number of specific problems which can be expressed and related in this framework. In particular, we discuss extension variants of the problems dominating set and feedback vertex/edge set. All these problems are shown to be NP-complete even when restricted to bipartite graphs of bounded degree, with the exception of our extension version of feedback edge set on undirected graphs which is shown to be solvable in polynomial time. For the extension variants of dominating and feedback vertex set, we also show NP-completeness for the restriction to planar graphs of bounded degree. As non-graph problem, we also study an extension version of the bin packing problem. We further consider the parameterized complexity of all these extension variants, where the parameter is a measure of the pre-solution as defined by our framework.
Doerr, Benjamin; Krejca, Martin S. A Simplified Run Time Analysis of the Univariate Marginal Distribution Algorithm on LeadingOnesTheoretical Computer Science 2021: 121–128
With elementary means, we prove in this note a stronger run time guarantee for the univariate marginal distribution algorithm (UMDA) optimizing the LeadingOnes benchmark function in the desirable regime with low genetic drift. If the population size \(\mu\) is at least \(128 n \ln(n)\) and the selection rate \(\mu/\lambda\) is at most some constant less than \(\frac{1}{4e \approx 0.092\), then the UMDA samples the optimum within \(O\big(n\frac{\lambda}{log (\lambda/\mu)}\big)\) fitness evaluations with high probability. This improves over the previous \(O(n \lambda \log\lambda)\) guarantee that was obtained by Dang and Lehre (2015) via the deep level-based population method. With similar arguments as in our upper-bound analysis, we also obtain the first lower bound for this problem. Under similar assumptions, we prove that a bound of \(\Omega\big(lambda \frac{n - \log(n\lambda)}log (\lambda / \mu)}\big)\) fitness evaluations holds with high probability.
Casel, Katrin; Fernau, Henning; Gaspers, Serge; Gras, Benjamin; Schmid, Markus L. On the Complexity of the Smallest Grammar Problem over Fixed AlphabetsTheory of Computing Systems 2021: 344–409
In the smallest grammar problem, we are given a word \(w\)and we want to compute a preferably small context-free grammar \(G\) for the singleton language \(\{w\}\) (where the size of a grammar is the sum of the sizes of its rules, and the size of a rule is measured by the length of its right side). It is known that, for unbounded alphabets, the decision variant of this problem is \(\mathsf{NP}\)-hard and the optimisation variant does not allow a polynomial-time approximation scheme, unless \(\mathsf{P}\) = \(\mathsf{NP}\). We settle the long-standing open problem whether these hardness results also hold for the more realistic case of a constant-size alphabet. More precisely, it is shown that the smallest grammar problem remains \(\mathsf{NP}\)-complete (and its optimisation version is \(\mathsf{APX}\)-hard), even if the alphabet is fixed and has size of at least 17. The corresponding reduction is robust in the sense that it also works for an alternative size-measure of grammars that is commonly used in the literature (i.e., a size measure also taking the number of rules into account), and it also allows to conclude that even computing the number of rules required by a smallest grammar is a hard problem. On the other hand, if the number of nonterminals (or, equivalently, the number of rules) is bounded by a constant, then the smallest grammar problem can be solved in polynomial time, which is shown by encoding it as a problem on graphs with interval structure. However, treating the number of rules as a parameter (in terms of parameterised complexity) yields \(\mathsf{W}\)[1]-hardness. Furthermore, we present an \(O(3^{|w|})\) exact exponential-time algorithm, based on dynamic programming. These three main questions are also investigated for 1-level grammars, i.e., grammars for which only the start rule contains nonterminals on the right side; thus, investigating the impact of the ``hierarchical depth'' of grammars on the complexity of the smallest grammar problem. In this regard, we obtain for 1-level grammars similar, but slightly stronger results.
Neubert, Stefan Grundkurs Theoretische Informatik Rheinwerk Computing 2021
Boockmeyer, Arne; Fischbeck, Philipp; Neubert, Stefan Fit fürs Studium - Informatik Rheinwerk Computing 2021
Aziz, Haris; Cseh, Agnes; Dickerson, John; McElfresh, Duncan Optimal Kidney Exchange with ImmunosuppressantsConference on Artificial Intelligence (AAAI) 2021: 21–29
Algorithms for exchange of kidneys is one of the key successful applications in market design, artificial intelligence, and operations research. Potent immunosuppressant drugs suppress the body's ability to reject a transplanted organ up to the point that a transplant across blood- or tissue-type incompatibility becomes possible. In contrast to the standard kidney exchange problem, we consider a setting that also involves the decision about which recipients receive from the limited supply of immunosuppressants that make them compatible with originally incompatible kidneys. We firstly present a general computational framework to model this problem. Our main contribution is a range of efficient algorithms that provide flexibility in terms of meeting meaningful objectives. Motivated by the current reality of kidney exchanges using sophisticated mathematical-programming-based clearing algorithms, we then present a general but scalable approach to optimal clearing with immunosuppression; we validate our approach on realistic data from a large fielded exchange.
Bilò, Davide; Friedrich, Tobias; Lenzner, Pascal; Lowski, Stefanie; Melnichenko, Anna Selfish Creation of Social NetworksConference on Artificial Intelligence (AAAI) 2021: 5185–5193
Understanding real-world networks has been a core research endeavor throughout the last two decades. Network Creation Games are a promising approach for this from a game-theoretic perspective. In these games, selfish agents corresponding to nodes in a network strategically decide which links to form to optimize their centrality. Many versions have been introduced and analyzed, but none of them fits to modeling the evolution of social networks. In real-world social networks, connections are often established by recommendations from common acquaintances or by a chain of such recommendations. Thus establishing and maintaining a contact with a friend of a friend is easier than connecting to complete strangers. This explains the high clustering, i.e., the abundance of triangles, in real-world social networks. We propose and analyze a network creation model inspired by real-world social networks. Edges are formed in our model via bilateral consent of both endpoints and the cost for establishing and maintaining an edge is proportional to the distance of the endpoints before establishing the connection. We provide results for generic cost functions, which essentially only must be convex functions in the distance of the endpoints without the respective edge. For this broad class of cost functions, we provide many structural properties of equilibrium networks and prove (almost) tight bounds on the diameter, the Price of Anarchy and the Price of Stability. Moreover, as a proof-of-concept we show via experiments that the created equilibrium networks of our model indeed closely mimics real-world social networks. We observe degree distributions that seem to follow a power-law, high clustering, and low diameters. This can be seen as a promising first step towards game-theoretic network creation models that predict networks featuring all core real-world properties.
Aziz, Haris; Chan, Hau; Cseh, Ágnes; Li, Bo; Ramezani, Fahimeh; Wang, Chenhao Multi-Robot Task Allocation—Complexity and ApproximationAutonomous Agents and Multiagent Systems (AAMAS) 2021: 133–141
Multi-robot task allocation is one of the most fundamental classes of problems in robotics and is crucial for various real-world robotic applications such as search, rescue and area exploration. We consider the Single-Task robots and Multi-Robot tasks Instantaneous Assignment (ST-MR-IA) setting where each task requires at least a certain number of robots and each robot can work on at most one task and incurs an operational cost for each task. Our aim is to consider a natural computational problem of allocating robots to complete the maximum number of tasks subject to budget constraints. We consider budget constraints of three different kinds: (1) total budget, (2) task budget, and (3) robot budget. We provide a detailed complexity analysis including results on approximations as well as polynomial-time algorithms for the general setting and important restricted settings.
Kraiczy, Sonja; Cseh, Ágnes; Manlove, David On Weakly and Strongly Popular RankingsAutonomous Agents and Multiagent Systems (AAMAS) 2021: 1563–1565
Van Zuylen et al. introduced the notion of a popular ranking in a voting context, where each voter submits a strictly-ordered list of all candidates. A popular ranking pi of the candidates is at least as good as any other ranking sigma in the following sense: if we compare pi to sigma, at least half of all voters will always weakly prefer pi. Whether a voter prefers one ranking to another is calculated based on the Kendall distance. A more traditional definition of popularity---as applied to popular matchings, a well-established topic in computational social choice---is stricter, because it requires at least half of the voters who are not indifferent between pi and sigma to prefer pi. In this paper, we derive structural and algorithmic results in both settings, also improving upon the results by van Zylen et al. We also point out connections to the famous open problem of finding a Kemeny consensus with 3 voters.
Cooley, Madison; Greene, Casey; Issac, Davis; Pividori, Milton; Sullivan, Blair Parameterized Algorithms for Identifying Gene Co-Expression Modules via Weighted Clique DecompositionApplied and Computational Discrete Algorithms (ACDA) 2021: 111–122
We present a new combinatorial model for identifying regulatory modules in gene co-expression data using a decomposition into weighted cliques. To capture complex interaction effects, we generalize the previously-studied weighted edge clique partition problem. As a first step, we restrict ourselves to the noise-free setting, and show that the problem is fixed parameter tractable when parameterized by the number of modules (cliques). We present two new algorithms for finding these decompositions, using linear programming and integer partitioning to determine the clique weights. Further, we implement these algorithms in Python and test them on a biologically-inspired synthetic corpus generated using real-world data from transcription factors and a latent variable analysis of co-expression in varying cell types.
Quinzan, Francesco; Doskoč, Vanja; Göbel, Andreas; Friedrich, Tobias Adaptive Sampling for Fast Constrained Maximization of Submodular FunctionsArtificial Intelligence and Statistics (AISTATS) 2021: 964–972
Several large-scale machine learning tasks, such as data summarization, can be approached by maximizing functions that satisfy submodularity. These optimization problems often involve complex side constraints, imposed by the underlying application. In this paper, we develop an algorithm with poly-logarithmic adaptivity for non-monotone submodular maximization under general side constraints. The adaptive complexity of a problem is the minimal number of sequential rounds required to achieve the objective. Our algorithm is suitable to maximize a non-monotone submodular function under a \(p\)-system side constraint, and it achieves a \((p + O({\sqrt{p}}))\)-approximation for this problem, after only poly-logarithmic adaptive rounds and polynomial queries to the valuation oracle function. Furthermore, our algorithm achieves a \(p + O(1))\)-approximation when the given side constraint is a \(p\)-extendible system. This algorithm yields an exponential speed-up, with respect to the adaptivity, over any other known constant-factor approximation algorithm for this problem. It also competes with previous known results in terms of the query complexity. We perform various experiments on various real-world applications. We find that, in comparison with commonly used heuristics, our algorithm performs better on these instances.
Borndörfer, Ralf; Casel, Katrin; Issac, Davis; Niklanovits, Aikaterini; Schwartz, Stephan; Zeif, Ziena Connected k-Partition of k-Connected Graphs and c-Claw-Free GraphsApproximation Algorithms for Combinatorial Optimization Problems (APPROX) 2021: 27:1–27:14
A connected partition is a partition of the vertices of a graph into sets that induce connected subgraphs. Such partitions naturally occur in many application areas such as road networks, and image processing. In these settings, it is often desirable to partition into a fixed number of parts of roughly of the same size or weight. The resulting computational problem is called Balanced Connected Partition (BCP). The two classical objectives for BCP are to maximize the weight of the smallest, or minimize the weight of the largest component. We study BCP on \(c\)-claw-free graphs, the class of graphs that do not have \(K_{1,c}\) as an induced subgraph, and present efficient \((c-1)\)-approximation algorithms for both objectives. In particular, for \(3\)-claw-free graphs, also simply known as claw-free graphs, we obtain a \(2\)-approximation. Due to the claw-freeness of line graphs, this also implies a \(2\)-approximation for the edge-partition version of BCP in general graphs. A harder connected partition problem arises from demanding a connected partition into k parts that have (possibly) heterogeneous target weights \(w_1, \dots, w_k\). In the 1970s Győri and Lovász showed that if \(G\) is \(k\)-connected and the target weights sum to the total size of \(G\), such a partition exists. However, to this day no polynomial algorithm to compute such partitions exists for \(k > 4\). Towards finding such a partition \(T_1, \dots, T_k\) in \(k\)-connected graphs for general \(k\), we show how to efficiently compute connected partitions that at least approximately meet the target weights, subject to the mild assumption that each \(w_i\) is greater than the weight of the heaviest vertex. In particular, we give a \(3\)-approximation for both the lower and the upper bounded version i.e. we guarantee that each \(T_i\) has weight at least \(\frac{w_i}{3}\) or that each \(T_i\) has weight most \(3 w_i\), respectively. Also, we present a both-side bounded version that produces a connected partition where each \(T_i\) has size at least \(\frac{w_i}{3}\) and at most \(\max(\{r, 3\}) w_i \), where \(r \geq 1\) is the ratio between the largest and smallest value in \(w_1, \dots, w_k\). In particular for the balanced version, i.e. \(w_1 = w_2 =, \dots , = w k\), this gives a partition with \(\frac{w_i3 leq w(T_i) \leq 3 w_i\).
Tabatabaei, Azadeh; Soheil, Farehe; Aletaha, Mohammad; Ghodsi, Mohammad Integer Cow-path Problem and Simple Robot Street SearchCanadian Conference on Computational Geometry (CCCG) 2021: 388–398
In this paper, we revisit the well-known cow-path problem and introduce a new variation called Integer Cow- path Problem (ICP). In the general cow-path problem, w rays with one common end-point and a robot standing on the end-point are given. A target point is put along one of the rays and can be detected only then it is reached by the robot. The robot has to find the target by traversing the rays starting from the end-point. In the ICP, the robot is restricted to take an integer number of steps. The goal is to design a strategy for the robot to find the target such that the length of the traveled path is as small as possible. We present a randomized strategy that gives an upper bound on the competitive ratio for the ICP. Furthermore, as an application of this variation, we study the Simple Robot Street Search problem and give a randomized strategy that is inspired from the strategy for the ICP.
Berger, Julian; Böther, Maximilian; Doskoč, Vanja; Gadea Harder, Jonathan; Klodt, Nicolas; Kötzing, Timo; Lötzsch, Winfried; Peters, Jannik; Schiller, Leon; Seifert, Lars; Wells, Armin; Wietheger, Simon Learning Languages with Decidable HypothesesComputability in Europe (CiE) 2021: 25–37
In \emph{language learning in the limit, the most common type of hypothesis is to give an enumerator for a language, a \(W\)-index. These hypotheses have the drawback that even the membership problem is undecidable. In this paper, we use a different system which allows for naming arbitrary decidable languages, namely \emph{programs for characteristic functions (called \(C\)-indices). These indices have the drawback that it is now not decidable whether a given hypothesis is even a legal \(C\)-index. In this first analysis of learning with \(C\)-indices, we give a structured account of the learning power of various restrictions employing \(C\)-indices, also when compared with \(W\)-indices. We establish a hierarchy of learning power depending on whether \(C\)-indices are required (a) on all outputs; (b) only on outputs relevant for the class to be learned or (c) only in the limit as final, correct hypotheses. We analyze all these questions also in relation to the mode of data presentation. Finally, we also ask about the relation of semantic versus syntactic convergence and derive the map of pairwise relations for these two kinds of convergence coupled with various forms of data presentation.
Doskoč, Vanja; Kötzing, Timo Mapping Monotonic Restrictions in Inductive InferenceComputability in Europe (CiE) 2021: 146–157
In \emph{inductive inference we investigate computable devices (learners) learning formal languages. In this work, we focus on \emph{monotonic learners which, despite their natural motivation, exhibit peculiar behaviour. A recent study analysed the learning capabilities of \emph{strongly monotone learners in various settings. The therein unveiled differences between \emph{explanatory (syntactically converging) and \emph{behaviourally correct (semantically converging) such learners motivate our studies of \emph{monotone learners in the same settings. While the structure of the pairwise relations for monotone explanatory learning is similar to the strongly monotone case (and for similar reasons), for behaviourally correct learning a very different picture emerges. In the latter setup, we provide a \emph{self-learning class of languages showing that monotone learners, as opposed to their strongly monotone counterpart, do heavily rely on the order in which the information is given, an unusual result for behaviourally correct learners.
Doskoč, Vanja; Kötzing, Timo Normal Forms for Semantically Witness-Based Learners in Inductive InferenceComputability in Europe (CiE) 2021: 158–168
In \emph{inductive inference, we study learners (computable devices) inferring formal languages. In particular, we consider \emph{semantically witness-based learners, that is, learners which are required to \emph{justify each of their semantic mind changes. This natural requirement deserves special attention as it is a specialization of various important learning paradigms. As such, it has already proven to be fruitful for gaining knowledge about other types of restrictions. In this paper, we provide a thorough analysis of semantically converging, semantically witness-based learners, obtaining \emph{normal forms for them. Most notably, we show that \emph{set-driven globally semantically witness-based learners are equally powerful as their \emph{Gold-style semantically conservative counterpart. Such results are key to understanding the, yet undiscovered, mutual relation between various important learning paradigms of semantically converging learners.
Khazraei, Ardalan; Kötzing, Timo; Seidel, Karen Towards a Map for Incremental Learning in the Limit from Positive and Negative InformationComputability in Europe (CiE) 2021: 273–284
In order to model an efficient learning paradigm, iterative learning algorithms access data one by one, updating the current hypothesis without regress to past data. Prior research investigating the impact of additional requirements on iterative learners left many questions open, especially in learning from informant, where the input is binary labeled.
Kötzing, Timo; Seidel, Karen Learning Languages in the Limit from Positive Information with Finitely Many Memory ChangesComputability in Europe (CiE) 2021: 318–329
We investigate learning collections of languages from texts by an inductive inference machine with access to the current datum and a bounded memory in form of states. Such a bounded memory states (\(\mathbf{BMS}\)) learner is considered successful in case it eventually settles on a correct hypothesis while exploiting only finitely many different states.
Cohen, Sarel; Hershcovitch, Moshik; Taraz, Martin; Kißig, Otto; Wood, Andrew; Waddington, Daniel; Chin, Peter; Friedrich, Tobias Drug Repurposing using Link Prediction on Knowledge Graphs with Applications to Non-Volatile MemoryComplex Networks and their Applications (ComplexNetworks) 2021: 742–753
The active global SARS-CoV-2 pandemic caused more than 167 million cases and 3.4 million deaths worldwide. The development of completely new drugs for such a novel disease is a challenging, time intensive process and despite researchers around the world working on this task, no effective treatments have been developed yet. This emphasizes the importance of \emph{drug repurposing, where treatments are found among existing drugs that are meant for different diseases. A common approach to this is based on \emph{knowledge graphs, that condense relationships between entities like drugs, diseases and genes. Graph neural networks (GNNs) can then be used for the task at hand by predicting links in such knowledge graphs. Expanding on state-of-the-art GNN research, Doshi et al. recently developed the Dr-COVID model. We further extend their work using additional output interpretation strategies. The best aggregation strategy derives a top-100 ranking of candidate drugs, 32 of which currently being in COVID-19-related clinical trials. Moreover, we present an alternative application for the model, the generation of additional candidates based on a given pre-selection of drug candidates using collaborative filtering. In addition, we improved the implementation of the Dr-COVID model by significantly shortening the inference and pre-processing time by exploiting data-parallelism. As drug repurposing is a task that requires high computation and memory resources, we further accelerate the post-processing phase using a new emerging hardware --- we propose a new approach to leverage the use of high-capacity Non-Volatile Memory for aggregate drug ranking.
Bilò, Davide; Cohen, Sarel; Friedrich, Tobias; Schirneck, Martin Near-Optimal Deterministic Single-Source Distance Sensitivity OraclesEuropean Symposium on Algorithms (ESA) 2021: 18:1–18:17
Given a graph with a distinguished source vertex \(s\), the Single Source Replacement Paths (SSRP) problem is to compute and output, for any target vertex \(t\) and edge \(e\), the length \(d(s,t,e)\) of a shortest path from \(s\) to \(t\) that avoids a failing edge \(e\). A Single-Source Distance Sensitivity Oracle (Single-Source DSO) is a compact data structure that answers queries of the form \((t,e)\) by returning the distance \(d(s,t,e)\). We show how to compress the output of the SSRP problem on \(n\)-vertex, \(m\)-edge graphs with integer edge weights in the range \([1,M]\) into a deterministic Single-Source DSO that has size \(O(M^{1/2 n^{3/2})\) and query time \(\widetildeO(1)\). We prove that the space requirement is optimal (up to the word size). Our techniques can also handle vertex failures within the same bounds. Chechik and Cohen [SODA 2019] presented a combinatorial randomized \(\widetildeO(m\sqrt{n}+n^2)\) time SSRP algorithm for undirected and unweighted graphs. We derandomize their algorithm with the same asymptotic running time and apply our compression to obtain a deterministic Single-Source DSO with \(\widetildeO(m\sqrt{n}+n^2)\) preprocessing time, \(O(n^{3/2})\) space, and \(\widetildeO(1)\) query time. Our combinatorial Single-Source DSO has near-optimal space, preprocessing and query time for dense unweighted graphs, improving the preprocessing time by a \(\sqrt{n}\)-factor compared to previous results. Grandoni and Vassilevska Williams [FOCS 2012, TALG 2020] gave an algebraic randomized \(\widetildeO(Mn^\omega)\) time SSRP algorithm for (undirected and directed) graphs with integer edge weights in the range \([1,M]\), where \(\omega < 2.373\) is the matrix multiplication exponent. We derandomize their algorithm for undirected graphs and apply our compression to obtain an algebraic Single-Source DSO with \(\widetildeO(Mn^\omega)\) preprocessing time, \(O(M^{1/2 n^{3/2})\) space, and \(\widetildeO(1)\) query time. This improves the preprocessing time of algebraic Single-Source DSOs by polynomial factors compared to previous results. We also present further improvements of our Single-Source DSOs. We show that the query time can be reduced to a constant at the cost of increasing the size of the oracle to \(O(M^{1/3 n^{5/3})\) and that all our oracles can be made path-reporting. On sparse graphs with \(m=O(\frac{n^{5/4-\varepsilon}}{M^{7/4}})\) edges, for any constant \(\varepsilon > 0\), we reduce the preprocessing to randomized \(\widetildeO(M^7/8 m^1/2 n^{11/8}) = O(n^{2-\varepsilon/2})\) time. To the best of our knowledge, this is the first truly subquadratic time algorithm for building Single-Source DSOs on sparse graphs.
Bläsius, Thomas; Friedrich, Tobias; Katzmann, Maximilian Efficiently Approximating Vertex Cover on Scale-Free Networks with Underlying Hyperbolic GeometryEuropean Symposium on Algorithms (ESA) 2021: 20:1–20:15
Finding a minimum vertex cover in a network is a fundamental NP-complete graph problem. One way to deal with its computational hardness, is to trade the qualitative performance of an algorithm (allowing non-optimal outputs) for an improved running time. For the vertex cover problem, there is a gap between theory and practice when it comes to understanding this tradeoff. On the one hand, it is known that it is NP-hard to approximate a minimum vertex cover within a factor of \(\sqrt{2}\). On the other hand, a simple greedy algorithm yields close to optimal approximations in practice. A promising approach towards understanding this discrepancy is to recognize the differences between theoretical worst-case instances and real-world networks. Following this direction, we close the gap between theory and practice by providing an algorithm that efficiently computes nearly optimal vertex cover approximations on hyperbolic random graphs; a network model that closely resembles real-world networks in terms of degree distribution, clustering, and the small-world property. More precisely, our algorithm computes a \((1 + o(1))\)-approximation, asymptotically almost surely, and has a running time of \(\mathcal{O(m \log(n))\). The proposed algorithm is an adaption of the successful greedy approach, enhanced with a procedure that improves on parts of the graph where greedy is not optimal. This makes it possible to introduce a parameter that can be used to tune the tradeoff between approximation performance and running time. Our empirical evaluation on real-world networks shows that this allows for improving over the near-optimal results of the greedy approach.
Bläsius, Thomas; Friedrich, Tobias; Weyand, Christopher Efficiently Computing Maximum Flows in Scale-Free NetworksEuropean Symposium on Algorithms (ESA) 2021: 21:1–21:14
We study the maximum-flow/minimum-cut problem on scale-free networks, i.e., graphs whose degree distribution follows a power-law. We propose a simple algorithm that capitalizes on the fact that often only a small fraction of such a network is relevant for the flow. At its core, our algorithm augments Dinitz's algorithm with a balanced bidirectional search. Our experiments on a scale-free random network model indicate sublinear run time. On scale-free real-world networks, we outperform the commonly used highest-label Push-Relabel implementation by up to two orders of magnitude. Compared to Dinitz's original algorithm, our modifications reduce the search space, e.g., by a factor of 275 on an autonomous systems graph. Beyond these good run times, our algorithm has an additional advantage compared to Push-Relabel. The latter computes a preflow, which makes the extraction of a minimum cut potentially more difficult. This is relevant, for example, for the computation of Gomory-Hu trees. On a social network with 70000 nodes, our algorithm computes the Gomory-Hu tree in 3 seconds compared to 12 minutes when using Push-Relabel.
Casel, Katrin; Friedrich, Tobias; Issac, Davis; Niklanovits, Aikaterini; Zeif, Ziena Balanced Crown Decomposition for Connectivity ConstraintsEuropean Symposium on Algorithms (ESA) 2021: 26:1–26:15
We introduce the balanced crown decomposition that captures the structure imposed on graphs by their connected induced subgraphs of a given size. Such subgraphs are a popular modeling tool in various application areas, where the non-local nature of the connectivity condition usually results in very challenging algorithmic tasks. The balanced crown decomposition is a combination of a crown decomposition and a balanced partition which makes it applicable to graph editing as well as graph packing and partitioning problems. We illustrate this by deriving improved approximation algorithms and kernelization for a variety of such problems. In particular, through this structure, we obtain the first constant-factor approximation for the Balanced Connected Partition (BCP) problem, where the task is to partition a vertex-weighted graph into k connected components of approximately equal weight. We derive a \(3\)-approximation for the two most commonly used objectives of maximizing the weight of the lightest component or minimizing the weight of the heaviest component.
Behrendt, Lukas; Casel, Katrin; Friedrich, Tobias; Lagodzinski, J. A. Gregor; Löser, Alexander; Wilhelm, Marcus From Symmetry to Asymmetry: Generalizing TSP Approximations by ParametrizationFundamentals of Computation Theory (FCT) 2021: 53–66
We generalize the tree doubling and Christofides algorithm to parameterized approximations for ATSP. The parameters we consider for the respective generalizations are upper bounded by the number of asymmetric distances, which yields algorithms to efficiently compute good approximations also for moderately asymmetric TSP instances. As generalization of the Christofides algorithm, we derive a parameterized 2.5-approximation, where the parameter is the size of a vertex cover for the subgraph induced by the asymmetric distances. Our generalization of the tree doubling algorithm gives a parameterized 3-approximation, where the parameter is the minimum number of asymmetric distances in a minimum spanning arborescence. Further, we combine these with a notion of symmetry relaxation which allows to trade approximation guarantee for runtime. Since the two parameters we consider are theoretically incomparable, we present experimental results which show that generalized tree doubling frequently outperforms generalized Christofides with respect to parameter size.
Berger, Julian; Bleidt, Tibor; Büßemeyer, Martin; Ding, Marcus; Feldmann, Moritz; Feuerpfeil, Moritz; Jacoby, Janusch; Schröter, Valentin; Sievers, Bjarne; Spranger, Moritz; Stadlinger, Simon; Wullenweber, Paul; Cohen, Sarel; Doskoč, Vanja; Friedrich, Tobias Fine-Grained Localization, Classification and Segmentation of Lungs with Various DiseasesCVPR Workshop on Fine-Grained Visual Categorization (FGVC@CVPR) 2021
The fine-grained localization and classification of various lung abnormalities is a challenging yet important task for combating diseases and, also, pandemics. In this paper, we present one way to detect and classify abnormalities within chest X-ray scans. In particular, we investigate the use of binary image classification (to distinguish between healthy and infected chests) and the weighted box fusion (which constructs a detection box using the proposed boxes within range). We observe that both methods increase the performance of a base model significantly. Furthermore, we improve state of the art on lung segmentation, even in the presence of abnormalities. We do so using transfer learning to fine-tune a UNet model on the Montgomery and Shenzhen datasets. In our experiments, we compare standard augmentations (like crop, pad, rotate, warp, zoom, brightness, and contrast variations) to more complex ones (for example, block masking and diffused noise augmentations). This way, we obtain a state-of-the-art model with a dice score of 97.9%. In particular, we show that simple augmentations outperform complex ones in our setting.
Böther, Maximilian; Schiller, Leon; Fischbeck, Philipp; Molitor, Louise; Krejca, Martin S.; Friedrich, Tobias Evolutionary Minimization of Traffic CongestionGenetic and Evolutionary Computation Conference (GECCO) 2021: 937–945
Best-Paper Award (RWA Track)
Traffic congestion is a major issue that can be solved by suggesting drivers alternative routes they are willing to take. This concept has been formalized as a strategic routing problem in which a single alternative route is suggested to an existing one. We extend this formalization and introduce the Multiple-Routes problem, which is given a start and a destination and then aims at finding up to \(n\) different routes that the drivers strategically disperse over, minimizing the overall travel time of the system. Due to the NP-hard nature of the problem, we introduce the Multiple-Routes evolutionary algorithm (MREA) as a heuristic solver. We study several mutation and crossover operators and evaluate them on real-world data of the city of Berlin, Germany. We find that a combination of all operators yields the best result, improving the overall travel time by a factor between 1.8 and 3, in the median, compared to all drivers taking the fastest route. For the base case \(n=2\), we compare our MREA to the highly tailored optimal solver by Bläsius etal. [ATMOS 2020] and show that, in the median, our approach finds solutions of quality at least \(99.69\%\) of an optimal solution while only requiring \(40\%\) of the time.
Doerr, Benjamin; Kötzing, Timo Lower Bounds from Fitness Levels Made EasyGenetic and Evolutionary Computation Conference (GECCO) 2021: 1142–1150
One of the first and easy to use techniques for proving run time bounds for evolutionary algorithms is the so-called method of fitness levels by Wegener. It uses a partition of the search space into a sequence of levels which are traversed by the algorithm in increasing order, possibly skipping levels. An easy, but often strong upper bound for the run time can then be derived by adding the reciprocals of the probabilities to leave the levels (or upper bounds for these). Unfortunately, a similarly effective method for proving lower bounds has not yet been established. The strongest such method, proposed by Sudholt (2013), requires a careful choice of the viscosity parameters gamma. In this paper we present two new variants of the method, one for upper and one for lower bounds. Besides the level leaving probabilities, they only rely on the probabilities that levels are visited at all. We show that these can be computed or estimated without greater difficulties and apply our method to reprove the following known results in an easy and natural way. (i) The precise run time of the (1+1) EA on LeadingOnes. (ii) A lower bound for the run time of the (1+1) EA on OneMax, tight apart from an O(n) term. (iii) A lower bound for the run time of the (1+1) EA on long k-paths.
Wood, Andrew; Hershcovitch, Moshik; Waddington, Daniel; Cohen, Sarel; Chin, Peter Non-Volatile Memory Accelerated Posterior EstimationHigh Performance and Embedded Computing (HPEC) 2021
Bayesian inference allows machine learning models to express uncertainty. Current machine learning models use only a single learnable parameter combination when making predictions, and as a result are highly overconfident when their predictions are wrong. To use more learnable parameter combinations efficiently, these samples must be drawn from the posterior distribution. Unfortunately computing the posterior directly is infeasible, so often researchers approximate it with a well known distribution such as a Gaussian. In this paper, we show that through the use of high-capacity persistent storage, models whose posterior distribution was too big to approximate are now feasible, leading to improved predictions in downstream tasks.
Wood, Andrew; Hershcovitch, Moshik; Waddington, Daniel; Cohen, Sarel; Wolf, Meredith; Suh, Hongjun; Zong, Weiyu; Chin, Peter Non-Volatile Memory Accelerated Geometric Multi-Scale Resolution AnalysisHigh Performance and Embedded Computing (HPEC) 2021: 1–7
Dimensionality reduction algorithms are standard tools in a researcher's toolbox. Dimensionality reduction algorithms are frequently used to augment downstream tasks such as machine learning, data science, and also are exploratory methods for understanding complex phenomena. For instance, dimensionality reduction is commonly used in Biology as well as Neuroscience to understand data collected from biological subjects. However, dimensionality reduction techniques are limited by the von-Neumann architectures that they execute on. Specifically, data intensive algorithms such as dimensionality reduction techniques often require fast, high capacity, persistent memory which historically hardware has been unable to provide at the same time. In this paper, we present a re-implementation of an existing dimensionality reduction technique called Geometric Multi-Scale Resolution Analysis (GMRA) which has been accelerated via novel persistent memory technology called Memory Centric Active Storage (MCAS). Our implementation uses a specialized version of MCAS called PyMM that provides native support for Python datatypes including NumPy arrays and PyTorch tensors. We compare our PyMM implementation against a DRAM implementation, and show that when data fits in DRAM, PyMM offers competitive runtimes. When data does not fit in DRAM, our PyMM implementation is still able to process the data.
Friedrich, Tobias; Göbel, Andreas; Krejca, Martin S.; Pappik, Marcus A spectral independence view on hard spheres via block dynamicsInternational Colloquium on Automata, Languages and Programming (ICALP) 2021: 66:1–66:15
The hard-sphere model is one of the most extensively studied models in statistical physics. It describes the continuous distribution of spherical particles, governed by hard-core interactions. An important quantity of this model is the normalizing factor of this distribution, called the partition function. We propose a Markov chain Monte Carlo algorithm for approximating the grand-canonical partition function of the hard-sphere model in \(d\) dimensions. Up to a fugacity of \( \lambda < \text{e}/2^d\), the runtime of our algorithm is polynomial in the volume of the system. This covers the entire known real-valued regime for the uniqueness of the Gibbs measure. Key to our approach is to define a discretization that closely approximates the partition function of the continuous model. This results in a discrete hard-core instance that is exponential in the size of the initial hard-sphere model. Our approximation bound follows directly from the correlation decay threshold of an infinite regular tree with degree equal to the maximum degree of our discretization. To cope with the exponential blow-up of the discrete instance we use clique dynamics, a Markov chain that was recently introduced in the setting of abstract polymer models. We prove rapid mixing of clique dynamics up to the tree threshold of the univariate hard-core model. This is achieved by relating clique dynamics to block dynamics and adapting the spectral expansion method, which was recently used to bound the mixing time of Glauber dynamics within the same parameter regime.
Lagodzinski, J. A. Gregor; Göbel, Andreas; Casel, Katrin; Friedrich, Tobias On Counting (Quantum-)Graph Homomorphisms in Finite Fields of Prime OrderInternational Colloquium on Automata, Languages and Programming (ICALP) 2021: 91:1–91:15
We study the problem of counting the number of homomorphisms from an input graph \(G\) to a fixed (quantum) graph \(\bar{H}\) in any finite field of prime order \(\mathbb{Z}_p\). The subproblem with graph \(H\) was introduced by Faben and Jerrum [ToC'15] and its complexity is still uncharacterised despite active research, e.g. the very recent work of Focke, Goldberg, Roth, and Zivný [SODA'21]. Our contribution is threefold. First, we introduce the study of quantum graphs to the study of modular counting homomorphisms. We show that the complexity for a quantum graph \(\bar{H}\) collapses to the complexity criteria found at dimension 1: graphs. Second, in order to prove cases of intractability we establish a further reduction to the study of bipartite graphs. Lastly, we establish a dichotomy for all bipartite \( (K_3,3 \ e, domino) \)-free graphs by a thorough structural study incorporating both local and global arguments. This result subsumes all results on bipartite graphs known for all prime moduli and extends them significantly. Even for the subproblem with \(p=2\) this establishes new results.
Casel, Katrin; Schmid, Markus L. Fine-Grained Complexity of Regular Path QueriesInternational Conference on Database Theory (ICDT) 2021: 19:1–19:20
A regular path query (RPQ) is a regular expression \(q\) that returns all node pairs \((u, v)\) from a graph database that are connected by an arbitrary path labelled with a word from \(L(q)\). The obvious algorithmic approach to RPQ-evaluation (called PG-approach), i.e., constructing the product graph between an NFA for \(q\) and the graph database, is appealing due to its simplicity and also leads to efficient algorithms. However, it is unclear whether the PG-approach is optimal. We address this question by thoroughly investigating which upper complexity bounds can be achieved by the PG-approach, and we complement these with conditional lower bounds (in the sense of the fine-grained complexity framework). A special focus is put on enumeration and delay bounds, as well as the data complexity perspective. A main insight is that we can achieve optimal (or near optimal) algorithms with the PG-approach, but the delay for enumeration is rather high (linear in the database). We explore three successful approaches towards enumeration with sub-linear delay: super-linear preprocessing, approximations of the solution sets, and restricted classes of RPQs.
Krogmann, Simon; Lenzner, Pascal; Molitor, Louise; Skopalik, Alexander Two-Stage Facility Location Games with Strategic Clients and FacilitiesInternational Joint Conference on Artificial Intelligence (IJCAI) 2021: 292–298
We consider non-cooperative facility location games where both facilities and clients act strategically and heavily influence each other. This contrasts established game-theoretic facility location models with non-strategic clients that simply select the closest opened facility. In our model, every facility location has a set of attracted clients and each client has a set of shopping locations and a weight that corresponds to her spending capacity. Facility agents selfishly select a location for opening their facility to maximize the attracted total spending capacity, whereas clients strategically decide how to distribute their spending capacity among the opened facilities in their shopping range. We focus on a natural client behavior similar to classical load balancing: our selfish clients aim for a distribution that minimizes their maximum waiting times for getting serviced, where a facility’s waiting time corresponds to its total attracted client weight. We show that subgame perfect equilibria exist and give almost tight constant bounds on the Price of Anarchy and the Price of Stability, which even hold for a broader class of games with arbitrary client behavior. Since facilities and clients influence each other, it is crucial for the facilities to anticipate the selfish clients’ behavior when selecting their location. For this, we provide an efficient algorithm that also implies an efficient check for equilibrium. Finally, we show that computing a socially optimal facility placement is NP-hard and that this result holds for all feasible client weight distributions.
Bläsius, Thomas; Fischbeck, Philipp; Gottesbüren, Lars; Hamann, Michael; Heuer, Tobias; Spinner, Jonas; Weyand, Christopher; Wilhelm, Marcus PACE Solver Description: The KaPoCE Exact Cluster Editing AlgorithmInternational Symposium on Parameterized and Exact Computation (IPEC) 2021: 27:1–27:3
The cluster editing problem is to transform an input graph into a cluster graph by performing a minimum number of edge editing operations. A cluster graph is a graph where each connected component is a clique. An edit operation can be either adding a new edge or removing an existing edge. In this write-up we outline the core techniques used in the exact cluster editing algorithm of the KaPoCE framework (contains also a heuristic solver), submitted to the exact track of the 2021 PACE challenge.
Bläsius, Thomas; Fischbeck, Philipp; Gottesbüren, Lars; Hamann, Michael; Heuer, Tobias; Spinner, Jonas; Weyand, Christopher; Wilhelm, Marcus PACE Solver Description: KaPoCE: A Heuristic Cluster Editing AlgorithmInternational Symposium on Parameterized and Exact Computation (IPEC) 2021: 31:1–31:4
The cluster editing problem is to transform an input graph into a cluster graph by performing a minimum number of edge editing operations. A cluster graph is a graph where each connected component is a clique. An edit operation can be either adding a new edge or removing an existing edge. In this write-up we outline the core techniques used in the heuristic cluster editing algorithm of the Karlsruhe and Potsdam Cluster Editing (KaPoCE) framework, submitted to the heuristic track of the 2021 PACE challenge.
Bläsius, Thomas; Friedrich, Tobias; Krejca, Martin S.; Molitor, Louise The Impact of Geometry on Monochrome Regions in the Flip Schelling ProcessInternational Symposium on Algorithms and Computation, (ISAAC) 2021 2021: 29:1–29:17
Schelling’s classical segregation model gives a coherent explanation for the wide-spread phenomenon of residential segregation. We consider an agent-based saturated open-city variant, the Flip-Schelling-Process (FSP), in which agents, placed on a graph, have one out of two types and, based on the predominant type in their neighborhood, decide whether to changes their types; similar to a new agent arriving as soon as another agent leaves the vertex. We investigate the probability that an edge u,vis monochrome, i.e., that both vertices u and v have the same type in the FSP, and we provide a general framework for analyzing the influence of the underlying graph topology on residential segregation. In particular, for two adjacent vertices, we show that a highly decisive common neighborhood, i.e., a common neighborhood where the absolute value of the difference between the number of vertices with different types is high, supports segregation and moreover, that large common neighborhoods are more decisive. As an application, we study the expected behavior of the FSP on two common random graph models with and without geometry: (1) For random geometric graphs, we show that the existence of an edge u,v makes a highly decisive common neighborhood for u and v more likely. Based on this, we prove the existence of a constant c > 0 such that the expected fraction of monochrome edges after the FSP is at least 1/2 + c. (2) For Erdős–Rényi graphs we show that large common neighborhoods are unlikely and that the expected fraction of monochrome edges after the FSP is at most 1/2 + o (1). Our results indicate that the cluster structure of the underlying graph has a significant impact on the obtained segregation strength.
Antoniadis, Antonios; Kumar, Gunjan; Kumar, Nikhil Skeletons and Minimum Energy SchedulingInternational Symposium on Algorithms and Computation (ISAAC) 2021: 51:1–51:16
Consider the problem where \(n\) jobs, each with a release time, a deadline and a required processing time are to be feasibly scheduled in a single- or multi-processor setting so as to minimize the total energy consumption of the schedule. A processor has two available states: a sleep state where no energy is consumed but also no processing can take place, and an active state which consumes energy at a rate of one, and in which jobs can be processed. Transitioning from the active to the sleep does not incur any further energy cost, but transitioning from the sleep to the active state requires \(q\) energy units. Jobs may be preempted and (in the multi-processor case) migrated. The single-processor case of the problem is known to be solvable in polynomial time via an involved dynamic program, whereas the only known approximation algorithm for the multi-processor case attains an approximation factor of 3 and is based on rounding the solution to a linear programming relaxation of the problem. In this work, we present efficient and combinatorial approximation algorithms for both the single- and the multi-processor setting. Before, only an algorithm based on linear programming was known for the multi-processor case. Our algorithms build upon the concept of a skeleton, a basic (and not necessarily feasible) schedule that captures the fact that some processor(s) must be active at some time point during an interval. Finally, we further demonstrate the power of skeletons by providing an 2-approximation algorithm for the multiprocessor case, thus improving upon the recent breakthrough 3-approximation result. Our algorithm is based on a novel rounding scheme of a linear-programming relaxation of the problem which incorporates skeletons.
Casel, Katrin; Friedrich, Tobias; Issac, Davis; Klodt, Nicolas; Seifert, Lars; Zahn, Arthur A Color-blind 3-Approximation for Chromatic Correlation Clustering and Improved HeuristicsKnowledge Discovery and Data Mining (KDD) 2021: 882–891
Chromatic Correlation Clustering (CCC) models clustering of objects with categorical pairwise relationships. The model can be viewed as clustering the vertices of a graph with edge-labels (colors). Bonchi et al. [KDD 2012] introduced it as a natural generalization of the well studied problem Correlation Clustering (CC), motivated by real-world applications from data-mining, social networks and bioinformatics. We give theoretical as well as practical contributions to the study of CCC. Our main theoretical contribution is an alternative analysis of the famous Pivot algorithm for CC. We show that, when simply run color-blind, Pivot is also a linear time 3-approximation for CCC. The previous best theoretical results for CCC were a 4-approximation with a high-degree polynomial runtime and a linear time 11-approximation, both by Anava et al. [WWW 2015]. While this theoretical result justifies Pivot as a baseline comparison for other heuristics, its blunt color-blindness performs poorly in practice. We develop a color-sensitive, practical heuristic we call Greedy Expansion that empirically outperforms all heuristics proposed for CCC so far, both on real-world and synthetic instances. Further, we propose a novel generalization of CCC allowing for multi-labelled edges. We argue that it is more suitable for many of the real-world applications and extend our results to this model.
Bilò, Davide; Cohen, Sarel; Friedrich, Tobias; Schirneck, Martin Space-Efficient Fault-Tolerant Diameter OraclesMathematical Foundations of Computer Science (MFCS) 2021: 18:1–18:16
We design \(f\)-edge fault-tolerant diameter oracles (\(f\)-FDO, or simply FDO if \(f=1\)). For a given directed or undirected and possibly edge-weighted graph \(G\) with \(n\) vertices and \(m\) edges and a positive integer \(f\), we preprocess the graph and construct a data structure that, when queried with a set \(F\) of edges, where \(|F| \leq f\), returns the diameter of \(G - F\). An \(f\)-FDO has stretch \(\sigma \geq 1\) if the returned value \(\widehat D\) satisfies \(\operatornamediam(G - F) leq widehat D leq sigma \operatornamediam(G - F)\). For the case of a single edge failure (\(f=1\)) in an unweighted directed graph, there exists an approximate FDO by Henzinger et al. [ITCS 2017] with stretch \((1+\varepsilon)\), constant query time, space \(O(m)\), and a combinatorial preprocessing time of \(\widetildeO(mn + n^1.5 \sqrt{Dm/\varepsilon})\), where \(D\) is the diameter. We present a near-optimal FDO with the same stretch, query time, and space. It has a preprocessing time of \(\widetildeO(mn + n^2/\varepsilon)\), which is better for any constant \(\varepsilon > 0\). The preprocessing time nearly matches a conditional lower bound for combinatorial algorithms, also by Henzinger et al. When using fast matrix multiplication instead, we achieve a preprocessing time of \(\widetildeO(n^2.5794 + n^2/\varepsilon)\). We further prove an information-theoretic lower bound showing that any FDO with stretch better than \(3/2\) requires \(\Omega(m)\) bits of space. Thus, for constant \(0 < \varepsilon < 3/2\), our combinatorial \((1+ \varepsilon)\)-approximate FDO is near-optimal in all the parameters. In the case of multiple edge failures (\(f>1\)) in undirected graphs with non-negative edge weights, we give an \(f\)-FDO with stretch \((f+2)\), query time \(O(f^2\log^2{n})\), \(\widetildeO(fn)\) space, and preprocessing time \(\widetildeO(fm)\). We complement this with a lower bound excluding any finite stretch in \(o(fn)\) space. Many real-world networks have polylogarithmic diameter. We show that for those graphs and up to \(f = o(\log n/ \log\log n)\) failures one can swap approximation for query time and space. We present an exact combinatorial \(f\)-FDO with preprocessing time \(mn^{1+o(1)}\), query time \(n^{o(1)}\), and space \(n^{2+o(1)}\). With fast matrix multiplication, the preprocessing time can be improved to \(n^{\omega+o(1)}\), where \(\omega < 2.373\) is the matrix multiplication exponent.
Kißig, Otto; Taraz, Martin; Cohen, Sarel; Doskoč, Vanja; Friedrich, Tobias Drug Repurposing for Multiple COVID Strains using Collaborative FilteringICLR Workshop on Machine Learning for Preventing and Combating Pandemics (MLPCP@ICLR) 2021
The ongoing COVID-19 pandemic demands for a swift discovery of suitable treatments. The development of completely new compounds for such a novel disease is a challenging, time intensive process. This amplifies the relevance of drug repurposing, a technique where existing drugs are used to treat other diseases. A common bioinformatical approach to this is based on knowledge graphs, which compile relationships between drugs, diseases, genes and other biomedical entities. Then, graph neural networks (GNNs) are used for the drug repurposing task as they provide a good link prediction performance on such knowledge graphs. Building on state-of-the-art GNN research, Doshi & Chepuri (2020) construct the remarkable model DR-COVID. We re-implement their model and extend the approach to perform significantly better. We propose and evaluate several strategies for the aggregation of link predictions into drug recommendation rankings. With the help of clustering of similar target diseases we improve the model by a substantial margin, compiling a top-100 ranking of candidates including 32 currently being in COVID-19-related clinical trials. Regarding the re-implementation, we offer more flexibility in the selection of the graph neighborhood sizes fed into the model and reduce the training time significantly by making use of data parallelism.
Atzberger, Daniel; Cech, Tim; de la Haye, Merlin; Söchting, Maximilian; Scheibel, Willy; Limberger, Daniel; Döllner, Jürgen Software Forest: A Visualization of Semantic Similarities in Source Code using a Tree MetaphorProceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications -- Volume 3 IVAPP 2021: 112–122
Software visualization techniques provide effective means for program comprehension tasks as they allow developers to interactively explore large code bases. A frequently encountered task during software development is the detection of source code files of similar semantic. To assist this task we present Software Forest, a novel 2.5D software visualization that enables interactive exploration of semantic similarities within a software system, illustrated as a forest. The underlying layout results from the analysis of the vocabulary of the software documents using Latent Dirichlet Allocation and Multidimensional Scaling and therefore reflects the semantic similarity between source code files. By mapping properties of a software entity, e.g., size metrics or trend data, to visual variables encoded by various, figurative tree meshes, aspects of a software system can be displayed. This concept is complemented with implementation details as well as a discussion on applications.
Friedrich, Tobias; Neumann, Frank; Rothenberger, Ralf; Sutton, Andrew M. Solving Non-Uniform Planted and Filtered Random SAT Formulas GreedilyTheory and Applications of Satisfiability Testing (SAT) 2021: 188–206
Recently, there has been an interest in studying non-uniform random \(k\)-satisfiability (\(k\)-SAT) models in order to address the non-uniformity of formulas arising from real-world applications. While uniform random \(k\)-SAT has been extensively studied from both a theoretical and experimental perspective, understanding the algorithmic complexity of heterogeneous distributions is still an open challenge. When a sufficiently dense formula is guaranteed to be satisfiable by conditioning or a planted assignment, it is well-known that uniform random \(k\)-SAT is easy on average. We generalize this result to the broad class of non-uniform random \(k\)-SAT models that are characterized only by an ensemble of distributions over variables with a mild balancing condition. This balancing condition rules out extremely skewed distributions in which nearly half the variables occur less frequently than a small constant fraction of the most frequent variables, but generalizes recently studied non-uniform \(k\)-SAT distributions such as power-law and geometric formulas. We show that for all formulas generated from this model of at least logarithmic densities, a simple greedy algorithm can find a solution with high probability. As a side result we show that the total variation distance between planted and filtered (conditioned on satisfiability) models is \(o(1)\) once the planted model produces formulas with a unique solution with probability \(1-o(1)\). This holds for all random \(k\)-SAT models where the signs of variables are drawn uniformly and independently at random.
Bläsius, Thomas; Friedrich, Tobias; Katzmann, Maximilian Force-Directed Embedding of Scale-Free Networks in the Hyperbolic PlaneSymposium on Experimental Algorithms (SEA) 2021: 22:1–22:18
Force-directed drawing algorithms are the most commonly used approach to visualize networks. While they are usually very robust, the performance of Euclidean spring embedders decreases if the graph exhibits the high level of heterogeneity that typically occurs in scale-free real-world networks. As heterogeneity naturally emerges from hyperbolic geometry (in fact, scale-free networks are often perceived to have an underlying hyperbolic geometry), it is natural to embed them into the hyperbolic plane instead. Previous techniques that produce hyperbolic embeddings usually make assumptions about the given network, which (if not met) impairs the quality of the embedding. It is still an open problem to adapt force-directed embedding algorithms to make use of the heterogeneity of the hyperbolic plane, while also preserving their robustness. We identify fundamental differences between the behavior of spring embedders in Euclidean and hyperbolic space, and adapt the technique to take advantage of the heterogeneity of the hyperbolic plane.
Bläsius, Thomas; Friedrich, Tobias; Göbel, Andreas; Levy, Jordi; Rothenberger, Ralf The Impact of Heterogeneity and Geometry on the Proof Complexity of Random SatisfiabilitySymposium on Discrete Algorithms (SODA) 2021: 42–53
Satisfiability is considered the canonical NP-complete problem and is used as a starting point for hardness reductions in theory, while in practice heuristic SAT solving algorithms can solve large-scale industrial SAT instances very efficiently. This disparity between theory and practice is believed to be a result of inherent properties of industrial SAT instances that make them tractable. Two characteristic properties seem to be prevalent in the majority of real-world SAT instances, heterogeneous degree distribution and locality. To understand the impact of these two properties on SAT, we study the proof complexity of random k-SAT models that allow to control heterogeneity and locality. Our findings show that heterogeneity alone does not make SAT easy as heterogeneous random k-SAT instances have superpolynomial resolution size. This implies intractability of these instances for modern SAT-solvers. On the other hand, modeling locality with an underlying geometry leads to small unsatisfiable subformulas, which can be found within polynomial time. A key ingredient for the result on geometric random k-SAT can be found in the complexity of higher-order Voronoi diagrams. As an additional technical contribution, we show an upper bound on the number of non-empty Voronoi regions, that holds for points with random positions in a very general setting. In particular, it covers arbitrary p-norms, higher dimensions, and weights affecting the area of influence of each point multiplicatively. Our bound is linear in the total weight. This is in stark contrast to quadratic lower bounds for the worst case.
Friedemann, Wilhelm; Friedrich, Tobias; Gawendowicz, Hans; Lenzner, Pascal; Melnichenko, Anna; Peters, Jannik; Stephan, Daniel; Vaichenker, Michael Efficiency and Stability in Euclidean Network DesignSymposium on Parallelism in Algorithms and Architectures (SPAA) 2021: 232–242
Network Design problems typically ask for a minimum cost sub-network from a given host network. This classical point-of-view assumes a central authority enforcing the optimum solution. But how should networks be designed to cope with selfish agents that own parts of the network? In this setting, minimum cost networks may be very unstable in that agents will deviate from a proposed solution if this decreases their individual cost. Hence, designed networks should be both efficient in terms of total cost and stable in terms of the agents' willingness to accept the network. We study this novel type of Network Design problem by investigating the creation of \($(\beta,\gamma)$\)-networks, that are in \($\beta$\)-approximate Nash equilibrium and have a total cost of at most \($\gamma$\) times the optimal cost, for the recently proposed Euclidean Generalized Network Creation Game by Bilò et al.SPAA2019. There, \($n$\) agents corresponding to points in Euclidean space create costly edges among themselves to optimize their centrality in the created network. Our main result is a simple \($\mathcal{O(n^2)$\)-time algorithm that computes a \($(\beta,\beta)$\)-network with low \($\beta$\) for any given set of points. Moreover, on integer grid point sets or random point sets our algorithm achieves a low constant \($\beta$\). Besides these results for the Euclidean model, we discuss a generalization of our algorithm to instances with arbitrary, even non-metric, edge lengths. Moreover, in contrast to these algorithmic results, we show that no such positive results are possible when focusing on either optimal networks, i.e., \($(\beta,1)$\)-networks, or perfectly stable networks, i.e., \($(1,\gamma)$\)-networks, as in both cases NP-hard problems arise, there exist instances with very unstable optimal networks, and there are instances for perfectly stable networks with high total cost. Along the way, we significantly improve several results from Bilò et al. and we asymptotically resolve their conjecture about the Price of Anarchy by providing a tight bound.
Kißig, Otto; Taraz, Martin; Cohen, Sarel; Friedrich, Tobias Drug Repurposing Using Link Prediction on Knowledge GraphsICML Workshop on Computational Biology (WCB@ICML) 2021: 742–753
The active global SARS-CoV-2 pandemic caused more than 167 million cases and 3.4 million deaths worldwide. The development of completely new drugs for such a novel disease is a challenging, time intensive process and despite researchers around the world working on this task, no effective treatments have been developed yet. This emphasizes the importance of drug repurposing, where treatments are found among existing drugs that are meant for different diseases. A common approach to this is based on knowledge graphs, that condense relationships between entities like drugs, diseases and genes. Graph neural networks (GNNs) can then be used for the task at hand by predicting links in such knowledge graphs. Expanding on state-of-the-art GNN research, Doshi et al. recently developed the Dr-COVID model. We further extend their work using additional output interpretation strategies. The best aggregation strategy derives a top-100 ranking of candidate drugs, 32 of which currently being in COVID-19-related clinical trials. Moreover, we present an alternative application for the model, the generation of additional candidates based on a given pre-selection of drug candidates using collaborative filtering. In addition, we improved the implementation of the Dr-COVID model by significantly shortening the inference and pre-processing time by exploiting data-parallelism. As drug repurposing is a task that requires high computation and memory resources, we further accelerate the post-processing phase using a new emerging hardware—we propose a new approach to leverage the use of high-capacity Non-Volatile Memory for aggregate drug ranking.