Cohen, Sarel; Hershcovitch, Moshik; Taraz, Martin; Kißig, Otto; Issac, Davis; Wood, Andrew; Waddington, Daniel; Chin, Peter; Friedrich, Tobias Improved And Optimized Drug Repurposing For The SARS-CoV-2 PandemicPlos One 2023
The active global SARS-CoV-2 pandemic caused more than 426 million cases and 5.8 million deaths worldwide. The development of completely new drugs for such a novel disease is a challenging, time intensive process. Despite researchers around the world working on this task, no effective treatments have been developed yet. This emphasizes the importance of drug repurposing, where treatments are found among existing drugs that are meant for different diseases. A common approach to this is based on knowledge graphs, that condense relationships between entities like drugs, diseases and genes. Graph neural networks (GNNs) can then be used for the task at hand by predicting links in such knowledge graphs. Expanding on state-of-the-art GNN research, Doshi et al. recently developed the Dr-COVID model. We further extend their work using additional output interpretation strategies. The best aggregation strategy derives a top-100 ranking of 8,070 candidate drugs, 32 of which are currently being tested in COVID-19-related clinical trials. Moreover, we present an alternative application for the model, the generation of additional candidates based on a given pre-selection of drug candidates using collaborative filtering. In addition, we improved the implementation of the Dr-COVID model by significantly shortening the inference and pre-processing time by exploiting data-parallelism. As drug repurposing is a task that requires high computation and memory resources, we further accelerate the post-processing phase using a new emerging hardware - we propose a new approach to leverage the use of high-capacity Non-Volatile Memory for aggregate drug ranking.
Quinzan, Francesco; Khanna, Rajiv; Hershcovitch, Moshik; Cohen, Sarel; Waddington, Daniel G.; Friedrich, Tobias; Mahoney, Michael W. Fast Feature Selection with Fairness ConstraintsArtificial Intelligence and Statistics (AISTATS) 2023: 7800–7823
We study the fundamental problem of selecting optimal features for model construction. This problem is computationally challenging on large datasets, even with the use of greedy algorithm variants. To address this challenge, we extend the adaptive query model, recently proposed for the greedy forward selection for submodular functions, to the faster paradigm of Orthogonal Matching Pursuit for non-submodular functions. The proposed algorithm achieves exponentially fast parallel run time in the adaptive query model, scaling much better than prior work. Furthermore, our extension allows the use of downward-closed constraints, which can be used to encode certain fairness criteria into the feature selection process. We prove strong approximation guarantees for the algorithm based on standard assumptions. These guarantees are applicable to many parametric models, including Generalized Linear Models. Finally, we demonstrate empirically that the proposed algorithm competes favorably with state-of-the-art techniques for feature selection, on real-world and synthetic datasets.
Bilò, Davide; Choudhary, Keerti; Cohen, Sarel; Friedrich, Tobias; Krogmann, Simon; Schirneck, Martin Fault-Tolerant ST-Diameter OraclesInternational Colloquium on Automata, Languages and Programming (ICALP) 2023: 24:1–24:20
We study the problem of estimating the \(ST\)-diameter of a graph that is subject to a bounded number of edge failures. An \(f\)-edge fault-tolerant \(ST\)-diameter oracle (\(f\)-FDO-\(ST\)) is a data structure that preprocesses a given graph \(G\), two sets of vertices \(S,T\), and positive integer \(f\). When queried with a set \(F\) of at most \(f\) edges, the oracle returns an estimate \(\widehat{D}\) of the \(ST\)-diameter \(\mathrm{diam(G-F,S,T)\), the maximum distance between vertices in \(S\) and \(T\) in \(G-F\). The oracle has stretch \(\sigma \geq 1\) if \(\mathrm{diam(G-F,S,T) leq \widehat{D leq sigma \mathrm{diam(G-F,S,T)\). If \(S\) and \(T\) both contain all vertices, the data structure is called an \(f\)-edge fault-tolerant diameter oracle (\(f\)-FDO). An \(f\)-edge fault-tolerant distance sensitivity oracles (\(f\)-DSO) estimates the pairwise graph distances under up to \(f\) failures. We design new \(f\)-FDOs and \(f\)-FDO-\(ST\)s by reducing their construction to that of all-pairs and single-source \(f\)-DSOs. We obtain several new tradeoffs between the size of the data structure, stretch guarantee, query and preprocessing times for diameter oracles by combining our black-box reductions with known results from the literature. We also provide an information-theoretic lower bound on the space requirement of approximate \(f\)-FDOs. We show that there exists a family of graphs for which any \(f\)-FDO with sensitivity \(f \ge 2\) and stretch less than \(5/3\) requires \(\Omega(n^{3/2})\) bits of space, regardless of the query time.
Bilò, Davide; Cohen, Sarel; Friedrich, Tobias; Gawendowicz, Hans; Klodt, Nicolas; Lenzner, Pascal; Skretas, George Temporal Network Creation GamesInternational Joint Conference on Artificial Intelligence (IJCAI) 2023: 2511–2519
Most networks are not static objects, but instead they change over time. This observation has sparked rigorous research on temporal graphs within the last years. In temporal graphs, we have a fixed set of nodes and the connections between them are only available at certain time steps. This gives rise to a plethora of algorithmic problems on such graphs, most prominently the problem of finding temporal spanners, i.e., the computation of subgraphs that guarantee all pairs reachability via temporal paths. To the best of our knowledge, only centralized approaches for the solution of this problem are known. However, many real-world networks are not shaped by a central designer but instead they emerge and evolve by the interaction of many strategic agents. This observation is the driving force of the recent intensive research on game-theoretic network formation models. In this work we bring together these two recent research directions: temporal graphs and game-theoretic network formation. As a first step into this new realm, we focus on a simplified setting where a complete temporal host graph is given and the agents, corresponding to its nodes, selfishly create incident edges to ensure that they can reach all other nodes via temporal paths in the created network. This yields temporal spanners as equilibria of our game. We prove results on the convergence to and the existence of equilibrium networks, on the complexity of finding best agent strategies, and on the quality of the equilibria. By taking these first important steps, we uncover challenging open problems that call for an in-depth exploration of the creation of temporal graphs by strategic agents.
Cohen, Sarel; Fischbeck, Philipp; Friedrich, Tobias; Krejca, Martin S. The Common-Neighbors Metric is Noise-Robust and Reveals Substructures of Real-World NetworksPacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2023: 67–79
Real-world networks typically display a complex structure that is hard to explain by a single model. A common approach is to partition the edges of the network into disjoint simpler structures. An important property in this context is locality—incident vertices usually have many common neighbors. This allows to classify edges into two groups, based on the number of the common neighbors of their incident vertices. Formally, this is captured by the common-neighbors (CN) metric, which forms the basis of many metrics for detecting outlier edges. Such outliers can be interpreted as noise or as a substructure. We aim to understand how useful the metric is, and empirically analyze several scenarios. We randomly insert outlier edges into real-world and generated graphs with high locality, and measure the metric accuracy for partitioning the combined edges. In addition, we use the metric to decompose real-world networks, and measure properties of the partitions. Our results show that the CN metric is a very good classifier that can reliably detect noise up to extreme levels (\(83\%\) noisy edges). We also provide mathematically rigorous analyses on special random-graph models. Last, we find the CN metric consistently decomposes real-world networks into two graphs with very different structures.
Angrick, Sebastian; Bals, Ben; Casel, Katrin; Cohen, Sarel; Friedrich, Tobias; Hastrich, Niko; Hradilak, Theresa; Issac, Davis; Kißig, Otto; Schmidt, Jonas; Wendt, Leo Solving Directed Feedback Vertex Set by Iterative Reduction to Vertex CoverSymposium on Experimental Algorithms (SEA) 2023: 10:1–10:14
In the Directed Feedback Vertex Set (DFVS) problem, one is given a directed graph \(G = (V,E)\) and wants to find a minimum cardinality set \(S \subseteq V\) such that \(G-S\) is acyclic. DFVS is a fundamental problem in computer science and finds applications in areas such as deadlock detection. The problem was the subject of the 2022 PACE coding challenge. We develop a novel exact algorithm for the problem that is tailored to perform well on instances that are mostly bi-directed. For such instances, we adapt techniques from the well-researched vertex cover problem. Our core idea is an iterative reduction to vertex cover. To this end, we also develop a new reduction rule that reduces the number of not bi-directed edges. With the resulting algorithm, we were able to win third place in the exact track of the PACE challenge. We perform computational experiments and compare the running time to other exact algorithms, in particular to the winning algorithm in PACE. Our experiments show that we outpace the other algorithms on instances that have a low density of uni-directed edges.
Bilò, Davide; Chechik, Shiri; Choudhary, Keerti; Cohen, Sarel; Friedrich, Tobias; Krogmann, Simon; Schirneck, Martin Approximate Distance Sensitivity Oracles in Subquadratic SpaceSymposium on Theory of Computing (STOC) 2023: 1396–1409
An \(f\)-edge fault-tolerant distance sensitive oracle (\(f\)-DSO) with stretch \(\sigma\ge1\) is a data structure that preprocesses a given undirected, unweighted graph \(G\) with \(n\) vertices and \(m\) edges, and a positive integer \(f\). When queried with a pair of vertices \(s,t\) and a set \(F\) of at most \(f\) edges, it returns a \(\sigma\)-approximation of the \(s\)-\(t\)-distance in \(G-F\). We study \(f\)-DSOs that take subquadratic space. Thorup and Zwick [JACM 2015] showed that this is only possible for \(\sigma\geq3\). We present, for any constant \(f\geq1\) and \(\alpha\in(0,\frac{1}{2})\), and any \(\varepsilon>0\), an \(f\)-DSO with stretch \(3+\varepsilon\) that takes \(\tilde{O}(n^{2-\frac{\alpha}{f+1}}/\varepsilon)\cdot{}O(\log n/\varepsilon)^{f+1}\) space and has an \(O(n^\alpha/\varepsilon^2)\) query time. We also give an improved construction for graphs with diameter at most \(D\). For any constant \(k\), we devise an \(f\)-DSO with stretch \(2k-1\) that takes \(O(D^{f+o(1)}n^{1+1/k})\) space and has \(\tilde{O}(D^o(1)})\) query time, with a preprocessing time of \(O(D^{f+o(1)}mn^{1/k})\). Chechik, Cohen, Fiat, and Kaplan [SODA 2017] presented an \(f\)-DSO with stretch \(1{+}\varepsilon\) and preprocessing time \(\tilde{O}_\varepsilon(n^5)\), albeit with a super-quadratic space requirement. We show how to reduce their preprocessing time to \(O(mn^{2})\cdot{}O(\log n/\varepsilon)^{f}\).
Bilò, Davide; Choudhary, Keerti; Cohen, Sarel; Friedrich, Tobias; Krogmann, Simon; Schirneck, Martin Compact Distance Oracles with Large Sensitivity and Low StretchAlgorithms and Data Structures Symposium (WADS) 2023: 149–163
An \(f\)-edge fault-tolerant distance sensitive oracle (\(f\)-DSO) with stretch \(\sigma \geq 1\) is a data-structure that preprocesses an input graph \(G = (V,E)\). When queried with the triple \((s,t,F)\), where \(s, t \in V\) and \(F \subseteq E\) contains at most \(f\) edges of \(G\), the oracle returns an estimate \(\widehat{d_G-F(s,t)\) of the distance \(d_{G-F}(s,t)\) between \(s\) and \(t\) in the graph \(G-F\) such that \(d_{G-F}(s,t) leq \widehat{d_G-F(s,t) leq sigma cdot d_G-F(s,t)\). For any positive integer \(k \ge 2\) and any \(0 < \alpha < 1\), we present an \(f\)-DSO with sensitivity \(f = o(\log n/\log\log n)\), stretch \(2k-1\), space \(O(n^{1+\frac{1}{k+\alpha+o(1)})\), and an \(\widetildeO(n^1+\frac{1}{k - \frac{\alpha}{k(f+1)}})\) query time. Prior to our work, there were only three known \(f\)-DSOs with subquadratic space. The first one by Chechik et al. [Algorithmica 2012] has a stretch of \((8k-2)(f+1)\), depending on \(f\). Another approach is storing an \(f\)-edge fault-tolerant \((2k-1)\)-spanner of \(G\). The bottleneck is the large query time due to the size of any such spanner, which is \(\Omega(n^{1+1/k})\) under the Erdős girth conjecture. Bilò et al. [STOC 2023] gave a solution with stretch \(3+\varepsilon\), query time \(O(n^{\alpha})\) but space \(O(n^{2-\frac{\alpha}{f+1}})\), approaching the quadratic barrier for large sensitivity. In the realm of subquadratic space, our \(f\)-DSOs are the first ones that guarantee, at the same time, large sensitivity, low stretch, and non-trivial query time. To obtain our results, we use the approximate distance oracles of Thorup and Zwick [JACM 2005], and the derandomization of the \(f\)-DSO of Weimann and Yuster [TALG 2013] that was recently given by Karthik and Parter [SODA 2021].