Bilò, Davide; Friedrich, Tobias; Lenzner, Pascal; Lowski, Stefanie; Melnichenko, Anna Selfish Creation of Social NetworksConference on Artificial Intelligence (AAAI) 2021: 5185–5193
Understanding real-world networks has been a core research endeavor throughout the last two decades. Network Creation Games are a promising approach for this from a game-theoretic perspective. In these games, selfish agents corresponding to nodes in a network strategically decide which links to form to optimize their centrality. Many versions have been introduced and analyzed, but none of them fits to modeling the evolution of social networks. In real-world social networks, connections are often established by recommendations from common acquaintances or by a chain of such recommendations. Thus establishing and maintaining a contact with a friend of a friend is easier than connecting to complete strangers. This explains the high clustering, i.e., the abundance of triangles, in real-world social networks. We propose and analyze a network creation model inspired by real-world social networks. Edges are formed in our model via bilateral consent of both endpoints and the cost for establishing and maintaining an edge is proportional to the distance of the endpoints before establishing the connection. We provide results for generic cost functions, which essentially only must be convex functions in the distance of the endpoints without the respective edge. For this broad class of cost functions, we provide many structural properties of equilibrium networks and prove (almost) tight bounds on the diameter, the Price of Anarchy and the Price of Stability. Moreover, as a proof-of-concept we show via experiments that the created equilibrium networks of our model indeed closely mimics real-world social networks. We observe degree distributions that seem to follow a power-law, high clustering, and low diameters. This can be seen as a promising first step towards game-theoretic network creation models that predict networks featuring all core real-world properties.
Berger, Julian; Böther, Maximilian; Doskoč, Vanja; Gadea Harder, Jonathan; Klodt, Nicolas; Kötzing, Timo; Lötzsch, Winfried; Peters, Jannik; Schiller, Leon; Seifert, Lars; Wells, Armin; Wietheger, Simon Learning Languages with Decidable HypothesesComputability in Europe (CiE) 2021: 25–37
In \emph{language learning in the limit, the most common type of hypothesis is to give an enumerator for a language, a \(W\)-index. These hypotheses have the drawback that even the membership problem is undecidable. In this paper, we use a different system which allows for naming arbitrary decidable languages, namely \emph{programs for characteristic functions (called \(C\)-indices). These indices have the drawback that it is now not decidable whether a given hypothesis is even a legal \(C\)-index. In this first analysis of learning with \(C\)-indices, we give a structured account of the learning power of various restrictions employing \(C\)-indices, also when compared with \(W\)-indices. We establish a hierarchy of learning power depending on whether \(C\)-indices are required (a) on all outputs; (b) only on outputs relevant for the class to be learned or (c) only in the limit as final, correct hypotheses. We analyze all these questions also in relation to the mode of data presentation. Finally, we also ask about the relation of semantic versus syntactic convergence and derive the map of pairwise relations for these two kinds of convergence coupled with various forms of data presentation.
Cohen, Sarel; Hershcovitch, Moshik; Taraz, Martin; Kißig, Otto; Wood, Andrew; Waddington, Daniel; Chin, Peter; Friedrich, Tobias Drug Repurposing using Link Prediction on Knowledge Graphs with Applications to Non-Volatile MemoryComplex Networks and their Applications (ComplexNetworks) 2021: 742–753
The active global SARS-CoV-2 pandemic caused more than 167 million cases and 3.4 million deaths worldwide. The development of completely new drugs for such a novel disease is a challenging, time intensive process and despite researchers around the world working on this task, no effective treatments have been developed yet. This emphasizes the importance of \emph{drug repurposing, where treatments are found among existing drugs that are meant for different diseases. A common approach to this is based on \emph{knowledge graphs, that condense relationships between entities like drugs, diseases and genes. Graph neural networks (GNNs) can then be used for the task at hand by predicting links in such knowledge graphs. Expanding on state-of-the-art GNN research, Doshi et al. recently developed the Dr-COVID model. We further extend their work using additional output interpretation strategies. The best aggregation strategy derives a top-100 ranking of candidate drugs, 32 of which currently being in COVID-19-related clinical trials. Moreover, we present an alternative application for the model, the generation of additional candidates based on a given pre-selection of drug candidates using collaborative filtering. In addition, we improved the implementation of the Dr-COVID model by significantly shortening the inference and pre-processing time by exploiting data-parallelism. As drug repurposing is a task that requires high computation and memory resources, we further accelerate the post-processing phase using a new emerging hardware --- we propose a new approach to leverage the use of high-capacity Non-Volatile Memory for aggregate drug ranking.
Berger, Julian; Bleidt, Tibor; Büßemeyer, Martin; Ding, Marcus; Feldmann, Moritz; Feuerpfeil, Moritz; Jacoby, Janusch; Schröter, Valentin; Sievers, Bjarne; Spranger, Moritz; Stadlinger, Simon; Wullenweber, Paul; Cohen, Sarel; Doskoč, Vanja; Friedrich, Tobias Fine-Grained Localization, Classification and Segmentation of Lungs with Various DiseasesCVPR Workshop on Fine-Grained Visual Categorization (FGVC@CVPR) 2021
The fine-grained localization and classification of various lung abnormalities is a challenging yet important task for combating diseases and, also, pandemics. In this paper, we present one way to detect and classify abnormalities within chest X-ray scans. In particular, we investigate the use of binary image classification (to distinguish between healthy and infected chests) and the weighted box fusion (which constructs a detection box using the proposed boxes within range). We observe that both methods increase the performance of a base model significantly. Furthermore, we improve state of the art on lung segmentation, even in the presence of abnormalities. We do so using transfer learning to fine-tune a UNet model on the Montgomery and Shenzhen datasets. In our experiments, we compare standard augmentations (like crop, pad, rotate, warp, zoom, brightness, and contrast variations) to more complex ones (for example, block masking and diffused noise augmentations). This way, we obtain a state-of-the-art model with a dice score of 97.9%. In particular, we show that simple augmentations outperform complex ones in our setting.
Böther, Maximilian; Schiller, Leon; Fischbeck, Philipp; Molitor, Louise; Krejca, Martin S.; Friedrich, Tobias Evolutionary Minimization of Traffic CongestionGenetic and Evolutionary Computation Conference (GECCO) 2021: 937–945
Best-Paper Award (RWA Track)
Traffic congestion is a major issue that can be solved by suggesting drivers alternative routes they are willing to take. This concept has been formalized as a strategic routing problem in which a single alternative route is suggested to an existing one. We extend this formalization and introduce the Multiple-Routes problem, which is given a start and a destination and then aims at finding up to \(n\) different routes that the drivers strategically disperse over, minimizing the overall travel time of the system. Due to the NP-hard nature of the problem, we introduce the Multiple-Routes evolutionary algorithm (MREA) as a heuristic solver. We study several mutation and crossover operators and evaluate them on real-world data of the city of Berlin, Germany. We find that a combination of all operators yields the best result, improving the overall travel time by a factor between 1.8 and 3, in the median, compared to all drivers taking the fastest route. For the base case \(n=2\), we compare our MREA to the highly tailored optimal solver by Bläsius etal. [ATMOS 2020] and show that, in the median, our approach finds solutions of quality at least \(99.69\%\) of an optimal solution while only requiring \(40\%\) of the time.
Krogmann, Simon; Lenzner, Pascal; Molitor, Louise; Skopalik, Alexander Two-Stage Facility Location Games with Strategic Clients and FacilitiesInternational Joint Conference on Artificial Intelligence (IJCAI) 2021: 292–298
We consider non-cooperative facility location games where both facilities and clients act strategically and heavily influence each other. This contrasts established game-theoretic facility location models with non-strategic clients that simply select the closest opened facility. In our model, every facility location has a set of attracted clients and each client has a set of shopping locations and a weight that corresponds to her spending capacity. Facility agents selfishly select a location for opening their facility to maximize the attracted total spending capacity, whereas clients strategically decide how to distribute their spending capacity among the opened facilities in their shopping range. We focus on a natural client behavior similar to classical load balancing: our selfish clients aim for a distribution that minimizes their maximum waiting times for getting serviced, where a facility’s waiting time corresponds to its total attracted client weight. We show that subgame perfect equilibria exist and give almost tight constant bounds on the Price of Anarchy and the Price of Stability, which even hold for a broader class of games with arbitrary client behavior. Since facilities and clients influence each other, it is crucial for the facilities to anticipate the selfish clients’ behavior when selecting their location. For this, we provide an efficient algorithm that also implies an efficient check for equilibrium. Finally, we show that computing a socially optimal facility placement is NP-hard and that this result holds for all feasible client weight distributions.
Casel, Katrin; Friedrich, Tobias; Issac, Davis; Klodt, Nicolas; Seifert, Lars; Zahn, Arthur A Color-blind 3-Approximation for Chromatic Correlation Clustering and Improved HeuristicsKnowledge Discovery and Data Mining (KDD) 2021: 882–891
Chromatic Correlation Clustering (CCC) models clustering of objects with categorical pairwise relationships. The model can be viewed as clustering the vertices of a graph with edge-labels (colors). Bonchi et al. [KDD 2012] introduced it as a natural generalization of the well studied problem Correlation Clustering (CC), motivated by real-world applications from data-mining, social networks and bioinformatics. We give theoretical as well as practical contributions to the study of CCC. Our main theoretical contribution is an alternative analysis of the famous Pivot algorithm for CC. We show that, when simply run color-blind, Pivot is also a linear time 3-approximation for CCC. The previous best theoretical results for CCC were a 4-approximation with a high-degree polynomial runtime and a linear time 11-approximation, both by Anava et al. [WWW 2015]. While this theoretical result justifies Pivot as a baseline comparison for other heuristics, its blunt color-blindness performs poorly in practice. We develop a color-sensitive, practical heuristic we call Greedy Expansion that empirically outperforms all heuristics proposed for CCC so far, both on real-world and synthetic instances. Further, we propose a novel generalization of CCC allowing for multi-labelled edges. We argue that it is more suitable for many of the real-world applications and extend our results to this model.
Kißig, Otto; Taraz, Martin; Cohen, Sarel; Doskoč, Vanja; Friedrich, Tobias Drug Repurposing for Multiple COVID Strains using Collaborative FilteringICLR Workshop on Machine Learning for Preventing and Combating Pandemics (MLPCP@ICLR) 2021
The ongoing COVID-19 pandemic demands for a swift discovery of suitable treatments. The development of completely new compounds for such a novel disease is a challenging, time intensive process. This amplifies the relevance of drug repurposing, a technique where existing drugs are used to treat other diseases. A common bioinformatical approach to this is based on knowledge graphs, which compile relationships between drugs, diseases, genes and other biomedical entities. Then, graph neural networks (GNNs) are used for the drug repurposing task as they provide a good link prediction performance on such knowledge graphs. Building on state-of-the-art GNN research, Doshi & Chepuri (2020) construct the remarkable model DR-COVID. We re-implement their model and extend the approach to perform significantly better. We propose and evaluate several strategies for the aggregation of link predictions into drug recommendation rankings. With the help of clustering of similar target diseases we improve the model by a substantial margin, compiling a top-100 ranking of candidates including 32 currently being in COVID-19-related clinical trials. Regarding the re-implementation, we offer more flexibility in the selection of the graph neighborhood sizes fed into the model and reduce the training time significantly by making use of data parallelism.
Atzberger, Daniel; Cech, Tim; de la Haye, Merlin; Söchting, Maximilian; Scheibel, Willy; Limberger, Daniel; Döllner, Jürgen Software Forest: A Visualization of Semantic Similarities in Source Code using a Tree MetaphorProceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications -- Volume 3 IVAPP 2021: 112–122
Software visualization techniques provide effective means for program comprehension tasks as they allow developers to interactively explore large code bases. A frequently encountered task during software development is the detection of source code files of similar semantic. To assist this task we present Software Forest, a novel 2.5D software visualization that enables interactive exploration of semantic similarities within a software system, illustrated as a forest. The underlying layout results from the analysis of the vocabulary of the software documents using Latent Dirichlet Allocation and Multidimensional Scaling and therefore reflects the semantic similarity between source code files. By mapping properties of a software entity, e.g., size metrics or trend data, to visual variables encoded by various, figurative tree meshes, aspects of a software system can be displayed. This concept is complemented with implementation details as well as a discussion on applications.
Friedemann, Wilhelm; Friedrich, Tobias; Gawendowicz, Hans; Lenzner, Pascal; Melnichenko, Anna; Peters, Jannik; Stephan, Daniel; Vaichenker, Michael Efficiency and Stability in Euclidean Network DesignSymposium on Parallelism in Algorithms and Architectures (SPAA) 2021: 232–242
Network Design problems typically ask for a minimum cost sub-network from a given host network. This classical point-of-view assumes a central authority enforcing the optimum solution. But how should networks be designed to cope with selfish agents that own parts of the network? In this setting, minimum cost networks may be very unstable in that agents will deviate from a proposed solution if this decreases their individual cost. Hence, designed networks should be both efficient in terms of total cost and stable in terms of the agents' willingness to accept the network. We study this novel type of Network Design problem by investigating the creation of \($(\beta,\gamma)$\)-networks, that are in \($\beta$\)-approximate Nash equilibrium and have a total cost of at most \($\gamma$\) times the optimal cost, for the recently proposed Euclidean Generalized Network Creation Game by Bilò et al.SPAA2019. There, \($n$\) agents corresponding to points in Euclidean space create costly edges among themselves to optimize their centrality in the created network. Our main result is a simple \($\mathcal{O(n^2)$\)-time algorithm that computes a \($(\beta,\beta)$\)-network with low \($\beta$\) for any given set of points. Moreover, on integer grid point sets or random point sets our algorithm achieves a low constant \($\beta$\). Besides these results for the Euclidean model, we discuss a generalization of our algorithm to instances with arbitrary, even non-metric, edge lengths. Moreover, in contrast to these algorithmic results, we show that no such positive results are possible when focusing on either optimal networks, i.e., \($(\beta,1)$\)-networks, or perfectly stable networks, i.e., \($(1,\gamma)$\)-networks, as in both cases NP-hard problems arise, there exist instances with very unstable optimal networks, and there are instances for perfectly stable networks with high total cost. Along the way, we significantly improve several results from Bilò et al. and we asymptotically resolve their conjecture about the Price of Anarchy by providing a tight bound.
Kißig, Otto; Taraz, Martin; Cohen, Sarel; Friedrich, Tobias Drug Repurposing Using Link Prediction on Knowledge GraphsICML Workshop on Computational Biology (WCB@ICML) 2021: 742–753
The active global SARS-CoV-2 pandemic caused more than 167 million cases and 3.4 million deaths worldwide. The development of completely new drugs for such a novel disease is a challenging, time intensive process and despite researchers around the world working on this task, no effective treatments have been developed yet. This emphasizes the importance of drug repurposing, where treatments are found among existing drugs that are meant for different diseases. A common approach to this is based on knowledge graphs, that condense relationships between entities like drugs, diseases and genes. Graph neural networks (GNNs) can then be used for the task at hand by predicting links in such knowledge graphs. Expanding on state-of-the-art GNN research, Doshi et al. recently developed the Dr-COVID model. We further extend their work using additional output interpretation strategies. The best aggregation strategy derives a top-100 ranking of candidate drugs, 32 of which currently being in COVID-19-related clinical trials. Moreover, we present an alternative application for the model, the generation of additional candidates based on a given pre-selection of drug candidates using collaborative filtering. In addition, we improved the implementation of the Dr-COVID model by significantly shortening the inference and pre-processing time by exploiting data-parallelism. As drug repurposing is a task that requires high computation and memory resources, we further accelerate the post-processing phase using a new emerging hardware—we propose a new approach to leverage the use of high-capacity Non-Volatile Memory for aggregate drug ranking.