Friedrich, Tobias; Kötzing, Timo; Radhakrishnan, Aishwarya; Schiller, Leon; Schirneck, Martin; Tennigkeit, Georg; Wietheger, Simon Crossover for Cardinality Constrained OptimizationACM Transactions on Evolutionary Learning and Optimization 2023: 1–32
To understand better how and why crossover can benefit constrained optimization, we consider pseudo-Boolean functions with an upper bound \(B\) on the number of 1-bits allowed in the length-\(n\) bit string (i.e., a cardinality constraint). We investigate the natural translation of the OneMax test function to this setting, a linear function where \(B\) bits have a weight of \(1+ 1/n\) and the remaining bits have a weight of \(1\). Friedrich et al. [TCS 2020] gave a bound of \(\Theta(n^2)\) for the expected running time of the (1+1) EA on this function. Part of the difficulty when optimizing this problem lies in having to improve individuals meeting the cardinality constraint by flipping a \(1\) and a \(0\) simultaneously. The experimental literature proposes balanced operators, preserving the number of 1-bits, as a remedy. We show that a balanced mutation operator optimizes the problem in \(O(n \log n)\) if \(n-B = O(1)\). However, if \(n-B = \Theta(n)\), we show a bound of \(\Omega(n^2)\), just as for classic bit mutation. Crossover together with a simple island model gives running times of \(O(n^2 / \log n)\) (uniform crossover) and \(O(n\sqrt{n})\) (3-ary majority vote crossover). For balanced uniform crossover with Hamming-distance maximization for diversity we show a bound of \(O(n \log n)\). As an additional contribution, we present an extensive analysis of different balanced crossover operators from the literature.
Friedrich, Tobias; Gawendowicz, Hans; Lenzner, Pascal; Melnichenko, Anna Social Distancing Network CreationAlgorithmica 2023
During a pandemic people have to find a trade-off between meeting others and staying safely at home. While meeting others is pleasant, it also increases the risk of infection. We consider this dilemma by introducing a game-theoretic network creation model in which selfish agents can form bilateral connections. They benefit from network neighbors, but at the same time, they want to maximize their distance to all other agents. This models the inherent conflict that social distancing rules impose on the behavior of selfish agents in a social network. Besides addressing this familiar issue, our model can be seen as the inverse to the well-studied Network Creation Game by Fabrikant et al. (in: PODC 2003, pp 347–351, 2003. https://doi.org/10.1145/872035.872088), where agents aim at being as central as possible in the created network. We look at two variants of network creation governed by social distancing. Firstly, a variant without connection restrictions, where we characterize optimal and equilibrium networks, and derive asymptotically tight bounds on the Price of Anarchy and Price of Stability. The second variant allows connection restrictions. As our main result, we prove that Swap-Maximal Routing-Cost Spanning Trees, an efficiently computable weaker variant of Maximum Routing-Cost Spanning Trees, actually resemble equilibria for a significant range of the parameter space. Moreover, we give almost tight bounds on the Price of Anarchy and Price of Stability. These results imply that under social distancing the agents’ selfishness has a strong impact on the quality of the equilibria.
Böther, Maximilian; Schiller, Leon; Fischbeck, Philipp; Molitor, Louise; Krejca, Martin S.; Friedrich, Tobias Evolutionary Minimization of Traffic CongestionIEEE Transactions on Evolutionary Computation 2023: 1809–1821
Traffic congestion is a major issue that can be solved by suggesting drivers alternative routes they are willing to take. This concept has been formalized as a strategic routing problem in which a single alternative route is suggested to an existing one. We extend this formalization and introduce the Multiple-Routes problem, which is given a start and destination and aims at finding up to \(n\) different routes that the drivers strategically disperse over, minimizing the overall travel time of the system. Due to the NP-hard nature of the problem, we introduce the Multiple-Routes evolutionary algorithm (MREA) as a heuristic solver. We study several mutation and crossover operators and evaluate them on real-world data of Berlin, Germany. We find that a combination of all operators yields the best result, reducing the overall travel time by a factor between \(1.8\) and \(3\), in the median, compared to all drivers taking the fastest route. For the base case \(n = 2\), we compare our MREA to the highly tailored optimal solver by Bläsius et al. (ATMOS 2020), and show that, in the median, our approach finds solutions of quality at least \(99.69 \%\) of an optimal solution while only requiring \(40 \%\) of the time.
Krämer, Bastian; Stang, Moritz; Doskoč, Vanja; Schäfers, Wolfgang; Friedrich, Tobias Automated valuation models: improving model performance by choosing the optimal spatial training levelJournal of Property Research 2023: 365–390
The academic community has discussed using Automated Valuation Models (AVMs) in the context of traditional real estate valuations and their performance for several decades. Most studies focus on finding the best method for estimating property values. One aspect that has not yet to be studied scientifically is the appropriate choice of the spatial training level. The published research on AVMs usually deals with a manually defined region and fails to test the methods used on different spatial levels. Our research aims to investigate the impact of training AVM algorithms at different spatial levels regarding valuation accuracy. We use a dataset with 1.2 million residential properties from Germany and test four methods: Ordinary Least Square, Generalised Additive Models, eXtreme Gradient Boosting and Deep Neural Network. Our results show that the right choice of spatial training level can significantly impact the model performance, and that this impact varies across the different methods.
Cohen, Sarel; Hershcovitch, Moshik; Taraz, Martin; Kißig, Otto; Issac, Davis; Wood, Andrew; Waddington, Daniel; Chin, Peter; Friedrich, Tobias Improved And Optimized Drug Repurposing For The SARS-CoV-2 PandemicPlos One 2023
The active global SARS-CoV-2 pandemic caused more than 426 million cases and 5.8 million deaths worldwide. The development of completely new drugs for such a novel disease is a challenging, time intensive process. Despite researchers around the world working on this task, no effective treatments have been developed yet. This emphasizes the importance of drug repurposing, where treatments are found among existing drugs that are meant for different diseases. A common approach to this is based on knowledge graphs, that condense relationships between entities like drugs, diseases and genes. Graph neural networks (GNNs) can then be used for the task at hand by predicting links in such knowledge graphs. Expanding on state-of-the-art GNN research, Doshi et al. recently developed the Dr-COVID model. We further extend their work using additional output interpretation strategies. The best aggregation strategy derives a top-100 ranking of 8,070 candidate drugs, 32 of which are currently being tested in COVID-19-related clinical trials. Moreover, we present an alternative application for the model, the generation of additional candidates based on a given pre-selection of drug candidates using collaborative filtering. In addition, we improved the implementation of the Dr-COVID model by significantly shortening the inference and pre-processing time by exploiting data-parallelism. As drug repurposing is a task that requires high computation and memory resources, we further accelerate the post-processing phase using a new emerging hardware - we propose a new approach to leverage the use of high-capacity Non-Volatile Memory for aggregate drug ranking.
Friedrich, Tobias; Lenzner, Pascal; Molitor, Louise; Seifert, Lars Single-Peaked Jump Schelling GamesInternational Symposium on Algorithmic Game Theory (SAGT) 2023
Schelling games model the wide-spread phenomenon of residential segregation in metropolitan areas from a game-theoretic point of view. In these games agents of different types each strategically select a node on a given graph that models the residential area to maximize their individual utility. The latter solely depends on the types of the agents on neighboring nodes and it has been a standard assumption to consider utility functions that are monotone in the number of same-type neighbors. This simplifying assumption has recently been challenged since sociological poll results suggest that real-world agents actually favor diverse neighborhoods. We contribute to the recent endeavor of investigating residential segregation models with realistic agent behavior by studying Jump Schelling Games with agents having a single-peaked utility function. In such games, there are empty nodes in the graph and agents can strategically jump to such nodes to improve their utility. We investigate the existence of equilibria and show that they exist under specific conditions. Contrasting this, we prove that even on simple topologies like paths or rings such stable states are not guaranteed to exist. Regarding the game dynamics, we show that improving response cycles exist independently of the position of the peak in the utility function. Moreover, we show high almost tight bounds on the Price of Anarchy and the Price of Stability with respect to the recently proposed degree of integration, which counts the number of agents with a diverse neighborhood and which serves as a proxy for measuring the segregation strength. Last but not least, we show that computing a beneficial state with high integration is NP-complete and, as a novel conceptual contribution, we also show that it is NP-hard to decide if an equilibrium state can be found via improving response dynamics starting from a given initial state.
Cseh, Ágnes; Führlich, Pascal; Lenzner, Pascal The Swiss GambitAutonomous Agents and Multi-Agent Systems (AAMAS) 2023
In each round of a Swiss-system tournament, players of similar score are paired against each other. An intentional early loss therefore might lead to weaker opponents in later rounds and thus to a better final tournament result a phenomenon known as the Swiss Gambit. To the best of our knowledge it is an open question whether this strategy can actually work. This paper provides answers based on an empirical agent-based analysis for the most prominent application area of the Swiss-system format, namely chess tournaments. We simulate realistic tournaments by employing the official FIDE pairing system for computing the player pairings in each round. We show that even though gambits are widely possible in Swiss-system chess tournaments, profiting from them requires a high degree of predictability of match results. Moreover, even if a Swiss Gambit succeeds, the obtained improvement in the final ranking is limited. Our experiments prove that counting on a Swiss Gambit is indeed a lot more of a risky gambit than a reliable strategy to improve the final rank.
Friedrich, Tobias; Lenzner, Pascal; Molitor, Louise; Seifert, Lars Single-Peaked Jump Schelling GamesAutonomous Agents and Multiagent Systems (AAMAS) 2023: 2899–2901
Schelling games model the wide-spread phenomenon of residential segregation in metropolitan areas from a game-theoretic point of view. In these games agents of different types each strategically select a node on a given graph that models the residential area to maximize their individual utility. The latter solely depends on the types of the agents on neighboring nodes and it has been a standard assumption to consider utility functions that are monotone in the number of same-type neighbors. This simplifying assumption has recently been challenged since sociological poll results suggest that real-world agents actually favor diverse neighborhoods. We contribute to the recent endeavor of investigating residential segregation models with realistic agent behavior by studying Jump Schelling Games with agents having a single-peaked utility function. In such games, there are empty nodes in the graph and agents can strategically jump to such nodes to improve their utility. We investigate the existence of equilibria and show that they exist under specific conditions. Contrasting this, we prove that even on simple topologies like paths or rings such stable states are not guaranteed to exist. Regarding the game dynamics, we show that improving response cycles exist independently of the position of the peak in the utility function. Moreover, we show high almost tight bounds on the Price of Anarchy and the Price of Stability with respect to the recently proposed degree of integration, which counts the number of agents with a diverse neighborhood and which serves as a proxy for measuring the segregation strength. Last but not least, we show that computing a beneficial state with high integration is NP-complete and, as a novel conceptual contribution, we also show that it is NP-hard to decide if an equilibrium state can be found via improving response dynamics starting from a given initial state.
Böther, Maximilian; Kißig, Otto; Weyand, Christopher Efficiently Computing Directed Minimum Spanning TreesSIAM Symposium on Algorithm Engineering and Experiments (ALENEX) 2023: 86–95
Computing a directed minimum spanning tree, called arborescence, is a fundamental algorithmic problem, although not as common as its undirected counterpart. In 1967, Edmonds discussed an elegant solution. It was refined to run in O(min(n2, m log n)) by Tarjan which is optimal for very dense and very sparse graphs. Gabow et al. gave a version of Edmonds’ algorithm that runs in O(n log n +m), thus asymptotically beating the Tarjan variant in the regime between sparse and dense. Despite the attention the problem received theoretically, there exists, to the best of our knowledge, no empirical evaluation of either of these algorithms. In fact, the version by Gabow et al. has never been implemented and, aside from coding competitions, all readily available Tarjan implementations run in O(n2). In this paper, we provide the first implementation of the version by Gabow et al. as well as five variants of Tarjan’s version with different underlying data structures. We evaluate these algorithms and existing solvers on a large set of real-world and random graphs.
Khomutovskiy, Ivan; Dunker, Rebekka; Dierking, Jessica; Egbert, Julian; Helms, Christian; Schöllkopf, Finn; Casel, Katrin; Fischbeck, Philipp; Friedrich, Tobias; Isaac, Davis; Krogmann, Simon; Lenzner, Pascal Applying Skeletons to Speed Up the Arc-Flags Routing AlgorithmSIAM Symposium on Algorithm Engineering and Experiments (ALENEX) 2023: 110–122
The Single-Source Shortest Path problem is classically solved by applying Dijkstra's algorithm. However, the plain version of this algorithm is far too slow for real-world applications such as routing in large road networks. To amend this, many speed-up techniques have been developed that build on the idea of computing auxiliary data in a preprocessing phase, that is used to speed up the queries. One well-known example is the Arc-Flags algorithm that is based on the idea of precomputing edge flags to make the search more goal-directed. To explain the strong practical performance of such speed-up techniques, several graph parameters have been introduced. The skeleton dimension is one such parameter that has already been used to derive runtime bounds for some speed-up techniques. Moreover, it was experimentally shown to be low in real-world road networks. We introduce a method to incorporate skeletons, the underlying structure behind the skeleton dimension, to improve routing speed-up techniques even further. As a proof of concept, we develop new algorithms called SKARF and SKARF+ that combine skeletons with Arc-Flags, and demonstrate via extensive experiments on large real-world road networks that SKARF+ yields a significant reduction of the search space and the query time of about 30% to 40% over Arc-Flags. We also prove theoretical bounds on the query time of SKARF, which is the first time an Arc-Flags variant has been analyzed in terms of skeleton dimension.
Krämer, Bastian; Stang, Moritz; Doskoč, Vanja; Schäfers, Wolfgang; Friedrich, Tobias Automated Valuation Models: Improving Model Performance by Choosing the Optimal Spatial Training LevelAmerican Real Estate Society (ARES) 2023: 1–26
The use of Automated Valuation Models (AVMs) in the context of traditional real estate valuations and their performance has been discussed in the academic community for several decades. Most studies focus on finding which method is best suited for estimating property values. One aspect that has not yet been studied scientifically is the appropriate choice of the spatial training level. The published research on AVMs usually deals with a manually defined region and fails to test the methods used on different spatial levels. The aim of our research is thus to investigate the impact of training AVM algorithms at different spatial levels in terms of valuation accuracy. We use a dataset with about 1.2 million residential properties from Germany and test four different methods, namely Ordinary Least Square, Generalized Additive Models, eXtreme Gradient Boosting and Deep Neural Network. Our results show that the right choice of spatial training level can have a major impact on the model performance, and that this impact varies across the different methods.
Casel, Katrin; Friedrich, Tobias; Schirneck, Martin; Wietheger, Simon Fair Correlation Clustering in ForestsFoundations of Responsible Computing (FORC) 2023: 9:1–9:12
The study of algorithmic fairness received growing attention recently. This stems from the awareness that bias in the input data for machine learning systems may result in discriminatory outputs. For clustering tasks, one of the most central notions of fairness is the formalization by Chierichetti, Kumar, Lattanzi, and Vassilvitskii [NeurIPS 2017]. A clustering is said to be fair, if each cluster has the same distribution of manifestations of a sensitive attribute as the whole input set. This is motivated by various applications where the objects to be clustered have sensitive attributes that should not be over- or underrepresented. Most research on this version of fair clustering has focused on centriod-based objectives. In contrast, we discuss the applicability of this fairness notion to Correlation Clustering. The existing literature on the resulting Fair Correlation Clustering problem either presents approximation algorithms with poor approximation guarantees or severely limits the possible distributions of the sensitive attribute (often only two manifestations with a 1:1 ratio are considered). Our goal is to understand if there is hope for better results in between these two extremes. To this end, we consider restricted graph classes which allow us to characterize the distributions of sensitive attributes for which this form of fairness is tractable from a complexity point of view. While existing work on Fair Correlation Clustering gives approximation algorithms, we focus on exact solutions and investigate whether there are efficiently solvable instances. The unfair version of Correlation Clustering is trivial on forests, but adding fairness creates a surprisingly rich picture of complexities. We give an overview of the distributions and types of forests where Fair Correlation Clustering turns from tractable to intractable. As the most surprising insight, we consider the fact that the cause of the hardness of Fair Correlation Clustering is not the strictness of the fairness condition. We lift most of our results to also hold for the relaxed version of the fairness condition. Instead, the source of hardness seems to be the distribution of the sensitive attribute. On the positive side, we identify some reasonable distributions that are indeed tractable. While this tractability is only shown for forests, it may open an avenue to design reasonable approximations for larger graph classes.
Friedrich, Tobias; Göbel, Andreas; Katzmann, Maximilian; Schiller, Leon Cliques in High-Dimensional Geometric Inhomogeneous Random GraphsInternational Colloquium on Automata, Languages and Programming (ICALP) 2023: 62:1–62:13
A recent trend in the context of graph theory is to bring theoretical analyses closer to empirical observations, by focusing the studies on random graph models that are used to represent practical instances. There, it was observed that geometric inhomogeneous random graphs (GIRGs) yield good representations of complex real-world networks, by expressing edge probabilities as a function that depends on (heterogeneous) vertex weights and distances in some underlying geometric space that the vertices are distributed in. While most of the parameters of the model are understood well, it was unclear how the dimensionality of the ground space affects the structure of the graphs. In this paper, we complement existing research into the dimension of geometric random graph models and the ongoing study of determining the dimensionality of real-world networks, by studying how the structure of GIRGs changes as the number of dimensions increases. We prove that, in the limit, GIRGs approach non-geometric inhomogeneous random graphs and present insights on how quickly the decay of the geometry impacts important graph structures. In particular, we study the expected number of cliques of a given size as well as the clique number and characterize phase transitions at which their behavior changes fundamentally. Finally, our insights help in better understanding previous results about the impact of the dimensionality on geometric random graphs.
Gadea Harder, Jonathan; Krogmann, Simon; Lenzner, Pascal; Skopalik, Alexander Strategic Resource Selection with Homophilic AgentsInternational Joint Conference on Artificial Intelligence (IJCAI) 2023: 2701–2709
The strategic selection of resources by selfish agents is a classic research direction, with Resource Selection Games and Congestion Games as prominent examples. In these games, agents select available resources and their utility then depends on the number of agents using the same resources. This implies that there is no distinction between the agents, i.e., they are anonymous. We depart from this very general setting by proposing Resource Selection Games with heterogeneous agents that strive for joint resource usage with similar agents. So, instead of the number of other users of a given resource, our model considers agents with different types and the decisive feature is the fraction of same-type agents among the users. More precisely, similarly to Schelling Games, there is a tolerance threshold \(\tau \in [0,1]\) which specifies the agents' desired minimum fraction of same-type agents on a resource. Agents strive to select resources where at least a \(\tau\)-fraction of those resources' users have the same type as themselves. For \(\tau=1\), our model generalizes Hedonic Diversity Games with a peak at \(1\). For our general model, we consider the existence and quality of equilibria and the complexity of maximizing social welfare. Additionally, we consider a bounded rationality model, where agents can only estimate the utility of a resource, since they only know the fraction of same-type agents on a given resource, but not the exact numbers. Thus, they cannot know the impact a strategy change would have on a target resource. Interestingly, we show that this type of bounded rationality yields favorable game-theoretic properties and specific equilibria closely approximate equilibria of the full knowledge setting.
Friedrich, Tobias; Gawendowicz, Hans; Lenzner, Pascal; Zahn, Arthur The Impact of Cooperation in Bilateral Network CreationACM Symposium on Principles of Distributed Computing (PODC) 2023
Many real-world networks, like the Internet or social networks, are not the result of central design but instead the outcome of the interaction of local agents that selfishly optimize their individual utility. The well-known Network Creation Game by Fabrikant, Luthra, Maneva, Papadimitriou, and Shenker [PODC 2003] models this. There, agents corresponding to network nodes buy incident edges towards other agents for a price of \(\alpha > 0\) and simultaneously try to minimize their buying cost and their total hop distance. Since in many real-world networks, e.g., social networks, consent from both sides is required to establish and maintain a connection, Corbo and Parkes [PODC 2005] proposed a bilateral version of the Network Creation Game, in which mutual consent and payment are required in order to create edges. It is known that this cooperative version has a significantly higher Price of Anarchy compared to the unilateral version. On the first glance this is counter-intuitive, since cooperation should help to avoid socially bad states. However, in the bilateral version only a very restrictive form of cooperation is considered. We investigate this trade-off between the amount of cooperation and the Price of Anarchy by analyzing the bilateral version with respect to various degrees of cooperation among the agents. With this, we provide insights into what kind of cooperation is needed to ensure that socially good networks are created. As a first step in this direction, we focus on tree networks and present a collection of asymptotically tight bounds on the Price of Anarchy that precisely map the impact of cooperation. Most strikingly, we find that weak forms of cooperation already yield a significantly improved Price of Anarchy. In particular, the cooperation of coalitions of size 3 is enough to achieve constant bounds. Moreover, for general networks we show that enhanced cooperation yields close to optimal networks for a wide range of edge prices. Along the way, we disprove an old conjecture by Corbo and Parkes [PODC 2005].
Angrick, Sebastian; Bals, Ben; Casel, Katrin; Cohen, Sarel; Friedrich, Tobias; Hastrich, Niko; Hradilak, Theresa; Issac, Davis; Kißig, Otto; Schmidt, Jonas; Wendt, Leo Solving Directed Feedback Vertex Set by Iterative Reduction to Vertex CoverSymposium on Experimental Algorithms (SEA) 2023: 10:1–10:14
In the Directed Feedback Vertex Set (DFVS) problem, one is given a directed graph \(G = (V,E)\) and wants to find a minimum cardinality set \(S \subseteq V\) such that \(G-S\) is acyclic. DFVS is a fundamental problem in computer science and finds applications in areas such as deadlock detection. The problem was the subject of the 2022 PACE coding challenge. We develop a novel exact algorithm for the problem that is tailored to perform well on instances that are mostly bi-directed. For such instances, we adapt techniques from the well-researched vertex cover problem. Our core idea is an iterative reduction to vertex cover. To this end, we also develop a new reduction rule that reduces the number of not bi-directed edges. With the resulting algorithm, we were able to win third place in the exact track of the PACE challenge. We perform computational experiments and compare the running time to other exact algorithms, in particular to the winning algorithm in PACE. Our experiments show that we outpace the other algorithms on instances that have a low density of uni-directed edges.
Bläsius, Thomas; Friedrich, Tobias; Katzmann, Maximilian; Stephan, Daniel Strongly Hyperbolic Unit Disk GraphsSymposium Theoretical Aspects of Computer Science (STACS) 2023: 13:1–13:17
The class of Euclidean unit disk graphs is one of the most fundamental and well-studied graph classes with underlying geometry. In this paper, we identify this class as a special case in the broader class of hyperbolic unit disk graphs and introduce strongly hyperbolic unit disk graphs as a natural counterpart to the Euclidean variant. In contrast to the grid-like structures exhibited by Euclidean unit disk graphs, strongly hyperbolic networks feature hierarchical structures, which are also observed in complex real-world networks. We investigate basic properties of strongly hyperbolic unit disk graphs, including adjacencies and the formation of cliques, and utilize the derived insights to demonstrate that the class is useful for the development and analysis of graph algorithms. Specifically, we develop a simple greedy routing scheme and analyze its performance on strongly hyperbolic unit disk graphs in order to prove that routing can be performed more efficiently on such networks than in general.