Hacker, Philipp; Naumann, Felix; Friedrich, Tobias; Grundmann, Stefan; Lehmann, Anja AI Compliance - Challenges of Bridging Data Science and LawACM Journal of Data and Information Quality 2022
This vision paper outlines the main building blocks of what we term AI Compliance, an effort to bridge two complementary research areas: computer science and the law. Such research has the goal to model, measure, and affect the quality of AI artifacts, such as data, models and applications, to then facilitate adherence to legal standards.
Bläsius, Thomas; Freiberger, Cedric; Friedrich, Tobias; Katzmann, Maximilian; Montenegro-Retana, Felix; Thieffry, Marianne Efficient Shortest Paths in Scale-Free Networks with Underlying Hyperbolic GeometryACM Transactions on Algorithms 2022: 1–32
A standard approach to accelerating shortest path algorithms on networks is the bidirectional search, which explores the graph from the start and the destination, simultaneously. In practice this strategy performs particularly well on scale-free real-world networks. Such networks typically have a heterogeneous degree distribution (e.g., a power-law distribution) and high clustering (i.e., vertices with a common neighbor are likely to be connected themselves). These two properties can be obtained by assuming an underlying hyperbolic geometry. To explain the observed behavior of the bidirectional search, we analyze its running time on hyperbolic random graphs and prove that it is \(\tilde{O}(n\)\(^{2 - 1/ \alpha}+\)\(n^{1/(2\alpha)}\)\(+ \delta_{\max})\) with high probability, where \(\alpha\)\(\in\)\((0.5, 1)\) controls the power-law exponent of the degree distribution, and \(\delta_{\max}\) is the maximum degree. This bound is sublinear, improving the obvious worst-case linear bound. Although our analysis depends on the underlying geometry, the algorithm itself is oblivious to it.
Bilò, Davide; Bilò, Vittorio; Lenzner, Pascal; Molitor, Louise Topological Influence and Locality in Swap Schelling GamesAutonomous Agents and Multi-Agent Systems (AGNT) 2022
Residential segregation is a wide-spread phenomenon that can be observed in almost every major city. In these urban areas residents with different racial or socioeconomic background tend to form homogeneous clusters. Schelling's famous agent-based model for residential segregation explains how such clusters can form even if all agents are tolerant, i.e., if they agree to live in mixed neighborhoods. For segregation to occur, all it needs is a slight bias towards agents preferring similar neighbors. Very recently, Schelling's model has been investigated from a game-theoretic point of view with selfish agents that strategically select their residential location. In these games, agents can improve on their current location by performing a location swap with another agent who is willing to swap. We significantly deepen these investigations by studying the influence of the underlying topology modeling the residential area on the existence of equilibria, the Price of Anarchy and on the dynamic properties of the resulting strategic multi-agent system. Moreover, as a new conceptual contribution, we also consider the influence of locality, i.e., if the location swaps are restricted to swaps of neighboring agents. We give improved almost tight bounds on the Price of Anarchy for arbitrary underlying graphs and we present (almost) tight bounds for regular graphs, paths and cycles. Moreover, we give almost tight bounds for grids, which are commonly used in empirical studies. For grids we also show that locality has a severe impact on the game dynamics.
Casel, Katrin; Fernau, Henning; Grigoriev, Alexander; Schmid, Markus L.; Whitesides, Sue Combinatorial Properties and Recognition of Unit Square Visibility GraphsDiscrete \& Computational Geometry 2022
Unit square visibility graphs (USV) are described by axis-parallel visibility between unit squares placed in the plane. If the squares are required to be placed on integer grid coordinates, then USV become unit square grid visibility graphs (USGV), an alternative characterisation of the well-known rectilinear graphs. We extend known combinatorial results for USGV and we show that, in the weak case (i.e., visibilities do not necessarily translate into edges of the represented combinatorial graph), the area minimisation variant of their recognition problem is NP-hard. We also provide combinatorial insights with respect to USV, and as our main result, we prove their recognition problem to be $\mathcal{NP}$-hard, which settles an open question.
Bläsius, Thomas; Friedrich, Tobias; Lischeid, Julius; Meeks, Kitty; Schirneck, Martin Efficiently Enumerating Hitting Sets of Hypergraphs Arising in Data ProfilingJournal of Computer and System Sciences 2022: 192–213
The transversal hypergraph problem asks to enumerate the minimal hitting sets of a hypergraph. If the solutions have bounded size, Eiter and Gottlob [SICOMP’95] gave an algorithm running in output-polynomial time, but whose space requirement also scales with the output. We improve this to polynomial delay and space. Central to our approach is the extension problem, deciding for a set X of vertices whether it is contained in any minimal hitting set. We show that this is one of the first natural problems to be W[3]-complete. We give an algorithm for the extension problem running in time O(m |X|+1 n) and prove a SETH-lower bound showing that this is close to optimal. We apply our enumeration method to the discovery problem of minimal unique column combinations from data profiling. Our empirical evaluation suggests that the algorithm outperforms its worst-case guarantees on hypergraphs stemming from real-world databases.
Bläsius, Thomas; Friedrich, Tobias; Schirneck, Martin The Complexity of Dependency Detection and Discovery in Relational DatabasesTheoretical Computer Science 2022: 79–96
Multi-column dependencies in relational databases come associated with two different computational tasks. The detection problem is to decide whether a dependency of a certain type and size holds in a given database, the discovery problem asks to enumerate all valid dependencies of that type. We settle the complexity of both of these problems for unique column combinations (UCCs), functional dependencies (FDs), and inclusion dependencies (INDs). We show that the detection of UCCs and FDs is W[2]-complete when parameterized by the solution size. The discovery of inclusion-wise minimal UCCs is proven to be equivalent under parsimonious reductions to the transversal hypergraph problem of enumerating the minimal hitting sets of a hypergraph. The discovery of FDs is equivalent to the simultaneous enumeration of the hitting sets of multiple input hypergraphs. We further identify the detection of INDs as one of the first natural W[3]-complete problems. The discovery of maximal INDs is shown to be equivalent to enumerating the maximal satisfying assignments of antimonotone, 3-normalized Boolean formulas.
Cseh, Ágnes; Peters, Jannik Three-Dimensional Popular Matching with Cyclic PreferencesAutonomous Agents and Multi-Agent Systems (AAMAS) 2022: 77–87
Two actively researched problem settings in matchings under preferences are popular matchings and the three-dimensional stable matching problem with cyclic preferences. In this paper, we apply the optimality notion of the first topic to the input characteristics of the second one. We investigate the connection between stability, popularity, and their strict variants, strong stability and strong popularity in three-dimensional instances with cyclic preferences. Furthermore, we also derive results on the complexity of these problems when the preferences are derived from master lists.
Berenbrink, Petra; Hoefer, Martin; Kaaser, Dominik; Lenzner, Pascal; Rau, Malin; Schmand, Daniel Asynchronous Opinion Dynamics in Social NetworksAutonomous Agents and Multi-Agent Systems (AAMAS) 2022: 109–117
Opinion spreading in a society decides the fate of elections, the success of products, and the impact of political or social movements. The model by Hegselmann and Krause is a well-known theoretical model to study such opinion formation processes in social networks. In contrast to many other theoretical models, it does not converge towards a situation where all agents agree on the same opinion. Instead, it assumes that people find an opinion reasonable if and only if it is close to their own. The system converges towards a stable situation where agents sharing the same opinion form a cluster, and agents in different clusters do not influence each other. We focus on the social variant of the Hegselmann-Krause model where agents are connected by a social network and their opinions evolve in an iterative process. When activated, an agent adopts the average of the opinions of its neighbors having a similar opinion. By this, the set of influencing neighbors of an agent may change over time. To the best of our knowledge, social Hegselmann-Krause systems with asynchronous opinion updates have only been studied with the complete graph as social network. We show that such opinion dynamics with random agent activation are guaranteed to converge for any social network. We provide an upper bound of \(\mathcal{O}(n |E|^2 (\varepsilon/\delta)^2)\) on the expected number of opinion updates until convergence, where |E| is the number of edges of the social network. For the complete social network we show a bound of \( \mathcal{O}(n^3(n^2 + (\varepsilon/\delta)^2)) \) that represents a major improvement over the previously best upper bound of \( \mathcal{O}(n^9(\varepsilon/\delta)^2) \). Our bounds are complemented by simulations that indicate asymptotically matching lower bounds.
Cseh, Ágnes; Friedrich, Tobias; Peters, Jannik Pareto Optimal and Popular House Allocation with Lower and Upper QuotasAutonomous Agents and Multi-Agent Systems (AAMAS) 2022: 300–308
In the house allocation problem with lower and upper quotas, we are given a set of applicants and a set of projects. Each applicant has a strictly ordered preference list over the projects, while the projects are equipped with a lower and an upper quota. A feasible matching assigns the applicants to the projects in such a way that a project is either matched to no applicant or to a number of applicants between its lower and upper quota. In this model we study two classic optimality concepts: Pareto optimality and popularity. We show that finding a popular matching is hard even if the maximum lower quota is 2 and that finding a perfect Pareto optimal matching, verifying Pareto optimality, and verifying popularity are all NP-complete even if the maximum lower quota is 3. We complement the last three negative results by showing that the problems become polynomial-time solvable when the maximum lower quota is 2, thereby answering two open questions of Cechlárová and Fleiner. Finally, we also study the parameterized complexity of all four mentioned problems.
Bläsius, Thomas; Friedrich, Tobias; Stangl, David; Weyand, Christopher An Efficient Branch-and-Bound Solver for Hitting SetAlgorithm Engineering and Experiments (ALENEX) 2022
The hitting set problem asks for a collection of sets over a universe \(U\) to find a minimum subset of \(U\) that intersects each of the given sets. It is NP-hard and equivalent to the problem set cover. We give a branch-and-bound algorithm to solve hitting set. Though it requires exponential time in the worst case, it can solve many practical instance from different domains in reasonable time. Our algorithm outperforms a modern ILP solver, the state-of-the-art for hitting set, by at least an order of magnitude on most instances.
Friedrich, Tobias; Issac, Davis; Kumar, Nikhil; Mallek, Nadym; Zeif, Ziena A Primal-Dual Algorithm for Multicommodity Flows and Multicuts in Treewidth-2 GraphsApproximation Algorithms for Combinatorial Optimization Problems (APPROX) 2022
We study the problem of multicommodity flow and multicut in treewidth-2 graphs and prove bounds on the multiflow-multicut gap. In particular, we give a primal-dual algorithm for computing multicommodity flow and multicut in treewidth-2 graphs and prove the following approximate max-flow min-cut theorem: given a treewidth-2 graph, there exists a multicommodity flow of value \(f\) with congestion \(4\), and a multicut of capacity \(c\) such that \(c \leq 20f \) . This implies a multiflow-multicut gap of \(80\) and improves upon the previous best known bounds for such graphs. Our algorithm runs in polynomial time when all the edges have capacity one. Our algorithm is completely combinatorial and builds upon the primal-dual algorithm of Garg, Vazirani and Yannakakis for multicut in trees and the augmenting paths framework of Ford and Fulkerson.
Doskoč, Vanja; Kötzing, Timo Maps of Restrictions for Behaviourally Correct LearningComputability in Europe (CiE) 2022
In language learning in the limit, we study computable devices (learners) learning formal languages. We consider learning tasks paired with restrictions regarding, for example, the hypotheses made by the learners. We compare such restrictions with each other in order to study their impact and depict the results in overviews, the so-called maps. In~the case of explanatory learning, the literature already provides various maps. On the other hand, in the case of behaviourally correct learning, only partial results are known. In this work, we complete these results and provide full behaviourally correct maps for different types of data presentation. In particular, in all studied settings, we observe that monotone learning implies non-U-shaped learning and that cautiousness, semantic conservativeness and weak monotonicity are equally powerful.
Führlich, Pascal; Cseh, Ágnes; Lenzner, Pascal Improving Ranking Quality and Fairness in Swiss-System Chess TournamentsACM Conference on Economics and Computation (EC) 2022
The International Chess Federation (FIDE) imposes a voluminous and complex set of player pairing criteria in Swiss-system chess tournaments and endorses computer programs that are able to calculate the prescribed pairings. The purpose of these formalities is to ensure that players are paired fairly during the tournament and that the final ranking corresponds to the players' true strength order. We contest the official FIDE player pairing routine by presenting alternative pairing rules. These can be enforced by computing maximum weight matchings in a carefully designed graph. We demonstrate by extensive experiments that a tournament format using our mechanism (1) yields fairer pairings in the rounds of the tournament and (2) produces a final ranking that reflects the players' true strengths better than the state-of-the-art FIDE pairing system.
Bläsius, Thomas; Fischbeck, Philipp On the External Validity of Average-Case Analyses of Graph AlgorithmsEuropean Symposium on Algorithms (ESA) 2022
The number one criticism of average-case analysis is that we do not actually know the probability distribution of real-world inputs. Thus, analyzing an algorithm on some random model has no implications for practical performance. At its core, this criticism doubts the existence of external validity, i.e., it assumes that algorithmic behavior on the somewhat simple and clean models does not translate beyond the models to practical performance real-world input. With this paper, we provide a first step towards studying the question of external validity systematically. To this end, we evaluate the performance of six graph algorithms on a collection of 2751 sparse real-world networks depending on two properties; the heterogeneity (variance in the degree distribution) and locality (tendency of edges to connect vertices that are already close). We compare this with the performance on generated networks with varying locality and heterogeneity. We find that the performance in the idealized setting of network models translates surprisingly well to real-world networks. Moreover, heterogeneity and locality appear to be the core properties impacting the performance of many graph algorithms.
Baguley, Samuel; Friedrich, Tobias; Timo, Kötzing; Li, Xiaoyue; Pappik, Marcus; Zeif, Ziena Analysis of a Gray-Box Operator for Vertex CoverGenetic and Evolutionary Computation Conference (GECCO) 2022
Combinatorial optimization problems are a prominent application area of evolutionary algorithms, where the (1+1) EA is one of the most investigated. We extend this algorithm by introducing some problem knowledge with a specialized mutation operator which works under the assumption that the number of 1s of a solution is critical, as frequently happens in combinatorial optimization. This slight modification increases the chance to correct wrongly placed bits while preserving the simplicity and problem independence of the (1+1) EA. As an application of our algorithm we examine the vertex cover problem on certain instances, where we show that it leads to asymptotically better runtimes and even finds with higher probability optimal solutions in comparison with the usual (1+1) EA. Precisely, we compare the performance of both algorithms on paths and on complete bipartite graphs of size \(n\). Regarding the path we prove that, for a particular initial configuration, the (1+1) EA takes in expectation \(\Theta(n^4)\) iterations while the modification reduces this to \(\Theta(n^3)\), and present experimental evidence that such a configuration is reached. Concerning the complete bipartite graph our modification finds the optimum in polynomial time with probability \(1-1/2^{\Omega(n^\xi)}\) for every positive constant \(\xi < 1\), which improves the known probability of \(1-1/\)poly\((n)\) for the (1+1) EA.
Angrick, Sebastian; Bals, Ben; Hastrich, Niko; Kleissl, Maximilian; Schmidt, Jonas; Doskoč, Vanja; Katzmann, Maximilian; Molitor, Louise; Friedrich, Tobias Towards Explainable Real Estate Valuation via Evolutionary AlgorithmsGenetic and Evolutionary Computation Conference (GECCO) 2022
Human lives are increasingly influenced by algorithms, which therefore need to meet higher standards not only in accuracy but also with respect to explainability. This is especially true for high-stakes areas such as real estate valuation. Unfortunately, the methods applied there often exhibit a trade-off between accuracy and explainability. One explainable approach is case-based reasoning (CBR), whereeach decision is supported by specific previous cases. However, such methods can be wanting in accuracy. The unexplainable machine learning approaches are often observed to provide higher accuracy but are not scrutable in their decision-making. In this paper, we apply evolutionary algorithms (EAs) to CBR predictors in order to improve their performance. In particular, we deploy EAs to the similarity functions (used in CBR to find comparable cases), which are fitted to the data set at hand. As a consequence, we achieve higher accuracy than state-of-the-art deep neural networks (DNNs), while keeping interpretability and explainability. These results stem from our empirical evaluation on a large data set of real estate offers where we compare known similarity functions, their EA-improved counterparts, and DNNs. Surprisingly, DNNs are only on par with standard CBR techniques. However, using EA-learned similarity functions does yield an improved performance.
Friedrich, Tobias; Kötzing, Timo; Radhakrishnan, Aishwarya; Schiller, Leon; Schirneck, Martin; Tennigkeit, Georg; Wietheger, Simon Crossover for Cardinality Constrained OptimizationGenetic and Evolutionary Computation Conference (GECCO) 2022
Best Paper Award (Theory Track) Nomination
In order to understand better how and why crossover can benefit optimization, we consider pseudo-Boolean functions with an upper bound \(B\) on the number of 1s allowed in the bit string (cardinality constraint). We consider the natural translation of the OneMax test function, a linear function where \(B\) bits have a weight of \(1+\varepsilon\) and the remaining bits have a weight of \(1\). The literature gives a bound of \(\Theta(n^2)\) for the (1+1) EA on this function. Part of the difficulty when optimizing this problem lies in having to improve individuals meeting the cardinality constraint by flipping both a 1 and a 0. The experimental literature proposes balanced operators, preserving the number of 1s, as a remedy. We show that a balanced mutation operator optimizes the problem in \(O(n \log n)\) if \(n-B = O(1)\). However, if \(n-B = \Theta(n)\), we show a bound of \(\Omega(n^2)\), just as classic bit flip mutation. Crossover and a simple island model gives \(O(n^2 / \log n)\) (uniform crossover) and \(O(n\sqrt{n})\) (3-ary majority vote crossover). For balanced uniform crossover with Hamming distance maximization for diversity we show a bound of \(O(n \log n)\). As an additional contribution we analyze and discuss different balanced crossover operators from the literature.
Friedrich, Tobias; Gawendowicz, Hans; Lenzner, Pascal; Melnichenko, Anna Social Distancing Network CreationInternational Colloquium on Automata, Languages and Programming (ICALP) 2022
During a pandemic people have to find a trade-off between meeting others and staying safely at home. While meeting others is pleasant, it also increases the risk of infection. We consider this dilemma by introducing a game-theoretic network creation model in which selfish agents can form bilateral connections. They benefit from network neighbors, but at the same time, they want to maximize their distance to all other agents. This models the inherent conflict that social distancing rules impose on the behavior of selfish agents in a social network. Besides addressing this familiar issue, our model can be seen as the inverse to the well-studied Network Creation Game by Fabrikant et al.~[PODC 2003] where agents aim at being as central as possible in the created network. Thus, our work is in-line with studies that compare minimization problems with their maximization versions. We look at two variants of network creation governed by social distancing. In the first variant, there are no restrictions on the connections being formed. We characterize optimal and equilibrium networks, and we derive asymptotically tight bounds on the Price of Anarchy and Price of Stability. The second variant is the model's generalization that allows restrictions on the connections that can be formed. As our main result, we prove that Swap-Maximal Routing-Cost Spanning Trees, an efficiently computable weaker variant of Maximum Routing-Cost Spanning Trees, actually resemble equilibria for a significant range of the parameter space. Moreover, we give almost tight bounds on the Price of Anarchy and Price of Stability. These results imply that, compared the well-studied inverse models, under social distancing the agents' selfish behavior has a significantly stronger impact on the quality of the equilibria, i.e., allowing socially much worse stable states.
Bilò, Davide; Choudhary, Keerti; Cohen, Sarel; Friedrich, Tobias; Schirneck, Martin Deterministic Sensitivity Oracles for Diameter, Eccentricities and All Pairs DistancesInternational Colloquium on Automata, Languages and Programming (ICALP) 2022: 68:1–68:19
We construct data structures for extremal and pairwise distances in directed graphs in the presence of transient edge failures. Henzinger et al. (ITCS 2017) initiated the study of fault-tolerant (sensitivity) oracles for the diameter and vertex eccentricities. We extend this with a special focus on space efficiency. We present several new data structures, among them the first fault-tolerant eccentricity oracle for dual failures in subcubic space. We further prove lower bounds that show limits to approximation vs. space and diameter vs. space trade-offs for fault-tolerant oracles. They highlight key differences between data structures for undirected and directed graphs. Initially, our oracles are randomized leaning on a sampling technique frequently used in sensitivity analysis. Building on the work of Alon, Chechik, and Cohen (ICALP 2019) as well as Karthik and Parter (SODA 2021), we develop a hierarchical framework to derandomize fault-tolerant data structures. We first apply it to our own diameter/eccentricity oracles and then show its usefulness by derandomizing algorithms from the literature: the distance sensitivity oracle of Ren (JCSS 2022) and the Single-Source Replacement Path algorithm of Chechik and Magen (ICALP 2020).
Böther, Maximilian; Kißig, Otto; Taraz, Martin; Cohen, Sarel; Seidel, Karen; Friedrich, Tobias What’s Wrong with Deep Learning in Tree Search for Combinatorial OptimizationInternational Conference on Learning Representations (ICLR) 2022
Combinatorial optimization lies at the core of many real-world problems. Especially since the rise of graph neural networks (GNNs), the deep learning community has been developing solvers that derive solutions to NP-hard problems by learning the problem-specific solution structure. However, reproducing the results of these publications proves to be difficult. We make three contributions. First, we present an open-source benchmark suite for the NP-hard Maximum Independent Set problem, in both its weighted and unweighted variants. The suite offers a unified interface to various state-of-the-art traditional and machine learning-based solvers. Second, using our benchmark suite, we conduct an in-depth analysis of the popular guided tree search algorithm by Li et al. [NeurIPS 2018], testing various configurations on small and large synthetic and real-world graphs. By re-implementing their algorithm with a focus on code quality and extensibility, we show that the graph convolution network used in the tree search does not learn a meaningful representation of the solution structure, and can in fact be replaced by random values. Instead, the tree search relies on algorithmic techniques like graph kernelization to find good solutions. Thus, the results from the original publication are not reproducible. Third, we extend the analysis to compare the tree search implementations to other solvers, showing that the classical algorithmic solvers often are faster, while providing solutions of similar quality. Additionally, we analyze a recent solver based on reinforcement learning and observe that for this solver, the GNN is responsible for the competitive solution quality.
Bilò, Davide; Bilò, Vittorio; Lenzner, Pascal; Molitor, Louise Tolerance is Necessary for Stability: Single-Peaked Swap Schelling GamesInternational Joint Conference on Artificial Intelligence (IJCAI) 2022
Residential segregation in metropolitan areas is a phenomenon that can be observed all over the world. Recently, this was investigated via game-theoretic models. There, selfish agents of two types are equipped with a monotone utility function that ensures higher utility if an agent has more same-type neighbors. The agents strategically choose their location on a given graph that serves as residential area to maximize their utility. However, sociological polls suggest that real-world agents are actually favoring mixed-type neighborhoods, and hence should be modeled via non-monotone utility functions. To address this, we study Swap Schelling Games with single-peaked utility functions. Our main finding is that tolerance, i.e., agents favoring fifty-fifty neighborhoods or being in the minority, is necessary for equilibrium existence on almost regular or bipartite graphs. Regarding the quality of equilibria, we derive (almost) tight bounds on the Price of Anarchy and the Price of Stability. In particular, we show that the latter is constant on bipartite and almost regular graphs.
Bullinger, Martin; Lenzner, Pascal; Melnichenko, Anna Network Creation with Homophilic AgentsInternational Joint Conference on Artificial Intelligence (IJCAI) 2022
Network Creation Games are an important framework for understanding the formation of real-world networks such as social networks. These games usually assume a set of indistinguishable agents strategically buying edges at a uniform price leading to a network among them. However, in real life, agents are heterogeneous and their relationships often display a bias towards similar agents, say of the same ethnic group. This homophilic behavior on the agent level can then lead to the emergent global phenomenon of social segregation. We initiate the study of Network Creation Games with multiple types of homophilic agents and non-uniform edge cost. Specifically, we introduce and compare two models, focusing on the perception of same-type and different-type neighboring agents, respectively. Despite their different initial conditions, both our theoretical and experimental analysis show that the resulting equilibrium networks are almost identical in the two models, indicating a robust structure of social networks under homophily. Moreover, we investigate the segregation strength of the formed networks and thereby offer new insights on understanding segregation.
Bilò, Davide; Casel, Katrin; Choudhary, Keerti; Cohen, Sarel; Friedrich, Tobias; Lagodzinski, J.A. Gregor; Schirneck, Martin; Wietheger, Simon Fixed-Parameter Sensitivity OraclesInnovations in Theoretical Computer Science (ITCS) 2022: 23:1–23:18
We combine ideas from distance sensitivity oracles (DSOs) and fixed-parameter tractability (FPT) to design sensitivity oracles for FPT graph problems. An oracle with sensitivity \(f\) for an FPT problem \(\Pi\) on a graph \(G\) with parameter \(k\) preprocesses \(G\) in time \(O(g(f,k) poly(n))\). When queried with a set \(F\) of at most \(f\) edges of \(G\), the oracle reports the answer to the \(\Pi\)-with the same parameter \(k\)-on the graph \(G-F\), i.e., \(G\) deprived of \(F\). The oracle should answer queries in a time that is significantly faster than merely running the best-known FPT algorithm on \(G-F\) from scratch. We design sensitivity oracles for the \(k\)-Path and the \(k\)-Vertex Cover problem. Our first oracle for \(k\)-Path has size \(O(k^{f+1})\) size and query time \(O(f \min\{f, \log (f) + k\})\). We use a technique inspired by the work of Weimann and Yuster [FOCS 2010, TALG 2013] on distance sensitivity problems to reduce the space to \(O\big(\big(\frac{f+k}{f}\big)^f \big(\frac{f+k}{k}\big)^k fk \cdot \log n\big)\) at the expense of increasing the query time to \(O\big(\big(\frac{f+k}{f}\big)^f \big(\frac{f+k}{k}\big)^k f \min\{f,k\} \cdot \log n \big)\). Both oracles can be modified to handle vertex-failures, but we need to replace \(k\) with \(2k\) in all the claimed bounds. Regarding \(k\)-Vertex Cover, we design three oracles offering different trade-offs between the size and the query time. The first oracle takes \(O(3^{f+k})\) space and has \(O(2^f)\) query time, the second one has a size of \(O(2^{f+k^2+k})\) and a query time of \(O(f{+}k^2)\); finally, the third one takes \(O(fk+k^2)\) space and can be queried in time \(O(1.2738^k + fk^2)\). All our oracles are computable in time (at most) proportional to their size and the time needed to detect a \(k\)-path or \(k\)-vertex cover, respectively. We also provide an interesting connection between \(k\)-Vertex Cover and the fault-tolerant shortest path problem, by giving a DSO of size \(O(poly(f,k) \cdot n)\) with query time in \(O(poly(f,k))\), where \(k\) is the size of a vertex cover. Following our line of research connecting fault-tolerant FPT and shortest paths problems, we introduce parameterization to the computation of distance preservers. Given a graph with a fixed source \(s\) and parameters \(f\),\(k\), we study the problem of constructing polynomial-sized oracles that reports efficiently, for any target vertex \(v\) and set \(F\) of at most \(f\) edge failures, whether the distance from \(s\) to \(v\) increases at most by an additive term of \(k\) in \(G-F\). The oracle size is \(O(2^k k^2 \cdot n)\), while the time needed to answer a query is \(O(2^k f^\omega k^\omega)\), where \(\omega<2.373\) is the matrix multiplication exponent. The second problem we study is about the construction of bounded-stretch fault-tolerant preservers. We construct a subgraph with \(O(2^{fk+f+k} k \cdot n)\) edges that preserves those \(s\)-\(v\)-distances that do not increase by more than \(k\) upon failure of \(F\). This improves significantly over the \( \tilde{O} (f n^{2-\frac{1}{2^f}}) \) bound in the unparameterized case by Bodwin et al. [ICALP 2017].
Friedrich, Tobias; Kötzing, Timo; Krejca, Martin S.; Rajabi, Amirhossein Escaping Local Optima With Local Search: A Theory-Driven DiscussionParallel Problem Solving from Nature (PPSN) 2022
Local search is the most basic strategy in optimization settings when no specific problem knowledge is employed. While this strategy finds good solutions for certain optimization problems, it generally suffers from getting stuck in local optima. This stagnation can be avoided if local search is modified. Depending on the optimization landscape, different modifications vary in their success. We discuss several features of optimization landscapes and give analyses as examples for how they affect the performance of modifications of local search. We consider modifying random local search by restarting it and by considering larger search radii. The landscape features we analyze include the number of local optima, the distance between different optima, as well as the local landscape around a local optimum. For each feature, we show which modifications of local search handle them well and which do not.
Bläsius, Thomas; Fischbeck, Philipp; Gottesbüren, Lars; Hamann, Michael; Heuer, Tobias; Spinner, Jonas; Weyand, Christopher; Wilhelm, Marcus A Branch-and-Bound Algorithm for Cluster EditingSymposium on Experimental Algorithms (SEA) 2022
The editing problem asks to transform a given graph into a disjoint union of cliques by inserting and deleting as few edges as possible. We describe and evaluate an exact branch-and-bound algorithm for cluster editing. For this, we introduce new reduction rules and adapt existing ones. Moreover, we generalize a known packing technique to obtain lower bounds and experimentally show that it contributes significantly to the performance of the solver. Our experiments further evaluate the effectiveness of the different reduction rules and examine the effects of structural properties of the input graph on solver performance. Our solver won the exact track of the 2021 PACE challenge.
Cohen, Sarel; Fischbeck, Philipp; Friedrich, Tobias; Krejca, Martin S.; Sauerwald, Thomas Accelerated Information Dissemination on Networks with Local and Global EdgesStructural Information and Communication Complexity (SIROCCO) 2022: 79–97
Bootstrap percolation is a classical model for the spread of information in a network. In the round-based version, nodes of an undirected graph become active once at least \(r\) neighbors were active in the previous round. We propose the \emph{perturbed} percolation process: a superposition of two percolation processes on the same node set. One process acts on a local graph with activation threshold~\(1\), the other acts on a global graph with threshold \(r\)~-- representing local and global edges, respectively. We consider grid-like local graphs and expanders as global graphs on~\(n\) nodes. For the extreme case \(r = 1\), all nodes are active after \(O(\log n)\) rounds, while the process spreads only polynomially fast for the other extreme case \(r \geq n\). For a range of suitable values of~\(r\), we prove that the process exhibits both phases of the above extremes: It starts with a polynomial growth and eventually transitions from at most \(cn\) to~\(n\) active nodes, for some constant \(c \in (0, 1)\), in \(O(\log n)\) rounds. We observe this behavior also empirically, considering additional global-graph models.
Bazgan, Cristina; Casel, Katrin; Cazals, Pierre Dense Graph Partitioning on sparse and dense graphs.Scandinavian Workshop Algorithm Theory (SWAT) 2022
We consider the problem of partitioning a graph into a non-fixed number of non-overlapping subgraphs of maximum density. The density of a partition is the sum of the densities of the subgraphs, where the density of a subgraph is its average degree, that is, the ratio of its number of edges and its number of vertices. This problem, called Dense Graph Partition, is known to be NP-hard on general graphs and polynomial-time solvable on trees, and polynomial-time 2-approximable. In this paper we study the restriction of Dense Graph Partition to particular sparse and dense graph classes. In particular, we prove that it is NP-hard on dense bipartite graphs as well as on cubic graphs. On dense graphs on \(n\) vertices, it is polynomial-time solvable on graphs with minimum degree \(n-3\) and NP-hard on \((n-4)\)-regular graphs. We prove that it is polynomial-time 4/3-approximable on cubic graphs and admits an efficient polynomial-time approximation scheme on graphs of minimum degree \(n-t\) for any constant \(t\geq 4\).
Schirneck, Martin Enumeration Algorithms in Data ProfilingPhD Thesis, Hasso Plattner Institute, University of Potsdam 2022
Data profiling is the extraction of metadata from relational databases. An important class of metadata are multi-column dependencies. They come associated with two computational tasks. The detection problem is to decide whether a dependency of a given type and size holds in a database. The discovery problem instead asks to enumerate all valid dependencies of that type. We investigate the two problems for three types of dependencies: unique column combinations (UCCs), functional dependencies (FDs), and inclusion dependencies (INDs). We first treat the parameterized complexity of the detection variants. We prove that the detection of UCCs and FDs, respectively, is W[2]-complete when parameterized by the size of the dependency. The detection of INDs is shown to be one of the first natural W[3]-complete problems. We further settle the enumeration complexity of the three discovery problems by presenting parsimonious equivalences with well-known enumeration problems. Namely, the discovery of UCCs is equivalent to the famous transversal hypergraph problem of enumerating the hitting sets of a hypergraph. The discovery of FDs is equivalent to the simultaneous enumeration of the hitting sets of multiple input hypergraphs. Finally, the discovery of INDs is shown to be equivalent to enumerating the satisfying assignments of antimonotone, 3-normalized Boolean formulas. In the remainder of the thesis, we design and analyze discovery algorithms for unique column combinations. Since this is as hard as the general transversal hypergraph problem, it is an open question whether the UCCs of a database can be computed in output-polynomial time in the worst case. For the analysis, we therefore focus on instances that are structurally close to databases in practice, most notably, inputs that have small solutions. The equivalence between UCCs and hitting sets transfers the computational hardness, but also allows us to apply ideas from hypergraph theory to data profiling. We devise an discovery algorithm that runs in polynomial space on arbitrary inputs and achieves polynomial delay whenever the maximum size of any minimal UCC is bounded. Central to our approach is the extension problem for minimal hitting sets, that is, to decide for a set of vertices whether they are contained in any minimal solution. We prove that this is yet another problem that is complete for the complexity class W[3], when parameterized by the size of the set that is to be extended. We also give several conditional lower bounds under popular hardness conjectures such as the Strong Exponential Time Hypothesis (SETH). The lower bounds suggest that the running time of our algorithm for the extension problem is close to optimal. We further conduct an empirical analysis of our discovery algorithm on real-world databases to confirm that the hitting set perspective on data profiling has merits also in practice. We show that the resulting enumeration times undercut their theoretical worst-case bounds on practical data, and that the memory consumption of our method is much smaller than that of previous solutions. During the analysis we make two observations about the connection between databases and their corresponding hypergraphs. On the one hand, the hypergraph representations containing all relevant information are usually significantly smaller than the original inputs. On the other hand, obtaining those hypergraphs is the actual bottleneck of any practical application. The latter often takes much longer than enumerating the solutions, which is in stark contrast to the fact that the preprocessing is guaranteed to be polynomial while the enumeration may take exponential time. To make the first observation rigorous, we introduce a maximum-entropy model for non-uniform random hypergraphs and prove that their expected number of minimal hyperedges undergoes a phase transition with respect to the total number of edges. The result also explains why larger databases may have smaller hypergraphs. Motivated by the second observation, we present a new kind of UCC discovery algorithm called Hitting Set Enumeration with Partial Information and Validation (HPIValid). It utilizes the fast enumeration times in practice in order to speed up the computation of the corresponding hypergraph. This way, we sidestep the bottleneck while maintaining the advantages of the hitting set perspective. An exhaustive empirical evaluation shows that HPIValid outperforms the current state of the art in UCC discovery. It is capable of processing databases that were previously out of reach for data profiling.