Cseh, Ágnes; Fleiner, TamásThe Complexity of Cake Cutting with Unequal Shares. ACM Transactions on Algorithms 2020: 1-21
An unceasing problem of our prevailing society is the fair division of goods. The problem of proportional cake cutting focuses on dividing a heterogeneous and divisible resource, the cake, among n players who value pieces according to their own measure function. The goal is to assign each player a not necessarily connected part of the cake that the player evaluates at least as much as her proportional share. In this article, we investigate the problem of proportional division with unequal shares, where each player is entitled to receive a predetermined portion of the cake. Our main contribution is threefold. First we present a protocol for integer demands, which delivers a proportional solution in fewer queries than all known proto- cols. By giving a matching lower bound, we then show that our protocol is asymptotically the fastest possible. Finally, we turn to irrational demands and solve the proportional cake cutting problem by reducing it to the same problem with integer demands only. All results remain valid in a highly general cake cutting model, which can be of independent interest.
Shi, Feng; Schirneck, Martin; Friedrich, Tobias; Kötzing, Timo; Neumann, FrankCorrection to: Reoptimization Time Analysis of Evolutionary Algorithms on Linear Functions Under Dynamic Uniform Constraints. Algorithmica 2020: 3117-3123
In the article "Reoptimization Time Analysis of Evolutionary Algorithms on Linear Functions Under Dynamic Uniform Constraints", we claimed a worst-case runtime of \(O(nD \log D)\) and \(O(nD)\) for the Multi-Objective Evolutionary Algorithm and the Multi-Objective \((\mu+(\lambda, \lambda))\) Genetic Algorithm, respectively, on linear profit functions under dynamic uniform constraint, where \(D = |B − B^*|\) denotes the difference between the original constraint bound \(B\) and the new one \(B^*\) . The technique used to prove these results contained an error. We correct this mistake and show a weaker bound of \(O(nD^2)\) for both algorithms instead.
Chauhan, Ankit; Friedrich, Tobias; Rothenberger, RalfGreed is Good for Deterministic Scale-Free Networks. Algorithmica 2020: 3338–3389
Large real-world networks typically follow a power-law degree distribution. To study such networks, numerous random graph models have been proposed. However, real-world networks are not drawn at random. Therefore, Brach, Cygan, Lacki, and Sankowski [SODA 2016] introduced two natural deterministic conditions: (1) a power-law upper bound on the degree distribution (PLB-U) and (2) power-law neighborhoods, that is, the degree distribution of neighbors of each vertex is also upper bounded by a power law (PLB-N). They showed that many real-world networks satisfy both deterministic properties and exploit them to design faster algorithms for a number of classical graph problems. We complement the work of Brach et al. by showing that some well-studied random graph models exhibit both the mentioned PLB properties and additionally also a power-law lower bound on the degree distribution (PLB-L). All three properties hold with high probability for Chung-Lu Random Graphs and Geometric Inhomogeneous Random Graphs and almost surely for Hyperbolic Random Graphs. As a consequence, all results of Brach et al. also hold with high probability or almost surely for those random graph classes. In the second part of this work we study three classical NP-hard combinatorial optimization problems on PLB networks. It is known that on general graphs with maximum degree \(\Delta\), a greedy algorithm, which chooses nodes in the order of their degree, only achieves a \(\Omega(\ln \Delta)\)-approximation for Minimum Vertex Cover and Minimum Dominating Set, and a \(\Omega(\Delta)\)-approximation for Maximum Independent Set. We prove that the PLB-U property suffices for the greedy approach to achieve a constant-factor approximation for all three problems. We also show that all three combinatorial optimization problems are APX-complete even if all PLB-properties holds hence, PTAS cannot be expected unless P=NP.
Casel, Katrin; Dreier, Jan; Fernau, Henning; Gobbert, Moritz; Kuinke, Philipp; Sanchez Villaamil, Fernando; Schmid, Markus L.; van Leeuwen, Erik JanComplexity of independency and cliquy trees. Discrete Applied Mathematics 2020: 2-15
An independency (cliquy) tree of an \(n\)-vertex graph \(G\) is a spanning tree of \(G\) in which the set of leaves induces an independent set (clique). We study the problems of minimizing or maximizing the number of leaves of such trees, and fully characterize their parameterized complexity. We show that all four variants of deciding if an independency/cliquy tree with at least/most \(\ell\) leaves exists parameterized by \(\ell\) are either para-NP- or W[1]-hard. We prove that minimizing the number of leaves of a cliquy tree parameterized by the number of internal vertices is para-NP-hard too. However, we show that minimizing the number of leaves of an independency tree parameterized by the number \(k\) of internal vertices has an \(O^*(4^k)\)-time algorithm and a \(2k\) vertex kernel. Moreover, we prove that maximizing the number of leaves of an independency/cliquy tree parameterized by the number \(k\) of internal vertices both have an \(O^*(18^k)\)-time algorithm and an \(O(k\, 2^k)\) vertex kernel, but no polynomial kernel unless the polynomial hierarchy collapses to the third level. Finally, we present an \(O(3^n f(n))\)-time algorithm to find a spanning tree where the leaf set has a property that can be decided in \(f(n)\) time and has minimum or maximum size.
Bazgan, Cristina; Brankovic, Ljiljana; Casel, Katrin; Fernau, HenningDomination chain: Characterisation, classical complexity, parameterised complexity and approximability. Discrete Applied Mathematics 2020: 23-42
We survey the most important results regarding the domination chain parameters, including the characterisation of the domination sequence, complexity of exact and parameterised algorithms, and approximation and inapproximability ratios. We provide a number of new results for the upper and lower irredundance and their complements, and a few results for other domination chain problems. In particular, we analyse the structure of maximum irredundant sets and we provide new bounds on the upper irredundance number. We show several approximability results for upper and lower irredundance and their complements on general graphs; all four problems remain NP-hard even on planar cubic graphs and APX-hard on cubic graphs. Finally, we give some results on everywhere dense graphs, and study some related extension problems.
Cseh, Ágnes; Heeger, KlausThe stable marriage problem with ties and restricted edges. Discrete Optimization 2020: 100571
In the stable marriage problem, a set of men and a set of women are given, each of whom has a strictly ordered preference list over the acceptable agents in the opposite class. A matching is called stable if it is not blocked by any pair of agents, who mutually prefer each other to their respective partner. Ties in the preferences allow for three different definitions for a stable matching: weak, strong and super-stability. Besides this, acceptable pairs in the instance can be restricted in their ability of blocking a matching or being part of it, which again generates three categories of restrictions on acceptable pairs. Forced pairs must be in a stable matching, forbidden pairs must not appear in it, and lastly, free pairs cannot block any matching. Our computational complexity study targets the existence of a stable solution for each of the three stability definitions, in the presence of each of the three types of restricted pairs. We solve all cases that were still open. As a byproduct, we also derive that the maximum size weakly stable matching problem is hard even in very dense graphs, which may be of independent interest.
Bläsius, Thomas; Friedrich, Tobias; Katzmann, Maximilian; Krohmer, AntonHyperbolic Embeddings for Near-Optimal Greedy Routing. Journal of Experimental Algorithmics (JEA) 2020
Greedy routing computes paths between nodes in a network by successively moving to the neighbor closest to the target with respect to coordinates given by an embedding into some metric space. Its advantage is that only local information is used for routing decisions. We present different algorithms for generating graph embeddings into the hyperbolic plane that are well suited for greedy routing. In particular, our embeddings guarantee that greedy routing always succeeds in reaching the target, and we try to minimize the lengths of the resulting greedy paths. We evaluate our algorithm on multiple generated and real-world networks. For networks that are generally assumed to have a hidden underlying hyperbolic geometry, such as the Internet graph, we achieve near-optimal results (i.e., the resulting greedy paths are only slightly longer than the corresponding shortest paths). In the case of the Internet graph, they are only \(6\%\) longer when using our best algorithm, which greatly improves upon the previous best known embedding, whose creation required substantial manual intervention. In addition to measuring the stretch, we empirically evaluate our algorithms regarding the size of the coordinates of the resulting embeddings and observe how it impacts the success rate when coordinates are not represented with very high precision. Since numerical difficulties are a major issue when performing computations in the hyperbolic plane, we consider variations of our algorithm that improve the success rate when using coordinates with lower precision.
Birnick, Johann; Bläsius, Thomas; Friedrich, Tobias; Naumann, Felix; Papenbrock, Thorsten; Schirneck, MartinHitting Set Enumeration with Partial Information for Unique Column Combination Discovery. Proceedings of the VLDB Endowment 2020: 2270 - 2283
Unique column combinations (UCCs) are a fundamental concept in relational databases. They identify entities in the data and support various data management activities. Still, UCCs are usually not explicitly defined and need to be discovered. State-of-the-art data profiling algorithms are able to efficiently discover UCCs in moderately sized datasets, but they tend to fail on large and, in particular, on wide datasets due to run time and memory limitations. In this paper, we introduce HPIValid, a novel UCC discovery algorithm that implements a faster and more resource-saving search strategy. HPIValid models the metadata discovery as a hitting set enumeration problem in hypergraphs. In this way, it combines efficient discovery techniques from data profiling research with the most recent theoretical insights into enumeration algorithms. Our evaluation shows that HPIValid is not only orders of magnitude faster than related work, it also has a much smaller memory footprint.
Battaglia, Francesco; Cucina, Domenico; Rizzo, ManuelParsimonious periodic autoregressive models for time series with evolving trend and seasonality. Statistics and Computing 2020: 77-91
This paper proposes an extension of Periodic AutoRegressive (PAR) modelling for time series with evolving features. The large scale of modern datasets, in fact, implies that the time span may subtend several evolving patterns of the underlying series, affecting also seasonality. The proposed model allows several regimes in time and a possibly different PAR process with a trend term in each regime. The means, autocorrelations and residual variances may change both with the regime and the season, resulting in a very large number of parameters. Therefore as a second step we propose a grouping procedure on the PAR parameters, in order to obtain a more parsimonious and concise model. The model selection procedure is a complex combinatorial problem, and it is solved basing on Genetic Algorithms that optimize an information criterion. The model is tested in both simulation studies and real data analysis from different fields, proving to be effective for a wide range of series with evolving features, and competitive with respect to more specific models.
Friedrich, Tobias; Kötzing, Timo; Lagodzinski, J. A. Gregor; Neumann, Frank; Schirneck, MartinAnalysis of the (1+1) EA on Subclasses of Linear Functions under Uniform and Linear Constraints. Theoretical Computer Science 2020: 3-19
Linear functions have gained great attention in the run time analysis of evolutionary computation methods. The corresponding investigations have provided many effective tools for analyzing more complex problems. So far, the runtime analysis of evolutionary algorithms has mainly focused on unconstrained problems, but problems occurring in applications frequently involve constraints. Therefore, there is a strong need to extend the methods for analyzing unconstrained problems to a setting involving constraints. In this paper, we consider the behavior of the classical (1+1) evolutionary algorithm on linear functions under linear constraint. We show tight bounds in the case where the constraint is given by the OneMax function and the objective function is given by either the OneMax or the BinVal function. For the general case we present upper and lower bounds.
Kötzing, Timo; Lagodzinski, J. A. Gregor; Lengler, Johannes; Melnichenko, AnnaDestructiveness of lexicographic parsimony pressure and alleviation by a concatenation crossover in genetic programming. Theoretical Computer Science 2020: 96--113
For theoretical analyses there are two specifics distinguishing GP from many other areas of evolutionary computation: the variable size representations, in particular yielding a possible bloat (i.e. the growth of individuals with redundant parts); and also the role and the realization of crossover, which is particularly central in GP due to the tree-based representation. Whereas some theoretical work on GP has studied the effects of bloat, crossover had surprisingly little share in this work. We analyze a simple crossover operator in combination with randomized local search, where a preference for small solutions minimizes bloat (lexicographic parsimony pressure); we denote the resulting algorithm Concatenation Crossover GP. We consider three variants of the well-studied Majority test function, adding large plateaus in different ways to the fitness landscape and thus giving a test bed for analyzing the interplay of variation operators and bloat control mechanisms in a setting with local optima. We show that the Concatenation Crossover GP can efficiently optimize these test functions, while local search cannot be efficient for all three variants independent of employing bloat control.
Krejca, Martin S.; Witt, CarstenLower Bounds on the Run Time of the Univariate Marginal Distribution Algorithm on OneMax. Theoretical Computer Science 2020: 143-165
The Univariate Marginal Distribution Algorithm (UMDA) -- a popular estimation-of-distribution algorithm -- is studied from a run time perspective. On the classical OneMax benchmark function on bit strings of length \(n\), a lower bound of \(\Omega(\lambda + \mu \sqrt{n} + n\log n)\), where \(\mu\) and \(\lambda\) are algorithm-specific parameters, on its expected run time is proved. This is the first direct lower bound on the run time of UMDA. It is stronger than the bounds that follow from general black-box complexity theory and is matched by the run time of many evolutionary algorithms. The results are obtained through advanced analyses of the stochastic change of the frequencies of bit values maintained by the algorithm, including carefully designed potential functions. These techniques may prove useful in advancing the field of run time analysis for estimation-of-distribution algorithms in general.
Doerr, Benjamin; Kötzing, Timo; Lagodzinski, J. A. Gregor; Lengler, JohannesThe impact of lexicographic parsimony pressure for ORDER/MAJORITY on the run time. Theoretical Computer Science 2020: 144--168
While many optimization problems work with a fixed number of decision variables and thus a fixed-length representation of possible solutions, genetic programming (GP) works on variable-length representations. A naturally occurring problem is that of bloat, that is, the unnecessary growth of solution lengths, which may slow down the optimization process. So far, the mathematical runtime analysis could not deal well with bloat and required explicit assumptions limiting bloat. In this paper, we provide the first mathematical runtime analysis of a GP algorithm that does not require any assumptions on the bloat. Previous performance guarantees were only proven conditionally for runs in which no strong bloat occurs. Together with improved analyses for the case with bloat restrictions our results show that such assumptions on the bloat are not necessary and that the algorithm is efficient without explicit bloat control mechanism. More specifically, we analyzed the performance of the \((1+1)\) GP on the two benchmark functions Order and Majority. When using lexicographic parsimony pressure as bloat control, we show a tight runtime estimate of \(O(T_{init+n \log n)\) iterations both for Order and Majority. For the case without bloat control, the bounds \(O(T_{init log T_init + n (\log n)^3)\) and \(\Omega(T_init + n \log n)\) (and \(Omega (T_init + \log T_{init})\) for \(n=1\)) hold for Majority.
Casel, Katrin; Fernau, Henning; Gaspers, Serge; Gras, Benjamin; Schmid, Markus L.On the Complexity of the Smallest Grammar Problem over Fixed Alphabets. Theory of Computing Systems 2020
In the smallest grammar problem, we are given a word \(w\)and we want to compute a preferably small context-free grammar \(G\) for the singleton language \(\{w\}\) (where the size of a grammar is the sum of the sizes of its rules, and the size of a rule is measured by the length of its right side). It is known that, for unbounded alphabets, the decision variant of this problem is \(\mathsf{NP}\)-hard and the optimisation variant does not allow a polynomial-time approximation scheme, unless \(\mathsf{P}\) = \(\mathsf{NP}\). We settle the long-standing open problem whether these hardness results also hold for the more realistic case of a constant-size alphabet. More precisely, it is shown that the smallest grammar problem remains \(\mathsf{NP}\)-complete (and its optimisation version is \(\mathsf{APX}\)-hard), even if the alphabet is fixed and has size of at least 17. The corresponding reduction is robust in the sense that it also works for an alternative size-measure of grammars that is commonly used in the literature (i.e., a size measure also taking the number of rules into account), and it also allows to conclude that even computing the number of rules required by a smallest grammar is a hard problem. On the other hand, if the number of nonterminals (or, equivalently, the number of rules) is bounded by a constant, then the smallest grammar problem can be solved in polynomial time, which is shown by encoding it as a problem on graphs with interval structure. However, treating the number of rules as a parameter (in terms of parameterised complexity) yields \(\mathsf{W}\)[1]-hardness. Furthermore, we present an \(O(3^{|w|})\) exact exponential-time algorithm, based on dynamic programming. These three main questions are also investigated for 1-level grammars, i.e., grammars for which only the start rule contains nonterminals on the right side; thus, investigating the impact of the ``hierarchical depth'' of grammars on the complexity of the smallest grammar problem. In this regard, we obtain for 1-level grammars similar, but slightly stronger results.
Bilò, Davide; Lenzner, PascalOn the Tree Conjecture for the Network Creation Game. Theory of Computing Systems 2020: 422--443
Selfish Network Creation focuses on modeling real world networks from a game-theoretic point of view. One of the classic models by Fabrikant et al. (2003) is the network creation game, where agents correspond to nodes in a network which buy incident edges for the price of \($\alpha$\) per edge to minimize their total distance to all other nodes. The model is well-studied but still has intriguing open problems. The most famous conjectures state that the price of anarchy is constant for all \($\alpha$\) and that for \($\alpha \geq n$\) all equilibrium networks are trees. We introduce a novel technique for analyzing stable networks for high edge-price \($\alpha$\) and employ it to improve on the best known bound for the latter conjecture. In particular we show that for \($\alpha>4n −13$\) all equilibrium networks must be trees, which implies a constant price of anarchy for this range of \($\alpha$\). Moreover, we also improve the constant upper bound on the price of anarchy for equilibrium trees.
Doerr, Benjamin; Krejca, Martin S.Significance-based Estimation-of-Distribution Algorithms. Transactions on Evolutionary Computation 2020: 1025-1034
Estimation-of-distribution algorithms (EDAs) are randomized search heuristics that create a probabilistic model of the solution space, which is updated iteratively, based on the quality of the solutions sampled according to the model. As previous works show, this iteration-based perspective can lead to erratic updates of the model, in particular, to bit-frequencies approaching a random boundary value. In order to overcome this problem, we propose a new EDA based on the classic compact genetic algorithm (cGA) that takes into account a longer history of samples and updates its model only with respect to information which it classifies as statistically significant. We prove that this significance-based compact genetic algorithm (sig-cGA) optimizes the commonly regarded benchmark functions OneMax, LeadingOnes, and BinVal all in quasilinear time, a result shown for no other EDA or evolutionary algorithm so far. For the recently proposed scGA – an EDA that tries to prevent erratic model updates by imposing a bias to the uniformly distributed model – we prove that it optimizes OneMax only in a time exponential in its hypothetical population size. Similarly, we show that the convex search algorithm cannot optimize OneMax in polynomial time.
Krejca, Martin; Witt, CarstenTheory of Estimation-of-Distribution Algorithms. Theory of Evolutionary Computation: Recent Developments in Discrete Optimization 2020: 405-442
Estimation-of-distribution algorithms (EDAs) are general metaheuristics used in optimization that represent a more recent alternative to classical approaches such as evolutionary algorithms. In a nutshell, EDAs typically do not directly evolve populations of search points but build probabilistic models of promising solutions by repeatedly sampling and selecting points from the underlying search space. Recently, significant progress has been made in the theoretical understanding of EDAs. This chapter provides an up-to-date overview of the most commonly analyzed EDAs and the most recent theoretical results in this area. In particular, emphasis is put on the runtime analysis of simple univariate EDAs, including a description of typical benchmark functions and tools for the analysis. Along the way, open problems and directions for future research are described.
Doskoč, Vanja; Kötzing, TimoCautious Limit Learning. Algorithmic Learning Theory (ALT) 2020: 251-276
We investigate language learning in the limit from text with various \(\mathit{cautious}\) learning restrictions. Learning is \(\mathit{cautious}\) if no hypothesis is a proper subset of a previous guess. While dealing with a seemingly natural learning behaviour, cautious learning does severely restrict explanatory (syntactic) learning power. To further understand why exactly this loss of learning power arises, Kötzing and Palenta (2016) introduced weakened versions of cautious learning and gave first partial results on their relation. In this paper, we aim to understand the restriction of cautious learning more fully. To this end we compare the known variants in a number of different settings, namely full-information and (partially) set-driven learning, paired either with the syntactic convergence restriction (explanatory learning) or the semantic convergence restriction (behaviourally correct learning). To do so, we make use of normal forms presented in Kötzing et al. (2017), most notably strongly locking and consistent learning. While strongly locking learners have been exploited when dealing with a variety of syntactic learning restrictions, we show how they can be beneficial in the semantic case as well. Furthermore, we expand the normal forms to a broader range of learning restrictions, including an answer to the open question of whether cautious learners can be assumed to be consistent, as stated in Kötzing et al. (2017).
Das, Syamantak; Jain, Lavina; Kumar, NikhilA Constant Factor Approximation for Capacitated Min-Max Tree Cover. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM) 2020: 55:1-55:13
Given a graph \(G=(V,E)\) with non-negative real edge lengths and an integer parameter \(k\), the (uncapacitated) Min-Max Tree Cover problem seeks to find a set of at most \(k\) trees which together span \(V\) and each tree is a subgraph of \(G\). The objective is to minimize the maximum length among all the trees. In this paper, we consider a capacitated generalization of the above and give the first constant factor approximation algorithm. In the capacitated version, there is a hard uniform capacity (\(\lambda\)) on the number of vertices a tree can cover. Our result extends to the rooted version of the problem, where we are given a set of \(k\) root vertices, \(R\) and each of the covering trees is required to include a distinct vertex in \(R\) as the root. Prior to our work, the only result known was a \((2k-1)\)-approximation algorithm for the special case when the total number of vertices in the graph is \(k\lambda\) [Guttmann-Beck and Hassin, J. of Algorithms, 1997]. Our technique circumvents the difficulty of using the minimum spanning tree of the graph as a lower bound, which is standard for the uncapacitated version of the problem [Even et al., OR Letters 2004],[Khani et al., Algorithmica 2010]. Instead, we use Steiner trees that cover \(\lambda\) vertices along with an iterative refinement procedure that ensures that the output trees have low cost and the vertices are well distributed among the trees.
Bläsius, Thomas; Böther, Maximilian; Fischbeck, Philipp; Friedrich, Tobias; Gries, Alina; Hüffner, Falk; Kißig, Otto; Lenzner, Pascal; Molitor, Louise; Schiller, Leon; Wells, Armin; Witheger, SimonA Strategic Routing Framework and Algorithms for Computing Alternative Paths. Algorithmic Approaches for Transportation Modelling, Optimization, and Systems (ATMOS) 2020: 10:1--10:14
Traditional navigation services find the fastest route for a single driver. Though always using the fastest route seems desirable for every individual, selfish behavior can have undesirable effects such as higher energy consumption and avoidable congestion, even leading to higher overall and individual travel times. In contrast, strategic routing aims at optimizing the traffic for all agents regarding a global optimization goal. We introduce a framework to formalize real-world strategic routing scenarios as algorithmic problems and study one of them, which we call Single Alternative Path (SAP), in detail. There, we are given an original route between a single origin–destination pair. The goal is to suggest an alternative route to all agents that optimizes the overall travel time under the assumption that the agents distribute among both routes according to a psychological model, for which we introduce the concept of Pareto-conformity. We show that the SAP problem is NP-complete, even for such models. Nonetheless, assuming Pareto-conformity, we give multiple algorithms for different variants of SAP, using multi-criteria shortest path algorithms as subroutines.Moreover, we prove that several natural models are in fact Pareto-conform. The implementation and evaluation of our algorithms serve as a proof of concept, showing that SAP can be solved in reasonable time even though the algorithms have exponential running time in the worst case.
Fogel, Sharon; Averbuch-Elor, Hadar; Cohen, Sarel; Mazor, Shai; Litman, RoeeScrabbleGAN: Semi-Supervised Varying Length Handwritten Text Generation. Conference on Computer Vision and Pattern Recognition (CVPR) 2020: 4323-4332
Optical character recognition (OCR) systems performance have improved significantly in the deep learning era. This is especially true for handwritten text recognition (HTR), where each author has a unique style, unlike printed text, where the variation is smaller by design. That said, deep learning based HTR is limited, as in every other task, by the number of training examples. Gathering data is a challenging and costly task, and even more so, the labeling task that follows, of which we focus here. One possible approach to reduce the burden of data annotation is semi-supervised learning. Semi supervised methods use, in addition to labeled data, some unlabeled samples to improve performance, compared to fully supervised ones. Consequently, such methods may adapt to unseen images during test time. We present ScrabbleGAN, a semi-supervised approach to synthesize handwritten text images that are versatile both in style and lexicon. ScrabbleGAN relies on a novel generative model which can generate images of words with an arbitrary length. We show how to operate our approach in a semi-supervised manner, enjoying the aforementioned benefits such as performance boost over state of the art supervised HTR. Furthermore, our generator can manipulate the resulting text style. This allows us to change, for instance, whether the text is cursive, or how thin is the pen stroke.
Doskoč, Vanja; Friedrich, Tobias; Göbel, Andreas; Neumann, Aneta; Neumann, Frank; Quinzan, FrancescoNon-Monotone Submodular Maximization with Multiple Knapsacks in Static and Dynamic Settings. European Conference on Artificial Intelligence (ECAI) 2020: 435-442
We study the problem of maximizing a non-monotone submodular function under multiple knapsack constraints. We propose a simple discrete greedy algorithm to approach this problem, and prove that it yields strong approximation guarantees for functions with bounded curvature. In contrast to other heuristics, this does not require problem relaxation to continuous domains and it maintains a constant-factor approximation guarantee in the problem size. In the case of a single knapsack, our analysis suggests that the standard greedy can be used in non-monotone settings. Additionally, we study this problem in a dynamic setting, in which knapsacks change during the optimization process. We modify our greedy algorithm to avoid a complete restart at each constraint update. This modification retains the approximation guarantees of the static case. We evaluate our results experimentally on a video summarization and sensor placement task. We show that our proposed algorithm competes with the state-of-the-art in static settings. Furthermore, we show that in dynamic settings with tight computational time budget, our modified greedy yields significant improvements over starting the greedy from scratch, in terms of the solution quality achieved.
Bläsius, Thomas; Friedrich, Tobias; Schirneck, MartinThe Minimization of Random Hypergraphs. European Symposium on Algorithms (ESA) 2020: 21:1-21:15
We investigate the maximum-entropy model \(\mathcal{B}_{n,m,p}\) for random \(n\)-vertex, \(m\)-edge multi-hypergraphs with expected edge size \(pn\). We show that the expected size of the minimization \(\min(\mathcal{B}_{n,m,p})\), i.e., the number of inclusion-wise minimal edges of \(\mathcal{B}_{n,m,p}\), undergoes a phase transition with respect to \(m\). If \(m\) is at most \(1/(1-p)^(1-p)n}\), then \(\mathrm{E[|\min(\mathcal{B}_{n,m,p})|]\) is of order \(\Theta(m)\), while for \(m ge 1/(1-p)^(1-p+\varepsilon)n}\) for any \(\varepsilon > 0\), it is \(\Theta( 2^(\mathrm{H(\alpha) + (1-\alpha) \log_2 p) n/ \sqrt{n})\). Here, \(\mathrm{H}\) denotes the binary entropy function and \(alpha = - (\log_{1-p m)/n\). The result implies that the maximum expected number of minimal edges over all \(m\) is \(\Theta((1+p)^n/\sqrt{n})\). Our structural findings have algorithmic implications for minimizing an input hypergraph. This has applications in the profiling of relational databases as well as for the Orthogonal Vectors problem studied in fine-grained complexity. We make several technical contributions that are of independent interest in probability. First, we improve the Chernoff--Hoeffding theorem on the tail of the binomial distribution. In detail, we show that for a binomial variable \(Y sim \mathrm{Bin(n,p)\) and any \(0 < x < p\), it holds that \(\mathrm{P[Y le xn] = \Theta( 2^-\!\mathrm{D(x \,\|}\, p) n}/\sqrt{n})\), where \(\mathrm{D}\) is the binary Kullback--Leibler divergence between Bernoulli distributions. We give explicit upper and lower bounds on the constants hidden in the big-O notation that hold for all \(n\). Secondly, we establish the fact that the probability of a set of cardinality \(i\) being minimal after \(m\) i.i.d. maximum-entropy trials exhibits a sharp threshold behavior at \(i^* = n + \log_{1-p m\).
Garg, Naveen; Kumar, NikhilDual Half-Integrality for Uncrossable Cut Cover and Its Application to Maximum Half-Integral Flow. European Symposium on Algorithms (ESA) 2020: 55:1-55:13
Given an edge weighted graph and a forest \(F\), the em 2-edge connectivity augmentation problem is to pick a minimum weighted set of edges, \(E'\), such that every connected component of \(E'\cup F\) is 2-edge connected. Williamson et al. gave a 2-approximation algorithm (WGMV) for this problem using the primal-dual schema. We show that when edge weights are integral, the WGMV procedure can be modified to obtain a half-integral dual. The 2-edge connectivity augmentation problem has an interesting connection to routing flow in graphs where the union of supply and demand is planar. The half-integrality of the dual leads to a tight 2-approximate max-half-integral-flow min-multicut theorem.
Doerr, Benjamin; Krejca, Martin S.The Univariate Marginal Distribution Algorithm Copes Well With Deception and Epistasis. Evolutionary Computation in Combinatorial Optimisation (EvoCOP) 2020: 51-66
Best-Paper Award
In their recent work, Lehre and Nguyen (FOGA 2019) show that the univariate marginal distribution algorithm (UMDA) needs time exponential in the parent populations size to optimize the DeceivingLeadingBlocks (DLB) problem. They conclude from this result that univariate EDAs have difficulties with deception and epistasis. In this work, we show that this negative finding is caused by an unfortunate choice of the parameters of the UMDA. When the population sizes are chosen large enough to prevent genetic drift, then the UMDA optimizes the DLB problem with high probability with at most \(\lambda(\frac{n}{2 + 2 e \ln n)\) fitness evaluations. Since an offspring population size \(\lambda\) of order \(n \log n\) can prevent genetic drift, the UMDA can solve the DLB problem with \(O(n^2 \log n)\) fitness evaluations. In contrast, for classic evolutionary algorithms no better run time guarantee than \(O(n^3)\) is known, so our result rather suggests that the UMDA can cope well with deception and epistatis. Together with the result of Lehre and Nguyen, our result for the first time rigorously proves that running EDAs in the regime with genetic drift can lead to drastic performance losses.
Bilò, Davide; Friedrich, Tobias; Lenzner, Pascal; Melnichenko, Anna; Molitor, LouiseFair Tree Connection Games with Topology-Dependent Edge Cost. Foundations of Software Technology and Theoretical Computer Science (FSTTCS) 2020: 15:1--15:15
How do rational agents self-organize when trying to connect to a common target? We study this question with a simple tree formation game which is related to the well-known fair single-source connection game by Anshelevich et al.(FOCS'04) and selfish spanning tree games by Gourvès and Monnot (WINE'08). In our game agents correspond to nodes in a network that activate a single outgoing edge to connect to the common target node (possibly via other nodes). Agents pay for their path to the common target, and edge costs are shared fairly among all agents using an edge. The main novelty of our model is dynamic edge costs that depend on the in-degree of the respective endpoint. This reflects that connecting to popular nodes that have increased internal coordination costs is more expensive since they can charge higher prices for their routing service. In contrast to related models, we show that equilibria are not guaranteed to exist, but we prove the existence for infinitely many numbers of agents. Moreover, we analyze the structure of equilibrium trees and employ these insights to prove a constant upper bound on the Price of Anarchy as well as non-trivial lower bounds on both the Price of Anarchy and the Price of Stability. We also show that in comparison with the social optimum tree the overall cost of an equilibrium tree is more fairly shared among the agents. Thus, we prove that self-organization of rational agents yields on average only slightly higher cost per agent compared to the centralized optimum, and at the same time, it induces a more fair cost distribution. Moreover, equilibrium trees achieve a beneficial trade-off between a low height and low maximum degree, and hence these trees might be of independent interest from a combinatorics point-of-view. We conclude with a discussion of promising extensions of our model.
Doerr, Benjamin; Krejca, Martin S.Bivariate Estimation-of-Distribution Algorithms Can Find an Exponential Number of Optima. Genetic and Evolutionary Computation Conference (GECCO) 2020: 796–804
Finding a large set of optima in a multimodal optimization landscape is a challenging task. Classical population-based evolutionary algorithms (EAs) typically converge only to a single solution. While this can be counteracted by applying niching strategies, the number of optima is nonetheless trivially bounded by the population size. Estimation-of-distribution algorithms (EDAs) provide an alternative approach by maintaining a probabilistic model of the solution space instead of an explicit population. Such a model has the benefit of being able to implicitly represent a solution set that is far larger than any realistic population size. To support the study of how optimization algorithms handle large sets of optima, we propose the test function EqualBlocksOneMax (EBOM). It has an easy to optimize fitness landscape, however, with an exponential number of optima. We show that the bivariate EDA mutual-information-maximizing input clustering (MIMIC), without any problem-specific modification, quickly generates a model that behaves very similarly to a theoretically ideal model for that function, which samples each of the exponentially many optima with the same maximal probability.
Bossek, Jakob; Casel, Katrin; Kerschke, Pascal; Neumann, FrankThe Node Weight Dependent Traveling Salesperson Problem: Approximation Algorithms and Randomized Search Heuristics. Genetic and Evolutionary Computation Conference (GECCO) 2020: 1286-1294
Several important optimization problems in the area of vehicle routing can be seen as a variant of the classical Traveling Salesperson Problem (TSP). In the area of evolutionary computation, the traveling thief problem (TTP) has gained increasing interest over the last 5 years. In this paper, we investigate the effect of weights on such problems, in the sense that the cost of traveling increases with respect to the weights of nodes already visited during a tour. This provides abstractions of important TSP variants such as the Traveling Thief Problem and time dependent TSP variants, and allows to study precisely the increase in difficulty caused by weight dependence. We provide a 3.59-approximation for this weight dependent version of TSP with metric distances and bounded positive weights. Furthermore, we conduct experimental investigations for simple randomized local search with classical mutation operators and two variants of the state-of-the-art evolutionary algorithm EAX adapted to the weighted TSP. Our results show the impact of the node weights on the position of the nodes in the resulting tour.
Friedrich, Tobias; Krejca, Martin S.; Lagodzinski, J. A. Gregor; Rizzo, Manuel; Zahn, ArthurMemetic Genetic Algorithms for Time Series Compression by Piecewise Linear Approximation. International Conference on Neural Information Processing (ICONIP) 2020: 592-604
Time series are sequences of data indexed by time. Such data are collected in various domains, often in massive amounts, such that storing them proves challenging. Thus, time series are commonly stored in a compressed format. An important compression approach is piecewise linear approximation (PLA), which only keeps a small set of time points and interpolates the remainder linearly. Picking a subset of time points such that the PLA minimizes the mean squared error to the original time series is a challenging task, naturally lending itself to heuristics. We propose the piecewise linear approximation genetic algorithm (PLA-GA) for compressing time series by PLA. The PLA-GA is a memetic \((\mu + \lambda)\) GA that makes use of two distinct operators tailored to time series compression. First, we add special individuals to the initial population that are derived using established PLA heuristics. Second, we propose a novel local search operator that greedily improves a compressed time series. We compare the PLA-GA empirically with existing evolutionary approaches and with a deterministic PLA algorithm, known as Bellman's algorithm, that is optimal for the restricted setting of sampling. In both cases, the PLA-GA approximates the original time series better and quicker. Further, it drastically outperforms Bellman's algorithm with increasing instance size with respect to run time until finding a solution of equal or better quality -- we observe speed-up factors between 7 and 100 for instances of 90,000 to 100,000 data points.
Echzell, Hagen; Friedrich, Tobias; Lenzner, Pascal; Melnichenko, AnnaFlow-Based Network Creation Games. International Joint Conference on Artificial Intelligence (IJCAI) 2020: 139-145
Network Creation Games (NCGs) model the creation of decentralized communication networks like the Internet. In such games strategic agents corresponding to network nodes selfishly decide with whom to connect to optimize some objective function. Past research intensively analyzed models where the agents strive for a central position in the network. This models agents optimizing the network for low-latency applications like VoIP. However, with today's abundance of streaming services it is important to ensure that the created network can satisfy the increased bandwidth demand. To the best of our knowledge, this natural problem of the decentralized strategic creation of networks with sufficient bandwidth has not yet been studied. We introduce Flow-Based NCGs where the selfish agents focus on bandwidth instead of latency. In essence, budget-constrained agents create network links to maximize their minimum or average network flow value to all other network nodes. Equivalently, this can also be understood as agents who create links to increase their connectivity and thus also the robustness of the network. For this novel type of NCG we prove that pure Nash equilibria exist, we give a simple algorithm for computing optimal networks, we show that the Price of Stability is~1 and we prove an (almost) tight bound of 2 on the Price of Anarchy. Last but not least, we show that our models do not admit a potential function.
Garg, Naveen; Kumar, Nikhil; Sebö, AndrásInteger Plane Multiflow Maximisation: Flow-Cut Gap and One-Quarter-Approximation. Integer Programming and Combinatorial Optimization (IPCO) 2020: 144-157
Abstract In this paper, we bound the integrality gap and the approximation ratio for maximum plane multiflow problems and deduce bounds on the flow-cut-gap. We consider instances where the union of the supply and demand graphs is planar and prove that there exists a multiflow of value at least half the capacity of a minimum multicut. We then show how to convert any multiflow into a half-integer flow of value at least half the original multiflow. Finally, we round any half-integer multiflow into an integer multiflow, losing at most half the value thus providing a 1/4-approximation algorithm and integrality gap for maximum integer multiflows in the plane.
Feldmann, Andreas; Issac, Davis; Rai, AshutoshFixed-Parameter Tractability of the Weighted Edge Clique Partition Problem. International Symposium on Parameterized and Exact Computation (IPEC) 2020
We develop an FPT algorithm and a bi-kernel for the Weighted Edge Clique Partition (WECP) problem, where a graph with \(n\) vertices and integer edge weights is given together with an integer \(k\), and the aim is to find \(k\) cliques, such that every edge appears in exactly as many cliques as its weight. The problem has been previously only studied in the unweighted version called Edge Clique Partition (ECP), where the edges need to be partitioned into \(k\) cliques. It was shown that ECP admits a kernel with \(k^2\) vertices [Mujuni and Rosamond, 2008], but this kernel does not extend to WECP. The previously fastest algorithm known for ECP has a runtime of \( 2^O(k^2)}\) \(n^{O(1)}\) [Issac, 2019]. For WECP we develop a bi-kernel with \(4^k\) vertices, and an algorithm with runtime \(2^{O(k^3/2w^1/2\log(k/w)) n^O(1)}\), where \(w\) is the maximum edge weight. The latter in particular improves the runtime for ECP to \(2^{O(k^3/2\log k) n^O(1)}\).
Kumar, NikhilMulticommodity Flows in Planar Graphs with Demands on Faces. International Symposium on Algorithms and Computation (ISAAC) 2020
We consider the problem of multicommodity flows in planar graphs. Seymour showed that if the union of supply and demand graphs is planar, then the cut condition is also sufficient for routing demands. Okamura-Seymour showed that if the supply graph is planar and all demands are incident on one face, then again the cut condition is sufficient for routing demands. We consider a common generalization of these settings where the end points of each demand are on the same face of the planar graph. We show that if the source sink pairs on each face of the graph are such that sources and sinks appear contiguously on the cycle bounding the face, then the flow cut gap is at most 3. We come up with a notion of approximating demands on a face by convex combination of laminar demands to prove this result.
Bilò, Davide; Bilò, Vittorio; Lenzner, Pascal; Molitor, LouiseTopological Influence and Locality in Swap Schelling Games. International Symposium on Mathematical Foundations of Computer Science (MFCS) 2020: 15:1--15:15
Kötzing, Timo; Witt, CarstenImproved Fixed-Budget Results via Drift Analysis. Parallel Problem Solving From Nature (PPSN) 2020: 648-660
Fixed-budget theory is concerned with computing or bounding the fitness value achievable by randomized search heuristics within a given budget of fitness function evaluations. Despite recent progress in fixed-budget theory, there is a lack of general tools to derive such results. We transfer drift theory, the key tool to derive expected optimization times, to the fixed-budged perspective. A first and easy-to-use statement concerned with iterating drift in so-called greed-admitting scenarios immediately translates into bounds on the expected function value. Afterwards, we consider a more general tool based on the well-known variable drift theorem. Applications of this technique to the LeadingOnes benchmark function yield statements that are more precise than the previous state of the art.
Antoniadis, Antonios; Garg, Naveen; Kumar, Gunjan; Kumar, NikhilParallel Machine Scheduling to Minimize Energy Consumption. Symposium on Discrete Algorithms (SODA) 2020: 2758-2769
Given \(n\) jobs with release dates, deadlines and processing times we consider the problem of scheduling them on \(m\) parallel machines so as to minimize the total energy consumed. Machines can enter a sleep state and they consume no energy in this state. Each machine requires \(L\) units of energy to awaken from the sleep state and in its active state the machine can process jobs and consumes a unit of energy per unit time. We allow for preemption and migration of jobs and provide the first constant approximation algorithm for this problem.
Feldmann, Michael; Khazraei, Ardalan; Scheideler, ChristianTime- and Space-Optimal Clock Synchronization in the Beeping Model. Symposium Parallelism in Algorithms and Architectures (SPAA) 2020
We consider the clock synchronization problem in the (discrete) beeping model: Given a network of \(n\) nodes with each node having a clock value \(\delta(v) \in\) \(\{ 0,\ldots , T-1\}\), the goal is to synchronize the clock values of all nodes such that they have the same value in any round. As is standard in clock synchronization, we assume arbitrary activations for all nodes, i.e., the nodes start their protocol at an arbitrary round (not limited to \(\{0,\ldots,T-1\}\)). We give an asymptotically optimal algorithm that runs in \( 4D + \lfloor D/\lfloor T/4 \rfloor \rfloor \cdot\ \)\((T \mod 4) = O(D)\) rounds, where \(D\) is the diameter of the network. Once all nodes are in sync, they beep at the same round every \(T\) rounds. The algorithm drastically improves on the \(O(T D)\)-bound of [Alistarh et al. 2013] (where \(T\) is required to be at least \(4n\), so the bound is no better than \(O(nD)\)). Our algorithm is very simple as nodes only have to maintain \(3\) bits in addition to the \(\lceil \log T \rceil\) bits needed to maintain the clock. Furthermore we investigate the complexity of self-stabilizing solutions for the clock synchronization problem: We first show lower bounds of \(\Omega(\max\{T,n\})\) rounds on the runtime and \(\Omega(\log(\max\{T,n\}))\) bits of memory required for any such protocol. Afterwards we present a protocol that runs in \(O(\max\{T,n\})\) rounds using at most \(O(\log(\max\{T,n\}))\) bits at each node, which is asymptotically optimal with regards to both, runtime and memory requirements.
Bläsius, Thomas; Fischbeck, Philipp; Friedrich, Tobias; Katzmann, MaximilianSolving Vertex Cover in Polynomial Time on Hyperbolic Random Graphs. Symposium on the Theoretical Aspects of Computer Science (STACS) 2020: 25:1-25:14
The VertexCover problem is proven to be computationally hard in different ways: It is NP-complete to find an optimal solution and even NP-hard to find an approximation with reasonable factors. In contrast, recent experiments suggest that on many real-world networks the run time to solve VertexCover is way smaller than even the best known FPT-approaches can explain. Similarly, greedy algorithms deliver very good approximations to the optimal solution in practice. We link these observations to two properties that are observed in many real-world networks, namely a heterogeneous degree distribution and high clustering. To formalize these properties and explain the observed behavior, we analyze how a branch-and-reduce algorithm performs on hyperbolic random graphs, which have become increasingly popular for modeling real-world networks. In fact, we are able to show that the VertexCover problem on hyperbolic random graphs can be solved in polynomial time, with high probability. The proof relies on interesting structural properties of hyperbolic random graphs. Since these predictions of the model are interesting in their own right, we conducted experiments on real-world networks showing that these properties are also observed in practice. When utilizing the same structural properties in an adaptive greedy algorithm, further experiments suggest that, on real instances, this leads to better approximations than the standard greedy approach within reasonable time. We link these observations to two properties that are observed in many real-world networks, namely a heterogeneous degree distribution and high clustering. To formalize these properties and explain the observed behavior, we analyze how a branch-and-reduce algorithm performs on hyperbolic random graphs, which have become increasingly popular for modeling real-world networks. In fact, we are able to show that the VertexCover problem on hyperbolic random graphs can be solved in polynomial time, with high probability. The proof relies on interesting structural properties of hyperbolic random graphs. Since these predictions of the model are interesting in their own right, we conducted experiments on real-world networks showing that these properties are also observed in practice. When utilizing the same structural properties in an adaptive greedy algorithm, further experiments suggest that this leads to even better approximations than the standard greedy approach on real instances.
Chechik, Shiri; Cohen, SarelDistance sensitivity oracles with subcubic preprocessing time and fast query time. Symposium Theory of Computing (STOC) 2020: 1375-1388
We present the first distance sensitivity oracle (DSO) with subcubic preprocessing time and poly-logarithmic query time for directed graphs with integer weights in the range \([−M,M]\). Weimann and Yuster [FOCS 10] presented a distance sensitivity oracle for a single vertex/edge failure with subcubic preprocessing time of \(\mathcal{O(Mn^\omega+1− \alpha})\) and subquadratic query time of \(\tilde{\mathcal{O(n^{1+\alpha})\), where \(\alpha\) is any parameter in \([0,1]\), \(n\) is the number of vertices, \(m\) is the number of edges, the \(\tilde{\mathcal{O(·)\) notation hides poly-logarithmic factors in \(n\) and \(\omega<2.373\) is the matrix multiplication exponent.Later, Grandoni and Vassilevska Williams [FOCS 12] substantially improved the query time to sublinear in \(n\). In particular, they presented a distance sensitivity oracle for a single vertex/edge failure with \(\tilde{\mathcal{O}}\)\( (Mn^\omega +1/2}\) \(+ Mn^\omega+\alpha(4−\omega)})\) preprocessing time and \(\tilde{\mathcal{O(n^{1−\alpha})\) query time. Despite the substantial improvement in the query time, it still remains polynomial in the size of the graph, which may be undesirable in many settings where the graph is of large scale. A natural question is whether one can hope for a distance sensitivity oracle with subcubic preprocessing time and very fast query time (of poly-logarithmic in \(n\)). In this paper we answer this question affirmatively by presenting a distance sensitive oracle supporting a single vertex/edge failure in subcubic \(\tilde{\mathcal{O(Mn^{2.873})\) preprocessing time for \(\omega=2.373\), \(\tilde{\mathcal{O(n^{2.5})\) space and near optimal query time of \(\tilde{\mathcal{O(1)\). For comparison, with the same \(\tilde{\mathcal{O(Mn^{2.873})\) preprocessing time the DSO of Grandoni and Vassilevska Williams has \(\tilde{\mathcal{O(n^{0.693})\) query time. In fact, the best query time their algorithm can obtain is \(\tilde{\mathcal{O(Mn^{0.385})\) (with \(\tilde{\mathcal{O(Mn^3)\) preprocessing time).
Becher, Kilian; Lagodzinski, J. A. Gregor; Strufe, ThorstenPrivacy-Preserving Public Verification of Ethical Cobalt Sourcing. Trust, Security and Privacy in Computing and Communications (TrustCom) 2020
Cobalt is a key ingredient of lithium-ion batteries and therefore is crucial for many modern devices. To ensure ethical sourcing, consumers need a way to verify provenance of their cobalt-based products, including the percentage of artisanally mined (ASM) cobalt. Existing frameworks for provenance and supply chain traceability rely on distributed ledgers. Providing public verifiability via permissionless distributed ledgers is trivial. However, offering public verifiability based on confidential production details seems contradictory. Hence, existing frameworks lack public verifiability of ratios between commodities while ensuring confidentiality of supply chain details. We propose a protocol that allows end consumers to verify the percentage of ASM cobalt in their products. Unlike previous solutions, production details are published and processed entirely in encrypted form by employing homomorphic encryption and proxy re-encryption. Thus, it ensures a high level of confidentiality of supply chain data. It has constant consumer-side complexity, making it suitable for mobile devices.
Held, Stephan; Khazraei, ArdalanAn Improved Approximation Algorithm for the Uniform Cost-Distance Steiner Tree Problem. Workshop on Approximation and Online Algorithms (WAOA) 2020
The cost-distance Steiner tree problem asks for a Steiner tree in a graph that minimizes the total cost plus a weighted sum of path delays from the root to the sinks. We present an improved approximation for the uniform cost-distance Steiner tree problem, where the delay of a path corresponds to the sum of edge costs along that path. Previous approaches deploy a bicriteria approximation algorithm for the length-bounded variant that does not take the actual delay weights into account. Our algorithm modifies a similar algorithm for the single-sink buy-at-bulk problem by Guha et al. [2009], allowing a better approximation factor for our problem. In contrast to the bicriteria algorithms it considers delay weights explicitly. Thereby, we achieve an approximation factor of (1+ \(\beta\) ), where \(\beta\) is the approximation factor for the Steiner tree problem. This improves the previously best known approximation factor for the uniform cost-distance Steiner tree problem from 2:87 to 2:39. This algorithm can be extended to the problem where the ratio of edge costs to edge delays throughout the graph is bounded from above and below. In particular, this shows that a previous inapproximability result (Chuzhoy et al. [2008]) requires large variations between edge delays and costs. Finally, we present an important application of our new algorithm in chip design. The cost-distance Steiner tree problem occurs as a Lagrangean subproblem when optimizing millions of Steiner trees with mutually depending path length bounds. We show how to quickly approximate a continuous relaxation of this problem with our new algorithm.
Pappik, MarcusNew Conditions via Markov Chains: Approximating Partition Functions of Abstract Polymer Models without Cluster Expansion. master's thesis, Hasso Plattner Institute 2020
Abstract polymer models are combinatorial structures that consist of a set of weighted objects, called polymers, together with a reflexive and symmetric incompatibility relation that describes which polymers cannot occur together. In this thesis we present the first Markov chain approach for sampling from the Gibbs distribution of abstract polymer models. Known Markov chains for polymer models from vertex and edge spin systems can be seen as special cases of this polymer Markov chain. We introduce the concept of polymer cliques and propose a generalized polymer mixing condition as a way to guarantee rapid mixing of our chain. The form of this condition is similar to conditions from cluster expansion approaches, such as the Kotecký-Preiss condition and the Fernández-Procacci condition, but it is less restrictive. To obtain an efficient sampling scheme for the Gibbs distribution from our polymer Markov chain, we prove that it suffices to draw each step of the chain approximately according to its transition probabilities. As one way to approximate each transition of the chain, we suggest to truncate each polymer clique based on some notion of size. We introduce the clique truncation condition as a general tool to determine the maximum size of polymers that we have to consider for the steps of the chain. We prove that our sampling scheme can be used to construct an FPRAS for the partition function. By this, we obtain the first Markov chain Monte Carlo method that works for abstract polymer models in a similar regime as cluster expansion approaches and beyond, while avoiding their complicated analytical assumptions and arguments. Further, we illustrate how our approach can be applied to algorithmic problems like the hard-core model on bipartite \(\alpha\)-expander graphs and the perfect matching polynomial to obtain new trade-offs between runtime and weight bounds. We emphasize that similar results can be obtained for a variety of other applications.
Kötzing, Timo; Seidel, KarenLearning Languages in the Limit from Positive Information with Finitely Many Memory Changes. CoRR 2020
ArXiv preprint
We investigate learning collections of languages from texts by an inductive inference machine with access to the current datum and its memory in form of states. The bounded memory states (\(BMS\)) learner is considered successful in case it eventually settles on a correct hypothesis while exploiting only finitely many different states. We give the complete map of all pairwise relations for an established collection of learning success restrictions. Most prominently, we show that non-U-shapedness is not restrictive, while conservativeness and (strong) monotonicity are. Some results carry over from iterative learning by a general lemma showing that, for a wealth of restrictions (the semantic restrictions), iterative and bounded memory states learning are equivalent. We also give an example of a non-semantic restriction (strongly non-U-shapedness) where the two settings differ.
Bläsius, Thomas; Friedrich, Tobias; Lischeid, Julius; Meeks, Kitty; Schirneck, MartinEfficiently Enumerating Hitting Sets of Hypergraphs Arising in Data Profiling. CoRR 2020
ArXiv preprint
We devise a method to enumerate the inclusion-wise minimal hitting sets of a hypergraph. The algorithm has delay \(O( m^k^*+1 n^2)\) on \(n\)-vertex, \(m\)-edge hypergraphs, where \(k^*\) is the rank of the transversal hypergraph, i.e., the cardinality of the largest minimal solution. In particular, on classes of hypergraphs for which \(k^*\) is bounded, the delay is polynomial. The algorithm uses space linear in the input size only. The enumeration methods solves the extension problem for minimal hitting sets as a subroutine. We show that this problem, parameterised by the cardinality of the set which is to be extended, is one of the first natural W[3]-complete problems. We give an algorithm for the subroutine that is optimal under the assumption that \(W[2] \neq \mathrm{FPT}\) or the exponential time hypothesis (ETH), respectively. Despite the hardness of the extension problem, we provide empirical evidence indicating that the enumeration outperforms its theoretical worst-case guarantee on hypergraphs arising in the profiling of relational databases, namely, in the detection of unique column combinations. Our analysis suggest that these hypergraphs exhibit structure that allows the subroutine to be fast on average.
Khazraei, Ardalan; Kötzing, Timo; Seidel, KarenLearning Half-Spaces and other Concept Classes in the Limit with Iterative Learners. CoRR 2020
ArXiv preprint
In order to model an efficient learning paradigm, iterative learning algorithms access data one by one, updating the current hypothesis without regress to past data. Past research on iterative learning analyzed for example many important additional requirements and their impact on iterative learners. In this paper, our results are twofold. First, we analyze the relative learning power of various settings of iterative learning, including learning from text and from informant, as well as various further restrictions, for example we show that strongly non-U-shaped learning is restrictive for iterative learning from informant. Second, we investigate the learnability of the concept class of half-spaces and provide a constructive iterative algorithm to learn the set of half-spaces from informant.
Issac, Davis; Chandran, L. Sunil; Zhou, SanmingHadwiger's conjecture for squares of 2-trees. European Journal of Combinatorics 2019
Hadwiger's conjecture asserts that any graph contains a clique minor with order no less than the chromatic number of the graph. We prove that this well-known conjecture is true for all graphs if and only if it is true for squares of split graphs. This observation implies that Hadwiger's conjecture for squares of chordal graphs is as difficult as the general case, since chordal graphs are a superclass of split graphs. Then we consider 2-trees which are a subclass of each of planar graphs, 2-degenerate graphs and chordal graphs. We prove that Hadwiger's conjecture is true for squares of 2-trees. We achieve this by proving the following stronger result: If G is the square of a 2 tree, then G has a clique minor of size \(\chi(G)\), where each branch set is a path.
Friedrich, Tobias; Krejca, Martin S.; Rothenberger, Ralf; Arndt, Tobias; Hafner, Danijar; Kellermeier, Thomas; Krogmann, Simon; Razmjou, ArminRouting for On-Street Parking Search using Probabilistic Data. AI Communications 2019: 113-124
A significant percentage of urban traffic is caused by the search for parking spots. One possible approach to improve this situation is to guide drivers along routes which are likely to have free parking spots. The task of finding such a route can be modeled as a probabilistic graph problem which is NP-complete. Thus, we propose heuristic approaches for solving this problem and evaluate them experimentally. For this, we use probabilities of finding a parking spot, which are based on publicly available empirical data from TomTom International B.V. Additionally, we propose a heuristic that relies exclusively on conventional road attributes. Our experiments show that this algorithm comes close to the baseline by a factor of 1.3 in our cost measure. Last, we complement our experiments with results from a field study, comparing the success rates of our algorithms against real human drivers.
Doerr, Benjamin; Doerr, Carola; Kötzing, TimoSolving Problems with Unknown Solution Length at Almost No Extra Cost. Algorithmica 2019: 703-748
Following up on previous work of Cathabard et al. (in: Proceedings of foundations of genetic algorithms (FOGA’11), ACM, 2011) we analyze variants of the (1 + 1) evolutionary algorithm (EA) for problems with unknown solution length. For their setting, in which the solution length is sampled from a geometric distribution, we provide mutation rates that yield for both benchmark functions ONEMAX and LEADINGONES an expected optimization time that is of the same order as that of the (1 + 1) EA knowing the solution length. More than this, we show that almost the same run times can be achieved even if no a priori information on the solution length is available. We also regard the situation in which neither the number nor the positions of the bits with an influence on the fitness function are known. Solving an open problem from Cathabard et al. we show that, for arbitrary natural numbers s, such ONEMAX and LEADINGONES instances can be solved, simultaneously for all natural numbers \(n\), in expected time \(O(n(\log(n))^2 \log\log(n) ... \log^{(s−1)(n)(\log^{(s)(n))^(1+\varepsilon)})\) and \(O(n^2 \log(n) \log\log(n) ... \log^{(s−1)(n)(\log^{(s)(n))^(1+\varepsilon)})\), respectively; that is, in almost the same time as if \(n\) and the relevant bit positions were known. For the LEADINGONES case, we prove lower bounds of same asymptotic order of magnitude apart from the \((\log^{(s)(n))^\varepsilon\) factor. Aiming at closing this arbitrarily small remaining gap, we realize that there is no asymptotically best performance for this problem. For any algorithm solving, for all \(n\), all instances of size \(n\) in expected time at most \(T(n)\), there is an algorithm doing the same in time \(T'(n)\) with \(T' = o(T)\). For ONEMAX we show results of similar flavor.
Shi, Feng; Schirneck, Martin; Friedrich, Tobias; Kötzing, Timo; Neumann, FrankReoptimization Time Analysis of Evolutionary Algorithms on Linear Functions Under Dynamic Uniform Constraints. Algorithmica 2019: 828-857
Rigorous runtime analysis is a major approach towards understanding evolutionary computing techniques, and in this area linear pseudo-Boolean objective functions play a central role. Having an additional linear constraint is then equivalent to the NP-hard Knapsack problem, certain classes thereof have been studied in recent works. In this article, we present a dynamic model of optimizing linear functions under uniform constraints. Starting from an optimal solution with respect to a given constraint bound, we investigate the runtimes that different evolutionary algorithms need to recompute an optimal solution when the constraint bound changes by a certain amount. The classical \((1+1)\) EA and several population-based algorithms are designed for that purpose, and are shown to recompute efficiently. Furthermore, a variant of the \((1+(\lambda,\lambda))\) GA for the dynamic optimization problem is studied, whose performance is better when the change of the constraint bound is small.
Doerr, Benjamin; Fischbeck, Philipp; Frahnow, Clemens; Friedrich, Tobias; Kötzing, Timo; Schirneck, MartinIsland Models Meet Rumor Spreading. Algorithmica 2019: 886-915
Island models in evolutionary computation solve problems by a careful interplay of independently running evolutionary algorithms on the island and an exchange of good solutions between the islands. In this work, we conduct rigorous run time analyses for such island models trying to simultaneously obtain good run times and low communication effort. We improve the existing upper bounds for both measures (i) by improving the run time bounds via a careful analysis, (ii) by balancing individual computation and communication in a more appropriate manner, and (iii) by replacing the usual communicate-with-all approach with randomized rumor spreading. In the latter, each island contacts a randomly chosen neighbor. This epidemic communication paradigm is known to lead to very fast and robust information dissemination in many applications. Our results concern island models running simple (1+1) evolutionary algorithms to optimize the classic test functions OneMax and LeadingOnes. We investigate binary trees, d-dimensional tori, and complete graphs as communication topologies.
Cseh, Ágnes; Matuschke, JannikNew and Simple Algorithms for Stable Flow Problems. Algorithmica 2019: 2557-2591
Stable flows generalize the well-known concept of stable matchings to markets in which transactions may involve several agents, forwarding flow from one to another. An instance of the problem consists of a capacitated directed network in which vertices express their preferences over their incident edges. A network flow is stable if there is no group of vertices that all could benefit from rerouting the flow along a walk. Fleiner (Algorithms 7:1-14, 2014) established that a stable flow always exists by reducing it to the stable allocation problem. We present an augmenting path algorithm for computing a stable flow, the first algorithm that achieves polynomial running time for this problem without using stable allocations as a black-box subroutine. We further consider the problem of finding a stable flow such that the flow value on every edge is within a given interval. For this problem, we present an elegant graph transformation and based on this, we devise a simple and fast algorithm, which also can be used to find a solution to the stable marriage problem with forced and forbidden edges. Finally, we study the stable multicommodity flow model introduced by Király and Pap (Algorithms 6:161-168, 2013). The original model is highly involved and allows for commodity-dependent preference lists at the vertices and commodity-specific edge capacities. We present several graph-based reductions that show equivalence to a significantly simpler model. We further show that it is NP-complete to decide whether an integral solution exists.
Neumann, Aneta; Neumann, Frank; Friedrich, TobiasQuasi-random Image Transition and Animation. Australian Journal of Intelligent Information Processing Systems 2019: 10-18
Evolutionary algorithms have been widely used in the area of creativity. Recently, evolutionary processes have been used to create artistic image transition processes using random walks. In this paper, we explore the use of quasi-random walks for evolutionary image transition and animation. Quasi-random walks show similar features as standard random walks, but with much less randomness. We utilize this established model from discrete mathematics and show how agents carrying out quasi-random walks can be used for evolutionary image transition and animation. The key idea is to generalize the notion of quasi-random walks and let a set of autonomous agents perform quasi-random walks painting an image. Each agent has one particular target image that they paint when following a sequence of directions for their quasi-random walk.The sequence can easily be chosen by a user and allows them to produce a wide range of different transition patterns and animations.
Cechlárová, Katarína; Cseh, Ágnes; Manlove, DavidSelected open problems in Matching Under Preferences. Bulletin of the European Association for Theoretical Computer Science 2019: 14-38
The House Allocation problem, the Stable Marriage problem and the Stable Roommates problem are three fundamental problems in the area of matching under preferences. These problems have been studied for decades under a range of optimality criteria, but despite much progress, some challenging questions remain open. The purpose of this article is to present a range of key open questions for each of these problems, which will hopefully stimulate further research activity in this area.
Battaglia, Francesco; Cucina, Domenico; Rizzo, ManuelDetection and estimation of additive outliers in seasonal time series. Computational Statistics 2019: 1-17
The detection of outliers in a time series is an important issue because their presence may have serious negative effects on the analysis in many different ways. Moreover the presence of a complex seasonal pattern in the series could affect the properties of the usual outlier detection procedures. Therefore modelling the appropriate form of seasonality is a very important step when outliers are present in a seasonal time series. In this paper we present some procedures for detection and estimation of additive outliers when parametric seasonal models, in particular periodic autoregressive, are specified to fit the data. A simulation study is presented to evaluate the benefits and the drawbacks of the proposed procedure on a selection of seasonal time series. An application to three real time series is also examined.
Trubenova, Barbora; Kötzing, Timo; Krejca, Martin S.; Lehre, Per KristianSurfing on the seascape: Adaptation in a changing environment. Evolution: International Journal of Organic Evolution 2019: 1356-1374
The environment changes constantly at various time scales and, in order to survive, species need to keep adapting. Whether these species succeed in avoiding extinction is a major evolutionary question. Using a multilocus evolutionary model of a mutation‐limited population adapting under strong selection, we investigate the effects of the frequency of environmental fluctuations on adaptation. Our results rely on an "adaptive‐walk" approximation and use mathematical methods from evolutionary computation theory to investigate the interplay between fluctuation frequency, the similarity of environments, and the number of loci contributing to adaptation. First, we assume a linear additive fitness function, but later generalize our results to include several types of epistasis. We show that frequent environmental changes prevent populations from reaching a fitness peak, but they may also prevent the large fitness loss that occurs after a single environmental change. Thus, the population can survive, although not thrive, in a wide range of conditions. Furthermore, we show that in a frequently changing environment, the similarity of threats that a population faces affects the level of adaptation that it is able to achieve. We check and supplement our analytical results with simulations.
Bläsius, Thomas; Rademacher, Marcel; Rutter, IgnazHow to Draw a Planarization. Graph Algorithms and Applications 2019: 653-682
We study the problem of computing straight-line drawings of non-planar graphs with few crossings. We assume that a crossing-minimization algorithm is applied first, yielding a planarization, i.e., a planar graph with a dummy vertex for each crossing, that fixes the topology of the resulting drawing. We present and evaluate two different approaches for drawing a planarization in such a way that the edges of the input graph are as straight as possible. The first approach is based on the planarity-preserving force-directed algorithm IMPRED [Simonetto et al. Computer Graphics Forum 2011], the second approach, which we call Geometric Planarization Drawing, iteratively moves vertices to their locally optimal positions in the given initial drawing. Our evaluation shows that both approaches significantly improve the initial drawing and that our geometric approach outperforms the force-directed approach. To the best of our knowledge, this is the first paper concerned with the generation of a straight-line drawing that respects an arbitrary planarization.
Cseh, Ágnes; Skutella, MartinPaths to stable allocations. International Journal of Game Theory 2019: 835-862
The stable allocation problem is one of the broadest extensions of the well-known stable marriage problem. In an allocation problem, edges of a bipartite graph have capacities and vertices have quotas to fill. Here we investigate the case of uncoordinated processes in stable allocation instances. In this setting, a feasible allocation is given and the aim is to reach a stable allocation by raising the value of the allocation along blocking edges and reducing it on worse edges if needed. Do such myopic changes lead to a stable solution? In our present work, we analyze both better and best response dynamics from an algorithmic point of view. With the help of two deterministic algorithms we show that random procedures reach a stable solution with probability one for all rational input data in both cases. Surprisingly, while there is a polynomial path to stability when better response strategies are played (even for irrational input data), the more intuitive best response steps may require exponential time. We also study the special case of correlated markets. There, random best response strategies lead to a stable allocation in expected polynomial time.
Cucina, Domenico; Rizzo, Manuel; Ursu, EugenMultiple changepoint detection for periodic autoregressive models with an application to river flow analysis. Stochastic Environmental Research and Risk Assessment 2019: 1137-1157
River flow data are usually subject to several sources of discontinuity and inhomogeneity. An example is seasonality, because climatic oscillations occurring on inter-annual timescale have an obvious impact on the river flow. Further sources of alteration can be caused by changes in reservoir management, instrumentation or even unexpected shifts in climatic conditions. When such changes are ignored the results of a statistical analysis can be strongly misleading, so flexible procedures are needed for building the appropriate models, which may be very complex. This paper develops an automatic procedure to estimate the number and locations of changepoints in Periodic AutoRegressive (PAR) models, which have been extensively used to account for seasonality in hydrology. We aim at filling the literature gap on multiple changepoint detection by allowing several time segments to be detected, inside of which a different PAR structure is specified, with the resulting model being employed to successfully capture the discontinuities of river flow data. The model estimation is performed by optimization of an objective function based on an information criterion using genetic algorithms. The proposed methodology is evaluated by means of simulation studies and it is then employed in the analysis of two river flows: the South Saskatchewan, measured at Saskatoon, Canada, and the Colorado, measured at Lees Ferry, Arizona. For these river flows we build changepoint models, discussing the possible events that caused discontinuity, and evaluate their forecasting accuracy. Comparisons with the literature on river flow analysis and on existing methods for changepoint detection confirm the efficiency of our proposal.
Batra, Sanjit Singh; Kumar, Nikhil; Tripathi, AmitabhaSome Problems Concerning the Frobenius Number for Extensions of an Arithmetic Progression. The Ramanujan Journal 2019: 545-565
For positive and relative prime set of integers \(A=\{a_1,\ldots,a_k\}\), let \(\Gamma(A)\) denote the set of integers of the form \(a_1 x_1+ \ldots +a_k x_k\) with each \(x_i \geq 0\). It is well known that \(\Gamma^c(A)=\mathbb{N setminus \Gamma(A)\) is a finite set, so that \(\texttt{g(A)\), which denotes the largest integer in \(\Gamma^{c(A)\), is well defined. Let \(A=AP(a,d,k)\) denote the set \(\{ a,a+d, dots, a+(k-1)d \}\) of integers in arithmetic progression, and let \(\texttt{gcd(a,d)=1\). We (i) determine the set \(A^+=\{ b in \Gamma^c(A):\texttt{g(A \cup \ b \})= \texttt{g(A) \}\); (ii) determine a subset \(\overline{A^{+}}\) of \(\Gamma^{c(A)\) of largest cardinality such that \(A \cup \overline{A^{+}}\) is an independent set and \(\texttt{g(A \cup \overline{A^{+}})=\texttt{g(A)\); and (iii) determine \(\texttt{g(A \cup \{b\})\) for some class of values of \(b\) that includes results of some recent work.
Friedrich, Tobias; Kötzing, Timo; Krejca, Martin S.Unbiasedness of Estimation-of-Distribution Algorithms. Theoretical Computer Science 2019: 46-59
In the context of black-box optimization, black-box complexity is used for understanding the inherent difficulty of a given optimization problem. Central to our understanding of nature-inspired search heuristics in this context is the notion of unbiasedness. Specialized black-box complexities have been developed in order to better understand the limitations of these heuristics – especially of (population-based) evolutionary algorithms (EAs). In contrast to this, we focus on a model for algorithms explicitly maintaining a probability distribution over the search space: so-called estimation-of-distribution algorithms (EDAs). We consider the recently introduced \(n\)-Bernoulli-\(\lambda\)-EDA framework, which subsumes, for example, the commonly known EDAs PBIL, UMDA, \(\lambda\)-MMAS\(_\textrm{IB}\), and cGA. We show that an \(n\)-Bernoulli-\(\lambda\)-EDA is unbiased if and only if its probability distribution satisfies a certain invariance property under isometric automorphisms of \([0, 1]^n\). By restricting how an \(n\)-Bernoulli-\(\lambda\)-EDA can perform an update, in a way common to many examples, we derive conciser characterizations, which are easy to verify. We demonstrate this by showing that our examples above are all unbiased.
Kötzing, Timo; Krejca, Martin S.First-hitting times under drift. Theoretical Computer Science 2019: 51-69
For the last ten years, almost every theoretical result concerning the expected run time of a randomized search heuristic used drift theory, making it the arguably most important tool in this domain. Its success is due to its ease of use and its powerful result: drift theory allows the user to derive bounds on the expected first-hitting time of a random process by bounding expected local changes of the process – the drift. This is usually far easier than bounding the expected first-hitting time directly. Due to the widespread use of drift theory, it is of utmost importance to have the best drift theorems possible. We improve the fundamental additive, multiplicative, and variable drift theorems by stating them in a form as general as possible and providing examples of why the restrictions we keep are still necessary. Our additive drift theorem for upper bounds only requires the process to be lower-bounded, that is, we remove unnecessary restrictions like a finite, discrete, or bounded state space. As corollaries, the same is true for our upper bounds in the case of variable and multiplicative drift. By bounding the step size of the process, we derive new lower-bounding multiplicative and variable drift theorems. Last, we also state theorems that are applicable when the process has a drift of 0, by using a drift on the variance of the process.
Cseh, Ágnes; Irving, Robert W.; Manlove, David F.The Stable Roommates Problem with Short Lists. Theory of Computing Systems 2019: 128-149
We consider two variants of the classical Stable Roommates problem with Incomplete (but strictly ordered) preference lists (sri) that are degree constrained, i.e., preference lists are of bounded length. The first variant, egald-sri, involves finding an egalitarian stable matching in solvable instances of sri with preference lists of length at most \(d\). We show that this problem is NP-hard even if \(d = 3\). On the positive side we give a \(\frac{2d+3}{7}\)-approximation algorithm for \(d \in \{3,4,5\}\) which improves on the known bound of 2 for the unbounded preference list case. In the second variant of sri, called d-srti, preference lists can include ties and are of length at most d. We show that the problem of deciding whether an instance of d-srti admits a stable matching is NP-complete even if \(d = 3\). We also consider the "most stable" version of this problem and prove a strong inapproximability bound for the \(d = 3\) case. However for \(d = 2\) we show that the latter problem can be solved in polynomial time.
Friedrich, Tobias; Göbel, Andreas; Neumann, Frank; Quinzan, Francesco; Rothenberger, RalfGreedy Maximization of Functions with Bounded Curvature Under Partition Matroid Constraints. Conference on Artificial Intelligence (AAAI) 2019: 2272-2279
We investigate the performance of a deterministic GREEDY algorithm for the problem of maximizing functions under a partition matroid constraint. We consider non-monotone submodular functions and monotone subadditive functions. Even though constrained maximization problems of monotone submodular functions have been extensively studied, little is known about greedy maximization of non-monotone submodular functions or monotone subadditive functions. We give approximation guarantees for GREEDY on these problems, in terms of the curvature. We find that this simple heuristic yields a strong approximation guarantee on a broad class of functions. We discuss the applicability of our results to three real-world problems: Maximizing the determinant function of a positive semidefinite matrix, and related problems such as the maximum entropy sampling problem, the constrained maximum cut problem on directed graphs, and combinatorial auction games. We conclude that GREEDY is well-suited to approach these problems. Overall, we present evidence to support the idea that, when dealing with constrained maximization problems with bounded curvature, one needs not search for (approximate) monotonicity to get good approximate solutions.
Roostapour, Vahid; Neumann, Aneta; Neumann, Frank; Friedrich, TobiasPareto Optimization for Subset Selection with Dynamic Cost Constraints. Conference on Artificial Intelligence (AAAI) 2019: 2354-2361
In this paper, we consider subset selection problems for functions \(f\) with constraints where the constraint bound \(B\) changes over time. We point out that adaptive variants of greedy approaches commonly used in the area of submodular optimization are not able to maintain their approximation quality. Investigating the recently introduced POMC Pareto optimization approach, we show that this algorithm efficiently computes a \( phi= (\alpha_f/2)(1-\frac{1}{e^{\alpha_f}})\)-approximation, where \(\alpha_f\) is the submodularity ratio, for each possible constraint bound \(b \leq B\). Furthermore, we show that POMC is able to adapt its set of solutions quickly in the case that \(B\) increases. Our experimental investigations for the influence maximization in social networks show the advantage of POMC over generalized greedy algorithms.
Feldotto, Matthias; Lenzner, Pascal; Molitor, Louise; Skopalik, AlexanderFrom Hotelling to Load Balancing: Approximation and the Principle of Minimum Differentiation. Autonomous Agents and Multiagent Systems (AAMAS) 2019: 1949-1951
Competing firms tend to select similar locations for their stores. This phenomenon, called the principle of minimum differentiation, was captured by Hotelling with a landmark model of spatial competition but is still the object of an ongoing scientific debate. Although consistently observed in practice, many more realistic variants of Hotelling's model fail to support minimum differentiation or do not have pure equilibria at all. In particular, it was recently proven for a generalized model which incorporates negative network externalities and which contains Hotelling's model and classical selfish load balancing as special cases, that the unique equilibria do not adhere to minimum differentiation. Furthermore, it was shown that for a significant parameter range pure equilibria do not exist. We derive a sharp contrast to these previous results by investigating Hotelling's model with negative network externalities from an entirely new angle: approximate pure subgame perfect equilibria. This approach allows us to prove analytically and via agent-based simulations that approximate equilibria having good approximation guarantees and that adhere to minimum differentiation exist for the full parameter range of the model. Moreover, we show that the obtained approximate equilibria have high social welfare.
Bläsius, Thomas; Friedrich, Tobias; Lischeid, Julius; Meeks, Kitty; Schirneck, MartinEfficiently Enumerating Hitting Sets of Hypergraphs Arising in Data Profiling. Algorithm Engineering and Experiments (ALENEX) 2019: 130-143
We devise an enumeration method for inclusion-wise minimal hitting sets in hypergraphs. It has delay \(O(m^{k^\ast+1} \cdot n^2)\) and uses linear space. Hereby, \(n\) is the number of vertices, \(m\) the number of hyperedges, and \(k^\ast\) the rank of the transversal hypergraph. In particular, on classes of hypergraphs for which the cardinality \(k^\ast\) of the largest minimal hitting set is bounded, the delay is polynomial. The algorithm solves the extension problem for minimal hitting sets as a subroutine. We show that the extension problem is W[3]-complete when parameterised by the cardinality of the set which is to be extended. For the subroutine, we give an algorithm that is optimal under the exponential time hypothesis. Despite these lower bounds, we provide empirical evidence showing that the enumeration outperforms the theoretical worst-case guarantee on hypergraphs arising in the profiling of relational databases, namely, in the detection of unique column combinations.
Fokina, Ekaterina; Kötzing, Timo; San Mauro, LucaLimit Learning Equivalence Structures. Algorithmic Learning Theory (ALT) 2019: 383-403
While most research in Gold-style learning focuses on learning formal languages, we consider the identification of computable structures, specifically equivalence structures. In our core model the learner gets more and more information about which pairs of elements of a structure are related and which are not. The aim of the learner is to find (an effective description of) the isomorphism type of the structure presented in the limit. In accordance with language learning we call this learning criterion InfEx-learning (explanatory learning from informant). We start with a discussion and separations of different variants of this learning criterion, including learning from text (where the only information provided is which elements are related, and not which elements are not related) and finite learning (where the first actual conjecture of the learner has to be correct). This gives first intuitions and examples for what (classes of) structures are learnable and which are not. Our main contribution is a complete characterization of the learnable classes of structures in terms of a combinatorial property of the classes. This property allows us to derive a bound of \(\mathbf{0''}\) on the computational complexity required to learn uniformly enumerable classes of structures. Finally, we show how learning classes of structures relates to learning classes of languages by mapping learning tasks for structures to equivalent learning tasks for languages.
Casel, Katrin; Fernau, Henning; Khosravian Ghadikolaei, Mehdi; Monnot, Jerome; Sikora, FlorianExtension of vertex cover and independent set in some classes of graphs and generalizations. International Conference on Algorithms and Complexity (CIAC) 2019: 124-136
We study extension variants of the classical problems Vertex Cover and Independent Set. Given a graph \(G = (V, E)\) and a vertex set \(U \subseteq V\), it is asked if there exists a minimal vertex cover (resp. maximal independent set) \(S\) with \(U \subseteq S\) (resp. \(U \supseteq S\)). Possibly contradicting intuition, these problems tend to be NP-complete, even in graph classes where the classical problem can be solved efficiently. Yet, we exhibit some graph classes where the extension variant remains polynomial-time solvable. We also study the parameterized complexity of theses problems, with parameter \(|U|\), as well as the optimality of simple exact algorithms under ETH. All these complexity considerations are also carried out in very restricted scenarios, be it degree or topological restrictions (bipartite, planar or chordal graphs). This also motivates presenting some explicit branching algorithms for degree-bounded instances. We further discuss the price of extension, measuring the distance of \(U\) to the closest set that can be extended, which results in natural optimization problems related to extension problems for which we discuss polynomial-time approximability.
Bläsius, Thomas; Friedrich, Tobias; Katzmann, Maximilian; Meyer, Ulrich; Penschuck, Manuel; Weyand, ChristopherEfficiently Generating Geometric Inhomogeneous and Hyperbolic Random Graphs. European Symposium on Algorithms (ESA) 2019: 21:2-21:14
EATCS Best Paper Award
Hyperbolic random graphs (HRG) and geometric inhomogeneous random graphs (GIRG) are two similar generative network models that were designed to resemble complex real world networks. In particular, they have a power-law degree distribution with controllable exponent \(\beta\), and high clustering that can be controlled via the temperature \(T\). We present the first implementation of an efficient GIRG generator running in expected linear time. Besides varying temperatures, it also supports underlying geometries of higher dimensions. It is capable of generating graphs with ten million edges in under a second on commodity hardware. The algorithm can be adapted to HRGs. Our resulting implementation is the fastest sequential HRG generator, despite the fact that we support non-zero temperatures. Though non-zero temperatures are crucial for many applications, most existing generators are restricted to \(T = 0\). We also support parallelization, although this is not the focus of this paper. Moreover, we note that our generators draw from the correct probability distribution, i.e., they involve no approximation. Besides the generators themselves, we also provide an efficient algorithm to determine the non-trivial dependency between the average degree of the resulting graph and the input parameters of the GIRG model. This makes it possible to specify \(\bar{d}\) as input and obtain a graph with expected average degree \(\bar{d}\). Moreover, we investigate the differences between HRGs and GIRGs, shedding new light on the nature of the relation between the two models. Although HRGs represent, in a certain sense, a special case of the GIRG model, we find that a straight-forward inclusion does not hold in practice. However, the difference is negligible for most use cases.
Casel, Katrin; Fernau, Henning; Khosravian Ghadikolaei, Mehdi; Monnot, Jerome; Sikora, FlorianExtension of some edge graph problems: standard and parameterized complexity. Fundamentals of Computation Theory (FCT) 2019: 185-200
We consider extension variants of some edge optimization problems in graphs containing the classical Edge Cover, Matching, and Edge Dominating Set problems. Given a graph \(G=(V,E)\) and an edge set \(U \subseteq E\), it is asked whether there exists an inclusion-wise minimal (resp., maximal) feasible solution \(E'\) which satisfies a given property, for instance, being an edge dominating set (resp., a matching) and containing the forced edge set \(U\) (resp., avoiding any edges from the forbidden edge set \(E U\)). We present hardness results for these problems, for restricted instances such as bipartite or planar graphs. We counter-balance these negative results with parameterized complexity results. We also consider the price of extension, a natural optimization problem variant of extension problems, leading to some approximation results.
Doerr, Benjamin; Kötzing, TimoMultiplicative Up-Drift. Genetic and Evolutionary Computation Conference (GECCO) 2019
Drift analysis aims at translating the expected progress of an evo- lutionary algorithm (or more generally, a random process) into a probabilistic guarantee on its run time (hitting time). So far, drift arguments have been successfully employed in the rigorous analy- sis of evolutionary algorithms, however, only for the situation that the progress is constant or becomes weaker when approaching the target. Motivated by questions like how fast fit individuals take over a population, we analyze random processes exhibiting a multiplica- tive growth in expectation. We prove a drift theorem translating this expected progress into a hitting time. This drift theorem gives a sim- ple and insightful proof of the level-based theorem first proposed by Lehre (2011). Our version of this theorem has, for the first time, the best-possible linear dependence on the growth parameter \(\delta\) (the previous-best was quadratic). This gives immediately stronger run time guarantees for a number of applications.
Alon, Noga; Chechik, Shiri; Cohen, SarelDeterministic Combinatorial Replacement Paths and Distance Sensitivity Oracles. International Colloquium on Automata, Languages and Programming (ICALP) 2019: 12:1-12:14
In this work we derandomize two central results in graph algorithms, replacement paths and distance sensitivity oracles (DSOs) matching in both cases the running time of the randomized algorithms. For the replacement paths problem, let \(G = (V,E)\) be a directed unweighted graph with \(n\) vertices and m edges and let \(P\) be a shortest path from \(s\) to \(t\) in \(G\). The replacement paths problem is to find for every edge \(e\) in \(P\) the shortest path from \(s\) to \(t\) avoiding \(e\). Roditty and Zwick [ICALP 2005] obtained a randomized algorithm with running time of \(\mathcal{O(m \sqrt{n})\). Here we provide the first deterministic algorithm for this problem, with the same \(\mathcal{O(m \sqrt{n})\) time. Due to matching conditional lower bounds of Williams et al. [FOCS 2010], our deterministic combinatorial algorithm for the replacement paths problem is optimal up to polylogarithmic factors (unless the long standing bound of \(\mathcal{O(mn)\) for the combinatorial boolean matrix multiplication can be improved). This also implies a deterministic algorithm for the second simple shortest path problem in \(\mathcal{O(m \sqrt{n})\) time, and a deterministic algorithm for the \(k\)-simple shortest paths problem in \(\mathcal{O(k m sqrt{n})\) time (for any integer constant \(k > 0\)). For the problem of distance sensitivity oracles, let \(G = (V,E)\) be a directed graph with real-edge weights. An \(f\)-Sensitivity Distance Oracle (\(f\)-DSO) gets as input the graph \(G=(V,E)\) and a parameter \(f\), preprocesses it into a data-structure, such that given a query \((s,t,F)\) with \(s,t \in V\) and \(F \subseteq E \cup V\), \(|F| <=f\) being a set of at most \(f\) edges or vertices (failures), the query algorithm efficiently computes the distance from \(s\) to \(t\) in the graph \(G \setminus F\) (i.e., the distance from \(s\) to \(t\) in the graph \(G\) after removing from it the failing edges and vertices \(F\)). For weighted graphs with real edge weights, Weimann and Yuster [FOCS 2010] presented several randomized \(f\)-DSOs. In particular, they presented a combinatorial \(f\)-DSO with \(\mathcal{O(mn^{4-\alpha})\) preprocessing time and subquadratic \(\mathcal{O(n^2-2(1-\alpha)/f})\) query time, giving a tradeoff between preprocessing and query time for every value of \(0 < \alpha < 1\). We derandomize this result and present a combinatorial deterministic \(f\)-DSO with the same asymptotic preprocessing and query time.
Friedrich, Tobias; Rothenberger, RalfThe Satisfiability Threshold for Non-Uniform Random 2-SAT. International Colloquium on Automata, Languages and Programming (ICALP) 2019: 61:1-61:14
Propositional satisfiability (SAT) is one of the most fundamental problems in computer science. Its worst-case hardness lies at the core of computational complexity theory, for example in the form of NP-hardness and the (Strong) Exponential Time Hypothesis. In practice however, SAT instances can often be solved efficiently. This contradicting behavior has spawned interest in the average-case analysis of SAT and has triggered the development of sophisticated rigorous and non-rigorous techniques for analyzing random structures. Despite a long line of research and substantial progress, most theoretical work on random SAT assumes a uniform distribution on the variables. In contrast, real-world instances often exhibit large fluctuations in variable occurrence. This can be modeled by a non-uniform distribution of the variables, which can result in distributions closer to industrial SAT instances. We study satisfiability thresholds of non-uniform random 2-SAT with n variables and m clauses and with an arbitrary probability distribution \((p_i)_{i \in [n]}\) with \(p_1 \ge p_2 \ge \dots \ge p_n > 0\) over the n variables. We show for \(p_1^2 = \Theta(\sum_{i=1^n p_i^2)\) that the asymptotic satisfiability threshold is at \(m = \Theta((1− \sum_{i=1^n p_i^2) / \)\((p_1 (\sum_{i=2^n p_i^2)^{1/2}))\) and that it is coarse. For \(p_1^2 = o( \sum_{i=1^n p_i^2)\) we show that there is a sharp satisfiability threshold at \(m = (\sum_{i=1^n p_i^2)^{−1}\). This result generalizes the seminal works by Chvatal and Reed [FOCS 1992] and by Goerdt [JCSS 1996].
Casel, Katrin; Day, Joel D.; Fleischmann, Pamela; Kociumaka, Tomasz; Manea, Florin; Schmid, Markus L.Graph and String Parameters: Connections Between Pathwidth, Cutwidth and the Locality Number. International Colloquium on Automata, Languages and Programming (ICALP) 2019: 109:1-109:16
We investigate the locality number, a recently introduced structural parameter for strings (with applications in pattern matching with variables), and its connection to two important graph-parameters, cutwidth and pathwidth. These connections allow us to show that computing the locality number is NP-hard but fixed parameter tractable (when the locality number or the alphabet size is treated as a parameter), and can be approximated with ratio \(O(\sqrt{\log opt \log n)\). As a by-product, we also relate cutwidth via the locality number to pathwidth, which is of independent interest, since it improves the currently best known approximation algorithm for cutwidth. In addition to these main results, we also consider the possibility of greedy-based approximation algorithms for the locality number.
Peters, Jannik; Stephan, Daniel; Amon, Isabel; Gawendowicz, Hans; Lischeid, Julius; Salabarria, Julius; Umland, Jonas; Werner, Felix; Krejca, Martin S.; Rothenberger, Ralf; Kötzing, Timo; Friedrich, TobiasMixed Integer Programming versus Evolutionary Computation for Optimizing a Hard Real-World Staff Assignment Problem. International Conference on Automated Planning and Scheduling (ICAPS) 2019: 541-554
Assigning staff to engagements according to hard constraints while optimizing several objectives is a task encountered by many companies on a regular basis. Simplified versions of such assignment problems are NP-hard. Despite this, a typical approach to solving them consists of formulating them as mixed integer programming (MIP) problems and using a state-of-the-art solver to get solutions that closely approximate the optimum. In this paper, we consider a complex real-world staff assignment problem encountered by the professional service company KPMG, with the goal of finding an algorithm that solves it faster and with a better solution than a commercial MIP solver. We follow the evolutionary algorithm (EA) metaheuristic and design a search heuristic which iteratively improves a solution using domain-specific mutation operators. Furthermore, we use a flow algorithm to optimally solve a subproblem, which tremendously reduces the search space for the EA. For our real-world instance of the assignment problem, given the same total time budget of \(100\) hours, a parallel EA approach finds a solution that is only \(1.7\)% away from an upper bound for the (unknown) optimum within under five hours, while the MIP solver Gurobi still has a gap of \(10.5\) %.
Friedrich, Tobias; Rothenberger, RalfSharpness of the Satisfiability Threshold for Non-Uniform Random \(k\)-SAT.International Joint Conference on Artificial Intelligence (IJCAI) 2019: 6151-6155
We study non-uniform random k-SAT on n variables with an arbitrary probability distribution p on the variable occurrences. The number \(t = t(n)\) of randomly drawn clauses at which random formulas go from asymptotically almost surely (a.a.s.) satisfiable to a.a.s. unsatisfiable is called the satisfiability threshold. Such a threshold is called sharp if it approaches a step function as n increases. We show that a threshold t(n) for random k-SAT with an ensemble \((p_n)_{n\in\mathbb{N}}\) of arbitrary probability distributions on the variable occurrences is sharp if \(\|p\|_2^2 = O_n(t^{-2/k})\) and \(\|p\|_\infty\) \(= o_n(t^-k/(2k-1) \log^{-(k-1)/(2k-1)(t))\). This result generalizes Friedgut’s sharpness result from uniform to non-uniform random k-SAT and implies sharpness for thresholds of a wide range of random k-SAT models with heterogeneous probability distributions, for example such models where the variable probabilities follow a power-law distribution.
Gao, Ziyuan; Jain, Sanjay; Khoussainov, Bakhadyr; Li, Wei; Melnikov, Alexander; Seidel, Karen; Stephan, FrankRandom Subgroups of Rationals. Mathematical Foundations of Computer Science (MFCS) 2019: 25:1-25:14
This paper introduces and studies a notion of algorithmic randomness for subgroups of rationals. Given a randomly generated additive subgroup \((G,+)\) of rationals, two main questions are addressed: first, what are the model-theoretic and recursion-theoretic properties of \((G,+)\); second, what learnability properties can one extract from \(G\) and its subclass of finitely generated subgroups? For the first question, it is shown that the theory of \((G,+)\) coincides with that of the additive group of integers and is therefore decidable; furthermore, while the word problem for \(G\) with respect to any generating sequence for \(G\) is not even semi-decidable, one can build a generating sequence \(\beta\) such that the word problem for \(G\) with respect to \(\beta\) is co-recursively enumerable (assuming that the set of generators of \(G\) is limit-recursive). In regard to the second question, it is proven that there is a generating sequence \(\beta\) for \(G\) such that every non-trivial finitely generated subgroup of \(G\) is recursively enumerable and the class of all such subgroups of \(G\) is behaviourally correctly learnable, that is, every non-trivial finitely generated subgroup can be semantically identified in the limit (again assuming that the set of generators of \(G\) is limit-recursive). On the other hand, the class of non-trivial finitely generated subgroups of \(G\) cannot be syntactically identified in the limit with respect to any generating sequence for \(G\). The present work thus contributes to a recent line of research studying algorithmically random infinite structures and uncovers an interesting connection between the arithmetical complexity of the set of generators of a randomly generated subgroup of rationals and the learnability of its finitely generated subgroups.
Chechik, Shiri; Cohen, SarelNear Optimal Algorithms For The Single Source Replacement Paths Problem. Symposium on Discrete Algorithms (SODA) 2019: 2090-2109
The Single Source Replacement Paths (SSRP) problem is as follows; Given a graph \(G = (V, E)\), a source vertex \(s\) and a shortest paths tree \(T_s\) rooted in \(s\), output for every vertex \(t \in V\) and for every edge \(e\) in \(T_s\) the length of the shortest path from \(s\) to \(t\) avoiding \(e\). We present near optimal upper bounds, by providing \(\tilde{\mathcal{O(m \sqrt{n} + n^2) \) time randomized combinatorial algorithm for unweighted undirected graphs, and matching conditional lower bounds for the SSRP problem.
Bilò, Davide; Friedrich, Tobias; Lenzner, Pascal; Melnichenko, AnnaGeometric Network Creation Games. Symposium on Parallelism in Algorithms and Architectures (SPAA) 2019: 323-332
Network Creation Games are a well-known approach for explaining and analyzing the structure, quality and dynamics of real-world networks like the Internet and other infrastructure networks which evolved via the interaction of selfish agents without a central authority. In these games selfish agents which correspond to nodes in a network strategically buy incident edges to improve their centrality. However, past research on these games has only considered the creation of networks with unit-weight edges. In practice, e.g. when constructing a fiber-optic network, the choice of which nodes to connect and also the induced price for a link crucially depends on the distance between the involved nodes and such settings can be modeled via edge-weighted graphs. We incorporate arbitrary edge weights by generalizing the well-known model by Fabrikant et al.~[PODC'03] to edge-weighted host graphs and focus on the geometric setting where the weights are induced by the distances in some metric space. In stark contrast to the state-of-the-art for the unit-weight version, where the Price of Anarchy is conjectured to be constant and where resolving this is a major open problem, we prove a tight non-constant bound on the Price of Anarchy for the metric version and a slightly weaker upper bound for the non-metric case. Moreover, we analyze the existence of equilibria, the computational hardness and the game dynamics for several natural metrics. The model we propose can be seen as the game-theoretic analogue of a variant of the classical Network Design Problem. Thus, low-cost equilibria of our game correspond to decentralized and stable approximations of the optimum network design.
Friedrich, TobiasFrom Graph Theory to Network Science: The Natural Emergence of Hyperbolicity. Symposium Theoretical Aspects of Computer Science (STACS) 2019: 5:1–5:9
Network science is driven by the question which properties large real-world networks have and how we can exploit them algorithmically. In the past few years, hyperbolic graphs have emerged as a very promising model for scale-free networks. The connection between hyperbolic geometry and complex networks gives insights in both directions: (1) Hyperbolic geometry forms the basis of a natural and explanatory model for real-world networks. Hyperbolic random graphs are obtained by choosing random points in the hyperbolic plane and connecting pairs of points that are geometrically close. The resulting networks share many structural properties for example with online social networks like Facebook or Twitter. They are thus well suited for algorithmic analyses in a more realistic setting. (2) Starting with a real-world network, hyperbolic geometry is well-suited for metric embeddings. The vertices of a network can be mapped to points in this geometry, such that geometric distances are similar to graph distances. Such embeddings have a variety of algorithmic applications ranging from approximations based on efficient geometric algorithms to greedy routing solely using hyperbolic coordinates for navigation decisions.
Cseh, Ágnes; Juhos, AttilaPairwise Preferences in the Stable Marriage Problem. Symposium Theoretical Aspects of Computer Science (STACS) 2019: 21:1-21:16
We study the classical, two-sided stable marriage problem under pairwise preferences. In the most general setting, agents are allowed to express their preferences as comparisons of any two of their edges and they also have the right to declare a draw or even withdraw from such a comparison. This freedom is then gradually restricted as we specify six stages of orderedness in the preferences, ending with the classical case of strictly ordered lists. We study all cases occurring when combining the three known notions of stability - weak, strong and super-stability - under the assumption that each side of the bipartite market obtains one of the six degrees of orderedness. By designing three polynomial algorithms and two NP-completeness proofs we determine the complexity of all cases not yet known, and thus give an exact boundary in terms of preference structure between tractable and intractable cases.
Bläsius, Thomas; Friedrich, Tobias; Sutton, Andrew M.On the Empirical Time Complexity of Scale-Free 3-SAT at the Phase Transition. Tools and Algorithms for the Construction and Analysis of Systems (TACAS) 2019: 117-134
The hardness of formulas at the solubility phase transition of random propositional satisfiability (SAT) has been intensely studied for decades both empirically and theoretically. Solvers based on stochastic local search (SLS) appear to scale very well at the critical threshold, while complete backtracking solvers exhibit exponential scaling. On industrial SAT instances, this phenomenon is inverted: backtracking solvers can tackle large industrial problems, where SLS-based solvers appear to stall. Industrial instances exhibit sharply different structure than uniform random instances. Among many other properties, they are often heterogeneous in the sense that some variables appear in many while others appear in only few clauses. We conjecture that the heterogeneity of SAT formulas alone already contributes to the trade-off in performance between SLS solvers and complete backtracking solvers. We empirically determine how the run time of SLS vs. backtracking solvers depends on the heterogeneity of the input, which is controlled by drawing variables according to a scale-free distribution. Our experiments reveal that the efficiency of complete solvers at the phase transition is strongly related to the heterogeneity of the degree distribution. We report results that suggest the depth of satisfying assignments in complete search trees is influenced by the level of heterogeneity as measured by a power-law exponent. We also find that incomplete SLS solvers, which scale well on uniform instances, are not affected by heterogeneity. The main contribution of this paper utilizes the scale-free random 3-SAT model to isolate heterogeneity as an important factor in the scaling discrepancy between complete and SLS solvers at the uniform phase transition found in previous works.
Bläsius, Thomas; Fischbeck, Philipp; Friedrich, Tobias; Schirneck, MartinUnderstanding the Effectiveness of Data Reduction in Public Transportation Networks. Workshop on Algorithms and Models for the Web Graph (WAW) 2019: 87-101
Given a public transportation network of stations and connections, we want to find a minimum subset of stations such that each connection runs through a selected station. Although this problem is NP-hard in general, real-world instances are regularly solved almost completely by a set of simple reduction rules. To explain this behavior, we view transportation networks as hitting set instances and identify two characteristic properties, locality and heterogeneity. We then devise a randomized model to generate hitting set instances with adjustable properties. While the heterogeneity does influence the effectiveness of the reduction rules, the generated instances show that locality is the significant factor. Beyond that, we prove that the effectiveness of the reduction rules is independent of the underlying graph structure. Finally, we show that high locality is also prevalent in instances from other domains, facilitating a fast computation of minimum hitting sets.
Echzell, Hagen; Friedrich, Tobias; Lenzner, Pascal; Molitor, Louise; Pappik, Marcus; Schöne, Friedrich; Sommer, Fabian; Stangl, DavidConvergence and Hardness of Strategic Schelling Segregation. Web and Internet Economics (WINE) 2019: 156-170
The phenomenon of residential segregation was captured by Schelling's famous segregation model where two types of agents are placed on a grid and an agent is content with her location if the fraction of her neighbors which have the same type as her is at least \(\tau\), for some \(0<\tau<1\). Discontent agents simply swap their location with a randomly chosen other discontent agent or jump to a random empty cell. We analyze a generalized game-theoretic model of Schelling segregation which allows more than two agent types and more general underlying graphs modeling the residential area. For this we show that both aspects heavily influence the dynamic properties and the tractability of finding an optimal placement. We map the boundary of when improving response dynamics (IRD), i.e., the natural approach for finding equilibrium states, are guaranteed to converge. For this we prove several sharp threshold results where guaranteed IRD convergence suddenly turns into the strongest possible non-convergence result: a violation of weak acyclicity. In particular, we show such threshold results also for Schelling's original model, which is in contrast to the standard assumption in many empirical papers. Furthermore, we show that in case of convergence, IRD find an equilibrium in \(O(m)\) steps, where \(m\) is the number of edges in the underlying graph and show that this bound is met in empirical simulations starting from random initial agent placements.
Krejca, Martin S.Theoretical Analyses of Univariate Estimation-of-Distribution Algorithms. PhD thesis, Hasso Plattner Institute, University of Potsdam 2019
ACM SIGEVO Dissertation Award, Honorable Mention
Optimization is a core part of technological advancement and is usually heavily aided by computers. However, since many optimization problems are hard, it is unrealistic to expect an optimal solution within reasonable time. Hence, heuristics are employed, that is, computer programs that try to produce solutions of high quality quickly. One special class are estimation-of-distribution algorithms (EDAs), which are characterized by maintaining a probabilistic model over the problem domain, which they evolve over time. In an iterative fashion, an EDA uses its model in order to generate a set of solutions, which it then uses to refine the model such that the probability of producing good solutions is increased. In this thesis, we theoretically analyze the class of univariate EDAs over the Boolean domain, that is, over the space of all length-\(n\) bit strings. In this setting, the probabilistic model of a univariate EDA consists of an \(n\)-dimensional probability vector where each component denotes the probability to sample a \(1\) for that position in order to generate a bit string. My contribution follows two main directions: first, we analyze general inherent properties of univariate EDAs. Second, we determine the expected run times of specific EDAs on benchmark functions from theory. In the first part, we characterize when EDAs are unbiased with respect to the problem encoding. We then consider a setting where all solutions look equally good to an EDA, and we show that the probabilistic model of an EDA quickly evolves into an incorrect model if it is always updated such that it does not change in expectation. In the second part, we first show that the algorithms cGA and MMAS-fp are able to efficiently optimize a noisy version of the classical benchmark function OneMax. We perturb the function by adding Gaussian noise with a variance of \(\sigma^2\), and we prove that the algorithms are able to generate the true optimum in a time polynomial in \(\sigma^2\) and the problem size \(n\). For the MMAS-fp, we generalize this result to linear functions. Further, we prove a run time of \(\Omega\big(n \log(n)\big)\) for the algorithm UMDA on (unnoisy) OneMax. Last, we introduce a new algorithm that is able to optimize the benchmark functions OneMax and LeadingOnes both in \(O\big(n \log(n)\big)\), which is a novelty for heuristics in the domain we consider.
FOGA ’19: Proceedings of the 15th ACM/SIGEVO Conference on Foundations of Genetic Algorithms. 2019ACM.
Editorship
The FOGA workshop series aims at advancing our understanding of the working principles behind evolutionary algorithms and related randomized search heuristics. FOGA is the premier event to discuss advances in the theoretical foundations of these algorithms, tools needed to analyze them, and different aspects of comparing algorithms' performance. FOGA XV has welcomed submissions covering the entire spectrum of work, ranging from rigorously derived mathematical results to carefully crafted empirical studies.
Theory of Randomized Optimization Heuristics. Dagstuhl Reports 2019: 61-94
This report documents the activities of Dagstuhl Seminar 19431 on Theory of Randomized Optimization Heuristics. 46 researchers from Europe, Australia, Asia, and North America have come together to discuss ongoing research. This tenth edition of the seminar series had three focus topics: (1) relation between optimal control and heuristic optimization, (2) benchmarking optimization heuristics, and (3) the interfaces between continuous and discrete optimization. Several breakout sessions have provided ample opportunity to brainstorm on recent developments in the research landscape, to discuss and solve open problems, and to kick-start new research initiatives.
Bläsius, Thomas; Friedrich, Tobias; Schirneck, MartinThe Minimization of Random Hypergraphs. CoRR 2019
ArXiv preprint
We investigate the maximum-entropy model \(\mathcal{B}_{n,m,p}\) for random \(n\)-vertex, \(m\)-edge multi-hypergraphs with expected edge size \(pn\). We show that the expected size of the minimization of \(\mathcal{B}_{n,m,p}\), i.e., the number of its inclusion-wise minimal edges, undergoes a phase transition with respect to \(m\). If \(m\) is at most \(1/(1-p)^(1-p)n}\), then the minimization is of size \(\Theta(m)\). Beyond that point, for \(\alpha\) such that \(m = 1/(1-p)^\alpha n}\) and \(\mathrm{H}\) being the entropy function, it is \(\Theta(1) cdot \min\left(1, , \frac{1}{(\alpha\,{-}\,(1-p)) \sqrt{(1\,{-}\,\alpha) n \right) cdot 2^(\mathrm{H(\alpha) + (1-\alpha) \log_2 p) n}\). This implies that the maximum expected size over all \(m\) is \(\Theta((1+p)^n/\sqrt{n})\). Our structural findings have algorithmic implications for minimizing an input hypergraph, which in turn has applications in the profiling of relational databases as well as for the Orthogonal Vectors problem studied in fine-grained complexity. The main technical tool is an improvement of the Chernoff--Hoeffding inequality, which we make tight up to constant factors. We show that for a binomial variable \(X sim \mathrm{Bin(n,p)\) and real number \(0 < x \le p\), it holds that \(\mathrm{P[X leq xn] = \Theta(1) cdot \min\left(1, , \frac{1}{(p-x) \sqrt{xn} \right) cdot 2^-\mathrm{D(x \,\|}\, p) n}\), where \(\mathrm{D}\) denotes the Kullback--Leibler divergence between Bernoulli distributions. The result remains true if \(x\) depends on \(n\) as long as it is bounded away from \(0\).
Issac, Davis; Bhattacharya, Anup; Kumar, Amit; Jaiswal, RageshSampling in space restricted settings. Algorithmica 2018: 1439-1458
Space efficient algorithms play an important role in dealing with large amount of data. In such settings, one would like to analyze the large data using small amount of “working space”. One of the key steps in many algorithms for analyzing large data is to maintain a (or a small number) random sample from the data points. In this paper, we consider two space restricted settings-(i) the streaming model, where data arrives over time and one can use only a small amount of storage, and (ii) the query model, where we can structure the data in low space and answer sampling queries. In this paper, we prove the following results in the above two settings: \begin{itemize item In the streaming setting, we would like to maintain a random sample from the elements seen so far. We prove that one can maintain a random sample using \( \mathcal{O(\log n) \) random bits and \( \mathcal{O(\log n) \) bits of space, where n is the number of elements seen so far. We can extend this to the case when elements have weights as well. item In the query model, there are n elements with weights \(w_1, \dots, w_n \) (which are w-bit integers) and one would like to sample a random element with probability proportional to its weight. Bringmann and Larsen (STOC 2013) showed how to sample such an element using \(n \omega + 1\) bits of space (whereas, the information theoretic lower bound is \(n \omega \)). We consider the approximate sampling problem, where we are given an error parameter \( \varepsilon \), and the sampling probability of an element can be off by an \( \varepsilon \) factor. We give matching upper and lower bounds for this problem. \end{itemize
Arulselvan, Ashwin; Cseh, Ágnes; Groß, Martin; Manlove, David F.; Matuschke, JannikMatchings with Lower Quotas: Algorithms and Complexity. Algorithmica 2018: 185-208
We study a natural generalization of the maximum weight many-to-one matching problem. We are given an undirected bipartite graph \(G = (A \cup P, E)\) with weights on the edges in E, and with lower and upper quotas on the vertices in P. We seek a maximum weight many-to-one matching satisfying two sets of constraints: vertices in A are incident to at most one matching edge, while vertices in \(P\) are either unmatched or they are incident to a number of matching edges between their lower and upper quota. This problem, which we call maximum weight many-to-one matching with lower and upper quotas (WMLQ), has applications to the assignment of students to projects within university courses, where there are constraints on the minimum and maximum numbers of students that must be assigned to each project. In this paper, we provide a comprehensive analysis of the complexity of WMLQ from the viewpoints of classical polynomial time algorithms, fixed-parameter tractability, as well as approximability. We draw the line between NP-hard and polynomially tractable instances in terms of degree and quota constraints and provide efficient algorithms to solve the tractable ones. We further show that the problem can be solved in polynomial time for instances with bounded treewidth; however, the corresponding runtime is exponential in the treewidth with the maximum upper quota \(u_{\max}\) as basis, and we prove that this dependence is necessary unless FPT=W[1]. The approximability of WMLQ is also discussed: we present an approximation algorithm for the general case with performance guarantee \(u_{\max}+1\), which is asymptotically best possible unless \(P=NP\). Finally, we elaborate on how most of our positive results carry over to matchings in arbitrary graphs with lower quotas.
Bläsius, Thomas; Karrer, Annette; Rutter, IgnazSimultaneous Embedding: Edge Orderings, Relative Positions, Cutvertices. Algorithmica 2018: 1214-1277
A simultaneous embedding (with fixed edges) of two graphs \(G^1\) and \(G^2\) with common graph \(G = G^1 ∩ G^2\) is a pair of planar drawings of \(G^1\) and \(G^2\) that coincide on \(G\). It is an open question whether there is a polynomial-time algorithm that decides whether two graphs admit a simultaneous embedding (problem SEFE). In this paper, we present two results. First, a set of three linear-time preprocessing algorithms that remove certain substructures from a given SEFE instance, producing a set of equivalent Sefe instances without such substructures. The structures we can remove are (1) cutvertices of the union graph \(G^∪ = G^1 ∪ G^2\) , (2) most separating pairs of \(G^∪\), and (3) connected components of G that are biconnected but not a cycle. Second, we give an \(O(n³)\)-time algorithm solving Sefe for instances with the following restriction. Let u be a pole of a P-node \(µ\) in the SPQR-tree of a block of \(G^1\) or \(G^2\). Then at most three virtual edges of \(µ\) may contain common edges incident to u. All algorithms extend to the sunflower case, i.e., to the case of more than two graphs pairwise intersecting in the same common graph.
Kötzing, Timo; Sudholt, DirkPreface to the Special Issue on Theory of Genetic and Evolutionary Computation. Algorithmica 2018: 1575-1578
Evolutionary algorithms (EAs) are randomized search heuristics that can be employed to solve complex optimization problems, including multimodal or highly constrained problems. EAs work by mimicking principles from natural evolution: maintaining a collection of possible solutions (a population) and iteratively creating variants of the individuals (the offspring) and then choosing a new set of individuals for the next iteration (selection). EAs are popular because they represent general-purpose optimizers that can be easily applied to various problems, even in cases where little or no in-depth knowledge about the problem is available. In order to guide practitioners devising new and effective algorithms, theoretical computer scientists employ methods from the field of randomized algorithms to analyze the working principles of EAs with mathematical rigor. Key questions concern the impact of parameter choices (such as, for example, the offspring size or the choice of variation operators) as well as foundational work on developing powerful analysis methods. The theory track of the annual ACM Genetic and Evolutionary Computation Conference (GECCO) is the first tier event for advances in this direction. In this special issue six selected papers from the 2016 edition of the GECCO theory track are collected, each one of them carefully revised and extended to meet the high quality standards of Algorithmica.
Doerr, Benjamin; Doerr, Carola; Kötzing, TimoStatic and Self-Adjusting Mutation Strengths for Multi-valued Decision Variables. Algorithmica 2018: 1732-1768
The most common representation in evolutionary computation are bit strings. With very little theoretical work existing on how to use evolutionary algorithms for decision variables taking more than two values, we study the run time of simple evolutionary algorithms on some OneMax-like functions defined over \(\Omega={0,1,\dots,r−1}^n\). We observe a crucial difference in how we extend the one-bit-flip and standard-bit mutation operators to the multi-valued domain. While it is natural to modify a random position of the string or select each position of the solution vector for modification independently with probability \(1/n\), there are various ways to then change such a position. If we change each selected position to a random value different from the original one, we obtain an expected run time of \(\Theta(n r \log n)\). If we change each selected position by \(+1\) or \(−1\) (random choice), the optimization time reduces to \(\Theta(n r + n \log n)\). If we use a random mutation strength \(i \in {0,1,\dots,r−1}\) with probability inversely proportional to \(i\) and change the selected position by \(+i\) or \(−i\) (random choice), then the optimization time becomes \(\Theta(n \log(r)(\log n + \log r))\), which is asymptotically faster than the previous if \(r = \omega(\log(n)\log\log(n))\). Interestingly, a better expected performance can be achieved with a self-adjusting mutation strength that is based on the success of previous iterations. For the mutation operator that modifies a randomly chosen position, we show that the self-adjusting mutation strength yields an expected optimization time of \(\Theta(n(\log n + \log r))\), which is best possible among all dynamic mutation strengths. In our proofs, we use a new multiplicative drift theorem for computing lower bounds, which is not restricted to processes that move only towards the target.
Bläsius, Thomas; Friedrich, Tobias; Krohmer, AntonCliques in Hyperbolic Random Graphs. Algorithmica 2018: 2324-2344
Most complex real world networks display scale-free features. This characteristic motivated the study of numerous random graph models with a power-law degree distribution. There is, however, no established and simple model which also has a high clustering of vertices as typically observed in real data. Hyperbolic random graphs bridge this gap. This natural model has recently been introduced by Krioukov et al. and has shown theoretically and empirically to fulfill all typical properties of real world networks, including power-law degree distribution and high clustering. We study cliques in hyperbolic random graphs \(G\) and present new results on the expected number of \(k\)-cliques \(E[K_k]\) and the size of the largest clique \(\omega(G)\). We observe that there is a phase transition at power-law exponent \(\beta = 3\). More precisely, for \(\beta\)\(\in\)\((2,3)\) we prove \(E[K_k] = \) \(n^{k(3-\beta)/2} \Theta(k)^{-k}\) and \(\omega(G) = \) \(\Theta\)\((n^{(3-\beta)/2})\), while for \(\beta \geq 3\) we prove \(E[K_k]=n \Theta(k)^{-k}\) and \(\omega(G)=\Theta(\log(n)/ \log\log n)\). Furthermore, we show that for \(\beta \geq 3\), cliques in hyperbolic random graphs can be computed in time \(O(n)\). If the underlying geometry is known, cliques can be found with worst-case runtime \(O(m n^{2.5})\) for all values of \(\beta\).
Abu-Khzam, Faisal N.; Bazgan, Cristina; Casel, Katrin; Fernau, HenningClustering with Lower-Bounded Sizes - A General Graph-Theoretic Framework. Algorithmica 2018: 2517-2550
Classical clustering problems search for a partition of objects into a fixed number of clusters. In many scenarios, however, the number of clusters is not known or necessarily fixed. Further, clusters are sometimes only considered to be of significance if they have a certain size. We discuss clustering into sets of minimum cardinality \(k\) without a fixed number of sets and present a general model for these types of problems. This general framework allows the comparison of different measures to assess the quality of a clustering. We specifically consider nine quality-measures and classify the complexity of the resulting problems with respect to \(k\). Further, we derive some polynomial-time solvable cases for \(k=2\) with connections to matching-type problems which, among other graph problems, then are used to compute approximations for larger values of \(k\).
Bringmann, Karl; Friedrich, Tobias; Krohmer, AntonDe-anonymization of Heterogeneous Random Graphs in Quasilinear Time. Algorithmica 2018: 3397–3427
There are hundreds of online social networks with altogether billions of users. Many such networks publicly release structural information, with all personal information removed. Empirical studies have shown, however, that this provides a false sense of privacy - it is possible to identify almost all users that appear in two such anonymized network as long as a few initial mappings are known. We analyze this problem theoretically by reconciling two versions of an artificial power-law network arising from independent subsampling of vertices and edges. We present a new algorithm that identifies most vertices and makes no wrong identifications with high probability. The number of vertices matched is shown to be asymptotically optimal. For an n-vertex graph, our algorithm uses \(n^\varepsilon\) seed nodes (for an arbitrarily small \(\varepsilon\)) and runs in quasilinear time. This improves previous theoretical results which need \(\Theta(n)\) seed nodes and have runtimes of order \(n^{1+\Omega(1)}\). Additionally, the applicability of our algorithm is studied experimentally on different networks.
Bläsius, Thomas; Stumpf, Peter; Ueckerdt, TorstenLocal and Union Boxicity. Discrete Mathematics 2018: 1307 - 1315
The boxicity \(box(H)\) of a graph \(H\) is the smallest integer \(d\) such that \(H\) is the intersection of \(d\) interval graphs, or equivalently, that \(H\) is the intersection graph of axis-aligned boxes in \(R^d\). These intersection representations can be interpreted as covering representations of the complement \(H^c\) of \(H\) with co-interval graphs, that is, complements of interval graphs. We follow the recent framework of global, local and folded covering numbers (Knauer and Ueckerdt, 2016) to define two new parameters: the local boxicity \(box_l(H)\) and the union boxicity \(box_u(H)\) of \(H\). The union boxicity of \(H\) is the smallest \(d\) such that \(H^c\) can be covered with \(d\) vertex-disjoint unions of co-interval graphs, while the local boxicity of \(H\) is the smallest \(d\) such that \(H^c\) can be covered with co-interval graphs, at most \(d\) at every vertex. We show that for every graph \(H\) we have \(box_l(H) leq box_u(H) leq box(H) \) and that each of these inequalities can be arbitrarily far apart. Moreover, we show that local and union boxicity are also characterized by intersection representations of appropriate axis-aligned boxes in \(R^d\) . We demonstrate with a few striking examples, that in a sense, the local boxicity is a better indication for the complexity of a graph, than the classical boxicity.
Baum, Moritz; Bläsius, Thomas; Gemsa, Andreas; Rutter, Ignaz; Wegner, FranziskaScalable Exact Visualization of Isocontours in Road Networks via Minimum-Link Paths. Journal of Computational Geometry 2018: 27-73
Isocontours in road networks represent the area that is reachable from a source within a given resource limit. We study the problem of computing accurate isocontours in realistic, large-scale networks. We propose isocontours represented by polygons with minimum number of segments that separate reachable and unreachable components of the network. Since the resulting problem is not known to be solvable in polynomial time, we introduce several heuristics that run in (almost) linear time and are simple enough to be implemented in practice. A key ingredient is a new practical linear-time algorithm for minimum-link paths in simple polygons. Experiments in a challenging realistic setting show excellent performance of our algorithms in practice, computing near-optimal solutions in a few milliseconds on average, even for long ranges.
Rizzo, Manuel; Battaglia, FrancescoStatistical and Computational Tradeoff in Genetic Algorithm-Based Estimation. Journal of Statistical Computation and Simulation 2018: 3081-3097
When a genetic algorithm (GA) is employed in a statistical problem, the result is affected by both variability due to sampling and the stochastic elements of algorithm. Both of these components should be controlled in order to obtain reliable results. In the present work we analyze parametric estimation problems tackled by GAs, and pursue two objectives: the first one is related to a formal variability analysis of final estimates, showing that it can be easily decomposed in the two sources of variability. In the second one we introduce a framework of GA estimation with fixed computational resources, which is a form of statistical and the computational tradeoff question, crucial in recent problems. In this situation the result should be optimal from both the statistical and computational point of view, considering the two sources of variability and the constraints on resources. Simulation studies will be presented for illustrating the proposed method and the statistical and computational tradeoff question.
Cseh, Ágnes; Kavitha, TelikepalliPopular edges and dominant matchings. Mathematical Programming 2018: 209-229
Given a bipartite graph \(G=(A \cup B, E)\) with strict preference lists and given an edge \(e^* \in E\), we ask if there exists a popular matching in \(G\) that contains \(e^*\). We call this the popular edge problem. A matching \(M\) is popular if there is no matching \(M'\) such that the vertices that prefer \(M'\) to \(M\) outnumber those that prefer \(M\) to \(M'\). It is known that every stable matching is popular; however \(G\) may have no stable matching with the edge \(e^*\). In this paper we identify another natural subclass of popular matchings called "dominant matchings" and show that if there is a popular matching that contains the edge \(e^*\), then there is either a stable matching that contains \(e^*\) or a dominant matching that contains \(e^*\). This allows us to design a linear time algorithm for identifying the set of popular edges. When preference lists are complete, we show an \(\mathcal{O(n^3)\) algorithm to find a popular matching containing a given set of edges or report that none exists, where \(n=|A|+|B|\).
Azar, Yossi; Cohen, SarelAn improved algorithm for online machine minimization. Operations Research Letters 2018: 128-133
The online machine minimization problem seeks to design a preemptive scheduling algorithm on multiple machines — each job \(j\) arrives at its release time \(r_j\), has to be processed for \(p_j\) time units, and must be completed by its deadline \(d_j\). The goal is to minimize the number of machines the algorithm uses. We improve the \(\mathcal{O(\log m)\)-competitive algorithm by Chen, Megow and Schewior (SODA 2016) and provide an \(\mathcal{O(\frac{\log m\log \log m})\)-competitive algorithm.
Friedrich, Tobias; Krohmer, AntonOn the diameter of hyperbolic random graphs. SIAM Journal on Discrete Mathematics 2018: 1314-1334
Large real-world networks are typically scale-free. Recent research has shown that such graphs are described best in a geometric space. More precisely, the internet can be mapped to a hyperbolic space such that geometric greedy routing is close to optimal (Boguñá, Papadopoulos, and Krioukov. Nature Communications, 1:62, 2010). This observation has pushed the interest in hyperbolic networks as a natural model for scale-free networks. Hyperbolic random graphs follow a power law degree distribution with controllable exponent \(\beta\) and show high clustering (Gugelmann, Panagiotou, and Peter. ICALP, pp. 573–585, 2012). For understanding the structure of the resulting graphs and for analyzing the behavior of network algorithms, the next question is bounding the size of the diameter. The only known explicit bound is \(O(\)\((\log n)\)\(^{32/((3 - \beta)(5 - \beta))+1})\)(Kiwi and Mitsche. ANALCO, pp. 26–39, 2015). We present two much simpler proofs for an improved upper bound of \(O((\log n)\)\(^{2/(3 - \beta)})\) and a lower bound of \(\Omega(\log n)\). If \(\beta > 3\), we show that the latter bound is tight by proving an upper bound of \(O(\log n)\) for the diameter.
Friedrich, Tobias; Katzmann, Maximilian; Krohmer, AntonUnbounded Discrepancy of Deterministic Random Walks on Grids. SIAM Journal on Discrete Mathematics 2018: 2441-2452
Random walks are frequently used in randomized algorithms. We study a derandomized variant of a random walk on graphs, called rotor-router model. In this model, instead of distributing tokens randomly, each vertex serves its neighbors in a fixed deterministic order. For most setups, both processes behave remarkably similar: Starting with the same initial configuration, the number of tokens in the rotor-router model deviates only slightly from the expected number of tokens on the corresponding vertex in the random walk model. The maximal difference over all vertices and all times is called single vertex discrepancy. Cooper and Spencer (2006) showed that on \(\mathbb{Z}^{d}\) the single vertex discrepancy is only a constant \(c_d\). Other authors also determined the precise value of \(c_d\) for \(d=1,2\). All these results, however, assume that initially all tokens are only placed on one partition of the bipartite graph \(\mathbb{Z}^{d}\). We show that this assumption is crucial by proving that otherwise the single vertex discrepancy can become arbitrarily large. For all dimensions \(d\geq1\) and arbitrary discrepancies~\(\ell \geq 0\), we construct configurations that reach a discrepancy of at least \(\ell\).
Bazgan, Cristina; Brankovic, Ljiljana; Casel, Katrin; Fernau, Henning; Jansen, Klaus; Klein, Kim-Manuel; Lampis, Michael; Liedloff, Mathieu; Monnot, Jérôme; Paschos, Vangelis Th.The many facets of upper domination. Theoretical Computer Science 2018: 2-25
This paper studies Upper Domination, i.e., the problem of computing the maximum cardinality of a minimal dominating set in a graph with respect to classical and parameterised complexity as well as approximability.
Dang, Duc-Cuong; Friedrich, Tobias; Kötzing, Timo; Krejca, Martin S.; Lehre, Per Kristian; Oliveto, Pietro S.; Sudholt, Dirk; Sutton, Andrew M.Escaping Local Optima Using Crossover with Emergent Diversity. Transactions on Evolutionary Computation 2018: 484-497
Population diversity is essential for the effective use of any crossover operator. We compare seven commonly used diversity mechanisms and prove rigorous run time bounds for the \((\mu+1)\) GA using uniform crossover on the fitness function \(Jump_k\). All previous results in this context only hold for unrealistically low crossover probability \(p_c=O(k/n)\), while we give analyses for the setting of constant \(p_c < 1\) in all but one case. Our bounds show a dependence on the problem size \(n\), the jump length \(k\), the population size \(\mu\), and the crossover probability \(p_c\). For the typical case of constant \(k > 2\) and constant \(p_c\), we can compare the resulting expected optimisation times for different diversity mechanisms assuming an optimal choice of \(\mu\): \(O(n^{k-1})\) for duplicate elimination/minimisation, \(O(n^2 \log n)\) for maximising the convex hull, \(O(n \log n)\) for det. crowding (assuming \(p_c = k/n\)), \(O(n \log n)\) for maximising the Hamming distance, \(O(n \log n)\) for fitness sharing, \(O(n \log n)\) for the single-receiver island model. This proves a sizeable advantage of all variants of the \((\mu+1)\) GA compared to the (1+1) EA, which requires \(\Theta(n^k)\). In a short empirical study we confirm that the asymptotic differences can also be observed experimentally.
Bläsius, Thomas; Friedrich, Tobias; Krohmer, Anton; Laue, SörenEfficient Embedding of Scale-Free Graphs in the Hyperbolic Plane. Transactions on Networking 2018: 920-933
Hyperbolic geometry appears to be intrinsic in many large real networks. We construct and implement a new maximum likelihood estimation algorithm that embeds scale-free graphs in the hyperbolic space. All previous approaches of similar embedding algorithms require at least a quadratic runtime. Our algorithm achieves quasilinear runtime, which makes it the first algorithm that can embed networks with hundreds of thousands of nodes in less than one hour. We demonstrate the performance of our algorithm on artificial and real networks. In all typical metrics, like Log-likelihood and greedy routing, our algorithm discovers embeddings that are very close to the ground truth.
Doskoč, VanjaConfident Iterative Learning in Computational Learning Theory. Current Trends in Theory and Practice of Computer Science (SOFSEM) 2018: 30-42
In inductive inference various types of learning have emerged. The main aim of this paper is to investigate a new type of learning, the confident iterative learning. Given a class to be learnt, the idea here is to merge the following two concepts. For confidence, we require the learner to converge on any set, however, it only needs to be correct on the sets in the class. To be iterative, we restrict the learner’s memory on previ- ous inputs and calculations to its last hypothesis. Investigating the new learner, we will provide negative and positive examples, as well as some properties the confident iterative learner possesses. This will peak at a classification theorem for certain types of classes. Next, we will introduce and compare different types of confidence, focus- ing on the learner’s behaviour on sets outside of the class. Lastly, we will focus on the possible hypotheses. Introducing learning with respect to hypothesis spaces, we will provide examples witnessing that exact, class preserving and class comprising learning are different.
Battaglia, Francesco; Cucina, Domenico; Rizzo, ManuelPeriodic Autoregressive Models with Multiple Structural Changes by Genetic Algorithms. Mathematical and Statistical Methods for Actuarial Sciences and Finance (MAF) 2018: 107-110
We present a model and a computational procedure for dealing with seasonality and regime changes in time series. In this work we are interested in time series which in addition to trend display seasonality in mean, in autocorrelation and in variance. These type of series appears in many areas, including hydrology, meteorology, economics and finance. The seasonality is accounted for by subset PAR modelling, for which each season follows a possibly different Autoregressive model. Levels, trend, autoregressive parameters and residual variances are allowed to change their values at fixed unknown times. The identification of number and location of structural changes, as well as PAR lags indicators, is based on Genetic Algorithms, which are suitable because of high dimensionality of the discrete search space. An application to Italian industrial production index time series is also proposed.
Bläsius, Thomas; Friedrich, Tobias; Katzmann, Maximilian; Krohmer, AntonHyperbolic Embeddings for Near-Optimal Greedy Routing. Algorithm Engineering and Experiments (ALENEX) 2018: 199-208
Greedy routing computes paths between nodes in a network by successively moving to the neighbor closest to the target with respect to coordinates given by an embedding into some metric space. Its advantage is that only local information is used for routing decisions. We present different algorithms for generating graph embeddings into the hyperbolic plane that are well suited for greedy routing. In particular our embeddings guarantee that greedy routing always succeeds in reaching the target and we try to minimize the lengths of the resulting greedy paths. We evaluate our algorithm on multiple generated and real wold networks. For networks that are generally assumed to have a hidden underlying hyperbolic geometry, such as the Internet graph, we achieve near-optimal results, i.e., the resulting greedy paths are only slightly longer than the corresponding shortest paths. In the case of the Internet graph, they are only \(6\%\) longer when using our best algorithm, which greatly improves upon the previous best known embedding, whose creation required substantial manual intervention.
Kumar Shekar, Arvind; Pappik, Marcus; Iglesias Sánchez, Patricia; Müller, EmmanuelSelection of Relevant and Non-Redundant Multivariate Ordinal Patterns for Time Series Classification. Discovery Science (DS) 2018: 224-240
Transformation of multivariate time series into feature spaces are common for data mining tasks like classification. Ordinality is one important property in time series that provides a qualitative representation of the underlying dynamic regime. In a multivariate time series, ordinalities from multiple dimensions combine together to be discriminative for the classification problem. However, existing works on ordinality do not address the multivariate nature of the time series. For multivariate ordinal patterns, there is a computational challenge with an explosion of pattern combinations, while not all patterns are relevant and provide novel information for the classification. In this work, we propose a technique for the extraction and selection of relevant and non-redundant multivariate ordinal patterns from the high-dimensional combinatorial search space. Our proposed approach Ordinal feature extraction (ordex), simultaneously extracts and scores the relevance and redundancy of ordinal patterns without training a classifier. As a filter-based approach, ordex aims to select a set of relevant patterns with complementary information. Hence, using our scoring function based on the principles of Chebyshev’s inequality, we maximize the relevance of the patterns and minimize the correlation between them. Our experiments on real world datasets show that ordinality in time series contains valuable information for classification in several applications.
Cseh, Ágnes; Kavitha, TelikepalliPopular Matchings in Complete Graphs. Foundations of Software Technology and Theoretical Computer Science (FSTTCS) 2018: 17:1-17:14
Our input is a complete graph \(G = (V,E)\) on n vertices where each vertex has a strict ranking of all other vertices in \(G\). The goal is to construct a matching in \(G\) that is "globally stable" or popular. A matching \(M\) is popular if \(M\) does not lose a head-to-head election against any matching \(M'\): here each vertex casts a vote for the matching in \(\{M,M'\}\) where it gets a better assignment. Popular matchings need not exist in the given instance \(G\) and the popular matching problem is to decide whether one exists or not. The popular matching problem in \(G\) is easy to solve for odd \(n\). Surprisingly, the problem becomes NP-hard for even n, as we show here.
Friedrich, Tobias; Quinzan, Francesco; Wagner, MarkusEscaping Large Deceptive Basins of Attraction with Heavy Mutation Operators. Genetic and Evolutionary Computation Conference (GECCO) 2018: 293-300
In many Evolutionary Algorithms (EAs), a parameter that needs to be tuned is that of the mutation rate, which determines the probability for each decision variable to be mutated. Typically, this rate is set to 1/n for the duration of the optimization, where n is the number of decision variables. This setting has the appeal that the expected number of mutated variables per iteration is one. In a recent theoretical study, it was proposed to sample the number of mutated variables from a power-law distribution. This results into a significantly higher probability on larger numbers of mutations, so that escaping local optima becomes more probable. In this paper, we propose another class of non-uniform mutation rates. We study the benefits of this operator in terms of average-case black-box complexity analysis and experimental comparison. We consider both pseudo-Boolean artificial landscapes and combinatorial problems (the Minimum Vertex Cover and the Maximum Cut). We observe that our non-uniform mutation rates significantly outperform the standard choices, when dealing with landscapes that exhibit large deceptive basins of attraction.
Friedrich, Tobias; Kötzing, Timo; Quinzan, Francesco; Sutton, Andrew M.Improving the Run Time of the (1+1) Evolutionary Algorithm with Luby Sequences. Genetic and Evolutionary Computation Conference (GECCO) 2018: 301-308
In the context of black box optimization, the most common way to handle deceptive attractors is to periodically restart the algorithm. In this paper, we explore the benefits of combining the simple \((1+1)\) Evolutionary Algorithm (EA) with the Luby Universal Strategy - the \((1+1)~EA_{\mathcal{U}}\), a meta-heuristic that does not require parameter tuning. We first consider two artificial pseudo-Boolean landscapes, on which the \((1+1)~EA\) exhibits exponential run time. We prove that the \((1+1)~EA_{\mathcal{U}}\) has polynomial run time on both instances. We then consider the Minimum Vertex Cover on two classes of graphs. Again, the \((1+1)~EA\) yields exponential run time on those instances, and the \((1+1)~EA_{\mathcal{U}}\) finds the global optimum in polynomial time. We conclude by studying the Makespan Scheduling. We consider an instance on which the \((1+1)~EA\) does not find a \((4/3-\epsilon)\)-approximation in polynomial time, and we show that the \((1+1)~EA_{\mathcal{U}}\) reaches a \((4/3-\epsilon)\)-approximation in polynomial time. We then prove that the \((1+1)~EA_{\mathcal{U}}\) serves as an Efficient Polynomial-time Approximation Scheme (EPTAS) for the Partition Problem, for a \((1+\epsilon)\)-approximation with \(\epsilon > 4/n\).
Gao, Wanru; Friedrich, Tobias; Neumann, Frank; Hercher, ChristianRandomized Greedy Algorithms for Covering Problems. Genetic and Evolutionary Computation Conference (GECCO) 2018: 309-315
Greedy algorithms provide a fast and often also effective solution to many combinatorial optimization problems. However, it is well known that they sometimes lead to low quality solutions on certain instances. In this paper, we explore the use of randomness in greedy algorithms for the minimum vertex cover and dominating set problem and compare the resulting performance against their deterministic counterpart. Our algorithms are based on a parameter \(\gamma\) which allows to explore the spectrum between uniform and deterministic greedy selection in the steps of the algorithm and our theoretical and experimental investigations point out the benefits of incorporating randomness into greedy algorithms for the two considered combinatorial optimization problems.
Doerr, Benjamin; Krejca, Martin S.Significance-based Estimation-of-Distribution Algorithms. Genetic and Evolutionary Computation Conference (GECCO) 2018: 1483-1490
Estimation-of-distribution algorithms (EDAs) are randomized search heuristics that maintain a stochastic model of the solution space. This model is updated from iteration to iteration based on the quality of the solutions sampled according to the model. As previous works show, this short-term perspective can lead to erratic updates of the model, in particular, to bit-frequencies approaching a random boundary value. This can lead to significant performance losses. In order to overcome this problem, we propose a new EDA that takes into account a longer history of samples and updates its model only with respect to information which it classifies as statistically significant. We prove that this significance-based compact genetic algorithm (sig-cGA) optimizes the common benchmark functions OneMax and LeadingOnes both in \(O(n\log n)\) time, a result shown for no other EDA or evolutionary algorithm so far. For the recently proposed scGA – an EDA that tries to prevent erratic model updates by imposing a bias to the uniformly distributed model – we prove that it optimizes OneMax only in a time exponential in the hypothetical population size \(1/\rho\).
Issac, Davis; Chandran, L. Sunil; Cheung, Yuen KuengSpanning tree congestion and computation of gyori lovasz partition. International Colloquium on Automata, Languages, and Programming (ICALP) 2018
We study a natural problem in graph sparsification, the Spanning Tree Congestion (STC) problem. Informally, it seeks a spanning tree with no tree-edge routing too many of the original edges. For any general connected graph with n vertices and m edges, we show that its STC is at most \( \mathcal{O(\sqrt{mn}) \), which is asymptotically optimal since we also demonstrate graphs with STC at least \( \Omega(\sqrt{mn}) \). We present a polynomial-time algorithm which computes a spanning tree with congestion \( \mathcal{O(\sqrt{mn}) \log n \) \( \mathcal{O(\sqrt{mn} \log n ) \). We also present another algorithm for computing a spanning tree with congestion \( \mathcal{O(\sqrt{mn}) \); this algorithm runs in sub-exponential time when \(m = \omega(n \log^2 n ) \). For achieving the above results, an important intermediate theorem is generalized Györi-Lovász theorem. Chen et al. [Jiangzhuo Chen et al., 2007] gave a non-constructive proof. We give the first elementary and constructive proof with a local search algorithm of running time \( \mathcal{O^*( 4^n ) \). We discuss some consequences of the theorem concerning graph partitioning, which might be of independent interest. We also show that for any graph which satisfies certain expanding properties, its STC is at most \( \mathcal{O^*(n) \), and a corresponding spanning tree can be computed in polynomial time. We then use this to show that a random graph has STC \( \Theta(n) \) with high probability.
Arar, Moab; Chechik, Shiri; Cohen, Sarel; Stein, Cliff; Wajc, DavidDynamic Matching: Reducing Integral Algorithms to Approximately-Maximal Fractional Algorithms. International Colloquium on Automata, Languages and Programming (ICALP) 2018: 7:1-7:16
We present a simple randomized reduction from fully-dynamic integral matching algorithms to fully-dynamic "approximately-maximal" fractional matching algorithms. Applying this reduction to the recent fractional matching algorithm of Bhattacharya, Henzinger, and Nanongkai (SODA 2017), we obtain a novel result for the integral problem. Specifically, our main result is a randomized fully-dynamic \((2+\varepsilon)\)-approximate integral matching algorithm with small polylog worst-case update time. For the \((2+\varepsilon)\)-approximation regime only a fractional fully-dynamic \((2+\varepsilon)\)-matching algorithm with worst-case polylog update time was previously known, due to Bhattacharya et al. (SODA 2017). Our algorithm is the first algorithm that maintains approximate matchings with worst-case update time better than polynomial, for any constant approximation ratio. As a consequence, we also obtain the first constant-approximate worst-case polylogarithmic update time maximum weight matching algorithm.
Bläsius, Thomas; Freiberger, Cedric; Friedrich, Tobias; Katzmann, Maximilian; Montenegro-Retana, Felix; Thieffry, MarianneEfficient Shortest Paths in Scale-Free Networks with Underlying Hyperbolic Geometry. International Colloquium on Automata, Languages, and Programming (ICALP) 2018: 20:1-20:14
A common way to accelerate shortest path algorithms on graphs is the use of a bidirectional search, which simultaneously explores the graph from the start and the destination. It has been observed recently that this strategy performs particularly well on scale-free real-world networks. Such networks typically have a heterogeneous degree distribution (e.g., a power-law distribution) and high clustering (i.e., vertices with a common neighbor are likely to be connected themselves). These two properties can be obtained by assuming an underlying hyperbolic geometry. To explain the observed behavior of the bidirectional search, we analyze its running time on hyperbolic random graphs and prove that it is \(\tilde{O}(n\)\(^{2 - 1/ \alpha}+\)\(n^{1/(2\alpha)}\)\(+ \delta_{\max})\) with high probability, where \(\alpha\)\(\in\)\((0.5, 1)\) controls the power-law exponent of the degree distribution, and \(\delta_{\max}\) is the maximum degree. This bound is sublinear, improving the obvious worst-case linear bound. Although our analysis depends on the underlying geometry, the algorithm itself is oblivious to it.
Casel, KatrinResolving Conflicts for Lower-Bounded Clustering. International Symposium on Parameterized and Exact Computation (IPEC) 2018: 23:1-23:14
This paper considers the effect of non-metric distances for lower-bounded clustering, i.e., the problem of computing a partition for a given set of objects with pairwise distance, such that each set has a certain minimum cardinality (as required for anonymisation or balanced facility location problems). We discuss lower-bounded clustering with the objective to minimise the maximum radius or diameter of the clusters. For these problems there exists a 2-approximation but only if the pairwise distance on the objects satisfies the triangle inequality, without this property no polynomial-time constant factor approximation is possible. We try to resolve or at least soften this effect of non-metric distances by devising particular strategies to deal with violations of the triangle inequality ("conflicts"). With parameterised algorithmics, we find that if the number of such conflicts is not too large, constant factor approximations can still be computed efficiently. In particular, we introduce parameterised approximations with respect to not just the number of conflicts but also for the vertex cover number of the "conflict graph" (graph induced by conflicts). Interestingly, we salvage the approximation ratio of 2 for diameter while for radius it is only possible to show a ratio of 3. For the parameter vertex cover number of the conflict graph this worsening in ratio is shown to be unavoidable, unless FPT=W[2]. We further discuss improvements for diameter by choosing the (induced) \(P_3\)-cover number of the conflict graph as parameter and complement these by showing that, unless FPT=W[1], there exists no constant factor parameterised approximation with respect to the parameter split vertex deletion set.
Cucina, Domenico; Rizzo, Manuel; Ursu, EugenIdentification of Multiregime Periodic Autoregressive Models by Genetic Algorithms. International Conference on Time Series and Forecasting (ITISE) 2018: 396-407