Kötzing, Timo; Palenta, Raphaela A Map of Update Constraints in Inductive InferenceAlgorithmic Learning Theory (ALT) 2014: 40–54
We investigate how different learning restrictions reduce learning power and how the different restrictions relate to one another. We give a complete map for nine different restrictions both for the cases of complete information learning and set-driven learning. This completes the picture for these well-studied delayable learning restrictions. A further insight is gained by different characterizations of conservative learning in terms of variants of cautious learning. Our analyses greatly benefit from general theorems we give, for example showing that learners with exclusively delayable restrictions can always be assumed total.
Jain, Sanjay; Kötzing, Timo; Ma, Junqi; Stephan, Frank On the Role of Update Constraints and Text-Types in Iterative LearningAlgorithmic Learning Theory (ALT) 2014: 55–69
The present work investigates the relationship of iterative learning with other learning criteria such as decisiveness, caution, reliability, non-U-shapedness, monotonicity, strong monotonicity and conservativeness. Building on the result of Case and Moelius that iterative learners can be made non-U-shaped, we show that they also can be made cautious and decisive. Furthermore, we obtain various special results with respect to one-one texts, fat texts and one-one hypothesis spaces.
Göbel, Andreas; Goldberg, Leslie Ann; McQuillan, Colin; Richerby, David; Yamakami, Tomoyuki Counting List Matrix Partitions of GraphsConference on Computational Complexity (CCC) 2014: 56–65
Given a symmetric \(D \times D\) matrix \(M\) over \(0, 1, *}\), a list \(M\)-partition of a graph \(G\) is a partition of the vertices of \(G\) into \(D\) parts which are associated with the rows of \(M\). The part of each vertex is chosen from a given list in such a way that no edge of \(G\) is mapped to a 0 in \(M\) and no non-edge of \(G\) is mapped to a 1 in \(M\). Many important graph-theoretic structures can be represented as list \(M\)-partitions including graph colourings, split graphs and homogeneous sets, which arise in the proofs of the weak and strong perfect graph conjectures. Thus, there has been quite a bit of work on determining for which matrices \(M\) computations involving list \(M\)-partitions are tractable. This paper focuses on the problem of counting list \(M\)-partitions, given a graph \(G\) and given lists for each vertex of \(G\). We give an algorithm that solves this problem in polynomial time for every (fixed) matrix \(M\) for which the problem is tractable. The algorithm relies on data structures such as sparse-dense partitions and sub cube decompositions to reduce each problem instance to a sequence of problem instances in which the lists have a certain useful structure that restricts access to portions of \(M\) in which the interactions of 0s and 1s is controlled. We show how to solve the resulting restricted instances by converting them into particular counting constraint satisfaction problems (\#CSPs) which we show how to solve using a constraint satisfaction technique known as "arc-consistency". For every matrix \(M\) for which our algorithm fails, we show that the problem of counting list \(M\)-partitions is #P-complete. Furthermore, we give an explicit characterisation of the dichotomy theorem - counting list \(M\)-partitions is tractable (in FP) if and only if the matrix \(M\) has a structure called a derectangularising sequence. Finally, we show that the meta-problem of determining whether a given matrix has a derectangularising sequence is NP-complete.
Bläsius, Thomas; Brückner, Guido; Rutter, Ignaz Complexity of Higher-Degree Orthogonal Graph Embedding in the Kandinsky ModelEuropean Symposium on Algorithms (ESA) 2014: 161–172
We show that finding orthogonal grid-embeddings of plane graphs (planar with fixed combinatorial embedding) with the minimum number of bends in the so-called Kandinsky model (which allows vertices of degree >4) is NP-complete, thus solving a long-standing open problem. On the positive side, we give an efficient algorithm for several restricted variants, such as graphs of bounded branch width and a subexponential exact algorithm for general plane graphs.
Bringmann, Karl; Friedrich, Tobias; Krohmer, Anton De-anonymization of Heterogeneous Random Graphs in Quasilinear TimeEuropean Symposium on Algorithms (ESA) 2014: 197–208
There are hundreds of online social networks with billions of users in total. Many such networks publicly release structural information, with all personal information removed. Empirical studies have shown, however, that this provides a false sense of privacy - it is possible to identify almost all users that appear in two such anonymized network as long as a few initial mappings are known. We analyze this problem theoretically by reconciling two versions of an artificial power-law network arising from independent subsampling of vertices and edges. We present a new algorithm that identifies most vertices and makes no wrong identifications with high probability. The number of vertices matched is shown to be asymptotically optimal. For an \(n\)-vertex graph, our algorithm uses \(n^\epsilon\) seed nodes (for an arbitrarily small \(\epsilon\)) and runs in quasilinear time. This improves previous theoretical results which need \(\Theta(n)\) seed nodes and have runtimes of order \(n^{1 + \Omega(1)}\). Additionally, the applicability of our algorithm is studied experimentally on different networks.
Casel, Katrin A Fixed-Parameter Approach for Privacy-Protection with Global RecodingFrontiers in Algorithmics (FAW) 2014: 25–35
This paper discusses a problem arising in the field of privacy-protection in statistical databases: Given a \(n \times m\) \(\{0,1\}\)-matrix \(M\), is there a set of mergings which transforms \(M\) into a zero matrix and only affects a bounded number of rows/columns. “Merging” here refers to combining adjacent lines with a component-wise logical AND. This kind transformation models a generalization on OLAP-cubes also called global recoding. Counting the number of affected lines presents a new measure of information-loss for this method. Parameterized by the number of affected lines \(k\) we introduce reduction rules and an \(O^*(2.618^k)\)-algorithm for the new abstract combinatorial problem LMAL.
Bläsius, Thomas; Rutter, Ignaz A New Perspective on Clustered Planarity as a Combinatorial Embedding ProblemGraph Drawing (GD) 2014: 440–451
The clustered planarity problem (\(c\)-planarity) asks whether a hierarchically clustered graph admits a planar drawing such that the clusters can be nicely represented by regions. We introduce the \(cd\)-tree data structure and give a new characterization of \(c\)-planarity. It leads to efficient algorithms for \(c\)-planarity testing in the following cases. (i) Every cluster and every co-cluster has at most two connected components. (ii) Every cluster has at most five outgoing edges. Moreover, the cd-tree reveals interesting connections between \(c\)-planarity and planarity with constraints on the order of edges around vertices. On one hand, this gives rise to a bunch of new open problems related to \(c\)-planarity, on the other hand it provides a new perspective on previous results.
Chicano, Francisco; Whitley, Darrell; Sutton, Andrew M. Efficient identification of improving moves in a ball for pseudo-boolean problemsGenetic and Evolutionary Computation Conference (GECCO) 2014: 437–444
Hill climbing algorithms are at the core of many approaches to solve optimization problems. Such algorithms usually require the complete enumeration of a neighborhood of the current solution. In the case of problems defined over binary strings of length \(n\), we define the \(r\)-ball neighborhood as the set of solutions at Hamming distance \(r\) or less from the current solution. For \(r \ll n\) this neighborhood contains \(\Theta(n^r)\) solutions. In this paper efficient methods areintroduced to locate improving moves in the \(r\)-ball neighborhood for problems that can be written as a sum of a linear number of subfunctions depending on a bounded number of variables. NK-landscapes and MAX-\(k\)-SAT are examples of these problems. If the number of subfunctions depending on any given variable is also bounded, then we prove that the method can explore the neighborhood in constant time, despite the fact that the number of solutions in the neighborhood is polynomial in \(n\). We develop a hill climber based on our exploration method and we analyze its efficiency and efficacy using experiments with NKq-landscapes instances.
Bringmann, Karl; Friedrich, Tobias; Klitzke, Patrick Two-dimensional subset selection for hypervolume and epsilon-indicatorGenetic and Evolutionary Computation Conference (GECCO) 2014: 589–596
The goal of bi-objective optimization is to find a small set of good compromise solutions. A common problem for bi-objective evolutionary algorithms is the following subset selection problem (SSP): Given \(n\) solutions \(P \subset \mathbb{R}^2\) in the objective space, select \(k\) solutions \(P^*\) from \(P\) that optimize an indicator function. In the hypervolume SSP we want to select \(k\) points \(P^*\) that maximize the hypervolume indicator \(I_{HYP}(P^*, r)\) for some reference point \(r \in R^2\). Similarly, the \(\epsilon\)-indicator SSP aims at selecting \(\tilde k\) points \(P^*\) that minimize the \(\epsilon\)-indicator \(I_{\epsilon}(P^*,\mathbb{R})\) for some reference set \(R \subset R^2\) of size \(m\) (which can be \(\mathbb{R}=P\)). We first present a new algorithm for the hypervolume SSP with runtime \(O(n (k + \log n))\). Our second main result is a new algorithm for the \(\epsilon\)-indicator SSP with runtime \(O(n \log n + m \log m)\). Both results improve the current state of the art runtimes by a factor of (nearly) \(n\) and make the problems tractable for new applications. Preliminary experiments confirm that the theoretical results translate into substantial empirical runtime improvements.
Doerr, Benjamin; Doerr, Carola; Kötzing, Timo Unbiased black-box complexities of jump functions: how to cross large plateausGenetic and Evolutionary Computation Conference (GECCO) 2014: 769–776
We analyze the unbiased black-box complexity of jump functions with large jump sizes. Among other results, we show that when the jump size is \((1/2 - \epsilon)n\), that is, only a small constant fraction of the fitness values is visible, then the unbiased black-box complexities for arities 3 and higher are of the same order as those for the simple OneMax function. Even for the extreme jump function, in which all but the two fitness values \(n/2\) and \(n\) are blanked out, polynomial-time mutation-based (i.e., unary unbiased) black-box optimization algorithms exist. This is quite surprising given that for the extreme jump function almost the whole search space (all but a \(\Theta(n^{-1/2})\) fraction) is a plateau of constant fitness.To prove these results, we introduce new tools for the analysis of unbiased black-box complexities, for example, selecting the new parent individual not by comparing the fitnesses of the competing search points, but also by taking into account the (empirical) expected fitnesses of their offspring.
Gießen, Christian; Kötzing, Timo Robustness of populations in stochastic environmentsGenetic and Evolutionary Computation Conference (GECCO) 2014: 1383–1390
We consider stochastic versions of OneMax and LeadingOnes and analyze the performance of evolutionary algorithms with and without populations on these problems. It is known that the (1+1) EA on OneMax performs well in the presence of very small noise, but poorly for higher noise levels. We extend these results to LeadingOnes and to many different noise models, showing how the application of drift theory can significantly simplify and generalize previous analyses. Most surprisingly, even small populations (of size \(\Theta(\log n)\)) can make evolutionary algorithms perform well for high noise levels, well outside the abilities of the (1+1) EA. Larger population sizes are even more beneficial; we consider both parent and offspring populations. In this sense, populations are robust in these stochastic settings.
Kötzing, Timo Concentration of first hitting times under additive driftGenetic and Evolutionary Computation Conference (GECCO) 2014: 1391–1398
Recent advances in drift analysis have given us better and better tools for understanding random processes, including the run time of randomized search heuristics. In the setting of multiplicative drift we do not only have excellent bounds on the expected run time, but also more general results showing the concentration of the run time. In this paper we investigate the setting of additive drift under the assumption of strong concentration of the "step size" of the process. Under sufficiently strong drift towards the goal we show a strong concentration of the hitting time. In contrast to this, we show that in the presence of small drift a Gambler's-Ruin-like behavior of the process overrides the influence of the drift. Finally, in the presence of sufficiently strong negative drift the hitting time is superpolynomial with high probability; this corresponds to the so-called negative drift theorem, for which we give new variants.
Bringmann, Karl; Friedrich, Tobias; Klitzke, Patrick Generic Postprocessing via Subset Selection for Hypervolume and Epsilon-IndicatorParallel Problem Solving from Nature (PPSN) 2014: 518–527
Most biobjective evolutionary algorithms maintain a population of fixed size \(\mu\) and return the final population at termination. During the optimization process many solutions are considered, but most are discarded. We present two generic postprocessing algorithms which utilize the archive of all non-dominated solutions evaluated during the search. We choose the best \(\mu\) solutions from the archive such that the hypervolume or \(\epsilon\)-indicator is maximized. This postprocessing costs no additional fitness function evaluations and has negligible runtime compared to most EMOAs. We experimentally examine our postprocessing for four standard algorithms (NSGA-II, SPEA2, SMS-EMOA, IBEA) on ten standard test functions (DTLZ 1-2,7, ZDT 1-3, WFG 3-6) and measure the average quality improvement. The median decrease of the distance to the optimal \(\epsilon\)-indicator is \(95\%\), the median decrease of the distance to the optimal hypervolume value is \(86\%\). We observe similar performance on a real-world problem (wind turbine placement).
Friedrich, Tobias; Neumann, Frank Maximizing Submodular Functions under Matroid Constraints by Multi-objective Evolutionary AlgorithmsParallel Problem Solving from Nature (PPSN) 2014: 922–931
Nominated for Best Paper Award
Many combinatorial optimization problems have underlying goal functions that are submodular. The classical goal is to find a good solution for a given submodular function f under a given set of constraints. In this paper, we investigate the runtime of a multi-objective evolutionary algorithm called GSEMO until it has obtained a good approximation for submodular functions. For the case of monotone submodular functions and uniform cardinality constraints we show that GSEMO achieves a \((1 - 1/e)\)-approximation in expected time \(O(n^2(\log n+k))\), where \(k\) is the value of the given constraint. For the case of non-monotone submodular functions with \(k\) matroid intersection constraints, we show that GSEMO achieves a \((1/(k + 2 + 1/k + \epsilon)\)-approximation in expected time \(O(n^{k+5\log(n)/\epsilon)\).
Sutton, Andrew M.; Neumann, Frank Runtime Analysis of Evolutionary Algorithms on Randomly Constructed High-Density Satisfiable 3-CNF FormulasParallel Problem Solving from Nature (PPSN) 2014: 942–951
We show that simple mutation-only evolutionary algorithms find a satisfying assignment on two similar models of random planted 3-CNF Boolean formulas in polynomial time with high probability in the high constraint density regime. We extend the analysis to random formulas conditioned on satisfiability (i.e., the so-called filtered distribution) and conclude that most high-density satisfiable formulas are easy for simple evolutionary algorithms. With this paper, we contribute the first rigorous study of randomized search heuristics from the evolutionary computation community on well-studied distributions of random satisfiability problems.
Cseh, Ágnes; Skutella, Martin Paths to Stable AllocationsSymposium Algorithmic Game Theory (SAGT) 2014: 61–73
The stable allocation problem is one of the broadest extensions of the well-known stable marriage problem. In an allocation problem, edges of a bipartite graph have capacities and vertices have quotas to fill. Here we investigate the case of uncoordinated processes in stable allocation instances. In this setting, a feasible allocation is given and the aim is to reach a stable allocation by raising the value of the allocation along blocking edges and reducing it on worse edges if needed. Do such myopic changes lead to a stable solution? In our present work, we analyze both better and best response dynamics from an algorithmic point of view. With the help of two deterministic algorithms we show that random procedures reach a stable solution with probability one for all rational input data in both cases. Surprisingly, while there is a polynomial path to stability when better response strategies are played (even for irrational input data), the more intuitive best response steps may require exponential time. We also study the special case of correlated markets. There, random best response strategies lead to a stable allocation in expected polynomial time.
Göbel, Andreas; Goldberg, Leslie Ann; Richerby, David Counting Homomorphisms to Cactus Graphs Modulo 2Symposium on Theoretical Aspects of Computer Science (STACS) 2014: 350–361
A homomorphism from a graph \(G\) to a graph \(H\) is a function from \(V(G)\) to \(V(H)\) that preserves edges. Many combinatorial structures that arise in mathematics and in computer science can be represented naturally as graph homomorphisms and as weighted sums of graph homomorphisms. In this article, we study the complexity of counting homomorphisms modulo 2. The complexity of modular counting was introduced by Papadimitriou and Zachos and it has been pioneered by Valiant who famously introduced a problem for which counting modulo 7 is easy but counting modulo 2 is intractable. Modular counting provides a rich setting in which to study the structure of homomorphism problems. In this case, the structure of the graph \(H\) has a big influence on the complexity of the problem. Thus, our approach is graph-theoretic. We give a complete solution for the class of cactus graphs, which are connected graphs in which every edge belongs to at most one cycle. Cactus graphs arise in many applications such as the modelling of wireless sensor networks and the comparison of genomes. We show that, for some cactus graphs \(H\), counting homomorphisms to \(H\) modulo 2 can be done in polynomial time. For every other fixed cactus graph \(H\), the problem is complete in the complexity class \(\oplus P\), which is a wide complexity class to which every problem in the polynomial hierarchy can be reduced (using randomised reductions). Determining which \(H\) lead to tractable problems can be done in polynomial time. Our result builds upon the work of Faben and Jerrum, who gave a dichotomy for the case in which \(H\) is a tree.
Kötzing, Timo A Solution to Wiehagen’s ThesisSymposium on Theoretical Aspects of Computer Science (STACS) 2014: 494–505
Wiehagen's Thesis in Inductive Inference (1991) essentially states that, for each learning criterion, learning can be done in a normalized, enumerative way. The thesis was not a formal statement and thus did not allow for a formal proof, but support was given by examples of a number of different learning criteria that can be learned by enumeration. Building on recent formalizations of learning criteria, we are now able to formalize Wiehagen's Thesis. We prove the thesis for a wide range of learning criteria, including many popular criteria from the literature. We also show the limitations of the thesis by giving four learning criteria for which the thesis does not hold (and, in two cases, was probably not meant to hold). Beyond the original formulation of the thesis, we also prove stronger versions which allow for many corollaries relating to strongly decisive and conservative learning.