Gießen, Christian; Kötzing, Timo Robustness of Populations in Stochastic EnvironmentsAlgorithmica 2016: 462–489
We consider stochastic versions of OneMax and LeadingOnes and analyze the performance of evolutionary algorithms with and without populations on these problems. It is known that the (1+1) EA on OneMax performs well in the presence of very small noise, but poorly for higher noise levels. We extend these results to LeadingOnes and to many different noise models, showing how the application of drift theory can significantly simplify and generalize previous analyses. Most surprisingly, even small populations (of size \(\Theta(\log n))\) can make evolutionary algorithms perform well for high noise levels, well outside the abilities of the (1+1) EA. Larger population sizes are even more beneficial; we consider both parent and offspring populations. In this sense, populations are robust in these stochastic settings.
Kötzing, Timo Concentration of First Hitting Times Under Additive DriftAlgorithmica 2016: 490–506
Recent advances in drift analysis have given us better and better tools for understanding random processes, including the run time of randomized search heuristics. In the setting of multiplicative drift we do not only have excellent bounds on the expected run time, but also more general results showing the concentration of the run time. In this paper we investigate the setting of additive drift under the assumption of strong concentration of the "step size" of the process. Under sufficiently strong drift towards the goal we show a strong concentration of the hitting time. In contrast to this, we show that in the presence of small drift a Gambler's-Ruin-like behavior of the process overrides the influence of the drift. Finally, in the presence of sufficiently strong negative drift the hitting time is superpolynomial with high probability; this corresponds to the so-called negative drift theorem, for which we give new variants.
Friedrich, Tobias; Kötzing, Timo; Krejca, Martin S.; Sutton, Andrew M. Robustness of Ant Colony Optimization to NoiseEvolutionary Computation 2016: 237–254
Recently Ant Colony Optimization (ACO) algorithms have been proven to be efficient in uncertain environments, such as noisy or dynamically changing fitness functions. Most of these analyses focus on combinatorial problems, such as path finding. We analyze an ACO algorithm in a setting where we try to optimize the simple OneMax test function, but with additive posterior noise sampled from a Gaussian distribution. Without noise the classical \((\mu+1)\)-EA outperforms any ACO algorithm, with smaller \(\mu\) being better; however, with large noise, the \((\mu+1)\)-EA fails, even for high values of \(\mu\) (which are known to help against small noise). In this paper we show that ACO is able to deal with arbitrarily large noise in a graceful manner, that is, as long as the evaporation factor \(p\) is small enough dependent on the parameter \(\delta^2\) of the noise and the dimension \(n\) of the search space \((p = o(1/(n(n + \delta \log n)^2 \log n)))\), optimization will be successful.
Case, John; Kötzing, Timo Strongly non-U-shaped language learning results by general techniquesInformation and Computation 2016: 1–15
In learning, a semantic or behavioral U-shape occurs when a learner first learns, then unlearns, and, finally, relearns, some target concept. This paper introduces two general techniques and applies them especially to syntactic U-shapes in learning: one technique to show when they are necessary and one to show when they are unnecessary. The technique for the former is very general and applicable to a much wider range of learning criteria. It employs so-called self-learning classes of languages which are shown to characterize completely one criterion learning more than another. We apply these techniques to show that, for set-driven and rearrangement-independent learning, any kind of U-shapes is unnecessary. Furthermore, we show that U-shapes are necessary in a strong way for iterative learning, contrasting with an earlier result by Case and Moelius that semantic U-shapes are unnecessary for iterative learning.
Jain, Sanjay; Kötzing, Timo; Ma, Junqi; Stephan, Frank On the Role of Update Constraints and Text-Types in Iterative LearningInformation and Computation 2016: 152–168
The present work investigates the relationship of iterative learning with other learning criteria such as decisiveness, caution, reliability, non-U-shapedness, monotonicity, strong monotonicity and conservativeness. Building on the result of Case and Moelius that iterative learners can be made non-U-shaped, we show that they also can be made cautious and decisive. Furthermore, we obtain various special results with respect to one-one texts, fat texts and one-one hypothesis spaces.
Jain, Sanjay; Kötzing, Timo; Stephan, Frank Enlarging learnable classesInformation and Computation 2016: 194–207
We study which classes of recursive functions satisfy that their union with any other explanatorily learnable class of recursive functions is again explanatorily learnable. We provide sufficient criteria for classes of recursive functions to satisfy this property and also investigate its effective variants. Furthermore, we study the question which learners can be effectively extended to learn a larger class of functions. We solve an open problem by showing that there is no effective procedure which does this task on all learners which do not learn a dense class of recursive functions. However, we show that there are two effective extension procedures such that each learner is extended by one of them.
Kötzing, Timo; Palenta, Raphaela A map of update constraints in inductive inferenceTheoretical Computer Science 2016: 4–24
We investigate how different learning restrictions reduce learning power and how the different restrictions relate to one another. We give a complete map for nine different restrictions both for the cases of complete information learning and set-driven learning. This completes the picture for these well-studied delayable learning restrictions. A further insight is gained by different characterizations of conservative learning in terms of variants of cautious learning. Our analyses greatly benefit from general theorems we give, for example showing that learners with exclusively delayable restrictions can always be assumed total.
Case, John; Kötzing, Timo Topological Separations in Inductive InferenceTheoretical Computer Science 2016: 33–45
Re learning in the limit from positive data, a major concern is which classes of languages are learnable with respect to a given learning criterion. We are particularly interested herein in the reasons for a class of languages to be unlearnable. We consider two types of reasons. One type is called topological where it does not help if the learners are allowed to be uncomputable (an example of Gold's is that no class containing an infinite language and all its finite sub-languages is learnable - even by an uncomputable learner). Another reason is called computational (where the learners are required to be algorithmic). In particular, two learning criteria might allow for learning different classes of languages from one another - but with dependence on whether the unlearnability is of type topological or computational. In this paper we formalize the idea of two learning criteria separating topologically in learning power. This allows us to study more closely why two learning criteria separate in learning power. For a variety of learning criteria, concerning vacillatory, monotone, (several kinds of) iterative and feedback learning, we show that certain learning criteria separate topologically, and certain others, which are known to separate, are shown not to separate topologically. Showing that learning criteria do not separate topologically implies that any known separation must necessarily exploit algorithmicity of the learner.
Friedrich, Tobias; Kötzing, Timo; Quinzan, Francesco; Sutton, Andrew M. Ant Colony Optimization Beats Resampling on Noisy FunctionsGenetic and Evolutionary Computation Conference (GECCO) 2016: 3–4
Despite the pervasiveness of noise in real-world optimization, there is little understanding of the interplay between the operators of randomized search heuristics and explicit noise-handling techniques such as statistical resampling. Ant Colony Optimization (ACO) algorithms are claimed to be particularly well-suited to dynamic and noisy problems, even without explicit noise-handling techniques. In this work, we empirically investigate the trade-offs between resampling an the noise-handling abilities of ACO algorithms. Our main focus is to locate the point where resampling costs more than it is worth.
Friedrich, Tobias; Kötzing, Timo; Krejca, Martin S.; Sutton, Andrew M. The Benefit of Recombination in Noisy Evolutionary SearchGenetic and Evolutionary Computation Conference (GECCO) 2016: 161–162
Practical optimization problems frequently include uncertainty about the quality measure, for example due to noisy evaluations. Thus, they do not allow for a straightforward application of traditional optimization techniques. In these settings, randomized search heuristics such as evolutionary algorithms are a popular choice because they are often assumed to exhibit some kind of resistance to noise. Empirical evidence suggests that some algorithms, such as estimation of distribution algorithms (EDAs) are robust against a scaling of the noise intensity, even without resorting to explicit noise-handling techniques such as resampling. In this paper, we want to support such claims with mathematical rigor. We introduce the concept of graceful scaling in which the run time of an algorithm scales polynomially with noise intensity. We study a monotone fitness function over binary strings with additive noise taken from a Gaussian distribution. We show that myopic heuristics cannot efficiently optimize the function under arbitrarily intense noise without any explicit noise-handling. Furthermore, we prove that using a population does not help. Finally we show that a simple EDA called the Compact Genetic Algorithm can overcome the shortsightedness of mutation-only heuristics to scale gracefully with noise. We conjecture that recombinative genetic algorithms also have this property.
Dang, Duc-Cuong; Friedrich, Tobias; Krejca, Martin S.; Kötzing, Timo; Lehre, Per Kristian; Oliveto, Pietro S.; Sudholt, Dirk; Sutton, Andrew Michael Escaping Local Optima with Diversity Mechanisms and CrossoverGenetic and Evolutionary Computation Conference (GECCO) 2016: 645–652
Population diversity is essential for the effective use of any crossover operator. We compare seven commonly used diversity mechanisms and prove rigorous run time bounds for the \((\mu+1)\) GA using uniform crossover on the fitness function \(Jump_k\). All previous results in this context only hold for unrealistically low crossover probability \(p_c=O(k/n)\), while we give analyses for the setting of constant \(p_c < 1\) in all but one case. Our bounds show a dependence on the problem size \(n\), the jump length \(k\), the population size \(\mu\), and the crossover probability \(p_c\). For the typical case of constant \(k > 2\) and constant \(p_c\), we can compare the resulting expected optimisation times for different diversity mechanisms assuming an optimal choice of \(\mu\): \(O(n^{k-1})\) for duplicate elimination/minimisation, \(O(n^2 \log n)\) for maximising the convex hull, \(O(n \log n)\) for det. crowding (assuming \(p_c = k/n\)), \(O(n \log n)\) for maximising the Hamming distance, \(O(n \log n)\) for fitness sharing, \(O(n \log n)\) for the single-receiver island model. This proves a sizeable advantage of all variants of the \((\mu+1)\) GA compared to the (1+1) EA, which requires \(\Theta(n^k)\). In a short empirical study we confirm that the asymptotic differences can also be observed experimentally.
Friedrich, Tobias; Kötzing, Timo; Krejca, Martin S.; Nallaperuma, Samadhi; Neumann, Frank; Schirneck, Martin Fast Building Block Assembly by Majority Vote CrossoverGenetic and Evolutionary Computation Conference (GECCO) 2016: 661–668
Different works have shown how crossover can help with building block assembly. Typically, crossover might get lucky to select good building blocks from each parent, but these lucky choices are usually rare. In this work we consider a crossover operator which works on three parent individuals. In each component, the offspring inherits the value present in the majority of the parents; thus, we call this crossover operator majority vote. We show that, if good components are sufficiently prevalent in the individuals, majority vote creates an optimal individual with high probability. Furthermore, we show that this process can be amplified: as long as components are good independently and with probability at least \(1/2+\delta\), we require only \(O(\log 1/\delta + \log \log n)\) successive stages of majority vote to create an optimal individual with high probability! We show how this applies in two scenarios. The first scenario is the Jump test function. With sufficient diversity, we get an optimization time of \(O(n \log n)\) even for jump sizes as large as \(O(n^{(1/2-\epsilon)})\). Our second scenario is a family of vertex cover instances. Majority vote optimizes this family efficiently, while local searches fail and only highly specialized two-parent crossovers are successful.
Doerr, Benjamin; Doerr, Carola; Kötzing, Timo The Right Mutation Strength for Multi-Valued Decision VariablesGenetic and Evolutionary Computation Conference (GECCO) 2016: 1115–1122
The most common representation in evolutionary computation are bit strings. This is ideal to model binary decision variables, but less useful for variables taking more values. With very little theoretical work existing on how to use evolutionary algorithms for such optimization problems, we study the run time of simple evolutionary algorithms on some OneMax-like functions defined over \(\Omega=\{0,1,\dots,r-1\}n\). More precisely, we regard a variety of problem classes requesting the component-wise minimization of the distance to an unknown target vector \(z \in \Omega\). For such problems we see a crucial difference in how we extend the standard-bit mutation operator to these multi-valued domains. While it is natural to select each position of the solution vector to be changed independently with probability \(1/n\), there are various ways to then change such a position. If we change each selected position to a random value different from the original one, we obtain an expected run time of \(\Theta(nr\log n)\). If we change each selected position by either +1 or -1 (random choice), the optimization time reduces to \(\Theta(nr+n\log n)\). If we use a random mutation strength \(i \in \{0,1,\dots,r-1\}n\) with probability inversely proportional to \(i\) and change the selected position by either +\(i\) or -\(i\) (random choice), then the optimization time becomes \(\Theta(n\log(r)(\log(n)+\log(r)))\), bringing down the dependence on \(r\) from linear to polylogarithmic. One of our results depends on a new variant of the lower bounding multiplicative drift theorem.
Friedrich, Tobias; Kötzing, Timo; Krejca, Martin S. {EDA}s cannot be Balanced and StableGenetic and Evolutionary Computation Conference (GECCO) 2016: 1139–1146
Estimation of Distribution Algorithms (EDAs) work by iteratively updating a distribution over the search space with the help of samples from each iteration. Up to now, theoretical analyses of EDAs are scarce and present run time results for specific EDAs. We propose a new framework for EDAs that captures the idea of several known optimizers, including PBIL, UMDA, \(\lambda\)-MMASIB, cGA, and \((1,\lambda)\)-EA. Our focus is on analyzing two core features of EDAs: a balanced EDA is sensitive to signals in the fitness; a stable EDA remains uncommitted under a biasless fitness function. We prove that no EDA can be both balanced and stable. The LeadingOnes function is a prime example where, at the beginning of the optimization, the fitness function shows no bias for many bits. Since many well-known EDAs are balanced and thus not stable, they are not well-suited to optimize LeadingOnes. We give a stable EDA which optimizes LeadingOnes within a time of \(O(n\,\log n)\).
Friedrich, Tobias; Kötzing, Timo; Krejca, Martin S.; Sutton, Andrew M. Graceful Scaling on Uniform versus Steep-Tailed NoiseParallel Problem Solving From Nature (PPSN) 2016: 761–770
Recently, different evolutionary algorithms (EAs) have been analyzed in noisy environments. The most frequently used noise model for this was additive posterior noise (noise added after the fitness evaluation) taken from a Gaussian distribution. In particular, for this setting it was shown that the \((\mu + 1)\)-EA on OneMax does not scale gracefully (higher noise cannot efficiently be compensated by higher \(\mu\)). In this paper we want to understand whether there is anything special about the Gaussian distribution which makes the \((\mu + 1)\)-EA not scale gracefully. We keep the setting of posterior noise, but we look at other distributions. We see that for exponential tails the \((\mu + 1)\)-EA on OneMax does also not scale gracefully, for similar reasons as in the case of Gaussian noise. On the other hand, for uniform distributions (as well as other, similar distributions) we see that the \((\mu + 1)\)-EA on OneMax does scale gracefully, indicating the importance of the noise model.
Friedrich, Tobias; Kötzing, Timo; Sutton, Andrew M. On the Robustness of Evolving PopulationsParallel Problem Solving From Nature (PPSN) 2016: 771–781
Most theoretical work that studies the benefit of recombination focuses on the ability of crossover to speed up optimization time on specific search problems. In this paper, we take a slightly different perspective and investigate recombination in the context of evolving solutions that exhibit \(\emph{mutational}\) robustness, i.e., they display insensitivity to small perturbations. Various models in population genetics have demonstrated that increasing the effective recombination rate promotes the evolution of robustness. We show this result also holds in the context of evolutionary computation by proving crossover promotes the evolution of robust solutions in the standard \((\mu+1)\) GA. Surprisingly, our results show that the effect is present even when robust solutions are at a selective disadvantage due to lower fitness values.
Doerr, Benjamin; Doerr, Carola; Kötzing, Timo Provably Optimal Self-Adjusting Step Sizes for Multi-Valued Decision VariablesParallel Problem Solving From Nature (PPSN) 2016: 782–791
We regard the problem of maximizing a OneMax-like function defined over an alphabet of size \(r\). In previous work [GECCO 2016] we have investigated how three different mutation operators influence the performance of Randomized Local Search (RLS) and the (1+1) Evolutionary Algorithm. This work revealed that among these natural mutation operators none is superior to the other two for any choice of \(r\). We have also given in [GECCO 2016] some indication that the best achievable run time for large \(r\) is \(\Theta(n \log r(\log n + \log r))\), regardless of how the mutation operator is chosen, as long as it is a static choice (i.e., the distribution used for variation of the current individual does not change over time). Here in this work we show that we can achieve a better performance if we allow for adaptive mutation operators. More precisely, we analyze the performance of RLS using a self-adjusting mutation strength. In this algorithm the size of the steps taken in each iteration depends on the success of previous iterations. That is, the mutation strength is increased after a successful iteration and it is decreased otherwise. We show that this idea yields an expected optimization time of \(\Theta(n(\log n + \log r))\), which is optimal among all comparison-based search heuristics. This is the first time that self-adjusting parameter choices are shown to outperform static choices on a discrete multi-valued optimization problem.
Dang, Duc-Cuong; Lehre, Per Kristian; Friedrich, Tobias; Kötzing, Timo; Krejca, Martin S.; Oliveto, Pietro S.; Sudholt, Dirk; Sutton, Andrew M. Emergence of Diversity and its Benefits for Crossover in Genetic AlgorithmsParallel Problem Solving From Nature (PPSN) 2016: 890–900
Population diversity is essential for avoiding premature convergence in Genetic Algorithms (GAs) and for the effective use of crossover. Yet the dynamics of how diversity emerges in populations are not well understood. We use rigorous runtime analysis to gain insight into population dynamics and GA performance for a standard \((\mu+1)\) GA and the \(Jump_k\) test function. By studying the stochastic process underlying the size of the largest collection of identical genotypes we show that the interplay of crossover followed by mutation may serve as a catalyst leading to a sudden burst of diversity. This leads to improvements of the expected optimisation time of order \(\Omega(n/ \log n)\) compared to mutation-only algorithms like the \((1+1)\) EA.
Kötzing, Timo; Schirneck, Martin Towards an Atlas of Computational Learning TheorySymposium on Theoretical Aspects of Computer Science (STACS) 2016: 47:1–47:13
A major part of our knowledge about Computational Learning stems from comparisons of the learning power of different learning criteria. These comparisons inform about trade-offs between learning restrictions and, more generally, learning settings; furthermore, they inform about what restrictions can be observed without losing learning power. With this paper we propose that one main focus of future research in Computational Learning should be on a structured approach to determine the relations of different learning criteria. In particular, we propose that, for small sets of learning criteria, all pairwise relations should be determined; these relations can then be easily depicted as a map, a diagram detailing the relations. Once we have maps for many relevant sets of learning criteria, the collection of these maps is an Atlas of Computational Learning Theory, informing at a glance about the landscape of computational learning just as a geographical atlas informs about the earth. In this paper we work toward this goal by providing three example maps, one pertaining to partially set-driven learning, and two pertaining to strongly monotone learning. These maps can serve as blueprints for future maps of similar base structure.