Fourquet, Océane; Krejca, Martin S.; Doerr, Carola; Schwikowski, Benno Towards the genome-scale discovery of bivariate monotonic classifiersBMC Bioinformatics 2025: 228:1–228:32
Background Bivariate monotonic classifiers (BMCs) are based on pairs of input features. Like many other models used for machine learning, they can capture nonlinear patterns in high-dimensional data. At the same time, they are simple and easy to interpret. Until now, the use of BMCs on a genome scale was hampered by the high computational complexity of the search for pairs of features with a high leave-one-out performance estimate. Results We introduce the fastBMC algorithm, which drastically speeds up the identification of BMCs. The algorithm is based on a mathematical bound for the BMC performance estimate while maintaining optimality. We show empirically that fastBMC speeds up the computation by a factor of at least 15 already for a small number of features, compared to the traditional approach. For two of the three smaller biomedical datasets that we consider here, the resulting possibility of considering much larger sets of features translates into significantly improved classification performance. As an example of the high degree of interpretability of BMCs, we discuss a straightforward interpretation of a BMC glioblastoma survival predictor, an immediate novel biomedical hypothesis, options for biomedical validation, and treatment implications. In addition, we study the performance of fastBMC on a larger and well-known breast cancer dataset, validating the benefits of the BMCs for biomarker identification and biomedical hypothesis generation. Conclusion fastBMC enables the rapid construction of robust and interpretable ensemble models using BMC, facilitating the discovery of gene pairs predictive of relevant phenotypes and their interaction in that context. Availability We provide the first open-source implementation for learning BMCs, a Python implementation of fastBMC in particular, and Python code to reproduce the fastBMC results on real and simulated data in this paper, at https://github.com/oceanefrqt/fastBMC.
Chwiałkowski, Marcel; Doerr, Benjamin; Krejca, Martin S. Runtime Analysis of the Compact Genetic Algorithm on the LeadingOnes BenchmarkIEEE Transactions on Evolutionary Computation 2025: 311–320
The compact genetic algorithm (cGA) is one of the simplest estimation-of-distribution algorithms (EDAs). Next to the univariate marginal distribution algorithm (UMDA)---another simple EDA---, the cGA has been subject to extensive mathematical runtime analyses, often showcasing a similar or even superior performance to competing approaches. Surprisingly though, up to date and in contrast to the UMDA and many other heuristics, we lack a rigorous runtime analysis of the cGA on the LeadingOnes benchmark---one of the most studied theory benchmarks in the domain of evolutionary computation. We fill this gap in the literature by conducting a formal runtime analysis of the cGA on LeadingOnes. For the cGA's single parameter---called the hypothetical population size---at least polylogarithmically larger than the problem size, we prove that the cGA samples the optimum of LeadingOnes with high probability within a number of function evaluations quasi-linear in the problem size and linear in the hypothetical population size. For the best hypothetical population size, our result matches, up to polylogarithmic factors, the typical quadratic runtime that many randomized search heuristics exhibit on LeadingOnes. Our analysis exhibits some noteworthy differences in the working principles of the two algorithms which were not visible in previous works.
Doerr, Benjamin; Korkotashvili, Dimitri; Krejca, Martin S. Difficulties of the NSGA-II with the Many-Objective LeadingOnes ProblemIEEE Transactions on Evolutionary Computation 2025: 773–782
The NSGA-II is the most prominent multi-objective evolutionary algorithm (cited more than 50,000 times). Very recently, a mathematical runtime analysis has proven that this algorithm can have enormous difficulties when the number of objectives is larger than two (Zheng, Doerr. IEEE Transactions on Evolutionary Computation (2024)). However, this result was shown only for the OneMinMax benchmark problem, which has the particularity that all solutions are on the Pareto front, a fact heavily exploited in the proof of this result. In this work, we show a comparable result for the LeadingOnesTrailingZeroes benchmark. This popular benchmark problem appears more natural in that most of its solutions are not on the Pareto front. With a careful analysis of the population dynamics of the NGSA-II optimizing this benchmark, we manage to show that when the population grows on the Pareto front, then it does so much faster by creating known Pareto optima than by spreading out on the Pareto front. Consequently, already when still a constant fraction of the Pareto front is unexplored, the crowding distance becomes the crucial selection mechanism, and thus the same problems arise as in the optimization of OneMinMax. With these and some further arguments, we show that the NSGA-II, with a population size by at most a constant factor larger than the Pareto front, cannot compute the Pareto front in less than exponential time.
Doerr, Benjamin; Krejca, Martin S.; Rudolph, Günter Runtime Analysis for Multi-Objective Evolutionary Algorithms in Unbounded Integer SpacesAnnual AAAI Conference on Artificial Intelligence (AAAI) 2025: 26955–26963
Randomized search heuristics have been applied successfully to a plethora of problems. This success is complemented by a large body of theoretical results. Unfortunately, the vast majority of these results regard problems with binary or continuous decision variables -- the theoretical analysis of randomized search heuristics for unbounded integer domains is almost nonexistent. To resolve this shortcoming, we start the runtime analysis of multi-objective evolutionary algorithms, which are among the most successful randomized search heuristics, for unbounded integer search spaces. We analyze single- and full-dimensional mutation operators with three different mutation strengths, namely changes by plus/minus one (unit strength), random changes following a law with exponential tails, and random changes following a power-law. The performance guarantees we prove on a recently proposed natural benchmark problem suggest that unit mutation strengths can be slow when the initial solutions are far from the Pareto front. When setting the expected change right (depending on the benchmark parameter and the distance of the initial solutions), the mutation strength with exponential tails yields the best runtime guarantees in our results -- however, with a wrong choice of this expectation, the performance guarantees quickly become highly uninteresting. With power-law mutation, which is an essentially parameter-less mutation operator, we obtain good results uniformly over all problem parameters and starting points. We complement our mathematical findings with experimental results that suggest that our bounds are not always tight. Most prominently, our experiments indicate that power-law mutation outperforms the one with exponential tails even when the latter uses a near-optimal parametrization. Hence, we suggest to favor power-law mutation for unknown problems in integer spaces.
Doerr, Benjamin; Ivan, Tudor; Krejca, Martin S. Speeding Up the NSGA-II With a Simple Tie-Breaking RuleAnnual AAAI Conference on Artificial Intelligence (AAAI) 2025: 26964–26972
The non-dominated sorting genetic algorithm II (NSGA-II) is the most popular multi-objective optimization heuristic. Recent mathematical runtime analyses have detected two shortcomings in discrete search spaces, namely, that the NSGA-II has difficulties with more than two objectives and that it is very sensitive to the choice of the population size. To overcome these difficulties, we analyze a simple tie-breaking rule in the selection of the next population. Similar rules have been proposed before, but have found only little acceptance. We prove the effectiveness of our tie-breaking rule via mathematical runtime analyses on the classic OneMinMax, LeadingOnesTrailingZeros, and OneJumpZeroJump benchmarks. We prove that this modified NSGA-II can optimize the three benchmarks efficiently also for many objectives, in contrast to the exponential lower runtime bound previously shown for OneMinMax with three or more objectives. For the bi-objective problems, we show runtime guarantees that do not increase when moderately increasing the population size over the minimum admissible size. For example, for the OneJumpZeroJump problem with representation length \($n$\) and gap parameter \($k$\), we show a runtime guarantee of \($O(\max\{n^{k+1},Nn\})$\) function evaluations when the population size is at least four times the size of the Pareto front. For population sizes larger than the minimal choice \($N = \Theta(n)$\), this result improves considerably over the \($\Theta(Nn^k)$\) runtime of the classic NSGA-II.
Krejca, Martin S.; Neumann, Frank; Witt, Carsten Population Dynamics and Improved Runtime Guarantees for the (µ+1) EA on BinValFoundations of Genetic Algorithms (FOGA) 2025: 142–153
Populations play a key role in the area of evolutionary computation to tackle complex optimization problems. Nevertheless, it is hard to understand the underlying population dynamics from a theoretical perspective, and only a limited number of theoretical results for population-based algorithms are available even for simple benchmark functions. In this paper, we study the classic \((1+1)\) EA on the benchmark problem BinVal, which allows for exponentially many function values. Previous methods for the analysis, based on fitness levels and multiplicative drift analysis, lead to runtime bounds for this function of size \(n\) that include an additive term of \(\Theta(n^2)\). We provide new insights into how this standard algorithm optimizes BinVal, and we provide runtime bounds that are polynomial in the population size \(\mu\) and do not include this additive term. In particular, we prove bounds on the expected runtime that are \(O(mu^5 n\log(n/\mu^4))\) for standard bit mutation, which is \(O(n \log n)\) for constant \(\mu\). Our analysis considers the population dynamics of the \((1+1)\) EA more closely, proving that copies created by mutation lead to a low diversity in short blocks of bits across all individuals. We extend this method to mutation operators that cannot create duplicates, and prove bounds similar to standard bit mutation.
Alghouass, Yasser; Doerr, Benjamin; Krejca, Martin S.; Lagmah, Mohammed Proven Approximation Guarantees in Multi-Objective Optimization: SPEA2 Beats NSGA-IIInternational Joint Conferences on Artifical Intelligence (IJCAI) 2025: 8833–8841
Together with the NSGA-II and SMS-EMOA, the strength Pareto evolutionary algorithm 2 (SPEA2) is one of the most prominent dominance-based multi-objective evolutionary algorithms (MOEAs). Different from the NSGA-II, it does not employ the crowding distance (essentially the distance to neighboring solutions) to compare pairwise non-dominating solutions but a complex system of \(\sigma\)-distances that builds on the distances to all other solutions. In this work, we give a first mathematical proof showing that this more complex system of distances can be superior. More specifically, we prove that a simple steady-state SPEA2 can compute optimal approximations of the Pareto front of the OneMinMax benchmark in polynomial time. The best proven guarantee for a comparable variant of the NSGA-II only assures approximation ratios of roughly a factor of two, and both mathematical analyses and experiments indicate that optimal approximations are not found efficiently.
Doerr, Benjamin; Krejca, Martin S.; Opris, Andre Tight Runtime Guarantees From Understanding the Population Dynamics of the GSEMO Multi-Objective Evolutionary AlgorithmInternational Joint Conferences on Artifical Intelligence (IJCAI) 2025: 8876–8884
The global simple evolutionary multi-objective optimizer (GSEMO) is a simple, yet often effective multi-objective evolutionary algorithm (MOEA). By only maintaining non-dominated solutions, it has a variable population size that automatically adjusts to the needs of the optimization process. The downside of the dynamic population size is that the population dynamics of this algorithm are harder to understand, resulting, e.g., in the fact that only sporadic tight runtime analyses exist. In this work, we significantly enhance our understanding of the dynamics of the GSEMO, in particular, for the classic CountingOnesCountingZeros (COCZ) benchmark. From this, we prove a lower bound of order \(\Omega(n^2 \log n)\), for the first time matching the seminal upper bounds known for over twenty years. We also show that the GSEMO finds any constant fraction of the Pareto front in time \(O(n^2)\), improving over the previous estimate of \(O(n^2 \log n)\) for the time to find the first Pareto optimum. Our methods extend to other classic benchmarks and yield, e.g., the first \(\Omega(n^{k+1})\) lower bound for the OJZJ benchmark in the case that the gap parameter is \(k \in \{2,3\}\). We are therefore optimistic that our new methods will be useful in future mathematical analyses of MOEAs.