2 A Gentle Introduction to Classic Drift Theorems

In this section we present two classic drift theorems, the additive and the multiplicative drift theorem (see Section 2.1 [The Additive and the Multiplicative Drift Theorems]). These two theorems are the basis for most analyses made by drift theory, and many more advanced drift theorems are variations of these two core examples. We give a number of instructive examples for how and when the theorems are helpful in Section 2.2 [Some Simple Applications], followed by two more complex examples in Section 2.3 [More Complex Problems]; at the end of this section, in Section 2.4 [Classic Results for Evolutionary Algorithms], we provide two classic applications from the theory of randomized search heuristics. Note that, for this section, we refrain from diving into the technically most powerful statements and present simpler versions of these theorems; stronger versions can be found in Section 5 [The Zoo: A Tour of Drift Theorems].

2.1 The Additive and the Multiplicative Drift Theorems

We state the two most commonly used drift theorems. The first drift theorem is the additive drift theorem, which requires a uniform bound on the expected change of a random process. It is due to [HY01Drift analysis and average time complexity of evolutionary algorithms. Jun He, Xin Yao. Artif. Intell. 2001., HY04A study of drift analysis for estimating computation time of evolutionary algorithms. Jun He, Xin Yao. Nat. Comput. 2004.]. The very general version given here is due to [KK19First-hitting times under drift. Timo Kötzing, Martin S. Krejca. Theor. Comput. Sci. 2019.], where also an instructive proof can be found. We give another proof on Theorem 5.1 [Additive Drift, Upper Bound, Time Condition].

Theorem 2.1 [Additive Drift, Upper Bound]. Let $(X_t)_{t \in \natnum}$ be an integrable process over $\realnum$, and let $T = \inf\set{t \in \natnum}{X_t \leq 0}$. Furthermore, suppose the following two conditions hold (non-negativity, drift).

(NN): For all $t \leq T$, $X_t \geq 0$.
(D): There is a $\delta > 0$ such that, for all $t \lt T$, it holds that $\Ew{X_t - X_{t+1} \mid X_0,…, X_t} \geq \delta$.

Then $$ \Ew{T} \leq \frac{\Ew{X_0}}{\delta}. $$

The condition (D) is the drift condition, this is where we require the additive progress towards the target state $0$. Note that we require to have drift for all possible histories $X_0,…, X_t$ of the process. In many applications, we have a Markov chain, which implies that conditioning on the history is equivalent on conditioning on $X_t$ only. See Section 8 [Drift as an Average: A Closer Look on the Conditioning of Drift] for a detailed discussion on what to condition on.
The condition (NN) requires non-negativity of the process. We cannot allow the process to assume smaller values than the target $0$ as demonstrated by the following example.

Example 2.2 [Additive Drift and Processes Reaching Negative Numbers]. Suppose our process starts with $X_0 = 5$ and, in each iteration deterministically, the process decreases by $2$. Then the expected time (in fact, the deterministic time) until $X_t \leq 0$ is exactly $3$. If we want to apply Theorem 2.1 [Additive Drift, Upper Bound] we use that the expected gain is $2$, so the conclusion suggests an expected time of $2.5$. This incorrect conclusions comes from the disregard for (NN).
We can amplify this effect with the following example. Let $(X_t)_{t \in \natnum}$ be a random process with $X_0 = 1$ and, for all $t \in \natnum$, with probability $1-1/n$, $X_{t+1} = X_t$ and otherwise $X_{t+1} = -n+1$; the expected time until $X_t \leq 0$ is $n$ (since it follows a geometric distribution with probability $1/n$), while the expected gain is $1$, for which the additive drift theorem would suggest an expected time of $1$.

Note that there are also additive drift theorems that remove the condition (NN) and instead incorporate an additional term in the conclusion, see Theorem 5.3 [Additive Drift, Upper Bound with Overshooting].
The additive drift theorem also allows for a corresponding lower bound as follows [HY01Drift analysis and average time complexity of evolutionary algorithms. Jun He, Xin Yao. Artif. Intell. 2001., HY04A study of drift analysis for estimating computation time of evolutionary algorithms. Jun He, Xin Yao. Nat. Comput. 2004., KK19First-hitting times under drift. Timo Kötzing, Martin S. Krejca. Theor. Comput. Sci. 2019.]. In Theorem 3 of [KST11How crossover helps in pseudo-boolean optimization. Timo Kötzing, Dirk Sudholt, Madeleine Theile. GECCO 2011.], this theorem was used to show a lower bound to derive an asymptotically tight run time analysis of an evolutionary algorithm. Another application can be found in Theorem 4 of [FKL+17Analysis of the (1+1) EA on Subclasses of Linear Functions under Uniform and Linear Constraints. Tobias Friedrich, Timo Kötzing, Gregor Lagodzinski, Frank Neumann, Martin Schirneck. FOGA 2017., FKL+20Analysis of the (1 + 1) EA on subclasses of linear functions under uniform and linear constraints. Tobias Friedrich, Timo Kötzing, J. A. Gregor Lagodzinski, Frank Neumann, Martin Schirneck. Theor. Comput. Sci. 2020.].

Theorem 2.3 [Additive Drift, Lower Bound]. Let $(X_t)_{t \in \natnum}$ be an integrable process over $\realnum$, and let $T = \inf\set{t \in \natnum}{X_t \leq 0}$. Furthermore, suppose the following conditions (bounded steps, drift).

(B): There is a $c \gt 0$ such that, for all $t \lt T$, it holds that $\Ew{|X_t - X_{t+1}| \mid X_0,…, X_t} \leq c$.
(D): There is a $\delta \gt 0$ such that, for all $t \lt T$, it holds that $\Ew{X_t - X_{t+1} \mid X_0,…, X_t} \leq \delta$.

Then $$ \Ew{T} \geq \frac{\Ew{X_0}}{\delta}. $$

For this lower bound we need to require (B), a bounded expected step size. This is to avoid counterexamples like the following process.

Example 2.4 [Additive Drift and Unbounded Step Size]. Let $(X_t)_{t \in \natnum}$ with $X_0 = 1$ and, for all $t$, with probability $1/2$, $X_{t+1} = 0$ and otherwise $X_{t+1} = 2X_t - 2\delta$. This process exhibits a drift of $\delta$, suggesting an expected time of $1/\delta$, but the true time until $X_t \leq 0$ is again geometrically distributed, this time with probability $1/2$, giving an expected time of $2$.

In order to apply an additive drift theorem, one has to find a single constant $\delta$ bounding drift uniformly. However, for processes where large parts of the state space exhibit a drift very differnt from this uniform bound, stronger results can be obtained by using a drift theorem which allows for a different drift in different states of the process.
The multiplicative drift theorem covers the case where the drift is proportional to the current value of the process. It is due to [DJW10Multiplicative drift analysis. Benjamin Doerr, Daniel Johannsen, Carola Winzen. GECCO 2010.], with tail bounds given in [DG10Drift Analysis with Tail Bounds. Benjamin Doerr, Leslie Ann Goldberg. PPSN (1) 2010.].

Theorem 2.5 [Multiplicative Drift]. Let $(X_t)_{t \in \natnum}$ be an integrable process over $\{0, 1\} \cup S$, where $S \subset \realnum_{\gt 1}$, and let $T = \inf\set{t \in \natnum}{X_t \leq 0}$. Assume that there is a $\delta \in \realnum_{+}$ such that, for all $s \in S \cup \{1\}$ and all $t \lt T$, it holds that $$\Ew{X_t - X_{t+1} \mid X_0,…, X_t} \geq \delta X_t.$$ Then $$\Ew{T} \leq \frac{1 + \ln \Ew{X_0}}{\delta}.$$ Further, for all $k \gt 0$ and $s \in S \cup \{1\}$ with $\Pr{X_0 \leq s} \gt 0$, it holds that $$ \Pr{T \gt \frac{k + \ln s}{\delta} \mid X_0 \leq s} \leq \mathrm{e}^{-k}\,. $$

The condition (D) gives a bound dependent on the history, specifically dependent on the “current” value of the process. Intuitively it requires that, if the process has a current value of $X_t = s$, then the drift is at least $\delta s$. In fact, the multiplicative drift theorem is frequently stated with a condition of $X_t=s$ instead of $X_0,…, X_t$, and the upper bound is written as $\delta s$ instead of $\delta X_t$. See Section 8 [Drift as an Average: A Closer Look on the Conditioning of Drift] for a detailed discussion on the different ways to write a drift theorem.

These drift theorems cover a lot of applications; the remainder of this section gives a range of use cases. Most scientists consider the drift theorems stated above first before turning to other drift theorems (see Section 5 [The Zoo: A Tour of Drift Theorems] for a list and discussion of such alternatives). An incomplete list of some applications of these basic theorems, regarding the analysis of evolutionary algorithms, is as follows.

Theorem 15 of [KSNO12The max problem revisited: the importance of mutation in genetic programming. Timo Kötzing, Andrew M. Sutton, Frank Neumann, Una-May O'Reilly. GECCO 2012.] uses it for a simple $O(n \log n)$ bound.
Similarly easy arguments are given in Theorems 14 and 17 of [DDK15Solving Problems with Unknown Solution Length at (Almost) No Extra Cost. Benjamin Doerr, Carola Doerr, Timo Kötzing. GECCO 2015.].
A number of applications is given in [FKL+17Analysis of the (1+1) EA on Subclasses of Linear Functions under Uniform and Linear Constraints. Tobias Friedrich, Timo Kötzing, Gregor Lagodzinski, Frank Neumann, Martin Schirneck. FOGA 2017., FKL+20Analysis of the (1 + 1) EA on subclasses of linear functions under uniform and linear constraints. Tobias Friedrich, Timo Kötzing, J. A. Gregor Lagodzinski, Frank Neumann, Martin Schirneck. Theor. Comput. Sci. 2020.].
Lemma 2 of [KM12ACO Beats EA on a Dynamic Pseudo-Boolean Function. Timo Kötzing, Hendrik Molter. PPSN (1) 2012.] uses the concentration bound of Theorem 2.5 [Multiplicative Drift].
So does Theorem 9 of [FKKS15The Benefit of Recombination in Noisy Evolutionary Search. Tobias Friedrich, Timo Kötzing, Martin S. Krejca, Andrew M. Sutton. ISAAC 2015.] (see also Theorem 9 of [FKKS17The Compact Genetic Algorithm is Efficient Under Extreme Gaussian Noise. Tobias Friedrich, Timo Kötzing, Martin S. Krejca, Andrew M. Sutton. IEEE Trans. Evol. Comput. 2017.]) for the analysis of the cGA.
The application in Theorem 9 of [FK13Optimizing expected path lengths with ant colony optimization using fitness proportional update. Matthias Feldmann, Timo Kötzing. FOGA 2013.] is a bit more intricate argument for an upper bound via multiplicative drift.
Similarly in Theorems 8 and 15 of [FKKS15Robustness of Ant Colony Optimization to Noise. Tobias Friedrich, Timo Kötzing, Martin S. Krejca, Andrew M. Sutton. GECCO 2015.] for the analysis of an ant colony optimization (ACO) algorithm optimizing noisy OneMax.

We note that, in our applications of the drift theorems in the following, we do not show that the random processes under consideration are integrable, since this is easily observed from the context that they are defined in.

2.2 Some Simple Applications

We will start by looking at the process from Section 1 [What is Stochastic Drift?] about collecting coupons, a classic process analyzed in many text books on random processes. We start with a (suboptimal) analysis via additive drift.

Theorem 2.6 [Coupon Collector with Additive Drift]. Suppose we want to collect at least one of each color of $n \in \natnum_{\geq 1}$ coupons. Each round, we are given one coupon chosen uniformly at random from the $n$ colors. Then, in expectation, we have to collect for at most $n^2$ rounds.

Proof. Let $X_t$ be the number of coupons missing after $t$ rounds and let $T$ be the random variable describing the first time such that $X_t = 0$. Furthermore, let $t \lt T$. The probability of making progress (of $1$) with coupon $t+1$ is at least $X_t/n$. In the worst case, when only one color is missing, this is still $1/n$. Thus, $\Ew{X_t - X_{t+1} \mid X_0,…, X_t} = X_t/n \geq 1/n$. Since we start with $X_0=n$ missing colors, an application of Theorem 2.1 [Additive Drift, Upper Bound] gives the desired upper bound of $n^2$ rounds, using $\delta = 1/n$.

The analysis with additive drift completely disregards the very high probability of finding new colors while still a lot of colors are missing. Thus, the analysis with multiplicative drift gives a much better bound, as the following theorem shows. In a sense, the multiplicative drift theorem is a generalization of the classic analysis of the coupon collector process; or, vice versa, the analysis of the coupon collector process follows directly from the multiplicative drift theorem.

Theorem 2.7 [Coupon Collector with Multiplicative Drift]. Suppose we want to collect at least one of each color of $n \in \natnum_{\geq 1}$ coupons. Each round, we are given one coupon chosen uniformly at random from the $n$ colors. Then, in expectation, we have to collect for at most $n(1+ \ln n)$ rounds. Furthermore, for all $k \in \realnum_{\gt 0}$, overshooting this time by $kn$ has a probability of at most $\mathrm{e}^{-(k+1)}$.

Proof. Let $X_t$ be the number of coupons missing after $t$ rounds and let $T$ be the random variable describing the first time such that $X_t = 0$. Furthermore, let $t \lt T$. The probability of making progress (of $1$) with coupon $t+1$ is $X_t/n$. Thus, $\Ew{X_t - X_{t+1} \mid X_0,…, X_t} = X_t/n$. An application of Theorem 2.5 [Multiplicative Drift] gives the desired result.

A lower bound can be derived with an appropriate lower bounding multiplicative drift theorem (see Theorem 5.12 [Coupon Collector, Lower Bound]). Since the process is monotone, both an upper and a lower bound can be derived with the fitness level method, see Theorem 6.6 [Coupon Collector, Lower Bound via Fitness Levels].
Using an analogous proof as in Theorem 2.7 [Coupon Collector with Multiplicative Drift], one can directly analyze a generalized version of the coupon collector process as follows.

Theorem 2.8 [Generalized Coupon Collector]. Suppose we want to collect at least one of each color of $n \in \natnum_{\geq 1}$ coupons. For each color of coupon and each round, we get this color of coupon with probability at least $p \in (0, 1]$. Then, in expectation, we have to wait for at most $(1+ \ln n)/p$ rounds. Furthermore, for all $k \in \realnum_{\gt 0}$, overshooting this time by $k/p$ has a probability of at most $\mathrm{e}^{-(k+1)}$.

Proof. Let $X_t$ be the number of coupons missing after $t$ rounds and let $T$ be the random variable describing the first time such that $X_t = 0$. Furthermore, let let $t \lt T$. The expected progress is $\Ew{X_t - X_{t+1} \mid X_0,…,X_t} \geq p X_t$, since the expected number of missing coupons that we get in the next iteration is $p X_t$. An application of Theorem 2.5 [Multiplicative Drift] gives the desired result.

Note that this generalized version does not make any assumptions on how many coupons we get per iteration, or whether these indicator random variables are in any way correlated.

We now turn to the well-known geometric distribution. The typical computation for its expectation involves modifying infinite sums. Using drift, the computation is rather simple. Furthermore, our analysis allows for processes where the probability of success changes over time and depends on the history, but a uniform bound on this probability is known.

Theorem 2.9 [Geometric Distribution]. Let $(X_t)_{t \in \natnum}$ be some random process where, in each iteration, a success event happens with some probability, possibly dependent on the history of the process; we let $S(X_0,…,X_t)$ denote the success event. Then the following estimates hold for all $p \in (0,1]$.

If, for each $t$, $\Pr{S(X_0,…,X_t) \mid X_0,…,X_t} \leq p$, then the expected time until any success event happened is at least $1/p$.
If, for each $t$, $\Pr{S(X_0,…,X_t) \mid X_0,…,X_t} \geq p$, then the expected time until any success event happened is at most $1/p$.
If, for each $t$, $\Pr{S(X_0,…,X_t) \mid X_0,…,X_t} = p$, then the expected time until any success event happened is exactly $1/p$.

Proof. We start with the first statement. For all $t \in \natnum$, let $X_t$ be $0$ if a success event has happened within the first $t$ iterations, and $1$ otherwise. We let $T$ be the random variable describing the first time that a success event happened. Furthermore, let $t \lt T$. Then $\Ew{X_t - X_{t + 1} \mid X_0,…,X_t} \leq p$. Thus, Theorem 2.3 [Additive Drift, Lower Bound] give us the corresponding bounds of at least $1/p$ iterations until the first success event. Analogously, we get the second statement from Theorem 2.1 [Additive Drift, Upper Bound]. The third statement is the conjunction of the first two.

Next we consider a sequence of fair coin tosses. Known as the gambler's fallacy is the believe that a sequence of “heads” makes the occurrence of “tails” more likely. Quite in contrast to this, for any given $k \in \natnum$ there will be an occurrence of $k$ “heads” in a row if the coin is tossed sufficiently often. In the following theorem we derive exactly how long we have to wait in expectation for such an event to happen. In the proof we apply the additive drift theorem not going down towards $0$, but going up to a value of $k$. Since the additive drift is symmetrical, we can use it in either direction equally.

Theorem 2.10 [Winning Streaks]. Let $k \in \natnum$ be given. Consider flipping a fair coin indefinitely. Then the expected number of coin flips until the first time that heads comes up $k$ times in a row is (exactly) $f(k) = 2^{k+1} - 2$.

Proof. For all $t \in \natnum$, let $R_t$ be the length of the current streak of heads after $t$ iterations ($R_t = 0$ if in iteration $t$ we got tails, as well as before any coin flip at $t=0$). In the following computation, we will condition on a value for the current search point, which is equivalent to conditioning on the history since our process is a discrete Markov chain (see Section 8 [Drift as an Average: A Closer Look on the Conditioning of Drift] for details). Let $X_t = f(R_t)$ be our process for which we aim to show drift. Let $i \in \natnum$ be given. If our current streak of heads is $i$, then in the next iteration one of two things happens: either we lose all progress, falling to a $0$ heads, or we now have a streak of $i+1$ heads. Each happens with probability $1/2$, so we have \begin{align*} \Ew{X_{t+1} - X_t \mid X_t = f(i)} &= \Ew{X_{t+1} \mid X_t = f(i)} - f(i)\\ &= \frac{1}{2}f(i+1) + \frac{1}{2}f(0) - f(i)\\ &= \left(2^{i+2}/2 - 2/2\right) + 0/2 - (2^{i+1} - 2)\\ &= 1. \end{align*} Thus, using Theorem 2.1 [Additive Drift, Upper Bound] and Theorem 2.3 [Additive Drift, Lower Bound] together, going up instead of down, we get an expected number of iterations of $f(k) = 2^{k+1}-2$ to reach a streak of $k$ heads.

Note that the potential function in the last proof, as in many places where potential functions are used, is not intuitive, so let us discuss where this potential function comes from. We decide we want to set up for additive drift, since the additive drift theorem gives both lower and upper bounds. Since any potential function that gives an additive drift can be normalized to give an additive drift of $1$, we search for a potential function that gives a drift of exactly $1$. From the two possible outcomes of the coin flipping process in each iteration, we now get the condition of $f(i+1)/2 - f(i) = 1$ for the potential $f$. In this case, this is a straightforward and easy to solve recurrence relation, so that with the (arbitrary) setting of $f(0) = 0$ we get the desired formula for $f$.
For a more in-depth discussion of potential functions and their use for the application of drift theorem, see Section 3 [The Art of Potential Functions].

2.3 More Complex Problems

In contrast to the previous applications of drift theorems, the following examples consider processes that are not Markovian. This is no problem for the drift theorems, but the user now has to make sure that all bounds hold regardless of the history, not just with respect to the current value of the process.

Our next example is a randomized algorithm for finding, in expectation, a $2$-approximation of the classical vertex cover problem. For an undirected graph $(V, E)$, a subset $C \subseteq V$ such that, for all $\{u, v\} \in E$, $u$ or $v$ is in $C$ is called a vertex cover. By Theorem 2.1 [Additive Drift, Upper Bound], we easily bound the expected size of the vertex cover that the algorithm constructs.

Theorem 2.11 [Vertex Cover Approximation]. Given an undirected graph, iteratively choose an uncovered edge and add uniformly at random an endpoint to the cover. Then, in expectation, the resulting cover is a $2$-approximation of an optimal vertex cover of the given graph.

Proof. Let a graph $G$ be given. Furthermore, fix a minimum vertex cover $C$. For all $t$, let $D_t$ be the set of vertices chosen by the algorithm after $t$ iterations. Let $X_t$ be $0$ if $D_t$ is a vertex cover, and otherwise let $X_t$ be the number of vertices of $C$ that are not in $D_t$. Clearly, the algorithm terminates exactly when $X_t = 0$. Furthermore, in each step and regardless of the history, the algorithm selects a vertex from $C$ with probability at least $1/2$, since, for every edge of $G$, at least one of the endpoints is in $C$. We get $\Ew{X_t - X_{t + 1} \mid X_0,…,X_t} \geq \frac{1}{2}$. Hence, using Theorem 2.1 [Additive Drift, Upper Bound], we get that the algorithm terminates in expectation after choosing $2|C|$ vertices.

The next example considers a simple randomized sorting algorithm. This and similar sorting algorithms were considered by Scharnow, Tinnefeld, and Wegener [STW04The analysis of evolutionary algorithms on sorting and shortest paths problems. Jens Scharnow, Karsten Tinnefeld, Ingo Wegener. J. Math. Model. Algorithms 2004.] (before the advent of drift theory). The analysis via the multiplicative drift theorem is short, easy and intuitive.

Theorem 2.12 [Random Sorting]. Consider the sorting algorithm which, given an input array $A$ of length $n \in \natnum_{\geq 1}$, iteratively chooses two different positions of the array uniformly at random and swaps them if and only if they are out order. Then the algorithm obtains a sorted array after $\Theta(n^2 \log n)$ iterations in expectation.

Proof. For all $i, j \in [n]$ with $i \lt j$, an ordered pair $(i,j)$ is called an inversion if and only if $A[i] \gt A[j]$. Note that the maximum number of inversions is $\binom{n}{2}$. Let $X_t$ be the number of inversions after $t \in \natnum$ iterations, and let $A_t$ denote the array after that iteration. If the algorithm chooses a pair which is not an inversion, nothing changes. If the algorithm chooses an inversion $(i,j)$, then this inversion is removed; for any other inversion, only indices $k \in [i..j]$ are relevant. If $A_t[k] \lt A_t[j]$ ($\lt A_t[i]$), then $(i,k)$ is an inversion before and after the swap, while $(k,j)$ is neither an inversion before nor after the swap; similarly for $A_t[k] \gt A_t[i]$ ($\gt A_t[j]$). Finally, if $A_t[j] \lt A_t[k] \lt A_t[i]$, then $(i,k)$ and $(k,j)$ are inversions before the swap but are not afterwards. Overall, this shows that the number of inversions goes down by at least $1$ whenever the algorithm chooses an inversion for swapping, regardless of the history. Let $t$ be such that $A_t$ is not sorted. Since the probability of the algorithm choosing an inversion is $X_t/\binom{n}{2}$, we get $\Ew{X_t - X_{t+1} \mid X_0,…,X_t} \geq X_t/\binom{n}{2}$. An application of Theorem 2.5 [Multiplicative Drift] gives the desired upper bound. Regarding the lower bound, consider the array $A$ which is almost sorted but the first and second element are swapped, the third and fourth, and so on. Then the algorithm effectively performs a coupon collector process on $n/2$ coupons, where each has a probability of $1/\binom{n}{2}$ to be collected. This takes an expected time of $\Omega(n^2 \log n)$ with a proof analogous to that of Theorem 5.12 [Coupon Collector, Lower Bound].

2.4 Classic Results for Evolutionary Algorithms

The basic evolutionary algorithm (EA) we want to analyze is the $(1+1)$ EA; it proceeds as follows (see also Section 9.1 [Algorithms]).

OneOneEA

The $(1+1)$ EA minimizing a function $f\colon \{0,1\}^n \rightarrow \realnum$. Mutation flips each bit independently with probability $1/n$.
The algorithm is set up to maximize the given function $f$; by turning around the inequality in line 4, we get the analogous algorithm for minimization.
The two cardinal test functions that are used to analyze the performance of evolutionary algorithms are ${\mathrm O{\scriptsize NE}M{\scriptsize AX}}$ and ${\mathrm L{\scriptsize EADING}O{\scriptsize NES}}$:

${\mathrm O{\scriptsize NE}M{\scriptsize AX}}$ is a function $\{0,1\}^n \rightarrow \realnum$ mapping any bit string to the number of $1$s in the bit string.
${\mathrm L{\scriptsize EADING}O{\scriptsize NES}}$ is a function $\{0,1\}^n \rightarrow \realnum$ mapping any bit string to the number of $1$s before the first $0$ (if any) in the bit string (the number of leading $1$s).

Theorem 2.13 [$(1+1)$ EA on OneMax]. Consider the $(1+1)$ EA maximizing the fitness function OneMax. Then the expected time for the algorithm to find the global optimum $1^n$ is $\bigO{n \log n}$ iterations.

Proof. For all $t$, let $X_t$ be the Hamming distance to the optimum of the current individual after $t$ iterations. We want to use the multiplicative drift theorem and estimate the drift as follows. If the currently best search point has a Hamming distance of $s$, then, for each bit $i$ of the $s$ missing positions, the event of flipping position $i$ and no other when producing offspring will result in an accepted offspring with a distance of $1$ less to the optimum. These events are disjoint (since only one bit flips) and each has a probability of $1/n \cdot (1-1/n)^{n-1} \geq 1/(en)$. Since the $(1+1)$ EA does not accept worsenings, no other event can contribute negatively to the drift, so we can pessimistically assume a contribution of $0$ to the drift in all other cases. Thus, we get \begin{align*} \Ew{X_{t} - X_{t+1} \mid X_0,…,X_t} \geq X_t / (en). \end{align*} An application of Theorem 2.5 [Multiplicative Drift] gives the desired upper bound since $X_0 \leq n$.

Theorem 2.14 [$(1+1)$ EA on LeadingOnes]. Consider the $(1+1)$ EA maximizing the fitness function LeadingOnes. Then the expected time for the algorithm to find the optimum is $\bigO{n^2}$ iterations.

Proof. For all $t$, let $X_t$ be the number of leading ones of the current individual after $t$ iterations (i.e. the fitness). We want to use the additive drift theorem and estimate drift as follows. Improving the fitness of the current individual requires flipping its first $0$ and none of the previous positions. There are at most $n-1$ previous positions, so the probability is at least $1/n \cdot (1-1/n)^{n-1} \geq 1/(en)$. An improvement is an improvement by at least $1$. Since the $(1+1)$ EA does not accept worsenings, no event can contribute negatively to the drift, so we can pessimistically assume a contribution of $0$. Thus, we get \begin{align*} \Ew{X_{t} - X_{t+1} \mid X_0,…,X_t} \geq 1 / (en). \end{align*} An application of Theorem 2.1 [Additive Drift, Upper Bound] gives the desired upper bound since $X_0 \geq 0$ and the target is at $n$.

Note that much more precise bounds are known for the optimization time of the $(1+1)$ EA on LeadingOnes, see Theorem 6.7 [Run Time of $(1+1)$ EA on LeadingOnes].

This work is licensed as Creative Commons Attribution - ShareAlike 4.0 International.
All mathematical statement in this script are freely usable, of course. As a collection and with comments they are licensed as stated above.