5 The Zoo: A Tour of Drift Theorems
We have seen the basic two drift theorems, the additive drift theorem and the multiplicative drift theorem, in
Section 2 [A Gentle Introduction to Classic Drift Theorems]. In this section we provide a list of more advanced drift theorems with applications.
- In Section 5.1 [Additive Drift] we start by extending the additive drift theorem; we see how to avoid the requirement of non-negativity (allowing overshooting of the target) and explore different conditions for the drift.
- Section 5.2 [Additive Drift: Concentration] provides a different view on additive drift by considering concentration.
- In Section 5.3 [Multiplicative Drift] we give lower bounds for the case of multiplicative drift.
- While additive drift required drift to be constant and multiplicative drift required proportional drift, in Section 5.4 [Variable Drift] we give theorems allowing for an arbitrary monotone dependence of the drift on the current state.
- Somewhat different in flavor is Section 5.5 [Negative Drift]. Here we discuss drift theorems providing exponential lower bounds given drift away from the target.
- In Section 5.6 [Finite State Spaces] we consider the special case of random processes on finite search spaces.
- Some settings allow drift only when far away from the target, but in the proximity of the target the drift is negative. In this case, the theorem of Section 5.7 [Headwind Drift] can offer an upper bound on the run time nonetheless.
- In order to derive good upper bounds even when the drift gets stronger when getting closer to the optimum, Section 5.8 [Multiplicative Up-Drift] provides a drift theorem for the case of proportionally increasing drift.
- A completely different approach to understanding drift is given by Wormald and briefly discussed in Section 5.9 [Wormald's Method].
Note that there are a few novel approaches to analyzing multi-dimensional potential functions [
Row18Linear multi-objective drift analysis. Jonathan E. Rowe. Theor. Comput. Sci. 2018.,
JL22Two-Dimensional Drift Analysis: - Optimizing Two Functions Simultaneously Can Be Hard. Duri Janett, Johannes Lengler. PPSN (2) 2022.]; while the initial works are promising, they have not gained traction yet and we will not discuss them here.
5.1 Additive Drift
We want to start with an illustrative proof for a strong version of the additive drift theorem; the proof is adapted from the proof of Theorem 2.3.1 in [
Len20Drift Analysis. Johannes Lengler. In: Theory of Evolutionary Computation 2020.].
Theorem 5.1 [Additive Drift, Upper Bound, Time Condition].
Let $(X_t)_{t \in \natnum}$ be a stochastic process on $\realnum$ with deterministic $X_0$, and let $T = \inf\set{t \in \natnum}{X_t \leq 0}$. Suppose that there is a $\delta \gt 0$ so that we have the following conditions (drift, non-negativity).
- (D)
- For all $t$ with $\Pr{t \lt T} \gt 0$, $\Ew{X_t - X_{t+1} \mid t \lt T} \geq \delta$.
- (NN)
- For all $t \leq T$, $X_t \geq 0$.
We have
$$
\Ew{T} \leq X_0/\delta.
$$
Proof.
We first replace the process $(X_t)_{t \in \natnum}$ with a process $(X'_t)_{t \in \natnum}$ such that, for all $t \leq T$, we have $X'_t = X_t$, and for all $t \gt T$ we have $X'_t = X'_{t-1}$. Both processes are $\leq 0$ at the same time, but $(X'_t)_{t \in \natnum}$ does not change after that.
We have that
(NN) gives $X_T = 0$, and, thus, $X_t =0$ for all $t \geq T$. We will from now on assume that $(X_t)_{t \in \natnum}$ is exactly this modified process. Together with
(NN) we thus have
\begin{equation}
\forall t \in \natnum: X_t \geq 0.\tag{NN'}
\end{equation}
Furthermore, for all $t \geq T$ with $\Pr{t \geq T} \gt 0$, we have $\Ew{X_t - X_{t+1} \mid t \geq T} = 0$.
We now have, for all $t \in \natnum$ with $\Pr{t \lt T} \gt 0$,
\begin{align*}
\Ew{X_t - X_{t+1}} & = \Ew{X_t - X_{t+1} \mid t \lt T}\Pr{t \lt T} + \Ew{X_t - X_{t+1} \mid t \geq T}\Pr{t \geq T} \\
& = \Pr{t \lt T} \Ew{X_t - X_{t+1} \mid t \lt T}\\
& \eqnComment{(D)}{\geq} \Pr{t \lt T} \delta \\
& = \delta\Pr{T \gt t}.
\end{align*}
The first equality is the
law of total expectation; the second follows from $X_{t} = X_{t+1}$ for $t \geq T$. Note that the overall inequality holds trivially for all $t \in \natnum$ such that $\Pr{t \lt T} = 0$, so it holds for all $t$. Explicitly, for all $t \in \natnum$ we have
\begin{equation}
\Pr{T \gt t} \leq \frac{1}{\delta} \; \Ew{X_t - X_{t+1}}.\tag{$\ast$}
\end{equation}
Since $T$ takes only values in $\natnum \cup\{\infty\}$, we have
$$
\Ew{T} = \sum_{i=0}^\infty\Pr{T \gt i}.
$$
We want to use this to compute $\Ew{T}$. For all $n \in \natnum$ we have
$$
\sum_{t=0}^n \Pr{T \gt t} \eqnComment{($\ast$)}{\leq} \frac{1}{\delta} \sum_{t=0}^n \left( \Ew{X_t} - \Ew{X_{t+1}}\right) = \frac{1}{\delta} \left(X_0 - \Ew{X_{n+1}}\right) \eqnComment{(NN')}{\leq} \frac{X_0}{\delta}.
$$
Since all partial sums are upper bounded by $X_0/\delta$, so is the infinite sum.
Note that we can turn the proof around to get the analogous version for a lower bound. Again the proof is essentially taken from the proof of Theorem 1 in [
Len20Drift Analysis. Johannes Lengler. In: Theory of Evolutionary Computation 2020.]. Note that it uses the somewhat strong assumption of a bounded search space, whereas
Theorem 2.3 [Additive Drift, Lower Bound] only requires a bound on the size of each step.
Theorem 5.2 [Additive Drift, Lower Bound, Time Condition].
Let $(X_t)_{t \in \natnum}$ be a stochastic process on $\realnum$ with deterministic $X_0$, and let $T = \inf\set{t \in \natnum}{X_t \leq 0}$. Suppose that there is a $\delta \in \realnum_{+}$ so that we have the following conditions (drift, upper bounded search space).
- (D)
- For all $t$ with $\Pr{t \lt T} \gt 0$, $\Ew{X_t - X_{t+1} \mid t \lt T} \leq \delta$.
- (UB)
- There is a $c \gt 0$ such that, for all $t \lt T$, $X_t \leq c$.
We have
$$
\Ew{T} \geq X_0/\delta.
$$
Proof.
We first replace the process $(X_t)_{t \in \natnum}$ with a process $(X'_t)_{t \in \natnum}$ such that, for all $t \leq T$, we have $X'_t = X_t$, and for all $t \gt T$ we have $X'_t = X'_{t-1}$. Both processes are $\leq 0$ at the same time, but $(X'_t)_{t \in \natnum}$ does not change after that. We will from now on assume that $(X_t)_{t \in \natnum}$ is exactly this modified process. Thus, for all $t \geq T$ with $\Pr{t \geq T} \gt 0$ we have $\Ew{X_t - X_{t+1} \mid t \geq T} = 0$.
We now have, for all $t \in \natnum$ with $\Pr{t \lt T} \gt 0$,
\begin{align*}
\Ew{X_t - X_{t+1}} & = \Ew{X_t - X_{t+1} \mid t \lt T}\Pr{t \lt T} + \Ew{X_t - X_{t+1} \mid t \geq T}\Pr{t \geq T} \\
& = \Pr{t \lt T} \Ew{X_t - X_{t+1} \mid t \lt T}\\
& \eqnComment{(D)}{\leq} \Pr{t \lt T} \delta \\
& = \delta\Pr{T \gt t}.
\end{align*}
The first equality is the
law of total expectation; the second follows from $X_{t} = X_{t+1}$ for $t \geq T$. Note that the overall inequality holds trivially for all $t \in \natnum$ such that $\Pr{t \lt T} = 0$, so it holds for all $t$. Explicitly, for all $t \in \natnum$ we have
\begin{equation}
\Pr{T \gt t} \geq \frac{1}{\delta} \; \Ew{X_t - X_{t+1}}.\tag{$\ast$}
\end{equation}
Since $T$ takes only values in $\natnum \cup\{\infty\}$, we have
$$
\Ew{T} = \sum_{i=0}^\infty\Pr{T \gt i}.
$$
We want to use this to compute $\Ew{T}$. For all $n \in \natnum$, we have
$$
\sum_{t=0}^n \Pr{T \gt t} \eqnComment{($\ast$)}{\geq} \frac{1}{\delta} \sum_{t=0}^n \left( \Ew{X_t} - \Ew{X_{t+1}}\right) = \frac{1}{\delta} \left(X_0 - \Ew{X_{n+1}}\right).
$$
It remains to be shown that $\Ew{X_{n+1}}$ converges to a value $\leq 0$ for $n$ going to infinity. Using the $c$ from
(UB), we have for all $n \in \natnum$ with $\Pr{n \lt T} \gt 0$ that
\begin{align*}
\Ew{X_{n}} & = \Ew{X_{n}\mid n \lt T}\Pr{n \lt T} + \Ew{X_{n}\mid n \geq T}\Pr{n \geq T} \leq c \cdot \Pr{n \lt T} + 0 \cdot \Pr{n \geq T}\\
& = c \Pr{n \lt T}.
\end{align*}
We distinguish two cases. If $\Pr{n \lt T}$ converges to $0$ for $n$ going to $\infty$, then $\Ew{X_{n}}$ converges to $0$ as desired. Otherwise, there is a non-zero probability of $T=\infty$, in which case the theorem follows directly from that.
Both in
Theorem 2.1 [Additive Drift, Upper Bound], the classic version of the additive drift theorem, as in the version just above, it required that the target of $0$ must be hit exactly and not overshot
(NN). From [
Kre19Theoretical analyses of univariate estimation-of-distribution algorithms. Martin S. Krejca. Universität Potsdam 2019.] we have a stronger version that allows for overshooting. This is frequently helpful, for example for finding approximations, when potentially much better values than required can be achieved.
Theorem 5.3 [Additive Drift, Upper Bound with Overshooting].
Let $(X_t)_{t \in \natnum}$ be an
integrable process over $\realnum$, and let $T = \inf\set{t \in \natnum }{ X_t \leq 0}$.
Furthermore, suppose the following (drift).
- (D)
- There is a $\delta > 0$ such that, for all $t \lt T$, it holds that $\Ew{X_t - X_{t+1} \mid X_0,…,X_t} \geq \delta$.
Then
$$
\Ew{T} \leq \frac{\Ew{X_0} - \Ew{X_T}}{\delta}.
$$
Proof.
Consider the process $(X'_t)_{t \in \natnum}$ such that, for all $t \in \natnum$, $X'_t = X_t - X_T$.
In a sense, this drift theorem is simpler than
Theorem 2.1 [Additive Drift, Upper Bound]: the requirement (NN) is dropped and the expected time is increased corresponding to the expected additional distance the process will have traveled (note that $\Ew{X_T}$ is not a positive value, since $T$ is the first point $t$ where $X_t \leq 0$). From the condition (NN) we could derive $X_T = 0$ and thus immediately recover
Theorem 2.1 [Additive Drift, Upper Bound].
5.2 Additive Drift: Concentration
One of the reasons why the additive drift theorem is so general (we only really have a requirement on the expectation of change, the first moment, but not on the higher moments) is that we only get a conclusion about the expectation of the first hitting time of the target. With requirements on the higher moments we can derive concentration bounds on the expected first hitting time. This is provided by [
Köt16Concentration of First Hitting Times Under Additive Drift. Timo Kötzing. Algorithmica 2016.], from which we give two different variants, one using an absolute bound on the step size
(B), Theorem 2 in the cited work, and one requiring concentrated step size
(C), combining Theorems 10 and 15 from the cited work. Each time we get that there is only a very small probability of arriving significantly later than the expected time of $n/\delta$. An example application of such a concentration result for additive drift is in [
KLW15(1+1) EA on Generalized Dynamic OneMax. Timo Kötzing, Andrei Lissovoi, Carsten Witt. FOGA 2015.] regarding an analysis of the $(1+1)$ EA on a dynamic version of ${\mathrm O{\scriptsize NE}M{\scriptsize AX}}$ (see Theorem 10 in the cited work). Another application is given in Theorem 5 of [
FKL+17Analysis of the (1+1) EA on Subclasses of Linear Functions under Uniform and Linear Constraints. Tobias Friedrich, Timo Kötzing, Gregor Lagodzinski, Frank Neumann, Martin Schirneck. FOGA 2017.,
FKL+20Analysis of the (1 + 1) EA on subclasses of linear functions under uniform and linear constraints. Tobias Friedrich, Timo Kötzing, J. A. Gregor Lagodzinski, Frank Neumann, Martin Schirneck. Theor. Comput. Sci. 2020.].
Theorem 5.4 [Additive Drift, Upper Concentration, Bounded Step Size].
Let $(X_t)_{t \in \natnum}$ be an
integrable process over $\realnum$, and let $T = \inf\set{t \in \natnum }{ X_t \leq 0}$.
Furthermore, suppose that there is $c \gt 0$ such that we have the following conditions (drift, bounded steps).
- (D)
- There is a $\delta > 0$ such that, for all $t \lt T$, it holds that $\Ew{X_t - X_{t+1} \mid X_0,…, X_t} \geq \delta$.
- (B)
- For all $t \in \natnum$, $|X_{t+1}-X_t| \leq c$.
Let $n \in \natnum$ such that $X_0 \leq n$. Then, for all $s \geq 2n / \delta$,
$$
\Pr{T \geq s} \leq \exp\left( - \frac{s\delta^2}{8c^2} \right).
$$
Theorem 5.5 [Additive Drift, Upper Concentration, Concentrated Step Size].
Let $(X_t)_{t \in \natnum}$ be an
integrable random process over $\realnum$, and let $T = \inf\set{t \in \natnum }{ X_t \leq 0 }$.
Furthermore, suppose that there are $\varepsilon \gt 0$ and $c \gt 0$ such that we have the following conditions (drift, concentration).
- (D)
- There is a $\delta > 0$ such that, for all $t \lt T$, it holds that $\Ew{X_t - X_{t+1} \mid X_0,…, X_t} \geq \delta$.
- (C)
- For all $t \in \natnum$ and all $x \geq 0$, $\Pr{|X_{t+1}-X_t| \geq x \mid X_t} \leq \frac{c}{(1+\varepsilon)^x}$.
Let $n \in \natnum$ such that $X_0 \leq n$. Then, for all $s \geq 2n / \delta$,
$$
\Pr{T \geq s} \leq \exp\left( - \frac{s\delta}{4} \min\left(\frac{\varepsilon}{4}, \frac{\delta\varepsilon^3}{256c} \right) \right).
$$
Also in [
Köt16Concentration of First Hitting Times Under Additive Drift. Timo Kötzing. Algorithmica 2016.] are analogous
lower bounds. Again we give two different variants, one using an absolute bound on the step size
(B), Theorem 1 in the cited work, and one requiring concentrated step size
(C), combining Theorems 10 and 14 from the cited work. Each time we get that there is only a very small probability of arriving significantly before the expected time of $n/\delta$.
Theorem 5.6 [Additive Drift, Lower Concentration, Bounded Step Size].
Let $(X_t)_{t \in \natnum}$ be an
integrable random process over $\realnum$, and let $T = \inf\{t \in \natnum \mid X_t \leq 0\}$.
Furthermore, suppose that there is $c \gt 0$ such that we have the following conditions (drift, bounded steps).
- (D)
- There is a $\delta > 0$ such that, for all $t \lt T$, it holds that $\Ew{X_t - X_{t+1} \mid X_0,…, X_t} \leq \delta$.
- (B)
- For all $t \in \natnum$, $|X_{t+1}-X_t| \leq c$.
Let $n \in \natnum$ such that $X_0 \geq n$. Then, for all $s \leq n / (2\delta)$,
$$
\Pr{T \lt s} \leq \exp\left( - \frac{n^2}{8c^2s} \right).
$$
Theorem 5.7 [Additive Drift, Lower Concentration, Concentrated Step Size].
Let $(X_t)_{t \in \natnum}$ be an
integrable random process over $\realnum$, and let $T = \inf\{t \in \natnum \mid X_t \leq 0\}$.
Furthermore, suppose that there are $\varepsilon \gt 0$ and $c \gt 0$ such that (drift, concentration)
- (D)
- there is a $\delta > 0$ such that, for all $t \lt T$, it holds that $\Ew{X_t - X_{t+1} \mid X_0,…, X_t} \leq \delta$;
- (C)
- for all $t \in \natnum$ and all $x \geq 0$, $\Pr{|X_{t+1}-X_t| \geq x \mid X_t} \leq \frac{c}{(1+\varepsilon)^x}$.
Let $n \in \natnum$ such that $X_0 \geq n$. Then, for all $s \leq n / (2\delta)$,
$$
\Pr{T \lt s} \leq \exp\left( - \frac{n}{4} \min\left(\frac{\varepsilon}{4}, \frac{n\varepsilon^3}{256cs} \right) \right).
$$
The overall situation depending on the strength of the drift is depicted in detail in [
Köt16Concentration of First Hitting Times Under Additive Drift. Timo Kötzing. Algorithmica 2016.]. In particular, there are three main regimes:
- If the drift is at least $\delta \geq 1/n$, then we get high concentration of the first hitting time.
- If the drift is $\delta \in [-1/n, 1/n]$ but the variance is significant, then we get to hit the optimum with constant chance within $\bigO{n^2}$ steps, see Theorem 4.2 [Unbiased Random Walk on the Line, One Barrier].
- If the drift is much smaller than $-1/n$, then we have negative drift and only a superpolynomially small chance to reach the optimum in polynomial time, see Theorem 5.15 [Negative Drift, Bounded Step Size].
The literature knows also the following theorem for bounding additive drift only relying on the variance, given by Semenov and Terkel [
ST03Analysis of convergence of an evolutionary algorithm with self-adaptation using a stochastic Lyapunov function. Mikhail A. Semenov, Dmitri A. Terkel. Evol. Comput. 2003.].
Theorem 5.8 [Additive Drift, Upper Concentration, Bounded Variance].
Let $(X_t)_{t \in \natnum}$ be an
integrable random process over $\realnum$ with $X_0=0$.
Furthermore, suppose the following (drift, variance).
- (D)
- There is a $\delta > 0$ such that, for all $t \lt T$, it holds that $\Ew{X_t - X_{t+1} \mid X_0,…, X_t} \geq \delta$.
- (Var)
- There is a $c \gt 0$ such that, for all $t \in \natnum$, $\Var{X_{t+1} \mid X_0,…, X_t} \leq c$.
Then, for all $\varepsilon \gt 0$, the following holds with probability $1$.
$$
X_t \geq t \delta - o\left(t^{0.5+\varepsilon}\right).
$$
5.3 Multiplicative Drift
The plain multiplicative drift theorem (see
Theorem 2.5 [Multiplicative Drift]) is already very strong, in that it requires few conditions on the search space and even gives a concentration (in one direction). What it does not provide is a lower bound. One possible such bound can be found in [
Wit13Tight Bounds on the Optimization Time of a Randomized Search Heuristic on Linear Functions. Carsten Witt. Comb. Probab. Comput. 2013.] which we state here.
Theorem 5.9 [Multiplicative Drift, Lower Bound, Monotone].
Let $(X_t)_{t \in \natnum}$ be a
discrete,
integrable process over $\{0, 1\} \cup S$, where $S \subset \realnum_{> 1}$ is finite, and let $T = \inf\set{t \in \natnum }{ X_t \leq 0}$.
We assume that there are $\beta,\delta \in (0,1)$ such that the following conditions (drift, monotonicity, concentration) hold for all $s \gt 1$ and $t \in \natnum$ with $\Pr{X_t = s} \gt 0$.
- (D)
- $\Ew{X_t - X_{t+1} \mid X_t = s} \leq \delta s$.
- (M)
- $X_{t+1} \leq X_t$.
- (C)
- $\Pr{X_t - X_{t+1} \geq \beta s \mid X_t = s} \leq \beta\delta /\ln(s)$.
Then
$$\Ew{T \mid X_0} \geq \frac{\ln (X_0)}{\delta} \cdot \frac{1-\beta}{1+\beta} \geq \frac{\ln (X_0)}{\delta} \cdot (1-2\beta).$$
From [
DDK18Static and Self-Adjusting Mutation Strengths for Multi-valued Decision Variables. Benjamin Doerr, Carola Doerr, Timo Kötzing. Algorithmica 2018.] we have a variant which allows for non-monotone drift. It substitutes the monotonicity with the requirement that we cannot expect more progress from first returning to bigger values of the process. Turned around, progress in any state $s$ cannot be bigger than in a state $s' \lt s$. We use the notation $(x)_+ := \max(0,x)$.
Theorem 5.10 [Multiplicative Drift, Lower Bound].
Let $(X_t)_{t \in \natnum}$ be a
discrete random process over $\{0, 1\} \cup S$, where $S \subset \realnum_{> 1}$ is finite, and let $T = \inf\set{t \in \natnum}{X_t \leq 1}$.
We assume that there are $\beta,\delta \in (0,1)$ such that the following conditions (drift, concentration) hold for all $s \gt 1$ and $t \in \natnum$ with $\Pr{X_t = s} \gt 0$.
- (D)
- For all $s'$ with $1 \lt s' \leq s$: $\Ew{(s' - X_{t+1})_+ \mid X_0,…, X_t, X_t = s} \leq \delta s'$.
- (C)
- For all $s'$ with $1 \lt s' \leq s$: $\Pr{s' - X_{t+1} \geq \beta s' \mid X_0,…, X_t, X_t = s} \leq \beta\delta /\ln(s')$.
Then
$$\Ew{T \mid X_0} \geq \frac{\ln (X_0)}{\delta} \cdot \frac{1-\beta}{1+\beta} \geq \frac{\ln (X_0)}{\delta} \cdot (1-2\beta).$$
As an alternative, we can find a lower bound when the step size is bounded. The following theorem is given in [
DKLL20The impact of lexicographic parsimony pressure for ORDER/MAJORITY on the run time. Benjamin Doerr, Timo Kötzing, J. A. Gregor Lagodzinski, Johannes Lengler. Theor. Comput. Sci. 2020.]. A further version can be found in [
KK19First-hitting times under drift. Timo Kötzing, Martin S. Krejca. Theor. Comput. Sci. 2019.].
Theorem 5.11 [Multiplicative Drift, Lower Bound, Bounded Step Size].
Let $(X_t)_{t \in \natnum}$ be an
integrable random process over $\realnum_+$, let $x_{\mathrm{min}} \gt 0$, and let $T = \inf\set{t \in \natnum}{X_t \leq x_{\mathrm{min}}}$.
We assume that there are $c,\delta \in \realnum_+$ with $x_{\mathrm{min}} \geq \sqrt{2}c$ such that the following conditions (drift, bounded step size) hold for all $t \lt T$.
- (D)
- $\Ew{X_t - X_{t+1} \mid X_0,…,X_t} \leq \delta X_t$.
- (B)
- $|X_t-X_{t+1}| \leq c$.
Then
$$\Ew{T \mid X_0} \geq \frac{1+\ln (X_0)-\ln(x_{\mathrm{min}})}{2\delta + \frac{c^2}{x_{\mathrm{min}}^2-c^2}}.$$
Note that, for typical applications, $\delta$ is small; yet the term $2\delta$ should dominate the term $\frac{c^2}{x_{\mathrm{min}}^2-c^2}$ to give a tight bound. But this is typically not a problem: consider the setting of $\delta = \Theta(1/n)$ and $X_0 = \Theta(n)$. We can let $x_{\mathrm{min}} = \Theta(\sqrt{n})$ and suppose we can bound $c = o(\sqrt{n})$ with sufficiently high probability (which would be typical). Then the theorem lets us derive the asymptotically optimal bound of $\Omega(n \log n)$.
As an example application we provide a lower bound for the coupon collector process (see the upper bound proven in
Theorem 2.7 [Coupon Collector with Multiplicative Drift]).
Theorem 5.12 [Coupon Collector, Lower Bound].
Suppose we want to collect at least one of each color of $n \in \natnum_{\geq 1}$ coupons. Each round, we are given one coupon with a color chosen uniformly at random from the $n$ kinds. Then, in expectation, we have to collect for at least $\Omega(n \ln n)$ iterations.
Proof.
Let $X_t$ be the number of coupons missing after $t$ iterations. We want to apply
Theorem 5.11 [Multiplicative Drift, Lower Bound, Bounded Step Size] and note that, since each iteration at most one coupon is gained and none is lost, we can use $c=1$ to satisfy
(B). Furthermore, regarding
(D), the probability of making progress (of $1$) with coupon $t+1$ is $X_t/n$. Thus, $\Ew{X_t - X_{t+1} \mid X_0,…, X_t} = X_t/n$ and we can use $\delta=1/n$. We set $x_{\mathrm{min}}=\sqrt{n}$ and an application of
Theorem 5.11 [Multiplicative Drift, Lower Bound, Bounded Step Size] gives an upper bound of
$$
\frac{1+\ln (n)-\ln(\sqrt{n})}{\frac{2}{n} + \frac{1}{n-1}} \geq \frac{1+\ln(n)/2}{\frac{3}{n-1}} = \frac{1}{6} \cdot (n-1)\ln(n) = \Omega(n \ln n).
$$
5.4 Variable Drift
A more general version of
Theorem 2.1 [Additive Drift, Upper Bound] and
Theorem 2.5 [Multiplicative Drift] is the variable drift theorem, allowing for any
monotone dependency of the drift on the current state (meaning that a larger distance to the target has to imply a larger drift). It is due to [
MRC09Theoretical analysis of local search strategies to optimize network communication subject to preserving the total number of links. Boris Mitavskiy, Jonathan E. Rowe, Chris Cannings. Int. J. Intell. Comput. Cybern. 2009.,
Joh10Random Combinatorial Structures and Randomized Search Heuristics. Daniel Johannsen. Universität des Saarlandes 2010.] and was improved in [
RS14The choice of the offspring population size in the (1, λ) evolutionary algorithm. Jonathan E. Rowe, Dirk Sudholt. Theor. Comput. Sci. 2014.]. We give here the version from [
KK19First-hitting times under drift. Timo Kötzing, Martin S. Krejca. Theor. Comput. Sci. 2019.], where the random process is not assumed to be
discrete or Markovian.
Theorem 5.13 [Variable Drift].
Let $(X_t)_{t \in \natnum}$ be an
integrable random process over $\realnum$, $x_{\mathrm{min}} \in \realnum_{+}$, and let $T = \inf\set{t \in \natnum }{ X_t \lt x_{\mathrm{min}}}$. Additionally, let $I$ denote the smallest real interval that contains at least all values $x \geq x_{\mathrm{min}}$ that, for all $t \leq T$, any $X_t$ can take. Furthermore, suppose that there is a function $h\colon I \rightarrow \realnum_+$ such that the following conditions (drift, monotonicity, start, non-negativity) hold for all $t \leq T$.
- (D)
- $\Ew{X_t - X_{t+1} \mid X_0,…,X_t} \geq h(X_t)$.
- (M)
- The function $h$ is monotonically non-decreasing.
- (S)
- $X_0 \geq x_{\mathrm{min}}$.
- (NN)
- $X_t \geq 0$.
Then
$$
\Ew{T \mid X_0} \leq \frac{1}{h(x_{\mathrm{min}})} + \int_{x_{\mathrm{min}}}^{X_0} \frac{1}{h(z)}\,\mathrm{d}z.
$$
Note that the additive drift theorem is the special case of constant $h$ and the multiplicative drift theorem is the special case of linear $h$. It is surprising that the cases of additive and multiplicative drift are sufficient in many applications, but the variable drift theorem can also in these cases sometimes give tighter bounds.
Concentration bounds for variable drift are also available [
LW14Concentrated Hitting Times of Randomized Search Heuristics with Variable Drift. Per Kristian Lehre, Carsten Witt. ISAAC 2014.]. The same paper also gives a lower bounding variable drift theorem, which requires $h$ to be monotonically non-
decreasing, the opposite as for the upper bound. Further variants can be found in [
KK19First-hitting times under drift. Timo Kötzing, Martin S. Krejca. Theor. Comput. Sci. 2019.], including lower bounds for step-size bounded settings.
An example application of
Theorem 5.13 [Variable Drift] is the optimization of
LeadingOnes by the $(1+1)$ EA. It is known [
DJW02On the analysis of the (1+1) evolutionary algorithm. Stefan Droste, Thomas Jansen, Ingo Wegener. Theor. Comput. Sci. 2002.] that the expected gain in fitness value per iteration, given that the current fitness value is $n-s$ (and thus $s$ away from the optimum), is (essentially) at least
$$
h(s) = 2 \cdot \left(1-1/n\right)^{n-s} \cdot \frac{1}{n}.
$$
The middle term is the probability to not lose a bit already gained; the $1/n$ is the probability to flip the left-most $0$ and the $2$ is the expected fitness gained when the two just mentioned events happen (one bit flipped, plus an expected one more bit that happens to be correctly set). The middle term can be lower-bounded by $1/e$, which allows for applying
Theorem 2.1 [Additive Drift, Upper Bound], giving a total run time of at most
$$
\frac{e}{2} \cdot n^2.
$$
Using the variable drift theorem directly on the bound given by $h$, a simple integration gives an upper bound on the optimization time of
$$
\frac{e-1}{2} \cdot n^2.
$$
This optimization time was first established in [
BDN10Optimal Fixed and Adaptive Mutation Rates for the LeadingOnes Problem. Süntje Böttcher, Benjamin Doerr, Frank Neumann. PPSN (1) 2010.]. In this example, the use of the variable drift theorem improved the leading constant. Note that the bound can be derived also by other means, for example by the fitness level method, which can also be used to show tightness of this bound, see
Theorem 6.7 [Run Time of $(1+1)$ EA on LeadingOnes].
An essential application of a variable drift is given in the proof of Theorem 17 in [
DFF+19Island Models Meet Rumor Spreading. Benjamin Doerr, Philipp Fischbeck, Clemens Frahnow, Tobias Friedrich, Timo Kötzing, Martin Schirneck. Algorithmica 2019.], considering the optimization of
OneMax by an islands-based evolutionary algorithm, employing $\lambda$ islands and an exchange of individuals every $\tau$ rounds. In particular, the considered drift function is $h\colon \realnum \rightarrow \realnum$ such that, for all $s > 0$,
$$
h(s) = \left. \ln(\lambda) \middle/ \ln \!\left(\!\frac{n \; \ln(\lambda)}{\tau s} \!\right) \right.
$$
The final bound on the run time is shown to be asymptotically tight, thanks to using both upper and lower bounding variable drift theorems.
Further uses of the variable drift theorem are given in Theorem 7 of [
FKL+17Analysis of the (1+1) EA on Subclasses of Linear Functions under Uniform and Linear Constraints. Tobias Friedrich, Timo Kötzing, Gregor Lagodzinski, Frank Neumann, Martin Schirneck. FOGA 2017.,
FKL+20Analysis of the (1 + 1) EA on subclasses of linear functions under uniform and linear constraints. Tobias Friedrich, Timo Kötzing, J. A. Gregor Lagodzinski, Frank Neumann, Martin Schirneck. Theor. Comput. Sci. 2020.] and in Theorem 6 of [
DDK16The Right Mutation Strength for Multi-Valued Decision Variables. Benjamin Doerr, Carola Doerr, Timo Kötzing. GECCO 2016.,
DDK18Static and Self-Adjusting Mutation Strengths for Multi-valued Decision Variables. Benjamin Doerr, Carola Doerr, Timo Kötzing. Algorithmica 2018.].
5.5 Negative Drift
When the drift goes away from the target, we speak of
negative drift. The negative drift theorem [
OW11Simplified Drift Analysis for Proving Lower Bounds in Evolutionary Computation. Pietro S. Oliveto, Carsten Witt. Algorithmica 2011.,
OW12Erratum: Simplified Drift Analysis for Proving Lower Bounds in Evolutionary Computation. Pietro S. Oliveto, Carsten Witt. CoRR 2012.] gives an exponential lower bound in this setting.
Theorem 5.14 [Negative Drift].
Let $(X_t)_{t \in \natnum}$ be an
integrable random process over $\realnum$. Suppose there is an interval $[a,b] \subseteq \realnum$, two constants $\delta, \varepsilon \gt 0$ and, possibly depending on $\ell = b-a$, a function $r(\ell)$ satisfying $1 \leq r(\ell) = o(\ell / \log \ell)$ such that, for all $t \in \natnum$, the following conditions (drift, concentration) hold.
- (D)
- $\Ew{X_{t+1} - X_t \mid X_0,…, X_t; a \lt X_t \lt b} \geq \delta$.
- (C)
- For all $j \in \natnum$, $\Pr{|X_{t+1} - X_t| \geq j \mid X_0,…, X_t; a \lt X_t} \leq \frac{r(\ell)}{(1+\varepsilon)^{j}}$.
Then there is a constant $c$ such that, for $T = \min\set{t \in \natnum}{X_t \leq a}$, we have
$$
\Pr{T \leq 2^{c \ell/ r(\ell)} \; \Big| \; X_0 \geq b} = 2^{-\Omega(\ell/ r(\ell))}.
$$
Note that drift goes with a strength
independent of the width $\ell = b-a$ of the interval away from the target $a$ (while the process is in the interval). A version with scaling which allows for more flexibility in this dependence is given in [
OW14On the runtime analysis of the Simple Genetic Algorithm. Pietro S. Oliveto, Carsten Witt. Theor. Comput. Sci. 2014.].
A variant that allows for arbitrary $\varepsilon$ (with decaying guarantees) is given in [
Köt16Concentration of First Hitting Times Under Additive Drift. Timo Kötzing. Algorithmica 2016.] as follows. It requires a bounded step size, but in return gives a very simple and easy-to-apply bound.
Theorem 5.15 [Negative Drift, Bounded Step Size].
Let $(X_t)_{t \in \natnum}$ be an
integrable random process over $\realnum$, each with finite expectation, and let $n \gt 0$. Let $T = \min\set{t \in \natnum}{X_t \geq n}$ and suppose there are $0 \lt c \lt n$ and $\varepsilon \lt 0$ such that, for all $t \in \natnum$, the following conditions hold (drift, boundedness).
- (D)
- $\Ew{X_{t+1} - X_t \mid X_0,…, X_t} \leq \varepsilon$.
- (B)
- $|X_{t+1} - X_t| \lt c$.
Then, for all $s \in \natnum$, we have
$$
\Pr{T \leq s} = s\exp\left( - \frac{n |\varepsilon|}{2c^2} \right).
$$
Given as Corollary 22 in [
Köt16Concentration of First Hitting Times Under Additive Drift. Timo Kötzing. Algorithmica 2016.] is a second variant of the negative drift theorem. It allows for very large $r$ while still giving a super-polynomial bound for finding the target in polynomial time.
Theorem 5.16 [Negative Drift II].
Let $(X_t)_{t \in \natnum}$ be an
integrable random process over $\realnum$. Suppose there is an interval $[a,b] \subseteq \realnum$, two constants $\delta, \varepsilon > 0$ and, possibly depending on $\ell = b-a$, a function $r(\ell)$ satisfying $1 \leq r(\ell) = \exp(o(\sqrt[4]{\ell}))$ such that, for all $t \in \natnum$, the following conditions hold (drift, concentration).
- (D)
- $\Ew{X_{t+1} - X_t \mid X_0,…, X_t; a \lt X_t \lt b} \geq \varepsilon$.
- (C)
- For all $j \in \natnum$, $\Pr{|X_{t+1} - X_t - \varepsilon| \geq j \mid X_0,…, X_t; a \lt X_t} \leq \frac{r(\ell)}{(1+\delta)^{j}}$.
Then there is a constant $c$ such that, for $T = \min\set{t \in \natnum}{X_t \leq a}$, we have
$$
\Pr{T \leq 2^{c \sqrt{\ell}} \; \Big| \; X_0 \geq b} = 2^{-\Omega(\sqrt[4]{\ell})}.
$$
Example applications of negative drift theorems for the analysis of evolutionary algorithms are given in the proofs of the following statements. Lemma 3 of [
KM12ACO Beats EA on a Dynamic Pseudo-Boolean Function. Timo Kötzing, Hendrik Molter. PPSN (1) 2012.]; in Lemma 8 of [
FKKS15The Benefit of Recombination in Noisy Evolutionary Search. Tobias Friedrich, Timo Kötzing, Martin S. Krejca, Andrew M. Sutton. ISAAC 2015.] (see also Lemma 5 of [
FKKS17The Compact Genetic Algorithm is Efficient Under Extreme Gaussian Noise. Tobias Friedrich, Timo Kötzing, Martin S. Krejca, Andrew M. Sutton. IEEE Trans. Evol. Comput. 2017.]); Theorem 3 of [
FKKS17The Compact Genetic Algorithm is Efficient Under Extreme Gaussian Noise. Tobias Friedrich, Timo Kötzing, Martin S. Krejca, Andrew M. Sutton. IEEE Trans. Evol. Comput. 2017.]; Lemma 6 [
FKS16On the Robustness of Evolving Populations. Tobias Friedrich, Timo Kötzing, Andrew M. Sutton. PPSN 2016.]; and Lemma 13 [
FKK16EDAs cannot be Balanced and Stable. Tobias Friedrich, Timo Kötzing, Martin S. Krejca. GECCO 2016.].
5.6 Finite State Spaces
Most drift theorems consider a random walk on the real numbers, sometimes restricted to non-negative numbers. For the analysis of
discrete algorithms, frequently the state space is even more restricted, in particular finite. By numbering the successive states, we can assume the state space to be $[0..n]$. For this setup we have the following drift theorem from [
KK18First-Hitting Times for Finite State Spaces. Timo Kötzing, Martin S. Krejca. PPSN (2) 2018.]. Note that the proof given in the paper is derived by the method of step-wise differences, see
Section 3.4 [Designing a Potential Function via Step-Wise Differences]. The theorem generalizes a theorem from [
DJW00Dynamic Parameter Control in Simple Evolutionary Algorithms. Stefan Droste, Thomas Jansen, Ingo Wegener. FOGA 2000.].
Theorem 5.17 [Finite State Spaces, Upper Bound].
Let $(X_t)_{t \in \natnum}$ be a
time-homogeneous Markov chain on $[0..n]$ and let $T$ be the first time $t$ such that $X_t = 0$. Suppose there are two functions $p^{\scriptscriptstyle\leftarrow}\colon [1..n] \to [0, 1]$ and $p^{\scriptscriptstyle\rightarrow}\colon [0..n-1] \to [0, 1]$ such that, for all $t \lt T$ and all $s \in [1..n]$,
- $p^{\scriptscriptstyle\leftarrow}(s) > 0$,
- $\Pr{X_t - X_{t+1} \geq 1 \mid X_t = s} \geq p^{\scriptscriptstyle\leftarrow}(s)$,
- $\Pr{X_t - X_{t+1} = -1 \mid X_t = s} \leq p^{\scriptscriptstyle\rightarrow}(s)$ (for $s \neq n$), and
- $\Pr{X_t - X_{t+1} \lt -1 \mid X_t = s} = 0$ (for $s \neq n$).
Then
$$
\Ew{T \mid X_0} \leq \sum_{s = 1}^{X_0}\sum_{i = s}^{n}\frac{1}{p^{\scriptscriptstyle\leftarrow}(i)}\prod_{j = s}^{i - 1}\frac{p^{\scriptscriptstyle\rightarrow}(j)}{p^{\scriptscriptstyle\leftarrow}(j)}.
$$
A special case of this theorem is the fitness level method (see
Theorem 6.1 [Fitness Level Method (FLM)]), where the process is monotone; we can recover this setting by setting $p^{\scriptscriptstyle\rightarrow}$ to be constantly $0$, significantly simplifying the above formula.
We also have the corresponding lower bound.
Theorem 5.18 [Finite State Spaces, Lower Bound].
Let $(X_t)_{t \in \natnum}$ be a
time-homogeneous Markov chain on $[0..n]$ and let $T$ be the first time $t$ such that $X_t = 0$. Suppose there are two functions $p^{\scriptscriptstyle\leftarrow}\colon \{1, …, n\} \to [0, 1]$ and $p^{\scriptscriptstyle\rightarrow}\colon [0..n-1] \to [0, 1]$ such that, for all $t \lt T$ and all $s \in [1..n]$,
- $p^{\scriptscriptstyle\leftarrow}(s) > 0$,
- $\Pr{X_t - X_{t+1} = 1 \mid X_t = s} \leq p^{\scriptscriptstyle\leftarrow}(s)$,
- $\Pr{X_t - X_{t+1} \gt 1 \mid X_t = s} = 0$, and
- $\Pr{X_t - X_{t+1} \leq -1 \mid X_t = s} \geq p^{\scriptscriptstyle\rightarrow}(s)$ (for $s \neq n$).
Then
$$
\Ew{T \mid X_0} \geq \sum_{s = 1}^{X_0}\sum_{i = s}^{n}\frac{1}{p^{\scriptscriptstyle\leftarrow}(i)}\prod_{j = s}^{i - 1}\frac{p^{\scriptscriptstyle\rightarrow}(j)}{p^{\scriptscriptstyle\leftarrow}(j)}.
$$
Note that for processes which make steps of at most $1$ and given exact $p^{\scriptscriptstyle\rightarrow}$ and $p^{\scriptscriptstyle\leftarrow}$, the two bounds coincide.
5.7 Headwind Drift
Sometimes drift only carries until shortly before the target, but then, close to the target, turns negative. In case only a small remaining distance needs to be bridged, and the probability of going the right way is still sufficiently high, the following
Headwind drift theorem can be used to directly get a decent bound without relying on hand crafted potential functions. The theorem was developed and applied in [
KLW15(1+1) EA on Generalized Dynamic OneMax. Timo Kötzing, Andrei Lissovoi, Carsten Witt. FOGA 2015.].
Theorem 5.19 [Headwind Drift].
Let $(X_t)_{t \in \natnum}$ be a
time-homogeneous Markov chain on $[0..n]$. Let bounds
$$
p^{-}(i)\leq \Pr{X_{t+1} \leq i-1 \mid X_t = i}
$$
and
$$
p^{+}(i) \geq \Pr{X_{t+1} \geq i+1 \mid X_t = i},
$$
where $0 \leq i \leq n$, be given, and define
$$
\delta(i):= p^{-}(i)-\Ew{(X_{t+1}-i) \cdot \mathbb{1}[X_{t+1} > i] \mid X_t=i}.
$$
Assume that $\delta(i)$ is monotone increasing with respect to $i$ and let $\kappa \geq \max\set{i \ge 0 }{ \delta(i) \leq 0 }$ (noting that $\delta(0) \leq 0$).
The function $g\colon [0..n+1]\to \realnum_+$ is defined by
$$
g(i) := \sum_{k=i+1}^n \frac{1}{\delta(k)}
$$
for $i \geq \kappa$ (in particular, $g(n)=g(n+1)=0$), and inductively by
$$
g(i) := \frac{1+(p^{+}(i+1)+p^{-}(i+1))g(i+1)}{p^{-}(i+1)}
$$
for $i \lt \kappa$.
Then it holds for the first hitting time $T := \min\set{t \in \natnum }{ X_t=0 }$ of state $0$ that
$$
\Ew{T \mid X_0} \leq g(0)-g(X_0).
$$
We can also get a closed expression for the expected first hitting time $\Ew{T \mid X_0}$. This expression involves the factor $\sum_{k=\kappa+1}^N \frac{1}{\delta(k)}$ that is reminiscent of the formula for the expected first hitting time of state $\kappa$ under variable drift towards the target (see
Theorem 5.13 [Variable Drift]). For the states less than $\kappa$, where drift away from the target holds, the product $\prod_{k=1}^{\kappa} \frac{p^{+}(k)+p^{-}(k)}{p^{-}(k)}$ comes into play. Intuitively, it represents the waiting time for the event of taking $\kappa$ consecutive steps against the drift. Since the product involves probabilities conditioned on leaving the states, which effectively removes self-loops, another sum of products must be added. This sum, represented by the second line of the expression for $\Ew{T \mid X_0}$, intuitively accounts for the self-loops.
Corollary 5.20 [Headwind Drift, Closed Form].
Let the assumptions of
Theorem 5.19 [Headwind Drift] hold. Then
\begin{align*}
\Ew{T \mid X_0} & \leq \left(\left(\sum_{k=\kappa+1}^N \frac{1}{\delta(k)}\right) \left(\prod_{k=1}^{\kappa} \frac{p^{+}(k)+p^{-}(k)}{p^{-}(k)} \right)\right)
\\
& \qquad + \left(\sum_{k=1}^{\kappa} \frac{1}{p^{-}(k)}\prod_{j=1}^{k-1} \frac{p^{+}(j)+p^{-}(j)}{p^{-}(j)}\right).
\end{align*}
Note that there is a similarity between the theorems for headwind drift and those from
Section 5.6 [Finite State Spaces]. This is because the analysis for the last steps of headwind drift is essentially an analysis brute-forcing the small interval of negative drift, which also happens in
Section 5.6 [Finite State Spaces].
5.8 Multiplicative Up-Drift
The idea of
Theorem 2.5 [Multiplicative Drift] was to have multiplicative drift going
down towards $0$. While this has many applications (owing to the fact that progress in optimization typically gets harder as better and better solutions are found), there are also a number of processes that gain in speed over time, typically making progress proportional to the current state of the process, such as rumor spreading, epidemics and population take-over. This is known as multiplicative up-drift and was studied in depth in [
DK21Multiplicative Up-Drift. Benjamin Doerr, Timo Kötzing. Algorithmica 2021.]. The main theorem is the following.
Theorem 5.21 [Multiplicative Up-Drift].
Let $(X_{t})_{t \in \natnum}$ be an
integrable random process over $\integers_{\geq 0}$. Let $n,k \in \integers_{\geq 1}$, $E_0 \gt 0$, $\gamma_0 \lt 1$, and $\delta \gt 0$ such that $n - 1 \leq \min\{\gamma_0 k, (1+\delta)^{-1} k\}$. Let $D_0 = \min (\lceil 100/\delta \rceil,n)$ when $\delta \le 1$ and $D_0 = \min (32,n)$ otherwise. Assume that, for all $t \in \natnum$ and all $x \in [0..n-1]$ with $\Pr{X_{t} = x} \gt 0$, the following two conditions hold (binomial distribution, gain at $0$); note that we use the concept of
stochastic dominance.
- (Bin)
- If $x \geq 1$, then $(X_{t+1} \succeq \mathrm{Bin}(k,(1+\delta) X_t/k)$.
- (0)
- $\Ew{ \min(X_{t+1}, D_0) \mid X_{t} = 0} \geq E_0$.
Let $T := \min\set{t \in \natnum}{X_{t} \geq n}$.
Then,
if $\delta \leq 1$,
\begin{align*}
\Ew{T}
& \leq \frac{4D_0 }{0.4088 E_0} + \frac{15}{1-\gamma_0} D_0 \ln(2 D_0)+ 2.5 \log_2(n) \lceil 3 / \delta \rceil.
\end{align*}
In particular, when $\gamma_0$ is bounded away from $1$ by a constant, then $E[T] = \bigO{\frac{1}{E_0\delta} + \frac{\log (n)}{\delta}}$, where the asymptotic notation refers to $n$ tending to infinity and where $\delta=\delta(n)$ may be a function of $n$.
Furthermore, if $n \gt 100/\delta$, then we also have that once the process has reached a state of at least $100/\delta$, the probability to ever return to a state of at most $50/\delta$ is at most $0.5912$.
If $\delta \gt 1$, then we have
\begin{align*}
\Ew{T}
& \leq \frac{128}{0.78 E_0} + 2.6 \log_{1+\delta}(n) + 81\\
& = \bigO{\frac{1}{E_0} + \frac{\log (n)}{\log(\delta)}}.
\end{align*}
In addition, once the process has reached state $32$ or higher, the probability to ever return to a state lower than $32$ is at most $\tfrac{1}{e(e-1)} \lt 0.22$.
Note that this drift theorem is essentially restricted to processes based on the binomial distribution. For many applications this restriction is satisfied, particularly for the level-based theorem introduced in [
Leh11Fitness-levels for non-elitist populations. Per Kristian Lehre. GECCO 2011.] and refined in [
DL16Runtime Analysis of Non-elitist Populations: From Classical Optimisation to Partial Information. Duc-Cuong Dang, Per Kristian Lehre. Algorithmica 2016.,
CDEL18Level-Based Analysis of Genetic Algorithms and Other Search Processes. Dogan Corus, Duc-Cuong Dang, Anton V. Eremeev, Per Kristian Lehre. IEEE Trans. Evol. Comput. 2018.]. We now discuss the currently strongest version in terms of the asymptotics in $\delta$, given in [
DK21Multiplicative Up-Drift. Benjamin Doerr, Timo Kötzing. Algorithmica 2021.] as a consequence to the multiplicative up-drift theorem.
The general setup of level-based theorems is as follows. There is a ground set $\mathcal{X}$, which in typical applications is the search space of an optimization problem. On this ground set, a
Markov chain $(P_t)$ induced by a population-based EA is defined. We consider populations of fixed size $\lambda$, which may contain elements several times (multi-sets). We write $\mathcal{X}^\lambda$ to denote the set of all such populations. We only consider
Markov chains where each element of the next population is sampled independently with repetition. That is, for each population $P \in \mathcal{X}^\lambda$, there is a distribution $D(P)$ on $\mathcal{X}$ such that given $P_t$, the next population $P_{t+1}$ consists of $\lambda$ elements of $\mathcal{X}$, each chosen independently according to the distribution $D(P_t)$. As all our results hold for any initial population $P_0$, we do not make any assumptions on $P_0$.
In the level-based setting, we assume that there is a partition of $\mathcal{X}$ into levels $A_1, \dots, A_m$ (leading to the name of a
level-based theorem). Based on information in particular on how individuals in higher levels are generated, we aim for an upper bound on the first time such that the population contains an element of the highest level $A_m$.
Theorem 5.22 [Level-Based Theorem].
Consider a population-based process as described above.
Let $(A_1,…,A_m)$ be a partition of $\mathcal{X}$. Let $A_{\ge j} := \bigcup_{i=j}^m A_i$ for all $j \in [1..m]$. Let $z_1,…,z_{m-1},\delta \in (0,1]$, and let $\gamma_0 \in (0,\frac{1}{1+\delta}]$ with $\gamma_0 \lambda \in \integers$. Let $D_0 = \min\{\lceil 100/\delta \rceil,\gamma_0 \lambda\}$ and $c_1 = 56 \, 000$. Let
$$
t_0 = \frac{7000}{\delta} \left(m + \frac{1}{1-\gamma_0} \sum_{j=1}^{m-1} \log^0_2\left(\frac{2\gamma_0\lambda}{1+\frac{z_j \lambda}{D_0}}\right) + \frac{1}{\lambda} \sum_{j=1}^{m-1}\frac{1}{z_j} \right),
$$
where $\log^0_2(x) := \max(0,\log_2(x))$ for all $x \in \realnum_+$. Assume that, for any population $P \in \mathcal{X}^\lambda$, the following three conditions are satisfied (drift, zero condition, population size).
- (D)
- For each level $j \in [1..m-2]$ and all $\gamma \in (0,\gamma_0]$, if $|P \cap A_{\geq j}| \geq \gamma_0 \lambda {/4}$ and $|P \cap A_{\geq j+1}| \geq \gamma \lambda$, then
$$
\operatorname{Pr}_{y \sim D(P)} \left[y \in A_{\geq j+1}\right] \geq (1+\delta)\gamma.
$$
- (0)
- For each level $j \in [1..m-1]$, if $|P \cap A_{\geq j}| \geq \gamma_0 \lambda {/4}$, then
$$
\operatorname{Pr}_{y \sim D(P)} \left[y \in A_{\geq j+1}\right] \geq z_j.
$$
- (PS)
- The population size $\lambda$ satisfies
$$
\lambda \geq {\frac{256}{\gamma_0 \delta} \ln \left(8 t_0 \right)}.
$$
Then $T := \min \set{\lambda t}{P_t \cap A_m \neq \emptyset}$ satisfies
\begin{align*}
\Ew{T} \leq 8\lambda t_0 = c_1 \frac{\lambda}{\delta} \left(m + \frac{1}{1-\gamma_0} \sum_{j=1}^{m-1} \log^0_2\left(\frac{2\gamma_0\lambda}{1+\frac{z_j \lambda}{D_0}}\right) + \frac 1 \lambda \sum_{j=1}^{m-1}\frac{1}{z_j} \right).
\end{align*}
Note that, with $z^* = \min_{j \in [1..m-1]} z_j$ and $\gamma_0$ a constant, (PS) in the previous theorem is satisfied for some $\lambda$ with
$$
\lambda = \Omega\left(\frac{1}{\delta}\log\left( \frac{m}{\delta z^*} \right)\right)
$$
as well as for all larger $\lambda$.
5.9 Wormald's Method
A very different approach to understanding random processes via their step-wise changes is given by Wormald [
Wor99The differential equation method for random graph processes and greedy algorithms. Nicholas C. Wormald. Lectures on approximation and randomized algorithms 1999.], tracking the processes via solutions of a system of differential equations. We briefly state a version of this theorem here.
Consider a stochastic process $(Y^{(t)})_{t\in \natnum}$, where each random variable $Y^{(t)}$ takes values in some set $S$. We use $H_t$ to denote a history of the process up to time $t$, i.e. $H_t=(Y^{(0)},…, Y^{(t)})$. And $S^+$ denotes the set of all sequences $(Y^{(0)},…, Y^{(t)})$ such that $Y^{(t)}\in S$.
We say that a function $f\colon \realnum^k \rightarrow \realnum$ satisfies a Lipschitz condition on $D \subseteq \realnum^k$ if there is an $L \gt 0$ such that, for all $u=(u_1,…,u_k), v=(v_1,…,v_k)\in D$, $$|f(u)-f(v)| \leq L \max_{1\leq i\leq k} |u_i-v_i|.$$
Theorem 5.23 [Wormald's Method].
For some $a \in \natnum$, let $(Y^{(t)}_i)_{1 \leq i \leq a, t \in \natnum}$ be a stochastic process, such that there is $C \in \realnum_+$ so that for all $m \in \natnum_+$ and $t \in \natnum$, $|Y_i^{(t)}| \lt m$ for all $H_t\in S^{+}$. Let $D$ be some bounded connected open set containing the closure of $$\Big\{(0,z_1,…,z_a) \Big| \Pr{Y_i^{(0)}=z_im, 1\leq i \leq a} \neq 0 \mbox{ for some } m\Big\}.$$
Assume the following three conditions hold, where for each $1 \leq i \leq a$ function $f_i\colon \realnum_{+} \times \realnum^{a} \rightarrow \realnum$ is continuous, and satisfies a Lipschitz condition on $D$ with the same Lipschitz constant $L$ for all $i$ (drift, boundedness).
- (D)
- $\Ew{Y_i^{(t+1)}-Y_i^{(t)} \mid H_t} = f_i(t/m,Y^{(t)}_1/m,…,Y^{(t)}_a/m)$.
- (B)
- For all $t \in \natnum$, $\max_{1\leq i\leq a} |Y_i^{(t+1)}-Y_i^{(t)}| \leq 1$.
Then the following are true.
- For any $(0,\hat{z}_1,…,\hat{z}_a)\in D$, the system of differential equations $$\frac{dz_i}{dx}=f_i(x,z_1,…,z_a),\; i=1,…,a$$ has a unique solution in $D$ for $z_i:\realnum \rightarrow \realnum$ passing through $z_i(0)=\hat{z}_i, 1\leq i\leq a$, and which extends to points arbitrarily close to the boundary of $D$;
- Let $\lambda = \lambda(m) = o(1)$. For some constant $C > 0$, with probability $1-\bigO{\frac{1}{\lambda}\exp(-m\lambda^3)}$,
$$Y_i^{(t)}=mz_i(t/m)+\bigO{\lambda m}$$
uniformly for $0\leq t\leq \sigma m$ and for each i, where $z_i(x)$ is the solution in given above with $\hat{z}_i=\frac{1}{m}Y_i^{(0)}$, and $\sigma=\sigma(m)$ is the supremum of those $x$ to which the solution can be extended before reaching within $L_{\infty}$-distance $C\lambda$ of the boundary of $D$.
A few analyzes of randomized search heuristics with Wormald's theorem exist [
LS15Fixed Budget Performance of the (1+1) EA on Linear Functions. Johannes Lengler, Nicholas Spooner. FOGA 2015.,
FKM17Analyzing search heuristics with differential equations. Tobias Friedrich, Timo Kötzing, Anna Melnichenko. GECCO (Companion) 2017.,
Her18Modelling Evolutionary Algorithms with Stochastic Differential Equations. Jorge Pérez Heredia. Evol. Comput. 2018.].
It has the advantage that it allows to track multiple interacting random variables (which, for other drift theorems, would have to be combined to a single potential). On the other hand, it requires solving a differential equation (well-known to be not an easy task) and the conclusion is typically deteriorating over time, since the variance is not averaged out but accumulates over time.
Note that there are also theorems closer to the classic drift theorems for tracking multiple random variables in restricted settings [
Row18Linear multi-objective drift analysis. Jonathan E. Rowe. Theor. Comput. Sci. 2018.,
JL22Two-Dimensional Drift Analysis: - Optimizing Two Functions Simultaneously Can Be Hard. Duri Janett, Johannes Lengler. PPSN (2) 2022.].
This work is licensed as Creative Commons Attribution - ShareAlike 4.0 International.
All mathematical statement in this script are freely usable, of course. As a collection and with comments they are licensed as stated above.