9 Notation

In this section we collect some algorithms, notation and lemmas used in this work.

9.1 Algorithms

The most simple search heuristic is Random Local Search (RLS). It starts with a random bit string and iteratively tries to improve it by changing the currently best search point in exactly one position (we use this as the “flipOne” function below). The pseudo code for RLS maximizing a given function $f\colon \{0,1\}^n \rightarrow \realnum$ is given as follows.

RLS

The algorithm is set up to maximize the given function $f$; by turning the inequality around, we get the analogous algorithm for minimization.

RLS constitutes a simple and straightforward hill climber. A slightly more advanced algorithm allows for larger jumps, that is, it also considers changing the currently best search point in more than one position. The most common way to achieve this is by flipping not exactly one bit, but each bit independently with some predefined probability $p$. This independently random flipping of bits is called mutation, and the resulting algorithm is called the $(1+1)$ EA; its pseudo code is given as follows.

OneOneEA

Note that the standard bit flip probability is $p=1/n$, implying that, on average, exactly one bit changes.

We consider in this document two concrete test functions and one function class as follows.

Definition 9.1 [Test Functions].

${\mathrm O{\scriptsize NE}M{\scriptsize AX}}$ is a function $\{0,1\}^n \rightarrow \realnum$ mapping any bit string to the number of $1$s in the bit string.
${\mathrm L{\scriptsize EADING}O{\scriptsize NES}}$ is a function $\{0,1\}^n \rightarrow \realnum$ mapping any bit string to the number of $1$s before the first $0$ (if any) in the bit string (the number of leading $1$s).
A linear function is any function $f\colon \{0,1\}^n \rightarrow \realnum$ such that there exists $w_1,…,w_n \in \realnum$ such that, for all $x \in \{0,1\}^n$, $f(x) = \sum_{i=1}^n w_i \; x_i$.

9.2 Notation

Next we give some non-standard notation.

Definition 9.2 [Discrete Intervals]. For any $n,m \in \natnum$ with $n \leq m$, we use $[n..m]$ to denote the set $\set{i \in \natnum}{n \leq i \leq m}$. Furthermore, for any $n \in \natnum_+$, we will write $[n]$ for $[1..n]$.

Definition 9.3 [Function Self-Composition]. For any function $f\colon X \rightarrow X$ and $i \geq 0$, we let $f^i$ denote the $i$-times self-composition of $f$ (with $f^0$ being the identity on $X$).

We use the following notation regarding probabilities.

Definition 9.4 [Indicator Function]. For any event $A$, let $\ind{A}$ denote the indicator function for the event $A$.

Definition 9.5 [Conditional Expectation]. For any discrete random variable $X$ and any event $A$ such that $\Pr{A} \gt 0$, we have $$\Ew{X \mid A} = \sum_{x} x \frac{\Pr{\{X=x\} \cap A}}{\Pr{A}}.$$

Note that we can only condition on the event $X=s$ if $\Pr{X = s} \gt 0$.

Definition 9.6 [Integrability]. A random variable $X$ is integrable if and only if $\Ew{|X|} \lt \infty$. In general, a random process $(X_t)_{t \in \natnum}$ is integrable if and only if, for all $t \in \natnum$, it holds that $X_t$ is integrable.

Definition 9.7 [Discrete Random Variable, Discrete Random Process]. A random variable $X$ is discrete if and only if it has a countable range. We call a random process $(X_t)_{t \in \natnum}$ discrete if each $X_t$ is discrete.

Definition 9.8 [Monotone Process]. A random process $(X_t)_{t \in \natnum}$ is monotone (sometimes called monotone non-decreasing) if and only if, for all $t$, $X_t \leq X_{t+1}$ (point-wise, i.e., for all atomic events $\omega \in \Omega$ we have $X_t(\omega) \leq X_{t+1}(\omega)$).

Definition 9.9 [Stochastic Dominance]. Given two random variables $X,Y$ over $\realnum$, we say that $Y$ stochastically dominates $X$ if and only if, for all $x \in \realnum$, $\Pr{X \leq x} \geq \Pr{Y \leq x}$; we write $X \preceq Y$.

Definition 9.10 [Markov Chain]. A discrete random process $(X_t)_{t \in \natnum}$ is a Markov chain if and only if the outcome of each next step only depends on the current state and time point. Formally, for all $t \in \natnum$ as well as all $s \in \realnum$ and all $v \in \realnum^t$, it holds that $\Pr{X_{t+1} = s \mid X_t = v_t} = \Pr{X_{t+1} = s \mid \forall t' \in [0..t]\colon X_{t'} = v_{t'}}$. A Markov chain $(X_t)_{t \in \natnum}$ is time-homogeneous if and only if the outcome of each next step only depends on the current state but not the current time point. Formally, for all $t,k \in \natnum$ as well as all $s, u \in \realnum$, it holds that $\Pr{X_{t+1} = s \mid X_t = u} = \Pr{X_{t+k+1} = s \mid X_{t + k} = u}$.

9.3 Lemmas

We will make use of the following lemmas. The first two pertain to bounding certain sums.

Lemma 9.11 [Upper Bound on the Harmonic Sum]. For all $n \in \natnum_{+}$ we have $$\sum_{i=1}^n \frac{1}{i} \leq \ln(n)+1.$$

Proof. See (1.4.12) in [Doe20Probabilistic Tools for the Analysis of Randomized Optimization Heuristics. Benjamin Doerr. In: Theory of Evolutionary Computation 2020.].

Lemma 9.12 [Upper Bounding Sum by Integral]. Let $a,b \in \realnum$ with $a \leq b$ and let $f\colon [a,b] \rightarrow \realnum$ be integrable. If $f$ is monotone increasing, then $$\sum_{i=a}^{b-1} f(i) \leq \int_a^b f(x) \; \mathrm{d}x.$$ If $f$ is monotone decreasing, then $$\sum_{i=a+1}^{b} f(i) \leq \int_a^b f(x) \; \mathrm{d}x.$$

The remaining lemmas pertain to random variables.

Lemma 9.13 [Conditional and Indicator]. For all events $A$ with $\Pr{A} \gt 0$, we have $$\Ew{Y \mid A} \cdot \Pr{A} = \Ew{Y \ind{A}}.$$

Definition 9.14 [Filtration Conditional]. Let a sub-$\sigma$-algebra $F$ of the underlying probability space be given and let $X$ be a random variable. Then $\Ew{Y \mid F}$ refers to any $F$-measurable random variable such that, for all $A \in F$ with probability $1$, $$\Ew{Y \cdot \ind{A}} = \Ew{\Ew{Y \mid F} \cdot \ind{A} }.$$

Lemma 9.15 [Tower Property for Sub-$\sigma$-Algebra]. Let two sub-$\sigma$-algebras $F_1 \subseteq F_2$ of the underlying probability space be given and let $X$ be a random variable. Then, with probability $1$, $$\Ew{\Ew{X \mid F_2} \mid F_1} = \Ew{X \mid F_1}.$$

Very much related to the preceding lemma is the law of total expectation (also known under other names and with different formulations).

Lemma 9.16 [Law of Total Expectation]. Let $X$ be a random variable and $A_i, …, A_n$ disjoint measurable events with positive probability that partition the probability space. Then we have $$\Ew{X} = \sum_{i=1}^n \Ew{X \mid A_i}\Pr{A_i}.$$ We will also have cause to use a different way to state this. Let $X,Y$ be two random variables. Then $$\Ew{X} = \Ew{\Ew{X \mid Y}}.$$

We need the following lemma to make a specific proof in this document rigorous. The proof is due to Marcus Pappik (private communication).

Lemma 9.17. Let $Y, X_0, …, X_t$ be random variables and let $Z$ be any version of the conditional expectation $\Ew{Y \mid X_0, …, X_t}$. For all $s_0, …, s_t \in \realnum$ such that $\Pr{X_0 = s_0, …, X_t = s_t} > 0$ and all $\omega \in \{X_0 = s_0, …, X_t = s_t\}$ it holds that $$ Z(\omega) = \Ew{Y \mid X_0 = s_0, …, X_t = s_t}. $$

Proof. We start by proving the following claim.
Claim. Consider two measurable spaces $(\Omega, \mathcal{F})$ and $(S, \mathcal{S})$ such that, for all $s \in S$, $\{s\} \in \mathcal{S}$. Let $X, Z\colon \Omega \to S$ be measurable. If $Z$ is $\sigma(X)$-measurable then $Z$ is constant on $\{X = s\}$ for every $s \in S$.
Proof of claim. For the sake of contradiction, assume the statement is false. That is, suppose $Z$ is $\sigma(X)$-measurable and there is some $s_0 \in S$ such that $Z$ is not constant on $A_0 := \{X=s_0\}$. Let $s_1, s_2 \in S$ be distinct values of $Z$ on $A_0$. Since $Z$ is $\sigma(X)$-measurable, it holds that $A_1 := \{Z = s_1\} \in \sigma(X)$. Consequently, we have that $A_0 \cap A_1 \in \sigma(X)$ and, by construction, $\emptyset \subset A_0 \cap A_1 \subset A_0$. We now show that these two properties of $A_0 \cap A_1$ pose a contradiction. To this end, suppose there exists any set $A \subseteq \Omega$ with $\emptyset \subset A \subset A_0$ and $A \in \sigma(X)$. Note that $$ \sigma(X) = \{X^{-1}(B) \mid B \in \mathcal{S}\} . $$ Hence, if $A \in \sigma(X)$ then $B := \{X(\omega) \mid \omega \in A\}$ must be in $\mathcal{S}$. But since $\emptyset \subset A \subset A_0$ it holds that $$ \emptyset \subset B \subset \{X(a) \mid a \in A_0\} = \{s_0\} . $$ However, since both inclusions are strict, such a set $B$ cannot exist. End of proof of claim.
To prove the lemma, suppose now that $Z$ is any version of $\Ew{Y \mid X_0, …, X_t}$ and let $A = \{X_0 = s_0, …, X_t = s_t\}$ for any $s_0, …, s_t \in \realnum$. Since $Z$ is $\sigma(X_0, …, X_t)$-measurable, the claim above yields that $Z$ is constant on $A$. Let $z$ be the value of $Z$ on $A$ and note that, for all $\omega \in A$, we have $$ Z(\omega) \cdot \Pr{A} = \Ew{z \cdot \ind{A}} = \Ew{Z \cdot \ind{A}} = \Ew{Y \cdot \ind{A}} , $$ where the first equality is due to $Z(\omega) = z$ and $\Pr{A} = \Ew{\ind{A}}$, the second equality follows from the fact that $z \cdot \ind{A} = Z \cdot \ind{A}$ point-wise, and the last equality is due to the fact that $Z$ is a version of $\Ew{Y \mid X_0, …, X_t}$. Further, by Lemma 9.13 [Conditional and Indicator], we have, using $\Pr{A} \gt 0$, $$ \Ew{Y \cdot \ind{A}} = \Ew{Y \mid A} \cdot \Pr{A} . $$ Combining both steps yields $$ Z(\omega) \cdot \Pr{A} = \Ew{Y \mid A} \cdot \Pr{A}. $$ Provided $\Pr{A} \gt 0$, this implies $Z(\omega) = \Ew{Y \mid A}$ for all $\omega \in A$, which proves the lemma.

This work is licensed as Creative Commons Attribution - ShareAlike 4.0 International.
All mathematical statement in this script are freely usable, of course. As a collection and with comments they are licensed as stated above.