1 What is Stochastic Drift?

Suppose that you win a million dollars in a lottery and that you start spending your winnings. You observe that you spend on average 10.000 dollars per day. How long will your lottery winnings last? Intuitively, you would divide the million you won by 10.000 and estimate that your winnings would last for 100 days. But that feels like confusing a random process with a deterministic one. Well, yes, but the good news are: There is a theorem that tells us that 100 days is the mathematically precise answer, even when the process is randomized. Even better, if you gain money on some days (say, by playing in a casino) but still, in expectation, your balance goes down by 10.000 per day, the conclusion still holds. There can even be dependencies between the earnings and spendings of different days (say, on a day after a big earning, you gamble higher amounts than otherwise). The theorem which shows that this is the case is called the additive drift theorem (see Theorem 2.1 [Additive Drift, Upper Bound]). The term drift refers to the difference between two successive values of the process, and the term additive refers to the requirement that the drift is, in expectation, bounded by an additive constant.
A similar setting to that of the process described above is the well-known coupon collector process, defined as follows. Suppose you want to collect coupons until you have one of each of $n$ different colors. Each day you get one coupon the color which is chosen uniformly at random and independent of the other days; in particular, you may gain coupons of a color which you already received before. How long does it take until you have a complete set of at least one coupon of each color? For the analysis, note that in the first iteration you get a new color of coupon with certainty (since you do not have any coupons yet). This changes over time: once you already collected exactly half of the colors, the probability of gaining a new one is only one in two. Once you have already gained 90 percent, it is down to one in ten, and so on. Or, flipped around: if you are only half the way from your goal, you only have half the chance of making progress, and if you are 10 percent away from the goal, you make 10 percent of the progress. This is a multiplicative expected progress (the progress is a multiple of the current state of the process) and the multiplicative drift theorem (see Theorem 2.5 [Multiplicative Drift]) can be used to analyze exactly this setting. The theorem also holds when the number of coupons gained in each iteration is random (for example, if I gain every color of coupon with a random chance of some value $p$ in each iteration), and it even holds when there is a possibility of losing coupons. Furthermore, it also gives an upper concentration bound.
As these two examples show, we analyze the expected progress of a single step of the given random process in order to find the first time the random process reaches a target state (the so-called “first-hitting time”). This brings us to the following description of drift theory:

Drift theory is a collection of theorems to turn iteration-wise expected gains into expected first-hitting times.

The first drift theorem, the additive drift theorem, was introduced by He and Yao [HY01Drift analysis and average time complexity of evolutionary algorithms. Jun He, Xin Yao. Artif. Intell. 2001.], based on an intricate theorem by Hajek [Haj82Hitting-time and occupation-time bounds implied by drift analysis with applications. Bruce Hajek. Advances in Applied Probability 1982.]. He and Yao applied their theorem in the context of analyzing randomized search heuristics (RSHs), such as evolutionary algorithms (EAs), which work by the principle of variation (mutating solutions by random changes) and selection (accepting improvements and rejecting worsenings). Drift theory gained a lot of traction in the EA theory community after the multiplicative drift theorem was introduced by Doerr, Johannsen, and Winzen [DJW10Multiplicative drift analysis. Benjamin Doerr, Daniel Johannsen, Carola Winzen. GECCO 2010.]. Their proof used additive drift, but a proof not relying on Hajek's result was given shortly after by Doerr and Goldberg [DG10Drift Analysis with Tail Bounds. Benjamin Doerr, Leslie Ann Goldberg. PPSN (1) 2010.]. Since then, drift theory has been the dominant method for formally analyzing RSHs, easing their analysis significantly over analyses not arguing via drift. For example, the main result of Droste [Dro04Analysis of the (1+1) EA for a Noisy OneMax. Stefan Droste. GECCO (1) 2004.] on noisy optimization, spanning an entire paper, was reproven in a more general form by Giessen and Kötzing [GK16Robustness of Populations in Stochastic Environments. Christian Gießen, Timo Kötzing. Algorithmica 2016.] on a single page.
Drift theorems find application in the analysis of a plethora of different settings, ranging from randomized optimization over approximation algorithms to further stochastic processes (see Section 2 [A Gentle Introduction to Classic Drift Theorems] and the second part of Section 4 [Going Nowhere: Drift Without Drift]). Key to the applicability is to model the problem as a search for the time until a random process reaches a target state. However, in spite of the versatility of drift theorems, there are only very few results outside of the theory of RSHs applying drift theory [BLM+20An Optimal Decentralized (Δ + 1)-Coloring Algorithm. Daniel Bertschinger, Johannes Lengler, Anders Martinsson, Robert Meier, Angelika Steger, Milos Trujic, Emo Welzl. ESA 2020., GLR20Phase transitions of the Moran process and algorithmic consequences. Leslie Ann Goldberg, John Lapinskas, David Richerby. Random Struct. Algorithms 2020., KU18Brief Announcement: Population Protocols Are Fast. Adrian Kosowski, Przemyslaw Uznanski. PODC 2018.]. Given the versatility of the approach within the area of randomized optimization, it is likely that a higher visibility of these theorems could benefit further research communities.

1.1 A Guide to this Document

This document presents an overview over drift theory. What theorems are available? How can they be applied? What pitfalls abound when using drift theory? Concretely, the contents of this document are as follows.
Section 2 [A Gentle Introduction to Classic Drift Theorems] presents the two already mentioned drift theorems (additive and multiplicative) formally. These are by far the two most important drift theorems and the section includes many examples of their use from a diverse range of settings.
Section 3 [The Art of Potential Functions] discusses the most important technique of making drift theorems applicable: With potential functions, random processes can be mapped to fulfill the requirements of drift theorems, and this section discusses heuristics of how to do this.
One main example for how potential functions can be used to make a process exhibit drift is the analysis of unbiased random walks. These walks have, by definition, a drift of $0$, but nonetheless drift theory can be used to analyze such processes. This is detailed in Section 4 [Going Nowhere: Drift Without Drift].

For researchers new to the area, Sections 2 to 4 give a brief but well-rounded introduction to the field. Further sections provide deeper material, extending the applicability and discussing the finer points of drift theory.

Section 5 [The Zoo: A Tour of Drift Theorems] provides a long list of available drift theorems, including the famous variable drift theorem, as well as many other drift theorems tailored to various settings of drift. Some example applications and discussions on the relation between the different theorems give an overview of the currently available drift theorems.
A special case of drift is exhibited by monotone processes (processes that cannot go back). This is a specific branch of analysis which was developed independently of the other drift theorems; we discuss the corresponding theorems in Section 6 [No Going Back: The Fitness Level Method (FLM)].
While classic drift theorems give statements about how long it takes for a process to reach a certain state, the dual question is to ask what state to expect after a given number of iterations. Also this area has theorems similar to drift theorems, and we discuss them in Section 7 [A Different Perspective: Fixed Budget Optimization].
In Section 8 [Drift as an Average: A Closer Look on the Conditioning of Drift] we consider the very technical side of drift theorems. We contrast and discuss different ways in stating the drift theorems and point to pitfalls in applying drift theorems without checking all conditions.
Finally, Section 9 [Notation] introduces some notation used in this work, before the author gives acknowledgments in Section 10 [Acknowledgments].

This work is licensed as Creative Commons Attribution - ShareAlike 4.0 International.
All mathematical statement in this script are freely usable, of course. As a collection and with comments they are licensed as stated above.