Martingales

This post works through highlights of Aaditya Ramdas’ 2018 minicourse on martingales at Carnegie Mellon University¹ with some supplemental information taken Durrett² and some definitions from Wikipedia. Additional definitions are from Shafer et al.³ adn Ramdas et al.⁴.

This is mostly for my own benefit because I have trouble conceptualizing (and then remembering) many topics in stochastic processes. The main topic of this post will be the martingale (to be more rigously defined later). We can think of a martingale as the amount of wealth a player has at a given round (time point) in a betting game.

Definitions

We now continue with several “building block” definitions covered in Ramdas et al.⁴. We start with the $p$-process, which is sometimes called the anytime-valid $p$-value in other works.

Definition ($p$-Process).
Let $H_0$ be some hypothesis. A $p$-process is any sequence of $p$-values $(p_t)_{t \in T}$ (with index set $T$) such that $\mathbb{P}_{H_0}(p_\tau \leq a) \leq a$ for all $a \in [0, 1]$ for any arbitrary stopping time, $\tau$.

Intuitively, a $p$-process is simply a sequence of $p$-values satisfying the condition that, if we were to stop our sequence at any time (and the stopping time can be whatever we want — even data-dependent!), then the probability that the $p$-value is, at most, any value in $[0, 1]$ cannot exceed that value (kinda like calibration).

Definition ($e$-Process).
Let $H_0$ be some hypothesis. An e-process is any non-negative sequence $(e_t)_{t \in T}$ (with index set $T$) such that $\mathbb{E}_{H_0}[e_\tau] \leq 1$ for any arbitrary stopping time, $\tau$.

Examples of $e$-processes include test martingales and test supermartingales.

Related to the $e$-process is the $e$-value.

Definition ($e$-Value).
Let $H_0$ denote a null hypothesis about the distribution of our data $X = (X_1, \dots, X_\tau)$, which is a sequence of observations with stopping time/sample size $\tau$. An $e$-variable or $e$-statistic is a non-negative random variable $E = E(X)$ such that, for all $P \in H_0$, $\mathbb{E}_P[E] \leq 1$. An $e$-value is a value taken on by $E$, but is commonly used to refer to the $e$-statistic itself.

We now turn to ideas in stochastic processes.

Definition (Random Walk).
Let $X_1, X_2, \dots$ denote independent and identically distributed random variables in $\mathbb{R}$. The sum of the first $n$, $S_n = X_1 + X_2 + \dots + X_n$, is a random walk.

A random walk is a very basic stochastic process. If the walk satisfies the condition that $\mathbb{P}(X_i = 1) = \mathbb{P}(X_i = -1) = \frac{1}{2}$ (i.e. the random variables take values in ${ -1, 1}$ equiprobably), then $S_n$ is a simple random walk.

Suppose we want to conceptualize the information we have at time $n$. One way to do so would be to consider the $\sigma$-field generated by $X_1, \dots, X_n$. This leads to the filtration.

Definition (Filtration).
Let $(\Omega, \mathcal{F}, P)$ be a probability space, and let $I$ be an index set equipped with a total order denoted by $\leq$. For every $i \in I$, let $\mathcal{F}_i$ be a sub-$\sigma$-field of $\mathcal{F}$. A filtration, denoted by $\mathbb{F} = (\mathcal{F}_i)_{i \in I}$ is the set of all such $\mathcal{F}_i$ such that $\mathcal{F}_k \subseteq \mathcal{F}_l$ for all $k \leq l$.

Intuitively, a filtration is a collection of sub-$\sigma$-fields of $\mathcal{F}$ that has a non-decreasing order that captures the information had up to a given point (usually in time). We can kind of think about filtrations as keeping track of all of the questions we can answer about our process.

At timepoint $t=0$, we know very little about our stochastic process. We know that $(X_t)_{t\in T}$ each take on some value in $\mathcal{F}$, and we know that they must take on values in $\mathcal{F}$. However, that’s all we know. Thus, $\mathcal{F}_0 = \{ F_0, \emptyset \}$ where $F_0 = \mathcal{F}^{\rvert T \rvert}$, the $\rvert T \rvert$-ary Cartesian power of $\mathcal{F}$.

At timepoint $t = 1$, we gain the knowledge of the outcome of $X_1$, and we can therefore answer any question that is only about $X_1$. Some simple examples include “Is $X_1 = 0, 1$ or $2$?” or “Is $X_1$ odd?”. Thus, $\mathcal{F}_1 = \{ A \times F^{\rvert T \rvert - 1} \rvert A \subseteq \mathcal{P}(\mathcal{F}) \}$ where $\mathcal{P}(S)$ denotes the power set of set $S$. In words, $\mathcal{F}_1$ allows us to answer any question about $X_1$ but nothing about the rest of the process.

This sort of relationship continues for all of the sub-$\sigma$-fields in our filtration. Each successive sub-$\sigma$-field, $\mathcal{F}_{t + 1}$ contains all of the information about our stochastic process’s past development. From our explanation for $t=1$, it is clear that $\mathcal{F}_t \subseteq \mathcal{F}_{t+1}$. However, $\mathcal{F}_{t+1}$ contains finer grained subsets since we gain knowledge about an additional timepoint in our process.

Example (Filtration).
Suppose we are flipping a coin two times. Let the tuple $(\Omega, \mathcal{F}, P)$ denote the probability space upon which the random variables representing these flips are defined. The sample space, $\Omega$, is the set $\{ HH, HT, TH, TT \}$.
At $t = 0$, $\mathcal{F}_0 = \{ \emptyset, \Omega \}$, since we don't know anything about what the outcomes will be.
At $t = 1$, we observe one coin flip, which has to either come out as heads or tails. Thus, $\mathcal{F}_1 = \{ \emptyset, \Omega, \{ HH, HT \}, \{ TH, TT \} \}$. We can interpret the additional two sets as showing us the possible outcomes once we have observed the first flip. $\{ HH, HT \}$ states that, once we see a $H$, that the only possible final results are $HH$ or $HT$.
At $t = 2$, we observe the second coin flip, so we know everything about this process. $\mathcal{F}_2 = \{ \emptyset, \Omega, \{ HH, HT \}, \{ TH, TT \}, \{ HH \}, \{ HT \}, \{ TT \}, \{ TH \} \}$. Similar to the case with $\mathcal{F}_1$, after we see both flips, we only have one possible outcome (the one we observed). Thus, we add all singleton sets. Notice that this is $\mathcal{P}(\Omega)$, the power set of the sample space.

The natural filtration is the $\sigma$-field generated by $(X_s)_{s \leq t}$.

Definition (Natural Filtration).
Let $X: T \times \Omega \rightarrow S$ be a stochastic process on probability space $(\Omega, \mathcal{F}, P)$ with (measurable) state space $(S, \Sigma)$. The natural filtration of $\mathcal{F}$ with respect to $X$ is the filtration $\mathbb{F}^X = (\mathcal{F}_t^X)_{t \in T}$ where: $$ \mathcal{F}_t^X = \sigma\left(X^{-1}_j(A) \big\rvert j \in T, j \leq t \text{ and } A \in \Sigma\right) \nonumber $$

A stochastic process $(X_t)_{t \in T}$ is said to be adapted to the filtration $\mathbb{F}$ if $X_t: \Omega \rightarrow S$ is a $(\mathcal{F}_t, \Sigma)$-measurable function for all $t \in T$.

For a filtration $\mathcal{F}_n$ with $n \geq 0$, we call a sequence $H_n$ for $n \geq 1$ predictable if $H_n$ is $\mathcal{F}_{n-1}$-measurable for all $n \geq 1$.

Properties

Sometimes we need additional assumptions to hold on a given filtration. One is continuity, and another is completeness.

Definition (Right-Continuity).
Let $\mathbb{F} = (\mathcal{F}_t)_{t \in T}$ be a filtration. If, for all $t \in T$, $\mathcal{F}_t = \bigcap_{s > t} \mathcal{F}_s$, then we call $\mathbb{F}$ right-continuous.

Definition (Complete).
Let $(\Omega, \mathcal{F}, P)$ be a probability space, and let $N_P = \{ A \subseteq \Omega \rvert A \subseteq B, B \in \mathcal{F} \text{ s.t. } P(B) = 0 \}$ be the set of all sets contained in the null set (with respect to $P$). We call a filtration $(\mathcal{F}_t)_{t \in T}$ complete if $N_P \subseteq \mathcal{F}_t$ for all $t$.
Equivalently, $(\mathcal{F}_t)_{t \in T}$ is complete if $(\Omega, \mathcal{F}_i, P)$ is a complete measure space for all $t$.

We can define the $P$-completion of a $\sigma$-field, $\mathcal{F}$, as the union of $\mathcal{F}$ with all sets $E \in \Omega$ such that $P(E) = 0$.

A probability space $(\Omega, \mathcal{F}, P)$ equipped with filtration $\mathbb{F} = (\mathcal{F}_t)_{t \geq 0}$ where $\mathcal{F}_t$ is a sub-$\sigma$-field of $\mathcal{F}$ is called a filtered probability space.

Stochastic processes can also be described by a stopping time, which is another random variable the describes when a stochastic process will display some phenomenon or behavior.

Definition (Stopping Time).
Let $(\Omega, \mathcal{F}, \mathbb{F}, P)$ be a filtered probability space. Let $\tau$ be a random variable defined on this space and taking values in index set $T$. $\tau$ is a stopping time (with respect to $\mathbb{F} = (\mathcal{F}_t)_{t \in T}$) if $\{ \tau \leq t \} \in \mathcal{F}_t$ for all $t \in T$.

In simpler terms, $\tau \in { 0, 1, 2, \dots } \cup { \infty }$ is a stopping time if, for any $n \in \mathbb{N}$, the event ${ \tau = n }$ is entirely known just from the information up until time $n$ (i.e. determined by ${X_1, \dots, X_n}$). This can be written as ${ \tau = n } \in \mathcal{F}_n$.

Example (Stopping Time).
Let $X_1, X_2, \dots$ be random variables in $\mathbb{R}$. Let $S_n = X_1 + \dots + X_n$. The random variable defined by: $$ \tau = \inf\{t \rvert S_t \geq c \} \nonumber $$ for $c \in \mathbb{R}$ is a stopping time.

Constant times (e.g. $\tau = c$ for some $c \in { 0, 1, 2, \dots } \cup { \infty }$) are stopping times, and the minimum and maximum of two stopping times is also a valid stopping time.

Martingales

Related to stochastic processes and filtrations is the martingale. Here I’ll cover some basic definitions, but interested readers can see my post on martingales for more details.

Definition (Martingale).
Let $S$ be a Banach space with norm $\rvert \rvert \cdot \rvert \rvert_S$. A stochastic process $X: T \times \Omega \rightarrow S$ on $(\Omega, \mathcal{F}, P)$ with state space $(S, \Sigma)$ is called a martingale with respect to the filtration $\mathbb{F} = \{ \mathcal{F}_t: t \in T \}$ and probability measure $P$ if:

$\mathbb{F}$ is a filtration of $(\Omega, \mathcal{F}, P)$
$X$ is adapted to $\mathbb{F}$ (i.e. $X_t$ is $\mathcal{F}_t$-measurable for all $t \in T$)
For all $t \in T$, $\mathbb{E}_P[ \rvert \rvert X_t \rvert \rvert_S ] < +\infty$
For all $s, t \in T$ with $s < t$, and for all subsets $F \in \mathcal{F}_s$, $\mathbb{E}_P[\mathbf{1}\{(X_t - X_s) \in F \}] = 0$

This last condition can be rewritten in terms of the conditional expectation with respect to a sub-$\sigma$-field: $\mathbb{E}_P[ X_t \rvert \mathcal{F}_s ] = 0$. This notation emphasizes that this expectation is an $\mathcal{F}_s$-measurable function. Put more intuitively, albeit with some simplification, a discrete-time martingale is a stochastic process $\{ X(t, \omega): t \in T, \omega \in \Omega\}$ such that, for any $t \in T$:

$\mathbb{E}[\rvert X_t \rvert] < \infty$
$\mathbb{E}[X_{t+1} \rvert X_1, \dots, X_t] = X_t$

A relaxation of the condition of a martingale leads to the following object used frequently in game theoretic statistics.

Definition (Supermartingale).
Let $(\Omega, \mathcal{F}, P)$ be a probability space. Let $( X_t )_{t \in T}$ be a stochastic process (with index set $T = \{ 0, 1, \dots \}$ or $T = [0, \infty)$) defined on this space, and let $\mathbb{F} = ( \mathcal{F}_t )_{t \in T}$ be a filtration of sub-$\sigma$-fields of $\mathcal{F}$ (i.e. $\mathcal{F}_t \subseteq \mathcal{F}$ for all $t \in T$). $(X_t, \mathcal{F}_t)$ is a supermartingale if:

$X_t$ is $\mathcal{F}_t$-measurable for all $t \in T$ (i.e. $(X_t)_{t \in T}$ is adapted to $\mathbb{F}$)
$X_t$ is integrable for all $t \in T$
$\mathbb{E}[X_t \rvert \mathcal{F}_s] \leq X_s$ for all $t \in T$ and all $s < t$ almost surely

We also call the stochastic process $(X_t)_{t \in T}$ a supermartingale if $(X_t, \mathcal{F}_t)$ is a supermartingale and $\mathcal{F}_t$ is the $\sigma$-field generated by $X_t$.

More specifically, game theoretic statistics makes use of the test supermartingale.

Definition (Test Supermartingale).
A supermartingale $(X_t, \mathcal{F}_t)_{t \in T}$ is called a test supermartingale if $X_t \geq 0$ for all $t$ and $\mathbb{E}[X_0] \leq 1$ (or $X_0 = 1$, which is basically the same as the previous statement if we set $X_t = 1$ for $t < 0$).
Similarly, a test martingale is a non-negative martingale such that $\mathbb{E}[X_0] = 1$.

Intuitively, a test supermartingale describes the cumulative gain or loss of a player in game where they bet against some hypothesis described by $P$. The initial condition basically specifies the amount of capital the player begins with (1 dollar), and the non-negatively condition ensures the player never loses (cumulatively) more than the initial dollar they they started with.

If the player has a lot of money at round $t$ (i.e. large $X_t$), then they have evidence against $P$ being true. This interpretation makes sense when you think about real betting. Winning big is rare at a casino, so having a large payout is associated with low probability events in a game.

One property of test supermartingales is the maximal inequality:

\[P\left( \underset{ t \leq \infty}{\sup} X_t \geq c \right)\leq \frac{1}{c} \hspace{5mm} \forall c \geq 1 \nonumber\]

This inequality states thats the probability (with respect to the probability measure, $P$, of our probability space) that our test supermartingale exceeds some value, $c$, that is at least $1$ is inversely related to the magnitude of $c$. This implies that the probability that a player earns an infinite amount of money in the game is zero.

A very important result in probability theory is Ville’s inequality, which upperbounds the probability a supermartingale gets at least as big as some chosen value.

Theorem (Ville's Inequality)⁵.
Let $(X_t)_{t = 0}^\infty$ be a non-negative supermartingale. For any positive $a \in \mathbb{R}$: $$ \mathbb{P}\left( \underset{n \geq 0}{\sup} X_n \geq a \right) \leq \frac{\mathbb{E}[X_0]}{a} \nonumber $$

References

Ramdas, A. (2018). Carnegie Mellon University. https://www.stat.cmu.edu/~aramdas/martingales18/. ↩
Durrett, R. (n.d.). Probability: Theory and Examples. ↩
Shafer, G., Shen, A., Vereshchagin, N., & Vovk, V. (2011). Test Martingales, Bayes Factors and $p$-Values. Statistical Science, 26(1). https://doi.org/10.1214/10-STS347. ↩
Ramdas, A., Ruf, J., Larsson, M., & Koolen, W. (2022). Admissible anytime-valid sequential inference must rely on nonnegative martingales (arXiv:2009.03167). arXiv. https://doi.org/10.48550/arXiv.2009.03167 ↩ ↩²
Wikipedia contributors. (2024, March 13). Ville’s inequality. In Wikipedia, The Free Encyclopedia. Retrieved 20:35, May 22, 2025, from https://en.wikipedia.org/w/index.php?title=Ville%27s_inequality&oldid=1213448789. ↩