Measures
In this post, I’ll be going through Chapter 2 in Axler
As before, we’ll let $\mathbb{R}$ denote the closed, ordered field of real numbers.
First, we begin with the idea of length for open intervals in $\mathbb{R}$.
We can use this (pretty intuitive!) definition of length to define the outer measure, which gives us a way to describe how big a set is.
The outer measure satisfies some properties that make it nice to work with.
Every countable subset of $\mathbb{R}$ has outer measure $0$
Let $A \subseteq \mathbb{R}$ be a countable subset with $A = { a_1, a_2, \dots }$. Let $\epsilon > 0$ be an arbitrary, positive, real number. For any positive integer \(k \in \mathbb{Z}_+\), we define:
\[I_k = \left(a_k - \frac{\epsilon}{2^k}, a_k + \frac{\epsilon}_{2^k} \right)\]which is the open interval centered at $a_k$ with length $\frac{\epsilon}_{2^{k-1}}$. Clearly, the union of $I_1, I_2, \dots$ contains $A$. Furthermore, notice that:
\[\sum_{k = 1}^\infty \ell(I_k) = \sum_{k = 1}^\infty \frac{\epsilon}{2^{k-1}} = 2\epsilon \sum_{k = 1}^\infty \frac{1}{2^k} = 2 \epsilon\]We then take the infimum, which means we shrink $\epsilon$ as small as possible, which implies $\rvert A \rvert = 0$.
Order Preservation
Let $I_1, I_2, \dots$ be a sequence of open intervals in $\mathbb{R}$ whose union contains $B$. Since $A \subseteq B$, this union also contains $A$. Thus:
\[\rvert A \rvert \leq \sum_{k = 1}^\infty \ell(I_k)\]Since the above holds for all such $I_1, I_2, \dots$, it will also hold for the infimum. Thus:
\[\rvert A \rvert \leq \underset{I_1, I_2, \dots}{\inf} \left\{ \sum_{k = 1}^{\infty} \ell(I_k) \right\} = \rvert B \rvert\]Translation Invariance
Let $I_1, I_2, \dots$ be a sequence of open intervals whose union contains $A$. Clearly, the union of the sequence of translated open intervals $t + I_1, t + I_2, \dots$ will contain $t + A$. Thus:
\[\rvert t + A \rvert \sum_{k = 1}^\infty \ell(t + I_k) = \sum_{k = 1}^\infty \ell(I_k)\]since translation will not affect the length of the intervals. We can then take the infimum over the right-hand side to see that:
\[\rvert t + A \rvert \leq \underset{I_1, I_2, \dots}{\inf} \left\{ \sum_{k = 1}^{\infty} \ell(I_k) \right\} = \rvert A \rvert\]Now, we notice that $A = t + (-t + A)$. We can repeat the above proof with $(-t + A)$ as $A$ and $-t$ as $t$ to see that:
\[\rvert t + (-t + A) \rvert \leq \underset{I_1, I_2, \dots}{\inf} \left\{ \sum_{k = 1}^{\infty} \ell(I_k) \right\} = \rvert -t + A \rvert\]This shows the other direction, and so we conclude that $\rvert t + A \rvert = \rvert A \rvert$.
Countable Additivity
Assume \(\rvert A_k \rvert < \infty$ for all $k \in \mathbb{Z}_+\). For each $k \in \mathbb{Z}+$, define a sequence of open intervals $I{1,k}, I_{2,k}, \dots$ whose union contains $A_k$. In the event that some fo the $A_k$ overlap, we have the inequality:
\[\cup_{k = 1}^\infty A_k \subseteq \cup_{k = 1}^\infty \cup_{j = 1}^\infty I_{j, k}\]By the order preservation property of the outer measure, we have
\[\rvert \cup_{k = 1}^\infty A_k \rvert \leq \rvert \cup_{k = 1}^\infty \cup_{j = 1}^\infty I_{j, k}\rvert\]Taking the infimum over the sequences of open intervals yields the result. Note that if the outer measure of any $A_k$ is not finite, then the claim clearly holds.
Note that, since finite sets are also countable, the countable subadditivity property implies finite subadditivity of the outer measure.
Before we can discuss the outer measure of closed (bounded) intervals, we need to introduce a few more tools.
As an example, consider $A = [2, \infty)$. The collection $\mathcal{C} = { (k, k + 2) \rvert k \in \mathbb{Z}_+ }$ is an open cover of $A$; however, $A$ does not have a finite subcover.
We now come to an important theorem in measure theory: the Heine-Borel Theorem.
Every open cover of a closed, bounded subset of $\mathbb{R}$ has a finite subcover. In other words, any closed and bounded subset of $\mathbb{R}$ is compact.
This proof follows the wikipedia article. Let $F \subseteq \mathbb{R}$ be a closed, bounded subset of $\mathbb{R}$, and let $\mathcal{C}$ be an open cover of $F$.
Compact sets are closed.
Let $S \subseteq \mathbb{R}$ be a compact set, and let $h$ be a limit point of $S$ (i.e. every neighborhood of $h$ contains at least one $x \in S$ such that $x \neq h$).
Suppose $S$ is not closed. This implies that there exists a limit point $h \notin S$. We will consider an open neighborhood about each $x \in S$, denoted by $N_x$, all of which are disjoint from some neighborhood of $h$, $V_x$ (not necessarily the same one!). Let \(\mathcal{C}'\) be the collection of such $N_x$. Furthermore, for any $x \in S$, we can find $C \in \mathcal{C}’$ such that $x \in C$. Since each $N_x$ is open, this implies that $\mathcal{C}’$ is an open cover of $S$.
Consider an arbitrary, finite subcollection of $\mathcal{C}’$, denoted by $\mathcal{K}$. Let \(X_\mathcal{K}\) denote the set of all $x$ corresponding to $N_x \in \mathcal{K}$. For every $N_x \in \mathcal{K}$, there exists a $V_x$ that $N_x$ does not intersect with. Thus, \(\cup_{x \in X_{\mathcal{K}}} N_x\) will be disjoint from \(\cap_{x \in X_{\mathcal{K}}} V_x\).
Since each $V_x$ is a neighborhood of $h$, there exist open sets $U_x$ such that $h \in U_x \subseteq V_x$ (by definition of neighborhood). It follows that:
\[h \in \cap_{x \in X_{\mathcal{K}}} U_x \subseteq \cap_{x \in X_{\mathcal{K}}}\]which implies that the intersection of such $V_x$ is a neighborhood of $h$. Note, the intersection will not be the set containing only $h$ because we can construct an open ball contained in each $U_x$ (and therefore each $V_k$) that is centered at $h$. The intersection will contain the smallest open ball out of all of the $U_x$.
Since the intersection is a neighborhood of $h$, there exists $x \in \cap_{x \in X_{\mathcal{K}}}$ such that $x \in S$. However, because of how we constructed $\mathcal{C}’$, $x \notin \cup_{x \in X_{\mathcal{K}}} N_x$. Thus, $\mathcal{K}$ is not an open subcover of $S$. This implies $S$ is not compact, which is a contradiction. Thus, all limit points must be in $S$, so $S$ is closed.
Compact sets are bounded.
Let $S \subseteq \mathbb{R}$ be a compact set, and let $U_x$ be an open ball centered at some $x \in \mathbb{R}$ with radius $1$. Clearly, $S \subseteq \cup_{x \in S} U_x$, so $U_x$ is an open cover of $S$.
Consider some finite subcover of $S$ of this union. Let $M$ be the maximum distance between the centers of any two balls (denote these centers as $C_p$ and $C_q$). Take any arbitrary points $p, q \in S$ such that $p$ is in the ball centered at $C_p$, and $q$ is in the ball centered at $C_q$. Then, by the triangle inquality:
\[d(p, q) \leq d(p, C_p) + d(C_p, C_q) + d(C_q, q) \leq 1 + M + 1 \leq 2 + M\]So the diameter of $S$ is bounded by $2 + M$.
Closed subsets of compact sets are compact.
Let $K$ be a closed subset of $T \subseteq \mathbb{R}$ where $T$ is compact. Let $\mathcal{C}_K$ be an open cover of $K$. Since $K$ is closed, $U = \mathbb{R} \setminus K$ is an open set, and $\mathcal{C}_T = \mathcal{C}_K \cup \left{ U \right}$ is an open cover of $T$.
Since $T$ is compact, there exists a finite subcover of $T$ of $\mathcal{C}_T$, which we denote by \(\mathcal{C}_T'\), which also covers $K$. Moreover, $K$ is also covered by \(\mathcal{C}_{K}' = \mathcal{C}_{T}' \setminus \{ U \}\) since there are no points in $U$ that are in $K$.
Since \(\mathcal{C}_T'\) is a finite subcover, \(\mathcal{C}_{K}'\) is also finite, and every \(C \in \mathcal{C}_{K}'\) is in \(\mathcal{C}_T'\), which implies \(C \in \mathcal{C}_T\). By the definition of $T$ and the fact that $C \notin U$, we have that $C \in \mathcal{C}_K$. Thus, \(\mathcal{C}_K'\) is a finite subcollection of $C_K$ and therefore a finite subcover of $K$ (i.e. $K$ is compact).
Closed, bounded sets are compact. Let $S \in \mathbb{R}$ be bounded. Let $T_0$ be a closed and bounded interval enclosing $S$. WLOG, we will say that $T_0 = [-a, a]$ for some $a > 0$.
Suppose $T_0$ is not compact. Let $\mathcal{C}$ be the (infinite) open cover of $T_0$ with no finite subcover of $T_0$. We can divide $T_0$ into two pieces, one of which must have an infinite subcover of $\mathcal{C}$ (if neither did, then the union of the two subcovers would be a subcover of $\mathcal{C}$). Denote this half $T_1$, and note that $T_1 \subset T_0$.
We can repeat the same argument for $T_1$ to get $T_2$ where $T_2 \subset T_1$. We can continue this to construct a sequence of nested intervals:
\[T_0 \supset T_1 \supset T_2 \supset \dots \supset T_k \supset \dots\]The length of interval $T_k$ is the length of $T_0$ divided by $2^k$. Thus:
\[\underset{k \rightarrow \infty}{\lim} \frac{2a}{2^k} = 0\]Let \((x_k)_k\) such that $x_k \in T_k$. Clearly, this sequence is Cauchy, since $x_k$ and $x_{k+1}$ can be, at most, $\frac{2a}{2^k} + \frac{2a}{2^{k+1}}$ apart. Since the sequence is Cauchy, it must converge (let $L$ denote this limit). Since the boxes are nested, for any $k$, the sequence will eventually always be inside $T_k$, and each $T_k$ is closed, so $L \in T_k$.
Recall that $\mathcal{C}$ covers $T_0$, so since $L \in T_k$ and $T_k \subset T_0$, there must exist some open set $U \in \mathcal{C}$ such that $L \in U$. As an open set, one can define an open ball centered at $L$, denoted by $B_L$, that is contained in $U$.
If we pick a big enough value $k$, then $T_k \subseteq B_L \subseteq U$. However, this implies that the single subset $U$ is an open cover of $T_k$, which means it is a finite subcover of $T_k$. This contradicts the fact that all of the $T_k$s do not have finite subcovers. Thus, $T_0$ must be compact.
By the previous result, since $S$ is closed and a subset of $T_0$, $S$ is also compact.
With the Heine-Borel Theorem, we can show that the outer measure of a closed interval in $\mathbb{R}$ is simply the difference betwen the endpoints.
Let $a, b \in \mathbb{R}$ with $a < b$. The outer measure of $[a, b]$ is given by:
\[\rvert [a, b] \rvert = b - a\]First, consider $\epsilon > 0$. We can define $(a-\epsilon, b + \epsilon), \emptyset, \emptyset, \dots$, which is a sequence of open intervals whose union contains $[a, b]$. Let $u_i$ denote the $i$-th interval in this sequence. Then:
\[\begin{aligned} \rvert \cup_{i = 1}^\infty u_i \rvert &\geq \rvert [a, b] \rvert \\ \implies \rvert (a - \epsilon, b + \epsilon) \rvert &\geq \rvert [a, b] \rvert \\ \implies (b - a) + 2 \epsilon &\geq \rvert [a, b] \rvert \end{aligned}\]Taking $\epsilon \rightarrow 0$ yields one direction of the result:
\(\rvert [a, b] \rvert \leq b - a\)
Now, let $I_1, I_2, \dots$ be a sequence of open intervals satisfying $[a, b] \subseteq \cup_{k = 1}^\infty I_k$. By the Heine-Borel theorem, there exists a finite subcover of $[a, b]$ from this open cover. That is, there exists some $n \in \mathbb{Z}_+$ such that:
Base case. Suppose that $n = 1$. Clearly:
\[b - a \leq \ell(I_1) \leq \sum_{k = 1}^\infty \ell(I_k)\]Inductive hypothesis. Now, suppose $n > 1$ and that the following holds for all $a, b \in \mathbb{R}$ with $a < b$:
\[[a, b] \subseteq I_1 \cup \dots \cup I_n \implies \sum_{k = 1}^n \ell(I_k) \geq b - a\]Consider appened one more open interval such that:
\[[a, b] \subseteq I_1 \cup \dots \cup I_n \cup I_{n+1}\]WLOG, assume $b \in I_{n+1} = (c, d)$ (we can relabel the intervals to make this true). If $c \leq a$, then the length of $I_{n+1}$ must be at least as large as $b - a$, and it follows that:
\[b - a \leq \sum_{k = 1}^n \ell(I_k)\]The other case is that $c > a$, so we have $a < c < b < d$, which implies:
\[[a, c] \subseteq I_1 \cup \dots \cup I_n\]By assumption, we have that:
\[c - a \leq \sum_{k = 1}^n \ell(I_k)\]We then have:
\[\begin{aligned} b - a &\leq d - a \\ &= (c - a) + (d - c) \\ &= (c - a) + \ell(I_{n+1}) \\ &\leq \sum_{k = 1}^{n+1} \ell(I_k) \end{aligned}\]which completes the proof.
Unfortunately, the outer measure does not satisfy additivity, which is a property that would make it behave more intuitively and lend itself naturally to more advanced usage. More specifically, there exist $A, B \subset \mathbb{R}$ such that $A \cap B = \emptyset$ and:
\[\rvert A \cup B \rvert \neq \rvert A \rvert + \rvert B \rvert\]Perhaps somewhat surprising, but there does not exist a way to describe the size of a subset of $\mathbb{R}$ that satisfies all of the nice properties we desire.
There does not exist a function, $\mu$, that satisfies all of the following:
Let $\mu$ be a function that satisfies all four properties.
Some consequences. Notice that $\mu(\emptyset) = 0$ by Property 2. Let $A \subseteq B \subseteq \mathbb{R}$. Notice that $B = \cup_{i = 1}^\infty C_i$ where $C_1 = A$, $C_2 = B \setminus A$, and $C_i = \emptyset$ for $i \geq 3$ (i.e. $B$ can be written as a sequence of disjoint sets). By Property 3, we see:
\[\begin{equation} \label{eq:order-preserving} \mu(B) = \mu(A) + \mu(B \setminus A) + \mu(\empty) + \dots = \mu(A) + \mu(B \setminus A) \geq \mu(A) \end{equation}\]Let $a, b \in \mathbb{R}$ with $a < b$. For any choice of $\epsilon > 0$, we have $(a, b) \subseteq [a, b] \subseteq (a - \epsilon, b + \epsilon)$. This implies:
\[\begin{equation} \label{eq:closed-size} b - a \leq \mu([a, b]) \leq b - a + 2 \epsilon \implies \mu([a, b]) = b - a \end{equation}\]Let $A_1, A_2, \dots \subseteq \mathbb{R}$. We consider the sequence of disjoint subsets constructed as $A_1, A_2 \setminus A_1, A_3 \setminus (A_2 \cup A_1), \dots$. We then have:
\[\begin{aligned} \mu\left(\cup_{k = 1}^\infty A_k \right) &= \mu\left( A_1 \cup (A_2 \setminus A_1) \cup (A_3 \setminus (A_1 \cup A_2)) \cup \dots\right) \\ &= \mu(A_1) + \mu(A_2 \setminus A_1) + \mu(A_3 \setminus (A_1 \cup A_2)) + \dots & \left(\text{Property } 2 \right)\\ &\leq \sum_{k = 1}^\infty \mu(A_k) \end{aligned}\]A Contradiction. We now show that there exist disjoint $A, B \subseteq \mathbb{R}$ such that $\rvert A \cup B \rvert \neq \rvert A \rvert + \rvert B \rvert$, which contradicts Property 3.
Let $a \in [-1, 1]$, and define the set of numbers in $[-1, 1]$ that differ from an element in $a$ by a rational number:
\[\tilde{a} = \left\{ c \in [-1, 1] \rvert a - c \in \mathbb{Q} \right\}\]where $\mathbb{Q}$ is the set of rational numbers.
Suppose $a, b \in [-1, 1]$ and $\tilde{a} \cap \tilde{b} \neq \emptyset$. Then it must be the case that $\tilde{a} = \tilde{b}$. This follows from the fact that, if there exists a $d \in \tilde{a} \cap \tilde{b}$, then $a - d, b - d \in \mathbb{Q}$. If we subtract the two, we see that $(a - d) - (b - d) = a - b \in \mathbb{Q}$ since the difference of rational numbers is rational. Consider now the equation $a - c = (a - b) + (b - c)$, which we obtain by adding and subtracting $b$. If $c \in [-1, 1]$, then $a - c$ is rational if, and only if, $b - c$ is also rational. Thus, $\tilde{a} = \tilde{b}$.
For every $a \in [-1, 1]$, $a \in \tilde{a}$ (since $a$ differs from $a$ by $0$, which is rational). Thus:
\[[-1, 1] = \bigcup_{a \in [-1, 1]} \tilde{a}\]We define the set, $V$, which is constructed by taking a single element from each of the distinct sets in ${ \tilde{a} \rvert a \in [-1, 1] }$. Clearly, $a - v \in \mathbb{Q}$ because $v \in \tilde{a}$, and all elements in $\tilde{a}$ differ from $a$ by a rational number.
We then define $r_1, r_2, \dots$, which is a sequence of distinct rational number that satisfy:
\[[-2, 2] \cap \mathbb{Q} = \{ r_1, r_2, \dots \}\]Since $a - v \in \mathbb{Q}$, it follows that $a = r_k + v \in r_k + V$ for some $k \in \mathbb{Z}_+$. Thus:
\[[-1, 1] \subseteq \bigcup_{k = 1}^\infty (r_k + V)\]By Property 3 and Eq. \eqref{eq:order-preserving}, we have that:
\[\mu([-1, 1]) \leq \sum_{k = 1}^\infty \mu(r_k + V)\]And by Eq. \eqref{eq:closed-size}:
\[\mu([-1, 1]) = 2\]Then, by Property 4, we can translate the subsets on the right-hand side to get:
\[2 \leq \sum_{k = 1}^\infty \mu(V) \implies \mu(V) > 0\]Now, if there exists $t \in (r_j + V) \cap (r_k + V)$, then it must be the case that $t = r_j + v_1 = r_k + v_2$ for some $v_1, v_2 \in V$. Subtracting $v_2$ and $r_j$ from both sides yields the equality $v_1 - v_2 = r_k - r_j$. Clearly, $v_1 - v_2 = r_k - r_j \in \mathbb{Q}$, since $r_k, r_j \in \mathbb{Q}$. However, by the construction of $V$, $v_1 = v_2$, so $r_k = r_j$, and thus $j = k$. This implies $(r_1 + V), (r_2 + V), \dots$ are disjoint sets.
For arbitrary $n \in \mathbb{Z}_+$, we have:
\[\bigcup_{k = 1}^n (r_k + V) \subseteq [-3, 3]\]since $V \subseteq [-1, 1]$ and $r_k \in [-2, 2]$ for all $k$. By Property 2 and Property 3, we have:
\[\mu\left( \bigcup_{k = 1}^n (r_k + V) \right) \leq 6\]Let’s pick $n \in \mathbb{Z}_+$ such that $n \mu(V) > 6$. By Property 3 and Property 4:
\[\begin{equation} \label{eq:2-22} \mu\left( \bigcup_{k = 1}^n (r_k + V) \right) \leq 6 < \sum_{k = 1}^n \mu(r_k + V) = \sum_{k = 1}^n \mu(V) = n \mu(V) \end{equation}\]This is basically the end of the proof, since we have constructed an example of such disjoint subsets. We can do a little more explanation, though.
If it were the case that $\mu(A \cup B) = \mu(A) + \mu(B)$ for any disjoint subsets $A, B \subseteq \mathbb{R}$, then (by induction), we would have that:
\[\mu\left(\bigcup_{k = 1}^n A_k \right) = \sum_{k = 1}^n \mu(A_k)\]for any disjoint subsets $A_1, \dots, A_n \subseteq \mathbb{R}$. However, Eq. \eqref{eq:2-22} shows this is not the case. Thus, Property 3 is violated.
The failure to find a function that satisfies all of the properties listed in Theorem 2.22 means we should probably drop one so that we can continue with analysis… Because Properties 2-4 are so important to our intuition and the utility of this idea of size, we must be satisfied with only these three. We now come to some fundamental building blocks of measure theory.
We’ll begin with a $\sigma$-algebra. This is simply a collection of subsets of some other set.
An example may make it a bit more concrete in one’s mind…
Notice that the first and second properties in the above definition imply that $\emptyset \in \mathcal{S}$ as well. The properties also imply that a $\sigma$-algebra must be closed under countable intersection. That is, $\cap_{i = 1}^\infty A_i \in \mathcal{S}$ for some sequence of $A_1, A_2, \dots \in \mathcal{S}$.
A $\sigma$-field is a generalization of the concept of an algebra (also called a field).
Now we can define measurable spaces!
We’ll come back to this definition later when we discuss measures, but a measurable space is just a space that could be assigned a measure.
Let’s finish up this sub-section by introducing topological spaces and Borel sets.
A lot of proofs I’ve come across implicitly define random variables on probability spaces (to be covered later!) where the $\sigma$-algebra is the Borel $\sigma$-algebra on $\mathbb{R}$.
This definition is a bit tricky to develop intuition for. The Borel $\sigma$-algebra is just the collection of all possible open sets in a given space, $X$ (usually $\mathbb{R}$).
An important Borel $\sigma$-algebra that will come up again when we discuss measures and probability is the Borel $\sigma$-algebra on the real line. Several examples follow from our definition:
Borel sets on $\mathbb{R}$ can also be extended to $[-\infty, \infty]$. However there are many sequences that are not Borel sets. For example, the set of all countable unions of countable intersections of open subsets of $\mathbb{R}$ is not the set of Borel sets.
Along with the Borel set and the $\sigma$-algebra is the semialgebra.
This concept will not be as useful in later discussions, but we include it for completeness. An example of a semialgebra is the union of ${ \emptyset }$ and the collection of sets that can be written as:
\[(a_1, b_1] \times \dots \times (a_d, b_d] \subset \mathbb{R}^d \hspace{5mm} \text{for } -\infty \leq a_i < b_i \leq \infty\]Given a semialgebra, $\mathcal{S}$, the collection of finite disjoint unions of sets in $\mathcal{S}$ forms an algebra called the algebra generated by $\mathcal{S}$.
We now need to define a concept that is at the crux of our discussions of mappings: the inverse image.
In words, the inverse image of a subset $A$ of $Y$ under function $f$ is the subset of elements in the domain $X$ that map to elements in $A$. It’s important to note that the inverse image of the whole of $Y$ does not necessarily have to be the whole of $X$!
We now introduce a definition that describes what it means for functions of a certain type to be “nice” with respect to a $\sigma$-field.
The basic idea behind an $\mathcal{S}$-measurable function is that we should be able to achieve any Borel set as output for some part of $\mathcal{S}$, which is in its domain (since $\text{dom}(f) = X$). It is important to remember that measurability is with respect to the $\sigma$-fields of the two measure spaces of interest.
To put it intuitively, a measurable function $f$ needs to take on values that “make sense” with respect to the $\sigma$-field of interest. For example, only constant functions are measurable with respect to the trivial $\sigma$-field ${ \emptyset, \Omega }$ for some $\Omega$. In addition, we have the following claim:
Constant functions are measurable with respect to any $\sigma$-field.
Suppose we have measurable spaces $(X, \mathcal{S})$ and $(Y, \mathcal{S}’)$. Let $\mathcal{S} = { \emptyset, \Omega }$, and suppose we have non-constant function $f: X \rightarrow Y$. That is, there exist $a, b \in \Omega$ such that $f(a), f(b) \in \mathcal{S}’$ and $f(a) \neq f(b)$.
Consider the pre-image of one of these points. We know that $f^{-1}(f(a)) = a \notin \mathcal{S}$ since $a$ is neither the null set nor the entirety of $\Omega$ (since we also have $b$ and, necessarily, $a \neq b$).
To prove the second claim, consider $\mathcal{S} = { \emptyset, \Omega }$ and arbitrary $\mathcal{S}’$ in the previous set-up. Since $f$ is constant, it must be the case that $f(x) = a$ for all $x \in X$ and some $a$. Pick any $s \in \mathcal{S}’$. If $a \in s$, then $f^{-1}(s) = \Omega$, since any input value maps to $a$ ($f$ is constant). If $a \notin s$, then $f^{-1}(s) = \Omega^c = \emptyset$ by the same argument.
Thus, for any $s \in \mathcal{S}’$, $f^{-1}(s) \in \mathcal{S}$, implying that $f$ is $(\mathcal{S}, \mathcal{S}’)$-measurable for any $\mathcal{S}’$.
To check whether a function is $\mathcal{S}$-measurable, it is sufficient to check whether \(f^{-1}((a, \infty]) = \{ x \in X \rvert f(x) > a \} \in \mathcal{S}\) for all $a \in \mathbb{R}$.
Furthermore, in the special case that $X \subseteq \mathbb{R}$ and $\mathcal{S}$ is the set of Borel subsets of $\mathbb{R}$ that are contained in $X$, then a function $f: X \rightarrow \mathbb{R}$ is called Borel measurable if $f^{-1}(B)$ is a Borel set for all Borel sets $B \subseteq \mathbb{R}$. It can be shown that any continuous or increasing function $f: X \rightarrow \mathbb{R}$ where $X$ is a Borel subset of $\mathbb{R}$ is Borel measurable (see pg. 33 for a proof).
Later, it will be useful to have a sense of limits of measurable functions.
A similar result holds if $f_1, f_2, \dots$ are $\mathcal{S}$-measurable functions from $X$ to $[-\infty, \infty]$. Then $g, h: X \rightarrow [-\infty, \infty]$ defined as:
\[g(x) = \inf\left\{ f_k(x) \rvert k \in \mathbb{Z}_+ \right\} \hspace{5mm} h(x) = \sup \left\{ f_k(x) \rvert k \in \mathbb{Z}_+ \right\}\]are also $\mathcal{S}$-measurable.
Simple functions are ones that take on only finitely many values. For example, a constant function is simple because it takes on only one value. We have the following feature of approximating measurable function with simple functions.
In other words, for a measurable space $(X, \mathcal{S})$, a simple function $f: X \rightarrow \mathbb{R}$ can be written as:
\[f = c_1 \chi_{E_1} + \dots + c_n \chi_{E_n}\]where $c_1, \dots, c_n \in \mathbb{R}$ are the distinct values that $f$ can take on, and $E_k = f^{-1}({ c_k })$. We also have the following result that will help us prove other things later.
We have finally come to the star of our discussion: the measure. A measure is a function that assigns a “size” to sets (it is similar to the idea of length for intervals or area for two dimensional regions).
With this definition, we define a measure space, which is the tuple $(X, \mathcal{S}, \mu)$.
For measure space $(X, \mathcal{S}, \mu)$ and $A, B \in \mathcal{S}$ such that $A \subseteq B$, we have that $\mu(A) \leq \mu(B)$ (order preserving) and $\mu(B \setminus A) = \mu(B) - \mu(A)$ (assuming that $\mu(A)$ is finite).
We also have the additional property of countable subadditivity, which is basically a generalization of Boole’s inequality:
\[\mu\left(\bigcup_{i = 1}^\infty A_i \right) \leq \sum_{i = 1}^\infty \mu(A_i)\]for any sequence of sets $A_1, A_2, \dots \in \mathcal{S}$. Measures also satisfy:
\[\mu(A \cup B) = \mu(A) + \mu(B) - \mu(A \cap B)\]when $\mu(A \cap B)$ is finite.
We also have that the increasing union of countably many sets in a measurable space has measure equal to the limit of the measures of the sets in the sequence. The same holds for a decreasing intersection. More formally, for measure space, $(X, \mathcal{S}, \mu)$, an increasing sequence of sets in $\mathcal{S}$, $E_1 \subseteq E_2 \subseteq \dots$, and a decreasing sequence of sets in $\mathcal{S}$, $F_1 \supseteq F_2 \supseteq \dots$, we have:
\[\begin{equation} \label{eq:increase-decrease} \mu\left(\bigcup_{k = 1}^\infty E_k \right) = \underset{k \rightarrow \infty}{\lim} \mu(E_k) \hspace{5mm} \mu\left(\bigcap_{k = 1}^\infty F_k \right) = \underset{k \rightarrow \infty}{\lim} \mu(F_k) \end{equation}\]If we have two $\sigma$-finite (see below) measure spaces, $(X, \mathcal{S}, \mu_1)$ and $(Y, \mathcal{T}, \mu_2)$, we can define two addition sets:
\[\begin{aligned} \Omega &= X \times Y = \{ (x, y): x \in X, y \in Y\} \\ \mathcal{U} &= \{ S \times T: S \in \mathcal{S}, T \in \mathcal{T}\} \end{aligned}\]Sets $U \in \mathcal{U}$ are rectangles. Let $\mathcal{F} = \mathcal{S} \times \mathcal{T}$ be the $\sigma$-algebra generated by $\mathcal{U}$. The unique measure $\mu = \mu_1 \times \mu_2$ on $\mathcal{F}$ defined as $\mu(S \times T) = \mu_1(S) \mu_2(T)$ is called a product measure. This result can be extended to finitely many $\sigma$-finite measurable spaces.
Measures can be characterized in a variety of ways. First, consider the $\sigma$-finite measure.
We can also define a sense of continuity to measures.
Measures can also be “coarsened” by restricting the $\sigma$-field on which they operate.
A restricted measure is basically the original measure but its domain is shrunken to whatever sub-$\sigma$-field it is restricted to. Measures also satisfy several properties.
Let $\mu$ be a measure on $(\Omega, \mathcal{F})$, and let $A_i \uparrow A$ denote $A_1 \subset A_2 \subset \dots$ with $\cup_i A_i = A$. The measure $\mu$ satisfies the following:
Proof to be completed.
A sense of “convergence” with respect to a measure can be defined for measurable functions.
Before we can move on to some of the core concepts in probability theory, we need one more definition.
An example of a measure is the Lebesgue measure, which is the outer measure on $(\mathbb{R}, \mathcal{B})$, where $\mathcal{B}$ is the $\sigma$-algebra of Borel subsets of $\mathbb{R}$. The outer measure is not a measure in general, but it is when it is applied to Borel sets.
A set $A \subseteq \mathbb{R}$ is called a Lebesgue measurable set if there exists a Borel set $B \subset A$ satisfying $\rvert A \setminus B \rvert = 0$. All Borel sets are Lebesgue measurable (set $B = A$). Once can also define a Lebesgue measurable set in many, equivalent ways.
In words, a Lebesgue measurable set is an element of the union of the Borel set and the set of all subsets of $\mathbb{R}$ with outer measure zero.
We can use the above definition to define Lebesgue measurable functions.
The above definition means that, if $A$ is a Lebesgue measurable subset of $\mathbb{R}$, then $f$ is $\mathcal{L}$-measurable, where $\mathcal{L}$ is the $\sigma$-algebra of all Lebesgue measurable subsets of $A$.
We have the following result that will allow us to just consider Borel measurable functions if we think it’s okay to ignore the sets of outer measure zero. First, we’ll define the characteristic function. Let $E \subseteq X$. The characteristic function of $E$, denoted by $\chi_E: X \rightarrow \mathbb{R}$, is defined as:
\[\chi_E(x) = \begin{cases} 1 & \text{if } x \in E \\ 0 & \text{if } x \notin E \end{cases}\]Let $f: \mathbb{R} \rightarrow \mathbb{R}$ be a Lebesgue measurable function. There exists a Borel measurable function $g: \mathbb{R} \rightarrow \mathbb{R}$ satisfying:
\[\rvert \left\{ x \in \mathbb{R} \rvert g(x) \neq f(x) \right\} \rvert = 0\]TO DO. <!– By the equivalences given here, there exists a sequence, $f_1, f_2, \dots$, of Lebesgue measurable, simple functions $f_i: \mathbb{R} \rightarrow \mathbb{R}$ that converges pointwise on $\mathbb{R}$ to $f$.
Let $k \in \mathbb{Z}_+$. Since the $f_i$ functions are simple, there exist $c_1, \dots, c_n \in \mathbb{R}$ and disjoint, Lebesgue measurable sets $A_1, \dots, A_n \subseteq \mathbb{R}$ such that:
\[f_k = c_1 \chi_{A_1} + \dots + c_n \chi_{A_n}\]By the first and fourth equivalences given here for Lebesgue measurable sets, we have that, for each $j \in { 1, \dots, n }$, there exists a Borel set $B_j \subseteq A_j$ satisfying $\rvert A_j \setminus B_j \rvert = 0$. We define:
\[g_k = c_1 \chi_{B_1} + \dots + c_n \chi_{B_n}\]Clearly, $\rvert { x \in \mathbb{R} \rvert g_k(x) \neq f_k(x) } \rvert = 0$ because $g_k$ takes on the exact same values as $f_k$ on subsets of the corresponding sets in the domain of $f$. Furthermore, $g_k$ is Borel measurable (proof?).
Suppose that $x \notin \bigcup_{i = 1}^\infty \left{ x \in \mathbb{R} \rvert g_i(x) \neq f_i(x) \right}$. This implies that –>
Convergence gives us a way to describe the behavior of a sequence of functions. In what follows, we will let $(X, \mathcal{S})$ be our measurable space with set $X$ and $\sigma$-algebra $\mathcal{S}$ on $X$.
Another way to state the definition of pointwise convergence is that for each $x \in X$ and every $\epsilon > 0$, there exists $n \in \mathbb{Z}_+$ such that:
\[\rvert f_k(x) - f(x) \rvert < \epsilon \hspace{5mm} \forall k \geq n\]Similarly, we have uniform convergence.
The difference between the definitions means that uniform convergence implies pointwise convergence (on the same set). The converse is not always true; however, it’s kind of true for sequences of functions on measure spaces with finite total measure.
Let $(X, \mathcal{S}, \mu)$ be a measure space with finite total measure (i.e $\mu(X) < \infty$). Let $f_1, f_2, \dots$ be a sequence of $\mathcal{S}$-measurable functions where $f_i: X \rightarrow \mathbb{R}$ that converges pointwise on $X$ to $f: X \rightarrow \mathbb{R}$. Then, for every $\epsilon > 0$, there exists $E \in \mathcal{S}$ such that $\mu(X \setminus E) < \epsilon$, and so $f_1, f_2, \dots$ converges uniformly to $f$ on $E$.
Let $\epsilon > 0$, and fix some \(n \in \mathbb{Z}_+\). We define, for \(m \in \mathbb{Z}_+\):
\[A_{m,n} = \bigcap_{k = m}^\infty \left\{ x \in X \rvert \frac{1}{n} > \rvert f_k(x) - f(x) \rvert \right\}\]Note that each $A_{m,n} \in \mathcal{S}$ since $f_k - f$ is an $\mathcal{S}$-measurable function by the properties of measurable functions covered here and here. Furthermore, we see that $A_{1,n} \subseteq A_{2,n} \subseteq \dots$ is an increasing sequence of sets (since we are doing fewer and fewer intersections as $m$ increases). Thus, by the definition of pointwise convergence, it must be the case that:
\[\bigcup_{m = 1}^\infty A_{m,n} = X\]To see why, set $\epsilon = \frac{1}{n}$. By the definition, there must be some $m \in \mathbb{Z}_+$ such that $\rvert f_k(x) - f(x) \rvert < \epsilon$ for all $k \geq m$ for all $x \in X$. Since the above holds for $x \in X$, taking the union of the sets of $x \in X$ satisfying the condition over all such choices of $m$ will yield the original set $X$.
By Eq. \eqref{eq:increase-decrease}, we have that:
\[\underset{m \rightarrow \infty}{\lim} \mu(A_{m,n}) = \mu(X)\]This implies that there must be an index \(m_n \in \mathbb{Z}_+\) such that the lefthand and righthand sides of the prior equation are arbitrarily close. That is:
\[\mu(X) - \mu(A_{m_n, n}) < \frac{\epsilon}{2^n}\]We will define:
\[E = \bigcap_{n=1}^\infty A_{m_n, n}\]It follows that:
\[\begin{aligned} \mu(X \setminus E) &= \mu\left(X \setminus \bigcap_{n = 1}^\infty A_{m_n, n} \right) \\ &= \mu \left(\bigcup_{n = 1}^\infty \left(X \setminus A_{m_n, n} \right) \right) & \left(\text{DeMorgan's Laws}\right) \\ &\leq \sum_{n = 1}^\infty \mu\left(X \setminus A_{m_n, n}\right) & \left(\text{countable subadditivity}\right) \\ &< \epsilon & \left( \mu(X) - \mu(A_{m_n, n}) < \frac{\epsilon}{2^n} \right) \end{aligned}\]This shows that the measure of the set $X \setminus E$ has measure going to zero. Now we need to show that $f_1, f_2, \dots$ converges uniformly to $f$ on $E$.
Let $\epsilon’ > 0$, and choose \(n \in \mathbb{Z}_+\) such that $\frac{n} < \epsilon’$. By construction, $E \subseteq A_{m_n, n}$, and thus, for any $x \in E$:
\[\rvert f_k(x) - f(x) \rvert < \frac{1}{n} < \epsilon'\]for all $k \geq m_n$, which concludes the proof.
By kind of true, we mean that $f_1, f_2, \dots$ converges uniformly to $f$ except on sets of arbitrarily small measure.