Permutation Testing

A Primer

Recently, I’ve spent a bunch of time invested in parametric hypothesis testing settings. However, in real life, it isn’t always reasonable to assume that we know the distribution of our data or to rely upon assumptotic results. In this post, I’ll provide some background and theory behind permutation testing, which is a way to conduct distribution-free hypothesis tests.

I’ll be using Good’s Permutation Testing and Persarin and Salmas’ Permutations Tests for Complex Data: Theory, Applications and Software for the most part.

Background

We will have the same kind of set-up as for general testing procedures. We will assume that we have some observation of random variable(s), $X$, and we will assume that $X$ has distribution $P_{\theta} \in \mathcal{P} = { P_\theta \rvert \theta \in \Theta }$. We also have a set of possible decisions, $\mathcal{D} = { d }$, that can result from a decision rule, $\delta$, and some chosen loss function, $\mathcal{L}(\theta, \delta)$, that is a measure of the performance of $\delta$ when $\theta$ is true.

A test, denoted by $\phi$, is a decision rule with values in $[0, 1]$. If $\phi(x) = 1$, we reject our null hypothesis in favor of our alternative hypothesis. If $\phi(x) = 0$, we fail to reject the null hypothesis. If $\phi(x) \in (0, 1)$, then reject the null with probability $\phi(x)$.

The Permutation Testing Principle

Suppose we observe a sample, $\mathbf{x} = { x_1, \dots, x_n }$, where each $x_i$ is an observation of the random variable $X \in \mathcal{X}$. We adopt a null hypothesis, $H_0$, which will assume that $X \sim P \in \mathcal{P}$ for some family of distributions, $\mathcal{P}$. Now, suppose we conduct two experiments that yield samples from distributions $P_1$ and $P_2$, respectively, with $P_1, P_2 \in \mathcal{P}$.

The permutation testing principle states that if the two experiments generate samples $\mathbf{x}_1$ and $\mathbf{x}_2$, respectively, such that $\mathbf{x}_1 = \mathbf{x}_2$, then the inferences made conditional on the dataset and using the same test statistic must be the same (given that samples are exchangeable under the null hypothesis).

We can define a permutation test mathematically as follows.

Definition (Permuation Test).
Let $\mathbf{z}$ be an $N$-dimensional vector of observations, let $T(\mathbf{z}) \in \mathbb{R}$ be some statistic computed for $\mathbf{z}$, and let $A: \mathbb{R} \times \mathbb{R} \rightarrow [0, 1]$ be an acceptance criterion. An $\alpha$-level permutation test, denoted by $\phi$, is a decision rule such that, for all $\mathbf{z}$, $\phi(\mathbf{z}) = 1$ if, and only if: $$ W(\mathbf{z}) = \sum_{\pi \in \Pi} A(T(\mathbf{z}), \mathbf{T}(\pi \mathbf{z})) \leq \alpha N! $$ where $\Pi$ is the set of all possible rearrangements of observations, and $\pi \mathbf{z}$ produces on such permutation of the vector $\mathbf{z}$.