This post is just a catch-all for my derivations for my score test project. Our set-up is as follows. We have $n$ observations coming from $k$ different clusters, each of size $n_t$ for $t \in [k]$. The full data will be denoted by $\mathbf{y}$. Though $\mathbf{y}$ is a vector, we’ll denote the $j$-th observation from cluster $i$ with $\mathbf{y}_{i,j}$. For example, \(\mathbf{y}_{i,j}\) denotes element \(\sum_{l = 1}^{i - 1} n_l + j\) of $\mathbf{y}$. We’ll also denote the $n_i$-dimensional vector of responses for cluster $i$ with $\mathbf{y}_i$.
For each observation, we will have $p$ fixed effect covariates arranged in a $p$-dimensional vector, \(\mathbf{x}_{i, j}\), and $q$ random effects covariates in a $q$-dimensional vector, \(\mathbf{z}_{i,j}\). We’ll assume that the observations within the same cluster are independent.
Our model comes in the form of a specification of the conditional mean, $\mu_{i,j} = \mathbb{E}[\mathbf{y}_{i,j} \rvert \beta_i]$ (where we suppress the addition conditioning on the covariates themselves). For a monotonic and differentiable link function (e.g. $\log(\cdot)$ or $\text{logit}(\cdot)$), the conditional mean of the $j$-th observation in group $i$ is assumed to be given by:
\[\mu_{i,j} = g^{-1}\left(\alpha^\top \mathbf{x}_{i,j} + \beta_i^\top \mathbf{z}_{i,j} \right) \label{eq:glmm}\]We then assume that the observations themselves follow some exponential family distribution with measurement errors, $\epsilon_{i,j}$, which is the deviation of the response from its (unit-specific) conditional mean. These errors are assumed to have mean zero and be independent of each other and of the random effects. We further assume the responses, $\mathbf{y}_{i,j}$, conditional on the random effects (and the covariates), are independent with variances equal to some function of the conditional mean.
In general, we will assume that:
\[\beta_i \overset{iid}{\sim} \mathcal{N}\left(\mathbf{0}_q, D(\tau^2) \right)\]for some variance component, $\tau^2$. We’ll use $[\cdot] \rvert_{H_0}$ to denote evaluation of the function in brackets when setting $\beta$ equal to $\beta_0$. We’ll also use a superscript $0$ (e.g. $\mu^0$, $\eta^0$, etc.) to denote the quantity under the null hypothesis (i.e. $\tau^2 = \mathbf{0} \implies \beta = \mathbf{0}$).
Gaussian Case
In this example, we’ll have the simple setting of a Gaussian response, which means $g(\cdot)$ is the identity function. We will have a fixed (but cluster-specific) intercept and a single random slope. We will have $k$ clusters and $n$ observations per cluster. We assume:
\[\mathbf{y}_{i, j} = \alpha_i + \beta_i \mathbf{z}_{i,j} + \epsilon_{i,j}, \hspace{8mm} \epsilon_{i,j} \overset{iid}{\sim} \mathcal{N}(0, \sigma^2), \hspace{5mm} \beta_i \overset{iid}{\sim} \mathcal{N}(0, \tau^2)\]where we also assume the random effects and errors are independent. \(\mathbf{z}_i \in \mathbb{R}^n\) is the vector of covariate values for the $n$ samples in cluster $i$. We’ll denote the vector of responses for cluster $i$ with $\mathbf{y}_i$ so that \(\mathbf{y}_{i,j}\) denotes the $j$-th component of said vector. Marginally, the response vector $\mathbf{y}_i$ has mean \(\alpha_i \mathbb{1}_n\) and variance-covariance matrix:
\[\Sigma_{y_i} = \sigma^2 \mathbb{I}_{n \times n} + \tau^2 \mathbf{z}_i \mathbf{z}_i^\top\]Proof.
For a single cluster: $$ \begin{aligned} \mathbb{E}\left[ (\mathbf{y}_{i,j} - \alpha_i)^2\right] &= \mathbb{E}\left[ (\beta_i \mathbf{z}_{i,j} + \epsilon_{i,j})^2 \right] \\ &= \mathbb{E}\left[\beta_i^2 \mathbf{z}_{i,j}^2 \right] + 2 \mathbb{E}\left[ \beta_i \mathbf{z}_{i,j} \epsilon_{i,j} \right] + \mathbb{E}\left[ \epsilon_{i,j}^2 \right] \\ &= \tau^2 \mathbf{z}_{i,j}^2 + \sigma^2 \\ \mathbb{E}\left[ (\mathbf{y}_{i,j} - \alpha_i)(\mathbf{y}_{i,j'} - \alpha_i) \right] &= \mathbb{E}\left[ (\beta_i \mathbf{z}_{i,j} + \epsilon_{i,j})(\beta_i \mathbf{z}_{i,j'} + \epsilon_{i,j'})\right] \\ &= \mathbb{E}\left[ \beta_i^2 \mathbf{z}_{i,j} \mathbf{z}_{i,j'} \right] + \mathbb{E}\left[ \beta_i \mathbf{z}_{i,j} \epsilon_{i,j'}\right] + \mathbb{E}\left[ \beta_i \mathbf{z}_{i,j'} \epsilon_{i,j}\right] + \mathbb{E}\left[ \epsilon_{i,j} \epsilon_{i,j'}\right] \\ &= \tau^2 \mathbf{z}_{i,j} \mathbf{z}_{i,j'} \nonumber \end{aligned} $$ Thus, the variance-covariance matrix for $\mathbf{y}_i$: $$ \Sigma_{y_i} = \begin{bmatrix} \sigma^2 + \tau^2 \mathbf{z}_{i,1}^2 & \dots & \tau^2 \mathbf{z}_{i,1} \mathbf{z}_{i,n} \\ \vdots & \ddots & \vdots \\ \tau^2 \mathbf{z}_{i,n} \mathbf{z}_{i, 1} & \dots & \sigma^2 + \tau^2 \mathbf{z}_{i, n}^2 \end{bmatrix} = \sigma^2 \mathbb{I}_{n \times n} + \tau^2 \mathbf{z}_i \mathbf{z}_i^\top \nonumber $$Since the $\beta_i$ are independent, observations from different clusters have covariance zero. Let $\mathbf{y} = (\mathbf{y}_1, \dots, \mathbf{y}_k)$ denote the full data, $\alpha = \begin{bmatrix} \alpha_1 & \dots & \alpha_k\end{bmatrix}^\top$, $\beta = \begin{bmatrix} \beta_1 & \dots & \beta_k\end{bmatrix}^\top$, and $\theta = (\alpha, \beta)$. The complete, marginal likelihood and log-likelihood are:
\[\begin{aligned} \mathcal{L}(\theta; \mathbf{y}) &= \prod_{i = 1}^k (2 \pi)^{-\frac{n}{2}} \rvert \Sigma_{y_i} \rvert^{-\frac{1}{2}} \exp\left(- \frac{1}{2} (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \Sigma_{y_i}^{-1} (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \right) \\ \ell(\theta; \mathbf{y}) &= \sum_{i = 1}^k \left[ -\frac{n}{2} \log(2 \pi) - \frac{1}{2}\log(\rvert \Sigma_{y_i} \rvert) - \frac{1}{2} (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \Sigma_{y_i}^{-1} (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \right] \end{aligned}\]Score and Information
It is easiest to write the score after evaluating it at the MLE of the parameter vector under $H_0$, which we denote with $\hat{\theta}$ (uncollapse the proof to see all of the details). The MLE is given by:
\[\hat{\theta} = \begin{bmatrix} \frac{1}{n} \sum_{j = 1}^n \mathbf{y}_{1,j} \\ \vdots \\ \frac{1}{n} \sum_{j = 1}^n \mathbf{y}_{k,j} \\ \frac{1}{nk} \sum_{i = 1}^k (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \\ 0 \end{bmatrix}\]Thus, the score evaluated at $\theta = \hat{\theta}$ is:
\[\begin{aligned} U_\theta (\hat{\theta}) &= \begin{bmatrix} \frac{\partial \ell(\theta; \mathbf{y})}{\partial \alpha} \bigg\rvert_{\theta = \hat{\theta}} \\ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \sigma^2} \bigg\rvert_{\theta = \hat{\theta}} \\ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \tau^2} \bigg\rvert_{\theta = \hat{\theta}} \end{bmatrix} = \begin{bmatrix} \frac{1}{\hat{\sigma}^2} (\mathbf{y}_1 - \hat{\alpha}_1 \mathbf{1}_n)^\top \mathbf{1}_n \\ \vdots \\ \frac{1}{\hat{\sigma}^2} (\mathbf{y}_k - \hat{\alpha}_k \mathbf{1}_n)^\top \mathbf{1}_n \\ - \frac{1}{2} \sum_{i = 1}^k \left[ \frac{n}{\hat{\sigma}^2} - \frac{1}{(\hat{\sigma}^2)^2} (\mathbf{y}_i - \hat{\alpha}_i \mathbf{1}_n)^\top (\mathbf{y}_i - \hat{\alpha}_i \mathbf{1}_n) \right] \\ -\frac{1}{2} \sum_{i = 1}^k \left[ \frac{\text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top \right]}{\hat{\sigma}^2} + \frac{1}{(\hat{\sigma}^2)^2}(\mathbf{y}_i - \hat{\alpha}_i \mathbf{1}_n)^\top \mathbf{z}_i \mathbf{z}_i^\top (\mathbf{y}_i - \hat{\alpha}_i \mathbf{1}_n) \right] \end{bmatrix} \end{aligned}\]Proof.
We first find the gradient of the log-likelihood with respect to $\theta$ parameter-wise. Using the Sherman-Morrison formula, we can find $\Sigma_{y_i}^{-1}$ to be: $$ \Sigma_{y_i}^{-1} = \frac{1}{\sigma^2} \mathbb{I}_{n \times n} - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \nonumber $$Proof.
$$ \begin{aligned} \Sigma_{y_i}^{-1} &= \left[ \sigma^2 \mathbb{I}_{n \times n} + \mathbf{z}_i [\tau^2 \mathbb{I}_{n \times n}] \mathbf{z}_i^\top \right]^{-1} \\ &= \frac{1}{\sigma^2} \mathbb{I}_{n \times n} - \left(1 + \tau^2 \mathbf{z}_i^\top \left[ \frac{1}{\sigma^2} \mathbb{I}_{n \times n} \right]\mathbf{z}_i \right)^{-1} \left( \left(\frac{1}{\sigma^2} \mathbb{I}_{n \times n}\right) \left(\tau^2 \mathbf{z}_{i} \mathbf{z}_i^\top \right) \left(\frac{1}{\sigma^2} \mathbb{I}_{n \times n}\right) \right) \\ &= \frac{1}{\sigma^2} \mathbb{I}_{n \times n} - \left(1 + \frac{\tau^2}{\sigma^2} \mathbf{z}_i^\top \mathbf{z}_i \right)^{-1}\left(\frac{\tau^2}{(\sigma^2)^2} \mathbf{z}_i \mathbf{z}_i^\top \right) \\ &= \frac{1}{\sigma^2} \mathbb{I}_{n \times n} - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \end{aligned} \nonumber $$Proof.
$$ \begin{aligned} \frac{\partial}{\partial \sigma^2} \left[ \log(\rvert \Sigma_y \rvert) \right] &= \text{tr}\left[ \Sigma_y^{-1} \frac{\partial}{\partial \sigma^2} \left[\Sigma_y\right] \right] \\ &= \text{tr}\left[ \Sigma_y^{-1} \mathbb{I}_{n \times n} \right] \\ &= \text{tr}\left[ \Sigma^{-1} \right] \\ &= \text{tr}\left[\frac{1}{\sigma^2} \mathbb{I}_{n \times n} - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \right] \\ &=\text{tr}\left[ \frac{1}{\sigma^2}\mathbb{I}_{n \times n} \right] \text{tr}\left[- \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top\right] \\ &=\frac{n}{\sigma^2} - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \text{tr}\left[\mathbf{z}_i \mathbf{z}_i^\top\right] \end{aligned} \nonumber $$ $$ \begin{aligned} \frac{\partial}{\partial \sigma^2} \left[ \Sigma_y^{-1} \right] &= - \Sigma_y^{-1} \frac{\partial}{\partial \sigma^2} \left[ \Sigma_y\right] \Sigma_y^{-1} \\ &= -\left[ \frac{1}{\sigma^2} \mathbb{I}_{n \times n} - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \right] \mathbb{I}_{n \times n} \left[\frac{1}{\sigma^2} \mathbb{I}_{n \times n} - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \right] \\ &= - \left[ \frac{1}{(\sigma^2)^2} \mathbb{I}_{n \times n} - \frac{2\tau^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top + \frac{(\tau^2)^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \right] \\ &= - \frac{1}{(\sigma^2)^2} \mathbb{I}_{n \times n} + \frac{2\tau^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top - \frac{(\tau^2)^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \end{aligned} \nonumber $$ The above imply: $$ \begin{aligned} \frac{\partial \ell(\theta; \mathbf{y})}{\partial \sigma^2} &= \frac{\partial}{\partial \sigma^2} \left[ \sum_{i = 1}^k - \frac{1}{2} \log(\rvert \Sigma_y \rvert) - \frac{1}{2}(\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \Sigma_y^{-1} (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \right] \\ &= -\frac{1}{2}\sum_{i = 1}^k \left[ \frac{n}{\sigma^2} - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \text{tr}\left[\mathbf{z}_i \mathbf{z}_i^\top\right] + (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \left[ - \frac{1}{(\sigma^2)^2} \mathbb{I}_{n \times n} + \frac{2\tau^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top - \frac{(\tau^2)^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top\right] (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \right] \end{aligned} \nonumber $$Proof.
$$ \begin{aligned} \frac{\partial}{\partial \tau^2} \left[ \log(\rvert \Sigma_y \rvert) \right] &= \text{tr}\left[ \Sigma_y^{-1} \frac{\partial}{\partial \tau^2} \left[\Sigma_y\right] \right] \\ &= \text{tr}\left[ \Sigma_y^{-1} \mathbf{z}_i \mathbf{z}_i^\top \right] \\ &= \text{tr} \left[ \left( \frac{1}{\sigma^2} \mathbb{I}_{n \times n} - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top\right) \mathbf{z}_i \mathbf{z}_i^\top\right] \\ &= \frac{1}{\sigma^2} \text{tr}[\mathbf{z}_i \mathbf{z}_i^\top] - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top\right] \end{aligned} \nonumber $$ $$ \begin{aligned} \frac{\partial}{\partial \tau^2} \left[ \Sigma_y^{-1} \right] &= - \Sigma_y^{-1} \frac{\partial}{\partial \tau^2} \left[ \Sigma_y\right] \Sigma_y^{-1} \\ &= -\left[ \frac{1}{\sigma^2} \mathbb{I}_{n \times n} - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \right] \left[ \mathbf{z}_i \mathbf{z}_i^\top \right] \left[\frac{1}{\sigma^2} \mathbb{I}_{n \times n} - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \right] \\ &= - \frac{1}{(\sigma^2)^2} \mathbf{z}_i \mathbf{z}_i^\top + \frac{2\tau^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top - \frac{(\tau^2)^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top\mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}^\top \mathbf{z}_i \mathbf{z}_i^\top \end{aligned} \nonumber $$ $$ \begin{aligned} \frac{\partial \ell(\theta; \mathbf{y})}{\partial \tau^2} &= \frac{\partial}{\partial \tau^2} \left[ \sum_{i = 1}^k - \frac{1}{2} \log(\rvert \Sigma_y \rvert) - \frac{1}{2}(\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \Sigma_y^{-1} (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \right] \\ &= - \frac{1}{2} \sum_{i = 1}^k \left[ \frac{1}{\sigma^2} \text{tr}[\mathbf{z}_i \mathbf{z}_i^\top] - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top\right] + (\mathbf{y}_i - \alpha_i\mathbf{1}_n)^\top \left[ - \frac{1}{(\sigma^2)^2} \mathbf{z}_i \mathbf{z}_i^\top + \frac{2\tau^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top - \frac{(\tau^2)^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top\mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}^\top \mathbf{z}_i \mathbf{z}_i^\top \right] (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \right] \end{aligned} \nonumber $$Proof.
We do the computations component-wise: $$ \begin{aligned} \frac{\partial \ell(\theta; \mathbf{y})}{\partial \alpha_j} &= \sum_{i = 1}^k - \frac{1}{2} \frac{\partial}{\partial \alpha_j} \left[ (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \Sigma^{-1}_y (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \right] \\ &= - \frac{1}{2} \left(2 (\mathbf{y}_j - \alpha_j \mathbf{1}_n)^\top \Sigma_y^{-1}(- \mathbf{1}_n) \right) \\ &= (\mathbf{y}_j - \alpha_j \mathbf{1}_n)^\top \Sigma_y^{-1} \mathbf{1}_n \end{aligned} $$ So then: $$ \begin{aligned} \frac{\partial \ell(\theta; \mathbf{y})}{\partial \alpha} &= \begin{bmatrix} (\mathbf{y}_1 - \alpha_1 \mathbf{1}_n)^\top \Sigma_y^{-1} \mathbf{1}_n \\ \vdots \\ (\mathbf{y}_k - \alpha_k \mathbf{1}_n)^\top \Sigma_y^{-1} \mathbf{1}_n \end{bmatrix} \end{aligned} $$Proof.
We set the derivative with respect to $\sigma^2$ equal to zero, substitute $\tau^2 = 0$ (under $H_0$), and solve for $\sigma^2$: $$ \begin{aligned} 0 &= \frac{\partial}{\partial \sigma^2} \left[ \ell(\theta; \mathbf{y}) \right] \bigg\rvert_{\theta = \theta_0}\\ 0 &= -\frac{1}{2}\sum_{i = 1}^k \left[ \frac{n}{\sigma^2} - \frac{0}{\sigma^2(\sigma^2 + 0 \cdot\mathbf{z}_i^\top \mathbf{z}_i)} \text{tr}\left[\mathbf{z}_i \mathbf{z}_i^\top\right] + (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \left[ - \frac{1}{(\sigma^2)^2} \mathbb{I}_{n \times n} + \frac{2\cdot 0}{(\sigma^2)^2(\sigma^2 + 0 \cdot \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top + \frac{(0)^2}{(\sigma^2)^2(\sigma^2 - 0 \cdot \mathbf{z}_i^\top \mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top\right] (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \right] \\ 0 &= -\frac{1}{2}\sum_{i = 1}^k \left[ \frac{n}{\sigma^2} + (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \left[ - \frac{1}{(\sigma^2)^2} \mathbb{I}_{n \times n} \right](\mathbf{y}_i - \alpha_i \mathbf{1}_n)\right] \\ 0 &= - \frac{nk}{2 \sigma^2} - \frac{1}{2} \sum_{i = 1}^k - \frac{1}{(\sigma^2)^2} (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \\ 0 &= - \frac{nk}{2\sigma^2} + \frac{1}{2 (\sigma^2)^2} \sum_{i = 1}^k (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \\ \frac{nk}{2 \sigma^2} &= \frac{1}{2(\sigma^2)^2} \sum_{i = 1}^k (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \\ \sigma^2 n k &= \sum_{i = 1}^k (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \\ \sigma^2 &= \frac{1}{nk} \sum_{i = 1}^k (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \end{aligned} \nonumber $$ We set the gradient w.r.t $\alpha$ equal to zero, substitute $\tau^2 = 0$ (under $H_0$), and solve for $\alpha$. $$ \begin{aligned} \mathbf{0} &= \frac{\partial \ell(\theta; \mathbf{y})}{\partial \alpha} \bigg\rvert_{\theta = \theta_0} \\ \mathbf{0} &= \begin{bmatrix} (\mathbf{y}_1 - \alpha_1 \mathbf{1}_n)^\top \left[\frac{1}{\sigma^2}\mathbb{I}_{n \times n} \right] \mathbf{1}_n \\ \vdots \\ (\mathbf{y}_k - \alpha_k \mathbf{1}_n)^\top \left[\frac{1}{\sigma^2}\mathbb{I}_{n \times n} \right] \mathbf{1}_n \end{bmatrix} \end{aligned} \nonumber $$ Since each entry of the gradient only has one component of $\alpha$, we can solve then all separately: $$ \begin{aligned} 0 &= (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \left[ \frac{1}{\sigma^2} \mathbb{I}_{n \times n} \right] \mathbf{1}_n \\ 0 &= \frac{1}{\sigma^2} (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \mathbf{1}_n \\ 0 &= \frac{1}{\sigma^2} \sum_{j = 1}^n (\mathbf{y}_{i,j} - \alpha_i) \\ 0 &= \frac{1}{\sigma^2} \left( \sum_{j =1 }^n \mathbf{y}_{i,j} - n \alpha_i \right) \\ n \alpha_i &= \sum_{j =1 }^n \mathbf{y}_{i,j} \\ \alpha_i &= \frac{1}{n} \sum_{j = 1}^n \mathbf{y}_{i,j} \end{aligned} $$To find the information, we need to compute the second-order derivatives of the log-likelihood, take the expectation under $H_0$ of minus those quantities, and evaluate them by plugging in $\hat{\theta}$:
\[\begin{aligned} \mathcal{I}_{\theta, \theta} (\hat{\theta}) &= -\mathbb{E}\left[ \frac{\partial^2 \ell(\theta; \mathbf{y})}{\partial \theta \partial \theta^\top}\right]\bigg\rvert_{\theta = \hat{\theta}} = \begin{bmatrix} \frac{n}{\hat{\sigma}^2} & \dots & 0 & 0 & 0\\ \vdots & \ddots & \vdots & \vdots & \vdots \\ 0 & \dots & \frac{n}{\hat{\sigma}^2} & 0 & 0 \\ 0 & \dots & 0 & \frac{nk}{2\hat{\sigma}^2} & \frac{1}{2(\hat{\sigma}^2)^2} \sum_{i = 1}^k \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top \right]\\ 0 & \dots & 0 & \frac{1}{2(\hat{\sigma}^2)^2} \sum_{i = 1}^k \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top \right] & \frac{1}{2(\hat{\sigma}^2)^2} \sum_{i = 1}^k \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \right] \end{bmatrix} \end{aligned}\]Proof.
We start by taking the derivative with respect to $\theta$ (component-wise) of the first derivative with respect to $\sigma^2$: $$ \begin{aligned} \frac{\partial}{\partial \sigma^2} \left[ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \sigma^2} \right] &= - \frac{1}{2}\sum_{i = 1}^k -\frac{n}{(\sigma^2)^2} + \frac{\tau^2(2\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top\right] + (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \left[ \frac{2}{(\sigma^2)^3} \mathbb{I}_{n \times n} - \frac{2\tau^2(3\sigma^2 + 2\tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^3(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2}\mathbf{z}_i \mathbf{z}_i^\top + \frac{2(\tau^2)^2(2\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^3 (\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z})^3}\mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \right](\mathbf{y}_i - \alpha_i \mathbf{1}_n) \\ \frac{\partial}{\partial \sigma^2} \left[ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \tau^2} \right] &= -\frac{1}{2}\sum_{i = 1}^k -\frac{1}{(\sigma^2)^2} \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top \right] - \frac{-\tau^2(2\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \right] + (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \left[ \frac{2}{(\sigma^2)^3} \mathbf{z}_i \mathbf{z}_i^\top + \frac{-2\tau^2(2\sigma^2 + 3\tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^3(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top - \frac{-2(\tau^2)^2(2\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^3 (\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^3} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \right] (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \\ \frac{\partial}{\partial \sigma^2} \left[ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \alpha_j} \right] &= (\mathbf{y}_j - \alpha_j \mathbf{1}_n)^\top \left[ -\frac{1}{(\sigma^2)^2} \mathbb{I}_{n \times n} + \frac{\tau^2(2\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^2 (\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \right] \mathbf{1}_n \end{aligned} \nonumber $$Proof.
$$ \begin{aligned} \frac{\partial}{\partial \sigma^2} \left[ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \sigma^2} \right] &= -\frac{1}{2}\sum_{i = 1}^k \frac{\partial}{\partial \sigma^2}\left[ \frac{n}{\sigma^2} - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \text{tr}\left[\mathbf{z}_i \mathbf{z}_i^\top\right] + (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \left[ - \frac{1}{(\sigma^2)^2} \mathbb{I}_{n \times n} + \frac{2\tau^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top - \frac{(\tau^2)^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top\right] (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \right] \\ &= - \frac{1}{2}\sum_{i = 1}^k -\frac{n}{(\sigma^2)^2} - \frac{-\tau^2(2\sigma^2+\tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top\right] + (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \left[ \frac{2}{(\sigma^2)^3} \mathbb{I}_{n \times n} + \frac{-2\tau^2(3\sigma^2 + 2\tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^3(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2}\mathbf{z}_i \mathbf{z}_i^\top - \frac{-2(\tau^2)^2(2\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^3 (\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z})^3}\mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \right](\mathbf{y}_i - \alpha_i \mathbf{1}_n) \\ &= - \frac{1}{2}\sum_{i = 1}^k -\frac{n}{(\sigma^2)^2} + \frac{\tau^2(2\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top\right] + (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \left[ \frac{2}{(\sigma^2)^3} \mathbb{I}_{n \times n} - \frac{2\tau^2(3\sigma^2 + 2\tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^3(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2}\mathbf{z}_i \mathbf{z}_i^\top + \frac{2(\tau^2)^2(2\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^3 (\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z})^3}\mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \right](\mathbf{y}_i - \alpha_i \mathbf{1}_n) \\ \frac{\partial}{\partial \sigma^2} \left[ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \tau^2} \right] &= - \frac{1}{2} \sum_{i = 1}^k \frac{\partial}{\partial \sigma^2} \left[ \frac{1}{\sigma^2} \text{tr}[\mathbf{z}_i \mathbf{z}_i^\top] - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top\right] + (\mathbf{y}_i - \alpha_i\mathbf{1}_n)^\top \left[ - \frac{1}{(\sigma^2)^2} \mathbf{z}_i \mathbf{z}_i^\top + \frac{2\tau^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top - \frac{(\tau^2)^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top\mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}^\top \mathbf{z}_i \mathbf{z}_i^\top \right] (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \right] \\ &= -\frac{1}{2}\sum_{i = 1}^k -\frac{1}{(\sigma^2)^2} \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top \right] - \frac{-\tau^2(2\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \right] + (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \left[ \frac{2}{(\sigma^2)^3} \mathbf{z}_i \mathbf{z}_i^\top + \frac{-2\tau^2(2\sigma^2 + 3\tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^3(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top - \frac{-2(\tau^2)^2(2\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^3 (\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^3} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \right] (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \\ \frac{\partial}{\partial \sigma^2} \left[ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \alpha_j} \right] &= \frac{\partial}{\partial \sigma^2} \left[ (\mathbf{y}_j - \alpha_j \mathbf{1}_n)^\top \left[ \frac{1}{\sigma^2} \mathbb{I}_{n \times n} - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \right] \mathbf{1}_n \right] \\ &= (\mathbf{y}_j - \alpha_j \mathbf{1}_n)^\top \left[ -\frac{1}{(\sigma^2)^2} \mathbb{I}_{n \times n} + \frac{\tau^2(2\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^2 (\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \right] \mathbf{1}_n \end{aligned} \nonumber $$Proof.
$$ \begin{aligned} \frac{\partial}{\partial \tau^2} \left[ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \sigma^2} \right] &= -\frac{1}{2}\sum_{i = 1}^k \frac{\partial}{\partial \tau^2}\left[ \frac{n}{\sigma^2} - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \text{tr}\left[\mathbf{z}_i \mathbf{z}_i^\top\right] + (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \left[ - \frac{1}{(\sigma^2)^2} \mathbb{I}_{n \times n} + \frac{2\tau^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top - \frac{(\tau^2)^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top\right] (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \right] \\ &= -\frac{1}{2}\sum_{i = 1}^k 0 - \frac{1}{(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top \right] + (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \left[0 + \frac{2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top - \frac{2\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^3} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \right] (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \\ &= -\frac{1}{2}\sum_{i = 1}^k - \frac{1}{(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top \right] + (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \left[\frac{2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top - \frac{2\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^3} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \right] (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \\ \frac{\partial}{\partial \tau^2} \left[ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \tau^2} \right] &= - \frac{1}{2} \sum_{i = 1}^k \frac{\partial}{\partial \tau^2} \left[ \frac{1}{\sigma^2} \text{tr}[\mathbf{z}_i \mathbf{z}_i^\top] - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top\right] + (\mathbf{y}_i - \alpha_i\mathbf{1}_n)^\top \left[ - \frac{1}{(\sigma^2)^2} \mathbf{z}_i \mathbf{z}_i^\top + \frac{2\tau^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top - \frac{(\tau^2)^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top\mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}^\top \mathbf{z}_i \mathbf{z}_i^\top \right] (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \right] \\ &= -\frac{1}{2}\sum_{i = 1}^k 0 - \frac{1}{(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \right] + (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \left[ 0 + \frac{2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2}\mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top - \frac{2\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^3} \mathbf{z}_i \mathbf{z}_i^\top\mathbf{z}_i \mathbf{z}_i^\top\mathbf{z}_i \mathbf{z}_i^\top\right] (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \\ &= -\frac{1}{2}\sum_{i = 1}^k - \frac{1}{(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \right] + (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \left[ \frac{2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2}\mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top - \frac{2\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^3} \mathbf{z}_i \mathbf{z}_i^\top\mathbf{z}_i \mathbf{z}_i^\top\mathbf{z}_i \mathbf{z}_i^\top\right] (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \\ \frac{\partial}{\partial \tau^2} \left[ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \alpha_j} \right] &= \frac{\partial}{\partial \tau^2} \left[ (\mathbf{y}_j - \alpha_j \mathbf{1}_n)^\top \left[ \frac{1}{\sigma^2} \mathbb{I}_{n \times n} - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \right] \mathbf{1}_n \right] \\ &= (\mathbf{y}_j - \alpha_j \mathbf{1}_n)^\top \left[ 0 - \frac{1}{(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \right] \mathbf{1}_n \\ &= (\mathbf{y}_j - \alpha_j \mathbf{1}_n)^\top \left[ - \frac{1}{(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \right] \mathbf{1}_n \end{aligned} \nonumber $$Proof.
$$ \begin{aligned} \frac{\partial}{\partial \alpha_j} \left[ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \sigma^2} \right] &= -\frac{1}{2}\sum_{i = 1}^k \frac{\partial}{\partial \alpha_j} \left[ \frac{n}{\sigma^2} - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \text{tr}\left[\mathbf{z}_i \mathbf{z}_i^\top\right] + (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \left[ - \frac{1}{(\sigma^2)^2} \mathbb{I}_{n \times n} + \frac{2\tau^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top - \frac{(\tau^2)^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top\right] (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \right] \\ &= -\frac{1}{2}\left[0 - 0 - 2(\mathbf{y}_j - \alpha_j \mathbf{1}_n)^\top \left[ - \frac{1}{(\sigma^2)^2} \mathbb{I}_{n \times n} + \frac{2\tau^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top - \frac{(\tau^2)^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \right]\mathbf{1}_n \right] \\ &= (\mathbf{y}_j - \alpha_j \mathbf{1}_n)^\top \left[ - \frac{1}{(\sigma^2)^2} \mathbb{I}_{n \times n} + \frac{2\tau^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_j^\top \mathbf{z}_j)} \mathbf{z}_j \mathbf{z}_j^\top - \frac{(\tau^2)^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_j^\top \mathbf{z}_j)^2} \mathbf{z}_j \mathbf{z}_j^\top \mathbf{z}_j \mathbf{z}_j^\top \right]\mathbf{1}_n \\ \frac{\partial}{\partial \alpha} \left[ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \tau^2} \right] &= - \frac{1}{2} \sum_{i = 1}^k \frac{\partial}{\partial \alpha_j} \left[ \frac{1}{\sigma^2} \text{tr}[\mathbf{z}_i \mathbf{z}_i^\top] - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top\right] + (\mathbf{y}_i - \alpha_i\mathbf{1}_n)^\top \left[ - \frac{1}{(\sigma^2)^2} \mathbf{z}_i \mathbf{z}_i^\top + \frac{2\tau^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top - \frac{(\tau^2)^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top\mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}^\top \mathbf{z}_i \mathbf{z}_i^\top \right] (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \right] \\ &= -\frac{1}{2} \left[0 - 0 - 2(\mathbf{y}_j - \alpha_j \mathbf{1}_n)^\top \left[ - \frac{1}{(\sigma^2)^2} \mathbf{z}_i \mathbf{z}_i^\top + \frac{2\tau^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top - \frac{(\tau^2)^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top\mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}^\top \mathbf{z}_i \mathbf{z}_i^\top \right] \mathbf{1}_n \right] \\ &= (\mathbf{y}_j - \alpha_j \mathbf{1}_n)^\top \left[ - \frac{1}{(\sigma^2)^2} \mathbf{z}_i \mathbf{z}_i^\top + \frac{2\tau^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top - \frac{(\tau^2)^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top\mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}^\top \mathbf{z}_i \mathbf{z}_i^\top \right] \mathbf{1}_n \\ \frac{\partial}{\partial \alpha_j} \left[ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \alpha_j} \right] &= \frac{\partial}{\partial \alpha_j} \left[ (\mathbf{y}_j - \alpha_j \mathbf{1}_n)^\top \left[ \frac{1}{\sigma^2} \mathbb{I}_{n \times n} - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \right] \mathbf{1}_n \right] \\ &= \frac{\partial}{\partial \alpha_j} \left[ \sum_{h = 1}^n (\mathbf{y}_{j,h} - \alpha_j) \sum_{l = 1}^n \left[ \frac{1}{\sigma^2} \mathbb{I}_{n \times n} - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \right]_{h, l} \right] \\ &= - \sum_{h = 1}^n \sum_{l = 1}^n \left[ \frac{1}{\sigma^2} \mathbb{I}_{n \times n} - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \right]_{h, l} \\ &= - \mathbf{1}_n^\top \left[ \frac{1}{\sigma^2} \mathbb{I}_{n \times n} - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \right] \mathbf{1}_n \\ \frac{\partial}{\partial \alpha_{j'}} \left[ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \alpha_j} \right] &= \frac{\partial}{\partial \alpha_{j'}} \left[ (\mathbf{y}_j - \alpha_j \mathbf{1}_n)^\top \left[ \frac{1}{\sigma^2} \mathbb{I}_{n \times n} - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \right] \mathbf{1}_n \right] \\ &= 0 \end{aligned} \nonumber $$Proof.
$$ \begin{aligned} \mathbb{E}\left[ \frac{\partial}{\partial \sigma^2} \left[ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \sigma^2} \right] \right] &= \mathbb{E}\left[ - \frac{1}{2}\sum_{i = 1}^k -\frac{n}{(\sigma^2)^2} + \frac{\tau^2(2\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top\right] + (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \left[ \frac{2}{(\sigma^2)^3} \mathbb{I}_{n \times n} - \frac{2\tau^2(3\sigma^2 + 2\tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^3(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2}\mathbf{z}_i \mathbf{z}_i^\top + \frac{2(\tau^2)^2(2\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^3 (\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z})^3}\mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \right](\mathbf{y}_i - \alpha_i \mathbf{1}_n) \right] \\ &= - \frac{1}{2}\sum_{i = 1}^k -\frac{n}{(\sigma^2)^2} + \frac{\tau^2(2\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top\right] + \mathbb{E}\left[ (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \left[ \frac{2}{(\sigma^2)^3} \mathbb{I}_{n \times n} - \frac{2\tau^2(3\sigma^2 + 2\tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^3(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2}\mathbf{z}_i \mathbf{z}_i^\top + \frac{2(\tau^2)^2(2\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^3 (\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z})^3}\mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \right](\mathbf{y}_i - \alpha_i \mathbf{1}_n) \right] \\ &= - \frac{1}{2}\sum_{i = 1}^k -\frac{n}{(\sigma^2)^2} + \frac{\tau^2(2\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top\right] + \text{tr}\left[ \left[ \frac{2}{(\sigma^2)^3} \mathbb{I}_{n \times n} - \frac{2\tau^2(3\sigma^2 + 2\tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^3(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2}\mathbf{z}_i \mathbf{z}_i^\top + \frac{2(\tau^2)^2(2\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^3 (\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z})^3}\mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \right] \mathbb{E}\left[ (\mathbf{y}_i - \alpha_i \mathbf{1}_n) (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \right] \right] \\ &= - \frac{1}{2}\sum_{i = 1}^k -\frac{n}{(\sigma^2)^2} + \frac{\tau^2(2\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top\right] + \text{tr}\left[ \left[ \frac{2}{(\sigma^2)^3} \mathbb{I}_{n \times n} - \frac{2\tau^2(3\sigma^2 + 2\tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^3(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2}\mathbf{z}_i \mathbf{z}_i^\top + \frac{2(\tau^2)^2(2\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^3 (\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z})^3}\mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \right] \Sigma_{y_i} \right] \\ \mathbb{E}\left[ \frac{\partial}{\partial \sigma^2} \left[ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \tau^2} \right] \right] &= -\frac{1}{2}\sum_{i = 1}^k -\frac{1}{(\sigma^2)^2} \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top \right] - \frac{-\tau^2(2\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \text{tr}\left[ \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \right] + \text{tr}\left[ \left[ \frac{2}{(\sigma^2)^3} \mathbf{z}_i \mathbf{z}_i^\top + \frac{-2\tau^2(2\sigma^2 + 3\tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^3(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top - \frac{-2(\tau^2)^2(2\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)}{(\sigma^2)^3 (\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)^3} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top \right] \Sigma_{y_i} \right] \\ \mathbb{E}\left[ \frac{\partial}{\partial \sigma^2} \left[ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \alpha_j} \right] \right] &= 0 \end{aligned} \nonumber $$Proof.
$$ \begin{aligned} \mathbb{E}\left[ \frac{\partial}{\partial \alpha_j} \left[ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \sigma^2} \right] \right] &= \mathbb{E}\left[(\mathbf{y}_j - \alpha_j \mathbf{1}_n)^\top \left[ - \frac{1}{(\sigma^2)^2} \mathbb{I}_{n \times n} + \frac{2\tau^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_j^\top \mathbf{z}_j)} \mathbf{z}_j \mathbf{z}_j^\top - \frac{(\tau^2)^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_j^\top \mathbf{z}_j)^2} \mathbf{z}_j \mathbf{z}_j^\top \mathbf{z}_j \mathbf{z}_j^\top \right]\mathbf{1}_n \right] \\ &= 0 \\ \mathbb{E}\left[ \frac{\partial}{\partial \alpha} \left[ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \tau^2} \right] \right] &= \mathbb{E}\left[(\mathbf{y}_j - \alpha_j \mathbf{1}_n)^\top \left[ - \frac{1}{(\sigma^2)^2} \mathbf{z}_i \mathbf{z}_i^\top + \frac{2\tau^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}_i^\top - \frac{(\tau^2)^2}{(\sigma^2)^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top\mathbf{z}_i)^2} \mathbf{z}_i \mathbf{z}_i^\top \mathbf{z}_i \mathbf{z}^\top \mathbf{z}_i \mathbf{z}_i^\top \right] \mathbf{1}_n \right] \\ &= 0 \\ \mathbb{E}\left[ \frac{\partial}{\partial \alpha_j} \left[ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \alpha_j} \right] \right] &= - \mathbf{1}_n^\top \left[ \frac{1}{\sigma^2} \mathbb{I}_{n \times n} - \frac{\tau^2}{\sigma^2(\sigma^2 + \tau^2 \mathbf{z}_i^\top \mathbf{z}_i)} \mathbf{z}_i \mathbf{z}_i^\top \right] \mathbf{1}_n \\ \mathbb{E}\left[\frac{\partial}{\partial \alpha_{j'}} \left[ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \alpha_j} \right] \right] &= 0 \end{aligned} \nonumber $$Negative Binomial Case
In this example, we’ll let the responses be negative binomial. To keep things simple, we’ll say we only have a single fixed intercept and a single random effect. We let $\phi > 0$, denote the known dispersion parameter and assume the conditional mean to be given by:
\[\mu_{i,j} = \exp\left( \alpha_i + \beta_i \mathbf{z}_{i,j} \right) \label{eq:neg-bin-mean}\]The likelihood based on a single observation, $\mathbf{y}_{i,j}$, is given by:
\[\mathcal{L}(\mathbf{y}_{i,j}; \alpha_i, \tau^2 \rvert \beta_i) = \frac{\Gamma\left(\mathbf{y}_{i,j} + \frac{1}{\phi}\right)}{\Gamma(\mathbf{y}_{i,j} + 1) \Gamma\left(\frac{1}{\phi} \right)}\left(\frac{1}{1 + \phi \mathbf{y}_{i,j}}\right)^{\frac{1}{\phi}} \left( \frac{\phi \mu_{i,j}}{1 + \phi \mu_{i,j}} \right)^{\mathbf{y}_{i,j}} \label{eq:neg-bin-single-lik}\]where $\Gamma(\cdot)$ is the gamma function:
\[\Gamma(x) = \int_0^\infty t^{x - 1} \exp(-t) dt\]The above parametrization of the likelihood implies that the conditional variance of the responses is given by:
\[V(\mu_{i,j}) = \mu_{i,j} + \frac{1}{\phi} \mu_{i,j}^2\]The conditional log-likelihood based on cluster $i$ is:
\[\ell(\mathbf{y}_i; \alpha_i, \tau^2 \rvert \beta_i) = \sum_{j = 1}^{n_i} \left[ \log \Gamma \left( \mathbf{y}_{i,j} + \frac{1}{\phi} \right) - \log \Gamma\left(\mathbf{y}_{i,j} + 1\right) - \log\Gamma\left(\frac{1}{\phi} \right) - \frac{1}{\phi} \log\left(1 + \phi \mathbf{y}_{i,j} \right) + \mathbf{y}_{i,j} \left( \log(\phi \mu_{i,j}) - \log(1 + \phi \mu_{i,j}) \right) \right] \label{eq:neg-bin-full-cond-ll}\]Pseudo-Likelihood Approach
We follow a pseudo-likelihood approach (see here). We assume to have the following generalized linear mixed model:
\[\mathbf{y}_{i,j} \rvert \beta_i \sim \text{NegBin}(\mu_{i,j}, \phi); \hspace{10mm} \mu_{i,j} = \exp\left(\eta_{i,j}\right) = \exp\left(\alpha_i + \beta_i \mathbf{z}_{i,j}\right) \label{eq:glmm-y}\]We’ll use a superscript $\star$ to denote a quantity evaluated at the parameter estimates made under $H_0$ (i.e. $\tau^2 = \mathbf{0}$). Our working responses and errors are:
\[\mathbf{y}^\star_{i,j} = \alpha_i + \beta_i \mathbf{z}_{i,j} + \epsilon^\star_{i,j}; \hspace{10mm} \epsilon^\star_{i,j} \sim \mathcal{N}\left(0, \frac{V(\hat{\mu}_{i,j})}{\delta^2(\hat{\eta}_{i,j})}\right)\]where \(\delta(\hat{\eta}_{i,j}) = \frac{\partial g^{-1}(\eta_{i,j})}{\partial \eta_{i,j}}\bigg\rvert_{\eta_{i,j} = \hat{\eta}_{i,j}}\). We can then just apply all of the results we found in the previous section to this case but make \(\hat{\sigma}^2\) different for each observation, where \(\hat{\sigma}^2_{i,j} = \text{Var}(\epsilon_{i,j})\).
To do so, we need an estimate of $\alpha_i$ under $H_0$. In this case, the model for a cluster $i$ reduces down to:
\[\mathbf{y}^\star_{i} = \alpha_i \mathbf{1}_n + \epsilon^*_{i} \hspace{10mm} \mathbf{y}^\star_{i} = \begin{bmatrix} \mathbf{y}^*_{i, 1} \\ \vdots \\ \mathbf{y}^*_{i, n} \end{bmatrix}; \hspace{2mm} \epsilon^*_{i} = \begin{bmatrix} \epsilon^*_{i, 1} \\ \vdots \\ \epsilon^*_{i,n} \end{bmatrix}\]Thus, a solution can be found in closed form via weighted least squares as:
\[\hat{\alpha}_i = (\mathbf{1}_n^\top \mathbf{W}_i \mathbf{1}_n)^{-1} \mathbf{1}_n^\top \mathbf{W}_i \mathbf{y}_i; \hspace{8mm} \mathbf{W}_i = \begin{bmatrix} \frac{1}{\hat{\sigma}_{i,1}^2} & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \dots & \frac{1}{\hat{\sigma}_{i,n}^2} \end{bmatrix}\]