Home

Generalized Linear Mixed Models

× These are my own notes, so errors and typos are bound to appear!

This post is a primer on generalized linear mixed models (GLMM) and their associated estimation procedures. Throughout this post, we perform derivatives of matrices, vectors, and scalars. We denote all of them with $\partial$ even though they may not technically be partial derivatives. The meaning should be clear from the context if you keep track of the dimensions of the variables.

We’ll use the notation $\text{vec}(\mathbf{M})$ denote the vectorization of matrix $\mathbf{M}$, where we flatten the matrix row-by-row. Also, any function $f: \mathbb{R} \rightarrow \mathbb{R}$ will be applied element-wise to vectors and matrices.


Background

Let’s first explain why generalized linear mixed models are of interest and helpful in data analysis. Suppose we have a dataset in which the observations can be grouped in some way (i.e. we have repeated observations from the same unit). One example is longitudinal studies where we obtain measurements from the same individuals over time. Another example is in education where we may observe students from the same classes or schools. In these settings, it seems reasonable to assume that observations from the same unit would be more similar that those from different ones. This can be realized in our model by including a way for the effects of covariates to differ across units.

As in generalized linear models, we assume the mean response is some function of a linear combination of covariates. However, GLMMs are called mixed because the effects of the covariates are of two types: fixed and random. The fixed effects represent population-level relationships, while the random effects represent unit-specific ones. In this way, we can use GLMMs to analyze between-subject variation (via the fixed effects) and within-subject variation (via the random effects): the fixed effects determine the relationship between the covariates and the mean response for the population (i.e. overall), and the random effects describe how each group’s mean response deviates from that. Later on, we introduce measurement errors, which account for deviation of each observation from its group’s mean.

Set-Up

We have $n$ observations coming from $k$ different clusters, each of size $n_t$ for $t \in [k]$. The full data will be denoted by $\mathbf{y}$. Though $\mathbf{y}$ is a vector, we’ll denote the $j$-th observation from cluster $i$ with $\mathbf{y}_{i,j}$. For example, \(\mathbf{y}_{i,j}\) denotes element \(\sum_{l = 1}^{i - 1} n_l + j\) of $\mathbf{y}$. We’ll also denote the $n_i$-dimensional vector of responses for cluster $i$ with $\mathbf{y}_i$.

For each observation, we will have $p$ fixed effect covariates arranged in a $p$-dimensional vector, \(\mathbf{x}_{i, j}\), and $q$ random effects covariates in a $q$-dimensional vector, \(\mathbf{z}_{i,j}\). We’ll assume that the observations within the same cluster are independent.

We’ll let $\alpha$ denote a $p$-dimensional fixed effect coefficient vector, $\alpha$, and $\beta_t$ will denote a $q$-dimensional random effect coefficient vector corresponding to group $t$. We assume $\beta_t \overset{iid}{\sim} \mathcal{F}$ for all $t \in [k]$ for an exponential family distribution, $\mathcal{F}$, and $q \times q$ covariance matrix, $D(\tau^2)$, depending on an $m$-dimensional variance component vector, $\tau^2$. These random effects are assumed to be independent of the covariates. When they are not, we can run into issues with bias when estimating the fixed effects.1

Additional Assumptions. A few auxiliary assumptions must also be made for the analysis later, which we list here:
  • The third moment and higher moments of $\beta$ are of order $o(\rvert \rvert \tau^2 \rvert \rvert)$.
  • The entries of $D(\tau^2)$ are linear in $\tau^2$.
  • $D(\tau^2) = \text{vec}(0)$ if $\tau^2 = \text{vec}(0)$

Our model comes in the form of a specification of the conditional mean, $\mu_{i,j} = \mathbb{E}[\mathbf{y}_{i,j} \rvert \beta_i]$ (where we suppress the addition conditioning on the covariates themselves). For a monotonic and differentiable link function (e.g. $\log(\cdot)$ or $\text{logit}(\cdot)$), the conditional mean of the $j$-th observation in group $i$ is assumed to be given by:

\[\mu_{i,j} = g^{-1}\left(\mathbf{x}_{i,j}^\top \alpha + \mathbf{z}_{i,j}^\top \beta_i \right) \label{eq:glmm}\]

We then assume that the observations themselves follow some exponential family distribution with measurement errors, $\epsilon_{i,j}$, which is the deviation of the response from its (unit-specific) conditional mean. These errors are assumed to have mean zero and be independent of each other and of the random effects. We further assume the responses, $\mathbf{y}_{i,j}$, conditional on the random effects (and the covariates), are independent with variances equal to some function of the conditional mean.

To write Eq. \eqref{eq:glmm} in matrix notation, we assume that the observations are ordered by group, so the first $n_1$ observations are all from group $1$. We can then define the following vectors/matrices:

\[\mu = \begin{bmatrix} \mu_{1, 1} \\ \vdots \\ \mu_{k, n_k} \end{bmatrix}, \hspace{2mm} \mathbf{X} = \begin{bmatrix} — & \mathbf{x}_{1,1} & — \\ & \dots & \\ — & \mathbf{x}_{k, n_k} & — \\ \end{bmatrix}, \hspace{2mm} \tilde{\mathbf{Z}}_i = \begin{bmatrix} — & \mathbf{z}_{i, 1} & — \\ & \dots & \\ — & \mathbf{z}_{i, n_i} & — \\ \end{bmatrix}, \hspace{2mm} \mathbf{Z} = \begin{bmatrix} \tilde{\mathbf{Z}}_1 & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \dots & \tilde{\mathbf{Z}}_k \end{bmatrix}, \hspace{2mm} \beta = \begin{bmatrix} \beta_1 \\ \vdots \\ \beta_k \end{bmatrix} \nonumber\]

where $\tilde{\mathbf{Z}}_i$ is constructed by stacking the \(\mathbf{z}_{i,j}\) vectors for group $i$ into a matrix, and where $\beta$ is the vector created by stacking together all of the $\beta_i$ vectors vertically. As a reminder, \(\mu \in \mathbb{R}^{n \times 1}\), \(\mathbf{X} \in \mathbb{R}^{n \times p}\), \(\tilde{\mathbf{Z}}_i \in \mathbb{R}^{n_i \times q}\), \(\mathbf{Z} \in \mathbb{R}^{n \times (q \times k)}\), $\alpha \in \mathbb{R}^p$, and \(\beta \in \mathbb{R}^{(q \times k) \times 1}\).

The model is then:

\[\mu = g^{-1}(\mathbf{X} \alpha + \mathbf{Z} \beta) \label{eq:glmm-matrix}\]

We’ll use $[\cdot] \rvert_{\beta = \beta_0}$ to denote evaluation of the function in brackets when setting $\beta$ equal to $\beta_0$. Similarly, we’ll use $[\cdot] \rvert_{H_0}$ to denote evaluation under the null hypothesis. We’ll also use a superscript $0$ (e.g. $\mu^0$, $\eta^0$, etc.) to denote the quantity under the null hypothesis (i.e. $\tau^2 = \mathbf{0} \implies \beta = \mathbf{0}$).

Example.
To keep things simple, we'll assume to a fixed intercept (that is cluster-specific) and a random slope. Thus, $\alpha, \beta \in \mathbb{R}^{k}$, and $\mathbf{z}_{i, j} \in \mathbb{R}$ as well.
In this simple case, $\tilde{\mathbf{Z}}_i \in \mathbb{R}^{n_t}$, so $\mathbf{Z}$ is $n \times k$. Our model is then: $$ \mathbf{y}_{i,j} \rvert \beta_i \sim \text{Poi}\left(\mu_{i,j}\right), \hspace{5mm} \mu_{i,j} = \exp\left(\alpha_i + \beta_i \mathbf{z}_{i,j} \right) $$ We'll have a scalar-valued variance component that we call $\tau^2$. The random effects will have distribution: $$ \beta_i \overset{iid}{\sim} \mathcal{N}(0, \tau^2), \hspace{5mm} \forall i \in [k] $$ If we let $\mathbf{A}$ denote the $n$-dimensional vector of intercepts where the $i$-th entry of $\alpha$ is repeated $n_i$ times, then we can write our model in vector form as: $$ \mu = \exp(\mathbf{A} + \mathbf{Z} \beta) $$

Likelihood

We can write the conditional log-likelihood using the exponential family form (see my generalized linear models post).

\[\ell(\mathbf{y}; \alpha, \tau^2 \rvert \beta) = \sum_{i = 1}^{k} \sum_{j = 1}^{n_i} \left[ \frac{\zeta_{i,j} \mathbf{y}_{i,j} - A(\zeta_{i,j})}{d(\phi, \omega_{i,j})} + \log(h(\mathbf{y}_{i,j}, \phi, \omega_{i,j}))\right], \hspace{10mm} \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) = \exp \left( \sum_{i = 1}^n \left[ \frac{\zeta_i \mathbf{y}_i - A(\zeta_i)}{d(\phi, \omega_i)} + \log(h(\mathbf{y}_i, \phi, \omega_i))\right] \right) \label{eq:condition-log-lik}\]

where $\phi > 0$ is a dispersion/scale parameter; $\omega_{i,j}$ is a (prior) dispersion weights; $\zeta_{i,j}$ is a distribution parameter; and $h(\cdot)$ and $A(\cdot)$ are known functions. We assume that $d(\phi, \omega_{i,j}) = \phi \omega_i^{-1}$.

As we found in my generalized linear models post, the conditional variance of the responses is given by:

\[\text{Cov}(\mathbf{y} \rvert \beta) = \text{diag}(d(\phi, \omega)) \underbrace{\frac{\partial^2 A(\tau^2)}{\partial \tau^2 \partial (\tau^2)^\top}}_{=V(\mu)} = \text{diag}^{-1}\left( \frac{\omega}{\phi}\right) V(\mu)\]

Quasi-Likelihood

We don’t actually need to specify the true likelihood, even though (in practice) we will usually be able to do so. Using quasi-likelihood methods can often be less computationally expensive than maximum likelihood for count data.

In the quasi-likelihood scheme, we keep the conditional mean assumption that we discussed above and also make the assumption that the conditional variance can be expressed as a function of the mean in the form $\text{Cov}(\mathbf{y}) = \text{diag}\left(d(\phi, \omega)\right)V(\mu)$ for some function $V(\cdot)$. Later, we’ll denote diagonal elements of $\text{Cov}(\mathbf{y})$ with $v(\mu_{i,j})$.

The conditional log quasi-likelihood and quasi-likelihood are given by:

\[\ell_q(\mathbf{y}_{i,j}; \alpha, \tau^2 \rvert \beta_i) = \int_{\mathbf{y}_{i,j}}^{\mu_{i,j}} \frac{\omega_{i,j}(\mathbf{y}_{i,j} - u)}{\phi V(u)} du, \hspace{10mm} \mathcal{L}_q(\mathbf{y}_{i,j}; \alpha, \tau^2 \rvert \beta_i) = \exp\left(\int_{\mathbf{y}_{i,j}}^{\mu_{i,j}} \frac{\omega_{i,j}(\mathbf{y}_{i,j} - u)}{\phi V(u)} du \right) \label{eq:quasi-lik}\]

When $\mathcal{F}$ is an exponential family distribution, the log-likelihood and log quasi-likelihood are equal. See my post on quasi-likelihood for a more in-depth discussion.

Let $f$ denote the density associated with $\mathcal{F}$, the distribution of the random effects. The unconditional quasi-likelihood is then given by integrating out the random effects from the joint quasi-likelihood:

\[\begin{aligned} \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2) &= \prod_{i = 1}^k \mathcal{L}_q(\mathbf{y}_i; \alpha, \tau^2) = \prod_{i = 1}^k \int\prod_{j = 1}^{n_i} \mathcal{L}_q(\mathbf{y}_{i,j}; \alpha, \tau^2 \rvert \beta_i) f(\beta_i) d\beta_i \end{aligned} \label{eq:uncondition-log-lik}\]

Remember that the above integral is multi-dimensional if $\beta_i$ has dimension greater than $1$!

We usually do maximum likelihood estimation for the parameter vector $\beta$ by taking the derivative of the log of the above, setting it equal to zero, and solving for $\beta$. However, depending upon the form of $f(\cdot)$ and the number of random effects we have, Eq. \eqref{eq:uncondition-log-lik} may be a multi-dimensional integral that is difficult to evaluate, let alone differentiate. One way is to use quadrature to approximate the integral as a summation. However, there are two other, computationally more feasible, methods that provide a work-around.

There are two common methods of approximate inference for GLMMs: penalized quasi-likelihood (PQL) and marginal quasi-likelihood (MQL). They are very similar in that they both perform a Taylor approximation of the conditional log quasi-likelihood to evaluate the integral in Eq. \eqref{eq:uncondition-log-lik}. However, PQL uses the maximum a priori estimates of the random effects as the operating point for the expansion, while MQL uses the random effects mean. See below for a discussion of the approaches.

Penalized Quasi-Likelihood. Penalized quasi-likelihood (PQL) essentially uses Laplace's method to approximate the integral above. The general idea behind Laplace's method is to approximate integrals of a certain form as: $$ \int \exp(M f(x)) dx \approx \exp(M f(x_0)) \int \exp\left( - \frac{1}{2} M \rvert f''(x_0) \rvert (x - x_0)^2 \right) dx $$ where $M$ is a large scalar, $f$ is a twice-differentiable function, and $x_0$ is a global maximum of $f$. If $$f''(x_0) < 0$$ and $M \rightarrow \infty$, then the integrand above is basically a Gaussian kernel and the approximation becomes: $$ \int \exp(M f(x)) dx \approx \left(\frac{2 \pi}{M \rvert f''(x_0) \rvert} \right)^{\frac{1}{2}} \exp(M f(x_0)) \label{eq:ql-gauss} $$ Letting $c = (2 \pi)^{-\frac{m}{2}}$ and $d_{i,j}(y_{i,j}, \mu_{i,j}) = -2 \int_{y_{i,j}}^{\mu_{i,j}} \frac{y_{i,j} - z}{a_{i,j} V(z)} dz$, we can rewrite the integral as: $$ \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2) = c \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \int \exp\left( -\kappa(\beta) \right) d \beta; \hspace{5mm} \kappa(\beta) = \frac{1}{2\phi}\sum_{i = 1}^n \sum_{j = 1}^{n_i} d_{i,j}(y_{i,j}, \mu_{i,j}) + \frac{1}{2}\beta^\top D^{-1}(\tau^2) \beta $$
Proof. $$ \begin{aligned} \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2) &= \int \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \mathcal{L}(\beta) d \beta \\ &= \int \left[ \prod_{i = 1}^n \prod_{j = 1}^{n_i} \exp\left( \ell_q(y_{i,j}; \mu_{i,j} \rvert \beta) \right) \right] (2 \pi)^{-\frac{m}{2}} \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \exp\left(- \frac{1}{2}\beta^\top D^{-1}(\tau^2) \beta \right) d\beta \\ &= (2 \pi)^{-\frac{m}{2}} \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \int \exp\left(\sum_{i = 1}^n \sum_{j = 1}^{n_i} \left( \int_{y_{i,j}}^{\mu_{i,j}} \frac{y_{i,j} - z}{\phi a_{i,j} V(z)} dz \right) - \frac{1}{2}\beta^\top D^{-1}(\tau^2) \beta \right) d\beta \\ &= (2 \pi)^{-\frac{m}{2}} \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \int \exp\left(- \frac{1}{2 \phi} \sum_{i = 1}^n \sum_{j = 1}^{n_i} \left( -2 \int_{y_{i,j}}^{\mu_{i,j}} \frac{y_{i,j} - z}{a_{i,j} V(z)} dz \right) - \frac{1}{2}\beta^\top D^{-1}(\tau^2) \beta \right) d\beta \\ &= (2 \pi)^{-\frac{m}{2}} \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \int \exp\left(- \frac{1}{2 \phi} \sum_{i = 1}^n \sum_{j = 1}^{n_i} d_{i,j}(y_{i,j}, \mu_{i,j}) - \frac{1}{2}\beta^\top D^{-1}(\tau^2) \beta \right) d\beta & \left( d_{i,j}(y_{i,j}, \mu_{i,j}) = -2 \int_{y_{i,j}}^{\mu_{i,j}} \frac{y_{i,j} - z}{a_{i,j} V(z)} dz \right) \end{aligned} \nonumber $$
Note that the first term in $\kappa(\beta)$ does involve $\beta$ since $$\mu_{i,j} = g^{-1}(\mathbf{x}_{i,j}^\top \alpha + \mathbf{z}_{i,j}^\top \beta_j)$$. Let $\kappa'$ and $\kappa''$ denote the vector of first-order partial derivatives and the matrix of second-order partial derivatives, respectively, of $\kappa$ with respect to $\beta$: $$ \begin{aligned} \kappa'_k(\beta) &= -\sum_{j = 1}^{n_k} \left(\frac{y_{k,j} - \mu_{k,j}}{\phi a_{k,j} V(\mu_{k,j})\frac{\partial g(\mu_{i,j})}{\partial \mu_{i,j}}}\right)\mathbf{z}_{i,j} + D^{-1}(\tau^2) \beta_k \\ \kappa''(\beta) &\approx \tilde{\mathbf{Z}}^\top \mathbf{W} \tilde{\mathbf{Z}} + D^{-1}(\tau^2) \end{aligned} $$ where $\mathbf{W}$ is the $n \times n$ diagonal matrix with elements $\frac{1}{\phi a_{i,j} V(\mu_{i,j}) \frac{\partial g(\mu_{i,j})}{\partial \mu_{i,j}}}$, and $\tilde{\mathbf{Z}}$ is the $n \times q$ matrix formed by concatenating the $\tilde{\mathbf{Z}}^t$ matrices.
Proof. $$ \begin{aligned} \kappa'_k(\beta) &= \frac{\partial}{\partial \beta_k} \left[\frac{1}{2 \phi} \sum_{i = 1}^k \sum_{j = 1}^{n_i} d_{i,j}(y_{i,j}, \mu_{i,j}) + \frac{1}{2} \beta^\top D^{-1}(\tau^2) \beta \right] \\ &= \frac{1}{2 \phi} \sum_{i = 1}^k \sum_{j = 1}^{n_i} \frac{\partial}{\partial \beta_k} \left[d_{i,j}(y_{i,j}, \mu_{i,j}) \right] + D^{-1}(\tau^2) \beta \\ &= \frac{1}{2 \phi} \sum_{j = 1}^{n_k} \frac{\partial \mu_{k,j}}{\partial \beta_k} \frac{\partial}{\partial \mu_{k,j}}\left[ -2 \int_{y_{k,j}}^{\mu_{k,j}} \frac{y_{k,j} - z}{a_{k,j} V(z)}dz \right] + D^{-1}(\tau^2) \beta_k \\ &= \frac{1}{2 \phi} \sum_{j = 1}^{n_k} \frac{\partial \mu_{k,j}}{\partial \beta_k} \left(-2 \frac{y_{k,j} - \mu_{k,j}}{a_{k,j} V(\mu_{k,j})}\right) + D^{-1}(\tau^2) \beta_k \\ &= -\sum_{j = 1}^{n_k} \frac{\partial \mu_{k,j}}{\partial \beta_k} \left(\frac{y_{k,j} - \mu_{k,j}}{\phi a_{k,j} V(\mu_{k,j})}\right) + D^{-1}(\tau^2) \beta_k \\ \end{aligned} $$ Assuming $g(\cdot)$ is strictly monotone and continuous, then we have: $$ \frac{\partial g^{-1}(\eta_{i,j})}{\partial \eta_{i,j}} = \frac{\partial \mu_{i,j}}{\partial \eta_{i,j}} = \left(\frac{\partial \eta_{i,j}}{\partial \mu_{i,j}}\right)^{-1} = \left(\frac{\partial g(\mu_{i,j})}{\partial \mu_{i,j}}\right)^{-1} $$ This implies: $$ \frac{\partial \mu_{k,j}}{\partial \beta_k} = \frac{\partial \eta_{k,j}}{\partial \beta_k} \frac{\partial \mu_{k, j}}{\partial \eta_{i,j}} = \mathbf{z}_{i,j} \left(\frac{\partial \eta_{i,j}}{\partial \mu_{i,j}}\right)^{-1} = \frac{\mathbf{z}_{i,j}}{\frac{\partial g(\mu_{i,j})}{\partial \mu_{i,j}}} $$ Thus, $\kappa'_k(\beta)$ is: $$ \kappa'_k(\beta) = -\sum_{j = 1}^{n_k} \left(\frac{y_{k,j} - \mu_{k,j}}{\phi a_{k,j} V(\mu_{k,j})\frac{\partial g(\mu_{k,j})}{\partial \mu_{k,j}}}\right)\mathbf{z}_{k,j} + D^{-1}(\tau^2) \beta_k $$ Looking at the second-order partial derivatives: $$ \begin{aligned} \kappa''_{k,k}(\beta) &= \frac{\partial}{\partial \beta_k^\top} \left[ -\sum_{j = 1}^{n_k} \left(\frac{y_{k,j} - \mu_{k,j}}{\phi a_{k,j} V(\mu_{k,j})\frac{\partial g(\mu_{k,j})}{\partial \mu_{k,j}}}\right)\mathbf{z}_{k,j} + D^{-1}(\tau^2) \beta_k \right] \\ &= - \sum_{j = 1}^{n_k} \left( \frac{\mathbf{z}_{k,j}}{\phi a_{k,j} V(\mu_{k,j}) \frac{\partial g(\mu_{k,j})}{\partial \mu_{k,j}}} \frac{\partial}{\partial \beta_k^\top} \left[ y_{k,j} - \mu_{k,j} \right] + \mathbf{z}_{k,j}(y_{k,j} - \mu_{k,j})\frac{\partial}{\partial \beta_k^\top} \left[ \frac{1}{\phi a_{k,j} V(\mu_{k,j}) \frac{\partial g(\mu_{k,j})}{\partial \mu_{k,j}}} \right] \right) + D^{-1}(\tau^2) \\ &= - \sum_{j = 1}^{n_k} \left( \frac{-\mathbf{z}_{k,j}\mathbf{z}_{k, j}^\top}{\phi a_{k,j} V(\mu_{k,j}) \left(\frac{\partial g(\mu_{k,j})}{\partial \mu_{k,j}}\right)^2} + \mathbf{z}_{k,j}(y_{k,j} - \mu_{k,j})\frac{\partial}{\partial \beta_k^\top} \left[ \frac{1}{\phi a_{k,j} V(\mu_{k,j}) \frac{\partial g(\mu_{k,j})}{\partial \mu_{k,j}}} \right] \right) + D^{-1}(\tau^2) \\ &= \sum_{j = 1}^{n_k} \frac{\mathbf{z}_{k,j}\mathbf{z}_{k, j}^\top}{\phi a_{k,j} V(\mu_{k,j}) \left(\frac{\partial g(\mu_{k,j})}{\partial \mu_{k,j}}\right)^2} + D^{-1}(\tau^2) + R_k \\ &=(\tilde{\mathbf{Z}}^k)^\top \mathbf{W}_k \tilde{\mathbf{Z}}^k + D^{-1}(\tau^2) + R_k \end{aligned} $$ where in $(i)$, we set: $$ R_k = -\sum_{j = 1}^{n_k} (y_{k,j} - \mu_{k,j}) \mathbf{z}_{k,j} \frac{\partial}{\partial \beta_k^\top} \left[ \frac{1}{\phi a_{k,j} V(\mu_{k,j}) \frac{\partial g(\mu_{k,j})}{\partial \mu_{k,j}}} \right] $$ and in $(ii)$, $\mathbf{W}_k$ is the diagonal matrix with elements: $$ \frac{1}{\phi a_{k,j} V(\mu_{k,j}) \frac{\partial g(\mu_{k,j})}{\partial \mu_{k,j}}}$$ Since $\kappa'_k(\beta)$ only involves $\beta_k$, $\kappa''_{k, k'}(\beta) = \mathbf{0}$. Stacking all of the $\mathbf{W}_k$ and $R_k$ together, we get: $$ \kappa''(\beta) = \tilde{\mathbf{Z}}^\top \mathbf{W} \tilde{\mathbf{Z}} + D^{-1}(\tau^2) + R \approx \tilde{\mathbf{Z}}^\top \mathbf{W} \tilde{\mathbf{Z}} + D^{-1}(\tau^2) $$ where we drop $R$ for the approximation.
Letting $\hat{\beta}$ be a solution to the equation $\kappa'(\beta) = \mathbf{0}$, we can ignore constants and remainders (by assuming they are negligible in the long run...) to obtain the approximation in Eq. \eqref{eq:ql-gauss}: $$ \begin{aligned} \ell_q(\mathbf{y}; \alpha, \tau^2) &\approx \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \log\left( \left( \frac{2 \pi}{\rvert \kappa''(\beta_0) \rvert}\right)^{\frac{1}{2}} \exp(-\kappa(\beta_0)) \right) \\ &= \frac{1}{2}\log(2 \pi) - \frac{1}{2} \log(\kappa''(\beta_0)) + \kappa(\beta_0) \\ &\approx -\frac{1}{2} \log\left( \rvert D(\tau^2) \rvert \right) -\frac{1}{2} \log\left( \rvert \tilde{\mathbf{Z}}^\top \mathbf{W} \mathbf{\tilde{Z}} + D^{-1}(\tau^2) \rvert \right) + \frac{1}{2\phi} \sum_{i = 1}^k \sum_{j = 1}^{n_i} d_{i,j}(y_{i,j}, \mu_{i,j}) + \frac{1}{2}\beta^\top D^{-1}(\tau^2) \beta \end{aligned} $$
Marginal Quasi-Likelihood. An alternative way to think about a generalized linear mixed model is from a marginal perspective, where we instead focus on the unconditional expectation of the response: $$ \mathbb{E}[y_{i,j}] = \mu_{i,j} = g(\mathbf{x}_{i,j}^\top \alpha) $$ In contrast to PQL, marginal quasi-likelihood (MQL) performs a Taylor approximation of the log quasi-likelihood about the random effects (prior) mean $\beta = \mathbf{0}$ to get a closed form solution to the multi-dimensional integral. For the $i$-th cluster, this approximation is: $$ \begin{aligned} \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) \approx \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)\rvert_{\beta_i = \mathbf{0}} + \beta_i^\top \left[ \frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i} \right] \bigg\rvert_{\beta_i = \mathbf{0}} + \frac{1}{2} \beta_i^\top \left[ \frac{\partial^2 \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i \partial \beta_i^\top} \right] \bigg\rvert_{\beta_i = \mathbf{0}} \beta_i \end{aligned} $$ Thus: $$ \begin{aligned} \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2) &= \prod_{i = 1}^k (2 \pi)^{-\frac{q}{2}} \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \int \exp\left( \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) - \frac{1}{2}\beta_i^\top D^{-1}(\tau^2) \beta_i \right) d \beta_i \\ &\approx \prod_{i = 1}^k (2 \pi)^{-\frac{q}{2}} \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \int \exp\left( [\ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)]_{\beta_i = \mathbf{0}} + \beta_i^\top \left[ \frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i} \right] \bigg\rvert_{\beta_i = \mathbf{0}} + \frac{1}{2} \beta_i^\top \left[ \frac{\partial^2 \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i \partial \beta_i^\top} \right] \bigg\rvert_{\beta_i = \mathbf{0}} \beta_i - \frac{1}{2}\beta_i^\top D^{-1}(\tau^2) \beta_i \right) d \beta_i \\ &= \prod_{i = 1}^k (2 \pi)^{-\frac{q}{2}} \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \left[\mathcal{L}_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) \right] \bigg\rvert_{\beta_i = \mathbf{0}} \int \exp\left( \beta_i^\top \left[ \frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i} \right] \bigg\rvert_{\beta_i = \mathbf{0}} - \frac{1}{2} \beta_i^\top \left[ D^{-1}(\tau^2) - \left[ \frac{\partial^2 \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i \partial \beta_i^\top} \right] \bigg\rvert_{\beta_i = \mathbf{0}} \right] \beta_i\right) d \beta_i \end{aligned} \label{eq:big-integral} $$ We can rewrite Eq. \eqref{eq:big-integral} in canonical form by defining: $$ \mathbf{K}_i = D^{-1}(\tau^2) - \left[ \frac{\partial^2 \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i \partial \beta_i^\top} \right] \bigg\rvert_{\beta_i = \mathbf{0}}; \hspace{5mm} \mathbf{h}_i = \left[ \frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i} \right] \bigg\rvert_{\beta_i = \mathbf{0}} $$ And: $$ \bar{\Sigma}_i^{-1} = \mathbf{K}_i; \hspace{5mm} \bar{\mu}_i = \bar{\Sigma}_i \mathbf{h}_i; \hspace{5mm} g_i = - \frac{1}{2} \bar{\mu}_i^\top \bar{\Sigma}_i^{-1} \bar{\mu}_i - \log\left((2 \pi)^{\frac{q}{2}} \rvert \bar{\Sigma}_i \rvert^{\frac{1}{2}} \right) $$ Then: $$ \begin{aligned} \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2) &\approx \prod_{i = 1}^k \left\rvert \mathbb{I}_{q \times q} - D(\tau^2) \left[ \frac{\partial^2 \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i \partial \beta_i^\top} \right] \bigg\rvert_{\beta_i = \mathbf{0}}\right\rvert^{-\frac{1}{2}} \left[\mathcal{L}_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) \right]\bigg\rvert_{\beta_i = \mathbf{0}} \exp\left(\frac{1}{2} \left[\mathbf{h}_i \mathbf{K}_i^{-1} \mathbf{h}_i \right] \right) \\ \implies \ell_q(\mathbf{y}; \alpha, \tau^2) &\approx \sum_{i = 1}^k \left[ -\frac{1}{2} \log\left( \left\rvert \mathbb{I}_{q \times q} - D(\tau^2) \left[ \frac{\partial^2 \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i \partial \beta_i^\top} \right] \bigg\rvert_{\beta_i = \mathbf{0}}\right\rvert \right) + \left[ \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) \right] \bigg\rvert_{\beta_i = \mathbf{0}} + \frac{1}{2} \mathbf{h}_i^\top \mathbf{K}_i^{-1} \mathbf{h}_i \right] \end{aligned} \label{eq:marg-qll} $$
Proof. $$ \begin{aligned} \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2) &\approx \prod_{i = 1}^k (2 \pi)^{-\frac{q}{2}} \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \left[\mathcal{L}_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) \right] \bigg\rvert_{\beta_i = \mathbf{0}} \underbrace{\int \exp\left( \beta_i^\top \mathbf{h}_i - \frac{1}{2} \beta_i^\top \mathbf{K}_i \beta_i\right) d \beta_i}_{= \exp(-g_i)} \\ &= \prod_{i = 1}^k (2 \pi)^{-\frac{q}{2}} \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \left[\mathcal{L}_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) \right] \bigg\rvert_{\beta_i = \mathbf{0}} \exp\left(\frac{1}{2}\bar{\mu}_i^\top \bar{\Sigma}_i^{-1}\bar{\mu}_i + \log\left((2 \pi)^{\frac{q}{2}} \rvert \bar{\Sigma}_i \rvert^{\frac{1}{2}}\right) \right) \\ &= \prod_{i = 1}^k (2 \pi)^{-\frac{q}{2}} \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \left[\mathcal{L}_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) \right]\bigg\rvert_{\beta_i = \mathbf{0}} \exp\left(\frac{1}{2}\bar{\mu}_i^\top \bar{\Sigma}_i^{-1}\bar{\mu}_i\right) (2 \pi)^{\frac{q}{2}} \rvert \bar{\Sigma}_i \rvert^{\frac{1}{2}} \\ &= \prod_{i = 1}^k \rvert D(\tau^2)\rvert^{-\frac{1}{2}} \rvert \bar{\Sigma}_i^{-1} \rvert^{-\frac{1}{2}}\left[\mathcal{L}_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) \right]\bigg\rvert_{\beta_i = \mathbf{0}} \exp\left(\frac{1}{2}\bar{\mu}_i^\top \bar{\Sigma}_i^{-1}\bar{\mu}_i\right) \\ &= \prod_{i = 1}^k \rvert D(\tau^2) \mathbf{K}_i \rvert^{-\frac{1}{2}} \left[\mathcal{L}_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) \right]\bigg\rvert_{\beta_i = \mathbf{0}} \exp\left(\frac{1}{2} \left[\mathbf{h}_i (\mathbf{K}_i^{-1})^\top \mathbf{K}_i \mathbf{K}_i^{-1} \mathbf{h}_i \right] \right) \\ &= \prod_{i = 1}^k \left\rvert D(\tau^2) \left[ D^{-1}(\tau^2) - \left[ \frac{\partial^2 \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i \partial \beta_i^\top} \right] \bigg\rvert_{\beta_i = \mathbf{0}}\right] \right\rvert^{-\frac{1}{2}} \left[\mathcal{L}_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) \right]\bigg\rvert_{\beta_i = \mathbf{0}} \exp\left(\frac{1}{2} \left[\mathbf{h}_i \mathbf{K}_i^{-1} \mathbf{h}_i \right] \right) \\ &= \prod_{i = 1}^k \left\rvert \mathbb{I}_{q \times q} - D(\tau^2) \left[ \frac{\partial^2 \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i \partial \beta_i^\top} \right] \bigg\rvert_{\beta_i = \mathbf{0}} \right\rvert^{-\frac{1}{2}} \left[\mathcal{L}_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) \right]\bigg\rvert_{\beta_i = \mathbf{0}} \exp\left(\frac{1}{2} \left[ \frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i^\top } \right] \bigg\rvert_{\beta_i = \mathbf{0}} \left[ D^{-1}(\tau^2) - \left[ \frac{\partial^2 \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i \partial \beta_i^\top} \right] \bigg\rvert_{\beta_i = \mathbf{0}} \right]^{-1} \left[ \frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i} \right] \bigg\rvert_{\beta_i = \mathbf{0}}\right) \end{aligned} $$

An Aside

Penalized quasi-likelihood and marginal quasi-likelihood are quite similar, but they do have notable differences. For one, MQL will not result in predictions for the random effects. Thus, it is not suitable for inference at the group level. In addition, the estimates of the fixed effects are essentially for the marginal model. Fitzmaurice et al.1 note that MQL estimates are highly biased unless the random effects variance is near zero, regardless of the number of repeated measurements.

In contrast, though PQL can be used to estimate the fixed effects and predict the random effects, there are some complications. For small counts or low numbers of repeated measurements, PQL underestimates the fixed effects as well as the variance components. PQL and MQL can also be reframed as approximations of a GLMM with an adjusted LMM (see below).

Pseudo-Likelihood. We can reframe penalized and marginal quasi-likelihood as arising from a Gaussian approximation of the GLMM at hand. We'll follow Chapter 15 in Fitzmaurice et al.1 We assume to have the following generalized linear mixed model: $$ \mathbf{y}_{i,j} \rvert \beta_i \sim \mathcal{F}(\mu_{i,j}); \hspace{10mm} \mu_{i,j} = g^{-1}\left(\eta_{i,j}\right) = g^{-1}\left(\alpha^\top \mathbf{x}_{i,j} + \beta_i^\top \mathbf{z}_{i,j}\right) \label{eq:glmm-y} $$ We first approximate the response by assuming it can be represented as its conditional mean plus some Gaussian error: $$ \mathbf{y}_{i,j} \approx \mu_{i,j} + \epsilon_{i,j} $$ where, for some variance function $V(\cdot)$ of the conditional mean (e.g. $V(\mu_{i,j}) = \mu_{i,j}$ in the Poisson case), the errors satisfy: $$ \mathbb{E}\left[ \epsilon_{i,j} \right] = 0; \hspace{5mm} \text{Var}(\epsilon_{i,j} ) = V(\mu_{i,j}); \hspace{5mm} \text{Cov}(\epsilon_{i,j}, \epsilon_{i', j'} ) = 0 $$ Since the link function is non-linear, we will linearize the mean with a first-order Taylor approximation about $\hat{\eta}_{i,j}$, the linear predictor estimated under $H_0$: $$ \mathbf{y}_{i,j} \approx g^{-1}(\hat{\eta}_{i,j}) + \delta(\hat{\eta}_{i,j}) (\eta_{i,j} - \hat{\eta}_{i,j}) + \epsilon_{i,j} $$ where $\delta(\hat{\eta}_{i,j}) = \frac{\partial g^{-1}(\eta_{i,j})}{\partial \eta_{i,j}}\bigg\rvert_{\eta_{i,j} = \hat{\eta}_{i,j}}$, the gradient of the inverse link function w.r.t $\eta_{i,j}$ evaluated at the estimate under $H_0$.
Proof. $$ \begin{aligned} \mathbf{y}_{i,j} &\approx \left[g^{-1}(\eta_{i,j})\right]\bigg\rvert_{\eta_{i,j} = \hat{\eta}_{i,j}} + \frac{\partial g^{-1}(\eta_{i,j})}{\partial \eta_{i,j}}\bigg\rvert_{\eta_{i,j} = \hat{\eta}^{(c)}_{i,j}} (\eta_{i,j} - \hat{\eta}_{i,j}) + \epsilon_{i,j} \\ &= g^{-1}(\hat{\eta}_{i,j}) + \delta(\hat{\eta}_{i,j}) (\eta_{i,j} - \hat{\eta}_{i,j}) + \epsilon_{i,j} & \left(\delta(\hat{\eta}_{i,j}) = \frac{\partial g^{-1}(\eta_{i,j})}{\partial \eta_{i,j}}\bigg\rvert_{\eta_{i,j} = \hat{\eta}^{(c)}_{i,j}}\right) \end{aligned} \nonumber $$
We can rearrange the above as: $$ \mathbf{y}_{i,j}^\star \approx \alpha^\top \mathbf{x}_{i,j} + \beta_i^\top \mathbf{z}_{i,j} + \epsilon^\star_{i,j} $$ where $\mathbf{y}_{i,j}^\star = \frac{1}{\delta(\hat{\eta}_{i,j})}\left(\mathbf{y}_{i,j} - \hat{\mu}_{i,j}\right) + \hat{\eta}_{i,j}$ and $\epsilon^\star_{i,j} = \frac{1}{\delta(\hat{\eta}_{i,j})}\epsilon_{i,j}$.
Proof. $$ \begin{aligned} \implies &\mathbf{y}_{i,j} \approx g^{-1}(\hat{\eta}_{i,j}) + \delta(\hat{\eta}_{i,j}) (\eta_{i,j} - \hat{\eta}_{i,j}) + \epsilon_{i,j} \\ \implies &\mathbf{y}_{i,j} \approx \hat{\mu}_{i,j} + \delta(\hat{\eta}_{i,j})(\alpha^\top \mathbf{x}_{i,j} + \beta_i^\top \mathbf{z}_{i,j}) - \delta(\hat{\eta}_{i,j})(\hat{\alpha}^\top \mathbf{x}_{i,j} + \mathbf{0}^\top \mathbf{z}_{i,j}) + \epsilon_{i,j} \\ \implies &\mathbf{y}_{i,j} + \delta(\hat{\eta}_{i,j})(\hat{\alpha}^\top \mathbf{x}_{i,j}) \approx \hat{\mu}_{i,j} + \delta(\hat{\eta}_{i,j})(\alpha^\top \mathbf{x}_{i,j} + \beta_i^\top \mathbf{z}_{i,j}) + \epsilon_{i,j} \\ \implies &\mathbf{y}_{i,j} - \hat{\mu}_{i,j} + \delta(\hat{\eta}_{i,j})(\hat{\alpha}^\top \mathbf{x}_{i,j}) \approx \delta(\hat{\eta}_{i,j})(\alpha^\top \mathbf{x}_{i,j} + \beta_i^\top \mathbf{z}_{i,j}) + \epsilon_{i,j} \\ \implies &\frac{1}{\delta(\hat{\eta}_{i,j})}\left(\mathbf{y}_{i,j} - \hat{\mu}_{i,j}\right) + \hat{\alpha}^\top \mathbf{x}_{i,j} \approx \alpha^\top \mathbf{x}_{i,j} + \beta_i^\top \mathbf{z}_{i,j} + \frac{1}{\delta(\hat{\eta}_{i,j})}\epsilon_{i,j} \\ \implies &\frac{1}{\delta(\hat{\eta}_{i,j})}\left(\mathbf{y}_{i,j} - \hat{\mu}_{i,j}\right) + \hat{\eta}_{i,j} \approx \alpha^\top \mathbf{x}_{i,j} + \beta_i^\top \mathbf{z}_{i,j} + \frac{1}{\delta(\hat{\eta}_{i,j})}\epsilon_{i,j} \end{aligned} \nonumber $$
Notice that since $\epsilon^*_{i,j}$ is just a scaled version of $\epsilon_{i,j}$, it is also Gaussian with: $$ \mathbb{E}\left[ \epsilon^\star_{i,j} \right] = \mathbb{E}\left[ \frac{1}{\delta(\hat{\eta}_{i,j})} \epsilon_{i,j} \right] = 0; \hspace{8mm} \text{Var}\left(\epsilon^\star_{i,j} \right) = \text{Var}\left( \frac{1}{\delta(\hat{\eta}_{i,j})} \epsilon_{i,j} \right) = \frac{1}{\delta^2(\hat{\eta}_{i,j})} \text{Var}(\epsilon_{i,j}) = \frac{V(\hat{\mu}_{i,j})}{\delta^2(\hat{\eta}_{i,j})} \nonumber $$ The above essentially specifies a linear mixed model: $$ \mathbf{y}^\star_{i,j} \sim \mathcal{N}\left(\alpha^\top \mathbf{x}_{i,j} + \beta_i^\top \mathbf{z}_{i,j}, \frac{V(\hat{\mu}_{i,j})}{\delta^2(\hat{\eta}_{i,j})}\right) \nonumber $$

Alternative Approximation

Unfortunately, the approximate inference methods introduced above require a particular distribution for the random effects. If we want to relax this as Lin2 does, we have to do something slightly more involved.

Taylor Approximation

First, let’s rewrite the conditional quasi-likelihood of $(\alpha, \tau^2)$ given $\beta$ using a Taylor expansion. We’ll use a second-order expansion since the remainder will be relatively simple due to the assumptions made on the higher order moments of the random effects.

Denote the first- and second-order partial derivatives of $\mathcal{L}_{q}(\alpha, \tau^2; y \rvert \beta) = \exp(\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta))$ with:

\[\begin{aligned} \frac{\partial \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2\rvert \beta)}{\partial \beta} &= \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \mathbf{Z}^\top \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta}\\ \frac{\partial^2 \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2\rvert \beta)}{\partial \beta \partial \beta^\top} &= \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\mathbf{Z}^\top \left(\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta}\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta^\top} + \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top} \right) \mathbf{Z} \end{aligned} \label{eq:likelihood-derivs}\]
First Order Derivations. $$ \begin{aligned} \frac{\partial \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2\rvert \beta)}{\partial \beta} &= \frac{\partial}{\partial \beta} \left[ \exp\left(\ell_{q}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right) \right] \\ &= \exp\left( \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right) \frac{\partial}{\partial \beta} \left[ \sum_{i = 1}^n \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta) \right] \\ &= \exp\left( \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right) \frac{\partial \eta}{\partial \beta} \frac{\partial}{\partial \eta} \left[ \sum_{i = 1}^n \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta) \right] & \left( \eta \in \mathbb{R}^{n \times 1}, \beta \in \mathbb{R}^{q \times 1} \implies \frac{\partial \eta}{\partial \beta} \in \mathbb{R}^{n \times q} \right) \\ &= \exp\left( \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right)\mathbf{Z}^\top \frac{\partial}{\partial \eta} \left[ \sum_{i = 1}^n\ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta) \right] & \left(\eta = \mathbf{X} \alpha + \mathbf{Z} \beta \right) \\ &= \exp\left( \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right) \sum_{i = 1}^n\frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i} (\mathbf{Z}_{\cdot, i})^\top & \left( \frac{\partial \ell_q}{\partial \eta} \in \mathbb{R}^{n \times 1}, \mathbf{Z}^\top \in \mathbb{R}^{q \times n} \right) \end{aligned} \nonumber $$ In matrix notation: $$ \frac{\partial \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2\rvert \beta)}{\partial \beta} = \exp(\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)) \mathbf{Z}^\top \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta} = \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \mathbf{Z}^\top \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta} \nonumber $$
Second Order Derivations. $$ \begin{aligned} \frac{\partial^2 \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2\rvert \beta)}{\partial \beta \partial \beta^\top} &= \frac{\partial}{\partial \beta} \left[\exp\left( \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right) \sum_{i = 1}^n\frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i} (\mathbf{Z}_{\cdot, i})^\top \right] \\ &= \underbrace{\frac{\partial}{\partial \beta}[\exp(\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta))] \left( \sum_{i = 1}^n \frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i} \mathbf{z}_i^\top\right)}_{(a)} + \underbrace{\exp(\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)) \frac{\partial}{\partial \beta} \left[ \sum_{i = 1}^n \frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i} (\mathbf{Z}_{i, \cdot})^\top\right]}_{(b)} & \left( \text{product rule}\right) \\ &= \exp(\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)) \left[\left( \sum_{i = 1}^n \frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i} (\mathbf{Z}_{i, \cdot})^\top \right) \left( \sum_{i = 1}^n \frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i} (\mathbf{Z}_{i, \cdot}) \right) + \sum_{i = 1}^n \frac{\partial^2 \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i^2} (\mathbf{Z}_{i, \cdot})^\top \mathbf{Z}_{i, \cdot} \right] \end{aligned} \nonumber $$ We show $(a)$ and $(b)$ separately: $$ \begin{aligned} (a) &=\frac{\partial}{\partial \beta} \left[ \exp\left( \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right) \right] \left[ \sum_{i = 1}^n \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta) (\mathbf{Z}_{\cdot, i})^\top \right]^\top \\ &= \exp\left(\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right) \left(\sum_{i = 1}^n\frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i} (\mathbf{Z}_{i, \cdot})^\top\right)\left(\sum_{i = 1}^n\frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i} (\mathbf{Z}_{i, \cdot})^\top\right)^\top\\ &= \exp\left(\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right) \left(\sum_{i = 1}^n\frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i} (\mathbf{Z}_{i, \cdot})^\top\right)\left(\sum_{i = 1}^n\frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i} \mathbf{Z}_{i, \cdot} \right) \\ (b) &= \exp\left( \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right)\frac{\partial}{\partial \beta} \left[ \sum_{i = 1}^n \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta) (\mathbf{Z}_{i, \cdot})^\top \right] \\ &= \exp\left( \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right) \frac{\partial}{\partial \eta} \left[ \sum_{i = 1}^n \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta) (\mathbf{Z}_{i, \cdot})^\top \right]\frac{\partial \eta}{\partial \beta} \\ &= \exp\left( \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right) \frac{\partial}{\partial \eta} \left[ \sum_{i = 1}^n \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta) (\mathbf{Z}_{i, \cdot})^\top \right] \mathbf{Z} & \left(\eta = \mathbf{X} \alpha + \mathbf{Z} \beta \right) \\ &= \exp\left( \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right)\left( \sum_{i = 1}^n \frac{\partial^2 \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i^2} (\mathbf{Z}_{i, \cdot})^\top \mathbf{Z}_{i, \cdot} \right) & \left( \frac{\partial \ell_q}{\partial \eta} \in \mathbb{R}^{n \times 1}, \mathbf{Z}^\top \in \mathbb{R}^{q \times n}, (\mathbf{Z}_{i, \cdot})^\top \in \mathbb{R}^{q \times 1} \right) \\ \end{aligned} \nonumber $$ In matrix notation: $$ \begin{aligned} \frac{\partial^2 \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2\rvert \beta)}{\partial \beta \partial \beta^\top} &= \exp(\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)) \left( \mathbf{Z}^\top \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta}\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta^\top} \mathbf{Z} + \mathbf{Z}^\top \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top} \mathbf{Z} \right)\\ &= \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \mathbf{Z}^\top \left( \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta}\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta^\top}+ \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top}\right) \mathbf{Z} \end{aligned} \nonumber $$
Example.
We can check the above derivations with our example. We have: $$ \frac{\partial \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta} = \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left(\mathbf{y} - \mu \right), \hspace{5mm} \frac{\partial^2 \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top} = \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left[ (\mathbf{y} - \mu)(\mathbf{y} - \mu)^\top - \text{diag}(\mu)\right] \label{eq:derivs-eta} $$
Proof. We first find the first-order derivatives of the log quasi-likelihood with respect to the linear predictor, $\eta$: $$ \begin{aligned} \frac{\partial \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta_{t, r}} &= \frac{\partial}{\partial \eta_{t,r}} \left[ \exp\left(\sum_{i = 1}^k \sum_{j = 1}^{n_i} \left[ \mathbf{y}_{i,j} \log(\mu_{i,j}) - \mu_{i,j} - \log(\mathbf{y}_{i,j}!) \right]\right)\right] \\ &= \exp\left(\sum_{i = 1}^k \sum_{j = 1}^{n_i} \left[ \mathbf{y}_{i,j} \log(\mu_{i,j}) - \mu_{i,j} - \log(\mathbf{y}_{i,j}!) \right]\right) \sum_{i = 1}^k \sum_{j = 1}^{n_i} \frac{\partial}{\partial \eta_{t,r}} \left[ \mathbf{y}_{i,j} \eta_{i,j} - \exp(\eta_{i,j}) - \log(\mathbf{y}_{i,j}!) \right] \\ &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left(\mathbf{y}_{t,r} - \exp(\eta_{t,r})\right) \\ &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left(\mathbf{y}_{t,r} - \mu_{t,r}\right) \\ \implies \frac{\partial \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta} &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left(\mathbf{y} - \mu \right) \end{aligned} \nonumber $$ Next, the second-order derivatives: $$ \begin{aligned} \frac{\partial^2 \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta_{t, r}^2} &= \frac{\partial}{\partial \eta_{t,r}} \left[ \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left(\mathbf{y}_{t,r} - \exp(\eta_{t,r}) \right) \right] \\ &= \frac{\partial \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta_{t,r}}(\mathbf{y}_{t,r} - \mu_{t,r}) + \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) (-\exp(\eta_{t,r})) \\ &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left(\mathbf{y}_{t,r} - \mu_{t,r}\right)^2 - \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \mu_{t,r} \\ &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left( \left(\mathbf{y}_{t,r} - \mu_{t,r}\right)^2 - \mu_{t,r} \right) \\ \frac{\partial^2 \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta_{t,r} \partial \eta_{s, q}} &= \frac{\partial}{\partial \eta_{s, q}} \left[ \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left(\mathbf{y}_{t,r} - \exp(\eta_{t,r}) \right) \right] \\ &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left(\mathbf{y}_{s,q} - \mu_{s,q}\right)\left(\mathbf{y}_{t,r} - \mu_{t,r} \right) \\ \implies \frac{\partial^2 \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top} &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left[ (\mathbf{y} - \mu)(\mathbf{y} - \mu)^\top - \text{diag}(\mu)\right] \end{aligned} \nonumber $$
And also: $$ \frac{\partial \mathcal{L}(\mathbf{y};\alpha, \tau^2 \rvert \beta)}{\partial \beta} = \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \mathbf{Z}^\top (\mathbf{y} - \mu), \hspace{5mm} \frac{\partial^2 \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \beta \partial \beta^\top} = \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \mathbf{Z}^\top \left[ (\mathbf{y} - \mu) (\mathbf{y} - \mu)^\top - \text{diag}(\mu) \right] \mathbf{Z} \label{eq:derivs-beta} $$
Proof. Now we take the derivatives with respect to $\beta$: $$ \begin{aligned} \frac{\partial \mathcal{L}(\mathbf{y};\alpha, \tau^2 \rvert \beta)}{\partial \beta_t} &= \frac{\partial}{\partial \beta_t} \left[ \exp\left(\sum_{i = 1}^k \sum_{j = 1}^{n_i} \left[ \mathbf{y}_{i,j} \log(\mu_{i,j}) - \mu_{i,j} - \log(\mathbf{y}_{i,j}!) \right]\right) \right] \\ &= \exp\left(\sum_{i = 1}^k \sum_{j = 1}^{n_i} \left[ \mathbf{y}_{i,j} \log(\mu_{i,j}) - \mu_{i,j} - \log(\mathbf{y}_{i,j}!) \right]\right) \sum_{i = 1}^k \sum_{j = 1}^{n_i} \frac{\partial}{\partial \beta_t} \left[ \mathbf{y}_{i,j} \log(\mu_{i,j}) - \mu_{i,j} - \log(\mathbf{y}_{i,j}!)\right] \\ &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \sum_{j = 1}^{n_t} \left[\mathbf{y}_{t,j}\frac{\partial}{\partial \beta_t} \left[\eta_{t,j} \right] - \frac{\partial}{\partial \beta_t} \left[ \exp(\eta_{t,j})\right] \right] \\ &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \sum_{j =1}^{n_t} \left[ \mathbf{y}_{t,j} \mathbf{z}_{t,j} - \exp(\eta_{t,j})\mathbf{z}_{t,j}\right] \\ &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \sum_{j =1}^{n_t} \left[ \mathbf{z}_{t,j}( \mathbf{y}_{t,j} - \mu_{t,j}) \right] \\ &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) (\tilde{\mathbf{Z}}^t)^\top (\mathbf{y}^t - \mu^t) \\ \implies \frac{\partial \mathcal{L}(\mathbf{y};\alpha, \tau^2 \rvert \beta)}{\partial \beta} &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \mathbf{Z}^\top (\mathbf{y} - \mu) \end{aligned} \nonumber $$ where we let $\mathbf{y}^t$ and $\mu^t$ denote the response vector and mean vector of the $t$-th cluster, respectively. The second-order partial derivatives are: $$ \begin{aligned} \frac{\partial^2 \mathcal{L}(\mathbf{y};\alpha, \tau^2 \rvert \beta)}{\partial \beta_t^2} &= \frac{\partial}{\partial \beta_t}\left[ \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) (\tilde{\mathbf{Z}}^t)^\top (\mathbf{y}^t - \mu^t) \right] \\ &= \frac{\partial \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \beta_t} (\tilde{\mathbf{Z}}^t)^\top(\mathbf{y}^t - \mu^t) + \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\frac{\partial}{\partial \beta^t}\left[ (\tilde{\mathbf{Z}}^t)^\top \mathbf{y}^t - \exp(\eta^t)\right] \\ &= \mathcal{L}(\mathbf{y}; \alpha,\tau^2 \rvert \beta) \left((\tilde{\mathbf{Z}}^t)^\top (\mathbf{y}^t - \mu^t)\right)^2 + \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \frac{\partial}{\partial \beta^t}\left[ \sum_{j = 1}^{n_t} \mathbf{z}_{t,j} (\mathbf{y}_{t,j} - \exp(\eta_{t,j})) \right]\\ &= \mathcal{L}(\mathbf{y}; \alpha,\tau^2 \rvert \beta) \left((\tilde{\mathbf{Z}}^t)^\top (\mathbf{y}^t - \mu^t)\right)^2 + \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left( \sum_{j = 1}^{n_t} \mathbf{z}_{t,j} (- \exp(\eta_{t,j})) \mathbf{z}_{t,j} \right)\\ &= \mathcal{L}(\mathbf{y}; \alpha,\tau^2 \rvert \beta) \left((\tilde{\mathbf{Z}}^t)^\top (\mathbf{y}^t - \mu^t)\right)^2 + \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) (\tilde{\mathbf{Z}}^t)^\top \text{diag}\left(-\mu^t \right)\tilde{\mathbf{Z}}^t \\ \frac{\partial^2 \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \beta_t \partial \beta_s} &= \frac{\partial}{\partial \beta_s}\left[ \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) (\tilde{\mathbf{Z}}^t)^\top (\mathbf{y}^t - \mu^t) \right] \\ &= \frac{\partial \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \beta_s} (\tilde{\mathbf{Z}}^t)^\top (\mathbf{y}^t - \mu^t) \\ &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) (\tilde{\mathbf{Z}}^s)^\top (\mathbf{y}^s - \mu^s) (\tilde{\mathbf{Z}}^t)^\top (\mathbf{y}^t - \mu^t) \\ \implies \frac{\partial^2 \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \beta \partial \beta^\top} &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \mathbf{Z}^\top \left[ (\mathbf{y} - \mu) (\mathbf{y} - \mu)^\top - \text{diag}(\mu) \right] \mathbf{Z} \end{aligned} \nonumber $$
which matches what we derived in Eq. \eqref{eq:likelihood-derivs}.

Let \(\mathcal{L}'_q(\beta_0)\) and \(\mathcal{L}''_q(\beta_0)\) denote the partial derivatives evaluated at $\beta = \beta_0$. The Taylor expansion of $\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)$ at $\beta_0$ is:

\[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) = \left[ \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \beta_0} + (\beta - \beta_0)^\top \left[\frac{\partial \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \beta} \right]\bigg\rvert_{\beta = \beta_0} + \frac{1}{2}(\beta - \beta_0)^\top \left[ \frac{\partial \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \beta \partial \beta^\top}\right] \bigg\rvert_{\beta = \beta_0} (\beta - \beta_0) + [R]\rvert_{\beta = \beta_0} \label{eq:taylor-expansion}\]

where $R$ is a remainder term containing the higher order partial derivatives. Let $\bar{R}$ denote the remainder terms divided by \(\left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right]\rvert_{\beta = \beta_0}\). Factoring out $\exp(\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta))$ and letting $\beta_0 = \mathbf{0}$ (the random effects mean) we get:

\[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) = \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \beta^\top\left[ \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert\beta)}{\partial \eta} \right]\bigg\rvert_{\beta = \mathbf{0}} + \frac{1}{2}\beta^\top \left(\mathbf{Z}^\top \left[ \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta}\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert\beta)}{\partial \eta^\top} + \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top}\right]\bigg\rvert_{\beta = \mathbf{0}} \mathbf{Z} \right)\beta + [\bar{R}]\rvert_{\beta = \mathbf{0}} \right) \label{eq:taylor-exp-0}\]
Example.
The Taylor expansion of the likelihood is fairly straightforward given Eqs. \eqref{eq:derivs-beta}: $$ \begin{aligned} \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) &= [\mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)]_{\beta = \mathbf{0}} + \beta^\top \left[\frac{\partial \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \beta} \right]_{\beta = \mathbf{0}} + \frac{1}{2} \beta^\top \left[ \frac{\partial \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \beta \partial \beta^\top} \right]_{\beta = \mathbf{0}} \beta + \left[\bar{R}\right]_{\beta = \mathbf{0}} \\ &\approx [\mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)]_{\beta = \mathbf{0}} \left(1 + \beta^\top \left[ \mathbf{Z}^\top (\mathbf{y} - \mu^0) \right] + \frac{1}{2} \beta^\top \left[ \mathbf{Z}^\top \left[ (\mathbf{y} - \mu^0) (\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0) \right] \mathbf{Z} \right] \beta \right) \end{aligned} \label{eq:ex-taylor} $$

Taking The Expectation

Notice that Eq. \eqref{eq:uncondition-log-lik} is just the expectation of $\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)$ ($\mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)$, respectively) with respect to the probability measure specified by $\mathcal{F}$. We can therefore write the unconditional quasi-likelihood (or likelihood) with the expectation of our Taylor expansion:

\[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) = \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \frac{1}{2}\text{tr} \left[\mathbf{Z}^\top \left[\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta} \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta^\top} + \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top}\right]\bigg\rvert_{\beta = \mathbf{0}} \mathbf{Z} D(\tau^2) \right] + o(\rvert \rvert \tau^2 \rvert \rvert) \right) \label{eq:taylor-expansion-expectation}\]
Proof. Let: $$ \bar{\mathcal{L}}''_q(\mathbf{0}) = \left(\mathbf{Z}^\top\left[ \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta}\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta^\top} + \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top} \right]\bigg\rvert_{\beta = \mathbf{0}}\mathbf{Z} \right) \nonumber $$ Then: $$ \begin{aligned} \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2) &= \int \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) f(\beta) d\beta \\ &= \mathbb{E}_{\beta} \left[ \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right] \\ &= \mathbb{E}_{\beta} \left[ \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \beta^\top \left[\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta}\right]\bigg\rvert_{\beta = \mathbf{0}} + \frac{1}{2}\beta^\top \bar{\mathcal{L}}''_q(\mathbf{0}) \beta + \bar{R} \right)\right] \\ &= \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \mathbb{E}_{\beta} \left[ \beta^\top \left[\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta}\right]\bigg\rvert_{\beta = \mathbf{0}}\right] + \frac{1}{2}\mathbb{E}_{\beta} \left[ \beta^\top \bar{\mathcal{L}}''_q(\mathbf{0}) \beta \right] + \mathbb{E}_{\beta} \left[ \bar{R} \right]\right) \\ &= \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \frac{1}{2}\mathbb{E}_{\beta} \left[ \beta^\top \bar{\mathcal{L}}''_q(\mathbf{0}) \beta \right] + \mathbb{E}_{\beta} \left[ \bar{R} \right]\right) & \left(\mathbb{E}_\beta \left[ \beta^\top \right] = \mathbf{0} \right) \\ &= \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \frac{1}{2}\mathbb{E}_{\beta} \left[ \text{tr}\left[ \beta^\top \bar{\mathcal{L}}''_q(\mathbf{0}) \beta \right] \right] + \mathbb{E}_{\beta} \left[ \bar{R} \right]\right) & \left( \beta^\top \bar{\mathcal{L}}''_q(\mathbf{0}) \beta \in \mathbb{R} \right) \\ &= \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \frac{1}{2}\mathbb{E}_{\beta} \left[ \text{tr}\left[ \bar{\mathcal{L}}''_q(\mathbf{0}) \beta \beta^\top \right] \right] + \mathbb{E}_{\beta} \left[ \bar{R} \right]\right) & \left( \text{tr}\left[ \cdot \right] \text{ commutes}\right) \\ &= \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \frac{1}{2}\text{tr}\left[ \mathbb{E}_{\beta} \left[ \bar{\mathcal{L}}''_q(\mathbf{0}) \beta \beta^\top \right] \right] + \mathbb{E}_{\beta} \left[ \bar{R} \right]\right) & \left( \text{tr}\left[ \cdot \right] \text{ is linear} \right) \\ &= \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \frac{1}{2} \text{tr}\left[ \bar{\mathcal{L}}''_q(\mathbf{0}) \text{Cov}(\beta) \right] + \mathbb{E}_{\beta} \left[ \bar{R} \right]\right) & \left( \mathbb{E}_\beta \left[ \beta \right] = \mathbf{0} \implies \mathbb{E}_\beta \left[ \beta \beta^\top \right] = \text{Cov}(\beta)\right) \\ &= \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}}\left( 1 + \frac{1}{2}\text{tr}\left[ \bar{\mathcal{L}}''_q(\mathbf{0}) D(\tau^2) \right] + \mathbb{E}_{\beta} \left[ \bar{R} \right]\right) & \left( \text{Cov}(\beta) = D(\tau^2) \right) \\ &= \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \frac{1}{2} \text{tr}\left[ \bar{\mathcal{L}}''_q(\mathbf{0}) D(\tau^2) \right] + o(\rvert \rvert \tau^2 \rvert\rvert)\right) & \left( \text{higher order assumption} \right) \end{aligned} \nonumber $$ Plugging in for $\mathcal{L}''_q(\mathbf{0})$ we found in Eq. \eqref{eq:likelihood-derivs}: $$ \begin{aligned} \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2) &= \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \frac{1}{2} \text{tr}\left[ \bar{\mathcal{L}}''_q(\mathbf{0}) D(\tau^2) \right] + o(\rvert \rvert \tau^2 \rvert\rvert)\right) \\ &=\left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \frac{1}{2}\text{tr} \left[\mathbf{Z}^\top \left[\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta} \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta^\top} + \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top}\right]\bigg\rvert_{\beta = \mathbf{0}} \mathbf{Z} D(\tau^2) \right] + o(\rvert \rvert \tau^2 \rvert \rvert) \right) \end{aligned} \nonumber $$

Taking the natural logarithm of both sides gives us the marginal log quasi-likelihood:

\[\begin{aligned} \ell_q(\mathbf{y}; \alpha, \tau^2) &= \log(\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2)) \\ &= \left[\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} + \log\left(1 + \frac{1}{2} \text{tr} \left[ \mathbf{Z}^\top \left[\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta} \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta^\top} + \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top}\right]\bigg\rvert_{\beta = \mathbf{0}} \mathbf{Z} D(\tau^2) \right] + o(\rvert \rvert \tau^2 \rvert \rvert)\right) \\ &\approx \left[\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}}+ \frac{1}{2} \text{tr} \left[ \mathbf{Z}^\top \left[\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta} \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta^\top} + \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top}\right]\bigg\rvert_{\beta = \mathbf{0}} \mathbf{Z} D(\tau^2) \right] \end{aligned} \label{eq:log-quasi-lik}\]

The approximation follows from the fact that $\log(1 + x) \approx x$ for small $x$, and we drop the remainder term since we assume it is small.

Example.
Taking the expectation of Eq. \eqref{eq:ex-taylor} gives us: $$ \mathbb{E}_{H_0}\left[ \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right] = [\mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)]_{\beta = \mathbf{0}} \left(1 + \frac{\tau^2}{2}\text{tr}\left[ \mathbf{Z}^\top \left[ (\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0) \right] \mathbf{Z} \right] \right) $$
Proof. $$ \begin{aligned} \mathbb{E}\left[ \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right]_{H_0} &\approx [\mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)]_{\beta = \mathbf{0}} \mathbb{E}_{H_0}\left[1 + \beta^\top \left[ \mathbf{Z}^\top(\mathbf{y} - \mu^0) \right] + \frac{1}{2} \beta^\top \left[ \mathbf{Z}^\top \left[ (\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0) \right] \mathbf{Z} \right] \beta \right] \\ &= [\mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)]_{\beta = \mathbf{0}} \left(1 + \frac{1}{2}\mathbb{E}_{H_0}\left[ \text{tr}\left[ \beta^\top \left( \mathbf{Z}^\top \left[ (\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0) \right] \mathbf{Z} \right) \beta \right] \right] \right) \\ &= [\mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)]_{\beta = \mathbf{0}} \left(1 + \frac{1}{2}\text{tr}\left[ \mathbb{E}_{H_0}\left[ \mathbf{Z}^\top \left[ (\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0) \right] \mathbf{Z} \beta \beta^\top \right] \right] \right) \\ &= [\mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)]_{\beta = \mathbf{0}} \left(1 + \frac{1}{2}\text{tr}\left[ \mathbf{Z}^\top \left[ (\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0) \right] \mathbf{Z} \mathbb{E}_{H_0}\left[ \beta \beta^\top \right] \right] \right) \\ &= [\mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)]_{\beta = \mathbf{0}} \left(1 + \frac{1}{2}\text{tr}\left[ \mathbf{Z}^\top \left[ (\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0) \right] \mathbf{Z} [\tau^2 \mathbb{I}_{k \times k}] \right] \right) \\ &= [\mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)]_{\beta = \mathbf{0}} \left(1 + \frac{\tau^2}{2}\text{tr}\left[ \mathbf{Z}^\top \left[ (\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0) \right] \mathbf{Z} \right] \right) \end{aligned} \nonumber $$
And taking the natural logarithm of this gives us the approximate marginal log-likelihood: $$ \begin{aligned} \ell(\mathbf{y}; \alpha, \tau^2) &\approx \left[ \ell(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right]_{\beta = \mathbf{0}} + \log\left( 1 + \frac{\tau^2}{2}\text{tr}\left[ \mathbf{Z}^\top \left[ (\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0) \right] \mathbf{Z} \right]\right) \\ &\approx \left[ \ell(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right]_{\beta = \mathbf{0}} + \frac{\tau^2}{2}\text{tr}\left[ \mathbf{Z}^\top \left[ (\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0) \right] \mathbf{Z} \right] \end{aligned} $$

Of course, these derivations include a bunch of approximations chained together, which raises the question of how well they’ll work in practice. We’ll have to just wait and find out I guess…


Inference

Okay, so now we have an approximation of the likelihood or quasi-likelihood, which we can use for inference. We’ll continue with the Taylor approximation. We should note that we further assume that:

\[\frac{\partial g^{-1}(\eta_i)}{\partial \eta_i} = \frac{\partial \mu_i}{\partial \eta_i} = \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{-1} \label{eq:deriv-assumption}\]

which holds if you assume that $g(\cdot)$ is strictly monotone and continuous. This implies that $g(\cdot)$ is invertible, and its derivative is never $0$. This holds for many standard choices of link function (e.g. $\log$, $\text{logit}$, the identity).

Since the score function involves a derivative, there is a separate one for each parameter. Due to fact that we assume that the random effects are Gaussian with mean zero and covariance matrix parametrized by $\tau^2$, we only really have $\alpha$, $\tau^2$, and $\phi$ are our main parameters (since $\omega$ are prior weights).

Before we derive the scores, let’s first define the following terms:

\[\delta_i = \frac{1}{\frac{\partial \eta_i}{\partial \mu_i}}, \hspace{5mm} \rho_i = \frac{\frac{\partial v(\mu_i)}{\partial \mu_i} \frac{\partial \eta_i}{\partial \mu_i} + v(\mu_i) \frac{\partial^2 \eta_i}{\partial \mu_i^2}}{v^2(\mu_i) \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^3}, \hspace{5mm} \gamma_i = \frac{1}{v(\mu_i) \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^2}, \hspace{5mm} \gamma_{0, i} = \gamma_i + \rho_i(\mathbf{y}_i - \mu_i) \label{eq:intermed-terms}\]

where we recall that we let $v(\mu_i) = d(\phi, \omega_i) V(\mu_i) = \frac{\phi}{\omega_i}V(\mu_i)$. We’ll also define the following matrices:

\[\Delta = \begin{bmatrix} \delta_1 & \dots & 0 \\ \vdots & \ddots &\vdots \\ 0 & \dots & \delta_n \end{bmatrix}, \hspace{5mm} \Gamma = \begin{bmatrix} \gamma_1 & \dots & 0 \\ \vdots & \ddots &\vdots \\ 0 & \dots & \gamma_n \end{bmatrix}, \hspace{5mm} \Gamma_0 = \begin{bmatrix} \gamma_{0, 1} & \dots & 0 \\ \vdots & \ddots &\vdots \\ 0 & \dots & \gamma_{0, n} \end{bmatrix} \label{eq:intermed-mats}\]

To save space, we’ll also define:

\[\mathbf{C} = \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta}\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta^\top} + \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top}\]

and we’ll let $\mathbf{C}^0 = \mathbf{C} \rvert_{\beta = \mathbf{0}}$ as before.

To streamline our analysis later, let’s do some initial derivations. We have:

\[\begin{aligned} \frac{\partial}{\partial \eta_i} \left[ \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{-1} \right] &= -\frac{\partial^2 \eta_i}{\partial \mu_i^2}\left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{-2} \\ \frac{\partial}{\partial \mu_i} \left[ \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right] &= \frac{\mathbf{y}_i - \mu_i}{v(\mu_i)} \\ \frac{\partial}{\partial \eta_i}\left[ \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right] &= \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{-1} \frac{\mathbf{y}_i - \mu_i}{v(\mu_i)} = \gamma_i \delta^{-1}_i (\mathbf{y}_i - \mu_i) \\ -\frac{\partial^2}{\partial \eta_i^2}[ \ell_q(\mathbf{y}; \alpha, \tau^2\rvert \beta)] &= - \frac{\partial}{\partial \eta_i} \left[ \frac{\mathbf{y}_i - \mu_i}{v(\mu_i) \frac{\partial \eta_i}{\partial \mu_i}} \right] = \gamma_{0, i} \end{aligned} \label{eq:intermed-derivs}\]
Proof. $$ \begin{aligned} \frac{\partial}{\partial \eta_i} \left[ \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{-1}\right] &= - \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{-2} \frac{\partial \mu_i}{\partial \eta_i} \frac{\partial}{\partial \mu_i}\left[ \frac{\partial \eta_i}{\partial \mu_i} \right] = -\frac{\partial^2 \eta_i}{\partial \mu_i^2}\left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{-2} \\ \frac{\partial}{\partial \mu_i} \left[ \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right] &= \frac{\partial}{\partial \mu_i} \left[ \int_{y_i}^{\mu_i} \frac{\omega_i(\mathbf{y}_i - u)}{\phi V(u)} du \right] = \frac{\omega_i(\mathbf{y}_i - \mu_i)}{\phi V(\mu_i)} = \frac{\mathbf{y}_i - \mu_i}{v(\mu_i)} \\ \frac{\partial}{\partial \eta_i}\left[ \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right] &= \frac{\partial \mu_i}{\partial \eta_i} \frac{\partial}{\partial \mu_i} \left[\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right] = \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{-1} \frac{\mathbf{y}_i - \mu_i}{v(\mu_i)} = \frac{\frac{\partial \eta_i}{\partial \mu_i}}{v(\mu_i) \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^2} (\mathbf{y}_i - \mu_i) = \gamma_i \delta^{-1}_i (\mathbf{y}_i - \mu_i) \nonumber \end{aligned} $$ To show the last equality, we note that: $$ \begin{aligned} \frac{\partial^2 \mu_i}{\partial \eta_i^2} &= \frac{\partial \mu_i}{\partial \eta_i} \frac{\partial}{\partial \mu_i}\left[ \frac{\partial \mu_i}{\partial \eta_i}\right] = - \frac{\partial \mu_i}{\partial \eta_i} \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^2 \frac{\partial^2 \eta_i}{\partial \mu_i^2} = - \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{-3} \frac{\partial^2 \eta_i}{\partial \mu_i^2} \hspace{5mm} \implies \hspace{5mm} \frac{\partial^2 \eta_i}{\partial \mu_i^2} = -\frac{\partial^2 \mu_i}{\partial \eta_i^2} \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{3} \end{aligned} \label{eq:temp} $$ Then: $$ \begin{aligned} \frac{\partial^2}{\partial \eta_i^2}[ \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)] &= \frac{\partial}{\partial \eta_i} \left[ \frac{\partial \mu_i}{\partial \eta_i} \right] \left(\frac{\mathbf{y}_i - \mu_i}{v(\mu_i)} \right)+ \left(\frac{\partial \mu_i}{\partial \eta_i} \right)\frac{\partial \mu_i}{\partial \eta_i} \frac{\partial}{\partial \mu_i} \left[ \frac{\mathbf{y}_i - \mu_i}{v(\mu_i)} \right] & \left(\text{chain rule}\right) \\ &= \frac{\partial^2 \mu_i}{\partial \eta_i^2} \left(\frac{\mathbf{y}_i - \mu_i}{v(\mu_i)} \right) + \left( \frac{\partial \mu_i}{\partial \eta_i}\right)^2 \left(\frac{-v(\mu_i) - \frac{\partial v(\mu_i)}{\partial \mu_i}(\mathbf{y}_i - \mu_i)}{v^2(\mu_i)}\right) \\ &= - \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{-3} \frac{\partial^2 \eta_i}{\partial \mu_i^2} \left( \frac{\mathbf{y}_i - \mu_i}{v(\mu_i)}\right) + \left( \frac{\partial \mu_i}{\partial \eta_i}\right)^2 \left(\frac{-v(\mu_i) - \frac{\partial v(\mu_i)}{\partial \mu_i}(\mathbf{y}_i - \mu_i)}{v^2(\mu_i)}\right) & \left(\text{Eq. } \eqref{eq:temp}\right) \\ &= - \frac{\frac{\partial^2 \eta_i}{\partial \mu_i^2} \left( \frac{\mathbf{y}_i - \mu_i}{v(\mu_i)}\right)}{\left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{3}} + \frac{-v(\mu_i) - \frac{\partial v(\mu_i)}{\partial \mu_i}(\mathbf{y}_i - \mu_i)}{v^2(\mu_i) \left( \frac{\partial \eta_i}{\partial \mu_i}\right)^2 } & \left( \frac{\partial \mu_i}{\partial \eta_i} = \left(\frac{\partial \eta_i}{\partial \mu_i} \right)^{-1} \text{ by Eq. \eqref{eq:deriv-assumption}}\right) \\ &= - \frac{1}{v(\mu_i) \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^2} - (\mathbf{y}_i - \mu_i)\left(\frac{\frac{\partial^2 \eta_i}{\partial \mu_i^2}}{v(\mu_i) \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^3} + \frac{\frac{\partial v(\mu_i)}{\partial \mu_i}}{v^2(\mu_i)\left(\frac{\partial \eta_i}{\partial \mu_i}\right)^2}\right) \\ &= - \gamma_i - (\mathbf{y}_i - \mu_i)\left(\frac{v(\mu_i) \frac{\partial^2 \eta_i}{\partial \mu_i^2}}{v^2(\mu_i) \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^3} + \frac{\frac{\partial v(\mu_i)}{\partial \mu_i} \frac{\partial \eta_i}{\partial \mu_i}}{v^2(\mu_i)\left(\frac{\partial \eta_i}{\partial \mu_i}\right)^3}\right) \\ &= - \gamma_i - \rho_i (\mathbf{y}_i - \mu_i) \\ &= - \gamma_{0, i} \end{aligned} \nonumber $$

If we consider all components of $\eta$ at the same time, the last two equalities become matrices:

\[\frac{\partial}{\partial \eta}[\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)] = \Gamma \Delta^{-1}(\mathbf{y} - \mu), \hspace{10mm} \frac{\partial^2}{\partial \eta \partial \eta^\top}[\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)] = -\Gamma_0\]
Example.
For our example, the mean equals the variance, so we have: $$ \delta_{i,j} = \mu_{i,j}, \hspace{5mm} \rho_{i,j} = 0, \hspace{5mm} \gamma_{i,j} = \mu_{i,j}, \hspace{5mm} \gamma_{0,i,j} = \mu_{i,j} $$
Proof. For our example, we use $\eta_{i,j} = \log(\mu_{i,j})$ and $v(\mu_{i,j}) = \mu_{i,j}$. Thus: $$ \frac{\partial \eta_{i,j}}{\partial \mu_{i,j}} = \frac{1}{\mu_{i,j}}, \hspace{5mm} \frac{\partial^2 \eta_{i,j}}{\partial \mu_{i,j}^2} = -\frac{1}{\mu_{i,j}^2}, \hspace{5mm} \frac{\partial v(\mu_{i,j})}{\partial \mu_{i,j}} = 1 \nonumber $$ Plugging these into Eq. \eqref{eq:intermed-terms} gives the desired result.
In matrix form, where we index the observations overall rather than by cluster: $$ \Delta = \begin{bmatrix} \mu_1 & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \dots & \mu_{n} \end{bmatrix}, \hspace{5mm} \Gamma = \begin{bmatrix} \mu_1 & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \dots & \mu_{n} \end{bmatrix}, \hspace{5mm} \Gamma_0 = \begin{bmatrix} \mu_1 & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \dots & \mu_{n} \end{bmatrix} \label{eq:matrix-versions-ex} $$ Comparing this to Eq. \eqref{eq:derivs-eta}: $$ \begin{aligned} \frac{\partial \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta} = \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) (\mathbf{y} - \mu) \hspace{5mm} &\implies \hspace{5mm} \frac{\partial \ell(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta} = \mathbf{y} - \mu = \Gamma \Delta^{-1}(\mathbf{y} - \mu) \\ \frac{\partial \ell(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta} = \mathbf{y} - \mu \hspace{5mm} &\implies \hspace{5mm} \frac{\partial^2 \ell(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top} = -\text{diag}(\mu) = -\Gamma_0 \end{aligned} $$ We also have: $$ \mathbf{C} = (\mathbf{y} - \mu)(\mathbf{y} - \mu)^\top - \text{diag}(\mu) $$

Scores

Common choices for parameter estimation and hypothesis testing will require us to take the gradient of the likelihood with respect to the parameters of interest. That is, we need to find the score function (see my likelihood post).

Score For $\alpha$

The score for $\alpha$ has elements:

\[U_{\alpha_j}\rvert_{H_0} = \frac{\partial}{\partial \alpha_j} [\ell_q(\mathbf{y}; \alpha, \tau^2) ]\bigg\rvert_{H_0} \approx \left(\mathbf{X}_{\cdot, j}\right)^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) + \frac{1}{2} \text{tr}\left[ \frac{\partial \mathbf{C}^0}{\partial \alpha_j} \mathbf{Z} D(\tau^2) \mathbf{Z}^\top \right] \label{eq:alpha-score}\]
Proof. $$ \begin{aligned} U_{\alpha_j} \rvert_{H_0} &= \frac{\partial}{\partial \alpha_j} \left[ \ell_q(\mathbf{y}; \alpha, \tau^2) \right] \bigg\rvert_{H_0}\\ &\approx \frac{\partial}{\partial \alpha_j} \left[ \left[\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]_{\beta = \mathbf{0}} + \frac{1}{2} \text{tr} \left[ \mathbf{Z}^\top \left[\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta} \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta^\top} + \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top}\right]_{\beta = \mathbf{0}} \mathbf{Z} D(\tau^2) \right] \right] & \left(\text{Eq. } \eqref{eq:log-quasi-lik} \right) \\ &= \frac{\partial}{\partial \alpha_j} \left[ \left[ \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right]\bigg\rvert_{H_0} \right] + \frac{1}{2} \frac{\partial}{\partial \alpha_j} \left[\text{tr}\left[ \mathbf{Z}^\top \mathbf{C}^0 \mathbf{Z} D(\tau^2) \right] \right] \\ &= \frac{\partial}{\partial \alpha_j} \left[ \left[ \sum_{i = 1}^n \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta) \right]\bigg\rvert_{\beta= \mathbf{0}}\right] + \frac{1}{2} \frac{\partial}{\partial \alpha_j} \text{tr} \left[ \mathbf{C}^0 \mathbf{Z} D(\tau^2) \mathbf{Z}^\top \right] & \left(\text{tr}\left[ \cdot \right] \text{ is linear} \right) \\ &= \frac{\partial}{\partial \alpha_j} \left[ \left[\sum_{i = 1}^n \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta) \right]\bigg\rvert_{\beta= \mathbf{0}} \right] + \frac{1}{2} \text{tr}\left[ \frac{\partial \mathbf{C}^0}{\partial \alpha_j} \mathbf{Z} D(\tau^2) \mathbf{Z}^\top \right] & \left(\text{by } (a)\right) \\ &= \left(\mathbf{X}_{\cdot, j}\right)^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) + \frac{1}{2} \text{tr}\left[ \frac{\partial \mathbf{C}^0}{\partial \alpha_j} \mathbf{Z} D(\tau^2) \mathbf{Z}^\top \right] & \left(\text{by } (b)\right) \end{aligned} \nonumber $$ $$ \begin{aligned} (a): \frac{\partial}{\partial \alpha_j} \left[ \text{tr} \left[ \mathbf{C}^0 \mathbf{Z} D(\tau^2) \mathbf{Z}^\top \right] \right] &= \sum_{i = 1}^n \sum_{i' = 1}^n \frac{\partial}{\partial \alpha_j} \left[ \mathbf{C}^0_{i, i'} (\mathbf{Z} D(\tau^2) \mathbf{Z}^\top)_{i, i'} \right] = \text{tr}\left[ \frac{\partial \mathbf{C}^0}{\partial \alpha_j} \mathbf{Z} D(\tau^2) \mathbf{Z}^\top \right] \\ (b):\frac{\partial}{\partial \alpha_j} \left[ \left[\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right] \bigg\rvert_{\beta= \mathbf{0}} \right] &= \sum_{i = 1}^n \frac{\partial}{\partial \alpha_j} \left[ \left[\ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta) \right] \bigg\rvert_{\beta= \mathbf{0}} \right]\\ &= \sum_{i = 1}^n \frac{\partial \eta_i}{\partial \alpha_j} \frac{\partial}{\partial \eta_i} \left[ \left[ \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta) \right]\bigg\rvert_{\beta= \mathbf{0}} \right] \\ &= \sum_{i = 1}^n \mathbf{x}_{i,j} \gamma^0_i (\delta^0_i)^{-1}\left(\mathbf{y}_i - \mu^0_i\right) & \left( \eta_{i,t} = \mathbf{x}_i^\top \alpha + \mathbf{z}_i^\top \beta_t \right) \\ &= \left(\mathbf{X}_{\cdot, j}\right)^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) \end{aligned} \nonumber $$
Example.
Letting $\tilde{\mu}_t$ denote an $n$-dimensional vector that is equal to $\mu_t$ for the entries corresponding to cluster $t$ and zero everywhere else, we have: $$ \begin{aligned} \frac{\partial \mathbf{C}}{\partial \alpha_t} &= - \mathbf{y}\tilde{\mu}_t^\top - \tilde{\mu}_t^\top \mathbf{y} + \begin{bmatrix} \mathbf{0}_{n \times \sum_{i < t}n_i} & \mu \mu_t^\top & \mathbf{0}_{n \times \sum_{i > t} n_i} \end{bmatrix} - \text{diag}\left(\tilde{\mu}_{t}\right) \in \mathbb{R}^{n \times n} \\ \frac{\partial}{\partial \alpha_t} \left[ \ell(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right] &= \sum_{j = 1}^{n_t} \left[ \mathbf{y}_{t, j} - \mu_{t,j} \right] \end{aligned} $$ Thus, $\frac{\mathbf{C}}{\partial \alpha}$ is the $n \times n \times k$ tensor made from stacking together all of the above matrices.
Proof. We have the following identities: $$ \frac{\partial \mu}{\partial \alpha} = \begin{bmatrix} \mu_1 & \mathbf{0}_{n_1} & \dots & \mathbf{0}_{n_1} \\ \mathbf{0}_{n_2} & \mu_2 & \dots & \mathbf{0}_{n_2} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{0}_{n_k} & \mathbf{0}_{n_k} & \dots & \mu_k \end{bmatrix} \in \mathbb{R}^{n \times k}, \hspace{5mm} \frac{\partial }{\partial \alpha} \left[ \mu \mu^\top \right] = \begin{bmatrix} \begin{bmatrix} \mu \mu_1^\top & \mathbf{0}_{n \times \sum_{i > 1} n_i} \end{bmatrix} & \dots & \begin{bmatrix} \mathbf{0}_{n \times \sum_{i < k} n_i} & \mu \mu_k^\top \end{bmatrix} \end{bmatrix} \in \mathbb{R}^{n \times n \times k} $$
Proof. $$ \begin{aligned} \frac{\partial \mu_{t,j}}{\partial \alpha_t} &= \frac{\partial}{\partial \alpha_t} \left[ \exp(\alpha_t + \beta_t \mathbf{z}_{t,j})\right] = \exp(\alpha_t + \beta_t \mathbf{z}_{t,j}) = \mu_{t,j} \\ \frac{\partial \mu_{i,j}}{\partial \alpha_t} &= 0 \\ \implies \frac{\partial \mu}{\partial \alpha_t} &= \begin{bmatrix} 0 & \dots & 0 & \mu_{t,1} & \dots & \mu_{t, n_t} & 0 & \dots & 0 \\ \end{bmatrix}^\top \\ \implies \frac{\partial \mu}{\partial \alpha} &= \begin{bmatrix} \frac{\partial \mu}{\partial \alpha_1} & \frac{\partial \mu}{\partial \alpha_2} & \dots & \frac{\partial \mu}{\partial \alpha_k} \end{bmatrix}^\top = \begin{bmatrix} \mu_1 & \mathbf{0}_{n_1} & \dots & \mathbf{0}_{n_1} \\ \mathbf{0}_{n_2} & \mu_2 & \dots & \mathbf{0}_{n_2} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{0}_{n_k} & \mathbf{0}_{n_k} & \dots & \mu_k \end{bmatrix} \in \mathbb{R}^{n \times k} \end{aligned} \nonumber $$ $$ \begin{aligned} \frac{\partial}{\partial \alpha_t} \left[ \mu_{t,j} \mu_{t, l} \right] &= \frac{\partial}{\partial \alpha_t} \left[ \exp(\alpha_t + \beta_t \mathbf{z}_{t,j}) \exp(\alpha_t + \beta_t \mathbf{z}_{t, l}) \right] = \exp(\alpha_t + \beta_t \mathbf{z}_{t, l})\exp(\alpha_t + \beta_t \mathbf{z}_{t, j}) + \exp(\alpha_t + \beta_t \mathbf{z}_{t, j})\exp(\alpha_t + \beta_t \mathbf{z}_{t, l}) = 2 \mu_{t,j} \mu_{t,l} \\ \frac{\partial}{\partial \alpha_t} \left[ \mu_{t,j} \mu_{s, l} \right] &= \frac{\partial}{\partial \alpha_t} \left[ \exp(\alpha_t + \beta_t \mathbf{z}_{t,j}) \exp(\alpha_s + \beta_s \mathbf{z}_{s, l}) \right] = \exp(\alpha_t + \beta_t \mathbf{z}_{t,j})\exp(\alpha_s + \beta_s \mathbf{z}_{s, l}) = \mu_{t,j} \mu_{s,l} \\ \frac{\partial}{\partial \alpha_t} \left[ \mu_{i, j} \mu_{s, l} \right] &= 0 \\ \implies \frac{\partial}{\partial \alpha_t} \left[ \mu \mu^\top \right] &= \begin{bmatrix} 0 & \dots & 0 & \mu_{1, 1} \mu_{t, 1} & \dots & \mu_{1, 1} \mu_{t, n_t} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \mu_{1, n_1} \mu_{t, 1} & \dots & \mu_{1, n_1} \mu_{t, n_t} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & 2\mu_{t, 1}^2 & \dots & 2\mu_{t, 1}\mu_{t, n_t} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & 2\mu_{t, n_t} \mu_{t, 1} & \dots & 2\mu_{t, n_t}^2 & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \mu_{k, 1} \mu_{t, 1} & \dots & \mu_{k, 1} \mu_{t, n_t} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \mu_{k, n_k} \mu_{t, 1} & \dots & \mu_{k, n_k} \mu_{t, n_t} & 0 & \dots & 0 \end{bmatrix} = \begin{bmatrix} \mathbf{0}_{n_1 \times \sum_{i < t} n_i} & \mu_1 \mu_t^\top & \mathbf{0}_{n_1 \times \sum_{i > t} n_i} \\ \vdots & \vdots & \vdots \\ \mathbf{0}_{n_k \times \sum_{i < t} n_i} & \mu_k \mu_t^\top & \mathbf{0}_{n_k \times \sum_{i > t} n_i} \end{bmatrix} = \begin{bmatrix} \mathbf{0}_{n \times \sum_{i < t} n_i} & \mu \mu_t^\top & \mathbf{0}_{n \times \sum_{i > t} n_i} \end{bmatrix} \end{aligned} \nonumber $$
This allows us to find: $$ \begin{aligned} \frac{\partial}{\partial \alpha_t} \left[ \mathbf{y} \mu^\top \right] &= \mathbf{y} \begin{bmatrix} \mathbf{0}_{\sum_{i < t} n_i} \\ \mu_{t} \\ \mathbf{0}_{\sum_{i > t} n_i} \end{bmatrix}^\top = \mathbf{y} \tilde{\mu}_t^\top \in \mathbb{R}^{n \times n} \\ \implies \frac{\partial}{\partial \alpha} \left[ \mathbf{y} \mu^\top \right] &= \begin{bmatrix} [\mathbf{y} \tilde{\mu}_1^\top] & \dots & [\mathbf{y} \tilde{\mu}_k^\top] \end{bmatrix} \in \mathbb{R}^{n \times n \times k} \end{aligned} \nonumber $$ And then: $$ \begin{aligned} \frac{\partial \mathbf{C}}{\partial \alpha_t} &= \frac{\partial}{\partial \alpha} \left[ (\mathbf{y} - \mu)(\mathbf{y} - \mu)^\top - \text{diag}(\mu) \right] \\ &= \frac{\partial}{\partial \alpha_t} \left[ \mathbf{y}\mathbf{y}^\top - \mathbf{y}\mu^\top - \mu \mathbf{y}^\top + \mu \mu^\top - \text{diag}(\mu) \right] \\ &= - \mathbf{y}\tilde{\mu}_t^\top - \tilde{\mu}_t^\top \mathbf{y} + \begin{bmatrix} \mathbf{0}_{n \times \sum_{i < t}n_i} & \mu \mu_t^\top & \mathbf{0}_{n \times \sum_{i > t} n_i} \end{bmatrix} - \text{diag}\left(\tilde{\mu}_{t}\right) \in \mathbb{R}^{n \times n} \end{aligned} \nonumber $$ For the derivative of the log-likelihood, we have: $$ \begin{aligned} \frac{\partial}{\partial \alpha_t} \left[ \ell(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right] &= \sum_{i = 1}^k \sum_{j = 1}^{n_i} \frac{\partial}{\partial \alpha_t} \left[ \mathbf{y}_{i,j} \log(\mu_{i,j}) - \mu_{i,j} - \log(\mathbf{y}_{i,j}!) \right] \\ &= \sum_{j = 1}^{n_t} \left[ \mathbf{y}_{t, j} \frac{1}{\mu_{t,j}} \frac{\partial \mu_{t, j}}{\partial \alpha_t} - \frac{\partial \mu_{t, j}}{\partial \alpha_t} \right] \\ &= \sum_{j = 1}^{n_t} \left[ \mathbf{y}_{t, j} \frac{\exp(\eta_{t,j})}{\mu_{t,j}} \frac{\partial \eta_{t,j}}{\partial \alpha_t} - \exp(\eta_{t,j}) \frac{\partial \eta_{t,j}}{\partial \alpha_t} \right] \\ &= \sum_{j = 1}^{n_t} \left[ \mathbf{y}_{t, j} - \mu_{t,j} \right] \end{aligned} \nonumber $$
Since the matrix trace is a linear operator, we can find the trace of all of the inner parts separately: $$ \text{tr}\left[ \frac{\partial}{\partial \alpha_t} \left[ \mathbf{Z}^\top \mathbf{y} \mu^\top \mathbf{Z} \right] \right] = \text{tr}\left[ \frac{\partial}{\partial \alpha_t} \left[ \mathbf{Z}^\top \mu \mathbf{y}^\top \mathbf{Z} \right] \right] = \left(\tilde{\mathbf{Z}}_t^\top \mathbf{y}_t\right)\left(\tilde{\mathbf{Z}}_t^\top \mu_t \right) $$
Proof. $$ \begin{aligned} \mathbf{y} \tilde{\mu}_t^\top &= \begin{bmatrix} 0 & \dots & 0 & \mathbf{y}_{1, 1} \mu_{t, 1} & \dots & \mathbf{y}_{1, 1} \mu_{t, n_t} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \mathbf{y}_{k, n_k} \mu_{t, 1} & \dots & \mathbf{y}_{k, n_k} \mu_{t, n_t} & 0 & \dots & 0 \end{bmatrix} \in \mathbb{R}^{n \times n} \\ \mathbf{Z}^\top &= \begin{bmatrix} \mathbf{Z}_{1,1} & \dots & \mathbf{Z}_{1, n_1} & 0 & \dots & 0 & \dots & 0 & \dots & 0 \\ 0 & \dots & 0 & \mathbf{Z}_{2, 1} & \dots & \mathbf{Z}_{2, n_2} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & 0 & \dots & 0 & \mathbf{Z}_{k, 1} & \dots & \mathbf{Z}_{k, n_k} \end{bmatrix} \in \mathbb{R}^{k \times n} \\ \implies \mathbf{Z}^\top \mathbf{y}\tilde{\mu}_t^\top &= \begin{bmatrix} 0 & \dots & 0 & \mu_{t, 1} \sum_{i = 1}^{n_1} \mathbf{Z}_{1, i} \mathbf{y}_{1, i} & \dots & \mu_{t, n_t} \sum_{i =1}^{n_1} \mathbf{Z}_{1, i} \mathbf{y}_{1, i} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \mu_{t, 1} \sum_{i = 1}^{n_k} \mathbf{Z}_{k, i} \mathbf{y}_{k, i} & \dots & \mu_{t, n_t} \sum_{i = 1}^{n_k} \mathbf{Z}_{k, i} \mathbf{y}_{k, i} & 0 & \dots & 0 \end{bmatrix} \in \mathbb{R}^{k \times n}\\ \implies \mathbf{Z}^\top \mathbf{y} \tilde{\mu}_t^\top \mathbf{Z} &= \begin{bmatrix} 0 & \dots & 0 & \sum_{i = 1}^{n_1} \sum_{j = 1}^{n_t} \mathbf{Z}_{1, i} \mathbf{Z}_{t, j} \mathbf{y}_{1, i} \mu_{t, j} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \sum_{i = 1}^{n_k} \sum_{j = 1}^{n_t} \mathbf{Z}_{k, i} \mathbf{Z}_{t, j} \mathbf{y}_{k, i} \mu_{t, j} & 0 & \dots & 0 \\ \end{bmatrix} \\ &= \begin{bmatrix} 0 & \dots & 0 & \left( \tilde{\mathbf{Z}}_1^\top \mathbf{y}_{1}\right) \left(\tilde{\mathbf{Z}}^\top_{t} \mu_{t} \right) & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \left( \tilde{\mathbf{Z}}_k^\top \mathbf{y}_{k}\right) \left(\tilde{\mathbf{Z}}^\top_{t} \mu_{t} \right) & 0 & \dots & 0 \\ \end{bmatrix} \\ \implies \text{tr} \left[ \mathbf{Z}^\top \mathbf{y} \tilde{\mu}_t^\top \mathbf{Z} \right] &= \left(\tilde{\mathbf{Z}}_t^\top \mathbf{y}_t\right)\left(\tilde{\mathbf{Z}}_t^\top \mu_t \right) \end{aligned} \nonumber $$
$$ \text{tr}\left[ \mathbf{Z}^\top \frac{\partial}{\partial\alpha_t} \left[ \mu \mu^\top \right] \mathbf{Z}\right] = \left(\tilde{\mathbf{Z}}_t^\top \mu_t\right)\left(\tilde{\mathbf{Z}}_t^\top \mu_t \right) $$
Proof. $$ \begin{aligned} \frac{\partial}{\partial \alpha_t} \left[ \mu \mu^\top \right] &= \begin{bmatrix} \mathbf{0}_{n \times \sum_{i < t}n_i} & \mu \mu_t^\top & \mathbf{0}_{n \times \sum_{i > t} n_i} \end{bmatrix} = \begin{bmatrix} 0 & \dots & 0 & \mu_{1, 1} \mu_{t, 1} & \dots & \mu_{1, 1} \mu_{t, n_t} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \mu_{1, n_1} \mu_{t, 1} & \dots & \mu_{1, n_1} \mu_{t, n_t} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & 2\mu_{t, 1}^2 & \dots & 2\mu_{t, 1}\mu_{t, n_t} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & 2\mu_{t, n_t} \mu_{t, 1} & \dots & 2\mu_{t, n_t}^2 & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \mu_{k, 1} \mu_{t, 1} & \dots & \mu_{k, 1} \mu_{t, n_t} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \mu_{k, n_k} \mu_{t, 1} & \dots & \mu_{k, n_k} \mu_{t, n_t} & 0 & \dots & 0 \end{bmatrix} \in \mathbb{R}^{n \times n} \\ \mathbf{Z}^\top &= \begin{bmatrix} \mathbf{Z}_{1,1} & \dots & \mathbf{Z}_{1, n_1} & 0 & \dots & 0 & \dots & 0 & \dots & 0 \\ 0 & \dots & 0 & \mathbf{Z}_{2, 1} & \dots & \mathbf{Z}_{2, n_2} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & 0 & \dots & 0 & \mathbf{Z}_{k, 1} & \dots & \mathbf{Z}_{k, n_k} \end{bmatrix} \in \mathbb{R}^{k \times n} \\ \implies \mathbf{Z}^\top \frac{\partial}{\partial \alpha_t} \left[ \mu \mu^\top \right] &= \begin{bmatrix} 0 & \dots & 0 & \mu_{t, 1} \sum_{i = 1}^{n_1} \mathbf{Z}_{1, i} \mu_{1, i} & \dots & \mu_{t, n_t} \sum_{i =1}^{n_1} \mathbf{Z}_{1, i} \mu_{1, i} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \mu_{t, 1} \sum_{i = 1}^{n_k} \mathbf{Z}_{k, i} \mu_{k, i} & \dots & \mu_{t, n_t} \sum_{i = 1}^{n_k} \mathbf{Z}_{k, i} \mu_{k, i} & 0 & \dots & 0 \end{bmatrix} \in \mathbb{R}^{k \times n}\\ \implies \mathbf{Z}^\top \frac{\partial}{\partial \alpha_t} \left[ \mu \mu^\top \right] \mathbf{Z} &= \begin{bmatrix} 0 & \dots & 0 & \sum_{i = 1}^{n_1} \sum_{j = 1}^{n_t} \mathbf{Z}_{1, i} \mathbf{Z}_{t, j} \mu_{1, i} \mu_{t, j} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \sum_{i = 1}^{n_k} \sum_{j = 1}^{n_t} \mathbf{Z}_{k, i} \mathbf{Z}_{t, j} \mu_{k, i} \mu_{t, j} & 0 & \dots & 0 \\ \end{bmatrix} \\ &= \begin{bmatrix} 0 & \dots & 0 & \left( \tilde{\mathbf{Z}}_1^\top \mu_{1}\right) \left(\tilde{\mathbf{Z}}^\top_{t} \mu_{t} \right) & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \left( \tilde{\mathbf{Z}}_k^\top \mu_{k}\right) \left(\tilde{\mathbf{Z}}^\top_{t} \mu_{t} \right) & 0 & \dots & 0 \\ \end{bmatrix} \\ \implies \text{tr} \left[ \mathbf{Z}^\top \mu \tilde{\mu}_t^\top \mathbf{Z} \right] &= \left(\tilde{\mathbf{Z}}_t^\top \mu_t\right)\left(\tilde{\mathbf{Z}}_t^\top \mu_t \right) \end{aligned} \nonumber $$
$$ \text{tr}\left[ \mathbf{Z}^\top \text{diag}(\tilde{\mu}_t) \mathbf{Z} \right] = \sum_{i = 1}^{n_t} \mathbf{Z}_{t, i}^2 \mu_{t, i} $$
Proof. $$ \begin{aligned} \mathbf{Z}^\top \text{diag}(\tilde{\mu}_t) &= \begin{bmatrix} 0 & \dots & 0 & 0 & \dots & 0 & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \mathbf{Z}_{t, 1} \mu_{t, 1} & \dots & 0 & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & 0 & \dots & \mathbf{Z}_{t, n_t} \mu_{t, n_t} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & 0 & \dots & 0 & 0 & \dots & 0 \end{bmatrix} \\ \implies \mathbf{Z}^\top \text{diag}(\tilde{\mu}_t) \mathbf{Z} &= \begin{bmatrix} 0 & \dots & 0 & 0 & \dots & 0 & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \mathbf{Z}_{t, 1}^2 \mu_{t, 1} & \dots & 0 & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & 0 & \dots & \mathbf{Z}_{t, n_t}^2 \mu_{t, n_t} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & 0 & \dots & 0 & 0 & \dots & 0 \end{bmatrix} \\ \implies \text{tr}\left[ \mathbf{Z}^\top \text{diag}(\tilde{\mu}_t) \mathbf{Z} \right] &= \sum_{i = 1}^{n_t} \mathbf{Z}_{t, i}^2 \mu_{t, i} \end{aligned} \nonumber $$
To find $U_{\alpha}\rvert_{H_0}$, we just plug-in: $$ \begin{aligned} U_{\alpha_t}\rvert_{H_0} &\approx \frac{\partial}{\partial \alpha_t} \left[ \left[ \ell(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right]_{\beta = \mathbf{0}} + \frac{\tau^2}{2} \text{tr}\left[ \mathbf{Z}^\top \left[ (\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0) \right] \mathbf{Z} \right] \right] \\ &= \sum_{j = 1}^{n_t} \left[ \mathbf{y}_{t, j} - \mu^0_{t,j} \right] + \frac{\tau^2}{2} \text{tr} \left[ \frac{\partial}{\partial \alpha_t} \left[ \mathbf{Z}^\top \mathbf{C}^0 \mathbf{Z} \right] \right] \\ &= \sum_{j = 1}^{n_t} \left[ \mathbf{y}_{t, j} - \mu^0_{t,j} \right] + \frac{\tau^2}{2} \text{tr}\left[ \mathbf{Z}^\top \left( - \mathbf{y}\tilde{\mu}_t^\top - \tilde{\mu}_t \mathbf{y}^\top + \frac{\partial}{\partial \alpha_t} \left[ \mu^0 (\mu^0)^\top \right] - \text{diag}\left(\tilde{\mu}_{t}\right) \right) \mathbf{Z} \right] \\ &= \sum_{j = 1}^{n_t} \left[ \mathbf{y}_{t, j} - \mu^0_{t,j} \right] + \frac{\tau^2}{2} \left[ -2\left(\tilde{\mathbf{Z}}_t^\top \mathbf{y}_t\right)\left(\tilde{\mathbf{Z}}_t^\top \mu_t\right) + \left(\tilde{\mathbf{Z}}_t^\top \mu_t\right)^2 - \sum_{i = 1}^{n_t}\tilde{\mathbf{Z}}_{t,i}^2 \mu_{t,i} \right] \end{aligned} \nonumber $$

Score For $\tau^2$

The score for $\tau^2$ has elements:

\[U_{\tau^2_j}\rvert_{H_0} = \frac{\partial}{\partial \tau^2_j} [\ell_q(\mathbf{y}; \alpha, \tau^2) ] \approx \frac{1}{2} \text{tr} \left[ \mathbf{C}^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top\right] = \frac{1}{2} \left( (\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top \Gamma^0 (\Delta^0)^{-1}(\mathbf{y} - \mu^0) - \text{tr} \left[ \Gamma^0_0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top\right] \right)\]
Proof. $$ \begin{aligned} U_{\tau^2_j} &= \frac{\partial}{\partial \tau^2_j} [\ell_q(\mathbf{y}; \alpha, \tau^2) ] \\ &\approx \frac{\partial}{\partial \tau^2_j} \left[ \left[ \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]_{\beta = \mathbf{0}} + \frac{1}{2} \text{tr} \left[ \mathbf{Z}^\top \mathbf{C}^0 \mathbf{Z} D(\tau^2) \right] \right] & \left(\text{Eq. } \eqref{eq:log-quasi-lik} \right) \\ &= \frac{1}{2} \frac{\partial}{\partial \tau^2_j} \left[ \text{tr} \left[ \mathbf{Z}^\top \mathbf{C}^0 \mathbf{Z} D(\tau^2) \right] \right] & \left(\text{no } \tau^2 \text{ in first term} \right) \\ &= \frac{1}{2} \frac{\partial}{\partial \tau^2_j} \left[ \text{tr} \left[ \mathbf{C}^0 \mathbf{Z} D(\tau^2) \mathbf{Z}^\top \right] \right] \\ &= \frac{1}{2} \sum_{i = 1}^n \sum_{i' = 1}^n \frac{\partial}{\partial \tau^2_j} \left[ \mathbf{C}^0_{i, i'} \left(\mathbf{Z} D(\tau^2) \mathbf{Z}^\top \right)_{i, i'}\right] \\ &= \frac{1}{2} \sum_{i = 1}^n \sum_{i' = 1}^n \mathbf{C}^0_{i, i'} \frac{\partial}{\partial \tau^2_j} \left[ \left(\mathbf{Z} D(\tau^2) \mathbf{Z}^\top \right)_{i, i'}\right] & \left(\text{no } \tau^2 \text{ in } \mathbf{C}^0 \right) \\ &= \frac{1}{2} \sum_{i = 1}^n \sum_{i' = 1}^n \mathbf{C}^0_{i, i'} \left(\mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top \right)_{i, i'} \\ &= \frac{1}{2} \text{tr} \left[ \mathbf{C}^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top\right] \end{aligned} \nonumber $$ We can rewrite the score in terms of the quantities we defined in Eqs. \eqref{eq:intermed-terms} and \eqref{eq:intermed-mats}: $$ \begin{aligned} U_{\tau^2_j} &= \frac{1}{2} \text{tr} \left[ \mathbf{C}^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top\right] \\ &= \frac{1}{2} \text{tr} \left[ \left[ \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta}\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta^\top} + \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top} \right]\bigg\rvert_{H_0} \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top\right] \\ &= \frac{1}{2} \text{tr} \left[ \left(\Gamma^0 (\Delta^0)^{-1}(\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 - \Gamma^0_0\right)\mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top\right] \\ &= \frac{1}{2} \left( \text{tr} \left[ \Gamma^0 (\Delta^0)^{-1}(\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top \right] - \text{tr} \left[ \Gamma_0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top\right] \right)\\ &= \frac{1}{2} \left( \text{tr} \left[(\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top \Gamma^0 (\Delta^0)^{-1}(\mathbf{y} - \mu^0)\right] - \text{tr} \left[ \Gamma^0_0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top\right] \right)\\ &= \frac{1}{2} \left( (\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top \Gamma^0 (\Delta^0)^{-1}(\mathbf{y} - \mu^0) - \text{tr} \left[ \Gamma^0_0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top\right] \right) & \left(\text{first term is a scalar}\right)\\ \end{aligned} \nonumber $$ where the first term in the parentheses and $\Gamma_0$ are all evaluated at $\beta = \mathbf{0}$.
Example.
This is very straightforward: $$ \begin{aligned} U_\tau^2\rvert_{H_0} &\approx \frac{\partial}{\partial \tau^2} \left[ [\ell(\mathbf{y}; \alpha, \tau^2 \rvert \beta)]_{\beta = \mathbf{0}} + \frac{\tau^2}{2} \text{tr}\left[ \mathbf{Z}^\top\left[ (\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0)\right] \mathbf{Z} \right] \right] \\ &= \frac{1}{2}\text{tr}\left[ \mathbf{Z}^\top\left[ (\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0)\right] \mathbf{Z} \right] \end{aligned} $$

Score For $\phi$

Technically there is also the scale parameter, but we will just assume it is known for now 🤫.


Information

A Bit Of Discussion

We can think of $U_\tau^2$ and $U_\alpha$ as components of one big score:

\[U = \begin{bmatrix} U_\tau^2 \\ U_\alpha \end{bmatrix} \label{eq:full-score}\]

The variance of the score evaluated at the true value of the parameter is the Fisher information:

\[\mathcal{I} = \mathbb{E}\left[ (U - \mathbb{E}[U])(U - \mathbb{E}[U])^\top \right] = \mathbb{E}[UU^\top] \label{eq:full-info}\]

where the second equality follows from the fact that the score has expectation zero when the null hypothesis is true (i.e. the null hypothesis value for the parameter is correct).

To perform a score test for just $\tau^2$, we need the variance of just $U_\tau^2$ (the Fisher information). We can partition $\mathcal{I}$ in the following way:

\[\mathcal{I} = \begin{bmatrix} \mathcal{I}_{\tau^2, \tau^2} & \mathcal{I}_{\tau^2, \alpha} \\ \mathcal{I}_{\alpha, \tau^2} & \mathcal{I}_{\alpha, \alpha} \end{bmatrix} \label{eq:partition-info}\]

where:

\[\mathcal{I}_{\tau^2, \tau^2} = \mathbb{E}\left[ \frac{\partial \ell(\mathbf{y}; \alpha, \tau^2)}{\partial \tau^2} \frac{\partial \ell(\mathbf{y}; \alpha, \tau^2)}{\partial (\tau^2)^\top}\right], \hspace{5mm} \mathcal{I}_{\alpha, \tau^2} = \mathbb{E}\left[ \frac{\partial \ell(\mathbf{y}; \alpha, \tau^2)}{\partial \alpha} \frac{\partial \ell(\mathbf{y}; \alpha, \tau^2)}{\partial (\tau^2)^\top}\right], \hspace{5mm} \mathcal{I}_{\alpha, \alpha} = \mathbb{E}\left[ \frac{\partial \ell(\mathbf{y}; \alpha, \tau^2)}{\partial \alpha} \frac{\partial \ell(\mathbf{y}; \alpha, \tau^2)}{\partial \alpha^\top}\right] \label{eq:info-blocks}\]

The parameter of interest for our test is $\tau^2$, so we only really care about the block of \(\mathcal{I}^{-1}\) corresponding to $\tau^2$. However, this block is not equal to the inverse of the corresponding block in $\mathcal{I}$ due to the presence and estimation of $\alpha$. That is:

\[\left(\mathcal{I}^{-1}\right)_{\tau^2, \tau^2} \neq \left(\mathcal{I}_{\tau^2, \tau^2} \right)^{-1} \label{eq:tricky-neq}\]

To overcome this, can use the Schur complement for inverting a partitioned matrix. For general block matrix, $M$, we have:

\[M = \begin{bmatrix} A & B \\ C & D \end{bmatrix} \implies M^{-1} = \begin{bmatrix} (A - BD^{-1} C)^{-1} & -(A - B D^{-1} C)^{-1} BD^{-1} \\ -D^{-1} C (A - BD^{-1}C)^{-1} & D^{-1} + D^{-1}C(A - B D^{-1} C)^{-1} B D^{-1} \end{bmatrix} \nonumber\]

Thus, the (relevant parts of the) Fisher information matrix can be computed as:

\[\mathcal{I}^{-1}_{\tau^2, \tau^2} = \left( \mathcal{I}_{\tau^2, \tau^2} - \mathcal{I}_{\alpha, \tau^2} \mathcal{I}^{-1}_{\alpha, \alpha} \mathcal{I}_{\tau^2, \alpha} \right)^{-1} \label{eq:relevant-fisher-info}\]

A more detailed discussion of this partitioning idea can be found in my post on estimation theory.

In the remaining sections, we’ll slightly modify our notation. We’ll use a superscript of $0$ to denote a quantity (e.g. $\mu$, $\eta$, $\delta$, etc.) evaluated by setting $\beta = \mathbf{0}$ and $\alpha = \hat{\alpha}_{MLE}$, the maximum likelihood estimate of $\alpha$ under the null hypothesis.

Recall that we assumed that $D(\tau^2) = \mathbf{0}$ if $\tau^2 = \mathbf{0}$. This simplifies the score for $\alpha$ a lot when we evaluate it under the null hypothesis:

\[\begin{aligned} U_{\alpha_j}\rvert_{H_0} &\approx \left(\mathbf{X}_{\cdot, j}\right)^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) \\ U_{\tau^2_j}\rvert_{H_0} &\approx \frac{1}{2}\left((\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1}\Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top \Gamma^0 (\Delta^0)^{-1}(\mathbf{y} - \mu^0) - \text{tr}\left[ \Gamma^0_0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top\right]\right) \end{aligned}\]

Information For $(\alpha, \alpha)$

\[\begin{aligned} \mathcal{I}_{\alpha_i, \alpha_j}\rvert_{H_0} &\approx \mathbb{E}_{H_0}\left[ U_{\alpha_i} U_{\alpha_j}\right] = (\mathbf{X}_{\cdot, i})^\top \Gamma^0 \mathbf{X}_{\cdot, j} \end{aligned} \label{eq:info-alpha-alpha}\]
Proof. $$ \begin{aligned} \mathcal{I}_{\alpha_i, \alpha_j}\rvert_{H_0} &= \mathbb{E}_{H_0}\left[ U_{\alpha_i} U_{\alpha_j}\right] \\ &\approx \mathbb{E}_{H_0}\left[ \left(\mathbf{X}_{\cdot, i}\right)^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) \left(\mathbf{X}_{\cdot, j}\right)^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) \right] \\ &= \mathbb{E}_{H_0}\left[ \left(\mathbf{X}_{\cdot, i}\right)^\top (\mathbf{y} - \mu^0) \left(\mathbf{X}_{\cdot, j}\right)^\top (\mathbf{y} - \mu^0) \right] \\ &= \mathbb{E}_{H_0}\left[ \left(\sum_{h = 1}^{n} \frac{\gamma^0_h}{\delta^0_h} \mathbf{X}_{h, i}(\mathbf{y}_h - \mu^0_h) \right) \left(\sum_{h = 1}^{n} \frac{\gamma^0_h}{\delta^0_h} \mathbf{X}_{h, j}(\mathbf{y}_h - \mu^0_h) \right) \right] \\ &= \sum_{h = 1}^{n} \left( \mathbb{E}_{H_0}\left[ \frac{(\gamma^0_h)^2}{(\delta^0_h)^2}\mathbf{X}_{h, i}\mathbf{X}_{h, j} (\mathbf{y}_h - \mu^0_h)^2 \right] + \sum_{l \neq h} \mathbb{E}_{H_0}\left[ \frac{\gamma^0_h \gamma^0_l}{\delta^0_h \delta^0_l} \mathbf{X}_{h, i}\mathbf{X}_{l, j}(\mathbf{y}_h - \mu^0_h)(\mathbf{y}_l - \mu^0_l) \right] \right) \\ &= \sum_{h = 1}^{n} \left( \frac{(\gamma^0_h)^2}{(\delta^0_h)^2}\mathbf{X}_{h, i}\mathbf{X}_{h, j} \mathbb{E}_{H_0}\left[ (\mathbf{y}_h - \mu^0_h)^2 \right] + \sum_{l \neq h} \frac{\gamma^0_h \gamma^0_l}{\delta^0_h \delta^0_l} \mathbf{X}_{h, i}\mathbf{X}_{l, j} \mathbb{E}_{H_0}\left[ \mathbf{y}_h - \mu^0_h \right]\mathbb{E}\left[ \mathbf{y}_l - \mu^0_l \right] \right) & \left(\mathbf{y}_i\text{'s independent}\right) \\ &= \sum_{h = 1}^{n} \frac{(\gamma^0_h)^2}{(\delta^0_h)^2} \mathbf{X}_{h, i}\mathbf{X}_{h, j} \text{Var}_{H_0}\left( \mathbf{y}_h \right) & \left(\mathbb{E}_{H_0}\left[ \mathbf{y}_h - \mu^0_h \right] = 0\right) \\ &= \sum_{h = 1}^{n} \frac{\left(\frac{\partial \eta^0_h}{\partial \mu_h}\right)^2}{v^2(\mu^0_h) \left(\frac{\partial \eta^0_h}{\partial \mu_h}\right)^4}\mathbf{X}_{h, i}\mathbf{X}_{h, j} \text{Var}_{H_0}\left( \mathbf{y}_h \right) & \left(\text{Eq. } \eqref{eq:intermed-terms}\right)\\ &= \sum_{h = 1}^{n} \frac{1}{v^2(\mu^0_h) \left(\frac{\partial \eta^0_h}{\partial \mu_h}\right)^2}\mathbf{X}_{h, i}\mathbf{X}_{h, j} v(\mu^0_h) & \left(\text{Eq. } \eqref{eq:quasi-lik} \right)\\ &= \sum_{h = 1}^n \gamma^0_h \mathbf{X}_{h, i}\mathbf{X}_{h, j} \\ &= (\mathbf{X}_{\cdot, i})^\top \Gamma^0 \mathbf{X}_{\cdot, j} \end{aligned} \nonumber $$
Example.
In this special case, there is no $\mathbf{X}$ (i.e. it is all ones) but the fixed intercepts are cluster-specific, and the general form is confusing me. I think it's easier to just derive it again. Since we will be evaluating everything under the null hypothesis at the end, we can ignore the second term in the score for $\alpha$ (the leading $\tau^2$ will make it equal to zero). Thus: $$ \mathcal{I}_{\alpha, \alpha}\rvert_{H_0} \approx \begin{bmatrix} n_1 \exp(\alpha_1) & 0 & \dots & 0\\ 0 & n_2 \exp(\alpha_2) & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & n_k \exp(\alpha_k) \end{bmatrix} $$
Proof. $$ \begin{aligned} \mathbb{E}_{H_0}\left[ U_{\alpha_t}(\mathbf{0}) U_{\alpha_s}(\mathbf{0}) \right] &\approx \mathbb{E}_{H_0}\left[ \left(\sum_{j = 1}^{n_t} \left[ \mathbf{y}_{t, j} - \mu_{t, j}^0\right] \right) \left(\sum_{j = 1}^{n_s} \left[ \mathbf{y}_{s, j} - \mu_{s, j}^0\right] \right)\right] \\ &= \left( \sum_{j = 1}^{n_t} \mathbb{E}_{H_0}\left[\mathbf{y}_{t, j} - \mu_{t, j}^0\right] \right)\left( \sum_{j = 1}^{n_s} \mathbb{E}_{H_0}\left[\mathbf{y}_{s, j} - \mu_{s, j}^0\right] \right) \\ &= 0 \\ \mathbb{E}_{H_0}\left[ U_{\alpha_t}^2(\mathbf{0}) \right] &\approx \mathbb{E}_{H_0}\left[ \left(\sum_{j = 1}^{n_t} \left[ \mathbf{y}_{t, j} - \mu_{t, j}^0\right] \right)^2\right] \\ &= \sum_{j = 1}^{n_t} \mathbb{E}_{H_0}\left[ (\mathbf{y}_{t,j} - \mu_{t,j}^0)^2\right] + \sum_{j = 1}^{n_t} \sum_{j' = 1 \\ j' \neq j}^{n_t} \mathbb{E}_{H_0}\left[\mathbf{y}_{t,j} - \mu_{t,j}^0 \right]\mathbb{E}\left[\mathbf{y}_{t,j'} - \mu_{t,j'}^0 \right] \\ &= \sum_{j = 1}^{n_t} \mu_{t,j}^0 \\ &= n_t \exp(\alpha_t) \end{aligned} \nonumber $$ where in the third to last equality, we assume that observations within the same cluster are independent.

Information For $(\alpha, \tau^2)$

\[\begin{aligned} \mathcal{I}_{\alpha_j, \tau^2_l}\rvert_{H_0} &= \frac{1}{2} \sum_{i' = 1}^n \mathbf{X}_{i', j} \mathbb{E}_{H_0}\left[ (\mathbf{y}_{i'} - \mu_{i'}^0)^3 \right] \left( \gamma_{i'}^0 (\delta_{i'}^0)^{-1} \right)^3 \left[ \mathbf{Z}_{i', \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{i', \cdot})^\top \right] \\ &= \frac{1}{2} \left[ \sum_{i' = 1}^n \mathbf{X}_{i', j} \kappa_{i'}^3 \left( \gamma_{i'}^0 (\delta_{i'}^0)^{-1} \right)^3 \left[ \mathbf{Z}_{i', \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{i', \cdot})^\top \right] + \sum_{i = 1}^n \mathbf{X}_{i,j} \gamma_i^0 (\delta_i^0)^{-1} \rho^0_i \kappa_{i'}^2 \left[ \mathbf{Z}_{i, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{i, \cdot})^\top \right] \right] \\ &= \frac{1}{2} \sum_{i = 1}^n \mathbf{X}_{i, j} \left[ \mathbf{Z}_{i, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{i, \cdot})^\top \right] \left(\kappa_{i}^3 \left( \gamma_{i}^0 (\delta_{i}^0)^{-1} \right)^3 - \gamma_i^0 (\delta_i^0)^{-1} \rho^0_i \kappa_{i}^2 \right) \end{aligned}\]

where \(\kappa^2_{i} = \mathbb{E}\left[(\mathbf{y}_{i} - \mu_{i}^0)^2 \right]\) and \(\kappa^3_{i} = \mathbb{E}\left[(\mathbf{y}_{i} - \mu_{i}^0)^3 \right]\) are the second and third central moments of $\mathbf{y}_{i}$ under the null (see this page on the equivalence between central moments and cumulants).

Proof. $$ \begin{aligned} \mathcal{I}_{\alpha_j, \tau^2_l} \rvert_{H_0} &= \mathbb{E}_{H_0}\left[ U_{\alpha_j}(0) U_{\tau^2_l}(0)\right] \\ &\approx \mathbb{E}_{H_0}\left[ \left( \left(\mathbf{X}_{\cdot, j}\right)^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) \right) \left( \frac{1}{2}\left( (\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1}\Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \Gamma^0 (\Delta^0)^{-1}(\mathbf{y} - \mu^0) - \text{tr}\left[ \Gamma^0_0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top\right]\right) \right) \right] \\ &= \frac{1}{2} \underbrace{\mathbb{E}_{H_0}\left[ \left( \left(\mathbf{X}_{\cdot, j}\right)^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) \right) \left( (\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1}\Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \Gamma^0 (\Delta^0)^{-1}(\mathbf{y} - \mu^0) \right) \right]}_{(\star)} - \underbrace{\mathbb{E}_{H_0}\left[ \left( \left(\mathbf{X}_{\cdot, j}\right)^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0)\right) \text{tr}\left[ \Gamma_0^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \right] \right]}_{(\star \star)} \\ &= \frac{1}{2} \left[ \sum_{i' = 1}^n \mathbf{X}_{i', j} \mathbb{E}_{H_0}\left[ (\mathbf{y}_{i'} - \mu_{i'}^0)^3 \right] \left( \gamma_{i'}^0 (\delta_{i'}^0)^{-1} \right)^3 \left[ \mathbf{Z}_{i', \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{i', \cdot})^\top \right] - \sum_{i = 1}^n \mathbf{X}_{i,j} \gamma_i^0 (\delta_i^0)^{-1} \rho^0_i \mathbb{E}_{H_0}\left[ (\mathbf{y}_i - \mu_i^0)^2 \right] \left[ \mathbf{Z}_{i, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{i, \cdot})^\top \right] \right] \end{aligned} \nonumber $$
Proof of $(\star)$. We can rewrite the first term in $(\star)$ as the summation: $$ \left(\mathbf{X}_{\cdot, j}\right)^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) = \sum_{i = 1}^n \mathbf{X}_{i,j} \gamma_i^0 (\delta_i^0)^{-1} (\mathbf{y}_i - \mu_i^0) \nonumber $$ We can also write the second term in $(\star)$ as a summation: $$ (\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1}\Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \Gamma^0 (\Delta^0)^{-1}(\mathbf{y} - \mu^0) = \sum_{i' = 1}^n \left( \sum_{i = 1}^n \left[(\mathbf{y}_{i} - \mu_i^0)\gamma_i^0 (\delta_i^0)^{-1}\right]\left[ \mathbf{Z}_{i, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{i', \cdot})^\top \right] \right) (\mathbf{y}_{i'} - \mu_{i'}^0)\gamma_i^0 (\delta_i^0)^{-1} \nonumber $$
Proof. The large vector/matrix product can be broken down into the following pieces: $$ \begin{aligned} (\mathbf{y}- \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 &= \begin{bmatrix} (\mathbf{y}_1 - \mu_1^0)(\gamma_1^0 / \delta_1^0) \\ \vdots \\ (\mathbf{y}_n - \mu_n^0)(\gamma_n^0 / \delta_n^0) \end{bmatrix}^\top \\ \Gamma^0 (\Delta^0)^{-1}(\mathbf{y}- \mu^0) &= \begin{bmatrix} (\mathbf{y}_1 - \mu_1^0)(\gamma_1^0 / \delta_1^0) \\ \vdots \\ (\mathbf{y}_n - \mu_n^0)(\gamma_n^0 / \delta_n^0) \end{bmatrix} \\ \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top &= \begin{bmatrix} \mathbf{Z}_{1, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \left( \mathbf{Z}_{1, \cdot} \right)^\top & \dots & \mathbf{Z}_{1, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \left( \mathbf{Z}_{n, \cdot} \right)^\top \\ \vdots & \ddots & \vdots \\ \mathbf{Z}_{n, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \left( \mathbf{Z}_{1, \cdot} \right)^\top & \dots & \mathbf{Z}_{n, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \left( \mathbf{Z}_{n, \cdot} \right)^\top \end{bmatrix} \end{aligned} \nonumber $$
Proof. $$ \begin{aligned} (\mathbf{y}- \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 &= \begin{bmatrix} \mathbf{y}_1 - \mu_1^0 \\ \vdots \\ \mathbf{y}_n - \mu_n^0 \end{bmatrix}^\top \begin{bmatrix} 1/ \delta^0_1 & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \vdots & 1/ \delta^0_n \end{bmatrix} \begin{bmatrix} \gamma_1^0 & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \vdots & \gamma_n^0 \end{bmatrix} \\ &= \begin{bmatrix} \mathbf{y}_1 - \mu_1^0 \\ \vdots \\ \mathbf{y}_n - \mu_n^0 \end{bmatrix}^\top \begin{bmatrix} \gamma_1^0 / \delta^0_1 & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \vdots & \gamma_n^0 / \delta^0_n \end{bmatrix} \\ &= \begin{bmatrix} (\mathbf{y}_1 - \mu_1^0)(\gamma_1^0 / \delta_1^0) \\ \vdots \\ (\mathbf{y}_n - \mu_n^0)(\gamma_n^0 / \delta_n^0) \end{bmatrix}^\top \end{aligned} \nonumber $$ $$ \begin{aligned} \Gamma^0 (\Delta^0)^{-1} (\mathbf{y}- \mu^0) &= \begin{bmatrix} \gamma_1^0 & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \vdots & \gamma_n^0 \end{bmatrix} \begin{bmatrix} 1/ \delta^0_1 & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \vdots & 1/ \delta^0_n \end{bmatrix} \begin{bmatrix} \mathbf{y}_1 - \mu_1^0 \\ \vdots \\ \mathbf{y}_n - \mu_n^0 \end{bmatrix} \\ &= \begin{bmatrix} \gamma_1^0 / \delta^0_1 & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \vdots & \gamma_n^0 / \delta^0_n \end{bmatrix} \begin{bmatrix} \mathbf{y}_1 - \mu_1^0 \\ \vdots \\ \mathbf{y}_n - \mu_n^0 \end{bmatrix} \\ &= \begin{bmatrix} (\mathbf{y}_1 - \mu_1^0)(\gamma_1^0 / \delta_1^0) \\ \vdots \\ (\mathbf{y}_n - \mu_n^0)(\gamma_n^0 / \delta_n^0) \end{bmatrix} \end{aligned} \nonumber $$ $$ \begin{aligned} \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top &= \begin{bmatrix} \mathbf{Z}_{1, 1} & \dots & \mathbf{Z}_{1, (q \times k)} \\ \vdots & \ddots & \vdots \\ \mathbf{Z}_{n, 1} & \dots & \mathbf{Z}_{n, (q \times k)} \end{bmatrix} \begin{bmatrix} \frac{\partial D(\tau^2)_{1,1}}{\partial \tau^2_l} & \dots & \frac{\partial D(\tau^2)_{1,(q \times k)}}{\partial \tau^2_l} \\ \vdots & \ddots & \vdots \\ \frac{\partial D(\tau^2)_{(q \times k),1}}{\partial \tau^2_l} & \dots & \frac{\partial D(\tau^2)_{(q \times k), (q \times k)}}{\partial \tau^2_l} \end{bmatrix} \begin{bmatrix} \mathbf{Z}_{1, 1} & \dots & \mathbf{Z}_{1, n} \\ \vdots & \ddots & \vdots \\ \mathbf{Z}_{(q \times k), 1} & \dots & \mathbf{Z}_{(q \times k), n} \end{bmatrix} \\ &= \begin{bmatrix} \sum_{i = 1}^{q \times k} \mathbf{Z}_{1, i} \frac{\partial D(\tau^2)_{i, 1}}{\partial \tau^2_l} & \dots & \sum_{i = 1}^{q \times k} \mathbf{Z}_{1, i} \frac{\partial D(\tau^2)_{i, (q \times k)}}{\partial \tau^2_l} \\ \vdots & \ddots & \vdots \\ \sum_{i = 1}^{q \times k} \mathbf{Z}_{n, i} \frac{\partial D(\tau^2)_{i, 1}}{\partial \tau^2_l} & \dots & \sum_{i = 1}^{q \times k} \mathbf{Z}_{n, i} \frac{\partial D(\tau^2)_{i, (q \times k)}}{\partial \tau^2_l} \\ \end{bmatrix} \begin{bmatrix} \mathbf{Z}_{1, 1} & \dots & \mathbf{Z}_{1, n} \\ \vdots & \ddots & \vdots \\ \mathbf{Z}_{(q \times k), 1} & \dots & \mathbf{Z}_{(q \times k), n} \end{bmatrix} \\ &= \begin{bmatrix} \sum_{i' = 1}^{q \times k} \left( \sum_{i = 1}^{q \times k} \mathbf{Z}_{1, i} \frac{\partial D(\tau^2)_{i, i'}}{\partial \tau^2_l}\right) \mathbf{Z}_{i', 1} & \dots & \sum_{i' = 1}^{q \times k} \left( \sum_{i = 1}^{q \times k} \mathbf{Z}_{1, i} \frac{\partial D(\tau^2)_{i, i'}}{\partial \tau^2_l}\right) \mathbf{Z}_{i', n} \\ \vdots & \ddots & \vdots \\ \sum_{i' = 1}^{q \times k} \left( \sum_{i = 1}^{q \times k} \mathbf{Z}_{n, i} \frac{\partial D(\tau^2)_{i, i'}}{\partial \tau^2_l}\right) \mathbf{Z}_{i', 1} & \dots & \sum_{i' = 1}^{q \times k} \left( \sum_{i = 1}^{q \times k} \mathbf{Z}_{n, i} \frac{\partial D(\tau^2)_{i, i'}}{\partial \tau^2_l}\right) \mathbf{Z}_{i', n} \end{bmatrix}\\ &= \begin{bmatrix} \sum_{i' = 1}^{q \times k} \mathbf{Z}_{1, \cdot} \frac{\partial D(\tau^2)_{\cdot, i'}}{\partial \tau^2_l} \mathbf{Z}_{i', 1} & \dots & \sum_{i' = 1}^{q \times k} \mathbf{Z}_{1, \cdot} \frac{\partial D(\tau^2)_{\cdot, i'}}{\partial \tau^2_l} \mathbf{Z}_{i', n} \\ \vdots & \ddots & \vdots \\ \sum_{i' = 1}^{q \times k} \mathbf{Z}_{n, \cdot} \frac{\partial D(\tau^2)_{\cdot, i'}}{\partial \tau^2_l} \mathbf{Z}_{i', 1} & \dots & \sum_{i' = 1}^{q \times k} \mathbf{Z}_{n, \cdot} \frac{\partial D(\tau^2)_{\cdot, i'}}{\partial \tau^2_l} \mathbf{Z}_{i', n} \end{bmatrix} \\ &= \begin{bmatrix} \mathbf{Z}_{1, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \left( \mathbf{Z}_{1, \cdot} \right)^\top & \dots & \mathbf{Z}_{1, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \left( \mathbf{Z}_{n, \cdot} \right)^\top \\ \vdots & \ddots & \vdots \\ \mathbf{Z}_{n, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \left( \mathbf{Z}_{1, \cdot} \right)^\top & \dots & \mathbf{Z}_{n, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \left( \mathbf{Z}_{n, \cdot} \right)^\top \end{bmatrix} \end{aligned} \nonumber $$
So that we can rewrite the entire thing as: $$ \begin{aligned} (\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1}\Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \Gamma^0 (\Delta^0)^{-1}(\mathbf{y} - \mu^0) &= \begin{bmatrix} (\mathbf{y}_1 - \mu_1^0)(\gamma_1^0 / \delta_1^0) \\ \vdots \\ (\mathbf{y}_n - \mu_n^0)(\gamma_n^0 / \delta_n^0) \end{bmatrix}^\top \begin{bmatrix} \mathbf{Z}_{1, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \left( \mathbf{Z}_{1, \cdot} \right)^\top & \dots & \mathbf{Z}_{1, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \left( \mathbf{Z}_{n, \cdot} \right)^\top \\ \vdots & \ddots & \vdots \\ \mathbf{Z}_{n, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \left( \mathbf{Z}_{1, \cdot} \right)^\top & \dots & \mathbf{Z}_{n, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \left( \mathbf{Z}_{n, \cdot} \right)^\top \end{bmatrix} \begin{bmatrix} (\mathbf{y}_1 - \mu_1^0)(\gamma_1^0 / \delta_1^0) \\ \vdots \\ (\mathbf{y}_n - \mu_n^0)(\gamma_n^0 / \delta_n^0) \end{bmatrix} \\ &= \begin{bmatrix} \sum_{i = 1}^n \left[(\mathbf{y}_{i} - \mu_i^0)(\gamma_i^0 / \delta_i^0)\right]\left[ \mathbf{Z}_{i, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{1, \cdot})^\top \right] \\ \vdots \\ \sum_{i = 1}^n \left[(\mathbf{y}_{i} - \mu_i^0)(\gamma_i^0 / \delta_i^0)\right]\left[ \mathbf{Z}_{i, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{n, \cdot})^\top \right] \end{bmatrix}^\top \begin{bmatrix} (\mathbf{y}_1 - \mu_1^0)(\gamma_1^0 / \delta_1^0) \\ \vdots \\ (\mathbf{y}_n - \mu_n^0)(\gamma_n^0 / \delta_n^0) \end{bmatrix} \\ &= \sum_{i' = 1}^n \left( \sum_{i = 1}^n \left[(\mathbf{y}_{i} - \mu_i^0)(\gamma_i^0 / \delta_i^0)\right]\left[ \mathbf{Z}_{i, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{i', \cdot})^\top \right] \right) (\mathbf{y}_{i'} - \mu_{i'}^0)(\gamma_{i'}^0 / \delta_{i'}^0) \end{aligned} \nonumber $$
Let $\xi_{i} = (\mathbf{y}_i - \mu_i^0)\gamma_i^0 (\delta_i^0)^{-1}$ and $H_{i, j, k} = \mathbf{Z}_{i, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_k} (\mathbf{Z}_{j, \cdot})^\top$. The product of the two terms is: $$ \begin{aligned} \left( \sum_{j' = 1}^n \mathbf{X}_{j',j} \xi_{i'} \right) \left( \sum_{i' = 1}^n \left( \sum_{i = 1}^n \xi_i H_{i, i', l} \right) \xi_{i'} \right) &= \underbrace{\sum_{i' = 1}^n \mathbf{X}_{i',j} \xi_{i'} \left( \sum_{i = 1}^n \xi_i H_{i, i', l} \right) \xi_{i'}}_{(a)} + \underbrace{\sum_{i' = 1}^n \sum_{j' \neq i'} \left( \mathbf{X}_{j',j} \xi_{i'} \right) \left( \sum_{i = 1}^n \xi_i H_{i, i', l} \right) \xi_{i'}}_{(b)} \end{aligned} \nonumber $$ Keeping in mind that we assume the $\mathbf{y}_i$'s are conditionally independent and have mean $\mu^0_i$ under the null, the expectations of $(a)$ and $(b)$ simplify greatly: $$ \begin{aligned} \mathbb{E}_{H_0}[(a)] &= \mathbb{E}\left[ \sum_{i' = 1}^n \mathbf{X}_{i',j} \xi_{i'} \left( \sum_{i = 1}^n \xi_i H_{i, i', l} \right) \xi_{i'} \right] \\ &= \mathbb{E}_{H_0}\left[ \sum_{i' = 1}^n \mathbf{X}_{i',j} \xi_{i'}^2 \left( \sum_{i = 1}^n \xi_i H_{i, i', l} \right) \right] \\ &= \mathbb{E}_{H_0}\left[ \sum_{i' = 1}^n \mathbf{X}_{i', j} \xi_{i'}^3 H_{i', i', l} + \sum_{i' = 1}^n \sum_{i \neq i'} \mathbf{X}_{i', j} \xi_{i'}^2 \xi_i H_{i, i', l} \right] \\ &= \sum_{i' = 1}^n \mathbf{X}_{i', j} \mathbb{E}_{H_0}\left[ \xi_{i'}^3 \right] H_{i', i', l} + \sum_{i' = 1}^n \sum_{i \neq i'} \mathbf{X}_{i', j} \mathbb{E}_{H_0}\left[ \xi_{i'}^2 \right] \underbrace{\mathbb{E}_{H_0}\left[ \xi_i \right]}_{=0} H_{i, i', l} \\ &= \sum_{i' = 1}^n \mathbf{X}_{i', j} \mathbb{E}_{H_0}\left[ \xi_{i'}^3 \right] H_{i', i', l} \\ &= \sum_{i' = 1}^n \mathbf{X}_{i', j} \mathbb{E}_{H_0}\left[ (\mathbf{y}_{i'} - \mu_{i'}^0)^3 \right] \left( \gamma_{i'}^0 (\delta_{i'}^0)^{-1} \right)^3 \left[ \mathbf{Z}_{i', \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{i', \cdot})^\top \right] \end{aligned} \nonumber $$ $$ \begin{aligned} \mathbb{E}_{H_0}[(b)] &= \mathbb{E}_{H_0}\left[ \sum_{i' = 1}^n \sum_{j' \neq i'} \left( \mathbf{X}_{j',j} \xi_{j'} \right) \left( \sum_{i = 1}^n \xi_i H_{i, i', l} \right) \xi_{i'} \right] \\ &= \mathbb{E}_{H_0}\left[ \underbrace{\sum_{i' = 1}^n \sum_{j' \neq i'} \left( \mathbf{X}_{j',j} \xi_{j'} \right) \xi_{i'} H_{i', i', l} \xi_{i'}}_{i = i'} + \underbrace{\sum_{i' = 1}^n \sum_{j' \neq i'} \left( \mathbf{X}_{j',j} \xi_{j'} \right) \xi_{j'} H_{j', i', l} \xi_{i'}}_{i = j'} + \sum_{i' = 1}^n \sum_{j' \neq i'} \sum_{i \neq i', j'} \left( \mathbf{X}_{j',j} \xi_{j'} \right) \xi_i H_{i, i', l} \xi_{i'} \right] \\ &= \mathbb{E}_{H_0}\left[ \sum_{i' = 1}^n \sum_{j' \neq i'} \left( \mathbf{X}_{j',j} \xi_{j'} \right) \xi_{i'}^2 H_{i', i', l} + \sum_{i' = 1}^n \sum_{j' \neq i'} \mathbf{X}_{j',j}\xi_{j'}^2 H_{j', i', l} \xi_{i'} + \sum_{i' = 1}^n \sum_{j' \neq i'} \sum_{i \neq i', j'} \left( \mathbf{X}_{j',j} \xi_{j'} \right) \xi_i H_{i, i', l} \xi_{i'} \right] \\ &= \sum_{i' = 1}^n \sum_{j' \neq i'} \left( \mathbf{X}_{j',j} \underbrace{\mathbb{E}_{H_0}[\xi_{j'}]}_{=0} \right) \mathbb{E}_{H_0}\left[ \xi_{i'}^2 \right] H_{i', i', l} + \sum_{i' = 1}^n \sum_{j' \neq i'} \mathbf{X}_{j',j} \mathbb{E}_{H_0}\left[ \xi_{j'}^2\right] H_{j', i', l} \underbrace{\mathbb{E}_{H_0}[\xi_{i'}]}_{=0} + \sum_{i' = 1}^n \sum_{j' \neq i'} \sum_{i \neq i', j'} \left( \mathbf{X}_{j',j} \underbrace{\mathbb{E}_{H_0}[\xi_{j'}]}_{=0} \right) \underbrace{\mathbb{E}_{H_0}[\xi_i]}_{=0} H_{i, i', l} \underbrace{\mathbb{E}_{H_0}[\xi_{i'}]}_{=0}\\ &= 0 \end{aligned} \nonumber $$ The result is then: $$ \begin{aligned} (\star) &= \mathbb{E}_{H_0}\left[ (a) + (b) \right] \\ &= \frac{1}{2} \sum_{i' = 1}^n \mathbf{X}_{i', j} \mathbb{E}_{H_0}\left[ (\mathbf{y}_{i'} - \mu_{i'}^0)^3 \right] \left( \gamma_{i'}^0 (\delta_{i'}^0)^{-1} \right)^3 \left[ \mathbf{Z}_{i', \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{i', \cdot})^\top \right] \end{aligned} \nonumber $$
Proof of $(\star \star)$. We can rewrite the trace term as a double summation: $$ \begin{aligned} \text{tr}\left[ \Gamma_0^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \right] &= \sum_{i = 1}^n \sum_{j = 1}^n \left(\Gamma_0^0 \right)_{i,j} \left[ \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \right]_{j, i} \\ &= \sum_{i = 1}^n \gamma_{0, i}^0 \mathbf{Z}_{i, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{i, \cdot})^\top \end{aligned} \nonumber $$ Then $(\star \star)$ becomes: $$ \begin{aligned} \mathbb{E}_{H_0}\left[ \left( \left(\mathbf{X}_{\cdot, j}\right)^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0)\right) \text{tr}\left[ \Gamma_0^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \right] \right] &= \mathbb{E}_{H_0}\left[ \left(\sum_{i = 1}^n \mathbf{X}_{i, j} \gamma_i^0 (\delta_i^0)^{-1} (\mathbf{y}_i - \mu_i^0) \right) \left( \sum_{i = 1}^n \gamma_{0, i}^0 \mathbf{Z}_{i, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{i, \cdot})^\top\right) \right] \\ &= \sum_{i = 1}^n \mathbf{X}_{i,j} \mathbb{E}_{H_0}\left[ \xi_i \gamma_{0, i}^0 \right] H_{i, i, l} + \sum_{i = 1}^n \sum_{i' \neq i} \mathbf{X}_{i,j} \underbrace{\mathbb{E}_{H_0}[\xi_i]}_{=0} \mathbb{E}_{H_0}[\gamma_{0, i'}^0] H_{i', i', l} \\ &= \sum_{i = 1}^n \mathbf{X}_{i,j} \gamma_i^0 (\delta_i^0)^{-1} \mathbb{E}_{H_0}\left[ (\mathbf{y}_i - \mu_i^0) (\gamma_i^0 + \rho^0_i (\mathbf{y}_i - \mu_i^0)) \right] H_{i, i, l} \\ &= \sum_{i = 1}^n \mathbf{X}_{i,j} \gamma_i^0 (\delta_i^0)^{-1} \left (\gamma_i^0 \underbrace{\mathbb{E}_{H_0}\left[\mathbf{y}_i - \mu_i^0\right]}_{=0} + \rho^0_i \mathbb{E}_{H_0}\left[ (\mathbf{y}_i - \mu_i^0)^2 \right]\right) H_{i, i, l} \\ &= \sum_{i = 1}^n \mathbf{X}_{i,j} \gamma_i^0 (\delta_i^0)^{-1} \rho^0_i \mathbb{E}_{H_0}\left[ (\mathbf{y}_i - \mu_i^0)^2 \right] H_{i, i, l} \end{aligned} \nonumber $$
Example.
We have to re-derive this as well since the intercept is cluster-specific: $$ \mathcal{I}_{\alpha, \tau^2} \approx \begin{bmatrix} \frac{\exp(\alpha_1)}{2} \sum_{j = 1}^{n_1} \tilde{\mathbf{Z}}_{j}^2 \\ \vdots \\ \frac{\exp(\alpha_k)}{2} \sum_{j = 1}^{n_k} \tilde{\mathbf{Z}}_{j}^2 \end{bmatrix} $$
Proof. Let $f(\tau^2) = \frac{\tau^2}{2} \left[ \frac{\partial}{\partial \alpha_t} \left[ \mathbf{Z}^\top \mathbf{C}^0 \mathbf{Z}\right]\right]$. We have: $$ \begin{aligned} \mathbb{E}\left[ U_{\alpha_t}(\mathbf{0})U_{\tau^2}(\mathbf{0}) \right] &\approx \mathbb{E}\left[ \left(\sum_{j = 1}^{n_t}(\mathbf{y}_{t,j} - \mu_{t,j}^0) + f(\tau^2) \right) \left(\frac{1}{2} \text{tr}\left[ \mathbf{Z}^\top \mathbf{C}^0 \mathbf{Z} \right]\right) \right] \\ &= \frac{1}{2} \mathbb{E}\left[ \left( \sum_{j = 1}^{n_t} (\mathbf{y}_{t,j} - \mu_{t,j}^0) \right) \text{tr}\left[ \mathbf{Z}^\top \mathbf{C}^0 \mathbf{Z} \right] \right] + \underbrace{\frac{1}{2} f(\tau^2) \mathbb{E}\left[ \text{tr}\left[ \mathbf{Z}^\top \mathbf{C}^0 \mathbf{Z} \right] \right]}_{=0 \text{ under } H_0} \\ &= \frac{1}{2} \mathbb{E}\left[ \left( \sum_{j = 1}^{n_t} (\mathbf{y}_{t,j} - \mu_{t,j}^0) \right) \sum_{i = 1}^{k} (\mathbf{Z}_{\cdot, i})^\top \mathbf{C}^0 \mathbf{Z}_{\cdot, i} \right] \\ &= \frac{1}{2} \sum_{j = 1}^{n_t} \sum_{i = 1}^k \mathbb{E}\left[ (\mathbf{y}_{t,j} - \mu_{t,j}^0) (\mathbf{Z}_{\cdot, i})^\top \mathbf{C}^0 \mathbf{Z}_{\cdot, i}\right] \\ &= \frac{1}{2} \sum_{j = 1}^{n_t} \sum_{i = 1}^k \mathbb{E}\left[ (\mathbf{y}_{t,j} - \mu_{t,j}^0) \sum_{h = 1}^n \sum_{h' = 1}^n \mathbf{Z}_{h, i} \mathbf{C}^0_{h, h'} \mathbf{Z}_{h', i} \right] \\ &\overset{(*)}{=} \frac{1}{2} \sum_{j = 1}^{n_t} \sum_{i = 1}^k \mathbb{E}\left[ (\mathbf{y}_{t,j} - \mu_{t,j}^0) \sum_{h = 1}^n \left(\mathbf{Z}_{h,i}^2\left((\mathbf{y}_h - \mu_h^0)^2 - \mu_h^0 \right) + \sum_{h' = 1 \\ h' \neq h}^n \mathbf{Z}_{h, i} \mathbf{Z}_{h', i} (\mathbf{y}_h - \mu_h^0)(\mathbf{y}_{h'} - \mu_{h'}^0)\right)\right] \end{aligned} \nonumber $$ Note that in $(*)$ we have mixed the cluster-specific and full indexing into the response and mean vectors. We then split the sums of expectations into those that correspond to cluster $t$ and those that don't: $$ \begin{aligned} \mathbb{E}_{H_0}\left[ U_{\alpha_t} U_{\tau^2} \right] &\approx \frac{1}{2} \sum_{j = 1}^{n_t} \sum_{i = 1}^k \mathbb{E}_{H_0}\left[ (\mathbf{y}_{t,j} - \mu_{t,j}^0) \sum_{h = 1}^n \left(\mathbf{Z}_{h,i}^2\left((\mathbf{y}_h - \mu_h^0)^2 - \mu_h^0 \right) + \sum_{h' = 1 \\ h' \neq h}^n \mathbf{Z}_{h, i} \mathbf{Z}_{h', i} (\mathbf{y}_h - \mu_h^0)(\mathbf{y}_{h'} - \mu_{h'}^0)\right)\right] \\ &= \frac{1}{2} \sum_{j = 1}^{n_t} \left( \tilde{\mathbf{Z}}_{j}^2 \left(\mathbb{E}_{H_0}\left[ (\mathbf{y}_{t,j} - \mu_{t,j}^0)^3\right] - \underbrace{\mathbb{E}_{H_0}\left[ (\mathbf{y}_{t,j} - \mu_{t,j}^0)\mu_{t,j}^0\right]}_{=0} \right)+ \sum_{h' = 1 \\ h' \neq j}^n \tilde{\mathbf{Z}}_{j} \mathbf{Z}_{h', t} \mathbb{E}_{H_0}\left[(\mathbf{y}_{t,j} - \mu_{t,j}^0)^2\right]\underbrace{\mathbb{E}_{H_0}\left[\mathbf{y}_{h'} - \mu_{h'}^0\right]}_{=0} \right) \\ &\hspace{5mm} \frac{1}{2} \sum_{j = 1}^{n_t} \sum_{i = 1}^k \sum_{h = 1 \\ h \neq j}^n \left( \mathbf{Z}_{h,i}^2 \underbrace{\mathbb{E}_{H_0}\left[\mathbf{y}_{t,j} - \mu_{t,j}^0 \right]}_{=0}\mathbb{E}_{H_0}\left[ (\mathbf{y}_{h} - \mu_h^0)^2 - \mu_h^0 \right] + \sum_{h' = 1 \\ h' \neq h}^n \mathbf{Z}_{h,i} \mathbf{Z}_{h', i} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{h} - \mu_h^0 \right]}_{=0} \mathbb{E}_{H_0}\left[ (\mathbf{y}_{t,j} - \mu_{t,j}^0)(\mathbf{y}_{h'} - \mu_{h'}^0) \right] \right) \\ &= \frac{1}{2} \sum_{j = 1}^{n_t}\tilde{\mathbf{Z}}_{j}^2 \mathbb{E}_{H_0}\left[ (\mathbf{y}_{t,j} - \mu_{t,j}^0)^3 \right] \\ &= \frac{1}{2} \sum_{j = 1}^{n_t} \tilde{\mathbf{Z}}_{j}^2 \mu_{t,j}^0 \\ &= \frac{\exp(\alpha_t)}{2} \sum_{j = 1}^{n_t} \tilde{\mathbf{Z}}_{j}^2 \end{aligned} \nonumber $$

Information For $(\tau^2, \tau^2)$

First, let’s rewrite the score for $\tau^2$:

\[\begin{aligned} U_{\tau^2_j}\rvert_{H_0} &= \frac{1}{2} \sum_{i = 1}^n \left( \left[ (\gamma^0_i)^2 (\delta_i^0)^{-2} (\mathbf{y}_i - \mu^0_i)^2 - \gamma_{0, i}^0\right]\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \right)_{i,i} + 2 \sum_{i' < i} (\mathbf{y}_i - \mu_i^0) (\mathbf{y}_{i'} - \mu_{i'}^0) (\gamma_i^0)^2 (\gamma_{i'}^0)^2 (\delta_i^0)^{-1} (\delta_{i'}^0)^{-1} \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \right)_{i, i'} \right) \end{aligned} \label{eq:rewrite-u-theta-j}\]
Rewriting $U_{\tau^2_j}$. $$ \begin{aligned} U_{\tau^2_l}\rvert_{H_0} &= \frac{1}{2}\left( (\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) - \text{tr}\left[ \Gamma_0^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \right] \right) \\ &= \frac{1}{2} \left( \text{tr}\left[ (\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) \right] - \text{tr} \left[ \Gamma_0^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top\right] \right) &\left(\text{first term is a scalar}\right) \\ &= \frac{1}{2} \left( \text{tr}\left[ \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) (\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top\right] - \text{tr} \left[ \Gamma_0^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top\right] \right) &\left(\text{cyclic property}\right) \\ &= \frac{1}{2} \text{tr}\left[ \left( \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) (\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 - \Gamma_0^0 \right) \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top\right] \\ &= \frac{1}{2} \sum_{i = 1}^n \sum_{j = 1}^n \left( \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) (\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 - \Gamma_0^0 \right)_{i,j} \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \right)_{j,i} \\ &= \frac{1}{2} \sum_{i = 1}^n \left( \left[ (\gamma^0_i)^2 (\delta_i^0)^{-2} (\mathbf{y}_i - \mu^0_i)^2 - \gamma_{0, i}^0\right]\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \right)_{i,i} + 2 \sum_{i' < i} (\mathbf{y}_i - \mu_i^0) (\mathbf{y}_{i'} - \mu_{i'}^0) (\gamma_i^0)^2 (\gamma_{i'}^0)^2 (\delta_i^0)^{-1} (\delta_{i'}^0)^{-1} \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \right)_{i, i'} \right) \end{aligned} $$
\[\begin{aligned} \mathcal{I}_{\tau^2_l, \tau^2_{l'}}\rvert_{H_0} &= \frac{1}{4}\left( \sum_{i = 1}^n \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i}\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i} \left( (\delta_i^{-1})^4 (\gamma_i^0)^4 \kappa_{i}^4 + 2(\gamma_i^0)^2 + (\rho_i^0)^2 \kappa_{i}^2 - 2 (\gamma_i^0)^2 (\delta_i^{-1})^2 \rho^0_i \kappa_{i}^3 \right) \right.\\ &\hspace{5mm} \left. + 4 \left( \sum_{i = 1}^n \sum_{i' < i} \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i'} \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i'} \gamma^0_i v(\mu^0_i) \gamma^0_{i'} v(\mu^0_{i'}) \right) \right) \end{aligned} \nonumber\]
Proof. $$ \begin{aligned} \mathcal{I}_{\tau^2_l, \tau^2_{l'}} \rvert_{H_0} &= \mathbb{E}_{H_0}\left[U_{\tau^2_l} U_{\tau^2_{l'}}\right] \\ &= \frac{1}{4}\mathbb{E}_{H_0}\left[ \left( \sum_{i = 1}^n \left[(\mathbf{y}_i - \mu^0_i)^2 (\delta_i^0)^{-2} (\gamma_{i}^0)^2 - \gamma_{0,i}^0 \right]\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i} + 2\sum_{i' < i} (\mathbf{y}_i - \mu^0_i) (\mathbf{y}_{i'} - \mu^0_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} \gamma_{i}^0 \gamma^0_{i'} \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i'}\right) \right. \\ &\hspace{15mm}\left. \times \left( \sum_{i = 1}^n \left[(\mathbf{y}_i - \mu^0_i)^2 (\delta_i^0)^{-2} (\gamma_{i}^0)^2 - \gamma_{0,i}^0 \right]\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i} + 2\sum_{i' < i} (\mathbf{y}_i - \mu^0_i) (\mathbf{y}_{i'} - \mu^0_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} \gamma_{i}^0 \gamma^0_{i'} \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i'}\right) \right] \\ &= \frac{1}{4} \mathbb{E}_{H_0}\left[ \left(\sum_{i = 1}^n \left[(\mathbf{y}_i - \mu^0_i)^2 (\delta_i^0)^{-2} (\gamma_{i}^0)^2 - \gamma_{0,i}^0 \right]\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i} \right)\left(\sum_{i = 1}^n \left[(\mathbf{y}_i - \mu^0_i)^2 (\delta_i^0)^{-2} (\gamma_{i}^0)^2 - \gamma_{0,i}^0 \right]\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i} \right) \right. \\ &\hspace{15mm} + \left. \left(\sum_{i = 1}^n \left[(\mathbf{y}_i - \mu^0_i)^2 (\delta_i^0)^{-2} (\gamma_{i}^0)^2 - \gamma_{0,i}^0 \right]\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i} \right) \left(2 \sum_{i = 1}^n \sum_{i' < i} (\mathbf{y}_i - \mu^0_i) (\mathbf{y}_{i'} - \mu^0_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} \gamma_{i}^0 \gamma_{i'}^0 \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i'} \right) \right. \\ &\hspace{15mm} + \left. \left(\sum_{i = 1}^n \left[(\mathbf{y}_i - \mu^0_i)^2 (\delta_i^0)^{-2} (\gamma_{i}^0)^2 - \gamma_{0,i}^0 \right]\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i} \right) \left(2 \sum_{i = 1}^n \sum_{i' < i} (\mathbf{y}_i - \mu^0_i) (\mathbf{y}_{i'} - \mu^0_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} \gamma_{i}^0 \gamma_{i'}^0 \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i'} \right) \right. \\ &\hspace{15mm} + \left. \left(2 \sum_{i = 1}^n \sum_{i' < i} (\mathbf{y}_i - \mu^0_i) (\mathbf{y}_{i'} - \mu^0_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} \gamma_{i}^0 \gamma_{i'}^0 \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i'} \right)\left(2 \sum_{i = 1}^n \sum_{i' < i} (\mathbf{y}_i - \mu^0_i) (\mathbf{y}_{i'} - \mu^0_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} \gamma_{i}^0 \gamma_{i'}^0 \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i'} \right) \right] \\ &\overset{(i)}{=} \frac{1}{4} \mathbb{E}_{H_0}\left[ \left(\sum_{i = 1}^n \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i}\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i}\left[(\mathbf{y}_i - \mu^0_i)^2 (\delta_i^0)^{-2} (\gamma_{i}^0)^2 - \gamma_{0,i}^0 \right]^2 \right) + 4\left(\sum_{i = 1}^n \sum_{i' < i} \left( (\mathbf{y}_i - \mu^0_i) (\mathbf{y}_{i'} - \mu^0_{i'}) (\delta_i^0)^{-1} (\delta^0_{i'})^{-1} \gamma_{i}^0 \gamma_{i'}^0 \right)^2 \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i'} \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i'} \right) \right] \\ &= \frac{1}{4}\left(\sum_{i = 1}^n \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i}\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i}\mathbb{E}_{H_0}\left[((\mathbf{y}_i - \mu^0_i)^2 (\delta_i^0)^{-2} (\gamma_{i}^0)^2 - \gamma_{0,i}^0)^2\right] + 4\left(\sum_{i = 1}^n \sum_{i' < i} \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i'} \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i'}\mathbb{E}_{H_0}\left[((\mathbf{y}_i - \mu^0_i) (\mathbf{y}_{i'} - \mu^0_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} \gamma_{i}^0 \gamma_{i'}^0)^2 \right] \right) \right) \end{aligned} \nonumber $$
Proof of $(i)$. In $(i)$, we use the fact that the $\mathbf{y}_i$'s are independent and have mean $\mu^0_i$ under the null. Thus, any term that has $(\mathbf{y}_i - \mu^0_i)$ (to only a power of $1$) will be $0$ in expectation and remove the term entirely from the sum.
We can then simplify: $$ \begin{aligned} \mathcal{I}_{\tau^2_l, \tau^2_{l'}} \rvert_{H_0} &\overset{(ii)}{=} \frac{1}{4}\left( \sum_{i = 1}^n \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i}\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i} \left( (\delta_i^{-1})^4 (\gamma_i^0)^4 \kappa_{i}^4 + 2(\gamma_i^0)^2 + (\rho_i^0)^2 \kappa_{i}^2 - 2 (\gamma_i^0)^2 (\delta_i^{-1})^2 \rho^0_i \kappa_{i}^3 \right)\right. \\ &\hspace{5mm} \left. + 4 \left( \sum_{i = 1}^n \sum_{i' < i} \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i'} \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i'} \mathbb{E}_{H_0}\left[ ( (\mathbf{y}_i - \mu^0_i) (\mathbf{y}_{i'} - \mu^0_{i'}) (\delta^0_i)^{-1} (\delta^0_{i'})^{-1} \gamma_{i}^0 \gamma_{i'}^0 )^2 \right] \right) \right) \\ &\overset{(iii)}{=} \frac{1}{4}\left( \sum_{i = 1}^n \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i}\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i} \left( (\delta_i^{-1})^4 (\gamma_i^0)^4 \kappa_{i}^4 + 2(\gamma_i^0)^2 + (\rho_i^0)^2 \kappa_{i}^2 - 2 (\gamma_i^0)^2 (\delta_i^{-1})^2 \rho^0_i \kappa_{i}^3 \right) \right.\\ &\hspace{5mm} \left. + 4 \left( \sum_{i = 1}^n \sum_{i' < i} \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i'} \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i'} \gamma^0_i v(\mu^0_i) \gamma^0_{i'} v(\mu^0_{i'}) \right) \right) \\ \end{aligned} \nonumber $$
Proof of $(ii)$. To simplify this further, notice the following: $$ \begin{aligned} \mathbb{E}_{H_0}\left[((\mathbf{y}_i - \mu^0_i)^2 (\delta_i^{-1})^2 (\gamma_i^0)^2 - \gamma_{0,i}^0)^2 \right] &= \mathbb{E}_{H_0}\left[((\mathbf{y}_i - \mu^0_i)^2 (\delta_i^{-1})^2 (\gamma_i^0)^2)^2 - 2\gamma_{0,i}^0(\mathbf{y}_i - \mu^0_i)^2 (\delta_i^{-1})^2 (\gamma_i^0)^2 + (\gamma_{0,i}^0)^2\right] \\ &= (\delta_i^{-1})^4 (\gamma_i^0)^4 \mathbb{E}_{H_0}\left[(\mathbf{y}_i - \mu^0_i)^4\right] - 2 \mathbb{E}_{H_0}\left[\gamma_{0,i}^0(\mathbf{y}_i - \mu^0_i)^2 (\delta_i^{-1})^2 (\gamma_i^0)^2\right] + \mathbb{E}_{H_0}\left[(\gamma_{0,i}^0)^2\right] \\ &= (\delta_i^{-1})^4 (\gamma_i^0)^4 (\kappa_{i}^4 + 3(\kappa_{i}^2)^2) - 2(\delta_i^{-1})^2(\gamma_i^0)^2 \mathbb{E}_{H_0}\left[ (\gamma_i^0 + \rho^0_i(\mathbf{y}_i - \mu^0_i))(\mathbf{y}_i-\mu^0_i)^2 \right] + \mathbb{E}_{H_0}\left[ (\gamma_i^0 + \rho^0_i(\mathbf{y}_i - \mu^0_i))^2 \right] \\ &= (\delta_i^{-1})^4 (\gamma_i^0)^4 (\kappa_{i}^4 + 3(\kappa_{i}^2)^2) - 2(\delta_i^{-1})^2(\gamma_i^0)^2 \left( \gamma_i^0\mathbb{E}_{H_0}\left[ (\mathbf{y}_i - \mu^0_i)^2 \right] + \rho^0_i \mathbb{E}_{H_0}\left[(\mathbf{y}_i - \mu^0_i)^3 \right]\right) + (\gamma_i^0)^2 + 2 \gamma_i^0 \rho^0_i \mathbb{E}_{H_0}\left[ \mathbf{y}_i - \mu^0_i \right] + (\rho_i^0)^2 \mathbb{E}_{H_0}\left[ (\mathbf{y}_i - \mu^0_i)^2 \right] \\ &= (\delta_i^{-1})^4 (\gamma_i^0)^4 \kappa_{i}^4 + 3(\delta_i^{-1})^4 (\gamma_i^0)^4 (\kappa_{i}^2)^2 - 2(\delta_i^{-1})^2(\gamma_i^0)^3 \kappa_{i}^2 - 2(\delta_i^{-1})^2(\gamma_i^0)^2 \rho^0_i \mathbb{E}_{H_0}\left[ (\mathbf{y}_i - \mu^0_i)^3 \right] + (\gamma_i^0)^2 + (\rho^0_i)^2 \kappa_{i}^2 \\ &\overset{(a)}{=} (\delta_i^{-1})^4 (\gamma_i^0)^4 \kappa_{i}^4 + 3(\delta_i^{-1})^2 (\gamma_i^0)^2 (\delta_i)^2 - 2\delta_i^{-1}(\gamma_i^0)^2 \delta_i - 2(\delta_i^{-1})^2 (\gamma_i^0)^2 \rho^0_i \mathbb{E}_{H_0}\left[ (\mathbf{y}_i - \mu^0_i)^3 \right] + (\gamma_i^0)^2 + (\rho_i^0)^2 \kappa_{i}^2 \\ &= (\delta_i^{-1})^4 (\gamma_i^0)^4 \kappa_{i}^4 + 3(\gamma_i^0)^2 - 2(\gamma_i^0)^2 + (\gamma_i^0)^2 - 2(\delta_i^{-1})^2 (\gamma_i^0)^2 \rho^0_i \mathbb{E}_{H_0}\left[ (\mathbf{y}_i - \mu^0_i)^3 \right] + (\rho_i^0)^2 \kappa_{2,i} \\ &\overset{(b)}{=} (\delta_i^{-1})^4 (\gamma_i^0)^4 \kappa_{i}^4 + 2(\gamma_i^0)^2 + (\rho_i^0)^2 \kappa_{i}^2 - 2 (\gamma_i^0)^2 (\delta_i^{-1})^2 \rho^0_i \kappa_{i}^3 \end{aligned} \nonumber $$ In $(a)$, we have that: $$ (\delta_i^0)^{-1} \gamma_i^0 = \left( \frac{1}{\frac{\partial \eta^0_i}{\partial \mu_i}} \right)^{-1} \frac{1}{v(\mu^0_i) (\frac{\partial \eta^0_i}{\partial \mu_i})^2} = \frac{1}{v(\mu^0_i) \frac{\partial \eta^0_i}{\partial \mu_i}} \implies (\delta^0_i)^{-1} \gamma^0_i \kappa_{i}^2 = \frac{1}{v(\mu^0_i) \frac{\partial \eta^0_i}{\partial \mu_i}} v(\mu^0_i) = \delta^0_i \nonumber $$ In $(b)$, we use the fact that the third cumulant is equal to the third central moment.
Proof of $(iii)$. In $(iii)$, we see that : $$ \begin{aligned} \mathbb{E}_{H_0}\left[((\mathbf{y}_i - \mu^0_i) (\mathbf{y}_{i'} - \mu^0_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} \gamma_{i}^0 \gamma^0_{i'})^2 \right] &= (\delta_i^{-1}\delta_{i'}^{-1})^2 (\gamma_i^0 \gamma^0_{i'})^2 \mathbb{E}_{H_0}\left[ (\mathbf{y}_i - \mu^0_i)^2(\mathbf{y}_{i'} - \mu^0_{i'})^2\right] \\ &= (\delta_i^{-1}\delta_{i'}^{-1})^2 (\gamma_i^0 \gamma^0_{i'})^2 \mathbb{E}_{H_0}\left[ (\mathbf{y}_i - \mu^0_i)^2\right] \mathbb{E}_{H_0}\left[(\mathbf{y}_{i'} - \mu^0_{i'})^2\right] \\ &= (\delta_i^{-1})^2 (\gamma^0_i)^2 v(\mu^0_i) (\delta_{i'}^{-1})^2 (\gamma^0_{i'})^2 v(\mu^0_{i'}) \\ &\overset{(a')}{=} \gamma^0_i v(\mu^0_i) \gamma^0_{i'} v(\mu^0_{i'}) \end{aligned} \nonumber $$ In $(a')$, we use the fact that: $$ ((\delta_i^0)^{-1})^2 v(\mu^0_i) = \left(\frac{\partial \eta^0_i}{\partial \mu_i} \right)^2 v(\mu^0_i) = (\gamma_i^0)^{-1} \nonumber $$

This matches the form in Lin (1997) if we let \(r_{i,i} = (\delta_i^{-1})^4 (\gamma_i^0)^4 \kappa_{i}^4 + 2(\gamma_i^0)^2 + (\rho_i^0)^2 \kappa_{i}^2 - 2 (\gamma_i^0)^2 (\delta_i^{-1})^2 \rho^0_i \kappa_{i}^3\) and \(r_{i, i'} = 2 \gamma^0_i\gamma^0_{i'}\).

Example.
$$ \mathcal{I}_{\tau^2, \tau^2} \bigg\rvert_{H_0} \approx \frac{1}{4} \sum_{i = 1}^k \sum_{j = 1}^n \mathbf{Z}_{i,j}^4 (\mu_{i,j}^0 + 2(\mu_{i,j}^0)^2) $$
Proof. $$ \begin{aligned} U_{\tau^2}^2 \bigg\rvert_{H_0} &\approx \left(\text{tr}\left[ \mathbf{Z}^\top \mathbf{C}^0 \mathbf{Z} \right] \right)^2 \\ &= \left(\sum_{i = 1}^k\mathbf{Z}_{\cdot, i}^\top \mathbf{C}^0 \mathbf{Z}_{\cdot, i} \right)^2 \\ &= \sum_{i = 1}^k \sum_{i' = 1}^k (\mathbf{Z}_{\cdot, i}^\top \mathbf{C}^0 \mathbf{Z}_{\cdot, i})(\mathbf{Z}_{\cdot, i'} \mathbf{C}^0 \mathbf{Z}_{\cdot, i'}) \\ &= \sum_{i = 1}^k \sum_{i' = 1}^k \sum_{j = 1}^n \sum_{j' = 1}^n \sum_{j'' = 1}^n \sum_{j''' = 1}^n \mathbf{Z}_{j,i} \mathbf{Z}_{j',i} \mathbf{Z}_{j'', i'} \mathbf{Z}_{j''', i'} \mathbf{C}_{j,j'}^0 \mathbf{C}_{j'', j'''}^0 \end{aligned} \nonumber $$ The summands in the above are equal to zero if there is ever an index that is different from the others due to the independence assumption. Thus, we can restrict the summations over $j$, $j'$, $j''$, and $j'''$ to only settings where all of the indices are the same or there are two pairs of matching indices (when $j = j'$ and $j = j''$ and $j = j'''$ and the corresponding other pairs): $$ \begin{aligned} U_{\tau^2}(\mathbf{0})^2 &\approx \sum_{i = 1}^k \sum_{i' = 1}^k \sum_{j = 1}^n \left[ \mathbf{Z}_{j,i}^2 \mathbf{Z}_{j, i'}^2 (\mathbf{C}_{j,j}^0)^2 + \sum_{j' = 1 \\ j' \neq j}^n \mathbf{Z}_{j, i}^2 \mathbf{Z}_{j', i'}^2 \mathbf{C}_{j,j}^0 \mathbf{C}_{j',j'}^0 \right] \end{aligned} \nonumber $$ We only need to deal with $\mathbf{C}^0$ terms when taking the expectation, and we have: $$ \begin{aligned} \mathbb{E}_{H_0}\left[ (\mathbf{C}_{j,j}^0)^2 \right] &= \mathbb{E}_{H_0}\left[ ((\mathbf{y}_j - \mu_j^0)^2 - \mu_j^0)^2 \right] \\ &= \mathbb{E}_{H_0}\left[ (\mathbf{y}_j - \mu_j^0)^4\right] - 2 \mu_j^0 \mathbb{E}_{H_0}\left[ (\mathbf{y}_j - \mu_j^0)^2 \right] + (\mu_j^0)^2 \\ &= (\mu_j^0 + 3 (\mu_j^0)^2) - 2 (\mu_j^0)^2 + (\mu_j^0)^2 \\ &= \mu_j^0 + 2(\mu_j^0)^2 \\ \mathbb{E}_{H_0}\left[ \mathbf{C}_{j,j}^0 \mathbf{C}_{j',j'}^0\right] &= \mathbb{E}_{H_0}\left[ ((\mathbf{y}_j - \mu_j^0)^2 - \mu_j^0)((\mathbf{y}_{j'} - \mu_{j'}^0)^2 - \mu_{j'}^0) \right] \\ &= \mathbb{E}_{H_0}\left[ (\mathbf{y}_j - \mu_j^0)^2 \right]\mathbb{E}_{H_0}\left[ (\mathbf{y}_{j'} - \mu_{j' }^0)^2\right] - \mu_j^0 \mathbb{E}_{H_0}\left[ (\mathbf{y}_{j'} - \mu_{j'}^0)^2 \right] - \mu_{j'}^0 \mathbb{E}_{H_0}\left[ (\mathbf{y}_j - \mu_j^0)^2 \right] + \mu_j^0 \mu_{j'}^0 \\ &= \mu_j^0 \mu_{j'}^0 - \mu_{j'}^0 \mu_j^0 - \mu_{j}^0 \mu_{j'}^0 + \mu_j^0 \mu_{j'}^0 \\ &= 0 \end{aligned} \nonumber $$ where we use the fact that the fourth central moment for the Poisson distribution is equal to the mean plus three times the mean squared (via the relationship between central moments and cumulants). Putting the above all together, we get: $$ \begin{aligned} \mathbb{E}_{H_0}\left[ U_{\tau^2}^2(\mathbf{0})\right] &\approx \frac{1}{4} \sum_{i = 1}^k \sum_{i' = 1}^k \sum_{j = 1}^n \left[ \mathbf{Z}_{j,i}^2 \mathbf{Z}_{j, i'}^2 \mathbb{E}_{H_0}\left[(\mathbf{C}_{j,j}^0)^2\right] + \sum_{j' = 1 \\ j' \neq j}^n \mathbf{Z}_{j, i}^2 \mathbf{Z}_{j', i'}^2 \underbrace{\mathbb{E}_{H_0}\left[\mathbf{C}_{j,j}^0 \mathbf{C}_{j',j'}^0 \right]}_{=0} \right] \\ &= \frac{1}{4} \sum_{i = 1}^k \sum_{i' = 1}^k \sum_{j = 1}^n \mathbf{Z}_{j,i}^2 \mathbf{Z}_{j, i'}^2 (\mu_j^0 + 2(\mu_j^0)^2 ) \end{aligned} \nonumber $$ Noticing that $\mathbf{Z}_{j,i} = 0$ if the index $j$ does not fall into those corresponding to cluster $i$, we can simplify to: $$ \mathbb{E}_{H_0}\left[ U_{\tau^2}^2(\mathbf{0})\right] \approx \frac{1}{4} \sum_{i = 1}^k \sum_{j = 1}^n \mathbf{Z}_{i,j}^4 (\mu_{i,j}^0 + 2(\mu_{i,j}^0)^2) $$

Test

The test statistic for the global test is given by:

\[\chi_G^2 = U_(\tau^2)^\top(\hat{\alpha}_0) \tilde{\mathcal{I}}_{\tau^2, \tau^2}^{-1}(\hat{\alpha}_0) U_\tau^2(\hat{\alpha}_0)\]

Under the null hypothesis, it should have a $\chi^2_1$ distribution, asymptotically.


Example With Marginal Likelihood

Instead of doing a Taylor approximation of the quasi-likelihood like in Lin (1997), we can use the marginal quasi-likelihood method for our example since we assumed a particular distributional form for the random effects.

We can write the log-likelihood for cluster $i$ in vector notation as:

\[\ell(\mathbf{y}_i; \alpha_i, \tau^2 \rvert \beta_i) = \sum_{j = 1}^{n_i} \left[ \mathbf{y}_{i,j} \log(\mu_{i,j}) - \mu_{i,j} - \log(\mathbf{y}_{i,j}!) \right] = \mathbf{y}_i^\top \log(\mu_i) - \mu_i^\top \mathbf{1}_{n_i} - \sum_{j = 1}^{n_i} \log( \mathbf{y}_{i,j}!)\]

We also have:

\[\frac{d \ell(\mathbf{y}_i; \alpha_i, \tau^2 \rvert \beta_i)}{d \beta_i} = \sum_{j = 1}^{n_i} \left[ \mathbf{y}_{i,j} \mathbf{z}_{i,j} - \exp(\alpha_i + \beta_i \mathbf{z}_{i,j})\mathbf{z}_{i,j} \right] \hspace{5mm} \frac{d^2 \ell(\mathbf{y}_i;\alpha_i, \tau^2 \rvert \beta_i)}{d \beta_i^2} = - \sum_{j = 1}^{n_i} \exp(\alpha_i + \beta_i \mathbf{z}_{i,j}) \mathbf{z}_{i,j}^2\]

In our case, we only have a random slope with $D(\tau^2) = \tau^2$, so $\mathbf{K}_i$ and $\mathbf{h}_i$ are fairly simple:

\[\mathbf{K}_i = \frac{1}{\tau^2} - \left(- \sum_{j = 1}^{n_i} \mu_{i,j}^0 \mathbf{z}_{i,j}^2\right); \hspace{5mm} \mathbf{h}_i = \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j}\]

Applying Eq. \eqref{eq:marg-qll} to this case, we get the marginal log-likelihood:

\[\begin{aligned} \ell(\mathbf{y}; \alpha, \tau^2) &\approx \sum_{i = 1}^k \left[-\frac{1}{2}\log\left(1 + \tau^2 \exp(\alpha_i) \sum_{j= 1}^{n_i} \mathbf{z}_{i,j}^2\right) + \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j}\alpha_i - \exp(\alpha_i) - \log(\mathbf{y}_{i,j}!)) + \frac{1}{2} \left( \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j}\right)^2 \left(\frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j}^2 \right)^{-1}\right] \end{aligned}\]

Let \(\dot{\mathbf{z}}_i = \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}(\mathbf{y}_{i,j} - \exp(\alpha_i))\) and define:

\[\begin{aligned} \upsilon_i &= \frac{\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2\left( 1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)} & \zeta_i &= \frac{\exp(\alpha_i)}{\frac{1}{\tau^2} + \exp(\alpha_i)\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}} = \frac{\tau^2 \exp(\alpha_i)}{1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}} \\ \xi_i &= \frac{\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2\left(\frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)} = \frac{\tau^2 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2\left(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)} & \psi_i &= \frac{1}{1 + 2 \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 + \left(\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^2} \end{aligned}\]

Score Functions

We have:

\[\begin{aligned} \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i} \bigg\rvert_{H_0} &\approx -\upsilon_i + \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) - \zeta_i \dot{\mathbf{z}}_i \left[\xi_i \dot{\mathbf{z}}_i + \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\right] \end{aligned}\]
Proof. We have: $$ \begin{aligned} \frac{d}{d \alpha_i} \left[ - \frac{1}{2}\log\left( 1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right)\right] \bigg\rvert_{H_0} &= -\frac{1}{2}\left( 1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^{-1} \left(\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \\ \frac{d}{d \alpha_i} \left[ \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} \alpha_i - \exp(\alpha_i) - \log(\mathbf{y}_{i,j}!)) \right] &= \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) \\ \frac{d}{d \alpha_i} \left[ \frac{1}{2} \left( \sum_{j = 1}^{n_i}(\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j} \right)^2 \right] &= -\left(\sum_{j = 1}^{n_i}(\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j}\right) \left(\exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j} \right)\\ \frac{d}{d \alpha_i} \left[ \left( \frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^{-1} \right] &= - \left(\frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^{-2} \left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \nonumber \end{aligned} $$ Thus, the derivative is: $$ \begin{aligned} \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i} \bigg\rvert_{H_0} &\approx \frac{d}{\alpha_i} \left[ -\frac{1}{2}\log\left(1 + \tau^2 \exp(\alpha_i) \sum_{j= 1}^{n_i} \mathbf{z}_{i,j}^2\right) + \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j}\alpha_i - \exp(\alpha_i) - \log(\mathbf{y}_{i,j}!)) + \frac{1}{2} \left( \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j}\right)^2 \left(\frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j}^2 \right)^{-1} \right] \\ &= -\frac{\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2\left( 1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)} + \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) + \frac{-\left(\sum_{j = 1}^{n_i}(\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j}\right) \left(\exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j} \right)}{\frac{1}{\tau^2} + \exp(\alpha_i)\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2} + \frac{\frac{1}{2} \left( \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j}\right)^2\left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)}{-\left(\frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^{2}} \\ &= -\frac{\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2\left( 1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)} + \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) - \frac{\left(\sum_{j = 1}^{n_i}(\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j}\right) \left(\exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j} \right)}{\frac{1}{\tau^2} + \exp(\alpha_i)\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2} - \frac{ \left( \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j}\right)^2\left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)}{2\left(\frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^{2}} \\ &= -\frac{\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2\left( 1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)} + \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) -\frac{\exp(\alpha_i)\sum_{j = 1}^{n_i}(\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j}}{\frac{1}{\tau^2} + \exp(\alpha_i)\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}} \left[\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}+ \frac{\left(\sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j} \right)\left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right)}{2\left(\frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)}\right] \\ &= -\upsilon_i + \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) - \zeta_i \left[ \sum_{j = 1}^{n_i}\mathbf{z}_{i,j}(\mathbf{y}_{i,j} - \exp(\alpha_i)) \right]\left[\sum_{j = 1}^{n_i} \mathbf{z}_{i,j} + \xi_i \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} (\mathbf{y}_{i,j} - \exp(\alpha_i))\right] \end{aligned} \nonumber $$
\[\begin{aligned} \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2} &\approx \frac{1}{2}\sum_{i = 1}^k \left[-\frac{2 \upsilon_i}{\tau^2} +\dot{\mathbf{z}}^2_i \psi_i \right] \end{aligned}\]
Proof. We have: $$ \begin{aligned} \frac{d}{d \tau^2} \left[ - \frac{1}{2}\log\left( 1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right)\right] &= -\frac{1}{2}\left( 1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^{-1} \left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \\ \frac{d}{d \tau^2} \left[ \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} \alpha_i - \exp(\alpha_i) - \log(\mathbf{y}_{i,j}!)) \right] &= 0\\ \frac{d}{d \tau^2} \left[ \frac{1}{2} \left( \sum_{j = 1}^{n_i}(\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j} \right)^2 \right] &= 0 \\ \frac{d}{d \tau^2} \left[ \left( \frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^{-1} \right] &= \left(\frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^{-2} (\tau^2)^{-2} \end{aligned} \nonumber $$ Thus, the derivative is: $$ \begin{aligned} \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2} &\approx \sum_{i = 1}^k \frac{d}{d\tau^2} \left[ -\frac{1}{2}\log\left(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right) + \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j}\alpha_i - \exp(\alpha_i) - \log(\mathbf{y}_{i,j}!)) + \frac{1}{2} \left( \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j}\right)^2 \left(\frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j}^2 \right)^{-1} \right] \\ &= \sum_{i = 1}^k \left[-\frac{\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2\left( 1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)} + \frac{\left( \sum_{j = 1}^{n_i}(\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j} \right)^2}{2 (\tau^2)^2 \left(\frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^2}\right] \\ &= \sum_{i = 1}^k \left[-\frac{\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2\left( 1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)} + \frac{\left( \sum_{j = 1}^{n_i}(\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j} \right)^2}{2 (\tau^2)^2 \left(\frac{1}{(\tau^2)^2} + 2\frac{1}{\tau^2} \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 + \left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^2 \right) }\right] \\ &= \frac{1}{2}\sum_{i = 1}^k \left[-\frac{\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 } + \frac{\left( \sum_{j = 1}^{n_i}(\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j} \right)^2}{1 + 2\tau^2\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 + \left(\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^2}\right] \\ &= \frac{1}{2}\sum_{i = 1}^k \left[-\frac{2 \upsilon_i}{\tau^2} + \psi_i \left( \sum_{j = 1}^{n_i}\mathbf{z}_{i,j}(\mathbf{y}_{i,j} - \exp(\alpha_i)) \right)^2 \right] \end{aligned} $$

Notice that under the null hypothesis, the gradient of the log-likelihood with respect to $\alpha$ becomes:

\[\frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha} \approx \begin{bmatrix} \sum_{j = 1}^{n_1} (\mathbf{y}_{1, j} - \exp(\alpha_1)) \\ \vdots \\ \sum_{j = 1}^{n_k} (\mathbf{y}_{k, j} - \exp(\alpha_k)) \end{bmatrix}\]

Setting the above equal to $\mathbf{0}_{k}$ and solving for $\alpha$ gives us the MLE under the null:

\[\hat{\alpha} = \begin{bmatrix} \log\left(\frac{1}{n_1} \sum_{j = 1}^{n_1} \mathbf{y}_{1, j}\right) \\ \vdots \\ \log\left(\frac{1}{n_k} \sum_{j = 1}^{n_k} \mathbf{y}_{k, j}\right) \end{bmatrix}\]

A Note On The MLE

Suppose we let $\alpha^* = \alpha + \epsilon$ where $\alpha$ is the true value of $\alpha$ and $\epsilon$ is a little bit of jitter (negative or positive). Consider evaluating the gradient of the marginal log-likelihood w.r.t. $\alpha$ at $(\alpha, \tau^2) = (\alpha^*, 0)$. We know that $\upsilon_i = \zeta_i = \xi_i = 0$ for $\tau^2 = 0$, so the component of the score corresponding to $\alpha_i$ simplifies to:

\[\frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i} \bigg\rvert_{(\alpha, \tau^2) = (\alpha^*, 0)} = \left(\sum_{j = 1}^{n_i} \mathbf{y}_{i,j}\right) - n_i \exp(\epsilon)\exp(\alpha_i)\]
Proof. $$ \begin{aligned} \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i} \bigg\rvert_{(\alpha, \tau^2) = (\alpha^*, 0)} &= \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i + \epsilon)) \\ &= \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i) \exp(\epsilon)) \\ &= \left(\sum_{j = 1}^{n_i} \mathbf{y}_{i,j}\right) - n_i \exp(\epsilon)\exp(\alpha_i) \end{aligned} \nonumber $$

Since $e^x > 0$ and $n_i > 0$, we see that adding any jitter to $\alpha$ corresponds to scaling the amount we are subtracting off the cluster sum by a factor that is exponential in the amount of jitter.

Now let’s consider the gradient of the marginal log-likelihood w.r.t. $\tau^2$:

\[\frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2} \bigg\rvert_{(\alpha, \tau^2) = (\alpha^*, 0)} = \frac{1}{2}\sum_{i = 1}^k \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \left(\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}\right) \left( \exp(\epsilon)\exp(\alpha_i) - \frac{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'} \mathbf{y}_{i,j'} + \frac{1}{2}\mathbf{z}_{i,j}}{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}} \right)^2 + c\]
Proof. $$ \begin{aligned} \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2} \bigg\rvert_{(\alpha, \tau^2) = (\alpha^*, 0)} &= \frac{1}{2} \sum_{i = 1}^k \left[ -\exp(\alpha_i + \epsilon) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 + \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j} (\mathbf{y}_{i,j} - \exp(\alpha_i + \epsilon))\right)^2 \right] \\ &= \frac{1}{2} \sum_{i = 1}^k \left[ - \exp(\alpha_i + \epsilon) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 + \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 (\mathbf{y}_{i,j} - \exp(\alpha_i + \epsilon))^2 + \sum_{j = 1}^{n_i} \sum_{j' \neq j} \mathbf{z}_{i,j} \mathbf{z}_{i,j'} (\mathbf{y}_{i,j} - \exp(\alpha_i + \epsilon))(\mathbf{y}_{i,j'} - \exp(\alpha_i + \epsilon)) \right] \\ &= \frac{1}{2} \sum_{i = 1}^k \left[ - \exp(\alpha_i + \epsilon) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 + \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \mathbf{y}_{i,j}^2 - 2\exp(\alpha_i + \epsilon)\sum_{j = 1}^{n_i}\mathbf{z}_{i,j}^2 \mathbf{y}_{i,j} + \exp^2(\alpha_i + \epsilon) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 + \sum_{j = 1}^{n_i} \sum_{j' \neq j} \mathbf{z}_{i,j} \mathbf{z}_{i,j'} (\mathbf{y}_{i,j}\mathbf{y}_{i,j'} - \exp(\alpha_i + \epsilon) \mathbf{y}_{i,j} - \exp(\alpha_i + \epsilon)\mathbf{y}_{i,j'} - \exp^2(\alpha_i + \epsilon)) \right] \\ &= \frac{1}{2}\sum_{i = 1}^k \left[ - \exp(\alpha_i + \epsilon) \sum_{j =1 }^{n_i} \mathbf{z}_{i,j}^2 + \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \mathbf{y}_{i,j}\right)^2 - 2\exp(\alpha_i + \epsilon)\left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \mathbf{y}_{i,j}\right)\left(\sum_{j =1 }^{n_i} \mathbf{z}_{i,j} \right) + \exp^2(\alpha_i + \epsilon) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\right)^2 \right] \\ &= \frac{1}{2}\sum_{i = 1}^k \sum_{j = 1}^{n_i} \left[ - \exp(\alpha_i + \epsilon) \mathbf{z}^2_{i,j} + \mathbf{z}_{i,j} \mathbf{y}_{i,j}\left(\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'} \mathbf{y}_{i,j'}\right) - 2\exp(\alpha_i + \epsilon)\mathbf{z}_{i,j} \left(\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'} \mathbf{y}_{i,j'}\right)+ \exp^2(\alpha_i + \epsilon) \mathbf{z}_{i,j} \left(\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}\right) \right] \\ &= \frac{1}{2}\sum_{i = 1}^k \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \left(\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}\right) \left[ - \exp(\alpha_i + \epsilon) \frac{\mathbf{z}_{i,j}}{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}} + \frac{\mathbf{y}_{i,j}\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'} \mathbf{y}_{i,j'}}{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}} - 2\exp(\alpha_i + \epsilon)\frac{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'} \mathbf{y}_{i,j'}}{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}}+ \exp^2(\alpha_i + \epsilon) \right] \\ &= \frac{1}{2}\sum_{i = 1}^k \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \left(\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}\right) \left[ \exp^2(\alpha_i + \epsilon) - 2 \exp(\alpha_i + \epsilon) \left( \frac{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'} \mathbf{y}_{i,j'} + \frac{1}{2}\mathbf{z}_{i,j}}{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}} \right) + \frac{\mathbf{y}_{i,j}\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'} \mathbf{y}_{i,j'}}{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}} \right] \\ &= \frac{1}{2}\sum_{i = 1}^k \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \left(\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}\right) \left[ \left( \exp(\alpha_i + \epsilon) - \frac{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'} \mathbf{y}_{i,j'} + \frac{1}{2}\mathbf{z}_{i,j}}{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}} \right)^2 - \left(\frac{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'} \mathbf{y}_{i,j'} + \frac{1}{2}\mathbf{z}_{i,j}}{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}}\right)^2 + \frac{\mathbf{y}_{i,j}\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'} \mathbf{y}_{i,j'}}{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}} \right] \\ &= \frac{1}{2}\sum_{i = 1}^k \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \left(\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}\right) \left( \exp(\alpha_i + \epsilon) - \frac{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'} \mathbf{y}_{i,j'} + \frac{1}{2}\mathbf{z}_{i,j}}{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}} \right)^2 + c \end{aligned} \nonumber $$ were $c$ is defined appropriately.

where $c$ is a term that does not depend upon $\epsilon$. For similar reasons as before, we see that all the jitter does is scale the leading term in the quadratic within the sum. Thus, as $\epsilon$ grows in magnitude, the score will also grow in magnitude exponentially in $2\epsilon$.


Information

We’ll derive the information matrix in chunks. First, we have:

\[\mathbb{E}_{H_0}\left[\dot{\mathbf{z}}_i \right] = 0; \hspace{5mm} \mathbb{E}_{H_0}\left[\dot{\mathbf{z}}_i^2 \right] = \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \mu_{i,j}^0; \hspace{5mm} \mathbb{E}_{H_0}\left[\dot{\mathbf{z}}_i^3 \right] = \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^3 \mu_{i,j}^0; \hspace{5mm} \mathbb{E}_{H_0}\left[\dot{\mathbf{z}}_i^4 \right] = 3\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^4 ((\mu_{i,j}^0)^2 + \mu_{i,j}^0) + 3\sum_{j = 1}^{n_i} \sum_{j' \neq j} \mathbf{z}_{i,j}^2 \mathbf{z}_{i,j'}^2 \mu_{i,j}^0 \mu_{i,j'}^0\]
Proof. $$ \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i\right] = \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j} - \exp(\alpha_i) \right] = 0 \nonumber $$ $$ \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \right] = \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \underbrace{\mathbb{E}_{H_0}\left[ (\mathbf{y}_{i,j} - \exp(\alpha_i))^2 \right]}_{=\text{Var}(\mathbf{y}_{i,j})} + \sum_{j = 1}^{n_i} \sum_{j' \neq j} \mathbf{z}_{i,j} \mathbf{z}_{i,j'} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j} - \exp(\alpha_i) \right]}_{=0}\underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j'} - \exp(\alpha_i) \right]}_{=0} = \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \mu_{i,j}^0 \nonumber $$ $$ \begin{aligned} \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^3 \right] &= \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^3 \underbrace{\mathbb{E}_{H_0}\left[ (\mathbf{y}_{i,j} - \exp(\alpha_i))^3 \right]}_{= \text{ 3rd central moment } \\ = \mu_{i,j}^0} + 2 \sum_{j = 1}^{n_i} \sum_{j' \neq j} \mathbf{z}_{i,j}^2 \mathbf{z}_{i,j'} \underbrace{\mathbb{E}_{H_0}\left[ (\mathbf{y}_{i,j} - \exp(\alpha_i))^2 \right]}_{= \text{Var}(\mathbf{y}_{i,j})} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j'} - \exp(\alpha_i) \right]}_{=0} \\ &\hspace{5mm} + \sum_{j = 1}^{n_i} \sum_{j' \neq j} \sum_{j'' \neq j, j'} \mathbf{z}_{i,j} \mathbf{z}_{i,j'} \mathbf{z}_{i,j''} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j} - \exp(\alpha_i) \right]}_{=0} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j'} - \exp(\alpha_i) \right]}_{=0}\underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j''} - \exp(\alpha_i) \right]}_{=0} \\ &= \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^3 \mu_{i,j}^0 \end{aligned} \nonumber $$ $$ \begin{aligned} \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^4 \right] &= \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^4 \underbrace{\mathbb{E}_{H_0}\left[ (\mathbf{y}_{i,j} - \exp(\alpha_i))^4 \right]}_{= \text{ 4th central moment } \\ = 3(\mu_{i,j}^0)^2 + \mu_{i,j}^0} + 4 \sum_{j = 1}^{n_i} \sum_{j' \neq j} \mathbf{z}_{i,j}^3 \mathbf{z}_{i,j'}\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j} - \exp(\alpha_i)^3 \right] \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j'} - \exp(\alpha_i) \right]}_{= 0} \\ &\hspace{5mm} + 3 \sum_{j = 1}^{n_i} \sum_{j' \neq j} \mathbf{z}_{i,j}^2 \mathbf{z}_{i,j'}^2 \underbrace{\mathbb{E}_{H_0}\left[ (\mathbf{y}_{i,j} - \exp(\alpha_i))^2 \right]}_{= \text{Var}(\mathbf{y}_{i,j})}\underbrace{\mathbb{E}_{H_0}\left[ (\mathbf{y}_{i,j} - \exp(\alpha_i))^2 \right]}_{= \text{Var}(\mathbf{y}_{i,j'})} + 12 \sum_{j = 1}^{n_i} \sum_{j' \neq j} \sum_{j'' \neq j, j'} \mathbf{z}^2_{i,j} \mathbf{z}_{i,j'} \mathbf{z}_{i,j''} \mathbb{E}_{H_0}\left[ (\mathbf{y}_{i,j} - \exp(\alpha_i))^2 \right] \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j'} - \exp(\alpha_i) \right]}_{= 0}\underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j''} - \exp(\alpha_i) \right]}_{= 0} \\ &\hspace{5mm} + \sum_{j = 1}^{n_i} \sum_{j' \neq j} \sum_{j'' \neq j, j'} \sum_{j''' \neq j, j', j''} \mathbf{z}_{i,j} \mathbf{z}_{i,j'} \mathbf{z}_{i,j''} \mathbf{z}_{i,j'''} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j} - \exp(\alpha_i) \right]}_{= 0} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j'} - \exp(\alpha_i) \right]}_{= 0} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j''} - \exp(\alpha_i) \right]}_{= 0} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j'''} - \exp(\alpha_i) \right]}_{= 0} \\ &= \sum_{j = 1}^{n_i} 3 \mathbf{z}_{i,j}^4 ((\mu_{i,j}^0)^2 + \mu_{i,j}^0) + 3\sum_{j = 1}^{n_i} \sum_{j' \neq j} \mathbf{z}_{i,j}^2 \mathbf{z}_{i,j'}^2 \mu_{i,j}^0 \mu_{i,j'}^0 \end{aligned} \nonumber $$

We can show that:

\[\mathbb{E}_{H_0}\left[ \left(\frac{\partial \ell(\mathbf{y}; \alpha, \tau^2)}{\partial \alpha_i}\right)^2 \right] \approx n_i \exp(\alpha_i)\]
Proof. All expectations are taking under the null hypothesis, so $\mathbb{E}_{H_0}[\mathbf{y}_{i,j}] = \exp(\alpha_i)$. Furthermore, we have that $\upsilon_i = 0$ and $\psi_i = 1$ under the null (also $\zeta_i$ and $\xi_i$ go to $0$ as we take $\tau^2 \rightarrow 0$). The cross terms come out to: $$ \begin{aligned} \mathbb{E}_{H_0}\left[ \upsilon_i \zeta_i \dot{\mathbf{z}}_i \left[\sum_{j = 1}^{n_i} \mathbf{z}_{i,j} + \xi_i \dot{\mathbf{z}}_i \right] \right] &= \upsilon_i \zeta_i \left( \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \underbrace{\mathbb{E}_{H_0}\left[\dot{\mathbf{z}}_i\right]}_{=0} + \xi_i \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \right] \right) = \upsilon_i \zeta_i \xi_i \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \mu_{i,j}^0 = \upsilon_i \zeta_i \xi_i \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \\ \mathbb{E}_{H_0}\left[ \upsilon_i \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) \right] &= \upsilon_i \sum_{j = 1}^{n_i} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j} - \exp(\alpha_i) \right]}_{=0} = 0 \\ \mathbb{E}_{H_0}\left[ \left(\sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) \right)\left( \zeta_i \dot{\mathbf{z}}_i \left[ \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} + \xi_i \dot{\mathbf{z}}_i \right]\right) \right] &= \zeta_i \left(\sum_{j = 1}^{n_i}\mathbf{z}_{i,j}\right) \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i))\right] + \zeta_i \xi_i \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}^2_i \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) \right] \\ &= \zeta_i \left(\sum_{j = 1}^{n_i}\mathbf{z}_{i,j}\right) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \mu_{i,j}^0 \right) + \zeta_i \xi_i \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \mu_{i,j}^0 \\ &= \zeta_i \exp(\alpha_i) \left[ \left(\sum_{j = 1}^{n_i}\mathbf{z}_{i,j}\right)^2 + \xi_i \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)\right] \end{aligned} \nonumber $$ All of the above terms go to $0$ under the null (as we take $\tau^2 \rightarrow 0$), so their sum also goes to $0$. Looking at the squared terms: $$ \begin{aligned} \mathbb{E}_{H_0}\left[ \upsilon_i^2 \right] &= \upsilon_i^2 \\ \mathbb{E}_{H_0}\left[ \left( \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) \right)^2 \right] &= \sum_{j = 1}^{n_i} \mathbb{E}_{H_0}\left[ (\mathbf{y}_{i,j} - \exp(\alpha_i))^2 \right] + \sum_{j = 1}^{n_i} \sum_{j' \neq j} \mathbb{E}_{H_0}\left[ (\mathbf{y}_{i,j} - \exp(\alpha_i))(\mathbf{y}_{i, j'} - \exp(\alpha_i)) \right] \\ &= \sum_{j = 1}^{n_i} \text{Var}(\mathbf{y}_{i,j}) + \sum_{j = 1}^{n_i} \sum_{j' \neq j} \underbrace{\mathbb{E}_{H_0}\left[ (\mathbf{y}_{i,j} - \exp(\alpha_i)) \right]}_{=0} \underbrace{\mathbb{E}_{H_0}\left[(\mathbf{y}_{i, j'} - \exp(\alpha_i)) \right]}_{=0} \\ &= \sum_{j = 1}^{n_i} \mu_{i,j}^0 \\ &= n_i \exp(\alpha_i) \\ \mathbb{E}_{H_0}\left[ \left( \zeta_i \dot{\mathbf{z}}_i \left[\sum_{j = 1}^{n_i} \mathbf{z}_{i,j} + \xi_i \dot{\mathbf{z}}_i \right]\right)^2\right] &= \zeta_i^2 \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 + \xi_i \dot{\mathbf{z}}_i \right)^2 \right] \\ &= \zeta_i^2 \left[\left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right)^2 \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \right]+ 2 \xi^2_i \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}^3_i \right] + \xi_i^2 \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^4 \right] \right] \\ &= \zeta_i^2 \left[ \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^2\left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \mu_{i,j}^0\right) + 2 \xi_i^2 \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^3 \mu_{i,j}^0 \right) + 3 \xi_i^2 \sum_{j = 1}^{n_i} \sum_{j' \neq j} \mathbf{z}_{i,j}^2 \mathbf{z}_{i,j'}^2 \mu_{i,j}^0 \mu_{i,j'}^0\right] \\ &= \zeta_i^2 \exp(\alpha_i) \left[ \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^3 + 2 \xi_i^2 \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^3 \right) + 3 \xi_i^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \sum_{j' \neq j} \mathbf{z}_{i,j}^2 \mathbf{z}_{i,j'}^2 \right] \\ \end{aligned} \nonumber $$ Only the second term is non-zero under the null hypothesis, so: $$ \begin{aligned} \mathbb{E}_{H_0}\left[ \left(\frac{\partial \ell(\mathbf{y}; \alpha, \tau^2)}{\partial \alpha_i}\right)^2 \right] \approx n_i \exp(\alpha_i) \end{aligned} $$
\[\begin{aligned} \mathbb{E}_{H_0}\left[\frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i} \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_l} \right] &\approx \mathbb{E}_{H_0}\left[\frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i} \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_l} \right] = 0 \end{aligned}\]
Proof. $$ \begin{aligned} \mathbb{E}_{H_0}\left[ \upsilon_i \upsilon_l \right] &= \upsilon_i \upsilon_l \\ \mathbb{E}_{H_0}\left[ \upsilon_i \sum_{j = 1}^{n_l} (\mathbf{y}_{l, j} - \exp(\alpha_l)) \right] &= \upsilon_i \sum_{j = 1}^{n_l} \underbrace{\mathbb{E}_{H_0} \left[ \mathbf{y}_{l,j} - \exp(\alpha_l) \right]}_{=0} = 0 \\ \mathbb{E}_{H_0}\left[ \upsilon_i \zeta_l \dot{\mathbf{z}}_l \left(\xi_l \dot{\mathbf{z}}_l + \sum_{j = 1}^{n_l} \mathbf{z}_{l, j} \right) \right] &= \upsilon_i \zeta_l \xi_l \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l^2 \right] + \upsilon_l \zeta_l \left( \mathbf{z}_{l,j} \right) \underbrace{\mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l \right]}_{=0} \\ &= \upsilon_i \zeta_l \xi_l \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2 \mu_{l, j}^0 \\ &= \upsilon_i \zeta_l \xi_l \exp(\alpha_l) \sum_{j = 1}^{n_l} \mathbf{z}_{l, j}^2 \\ \mathbb{E}_{H_0}\left[ \left(\sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) \right) \left(\sum_{j = 1}^{n_l} (\mathbf{y}_{l,j} - \exp(\alpha_l)) \right)\right] &= \sum_{j = 1}^{n_i} \sum_{j = 1}^{n_l} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j} - \exp(\alpha_i) \right]}_{=0} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{l,j} - \exp(\alpha_l) \right]}_{=0} \\ &= 0 \\ \mathbb{E}_{H_0}\left[ \left(\sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) \right) \zeta_l \dot{\mathbf{z}}_l\left( \xi_l \dot{\mathbf{z}}_l + \sum_{j = 1}^{n_l} \mathbf{z}_{l,j} \right)\right] &= \zeta_l \xi_l \sum_{j = 1}^{n_i} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j} - \exp(\alpha_i)\right]}_{=0} \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l\right] + \sum_{j = 1}^{n_i} \sum_{j = 1}^{n_l} \mathbf{z}_{l,j} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j} - \exp(\alpha_i) \right]}_{=0} \\ &= 0 \\ \mathbb{E}_{H_0}\left[ \zeta_i \zeta_l \dot{\mathbf{z}}_i \dot{\mathbf{z}}_l \left(\xi_i \dot{\mathbf{z}}_i + \sum_{j =1}^{n_i} \mathbf{z}_{i,j}\right)\left(\xi_l \dot{\mathbf{z}}_l + \sum_{j = 1}^{n_l} \mathbf{z}_{l,j} \right)\right] &= \zeta_i \zeta_l \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i \dot{\mathbf{z}}_l\left(\xi_i \xi_l \dot{\mathbf{z}}_i \dot{\mathbf{z}}_l + \xi_i \dot{\mathbf{z}}_i \sum_{j = 1}^{n_l} \mathbf{z}_{l,j} + \xi_l \dot{\mathbf{z}}_l \sum_{j = 1}^{n_l} \mathbf{z}_{i,j} + \left( \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\right)\left( \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}\right)\right) \right] \\ &= \zeta_i \zeta_l \left[ \xi_i \xi_l \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \right] \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l^2 \right] + \xi_i \left(\sum_{j = 1}^{n_l} \mathbf{z}_{l,j} \right) \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \right] \underbrace{\mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l \right]}_{=0} + \xi_l \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right) \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l^2 \right] \underbrace{\mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i \right]}_{=0} + \left( \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\right)\left( \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}\right)\underbrace{\mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i \right]}_{=0}\underbrace{\mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l \right]}_{=0}\right] \\ &= \zeta_i \zeta_l \xi_i \xi_l \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \mu_{i,j}^0 \right) \left(\sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2 \mu_{l,j}^0 \right) \\ &= \zeta_i \zeta_l \xi_i \xi_l \eta(\alpha_i) \eta(\alpha_l) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \left(\sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2 \right) \end{aligned} \nonumber $$ As $\tau^2 \rightarrow 0$, all of the above terms go to zero, so we get: $$ \mathbb{E}_{H_0}\left[\frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i} \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_l} \right] \approx 0 $$ under the null hypothesis.
\[\mathbb{E}_{H_0}\left[ \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i} \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2}\right] \approx \frac{1}{2} \sum_{l = 1}^k \exp(\alpha_l) \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2\]
Proof. $$ \begin{aligned} \mathbb{E}_{H_0}\left[ \upsilon_i \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2}\right] &\approx \upsilon_i \mathbb{E}_{H_0}\left[ \frac{1}{2} \sum_{l = 1}^k \left[ - \frac{2\upsilon_l}{\tau^2} +\dot{\mathbf{z}}_l^2 \psi_l \right]\right] \\ &= \frac{1}{2} \upsilon_i \sum_{l = 1}^{k} - \frac{2\upsilon_l}{\tau^2} + \psi_l \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l^2 \right] \\ &= \frac{1}{2} \upsilon_i \sum_{l = 1}^{k} \left( - \frac{2\upsilon_l}{\tau^2} + \psi_l \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2 \mu_{l,j}^0 \right) \\ &= \frac{1}{2} \upsilon_i \sum_{l = 1}^{k} \left(- \frac{2 \upsilon_l}{\tau^2} + \psi_l \exp(\alpha_l) \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2\right) \\ &= \frac{1}{2} \upsilon_i \sum_{l = 1}^{k} \left(- \frac{\exp(\alpha_l) \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2}{1 + \tau^2 \exp(\alpha_l) \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2}+ \psi_l \exp(\alpha_l) \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2\right)\\ &= \frac{1}{2} \upsilon_i \sum_{l = 1}^{k} \left(\exp(\alpha_l) \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2\right) \left(\psi_l - \frac{1}{1 + \tau^2 \exp(\alpha_l) \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2} \right)\\ \mathbb{E}_{H_0}\left[\left(\sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) \right) \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2} \right] &= \frac{1}{2} \sum_{l = 1}^k \mathbb{E}_{H_0}\left[ \left(\sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) \right) \left(- \frac{2\upsilon_l}{\tau^2} + \dot{\mathbf{z}}_l^2 \psi_l \right) \right] \\ &= \frac{1}{2}\sum_{l = 1}^k \left[ - \frac{2 \upsilon_l}{\tau^2} \sum_{j = 1}^{n_i} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j} - \exp(\alpha_i) \right]}_{=0} + \psi_l \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l^2 \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) \right] \right] \\ &= \frac{1}{2} \sum_{l = 1}^k \left[ \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2 \mu_{l,j}^0 + \sum_{j = 1}^{n_i} \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l^2 \right] \underbrace{\mathbb{E}_{H_0}\left[ (\mathbf{y}_{i,j} - \exp(\alpha_i)) \right]}_{=0}\right] \\ &= \frac{1}{2} \sum_{l = 1}^k \exp(\alpha_l) \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2\\ \mathbb{E}_{H_0}\left[ \zeta_i \dot{\mathbf{z}}_i \left[ \xi_i \dot{\mathbf{z}}_i + \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right] \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2}\right] &\approx \frac{1}{2}\zeta_i \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i \left[ \xi_i \dot{\mathbf{z}}_i + \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right] \left(\sum_{l = 1}^k \left[-\frac{2 \upsilon_l}{\tau^2} +\dot{\mathbf{z}}^2_l \psi_l \right]\right) \right] \\ &= \frac{1}{2}\zeta_i \sum_{l = 1}^k \mathbb{E}_{H_0}\left[ \left( \xi_i \dot{\mathbf{z}}_i^2 + \dot{\mathbf{z}}_i \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right)\left(-\frac{2 \upsilon_l}{\tau^2} +\dot{\mathbf{z}}^2_l \psi_l \right)\right] \\ &= \frac{1}{2}\zeta_i \sum_{l = 1}^k \left[ -\frac{2 \upsilon_l \xi_i}{\tau^2} \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \right] - \frac{2 \upsilon_l}{\tau^2} \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right) \underbrace{\mathbb{E}_{H_0} \left[ \dot{\mathbf{z}}_i \right]}_{=0} + \xi_i \psi_l \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \dot{\mathbf{z}}_l \right] + \psi_l \left( \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\right)\mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l^2 \dot{\mathbf{z}}_i \right] \right] \\ &= \frac{1}{2}\zeta_i \left[ - \frac{2\xi_i}{\tau^2} \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \mu_{i,j}^0 \right) \sum_{l = 1}^k \upsilon_l + \xi_i \sum_{l = 1 \\ l \neq i}^k \psi_l \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \right] \underbrace{\mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l \right]}_{=0} + \xi_i \psi_l \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^3 \right] + \left( \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right) \left( \sum_{l = 1\\ l \neq i}^k \psi_l \mathbb{E}_{H_0}\left[ \mathbf{z}_{l}^2\right] \underbrace{\mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i \right]}_{=0} + \psi_i \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^3 \right] \right) \right] \\ &= \frac{1}{2}\zeta_i \left[ - \frac{2\xi_i \exp(\alpha_i)}{\tau^2} \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \sum_{l = 1}^k \upsilon_l + \xi_i \psi_i \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^3 \right] + \left( \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right) \psi_i \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^3 \right] \right] \\ &= \frac{1}{2}\zeta_i \left[ - \frac{2\xi_i \exp(\alpha_i)}{\tau^2} \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \sum_{l = 1}^k \upsilon_l + \left( \xi_i \psi_i + \psi_i\sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right)\left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^3 \mu_{i,j}^0 \right) \right] \\ &= \frac{1}{2}\zeta_i \exp(\alpha_i) \left[ - \frac{2\xi_i}{\tau^2} \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \sum_{l = 1}^k \upsilon_l + \psi_i \left( \xi_i + \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\right)\left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^3 \right) \right] \\ \end{aligned} \nonumber $$ Since $\upsilon_i \rightarrow 0$ and $\zeta_i \rightarrow 0$ as $\tau^2 \rightarrow 0$, the first and last term above go to zero in the limit too. Thus: $$ \mathbb{E}_{H_0}\left[ \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i} \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2}\right] \approx \frac{1}{2} \sum_{l = 1}^k \exp(\alpha_l) \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2 $$ under the null hypothesis.
\[\mathbb{E}_{H_0}\left[ \left( \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2}\right)^2 \right] \approx \frac{1}{4} \sum_{i =1}^k \sum_{l = 1 \\ l \neq i}^k \exp(\alpha_i) \exp(\alpha_l) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \left(\sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2\right) + \frac{1}{4}\sum_{i = 1}^{k}\left(3 (\exp^2(\alpha_i) + \exp(\alpha_i)) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^4 + \exp^2(\alpha_i)\sum_{j = 1}^{n_i} \sum_{j' = 1 \\ j' \neq j}^{n_i} \mathbf{z}_{i,j}^2 \mathbf{z}_{i,j'}^2 \right)\]
Proof. $$ \begin{aligned} \mathbb{E}_{H_0}\left[ \left( \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2}\right)^2 \right] &= \frac{1}{4} \sum_{i = 1}^{k} \sum_{l = 1}^k \mathbb{E}_{H_0}\left[ \left(- \frac{2 \upsilon_i}{\tau^2} + \dot{\mathbf{z}}_i^2 \psi_i\right) \left(- \frac{2 \upsilon_l}{\tau^2} + \dot{\mathbf{z}}_l^2 \psi_l \right)\right] \\ &= \frac{1}{4} \sum_{i = 1}^k \sum_{l = 1}^k \left[ \frac{4 \upsilon_i \upsilon_l}{(\tau^2)^2} - \frac{2 \upsilon_i \psi_l }{\tau^2} \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l^2\right] - \frac{2 \upsilon_l \psi_i}{\tau^2} \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \right] + \psi_i \psi_l \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \dot{\mathbf{z}}_l^2 \right] \right] \\ &= \frac{1}{4} \sum_{i = 1}^k \sum_{l = 1}^k \left[ \frac{4 \upsilon_i \upsilon_l}{(\tau^2)^2} - \frac{2 \upsilon_i \psi_l }{\tau^2} \left( \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2 \mu_{l,j}^0 \right) - \frac{2 \upsilon_l \psi_i}{\tau^2} \left( \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \mu_{i,j}^0 \right)\right] + \frac{1}{4} \sum_{i =1}^k \sum_{l = 1 \\ l \neq i}^k \psi_i \psi_l \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \dot{\mathbf{z}}_l^2 \right] + \frac{1}{4}\sum_{i = 1}^{k} \psi_i^2 \mathbb{E}_{H_0}\left[\dot{\mathbf{z}}_i^4 \right] \\ &= \frac{1}{4} \sum_{i = 1}^k \sum_{l = 1}^k \left[ \frac{4 \upsilon_i \upsilon_l}{(\tau^2)^2} - \frac{2 \upsilon_i \psi_l }{\tau^2} \left( \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2 \mu_{l,j}^0 \right) - \frac{2 \upsilon_l \psi_i}{\tau^2} \left( \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \mu_{i,j}^0 \right)\right] + \frac{1}{4} \sum_{i =1}^k \sum_{l = 1 \\ l \neq i}^k \psi_i \psi_l \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2\right] \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l^2 \right] \\ &\hspace{5mm} + \frac{1}{4}\sum_{i = 1}^{k} \psi_i^2 \left(3 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^4 ((\mu_{i,j}^0)^2 + \mu_{i,j}^0) + \sum_{j = 1}^{n_i} \sum_{j' = 1 \\ j' \neq j}^{n_i} \mathbf{z}_{i,j}^2 \mathbf{z}_{i,j'}^2 \mu_{i,j}^0 \mu_{i,j'}^0 \right)\\ &= \sum_{i = 1}^k \sum_{l = 1}^k \left[ \frac{\upsilon_i \upsilon_l}{(\tau^2)^2} - \frac{\upsilon_i \psi_l \exp(\alpha_l)}{2\tau^2} \left( \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2\right) - \frac{\upsilon_l \psi_i \exp(\alpha_i)}{2\tau^2} \left( \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)\right] + \frac{1}{4} \sum_{i =1}^k \sum_{l = 1 \\ l \neq i}^k \psi_i \psi_l \exp(\alpha_i) \exp(\alpha_l) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \left(\sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2\right) \\ &\hspace{5mm} + \frac{1}{4}\sum_{i = 1}^{k} \psi_i^2 \left(3 (\exp^2(\alpha_i) + \exp(\alpha_i)) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^4 + \exp^2(\alpha_i)\sum_{j = 1}^{n_i} \sum_{j' = 1 \\ j' \neq j}^{n_i} \mathbf{z}_{i,j}^2 \mathbf{z}_{i,j'}^2 \right)\\ \end{aligned} \nonumber $$ Evaluating the above under the null hypothesis (using the fact that $\upsilon_i = 0$ and $\psi_i = 1$ as $\tau^2 \rightarrow 0$): $$ \begin{aligned} \mathbb{E}_{H_0}\left[ \left( \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2}\right)^2 \right] &\approx \frac{1}{4} \sum_{i =1}^k \sum_{l = 1 \\ l \neq i}^k \exp(\alpha_i) \exp(\alpha_l) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \left(\sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2\right) + \frac{1}{4}\sum_{i = 1}^{k}\left(3 (\exp^2(\alpha_i) + \exp(\alpha_i)) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^4 + \exp^2(\alpha_i)\sum_{j = 1}^{n_i} \sum_{j' = 1 \\ j' \neq j}^{n_i} \mathbf{z}_{i,j}^2 \mathbf{z}_{i,j'}^2 \right) \end{aligned} \nonumber $$

Putting the above together into a big matrix, we get:

\[\mathcal{I} \rvert_{H_0} = \begin{bmatrix} \mathcal{I}_{\alpha, \alpha} \rvert_{H_0} & \mathcal{I}_{\alpha, \tau^2}\rvert_{H_0} \\ \mathcal{I}^\top_{\alpha, \tau^2}\rvert_{H_0} & \mathcal{I}_{\tau^2, \tau^2}\rvert_{H_0} \end{bmatrix}\]

where:

\[\begin{aligned} \mathcal{I}_{\alpha, \alpha}\rvert_{H_0} &= \begin{bmatrix} n_1 \exp(\alpha_1) & 0 & \dots & 0 \\ 0 & n_2 \exp(\alpha_2) & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & n_k \exp(\alpha_k) \end{bmatrix} \hspace{10mm} \mathcal{I}_{\alpha, \tau^2}\rvert_{H_0} = \begin{bmatrix} \frac{1}{2}\sum_{i = 1}^{k} \exp(\alpha_i) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \\ \vdots \\ \frac{1}{2}\sum_{i = 1}^{k} \exp(\alpha_i) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \end{bmatrix} \\ \mathcal{I}_{\tau^2, \tau^2} \rvert_{H_0} &= \begin{bmatrix} \frac{1}{4} \sum_{i =1}^k \sum_{l = 1 \\ l \neq i}^k \exp(\alpha_i) \exp(\alpha_l) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \left(\sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2\right) + \frac{1}{4}\sum_{i = 1}^{k}\left(3 (\exp^2(\alpha_i) + \exp(\alpha_i)) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^4 + \exp^2(\alpha_i)\sum_{j = 1}^{n_i} \sum_{j' = 1 \\ j' \neq j}^{n_i} \mathbf{z}_{i,j}^2 \mathbf{z}_{i,j'}^2 \right) \end{bmatrix} \end{aligned}\] \[\begin{aligned} \frac{d \upsilon_i}{d \alpha_i} &= \frac{\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} & \frac{d \xi_i}{d \alpha_i} &= - \frac{\exp(\alpha_i)(\tau^2 \sum_{j =1}^{n_i} \mathbf{z}_{i,j}^2 )^2}{2 (1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} \\ \frac{d \zeta_i}{d \alpha_i} &= \frac{\tau^2 \exp(\alpha_i)}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2} & \frac{d \dot{\mathbf{z}}_i}{d \alpha_i} &=-\exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j} \end{aligned}\]
Proof. $$ \begin{aligned} \frac{d \upsilon_i}{d \alpha_i} &= \frac{d}{d \alpha_i} \left[ \frac{\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)}\right] \\ &= \frac{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)(\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2) - 2\left( \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \right)^2}{(2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j}^2))^2} \\ &= \frac{\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} \\ \frac{d \xi_i}{d \alpha_i} &= \frac{d}{d \alpha_i} \left[ \frac{\tau^2 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)} \right] \\ &= \frac{-2\exp(\alpha_i) (\tau^2 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2}{(2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2))^2} \\ &= - \frac{\exp(\alpha_i)(\tau^2 \sum_{j =1}^{n_i} \mathbf{z}_{i,j}^2 )^2}{2 (1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} \\ \frac{d \zeta_i}{d \alpha_i} &= \frac{d}{d \alpha_i} \left[ \frac{\tau^2 \exp(\alpha_i)}{1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}} \right] \\ &= \frac{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})(\tau^2 \exp(\alpha_i)) - (\tau^2 \exp(\alpha_i))^2 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2} \\ &= \frac{\tau^2 \exp(\alpha_i)}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2} \\ \frac{d \dot{\mathbf{z}}_i}{d \alpha_i} &= \frac{d}{d \alpha_i} \left[\sum_{j =1 }^{n_i} \mathbf{z}_{i,j}(\mathbf{y}_{i,j} - \exp(\alpha_i)) \right] \\ &= - \exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j} \end{aligned} \nonumber $$

From these identities, we can derive (component-by-component) the information as the negative expected Hessian:

\[- \mathbb{E}_{H_0}\left[ \frac{d^2 \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i^2} \right] = n_i \exp(\alpha_i)\]
Proof. $$ \begin{aligned} \frac{d}{d \alpha_i} \left[ \zeta_i \dot{\mathbf{z}}_i \right] &= \dot{\mathbf{z}}_i \frac{d \zeta_i}{\partial \alpha_i} + \zeta_i \frac{d \dot{\mathbf{z}}_i}{d \alpha_i} \\ &= \frac{\tau^2 \exp(\alpha_i) \dot{\mathbf{z}}_i}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j})^2} - \zeta_i \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \\ \frac{d}{d \alpha_i} \left[ \xi_i \dot{\mathbf{z}}_i \right] &= \dot{\mathbf{z}}_i \frac{d \xi_i}{d \alpha_i} + \xi_i \frac{d \dot{\mathbf{z}}_i}{d \alpha_i} \\ &= \frac{\dot{\mathbf{z}}_i \exp(\alpha_i) (\tau^2 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2}{2 (1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} - \xi_i \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \end{aligned} $$ $$ \begin{aligned} \frac{d^2 \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i^2} &= \frac{d}{d \alpha_i} \left[ \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i} \right] \\ &\approx \frac{d}{d \alpha_i} \left[ - \upsilon_i + \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) - \zeta_i \dot{\mathbf{z}}_i \left( \xi_i \dot{\mathbf{z}}_i + \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right) \right] \\ &= - \frac{d \upsilon_i}{d \alpha_i} - n_i\exp(\alpha_i) - \xi_i \dot{\mathbf{z}}_i \frac{d}{d \alpha_i} \left[ \zeta_i \dot{\mathbf{z}}_i \right] - \zeta_i \dot{\mathbf{z}}_i \frac{d}{d \alpha_i} \left[ \xi_i \dot{\mathbf{z}}_i \right] - \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right) \frac{d}{d \alpha_i} \left[ \zeta_i \dot{\mathbf{z}}_i \right] \\ &= - \frac{\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2 (1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} - n_i \exp(\alpha_i) - \left(\xi_i \dot{\mathbf{z}}_i - \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right) \left[\frac{\tau^2 \exp(\alpha_i) \dot{\mathbf{z}}_i}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2} - \zeta_i \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right] - \zeta_i \dot{\mathbf{z}}_i \left[ \frac{\dot{\mathbf{z}}_i \exp(\alpha_i) (\tau^2 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2}{2 (1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} - \xi_i \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right] \\ &= - \frac{\tau^2\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\left(1 + \dot{\mathbf{z}}_i^2 \zeta_i \tau^2 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right)}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j}^2)^2} - \left(\xi_i \dot{\mathbf{z}}_i - \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right) \left[\frac{\tau^2 \exp(\alpha_i) \dot{\mathbf{z}}_i}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2} - \zeta_i \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right] + \zeta_i \xi_i \dot{\mathbf{z}}_i \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} - n_i \exp(\alpha_i) \\ &= - \frac{\tau^2\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\left(1 + \dot{\mathbf{z}}_i^2 \zeta_i \tau^2 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right)}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j}^2)^2} - \frac{\left(\xi_i \dot{\mathbf{z}}_i - \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right) \tau^2 \exp(\alpha_i) \dot{\mathbf{z}}_i}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2} + \left( \zeta_i \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right) \left( 2\xi_i \dot{\mathbf{z}}_i - \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\right) - n_i \exp(\alpha_i) \end{aligned} $$ Recall that $$\mathbb{E}_{H_0}[\dot{\mathbf{z}}_i] = 0$$ and $$\mathbb{E}_{H_0} \left[ \dot{\mathbf{z}}_i^2 \right] = \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2$$, so taking the negative expectation of the above yields: $$ \begin{aligned} - \mathbb{E}_{H_0}\left[ \frac{d^2 \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i^2} \right] &= \frac{\tau^2\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\left(1 + \left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \zeta_i \tau^2 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right)}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j}^2)^2} + \frac{\tau^2 \exp(\alpha_i)^2 \xi_i \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2} + n_i \exp(\alpha_i) \\ \implies - \mathbb{E}_{H_0}\left[ \frac{d^2 \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i^2} \right] \bigg\rvert_{H_0} &= n_i \exp(\alpha_i) \end{aligned} $$
\[- \mathbb{E}_{H_0} \left[ \frac{d^2 \ell(\mathbf{y}; \alpha, \tau^2)}{d (\tau^2)^2} \right] \bigg\rvert_{H_0} = \frac{3}{2} \sum_{i = 1}^{k} \left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^2 \\\]
Proof. $$ \begin{aligned} \frac{d}{d \tau^2} \left[ \frac{2 \upsilon_i}{\tau^2} \right] &= \frac{d}{d \tau^2} \left[ \frac{\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}\right] \\ &= -\frac{(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} \\ \frac{d \psi_i}{d \tau^2} &= \frac{d}{d \tau^2} \left[ \left(1 + 2 \tau^2 \exp(\alpha_i) \sum_{j =1}^{n_i} \mathbf{z}_{i,j}^2 + (\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2 \right)^{-1} \right] \\ &= -\left(1 + 2 \tau^2 \exp(\alpha_i) \sum_{j =1}^{n_i} \mathbf{z}_{i,j}^2 + (\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2\right)^{-2}\left(2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 + 2 \tau^2 \left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right)^2 \right) \\ &= - 2\psi_i^2 \exp(\alpha_i) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right) \left(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \end{aligned} \nonumber $$ It follows that: $$ \begin{aligned} \frac{d^2 \ell(\mathbf{y}; \alpha, \tau^2)}{d (\tau^2)^2} &= \frac{d}{d \tau^2} \left[ \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2} \right] \\ &= \frac{d}{d \tau^2} \left[ \frac{1}{2} \sum_{i = 1}^k \left( - \frac{2 \upsilon_i}{\tau^2} + \dot{\mathbf{z}}_i^2 \psi_i \right) \right] \\ &= \frac{1}{2} \sum_{i = 1}^k \left( \frac{(\exp(\alpha_i)\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2}{( 1+ \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} - 2 \dot{\mathbf{z}}_i^2 \psi_i^2 \exp(\alpha_i) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right) \left(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \right) \end{aligned} \nonumber $$ And thus: $$ \begin{aligned} - \mathbb{E}_{H_0} \left[ \frac{d^2 \ell(\mathbf{y}; \alpha, \tau^2)}{d (\tau^2)^2} \right] &= \frac{1}{2} \sum_{i = 1}^k \left[ - \frac{(\exp(\alpha_i)\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2}{( 1+ \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} + 2 \mathbb{E}\left[ \dot{\mathbf{z}}_i^2 \right] \psi_i^2 \exp(\alpha_i) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right) \left(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \right] \\ &=\frac{1}{2} \sum_{i = 1}^k \left[ - \frac{(\exp(\alpha_i)\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2}{( 1+ \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} + 2 \psi_i^2 \left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right)^2 \left(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \right] \end{aligned} \nonumber $$ Evaluating this under $H_0$, we get: $$ \begin{aligned} - \mathbb{E}_{H_0} \left[ \frac{d^2 \ell(\mathbf{y}; \alpha, \tau^2)}{d (\tau^2)^2} \right] \bigg\rvert_{H_0} &= \frac{1}{2} \sum_{i = 1}^{k} \left[ -\left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^2 + 2 \left( \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^2\right] \\ &= \frac{1}{2} \sum_{i = 1}^{k} \left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^2 \\ \end{aligned} \nonumber $$
\[\begin{aligned} \frac{d \upsilon_i}{d\tau^2} &= \frac{\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j =1 }^{n_i} \mathbf{z}_{i,j}^2)^2} & \frac{d \xi_i}{d \tau^2} &= \frac{\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2 (1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} \\ \frac{d \zeta_i}{d \tau^2} &= \frac{\exp(\alpha_i)}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2} & \frac{d \dot{\mathbf{z}}_i}{d \tau^2} &= 0 \end{aligned}\]
Proof. $$ \begin{aligned} \frac{d \upsilon_i}{d\tau^2} &= \frac{d}{d \tau^2} \left[ \frac{\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)} \right] \\ &= \frac{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2) - 2\tau^2 \left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right)^2}{(2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2))^2} \\ &= \frac{\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j =1 }^{n_i} \mathbf{z}_{i,j}^2)^2} \\ \frac{d \xi_i}{d \tau^2} &= \frac{d}{\tau^2} \left[\frac{\tau^2 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2 (1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)} \right] \\ &= \frac{2 (1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2) - 2 \tau^2 \exp(\alpha_i) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right)^2}{\left(2 (1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2) \right)^2} \\ &= \frac{\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2 (1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} \\ \frac{d \zeta_i}{d \tau^2} &= \frac{d}{d \tau^2} \left[ \frac{\tau^2 \exp(\alpha_i)}{1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}}\right] \\ &= \frac{\exp(\alpha_i)(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}) - \exp^2(\alpha_i) \tau^2 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}}{(1 + \tau^2 \exp(\alpha_i) \sum_{j =1 }^{n_i} \mathbf{z}_{i,j})^2} \\ &= \frac{\exp(\alpha_i)}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2} \\ \frac{d \dot{\mathbf{z}}_i}{d \tau^2} &= \frac{d}{\tau^2} \left[ \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}(\mathbf{y}_{i,j} - \exp(\alpha_i)) \right] \\ &= 0 \end{aligned} \nonumber $$
\[-\mathbb{E}_{H_0}\left[\frac{d^2 \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i d \tau^2} \right] \bigg\rvert_{H_0} = \frac{1}{2}\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\]
Proof. $$ \begin{aligned} \frac{d^2 \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i d \tau^2} &= \frac{d}{d \tau^2} \left[ \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i} \right] \\ &= \frac{d}{d \tau^2} \left[ - \upsilon_i + \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) - \zeta_i \dot{\mathbf{z}}_i \left( \xi_i \dot{\mathbf{z}}_i + \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right) \right] \\ &= - \frac{\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} - \dot{\mathbf{z}}_i^2 \left( \xi_i \frac{d \zeta_i}{d \tau^2} + \zeta_i \frac{d \xi_i}{d \tau^2}\right) - \dot{\mathbf{z}}_i \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\right) \frac{d \zeta_i}{d \tau^2} \\ &= - \frac{\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} - \dot{\mathbf{z}}_i^2 \left( \frac{\xi_i \exp(\alpha_i)}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2}+\frac{\zeta_i \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} \right) - \dot{\mathbf{z}}_i \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\right) \left(\frac{\exp(\alpha_i)}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2}\right)\\ \end{aligned} \nonumber $$ Taking the negative expectation of the above under $H_0$: $$ \begin{aligned} -\mathbb{E}_{H_0}\left[\frac{d^2 \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i d \tau^2} \right] &= \frac{\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} + \mathbb{E}_{H_0}[\dot{\mathbf{z}}_i^2] \left( \frac{\xi_i \exp(\alpha_i)}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2}+\frac{\zeta_i \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} \right) \\ &= \frac{\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} + \left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right) \left( \frac{\xi_i \exp(\alpha_i)}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2}+\frac{\zeta_i \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} \right) \end{aligned} \nonumber $$ And evaluating under $H_0$: $$ -\mathbb{E}_{H_0}\left[\frac{d^2 \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i d \tau^2} \right] \bigg\rvert_{H_0} = \frac{1}{2}\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \nonumber $$

References

  1. Fitzmaurice, G. M., Laird, N. M., & Ware, J. H. (2011). Applied longitudinal analysis (Second edition). Wiley.  2 3

  2. Lin, X. “Variance component testing in generalised linear models with random effects”. Biometrika, Volume 84, Issue 2, June 1997. Pages 309–326. doi:10.1093/biomet/84.2.309

This project is maintained by aerosengart