This post is a primer on generalized linear mixed models (GLMM) and their associated estimation procedures. Throughout this post, we perform derivatives of matrices, vectors, and scalars. We denote all of them with $\partial$ even though they may not technically be partial derivatives. The meaning should be clear from the context if you keep track of the dimensions of the variables.
We’ll use the notation $\text{vec}(\mathbf{M})$ denote the vectorization of matrix $\mathbf{M}$, where we flatten the matrix row-by-row. Also, any function $f: \mathbb{R} \rightarrow \mathbb{R}$ will be applied element-wise to vectors and matrices.
Background
Let’s first explain why generalized linear mixed models are of interest and helpful in data analysis. Suppose we have a dataset in which the observations can be grouped in some way (i.e. we have repeated observations from the same unit). One example is longitudinal studies where we obtain measurements from the same individuals over time. Another example is in education where we may observe students from the same classes or schools. In these settings, it seems reasonable to assume that observations from the same unit would be more similar that those from different ones. This can be realized in our model by including a way for the effects of covariates to differ across units.
As in generalized linear models, we assume the mean response is some function of a linear combination of covariates. However, GLMMs are called mixed because the effects of the covariates are of two types: fixed and random. The fixed effects represent population-level relationships, while the random effects represent unit-specific ones. In this way, we can use GLMMs to analyze between-subject variation (via the fixed effects) and within-subject variation (via the random effects): the fixed effects determine the relationship between the covariates and the mean response for the population (i.e. overall), and the random effects describe how each group’s mean response deviates from that. Later on, we introduce measurement errors, which account for deviation of each observation from its group’s mean.
Set-Up
We have $n$ observations coming from $k$ different clusters, each of size $n_t$ for $t \in [k]$. The full data will be denoted by $\mathbf{y}$. Though $\mathbf{y}$ is a vector, we’ll denote the $j$-th observation from cluster $i$ with $\mathbf{y}_{i,j}$. For example, \(\mathbf{y}_{i,j}\) denotes element \(\sum_{l = 1}^{i - 1} n_l + j\) of $\mathbf{y}$. We’ll also denote the $n_i$-dimensional vector of responses for cluster $i$ with $\mathbf{y}_i$.
For each observation, we will have $p$ fixed effect covariates arranged in a $p$-dimensional vector, \(\mathbf{x}_{i, j}\), and $q$ random effects covariates in a $q$-dimensional vector, \(\mathbf{z}_{i,j}\). We’ll assume that the observations within the same cluster are independent.
We’ll let $\alpha$ denote a $p$-dimensional fixed effect coefficient vector, $\alpha$, and $\beta_t$ will denote a $q$-dimensional random effect coefficient vector corresponding to group $t$. We assume $\beta_t \overset{iid}{\sim} \mathcal{F}$ for all $t \in [k]$ for an exponential family distribution, $\mathcal{F}$, and $q \times q$ covariance matrix, $D(\tau^2)$, depending on an $m$-dimensional variance component vector, $\tau^2$. These random effects are assumed to be independent of the covariates. When they are not, we can run into issues with bias when estimating the fixed effects.1
Additional Assumptions.
A few auxiliary assumptions must also be made for the analysis later, which we list here:- The third moment and higher moments of $\beta$ are of order $o(\rvert \rvert \tau^2 \rvert \rvert)$.
- The entries of $D(\tau^2)$ are linear in $\tau^2$.
- $D(\tau^2) = \text{vec}(0)$ if $\tau^2 = \text{vec}(0)$
Our model comes in the form of a specification of the conditional mean, $\mu_{i,j} = \mathbb{E}[\mathbf{y}_{i,j} \rvert \beta_i]$ (where we suppress the addition conditioning on the covariates themselves). For a monotonic and differentiable link function (e.g. $\log(\cdot)$ or $\text{logit}(\cdot)$), the conditional mean of the $j$-th observation in group $i$ is assumed to be given by:
\[\mu_{i,j} = g^{-1}\left(\mathbf{x}_{i,j}^\top \alpha + \mathbf{z}_{i,j}^\top \beta_i \right) \label{eq:glmm}\]We then assume that the observations themselves follow some exponential family distribution with measurement errors, $\epsilon_{i,j}$, which is the deviation of the response from its (unit-specific) conditional mean. These errors are assumed to have mean zero and be independent of each other and of the random effects. We further assume the responses, $\mathbf{y}_{i,j}$, conditional on the random effects (and the covariates), are independent with variances equal to some function of the conditional mean.
To write Eq. \eqref{eq:glmm} in matrix notation, we assume that the observations are ordered by group, so the first $n_1$ observations are all from group $1$. We can then define the following vectors/matrices:
\[\mu = \begin{bmatrix} \mu_{1, 1} \\ \vdots \\ \mu_{k, n_k} \end{bmatrix}, \hspace{2mm} \mathbf{X} = \begin{bmatrix} — & \mathbf{x}_{1,1} & — \\ & \dots & \\ — & \mathbf{x}_{k, n_k} & — \\ \end{bmatrix}, \hspace{2mm} \tilde{\mathbf{Z}}_i = \begin{bmatrix} — & \mathbf{z}_{i, 1} & — \\ & \dots & \\ — & \mathbf{z}_{i, n_i} & — \\ \end{bmatrix}, \hspace{2mm} \mathbf{Z} = \begin{bmatrix} \tilde{\mathbf{Z}}_1 & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \dots & \tilde{\mathbf{Z}}_k \end{bmatrix}, \hspace{2mm} \beta = \begin{bmatrix} \beta_1 \\ \vdots \\ \beta_k \end{bmatrix} \nonumber\]where $\tilde{\mathbf{Z}}_i$ is constructed by stacking the \(\mathbf{z}_{i,j}\) vectors for group $i$ into a matrix, and where $\beta$ is the vector created by stacking together all of the $\beta_i$ vectors vertically. As a reminder, \(\mu \in \mathbb{R}^{n \times 1}\), \(\mathbf{X} \in \mathbb{R}^{n \times p}\), \(\tilde{\mathbf{Z}}_i \in \mathbb{R}^{n_i \times q}\), \(\mathbf{Z} \in \mathbb{R}^{n \times (q \times k)}\), $\alpha \in \mathbb{R}^p$, and \(\beta \in \mathbb{R}^{(q \times k) \times 1}\).
The model is then:
\[\mu = g^{-1}(\mathbf{X} \alpha + \mathbf{Z} \beta) \label{eq:glmm-matrix}\]We’ll use $[\cdot] \rvert_{\beta = \beta_0}$ to denote evaluation of the function in brackets when setting $\beta$ equal to $\beta_0$. Similarly, we’ll use $[\cdot] \rvert_{H_0}$ to denote evaluation under the null hypothesis. We’ll also use a superscript $0$ (e.g. $\mu^0$, $\eta^0$, etc.) to denote the quantity under the null hypothesis (i.e. $\tau^2 = \mathbf{0} \implies \beta = \mathbf{0}$).
To keep things simple, we'll assume to a fixed intercept (that is cluster-specific) and a random slope. Thus, $\alpha, \beta \in \mathbb{R}^{k}$, and $\mathbf{z}_{i, j} \in \mathbb{R}$ as well.
In this simple case, $\tilde{\mathbf{Z}}_i \in \mathbb{R}^{n_t}$, so $\mathbf{Z}$ is $n \times k$. Our model is then: $$ \mathbf{y}_{i,j} \rvert \beta_i \sim \text{Poi}\left(\mu_{i,j}\right), \hspace{5mm} \mu_{i,j} = \exp\left(\alpha_i + \beta_i \mathbf{z}_{i,j} \right) $$ We'll have a scalar-valued variance component that we call $\tau^2$. The random effects will have distribution: $$ \beta_i \overset{iid}{\sim} \mathcal{N}(0, \tau^2), \hspace{5mm} \forall i \in [k] $$ If we let $\mathbf{A}$ denote the $n$-dimensional vector of intercepts where the $i$-th entry of $\alpha$ is repeated $n_i$ times, then we can write our model in vector form as: $$ \mu = \exp(\mathbf{A} + \mathbf{Z} \beta) $$
Likelihood
We can write the conditional log-likelihood using the exponential family form (see my generalized linear models post).
\[\ell(\mathbf{y}; \alpha, \tau^2 \rvert \beta) = \sum_{i = 1}^{k} \sum_{j = 1}^{n_i} \left[ \frac{\zeta_{i,j} \mathbf{y}_{i,j} - A(\zeta_{i,j})}{d(\phi, \omega_{i,j})} + \log(h(\mathbf{y}_{i,j}, \phi, \omega_{i,j}))\right], \hspace{10mm} \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) = \exp \left( \sum_{i = 1}^n \left[ \frac{\zeta_i \mathbf{y}_i - A(\zeta_i)}{d(\phi, \omega_i)} + \log(h(\mathbf{y}_i, \phi, \omega_i))\right] \right) \label{eq:condition-log-lik}\]where $\phi > 0$ is a dispersion/scale parameter; $\omega_{i,j}$ is a (prior) dispersion weights; $\zeta_{i,j}$ is a distribution parameter; and $h(\cdot)$ and $A(\cdot)$ are known functions. We assume that $d(\phi, \omega_{i,j}) = \phi \omega_i^{-1}$.
As we found in my generalized linear models post, the conditional variance of the responses is given by:
\[\text{Cov}(\mathbf{y} \rvert \beta) = \text{diag}(d(\phi, \omega)) \underbrace{\frac{\partial^2 A(\tau^2)}{\partial \tau^2 \partial (\tau^2)^\top}}_{=V(\mu)} = \text{diag}^{-1}\left( \frac{\omega}{\phi}\right) V(\mu)\]Quasi-Likelihood
We don’t actually need to specify the true likelihood, even though (in practice) we will usually be able to do so. Using quasi-likelihood methods can often be less computationally expensive than maximum likelihood for count data.
In the quasi-likelihood scheme, we keep the conditional mean assumption that we discussed above and also make the assumption that the conditional variance can be expressed as a function of the mean in the form $\text{Cov}(\mathbf{y}) = \text{diag}\left(d(\phi, \omega)\right)V(\mu)$ for some function $V(\cdot)$. Later, we’ll denote diagonal elements of $\text{Cov}(\mathbf{y})$ with $v(\mu_{i,j})$.
The conditional log quasi-likelihood and quasi-likelihood are given by:
\[\ell_q(\mathbf{y}_{i,j}; \alpha, \tau^2 \rvert \beta_i) = \int_{\mathbf{y}_{i,j}}^{\mu_{i,j}} \frac{\omega_{i,j}(\mathbf{y}_{i,j} - u)}{\phi V(u)} du, \hspace{10mm} \mathcal{L}_q(\mathbf{y}_{i,j}; \alpha, \tau^2 \rvert \beta_i) = \exp\left(\int_{\mathbf{y}_{i,j}}^{\mu_{i,j}} \frac{\omega_{i,j}(\mathbf{y}_{i,j} - u)}{\phi V(u)} du \right) \label{eq:quasi-lik}\]When $\mathcal{F}$ is an exponential family distribution, the log-likelihood and log quasi-likelihood are equal. See my post on quasi-likelihood for a more in-depth discussion.
Let $f$ denote the density associated with $\mathcal{F}$, the distribution of the random effects. The unconditional quasi-likelihood is then given by integrating out the random effects from the joint quasi-likelihood:
\[\begin{aligned} \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2) &= \prod_{i = 1}^k \mathcal{L}_q(\mathbf{y}_i; \alpha, \tau^2) = \prod_{i = 1}^k \int\prod_{j = 1}^{n_i} \mathcal{L}_q(\mathbf{y}_{i,j}; \alpha, \tau^2 \rvert \beta_i) f(\beta_i) d\beta_i \end{aligned} \label{eq:uncondition-log-lik}\]Remember that the above integral is multi-dimensional if $\beta_i$ has dimension greater than $1$!
We usually do maximum likelihood estimation for the parameter vector $\beta$ by taking the derivative of the log of the above, setting it equal to zero, and solving for $\beta$. However, depending upon the form of $f(\cdot)$ and the number of random effects we have, Eq. \eqref{eq:uncondition-log-lik} may be a multi-dimensional integral that is difficult to evaluate, let alone differentiate. One way is to use quadrature to approximate the integral as a summation. However, there are two other, computationally more feasible, methods that provide a work-around.
There are two common methods of approximate inference for GLMMs: penalized quasi-likelihood (PQL) and marginal quasi-likelihood (MQL). They are very similar in that they both perform a Taylor approximation of the conditional log quasi-likelihood to evaluate the integral in Eq. \eqref{eq:uncondition-log-lik}. However, PQL uses the maximum a priori estimates of the random effects as the operating point for the expansion, while MQL uses the random effects mean. See below for a discussion of the approaches.
Penalized Quasi-Likelihood.
Penalized quasi-likelihood (PQL) essentially uses Laplace's method to approximate the integral above. The general idea behind Laplace's method is to approximate integrals of a certain form as: $$ \int \exp(M f(x)) dx \approx \exp(M f(x_0)) \int \exp\left( - \frac{1}{2} M \rvert f''(x_0) \rvert (x - x_0)^2 \right) dx $$ where $M$ is a large scalar, $f$ is a twice-differentiable function, and $x_0$ is a global maximum of $f$. If $$f''(x_0) < 0$$ and $M \rightarrow \infty$, then the integrand above is basically a Gaussian kernel and the approximation becomes: $$ \int \exp(M f(x)) dx \approx \left(\frac{2 \pi}{M \rvert f''(x_0) \rvert} \right)^{\frac{1}{2}} \exp(M f(x_0)) \label{eq:ql-gauss} $$ Letting $c = (2 \pi)^{-\frac{m}{2}}$ and $d_{i,j}(y_{i,j}, \mu_{i,j}) = -2 \int_{y_{i,j}}^{\mu_{i,j}} \frac{y_{i,j} - z}{a_{i,j} V(z)} dz$, we can rewrite the integral as: $$ \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2) = c \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \int \exp\left( -\kappa(\beta) \right) d \beta; \hspace{5mm} \kappa(\beta) = \frac{1}{2\phi}\sum_{i = 1}^n \sum_{j = 1}^{n_i} d_{i,j}(y_{i,j}, \mu_{i,j}) + \frac{1}{2}\beta^\top D^{-1}(\tau^2) \beta $$Proof.
$$ \begin{aligned} \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2) &= \int \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \mathcal{L}(\beta) d \beta \\ &= \int \left[ \prod_{i = 1}^n \prod_{j = 1}^{n_i} \exp\left( \ell_q(y_{i,j}; \mu_{i,j} \rvert \beta) \right) \right] (2 \pi)^{-\frac{m}{2}} \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \exp\left(- \frac{1}{2}\beta^\top D^{-1}(\tau^2) \beta \right) d\beta \\ &= (2 \pi)^{-\frac{m}{2}} \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \int \exp\left(\sum_{i = 1}^n \sum_{j = 1}^{n_i} \left( \int_{y_{i,j}}^{\mu_{i,j}} \frac{y_{i,j} - z}{\phi a_{i,j} V(z)} dz \right) - \frac{1}{2}\beta^\top D^{-1}(\tau^2) \beta \right) d\beta \\ &= (2 \pi)^{-\frac{m}{2}} \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \int \exp\left(- \frac{1}{2 \phi} \sum_{i = 1}^n \sum_{j = 1}^{n_i} \left( -2 \int_{y_{i,j}}^{\mu_{i,j}} \frac{y_{i,j} - z}{a_{i,j} V(z)} dz \right) - \frac{1}{2}\beta^\top D^{-1}(\tau^2) \beta \right) d\beta \\ &= (2 \pi)^{-\frac{m}{2}} \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \int \exp\left(- \frac{1}{2 \phi} \sum_{i = 1}^n \sum_{j = 1}^{n_i} d_{i,j}(y_{i,j}, \mu_{i,j}) - \frac{1}{2}\beta^\top D^{-1}(\tau^2) \beta \right) d\beta & \left( d_{i,j}(y_{i,j}, \mu_{i,j}) = -2 \int_{y_{i,j}}^{\mu_{i,j}} \frac{y_{i,j} - z}{a_{i,j} V(z)} dz \right) \end{aligned} \nonumber $$Proof.
$$ \begin{aligned} \kappa'_k(\beta) &= \frac{\partial}{\partial \beta_k} \left[\frac{1}{2 \phi} \sum_{i = 1}^k \sum_{j = 1}^{n_i} d_{i,j}(y_{i,j}, \mu_{i,j}) + \frac{1}{2} \beta^\top D^{-1}(\tau^2) \beta \right] \\ &= \frac{1}{2 \phi} \sum_{i = 1}^k \sum_{j = 1}^{n_i} \frac{\partial}{\partial \beta_k} \left[d_{i,j}(y_{i,j}, \mu_{i,j}) \right] + D^{-1}(\tau^2) \beta \\ &= \frac{1}{2 \phi} \sum_{j = 1}^{n_k} \frac{\partial \mu_{k,j}}{\partial \beta_k} \frac{\partial}{\partial \mu_{k,j}}\left[ -2 \int_{y_{k,j}}^{\mu_{k,j}} \frac{y_{k,j} - z}{a_{k,j} V(z)}dz \right] + D^{-1}(\tau^2) \beta_k \\ &= \frac{1}{2 \phi} \sum_{j = 1}^{n_k} \frac{\partial \mu_{k,j}}{\partial \beta_k} \left(-2 \frac{y_{k,j} - \mu_{k,j}}{a_{k,j} V(\mu_{k,j})}\right) + D^{-1}(\tau^2) \beta_k \\ &= -\sum_{j = 1}^{n_k} \frac{\partial \mu_{k,j}}{\partial \beta_k} \left(\frac{y_{k,j} - \mu_{k,j}}{\phi a_{k,j} V(\mu_{k,j})}\right) + D^{-1}(\tau^2) \beta_k \\ \end{aligned} $$ Assuming $g(\cdot)$ is strictly monotone and continuous, then we have: $$ \frac{\partial g^{-1}(\eta_{i,j})}{\partial \eta_{i,j}} = \frac{\partial \mu_{i,j}}{\partial \eta_{i,j}} = \left(\frac{\partial \eta_{i,j}}{\partial \mu_{i,j}}\right)^{-1} = \left(\frac{\partial g(\mu_{i,j})}{\partial \mu_{i,j}}\right)^{-1} $$ This implies: $$ \frac{\partial \mu_{k,j}}{\partial \beta_k} = \frac{\partial \eta_{k,j}}{\partial \beta_k} \frac{\partial \mu_{k, j}}{\partial \eta_{i,j}} = \mathbf{z}_{i,j} \left(\frac{\partial \eta_{i,j}}{\partial \mu_{i,j}}\right)^{-1} = \frac{\mathbf{z}_{i,j}}{\frac{\partial g(\mu_{i,j})}{\partial \mu_{i,j}}} $$ Thus, $\kappa'_k(\beta)$ is: $$ \kappa'_k(\beta) = -\sum_{j = 1}^{n_k} \left(\frac{y_{k,j} - \mu_{k,j}}{\phi a_{k,j} V(\mu_{k,j})\frac{\partial g(\mu_{k,j})}{\partial \mu_{k,j}}}\right)\mathbf{z}_{k,j} + D^{-1}(\tau^2) \beta_k $$ Looking at the second-order partial derivatives: $$ \begin{aligned} \kappa''_{k,k}(\beta) &= \frac{\partial}{\partial \beta_k^\top} \left[ -\sum_{j = 1}^{n_k} \left(\frac{y_{k,j} - \mu_{k,j}}{\phi a_{k,j} V(\mu_{k,j})\frac{\partial g(\mu_{k,j})}{\partial \mu_{k,j}}}\right)\mathbf{z}_{k,j} + D^{-1}(\tau^2) \beta_k \right] \\ &= - \sum_{j = 1}^{n_k} \left( \frac{\mathbf{z}_{k,j}}{\phi a_{k,j} V(\mu_{k,j}) \frac{\partial g(\mu_{k,j})}{\partial \mu_{k,j}}} \frac{\partial}{\partial \beta_k^\top} \left[ y_{k,j} - \mu_{k,j} \right] + \mathbf{z}_{k,j}(y_{k,j} - \mu_{k,j})\frac{\partial}{\partial \beta_k^\top} \left[ \frac{1}{\phi a_{k,j} V(\mu_{k,j}) \frac{\partial g(\mu_{k,j})}{\partial \mu_{k,j}}} \right] \right) + D^{-1}(\tau^2) \\ &= - \sum_{j = 1}^{n_k} \left( \frac{-\mathbf{z}_{k,j}\mathbf{z}_{k, j}^\top}{\phi a_{k,j} V(\mu_{k,j}) \left(\frac{\partial g(\mu_{k,j})}{\partial \mu_{k,j}}\right)^2} + \mathbf{z}_{k,j}(y_{k,j} - \mu_{k,j})\frac{\partial}{\partial \beta_k^\top} \left[ \frac{1}{\phi a_{k,j} V(\mu_{k,j}) \frac{\partial g(\mu_{k,j})}{\partial \mu_{k,j}}} \right] \right) + D^{-1}(\tau^2) \\ &= \sum_{j = 1}^{n_k} \frac{\mathbf{z}_{k,j}\mathbf{z}_{k, j}^\top}{\phi a_{k,j} V(\mu_{k,j}) \left(\frac{\partial g(\mu_{k,j})}{\partial \mu_{k,j}}\right)^2} + D^{-1}(\tau^2) + R_k \\ &=(\tilde{\mathbf{Z}}^k)^\top \mathbf{W}_k \tilde{\mathbf{Z}}^k + D^{-1}(\tau^2) + R_k \end{aligned} $$ where in $(i)$, we set: $$ R_k = -\sum_{j = 1}^{n_k} (y_{k,j} - \mu_{k,j}) \mathbf{z}_{k,j} \frac{\partial}{\partial \beta_k^\top} \left[ \frac{1}{\phi a_{k,j} V(\mu_{k,j}) \frac{\partial g(\mu_{k,j})}{\partial \mu_{k,j}}} \right] $$ and in $(ii)$, $\mathbf{W}_k$ is the diagonal matrix with elements: $$ \frac{1}{\phi a_{k,j} V(\mu_{k,j}) \frac{\partial g(\mu_{k,j})}{\partial \mu_{k,j}}}$$ Since $\kappa'_k(\beta)$ only involves $\beta_k$, $\kappa''_{k, k'}(\beta) = \mathbf{0}$. Stacking all of the $\mathbf{W}_k$ and $R_k$ together, we get: $$ \kappa''(\beta) = \tilde{\mathbf{Z}}^\top \mathbf{W} \tilde{\mathbf{Z}} + D^{-1}(\tau^2) + R \approx \tilde{\mathbf{Z}}^\top \mathbf{W} \tilde{\mathbf{Z}} + D^{-1}(\tau^2) $$ where we drop $R$ for the approximation.Marginal Quasi-Likelihood.
An alternative way to think about a generalized linear mixed model is from a marginal perspective, where we instead focus on the unconditional expectation of the response: $$ \mathbb{E}[y_{i,j}] = \mu_{i,j} = g(\mathbf{x}_{i,j}^\top \alpha) $$ In contrast to PQL, marginal quasi-likelihood (MQL) performs a Taylor approximation of the log quasi-likelihood about the random effects (prior) mean $\beta = \mathbf{0}$ to get a closed form solution to the multi-dimensional integral. For the $i$-th cluster, this approximation is: $$ \begin{aligned} \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) \approx \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)\rvert_{\beta_i = \mathbf{0}} + \beta_i^\top \left[ \frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i} \right] \bigg\rvert_{\beta_i = \mathbf{0}} + \frac{1}{2} \beta_i^\top \left[ \frac{\partial^2 \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i \partial \beta_i^\top} \right] \bigg\rvert_{\beta_i = \mathbf{0}} \beta_i \end{aligned} $$ Thus: $$ \begin{aligned} \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2) &= \prod_{i = 1}^k (2 \pi)^{-\frac{q}{2}} \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \int \exp\left( \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) - \frac{1}{2}\beta_i^\top D^{-1}(\tau^2) \beta_i \right) d \beta_i \\ &\approx \prod_{i = 1}^k (2 \pi)^{-\frac{q}{2}} \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \int \exp\left( [\ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)]_{\beta_i = \mathbf{0}} + \beta_i^\top \left[ \frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i} \right] \bigg\rvert_{\beta_i = \mathbf{0}} + \frac{1}{2} \beta_i^\top \left[ \frac{\partial^2 \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i \partial \beta_i^\top} \right] \bigg\rvert_{\beta_i = \mathbf{0}} \beta_i - \frac{1}{2}\beta_i^\top D^{-1}(\tau^2) \beta_i \right) d \beta_i \\ &= \prod_{i = 1}^k (2 \pi)^{-\frac{q}{2}} \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \left[\mathcal{L}_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) \right] \bigg\rvert_{\beta_i = \mathbf{0}} \int \exp\left( \beta_i^\top \left[ \frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i} \right] \bigg\rvert_{\beta_i = \mathbf{0}} - \frac{1}{2} \beta_i^\top \left[ D^{-1}(\tau^2) - \left[ \frac{\partial^2 \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i \partial \beta_i^\top} \right] \bigg\rvert_{\beta_i = \mathbf{0}} \right] \beta_i\right) d \beta_i \end{aligned} \label{eq:big-integral} $$ We can rewrite Eq. \eqref{eq:big-integral} in canonical form by defining: $$ \mathbf{K}_i = D^{-1}(\tau^2) - \left[ \frac{\partial^2 \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i \partial \beta_i^\top} \right] \bigg\rvert_{\beta_i = \mathbf{0}}; \hspace{5mm} \mathbf{h}_i = \left[ \frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i} \right] \bigg\rvert_{\beta_i = \mathbf{0}} $$ And: $$ \bar{\Sigma}_i^{-1} = \mathbf{K}_i; \hspace{5mm} \bar{\mu}_i = \bar{\Sigma}_i \mathbf{h}_i; \hspace{5mm} g_i = - \frac{1}{2} \bar{\mu}_i^\top \bar{\Sigma}_i^{-1} \bar{\mu}_i - \log\left((2 \pi)^{\frac{q}{2}} \rvert \bar{\Sigma}_i \rvert^{\frac{1}{2}} \right) $$ Then: $$ \begin{aligned} \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2) &\approx \prod_{i = 1}^k \left\rvert \mathbb{I}_{q \times q} - D(\tau^2) \left[ \frac{\partial^2 \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i \partial \beta_i^\top} \right] \bigg\rvert_{\beta_i = \mathbf{0}}\right\rvert^{-\frac{1}{2}} \left[\mathcal{L}_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) \right]\bigg\rvert_{\beta_i = \mathbf{0}} \exp\left(\frac{1}{2} \left[\mathbf{h}_i \mathbf{K}_i^{-1} \mathbf{h}_i \right] \right) \\ \implies \ell_q(\mathbf{y}; \alpha, \tau^2) &\approx \sum_{i = 1}^k \left[ -\frac{1}{2} \log\left( \left\rvert \mathbb{I}_{q \times q} - D(\tau^2) \left[ \frac{\partial^2 \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i \partial \beta_i^\top} \right] \bigg\rvert_{\beta_i = \mathbf{0}}\right\rvert \right) + \left[ \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) \right] \bigg\rvert_{\beta_i = \mathbf{0}} + \frac{1}{2} \mathbf{h}_i^\top \mathbf{K}_i^{-1} \mathbf{h}_i \right] \end{aligned} \label{eq:marg-qll} $$Proof.
$$ \begin{aligned} \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2) &\approx \prod_{i = 1}^k (2 \pi)^{-\frac{q}{2}} \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \left[\mathcal{L}_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) \right] \bigg\rvert_{\beta_i = \mathbf{0}} \underbrace{\int \exp\left( \beta_i^\top \mathbf{h}_i - \frac{1}{2} \beta_i^\top \mathbf{K}_i \beta_i\right) d \beta_i}_{= \exp(-g_i)} \\ &= \prod_{i = 1}^k (2 \pi)^{-\frac{q}{2}} \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \left[\mathcal{L}_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) \right] \bigg\rvert_{\beta_i = \mathbf{0}} \exp\left(\frac{1}{2}\bar{\mu}_i^\top \bar{\Sigma}_i^{-1}\bar{\mu}_i + \log\left((2 \pi)^{\frac{q}{2}} \rvert \bar{\Sigma}_i \rvert^{\frac{1}{2}}\right) \right) \\ &= \prod_{i = 1}^k (2 \pi)^{-\frac{q}{2}} \rvert D(\tau^2) \rvert^{-\frac{1}{2}} \left[\mathcal{L}_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) \right]\bigg\rvert_{\beta_i = \mathbf{0}} \exp\left(\frac{1}{2}\bar{\mu}_i^\top \bar{\Sigma}_i^{-1}\bar{\mu}_i\right) (2 \pi)^{\frac{q}{2}} \rvert \bar{\Sigma}_i \rvert^{\frac{1}{2}} \\ &= \prod_{i = 1}^k \rvert D(\tau^2)\rvert^{-\frac{1}{2}} \rvert \bar{\Sigma}_i^{-1} \rvert^{-\frac{1}{2}}\left[\mathcal{L}_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) \right]\bigg\rvert_{\beta_i = \mathbf{0}} \exp\left(\frac{1}{2}\bar{\mu}_i^\top \bar{\Sigma}_i^{-1}\bar{\mu}_i\right) \\ &= \prod_{i = 1}^k \rvert D(\tau^2) \mathbf{K}_i \rvert^{-\frac{1}{2}} \left[\mathcal{L}_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) \right]\bigg\rvert_{\beta_i = \mathbf{0}} \exp\left(\frac{1}{2} \left[\mathbf{h}_i (\mathbf{K}_i^{-1})^\top \mathbf{K}_i \mathbf{K}_i^{-1} \mathbf{h}_i \right] \right) \\ &= \prod_{i = 1}^k \left\rvert D(\tau^2) \left[ D^{-1}(\tau^2) - \left[ \frac{\partial^2 \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i \partial \beta_i^\top} \right] \bigg\rvert_{\beta_i = \mathbf{0}}\right] \right\rvert^{-\frac{1}{2}} \left[\mathcal{L}_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) \right]\bigg\rvert_{\beta_i = \mathbf{0}} \exp\left(\frac{1}{2} \left[\mathbf{h}_i \mathbf{K}_i^{-1} \mathbf{h}_i \right] \right) \\ &= \prod_{i = 1}^k \left\rvert \mathbb{I}_{q \times q} - D(\tau^2) \left[ \frac{\partial^2 \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i \partial \beta_i^\top} \right] \bigg\rvert_{\beta_i = \mathbf{0}} \right\rvert^{-\frac{1}{2}} \left[\mathcal{L}_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i) \right]\bigg\rvert_{\beta_i = \mathbf{0}} \exp\left(\frac{1}{2} \left[ \frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i^\top } \right] \bigg\rvert_{\beta_i = \mathbf{0}} \left[ D^{-1}(\tau^2) - \left[ \frac{\partial^2 \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i \partial \beta_i^\top} \right] \bigg\rvert_{\beta_i = \mathbf{0}} \right]^{-1} \left[ \frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta_i)}{\partial \beta_i} \right] \bigg\rvert_{\beta_i = \mathbf{0}}\right) \end{aligned} $$An Aside
Penalized quasi-likelihood and marginal quasi-likelihood are quite similar, but they do have notable differences. For one, MQL will not result in predictions for the random effects. Thus, it is not suitable for inference at the group level. In addition, the estimates of the fixed effects are essentially for the marginal model. Fitzmaurice et al.1 note that MQL estimates are highly biased unless the random effects variance is near zero, regardless of the number of repeated measurements.
In contrast, though PQL can be used to estimate the fixed effects and predict the random effects, there are some complications. For small counts or low numbers of repeated measurements, PQL underestimates the fixed effects as well as the variance components. PQL and MQL can also be reframed as approximations of a GLMM with an adjusted LMM (see below).
Pseudo-Likelihood.
We can reframe penalized and marginal quasi-likelihood as arising from a Gaussian approximation of the GLMM at hand. We'll follow Chapter 15 in Fitzmaurice et al.1 We assume to have the following generalized linear mixed model: $$ \mathbf{y}_{i,j} \rvert \beta_i \sim \mathcal{F}(\mu_{i,j}); \hspace{10mm} \mu_{i,j} = g^{-1}\left(\eta_{i,j}\right) = g^{-1}\left(\alpha^\top \mathbf{x}_{i,j} + \beta_i^\top \mathbf{z}_{i,j}\right) \label{eq:glmm-y} $$ We first approximate the response by assuming it can be represented as its conditional mean plus some Gaussian error: $$ \mathbf{y}_{i,j} \approx \mu_{i,j} + \epsilon_{i,j} $$ where, for some variance function $V(\cdot)$ of the conditional mean (e.g. $V(\mu_{i,j}) = \mu_{i,j}$ in the Poisson case), the errors satisfy: $$ \mathbb{E}\left[ \epsilon_{i,j} \right] = 0; \hspace{5mm} \text{Var}(\epsilon_{i,j} ) = V(\mu_{i,j}); \hspace{5mm} \text{Cov}(\epsilon_{i,j}, \epsilon_{i', j'} ) = 0 $$ Since the link function is non-linear, we will linearize the mean with a first-order Taylor approximation about $\hat{\eta}_{i,j}$, the linear predictor estimated under $H_0$: $$ \mathbf{y}_{i,j} \approx g^{-1}(\hat{\eta}_{i,j}) + \delta(\hat{\eta}_{i,j}) (\eta_{i,j} - \hat{\eta}_{i,j}) + \epsilon_{i,j} $$ where $\delta(\hat{\eta}_{i,j}) = \frac{\partial g^{-1}(\eta_{i,j})}{\partial \eta_{i,j}}\bigg\rvert_{\eta_{i,j} = \hat{\eta}_{i,j}}$, the gradient of the inverse link function w.r.t $\eta_{i,j}$ evaluated at the estimate under $H_0$.Proof.
$$ \begin{aligned} \mathbf{y}_{i,j} &\approx \left[g^{-1}(\eta_{i,j})\right]\bigg\rvert_{\eta_{i,j} = \hat{\eta}_{i,j}} + \frac{\partial g^{-1}(\eta_{i,j})}{\partial \eta_{i,j}}\bigg\rvert_{\eta_{i,j} = \hat{\eta}^{(c)}_{i,j}} (\eta_{i,j} - \hat{\eta}_{i,j}) + \epsilon_{i,j} \\ &= g^{-1}(\hat{\eta}_{i,j}) + \delta(\hat{\eta}_{i,j}) (\eta_{i,j} - \hat{\eta}_{i,j}) + \epsilon_{i,j} & \left(\delta(\hat{\eta}_{i,j}) = \frac{\partial g^{-1}(\eta_{i,j})}{\partial \eta_{i,j}}\bigg\rvert_{\eta_{i,j} = \hat{\eta}^{(c)}_{i,j}}\right) \end{aligned} \nonumber $$Proof.
$$ \begin{aligned} \implies &\mathbf{y}_{i,j} \approx g^{-1}(\hat{\eta}_{i,j}) + \delta(\hat{\eta}_{i,j}) (\eta_{i,j} - \hat{\eta}_{i,j}) + \epsilon_{i,j} \\ \implies &\mathbf{y}_{i,j} \approx \hat{\mu}_{i,j} + \delta(\hat{\eta}_{i,j})(\alpha^\top \mathbf{x}_{i,j} + \beta_i^\top \mathbf{z}_{i,j}) - \delta(\hat{\eta}_{i,j})(\hat{\alpha}^\top \mathbf{x}_{i,j} + \mathbf{0}^\top \mathbf{z}_{i,j}) + \epsilon_{i,j} \\ \implies &\mathbf{y}_{i,j} + \delta(\hat{\eta}_{i,j})(\hat{\alpha}^\top \mathbf{x}_{i,j}) \approx \hat{\mu}_{i,j} + \delta(\hat{\eta}_{i,j})(\alpha^\top \mathbf{x}_{i,j} + \beta_i^\top \mathbf{z}_{i,j}) + \epsilon_{i,j} \\ \implies &\mathbf{y}_{i,j} - \hat{\mu}_{i,j} + \delta(\hat{\eta}_{i,j})(\hat{\alpha}^\top \mathbf{x}_{i,j}) \approx \delta(\hat{\eta}_{i,j})(\alpha^\top \mathbf{x}_{i,j} + \beta_i^\top \mathbf{z}_{i,j}) + \epsilon_{i,j} \\ \implies &\frac{1}{\delta(\hat{\eta}_{i,j})}\left(\mathbf{y}_{i,j} - \hat{\mu}_{i,j}\right) + \hat{\alpha}^\top \mathbf{x}_{i,j} \approx \alpha^\top \mathbf{x}_{i,j} + \beta_i^\top \mathbf{z}_{i,j} + \frac{1}{\delta(\hat{\eta}_{i,j})}\epsilon_{i,j} \\ \implies &\frac{1}{\delta(\hat{\eta}_{i,j})}\left(\mathbf{y}_{i,j} - \hat{\mu}_{i,j}\right) + \hat{\eta}_{i,j} \approx \alpha^\top \mathbf{x}_{i,j} + \beta_i^\top \mathbf{z}_{i,j} + \frac{1}{\delta(\hat{\eta}_{i,j})}\epsilon_{i,j} \end{aligned} \nonumber $$Alternative Approximation
Unfortunately, the approximate inference methods introduced above require a particular distribution for the random effects. If we want to relax this as Lin2 does, we have to do something slightly more involved.
Taylor Approximation
First, let’s rewrite the conditional quasi-likelihood of $(\alpha, \tau^2)$ given $\beta$ using a Taylor expansion. We’ll use a second-order expansion since the remainder will be relatively simple due to the assumptions made on the higher order moments of the random effects.
Denote the first- and second-order partial derivatives of $\mathcal{L}_{q}(\alpha, \tau^2; y \rvert \beta) = \exp(\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta))$ with:
\[\begin{aligned} \frac{\partial \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2\rvert \beta)}{\partial \beta} &= \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \mathbf{Z}^\top \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta}\\ \frac{\partial^2 \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2\rvert \beta)}{\partial \beta \partial \beta^\top} &= \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\mathbf{Z}^\top \left(\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta}\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta^\top} + \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top} \right) \mathbf{Z} \end{aligned} \label{eq:likelihood-derivs}\]First Order Derivations.
$$ \begin{aligned} \frac{\partial \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2\rvert \beta)}{\partial \beta} &= \frac{\partial}{\partial \beta} \left[ \exp\left(\ell_{q}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right) \right] \\ &= \exp\left( \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right) \frac{\partial}{\partial \beta} \left[ \sum_{i = 1}^n \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta) \right] \\ &= \exp\left( \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right) \frac{\partial \eta}{\partial \beta} \frac{\partial}{\partial \eta} \left[ \sum_{i = 1}^n \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta) \right] & \left( \eta \in \mathbb{R}^{n \times 1}, \beta \in \mathbb{R}^{q \times 1} \implies \frac{\partial \eta}{\partial \beta} \in \mathbb{R}^{n \times q} \right) \\ &= \exp\left( \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right)\mathbf{Z}^\top \frac{\partial}{\partial \eta} \left[ \sum_{i = 1}^n\ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta) \right] & \left(\eta = \mathbf{X} \alpha + \mathbf{Z} \beta \right) \\ &= \exp\left( \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right) \sum_{i = 1}^n\frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i} (\mathbf{Z}_{\cdot, i})^\top & \left( \frac{\partial \ell_q}{\partial \eta} \in \mathbb{R}^{n \times 1}, \mathbf{Z}^\top \in \mathbb{R}^{q \times n} \right) \end{aligned} \nonumber $$ In matrix notation: $$ \frac{\partial \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2\rvert \beta)}{\partial \beta} = \exp(\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)) \mathbf{Z}^\top \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta} = \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \mathbf{Z}^\top \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta} \nonumber $$Second Order Derivations.
$$ \begin{aligned} \frac{\partial^2 \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2\rvert \beta)}{\partial \beta \partial \beta^\top} &= \frac{\partial}{\partial \beta} \left[\exp\left( \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right) \sum_{i = 1}^n\frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i} (\mathbf{Z}_{\cdot, i})^\top \right] \\ &= \underbrace{\frac{\partial}{\partial \beta}[\exp(\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta))] \left( \sum_{i = 1}^n \frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i} \mathbf{z}_i^\top\right)}_{(a)} + \underbrace{\exp(\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)) \frac{\partial}{\partial \beta} \left[ \sum_{i = 1}^n \frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i} (\mathbf{Z}_{i, \cdot})^\top\right]}_{(b)} & \left( \text{product rule}\right) \\ &= \exp(\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)) \left[\left( \sum_{i = 1}^n \frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i} (\mathbf{Z}_{i, \cdot})^\top \right) \left( \sum_{i = 1}^n \frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i} (\mathbf{Z}_{i, \cdot}) \right) + \sum_{i = 1}^n \frac{\partial^2 \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i^2} (\mathbf{Z}_{i, \cdot})^\top \mathbf{Z}_{i, \cdot} \right] \end{aligned} \nonumber $$ We show $(a)$ and $(b)$ separately: $$ \begin{aligned} (a) &=\frac{\partial}{\partial \beta} \left[ \exp\left( \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right) \right] \left[ \sum_{i = 1}^n \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta) (\mathbf{Z}_{\cdot, i})^\top \right]^\top \\ &= \exp\left(\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right) \left(\sum_{i = 1}^n\frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i} (\mathbf{Z}_{i, \cdot})^\top\right)\left(\sum_{i = 1}^n\frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i} (\mathbf{Z}_{i, \cdot})^\top\right)^\top\\ &= \exp\left(\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right) \left(\sum_{i = 1}^n\frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i} (\mathbf{Z}_{i, \cdot})^\top\right)\left(\sum_{i = 1}^n\frac{\partial \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i} \mathbf{Z}_{i, \cdot} \right) \\ (b) &= \exp\left( \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right)\frac{\partial}{\partial \beta} \left[ \sum_{i = 1}^n \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta) (\mathbf{Z}_{i, \cdot})^\top \right] \\ &= \exp\left( \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right) \frac{\partial}{\partial \eta} \left[ \sum_{i = 1}^n \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta) (\mathbf{Z}_{i, \cdot})^\top \right]\frac{\partial \eta}{\partial \beta} \\ &= \exp\left( \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right) \frac{\partial}{\partial \eta} \left[ \sum_{i = 1}^n \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta) (\mathbf{Z}_{i, \cdot})^\top \right] \mathbf{Z} & \left(\eta = \mathbf{X} \alpha + \mathbf{Z} \beta \right) \\ &= \exp\left( \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right)\left( \sum_{i = 1}^n \frac{\partial^2 \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta)}{\partial \eta_i^2} (\mathbf{Z}_{i, \cdot})^\top \mathbf{Z}_{i, \cdot} \right) & \left( \frac{\partial \ell_q}{\partial \eta} \in \mathbb{R}^{n \times 1}, \mathbf{Z}^\top \in \mathbb{R}^{q \times n}, (\mathbf{Z}_{i, \cdot})^\top \in \mathbb{R}^{q \times 1} \right) \\ \end{aligned} \nonumber $$ In matrix notation: $$ \begin{aligned} \frac{\partial^2 \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2\rvert \beta)}{\partial \beta \partial \beta^\top} &= \exp(\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)) \left( \mathbf{Z}^\top \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta}\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta^\top} \mathbf{Z} + \mathbf{Z}^\top \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top} \mathbf{Z} \right)\\ &= \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \mathbf{Z}^\top \left( \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta}\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta^\top}+ \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top}\right) \mathbf{Z} \end{aligned} \nonumber $$We can check the above derivations with our example. We have: $$ \frac{\partial \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta} = \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left(\mathbf{y} - \mu \right), \hspace{5mm} \frac{\partial^2 \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top} = \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left[ (\mathbf{y} - \mu)(\mathbf{y} - \mu)^\top - \text{diag}(\mu)\right] \label{eq:derivs-eta} $$
Proof.
We first find the first-order derivatives of the log quasi-likelihood with respect to the linear predictor, $\eta$: $$ \begin{aligned} \frac{\partial \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta_{t, r}} &= \frac{\partial}{\partial \eta_{t,r}} \left[ \exp\left(\sum_{i = 1}^k \sum_{j = 1}^{n_i} \left[ \mathbf{y}_{i,j} \log(\mu_{i,j}) - \mu_{i,j} - \log(\mathbf{y}_{i,j}!) \right]\right)\right] \\ &= \exp\left(\sum_{i = 1}^k \sum_{j = 1}^{n_i} \left[ \mathbf{y}_{i,j} \log(\mu_{i,j}) - \mu_{i,j} - \log(\mathbf{y}_{i,j}!) \right]\right) \sum_{i = 1}^k \sum_{j = 1}^{n_i} \frac{\partial}{\partial \eta_{t,r}} \left[ \mathbf{y}_{i,j} \eta_{i,j} - \exp(\eta_{i,j}) - \log(\mathbf{y}_{i,j}!) \right] \\ &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left(\mathbf{y}_{t,r} - \exp(\eta_{t,r})\right) \\ &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left(\mathbf{y}_{t,r} - \mu_{t,r}\right) \\ \implies \frac{\partial \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta} &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left(\mathbf{y} - \mu \right) \end{aligned} \nonumber $$ Next, the second-order derivatives: $$ \begin{aligned} \frac{\partial^2 \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta_{t, r}^2} &= \frac{\partial}{\partial \eta_{t,r}} \left[ \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left(\mathbf{y}_{t,r} - \exp(\eta_{t,r}) \right) \right] \\ &= \frac{\partial \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta_{t,r}}(\mathbf{y}_{t,r} - \mu_{t,r}) + \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) (-\exp(\eta_{t,r})) \\ &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left(\mathbf{y}_{t,r} - \mu_{t,r}\right)^2 - \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \mu_{t,r} \\ &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left( \left(\mathbf{y}_{t,r} - \mu_{t,r}\right)^2 - \mu_{t,r} \right) \\ \frac{\partial^2 \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta_{t,r} \partial \eta_{s, q}} &= \frac{\partial}{\partial \eta_{s, q}} \left[ \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left(\mathbf{y}_{t,r} - \exp(\eta_{t,r}) \right) \right] \\ &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left(\mathbf{y}_{s,q} - \mu_{s,q}\right)\left(\mathbf{y}_{t,r} - \mu_{t,r} \right) \\ \implies \frac{\partial^2 \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top} &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left[ (\mathbf{y} - \mu)(\mathbf{y} - \mu)^\top - \text{diag}(\mu)\right] \end{aligned} \nonumber $$Proof.
Now we take the derivatives with respect to $\beta$: $$ \begin{aligned} \frac{\partial \mathcal{L}(\mathbf{y};\alpha, \tau^2 \rvert \beta)}{\partial \beta_t} &= \frac{\partial}{\partial \beta_t} \left[ \exp\left(\sum_{i = 1}^k \sum_{j = 1}^{n_i} \left[ \mathbf{y}_{i,j} \log(\mu_{i,j}) - \mu_{i,j} - \log(\mathbf{y}_{i,j}!) \right]\right) \right] \\ &= \exp\left(\sum_{i = 1}^k \sum_{j = 1}^{n_i} \left[ \mathbf{y}_{i,j} \log(\mu_{i,j}) - \mu_{i,j} - \log(\mathbf{y}_{i,j}!) \right]\right) \sum_{i = 1}^k \sum_{j = 1}^{n_i} \frac{\partial}{\partial \beta_t} \left[ \mathbf{y}_{i,j} \log(\mu_{i,j}) - \mu_{i,j} - \log(\mathbf{y}_{i,j}!)\right] \\ &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \sum_{j = 1}^{n_t} \left[\mathbf{y}_{t,j}\frac{\partial}{\partial \beta_t} \left[\eta_{t,j} \right] - \frac{\partial}{\partial \beta_t} \left[ \exp(\eta_{t,j})\right] \right] \\ &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \sum_{j =1}^{n_t} \left[ \mathbf{y}_{t,j} \mathbf{z}_{t,j} - \exp(\eta_{t,j})\mathbf{z}_{t,j}\right] \\ &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \sum_{j =1}^{n_t} \left[ \mathbf{z}_{t,j}( \mathbf{y}_{t,j} - \mu_{t,j}) \right] \\ &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) (\tilde{\mathbf{Z}}^t)^\top (\mathbf{y}^t - \mu^t) \\ \implies \frac{\partial \mathcal{L}(\mathbf{y};\alpha, \tau^2 \rvert \beta)}{\partial \beta} &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \mathbf{Z}^\top (\mathbf{y} - \mu) \end{aligned} \nonumber $$ where we let $\mathbf{y}^t$ and $\mu^t$ denote the response vector and mean vector of the $t$-th cluster, respectively. The second-order partial derivatives are: $$ \begin{aligned} \frac{\partial^2 \mathcal{L}(\mathbf{y};\alpha, \tau^2 \rvert \beta)}{\partial \beta_t^2} &= \frac{\partial}{\partial \beta_t}\left[ \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) (\tilde{\mathbf{Z}}^t)^\top (\mathbf{y}^t - \mu^t) \right] \\ &= \frac{\partial \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \beta_t} (\tilde{\mathbf{Z}}^t)^\top(\mathbf{y}^t - \mu^t) + \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\frac{\partial}{\partial \beta^t}\left[ (\tilde{\mathbf{Z}}^t)^\top \mathbf{y}^t - \exp(\eta^t)\right] \\ &= \mathcal{L}(\mathbf{y}; \alpha,\tau^2 \rvert \beta) \left((\tilde{\mathbf{Z}}^t)^\top (\mathbf{y}^t - \mu^t)\right)^2 + \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \frac{\partial}{\partial \beta^t}\left[ \sum_{j = 1}^{n_t} \mathbf{z}_{t,j} (\mathbf{y}_{t,j} - \exp(\eta_{t,j})) \right]\\ &= \mathcal{L}(\mathbf{y}; \alpha,\tau^2 \rvert \beta) \left((\tilde{\mathbf{Z}}^t)^\top (\mathbf{y}^t - \mu^t)\right)^2 + \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \left( \sum_{j = 1}^{n_t} \mathbf{z}_{t,j} (- \exp(\eta_{t,j})) \mathbf{z}_{t,j} \right)\\ &= \mathcal{L}(\mathbf{y}; \alpha,\tau^2 \rvert \beta) \left((\tilde{\mathbf{Z}}^t)^\top (\mathbf{y}^t - \mu^t)\right)^2 + \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) (\tilde{\mathbf{Z}}^t)^\top \text{diag}\left(-\mu^t \right)\tilde{\mathbf{Z}}^t \\ \frac{\partial^2 \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \beta_t \partial \beta_s} &= \frac{\partial}{\partial \beta_s}\left[ \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) (\tilde{\mathbf{Z}}^t)^\top (\mathbf{y}^t - \mu^t) \right] \\ &= \frac{\partial \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \beta_s} (\tilde{\mathbf{Z}}^t)^\top (\mathbf{y}^t - \mu^t) \\ &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) (\tilde{\mathbf{Z}}^s)^\top (\mathbf{y}^s - \mu^s) (\tilde{\mathbf{Z}}^t)^\top (\mathbf{y}^t - \mu^t) \\ \implies \frac{\partial^2 \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \beta \partial \beta^\top} &= \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \mathbf{Z}^\top \left[ (\mathbf{y} - \mu) (\mathbf{y} - \mu)^\top - \text{diag}(\mu) \right] \mathbf{Z} \end{aligned} \nonumber $$Let \(\mathcal{L}'_q(\beta_0)\) and \(\mathcal{L}''_q(\beta_0)\) denote the partial derivatives evaluated at $\beta = \beta_0$. The Taylor expansion of $\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)$ at $\beta_0$ is:
\[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) = \left[ \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \beta_0} + (\beta - \beta_0)^\top \left[\frac{\partial \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \beta} \right]\bigg\rvert_{\beta = \beta_0} + \frac{1}{2}(\beta - \beta_0)^\top \left[ \frac{\partial \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \beta \partial \beta^\top}\right] \bigg\rvert_{\beta = \beta_0} (\beta - \beta_0) + [R]\rvert_{\beta = \beta_0} \label{eq:taylor-expansion}\]where $R$ is a remainder term containing the higher order partial derivatives. Let $\bar{R}$ denote the remainder terms divided by \(\left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right]\rvert_{\beta = \beta_0}\). Factoring out $\exp(\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta))$ and letting $\beta_0 = \mathbf{0}$ (the random effects mean) we get:
\[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) = \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \beta^\top\left[ \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert\beta)}{\partial \eta} \right]\bigg\rvert_{\beta = \mathbf{0}} + \frac{1}{2}\beta^\top \left(\mathbf{Z}^\top \left[ \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta}\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert\beta)}{\partial \eta^\top} + \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top}\right]\bigg\rvert_{\beta = \mathbf{0}} \mathbf{Z} \right)\beta + [\bar{R}]\rvert_{\beta = \mathbf{0}} \right) \label{eq:taylor-exp-0}\]The Taylor expansion of the likelihood is fairly straightforward given Eqs. \eqref{eq:derivs-beta}: $$ \begin{aligned} \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) &= [\mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)]_{\beta = \mathbf{0}} + \beta^\top \left[\frac{\partial \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \beta} \right]_{\beta = \mathbf{0}} + \frac{1}{2} \beta^\top \left[ \frac{\partial \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \beta \partial \beta^\top} \right]_{\beta = \mathbf{0}} \beta + \left[\bar{R}\right]_{\beta = \mathbf{0}} \\ &\approx [\mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)]_{\beta = \mathbf{0}} \left(1 + \beta^\top \left[ \mathbf{Z}^\top (\mathbf{y} - \mu^0) \right] + \frac{1}{2} \beta^\top \left[ \mathbf{Z}^\top \left[ (\mathbf{y} - \mu^0) (\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0) \right] \mathbf{Z} \right] \beta \right) \end{aligned} \label{eq:ex-taylor} $$
Taking The Expectation
Notice that Eq. \eqref{eq:uncondition-log-lik} is just the expectation of $\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)$ ($\mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)$, respectively) with respect to the probability measure specified by $\mathcal{F}$. We can therefore write the unconditional quasi-likelihood (or likelihood) with the expectation of our Taylor expansion:
\[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) = \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \frac{1}{2}\text{tr} \left[\mathbf{Z}^\top \left[\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta} \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta^\top} + \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top}\right]\bigg\rvert_{\beta = \mathbf{0}} \mathbf{Z} D(\tau^2) \right] + o(\rvert \rvert \tau^2 \rvert \rvert) \right) \label{eq:taylor-expansion-expectation}\]Proof.
Let: $$ \bar{\mathcal{L}}''_q(\mathbf{0}) = \left(\mathbf{Z}^\top\left[ \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta}\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta^\top} + \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top} \right]\bigg\rvert_{\beta = \mathbf{0}}\mathbf{Z} \right) \nonumber $$ Then: $$ \begin{aligned} \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2) &= \int \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) f(\beta) d\beta \\ &= \mathbb{E}_{\beta} \left[ \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right] \\ &= \mathbb{E}_{\beta} \left[ \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \beta^\top \left[\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta}\right]\bigg\rvert_{\beta = \mathbf{0}} + \frac{1}{2}\beta^\top \bar{\mathcal{L}}''_q(\mathbf{0}) \beta + \bar{R} \right)\right] \\ &= \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \mathbb{E}_{\beta} \left[ \beta^\top \left[\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta}\right]\bigg\rvert_{\beta = \mathbf{0}}\right] + \frac{1}{2}\mathbb{E}_{\beta} \left[ \beta^\top \bar{\mathcal{L}}''_q(\mathbf{0}) \beta \right] + \mathbb{E}_{\beta} \left[ \bar{R} \right]\right) \\ &= \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \frac{1}{2}\mathbb{E}_{\beta} \left[ \beta^\top \bar{\mathcal{L}}''_q(\mathbf{0}) \beta \right] + \mathbb{E}_{\beta} \left[ \bar{R} \right]\right) & \left(\mathbb{E}_\beta \left[ \beta^\top \right] = \mathbf{0} \right) \\ &= \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \frac{1}{2}\mathbb{E}_{\beta} \left[ \text{tr}\left[ \beta^\top \bar{\mathcal{L}}''_q(\mathbf{0}) \beta \right] \right] + \mathbb{E}_{\beta} \left[ \bar{R} \right]\right) & \left( \beta^\top \bar{\mathcal{L}}''_q(\mathbf{0}) \beta \in \mathbb{R} \right) \\ &= \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \frac{1}{2}\mathbb{E}_{\beta} \left[ \text{tr}\left[ \bar{\mathcal{L}}''_q(\mathbf{0}) \beta \beta^\top \right] \right] + \mathbb{E}_{\beta} \left[ \bar{R} \right]\right) & \left( \text{tr}\left[ \cdot \right] \text{ commutes}\right) \\ &= \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \frac{1}{2}\text{tr}\left[ \mathbb{E}_{\beta} \left[ \bar{\mathcal{L}}''_q(\mathbf{0}) \beta \beta^\top \right] \right] + \mathbb{E}_{\beta} \left[ \bar{R} \right]\right) & \left( \text{tr}\left[ \cdot \right] \text{ is linear} \right) \\ &= \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \frac{1}{2} \text{tr}\left[ \bar{\mathcal{L}}''_q(\mathbf{0}) \text{Cov}(\beta) \right] + \mathbb{E}_{\beta} \left[ \bar{R} \right]\right) & \left( \mathbb{E}_\beta \left[ \beta \right] = \mathbf{0} \implies \mathbb{E}_\beta \left[ \beta \beta^\top \right] = \text{Cov}(\beta)\right) \\ &= \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}}\left( 1 + \frac{1}{2}\text{tr}\left[ \bar{\mathcal{L}}''_q(\mathbf{0}) D(\tau^2) \right] + \mathbb{E}_{\beta} \left[ \bar{R} \right]\right) & \left( \text{Cov}(\beta) = D(\tau^2) \right) \\ &= \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \frac{1}{2} \text{tr}\left[ \bar{\mathcal{L}}''_q(\mathbf{0}) D(\tau^2) \right] + o(\rvert \rvert \tau^2 \rvert\rvert)\right) & \left( \text{higher order assumption} \right) \end{aligned} \nonumber $$ Plugging in for $\mathcal{L}''_q(\mathbf{0})$ we found in Eq. \eqref{eq:likelihood-derivs}: $$ \begin{aligned} \mathcal{L}_q(\mathbf{y}; \alpha, \tau^2) &= \left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \frac{1}{2} \text{tr}\left[ \bar{\mathcal{L}}''_q(\mathbf{0}) D(\tau^2) \right] + o(\rvert \rvert \tau^2 \rvert\rvert)\right) \\ &=\left[\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} \left( 1 + \frac{1}{2}\text{tr} \left[\mathbf{Z}^\top \left[\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta} \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta^\top} + \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top}\right]\bigg\rvert_{\beta = \mathbf{0}} \mathbf{Z} D(\tau^2) \right] + o(\rvert \rvert \tau^2 \rvert \rvert) \right) \end{aligned} \nonumber $$Taking the natural logarithm of both sides gives us the marginal log quasi-likelihood:
\[\begin{aligned} \ell_q(\mathbf{y}; \alpha, \tau^2) &= \log(\mathcal{L}_q(\mathbf{y}; \alpha, \tau^2)) \\ &= \left[\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}} + \log\left(1 + \frac{1}{2} \text{tr} \left[ \mathbf{Z}^\top \left[\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta} \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta^\top} + \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top}\right]\bigg\rvert_{\beta = \mathbf{0}} \mathbf{Z} D(\tau^2) \right] + o(\rvert \rvert \tau^2 \rvert \rvert)\right) \\ &\approx \left[\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]\bigg\rvert_{\beta = \mathbf{0}}+ \frac{1}{2} \text{tr} \left[ \mathbf{Z}^\top \left[\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta} \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta^\top} + \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top}\right]\bigg\rvert_{\beta = \mathbf{0}} \mathbf{Z} D(\tau^2) \right] \end{aligned} \label{eq:log-quasi-lik}\]The approximation follows from the fact that $\log(1 + x) \approx x$ for small $x$, and we drop the remainder term since we assume it is small.
Taking the expectation of Eq. \eqref{eq:ex-taylor} gives us: $$ \mathbb{E}_{H_0}\left[ \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right] = [\mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)]_{\beta = \mathbf{0}} \left(1 + \frac{\tau^2}{2}\text{tr}\left[ \mathbf{Z}^\top \left[ (\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0) \right] \mathbf{Z} \right] \right) $$
Proof.
$$ \begin{aligned} \mathbb{E}\left[ \mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right]_{H_0} &\approx [\mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)]_{\beta = \mathbf{0}} \mathbb{E}_{H_0}\left[1 + \beta^\top \left[ \mathbf{Z}^\top(\mathbf{y} - \mu^0) \right] + \frac{1}{2} \beta^\top \left[ \mathbf{Z}^\top \left[ (\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0) \right] \mathbf{Z} \right] \beta \right] \\ &= [\mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)]_{\beta = \mathbf{0}} \left(1 + \frac{1}{2}\mathbb{E}_{H_0}\left[ \text{tr}\left[ \beta^\top \left( \mathbf{Z}^\top \left[ (\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0) \right] \mathbf{Z} \right) \beta \right] \right] \right) \\ &= [\mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)]_{\beta = \mathbf{0}} \left(1 + \frac{1}{2}\text{tr}\left[ \mathbb{E}_{H_0}\left[ \mathbf{Z}^\top \left[ (\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0) \right] \mathbf{Z} \beta \beta^\top \right] \right] \right) \\ &= [\mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)]_{\beta = \mathbf{0}} \left(1 + \frac{1}{2}\text{tr}\left[ \mathbf{Z}^\top \left[ (\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0) \right] \mathbf{Z} \mathbb{E}_{H_0}\left[ \beta \beta^\top \right] \right] \right) \\ &= [\mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)]_{\beta = \mathbf{0}} \left(1 + \frac{1}{2}\text{tr}\left[ \mathbf{Z}^\top \left[ (\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0) \right] \mathbf{Z} [\tau^2 \mathbb{I}_{k \times k}] \right] \right) \\ &= [\mathcal{L}(\mathbf{y}; \alpha, \tau^2 \rvert \beta)]_{\beta = \mathbf{0}} \left(1 + \frac{\tau^2}{2}\text{tr}\left[ \mathbf{Z}^\top \left[ (\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0) \right] \mathbf{Z} \right] \right) \end{aligned} \nonumber $$Of course, these derivations include a bunch of approximations chained together, which raises the question of how well they’ll work in practice. We’ll have to just wait and find out I guess…
Inference
Okay, so now we have an approximation of the likelihood or quasi-likelihood, which we can use for inference. We’ll continue with the Taylor approximation. We should note that we further assume that:
\[\frac{\partial g^{-1}(\eta_i)}{\partial \eta_i} = \frac{\partial \mu_i}{\partial \eta_i} = \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{-1} \label{eq:deriv-assumption}\]which holds if you assume that $g(\cdot)$ is strictly monotone and continuous. This implies that $g(\cdot)$ is invertible, and its derivative is never $0$. This holds for many standard choices of link function (e.g. $\log$, $\text{logit}$, the identity).
Since the score function involves a derivative, there is a separate one for each parameter. Due to fact that we assume that the random effects are Gaussian with mean zero and covariance matrix parametrized by $\tau^2$, we only really have $\alpha$, $\tau^2$, and $\phi$ are our main parameters (since $\omega$ are prior weights).
Before we derive the scores, let’s first define the following terms:
\[\delta_i = \frac{1}{\frac{\partial \eta_i}{\partial \mu_i}}, \hspace{5mm} \rho_i = \frac{\frac{\partial v(\mu_i)}{\partial \mu_i} \frac{\partial \eta_i}{\partial \mu_i} + v(\mu_i) \frac{\partial^2 \eta_i}{\partial \mu_i^2}}{v^2(\mu_i) \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^3}, \hspace{5mm} \gamma_i = \frac{1}{v(\mu_i) \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^2}, \hspace{5mm} \gamma_{0, i} = \gamma_i + \rho_i(\mathbf{y}_i - \mu_i) \label{eq:intermed-terms}\]where we recall that we let $v(\mu_i) = d(\phi, \omega_i) V(\mu_i) = \frac{\phi}{\omega_i}V(\mu_i)$. We’ll also define the following matrices:
\[\Delta = \begin{bmatrix} \delta_1 & \dots & 0 \\ \vdots & \ddots &\vdots \\ 0 & \dots & \delta_n \end{bmatrix}, \hspace{5mm} \Gamma = \begin{bmatrix} \gamma_1 & \dots & 0 \\ \vdots & \ddots &\vdots \\ 0 & \dots & \gamma_n \end{bmatrix}, \hspace{5mm} \Gamma_0 = \begin{bmatrix} \gamma_{0, 1} & \dots & 0 \\ \vdots & \ddots &\vdots \\ 0 & \dots & \gamma_{0, n} \end{bmatrix} \label{eq:intermed-mats}\]To save space, we’ll also define:
\[\mathbf{C} = \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta}\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta^\top} + \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top}\]and we’ll let $\mathbf{C}^0 = \mathbf{C} \rvert_{\beta = \mathbf{0}}$ as before.
To streamline our analysis later, let’s do some initial derivations. We have:
\[\begin{aligned} \frac{\partial}{\partial \eta_i} \left[ \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{-1} \right] &= -\frac{\partial^2 \eta_i}{\partial \mu_i^2}\left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{-2} \\ \frac{\partial}{\partial \mu_i} \left[ \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right] &= \frac{\mathbf{y}_i - \mu_i}{v(\mu_i)} \\ \frac{\partial}{\partial \eta_i}\left[ \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right] &= \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{-1} \frac{\mathbf{y}_i - \mu_i}{v(\mu_i)} = \gamma_i \delta^{-1}_i (\mathbf{y}_i - \mu_i) \\ -\frac{\partial^2}{\partial \eta_i^2}[ \ell_q(\mathbf{y}; \alpha, \tau^2\rvert \beta)] &= - \frac{\partial}{\partial \eta_i} \left[ \frac{\mathbf{y}_i - \mu_i}{v(\mu_i) \frac{\partial \eta_i}{\partial \mu_i}} \right] = \gamma_{0, i} \end{aligned} \label{eq:intermed-derivs}\]Proof.
$$ \begin{aligned} \frac{\partial}{\partial \eta_i} \left[ \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{-1}\right] &= - \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{-2} \frac{\partial \mu_i}{\partial \eta_i} \frac{\partial}{\partial \mu_i}\left[ \frac{\partial \eta_i}{\partial \mu_i} \right] = -\frac{\partial^2 \eta_i}{\partial \mu_i^2}\left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{-2} \\ \frac{\partial}{\partial \mu_i} \left[ \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right] &= \frac{\partial}{\partial \mu_i} \left[ \int_{y_i}^{\mu_i} \frac{\omega_i(\mathbf{y}_i - u)}{\phi V(u)} du \right] = \frac{\omega_i(\mathbf{y}_i - \mu_i)}{\phi V(\mu_i)} = \frac{\mathbf{y}_i - \mu_i}{v(\mu_i)} \\ \frac{\partial}{\partial \eta_i}\left[ \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right] &= \frac{\partial \mu_i}{\partial \eta_i} \frac{\partial}{\partial \mu_i} \left[\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right] = \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{-1} \frac{\mathbf{y}_i - \mu_i}{v(\mu_i)} = \frac{\frac{\partial \eta_i}{\partial \mu_i}}{v(\mu_i) \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^2} (\mathbf{y}_i - \mu_i) = \gamma_i \delta^{-1}_i (\mathbf{y}_i - \mu_i) \nonumber \end{aligned} $$ To show the last equality, we note that: $$ \begin{aligned} \frac{\partial^2 \mu_i}{\partial \eta_i^2} &= \frac{\partial \mu_i}{\partial \eta_i} \frac{\partial}{\partial \mu_i}\left[ \frac{\partial \mu_i}{\partial \eta_i}\right] = - \frac{\partial \mu_i}{\partial \eta_i} \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^2 \frac{\partial^2 \eta_i}{\partial \mu_i^2} = - \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{-3} \frac{\partial^2 \eta_i}{\partial \mu_i^2} \hspace{5mm} \implies \hspace{5mm} \frac{\partial^2 \eta_i}{\partial \mu_i^2} = -\frac{\partial^2 \mu_i}{\partial \eta_i^2} \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{3} \end{aligned} \label{eq:temp} $$ Then: $$ \begin{aligned} \frac{\partial^2}{\partial \eta_i^2}[ \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)] &= \frac{\partial}{\partial \eta_i} \left[ \frac{\partial \mu_i}{\partial \eta_i} \right] \left(\frac{\mathbf{y}_i - \mu_i}{v(\mu_i)} \right)+ \left(\frac{\partial \mu_i}{\partial \eta_i} \right)\frac{\partial \mu_i}{\partial \eta_i} \frac{\partial}{\partial \mu_i} \left[ \frac{\mathbf{y}_i - \mu_i}{v(\mu_i)} \right] & \left(\text{chain rule}\right) \\ &= \frac{\partial^2 \mu_i}{\partial \eta_i^2} \left(\frac{\mathbf{y}_i - \mu_i}{v(\mu_i)} \right) + \left( \frac{\partial \mu_i}{\partial \eta_i}\right)^2 \left(\frac{-v(\mu_i) - \frac{\partial v(\mu_i)}{\partial \mu_i}(\mathbf{y}_i - \mu_i)}{v^2(\mu_i)}\right) \\ &= - \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{-3} \frac{\partial^2 \eta_i}{\partial \mu_i^2} \left( \frac{\mathbf{y}_i - \mu_i}{v(\mu_i)}\right) + \left( \frac{\partial \mu_i}{\partial \eta_i}\right)^2 \left(\frac{-v(\mu_i) - \frac{\partial v(\mu_i)}{\partial \mu_i}(\mathbf{y}_i - \mu_i)}{v^2(\mu_i)}\right) & \left(\text{Eq. } \eqref{eq:temp}\right) \\ &= - \frac{\frac{\partial^2 \eta_i}{\partial \mu_i^2} \left( \frac{\mathbf{y}_i - \mu_i}{v(\mu_i)}\right)}{\left(\frac{\partial \eta_i}{\partial \mu_i}\right)^{3}} + \frac{-v(\mu_i) - \frac{\partial v(\mu_i)}{\partial \mu_i}(\mathbf{y}_i - \mu_i)}{v^2(\mu_i) \left( \frac{\partial \eta_i}{\partial \mu_i}\right)^2 } & \left( \frac{\partial \mu_i}{\partial \eta_i} = \left(\frac{\partial \eta_i}{\partial \mu_i} \right)^{-1} \text{ by Eq. \eqref{eq:deriv-assumption}}\right) \\ &= - \frac{1}{v(\mu_i) \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^2} - (\mathbf{y}_i - \mu_i)\left(\frac{\frac{\partial^2 \eta_i}{\partial \mu_i^2}}{v(\mu_i) \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^3} + \frac{\frac{\partial v(\mu_i)}{\partial \mu_i}}{v^2(\mu_i)\left(\frac{\partial \eta_i}{\partial \mu_i}\right)^2}\right) \\ &= - \gamma_i - (\mathbf{y}_i - \mu_i)\left(\frac{v(\mu_i) \frac{\partial^2 \eta_i}{\partial \mu_i^2}}{v^2(\mu_i) \left(\frac{\partial \eta_i}{\partial \mu_i}\right)^3} + \frac{\frac{\partial v(\mu_i)}{\partial \mu_i} \frac{\partial \eta_i}{\partial \mu_i}}{v^2(\mu_i)\left(\frac{\partial \eta_i}{\partial \mu_i}\right)^3}\right) \\ &= - \gamma_i - \rho_i (\mathbf{y}_i - \mu_i) \\ &= - \gamma_{0, i} \end{aligned} \nonumber $$If we consider all components of $\eta$ at the same time, the last two equalities become matrices:
\[\frac{\partial}{\partial \eta}[\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)] = \Gamma \Delta^{-1}(\mathbf{y} - \mu), \hspace{10mm} \frac{\partial^2}{\partial \eta \partial \eta^\top}[\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)] = -\Gamma_0\]For our example, the mean equals the variance, so we have: $$ \delta_{i,j} = \mu_{i,j}, \hspace{5mm} \rho_{i,j} = 0, \hspace{5mm} \gamma_{i,j} = \mu_{i,j}, \hspace{5mm} \gamma_{0,i,j} = \mu_{i,j} $$
Proof.
For our example, we use $\eta_{i,j} = \log(\mu_{i,j})$ and $v(\mu_{i,j}) = \mu_{i,j}$. Thus: $$ \frac{\partial \eta_{i,j}}{\partial \mu_{i,j}} = \frac{1}{\mu_{i,j}}, \hspace{5mm} \frac{\partial^2 \eta_{i,j}}{\partial \mu_{i,j}^2} = -\frac{1}{\mu_{i,j}^2}, \hspace{5mm} \frac{\partial v(\mu_{i,j})}{\partial \mu_{i,j}} = 1 \nonumber $$ Plugging these into Eq. \eqref{eq:intermed-terms} gives the desired result.Scores
Common choices for parameter estimation and hypothesis testing will require us to take the gradient of the likelihood with respect to the parameters of interest. That is, we need to find the score function (see my likelihood post).
Score For $\alpha$
The score for $\alpha$ has elements:
\[U_{\alpha_j}\rvert_{H_0} = \frac{\partial}{\partial \alpha_j} [\ell_q(\mathbf{y}; \alpha, \tau^2) ]\bigg\rvert_{H_0} \approx \left(\mathbf{X}_{\cdot, j}\right)^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) + \frac{1}{2} \text{tr}\left[ \frac{\partial \mathbf{C}^0}{\partial \alpha_j} \mathbf{Z} D(\tau^2) \mathbf{Z}^\top \right] \label{eq:alpha-score}\]Proof.
$$ \begin{aligned} U_{\alpha_j} \rvert_{H_0} &= \frac{\partial}{\partial \alpha_j} \left[ \ell_q(\mathbf{y}; \alpha, \tau^2) \right] \bigg\rvert_{H_0}\\ &\approx \frac{\partial}{\partial \alpha_j} \left[ \left[\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]_{\beta = \mathbf{0}} + \frac{1}{2} \text{tr} \left[ \mathbf{Z}^\top \left[\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta} \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta^\top} + \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top}\right]_{\beta = \mathbf{0}} \mathbf{Z} D(\tau^2) \right] \right] & \left(\text{Eq. } \eqref{eq:log-quasi-lik} \right) \\ &= \frac{\partial}{\partial \alpha_j} \left[ \left[ \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right]\bigg\rvert_{H_0} \right] + \frac{1}{2} \frac{\partial}{\partial \alpha_j} \left[\text{tr}\left[ \mathbf{Z}^\top \mathbf{C}^0 \mathbf{Z} D(\tau^2) \right] \right] \\ &= \frac{\partial}{\partial \alpha_j} \left[ \left[ \sum_{i = 1}^n \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta) \right]\bigg\rvert_{\beta= \mathbf{0}}\right] + \frac{1}{2} \frac{\partial}{\partial \alpha_j} \text{tr} \left[ \mathbf{C}^0 \mathbf{Z} D(\tau^2) \mathbf{Z}^\top \right] & \left(\text{tr}\left[ \cdot \right] \text{ is linear} \right) \\ &= \frac{\partial}{\partial \alpha_j} \left[ \left[\sum_{i = 1}^n \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta) \right]\bigg\rvert_{\beta= \mathbf{0}} \right] + \frac{1}{2} \text{tr}\left[ \frac{\partial \mathbf{C}^0}{\partial \alpha_j} \mathbf{Z} D(\tau^2) \mathbf{Z}^\top \right] & \left(\text{by } (a)\right) \\ &= \left(\mathbf{X}_{\cdot, j}\right)^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) + \frac{1}{2} \text{tr}\left[ \frac{\partial \mathbf{C}^0}{\partial \alpha_j} \mathbf{Z} D(\tau^2) \mathbf{Z}^\top \right] & \left(\text{by } (b)\right) \end{aligned} \nonumber $$ $$ \begin{aligned} (a): \frac{\partial}{\partial \alpha_j} \left[ \text{tr} \left[ \mathbf{C}^0 \mathbf{Z} D(\tau^2) \mathbf{Z}^\top \right] \right] &= \sum_{i = 1}^n \sum_{i' = 1}^n \frac{\partial}{\partial \alpha_j} \left[ \mathbf{C}^0_{i, i'} (\mathbf{Z} D(\tau^2) \mathbf{Z}^\top)_{i, i'} \right] = \text{tr}\left[ \frac{\partial \mathbf{C}^0}{\partial \alpha_j} \mathbf{Z} D(\tau^2) \mathbf{Z}^\top \right] \\ (b):\frac{\partial}{\partial \alpha_j} \left[ \left[\ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right] \bigg\rvert_{\beta= \mathbf{0}} \right] &= \sum_{i = 1}^n \frac{\partial}{\partial \alpha_j} \left[ \left[\ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta) \right] \bigg\rvert_{\beta= \mathbf{0}} \right]\\ &= \sum_{i = 1}^n \frac{\partial \eta_i}{\partial \alpha_j} \frac{\partial}{\partial \eta_i} \left[ \left[ \ell_q(\mathbf{y}_i; \alpha, \tau^2 \rvert \beta) \right]\bigg\rvert_{\beta= \mathbf{0}} \right] \\ &= \sum_{i = 1}^n \mathbf{x}_{i,j} \gamma^0_i (\delta^0_i)^{-1}\left(\mathbf{y}_i - \mu^0_i\right) & \left( \eta_{i,t} = \mathbf{x}_i^\top \alpha + \mathbf{z}_i^\top \beta_t \right) \\ &= \left(\mathbf{X}_{\cdot, j}\right)^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) \end{aligned} \nonumber $$Letting $\tilde{\mu}_t$ denote an $n$-dimensional vector that is equal to $\mu_t$ for the entries corresponding to cluster $t$ and zero everywhere else, we have: $$ \begin{aligned} \frac{\partial \mathbf{C}}{\partial \alpha_t} &= - \mathbf{y}\tilde{\mu}_t^\top - \tilde{\mu}_t^\top \mathbf{y} + \begin{bmatrix} \mathbf{0}_{n \times \sum_{i < t}n_i} & \mu \mu_t^\top & \mathbf{0}_{n \times \sum_{i > t} n_i} \end{bmatrix} - \text{diag}\left(\tilde{\mu}_{t}\right) \in \mathbb{R}^{n \times n} \\ \frac{\partial}{\partial \alpha_t} \left[ \ell(\mathbf{y}; \alpha, \tau^2 \rvert \beta) \right] &= \sum_{j = 1}^{n_t} \left[ \mathbf{y}_{t, j} - \mu_{t,j} \right] \end{aligned} $$ Thus, $\frac{\mathbf{C}}{\partial \alpha}$ is the $n \times n \times k$ tensor made from stacking together all of the above matrices.
Proof.
We have the following identities: $$ \frac{\partial \mu}{\partial \alpha} = \begin{bmatrix} \mu_1 & \mathbf{0}_{n_1} & \dots & \mathbf{0}_{n_1} \\ \mathbf{0}_{n_2} & \mu_2 & \dots & \mathbf{0}_{n_2} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{0}_{n_k} & \mathbf{0}_{n_k} & \dots & \mu_k \end{bmatrix} \in \mathbb{R}^{n \times k}, \hspace{5mm} \frac{\partial }{\partial \alpha} \left[ \mu \mu^\top \right] = \begin{bmatrix} \begin{bmatrix} \mu \mu_1^\top & \mathbf{0}_{n \times \sum_{i > 1} n_i} \end{bmatrix} & \dots & \begin{bmatrix} \mathbf{0}_{n \times \sum_{i < k} n_i} & \mu \mu_k^\top \end{bmatrix} \end{bmatrix} \in \mathbb{R}^{n \times n \times k} $$Proof.
$$ \begin{aligned} \frac{\partial \mu_{t,j}}{\partial \alpha_t} &= \frac{\partial}{\partial \alpha_t} \left[ \exp(\alpha_t + \beta_t \mathbf{z}_{t,j})\right] = \exp(\alpha_t + \beta_t \mathbf{z}_{t,j}) = \mu_{t,j} \\ \frac{\partial \mu_{i,j}}{\partial \alpha_t} &= 0 \\ \implies \frac{\partial \mu}{\partial \alpha_t} &= \begin{bmatrix} 0 & \dots & 0 & \mu_{t,1} & \dots & \mu_{t, n_t} & 0 & \dots & 0 \\ \end{bmatrix}^\top \\ \implies \frac{\partial \mu}{\partial \alpha} &= \begin{bmatrix} \frac{\partial \mu}{\partial \alpha_1} & \frac{\partial \mu}{\partial \alpha_2} & \dots & \frac{\partial \mu}{\partial \alpha_k} \end{bmatrix}^\top = \begin{bmatrix} \mu_1 & \mathbf{0}_{n_1} & \dots & \mathbf{0}_{n_1} \\ \mathbf{0}_{n_2} & \mu_2 & \dots & \mathbf{0}_{n_2} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{0}_{n_k} & \mathbf{0}_{n_k} & \dots & \mu_k \end{bmatrix} \in \mathbb{R}^{n \times k} \end{aligned} \nonumber $$ $$ \begin{aligned} \frac{\partial}{\partial \alpha_t} \left[ \mu_{t,j} \mu_{t, l} \right] &= \frac{\partial}{\partial \alpha_t} \left[ \exp(\alpha_t + \beta_t \mathbf{z}_{t,j}) \exp(\alpha_t + \beta_t \mathbf{z}_{t, l}) \right] = \exp(\alpha_t + \beta_t \mathbf{z}_{t, l})\exp(\alpha_t + \beta_t \mathbf{z}_{t, j}) + \exp(\alpha_t + \beta_t \mathbf{z}_{t, j})\exp(\alpha_t + \beta_t \mathbf{z}_{t, l}) = 2 \mu_{t,j} \mu_{t,l} \\ \frac{\partial}{\partial \alpha_t} \left[ \mu_{t,j} \mu_{s, l} \right] &= \frac{\partial}{\partial \alpha_t} \left[ \exp(\alpha_t + \beta_t \mathbf{z}_{t,j}) \exp(\alpha_s + \beta_s \mathbf{z}_{s, l}) \right] = \exp(\alpha_t + \beta_t \mathbf{z}_{t,j})\exp(\alpha_s + \beta_s \mathbf{z}_{s, l}) = \mu_{t,j} \mu_{s,l} \\ \frac{\partial}{\partial \alpha_t} \left[ \mu_{i, j} \mu_{s, l} \right] &= 0 \\ \implies \frac{\partial}{\partial \alpha_t} \left[ \mu \mu^\top \right] &= \begin{bmatrix} 0 & \dots & 0 & \mu_{1, 1} \mu_{t, 1} & \dots & \mu_{1, 1} \mu_{t, n_t} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \mu_{1, n_1} \mu_{t, 1} & \dots & \mu_{1, n_1} \mu_{t, n_t} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & 2\mu_{t, 1}^2 & \dots & 2\mu_{t, 1}\mu_{t, n_t} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & 2\mu_{t, n_t} \mu_{t, 1} & \dots & 2\mu_{t, n_t}^2 & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \mu_{k, 1} \mu_{t, 1} & \dots & \mu_{k, 1} \mu_{t, n_t} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \mu_{k, n_k} \mu_{t, 1} & \dots & \mu_{k, n_k} \mu_{t, n_t} & 0 & \dots & 0 \end{bmatrix} = \begin{bmatrix} \mathbf{0}_{n_1 \times \sum_{i < t} n_i} & \mu_1 \mu_t^\top & \mathbf{0}_{n_1 \times \sum_{i > t} n_i} \\ \vdots & \vdots & \vdots \\ \mathbf{0}_{n_k \times \sum_{i < t} n_i} & \mu_k \mu_t^\top & \mathbf{0}_{n_k \times \sum_{i > t} n_i} \end{bmatrix} = \begin{bmatrix} \mathbf{0}_{n \times \sum_{i < t} n_i} & \mu \mu_t^\top & \mathbf{0}_{n \times \sum_{i > t} n_i} \end{bmatrix} \end{aligned} \nonumber $$Proof.
$$ \begin{aligned} \mathbf{y} \tilde{\mu}_t^\top &= \begin{bmatrix} 0 & \dots & 0 & \mathbf{y}_{1, 1} \mu_{t, 1} & \dots & \mathbf{y}_{1, 1} \mu_{t, n_t} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \mathbf{y}_{k, n_k} \mu_{t, 1} & \dots & \mathbf{y}_{k, n_k} \mu_{t, n_t} & 0 & \dots & 0 \end{bmatrix} \in \mathbb{R}^{n \times n} \\ \mathbf{Z}^\top &= \begin{bmatrix} \mathbf{Z}_{1,1} & \dots & \mathbf{Z}_{1, n_1} & 0 & \dots & 0 & \dots & 0 & \dots & 0 \\ 0 & \dots & 0 & \mathbf{Z}_{2, 1} & \dots & \mathbf{Z}_{2, n_2} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & 0 & \dots & 0 & \mathbf{Z}_{k, 1} & \dots & \mathbf{Z}_{k, n_k} \end{bmatrix} \in \mathbb{R}^{k \times n} \\ \implies \mathbf{Z}^\top \mathbf{y}\tilde{\mu}_t^\top &= \begin{bmatrix} 0 & \dots & 0 & \mu_{t, 1} \sum_{i = 1}^{n_1} \mathbf{Z}_{1, i} \mathbf{y}_{1, i} & \dots & \mu_{t, n_t} \sum_{i =1}^{n_1} \mathbf{Z}_{1, i} \mathbf{y}_{1, i} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \mu_{t, 1} \sum_{i = 1}^{n_k} \mathbf{Z}_{k, i} \mathbf{y}_{k, i} & \dots & \mu_{t, n_t} \sum_{i = 1}^{n_k} \mathbf{Z}_{k, i} \mathbf{y}_{k, i} & 0 & \dots & 0 \end{bmatrix} \in \mathbb{R}^{k \times n}\\ \implies \mathbf{Z}^\top \mathbf{y} \tilde{\mu}_t^\top \mathbf{Z} &= \begin{bmatrix} 0 & \dots & 0 & \sum_{i = 1}^{n_1} \sum_{j = 1}^{n_t} \mathbf{Z}_{1, i} \mathbf{Z}_{t, j} \mathbf{y}_{1, i} \mu_{t, j} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \sum_{i = 1}^{n_k} \sum_{j = 1}^{n_t} \mathbf{Z}_{k, i} \mathbf{Z}_{t, j} \mathbf{y}_{k, i} \mu_{t, j} & 0 & \dots & 0 \\ \end{bmatrix} \\ &= \begin{bmatrix} 0 & \dots & 0 & \left( \tilde{\mathbf{Z}}_1^\top \mathbf{y}_{1}\right) \left(\tilde{\mathbf{Z}}^\top_{t} \mu_{t} \right) & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \left( \tilde{\mathbf{Z}}_k^\top \mathbf{y}_{k}\right) \left(\tilde{\mathbf{Z}}^\top_{t} \mu_{t} \right) & 0 & \dots & 0 \\ \end{bmatrix} \\ \implies \text{tr} \left[ \mathbf{Z}^\top \mathbf{y} \tilde{\mu}_t^\top \mathbf{Z} \right] &= \left(\tilde{\mathbf{Z}}_t^\top \mathbf{y}_t\right)\left(\tilde{\mathbf{Z}}_t^\top \mu_t \right) \end{aligned} \nonumber $$Proof.
$$ \begin{aligned} \frac{\partial}{\partial \alpha_t} \left[ \mu \mu^\top \right] &= \begin{bmatrix} \mathbf{0}_{n \times \sum_{i < t}n_i} & \mu \mu_t^\top & \mathbf{0}_{n \times \sum_{i > t} n_i} \end{bmatrix} = \begin{bmatrix} 0 & \dots & 0 & \mu_{1, 1} \mu_{t, 1} & \dots & \mu_{1, 1} \mu_{t, n_t} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \mu_{1, n_1} \mu_{t, 1} & \dots & \mu_{1, n_1} \mu_{t, n_t} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & 2\mu_{t, 1}^2 & \dots & 2\mu_{t, 1}\mu_{t, n_t} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & 2\mu_{t, n_t} \mu_{t, 1} & \dots & 2\mu_{t, n_t}^2 & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \mu_{k, 1} \mu_{t, 1} & \dots & \mu_{k, 1} \mu_{t, n_t} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \mu_{k, n_k} \mu_{t, 1} & \dots & \mu_{k, n_k} \mu_{t, n_t} & 0 & \dots & 0 \end{bmatrix} \in \mathbb{R}^{n \times n} \\ \mathbf{Z}^\top &= \begin{bmatrix} \mathbf{Z}_{1,1} & \dots & \mathbf{Z}_{1, n_1} & 0 & \dots & 0 & \dots & 0 & \dots & 0 \\ 0 & \dots & 0 & \mathbf{Z}_{2, 1} & \dots & \mathbf{Z}_{2, n_2} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & 0 & \dots & 0 & \mathbf{Z}_{k, 1} & \dots & \mathbf{Z}_{k, n_k} \end{bmatrix} \in \mathbb{R}^{k \times n} \\ \implies \mathbf{Z}^\top \frac{\partial}{\partial \alpha_t} \left[ \mu \mu^\top \right] &= \begin{bmatrix} 0 & \dots & 0 & \mu_{t, 1} \sum_{i = 1}^{n_1} \mathbf{Z}_{1, i} \mu_{1, i} & \dots & \mu_{t, n_t} \sum_{i =1}^{n_1} \mathbf{Z}_{1, i} \mu_{1, i} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \mu_{t, 1} \sum_{i = 1}^{n_k} \mathbf{Z}_{k, i} \mu_{k, i} & \dots & \mu_{t, n_t} \sum_{i = 1}^{n_k} \mathbf{Z}_{k, i} \mu_{k, i} & 0 & \dots & 0 \end{bmatrix} \in \mathbb{R}^{k \times n}\\ \implies \mathbf{Z}^\top \frac{\partial}{\partial \alpha_t} \left[ \mu \mu^\top \right] \mathbf{Z} &= \begin{bmatrix} 0 & \dots & 0 & \sum_{i = 1}^{n_1} \sum_{j = 1}^{n_t} \mathbf{Z}_{1, i} \mathbf{Z}_{t, j} \mu_{1, i} \mu_{t, j} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \sum_{i = 1}^{n_k} \sum_{j = 1}^{n_t} \mathbf{Z}_{k, i} \mathbf{Z}_{t, j} \mu_{k, i} \mu_{t, j} & 0 & \dots & 0 \\ \end{bmatrix} \\ &= \begin{bmatrix} 0 & \dots & 0 & \left( \tilde{\mathbf{Z}}_1^\top \mu_{1}\right) \left(\tilde{\mathbf{Z}}^\top_{t} \mu_{t} \right) & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \left( \tilde{\mathbf{Z}}_k^\top \mu_{k}\right) \left(\tilde{\mathbf{Z}}^\top_{t} \mu_{t} \right) & 0 & \dots & 0 \\ \end{bmatrix} \\ \implies \text{tr} \left[ \mathbf{Z}^\top \mu \tilde{\mu}_t^\top \mathbf{Z} \right] &= \left(\tilde{\mathbf{Z}}_t^\top \mu_t\right)\left(\tilde{\mathbf{Z}}_t^\top \mu_t \right) \end{aligned} \nonumber $$Proof.
$$ \begin{aligned} \mathbf{Z}^\top \text{diag}(\tilde{\mu}_t) &= \begin{bmatrix} 0 & \dots & 0 & 0 & \dots & 0 & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \mathbf{Z}_{t, 1} \mu_{t, 1} & \dots & 0 & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & 0 & \dots & \mathbf{Z}_{t, n_t} \mu_{t, n_t} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & 0 & \dots & 0 & 0 & \dots & 0 \end{bmatrix} \\ \implies \mathbf{Z}^\top \text{diag}(\tilde{\mu}_t) \mathbf{Z} &= \begin{bmatrix} 0 & \dots & 0 & 0 & \dots & 0 & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & \mathbf{Z}_{t, 1}^2 \mu_{t, 1} & \dots & 0 & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & 0 & \dots & \mathbf{Z}_{t, n_t}^2 \mu_{t, n_t} & 0 & \dots & 0 \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ 0 & \dots & 0 & 0 & \dots & 0 & 0 & \dots & 0 \end{bmatrix} \\ \implies \text{tr}\left[ \mathbf{Z}^\top \text{diag}(\tilde{\mu}_t) \mathbf{Z} \right] &= \sum_{i = 1}^{n_t} \mathbf{Z}_{t, i}^2 \mu_{t, i} \end{aligned} \nonumber $$Score For $\tau^2$
The score for $\tau^2$ has elements:
\[U_{\tau^2_j}\rvert_{H_0} = \frac{\partial}{\partial \tau^2_j} [\ell_q(\mathbf{y}; \alpha, \tau^2) ] \approx \frac{1}{2} \text{tr} \left[ \mathbf{C}^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top\right] = \frac{1}{2} \left( (\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top \Gamma^0 (\Delta^0)^{-1}(\mathbf{y} - \mu^0) - \text{tr} \left[ \Gamma^0_0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top\right] \right)\]Proof.
$$ \begin{aligned} U_{\tau^2_j} &= \frac{\partial}{\partial \tau^2_j} [\ell_q(\mathbf{y}; \alpha, \tau^2) ] \\ &\approx \frac{\partial}{\partial \tau^2_j} \left[ \left[ \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)\right]_{\beta = \mathbf{0}} + \frac{1}{2} \text{tr} \left[ \mathbf{Z}^\top \mathbf{C}^0 \mathbf{Z} D(\tau^2) \right] \right] & \left(\text{Eq. } \eqref{eq:log-quasi-lik} \right) \\ &= \frac{1}{2} \frac{\partial}{\partial \tau^2_j} \left[ \text{tr} \left[ \mathbf{Z}^\top \mathbf{C}^0 \mathbf{Z} D(\tau^2) \right] \right] & \left(\text{no } \tau^2 \text{ in first term} \right) \\ &= \frac{1}{2} \frac{\partial}{\partial \tau^2_j} \left[ \text{tr} \left[ \mathbf{C}^0 \mathbf{Z} D(\tau^2) \mathbf{Z}^\top \right] \right] \\ &= \frac{1}{2} \sum_{i = 1}^n \sum_{i' = 1}^n \frac{\partial}{\partial \tau^2_j} \left[ \mathbf{C}^0_{i, i'} \left(\mathbf{Z} D(\tau^2) \mathbf{Z}^\top \right)_{i, i'}\right] \\ &= \frac{1}{2} \sum_{i = 1}^n \sum_{i' = 1}^n \mathbf{C}^0_{i, i'} \frac{\partial}{\partial \tau^2_j} \left[ \left(\mathbf{Z} D(\tau^2) \mathbf{Z}^\top \right)_{i, i'}\right] & \left(\text{no } \tau^2 \text{ in } \mathbf{C}^0 \right) \\ &= \frac{1}{2} \sum_{i = 1}^n \sum_{i' = 1}^n \mathbf{C}^0_{i, i'} \left(\mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top \right)_{i, i'} \\ &= \frac{1}{2} \text{tr} \left[ \mathbf{C}^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top\right] \end{aligned} \nonumber $$ We can rewrite the score in terms of the quantities we defined in Eqs. \eqref{eq:intermed-terms} and \eqref{eq:intermed-mats}: $$ \begin{aligned} U_{\tau^2_j} &= \frac{1}{2} \text{tr} \left[ \mathbf{C}^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top\right] \\ &= \frac{1}{2} \text{tr} \left[ \left[ \frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta}\frac{\partial \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta^\top} + \frac{\partial^2 \ell_q(\mathbf{y}; \alpha, \tau^2 \rvert \beta)}{\partial \eta \partial \eta^\top} \right]\bigg\rvert_{H_0} \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top\right] \\ &= \frac{1}{2} \text{tr} \left[ \left(\Gamma^0 (\Delta^0)^{-1}(\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 - \Gamma^0_0\right)\mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top\right] \\ &= \frac{1}{2} \left( \text{tr} \left[ \Gamma^0 (\Delta^0)^{-1}(\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top \right] - \text{tr} \left[ \Gamma_0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top\right] \right)\\ &= \frac{1}{2} \left( \text{tr} \left[(\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top \Gamma^0 (\Delta^0)^{-1}(\mathbf{y} - \mu^0)\right] - \text{tr} \left[ \Gamma^0_0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top\right] \right)\\ &= \frac{1}{2} \left( (\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top \Gamma^0 (\Delta^0)^{-1}(\mathbf{y} - \mu^0) - \text{tr} \left[ \Gamma^0_0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top\right] \right) & \left(\text{first term is a scalar}\right)\\ \end{aligned} \nonumber $$ where the first term in the parentheses and $\Gamma_0$ are all evaluated at $\beta = \mathbf{0}$.This is very straightforward: $$ \begin{aligned} U_\tau^2\rvert_{H_0} &\approx \frac{\partial}{\partial \tau^2} \left[ [\ell(\mathbf{y}; \alpha, \tau^2 \rvert \beta)]_{\beta = \mathbf{0}} + \frac{\tau^2}{2} \text{tr}\left[ \mathbf{Z}^\top\left[ (\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0)\right] \mathbf{Z} \right] \right] \\ &= \frac{1}{2}\text{tr}\left[ \mathbf{Z}^\top\left[ (\mathbf{y} - \mu^0)(\mathbf{y} - \mu^0)^\top - \text{diag}(\mu^0)\right] \mathbf{Z} \right] \end{aligned} $$
Score For $\phi$
Technically there is also the scale parameter, but we will just assume it is known for now 🤫.
Information
A Bit Of Discussion
We can think of $U_\tau^2$ and $U_\alpha$ as components of one big score:
\[U = \begin{bmatrix} U_\tau^2 \\ U_\alpha \end{bmatrix} \label{eq:full-score}\]The variance of the score evaluated at the true value of the parameter is the Fisher information:
\[\mathcal{I} = \mathbb{E}\left[ (U - \mathbb{E}[U])(U - \mathbb{E}[U])^\top \right] = \mathbb{E}[UU^\top] \label{eq:full-info}\]where the second equality follows from the fact that the score has expectation zero when the null hypothesis is true (i.e. the null hypothesis value for the parameter is correct).
To perform a score test for just $\tau^2$, we need the variance of just $U_\tau^2$ (the Fisher information). We can partition $\mathcal{I}$ in the following way:
\[\mathcal{I} = \begin{bmatrix} \mathcal{I}_{\tau^2, \tau^2} & \mathcal{I}_{\tau^2, \alpha} \\ \mathcal{I}_{\alpha, \tau^2} & \mathcal{I}_{\alpha, \alpha} \end{bmatrix} \label{eq:partition-info}\]where:
\[\mathcal{I}_{\tau^2, \tau^2} = \mathbb{E}\left[ \frac{\partial \ell(\mathbf{y}; \alpha, \tau^2)}{\partial \tau^2} \frac{\partial \ell(\mathbf{y}; \alpha, \tau^2)}{\partial (\tau^2)^\top}\right], \hspace{5mm} \mathcal{I}_{\alpha, \tau^2} = \mathbb{E}\left[ \frac{\partial \ell(\mathbf{y}; \alpha, \tau^2)}{\partial \alpha} \frac{\partial \ell(\mathbf{y}; \alpha, \tau^2)}{\partial (\tau^2)^\top}\right], \hspace{5mm} \mathcal{I}_{\alpha, \alpha} = \mathbb{E}\left[ \frac{\partial \ell(\mathbf{y}; \alpha, \tau^2)}{\partial \alpha} \frac{\partial \ell(\mathbf{y}; \alpha, \tau^2)}{\partial \alpha^\top}\right] \label{eq:info-blocks}\]The parameter of interest for our test is $\tau^2$, so we only really care about the block of \(\mathcal{I}^{-1}\) corresponding to $\tau^2$. However, this block is not equal to the inverse of the corresponding block in $\mathcal{I}$ due to the presence and estimation of $\alpha$. That is:
\[\left(\mathcal{I}^{-1}\right)_{\tau^2, \tau^2} \neq \left(\mathcal{I}_{\tau^2, \tau^2} \right)^{-1} \label{eq:tricky-neq}\]To overcome this, can use the Schur complement for inverting a partitioned matrix. For general block matrix, $M$, we have:
\[M = \begin{bmatrix} A & B \\ C & D \end{bmatrix} \implies M^{-1} = \begin{bmatrix} (A - BD^{-1} C)^{-1} & -(A - B D^{-1} C)^{-1} BD^{-1} \\ -D^{-1} C (A - BD^{-1}C)^{-1} & D^{-1} + D^{-1}C(A - B D^{-1} C)^{-1} B D^{-1} \end{bmatrix} \nonumber\]Thus, the (relevant parts of the) Fisher information matrix can be computed as:
\[\mathcal{I}^{-1}_{\tau^2, \tau^2} = \left( \mathcal{I}_{\tau^2, \tau^2} - \mathcal{I}_{\alpha, \tau^2} \mathcal{I}^{-1}_{\alpha, \alpha} \mathcal{I}_{\tau^2, \alpha} \right)^{-1} \label{eq:relevant-fisher-info}\]A more detailed discussion of this partitioning idea can be found in my post on estimation theory.
In the remaining sections, we’ll slightly modify our notation. We’ll use a superscript of $0$ to denote a quantity (e.g. $\mu$, $\eta$, $\delta$, etc.) evaluated by setting $\beta = \mathbf{0}$ and $\alpha = \hat{\alpha}_{MLE}$, the maximum likelihood estimate of $\alpha$ under the null hypothesis.
Recall that we assumed that $D(\tau^2) = \mathbf{0}$ if $\tau^2 = \mathbf{0}$. This simplifies the score for $\alpha$ a lot when we evaluate it under the null hypothesis:
\[\begin{aligned} U_{\alpha_j}\rvert_{H_0} &\approx \left(\mathbf{X}_{\cdot, j}\right)^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) \\ U_{\tau^2_j}\rvert_{H_0} &\approx \frac{1}{2}\left((\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1}\Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top \Gamma^0 (\Delta^0)^{-1}(\mathbf{y} - \mu^0) - \text{tr}\left[ \Gamma^0_0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_j} \mathbf{Z}^\top\right]\right) \end{aligned}\]Information For $(\alpha, \alpha)$
\[\begin{aligned} \mathcal{I}_{\alpha_i, \alpha_j}\rvert_{H_0} &\approx \mathbb{E}_{H_0}\left[ U_{\alpha_i} U_{\alpha_j}\right] = (\mathbf{X}_{\cdot, i})^\top \Gamma^0 \mathbf{X}_{\cdot, j} \end{aligned} \label{eq:info-alpha-alpha}\]Proof.
$$ \begin{aligned} \mathcal{I}_{\alpha_i, \alpha_j}\rvert_{H_0} &= \mathbb{E}_{H_0}\left[ U_{\alpha_i} U_{\alpha_j}\right] \\ &\approx \mathbb{E}_{H_0}\left[ \left(\mathbf{X}_{\cdot, i}\right)^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) \left(\mathbf{X}_{\cdot, j}\right)^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) \right] \\ &= \mathbb{E}_{H_0}\left[ \left(\mathbf{X}_{\cdot, i}\right)^\top (\mathbf{y} - \mu^0) \left(\mathbf{X}_{\cdot, j}\right)^\top (\mathbf{y} - \mu^0) \right] \\ &= \mathbb{E}_{H_0}\left[ \left(\sum_{h = 1}^{n} \frac{\gamma^0_h}{\delta^0_h} \mathbf{X}_{h, i}(\mathbf{y}_h - \mu^0_h) \right) \left(\sum_{h = 1}^{n} \frac{\gamma^0_h}{\delta^0_h} \mathbf{X}_{h, j}(\mathbf{y}_h - \mu^0_h) \right) \right] \\ &= \sum_{h = 1}^{n} \left( \mathbb{E}_{H_0}\left[ \frac{(\gamma^0_h)^2}{(\delta^0_h)^2}\mathbf{X}_{h, i}\mathbf{X}_{h, j} (\mathbf{y}_h - \mu^0_h)^2 \right] + \sum_{l \neq h} \mathbb{E}_{H_0}\left[ \frac{\gamma^0_h \gamma^0_l}{\delta^0_h \delta^0_l} \mathbf{X}_{h, i}\mathbf{X}_{l, j}(\mathbf{y}_h - \mu^0_h)(\mathbf{y}_l - \mu^0_l) \right] \right) \\ &= \sum_{h = 1}^{n} \left( \frac{(\gamma^0_h)^2}{(\delta^0_h)^2}\mathbf{X}_{h, i}\mathbf{X}_{h, j} \mathbb{E}_{H_0}\left[ (\mathbf{y}_h - \mu^0_h)^2 \right] + \sum_{l \neq h} \frac{\gamma^0_h \gamma^0_l}{\delta^0_h \delta^0_l} \mathbf{X}_{h, i}\mathbf{X}_{l, j} \mathbb{E}_{H_0}\left[ \mathbf{y}_h - \mu^0_h \right]\mathbb{E}\left[ \mathbf{y}_l - \mu^0_l \right] \right) & \left(\mathbf{y}_i\text{'s independent}\right) \\ &= \sum_{h = 1}^{n} \frac{(\gamma^0_h)^2}{(\delta^0_h)^2} \mathbf{X}_{h, i}\mathbf{X}_{h, j} \text{Var}_{H_0}\left( \mathbf{y}_h \right) & \left(\mathbb{E}_{H_0}\left[ \mathbf{y}_h - \mu^0_h \right] = 0\right) \\ &= \sum_{h = 1}^{n} \frac{\left(\frac{\partial \eta^0_h}{\partial \mu_h}\right)^2}{v^2(\mu^0_h) \left(\frac{\partial \eta^0_h}{\partial \mu_h}\right)^4}\mathbf{X}_{h, i}\mathbf{X}_{h, j} \text{Var}_{H_0}\left( \mathbf{y}_h \right) & \left(\text{Eq. } \eqref{eq:intermed-terms}\right)\\ &= \sum_{h = 1}^{n} \frac{1}{v^2(\mu^0_h) \left(\frac{\partial \eta^0_h}{\partial \mu_h}\right)^2}\mathbf{X}_{h, i}\mathbf{X}_{h, j} v(\mu^0_h) & \left(\text{Eq. } \eqref{eq:quasi-lik} \right)\\ &= \sum_{h = 1}^n \gamma^0_h \mathbf{X}_{h, i}\mathbf{X}_{h, j} \\ &= (\mathbf{X}_{\cdot, i})^\top \Gamma^0 \mathbf{X}_{\cdot, j} \end{aligned} \nonumber $$In this special case, there is no $\mathbf{X}$ (i.e. it is all ones) but the fixed intercepts are cluster-specific, and the general form is confusing me. I think it's easier to just derive it again. Since we will be evaluating everything under the null hypothesis at the end, we can ignore the second term in the score for $\alpha$ (the leading $\tau^2$ will make it equal to zero). Thus: $$ \mathcal{I}_{\alpha, \alpha}\rvert_{H_0} \approx \begin{bmatrix} n_1 \exp(\alpha_1) & 0 & \dots & 0\\ 0 & n_2 \exp(\alpha_2) & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & n_k \exp(\alpha_k) \end{bmatrix} $$
Proof.
$$ \begin{aligned} \mathbb{E}_{H_0}\left[ U_{\alpha_t}(\mathbf{0}) U_{\alpha_s}(\mathbf{0}) \right] &\approx \mathbb{E}_{H_0}\left[ \left(\sum_{j = 1}^{n_t} \left[ \mathbf{y}_{t, j} - \mu_{t, j}^0\right] \right) \left(\sum_{j = 1}^{n_s} \left[ \mathbf{y}_{s, j} - \mu_{s, j}^0\right] \right)\right] \\ &= \left( \sum_{j = 1}^{n_t} \mathbb{E}_{H_0}\left[\mathbf{y}_{t, j} - \mu_{t, j}^0\right] \right)\left( \sum_{j = 1}^{n_s} \mathbb{E}_{H_0}\left[\mathbf{y}_{s, j} - \mu_{s, j}^0\right] \right) \\ &= 0 \\ \mathbb{E}_{H_0}\left[ U_{\alpha_t}^2(\mathbf{0}) \right] &\approx \mathbb{E}_{H_0}\left[ \left(\sum_{j = 1}^{n_t} \left[ \mathbf{y}_{t, j} - \mu_{t, j}^0\right] \right)^2\right] \\ &= \sum_{j = 1}^{n_t} \mathbb{E}_{H_0}\left[ (\mathbf{y}_{t,j} - \mu_{t,j}^0)^2\right] + \sum_{j = 1}^{n_t} \sum_{j' = 1 \\ j' \neq j}^{n_t} \mathbb{E}_{H_0}\left[\mathbf{y}_{t,j} - \mu_{t,j}^0 \right]\mathbb{E}\left[\mathbf{y}_{t,j'} - \mu_{t,j'}^0 \right] \\ &= \sum_{j = 1}^{n_t} \mu_{t,j}^0 \\ &= n_t \exp(\alpha_t) \end{aligned} \nonumber $$ where in the third to last equality, we assume that observations within the same cluster are independent.Information For $(\alpha, \tau^2)$
\[\begin{aligned} \mathcal{I}_{\alpha_j, \tau^2_l}\rvert_{H_0} &= \frac{1}{2} \sum_{i' = 1}^n \mathbf{X}_{i', j} \mathbb{E}_{H_0}\left[ (\mathbf{y}_{i'} - \mu_{i'}^0)^3 \right] \left( \gamma_{i'}^0 (\delta_{i'}^0)^{-1} \right)^3 \left[ \mathbf{Z}_{i', \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{i', \cdot})^\top \right] \\ &= \frac{1}{2} \left[ \sum_{i' = 1}^n \mathbf{X}_{i', j} \kappa_{i'}^3 \left( \gamma_{i'}^0 (\delta_{i'}^0)^{-1} \right)^3 \left[ \mathbf{Z}_{i', \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{i', \cdot})^\top \right] + \sum_{i = 1}^n \mathbf{X}_{i,j} \gamma_i^0 (\delta_i^0)^{-1} \rho^0_i \kappa_{i'}^2 \left[ \mathbf{Z}_{i, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{i, \cdot})^\top \right] \right] \\ &= \frac{1}{2} \sum_{i = 1}^n \mathbf{X}_{i, j} \left[ \mathbf{Z}_{i, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{i, \cdot})^\top \right] \left(\kappa_{i}^3 \left( \gamma_{i}^0 (\delta_{i}^0)^{-1} \right)^3 - \gamma_i^0 (\delta_i^0)^{-1} \rho^0_i \kappa_{i}^2 \right) \end{aligned}\]where \(\kappa^2_{i} = \mathbb{E}\left[(\mathbf{y}_{i} - \mu_{i}^0)^2 \right]\) and \(\kappa^3_{i} = \mathbb{E}\left[(\mathbf{y}_{i} - \mu_{i}^0)^3 \right]\) are the second and third central moments of $\mathbf{y}_{i}$ under the null (see this page on the equivalence between central moments and cumulants).
Proof.
$$ \begin{aligned} \mathcal{I}_{\alpha_j, \tau^2_l} \rvert_{H_0} &= \mathbb{E}_{H_0}\left[ U_{\alpha_j}(0) U_{\tau^2_l}(0)\right] \\ &\approx \mathbb{E}_{H_0}\left[ \left( \left(\mathbf{X}_{\cdot, j}\right)^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) \right) \left( \frac{1}{2}\left( (\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1}\Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \Gamma^0 (\Delta^0)^{-1}(\mathbf{y} - \mu^0) - \text{tr}\left[ \Gamma^0_0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top\right]\right) \right) \right] \\ &= \frac{1}{2} \underbrace{\mathbb{E}_{H_0}\left[ \left( \left(\mathbf{X}_{\cdot, j}\right)^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) \right) \left( (\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1}\Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \Gamma^0 (\Delta^0)^{-1}(\mathbf{y} - \mu^0) \right) \right]}_{(\star)} - \underbrace{\mathbb{E}_{H_0}\left[ \left( \left(\mathbf{X}_{\cdot, j}\right)^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0)\right) \text{tr}\left[ \Gamma_0^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \right] \right]}_{(\star \star)} \\ &= \frac{1}{2} \left[ \sum_{i' = 1}^n \mathbf{X}_{i', j} \mathbb{E}_{H_0}\left[ (\mathbf{y}_{i'} - \mu_{i'}^0)^3 \right] \left( \gamma_{i'}^0 (\delta_{i'}^0)^{-1} \right)^3 \left[ \mathbf{Z}_{i', \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{i', \cdot})^\top \right] - \sum_{i = 1}^n \mathbf{X}_{i,j} \gamma_i^0 (\delta_i^0)^{-1} \rho^0_i \mathbb{E}_{H_0}\left[ (\mathbf{y}_i - \mu_i^0)^2 \right] \left[ \mathbf{Z}_{i, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{i, \cdot})^\top \right] \right] \end{aligned} \nonumber $$Proof of $(\star)$.
We can rewrite the first term in $(\star)$ as the summation: $$ \left(\mathbf{X}_{\cdot, j}\right)^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) = \sum_{i = 1}^n \mathbf{X}_{i,j} \gamma_i^0 (\delta_i^0)^{-1} (\mathbf{y}_i - \mu_i^0) \nonumber $$ We can also write the second term in $(\star)$ as a summation: $$ (\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1}\Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \Gamma^0 (\Delta^0)^{-1}(\mathbf{y} - \mu^0) = \sum_{i' = 1}^n \left( \sum_{i = 1}^n \left[(\mathbf{y}_{i} - \mu_i^0)\gamma_i^0 (\delta_i^0)^{-1}\right]\left[ \mathbf{Z}_{i, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{i', \cdot})^\top \right] \right) (\mathbf{y}_{i'} - \mu_{i'}^0)\gamma_i^0 (\delta_i^0)^{-1} \nonumber $$Proof.
The large vector/matrix product can be broken down into the following pieces: $$ \begin{aligned} (\mathbf{y}- \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 &= \begin{bmatrix} (\mathbf{y}_1 - \mu_1^0)(\gamma_1^0 / \delta_1^0) \\ \vdots \\ (\mathbf{y}_n - \mu_n^0)(\gamma_n^0 / \delta_n^0) \end{bmatrix}^\top \\ \Gamma^0 (\Delta^0)^{-1}(\mathbf{y}- \mu^0) &= \begin{bmatrix} (\mathbf{y}_1 - \mu_1^0)(\gamma_1^0 / \delta_1^0) \\ \vdots \\ (\mathbf{y}_n - \mu_n^0)(\gamma_n^0 / \delta_n^0) \end{bmatrix} \\ \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top &= \begin{bmatrix} \mathbf{Z}_{1, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \left( \mathbf{Z}_{1, \cdot} \right)^\top & \dots & \mathbf{Z}_{1, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \left( \mathbf{Z}_{n, \cdot} \right)^\top \\ \vdots & \ddots & \vdots \\ \mathbf{Z}_{n, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \left( \mathbf{Z}_{1, \cdot} \right)^\top & \dots & \mathbf{Z}_{n, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \left( \mathbf{Z}_{n, \cdot} \right)^\top \end{bmatrix} \end{aligned} \nonumber $$Proof.
$$ \begin{aligned} (\mathbf{y}- \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 &= \begin{bmatrix} \mathbf{y}_1 - \mu_1^0 \\ \vdots \\ \mathbf{y}_n - \mu_n^0 \end{bmatrix}^\top \begin{bmatrix} 1/ \delta^0_1 & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \vdots & 1/ \delta^0_n \end{bmatrix} \begin{bmatrix} \gamma_1^0 & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \vdots & \gamma_n^0 \end{bmatrix} \\ &= \begin{bmatrix} \mathbf{y}_1 - \mu_1^0 \\ \vdots \\ \mathbf{y}_n - \mu_n^0 \end{bmatrix}^\top \begin{bmatrix} \gamma_1^0 / \delta^0_1 & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \vdots & \gamma_n^0 / \delta^0_n \end{bmatrix} \\ &= \begin{bmatrix} (\mathbf{y}_1 - \mu_1^0)(\gamma_1^0 / \delta_1^0) \\ \vdots \\ (\mathbf{y}_n - \mu_n^0)(\gamma_n^0 / \delta_n^0) \end{bmatrix}^\top \end{aligned} \nonumber $$ $$ \begin{aligned} \Gamma^0 (\Delta^0)^{-1} (\mathbf{y}- \mu^0) &= \begin{bmatrix} \gamma_1^0 & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \vdots & \gamma_n^0 \end{bmatrix} \begin{bmatrix} 1/ \delta^0_1 & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \vdots & 1/ \delta^0_n \end{bmatrix} \begin{bmatrix} \mathbf{y}_1 - \mu_1^0 \\ \vdots \\ \mathbf{y}_n - \mu_n^0 \end{bmatrix} \\ &= \begin{bmatrix} \gamma_1^0 / \delta^0_1 & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \vdots & \gamma_n^0 / \delta^0_n \end{bmatrix} \begin{bmatrix} \mathbf{y}_1 - \mu_1^0 \\ \vdots \\ \mathbf{y}_n - \mu_n^0 \end{bmatrix} \\ &= \begin{bmatrix} (\mathbf{y}_1 - \mu_1^0)(\gamma_1^0 / \delta_1^0) \\ \vdots \\ (\mathbf{y}_n - \mu_n^0)(\gamma_n^0 / \delta_n^0) \end{bmatrix} \end{aligned} \nonumber $$ $$ \begin{aligned} \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top &= \begin{bmatrix} \mathbf{Z}_{1, 1} & \dots & \mathbf{Z}_{1, (q \times k)} \\ \vdots & \ddots & \vdots \\ \mathbf{Z}_{n, 1} & \dots & \mathbf{Z}_{n, (q \times k)} \end{bmatrix} \begin{bmatrix} \frac{\partial D(\tau^2)_{1,1}}{\partial \tau^2_l} & \dots & \frac{\partial D(\tau^2)_{1,(q \times k)}}{\partial \tau^2_l} \\ \vdots & \ddots & \vdots \\ \frac{\partial D(\tau^2)_{(q \times k),1}}{\partial \tau^2_l} & \dots & \frac{\partial D(\tau^2)_{(q \times k), (q \times k)}}{\partial \tau^2_l} \end{bmatrix} \begin{bmatrix} \mathbf{Z}_{1, 1} & \dots & \mathbf{Z}_{1, n} \\ \vdots & \ddots & \vdots \\ \mathbf{Z}_{(q \times k), 1} & \dots & \mathbf{Z}_{(q \times k), n} \end{bmatrix} \\ &= \begin{bmatrix} \sum_{i = 1}^{q \times k} \mathbf{Z}_{1, i} \frac{\partial D(\tau^2)_{i, 1}}{\partial \tau^2_l} & \dots & \sum_{i = 1}^{q \times k} \mathbf{Z}_{1, i} \frac{\partial D(\tau^2)_{i, (q \times k)}}{\partial \tau^2_l} \\ \vdots & \ddots & \vdots \\ \sum_{i = 1}^{q \times k} \mathbf{Z}_{n, i} \frac{\partial D(\tau^2)_{i, 1}}{\partial \tau^2_l} & \dots & \sum_{i = 1}^{q \times k} \mathbf{Z}_{n, i} \frac{\partial D(\tau^2)_{i, (q \times k)}}{\partial \tau^2_l} \\ \end{bmatrix} \begin{bmatrix} \mathbf{Z}_{1, 1} & \dots & \mathbf{Z}_{1, n} \\ \vdots & \ddots & \vdots \\ \mathbf{Z}_{(q \times k), 1} & \dots & \mathbf{Z}_{(q \times k), n} \end{bmatrix} \\ &= \begin{bmatrix} \sum_{i' = 1}^{q \times k} \left( \sum_{i = 1}^{q \times k} \mathbf{Z}_{1, i} \frac{\partial D(\tau^2)_{i, i'}}{\partial \tau^2_l}\right) \mathbf{Z}_{i', 1} & \dots & \sum_{i' = 1}^{q \times k} \left( \sum_{i = 1}^{q \times k} \mathbf{Z}_{1, i} \frac{\partial D(\tau^2)_{i, i'}}{\partial \tau^2_l}\right) \mathbf{Z}_{i', n} \\ \vdots & \ddots & \vdots \\ \sum_{i' = 1}^{q \times k} \left( \sum_{i = 1}^{q \times k} \mathbf{Z}_{n, i} \frac{\partial D(\tau^2)_{i, i'}}{\partial \tau^2_l}\right) \mathbf{Z}_{i', 1} & \dots & \sum_{i' = 1}^{q \times k} \left( \sum_{i = 1}^{q \times k} \mathbf{Z}_{n, i} \frac{\partial D(\tau^2)_{i, i'}}{\partial \tau^2_l}\right) \mathbf{Z}_{i', n} \end{bmatrix}\\ &= \begin{bmatrix} \sum_{i' = 1}^{q \times k} \mathbf{Z}_{1, \cdot} \frac{\partial D(\tau^2)_{\cdot, i'}}{\partial \tau^2_l} \mathbf{Z}_{i', 1} & \dots & \sum_{i' = 1}^{q \times k} \mathbf{Z}_{1, \cdot} \frac{\partial D(\tau^2)_{\cdot, i'}}{\partial \tau^2_l} \mathbf{Z}_{i', n} \\ \vdots & \ddots & \vdots \\ \sum_{i' = 1}^{q \times k} \mathbf{Z}_{n, \cdot} \frac{\partial D(\tau^2)_{\cdot, i'}}{\partial \tau^2_l} \mathbf{Z}_{i', 1} & \dots & \sum_{i' = 1}^{q \times k} \mathbf{Z}_{n, \cdot} \frac{\partial D(\tau^2)_{\cdot, i'}}{\partial \tau^2_l} \mathbf{Z}_{i', n} \end{bmatrix} \\ &= \begin{bmatrix} \mathbf{Z}_{1, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \left( \mathbf{Z}_{1, \cdot} \right)^\top & \dots & \mathbf{Z}_{1, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \left( \mathbf{Z}_{n, \cdot} \right)^\top \\ \vdots & \ddots & \vdots \\ \mathbf{Z}_{n, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \left( \mathbf{Z}_{1, \cdot} \right)^\top & \dots & \mathbf{Z}_{n, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \left( \mathbf{Z}_{n, \cdot} \right)^\top \end{bmatrix} \end{aligned} \nonumber $$Proof of $(\star \star)$.
We can rewrite the trace term as a double summation: $$ \begin{aligned} \text{tr}\left[ \Gamma_0^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \right] &= \sum_{i = 1}^n \sum_{j = 1}^n \left(\Gamma_0^0 \right)_{i,j} \left[ \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \right]_{j, i} \\ &= \sum_{i = 1}^n \gamma_{0, i}^0 \mathbf{Z}_{i, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{i, \cdot})^\top \end{aligned} \nonumber $$ Then $(\star \star)$ becomes: $$ \begin{aligned} \mathbb{E}_{H_0}\left[ \left( \left(\mathbf{X}_{\cdot, j}\right)^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0)\right) \text{tr}\left[ \Gamma_0^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \right] \right] &= \mathbb{E}_{H_0}\left[ \left(\sum_{i = 1}^n \mathbf{X}_{i, j} \gamma_i^0 (\delta_i^0)^{-1} (\mathbf{y}_i - \mu_i^0) \right) \left( \sum_{i = 1}^n \gamma_{0, i}^0 \mathbf{Z}_{i, \cdot} \frac{\partial D(\tau^2)}{\partial \tau^2_l} (\mathbf{Z}_{i, \cdot})^\top\right) \right] \\ &= \sum_{i = 1}^n \mathbf{X}_{i,j} \mathbb{E}_{H_0}\left[ \xi_i \gamma_{0, i}^0 \right] H_{i, i, l} + \sum_{i = 1}^n \sum_{i' \neq i} \mathbf{X}_{i,j} \underbrace{\mathbb{E}_{H_0}[\xi_i]}_{=0} \mathbb{E}_{H_0}[\gamma_{0, i'}^0] H_{i', i', l} \\ &= \sum_{i = 1}^n \mathbf{X}_{i,j} \gamma_i^0 (\delta_i^0)^{-1} \mathbb{E}_{H_0}\left[ (\mathbf{y}_i - \mu_i^0) (\gamma_i^0 + \rho^0_i (\mathbf{y}_i - \mu_i^0)) \right] H_{i, i, l} \\ &= \sum_{i = 1}^n \mathbf{X}_{i,j} \gamma_i^0 (\delta_i^0)^{-1} \left (\gamma_i^0 \underbrace{\mathbb{E}_{H_0}\left[\mathbf{y}_i - \mu_i^0\right]}_{=0} + \rho^0_i \mathbb{E}_{H_0}\left[ (\mathbf{y}_i - \mu_i^0)^2 \right]\right) H_{i, i, l} \\ &= \sum_{i = 1}^n \mathbf{X}_{i,j} \gamma_i^0 (\delta_i^0)^{-1} \rho^0_i \mathbb{E}_{H_0}\left[ (\mathbf{y}_i - \mu_i^0)^2 \right] H_{i, i, l} \end{aligned} \nonumber $$We have to re-derive this as well since the intercept is cluster-specific: $$ \mathcal{I}_{\alpha, \tau^2} \approx \begin{bmatrix} \frac{\exp(\alpha_1)}{2} \sum_{j = 1}^{n_1} \tilde{\mathbf{Z}}_{j}^2 \\ \vdots \\ \frac{\exp(\alpha_k)}{2} \sum_{j = 1}^{n_k} \tilde{\mathbf{Z}}_{j}^2 \end{bmatrix} $$
Proof.
Let $f(\tau^2) = \frac{\tau^2}{2} \left[ \frac{\partial}{\partial \alpha_t} \left[ \mathbf{Z}^\top \mathbf{C}^0 \mathbf{Z}\right]\right]$. We have: $$ \begin{aligned} \mathbb{E}\left[ U_{\alpha_t}(\mathbf{0})U_{\tau^2}(\mathbf{0}) \right] &\approx \mathbb{E}\left[ \left(\sum_{j = 1}^{n_t}(\mathbf{y}_{t,j} - \mu_{t,j}^0) + f(\tau^2) \right) \left(\frac{1}{2} \text{tr}\left[ \mathbf{Z}^\top \mathbf{C}^0 \mathbf{Z} \right]\right) \right] \\ &= \frac{1}{2} \mathbb{E}\left[ \left( \sum_{j = 1}^{n_t} (\mathbf{y}_{t,j} - \mu_{t,j}^0) \right) \text{tr}\left[ \mathbf{Z}^\top \mathbf{C}^0 \mathbf{Z} \right] \right] + \underbrace{\frac{1}{2} f(\tau^2) \mathbb{E}\left[ \text{tr}\left[ \mathbf{Z}^\top \mathbf{C}^0 \mathbf{Z} \right] \right]}_{=0 \text{ under } H_0} \\ &= \frac{1}{2} \mathbb{E}\left[ \left( \sum_{j = 1}^{n_t} (\mathbf{y}_{t,j} - \mu_{t,j}^0) \right) \sum_{i = 1}^{k} (\mathbf{Z}_{\cdot, i})^\top \mathbf{C}^0 \mathbf{Z}_{\cdot, i} \right] \\ &= \frac{1}{2} \sum_{j = 1}^{n_t} \sum_{i = 1}^k \mathbb{E}\left[ (\mathbf{y}_{t,j} - \mu_{t,j}^0) (\mathbf{Z}_{\cdot, i})^\top \mathbf{C}^0 \mathbf{Z}_{\cdot, i}\right] \\ &= \frac{1}{2} \sum_{j = 1}^{n_t} \sum_{i = 1}^k \mathbb{E}\left[ (\mathbf{y}_{t,j} - \mu_{t,j}^0) \sum_{h = 1}^n \sum_{h' = 1}^n \mathbf{Z}_{h, i} \mathbf{C}^0_{h, h'} \mathbf{Z}_{h', i} \right] \\ &\overset{(*)}{=} \frac{1}{2} \sum_{j = 1}^{n_t} \sum_{i = 1}^k \mathbb{E}\left[ (\mathbf{y}_{t,j} - \mu_{t,j}^0) \sum_{h = 1}^n \left(\mathbf{Z}_{h,i}^2\left((\mathbf{y}_h - \mu_h^0)^2 - \mu_h^0 \right) + \sum_{h' = 1 \\ h' \neq h}^n \mathbf{Z}_{h, i} \mathbf{Z}_{h', i} (\mathbf{y}_h - \mu_h^0)(\mathbf{y}_{h'} - \mu_{h'}^0)\right)\right] \end{aligned} \nonumber $$ Note that in $(*)$ we have mixed the cluster-specific and full indexing into the response and mean vectors. We then split the sums of expectations into those that correspond to cluster $t$ and those that don't: $$ \begin{aligned} \mathbb{E}_{H_0}\left[ U_{\alpha_t} U_{\tau^2} \right] &\approx \frac{1}{2} \sum_{j = 1}^{n_t} \sum_{i = 1}^k \mathbb{E}_{H_0}\left[ (\mathbf{y}_{t,j} - \mu_{t,j}^0) \sum_{h = 1}^n \left(\mathbf{Z}_{h,i}^2\left((\mathbf{y}_h - \mu_h^0)^2 - \mu_h^0 \right) + \sum_{h' = 1 \\ h' \neq h}^n \mathbf{Z}_{h, i} \mathbf{Z}_{h', i} (\mathbf{y}_h - \mu_h^0)(\mathbf{y}_{h'} - \mu_{h'}^0)\right)\right] \\ &= \frac{1}{2} \sum_{j = 1}^{n_t} \left( \tilde{\mathbf{Z}}_{j}^2 \left(\mathbb{E}_{H_0}\left[ (\mathbf{y}_{t,j} - \mu_{t,j}^0)^3\right] - \underbrace{\mathbb{E}_{H_0}\left[ (\mathbf{y}_{t,j} - \mu_{t,j}^0)\mu_{t,j}^0\right]}_{=0} \right)+ \sum_{h' = 1 \\ h' \neq j}^n \tilde{\mathbf{Z}}_{j} \mathbf{Z}_{h', t} \mathbb{E}_{H_0}\left[(\mathbf{y}_{t,j} - \mu_{t,j}^0)^2\right]\underbrace{\mathbb{E}_{H_0}\left[\mathbf{y}_{h'} - \mu_{h'}^0\right]}_{=0} \right) \\ &\hspace{5mm} \frac{1}{2} \sum_{j = 1}^{n_t} \sum_{i = 1}^k \sum_{h = 1 \\ h \neq j}^n \left( \mathbf{Z}_{h,i}^2 \underbrace{\mathbb{E}_{H_0}\left[\mathbf{y}_{t,j} - \mu_{t,j}^0 \right]}_{=0}\mathbb{E}_{H_0}\left[ (\mathbf{y}_{h} - \mu_h^0)^2 - \mu_h^0 \right] + \sum_{h' = 1 \\ h' \neq h}^n \mathbf{Z}_{h,i} \mathbf{Z}_{h', i} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{h} - \mu_h^0 \right]}_{=0} \mathbb{E}_{H_0}\left[ (\mathbf{y}_{t,j} - \mu_{t,j}^0)(\mathbf{y}_{h'} - \mu_{h'}^0) \right] \right) \\ &= \frac{1}{2} \sum_{j = 1}^{n_t}\tilde{\mathbf{Z}}_{j}^2 \mathbb{E}_{H_0}\left[ (\mathbf{y}_{t,j} - \mu_{t,j}^0)^3 \right] \\ &= \frac{1}{2} \sum_{j = 1}^{n_t} \tilde{\mathbf{Z}}_{j}^2 \mu_{t,j}^0 \\ &= \frac{\exp(\alpha_t)}{2} \sum_{j = 1}^{n_t} \tilde{\mathbf{Z}}_{j}^2 \end{aligned} \nonumber $$Information For $(\tau^2, \tau^2)$
First, let’s rewrite the score for $\tau^2$:
\[\begin{aligned} U_{\tau^2_j}\rvert_{H_0} &= \frac{1}{2} \sum_{i = 1}^n \left( \left[ (\gamma^0_i)^2 (\delta_i^0)^{-2} (\mathbf{y}_i - \mu^0_i)^2 - \gamma_{0, i}^0\right]\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \right)_{i,i} + 2 \sum_{i' < i} (\mathbf{y}_i - \mu_i^0) (\mathbf{y}_{i'} - \mu_{i'}^0) (\gamma_i^0)^2 (\gamma_{i'}^0)^2 (\delta_i^0)^{-1} (\delta_{i'}^0)^{-1} \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \right)_{i, i'} \right) \end{aligned} \label{eq:rewrite-u-theta-j}\]Rewriting $U_{\tau^2_j}$.
$$ \begin{aligned} U_{\tau^2_l}\rvert_{H_0} &= \frac{1}{2}\left( (\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) - \text{tr}\left[ \Gamma_0^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \right] \right) \\ &= \frac{1}{2} \left( \text{tr}\left[ (\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) \right] - \text{tr} \left[ \Gamma_0^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top\right] \right) &\left(\text{first term is a scalar}\right) \\ &= \frac{1}{2} \left( \text{tr}\left[ \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) (\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top\right] - \text{tr} \left[ \Gamma_0^0 \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top\right] \right) &\left(\text{cyclic property}\right) \\ &= \frac{1}{2} \text{tr}\left[ \left( \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) (\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 - \Gamma_0^0 \right) \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top\right] \\ &= \frac{1}{2} \sum_{i = 1}^n \sum_{j = 1}^n \left( \Gamma^0 (\Delta^0)^{-1} (\mathbf{y} - \mu^0) (\mathbf{y} - \mu^0)^\top (\Delta^0)^{-1} \Gamma^0 - \Gamma_0^0 \right)_{i,j} \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \right)_{j,i} \\ &= \frac{1}{2} \sum_{i = 1}^n \left( \left[ (\gamma^0_i)^2 (\delta_i^0)^{-2} (\mathbf{y}_i - \mu^0_i)^2 - \gamma_{0, i}^0\right]\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \right)_{i,i} + 2 \sum_{i' < i} (\mathbf{y}_i - \mu_i^0) (\mathbf{y}_{i'} - \mu_{i'}^0) (\gamma_i^0)^2 (\gamma_{i'}^0)^2 (\delta_i^0)^{-1} (\delta_{i'}^0)^{-1} \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_l} \mathbf{Z}^\top \right)_{i, i'} \right) \end{aligned} $$Proof.
$$ \begin{aligned} \mathcal{I}_{\tau^2_l, \tau^2_{l'}} \rvert_{H_0} &= \mathbb{E}_{H_0}\left[U_{\tau^2_l} U_{\tau^2_{l'}}\right] \\ &= \frac{1}{4}\mathbb{E}_{H_0}\left[ \left( \sum_{i = 1}^n \left[(\mathbf{y}_i - \mu^0_i)^2 (\delta_i^0)^{-2} (\gamma_{i}^0)^2 - \gamma_{0,i}^0 \right]\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i} + 2\sum_{i' < i} (\mathbf{y}_i - \mu^0_i) (\mathbf{y}_{i'} - \mu^0_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} \gamma_{i}^0 \gamma^0_{i'} \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i'}\right) \right. \\ &\hspace{15mm}\left. \times \left( \sum_{i = 1}^n \left[(\mathbf{y}_i - \mu^0_i)^2 (\delta_i^0)^{-2} (\gamma_{i}^0)^2 - \gamma_{0,i}^0 \right]\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i} + 2\sum_{i' < i} (\mathbf{y}_i - \mu^0_i) (\mathbf{y}_{i'} - \mu^0_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} \gamma_{i}^0 \gamma^0_{i'} \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i'}\right) \right] \\ &= \frac{1}{4} \mathbb{E}_{H_0}\left[ \left(\sum_{i = 1}^n \left[(\mathbf{y}_i - \mu^0_i)^2 (\delta_i^0)^{-2} (\gamma_{i}^0)^2 - \gamma_{0,i}^0 \right]\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i} \right)\left(\sum_{i = 1}^n \left[(\mathbf{y}_i - \mu^0_i)^2 (\delta_i^0)^{-2} (\gamma_{i}^0)^2 - \gamma_{0,i}^0 \right]\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i} \right) \right. \\ &\hspace{15mm} + \left. \left(\sum_{i = 1}^n \left[(\mathbf{y}_i - \mu^0_i)^2 (\delta_i^0)^{-2} (\gamma_{i}^0)^2 - \gamma_{0,i}^0 \right]\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i} \right) \left(2 \sum_{i = 1}^n \sum_{i' < i} (\mathbf{y}_i - \mu^0_i) (\mathbf{y}_{i'} - \mu^0_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} \gamma_{i}^0 \gamma_{i'}^0 \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i'} \right) \right. \\ &\hspace{15mm} + \left. \left(\sum_{i = 1}^n \left[(\mathbf{y}_i - \mu^0_i)^2 (\delta_i^0)^{-2} (\gamma_{i}^0)^2 - \gamma_{0,i}^0 \right]\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i} \right) \left(2 \sum_{i = 1}^n \sum_{i' < i} (\mathbf{y}_i - \mu^0_i) (\mathbf{y}_{i'} - \mu^0_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} \gamma_{i}^0 \gamma_{i'}^0 \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i'} \right) \right. \\ &\hspace{15mm} + \left. \left(2 \sum_{i = 1}^n \sum_{i' < i} (\mathbf{y}_i - \mu^0_i) (\mathbf{y}_{i'} - \mu^0_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} \gamma_{i}^0 \gamma_{i'}^0 \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i'} \right)\left(2 \sum_{i = 1}^n \sum_{i' < i} (\mathbf{y}_i - \mu^0_i) (\mathbf{y}_{i'} - \mu^0_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} \gamma_{i}^0 \gamma_{i'}^0 \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i'} \right) \right] \\ &\overset{(i)}{=} \frac{1}{4} \mathbb{E}_{H_0}\left[ \left(\sum_{i = 1}^n \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i}\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i}\left[(\mathbf{y}_i - \mu^0_i)^2 (\delta_i^0)^{-2} (\gamma_{i}^0)^2 - \gamma_{0,i}^0 \right]^2 \right) + 4\left(\sum_{i = 1}^n \sum_{i' < i} \left( (\mathbf{y}_i - \mu^0_i) (\mathbf{y}_{i'} - \mu^0_{i'}) (\delta_i^0)^{-1} (\delta^0_{i'})^{-1} \gamma_{i}^0 \gamma_{i'}^0 \right)^2 \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i'} \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i'} \right) \right] \\ &= \frac{1}{4}\left(\sum_{i = 1}^n \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i}\left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i}\mathbb{E}_{H_0}\left[((\mathbf{y}_i - \mu^0_i)^2 (\delta_i^0)^{-2} (\gamma_{i}^0)^2 - \gamma_{0,i}^0)^2\right] + 4\left(\sum_{i = 1}^n \sum_{i' < i} \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l}} \mathbf{Z}^\top \right)_{i,i'} \left( \mathbf{Z} \frac{\partial D(\tau^2)}{\partial \tau^2_{l'}} \mathbf{Z}^\top \right)_{i,i'}\mathbb{E}_{H_0}\left[((\mathbf{y}_i - \mu^0_i) (\mathbf{y}_{i'} - \mu^0_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} \gamma_{i}^0 \gamma_{i'}^0)^2 \right] \right) \right) \end{aligned} \nonumber $$Proof of $(i)$.
In $(i)$, we use the fact that the $\mathbf{y}_i$'s are independent and have mean $\mu^0_i$ under the null. Thus, any term that has $(\mathbf{y}_i - \mu^0_i)$ (to only a power of $1$) will be $0$ in expectation and remove the term entirely from the sum.Proof of $(ii)$.
To simplify this further, notice the following: $$ \begin{aligned} \mathbb{E}_{H_0}\left[((\mathbf{y}_i - \mu^0_i)^2 (\delta_i^{-1})^2 (\gamma_i^0)^2 - \gamma_{0,i}^0)^2 \right] &= \mathbb{E}_{H_0}\left[((\mathbf{y}_i - \mu^0_i)^2 (\delta_i^{-1})^2 (\gamma_i^0)^2)^2 - 2\gamma_{0,i}^0(\mathbf{y}_i - \mu^0_i)^2 (\delta_i^{-1})^2 (\gamma_i^0)^2 + (\gamma_{0,i}^0)^2\right] \\ &= (\delta_i^{-1})^4 (\gamma_i^0)^4 \mathbb{E}_{H_0}\left[(\mathbf{y}_i - \mu^0_i)^4\right] - 2 \mathbb{E}_{H_0}\left[\gamma_{0,i}^0(\mathbf{y}_i - \mu^0_i)^2 (\delta_i^{-1})^2 (\gamma_i^0)^2\right] + \mathbb{E}_{H_0}\left[(\gamma_{0,i}^0)^2\right] \\ &= (\delta_i^{-1})^4 (\gamma_i^0)^4 (\kappa_{i}^4 + 3(\kappa_{i}^2)^2) - 2(\delta_i^{-1})^2(\gamma_i^0)^2 \mathbb{E}_{H_0}\left[ (\gamma_i^0 + \rho^0_i(\mathbf{y}_i - \mu^0_i))(\mathbf{y}_i-\mu^0_i)^2 \right] + \mathbb{E}_{H_0}\left[ (\gamma_i^0 + \rho^0_i(\mathbf{y}_i - \mu^0_i))^2 \right] \\ &= (\delta_i^{-1})^4 (\gamma_i^0)^4 (\kappa_{i}^4 + 3(\kappa_{i}^2)^2) - 2(\delta_i^{-1})^2(\gamma_i^0)^2 \left( \gamma_i^0\mathbb{E}_{H_0}\left[ (\mathbf{y}_i - \mu^0_i)^2 \right] + \rho^0_i \mathbb{E}_{H_0}\left[(\mathbf{y}_i - \mu^0_i)^3 \right]\right) + (\gamma_i^0)^2 + 2 \gamma_i^0 \rho^0_i \mathbb{E}_{H_0}\left[ \mathbf{y}_i - \mu^0_i \right] + (\rho_i^0)^2 \mathbb{E}_{H_0}\left[ (\mathbf{y}_i - \mu^0_i)^2 \right] \\ &= (\delta_i^{-1})^4 (\gamma_i^0)^4 \kappa_{i}^4 + 3(\delta_i^{-1})^4 (\gamma_i^0)^4 (\kappa_{i}^2)^2 - 2(\delta_i^{-1})^2(\gamma_i^0)^3 \kappa_{i}^2 - 2(\delta_i^{-1})^2(\gamma_i^0)^2 \rho^0_i \mathbb{E}_{H_0}\left[ (\mathbf{y}_i - \mu^0_i)^3 \right] + (\gamma_i^0)^2 + (\rho^0_i)^2 \kappa_{i}^2 \\ &\overset{(a)}{=} (\delta_i^{-1})^4 (\gamma_i^0)^4 \kappa_{i}^4 + 3(\delta_i^{-1})^2 (\gamma_i^0)^2 (\delta_i)^2 - 2\delta_i^{-1}(\gamma_i^0)^2 \delta_i - 2(\delta_i^{-1})^2 (\gamma_i^0)^2 \rho^0_i \mathbb{E}_{H_0}\left[ (\mathbf{y}_i - \mu^0_i)^3 \right] + (\gamma_i^0)^2 + (\rho_i^0)^2 \kappa_{i}^2 \\ &= (\delta_i^{-1})^4 (\gamma_i^0)^4 \kappa_{i}^4 + 3(\gamma_i^0)^2 - 2(\gamma_i^0)^2 + (\gamma_i^0)^2 - 2(\delta_i^{-1})^2 (\gamma_i^0)^2 \rho^0_i \mathbb{E}_{H_0}\left[ (\mathbf{y}_i - \mu^0_i)^3 \right] + (\rho_i^0)^2 \kappa_{2,i} \\ &\overset{(b)}{=} (\delta_i^{-1})^4 (\gamma_i^0)^4 \kappa_{i}^4 + 2(\gamma_i^0)^2 + (\rho_i^0)^2 \kappa_{i}^2 - 2 (\gamma_i^0)^2 (\delta_i^{-1})^2 \rho^0_i \kappa_{i}^3 \end{aligned} \nonumber $$ In $(a)$, we have that: $$ (\delta_i^0)^{-1} \gamma_i^0 = \left( \frac{1}{\frac{\partial \eta^0_i}{\partial \mu_i}} \right)^{-1} \frac{1}{v(\mu^0_i) (\frac{\partial \eta^0_i}{\partial \mu_i})^2} = \frac{1}{v(\mu^0_i) \frac{\partial \eta^0_i}{\partial \mu_i}} \implies (\delta^0_i)^{-1} \gamma^0_i \kappa_{i}^2 = \frac{1}{v(\mu^0_i) \frac{\partial \eta^0_i}{\partial \mu_i}} v(\mu^0_i) = \delta^0_i \nonumber $$ In $(b)$, we use the fact that the third cumulant is equal to the third central moment.Proof of $(iii)$.
In $(iii)$, we see that : $$ \begin{aligned} \mathbb{E}_{H_0}\left[((\mathbf{y}_i - \mu^0_i) (\mathbf{y}_{i'} - \mu^0_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} \gamma_{i}^0 \gamma^0_{i'})^2 \right] &= (\delta_i^{-1}\delta_{i'}^{-1})^2 (\gamma_i^0 \gamma^0_{i'})^2 \mathbb{E}_{H_0}\left[ (\mathbf{y}_i - \mu^0_i)^2(\mathbf{y}_{i'} - \mu^0_{i'})^2\right] \\ &= (\delta_i^{-1}\delta_{i'}^{-1})^2 (\gamma_i^0 \gamma^0_{i'})^2 \mathbb{E}_{H_0}\left[ (\mathbf{y}_i - \mu^0_i)^2\right] \mathbb{E}_{H_0}\left[(\mathbf{y}_{i'} - \mu^0_{i'})^2\right] \\ &= (\delta_i^{-1})^2 (\gamma^0_i)^2 v(\mu^0_i) (\delta_{i'}^{-1})^2 (\gamma^0_{i'})^2 v(\mu^0_{i'}) \\ &\overset{(a')}{=} \gamma^0_i v(\mu^0_i) \gamma^0_{i'} v(\mu^0_{i'}) \end{aligned} \nonumber $$ In $(a')$, we use the fact that: $$ ((\delta_i^0)^{-1})^2 v(\mu^0_i) = \left(\frac{\partial \eta^0_i}{\partial \mu_i} \right)^2 v(\mu^0_i) = (\gamma_i^0)^{-1} \nonumber $$This matches the form in Lin (1997) if we let \(r_{i,i} = (\delta_i^{-1})^4 (\gamma_i^0)^4 \kappa_{i}^4 + 2(\gamma_i^0)^2 + (\rho_i^0)^2 \kappa_{i}^2 - 2 (\gamma_i^0)^2 (\delta_i^{-1})^2 \rho^0_i \kappa_{i}^3\) and \(r_{i, i'} = 2 \gamma^0_i\gamma^0_{i'}\).
$$ \mathcal{I}_{\tau^2, \tau^2} \bigg\rvert_{H_0} \approx \frac{1}{4} \sum_{i = 1}^k \sum_{j = 1}^n \mathbf{Z}_{i,j}^4 (\mu_{i,j}^0 + 2(\mu_{i,j}^0)^2) $$
Proof.
$$ \begin{aligned} U_{\tau^2}^2 \bigg\rvert_{H_0} &\approx \left(\text{tr}\left[ \mathbf{Z}^\top \mathbf{C}^0 \mathbf{Z} \right] \right)^2 \\ &= \left(\sum_{i = 1}^k\mathbf{Z}_{\cdot, i}^\top \mathbf{C}^0 \mathbf{Z}_{\cdot, i} \right)^2 \\ &= \sum_{i = 1}^k \sum_{i' = 1}^k (\mathbf{Z}_{\cdot, i}^\top \mathbf{C}^0 \mathbf{Z}_{\cdot, i})(\mathbf{Z}_{\cdot, i'} \mathbf{C}^0 \mathbf{Z}_{\cdot, i'}) \\ &= \sum_{i = 1}^k \sum_{i' = 1}^k \sum_{j = 1}^n \sum_{j' = 1}^n \sum_{j'' = 1}^n \sum_{j''' = 1}^n \mathbf{Z}_{j,i} \mathbf{Z}_{j',i} \mathbf{Z}_{j'', i'} \mathbf{Z}_{j''', i'} \mathbf{C}_{j,j'}^0 \mathbf{C}_{j'', j'''}^0 \end{aligned} \nonumber $$ The summands in the above are equal to zero if there is ever an index that is different from the others due to the independence assumption. Thus, we can restrict the summations over $j$, $j'$, $j''$, and $j'''$ to only settings where all of the indices are the same or there are two pairs of matching indices (when $j = j'$ and $j = j''$ and $j = j'''$ and the corresponding other pairs): $$ \begin{aligned} U_{\tau^2}(\mathbf{0})^2 &\approx \sum_{i = 1}^k \sum_{i' = 1}^k \sum_{j = 1}^n \left[ \mathbf{Z}_{j,i}^2 \mathbf{Z}_{j, i'}^2 (\mathbf{C}_{j,j}^0)^2 + \sum_{j' = 1 \\ j' \neq j}^n \mathbf{Z}_{j, i}^2 \mathbf{Z}_{j', i'}^2 \mathbf{C}_{j,j}^0 \mathbf{C}_{j',j'}^0 \right] \end{aligned} \nonumber $$ We only need to deal with $\mathbf{C}^0$ terms when taking the expectation, and we have: $$ \begin{aligned} \mathbb{E}_{H_0}\left[ (\mathbf{C}_{j,j}^0)^2 \right] &= \mathbb{E}_{H_0}\left[ ((\mathbf{y}_j - \mu_j^0)^2 - \mu_j^0)^2 \right] \\ &= \mathbb{E}_{H_0}\left[ (\mathbf{y}_j - \mu_j^0)^4\right] - 2 \mu_j^0 \mathbb{E}_{H_0}\left[ (\mathbf{y}_j - \mu_j^0)^2 \right] + (\mu_j^0)^2 \\ &= (\mu_j^0 + 3 (\mu_j^0)^2) - 2 (\mu_j^0)^2 + (\mu_j^0)^2 \\ &= \mu_j^0 + 2(\mu_j^0)^2 \\ \mathbb{E}_{H_0}\left[ \mathbf{C}_{j,j}^0 \mathbf{C}_{j',j'}^0\right] &= \mathbb{E}_{H_0}\left[ ((\mathbf{y}_j - \mu_j^0)^2 - \mu_j^0)((\mathbf{y}_{j'} - \mu_{j'}^0)^2 - \mu_{j'}^0) \right] \\ &= \mathbb{E}_{H_0}\left[ (\mathbf{y}_j - \mu_j^0)^2 \right]\mathbb{E}_{H_0}\left[ (\mathbf{y}_{j'} - \mu_{j' }^0)^2\right] - \mu_j^0 \mathbb{E}_{H_0}\left[ (\mathbf{y}_{j'} - \mu_{j'}^0)^2 \right] - \mu_{j'}^0 \mathbb{E}_{H_0}\left[ (\mathbf{y}_j - \mu_j^0)^2 \right] + \mu_j^0 \mu_{j'}^0 \\ &= \mu_j^0 \mu_{j'}^0 - \mu_{j'}^0 \mu_j^0 - \mu_{j}^0 \mu_{j'}^0 + \mu_j^0 \mu_{j'}^0 \\ &= 0 \end{aligned} \nonumber $$ where we use the fact that the fourth central moment for the Poisson distribution is equal to the mean plus three times the mean squared (via the relationship between central moments and cumulants). Putting the above all together, we get: $$ \begin{aligned} \mathbb{E}_{H_0}\left[ U_{\tau^2}^2(\mathbf{0})\right] &\approx \frac{1}{4} \sum_{i = 1}^k \sum_{i' = 1}^k \sum_{j = 1}^n \left[ \mathbf{Z}_{j,i}^2 \mathbf{Z}_{j, i'}^2 \mathbb{E}_{H_0}\left[(\mathbf{C}_{j,j}^0)^2\right] + \sum_{j' = 1 \\ j' \neq j}^n \mathbf{Z}_{j, i}^2 \mathbf{Z}_{j', i'}^2 \underbrace{\mathbb{E}_{H_0}\left[\mathbf{C}_{j,j}^0 \mathbf{C}_{j',j'}^0 \right]}_{=0} \right] \\ &= \frac{1}{4} \sum_{i = 1}^k \sum_{i' = 1}^k \sum_{j = 1}^n \mathbf{Z}_{j,i}^2 \mathbf{Z}_{j, i'}^2 (\mu_j^0 + 2(\mu_j^0)^2 ) \end{aligned} \nonumber $$ Noticing that $\mathbf{Z}_{j,i} = 0$ if the index $j$ does not fall into those corresponding to cluster $i$, we can simplify to: $$ \mathbb{E}_{H_0}\left[ U_{\tau^2}^2(\mathbf{0})\right] \approx \frac{1}{4} \sum_{i = 1}^k \sum_{j = 1}^n \mathbf{Z}_{i,j}^4 (\mu_{i,j}^0 + 2(\mu_{i,j}^0)^2) $$Test
The test statistic for the global test is given by:
\[\chi_G^2 = U_(\tau^2)^\top(\hat{\alpha}_0) \tilde{\mathcal{I}}_{\tau^2, \tau^2}^{-1}(\hat{\alpha}_0) U_\tau^2(\hat{\alpha}_0)\]Under the null hypothesis, it should have a $\chi^2_1$ distribution, asymptotically.
Example With Marginal Likelihood
Instead of doing a Taylor approximation of the quasi-likelihood like in Lin (1997), we can use the marginal quasi-likelihood method for our example since we assumed a particular distributional form for the random effects.
We can write the log-likelihood for cluster $i$ in vector notation as:
\[\ell(\mathbf{y}_i; \alpha_i, \tau^2 \rvert \beta_i) = \sum_{j = 1}^{n_i} \left[ \mathbf{y}_{i,j} \log(\mu_{i,j}) - \mu_{i,j} - \log(\mathbf{y}_{i,j}!) \right] = \mathbf{y}_i^\top \log(\mu_i) - \mu_i^\top \mathbf{1}_{n_i} - \sum_{j = 1}^{n_i} \log( \mathbf{y}_{i,j}!)\]We also have:
\[\frac{d \ell(\mathbf{y}_i; \alpha_i, \tau^2 \rvert \beta_i)}{d \beta_i} = \sum_{j = 1}^{n_i} \left[ \mathbf{y}_{i,j} \mathbf{z}_{i,j} - \exp(\alpha_i + \beta_i \mathbf{z}_{i,j})\mathbf{z}_{i,j} \right] \hspace{5mm} \frac{d^2 \ell(\mathbf{y}_i;\alpha_i, \tau^2 \rvert \beta_i)}{d \beta_i^2} = - \sum_{j = 1}^{n_i} \exp(\alpha_i + \beta_i \mathbf{z}_{i,j}) \mathbf{z}_{i,j}^2\]In our case, we only have a random slope with $D(\tau^2) = \tau^2$, so $\mathbf{K}_i$ and $\mathbf{h}_i$ are fairly simple:
\[\mathbf{K}_i = \frac{1}{\tau^2} - \left(- \sum_{j = 1}^{n_i} \mu_{i,j}^0 \mathbf{z}_{i,j}^2\right); \hspace{5mm} \mathbf{h}_i = \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j}\]Applying Eq. \eqref{eq:marg-qll} to this case, we get the marginal log-likelihood:
\[\begin{aligned} \ell(\mathbf{y}; \alpha, \tau^2) &\approx \sum_{i = 1}^k \left[-\frac{1}{2}\log\left(1 + \tau^2 \exp(\alpha_i) \sum_{j= 1}^{n_i} \mathbf{z}_{i,j}^2\right) + \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j}\alpha_i - \exp(\alpha_i) - \log(\mathbf{y}_{i,j}!)) + \frac{1}{2} \left( \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j}\right)^2 \left(\frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j}^2 \right)^{-1}\right] \end{aligned}\]Let \(\dot{\mathbf{z}}_i = \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}(\mathbf{y}_{i,j} - \exp(\alpha_i))\) and define:
\[\begin{aligned} \upsilon_i &= \frac{\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2\left( 1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)} & \zeta_i &= \frac{\exp(\alpha_i)}{\frac{1}{\tau^2} + \exp(\alpha_i)\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}} = \frac{\tau^2 \exp(\alpha_i)}{1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}} \\ \xi_i &= \frac{\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2\left(\frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)} = \frac{\tau^2 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2\left(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)} & \psi_i &= \frac{1}{1 + 2 \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 + \left(\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^2} \end{aligned}\]Score Functions
We have:
\[\begin{aligned} \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i} \bigg\rvert_{H_0} &\approx -\upsilon_i + \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) - \zeta_i \dot{\mathbf{z}}_i \left[\xi_i \dot{\mathbf{z}}_i + \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\right] \end{aligned}\]Proof.
We have: $$ \begin{aligned} \frac{d}{d \alpha_i} \left[ - \frac{1}{2}\log\left( 1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right)\right] \bigg\rvert_{H_0} &= -\frac{1}{2}\left( 1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^{-1} \left(\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \\ \frac{d}{d \alpha_i} \left[ \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} \alpha_i - \exp(\alpha_i) - \log(\mathbf{y}_{i,j}!)) \right] &= \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) \\ \frac{d}{d \alpha_i} \left[ \frac{1}{2} \left( \sum_{j = 1}^{n_i}(\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j} \right)^2 \right] &= -\left(\sum_{j = 1}^{n_i}(\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j}\right) \left(\exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j} \right)\\ \frac{d}{d \alpha_i} \left[ \left( \frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^{-1} \right] &= - \left(\frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^{-2} \left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \nonumber \end{aligned} $$ Thus, the derivative is: $$ \begin{aligned} \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i} \bigg\rvert_{H_0} &\approx \frac{d}{\alpha_i} \left[ -\frac{1}{2}\log\left(1 + \tau^2 \exp(\alpha_i) \sum_{j= 1}^{n_i} \mathbf{z}_{i,j}^2\right) + \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j}\alpha_i - \exp(\alpha_i) - \log(\mathbf{y}_{i,j}!)) + \frac{1}{2} \left( \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j}\right)^2 \left(\frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j}^2 \right)^{-1} \right] \\ &= -\frac{\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2\left( 1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)} + \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) + \frac{-\left(\sum_{j = 1}^{n_i}(\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j}\right) \left(\exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j} \right)}{\frac{1}{\tau^2} + \exp(\alpha_i)\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2} + \frac{\frac{1}{2} \left( \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j}\right)^2\left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)}{-\left(\frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^{2}} \\ &= -\frac{\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2\left( 1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)} + \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) - \frac{\left(\sum_{j = 1}^{n_i}(\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j}\right) \left(\exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j} \right)}{\frac{1}{\tau^2} + \exp(\alpha_i)\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2} - \frac{ \left( \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j}\right)^2\left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)}{2\left(\frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^{2}} \\ &= -\frac{\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2\left( 1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)} + \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) -\frac{\exp(\alpha_i)\sum_{j = 1}^{n_i}(\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j}}{\frac{1}{\tau^2} + \exp(\alpha_i)\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}} \left[\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}+ \frac{\left(\sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j} \right)\left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right)}{2\left(\frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)}\right] \\ &= -\upsilon_i + \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) - \zeta_i \left[ \sum_{j = 1}^{n_i}\mathbf{z}_{i,j}(\mathbf{y}_{i,j} - \exp(\alpha_i)) \right]\left[\sum_{j = 1}^{n_i} \mathbf{z}_{i,j} + \xi_i \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} (\mathbf{y}_{i,j} - \exp(\alpha_i))\right] \end{aligned} \nonumber $$Proof.
We have: $$ \begin{aligned} \frac{d}{d \tau^2} \left[ - \frac{1}{2}\log\left( 1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right)\right] &= -\frac{1}{2}\left( 1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^{-1} \left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \\ \frac{d}{d \tau^2} \left[ \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} \alpha_i - \exp(\alpha_i) - \log(\mathbf{y}_{i,j}!)) \right] &= 0\\ \frac{d}{d \tau^2} \left[ \frac{1}{2} \left( \sum_{j = 1}^{n_i}(\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j} \right)^2 \right] &= 0 \\ \frac{d}{d \tau^2} \left[ \left( \frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^{-1} \right] &= \left(\frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^{-2} (\tau^2)^{-2} \end{aligned} \nonumber $$ Thus, the derivative is: $$ \begin{aligned} \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2} &\approx \sum_{i = 1}^k \frac{d}{d\tau^2} \left[ -\frac{1}{2}\log\left(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right) + \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j}\alpha_i - \exp(\alpha_i) - \log(\mathbf{y}_{i,j}!)) + \frac{1}{2} \left( \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j}\right)^2 \left(\frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j}^2 \right)^{-1} \right] \\ &= \sum_{i = 1}^k \left[-\frac{\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2\left( 1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)} + \frac{\left( \sum_{j = 1}^{n_i}(\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j} \right)^2}{2 (\tau^2)^2 \left(\frac{1}{\tau^2} + \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^2}\right] \\ &= \sum_{i = 1}^k \left[-\frac{\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2\left( 1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)} + \frac{\left( \sum_{j = 1}^{n_i}(\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j} \right)^2}{2 (\tau^2)^2 \left(\frac{1}{(\tau^2)^2} + 2\frac{1}{\tau^2} \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 + \left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^2 \right) }\right] \\ &= \frac{1}{2}\sum_{i = 1}^k \left[-\frac{\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 } + \frac{\left( \sum_{j = 1}^{n_i}(\mathbf{y}_{i,j} - \exp(\alpha_i))\mathbf{z}_{i,j} \right)^2}{1 + 2\tau^2\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 + \left(\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^2}\right] \\ &= \frac{1}{2}\sum_{i = 1}^k \left[-\frac{2 \upsilon_i}{\tau^2} + \psi_i \left( \sum_{j = 1}^{n_i}\mathbf{z}_{i,j}(\mathbf{y}_{i,j} - \exp(\alpha_i)) \right)^2 \right] \end{aligned} $$Notice that under the null hypothesis, the gradient of the log-likelihood with respect to $\alpha$ becomes:
\[\frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha} \approx \begin{bmatrix} \sum_{j = 1}^{n_1} (\mathbf{y}_{1, j} - \exp(\alpha_1)) \\ \vdots \\ \sum_{j = 1}^{n_k} (\mathbf{y}_{k, j} - \exp(\alpha_k)) \end{bmatrix}\]Setting the above equal to $\mathbf{0}_{k}$ and solving for $\alpha$ gives us the MLE under the null:
\[\hat{\alpha} = \begin{bmatrix} \log\left(\frac{1}{n_1} \sum_{j = 1}^{n_1} \mathbf{y}_{1, j}\right) \\ \vdots \\ \log\left(\frac{1}{n_k} \sum_{j = 1}^{n_k} \mathbf{y}_{k, j}\right) \end{bmatrix}\]A Note On The MLE
Suppose we let $\alpha^* = \alpha + \epsilon$ where $\alpha$ is the true value of $\alpha$ and $\epsilon$ is a little bit of jitter (negative or positive). Consider evaluating the gradient of the marginal log-likelihood w.r.t. $\alpha$ at $(\alpha, \tau^2) = (\alpha^*, 0)$. We know that $\upsilon_i = \zeta_i = \xi_i = 0$ for $\tau^2 = 0$, so the component of the score corresponding to $\alpha_i$ simplifies to:
\[\frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i} \bigg\rvert_{(\alpha, \tau^2) = (\alpha^*, 0)} = \left(\sum_{j = 1}^{n_i} \mathbf{y}_{i,j}\right) - n_i \exp(\epsilon)\exp(\alpha_i)\]Proof.
$$ \begin{aligned} \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i} \bigg\rvert_{(\alpha, \tau^2) = (\alpha^*, 0)} &= \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i + \epsilon)) \\ &= \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i) \exp(\epsilon)) \\ &= \left(\sum_{j = 1}^{n_i} \mathbf{y}_{i,j}\right) - n_i \exp(\epsilon)\exp(\alpha_i) \end{aligned} \nonumber $$Since $e^x > 0$ and $n_i > 0$, we see that adding any jitter to $\alpha$ corresponds to scaling the amount we are subtracting off the cluster sum by a factor that is exponential in the amount of jitter.
Now let’s consider the gradient of the marginal log-likelihood w.r.t. $\tau^2$:
\[\frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2} \bigg\rvert_{(\alpha, \tau^2) = (\alpha^*, 0)} = \frac{1}{2}\sum_{i = 1}^k \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \left(\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}\right) \left( \exp(\epsilon)\exp(\alpha_i) - \frac{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'} \mathbf{y}_{i,j'} + \frac{1}{2}\mathbf{z}_{i,j}}{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}} \right)^2 + c\]Proof.
$$ \begin{aligned} \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2} \bigg\rvert_{(\alpha, \tau^2) = (\alpha^*, 0)} &= \frac{1}{2} \sum_{i = 1}^k \left[ -\exp(\alpha_i + \epsilon) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 + \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j} (\mathbf{y}_{i,j} - \exp(\alpha_i + \epsilon))\right)^2 \right] \\ &= \frac{1}{2} \sum_{i = 1}^k \left[ - \exp(\alpha_i + \epsilon) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 + \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 (\mathbf{y}_{i,j} - \exp(\alpha_i + \epsilon))^2 + \sum_{j = 1}^{n_i} \sum_{j' \neq j} \mathbf{z}_{i,j} \mathbf{z}_{i,j'} (\mathbf{y}_{i,j} - \exp(\alpha_i + \epsilon))(\mathbf{y}_{i,j'} - \exp(\alpha_i + \epsilon)) \right] \\ &= \frac{1}{2} \sum_{i = 1}^k \left[ - \exp(\alpha_i + \epsilon) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 + \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \mathbf{y}_{i,j}^2 - 2\exp(\alpha_i + \epsilon)\sum_{j = 1}^{n_i}\mathbf{z}_{i,j}^2 \mathbf{y}_{i,j} + \exp^2(\alpha_i + \epsilon) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 + \sum_{j = 1}^{n_i} \sum_{j' \neq j} \mathbf{z}_{i,j} \mathbf{z}_{i,j'} (\mathbf{y}_{i,j}\mathbf{y}_{i,j'} - \exp(\alpha_i + \epsilon) \mathbf{y}_{i,j} - \exp(\alpha_i + \epsilon)\mathbf{y}_{i,j'} - \exp^2(\alpha_i + \epsilon)) \right] \\ &= \frac{1}{2}\sum_{i = 1}^k \left[ - \exp(\alpha_i + \epsilon) \sum_{j =1 }^{n_i} \mathbf{z}_{i,j}^2 + \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \mathbf{y}_{i,j}\right)^2 - 2\exp(\alpha_i + \epsilon)\left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \mathbf{y}_{i,j}\right)\left(\sum_{j =1 }^{n_i} \mathbf{z}_{i,j} \right) + \exp^2(\alpha_i + \epsilon) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\right)^2 \right] \\ &= \frac{1}{2}\sum_{i = 1}^k \sum_{j = 1}^{n_i} \left[ - \exp(\alpha_i + \epsilon) \mathbf{z}^2_{i,j} + \mathbf{z}_{i,j} \mathbf{y}_{i,j}\left(\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'} \mathbf{y}_{i,j'}\right) - 2\exp(\alpha_i + \epsilon)\mathbf{z}_{i,j} \left(\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'} \mathbf{y}_{i,j'}\right)+ \exp^2(\alpha_i + \epsilon) \mathbf{z}_{i,j} \left(\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}\right) \right] \\ &= \frac{1}{2}\sum_{i = 1}^k \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \left(\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}\right) \left[ - \exp(\alpha_i + \epsilon) \frac{\mathbf{z}_{i,j}}{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}} + \frac{\mathbf{y}_{i,j}\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'} \mathbf{y}_{i,j'}}{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}} - 2\exp(\alpha_i + \epsilon)\frac{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'} \mathbf{y}_{i,j'}}{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}}+ \exp^2(\alpha_i + \epsilon) \right] \\ &= \frac{1}{2}\sum_{i = 1}^k \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \left(\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}\right) \left[ \exp^2(\alpha_i + \epsilon) - 2 \exp(\alpha_i + \epsilon) \left( \frac{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'} \mathbf{y}_{i,j'} + \frac{1}{2}\mathbf{z}_{i,j}}{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}} \right) + \frac{\mathbf{y}_{i,j}\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'} \mathbf{y}_{i,j'}}{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}} \right] \\ &= \frac{1}{2}\sum_{i = 1}^k \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \left(\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}\right) \left[ \left( \exp(\alpha_i + \epsilon) - \frac{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'} \mathbf{y}_{i,j'} + \frac{1}{2}\mathbf{z}_{i,j}}{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}} \right)^2 - \left(\frac{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'} \mathbf{y}_{i,j'} + \frac{1}{2}\mathbf{z}_{i,j}}{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}}\right)^2 + \frac{\mathbf{y}_{i,j}\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'} \mathbf{y}_{i,j'}}{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}} \right] \\ &= \frac{1}{2}\sum_{i = 1}^k \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \left(\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}\right) \left( \exp(\alpha_i + \epsilon) - \frac{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'} \mathbf{y}_{i,j'} + \frac{1}{2}\mathbf{z}_{i,j}}{\sum_{j' = 1}^{n_i} \mathbf{z}_{i,j'}} \right)^2 + c \end{aligned} \nonumber $$ were $c$ is defined appropriately.where $c$ is a term that does not depend upon $\epsilon$. For similar reasons as before, we see that all the jitter does is scale the leading term in the quadratic within the sum. Thus, as $\epsilon$ grows in magnitude, the score will also grow in magnitude exponentially in $2\epsilon$.
Information
We’ll derive the information matrix in chunks. First, we have:
\[\mathbb{E}_{H_0}\left[\dot{\mathbf{z}}_i \right] = 0; \hspace{5mm} \mathbb{E}_{H_0}\left[\dot{\mathbf{z}}_i^2 \right] = \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \mu_{i,j}^0; \hspace{5mm} \mathbb{E}_{H_0}\left[\dot{\mathbf{z}}_i^3 \right] = \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^3 \mu_{i,j}^0; \hspace{5mm} \mathbb{E}_{H_0}\left[\dot{\mathbf{z}}_i^4 \right] = 3\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^4 ((\mu_{i,j}^0)^2 + \mu_{i,j}^0) + 3\sum_{j = 1}^{n_i} \sum_{j' \neq j} \mathbf{z}_{i,j}^2 \mathbf{z}_{i,j'}^2 \mu_{i,j}^0 \mu_{i,j'}^0\]Proof.
$$ \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i\right] = \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j} - \exp(\alpha_i) \right] = 0 \nonumber $$ $$ \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \right] = \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \underbrace{\mathbb{E}_{H_0}\left[ (\mathbf{y}_{i,j} - \exp(\alpha_i))^2 \right]}_{=\text{Var}(\mathbf{y}_{i,j})} + \sum_{j = 1}^{n_i} \sum_{j' \neq j} \mathbf{z}_{i,j} \mathbf{z}_{i,j'} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j} - \exp(\alpha_i) \right]}_{=0}\underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j'} - \exp(\alpha_i) \right]}_{=0} = \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \mu_{i,j}^0 \nonumber $$ $$ \begin{aligned} \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^3 \right] &= \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^3 \underbrace{\mathbb{E}_{H_0}\left[ (\mathbf{y}_{i,j} - \exp(\alpha_i))^3 \right]}_{= \text{ 3rd central moment } \\ = \mu_{i,j}^0} + 2 \sum_{j = 1}^{n_i} \sum_{j' \neq j} \mathbf{z}_{i,j}^2 \mathbf{z}_{i,j'} \underbrace{\mathbb{E}_{H_0}\left[ (\mathbf{y}_{i,j} - \exp(\alpha_i))^2 \right]}_{= \text{Var}(\mathbf{y}_{i,j})} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j'} - \exp(\alpha_i) \right]}_{=0} \\ &\hspace{5mm} + \sum_{j = 1}^{n_i} \sum_{j' \neq j} \sum_{j'' \neq j, j'} \mathbf{z}_{i,j} \mathbf{z}_{i,j'} \mathbf{z}_{i,j''} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j} - \exp(\alpha_i) \right]}_{=0} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j'} - \exp(\alpha_i) \right]}_{=0}\underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j''} - \exp(\alpha_i) \right]}_{=0} \\ &= \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^3 \mu_{i,j}^0 \end{aligned} \nonumber $$ $$ \begin{aligned} \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^4 \right] &= \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^4 \underbrace{\mathbb{E}_{H_0}\left[ (\mathbf{y}_{i,j} - \exp(\alpha_i))^4 \right]}_{= \text{ 4th central moment } \\ = 3(\mu_{i,j}^0)^2 + \mu_{i,j}^0} + 4 \sum_{j = 1}^{n_i} \sum_{j' \neq j} \mathbf{z}_{i,j}^3 \mathbf{z}_{i,j'}\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j} - \exp(\alpha_i)^3 \right] \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j'} - \exp(\alpha_i) \right]}_{= 0} \\ &\hspace{5mm} + 3 \sum_{j = 1}^{n_i} \sum_{j' \neq j} \mathbf{z}_{i,j}^2 \mathbf{z}_{i,j'}^2 \underbrace{\mathbb{E}_{H_0}\left[ (\mathbf{y}_{i,j} - \exp(\alpha_i))^2 \right]}_{= \text{Var}(\mathbf{y}_{i,j})}\underbrace{\mathbb{E}_{H_0}\left[ (\mathbf{y}_{i,j} - \exp(\alpha_i))^2 \right]}_{= \text{Var}(\mathbf{y}_{i,j'})} + 12 \sum_{j = 1}^{n_i} \sum_{j' \neq j} \sum_{j'' \neq j, j'} \mathbf{z}^2_{i,j} \mathbf{z}_{i,j'} \mathbf{z}_{i,j''} \mathbb{E}_{H_0}\left[ (\mathbf{y}_{i,j} - \exp(\alpha_i))^2 \right] \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j'} - \exp(\alpha_i) \right]}_{= 0}\underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j''} - \exp(\alpha_i) \right]}_{= 0} \\ &\hspace{5mm} + \sum_{j = 1}^{n_i} \sum_{j' \neq j} \sum_{j'' \neq j, j'} \sum_{j''' \neq j, j', j''} \mathbf{z}_{i,j} \mathbf{z}_{i,j'} \mathbf{z}_{i,j''} \mathbf{z}_{i,j'''} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j} - \exp(\alpha_i) \right]}_{= 0} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j'} - \exp(\alpha_i) \right]}_{= 0} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j''} - \exp(\alpha_i) \right]}_{= 0} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j'''} - \exp(\alpha_i) \right]}_{= 0} \\ &= \sum_{j = 1}^{n_i} 3 \mathbf{z}_{i,j}^4 ((\mu_{i,j}^0)^2 + \mu_{i,j}^0) + 3\sum_{j = 1}^{n_i} \sum_{j' \neq j} \mathbf{z}_{i,j}^2 \mathbf{z}_{i,j'}^2 \mu_{i,j}^0 \mu_{i,j'}^0 \end{aligned} \nonumber $$We can show that:
\[\mathbb{E}_{H_0}\left[ \left(\frac{\partial \ell(\mathbf{y}; \alpha, \tau^2)}{\partial \alpha_i}\right)^2 \right] \approx n_i \exp(\alpha_i)\]Proof.
All expectations are taking under the null hypothesis, so $\mathbb{E}_{H_0}[\mathbf{y}_{i,j}] = \exp(\alpha_i)$. Furthermore, we have that $\upsilon_i = 0$ and $\psi_i = 1$ under the null (also $\zeta_i$ and $\xi_i$ go to $0$ as we take $\tau^2 \rightarrow 0$). The cross terms come out to: $$ \begin{aligned} \mathbb{E}_{H_0}\left[ \upsilon_i \zeta_i \dot{\mathbf{z}}_i \left[\sum_{j = 1}^{n_i} \mathbf{z}_{i,j} + \xi_i \dot{\mathbf{z}}_i \right] \right] &= \upsilon_i \zeta_i \left( \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \underbrace{\mathbb{E}_{H_0}\left[\dot{\mathbf{z}}_i\right]}_{=0} + \xi_i \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \right] \right) = \upsilon_i \zeta_i \xi_i \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \mu_{i,j}^0 = \upsilon_i \zeta_i \xi_i \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \\ \mathbb{E}_{H_0}\left[ \upsilon_i \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) \right] &= \upsilon_i \sum_{j = 1}^{n_i} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j} - \exp(\alpha_i) \right]}_{=0} = 0 \\ \mathbb{E}_{H_0}\left[ \left(\sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) \right)\left( \zeta_i \dot{\mathbf{z}}_i \left[ \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} + \xi_i \dot{\mathbf{z}}_i \right]\right) \right] &= \zeta_i \left(\sum_{j = 1}^{n_i}\mathbf{z}_{i,j}\right) \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i))\right] + \zeta_i \xi_i \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}^2_i \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) \right] \\ &= \zeta_i \left(\sum_{j = 1}^{n_i}\mathbf{z}_{i,j}\right) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \mu_{i,j}^0 \right) + \zeta_i \xi_i \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \mu_{i,j}^0 \\ &= \zeta_i \exp(\alpha_i) \left[ \left(\sum_{j = 1}^{n_i}\mathbf{z}_{i,j}\right)^2 + \xi_i \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)\right] \end{aligned} \nonumber $$ All of the above terms go to $0$ under the null (as we take $\tau^2 \rightarrow 0$), so their sum also goes to $0$. Looking at the squared terms: $$ \begin{aligned} \mathbb{E}_{H_0}\left[ \upsilon_i^2 \right] &= \upsilon_i^2 \\ \mathbb{E}_{H_0}\left[ \left( \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) \right)^2 \right] &= \sum_{j = 1}^{n_i} \mathbb{E}_{H_0}\left[ (\mathbf{y}_{i,j} - \exp(\alpha_i))^2 \right] + \sum_{j = 1}^{n_i} \sum_{j' \neq j} \mathbb{E}_{H_0}\left[ (\mathbf{y}_{i,j} - \exp(\alpha_i))(\mathbf{y}_{i, j'} - \exp(\alpha_i)) \right] \\ &= \sum_{j = 1}^{n_i} \text{Var}(\mathbf{y}_{i,j}) + \sum_{j = 1}^{n_i} \sum_{j' \neq j} \underbrace{\mathbb{E}_{H_0}\left[ (\mathbf{y}_{i,j} - \exp(\alpha_i)) \right]}_{=0} \underbrace{\mathbb{E}_{H_0}\left[(\mathbf{y}_{i, j'} - \exp(\alpha_i)) \right]}_{=0} \\ &= \sum_{j = 1}^{n_i} \mu_{i,j}^0 \\ &= n_i \exp(\alpha_i) \\ \mathbb{E}_{H_0}\left[ \left( \zeta_i \dot{\mathbf{z}}_i \left[\sum_{j = 1}^{n_i} \mathbf{z}_{i,j} + \xi_i \dot{\mathbf{z}}_i \right]\right)^2\right] &= \zeta_i^2 \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 + \xi_i \dot{\mathbf{z}}_i \right)^2 \right] \\ &= \zeta_i^2 \left[\left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right)^2 \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \right]+ 2 \xi^2_i \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}^3_i \right] + \xi_i^2 \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^4 \right] \right] \\ &= \zeta_i^2 \left[ \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^2\left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \mu_{i,j}^0\right) + 2 \xi_i^2 \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^3 \mu_{i,j}^0 \right) + 3 \xi_i^2 \sum_{j = 1}^{n_i} \sum_{j' \neq j} \mathbf{z}_{i,j}^2 \mathbf{z}_{i,j'}^2 \mu_{i,j}^0 \mu_{i,j'}^0\right] \\ &= \zeta_i^2 \exp(\alpha_i) \left[ \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^3 + 2 \xi_i^2 \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^3 \right) + 3 \xi_i^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \sum_{j' \neq j} \mathbf{z}_{i,j}^2 \mathbf{z}_{i,j'}^2 \right] \\ \end{aligned} \nonumber $$ Only the second term is non-zero under the null hypothesis, so: $$ \begin{aligned} \mathbb{E}_{H_0}\left[ \left(\frac{\partial \ell(\mathbf{y}; \alpha, \tau^2)}{\partial \alpha_i}\right)^2 \right] \approx n_i \exp(\alpha_i) \end{aligned} $$Proof.
$$ \begin{aligned} \mathbb{E}_{H_0}\left[ \upsilon_i \upsilon_l \right] &= \upsilon_i \upsilon_l \\ \mathbb{E}_{H_0}\left[ \upsilon_i \sum_{j = 1}^{n_l} (\mathbf{y}_{l, j} - \exp(\alpha_l)) \right] &= \upsilon_i \sum_{j = 1}^{n_l} \underbrace{\mathbb{E}_{H_0} \left[ \mathbf{y}_{l,j} - \exp(\alpha_l) \right]}_{=0} = 0 \\ \mathbb{E}_{H_0}\left[ \upsilon_i \zeta_l \dot{\mathbf{z}}_l \left(\xi_l \dot{\mathbf{z}}_l + \sum_{j = 1}^{n_l} \mathbf{z}_{l, j} \right) \right] &= \upsilon_i \zeta_l \xi_l \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l^2 \right] + \upsilon_l \zeta_l \left( \mathbf{z}_{l,j} \right) \underbrace{\mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l \right]}_{=0} \\ &= \upsilon_i \zeta_l \xi_l \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2 \mu_{l, j}^0 \\ &= \upsilon_i \zeta_l \xi_l \exp(\alpha_l) \sum_{j = 1}^{n_l} \mathbf{z}_{l, j}^2 \\ \mathbb{E}_{H_0}\left[ \left(\sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) \right) \left(\sum_{j = 1}^{n_l} (\mathbf{y}_{l,j} - \exp(\alpha_l)) \right)\right] &= \sum_{j = 1}^{n_i} \sum_{j = 1}^{n_l} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j} - \exp(\alpha_i) \right]}_{=0} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{l,j} - \exp(\alpha_l) \right]}_{=0} \\ &= 0 \\ \mathbb{E}_{H_0}\left[ \left(\sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) \right) \zeta_l \dot{\mathbf{z}}_l\left( \xi_l \dot{\mathbf{z}}_l + \sum_{j = 1}^{n_l} \mathbf{z}_{l,j} \right)\right] &= \zeta_l \xi_l \sum_{j = 1}^{n_i} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j} - \exp(\alpha_i)\right]}_{=0} \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l\right] + \sum_{j = 1}^{n_i} \sum_{j = 1}^{n_l} \mathbf{z}_{l,j} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j} - \exp(\alpha_i) \right]}_{=0} \\ &= 0 \\ \mathbb{E}_{H_0}\left[ \zeta_i \zeta_l \dot{\mathbf{z}}_i \dot{\mathbf{z}}_l \left(\xi_i \dot{\mathbf{z}}_i + \sum_{j =1}^{n_i} \mathbf{z}_{i,j}\right)\left(\xi_l \dot{\mathbf{z}}_l + \sum_{j = 1}^{n_l} \mathbf{z}_{l,j} \right)\right] &= \zeta_i \zeta_l \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i \dot{\mathbf{z}}_l\left(\xi_i \xi_l \dot{\mathbf{z}}_i \dot{\mathbf{z}}_l + \xi_i \dot{\mathbf{z}}_i \sum_{j = 1}^{n_l} \mathbf{z}_{l,j} + \xi_l \dot{\mathbf{z}}_l \sum_{j = 1}^{n_l} \mathbf{z}_{i,j} + \left( \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\right)\left( \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}\right)\right) \right] \\ &= \zeta_i \zeta_l \left[ \xi_i \xi_l \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \right] \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l^2 \right] + \xi_i \left(\sum_{j = 1}^{n_l} \mathbf{z}_{l,j} \right) \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \right] \underbrace{\mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l \right]}_{=0} + \xi_l \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right) \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l^2 \right] \underbrace{\mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i \right]}_{=0} + \left( \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\right)\left( \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}\right)\underbrace{\mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i \right]}_{=0}\underbrace{\mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l \right]}_{=0}\right] \\ &= \zeta_i \zeta_l \xi_i \xi_l \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \mu_{i,j}^0 \right) \left(\sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2 \mu_{l,j}^0 \right) \\ &= \zeta_i \zeta_l \xi_i \xi_l \eta(\alpha_i) \eta(\alpha_l) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \left(\sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2 \right) \end{aligned} \nonumber $$ As $\tau^2 \rightarrow 0$, all of the above terms go to zero, so we get: $$ \mathbb{E}_{H_0}\left[\frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i} \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_l} \right] \approx 0 $$ under the null hypothesis.Proof.
$$ \begin{aligned} \mathbb{E}_{H_0}\left[ \upsilon_i \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2}\right] &\approx \upsilon_i \mathbb{E}_{H_0}\left[ \frac{1}{2} \sum_{l = 1}^k \left[ - \frac{2\upsilon_l}{\tau^2} +\dot{\mathbf{z}}_l^2 \psi_l \right]\right] \\ &= \frac{1}{2} \upsilon_i \sum_{l = 1}^{k} - \frac{2\upsilon_l}{\tau^2} + \psi_l \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l^2 \right] \\ &= \frac{1}{2} \upsilon_i \sum_{l = 1}^{k} \left( - \frac{2\upsilon_l}{\tau^2} + \psi_l \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2 \mu_{l,j}^0 \right) \\ &= \frac{1}{2} \upsilon_i \sum_{l = 1}^{k} \left(- \frac{2 \upsilon_l}{\tau^2} + \psi_l \exp(\alpha_l) \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2\right) \\ &= \frac{1}{2} \upsilon_i \sum_{l = 1}^{k} \left(- \frac{\exp(\alpha_l) \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2}{1 + \tau^2 \exp(\alpha_l) \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2}+ \psi_l \exp(\alpha_l) \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2\right)\\ &= \frac{1}{2} \upsilon_i \sum_{l = 1}^{k} \left(\exp(\alpha_l) \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2\right) \left(\psi_l - \frac{1}{1 + \tau^2 \exp(\alpha_l) \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2} \right)\\ \mathbb{E}_{H_0}\left[\left(\sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) \right) \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2} \right] &= \frac{1}{2} \sum_{l = 1}^k \mathbb{E}_{H_0}\left[ \left(\sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) \right) \left(- \frac{2\upsilon_l}{\tau^2} + \dot{\mathbf{z}}_l^2 \psi_l \right) \right] \\ &= \frac{1}{2}\sum_{l = 1}^k \left[ - \frac{2 \upsilon_l}{\tau^2} \sum_{j = 1}^{n_i} \underbrace{\mathbb{E}_{H_0}\left[ \mathbf{y}_{i,j} - \exp(\alpha_i) \right]}_{=0} + \psi_l \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l^2 \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) \right] \right] \\ &= \frac{1}{2} \sum_{l = 1}^k \left[ \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2 \mu_{l,j}^0 + \sum_{j = 1}^{n_i} \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l^2 \right] \underbrace{\mathbb{E}_{H_0}\left[ (\mathbf{y}_{i,j} - \exp(\alpha_i)) \right]}_{=0}\right] \\ &= \frac{1}{2} \sum_{l = 1}^k \exp(\alpha_l) \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2\\ \mathbb{E}_{H_0}\left[ \zeta_i \dot{\mathbf{z}}_i \left[ \xi_i \dot{\mathbf{z}}_i + \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right] \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2}\right] &\approx \frac{1}{2}\zeta_i \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i \left[ \xi_i \dot{\mathbf{z}}_i + \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right] \left(\sum_{l = 1}^k \left[-\frac{2 \upsilon_l}{\tau^2} +\dot{\mathbf{z}}^2_l \psi_l \right]\right) \right] \\ &= \frac{1}{2}\zeta_i \sum_{l = 1}^k \mathbb{E}_{H_0}\left[ \left( \xi_i \dot{\mathbf{z}}_i^2 + \dot{\mathbf{z}}_i \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right)\left(-\frac{2 \upsilon_l}{\tau^2} +\dot{\mathbf{z}}^2_l \psi_l \right)\right] \\ &= \frac{1}{2}\zeta_i \sum_{l = 1}^k \left[ -\frac{2 \upsilon_l \xi_i}{\tau^2} \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \right] - \frac{2 \upsilon_l}{\tau^2} \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right) \underbrace{\mathbb{E}_{H_0} \left[ \dot{\mathbf{z}}_i \right]}_{=0} + \xi_i \psi_l \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \dot{\mathbf{z}}_l \right] + \psi_l \left( \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\right)\mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l^2 \dot{\mathbf{z}}_i \right] \right] \\ &= \frac{1}{2}\zeta_i \left[ - \frac{2\xi_i}{\tau^2} \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \mu_{i,j}^0 \right) \sum_{l = 1}^k \upsilon_l + \xi_i \sum_{l = 1 \\ l \neq i}^k \psi_l \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \right] \underbrace{\mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l \right]}_{=0} + \xi_i \psi_l \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^3 \right] + \left( \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right) \left( \sum_{l = 1\\ l \neq i}^k \psi_l \mathbb{E}_{H_0}\left[ \mathbf{z}_{l}^2\right] \underbrace{\mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i \right]}_{=0} + \psi_i \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^3 \right] \right) \right] \\ &= \frac{1}{2}\zeta_i \left[ - \frac{2\xi_i \exp(\alpha_i)}{\tau^2} \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \sum_{l = 1}^k \upsilon_l + \xi_i \psi_i \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^3 \right] + \left( \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right) \psi_i \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^3 \right] \right] \\ &= \frac{1}{2}\zeta_i \left[ - \frac{2\xi_i \exp(\alpha_i)}{\tau^2} \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \sum_{l = 1}^k \upsilon_l + \left( \xi_i \psi_i + \psi_i\sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right)\left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^3 \mu_{i,j}^0 \right) \right] \\ &= \frac{1}{2}\zeta_i \exp(\alpha_i) \left[ - \frac{2\xi_i}{\tau^2} \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \sum_{l = 1}^k \upsilon_l + \psi_i \left( \xi_i + \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\right)\left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^3 \right) \right] \\ \end{aligned} \nonumber $$ Since $\upsilon_i \rightarrow 0$ and $\zeta_i \rightarrow 0$ as $\tau^2 \rightarrow 0$, the first and last term above go to zero in the limit too. Thus: $$ \mathbb{E}_{H_0}\left[ \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i} \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2}\right] \approx \frac{1}{2} \sum_{l = 1}^k \exp(\alpha_l) \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2 $$ under the null hypothesis.Proof.
$$ \begin{aligned} \mathbb{E}_{H_0}\left[ \left( \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2}\right)^2 \right] &= \frac{1}{4} \sum_{i = 1}^{k} \sum_{l = 1}^k \mathbb{E}_{H_0}\left[ \left(- \frac{2 \upsilon_i}{\tau^2} + \dot{\mathbf{z}}_i^2 \psi_i\right) \left(- \frac{2 \upsilon_l}{\tau^2} + \dot{\mathbf{z}}_l^2 \psi_l \right)\right] \\ &= \frac{1}{4} \sum_{i = 1}^k \sum_{l = 1}^k \left[ \frac{4 \upsilon_i \upsilon_l}{(\tau^2)^2} - \frac{2 \upsilon_i \psi_l }{\tau^2} \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l^2\right] - \frac{2 \upsilon_l \psi_i}{\tau^2} \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \right] + \psi_i \psi_l \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \dot{\mathbf{z}}_l^2 \right] \right] \\ &= \frac{1}{4} \sum_{i = 1}^k \sum_{l = 1}^k \left[ \frac{4 \upsilon_i \upsilon_l}{(\tau^2)^2} - \frac{2 \upsilon_i \psi_l }{\tau^2} \left( \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2 \mu_{l,j}^0 \right) - \frac{2 \upsilon_l \psi_i}{\tau^2} \left( \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \mu_{i,j}^0 \right)\right] + \frac{1}{4} \sum_{i =1}^k \sum_{l = 1 \\ l \neq i}^k \psi_i \psi_l \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2 \dot{\mathbf{z}}_l^2 \right] + \frac{1}{4}\sum_{i = 1}^{k} \psi_i^2 \mathbb{E}_{H_0}\left[\dot{\mathbf{z}}_i^4 \right] \\ &= \frac{1}{4} \sum_{i = 1}^k \sum_{l = 1}^k \left[ \frac{4 \upsilon_i \upsilon_l}{(\tau^2)^2} - \frac{2 \upsilon_i \psi_l }{\tau^2} \left( \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2 \mu_{l,j}^0 \right) - \frac{2 \upsilon_l \psi_i}{\tau^2} \left( \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \mu_{i,j}^0 \right)\right] + \frac{1}{4} \sum_{i =1}^k \sum_{l = 1 \\ l \neq i}^k \psi_i \psi_l \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_i^2\right] \mathbb{E}_{H_0}\left[ \dot{\mathbf{z}}_l^2 \right] \\ &\hspace{5mm} + \frac{1}{4}\sum_{i = 1}^{k} \psi_i^2 \left(3 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^4 ((\mu_{i,j}^0)^2 + \mu_{i,j}^0) + \sum_{j = 1}^{n_i} \sum_{j' = 1 \\ j' \neq j}^{n_i} \mathbf{z}_{i,j}^2 \mathbf{z}_{i,j'}^2 \mu_{i,j}^0 \mu_{i,j'}^0 \right)\\ &= \sum_{i = 1}^k \sum_{l = 1}^k \left[ \frac{\upsilon_i \upsilon_l}{(\tau^2)^2} - \frac{\upsilon_i \psi_l \exp(\alpha_l)}{2\tau^2} \left( \sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2\right) - \frac{\upsilon_l \psi_i \exp(\alpha_i)}{2\tau^2} \left( \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)\right] + \frac{1}{4} \sum_{i =1}^k \sum_{l = 1 \\ l \neq i}^k \psi_i \psi_l \exp(\alpha_i) \exp(\alpha_l) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \left(\sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2\right) \\ &\hspace{5mm} + \frac{1}{4}\sum_{i = 1}^{k} \psi_i^2 \left(3 (\exp^2(\alpha_i) + \exp(\alpha_i)) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^4 + \exp^2(\alpha_i)\sum_{j = 1}^{n_i} \sum_{j' = 1 \\ j' \neq j}^{n_i} \mathbf{z}_{i,j}^2 \mathbf{z}_{i,j'}^2 \right)\\ \end{aligned} \nonumber $$ Evaluating the above under the null hypothesis (using the fact that $\upsilon_i = 0$ and $\psi_i = 1$ as $\tau^2 \rightarrow 0$): $$ \begin{aligned} \mathbb{E}_{H_0}\left[ \left( \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2}\right)^2 \right] &\approx \frac{1}{4} \sum_{i =1}^k \sum_{l = 1 \\ l \neq i}^k \exp(\alpha_i) \exp(\alpha_l) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \left(\sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2\right) + \frac{1}{4}\sum_{i = 1}^{k}\left(3 (\exp^2(\alpha_i) + \exp(\alpha_i)) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^4 + \exp^2(\alpha_i)\sum_{j = 1}^{n_i} \sum_{j' = 1 \\ j' \neq j}^{n_i} \mathbf{z}_{i,j}^2 \mathbf{z}_{i,j'}^2 \right) \end{aligned} \nonumber $$Putting the above together into a big matrix, we get:
\[\mathcal{I} \rvert_{H_0} = \begin{bmatrix} \mathcal{I}_{\alpha, \alpha} \rvert_{H_0} & \mathcal{I}_{\alpha, \tau^2}\rvert_{H_0} \\ \mathcal{I}^\top_{\alpha, \tau^2}\rvert_{H_0} & \mathcal{I}_{\tau^2, \tau^2}\rvert_{H_0} \end{bmatrix}\]where:
\[\begin{aligned} \mathcal{I}_{\alpha, \alpha}\rvert_{H_0} &= \begin{bmatrix} n_1 \exp(\alpha_1) & 0 & \dots & 0 \\ 0 & n_2 \exp(\alpha_2) & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & n_k \exp(\alpha_k) \end{bmatrix} \hspace{10mm} \mathcal{I}_{\alpha, \tau^2}\rvert_{H_0} = \begin{bmatrix} \frac{1}{2}\sum_{i = 1}^{k} \exp(\alpha_i) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \\ \vdots \\ \frac{1}{2}\sum_{i = 1}^{k} \exp(\alpha_i) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \end{bmatrix} \\ \mathcal{I}_{\tau^2, \tau^2} \rvert_{H_0} &= \begin{bmatrix} \frac{1}{4} \sum_{i =1}^k \sum_{l = 1 \\ l \neq i}^k \exp(\alpha_i) \exp(\alpha_l) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \left(\sum_{j = 1}^{n_l} \mathbf{z}_{l,j}^2\right) + \frac{1}{4}\sum_{i = 1}^{k}\left(3 (\exp^2(\alpha_i) + \exp(\alpha_i)) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^4 + \exp^2(\alpha_i)\sum_{j = 1}^{n_i} \sum_{j' = 1 \\ j' \neq j}^{n_i} \mathbf{z}_{i,j}^2 \mathbf{z}_{i,j'}^2 \right) \end{bmatrix} \end{aligned}\] \[\begin{aligned} \frac{d \upsilon_i}{d \alpha_i} &= \frac{\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} & \frac{d \xi_i}{d \alpha_i} &= - \frac{\exp(\alpha_i)(\tau^2 \sum_{j =1}^{n_i} \mathbf{z}_{i,j}^2 )^2}{2 (1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} \\ \frac{d \zeta_i}{d \alpha_i} &= \frac{\tau^2 \exp(\alpha_i)}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2} & \frac{d \dot{\mathbf{z}}_i}{d \alpha_i} &=-\exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j} \end{aligned}\]Proof.
$$ \begin{aligned} \frac{d \upsilon_i}{d \alpha_i} &= \frac{d}{d \alpha_i} \left[ \frac{\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)}\right] \\ &= \frac{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)(\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2) - 2\left( \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \right)^2}{(2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j}^2))^2} \\ &= \frac{\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} \\ \frac{d \xi_i}{d \alpha_i} &= \frac{d}{d \alpha_i} \left[ \frac{\tau^2 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)} \right] \\ &= \frac{-2\exp(\alpha_i) (\tau^2 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2}{(2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2))^2} \\ &= - \frac{\exp(\alpha_i)(\tau^2 \sum_{j =1}^{n_i} \mathbf{z}_{i,j}^2 )^2}{2 (1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} \\ \frac{d \zeta_i}{d \alpha_i} &= \frac{d}{d \alpha_i} \left[ \frac{\tau^2 \exp(\alpha_i)}{1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}} \right] \\ &= \frac{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})(\tau^2 \exp(\alpha_i)) - (\tau^2 \exp(\alpha_i))^2 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2} \\ &= \frac{\tau^2 \exp(\alpha_i)}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2} \\ \frac{d \dot{\mathbf{z}}_i}{d \alpha_i} &= \frac{d}{d \alpha_i} \left[\sum_{j =1 }^{n_i} \mathbf{z}_{i,j}(\mathbf{y}_{i,j} - \exp(\alpha_i)) \right] \\ &= - \exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j} \end{aligned} \nonumber $$From these identities, we can derive (component-by-component) the information as the negative expected Hessian:
\[- \mathbb{E}_{H_0}\left[ \frac{d^2 \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i^2} \right] = n_i \exp(\alpha_i)\]Proof.
$$ \begin{aligned} \frac{d}{d \alpha_i} \left[ \zeta_i \dot{\mathbf{z}}_i \right] &= \dot{\mathbf{z}}_i \frac{d \zeta_i}{\partial \alpha_i} + \zeta_i \frac{d \dot{\mathbf{z}}_i}{d \alpha_i} \\ &= \frac{\tau^2 \exp(\alpha_i) \dot{\mathbf{z}}_i}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j})^2} - \zeta_i \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \\ \frac{d}{d \alpha_i} \left[ \xi_i \dot{\mathbf{z}}_i \right] &= \dot{\mathbf{z}}_i \frac{d \xi_i}{d \alpha_i} + \xi_i \frac{d \dot{\mathbf{z}}_i}{d \alpha_i} \\ &= \frac{\dot{\mathbf{z}}_i \exp(\alpha_i) (\tau^2 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2}{2 (1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} - \xi_i \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \end{aligned} $$ $$ \begin{aligned} \frac{d^2 \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i^2} &= \frac{d}{d \alpha_i} \left[ \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i} \right] \\ &\approx \frac{d}{d \alpha_i} \left[ - \upsilon_i + \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) - \zeta_i \dot{\mathbf{z}}_i \left( \xi_i \dot{\mathbf{z}}_i + \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right) \right] \\ &= - \frac{d \upsilon_i}{d \alpha_i} - n_i\exp(\alpha_i) - \xi_i \dot{\mathbf{z}}_i \frac{d}{d \alpha_i} \left[ \zeta_i \dot{\mathbf{z}}_i \right] - \zeta_i \dot{\mathbf{z}}_i \frac{d}{d \alpha_i} \left[ \xi_i \dot{\mathbf{z}}_i \right] - \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right) \frac{d}{d \alpha_i} \left[ \zeta_i \dot{\mathbf{z}}_i \right] \\ &= - \frac{\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2 (1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} - n_i \exp(\alpha_i) - \left(\xi_i \dot{\mathbf{z}}_i - \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right) \left[\frac{\tau^2 \exp(\alpha_i) \dot{\mathbf{z}}_i}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2} - \zeta_i \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right] - \zeta_i \dot{\mathbf{z}}_i \left[ \frac{\dot{\mathbf{z}}_i \exp(\alpha_i) (\tau^2 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2}{2 (1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} - \xi_i \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right] \\ &= - \frac{\tau^2\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\left(1 + \dot{\mathbf{z}}_i^2 \zeta_i \tau^2 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right)}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j}^2)^2} - \left(\xi_i \dot{\mathbf{z}}_i - \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right) \left[\frac{\tau^2 \exp(\alpha_i) \dot{\mathbf{z}}_i}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2} - \zeta_i \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right] + \zeta_i \xi_i \dot{\mathbf{z}}_i \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} - n_i \exp(\alpha_i) \\ &= - \frac{\tau^2\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\left(1 + \dot{\mathbf{z}}_i^2 \zeta_i \tau^2 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right)}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j}^2)^2} - \frac{\left(\xi_i \dot{\mathbf{z}}_i - \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right) \tau^2 \exp(\alpha_i) \dot{\mathbf{z}}_i}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2} + \left( \zeta_i \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right) \left( 2\xi_i \dot{\mathbf{z}}_i - \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\right) - n_i \exp(\alpha_i) \end{aligned} $$ Recall that $$\mathbb{E}_{H_0}[\dot{\mathbf{z}}_i] = 0$$ and $$\mathbb{E}_{H_0} \left[ \dot{\mathbf{z}}_i^2 \right] = \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2$$, so taking the negative expectation of the above yields: $$ \begin{aligned} - \mathbb{E}_{H_0}\left[ \frac{d^2 \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i^2} \right] &= \frac{\tau^2\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\left(1 + \left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \zeta_i \tau^2 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right)}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i}\mathbf{z}_{i,j}^2)^2} + \frac{\tau^2 \exp(\alpha_i)^2 \xi_i \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2} + n_i \exp(\alpha_i) \\ \implies - \mathbb{E}_{H_0}\left[ \frac{d^2 \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i^2} \right] \bigg\rvert_{H_0} &= n_i \exp(\alpha_i) \end{aligned} $$Proof.
$$ \begin{aligned} \frac{d}{d \tau^2} \left[ \frac{2 \upsilon_i}{\tau^2} \right] &= \frac{d}{d \tau^2} \left[ \frac{\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}\right] \\ &= -\frac{(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} \\ \frac{d \psi_i}{d \tau^2} &= \frac{d}{d \tau^2} \left[ \left(1 + 2 \tau^2 \exp(\alpha_i) \sum_{j =1}^{n_i} \mathbf{z}_{i,j}^2 + (\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2 \right)^{-1} \right] \\ &= -\left(1 + 2 \tau^2 \exp(\alpha_i) \sum_{j =1}^{n_i} \mathbf{z}_{i,j}^2 + (\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2\right)^{-2}\left(2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 + 2 \tau^2 \left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right)^2 \right) \\ &= - 2\psi_i^2 \exp(\alpha_i) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right) \left(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \end{aligned} \nonumber $$ It follows that: $$ \begin{aligned} \frac{d^2 \ell(\mathbf{y}; \alpha, \tau^2)}{d (\tau^2)^2} &= \frac{d}{d \tau^2} \left[ \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \tau^2} \right] \\ &= \frac{d}{d \tau^2} \left[ \frac{1}{2} \sum_{i = 1}^k \left( - \frac{2 \upsilon_i}{\tau^2} + \dot{\mathbf{z}}_i^2 \psi_i \right) \right] \\ &= \frac{1}{2} \sum_{i = 1}^k \left( \frac{(\exp(\alpha_i)\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2}{( 1+ \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} - 2 \dot{\mathbf{z}}_i^2 \psi_i^2 \exp(\alpha_i) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right) \left(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \right) \end{aligned} \nonumber $$ And thus: $$ \begin{aligned} - \mathbb{E}_{H_0} \left[ \frac{d^2 \ell(\mathbf{y}; \alpha, \tau^2)}{d (\tau^2)^2} \right] &= \frac{1}{2} \sum_{i = 1}^k \left[ - \frac{(\exp(\alpha_i)\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2}{( 1+ \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} + 2 \mathbb{E}\left[ \dot{\mathbf{z}}_i^2 \right] \psi_i^2 \exp(\alpha_i) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right) \left(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \right] \\ &=\frac{1}{2} \sum_{i = 1}^k \left[ - \frac{(\exp(\alpha_i)\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2}{( 1+ \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} + 2 \psi_i^2 \left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right)^2 \left(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right) \right] \end{aligned} \nonumber $$ Evaluating this under $H_0$, we get: $$ \begin{aligned} - \mathbb{E}_{H_0} \left[ \frac{d^2 \ell(\mathbf{y}; \alpha, \tau^2)}{d (\tau^2)^2} \right] \bigg\rvert_{H_0} &= \frac{1}{2} \sum_{i = 1}^{k} \left[ -\left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^2 + 2 \left( \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^2\right] \\ &= \frac{1}{2} \sum_{i = 1}^{k} \left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \right)^2 \\ \end{aligned} \nonumber $$Proof.
$$ \begin{aligned} \frac{d \upsilon_i}{d\tau^2} &= \frac{d}{d \tau^2} \left[ \frac{\tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)} \right] \\ &= \frac{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2) - 2\tau^2 \left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right)^2}{(2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2))^2} \\ &= \frac{\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j =1 }^{n_i} \mathbf{z}_{i,j}^2)^2} \\ \frac{d \xi_i}{d \tau^2} &= \frac{d}{\tau^2} \left[\frac{\tau^2 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2 (1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)} \right] \\ &= \frac{2 (1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2) - 2 \tau^2 \exp(\alpha_i) \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right)^2}{\left(2 (1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2) \right)^2} \\ &= \frac{\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2 (1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} \\ \frac{d \zeta_i}{d \tau^2} &= \frac{d}{d \tau^2} \left[ \frac{\tau^2 \exp(\alpha_i)}{1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}}\right] \\ &= \frac{\exp(\alpha_i)(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}) - \exp^2(\alpha_i) \tau^2 \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}}{(1 + \tau^2 \exp(\alpha_i) \sum_{j =1 }^{n_i} \mathbf{z}_{i,j})^2} \\ &= \frac{\exp(\alpha_i)}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2} \\ \frac{d \dot{\mathbf{z}}_i}{d \tau^2} &= \frac{d}{\tau^2} \left[ \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}(\mathbf{y}_{i,j} - \exp(\alpha_i)) \right] \\ &= 0 \end{aligned} \nonumber $$Proof.
$$ \begin{aligned} \frac{d^2 \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i d \tau^2} &= \frac{d}{d \tau^2} \left[ \frac{d \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i} \right] \\ &= \frac{d}{d \tau^2} \left[ - \upsilon_i + \sum_{j = 1}^{n_i} (\mathbf{y}_{i,j} - \exp(\alpha_i)) - \zeta_i \dot{\mathbf{z}}_i \left( \xi_i \dot{\mathbf{z}}_i + \sum_{j = 1}^{n_i} \mathbf{z}_{i,j} \right) \right] \\ &= - \frac{\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} - \dot{\mathbf{z}}_i^2 \left( \xi_i \frac{d \zeta_i}{d \tau^2} + \zeta_i \frac{d \xi_i}{d \tau^2}\right) - \dot{\mathbf{z}}_i \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\right) \frac{d \zeta_i}{d \tau^2} \\ &= - \frac{\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} - \dot{\mathbf{z}}_i^2 \left( \frac{\xi_i \exp(\alpha_i)}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2}+\frac{\zeta_i \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} \right) - \dot{\mathbf{z}}_i \left(\sum_{j = 1}^{n_i} \mathbf{z}_{i,j}\right) \left(\frac{\exp(\alpha_i)}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2}\right)\\ \end{aligned} \nonumber $$ Taking the negative expectation of the above under $H_0$: $$ \begin{aligned} -\mathbb{E}_{H_0}\left[\frac{d^2 \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i d \tau^2} \right] &= \frac{\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} + \mathbb{E}_{H_0}[\dot{\mathbf{z}}_i^2] \left( \frac{\xi_i \exp(\alpha_i)}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2}+\frac{\zeta_i \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} \right) \\ &= \frac{\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} + \left(\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2\right) \left( \frac{\xi_i \exp(\alpha_i)}{(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j})^2}+\frac{\zeta_i \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2}{2(1 + \tau^2 \exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2)^2} \right) \end{aligned} \nonumber $$ And evaluating under $H_0$: $$ -\mathbb{E}_{H_0}\left[\frac{d^2 \ell(\mathbf{y}; \alpha, \tau^2)}{d \alpha_i d \tau^2} \right] \bigg\rvert_{H_0} = \frac{1}{2}\exp(\alpha_i) \sum_{j = 1}^{n_i} \mathbf{z}_{i,j}^2 \nonumber $$References
-
Fitzmaurice, G. M., Laird, N. M., & Ware, J. H. (2011). Applied longitudinal analysis (Second edition). Wiley. ↩ ↩2 ↩3
-
Lin, X. “Variance component testing in generalised linear models with random effects”. Biometrika, Volume 84, Issue 2, June 1997. Pages 309–326. doi:10.1093/biomet/84.2.309. ↩