This post serves as a general summary of work related to testing and estimation for generalized linear mixed models. We’ll use the $\log(\cdot)$ link function as an example (a Poisson response) and extend the setting to multiple responses.
There is a little bit of abuse of notation. If we write $\text{diag}(c)$ for scalar $c$, we mean the diagonal matrix with elements $c$ of the appropriate dimension for the context.
Set-Up - Generalized Linear Mixed Model
We have $n$ observations, and for the $i$-th observation we have a response, $y_i$ (let $y$ denote the full data). We will have $p$ fixed effect covariates arranged in a $p$-dimensional vector, $x_{i}$, and $q$ random effects covariates in a $q$-dimensional vector, $z_{i}$.
Let $\text{vec}(\gamma)$ denote vectorization of matrix $\gamma$ where we take the matrix of interest and concanate its columns into one long vector. We assume that we have a $p$-dimensional fixed effect coefficient vector, $\alpha$, and $q$-dimensional random effect coefficient vector, $\beta$. In addition, we assume $\beta \sim \mathcal{F}(\text{vec}(0), D(\theta))$ for a distribution, $F$, and covariance matrix, $D(\theta)$, depending on an $m$-dimensional variance component vector, $\theta$.
Furthermore, we assume that, conditional on the random effects, the responses are independent with conditional means $\mathbb{E}[y_i \rvert \beta] = \mu_i$ and variances $\text{Var}(y_i \rvert \beta) = \phi \omega_i^{-1}v(\mu_i)$ for scale parameter, $\phi > 0$, weight $\omega_i$, and variance function $v(\cdot)$. Our model comes in the form of a specification of the conditional mean. For a monotonic and differentiable link function (e.g. $\log(\cdot)$ or $\text{logit}(\cdot)$), we assume:
\[g(\mu_i) = x_i^\top \alpha + z_i^\top \beta \label{eq:glmm}\]In matrix notation, where $\mu$ is the $n$-dimensional vector of conditional means, $X$ is the $n \times p$ fixed effects covariate matrix ($x_i$ are its rows), and $Z$ is the $n \times q$ random effects covariate matrix ($z_i$ are its rows), Eq. \eqref{eq:glmm} becomes:
\[g(\mu) = X \alpha + Z \beta \label{eq:glmm-mat}\]Additional Assumptions.
The third moment and higher moments of $\beta$ are of order $o(\rvert \rvert \theta \rvert \rvert)$WLOG, entries of $D(\theta)$ are linear in $\theta$
WLOG, $D(\theta) = \text{vec}(0)$ if $\theta = \text{vec}(0)$
Instead of just $n$ observations, we'll have $n$ samples of $k$ targets (or phenotypes). Our data will then be a count table, $Y$, that is $n \times k$, some fixed covariates, $X$, that is $n \times p$, and some random covariates, $Z$, that is $n \times q$. We'll also have a $p \times k$ fixed effects matrix $\alpha$ and $q \times k$ random effects matrix $\beta$.
We'll have a scalar- rather than vector-valued variance component that we call $\tau$. We'll assume $\text{vec}(\beta) \sim \mathcal{N}(\text{vec}(0), \tau^2 \Sigma_Z \otimes \Sigma_T)$ for $q \times k$-dimensional vector of zeroes, $\text{vec}(0)$, $q \times q$ random effects covariance matrix, $\Sigma_Z$, and $k \times k$ target covariance matrix, $\Sigma_T$.
To simplify things, we'll assume that $p = q = 1$, so $\Sigma_Z = 1$. We'll further assume that $\Sigma_T = \mathbb{I}_{k \times k}$, the $k \times k$ identity matrix. Let $\Sigma = \Sigma_Z \otimes \Sigma_T = \Sigma_T$.
Assume that function on scalars are applied element-wise when used for vectors or matrices. If we let $\mu$ denote the $n \times k$ matrix where the $(i,j)$ entry is $\mu_{i,j}$, we can write our vectorized model is: $$ g(\text{vec}(\mu)) = \text{vec}(X \alpha) + \text{vec}(Z \beta) \nonumber $$ Since we assume $p = q = 1$, our data matrices, $X$ and $Z$, have only one column each. We'll use the notation $X_i$ and $Z_i$ to denote the $i$-th entry of these.
Quasi-Likelihood
Define $\ell_{\text{quasi}}(\alpha, \theta; y_i \rvert \beta) = \int_{y_i}^{\mu_i} \frac{\omega_i(y_i - u)}{\phi v(u)} du$, the log quasi-likelihood of $(\alpha, \theta)$ given $y_i$, conditional on $\beta$. We define the quasi-likelihood of $\alpha$ and $\theta$ given the complete data, $y$, conditional on the random effects, $\beta$, as:
\[\begin{aligned} \mathcal{L}_\text{quasi}(\alpha, \theta; y \rvert \beta) &= \prod_{i = 1}^n \mathcal{L}_\text{quasi}(\alpha, \theta; y_i \rvert \beta) \\ &= \prod_{i = 1}^n \exp \left( \ell_{\text{quasi}}(\alpha, \theta; y_i \rvert \beta) \right) \\ &= \prod_{i = 1}^n \exp \left( \int_{y_i}^{\mu_i} \frac{\omega_i(y_i - u)}{\phi v(u)} du \right) \\ &= \exp \left( \sum_{i = 1}^n \int_{y_i}^{\mu_i} \frac{\omega_i(y_i - u)}{\phi v(u)} du \right) \end{aligned} \label{eq:c-q-lik}\]An Aside On Quasi-Likelihood.
In the following, the conditioning on $\beta$ has been suppressed.The quasi-likelihood is motivated by the so-called quasi-score, $U(\mu_i; y_i) = \frac{\omega_i(y_i - \mu_i)}{\phi v(\mu_i)}$.
Notice that the following properties hold for $U$: $$ \begin{aligned} \mathbb{E}[U(\mu_i; y_i)] &= \mathbb{E}\left[ \frac{\omega_i(y_i - \mu_i)}{\phi v(\mu_i)} \right] \\ &= \frac{\omega_i(\mathbb{E}[y_i] - \mu_i)}{\phi v(\mu_i)} \\ &= \frac{\omega_i(\mu_i - \mu_i)}{\phi v(\mu)} \\ &= 0 \end{aligned} \nonumber $$ $$ \begin{aligned} \text{Var}(U(\mu_i; y_i)) &= \text{Var}\left( \frac{\omega_i(y_i - \mu_i)}{\phi v(\mu_i)} \right) \\ &= \frac{\omega_i^2}{\phi^2 v^2(\mu_i)} \text{Var}(y_i - \mu_i) \\ &= \frac{\omega_i^2}{\phi^2 v^2(\mu_i)} \text{Var}(y_i) \\ &= \frac{\omega_i^2}{\phi^2 v^2(\mu_i)} (\phi \omega_i^{-1} v(\mu_i)) \\ &= \frac{\omega_i}{\phi v(\mu_i)} \\ \end{aligned} \nonumber $$ $$ \begin{aligned} - \mathbb{E}\left[ \frac{\partial}{\partial \mu_i} U(\mu_i; y_i) \right] &= -\mathbb{E}\left[ \frac{\partial}{\partial \mu_i} \frac{\omega_i(y_i - \mu_i)}{\phi v(\mu_i)} \right] \\ &= -\mathbb{E}\left[ \frac{-\omega_i \phi v(\mu_i) - \omega_i(y_i - \mu_i)\phi v'(\mu_i)}{\phi^2 v^2(\mu_i)} \right] \\ &= -\frac{-\omega_i \phi v(\mu_i) - \omega_i\mathbb{E}\left[ (y_i - \mu_i) \right]\phi v'(\mu_i)}{\phi^2 v^2(\mu_i)} \\ &= -\frac{-\omega_i \phi v(\mu_i) - \omega_i (\mu_i - \mu_i) \phi v'(\mu_i)}{\phi^2 v^2(\mu_i)} \\ &= -\frac{-\omega_i \phi v(\mu_i)}{\phi^2 v^2(\mu_i)} \\ &= \frac{\omega_i}{\phi v(\mu_i)} \end{aligned} \nonumber $$ The above properties are also had by the score function, which is very useful in situations when a likelihood function can be constructed. The basic idea of quasi-likelihood is to use the quasi-score to get a log quasi-likelihood and use that in cases when a likelihood function cannot be constructed or identified: $$ \ell_{\text{quasi}}(\mu_i; y_i) = \int_{y_i}^{\mu_i} \frac{\omega_i(y_i - u)}{\phi v(u)} du \nonumber $$
Let $f$ denote the density associated with $F$, the distribution of the random effects. The integrated quasi-likelihood of $(\alpha, \theta)$ is given by integrating $\beta$ out of the log quasi-likelihood of $(\alpha, \theta, \beta)$, which itself is the product of the conditional quasi-likelihood and the likelihood of $\beta$:
\[\begin{aligned} \mathcal{L}_\text{quasi}(\alpha, \theta; y) &= \int \mathcal{L}_\text{quasi}(\alpha, \theta; y \rvert \beta) f(\beta) d\beta \\ &= \int \exp \left( \ell_\text{quasi}(\alpha, \theta; y \rvert \beta) \right) f(\beta) d\beta \\ &= \int \exp \left( \sum_{i = 1}^n \int_{y_i}^{\mu_i} \frac{\omega_i(y_i - u)}{\phi v(u)} du \right) f(\beta) d\beta \end{aligned} \label{eq:comp-q-lik}\]Since we have assumed a Poisson model, $v(\mu_{i,j}) = \mu_{i,j}$, we have no overdispersion, which implies that $\phi = 1$. We'll just say that $\omega_{i,j} = 1$ 🤷♂️. This means the conditional log quasi-likelihood for a single observation (sample $i$, target $j$) is given by: $$ \begin{aligned} \ell_\text{quasi}(\alpha_{\cdot, j}, \tau; Y_{i,j}, X_{i, \cdot}, Z_{i, \cdot} \rvert \beta_{\cdot, j}) &= Y_{i,j} \left( \log(\mu_{i,j}) - \log(Y_{i,j}) \right) - \left( \mu_{i,j} - Y_{i,j} \right) \end{aligned} \label{eq:example-quasi-log-lik} $$ where $M_{\cdot, i}$ and $M_{i, \cdot}$ denote the $i$-th column and row of matrix $M$, respectively.
Details.
$$ \begin{aligned} \ell_\text{quasi}(\alpha_{\cdot, j}, \tau; Y_{i,j}, X_{i, \cdot}, Z_{i, \cdot} \rvert \beta_{\cdot, j}) &= \int_{Y_{i, j}}^{\mu_{i,j}} \frac{(Y_{i,j} - u)}{u} du \\ &= \int_{Y_{i,j}}^{\mu_{i,j}} \left( \frac{Y_{i,j}}{u} - \frac{u}{u} \right) du \\ &= Y_{i,j} \int_{Y_{i,j}}^{\mu_{i,j}} \frac{1}{u} du - \int_{Y_{i,j}}^{\mu_{i,j}} 1 du \\ &= Y_{i,j} \left( \log(\mu_{i,j}) - \log(Y_{i,j}) \right) - \left( \mu_{i,j} - Y_{i,j} \right) \end{aligned} \nonumber $$Details
$$ \begin{aligned} \mathcal{L}_\text{quasi}(\alpha, \tau; Y, X, Z \rvert \beta) &= \exp \left( \sum_{i = 1}^n \sum_{j = 1}^k \ell_\text{quasi}(\alpha_{\cdot, j}, \tau; Y_{i, j}, X_{i, \cdot}, Z_{i, \cdot} \rvert \beta_{\cdot, j}) \right) \\ &= \exp \left( \sum_{i = 1}^n \sum_{j = 1}^k \left[ Y_{i,j} \left( \log(\mu_{i,j}) - \log(Y_{i,j}) \right) - \left( \mu_{i,j} - Y_{i,j} \right) \right] \right) \\ &= \exp \left( \sum_{i = 1}^n \sum_{j = 1}^k \left[ Y_{i,j} \log(\mu_{i,j}) - \mu_{i,j} - \log(Y_{i,j}) + Y_{i,j} \right] \right) \end{aligned} \nonumber $$Taylor Expansion
The complexity of Eq. \eqref{eq:comp-q-lik} can lead to difficulty That is, I do not know how to integrate that monster. in parameter estimation for Model \eqref{eq:glmm-mat}.
First, let’s rewrite the conditional quasi-likelihood of $(\alpha, \theta)$ given $\beta$ using a Taylor expansion. We’ll use a second-order expansion since the remainder will be relatively simple due to the assumptions made on the higher order moments of the random effects.
Denote the first- and second-order partial derivatives of $\mathcal{L}_{\text{quasi}}(\alpha, \theta; y \rvert \beta)$ evaluated at $\beta = \beta_0$ with:
\[\begin{aligned} \mathcal{L}'_\text{quasi}(\beta_0) &= \exp\left( \ell_\text{quasi}(\alpha, \theta; y \rvert \beta_0) \right) \left(\sum_{i = 1}^n \frac{\partial \ell_{\text{quasi}}(\alpha, \theta; y_i \rvert \beta_0)}{\partial \eta_i} z_i^\top \right)\\ \mathcal{L}''_\text{quasi}(\beta_0) &= \exp\left( \ell_\text{quasi}(\alpha, \theta; y \rvert \beta_0) \right) \left[ \left( \sum_{i = 1}^n \frac{\partial \ell_{\text{quasi}}(\alpha, \theta; y_i \rvert \beta_0)}{\partial \eta_i} z_i \right) \left( \sum_{i = 1}^n \frac{\partial \ell_{\text{quasi}}(\alpha, \theta; y_i \rvert \beta_0)}{\partial \eta_i} z_i^\top \right) + \sum_{i = 1}^n \frac{\partial^2 \ell_{\text{quasi}}(\alpha, \theta; y_i \rvert \beta_0)}{\partial \eta_i^2} z_i z_i^\top \right] \end{aligned} \label{eq:likelihood-derivs}\]First Order Derivations.
We use the fact that $g(\mu_i) = \eta_i = x_i^\top \alpha + z_i^\top \beta$. $$ \begin{aligned} \mathcal{L}'_\text{quasi} &= \frac{\partial}{\partial \beta} \left[ \exp\left(\ell_{\text{quasi}}(\alpha, \theta; y \rvert \beta) \right) \right] \\ &= \exp\left(\ell_{\text{quasi}}(\alpha, \theta; y \rvert \beta) \right) \frac{\partial}{\partial \beta} \left[ \sum_{i = 1}^n \ell_{\text{quasi}}(\alpha, \theta; y_i \rvert \beta) \right] \\ &= \exp\left(\ell_{\text{quasi}}(\alpha, \theta; y \rvert \beta) \right) \left(\sum_{i = 1}^n \frac{\partial}{\partial \beta} \left[ \int_{y_i}^{\mu_i} \frac{\omega_i(y_i - u)}{\phi v(u)} du \right] \right)\\ &= \exp\left(\ell_{\text{quasi}}(\alpha, \theta; y \rvert \beta) \right) \left( \sum_{i = 1}^n \frac{\partial}{\partial \eta_i} \left[ \int_{y_i}^{\mu_i} \frac{\omega_i(y_i - u)}{\phi v(u)} du \right] \frac{\partial \eta_i}{\partial \beta} \right) \\ &= \exp\left(\ell_{\text{quasi}}(\alpha, \theta; y \rvert \beta) \right) \left( \sum_{i = 1}^n \frac{\partial}{\partial \eta_i} \left[ \ell_{\text{quasi}}(\alpha, \theta; y_i \rvert \beta) \right] z_i\right) \\ &= \exp\left(\ell_{\text{quasi}}(\alpha, \theta; y \rvert \beta) \right) \left( \sum_{i = 1}^n \frac{\partial \ell_{\text{quasi}}(\alpha, \theta; y_i \rvert \beta)}{\partial \eta_i} z_i\right) \end{aligned} \nonumber $$Second Order Derivations.
TODO $$ \begin{aligned} \kappa''(\beta) &= ?? \end{aligned} \nonumber $$Due to our assumptions, we can adjust Eqs. \eqref{eq:likelihood-derivs} to accommodate the multiple targets. $$ \begin{aligned} \mathcal{L}'_\text{quasi}(\beta_0) &= \exp(\ell_\text{quasi}(\alpha, \tau; Y, X, Z \rvert \beta_0)) \left( \sum_{i = 1}^n Z_i \frac{\partial \ell_\text{quasi}(\alpha, \tau; Y, X, Z \rvert \beta_0)}{\partial \eta_{i, \cdot}^\top}\right) \\ \mathcal{L}''_\text{quasi}(\beta_0) &= \exp(\ell_\text{quasi}(\alpha, \tau; Y, X, Z \rvert \beta_0)) \left[ \left(\sum_{i = 1}^n Z_i \frac{\partial \ell_\text{quasi}(\alpha, \tau; Y, X, Z \rvert \beta_0)}{\partial \eta_{i, \cdot}}\right) \left(\sum_{i = 1}^n Z_i \frac{\partial \ell_\text{quasi}(\alpha, \tau; Y, X, Z \rvert \beta_0)}{\partial \eta^\top_{i, \cdot}}\right) + \sum_{i = 1}^n Z_i^2 \frac{\partial \ell_\text{quasi}(\alpha, \tau; Y, X, Z \rvert \beta_0)}{\partial \eta_{i, \cdot} \partial \eta_{i, \cdot}^\top}\right] \end{aligned} \label{eq:ex-likelihood-derivs} $$ In our example, we have that $\log(\mu_{i,j}) = \eta_{i,j}$. Thus: $$ \begin{aligned} \frac{\partial \ell_\text{quasi}(\alpha, \tau; Y, X, Z \rvert \beta)}{\partial \eta_{i,\cdot}^\top} &= Z_i (Y_{i,\cdot} - \mu_{i,\cdot})^\top & \frac{\partial^2 \ell_\text{quasi}(\alpha, \tau; Y, X, Z \rvert \beta)}{\partial \eta_{i,\cdot} \partial \eta^\top_{i, \cdot}} &= \text{diag}\left( - \mu_{i, \cdot} \right) \end{aligned} \nonumber $$
Details.
$$ \begin{aligned} \frac{\partial \ell_\text{quasi}(\alpha, \tau; Y, X, Z \rvert \beta_0)}{\partial \eta_{i,j}} &= \frac{\partial}{\partial \eta_{i,j}} \left[ Y_{i,j}(\log(\mu_{i,j}) - \log(Y_{i,j})) - (\mu_{i,j} - Y_{i,j}) \right] \\ &= \frac{\partial}{\partial \eta_{i,j}} \left[ Y_{i,j}(\eta_{i,j} - \log(Y_{i,j})) - (\exp(\eta_{i,j}) - Y_{i,j}) \right] \\ &= Y_{i,j} - \exp(\eta_{i,j}) \\ &= Y_{i,j} - \mu_{i,j} \\ \implies \frac{\partial \ell_\text{quasi}(\alpha, \tau; Y, X, Z \rvert \beta_0)}{\partial \eta_{i, \cdot}} &= Y_{i, \cdot} - \mu_{i, \cdot} \end{aligned} \nonumber $$ and also: $$ \begin{aligned} \frac{\partial^2 \ell_\text{quasi}(\alpha, \tau; Y, X, Z \rvert \beta_0)}{\partial \eta_{i,j}^2} &= \frac{\partial}{\partial \eta_{i,j}} \left[ Y_{i,j} - \mu_{i,j} \right] \\ &= - \exp(\eta_{i,j}) \\ &= - \mu_{i,j} \end{aligned} \nonumber $$ all of the other ones are $0$.The Taylor expansion of $\exp \left( \ell_\text{quasi}(\alpha, \theta; y \rvert \beta) \right)$ at $\beta_0$ is:
\[\begin{aligned} \exp \left( \ell_\text{quasi}(\alpha, \theta; y \rvert \beta) \right) &= \mathcal{L}_\text{quasi}(\beta_0) + (\beta - \beta_0)^\top \mathcal{L}'_\text{quasi}(\beta_0) + \frac{1}{2}(\beta - \beta_0)^\top \mathcal{L}''_\text{quasi}(\beta_0) (\beta - \beta_0) + R \end{aligned} \label{eq:taylor-expansion}\]where $R$ is a remainder term containing the higher order partial derivatives. Let $\bar{R}$ denote the same terms divided by $\mathcal{L}(\beta_0)$ (i.e. we just factor out the leading term of $\exp(\ell_\text{quasi}(\alpha, \theta; y \rvert \beta_0))$). We’ll do the expansion about $\beta_0 = \text{vec}(0)$ (the mean of the random effects), so Eq. \eqref{eq:taylor-expansion} simplifies to:
\[\begin{aligned} \exp \left( \ell_\text{quasi}(\alpha, \theta; y \rvert \beta) \right) &= \exp \left( \sum_{i = 1}^n \ell_\text{quasi}(\alpha, \theta; y_i \rvert \text{vec}(0)) \right) \left[ 1 + \beta^\top \left(\sum_{i = 1}^n \frac{\partial \ell_{\text{quasi}}(\alpha, \theta; y_i \rvert \text{vec}(0))}{\partial \eta_i} z_i^\top \right) \right. \\ &\hspace{6mm} \left. + \frac{1}{2}\beta^\top \left[ \left( \sum_{i = 1}^n \frac{\partial \ell_{\text{quasi}}(\alpha, \theta; y_i \rvert \text{vec}(0))}{\partial \eta_i} z_i \right) \left( \sum_{i = 1}^n \frac{\partial \ell_{\text{quasi}}(\alpha, \theta; y_i \rvert \text{vec}(0))}{\partial \eta_i} z_i^\top \right) + \sum_{i = 1}^n \frac{\partial^2 \ell_{\text{quasi}}(\alpha, \theta; y_i \rvert \text{vec}(0))}{\partial \eta_i^2} z_i z_i^\top \right] \beta + \bar{R} \right] \end{aligned} \nonumber\]We can approximate the unconditional quasi-likelihood in Eq. \eqref{eq:comp-q-lik} as the expectation of our approximation of $\exp \left( \sum_{i = 1}^n \ell_\text{quasi}(\alpha, \theta; y_i \rvert \beta) \right)$ with respect to the probability measure specified by $F$:
\[\begin{aligned} \mathcal{L}(\alpha, \theta; y) &= \int \exp \left( \sum_{i = 1}^n \ell_\text{quasi}(\alpha, \theta; y_i \rvert \beta) \right) f(\beta) d\beta \\ &= \mathbb{E} \left[ \exp \left( \sum_{i = 1}^n \ell_\text{quasi}(\alpha, \theta; y_i \rvert \beta) \right) \right] \\ &= \exp \left( \sum_{i = 1}^n \ell_\text{quasi}(\alpha, \theta; y_i \rvert \text{vec}(0)) \right) \left(1 + \frac{1}{2} \text{tr}\left[ \left( \left( \sum_{i = 1}^n \frac{\partial \ell_{\text{quasi}}(\alpha, \theta; y_i \rvert \text{vec}(0))}{\partial \eta_i} z_i \right) \left( \sum_{i = 1}^n \frac{\partial \ell_{\text{quasi}}(\alpha, \theta; y_i \rvert \text{vec}(0))}{\partial \eta_i} z_i^\top \right) + \sum_{i = 1}^n \frac{\partial^2 \ell_{\text{quasi}}(\alpha, \theta; y_i \rvert \text{vec}(0))}{\partial \eta_i^2} z_i z_i^\top \right) D(\theta) \right] + o(\rvert \rvert \theta \rvert \rvert) \right) \end{aligned} \label{eq:taylor-expansion-2}\]Derivation Of Quasi-Likelihood.
Please be aware that I am no expert on vector calculus, so the following is a bitLet $\frac{\partial \ell_\text{quasi}(\alpha, \theta; y \rvert \beta)}{\partial \eta}$ denote the $n$-dimensional vector first-order partial derivatives. Similarly, let $\frac{\partial^2 \ell_\text{quasi}(\alpha, \theta; y \rvert \beta)}{\partial \eta \partial \eta^\top}$ be the $n \times n$ matrix of second-order partial derivatives.
We can write our Taylor expansion of the log quasi-likelihood about $\beta = \text{vec}(0)$ in matrix notation as:
\[\begin{aligned} \ell_\text{quasi}(\alpha, \theta; y) &= \log \left(\mathcal{L}_\text{quasi}(\alpha, \theta; y) \right) \\ &= \sum_{i = 1}^n \ell_\text{quasi}(\alpha, \theta; y_i \rvert \text{vec}(0)) + \log\left( 1 + \frac{1}{2}\text{tr}\left[ Z^\top \left( \frac{\partial \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta} \frac{\partial \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta^\top} + \frac{\partial^2 \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta \partial \eta^\top} \right) Z D(\theta) \right] + o(\rvert \rvert \theta \rvert \rvert) \right) \\ &\overset{(i)}{\approx} \sum_{i = 1}^n \ell_\text{quasi}(\alpha, \theta; y_i \rvert \text{vec}(0)) +\frac{1}{2}\text{tr}\left[ Z^\top \left( \frac{\partial \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta} \frac{\partial \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta^\top} + \frac{\partial^2 \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta \partial \eta^\top} \right) Z D(\theta) \right] \end{aligned} \label{eq:log-quasi-lik-exp}\]where the approximation in $(i)$ uses the fact that $\log(1 + x) \approx x$ for small enough $x$ and that $o(\rvert \rvert \theta \rvert \rvert)$ is small enough…
In our example, we let the covariance of the random effects be $\tau^2 \Sigma$ for $k \times k$ identity matrix $\Sigma$.
Let $\ell^{(i,j)}_\text{quasi}(\alpha_{\cdot, j}, \tau \rvert \mathbf{0}) = \ell_\text{quasi}(\alpha_{\cdot, j}, \tau; Y_{i,j}, X_{i, \cdot}, Z_{i, \cdot} \rvert \text{vec}(0))$, where $\text{vec}(0) = \beta_{0; \cdot, j}$, the $j$-th column of $\beta_0$ under the null hypothesis. We can use the forms we found in Eq. \eqref{eq:ex-likelihood-derivs} and plug into the approximations found above: $$ \begin{aligned} \ell_\text{quasi}(\alpha, \tau; Y, X, Z) &\approx \sum_{i = 1}^n \sum_{j = 1}^k \ell^{(i,j)}_\text{quasi}(\alpha, \tau \rvert \mathbf{0}) + \frac{\tau^2}{2} \text{tr}\left[ \left[ \left(\sum_{i = 1}^n Z_i \frac{\partial \ell_\text{quasi}(\alpha, \tau; Y, X, Z \rvert \mathbf{0})}{\partial \eta_{i, \cdot}} \right) \left(\sum_{i = 1}^n Z_i \frac{\partial \ell_\text{quasi}(\alpha, \tau; Y, X, Z \rvert \mathbf{0})}{\partial \eta^\top_{i, \cdot}} \right) + \left(\sum_{i = 1}^n Z_i^2 \frac{\partial^2 \ell_\text{quasi}(\alpha, \tau; Y, X, Z \rvert \mathbf{0})}{\partial \eta_{i, \cdot} \partial \eta_{i, \cdot}^\top} \right) \right] \Sigma \right]\\ &= \sum_{i = 1}^n \sum_{j = 1}^k \ell^{(i,j)}_\text{quasi}(\alpha, \tau \rvert \mathbf{0}) + \frac{\tau^2}{2} \text{tr} \left[ \left(\sum_{i = 1}^n Z_i (Y_{i, \cdot} - \mu_{i, \cdot}) \right) \left(\sum_{i = 1}^n Z_i (Y_{i, \cdot} - \mu_{i, \cdot})^\top \right) - \left(\sum_{i = 1}^n Z_i^2 \text{diag}(\mu_{i, \cdot})\right) \right] \end{aligned} $$
Global Test
In (Lin, 1997), I believe that they have that $[g^{-1}(\eta_i)]’ = \frac{1}{g’(\mu_i)}$. This holds if you assume that $g(\cdot)$ is strictly monotone and continuous, which implies that $g(\cdot)$ is invertible and its derivative is never $0$. I will continue under this assumption.
First, let’s do some intermediate work that will make things easier later on. Let $f’(\mu_i)$ denote the first derivative of $f$ with respect to $\mu_i$. Define the following:
\[\begin{array}{ll} \eta_i = g^{-1}(\mu_i) & \delta_i = \frac{1}{g'(\mu_i)} \\ V(\mu_i) = \phi (\omega_i)^{-1} v(\mu_i) & w_i = \frac{1}{V(\mu_i) (g'(\mu_i))^2} \\ e_i = \frac{V'(\mu_i)g'(\mu_i) + V(\mu_i)g''(\mu_i)}{V^2(\mu_i)(g'(\mu_i))^3} & w_{0, i} = w_i + e_i(y_i - \mu_i) \end{array} \label{eq:intermed-terms}\]Let $\Delta$ and $W$ be the diagonal matrices with elements $\delta_i$ and $w_i$, respectively. Similarly, let $W_0$ be the diagonal matrix with $w_{0, i}$ as its elements.
\[\begin{array}{ll} \frac{\partial \mu_i}{\partial \eta_i} = \frac{1}{g'(\mu_i)} & \frac{\partial}{\partial \eta_i} \left[ \frac{1}{g'(\mu_i)} \right] = -\frac{g''(\mu_i)}{(g'(\mu_i))^3} \\ \frac{\partial}{\partial \mu_i} \left[ \ell_\text{quasi}(\alpha, \theta; y \rvert \beta) \right] = \frac{y_i - \mu_i}{V(\mu_i)} & \frac{\partial}{\partial \eta_i} \left[ \ell_\text{quasi}(\alpha, \theta; y \rvert \beta) \right] = \frac{y_i - \mu_i}{V(\mu_i) g'(\mu_i)} \end{array} \label{eq:helper-1}\]Details.
$$ \begin{aligned} \frac{\partial}{\partial \eta_i} \left[ \mu_i \right] = \frac{\partial}{\partial \eta_i} \left[ g^{-1}(\eta_i) \right] = \left[ g^{-1}(\eta_i) \right]' = \frac{1}{g'(\mu_i)} \end{aligned} \label{eq:deriv-mu-eta} \nonumber $$ $$ \begin{aligned} \frac{\partial}{\partial \eta_i} \left[\frac{1}{g'(\mu_i)} \right] = -\frac{1}{(g'(\mu_i))^2} \cdot \frac{\partial \mu_i}{\partial \eta_i} \frac{\partial}{\partial \mu_i} \left[ g'(\mu_i) \right] = -\frac{g''(\mu_i)}{(g'(\mu_i))^3} \end{aligned} \label{eq:deriv-mu-eta-2} \nonumber $$ $$ \begin{aligned} \frac{\partial}{\partial \mu_i} \left[ \ell_\text{quasi}(\alpha, \theta; y \rvert \beta) \right] = \frac{\partial}{\partial \mu_i} \left[ \int_{y_i}^{\mu_i} \frac{\omega_i(y_i - u)}{\phi v(u)} du\right] = \frac{\omega_i(y_i - \mu_i)}{\phi v(\mu_i)} = \frac{y_i - \mu_i}{V(\mu_i)} \end{aligned} \label{eq:deriv-ell-mu} \nonumber $$ $$ \begin{aligned} \frac{\partial}{\partial \eta_i} \left[ \ell_\text{quasi}(\alpha, \theta; y \rvert \beta) \right] = \frac{\partial \mu_i}{\partial \eta_i} \frac{\partial}{\partial \mu_i} \left[ \int_{y_i}^{\mu_i} \frac{\omega_i(y_i - u)}{\phi v(u)} du\right] = \frac{\omega_i (y_i - \mu_i)}{\phi v(\mu_i)} \cdot \frac{1}{g'(\mu_i)} = \frac{y_i - \mu_i}{V(\mu_i) g'(\mu_i)} \end{aligned} \label{eq:deriv-ell-eta} \nonumber $$We can use the above results to show that:
\[\begin{array}{ll} \frac{\partial}{\partial \eta_i} \left[ \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0)) \right] = w_i \delta_i^{-1} (y_i - \mu_i) & \frac{\partial}{\partial \eta}\left[ \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0)) \right] = W \Delta^{-1}(y - \mu) \\ -\frac{\partial^2}{\partial \eta_i^2}\left[ \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0)) \right] = w_i + e_i(y_i - \mu_i) & -\frac{\partial^2}{\partial \eta \eta^\top}\left[ \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0)) \right] = W_0 \end{array} \label{eq:helper-2}\]Details.
$$ \begin{aligned} \frac{\partial^2}{\partial \eta_j \partial \eta_i} \left[ \ell_\text{quasi}(\alpha, \theta; y \rvert \beta) \right] &= \frac{\partial}{\partial \eta_j} \left[ \frac{\omega_i(y_i - \mu_i)}{\phi v(\mu_i)} \cdot \frac{1}{g'(\mu_i)} \right] \\ &= 0 \end{aligned} \nonumber $$ $$ \begin{aligned} \frac{\partial^2}{\partial \eta_i^2} \left[ \ell_\text{quasi}(\alpha, \theta; y \rvert \beta) \right] &= \frac{\partial}{\partial \eta_i} \left[ \frac{y_i - \mu_i}{V(\mu_i)} \cdot \frac{1}{g'(\mu_i)} \right] \\ &= \frac{1}{g'(\mu_i)} \frac{\partial}{\partial \eta_i} \left[ \frac{y_i - \mu_i}{V(\mu_i)} \right] + \left[ \frac{y_i - \mu_i}{V(\mu_i)} \right] \frac{\partial}{\partial \eta_i} \cdot \frac{1}{g'(\mu_i)} \\ &= \frac{1}{g'(\mu_i)} \left( \frac{-V(\mu_i)\left(\frac{1}{g'(\mu_i)}\right) - (y_i - \mu_i)V'(\mu_i)\left(\frac{1}{g'(\mu_i)}\right)}{V^2(\mu_i)}\right) - \frac{g''(\mu_i)}{(g'(\mu_i))^3} \left( \frac{y_i - \mu_i}{V(\mu_i)}\right) \\ &= -\frac{1}{V(\mu_i) (g'(\mu_i))^2} - \frac{(y_i - \mu_i)V'(\mu_i)}{(g'(\mu_i))^2 V^2(\mu_i)} - \frac{g''(\mu_i)}{(g'(\mu_i))^3} \left( \frac{y_i - \mu_i}{V(\mu_i)}\right) \\ &= -\frac{1}{V(\mu_i)(g'(\mu_i))^2} - \left( \frac{V'(\mu_i)g'(\mu_i) + g''(\mu_i)V(\mu_i)}{V^2(\mu_i)(g'(\mu_i))^3} \right) (y_i - \mu_i) \\ &= -w_{0, i} \end{aligned} \nonumber $$In our example, we have $g(\cdot) = \log(\cdot)$. Thus: $$ \begin{array}{ll} \eta_{i,j} = \exp(\mu_{i,j}) & \delta_{i,j} = \mu_{i,j} \\ V(\mu_{i,j}) = \mu_{i,j} & w_{i,j} = \mu_{i,j} \\ e_{i,j} = 0 & w_{0, i,j} = \mu_{i,j} \end{array} \nonumber $$
Details.
$$ \begin{array}{l} \eta_{i,j} = g^{-1}(\mu_{i,j}) = \exp(\mu_{i,j}) \\ \delta_{i,j} = \frac{1}{g'(\mu_{i,j})} = \frac{1}{\frac{1}{\mu_{i,j}}} = \mu_{i,j} \\ V(\mu_{i,j}) = v(\mu_{i,j}) = \mu_{i,j} \\ w_{i,j} = \frac{1}{V(\mu_{i,j}) (g'(\mu_{i,j}))^2} = \frac{1}{\mu_{i,j}(\frac{1}{\mu^2_{i,j}})} = \mu_{i,j} \\ e_{i,j} = \frac{V'(\mu_{i,j})g'(\mu_{i,j}) + V(\mu_{i,j})g''(\mu_{i,j})}{V^2(\mu_{i,j})(g'(\mu_{i,j}))^3} = \frac{\frac{1}{\mu_{i,j}} - \mu_{i,j}\frac{1}{\mu^2_{i,j}}}{V^2(\mu_{i,j}) (g'(\mu_{i,j}))^3} = 0 \\ w_{0,i,j} = w_{i,j} + e_{i,j}(Y_{i,j} - \mu_{i,j}) = w_{i,j} = \mu_{i,j} \end{array} \nonumber $$As in (Lin, 1997), we’ll assume that the cumulants of $y_i$ of order greater than $2$ satisfym $\kappa_{(r + 1),i} = \kappa_{2,i} \frac{\partial \kappa_{r, i}}{\partial \mu_i}$ for $r = 2, 3$. Recall that the second cumulant is the variance, so $\kappa_{2, i} = V(\mu_i) = \phi(\omega_i)^{-1}v(\mu_i)$:
\[\begin{array}{l} \kappa_{3,i} = (\phi(\omega_i)^{-1})^2v(\mu_i) v'(\mu_i) \\ \kappa_{4, i} = (\phi(\omega_i)^{-1})^3v(\mu_i)((v'(\mu_i))^2 + v(\mu_i)v''(\mu_i)) \end{array} \label{eq:cumulant-conditions}\]Details.
$$ \begin{aligned} \kappa_{3,i} &= \kappa_{2,i} \frac{\partial \kappa_{2,i}}{\partial \mu_i} \\ &= \phi(\omega_i)^{-1}v(\mu_i) \frac{\partial}{\partial \mu_i} \left[ \phi(\omega_i)^{-1}v(\mu_i)\right] \\ &= (\phi(\omega_i)^{-1})^2v(\mu_i) v'(\mu_i) \end{aligned} \nonumber $$ $$ \begin{aligned} \kappa_{4,i} &= \kappa_{2,i} \frac{\partial \kappa_{3,i}}{\partial \mu_i} \\ &= \phi(\omega_i)^{-1}v(\mu_i) \frac{\partial}{\partial \mu_i} \left[ (\phi(\omega_i)^{-1})^2v(\mu_i) v'(\mu_i) \right] \\ &= (\phi(\omega_i)^{-1})^3v(\mu_i)((v'(\mu_i))^2 + v(\mu_i)v''(\mu_i)) \end{aligned} \nonumber $$Scores
This is where things get messy. I’ve included some additional work for the case where $k = 1$ as a sanity check.
So that our example is not so trivial, we’ll let $p$ be an arbitrary positive integer but keep $q = 1$. This implies that $\beta \sim \mathcal{N}(\text{vec}(0), \tau^2)$.
This leaves us with outcome vector $Y$, which is $n \times 1$; fixed covariate matrix $X$, which is $n \times p$; random covariate matrix $Z$, which is $n \times 1$; fixed effects vector $\alpha$, which is $p \times 1$; and random effect $\beta$, which is a scalar. Similar to above, we’ll let $\mu$ denote the $n \times 1$ vector of means.
Score For $\theta$
The score for $\theta$ has elements:
\[\begin{aligned} U_{\theta_j} &= \frac{\partial \ell_\text{quasi}(\alpha, \theta; y)}{\partial \theta_j} \\ &= \frac{1}{2} \text{tr}\left[ \left( \frac{\partial \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta} \frac{\partial \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta^\top} + \frac{\partial^2 \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta \partial \eta^\top} \right) ZD'_j(\theta)Z^\top\right] \end{aligned} \nonumber\]where we use $D’_{j}(\theta)$ to denote the matrix of partial derivatives with respect to $\theta_j$.
Derivation Of $U_{\theta_j}$.
We can calculate the quasi-score vector for $\theta$ by finding the vector of first-order partial derivatives of $\ell_\text{quasi}(\alpha, \theta; y)$ with respect to the components of $\theta$ and evaluating it under the null hypothesis. Let $B = \frac{\partial \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta} \frac{\partial \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta^\top} + \frac{\partial^2 \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta \partial \eta^\top}$. Then the quasi-score is the $q$-dimensional vector with $j$-th component: $$ \begin{aligned} U_{\theta_j} &= \frac{\partial \ell_\text{quasi}(\alpha, \theta; y)}{\partial \theta_j} \\ &= \frac{\partial}{\partial \theta_j} \left[ \underbrace{\sum_{i = 1}^n \ell_\text{quasi}(\alpha, \theta; y_i \rvert \text{vec}(0))}_{(a)} + \log\left( 1 + \frac{1}{2}\text{tr}\left[ Z^\top \left( \frac{\partial \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta} \frac{\partial \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta^\top} + \frac{\partial^2 \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta \partial \eta^\top} \right) Z D(\theta) \right] + o(\rvert \rvert \theta \rvert \rvert) \right)\right] \\ &\overset{(i)}{\approx} \frac{\partial}{\partial \theta_j}\left[\log \left(1 + \frac{1}{2} \text{tr}\left[ Z^\top B Z D(\theta) \right] + o(\rvert \rvert \theta \rvert \rvert) \right) \right] \\ &\overset{(ii)}{\approx} \frac{\partial}{\partial \theta_j}\left[ \frac{1}{2} \text{tr}\left[ Z^\top B Z D(\theta) \right] \right] \\ &\overset{(iii)}{=} \frac{1}{2} \frac{\partial}{\partial \theta_j}\left[ \text{tr}\left[ B Z D(\theta) Z^\top \right] \right] \\ &\overset{(iv)}{=} \frac{1}{2} \text{tr}\left[ B Z D'_j(\theta) Z^\top \right] \end{aligned} \nonumber $$ where we have used the Taylor expansion from Eq. \eqref{eq:taylor-expansion-2}.$(i)$ is due to the fact that $(a)$ has no $\theta$, so the derivative is $0$.
$(ii)$ is due to the fact that we assume $o(\rvert \rvert \theta \rvert \rvert)$ is small enough and use the fact that $\log(1 + x) \approx x$ for small enough $x$.
$(iii)$ is due to the cyclic property of the matrix trace.
$(iv)$ follows from the following argument: $$ \begin{aligned} \frac{\partial}{\partial \theta_j}\left[ \text{tr}\left[ B Z D(\theta) Z^\top \right] \right] &= \frac{\partial}{\partial \theta_j} \left[ \sum_{i = 1}^{n} \sum_{l = 1}^n B_{i,l} (Z D(\theta) Z^\top)_{i,l} \right] \\ &= \sum_{i = 1}^n \sum_{l = 1}^n \left( (ZD(\theta)Z^\top)_{i,l}\frac{\partial}{\partial \theta_j} B_{i,l} + B_{i,l} \frac{\partial}{\partial \theta_j} (ZD(\theta)Z^\top)_{i,l} \right) \\ &= \sum_{i = 1}^n \sum_{l = 1}^n \left( B_{i,l} (ZD'_j(\theta)Z^\top)_{i,l} \right) \\ &= \text{tr}\left[ BZ D'_j(\theta) Z^\top \right] \end{aligned} \nonumber $$ where in the second to last line we use the fact that there is no $\theta$ in $B$.
We can rewrite the score using the terms we defined in Eqs. \eqref{eq:intermed-terms}, \eqref{eq:helper-1}, \eqref{eq:helper-2}:
\[\begin{aligned} U_{\theta_j} &= \frac{1}{2}\left( (y - \mu)^\top \Delta^{-1} W Z D'_j(\theta) Z^\top W \Delta^{-1} (y - \mu) + \text{tr}\left[W_0 Z D'_j(\theta) Z^\top\right] \right)\\ &= \frac{1}{2}\left( \sum_{i = 1}^n \left[(y_i - \mu_i)^2 (\delta_{i}^{-1})^2 w_{i}^2 - w_{0, i} \right]\left( Z D'_j(\theta) Z^\top \right)_{i,i} + 2\sum_{i' < i} (y_i - \mu_i) (y_{i'} - \mu_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} w_{i} w_{i'} \left( Z D'_j(\theta) Z^\top \right)_{i,i'} \right) \end{aligned} \label{eq:rewrite-u-theta-j}\]Rewriting $U_{\theta_j}$.
$$ \begin{aligned} U_{\theta_j} &= \frac{1}{2} \text{tr}\left[ \left( \frac{\partial \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta} \frac{\partial \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta^\top} + \frac{\partial^2 \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta \partial \eta^\top} \right) ZD'(\theta)Z^\top\right] \\ &= \frac{1}{2}\text{tr}\left[(W \Delta^{-1}(y - \mu)(y - \mu)^\top \Delta^{-1}W + W_0)Z D'_j(\theta) Z^\top\right] \\ &= \frac{1}{2}\left( \text{tr}\left[W \Delta^{-1}(y - \mu)(y - \mu)^\top \Delta^{-1}WZ D'_j(\theta) Z^\top \right] - \text{tr}\left[W_0 Z D'_j(\theta) Z^\top\right] \right)\\ &= \frac{1}{2}\left( \text{tr}\left[ (y - \mu)^\top \Delta^{-1} W Z D'_j(\theta) Z^\top W \Delta^{-1}(y - \mu) \right] - \text{tr}\left[W_0 Z D'_j(\theta) Z^\top\right] \right) \\ &\overset{(i)}{=} \frac{1}{2}\left( (y - \mu)^\top \Delta^{-1} W Z D'_j(\theta) Z^\top W \Delta^{-1} (y - \mu) - \text{tr}\left[W_0 Z D'_j(\theta) Z^\top\right] \right) \\ \end{aligned} $$ For $(i)$, we use the fact that matrix product within the first trace term is a scalar. We can expand out the matrix/vector multiplication to get: $$ \begin{aligned} U_{\theta_j}(\hat{\alpha_0}) &= \frac{1}{2}\left( (y - \mu)^\top \Delta^{-1} W Z D'_j(\theta) Z^\top W \Delta^{-1} (y - \mu) - \text{tr}\left[W_0 Z D'_j(\theta) Z^\top\right] \right)\\ &= \frac{1}{2}\left( \sum_{i = 1}^n \sum_{l = 1}^n \left[ (y_i - \mu_i) (y_l - \mu_l) \left(\Delta^{-1} W Z D'_j(\theta) Z^\top W \Delta^{-1}\right)_{i,l} \right] - \sum_{i = 1}^n \sum_{l = 1}^n \left[(W_0)_{i,l} (Z D'_j(\theta) Z^\top)_{i,l} \right] \right) \\ &\overset{(i)}{=} \frac{1}{2}\left( \sum_{i = 1}^n \sum_{l = 1}^n \left[ (y_i - \mu_i) (y_l - \mu_l) \delta^{-1}_{i}\delta^{-1}_{l} w_{i} w_{l} \left( Z D'_j(\theta) Z^\top \right)_{i,l} \right] - \sum_{i = 1}^n \left[w_{0,i} (Z D'_j(\theta) Z^\top)_{i,i} \right] \right) \\ &= \frac{1}{2}\left( \sum_{i = 1}^n \left[(y_i - \mu_i)^2 (\delta_{i}^{-1})^2 w_{i}^2 - w_{0, i} \right]\left( Z D'_j(\theta) Z^\top \right)_{i,i} + 2\sum_{i' < i} (y_i - \mu_i) (y_{i'} - \mu_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} w_{i} w_{i'} \left( Z D'_j(\theta) Z^\top \right)_{i,i'} \right) \\ \end{aligned} \nonumber $$ In $(i)$, we use the fact that $W$, $W_0$, and $\Delta$ are diagonal matrices.We then evaluate the score by plugging in $\alpha = \hat{\alpha}_0$ (where $\hat{\alpha}_0$ is the MLE under the null) and $\theta = \text{vec}(0)$.
I think it's actually easier to just differentiate the log quasi-likelihood directly, since $\tau$ is scalar valued: $$ \begin{aligned} U_\tau &\approx \tau \text{tr} \left[ \left(\sum_{i = 1}^n Z_i (Y_{i, \cdot} - \mu_{i, \cdot}) \right) \left(\sum_{i = 1}^n Z_i (Y_{i, \cdot} - \mu_{i, \cdot})^\top \right) - \left(\sum_{i = 1}^n Z_i^2 \text{diag}(\mu_{i, \cdot})\right) \right] \end{aligned} \nonumber $$
Details.
$$ \begin{aligned} U_\tau &\approx \frac{\partial}{\partial \tau} \left[ \ell_\text{quasi}(\alpha, \tau; Y, X, Z) \right] \\ &= \frac{\partial}{\partial \tau} \left[ \sum_{i = 1}^n \sum_{j = 1}^k \ell^{(i,j)}_\text{quasi}(\alpha, \tau \rvert \mathbf{0}) + \frac{\tau^2}{2} \text{tr} \left[ \left(\sum_{i = 1}^n Z_i (Y_{i, \cdot} - \mu_{i, \cdot}) \right) \left(\sum_{i = 1}^n Z_i (Y_{i, \cdot} - \mu_{i, \cdot})^\top \right) - \left(\sum_{i = 1}^n Z_i^2 \text{diag}(\mu_{i, \cdot})\right) \right] \right] \\ &= \tau \text{tr} \left[ \left(\sum_{i = 1}^n Z_i (Y_{i, \cdot} - \mu_{i, \cdot}) \right) \left(\sum_{i = 1}^n Z_i (Y_{i, \cdot} - \mu_{i, \cdot})^\top \right) - \left(\sum_{i = 1}^n Z_i^2 \text{diag}(\mu_{i, \cdot})\right) \right] \end{aligned} \nonumber $$ This can be further simplified: $$ \begin{aligned} U_\tau &\approx \tau \sum_{j = 1}^k \left[ \left(\sum_{i = 1}^n Z_i (Y_{i, \cdot} - \mu_{i, \cdot}) \right) \left(\sum_{i = 1}^n Z_i (Y_{i, \cdot} - \mu_{i, \cdot})^\top \right) - \left(\sum_{i = 1}^n Z_i^2 \text{diag}(\mu_{i, \cdot})\right) \right]_{j, j} \\ &= \tau \sum_{j = 1}^k \left[ \left(\sum_{i = 1}^n Z_i (Y_{i, j} - \mu_{i, j}) \right) \left(\sum_{i = 1}^n Z_i (Y_{i, j} - \mu_{i, j}) \right) - \left( \sum_{i = 1}^n Z_i^2 \mu_{i,j} \right) \right]\\ &= \tau \left( \sum_{j = 1}^k Z^\top (Y_{\cdot, j} - \mu_{\cdot, j}) \left( Z^\top (Y_{\cdot, j} - \mu_{\cdot, j}) \right) - \sum_{j = 1}^k \left( \sum_{i = 1}^n Z_i^2 \mu_{i,j}\right) \right) \\ &= \tau \left( \sum_{j = 1}^k \left(Z^\top (Y_{\cdot, j} - \mu_{\cdot, j})\right)^2 - \sum_{j = 1}^k \left( \sum_{i = 1}^n Z_i^2 \mu_{i, j} \right) \right) \\ &= \tau \left( \left(Z^\top \sum_{j = 1}^k (Y_{\cdot, j} - \mu_{\cdot, j}) \right)^2 - \sum_{i = 1}^n Z_i^2 \sum_{j = 1}^k \mu_{i, j}\right) \\ &= \tau \left( \left(Z^\top (Y - \mu) \mathbf{1}_{k} \right)^2 - \sum_{i = 1}^n Z_i^2 \left( \mu_{i, \cdot} \mathbf{1}_k \right) \right) \\ \end{aligned} \nonumber $$ where $\mathbf{1}_k$ is a $k$-dimensional vector of $1$s. The multiplication produces the (vector of) row sum(s).Score For $\alpha$
Recall that we assumed that $D(\theta) = 0$ if $\theta = 0$ (see additional assumptions from set-up). This simplifies the score for $\alpha$ tremendously. As in the case of $U_{\theta_j}$, let $B = \frac{\partial \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta} \frac{\partial \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta^\top} + \frac{\partial^2 \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta \partial \eta^\top}$.
\[\begin{aligned} U_{\alpha_j} &= \frac{\partial \ell_\text{quasi}(\alpha, \theta; y)}{\partial \alpha_j} \\ &= W \Delta^{-1} \text{diag}(X_{\cdot, j}) (y - \mu) + \frac{1}{2} \text{tr} \left[ B'_j Z D(\theta) Z^\top \right] \end{aligned} \label{eq:u-alpha-j}\]where $\text{diag}(X_{\cdot, j})$ denotes the diagonal matrix with elements $x_{i,j}$, and $B’_j$ denotes the matrix of partial derivatives of $B$ with respect to $\alpha_j$.
Derivation Of $U_{\alpha_j}$.
$$ \begin{aligned} U_{\alpha_j} &= \frac{\partial \ell_\text{quasi}(\alpha, \theta; y)}{\partial \alpha_j} \\ &= \frac{\partial}{\partial \alpha_j} \left[ \sum_{i = 1}^n \ell_\text{quasi}(\alpha, \theta; y_i \rvert \text{vec}(0)) + \log \left(1 + \frac{1}{2}\text{tr} \left[ Z^\top \left( \frac{\partial \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta} \frac{\partial \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta^\top} + \frac{\partial^2 \ell_\text{quasi}(\alpha, \theta; y \rvert \text{vec}(0))}{\partial \eta \partial \eta^\top} \right) Z D(\theta) \right] + o(\rvert \rvert \theta \rvert \rvert) \right) \right] \\ &\overset{(i)}{\approx} \sum_{i = 1}^n \frac{\partial}{\partial \alpha_j} \left[ \ell_\text{quasi}(\alpha, \theta; y_i \rvert \text{vec}(0)) \right] + \frac{1}{2} \frac{\partial}{\partial \alpha_j} \left[ \text{tr}\left[ Z^\top B Z D(\theta)\right] \right] \\ &= \sum_{i = 1}^n \frac{\partial}{\partial \alpha_j} \left[ \ell_\text{quasi}(\alpha, \theta; y_i \rvert \text{vec}(0)) \right] + \frac{1}{2} \frac{\partial}{\partial \alpha_j} \left[ \text{tr}\left[ B Z D(\theta) Z^\top\right] \right] \\ &\overset{(ii)}{=} \sum_{i = 1}^n \frac{\partial}{\partial \alpha_j} \left[ \ell_\text{quasi}(\alpha, \theta; y_i \rvert \text{vec}(0)) \right] + \frac{1}{2} \text{tr}\left[ B'_j Z D(\theta) Z^\top \right] \\ &\overset{(iii)}{=} \sum_{i = 1}^n w_i \delta_i^{-1} (y_i - \mu_i) x_{i,j} + \frac{1}{2} \text{tr}\left[ B'_j Z D(\theta) Z^\top \right] \\ &= W \Delta^{-1} \text{diag}(X_{\cdot, j}) (y - \mu) + \frac{1}{2} \text{tr} \left[ B'_j Z D(\theta) Z^\top \right] \end{aligned} \nonumber $$ $(i)$ is due to the assumption that $o(\rvert \rvert \theta \rvert \rvert)$ is small enough and that $\log(1 + x) \approx x$ for small enough $x$.$(ii)$ is due to the following: $$ \begin{aligned} \frac{\partial}{\partial \alpha_j} \left[\text{tr}\left[ B Z D(\theta) Z^\top\right] \right] &= \text{tr}\left[ B'_j Z D(\theta) Z^\top \right] \\ &= \frac{\partial}{\partial \alpha_j} \left[ \sum_{i = 1}^n \sum_{i' = 1}^n B_{i,i'} (Z D(\theta) Z^\top)_{i,i'} \right] \\ &= \sum_{i = 1}^n \sum_{i' = 1}^n \frac{\partial}{\partial \alpha_j}\left[ B_{i,i'} \right] (Z D(\theta) Z^\top)_{i,i'} \\ &= \text{tr} \left[ B'_j Z D(\theta) Z^\top \right] \end{aligned} \nonumber $$ $(iii)$ is due to the following: $$ \begin{aligned} \frac{\partial}{\partial \alpha_j} \left[\ell_\text{quasi}(\alpha, \theta; y_i \rvert \text{vec}(0)) \right] &= \frac{\partial \mu_i}{\partial \alpha_j} \frac{\partial}{\partial \mu_i} \left[\ell_\text{quasi}(\alpha, \theta; y_i \rvert \text{vec}(0)) \right] \\ &= \left( \frac{x_{i,j}}{g'(\mu_i)}\right) \left(\frac{y_i - \mu_i}{V(\mu_i)} \right) \\ &= \frac{(y_i - \mu_i)(x_{i,j})}{V(\mu_i) g'(\mu_i)} \\ &= w_i \delta_i^{-1} (y_i - \mu_i) x_{i,j} \end{aligned} \nonumber $$
As before, we evaluate the score by plugging in $\alpha = \hat{\alpha}_0$ (where $\hat{\alpha}_0$ is the MLE under the null) and $\theta = \text{vec}(0)$. Since this means we set $D(\theta) = 0$, the score for $\alpha$ simplifies to:
\[U_{\alpha_j} = W \Delta^{-1} \text{diag}(X_{\cdot, j}) (y - \mu)\]In general, $\alpha$ is a matrix, so we have to do some vectorization. $U_\alpha$ should be a $(p \times k)$-dimensional vector. Since $p = 1$ in our case, it will just have $k$ dimensions. $$ U_{\alpha_j} = \sum_{i = 1}^n X_i (Y_{i,j} - \exp(X_i \alpha_j)) \implies U_\alpha \approx \sum_{i = 1}^n X_i (Y_{i,\cdot} - \mu_{i,\cdot})^\top \label{eq:alpha-j-example} $$
Details.
$$ \begin{aligned} U_{\alpha_j} &\approx \frac{\partial}{\partial \alpha_{j}} \left[ \sum_{i = 1}^n \sum_{j = 1}^k \ell^{(i,j)}_\text{quasi}(\alpha, \tau \rvert \mathbf{0}) + \frac{\tau^2}{2} \text{tr}\left[ \left(\sum_{i = 1}^n Z_i (Y_{i, \cdot} - \mu_{i, \cdot}) \right) \left(\sum_{i = 1}^n Z_i (Y_{i, \cdot} - \mu_{i, \cdot})^\top \right) - \left(\sum_{i = 1}^n Z_i^2 \text{diag}(\mu_{i, \cdot})\right) \right] \right] \\ &\overset{(i)}{\approx} \frac{\partial}{\partial \alpha_{j}} \left[ \sum_{i = 1}^n \sum_{j = 1}^k \ell^{(i,j)}_\text{quasi}(\alpha, \tau \rvert \mathbf{0}) \right] \\ &= \sum_{i = 1}^n \sum_{j = 1}^k \frac{\partial}{\partial \alpha_{j}} \left[ Y_{i, j}(X_i \alpha_j) - \exp(X_i \alpha_j) - \log(Y_{i,j}) + Y_{i,j} \right] \\ &= \sum_{i = 1}^n X_i Y_{i,j} - X_i \exp(X_i \alpha_j) \\ &= \sum_{i = 1}^n X_i (Y_{i,j} - \exp(X_i \alpha_j)) \\ \implies U_\alpha &\overset{(ii)}{\approx} \sum_{i = 1}^n X_i(Y_{i, \cdot} - \mu_{i, \cdot})^\top \end{aligned} $$ For $(i)$, when we plug in $\tau = 0$ (under the null), the second term disappears.For $(ii)$, we note that $\mu$ is under the null ($\beta = \mathbf{0}$).
Test Statistic
Let $\tilde{\mathcal{I}}$ denote the efficient information matrix of $\theta$ evaluated under the null hypothesis:
\[\tilde{\mathcal{I}} = \mathcal{I}_{\theta, \theta} - \mathcal{I}_{\alpha, \theta}^\top \mathcal{I}_{\alpha, \alpha}^{-1} \mathcal{I}_{\alpha, \theta}\]An Aside On $\tilde{\mathcal{I}}$.
As I understand it, I believe the formula $\tilde{\mathcal{I}} = \mathcal{I}_{\theta, \theta} - \mathcal{I}_{\alpha, \theta}^\top \mathcal{I}_{\alpha, \alpha}^{-1} \mathcal{I}_{\alpha, \theta}$ arises from standard likelihood theory.We can think of $U_\theta$ and $U_\alpha$ as components of one big score: $$ U = \begin{bmatrix} U_\theta \\ U_\alpha \end{bmatrix} \nonumber $$ Then we can partition the information matrix into the parts that correspond to the parameters of interest, $\theta$, and the nuisance parameters, $\alpha$: $$ \mathcal{I} = \begin{bmatrix} \mathcal{I}_{\theta, \theta} & \mathcal{I}_{\theta, \alpha} \\ \mathcal{I}_{\alpha, \theta} & \mathcal{I}_{\alpha, \alpha} \end{bmatrix} \nonumber $$
We are interested in $\theta$, which means we only really care about the portion of $\mathcal{I}^{-1}$ corresponding to $\theta$ (denote this by $\left(\mathcal{I}^{-1}\right)_{\theta, \theta}$). However: $\left(\mathcal{I}^{-1}\right)_{\theta, \theta} \neq \left(\mathcal{I}_{\theta, \theta} \right)^{-1}$ because we need to account for $\alpha$!
We then make use of the Schur complement for inverting a partitioned matrix. For a partitioned matrix, $M$, we have: $$ M = \begin{bmatrix} A & B \\ C & D \end{bmatrix} \implies M^{-1} = \begin{bmatrix} (A - BD^{-1} C)^{-1} & -(A - B D^{-1} C)^{-1} BD^{-1} \\ -D^{-1} C (A - BD^{-1}C)^{-1} & D^{-1} + D^{-1}C(A - B D^{-1} C)^{-1} B D^{-1} \end{bmatrix} \nonumber $$ We can think of this as a loss of information or an adjustment of the information matrix that stems from the estimation of $\alpha$.
with terms:
\[\begin{array}{ll} \mathcal{I}_{\theta, \theta} = \mathbb{E}\left[ \frac{\partial \ell_\text{quasi}(\alpha, \theta; y)}{\partial \theta} \frac{\partial \ell_\text{quasi}(\alpha, \theta; y)}{\partial \theta^\top}\right] & \mathcal{I}_{\theta_j, \theta_k} = \frac{1}{4} \sum_{i = 1}^n \left( \left( Z D'_j(\theta) Z^\top \right)_{i,i}\left( Z D'_k(\theta) Z^\top \right)_{i,i}r_{i,i} + 2 \sum_{i' < i} \left( Z D'_j(\theta) Z^\top \right)_{i,i'} r_{i, i'} \right) \\ \mathcal{I}_{\alpha, \theta} = \mathbb{E}\left[ \frac{\partial \ell_\text{quasi}(\alpha, \theta; y)}{\partial \alpha} \frac{\partial \ell_\text{quasi}(\alpha, \theta; y)}{\partial \theta^\top}\right] & \mathcal{I}_{\alpha_j, \theta_k} = \frac{1}{2} \sum_{i = 1}^n (Z D'_k(\theta) Z^\top)_{i,i} x_{i,j} c_i \\ \mathcal{I}_{\alpha, \alpha} = \mathbb{E}\left[ \frac{\partial \ell_\text{quasi}(\alpha, \theta; y)}{\partial \alpha} \frac{\partial \ell_\text{quasi}(\alpha, \theta; y)}{\partial \alpha^\top}\right] & \mathcal{I}_{\alpha_j, \alpha_k} = \sum_{i = 1}^n w_{i} x_{i,j} x_{i,k}\\ \end{array}\]where:
\[\begin{array}{l} r_{i,i} = w_i^4 \delta_i^{-4} \kappa_{4,i} + 2 w_i^2 + e_i^2 \kappa_{2,i} - 2 w_i^2 \delta_i^{-2} e_i \kappa_{3,i} \\ r_{i, i'} = 2 w_i w_{i'} \\ c_{i} = w_i^3 \delta_i^{-3} \kappa_{3,i} - w_i \delta_i^{-1} e_i \kappa_{2,i} \end{array} \nonumber\]Details Of $\mathcal{I}_{\theta_j, \theta_k}$.
$$ \begin{aligned} \mathcal{I}_{\theta_j, \theta_k} &= \mathbb{E}\left[U_{\theta_j} U_{\theta_k}\right] \\ &= \frac{1}{4}\mathbb{E}\left[ \left( \sum_{i = 1}^n \left[(y_i - \mu_i)^2 (\delta_{i}^{-1})^2 w_{i}^2 - w_{0, i} \right]\left( Z D'_j(\theta) Z^\top \right)_{i,i} + 2\sum_{i' < i} (y_i - \mu_i) (y_{i'} - \mu_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} w_{i} w_{i'} \left( Z D'_j(\theta) Z^\top \right)_{i,i'}\right) \right. \\ &\hspace{15mm}\left. \times \left( \sum_{i = 1}^n \left[(y_i - \mu_i)^2 (\delta_{i}^{-1})^2 w_{i}^2 - w_{0, i} \right]\left( Z D'_k(\theta) Z^\top \right)_{i,i} + 2\sum_{i' < i} (y_i - \mu_i) (y_{i'} - \mu_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} w_{i} w_{i'} \left( Z D'_k(\theta) Z^\top \right)_{i,i'}\right) \right] \\ &= \frac{1}{4} \mathbb{E}\left[ \left(\sum_{i = 1}^n \left[(y_i - \mu_i)^2 (\delta_{i}^{-1})^2 w_{i}^2 - w_{0, i} \right]\left( Z D'_j(\theta) Z^\top \right)_{i,i} \right)\left(\sum_{i = 1}^n \left[(y_i - \mu_i)^2 (\delta_{i}^{-1})^2 w_{i}^2 - w_{0, i} \right]\left( Z D'_k(\theta) Z^\top \right)_{i,i} \right) \right. \\ &\hspace{15mm} + \left. \left(\sum_{i = 1}^n \left[(y_i - \mu_i)^2 (\delta_{i}^{-1})^2 w_{i}^2 - w_{0, i} \right]\left( Z D'_j(\theta) Z^\top \right)_{i,i} \right) \left(2 \sum_{i = 1}^n \sum_{i' < i} (y_i - \mu_i) (y_{i'} - \mu_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} w_{i} w_{i'} \left( Z D'_k(\theta) Z^\top \right)_{i,i'} \right) \right. \\ &\hspace{15mm} + \left. \left(\sum_{i = 1}^n \left[(y_i - \mu_i)^2 (\delta_{i}^{-1})^2 w_{i}^2 - w_{0, i} \right]\left( Z D'_k(\theta) Z^\top \right)_{i,i} \right) \left(2 \sum_{i = 1}^n \sum_{i' < i} (y_i - \mu_i) (y_{i'} - \mu_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} w_{i} w_{i'} \left( Z D'_j(\theta) Z^\top \right)_{i,i'} \right) \right. \\ &\hspace{15mm} + \left. \left(2 \sum_{i = 1}^n \sum_{i' < i} (y_i - \mu_i) (y_{i'} - \mu_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} w_{i} w_{i'} \left( Z D'_j(\theta) Z^\top \right)_{i,i'} \right)\left(2 \sum_{i = 1}^n \sum_{i' < i} (y_i - \mu_i) (y_{i'} - \mu_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} w_{i} w_{i'} \left( Z D'_k(\theta) Z^\top \right)_{i,i'} \right) \right]\\ &\overset{(i)}{=} \frac{1}{4} \mathbb{E}\left[ \left(\sum_{i = 1}^n \left( Z D'_j(\theta) Z^\top \right)_{i,i}\left( Z D'_k(\theta) Z^\top \right)_{i,i}\left[(y_i - \mu_i)^2 (\delta_{i}^{-1})^2 w_{i}^2 - w_{0, i} \right]^2 \right) + 4\left(\sum_{i = 1}^n \sum_{i' < i} \left( (y_i - \mu_i) (y_{i'} - \mu_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} w_{i} w_{i'}\right)^2 \left( Z D'_j(\theta) Z^\top \right)_{i,i'} \left( Z D'_k(\theta) Z^\top \right)_{i,i'} \right) \right] \\ &= \frac{1}{4}\left(\sum_{i = 1}^n \left( Z D'_j(\theta) Z^\top \right)_{i,i}\left( Z D'_k(\theta) Z^\top \right)_{i,i}\mathbb{E}\left[((y_i - \mu_i)^2 (\delta_{i}^{-1})^2 w_{i}^2 - w_{0, i})^2\right] + 4\left(\sum_{i = 1}^n \sum_{i' < i} \left( Z D'_j(\theta) Z^\top \right)_{i,i'} \left( Z D'_k(\theta) Z^\top \right)_{i,i'}\mathbb{E}\left[((y_i - \mu_i) (y_{i'} - \mu_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} w_{i} w_{i'})^2 \right] \right) \right) \\ &\overset{(ii)}{=}\frac{1}{4}\left(\sum_{i = 1}^n \left( Z D'_j(\theta) Z^\top \right)_{i,i}\left( Z D'_k(\theta) Z^\top \right)_{i,i}r_{i,i} + 4\left(\sum_{i = 1}^n \sum_{i' < i} \left( Z D'_j(\theta) Z^\top \right)_{i,i'} \left( Z D'_k(\theta) Z^\top \right)_{i,i'}\mathbb{E}\left[((y_i - \mu_i) (y_{i'} - \mu_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} w_{i} w_{i'})^2 \right] \right) \right) \\ &\overset{(iii)}{=} \frac{1}{4}\left(\sum_{i = 1}^n \left( Z D'_j(\theta) Z^\top \right)_{i,i}\left( Z D'_k(\theta) Z^\top \right)_{i,i}r_{i,i} + 4\left(\sum_{i = 1}^n \sum_{i' < i} \left( Z D'_j(\theta) Z^\top \right)_{i,i'} w_i w_{i'}\right)\right) \\ &\overset{(iv)}{=} \frac{1}{4} \sum_{i = 1}^n \left( \left( Z D'_j(\theta) Z^\top \right)_{i,i}\left( Z D'_k(\theta) Z^\top \right)_{i,i}r_{i,i} + 2 \sum_{i' < i} \left( Z D'_j(\theta) Z^\top \right)_{i,i'} r_{i, i'} \right) \end{aligned} \nonumber $$ In $(i)$, we use the fact that the $y_i$'s are independent and have mean $\mu_i$ under the null. Thus, any term that has $(y_i - \mu_i)$ (to only a power of $1$) will be $0$ in expectation and remove the term entirely from the sum. In $(ii)$, we use the fact that $\mathbb{E}\left[ (y_i - \mu_i)^4 \right] = \kappa_{4, i} + 3\kappa_{2, i}^2$ and $\kappa_{2, i} = \text{Var}(\mu_i)$. Substituting in for $w_{0,i}$, we see that: $$ \begin{aligned} \mathbb{E}\left[((y_i - \mu_i)^2 (\delta_i^{-1})^2 w_i^2 - w_{0,i})^2 \right] &= (\delta_i^{-1})^4 w_i^4 \kappa_{4,i} + 2w_i^2 + e_i^2 \kappa_{2,i} - 2 w_i^2 (\delta_i^{-1})^2 e_i \kappa_{3,i} \\ &= r_{i,i} \end{aligned} $$ where in the last line we let $r_{i,i} = w_i^4 \delta_i^{-4} \kappa_{4,i} + 2 w_i^2 + e_i^2 \kappa_{2,i} - 2 w_i^2 \delta_i^{-2} e_i \kappa_{3,i}$ as in (Lin, 1997).Derivation Of $(ii)$.
$$ \begin{aligned} \mathbb{E}\left[((y_i - \mu_i)^2 (\delta_i^{-1})^2 w_i^2 - w_{0,i})^2 \right] &= \mathbb{E}\left[((y_i - \mu_i)^2 (\delta_i^{-1})^2 w_i^2)^2 - 2w_{0,i}(y_i - \mu_i)^2 (\delta_i^{-1})^2 w_i^2 + w_{0,i}^2\right] \\ &= (\delta_i^{-1})^4 w_i^4 \mathbb{E}\left[(y_i - \mu_i)^4\right] - 2 \mathbb{E}\left[w_{0,i}(y_i - \mu_i)^2 (\delta_i^{-1})^2 w_i^2\right] + \mathbb{E}\left[w_{0,i}^2\right] \\ &= (\delta_i^{-1})^4 w_i^4 (\kappa_{4,i} + 3\kappa_{2,i}^2) - 2(\delta_i^{-1})^2w_i^2 \mathbb{E}\left[ (w_i + e_i(y_i - \mu_i))(y_i-\mu_i)^2 \right] + \mathbb{E}\left[ (w_i + e_i(y_i - \mu_i))^2 \right] \\ &= (\delta_i^{-1})^4 w_i^4 (\kappa_{4,i} + 3\kappa_{2,i}^2) - 2(\delta_i^{-1})^2w_i^2 \left( w_i\mathbb{E}\left[ (y_i - \mu_i)^2 \right] + e_i \mathbb{E}\left[(y_i - \mu_i)^3 \right]\right) + w_i^2 + 2w_i e_i \mathbb{E}\left[ y_i - \mu_i\right] + e_i^2 \mathbb{E}\left[ (y_i - \mu_i)^2 \right] \\ &= (\delta_i^{-1})^4 w_i^4 \kappa_{4,i} + 3(\delta_i^{-1})^4 w_i^4 \kappa_{2,i}^2 - 2(\delta_i^{-1})^2w_i^3 \kappa_{2,i} - 2(\delta_i^{-1})^2w_i^2 e_i \mathbb{E}\left[ (y_i - \mu_i)^3 \right] + w_i^2 + e_i^2 \kappa_{2,i} \\ &\overset{(a)}{=} (\delta_i^{-1})^4 w_i^4 \kappa_{4,i} + 3(\delta_i^{-1})^2 w_i^2 (\delta_i)^2 - 2\delta_i^{-1}w_i^2 \delta_i - 2(\delta_i^{-1})^2 w_i^2 e_i \mathbb{E}\left[ (y_i - \mu_i)^3 \right] + w_i^2 + e_i^2 \kappa_{2,i} \\ &= (\delta_i^{-1})^4 w_i^4 \kappa_{4,i} + 3w_i^2 - 2w_i^2 + w_i^2 - 2(\delta_i^{-1})^2 w_i^2 e_i \mathbb{E}\left[ (y_i - \mu_i)^3 \right] + e_i^2 \kappa_{2,i} \\ &\overset{(b)}{=} (\delta_i^{-1})^4 w_i^4 \kappa_{4,i} + 2w_i^2 + e_i^2 \kappa_{2,i} - 2 w_i^2 (\delta_i^{-1})^2 e_i \kappa_{3,i} \end{aligned} \nonumber $$ In $(a)$, we have that: $$ \delta_i^{-1} w_i = \left( \frac{1}{g'(\mu_i)} \right)^{-1} \frac{1}{V(\mu_i) (g'(\mu_i))^2} = \frac{1}{V(\mu_i) g'(\mu_i)} \implies \delta_i^{-1} w_i \kappa_{2,i} = \frac{1}{V(\mu_i) g'(\mu_i)} V(\mu_i) = \delta_i $$ In $(b)$, we use the fact that the third cumulant is equal to the third central moment.Derivation Of $(iii)$.
$$ \begin{aligned} \mathbb{E}\left[((y_i - \mu_i) (y_{i'} - \mu_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} w_{i} w_{i'})^2 \right] &= (\delta_i^{-1}\delta_{i'}^{-1})^2w_i^2 w_{i'}^2 \mathbb{E}\left[ (y_i - \mu_i)^2(y_{i'} - \mu_{i'})^2\right] \\ &= (\delta_i^{-1}\delta_{i'}^{-1})^2w_i^2 w_{i'}^2 \mathbb{E}\left[ (y_i - \mu_i)^2\right] \mathbb{E}\left[(y_{i'} - \mu_{i'})^2\right] \\ &= (\delta_i^{-1})^2 w_i^2 V(\mu_i) (\delta_{i'}^{-1})^2 w_{i'}^2 V(\mu_{i'}) \\ &\overset{(a)}{=} w_i V(\mu_i) w_{i'}V(\mu_{i'}) \nonumber \end{aligned} $$ In $(a)$, we use the fact that $(\delta_i^{-1})^2 V(\mu_i) = (g'(\mu_i))^2 V(\mu_i) = w_i^{-1}$.Details Of $\mathcal{I}_{\alpha_j, \theta_k}$.
$$ \begin{aligned} \mathcal{I}_{\alpha_j, \theta_k} &= \mathbb{E}\left[ U_{\alpha_j} U_{\theta_k} \right] \\ &= \frac{1}{2} \mathbb{E}\left[ W \Delta^{-1} \text{diag}(X_{\cdot, j}) (y - \mu) \left( \sum_{i = 1}^n \left[(y_i - \mu_i)^2 (\delta_{i}^{-1})^2 w_{i}^2 - w_{0, i} \right] \left( Z D'_k(\theta) Z^\top \right)_{i,i} + 2\sum_{i' < i} (y_i - \mu_i) (y_{i'} - \mu_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} w_{i} w_{i'} \left( Z D'_k(\theta) Z^\top \right)_{i,i'}\right) \right] \\ &= \frac{1}{2} \mathbb{E}\left[ \left(\sum_{i = 1}^n w_i \delta_i^{-1} x_{i,j} (y_i - \mu_i) \right) \left( \sum_{i = 1}^n \left[(y_i - \mu_i)^2 (\delta_{i}^{-1})^2 w_{i}^2 - w_{0, i} \right] \left( Z D'_k(\theta) Z^\top \right)_{i,i} + 2\sum_{i' < i} (y_i - \mu_i) (y_{i'} - \mu_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} w_{i} w_{i'} \left( Z D'_k(\theta) Z^\top \right)_{i,i'}\right) \right] \\ &= \frac{1}{2} \left( \mathbb{E}\left[ \left(\sum_{i = 1}^n w_i \delta_i^{-1} x_{i,j} (y_i - \mu_i) \right) \left( \sum_{i = 1}^n \left[(y_i - \mu_i)^2 (\delta_{i}^{-1})^2 w_{i}^2 - w_{0, i} \right] \left( Z D'_k(\theta) Z^\top \right)_{i,i} \right) \right] + \mathbb{E}\left[ \left( \sum_{i = 1}^n w_i \delta_i^{-1} x_{i,j} (y_i - \mu_i) \right) \left(2 \sum_{i = 1}^n \sum_{i' < i} (y_i - \mu_i) (y_{i'} - \mu_{i'}) \delta^{-1}_{i}\delta^{-1}_{i'} w_{i} w_{i'} \left( Z D'_k(\theta) Z^\top \right)_{i,i'} \right) \right] \right) \\ &= \frac{1}{2} \left( \sum_{i = 1}^n \mathbb{E}\left[ w_i \delta_i^{-1} x_{i,j} (y_i - \mu_i) \left[(y_i - \mu_i)^2 (\delta_{i}^{-1})^2 w_{i}^2 - w_{0, i} \right] \left( Z D'_k(\theta) Z^\top \right)_{i,i} \right] \right. \\ &\hspace{15mm} + \left. 2 \sum_{i = 1}^n \sum_{i' < i} \mathbb{E}\left[ w_i \delta_i^{-1} x_{i,j} (y_i - \mu_i) \left[(y_{i'} - \mu_{i'})^2 (\delta_{i'}^{-1})^2 w_{i'}^2 - w_{0, i'} \right] \left( Z D'_k(\theta) Z^\top \right)_{i',i'} \right] + 2 \sum_{i = 1}^n \sum_{i' < i} \mathbb{E}\left[ w_i^2 \delta_i^{-2}(y_i - \mu_i)^2 \delta_{i'}^{-1} w_{i'} (y_{i'} - \mu_{i'}) \left( Z D_k'(\theta) Z^\top \right)_{i',i'} \right] \right. \\ &\hspace{15mm} + \left. 2 \sum_{i = 1}^n \sum_{i' < i} \mathbb{E}\left[ (y_i - \mu_i) (y_{i'} - \mu_{i'})^2 \delta^{-1}_{i}\delta^{-2}_{i'} w_{i} w_{i'}^2 \left( Z D'_k(\theta) Z^\top \right)_{i,i'} \right] + 2 \sum_{i = 1}^n \sum_{i' < i} \sum_{i'' \neq i, i'} (y_i - \mu_i) (y_{i'} - \mu_{i'}) (y_{i''} - \mu_{i''}) \delta^{-1}_{i}\delta^{-1}_{i'} \delta^{-1}_{i''} w_{i} w_{i'} w_{i''} x_{i'', j} \left( Z D'_k(\theta) Z^\top \right)_{i,i'} \right) \\ &\overset{(i)}{=} \frac{1}{2} \sum_{i = 1}^n \left( \mathbb{E}\left[ (y_i - \mu_i)^3 \right] w_i^3 \delta_i^{-3} x_{i,j} (Z D'_k(\theta) Z^\top)_{i,i} - \mathbb{E}\left[ (y_i - \mu_i) w_{0,i} \right] w_i \delta_i^{-1}x_{i,j} (Z D'_k(\theta) Z^\top)_{i,i} \right) \\ &\overset{(ii)}{=} \frac{1}{2} \sum_{i = 1}^n \left( \mathbb{E}\left[ (y_i - \mu_i)^3 \right] w_i^3 \delta_i^{-3} x_{i,j} (Z D'_k(\theta) Z^\top)_{i,i} - \mathbb{E}\left[ (y_i - \mu_i)^2\right] e_i w_i \delta_i^{-1}x_{i,j} (Z D'_k(\theta) Z^\top)_{i,i} \right) \\ &= \frac{1}{2}\sum_{i = 1}^n (Z D'_k(\theta) Z^\top)_{i,i} x_{i,j} \left( w_i^3 \delta_i^{-3} \kappa_{3, i} - w_i \delta_i^{-1} e_i \kappa_{2, i} \right) \\ &= \frac{1}{2} \sum_{i = 1}^n (Z D'_k(\theta) Z^\top)_{i,i} x_{i,j} c_i \end{aligned} $$ where in the last line we let $c_{i} = w_i^3 \delta_i^{-3} \kappa_{3,i} - w_i \delta_i^{-1} \kappa_{2,i}$.For $(i)$, we have assumed that the $y_i$'s are independent, so we can split apart the expectations of products into the products of expectations. This means that most of the terms become $0$ since $\mathbb{E}[y_i - \mu_i] = 0$.
For $(ii)$, we use the fact that: $$ \begin{aligned} \mathbb{E}\left[(y_i - \mu_i)w_{0,i} \right] &= \mathbb{E}\left[ (y_i - \mu_i)(w_i + e_i(y_i - \mu_i)) \right] \\ &= \mathbb{E}\left[ (y_i - \mu_i)w_i\right] + \mathbb{E}\left[e_i(y_i - \mu_i)^2 \right] \\ &= e_i \mathbb{E}[(y_i - \mu_i)^2] \end{aligned} $$
Details Of $\mathcal{I}_{\alpha_j, \alpha_k}$.
$$ \begin{aligned} \mathcal{I}_{\alpha_j, \alpha_k} &= \mathbb{E}\left[ U_{\alpha_j} U_{\alpha_k} \right] \\ &= \mathbb{E}\left[ \left( \sum_{i = 1}^n w_i \delta_i^{-1} x_{i,j} (y_i - \mu_i) \right) \left( \sum_{i = 1}^n w_i \delta_i^{-1} x_{i,k} (y_i - \mu_i) \right) \right] \\ &= \sum_{i = 1}^n \mathbb{E}\left[ w_i^2 \delta_i^{-2} x_{i,j} x_{i,k} (y_i - \mu_i)^2 \right] + \sum_{i = 1}^n \sum_{i' < i} \mathbb{E}\left[ w_i w_{i'}\delta_i^{-1} \delta_{i'}^{-1} x_{i,j} x_{i', k} (y_i - \mu_i) (y_{i'} - \mu_{i'}) \right] + \sum_{i = 1}^n \sum_{i' < i} \mathbb{E}\left[ w_i w_{i'}\delta_i^{-1} \delta_{i'}^{-1} x_{i,k} x_{i', j} (y_i - \mu_i) (y_{i'} - \mu_{i'}) \right] \\ &\overset{(i)}{=} \sum_{i = 1}^n \mathbb{E}\left[ (y_i - \mu_i)^2 \right] w_i^2 \delta_i^{-2} x_{i,j} x_{i,k} \\ &= \sum_{i = 1}^n \frac{1}{(V(\mu_i) (g'(\mu_i))^2)^2} (g'(\mu_i))^2 x_{i,j} x_{i,k} \kappa_{2,i} \\ &= \sum_{i = 1}^n \frac{1}{V(\mu_i) (g'(\mu_i))^2} x_{i,j} x_{i, k} \\ &= \sum_{i = 1}^n w_{i} x_{i,j} x_{i,k} \end{aligned} $$ For $(i)$, we have assumed that the $y_i$'s are independent, so we can split apart the expectations of products into the products of expectations. This means all but the first term become $0$ since $\mathbb{E}[y_i - \mu_i] = 0$.If we let \(A^j = Z D'_j(\theta) Z^\top\), $A^j_\text{diag}$ denote the $n$-dimensional vector of the diagonal elements of $A^j$, $\mathbf{1}$ denote a vector of ones of the appropriate dimensions, $R$ denote the matrix with diagonal elements $r_{i,i}$ and off-diagonal elements $r_{i, i’}$, $C$ denote the diagonal matrix with elements $c_i$, and $\circ$ denote the Hadamard (element-wise) product, then we can write the pieces of the information matrix as:
\[\begin{array}{l} \mathcal{I}_{\theta_j, \theta_k} = \frac{1}{4} \mathbf{1}^\top \left( A^j \circ R \circ A^k \right) \mathbf{1} \\ \mathcal{I}_{\alpha, \theta_j} = \frac{1}{2} X^\top C A^j_\text{diag} \\ \mathcal{I}_{\alpha, \alpha} = X^\top W X \end{array} \label{eq:general-Is}\]The fact that we have multiple targets ($Y$ is a matrix) makes it a bit tricky to use the equations we derived previously. Unfortunately, I think the easiest way is just to redo the derivations for this example by hand 😞.
Let's keep in mind our dimensions. $\mathcal{I}_{\tau, \tau}$ should be a scalar because $\tau$ is a scalar. $\mathcal{I}_{\alpha, \tau}$ should be a vector with $p \times k = k$ dimensions. $\mathcal{I}_{\alpha, \alpha}$ should be a matrix that is $(p \times k) \times (p \times k) = k \times k$. $$ \begin{aligned} \mathcal{I}_{\tau, \tau} &\approx \tau^2 \left[ \sum_{i = 1}^n Z_i^4 \left( \sum_{j = 1}^k \left( \kappa_{4; i, j} + 3 \mu_{i,j}^2 \right) \right) - \sum_{i = 1}^n Z_i^4 \left(\mu_{i,\cdot} \mathbf{1}_{k} \right)^2 + 4 \sum_{i = 1}^nZ_i^2 \left(\mu_{i,\cdot} \mathbf{1}_{k} \right) \sum_{i' < i} Z_{i'}^2 \left(\mu_{i',\cdot} \mathbf{1}_{k} \right) \right] \end{aligned} \label{eq:example-I-tau-tau} $$
Details Of $\mathcal{I}_{\tau, \tau}$.
Let $\gamma^t_i := \sum_{j = 1}^k (Y_{i,j} - \mu_{i,j})^t$. Note that $\mathbb{E}\left[ \gamma_i \right] = 0$ and $\mathbb{E}\left[\gamma^2_i \right] = \sum_{j = 1}^k \mu_{i,j}$. Then: $$ \begin{aligned} \mathcal{I}_{\tau, \tau} &= \mathbb{E}\left[ U_\tau^2 \right] \\ &\approx \tau^2 \mathbb{E}\left[ \left( \text{tr}\left[ \left(\sum_{i = 1}^n Z_i (Y_{i, \cdot} - \mu_{i, \cdot}) \right) \left(\sum_{i = 1}^n Z_i (Y_{i, \cdot} - \mu_{i, \cdot})^\top \right) - \left(\sum_{i = 1}^n Z_i^2 \text{diag}(\mu_{i, \cdot})\right) \right]\right)^2 \right] \\ &= \tau^2 \mathbb{E}\left[ \left( \sum_{j = 1}^k \left[ \sum_{j' = 1}^k \left( \sum_{i = 1}^n Z_i (Y_{i, j} - \mu_{i, j}) \right) \left( \sum_{i = 1}^n Z_i (Y_{i, j'} - \mu_{i, j'})\right) - \sum_{i = 1}^n Z_i^2\mu_{i, j} \right] \right)^2 \right] \\ &= \tau^2 \mathbb{E}\left[ \left( \left( \sum_{j = 1}^k \sum_{i = 1}^n Z_i (Y_{i,j} - \mu_{i,j}) \right)\left( \sum_{j' = 1}^k \sum_{i = 1}^n Z_i (Y_{i,j'} - \mu_{i,j'}) \right) - \sum_{j = 1}^k \sum_{i = 1}^n Z_i^2 \mu_{i,j} \right)^2\right] \\ &= \tau^2 \underbrace{\mathbb{E}\left[ \left( \sum_{j = 1}^k \sum_{i = 1}^n Z_i (Y_{i,j} - \mu_{i,j}) \right)^2 \left( \sum_{j' = 1}^k \sum_{i = 1}^n Z_i (Y_{i,j'} - \mu_{i,j'}) \right)^2 \right]}_{(a)} \\ &\hspace{15mm} - 2 \tau^2 \underbrace{\mathbb{E}\left[ \left( \sum_{j = 1}^k \sum_{i = 1}^n Z_i (Y_{i,j} - \mu_{i,j}) \right)\left( \sum_{j' = 1}^k \sum_{i = 1}^n Z_i (Y_{i,j'} - \mu_{i,j'}) \right) \left( \sum_{j = 1}^k \sum_{i = 1}^n Z_i^2 \mu_{i,j} \right)\right]}_{(b)} \\ &\hspace{15mm} + \tau^2 \mathbb{E}\left[ \left( \sum_{j = 1}^k \sum_{i = 1}^n Z_i^2 \mu_{i,j} \right)^2 \right] \\ &= \tau^2 \sum_{i = 1}^n Z_i^4 \mathbb{E}\left[ \gamma_i^4 \right] + 6 \tau^2 \sum_{i = 1}^n \sum_{i' < i} Z_i^2 Z_{i'}^2 \left( \sum_{j = 1}^k \mu_{i,j} \right) \left( \sum_{j = 1}^k \mu_{i',j} \right) - 2\tau^2 \left( \sum_{i = 1}^n Z_i^2 \sum_{j = 1}^k \mu_{i,j} \right)^2 + \tau^2 \left( \sum_{i = 1}^n Z_i^2 \sum_{j = 1}^k \mu_{i,j} \right)^2 \\ &= \tau^2 \left[ \sum_{i = 1}^n Z_i^4 \mathbb{E}\left[ \gamma_i^4 \right] + 6 \sum_{i = 1}^n \sum_{i' < i} Z_i^2 Z_{i'}^2 \left( \sum_{j = 1}^k \mu_{i,j} \right) \left( \sum_{j = 1}^k \mu_{i',j} \right) - \left( \sum_{i = 1}^n Z_i^2 \sum_{j = 1}^k \mu_{i,j} \right)^2 \right] \\ &= \tau^2 \left[ \sum_{i = 1}^n Z_i^4 \mathbb{E}\left[ \gamma_i^4 \right] + 6 \sum_{i = 1}^n \sum_{i' < i} Z_i^2 Z_{i'}^2 \left( \sum_{j = 1}^k \mu_{i,j} \right) \left( \sum_{j = 1}^k \mu_{i',j} \right) - \sum_{i = 1}^n Z_i^4 \left(\sum_{j = 1}^k \mu_{i,j}\right)^2 - 2 \sum_{i = 1}^n \sum_{i' < i} Z_i^2 Z_{i'}^2 \left(\sum_{j = 1}^k \mu_{i,j}\right)\left(\sum_{j = 1}^k \mu_{i',j}\right) \right] \\ &= \tau^2 \left[ \sum_{i = 1}^n Z_i^4 \underbrace{\mathbb{E}\left[ \gamma_i^4 \right]}_{(c)} - \sum_{i = 1}^n Z_i^4 \left(\sum_{j = 1}^k \mu_{i,j}\right)^2 + 4 \sum_{i = 1}^n \sum_{i' < i} Z_i^2 Z_{i'}^2 \left(\sum_{j = 1}^k \mu_{i,j}\right)\left(\sum_{j = 1}^k \mu_{i',j}\right) \right] \\ &= \tau^2 \left[ \sum_{i = 1}^n Z_i^4 \left( \sum_{j = 1}^k \left( \kappa_{4; i, j} + 3 \mu_{i,j}^2 \right) \right) - \sum_{i = 1}^n Z_i^4 \left(\sum_{j = 1}^k \mu_{i,j}\right)^2 + 4 \sum_{i = 1}^n \sum_{i' < i} Z_i^2 Z_{i'}^2 \left(\sum_{j = 1}^k \mu_{i,j}\right)\left(\sum_{j = 1}^k \mu_{i',j}\right) \right] \\ &= \tau^2 \left[ \sum_{i = 1}^n Z_i^4 \left( \sum_{j = 1}^k \left( \kappa_{4; i, j} + 3 \mu_{i,j}^2 \right) \right) - \sum_{i = 1}^n Z_i^4 \left(\mu_{i,\cdot} \mathbf{1}_{k} \right)^2 + 4 \sum_{i = 1}^n \sum_{i' < i} Z_i^2 Z_{i'}^2 \left(\mu_{i,\cdot} \mathbf{1}_{k} \right)\left(\mu_{i',\cdot} \mathbf{1}_{k} \right) \right] \\ &= \tau^2 \left[ \sum_{i = 1}^n Z_i^4 \left( \sum_{j = 1}^k \left( \kappa_{4; i, j} + 3 \mu_{i,j}^2 \right) \right) - \sum_{i = 1}^n Z_i^4 \left(\mu_{i,\cdot} \mathbf{1}_{k} \right)^2 + 4 \sum_{i = 1}^nZ_i^2 \left(\mu_{i,\cdot} \mathbf{1}_{k} \right) \sum_{i' < i} Z_{i'}^2 \left(\mu_{i',\cdot} \mathbf{1}_{k} \right) \right] \end{aligned} \nonumber $$Proof Of $(a)$.
$$ \begin{aligned} \left( \sum_{j = 1}^k \sum_{i = 1}^n Z_i (Y_{i,j} - \mu_{i,j}) \right)^2 &= \left( \sum_{j = 1}^k \sum_{i = 1}^n Z_i (Y_{i,j} - \mu_{i,j}) \right) \left( \sum_{j' = 1}^k \sum_{i' = 1}^n Z_{i'} (Y_{i',j'} - \mu_{i',j'}) \right) \\ &= \sum_{i = 1}^n \sum_{i' = 1}^n \sum_{j = 1}^k \sum_{j' = 1}^k Z_i Z_{i'} (Y_{i,j} - \mu_{i,j}) (Y_{i',j'} - \mu_{i',j'}) \\ &= \sum_{i = 1}^n \left[ \sum_{j = 1}^k \sum_{j' = 1}^k Z_i^2 (Y_{i,j} - \mu_{i,j}) (Y_{i,j'} - \mu_{i,j'}) + 2 \sum_{i' < i} \sum_{j = 1}^k \sum_{j' = 1}^k Z_i Z_{i'} (Y_{i,j} - \mu_{i,j}) (Y_{i',j'} - \mu_{i',j'}) \right] \\ &= \sum_{i = 1}^n \left[ Z_i^2 \left(\sum_{j = 1}^k (Y_{i,j} - \mu_{i,j}) \right) \left( \sum_{j' = 1}^k (Y_{i,j'} - \mu_{i,j'})\right) + 2 \sum_{i' < i} Z_i Z_{i'} \left( \sum_{j = 1}^k(Y_{i,j} - \mu_{i,j})\right)\left( \sum_{j' = 1}^k (Y_{i',j'} - \mu_{i',j'}) \right)\right] \\ &= \sum_{i = 1}^n \left[ Z_i^2 \gamma_i^2 + 2 \sum_{i' < i} Z_i Z_{i'} \gamma_i \gamma_{i'} \right] \end{aligned} \nonumber $$ $$ \begin{aligned} \left( \sum_{j = 1}^k \sum_{i = 1}^n Z_i (Y_{i,j} - \mu_{i,j}) \right)^4 &= \left( \sum_{i = 1}^n \left[ Z_i^2 \gamma_i^2 + 2 \sum_{i' < i} Z_i Z_{i'} \gamma_i \gamma_{i'} \right] \right)^2 \\ &= \left( \sum_{i = 1}^n \left[ Z_i^2 \gamma_i^2 + 2 \sum_{i' < i} Z_i Z_{i'} \gamma_i \gamma_{i'} \right] \right) \left( \sum_{m = 1}^n \left[ Z_m^2 \gamma_m^2 + 2 \sum_{m' < m} Z_m Z_{m'} \gamma_m \gamma_{m'} \right] \right) \\ &= \underbrace{\sum_{i = 1}^n \sum_{m = 1}^n Z_i^2 \gamma_i^2 Z_m^2 \gamma_m^2}_{(i)} \\ &\hspace{10mm} + \underbrace{2 \sum_{i = 1}^n \sum_{m = 1}^n Z_m^2 \gamma_m^2 \left( \sum_{i' < i} Z_i Z_{i'} \gamma_i \gamma_{i'} \right)}_{(ii)} \\ &\hspace{10mm} + \underbrace{2 \sum_{i = 1}^n \sum_{m = 1}^n Z_i^2 \gamma_i^2 \left( \sum_{m' < m} Z_m Z_{m'} \gamma_m \gamma_{m'} \right)}_{(iii)} \\ &\hspace{10mm} + \underbrace{4 \sum_{i = 1}^n \sum_{m = 1}^n \left( \sum_{i' < i} Z_i Z_{i'} \gamma_i \gamma_{i'} \right)\left( \sum_{m' < m} Z_m Z_{m'} \gamma_m \gamma_{m'} \right)}_{(iv)} \end{aligned} \nonumber $$ $(i)$ becomes: $$ \begin{aligned} \sum_{i = 1}^n \sum_{m = 1}^n Z_i^2 \gamma_i^2 Z_m^2 \gamma_m^2 &= \sum_{i = 1}^n \left[ Z_i^4 \gamma_i^4 + 2\sum_{i' < i} Z_i^2 \gamma_i^2 Z_{i'}^2 \gamma_{i'}^2 \right] \end{aligned} \nonumber $$ $(ii)$ and $(iii)$ become: $$ \begin{aligned} 2 \sum_{i = 1}^n \sum_{m = 1}^n Z_m^2 \gamma_m^2 \left( \sum_{i' < i} Z_i Z_{i'} \gamma_i \gamma_{i'} \right) &= 2 \sum_{i = 1}^n \sum_{m = 1}^n \left( \sum_{i' < i} Z_m^2 \gamma_m^2 Z_i Z_{i'} \gamma_i \gamma_{i'} \right) \\ &= 2 \sum_{i = 1}^n \left[ \left( \sum_{i' < i} Z_i^3 Z_{i'} \gamma_i^3 \gamma_{i'} \right) + \left( \sum_{i' < i} Z_i Z_{i'}^3 \gamma_i \gamma_{i'}^3 \right) + 2 \sum_{i' < i} \sum_{m < i'} Z_m^2 \gamma_m^2 Z_i \gamma_i Z_{i'} \gamma_{i'} \right] \end{aligned} \nonumber $$ $(iv)$ becomes: $$ \begin{aligned} 4 \sum_{i = 1}^n \sum_{m = 1}^n \left( \sum_{i' < i} Z_i Z_{i'} \gamma_i \gamma_{i'} \right)\left( \sum_{m' < m} Z_m Z_{m'} \gamma_m \gamma_{m'} \right) &= 4 \sum_{i = 1}^n \left[ \left( \sum_{i' < i} Z_i Z_{i'} \gamma_i \gamma_{i'} \right)^2 + 2 \sum_{m < i} \left( \sum_{i' < i} Z_i Z_{i'} \gamma_i \gamma_{i'} \right)\left( \sum_{m' < m} Z_m Z_{m'} \gamma_m \gamma_{m'} \right) \right] \\ &= 4 \sum_{i = 1}^n \left[ \sum_{i' < i} \left( Z_i^2 \gamma_i^2 Z_{i'}^2 \gamma_{i'}^2 + 2 \sum_{i'' < i'} Z_i^2 \gamma_i^2 Z_{i'} \gamma_{i'} Z_{i''} \gamma_{i''} \right) + 2 \sum_{m < i} \sum_{i' < i} \sum_{m' < m} Z_i Z_{i'} \gamma_i \gamma_{i'} Z_m Z_{m'} \gamma_m \gamma_{m'} \right] \\ &= 4 \sum_{i = 1}^n \left[ \sum_{i' < i} \left( Z_i^2 \gamma_i^2 Z_{i'}^2 \gamma_{i'}^2 + 2 \sum_{i'' < i'} Z_i^2 \gamma_i^2 Z_{i'} \gamma_{i'} Z_{i''} \gamma_{i''} \right) \right. \\ &\hspace{10mm} + 2 \left. \left( \sum_{i' < i} \sum_{m' < i'} Z_i Z_{i'}^2 \gamma_i \gamma_{i'}^2 Z_{m'}\gamma_{m'} + 2 \sum_{i' < i} \sum_{m < i'} \sum_{m' < m} Z_i Z_{i'} \gamma_i \gamma_{i'} Z_{m} Z_{m'} \gamma_{m} \gamma_{m'} \right) \right] \\ \end{aligned} \nonumber $$ In the above, any $\gamma$ that has a power of $1$ will be $0$ in expectation, so we have the following: $$ \begin{aligned} \mathbb{E}\left[ (i) \right] &= \sum_{i = 1}^n \left( Z_i^4 \mathbb{E}\left[ \gamma_i^4 \right] + 2 \sum_{i' < i} Z_i^2 Z_{i'}^2 \mathbb{E}\left[ \gamma_i^2 \right] \mathbb{E}\left[ \gamma_{i'}^2 \right] \right) \\ \mathbb{E}\left[ (ii) \right] &= \mathbb{E}\left[ (iii) \right] = 0 \\ \mathbb{E}\left[ (iv) \right] &= 4 \sum_{i = 1}^n \sum_{i' < i} Z_i^2 Z_{i'}^2 \mathbb{E}\left[ \gamma_i^2 \right] \mathbb{E}\left[ \gamma_{i'}^2 \right] \end{aligned} \nonumber $$ Putting all of the above together: $$ \begin{aligned} (a) &= \mathbb{E}\left[ \left( \sum_{j = 1}^k \sum_{i = 1}^n Z_i (Y_{i,j} - \mu_{i,j}) \right)^4 \right] \\ &= \sum_{i = 1}^n Z_i^4 \mathbb{E}\left[ \gamma_i^4\right] + 2 \sum_{i = 1}^n \sum_{i' < i} Z_i^2 Z_{i'}^2 \mathbb{E}\left[ \gamma_i^2 \right] \mathbb{E}\left[ \gamma_{i'}^2 \right] + 4 \sum_{i= 1}^n \sum_{i' = 1}^n Z_i^2 Z_{i'}^2 \mathbb{E}\left[ \gamma_i^2 \right] \mathbb{E}\left[ \gamma_{i'}^2 \right] \\ &= \sum_{i = 1}^n Z_i^4 \mathbb{E}\left[ \gamma_i^4\right] + 6 \sum_{i = 1}^n \sum_{i' < i} Z_i^2 Z_{i'}^2 \mathbb{E}\left[ \gamma_i^2 \right] \mathbb{E}\left[ \gamma_{i'}^2 \right] \\ &= \sum_{i = 1}^n Z_i^4 \mathbb{E}\left[ \gamma_i^4\right] + 6 \sum_{i = 1}^n \sum_{i' < i} Z_i^2 Z_{i'}^2 \left( \sum_{j = 1}^k \mu_{i,j} \right) \left( \sum_{j = 1}^k \mu_{i',j} \right) \\ \end{aligned} \nonumber $$Proof Of $(b)$.
$$ \begin{aligned} (b) &= \left(\sum_{j = 1}^k \sum_{i = 1}^n Z_i^2 \mu_{i,j} \right)\mathbb{E}\left[ \left( \sum_{i = 1}^n Z_i \gamma_i \right)^2 \right] \\ &= \left(\sum_{j = 1}^k \sum_{i = 1}^n Z_i^2 \mu_{i,j} \right) \sum_{i = 1}^n \left( Z_i^2 \mathbb{E}\left[ \gamma_i^2 \right] + 2 \sum_{i' < i} Z_i Z_{i'} \underbrace{\mathbb{E}\left[ \gamma_i \right]}_{=0} \underbrace{\mathbb{E}\left[ \gamma_{i'} \right]}_{=0} \right) \\ &= \left( \sum_{i = 1}^n Z_i^2 \sum_{j = 1}^k \mu_{i,j} \right) \left( \sum_{i = 1}^n Z_i^2 \sum_{j = 1}^k \mu_{i, j} \right) \\ &= \left( \sum_{i = 1}^n Z_i^2 \sum_{j = 1}^k \mu_{i,j} \right)^2 \end{aligned} \nonumber $$Proof Of $(c)$.
$$ \begin{aligned} \mathbb{E}\left[ \gamma_i^t \right] &= \mathbb{E}\left[\sum_{j = 1}^k (Y_{i,j} - \mu_{i,j})^4 \right] \\ &= \sum_{j = 1}^k \mathbb{E}\left[ (Y_{i,j} - \mu_{i,j})^4 \right] \\ &= \sum_{j = 1}^k \kappa_{4; i, j} + 3 \mathbb{E}\left[(Y_{i,j} - \mu_{i,j})^2 \right]^2 \\ &= \sum_{j = 1}^k \kappa_{4; i, j} + 3 \mu_{i,j}^2 \end{aligned} \nonumber $$Details Of $\mathcal{I}_{\alpha, \tau}$.
$$ \begin{aligned} \mathcal{I}_{\alpha_j, \tau} &= \mathbb{E}\left[ U_{\alpha_j} U_\tau \right] \\ &= \mathbb{E}\left[ \left( \sum_{i = 1}^n X_i (Y_{i,j} - \mu_{i,j}) \right) \left( \tau \text{tr}\left[\left( \sum_{i = 1}^n Z_i (Y_{i, \cdot} - \mu_{i, \cdot}) \right)\left( \sum_{i = 1}^n Z_i (Y_{i, \cdot} - \mu_{i, \cdot})^\top \right) - \left(\sum_{i = 1}^n Z_i^2 \text{diag}(\mu_{i,\cdot}) \right) \right] \right) \right] \\ &= \tau \mathbb{E}\left[ \left( \sum_{i = 1}^n X_i (Y_{i,j} - \mu_{i,j}) \right) \left(\sum_{j = 1}^k \sum_{j' = 1}^k \left[\left( \sum_{i = 1}^n Z_i (Y_{i, \cdot} - \mu_{i, \cdot}) \right)\left( \sum_{i = 1}^n Z_i (Y_{i, \cdot} - \mu_{i, \cdot})^\top \right) - \left(\sum_{i = 1}^n Z_i^2 \text{diag}(\mu_{i,\cdot}) \right) \right]_{j, j'} \right) \right] \\ &= \tau \mathbb{E}\left[ \left( \sum_{i = 1}^n X_i (Y_{i,j} - \mu_{i,j}) \right) \left(\sum_{j = 1}^k \sum_{j' = 1}^k \left( \sum_{i = 1}^n Z_i (Y_{i, j} - \mu_{i, j}) \right)\left(\sum_{i = 1}^n Z_i (Y_{i, j'} - \mu_{i, j'}) \right) - \sum_{j = 1}^k \left(\sum_{i = 1}^n Z_i^2 \mu_{i,j} \right) \right) \right] \\ &= \tau \mathbb{E}\left[ \left( \sum_{i = 1}^n X_i (Y_{i,j} - \mu_{i,j}) \right) \left[ \left(\sum_{j = 1}^k \sum_{i = 1}^n Z_i (Y_{i, j} - \mu_{i, j}) \right) \left(\sum_{j' = 1}^k \sum_{i = 1}^n Z_i (Y_{i, j'} - \mu_{i, j'}) \right) - \sum_{j = 1}^k \left(\sum_{i = 1}^n Z_i^2 \mu_{i,j} \right) \right] \right] \\ &= \tau \mathbb{E}\left[ \left( \sum_{i = 1}^n X_i (Y_{i,j} - \mu_{i,j}) \right) \left[ \left(\sum_{i = 1}^n Z_i \gamma_i \right) \left( \sum_{i = 1}^n Z_i \gamma_i \right) - \sum_{j = 1}^k \sum_{j' = 1}^k \left(\sum_{i = 1}^n Z_i^2 \mu_{i,j} \right) \right] \right] \\ &= \tau \sum_{i = 1}^n \left( X_i \mathbb{E} \left[(Y_{i,j} - \mu_{i,j})\left( \sum_{i' = 1}^n Z_{i'} \gamma_{i'} \right) \left( \sum_{i' = 1}^n Z_{i'} \gamma_{i'} \right) \right] - X_i \left( \sum_{j = 1}^k \sum_{j' = 1}^k \sum_{i' = 1}^n Z_{i'}^2 \mu_{i',j} \right) \underbrace{\mathbb{E}\left[ (Y_{i,j} - \mu_{i,j}) \right]}_{=0} \right) \\ &= \tau \sum_{i = 1}^n X_i \mathbb{E}\left[ (Y_{i,j} - \mu_{i,j}) \sum_{i' = 1}^n \left( Z_{i'}^2 \gamma_{i'}^2 + 2 \sum_{i'' < i'} Z_{i'} Z_{i''} \gamma_{i'} \gamma_{i''} \right)\right] \\ &= \tau \sum_{i = 1}^n X_i \left( \sum_{i' = 1}^n \mathbb{E}\left[ (Y_{i,j} - \mu_{i,j}) Z_{i'}^2 \gamma_{i'}^2 \right] + 2\sum_{i' = 1}^n \sum_{i'' < i'}\mathbb{E} \left[ (Y_{i,j} - \mu_{i,j}) Z_{i'} Z_{i''} \gamma_{i'} \gamma_{i''} \right] \right) \\ &= \tau \sum_{i = 1}^n X_i \left( Z_i^2 \mathbb{E}\left[ (Y_{i,j} - \mu_{i,j}) \gamma_i^2 \right] + 2\sum_{i' < i} Z_{i'}^2 \underbrace{\mathbb{E}\left[ (Y_{i,j} - \mu_{i,j}) \right]}_{=0} \mathbb{E}\left[ \gamma_{i'}^2 \right] + 2\left( \sum_{i'' < i} Z_i Z_{i''} \mathbb{E}\left[ (Y_{i,j} - \mu_{i,j}) \gamma_i \right] \underbrace{\mathbb{E}\left[ \gamma_{i''}\right]}_{=0} + \sum_{i' < i}\sum_{i'' < i'} Z_{i'} Z_{i''} \underbrace{\mathbb{E}\left[ (Y_{i,j} - \mu_{i,j}) \right]}_{=0} \underbrace{\mathbb{E}\left[ \gamma_{i'} \right]}_{=0} \underbrace{\mathbb{E}\left[ \gamma_{i''} \right]}_{=0}\right)\right) \\ &= \tau \sum_{i = 1}^n X_i \left( Z_i^2 \mathbb{E}\left[ (Y_{i,j} - \mu_{i,j}) \gamma_i^2 \right] \right) \\ &= \tau \sum_{i = 1}^n X_i \left( Z_i^2 \mathbb{E}\left[ (Y_{i,j} - \mu_{i,j}) \left(\sum_{j = 1}^k Y_{i,j} - \mu_{i,j} \right) \left(\sum_{j = 1}^k Y_{i,j} - \mu_{i,j} \right) \right] \right) \\ &= \tau \sum_{i = 1}^n X_i Z_i^2 \left( \mathbb{E}\left[ (Y_{i,j} - \mu_{i,j})^3 \right] + \sum_{j' \neq j} \sum_{j'' \neq j} \mathbb{E}\left[ (Y_{i,j} - \mu_{i,j}) (Y_{i,j'} - \mu_{i, j'}) (Y_{i,j''} - \mu_{i, j''}) \right] \right) \\ &\overset{(i)}{=} \tau \sum_{i = 1}^n X_i Z_i^2 \left( \mathbb{E}\left[ (Y_{i,j} - \mu_{i,j})^3 \right] + \sum_{j' \neq j} \sum_{j'' \neq j} \underbrace{\mathbb{E}\left[ (Y_{i,j} - \mu_{i,j})\right]}_{=0} \underbrace{\mathbb{E}\left[ (Y_{i,j'} - \mu_{i, j'}) \right]}_{=0} \underbrace{\mathbb{E}\left[ (Y_{i,j''} - \mu_{i, j''}) \right]}_{=0} \right) \\ &= \tau \sum_{i = 1}^n X_i Z_i^2 \mathbb{E}\left[ (Y_{i,j} - \mu_{i,j})^3 \right] \\ &\overset{(ii)}{=} \tau \sum_{i = 1}^n X_i Z_i^2 \mu_{i,j} \\ \end{aligned} $$ For $(i)$, we assume that the outcomes are independent, even from the same sample (?).For $(ii)$, we use the fact that the third central moment for a Poisson random variable is the mean.
The above implies that: $$ \mathcal{I}_{\alpha, \tau} = \tau \sum_{i = 1}^n Z_i^2 X_i \mu_{i,\cdot} = \tau X^\top \text{diag}(Z) \text{diag}(Z) \mu \nonumber $$
Details Of $\mathcal{I}_{\alpha, \alpha}$.
$$ \begin{aligned} \mathcal{I}_{\alpha_j, \alpha_{j'}} &= \mathbb{E}\left[ \left( \sum_{i = 1}^n X_i(Y_{i,j} - \mu_{i,j}) \right) \left( \sum_{i = 1}^n X_i(Y_{i,j'} - \mu_{i,j'}) \right) \right] \\ &= \sum_{i = 1}^n \sum_{i' = 1}^n X_i X_{i'} \underbrace{\mathbb{E}\left[ (Y_{i,j}- \mu_{i,j}) \right]}_{=0} \underbrace{\mathbb{E}\left[ (Y_{i',j'} - \mu_{i',j'}) \right]}_{=0} \\ &= 0 \\ \mathcal{I}_{\alpha_j, \alpha_{j}} &= \mathbb{E}\left[ \left( \sum_{i = 1}^n X_i(Y_{i,j} - \mu_{i,j}) \right)^2 \right] \\ &\overset{(i)}{=} \sum_{i = 1}^n \left( X_i^2 \underbrace{\mathbb{E}\left[ (Y_{i,j} - \mu_{i,j})^2 \right]} + 2\sum_{i' < i} X_i X_{i'} \underbrace{\mathbb{E}\left[ Y_{i,j} - \mu_{i,j} \right]}_{=0} \underbrace{\mathbb{E}\left[ Y_{i',j} - \mu_{i',j} \right]}_{=0} \right) \\ &= \sum_{i = 1}^n X_i^2 \mu_{i,j} \\ &= X^\top \text{diag}(X) \mu_{\cdot, j} \\ \implies \mathcal{I}_{\alpha, \alpha} &= \text{diag}\left( X^\top \text{diag}(X) \mu \right) \end{aligned} \nonumber $$ For $(i)$, we assume that the outcomes are independent, even from the same sample (?).$\mathcal{I}_{\alpha, \alpha}$ is not only $k \times k$ but also diagonal. Thus, its inverse is the reciprocal of the diagonal elements.
The test statistic for the global test is given by:
\[\chi_G^2 = U_\theta^\top(\hat{\alpha}_0) \tilde{\mathcal{I}}^{-1}(\hat{\alpha}_0) U_\theta(\hat{\alpha}_0)\]References
Breslow, N. E., and D. G. Clayton. “Approximate Inference in Generalized Linear Mixed Models.” Journal of the American Statistical Association 88, no. 421 (1993): 9–25. doi:10.2307/2290687.
Lin, X. “Variance component testing in generalised linear models with random effects”. Biometrika, Volume 84, Issue 2, June 1997. Pages 309–326. doi:10.1093/biomet/84.2.309.
Zhang, D., Lin, X. (2008). “Variance Component Testing in Generalized Linear Mixed Models for Longitudinal/Clustered Data and other Related Topics”. In: Dunson, D.B. (eds) Random Effect and Latent Variable Model Selection. Lecture Notes in Statistics, vol 192. Springer, New York, NY. doi:10.1007/978-0-387-76721-5_2.