Score Calcuations

The Negative Binomial Case

Introduction

This post is just a catch-all for my derivations for my score test project with negative binomial outcomes. Our set-up is as follows. We have $n$ observations coming from $k$ different clusters, each of size $n_t$ for $t \in [k]$. The full data will be denoted by $\mathbf{y}$. Though $\mathbf{y}$ is a vector, we’ll denote the $j$-th observation from cluster $i$ with $\mathbf{y}_{i,j}$. For example, \(\mathbf{y}_{i,j}\) denotes element \(\sum_{l = 1}^{i - 1} n_l + j\) of $\mathbf{y}$. We’ll also denote the $n_i$-dimensional vector of responses for cluster $i$ with $\mathbf{y}_i$.

For each observation, we will have $p$ fixed effect covariates arranged in a $p$-dimensional vector, \(\mathbf{x}_{i, j}\), and $q$ random effects covariates in a $q$-dimensional vector, \(\mathbf{z}_{i,j}\). We’ll assume that the observations within the same cluster are independent.

Our model comes in the form of a specification of the conditional mean, $\mu_{i,j} = \mathbb{E}[\mathbf{y}_{i,j} \rvert \beta_i]$ (where we suppress the addition conditioning on the covariates themselves). For a monotonic and differentiable link function (e.g. $\log(\cdot)$ or $\text{logit}(\cdot)$), the conditional mean of the $j$-th observation in group $i$ is assumed to be given by:

\[\mu_{i,j} = g^{-1}\left(\alpha^\top \mathbf{x}_{i,j} + \beta_i^\top \mathbf{z}_{i,j} \right) \label{eq:glmm}\]

We then assume that the observations themselves follow some exponential family distribution with measurement errors, $\epsilon_{i,j}$, which is the deviation of the response from its (unit-specific) conditional mean. These errors are assumed to have mean zero and be independent of each other and of the random effects. We further assume the responses, $\mathbf{y}_{i,j}$, conditional on the random effects (and the covariates), are independent with variances equal to some function of the conditional mean.

In general, we will assume that:

\[\beta_i \overset{iid}{\sim} \mathcal{N}\left(\mathbf{0}_q, D(\tau^2) \right)\]

for some variance component, $\tau^2$. We’ll use $[\cdot] \rvert_{H_0}$ to denote evaluation of the function in brackets when setting $\beta$ equal to $\beta_0$. We’ll also use a superscript $0$ (e.g. $\mu^0$, $\eta^0$, etc.) to denote the quantity under the null hypothesis (i.e. $\tau^2 = \mathbf{0} \implies \beta = \mathbf{0}$).


Negative Binomial Case

Set-Up

In this example, we’ll let the responses be negative binomial. To keep things simple, we’ll say we only have a single fixed intercept and a single random slope. We let $\phi > 0$ denote the unknown dispersion parameter and assume the conditional mean to be given by:

\[\mu_{i,j} = \exp\left( \alpha_i + \beta_i \mathbf{z}_{i,j} \right) \label{eq:neg-bin-mean}\]

The likelihood based on a single observation, $\mathbf{y}_{i,j}$, is given by:

\[\mathcal{L}(\mathbf{y}_{i,j}; \alpha_i, \tau^2 \rvert \beta_i) = \frac{\Gamma\left(\mathbf{y}_{i,j} + \frac{1}{\phi}\right)}{\Gamma(\mathbf{y}_{i,j} + 1) \Gamma\left(\frac{1}{\phi} \right)}\left(\frac{1}{1 + \phi \mathbf{y}_{i,j}}\right)^{\frac{1}{\phi}} \left( \frac{\phi \mu_{i,j}}{1 + \phi \mu_{i,j}} \right)^{\mathbf{y}_{i,j}} \label{eq:neg-bin-single-lik}\]

where $\Gamma(\cdot)$ is the gamma function:

\[\Gamma(x) = \int_0^\infty t^{x - 1} \exp(-t) dt\]

The above parametrization of the likelihood implies that the conditional variance of the responses is given by:

\[V(\mu_{i,j}) = \mu_{i,j} + \frac{1}{\phi} \mu_{i,j}^2\]

The conditional log-likelihood based on cluster $i$ is:

\[\ell(\mathbf{y}_i; \alpha_i, \tau^2 \rvert \beta_i) = \sum_{j = 1}^{n_i} \left[ \log \Gamma \left( \mathbf{y}_{i,j} + \frac{1}{\phi} \right) - \log \Gamma\left(\mathbf{y}_{i,j} + 1\right) - \log\Gamma\left(\frac{1}{\phi} \right) - \frac{1}{\phi} \log\left(1 + \phi \mathbf{y}_{i,j} \right) + \mathbf{y}_{i,j} \left( \log(\phi \mu_{i,j}) - \log(1 + \phi \mu_{i,j}) \right) \right] \label{eq:neg-bin-full-cond-ll}\]

We assume to have the following generalized linear mixed model:

\[\begin{equation} \label{eq:glmm-y} \begin{aligned} \mathbf{y}_{i,j} \rvert \beta_i &\sim \text{NegBin}(\mu_{i,j}, \phi) \\ \mu_{i,j} &= \exp\left(\eta_{i,j}\right) = \exp\left(\alpha_i + \beta_i \mathbf{z}_{i,j}\right) \end{aligned} \end{equation}\]

Approximation

We follow a pseudo-likelihood approach (see here) which permits a Gaussian approximation of the outcome distribution via a linearization. Suppose we find the MLEs of the $\alpha_i$ and $\phi$ terms under $H_0$, which we can do with iteratively reweighted least squares or some comparable algorithm (basically just fitting the null model). Denote these estimates with $\tilde{\alpha_i}$ and $\tilde{\phi}$, respectively. We can compute our working responses and errors as:

\[\begin{equation} \label{eq:working-responses} \mathbf{y}^\star_{i,j} = \alpha_i + \beta_i \mathbf{z}_{i,j} + \epsilon^\star_{i,j}; \hspace{10mm} \epsilon^\star_{i,j} \sim \mathcal{N}\left(0, \frac{V(\tilde{\mu}_{i,j})}{\delta^2(\tilde{\eta}_{i,j})}\right) \end{equation}\]

where

\[\tilde{\eta}_{i,j} = \tilde{\alpha}_i; \hspace{10mm} \delta(\tilde{\eta}_{i,j}) = \frac{\partial g^{-1}(\eta_{i,j})}{\partial \eta_{i,j}}\bigg\rvert_{\eta_{i,j} = \tilde{\eta}_{i,j}}\]

Since $g(\cdot) = \log(\cdot)$, we have that \(\delta(\tilde{\eta}_{i,j}) = \exp(\tilde{\eta}_{i,j})\), implying that the working error variances are:

\[\frac{V(\tilde{\mu}_{i,j})}{\delta^2(\tilde{\eta}_{i,j})} = \frac{\exp(\tilde{\eta}_{i,j}) + \frac{1}{\phi}\exp(\tilde{\eta}_{i,j})}{\exp^2(\tilde{\eta}_{i,j})} = \frac{1}{\exp(\tilde{\eta}_{i,j})}\left(1 + \frac{1}{\tilde{\phi}}\right)\]

where we recall that $\tilde{\eta}_{i,j} = \tilde{\alpha}_i$ under $H_0$. Eq. \eqref{eq:working-responses} describes a linear mixed model (i.e. Gaussian outcomes), defined in terms of the working responses and errors. The only big difference is that we force the error variances to all be different and fix them at:

\[\sigma_{i,j}^2 = \frac{V(\tilde{\mu}_{i,j})}{\delta^2(\tilde{\eta}_{i,j})} = \frac{1}{\exp(\tilde{\alpha}_i)} \left(1 + \frac{1}{\tilde{\phi}} \right)\]

We have simplified some of the calculations since we can now use the Gaussian likelihood instead:

\[\begin{aligned} \mathcal{L}(\theta; \mathbf{y}^\star) &= \prod_{i = 1}^k (2 \pi)^{-\frac{n}{2}} \rvert \Sigma_{y_i} \rvert^{-\frac{1}{2}} \exp\left(- \frac{1}{2} (\mathbf{y}_i - \alpha_i \mathbf{1}_n)^\top \Sigma_{y_i}^{-1} (\mathbf{y}_i - \alpha_i \mathbf{1}_n) \right) \\ \ell(\theta; \mathbf{y}^\star) &= \sum_{i = 1}^k \left[ -\frac{n}{2} \log(2 \pi) - \frac{1}{2}\log(\rvert \Sigma_{y^\star_i} \rvert) - \frac{1}{2} (\mathbf{y}^\star_i - \alpha_i \mathbf{1}_n)^\top \Sigma_{y^\star_i}^{-1} (\mathbf{y}^\star_i - \alpha_i \mathbf{1}_n) \right] \end{aligned}\]

In the rest of the post, I’ll drop the $\star$ superscript for readability. Just be sure to remember that we are not dealing with the original observations but with their transformations.

Score

The marginal covariance matrix is very similar to the Gaussian outcome model above. The only thing that has changed is that each error has its own variance:

\[\Sigma_{y_i} = \text{diag}\left(\begin{bmatrix} \sigma^2_{i,1} & \dots & \sigma^2_{i, n}\end{bmatrix}\right) + \tau^2 \mathbf{z}_i \mathbf{z}_i^\top\]

Its inverse, $\Sigma^{-1}_{y_i}$, is:

Derivatives

We first find the derivative with respect to $\tau^2$:

Next, we find the derivative with respect to $\phi$:

And now we find the gradient with respect to $\alpha$:

MLEs

We can find $\hat{\theta}$ by setting the above equal to zero and substitute $\tau^2 = 0$. We get:

Thus, the score evaluated at $\theta = \hat{\theta}$ is then:

\[U_{\theta}(\hat{\theta}) = \begin{bmatrix} \frac{\partial \ell(\theta; \mathbf{y})}{\partial \alpha} \bigg\rvert_{\theta = \hat{\theta}} \\ \frac{\partial \ell(\theta; \mathbf{y})}{\partial \tau^2} \bigg\rvert_{\theta = \hat{\theta}} \end{bmatrix} = \begin{bmatrix} \left(\sum_{l' = 1}^n \frac{1}{\sigma^2_{1,l'}} \right)^{-1} \sum_{l' = 1}^n \frac{\mathbf{y}_{1,l'}}{\sigma^2_{1,l'}} \\ \vdots \\ \left(\sum_{l' = 1}^n \frac{1}{\sigma^2_{k,l'}} \right)^{-1} \sum_{l' = 1}^n \frac{\mathbf{y}_{k,l'}}{\sigma^2_{k,l'}} \\ - \frac{1}{2}\sum_{i = 1}^k \left[ \sum_{l = 1}^n \frac{\mathbf{z}_{i,l}^2}{\sigma_{i,l}^2} + \sum_{l = 1}^n \sum_{l' = 1}^n \frac{\mathbf{z}_{i,l}\mathbf{z}_{i,l'}(\mathbf{y}_{i,l} - \hat{\alpha}_i)(\mathbf{y}_{i,l'} - \hat{\alpha}_i)}{\sigma^2_{i,l} \sigma^2_{i,l'}}\right] \end{bmatrix}\]

Information

As before, to find the information, we need to compute the second-order derivatives of the log-likelihood, take the expectation under $H_0$ of minus those quantities, and evaluate them by plugging in $\hat{\theta}$.

Derivatives

We’ll take all of the derivatives component-wise. We’ll start with those with respect to $\tau^2$.

We then do the same but with respect to $\alpha$.

Expectations

We now take the expectation of the above vectors. We’ll evaluate the second order partial derivatives with respect to $\tau^2$ first.

And then we do the same for $\alpha_j$:

We then evaluate the Fisher information at the MLE:


Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Score and Information
  • Generalized Linear Mixed Models
  • Generalized Linear Models