GLMM FAQ

GLMM poisson log link logit probit

an informal GLMM FAQ list for the r-sig-mixed-models mailing list

Ben Bolker https://github.com/Bokola
04-28-2021

Introduction

This is an informal FAQ list for the r-sig-mixed-models mailing list.

The most commonly used functions for mixed modeling in R are

Another quick-and-dirty way to search for mixed-model related packages on CRAN:

grep("l.?m[me][^t]",rownames(available.packages(repos = 'https://cran.us.r-project.org')),value=TRUE)
 [1] "blmeco"         "buildmer"       "cellVolumeDist"
 [4] "climextRemes"   "elementR"       "glmertree"     
 [7] "glmmboot"       "glmmEP"         "glmmfields"    
[10] "glmmLasso"      "glmmML"         "glmmSeq"       
[13] "glmmsr"         "glmmTMB"        "lamme"         
[16] "lme4"           "lmec"           "lmeInfo"       
[19] "lmem.qtler"     "lmeNB"          "lmeNBBayes"    
[22] "lmeresampler"   "lmerTest"       "lmeSplines"    
[25] "lmeVarComp"     "lmmot"          "lmmpar"        
[28] "lrmest"         "lsmeans"        "mailmerge"     
[31] "mlmm.gwas"      "mlmmm"          "mvglmmRank"    
[34] "nlmeODE"        "nlmeU"          "palmerpenguins"
[37] "tlmec"          "vagalumeR"     

There are some false positives here (e.g. palmerpenguins); see here if you’re interested in “regex golf”.

Other sources of help

DISCLAIMERS:

References

linear mixed models

web/open

books (dead-tree/closed)

Model definition

Model specification

The following formula extensions for specifying random-effects structures in R are used by

MCMCglmm uses a different specification, inherited from AS-REML.

(Modified from Robin Jeffries, UCLA:)

formula meaning
(1|group) random group intercept
(x|group) = (1+x|group) random slope of x within group with correlated intercept
(0+x|group) = (-1+x|group) random slope of x within group: no variation in intercept
(1|group) + (0+x|group) uncorrelated random intercept and random slope within group
(1|site/block) = (1|site)+(1|site:block) intercept varying among sites and among blocks within sites (nested random effects)
site+(1|site:block) fixed effect of sites plus random variation in intercept among blocks within sites
(x|site/block) = (x|site)+(x|site:block) = (1 + x|site)+(1+x|site:block) slope and intercept varying among sites and among blocks within sites
(x1|site)+(x2|block) two different effects, varying at different levels
x*site+(x|site:block) fixed effect variation of slope and intercept varying among sites and random variation of slope and intercept among blocks within sites
(1|group1)+(1|group2) intercept varying among crossed random effects (e.g. site, year)

Or in a little more detail:

equation formula
\(ß_0 + ß_{1}X_{i} + e_{si}\) n/a (Not a mixed-effects model)
\((ß_0 + b_{S,0s}) + ß_{1}X_i + e_{si}\) ~ X + (1|Subject)
\((ß_0 + b_{S,0s}) + (ß_{1} + b_{S,1s}) X_i + e_{si}\) ~ X + (1 + X|Subject)
\((ß_0 + b_{S,0s} + b_{I,0i}) + (ß_{1} + b_{S,1s}) X_i + e_{si}\) ~ X + (1 + X|Subject) + (1|Item)
As above, but \(S_{0s}\), \(S_{1s}\) independent ~ X + (1|Subject) + (0 + X| Subject) + (1|Item)
\((ß_0 + b_{S,0s} + b_{I,0i}) + ß_{1}X_i + e_{si}\) ~ X + (1|Subject) + (1|Item)
\((ß_0 + b_{I,0i}) + (ß_{1} + b_{S,1s})X_i + e_{si}\) ~ X + (0 + X|Subject) + (1|Item)

Modified from: http://stats.stackexchange.com/questions/13166/rs-lmer-cheat-sheet?lq=1 (Livius)

The magic development version of the equatiomatic package can handle mixed models (remotes::install_github("datalorax/equatiomatic")), e.g.

library(lme4)
library(equatiomatic)
fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
equatiomatic::extract_eq(fm1)

\[ \begin{aligned} \operatorname{Reaction}_{i} &\sim N \left(\alpha_{j[i]} + \beta_{1j[i]}(\operatorname{Days}), \sigma^2 \right) \\ \left( \begin{array}{c} \begin{aligned} &\alpha_{j} \\ &\beta_{1j} \end{aligned} \end{array} \right) &\sim N \left( \left( \begin{array}{c} \begin{aligned} &\mu_{\alpha_{j}} \\ &\mu_{\beta_{1j}} \end{aligned} \end{array} \right) , \left( \begin{array}{cc} \sigma^2_{\alpha_{j}} & \rho_{\alpha_{j}\beta_{1j}} \\ \rho_{\beta_{1j}\alpha_{j}} & \sigma^2_{\beta_{1j}} \end{array} \right) \right) \text{, for Subject j = 1,} \dots \text{,J} \end{aligned} \]

It doesn’t handle GLMMs (yet), but you could fit two fake models — one LMM like your GLMM but with a Gaussian response, and one GLM with the same family/link function as your GLMM but without the random effects — and put the pieces together.

More possibly useful links:

Should I treat factor xxx as fixed or random?

This is in general a far more difficult question than it seems on the surface. There are many competing philosophies and definitions. For example, from @gelman_analysis_2005:

Before discussing the technical issues, we briefly review what is meant by fixed and random effects. It turns out that different—in fact, incompatible—definitions are used in different contexts. [See also Kreft and de Leeuw (1998), Section 1.3.3, for a discussion of the multiplicity of definitions of fixed and random effects and coefficients, and Robinson (1998) for a historical overview.] Here we outline five definitions that we have seen: 1. Fixed effects are constant across individuals, and random effects vary. For example, in a growth study, a model with random intercepts αi and fixed slope β corresponds to parallel lines for different individuals i, or the model yit = αi + βt. Kreft and de Leeuw [(1998), page 12] thus distinguish between fixed and random coefficients. 2. Effects are fixed if they are interesting in themselves or random if there is interest in the underlying population. Searle, Casella and McCulloch [(1992), Section 1.4] explore this distinction in depth. 3. “When a sample exhausts the population, the corresponding variable is fixed; when the sample is a small (i.e., negligible) part of the population the corresponding variable is random” [Green and Tukey (1960)]. 4. “If an effect is assumed to be a realized value of a random variable, it is called a random effect” [LaMotte (1983)]. 5. Fixed effects are estimated using least squares (or, more generally, maximum likelihood) and random effects are estimated with shrinkage [“linear unbiased prediction” in the terminology of Robinson (1991)]. This definition is standard in the multilevel modeling literature [see, e.g., Snijders and Bosker (1999), Section 4.2] and in econometrics.

Another useful comment (via Kevin Wright) reinforcing the idea that “random vs. fixed” is not a simple, cut-and-dried decision: from @schabenberger_contemporary_2001, p. 627:

Before proceeding further with random field linear models we need to remind the reader of the adage that one modeler’s random effect is another modeler’s fixed effect.

@clark2015should address this question from a mostly econometric perspective, focusing mostly on practical variance/bias/RMSE criteria.

One point of particular relevance to ‘modern’ mixed model estimation (rather than ‘classical’ method-of-moments estimation) is that, for practical purposes, there must be a reasonable number of random-effects levels (e.g. blocks) – more than 5 or 6 at a minimum. This is not surprising if you consider that random effects estimation is trying to estimate an among-block variance. For example, from @Crawley2002 p. 670:

Are there enough levels of the factor in the data on which to base an estimate of the variance of the population of effects? No, means [you should probably treat the variable as] fixed effects.

Some researchers (who treat fixed vs random as a philosophical rather than a pragmatic decision) object to this approach.

Also see a very thoughtful chapter in @hodges_richly_2016.

Treating factors with small numbers of levels as random will in the best case lead to very small and/or imprecise estimates of random effects; in the worst case it will lead to various numerical difficulties such as lack of convergence, zero variance estimates, etc.. (A small simulation exercise shows that at least the estimates of the standard deviation are downwardly biased in this case; it’s not clear whether/how this bias would affect the point estimates of fixed effects or their estimated confidence intervals.) In the classical method-of-moments approach these problems may not arise (because the sums of squares are always well defined as long as there are at least two units), but the underlying problems of lack of power are there nevertheless.

Also see this thread on the r-sig-mixed-models mailing list.

Nested or crossed?

(When) can I include a predictor as both fixed and random?

See blog post by Thierry Onkelinx

Model extensions

Overdispersion

Testing for overdispersion/computing overdispersion factor

The following function should work for a variety of model types (at least glmmADMB, glmmTMB, lme4, …).

overdisp_fun <- function(model) {
    rdf <- df.residual(model)
    rp <- residuals(model,type="pearson")
    Pearson.chisq <- sum(rp^2)
    prat <- Pearson.chisq/rdf
    pval <- pchisq(Pearson.chisq, df=rdf, lower.tail=FALSE)
    c(chisq=Pearson.chisq,ratio=prat,rdf=rdf,p=pval)
}
Example:
set.seed(101)  
d <- data.frame(x=runif(1000),
                f=factor(sample(1:10,size=1000,replace=TRUE)))
suppressMessages(d$y <- simulate(~x+(1|f), family=poisson,
                          newdata=d,
                          newparams=list(theta=1,beta=c(0,2)))[[1]])
m1 <- glmer(y~x+(1|f),data=d,family=poisson)
overdisp_fun(m1)
       chisq        ratio          rdf            p 
1035.9966325    1.0391140  997.0000000    0.1902294 
m2 <- glmmTMB(y~x+(1|f),data=d,family="poisson")
overdisp_fun(m2)
       chisq        ratio          rdf            p 
1035.9961394    1.0391135  997.0000000    0.1902323 

The gof function in the aods3 provides similar functionality (it reports both deviance- and \(\chi^2\)-based estimates of overdispersion and tests).

Fitting models with overdispersion?

## extract summary table; you may also be able to do this via
##  broom::tidy or broom.mixed::tidy
quasi_table <- function(model,ctab=coef(summary(model)),
                           phi=overdisp_fun(model)["ratio"]) {
    qctab <- within(as.data.frame(ctab),
    {   `Std. Error` <- `Std. Error`*sqrt(phi)
        `z value` <- Estimate/`Std. Error`
        `Pr(>|z|)` <- 2*pnorm(abs(`z value`), lower.tail=FALSE)
    })
    return(qctab)
}
printCoefmat(quasi_table(m1),digits=3)
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   0.2277     0.2700    0.84      0.4    
x             2.0640     0.0528   39.11   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## to use this with glmmTMB, we need to separate out the
##  conditional component of the summary
printCoefmat(quasi_table(m2,
                         ctab=coef(summary(m2))[["cond"]]),
             digits=3)
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   0.2277     0.2700    0.84      0.4    
x             2.0640     0.0528   39.09   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Another version, this one tidyverse-centric:

library(broom.mixed)
library(dplyr)
tidy_quasi <- function(model, phi=overdisp_fun(model)["ratio"],
                       conf.level=0.95) {
    tt <- (tidy(model, effects="fixed")
        %>% mutate(std.error=std.error*sqrt(phi),
                   statistic=estimate/std.error,
                   p.value=2*pnorm(abs(statistic), lower.tail=FALSE))
    )
    return(tt)
}
tidy_quasi(m1)
# A tibble: 2 x 6
  effect term        estimate std.error statistic p.value
  <chr>  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 fixed  (Intercept)    0.228    0.270      0.843   0.399
2 fixed  x              2.06     0.0528    39.1     0    
tidy_quasi(m2)
# A tibble: 2 x 7
  effect component term        estimate std.error statistic p.value
  <chr>  <chr>     <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 fixed  cond      (Intercept)    0.228    0.270      0.843   0.399
2 fixed  cond      x              2.06     0.0528    39.1     0    

These functions make some simplifying assumptions: (1) this overdispersion computation is approximate

In this case using quasi-likelihood doesn’t make much difference, since the data we simulated in the first place were Poisson.) Keep in mind that once you switch to quasi-likelihood you will either have to eschew inferential methods such as the likelihood ratio test, profile confidence intervals, AIC, etc., or make more heroic assumptions to compute “quasi-” analogs of all of the above (such as QAIC).

Negative binomial models in glmmTMB and lognormal-Poisson models in glmer (or MCMCglmm) are probably the best quick alternatives for overdispersed count data. If you need to explore alternatives (different variance-mean relationships, different distributions), then ADMB, TMB, WinBUGS, Stan, NIMBLE are the most flexible alternatives.

Underdispersion

Underdispersion (much less variability than expected) is a less common problem than overdispersion.

Gamma GLMMs

While one (well, OK I) would naively think that GLMMs with Gamma distributions would be just as easy (or hard) as any other sort of GLMMs, it seems that they are in fact harder to implement. Basic simulated examples of Gamma GLMMs can fail in lme4 despite analogous problems with Poisson, binomial, etc. distributions. Solutions: - the default inverse link seems particularly problematic; try other links (especially family=Gamma(link="log")) if that is possible/makes sense - consider whether a lognormal model (i.e. a regular LMM on logged data) would work/makes sense. - @lo_transform_2015 argue that the Gamma family with an identity link is superior to lognormal models for reaction-time data. I (BMB) don’t find their argument particularly convincing, but lots of people want to do this. Unfortunately this is technically challenging (see here), because it is likely that some “illegal” values (predicted responses \(\le 0\)) will occur while fitting the model, even if the final fitted model makes no impossible predictions. Thus something has to be done to make the model-fitting machinery tolerant of such values (i.e. returning NA for these model evaluations, or clamping illegal values to the constrained space with an appropriate smooth penalty function).

Gamma models can be fitted by a wide variety of platforms (lme4::glmer, MASS::glmmPQL, glmmADMB, glmmTMB, MixedModels.jl, MCMCglmm, brms … not sure about others.

Beta GLMMs

Proportion data where the denominator (e.g. maximum possible number of successes for a given observation) is not known can be modeled using a Beta distribution. @smithson_better_2006 is a good introduction for non-statisticians (not in the mixed-model case), and the betareg package [@cribari-neto_beta_2009] handles non-mixed Beta regressions. The glmmTMB and brms packages handle Beta mixed models (brms also handles zero-inflated and zero-one inflated models).

Zero-inflation

See e.g. @martin_zero_2005 or @warton_many_2005 (“many zeros does not mean zero inflation”) or @zuur_zero-truncated_2009 for general information on zero-inflation.

Count data

Continuous data

Continuous data are a special case where the mixture model for zero-inflated data is less relevant, because observations that are exactly zero occur with probability (but not probability density) zero. There are two cases of interest:

Probability density of \(x\) zero or infinite

In this case zero is a problematic observation for the distribution; it’s either impossible or infinitely (locally) likely. Some examples:

The best solution depends very much on the data-generating mechanism.

Probability density of \(x\) positive and finite

In this case (e.g. a spike of zeros in the center of an otherwise continuous distribution), the hurdle model probably makes the most sense.

Tests for zero-inflation

Spatial and temporal correlation models, heteroscedasticity (“R-side” models)

In nlme these so-called R-side (R for “residual”) structures are accessible via the weights/VarStruct (heteroscedasticity) and correlation/corStruct (spatial or temporal correlation) arguments and data structures. This extension is a bit harder than it might seem. In LMMs it is a natural extension to allow the residual error terms to be components of a single multivariate normal draw; if that MVN distribution is uncorrelated and homoscedastic (i.e. proportional to an identity matrix) we get the classic model, but we can in principle allow it to be correlated and/or heteroscedastic.

It is not too hard to define marginal correlation structures that don’t make sense. One class of reasonably sensible models is to always assume an observation-level random effect (as MCMCglmm does for computational reasons) and to allow that random effect to be MVN on the link scale (so that the full model is lognormal-Poisson, logit-normal binomial, etc., depending on the link function and family).

For example, a relatively simple Poisson model with spatially correlated errors might look like this:

\[ \begin{split} \eta & \sim \textrm{MVN}(a + b x, \Sigma) \\ \Sigma_{ij} & = \sigma^2 \exp(-d_{ij}/s) \\ y_i & \sim \textrm{Poisson}(\lambda=\exp(\eta_i)) \end{split} \]

That is, the marginal distributions of the response values are Poisson-lognormal, but on the link (log) scale the latent Normal variables underlying the response are multivariate normal, with a variance-covariance matrix described by an exponential spatial correlation function with scale parameter \(s\).

How can one achieve this?

finds additional possibilities in the ramps (extended geostatistical) and ape (phylogenetic) packages.

Penalization/handling complete separation

Complete separation occurs in a binary-response model when there is some linear combination of the parameters that perfectly separates failures from successes - for example, when all of the observations are zero for some particular combination of categories. The symptoms of this problem are unrealistically large parameter estimates; ridiculously large Wald standard errors (the Hauck-Donner effect); and various warnings.

In particular, binomial glmer() models with complete separation can lead to “Downdated VtV is not positive definite” (e.g. see here) or “PIRLS step-halvings failed to reduce deviance in pwrssUpdate” errors (e.g. see here). Roughly speaking, the complete separation is likely to appear even if one considers only the fixed effects part of the model (counterarguments or counterexamples welcome!), suggesting two quick-and-dirty diagnostic methods. If fixed_form is the formula including only the fixed effects:

Separation: TRUE 
Existence of maximum likelihood estimates
(Intercept)      height 
        Inf         Inf 
0: finite value, Inf: infinity, -Inf: -infinity

If complete separation is occurring between categories of a single categorical fixed-effect predictor with a large number of levels, one option would be to treat this fixed effect as a random effect, which will allow some degree of shrinkage to the mean. (It might be reasonable to specify the variance of this term a priori to a large value [minimal shrinkage], rather than trying to estimate it from the data.)

(TODO: worked example)

The general approach to handling complete separation in logistic regression is called penalized regression; it’s available in the brglm, brglm2, logistf, and rms packages. However, these packages don’t handle mixed models, so the best available general approach is to use a Bayesian method that allows you to set a prior on the fixed effects, e.g. a Gaussian with standard deviation of 3; this can be done in any of the Bayesian GLMM packages (e.g. blme, MCMCglmm, brms, …) (See supplementary material for Fox et al. 2016 for a worked example.)

Non-Gaussian random effects

I’m not aware of easy ways to fit mixed models with non-Gaussian random effects distributions in R (i.e., convenient, flexible, well-tested implementations). @mcculloch_misspecifying_2011 discusses when this misspecification may be important. This presentation discusses various approaches to solving the problem (e.g. using a Gamma rather than a Normal distribution of REs in log-link models). The spaMM package implements H-likelihood models [@lee_generalized_2017], and claims to allow a range of random-effects distributions (perhaps not well tested though …)

In principle you can implement any random-effects distribution you want in a fully capable Bayesian modeling language (e.g. JAGS/Stan/PyMC/etc.); see e.g. this StackOverflow answer, which uses the rethinking package’s interface to Stan.

Estimation

What methods are available to fit (estimate) GLMMs?

(adapted from Bolker et al TREE 2009)

Method Advantages Disadvantages Packages
Penalized quasi-likelihood Flexible, widely implemented Likelihood inference may be inappropriate; biased for large variance or small means PROC GLIMMIX (SAS), GLMM (GenStat), glmmPQL (R:MASS), ASREML-R
Laplace approximation More accurate than PQL Slower and less flexible than PQL glmer (R:lme4,lme4a), glmm.admb (R:glmmADMB), INLA, glmmTMB, AD Model Builder, HLM
Gauss-Hermite quadrature More accurate than Laplace Slower than Laplace; limited to 2-3 random effects PROC NLMIXED (SAS), glmer (R:lme4, lme4a), glmmML (R:glmmML), xtlogit (Stata)
Markov chain Monte Carlo Highly flexible, arbitrary number of random effects; accurate Slow, technically challenging, Bayesian framework MCMCglmm (R:MCMCglmm), rstanarm (R), brms (R), MCMCpack (R), WinBUGS/OpenBUGS (R interface: BRugs/R2WinBUGS), JAGS (R interface: rjags/R2jags), AD Model Builder (R interface: R2admb), glmm.admb (post hoc MCMC after Laplace fit) (R:glmmADMB)

Troubleshooting

Convergence warnings

Most of the current advice about troubleshooting lme4 convergence problems can be found in the help page ?convergence. That page explains that the convergence tests in the current version of lme4 (1.1-11, February 2016) generate lots of false positives. We are considering raising the gradient warning threshold to 0.01 in future releases of lme4. In addition to the general troubleshooting tips above:

Singular models: random effect variances estimated as zero, or correlations estimated as +/- 1

It is very common for overfitted mixed models to result in singular fits. Technically, singularity means that some of the \(\boldsymbol \theta\) (variance-covariance Cholesky decomposition) parameters corresponding to diagonal elements of the Cholesky factor are exactly zero, which is the edge of the feasible space, or equivalently that the variance-covariance matrix has some zero eigenvalues (i.e. is positive semidefinite rather than positive definite), or (almost equivalently) that some of the variances are estimated as zero or some of the correlations are estimated as +/-1. This commonly occurs in two scenarios:

As of lme4 version 1.1-19, this functionality is available as isSingular(model). - In MCMCglmm, singular or near-singular models will provoke an error and a requirement to specify a stronger prior.

At present there are a variety of strong opinions about how to resolve such problems. Briefly:

Setting residual variances to a fixed value (zero or other)

For some problems it would be convenient to be able to set the residual variance term to zero, or a fixed value. This is difficult in lme4, because the model is parameterized internally in such a way that the residual variance is profiled out (i.e., calculated directly from a residual deviance term) and the random-effects variances are scaled by the residual variance.

Searching the r-sig-mixed-models list for “fix residual variance”

library(blme)
blmer(formula = y ~ 1 + (1 | group), weights = V,
      resid.prior = point(1.0), cov.prior = NULL)
This sets the residual variance to 1.0. You cannot use this to make it exactly zero, but you can make it very small (and experiment with setting it to different small values, e.g. 0.001 vs 0.0001, to see how sensitive the results are). - Similarly, you can fix the residual variance to a small positive value in [n]lme via the control() argument [@heisterkamp_update_2017]:
nlme::lme(Reaction~Days,random=~1|Subject,
          data=lme4::sleepstudy,
          control=list(sigma=1e-8))

The resulting function is not useful for general nonlinear optimization — one can easily wander into parameter regimes corresponding to infeasible (non-positive semidefinite) variance-covariance matrices — but it serves for likelihood profiling, where one focal parameter is varied at a time and the optimization over the other parameters is likely to start close to an optimum.

Other problems/lme4 error messages

Most of the following error messages are relatively unusual, and happen mostly with complex/large/unstable models. There is often no simple fix; the standard suggestions for troubleshooting are (1) try rescaling and/or centering predictors; (2) see if a simpler model can be made to work; (3) look for severe lack of balance and/or complete separation in the data set.

REML for GLMMs

Model diagnostics

Inference and confidence intervals

Testing hypotheses

What are the p-values listed by summary(glmerfit) etc.? Are they reliable?

By default, in keeping with the tradition in analysis of generalized linear models, lme4 and similar packages display the Wald Z-statistics for each parameter in the model summary. These have one big advantage: they’re convenient to compute. However, they are asymptotic approximations, assuming both that (1) the sampling distributions of the parameters are multivariate normal (or equivalently that the log-likelihood surface is quadratic) and that (2) the sampling distribution of the log-likelihood is (proportional to) \(\chi^2\). The second approximation is discussed further under “Degrees of freedom”. The first assumption usually requires an even greater leap of faith, and is known to cause problems in some contexts (for binomial models failures of this assumption are called the Hauck-Donner effect), especially with extreme-valued parameters.

Methods for testing single parameters

From worst to best:

Tests of effects (i.e. testing that several parameters are simultaneously zero)

From worst to best:

Is the likelihood ratio test reliable for mixed models?

Why doesn’t lme4 display denominator degrees of freedom/p values? What other options do I have?

There is an R FAQ entry on this topic, which links to a mailing list post by Doug Bates (there is also a voluminous mailing list thread reproduced on the R wiki). The bottom line is

Df alternatives:

These conditional tests for fixed-effects terms require denominator degrees of freedom. In the case of the conditional \(F\)-tests, the numerator degrees of freedom are also required, being determined by the term itself. The denominator degrees of freedom are determined by the grouping level at which the term is estimated. A term is called inner relative to a factor if its value can change within a given level of the grouping factor. A term is outer to a grouping factor if its value does not changes within levels of the grouping factor. A term is said to be estimated at level \(i\), if it is inner to the \(i-1\)st grouping factor and outer to the \(i\)th grouping factor. For example, the term Machine in the fm2Machine model is outer to Machine %in% Worker and inner to Worker, so it is estimated at level 2 (Machine %in% Worker). If a term is inner to all \(Q\) grouping factors in a model, it is estimated at the level of the within-group errors, which we denote as the \(Q+1\)st level.

The intercept, which is the parameter corresponding to the column of all 1’s in the model matrices \(X_i\), is treated differently from all the other parameters, when it is present. As a parameter it is regarded as being estimated at level 0 because it is outer to all the grouping factors. However, its denominator degrees of freedom are calculated as if it were estimated at level \(Q+1\). This is because the intercept is the one parameter that pools information from all the observations at a level even when the corresponding column in \(X_i\) doesn’t change with the level.

Letting \(m_i\) denote the total number of groups in level \(i\) (with the convention that \(m_0=1\) when the fixed effects model includes an intercept and 0 otherwise, and \(m_{Q+1}=N\)) and \(p_i\) denote the sum of the degrees of freedom corresponding to the terms estimated at level \(i\), the \(i\)th level denominator degrees of freedom is defined as

\[ \mathrm{denDF}_i = m_i - (m_{i-1} + p_i), i = 1, \dots, Q \]

This definition coincides with the classical decomposition of degrees of freedom in balanced, multilevel ANOVA designs and gives a reasonable approximation for more general mixed-effects models.

Note that the implementation used in lme gets the wrong answer for random-slopes models:
library(nlme)
lmeDF <- function(formula=distance~age,random=~1|Subject) {
     mod <- lme(formula,random,data=Orthodont)
     aa <- anova(mod)
    return(setNames(aa[,"denDF"],rownames(aa)))
}
lmeDF()
(Intercept)         age 
         80          80 
lmeDF(random=~age|Subject) ## wrong!
(Intercept)         age 
         80          80 

I (BB) have re-implemented this algorithm in a way that does slightly better for random-slopes models (but may still get confused!), see here.

home = ifelse(Sys.info()["sysname"] == "Windows",
              Sys.getenv("USERPROFILE"),
              Sys.getenv("HOME"))
home = home %>% gsub("\\\\", "/", .)

data_dir = file.path(
  home,
  "Google Drive (basil.okola@student.uhasselt.be)",
  "MSc. Stats Hasselt",
  "y1 sem2",
  "Multivariate and hierarchical data",
  "sample size calculation"
)
site_dir = file.path(home, "Distill websites", "_posts")
site_dir2 = file.path(home, "Distill websites")
source(file.path(site_dir,"2021-04-28-glmm-faq", "calcDenDF.R"))
calcDenDF(~age,"Subject",nlme::Orthodont)
(Intercept)         age 
         80          80 
calcDenDF(~age,data=nlme::Orthodont,random=~1|Subject)
(Intercept)         age 
         80          80 
calcDenDF(~age,data=nlme::Orthodont,random=~age|Subject) ## off by 1
(Intercept)         age 
         81          25 

Testing significance of random effects

Data: sleepstudy
Models:
m0: Reaction ~ Days
m1: Reaction ~ Days + (1 | Subject)
m2: Reaction ~ Days + (1 | Subject) + (0 + Days | Subject)
   npar    AIC    BIC  logLik deviance   Chisq Df Pr(>Chisq)    
m0    3 1906.3 1915.9 -950.15   1900.3                          
m1    4 1802.1 1814.8 -897.04   1794.1 106.214  1  < 2.2e-16 ***
m2    5 1762.0 1778.0 -876.00   1752.0  42.075  1  8.782e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With recent versions of lme4, goodness-of-fit (deviance) can be compared between (g)lmer and (g)lm models, although anova() must be called with the mixed ((g)lmer) model listed first. Keep in mind that LRT-based null hypothesis tests are conservative when the null value (such as \(\sigma^2=0\)) is on the boundary of the feasible space [@self_asymptotic_1987;@stram_variance_1994;@GoldmanWhelan2000]; in the simplest case (single random effect variance), the p-value is approximately twice as large as it should be [@pinheiro_mixed-effects_2000].

library(RLRsim)
## compare m0 and m1
exactLRT(m1,m0)

    simulated finite sample distribution of LRT. (p-value based
    on 10000 simulated values)

data:  
LRT = 106.21, p-value < 2.2e-16
## compare m1 and m2
mA <- update(m2,REML=TRUE)
m0B <- update(mA, . ~ . - (0 + Days|Subject))
m.slope  <- update(mA, . ~ . - (1|Subject))
exactRLRT(m0=m0B,m=m.slope,mA=mA)

    simulated finite sample distribution of RLRT.
    
    (p-value based on 10000 simulated values)

data:  
RLRT = 42.796, p-value < 2.2e-16
Bootstrap test; time: 53.98 sec; samples: 1000; extremes: 0;
Requested samples: 1000 Used samples: 500 Extremes: 0
large : Reaction ~ Days + (1 | Subject) + (0 + Days | Subject)
Reaction ~ Days + (1 | Subject)
         stat df   p.value    
LRT    42.075  1 8.782e-11 ***
PBtest 42.075     0.001996 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Standard errors of variance estimates

P-values: MCMC and parametric bootstrap

Abandoning the approximate \(F\)/\(t\)-statistic route, one ends up with the more general problem of estimating \(p\)-values. There is a wider range of options here, although many of them are computationally intensive …

Markov chain Monte Carlo sampling:

Status of mcmcsamp

mcmcsamp is a function for lme4 that is supposed to sample from the posterior distribution of the parameters, based on flat/improper priors for the parameters [ed: I believe, but am not sure, that these priors are flat on the scale of the theta (Cholesky-factor) parameters]. At present, in the CRAN version (lme4 0.999999-0) and the R-forge “stable” version (lme4.0 0.999999-1), this covers only linear mixed models with uncorrelated random effects.

As has been discussed in a variety of places (e.g. on r-sig-mixed models, and on the r-forge bug tracker, it is challenging to come up with a sampler that accounts properly for the possibility that the posterior distributions for some of the variance components may be mixtures of point masses at zero and continuous distributions. Naive samplers are likely to get stuck at or near zero. Doug Bates has always been a bit unsure that mcmcsamp is really performing as intended, even in the limited cases it now handles.

Given this uncertainty about how even the basic version works, the lme4 developers have been reluctant to make the effort to extend it to GLMMs or more complex LMMs, or to implement it for the development version of lme4 … so unless something miraculous happens, it will not be implemented for the new version of lme4. As always, users are encouraged to write and share their own code that implements these capabilities …

Parametric bootstrap

The idea here is that in order to do inference on the effect of (a) predictor(s), you (1) fit the reduced model (without the predictors) to the data; (2) many times, (2a) simulate data from the reduced model; (2b) fit both the reduced and the full model to the simulated (null) data; (2c) compute some statistic(s) [e.g. t-statistic of the focal parameter, or the log-likelihood or deviance difference between the models]; (3) compare the observed values of the statistic from fitting your full model to the data to the null distribution generated in step 2. - PBmodcomp in the pbkrtest package - see the example in help("simulate-mer") in the lme4 package to roll your own, using a combination of simulate() and refit(). - bootMer in lme4 version >1.0.0 - a presentation at UseR! 2009 (abstract, slides) went into detail about a proposed bootMer package and suggested it could work for GLMMs too – but it does not seem to be active.

Predictions and/or confidence (or prediction) intervals on predictions

Note that none of the following approaches takes the uncertainty of the random effects parameters into account … if you want to take RE parameter uncertainty into account, a Bayesian approach is probably the easiest way to do it.

The general recipe for computing predictions from a linear or generalized linear model is to

lme

library(nlme) 
fm1 <- lme(distance ~ age*Sex, random = ~ 1 + age | Subject,
           data = Orthodont) 
plot(Orthodont,asp="fill") ## plot responses by individual
## note that expand.grid() orders factor levels by *order of
## appearance* -- must match levels(Orthodont$Sex)
newdat <- expand.grid(age=c(8,10,12,14), Sex=c("Female","Male")) 
newdat$pred <- predict(fm1, newdat, level = 0)

## [-2] drops response from formula
Designmat <- model.matrix(formula(fm1)[-2], newdat)
predvar <- diag(Designmat %*% vcov(fm1) %*% t(Designmat)) 
newdat$SE <- sqrt(predvar) 
newdat$SE2 <- sqrt(predvar+fm1$sigma^2)

library(ggplot2) 
pd <- position_dodge(width=0.4) 
g0 <- ggplot(newdat,aes(x=age,y=pred,colour=Sex))+ 
   geom_point(position=pd)
cmult <- 2  ## could use 1.96 instead
g0 + geom_linerange(aes(ymin=pred-cmult*SE,ymax=pred+cmult*SE), position=pd)
## prediction intervals 
g0 + geom_linerange(aes(ymin=pred-cmult*SE2,ymax=pred+cmult*SE2), position=pd) 

A similar answer is laid out in the responses to this StackOverflow question.

lme4

Current versions of lme4 have a predict method.

library(lme4)
library(ggplot2)
data("Orthodont",package="MEMSS")
fm1 <- lmer(
  formula = distance ~ age*Sex + (age|Subject)
  , data = Orthodont
)
newdat <- expand.grid(
  age=c(8,10,12,14)
  , Sex=c("Female","Male")
  , distance = 0
)
newdat$distance <- predict(fm1,newdat,re.form=NA)
mm <- model.matrix(terms(fm1),newdat)
## or newdat$distance <- mm %*% fixef(fm1)
pvar1 <- diag(mm %*% tcrossprod(vcov(fm1),mm))
tvar1 <- pvar1+VarCorr(fm1)$Subject[1]  ## must be adapted for more complex models
cmult <- 2 ## could use 1.96
newdat <- data.frame(
  newdat
  , plo = newdat$distance-cmult*sqrt(pvar1)
  , phi = newdat$distance+cmult*sqrt(pvar1)
  , tlo = newdat$distance-cmult*sqrt(tvar1)
  , thi = newdat$distance+cmult*sqrt(tvar1)
)
#plot confidence
g0 <- ggplot(newdat, aes(x=age, y=distance, colour=Sex))+geom_point()
g0 + geom_pointrange(aes(ymin = plo, ymax = phi))+
    labs(title="CI based on fixed-effects uncertainty ONLY")
#plot prediction
g0 + geom_pointrange(aes(ymin = tlo, ymax = thi))+
    labs(title="CI based on FE uncertainty + RE variance")
rm("Orthodont") ## clean up

glmmTMB

library(glmmTMB)
data(Orthodont,package="nlme")
fm2 <- glmmTMB(distance ~ age*Sex + (age | Subject),
                data = Orthodont,
                family="gaussian")

## make prediction data frame
newdat <- expand.grid(age=c(8,10,12,14), Sex=c("Female","Male"))
## design matrix (fixed effects)
mm <- model.matrix(delete.response(terms(fm2)),newdat)
## linear predictor (for GLMMs, back-transform this with the
##  inverse link function (e.g. plogis() for binomial, beta;
##  exp() for Poisson, negative binomial
newdat$distance <- drop(mm %*% fixef(fm2)[["cond"]])
predvar <- diag(mm %*% vcov(fm2)[["cond"]] %*% t(mm))
newdat$SE <- sqrt(predvar) 
newdat$SE2 <- sqrt(predvar+sigma(fm2)^2)
(Probably overly complicated) ggplot code:
library(ggplot2);  theme_set(theme_bw())
pd <- position_dodge(width=0.4)
g0 <- ggplot(Orthodont,aes(x=age,y=distance,colour=Sex))+
    stat_sum(alpha=0.2,aes(size=..n..))+
    scale_size_continuous(breaks=1:4,range=c(2,5))
g1 <- g0+geom_line(data=newdat,position=pd)+
    geom_point(data=newdat,shape=17,size=3,position=pd)
## confidence intervals
g2 <- g1 + geom_linerange(data=newdat,
                          aes(ymin=distance-2*SE,ymax=distance+2*SE),
                          lwd=2, position=pd)
## prediction intervals 
g2 + geom_linerange(data=newdat,
                    aes(ymin=distance-2*SE2,ymax=distance+2*SE2), position=pd)

The effects, emmeans, and sjPlot packages are also useful here.

Confidence intervals on conditional means/BLUPs/random effects

lme4

(From Harold Doran, updated by Assaf Oron Nov. 2013:)

If you want the standard errors of the conditional means, you can extract them as follows:
library(lme4)
fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy)
cV <- ranef(fm1, condVar = TRUE)   

cV is a list; each element is a data frame containing the conditional modes for a particular grouping factor. If you use scalar random effects (typically random intercepts), then specifying ranef(...,drop=TRUE) will return the conditional modes as a single named vector instead.

The conditional variances are returned as an attribute of the conditional modes. It may be easiest to use as.data.frame(cV), or broom.mixed::tidy(fm1, effects="ran_vals"), to extract all of the conditional means and standard deviations.

Or, digging in to the data structure by hand: if we set
ranvar <- attr(cV[[1]], "postVar")
then ranvar is a 3-D array (the attribute is still called postVar, rather than condVar, for historical reasons/backward compatibility). Individual-level covariance matrices of the conditional modes will sit on the [,,i] faces. For example, ranvar[,,1] is the variance-covariance matrix of the conditional distribution for the first group, so
sqrt(diag(ranvar[,,1]))
[1] 12.070857  2.304839
will provide the intercept and slope standard standard deviations for the first group’s conditional modes. If you have a scalar random effect and used drop=TRUE in ranef(), then you will (mercifully) receive only a vector of individual variances here (one for each level of the grouping factor). The following incantation will give a matrix of conditional variances with one row for each group and one column for each parameters:
ng <- dim(ranvar)[3]
np <- dim(ranvar)[2]
mm <- matrix(ranvar[cbind(rep(seq(np),ng),
             rep(seq(np),ng),
             rep(ng,each=np))],
       byrow=TRUE,
       nrow=ng)

Getting the uncertainty of the coefficients (i.e., what’s returned by coef(): the sum of the fixed-effect and random-effect predictions for a particular level) is not (alas) currently easy with lme4. If the fixed and random effects were independent then we could simply add the conditional variance and the variance of the fixed-effect predictions, but they aren’t in general. There is a long r-sig-mixed-models mailing list thread that discusses the issues, focusing on (1) how to extract the covariance between fixed-effect estimate and the random-effect prediction; (2) whether this value (the covariance between an estimated parameter and a predicted mode of a conditional distribution of a random variable) even makes sense in a frequentist framework. If you’re willing to assume independence of the conditional variance and the fixed-effect sampling variance, then (e.g.) the variance of the intercepts for each group would be the sum of the fixed-effect intercept variance and the conditional variance of the intercept for each group:

vcov(fm1)[1,1]+mm[,1]
 [1] 192.2807 192.2807 192.2807 192.2807 192.2807 192.2807 192.2807
 [8] 192.2807 192.2807 192.2807 192.2807 192.2807 192.2807 192.2807
[15] 192.2807 192.2807 192.2807 192.2807

Power analysis

Power analysis for (G)LMMs is mostly done by simulation, although there are some closed-form solutions and approximations, e.g. @snijders_standard_1993 (Snijders has links to programs and other resources on his web page). There are resources and bits of code examples spread all over the internet. e.g. here.

@kain_practical_2015 and @johnson_power_2015 are peer-reviewed papers that discuss power analysis via simulation in more detail.

library(sos); findFn("{power analysis} mixed simulation")

locates the fullfact, pamm, simr, and simglm packages. Depending on the goal, one of these packages may have sufficient flexibility to do what you want.

Model selection and averaging

Can I use AIC for mixed models? How do I count the number of degrees of freedom for a random effect?

Model summaries (goodness-of-fit, decomposition of variance, etc.)

How do I compute a coefficient of determination (\(R^2\)), or an analogue, for (G)LMMs?

Problem

(This topic applies to both LMMs and GLMMs, perhaps more so to LMMs, because the issues are even harder for GLMMs.)

Researchers often want to know if there is a simple (or at least implemented-in-R) way to get an analogue of \(R^2\) or another simple goodness-of-fit metric for LMMs or GLMMs. This is a challenging question in both the GLM and LMM worlds (and therefore doubly so for GLMMs), because it turns out that the wonderful simplicity of \(R^2\) breaks down in the extension to GLMs or LMMs. If you’re trying to quantify “fraction of variance explained” in the GLM context, should you include or exclude sampling variation (e.g., Poisson variation around the expected mean)? [According to an sos::findFn search for “Nagelkerke”, one of the common solutions to this problem, the LogRegR2 function in the descr package computes several different “pseudo-\(R^2\)” measures for logistic regression.] If you’re trying to quantify it in the LMM context, should you include or exclude variation of different random-effects terms?

The same questions apply more generally to decomposition of variance (i.e. trying to assess the contribution of various model components to the overall fit, not just trying to assess the overall goodness-of-fit of the model); there is unlikely to be a single recipe that does everything you want.

This has been discussed at various times on the mailing lists. This thread and this thread on the r-sig-mixed-models mailing list are good starting points, and this post is useful too.

In one of those threads, Doug Bates said:

Assuming that one wants to define an R^2 measure, I think an argument could be made for treating the penalized residual sum of squares from a linear mixed model in the same way that we consider the residual sum of squares from a linear model. Or one could use just the residual sum of squares without the penalty or the minimum residual sum of squares obtainable from a given set of terms, which corresponds to an infinite precision matrix. I don’t know, really. It depends on what you are trying to characterize.

Simple/crude solutions

In one of those threads, Jarrett Byrnes contributed the following code:
r2.corr.mer <- function(m) {
   lmfit <-  lm(model.response(model.frame(m)) ~ fitted(m))
   summary(lmfit)$r.squared
}

\(\Omega^2_0\) [@xu_measuring_2003], which is almost the same, is based on comparing the residual variance of the full model against the residual variance of a (fixed) intercept-only null model:

1-var(residuals(m))/var(model.response(model.frame(m)))

Another possibility is the squared correlation between the response variable and the predicted values:

cor(model.response(model.frame(m)),predict(m,type="response"))^2

Sophisticated solutions

@gelman_bayesian_2006 propose/discuss Bayesian measures of \(R^2\) (I don’t know if anyone has created a canned implementation of these measures in R). @nakagawa_general_2013 and @johnson_extension_2014 have also proposed a general methodology for computing \(R^2\); J. Lefcheck gives examples here and here, based on his implementation in the piecewiseSEM package (CRAN, Github). See also @jaeger_r2_2017, @rights_quantifying_2018

A related question is how to quantify “repeatability” (i.e., ratios of variance at different levels) in GLMMs, especially how to compute the “residual error” term for GLMMs: see @nakagawa_repeatability_2010 and the rptR package.

The bottom line is that there are some simple recipes (and some more complex recipes that may or may not have been coded up by someone), but that ’‘’you have to think carefully about what information you want to get out of the coefficient of determination’’’, because no recipe will have all of the properties of \(R^2\) in the simple linear model case.

Packages/functions: See performance::r2(), MuMIn::r.squaredGLMM(), the r2glmm package, the standalone r2MLM function, stuff in the piecewiseSEM package, psycho::R2_nakagawa, partR2 package … (try e.g. sos::findFn("Nakagawa Schielzeth") for an up-to-date list …)

Variable importance

Do I have to specify the levels of fixed effects in lmer?

No. See Doug Bates reply to this question here

Miscellaneous/procedural

Pronunciation of lmer/glmer/etc.

Storing information

Recent versions of lme4 output contain an @optinfo slot that stores warnings.

Copied from https://stat.ethz.ch/pipermail/r-help/2012-February/302767.html :

There’s a somewhat hack-ish solution, which is to use options(warn=2) to ‘upgrade’ warnings to errors, and then use try() or tryCatch() to catch them.

More fancily, I used code that looked something like this to save warnings as I went along (sorry about the <<- ) in a recent simulation study. You could also check w$message to do different things in the case of different warnings.

## n.b. have to set up a 3D warn array first ...
withCallingHandlers(tryCatch(fun(n=nvec[j],tau=tauvec[i],...),
                error = function(e) {
                  warn[k,i,j] <<- paste("ERROR:",e$message)
              NA_ans}),
               warning = function(w) {
                  warn[k,i,j] <<- w$message
                  invokeRestart("muffleWarning")
             })

Mixed modeling packages

Which R packages (functions) fit GLMMs?

Should I use aov(), nlme, or lme4, or some other package?

The following is modified from a contribution by Kingsford Jones, found 2010-03-16

linear and nonlinear mixed models

GLMMs

Additive and generalized-additive mixed models

Hierarchical GLMs

diagnostic and modeling frameworks

data and examples

extensions

Interfaces to other systems

modeling based on LMMs

Off-CRAN mixed modeling packages:

R-forge and Github:

Other open source:

Commercial:

Package versions used

sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
 [1] ggplot2_3.3.3      RLRsim_3.1-6       nlme_3.1-148      
 [4] dplyr_1.0.3        broom.mixed_0.2.6  glmmTMB_1.0.2.1   
 [7] equatiomatic_0.2.0 lme4_1.1-26        Matrix_1.2-18     
[10] Cairo_1.5-12.2     pander_0.6.3       knitr_1.31        

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6        mvtnorm_1.1-1     lattice_0.20-41  
 [4] tidyr_1.1.2       zoo_1.8-8         assertthat_0.2.1 
 [7] digest_0.6.27     utf8_1.1.4        R6_2.5.0         
[10] plyr_1.8.6        backports_1.2.1   evaluate_0.14    
[13] coda_0.19-4       highr_0.8         pillar_1.5.1     
[16] rlang_0.4.10      multcomp_1.4-15   minqa_1.2.4      
[19] rstudioapi_0.13   nloptr_1.2.2.2    jquerylib_0.1.3  
[22] rmarkdown_2.7     labeling_0.4.2    splines_4.0.2    
[25] statmod_1.4.35    stringr_1.4.0     TMB_1.7.18       
[28] munsell_0.5.0     broom_0.7.5       compiler_4.0.2   
[31] xfun_0.20         pkgconfig_2.0.3   mgcv_1.8-31      
[34] htmltools_0.5.1.1 downlit_0.2.1     tidyselect_1.1.0 
[37] tibble_3.1.0      codetools_0.2-16  fansi_0.4.2      
[40] withr_2.4.0       crayon_1.3.4      MASS_7.3-53.1    
[43] grid_4.0.2        gtable_0.3.0      jsonlite_1.7.2   
[46] xtable_1.8-4      lifecycle_0.2.0   DBI_1.1.1        
[49] magrittr_2.0.1    scales_1.1.1      estimability_1.3 
[52] cli_2.2.0         stringi_1.5.3     farver_2.0.3     
[55] reshape2_1.4.4    bslib_0.2.4       ellipsis_0.3.1   
[58] generics_0.1.0    vctrs_0.3.6       boot_1.3-25      
[61] sandwich_3.0-0    distill_1.2       TH.data_1.0-10   
[64] tools_4.0.2       glue_1.4.2        purrr_0.3.4      
[67] emmeans_1.5.3     survival_3.1-12   yaml_2.2.1       
[70] colorspace_2.0-0  sass_0.3.1       

To do

Credited to:

This is original work by Ben Bolker and colleagues, kindly credit them if you find this useful. Original post at https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#

Distill is a publication format for scientific and technical writing, native to the web.

Learn more about using Distill at https://rstudio.github.io/distill.


  1. in R, foo::bar (or foo::bar()) denotes “function bar in package foo”).↩︎

  2. the dispersion factor is estimated on a variance, so we need to take the square root to apply it to the standard error↩︎

Citation

For attribution, please cite this work as

Bolker (2021, April 28). Basil Okola: GLMM FAQ. Retrieved from https://bokola214.netlify.app/posts/2021-04-28-glmm-faq/

BibTeX citation

@misc{bolker2021glmm,
  author = {Bolker, Ben},
  title = {Basil Okola: GLMM FAQ},
  url = {https://bokola214.netlify.app/posts/2021-04-28-glmm-faq/},
  year = {2021}
}