Basil Okola: sample size calculation for GLMM model with a log link

Basil Okola

We were recently tasked with determining sample size and power for a project in one of the courses in the Master of Statistics program. Sample size and power calculations have been well documented, and involves following 5 steps:

Specify a parameter, hypothesis and test
Specify significance level
Specify effect size
Obtain values or estimates of other parameters needed
Specify a target value for power.

The project involved determining from 14 compounds, those that guaranteed a longer vase life for a rose flower when compared to water. We chose to model count of days and adjusted for species of the flowers, including some random effects as well.

We set \(\alpha=0.05\), effect size of 1, power set at \(1 - \beta = 0.85\) with a right tailed alternative hypothesis. We had to correct for multiple hypothesis test using bonferroni correction. A pilot study of flowers preserved with water was used to estimate mean vase life used for simulating sample size.

We sampled from a poisson distribution (\(Y \sim poisson(\lambda)\)) to simulate sample size and power but did not however include random intercepts in the sample size calculation, which overestimated our overall variability.

Poisson_sims <-
  function(n_grid = seq(50, 1500, 50),
           lambda0 = 26,
           lambda1 = 27,
           alpha = 0.05/14,
           test = "two_sided",
           n_sims = 10000,
           seed_nr = 1234) {
    # power_vec <- matrix(nrow = 1, ncol = length(n_grid))
    power_vec = vector(mode = 'integer')
    for (j in 1:length(n_grid)) {
      # 1. Choose sample size per group
      N <- n_grid[j]
      # 2. Select parameters
      lambda.control = lambda0
      lambda.treated = lambda1
      alpha = alpha
      # 3. Simulate huge number of experiments and test
      numberSimulation <- n_sims
      pval <- numeric(numberSimulation)
      zval <- numeric(numberSimulation)
      set.seed(seed_nr)
      for (i in 1:numberSimulation) {
        # We simulate from Poisson distribution
        controlGroup <- rpois(N, lambda = lambda.control)
        treatedGroup <- rpois(N, lambda = lambda.treated)
        simData <- data.frame(
          response = c(controlGroup, treatedGroup),
          treatment = rep(c(0, 1), each = N)
        )
        # We use a GLM model for Poisson regression to test effect of treatment
        # (Wald test)
        glm_fit <-
          summary(glm(
            response ~ treatment,
            data = simData,
            family = poisson()
          ))
        pval[i] <- glm_fit$coeff["treatment", "Pr(>|z|)"]
        zval[i] <- glm_fit$coeff["treatment", "z value"]
        if (test == "greater" & zval[i] > 0) {
          pval[i] <- pval[i] / 2
        }
        if (test == "greater" & zval[i] < 0) {
          pval[i] <- 1 - (pval[i] / 2)
        }
        if (test == "less" & zval[i] < 0) {
          pval[i] <- pval[i] / 2
        }
        if (test == "less" & zval[i] > 0) {
          pval[i] <- 1 - (pval[i] / 2)
        }
      }
      # 4. Estimate power
      power_vec[j] = sum(pval < alpha) / numberSimulation
    }
    return(list(n_grid = n_grid, power_vec = power_vec))
  }

library(parallel)
# for Linux os
numcores = detectCores()
# windows
cl = makeCluster(detectCores())
# normal rune
# system.time({
# hh = lapply(seq(200, 250, 1),sample_power)
# })

# clustermap for supplying additional arguments

system.time({
  out_list = clusterMap(cl,test = 'greater',Poisson_sims)
  # out_list = parLapply(cl, n_sample = seq(50, 150, 50), sample_power)
})

   user  system elapsed 
   0.19    0.45 6207.24

# remember to stop cluster
stopCluster(cl)
out = do.call(cbind, out_list$greater) %>% as.data.frame()
names(out) = c('n', 'powr')
# split out to portions of 10 rows
outb = split(out, (seq(nrow(out))-1) %/% 10)
out_tab = do.call(cbind, outb)
names(out_tab) = sub('^.[^1-9]', "", names(out_tab))
# knitr::kable(out_tab, caption = "Power as simulated for different sample sizes")

Effect size of 1

Effect size of 2

cl = makeCluster(detectCores())
# normal run
# system.time({
# hh = lapply(seq(200, 250, 1),sample_power)
# })

system.time({
  # out_list = parLapply(cl, seq(1051, 1200, 1),sample_power)
  out_list = clusterMap(cl, n_grid = seq(101, 250, 10), test = 'greater',lambda1 = 28,Poisson_sims)
})

   user  system elapsed 
   0.03    0.00  185.35

# remember to stop cluster
stopCluster(cl)
out = do.call(rbind.data.frame, out_list) %>% as.data.frame()
names(out) = c('n', 'powr')
outb = split(out, (seq(nrow(out))-1) %/% 30)
out_tab = do.call(cbind, outb)
names(out_tab) = sub('^.[^1-9]', "", names(out_tab))
# knitr::kable(out_tab, caption = "Power as simulated for different sample sizes")

p2 = ggplot(out,aes(x = n, y = powr)) +
  geom_line(color = 'red', size = 1) +
  geom_hline(aes(yintercept = .80), linetype = 'dashed') +
  geom_hline(aes(yintercept = .85), linetype = 'dashed') +
  geom_hline(aes(yintercept = .90), linetype = 'dashed') +
  theme_minimal() +
  # cowplot::theme_minimal_hgrid(rel_small = 1) +
  scale_y_continuous(labels = scales::percent) + #, limits = c(0.0, 1)) +
  scale_x_continuous(breaks = seq(101, 250, 10)) +
  # theme(axis.text.x = element_text(angle = 90, vjust = 0.1, hjust = 1)) +
  labs(x = "Sample size", y = 'Power') #+expand_limits(y = 0)
print(p2)

ggsave(p2, filename = file.path(data_dir, "sample size zoom.png"))

Additional links on sample size determination:

https://nickch-k.github.io/EconometricsSlides/Week_08/Power_Simulations.html https://cran.r-project.org/web/packages/paramtest/vignettes/Simulating-Power.html

Learn more about using Distill at https://rstudio.github.io/distill.

sample size calculation for GLMM model with a log link

Effect size of 1

Effect size of 2

Citation