Every year I suggest a student use Terry Shaffer’s log exposure models for nest survival (Shaffer 2004). And every year I spend hours trying to figure out why the code in the help section of ?family doesn’t work. So this year I’m writing it down.

Irregular data collection intervals are an annoying feature of much ecological data. Nest survival data in particular. Try monitoring the fate of dozens or hundreds of nests scattered across a large landscape and make sure you always visit every one every 3 days. There is always going to be slop[1]. Ignoring that slop leads to biased estimates of survival. What we really want is the daily survival rate p, but when the intervals vary what we are observing is pd where d is the number of days in the interval.

Harold Mayfield recognized this problem (Mayfield 1975, and references therein), and his estimator is the classic way to deal with this problem. Coming up with alternatives and improvements to the Mayfield estimator is a cottage industry among statisticians! Terry Schaffer recognized that a simple change to the link function in a binomial glm model achieved the same goal, and gives us access to many powerful modeling tools. For example, Max Post van der Burg (2010) used the modified link function together with generalized additive models to identify periods of increased risk of nest parasitism.

Terry used SAS in his paper, but the good R fairies provided code for a log exposure day link function in the examples for family(). Except the code doesn’t work.

logexp_bad <- function(days = 1)
{
  linkfun <- function(mu) qlogis(mu^(1/days))
  linkinv <- function(eta) plogis(eta)^days
  mu.eta <- function(eta) days * plogis(eta)^(days-1) * binomial()$mu_eta # this is the error
  valideta <- function(eta) TRUE
  link <- paste0("logexp(", days, ")")
  structure(list(linkfun = linkfun, linkinv = linkinv,
                 mu.eta = mu.eta, valideta = valideta, name = link),
            class = "link-glm")
}


# simulate some data with different exposure days
library(tidyverse)
set.seed(30894879)
abc <- data.frame(x=rnorm(100),
           days=sample(1:5, 100, replace=TRUE))
abc <- mutate(abc,
              p=plogis(x),
              p_days=p^days,
              y = rbinom(100, 1, p_days))

glm(y~x, data=abc, family=binomial(link=logexp_bad(days=abc$days))) # fails

## Error in glm.fit(x = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, : NAs in d(mu)/d(eta)

The problem is subtle, because it doesn’t cause anything to fail in an obvious way, even when you try to use it. The definition of the function mu.eta() includes a call to binomial() to extract the mu_eta component, except it doesn’t exist so mu.eta() ends up NULL. The correct component name is mu.eta, so:

logexp <- function(days = 1)
{
  linkfun <- function(mu) qlogis(mu^(1/days))
  linkinv <- function(eta) plogis(eta)^days
  mu.eta <- function(eta) days * plogis(eta)^(days-1) * binomial()$mu.eta(eta)
  valideta <- function(eta) TRUE
  link <- paste0("logexp(", days, ")")
  structure(list(linkfun = linkfun, linkinv = linkinv,
                 mu.eta = mu.eta, valideta = valideta, name = link),
            class = "link-glm")
}
summary(glm(y~x, data=abc, family=binomial(link=logexp(days=abc$days)))) # works

## 
## Call:
## glm(formula = y ~ x, family = binomial(link = logexp(days = abc$days)), 
##     data = abc)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.67072  -0.67665  -0.26660   0.09403   2.40338  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.1210     0.2505  -0.483    0.629    
## x             1.1601     0.2608   4.449 8.63e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 112.467  on 99  degrees of freedom
## Residual deviance:  76.386  on 98  degrees of freedom
## AIC: 80.386
## 
## Number of Fisher Scoring iterations: 6

and that works. The estimated slope should not be different from 1 on the log - odds scale, and it is, so that makes me happy too.

References

Burg, Max Post Van Der, Larkin A Powell, and Andrew J Tyre. 2010. “Finding the Smoothest Path to Success: Model Complexity and the Consideration of Nonlinear Patterns in Nest-Survival Data.” The Condor 112 (3). University of California Press: 421–31.

Mayfield, Harold F. 1975. “Suggestions for Calculating Nest Success.” The Wilson Bulletin. JSTOR, 456–66.

Shaffer, Terry L. 2004. “A Unified Approach to Analyzing Nest Success.” The Auk 121 (2): 526–40.

[1] All the code for this post, including that not shown, can be found here.