
Tutorial: Interval censored data analysis

Michael
P. Fay, National Institute of Allergy and Infectious Diseases
(NIAID), USA
Abstract
Interval censored data analysis is important in biomedical statistics for any
type of timetoevent response where the time of response is not known exactly
but only known to occur between two assessment times, one before the event
occurred and one after the event occurred. Some examples are:
1. time until first positive HIV blood sample in an HIV vaccine trial,
2. time until first negative sputum culture in a TB treatment trial, and
3. time until cancer progression or death in a cancer treatment trial.
Standard survival methods (e.g., KaplanMeier curves, logrank tests,
accelerated failure time regression models) must be modified to properly account
for the interval censoring. For example, naively imputing the failure time as
the midpoint of the interval and performing the usual logrank test for
rightcensored data can lead to large type 1 errors. This topic is relevant for
the R users conference because for some important methods for this type of data,
the only readily available software is implemented in R packages. The goal of
this tutorial is to show why these interval censored data methods are needed and
useful, and to show that some of the methods are easily performed in R.
Outline
Topics will include:
 Types of interval censoring (noninformative vs. informative; Case 1,
Case 2, Case k)
 Nonparametric Maximum likelihood estimation (NPMLE) of the Survival
Curve
 Right censored case (KaplanMeier). Graphical description of Efron's
redistribution to the right algorithm.
 Interval censored case. Graphical description of Turnbull's
selfconsistent algorithm. (To give intuition on the NPMLE).
 Calculation of NPMLE in R
 survfit in survival package, including review of
Surv function and different types of censoring
 interval package
 Icens package and its algorithms.
 Testing the difference between two groups
 Why we usually use rank tests for timetoevent responses
 Basic permutation tests
 Generalizing the WilcoxonMannWhitney test for survival data
 Likelihoods for interval censored data.
 Marginal likelihood of the ranks
 Grouped continuous model
 Weighted logrank tests as score tests on semiparametric models
 Logrank test (two versions)/ Proportional Hazards
 Wilcoxon test/ Proportional odds
 Multiple imputation
 Why midpoint imputation can give bad type I errors
 What if the inspection process is different between treatment groups
 Overview of type I error problems and different rank tests
 Weighted logrank tests in R using interval package
 Choosing model/score
 Choosing method
 Regression
 Parametric models (accelerated failure time models)
 Examples using survival R package
Potential attendees
Potential attendees are those who analyze interval
censored data or plan clinical trials with endpoints of that type.
Required knowledge
Minimal knowledge of R is required. The tutorial will assume that participants have been exposed previously to standard rightcensored
data analysis methods (KaplanMeier curves, logrank tests, etc.) although an
indepth knowledge of those methods is not necessary.
Tutorial Materials
Slides are here.