Tutorial: Survey Analysis in R

presented by Thomas Lumley

Background knowledge:

Either experience in analyzing complex surveys or experience in regression modelling with S.

Intended audience:

The primary audience is biostatisticians and social or economic statisticians interested in secondary analysis of complex national surveys. A secondary audience is people involved in teaching survey analysis or design-based inference who want to explore using R as a tool.


Until recently survey statistics was dominated by special-purpose software. This is now changing, with many general-purpose statistical systems offering some survey analysis tools. There appears to be interest in the R survey package both from government agencies and from academics (and the JSS paper about the survey package at least gets more hits than average for the journal).


After the course, participants should be able to create R objects describing stratified multistage surveys, create calibrated or post-stratified weights, extract tables of summary statistics, and perform regression modelling.


  1. Describing survey designs: svydesign()
    • Different meanings of "weight" in statistical software
    • strata
    • clusters
    • finite population sizes
    • multistage sampling
    • designs specified by replicate weights
  2. Summary statistics: mean, total, quantiles, design effect
  3. Tables of summary statistics, domain estimation.
  4. Graphics: histograms, hexbin scatterplots.
  5. Regression modelling: svyglm()
  6. Calibration of weights: postStratify(), calibrate()

Back to Tutorials