Tutorial: Survey Analysis in R


presented by Thomas Lumley

Background knowledge:

Either experience in analyzing complex surveys or experience in regression modelling with S.

Intended audience:

The primary audience is biostatisticians and social or economic statisticians interested in secondary analysis of complex national surveys. A secondary audience is people involved in teaching survey analysis or design-based inference who want to explore using R as a tool.

Justification:

Until recently survey statistics was dominated by special-purpose software. This is now changing, with many general-purpose statistical systems offering some survey analysis tools. There appears to be interest in the R survey package both from government agencies and from academics (and the JSS paper about the survey package at least gets more hits than average for the journal).

Goals:

After the course, participants should be able to create R objects describing stratified multistage surveys, create calibrated or post-stratified weights, extract tables of summary statistics, and perform regression modelling.

Outline:

  1. Describing survey designs: svydesign()
    • Different meanings of "weight" in statistical software
    • strata
    • clusters
    • finite population sizes
    • multistage sampling
    • designs specified by replicate weights
  2. Summary statistics: mean, total, quantiles, design effect
  3. Tables of summary statistics, domain estimation.
  4. Graphics: histograms, hexbin scatterplots.
  5. Regression modelling: svyglm()
  6. Calibration of weights: postStratify(), calibrate()

Back to Tutorials


useR-2006@R-project.org