Tutorial: High-dimensional Data Methods with R

Andrea S. Foulkes, Division of Biostatistics and Epidemiology, University of Massachusetts, Amherst, USA

Course Description

The primary goal of this tutorial is to introduce fundamental statistical concepts and R tools for high-dimensional data analysis.

Recent technological advancements, coupled with extensive genetic sequencing efforts, have led to an explosion in the availability of molecular and cellular level data for the study of complex diseases.
At the same time, novel statistical methods have been developed to address the unique analytic challenges that arise in these data settings.
This tutorial introduces the theory and practical application of these new methodologies.

The first portion of the tutorial focuses on several procedures that address the multiplicity problem inherent in high-dimensional data analysis, including single-step and step-down adjustments. Secondly, tree-based approaches are introduced, including tree-growth and tree-pruning algorithms and variable importance scoring.
Finally, a range of alternative methods are presented and illustrated, including logic regression and conditional inference trees, with specific attention given to the scientific questions each approach is tailored to address.


Topics will include:

Publicly available data are used to aid in the illustration of analytic tools.


Dr. Foulkes is Associate Professor of Biostatistics, Division of Biostatistics and Epidemiology, University of Massachusetts Amherst where she has been recognized for teaching excellence.

Her active research program includes the development of methods for characterizing the relationships among high-dimensional molecular and cellular level data and measures of disease progression. She has authored numerous technical manuscripts in this field as well as the graduate level text, Applied Statistical Genetics with R (2009) Springer Use R Series.

She also serves as the principal investigator of an individual research award (R01) from the National Institute of Allergy and Infectious Diseases, a division of the National Institutes of Health on Methods for high-dimensional data in HIV/AIDS research.


Elementary knowledge of statistical concepts at the level of a first course in statistics is assumed.

Please check here for up to date tutorial resources.