Andrea S. Foulkes, Division of Biostatistics and Epidemiology, University of Massachusetts, Amherst, USA
The primary goal of this tutorial is to introduce fundamental statistical concepts and R tools for high-dimensional data analysis.
Recent technological advancements, coupled with extensive
genetic sequencing efforts, have led to an explosion in the
availability of molecular and cellular level data for the
study of complex diseases.
At the same time, novel statistical methods have been developed to address the unique analytic challenges that arise in these data settings.
This tutorial introduces the theory and practical application of these new methodologies.
The first portion of the tutorial focuses on several
procedures that address the multiplicity problem inherent in
high-dimensional data analysis, including single-step and
step-down adjustments. Secondly, tree-based approaches are
introduced, including tree-growth and tree-pruning algorithms
and variable importance scoring.
Finally, a range of alternative methods are presented and illustrated, including logic regression and conditional inference trees, with specific attention given to the scientific questions each approach is tailored to address.
Publicly available data are used to aid in the illustration of analytic tools.
Dr. Foulkes is Associate Professor of Biostatistics, Division of Biostatistics and Epidemiology, University of Massachusetts Amherst where she has been recognized for teaching excellence.
Her active research program includes the development of methods for characterizing the relationships among high-dimensional molecular and cellular level data and measures of disease progression. She has authored numerous technical manuscripts in this field as well as the graduate level text, Applied Statistical Genetics with R (2009) Springer Use R Series.
She also serves as the principal investigator of an individual research award (R01) from the National Institute of Allergy and Infectious Diseases, a division of the National Institutes of Health on Methods for high-dimensional data in HIV/AIDS research.
Elementary knowledge of statistical concepts at the level of a first course in statistics is assumed.