Tutorial: Analysis of Complex Traits Using R: Case studies

Jing Hua Zhao, MRC Epidemiology Unit, Cambridge, UK.


Owing to recent advances in genotyping and sequencing technologies and the successes of several international collaborative projects, there is a considerable interest in genetic analysis of complex traits which include common diseases [1] and other quantitative measurements [2]. The analysis customarily involves a large number of single-nucleotide polymorphisms (SNPs), the most abundant genetic variants in human genome. A lot of work have been done using a variety of computer software including R but a greater awareness of R, a synthesis of the current work together with contributions from a broader community are required [3].

This tutorial intends to give an overview of approaches for genetic analysis of complex traits, including heritability, segregation, linkage and association studies, paying attention to the statistical models, some successful stories and indication of their limitations. A particular focus is on the study design and analysis of genomewide association studies (GWAS), and the instructor’s own involvement in such analysis will be described [2,4]. A complementary part of this tutorial concerns about computer software in these analyses including R, which reflects but not limited to the instructor’s own work [3,5-7]. Statistical and computational challenges are expected to be exposed through both parts.

Topics associated with and motivated from the case studies range from fundamental concepts such as measurement of risk and heritability to analysis of genomic data such as Hardy-Weinberg equilibrium, linkage equilibrium to more sophisticated modelling such as prospective and retrospective models of haplotypes, gene-environmental interactions [8] and pathways [9]. Related aspects include haplotype analysis, imputation of genotypes and meta-analysis, merits of some frequentist and Bayesian methods as used in our GWAS of obesity.

The instructor is an investigator scientist in genetics. He obtained his degrees in medicine, statistics and genetics, and has worked on a broad range of problems in statistical genetics and genetic epidemiology over the last ten years. He and his colleagues have recently been involved in study design and analysis in several large epidemiological cohorts and collaborative work such as the Genomic Investigation of Anthropometric Traits (GIANT) consortium. Besides materials to be covered in the tutorial, he also did other work on genetic data analysis with R [10,11] and programs in other computing environment and languages such as C and SAS [12-14].

The potential attendees will be researchers with basic knowledge in statistics and computing who wish to get involved with or improve their understanding of genetic data analysis. However, it will also be useful to professionals and researchers actively engaged in analysis of genetic data and/or development of computational tools in R or other environments. It is expected that course materials will refresh and interact with attendees' views on design and analysis of genomic data in humans while generating interest to researchers in plant and animal sciences.


  1. The WTCCC: Genomewide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 2007, 447: 661-678
  2. Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, Lindgren CM, Perry JR, Elliott KS, Lango H, Rayner NW et al.: A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 2007, 316: 889-894
  3. Zhao JH: Use of R in genetic association studies. useR 2007, Ames, Iowa, USA
  4. Loos R, Lindgren CM, Li S, Wheeler E, Zhao JHeal. Association studies involving over 90,000 samples demonstrate that common variants near to MC4R influence fat mass, weight and risk of obesity. submitted. 2007.
  5. Zhao JH, Tan Q: Integrated analysis of genetic data with R. Hum Genomics 2006, 2: 258-265
  6. Zhao JH, Tan Q: Genetic dissection of complex traits in silico: approaches, problems and solutions. Curr Bioinformatics 2006, 1: 359-369
  7. Zhao JH: gap: genetic analysis package. J Stat Soft 2007, in press:
  8. Tan Q, Christiansen L, Andersen CB, Zhao JH, Li S, Torben AK, Christensen K: Retrospective analysis of main and interaction effects in genetic association studies of human complex traits. BMC Genetics 2007, 8: 70
  9. Zhao JH, Luan JA, Baksh F, Tan Q: Mining gene networks with application to GAW15 problem 1. BMC Proceedings 2007, 1 (Suppl 1): S52
  10. Zhao JH: Mixed-effects Cox models of alcohol dependence in extended families. BMC Genet 2005, 6(Suppl 1): S127
  11. Zhao JH: Pedigree-drawing with R and graphviz. Bioinformatics 2006, 22: 1013-1014
  12. Zhao JH, Curtis D, Sham PC: Model-free and permutation tests for allelic associations. Hum Hered 2000, 50: 133-139
  13. Zhao JH: 2LD, GENECOUNTING and HAP: Computer programs for linkage disequilibrium analysis. Bioinformatics 2004, 20: 1325-1326
  14. Zhao JH, Luan JA, Tan Q, Loos R, Wareham N. Analysis of Large Genomic Data in Silico: The EPIC-Norfolk Study of Obesity. Huang, D.-S., Heutte, L., and Loog, M. [2], 781-790. 2007. Springer-Verlag Berlin Heidelberg 2007.