Tutorial: An Introduction to High-Performance R


Dirk Eddelbuettel, Debian Project, Chicago, USA.

Slides available

Follow this link.

Brief Description

Computing resources are more abundant than ever in absolute terms thanks to the continued improvements in processing power that were predicted decades ago by Moore's law.

However, despite these advances, work in applied and computational statisticis is constrained by the seemingly parallel growth in data sets. As the larger and larger amounts of data offset the faster computers, any 'relative' improvement in computing power appears to be difficult to appreciate. Hence, R users still feel limited by their available computing resources, be it available memory, cpu power or both and complain that it still takes too long or it still uses all my memory.

The tutorial will introduce a number of available options to address these computing constraints in order to enhance or accelerate R processing.

Outline

The following topics will be covered in the tutorial:

  1. Measuring performance and profiling programs in R
  2. Improvements through better coding: vectorisation and other examples
  3. Extending R using compiled code:
  4. Running R using 'explicit parellelism':
  5. Running R using 'implicit parallelism' via Luke Tierney's new pnmath and pnmath0 packages for multithreaded math functions
  6. Extending R using out-of-memory processing using the biglm and ff packages
  7. Automating R processing using the Rscript and littler wrappers
Tutorial participants will be provided with a live-cdrom with R and all packages so that participants can run all examples on their own laptops.

Potential attendees

The tutorial is aimed at R users wishing to learn about different methods of extending R for more efficient data processing.

Required knowledge

Basic R and programming knowledge will be required to take advantage of all examples. Likewise, some understanding of general computing concepts will be helpful. For the examples involving C and C++, some familiarity with these language is required.

Instructor

Dr. Dirk Eddelbuettel has been using S+ and R for quantitative analysis for over a decade. He is the author of several CRAN packages, the maintainer of R and numerous other packages for the Debian Linux distribution, and the author of the Quantian computing environment.