Tutorial: Regression on large data sets: big(g)lm and
other approaches

Everyone knows that R can handle only small data sets. This
tutorial will look at ways to show that `everyone' is wrong.
There are three main approaches. For data sets with up to a few
hundreds of thousands of rows it is possible to perform the regressions
in R if only the necessary variables are loaded. For larger
data sets we can use incremental updates of bounded memory computations
as in the biglm package, or perform the largedata computations
directly in a database.

1) Why does lm() use a lot of memory?
2) Data examples 3) A little SQL: Loadondemand regression 5) Boundedmemory algorithms 6) One pass: biglm 7) Iterative: bigglm 8) More SQL: pushing computations to the database. 
Users of R who want to analyse data sets that cannot fit conveniently
into memory. The focus will be on linear and generalized
linear models, but the techniques are relevant to other computations.
