Torsten Hothorn
Bundling Predictors in R
************************
A combination of different classifiers, for example tree based and linear
classifiers, nearest neighbors or the logistic regression model, promises to
lead to an improvement with respect to misclassification error compared with
any of the single competitors. Hothorn & Lausen (2003) suggest a combination
of linear and tree based classifiers called "double-bagging": A linear
discriminant analysis (LDA) is performed using the out-of-bag observations
of a bootstrap sample and a classification tree is computed using the
variables in the bootstrap sample as well as the values of the linear
discriminant functions for those observations. The procedure is repeated
sufficiently often and a new observation is classified by majority voting in
analogy to bagging (Breiman, 1996).
Simulation experiments show that this combined classifier performs at least
as good as the best of those two procedures or even leads to an improvement.
The methodology can be extented to a combination of arbitrary classifiers
called ``bundling''. Moreover, the same procedure can be used for a
combination of regression models as well as for problems with censored
responses.
The functionality is implemented in the ipred package. A rather general
interface allows to combine most of the classifiers or regression models
currently available in \R as long as they provide a formula based interface.
The estimation of each of the single models for each out-of-bag sample and
the computation of their predictions can easily be implemented using
lexical scoping. We will introduce the user interface and illustrate its
application to classification and regression problems and survival data.
Some aspects of the implementation will be given. The major problem with
the current design is a tradeoff between generality and performance, since
the formula for each of the single models needs to be evaluated for every
out-of-bag sample. An modification to the rpart routine leads to a significant
improvement with respect to computing time.