Classification trees are a powerful alternative to the more traditional statistical models. This model has the advantage of being able to detect non-linear relationships and showing a good performance in presence of qualitative information as it happens in many real problems. As a result, they are widely used as base classifiers for ensemble methods. AdaBoost constructs its base classifiers in sequence, updating a distribution over the training examples to create each base classifier. Bagging combines the individual classifiers built in bootstrap replicates of the training set. Random Forest is a combination of tree predictors such that each tree depends on the values of a random vector sampled independently and with the same distribution for all trees in the forest. In this tutorial, we compare the prediction accuracy of these techniques for several UCI data sets.
The first goal of this tutorial is to introduce to the less expert audience in classification with individual or ensemble trees through several R packages as rpart, adabag or RandomForest. The second goal is that the audience brought their own data in order to apply these methods to it.
Participants should have some basic knowledge of data manipulation and standard functions in R.
Researchers, students and professionals interested in Classification.
The slides used in the tutorial will be available for participants. Participants are welcome to bring their own laptops and datasets which application could be discussed at the last part of the tutorial.
ALFARO, E., GAMEZ, M., GARCIA, N. (2012). adabag: Applies multiclass AdaBoost.M1, AdaBoost-SAMME and Bagging. R Package version 3.1. http://CRAN.R-project.org/package=adabag.
FAN, Y., MURPHY, T.B., WATSON, R.W.G. (2012). digeR: GUI tool for analyzing 2D DIGE data. R package version 1.3. http://CRAN.R-project.org/package=digeR
KUHN, M. (2012). caret: Classification and Regression Training. R package version 5.15-023. Contributions from WING, J. WESTON, S., WILLIAMS, A., KEEFER, C., ENGELHARDT, A., URL http://CRAN.R-project.org/package=caret.
LIAW, A., WIENER, M. (2002). Classification and Regression by randomForest. R News 2(3), 18-22. http://cran.r-project.org/web/packages/randomForest/.
THERNEAU, T.M., ATKINSON, B., RIPLEY, B. (2012). rpart: Recursive Partitioning. R package version 3.1-55. http://CRAN.R-project.org/package=rpart.
ALFARO, E., GARCIA, N., GAMEZ, M. AND ELIZONDO, D. (2008): “Bankruptcy forecasting: An empirical comparison of AdaBoost and neural networks”. Decision Support Systems, 45, 110–122.
BREIMAN, L. (1996): “Bagging predictors”. Machine Learning, 24(2), 123-140
BREIMAN, L. (2001) “Random Forest”. Machine Learning , 45, 5-32
BREIMAN, L., FRIEDMAN, J.H. OLSHEN, R. AND STONE C.J. (1984) Classification and regression trees. Belmont, Wadsworth International Group
FREUND, Y. AND SCHAPIRE, R.E. (1996): ``Experiments with a new boosting algorithm''. In Proceedings of the Thirteenth International Conference on Machine Learning, 148-156, Morgan Kaufmann.
ZHU, J., ZOU, H., ROSSET, S. AND HASTIE, T. (2009): ``Multi-class AdaBoost''. Statistics and Its Interface, 2, 349-360.