Axel Benner Survival Analysis Using Multiple Additive Regression Trees ********************************************************** The method of classification and regression trees (CART; Breiman et al. 1984) models the relationship between a response variable y and a set of explanatory or input variables x by recursive partitioning. Since the input variables have to be split to binary variables the resulting trees often produce noisy and therefore unstable predictors. Several approaches were developed to reduce the prediction error, but especially two procedures, bagging and boosting of trees, have become popular in the last years. Both methods generate multiple versions of classification or regression trees and use these to get an aggregated predictor. Bagging (an acronym for bootstrap aggregating; Breiman, 1996) fits many trees to bootstrap-resampled versions of the training data, and builds an aggregated predictor by averaging them. Boosting (Freund & Schapire, 1996) sequentially fits trees to reweighted versions of the training data. The final predictor is then a linear combination of the trees from each stage. Boosting was originally developed in an algorithmic manner, but recently Friedman et al. (2000) showed that boosting procedures can be seen as stagewise algorithms for fitting additive regression models. Friedman (2001) extend the adaptive boosting algorithms towards a theoretically based function estimation procedure. The target is to find an estimate of the function F*(x), which maps x to y, that minimizes the expected value of a loss function L(y, F(x)). Numerical optimization is done by a steepest descent algorithm, where the negative gradient defines the steepest descent direction. Gradient boosting of regression trees for fitting survival time data using the Brier score loss function will be presented and applied to clinical data of patients with Chronic Lymphocytic Leukemia. Using the integrated Brier score (Graf et al, 1999) it is shown that this procedure reduces the prediction error compared to single regression trees for survival time data.