Open-Source Machine Learning: R Meets Weka Kurt Hornik, Christian Buchta, Achim Zeileis ******************************************** Weka (http://www.cs.waikato.ac.nz/~ml/weka/) is the leading open-source project in machine learning. Weka is a comprehensive collection of machine-learning algorithms for data mining tasks written in Java, containing tools for data pre-processing, classification, regression, clustering, association rules, and visualization. For many of these algorithms Weka provides de facto reference implementations, hence a variety of additional projects are based on Weka. To enhance the statistics and statistical learning tool box already available in R by popular machine learning techniques, an interface from R to Weka would clearly be desirable. The R extension package RWeka provides such an interface. On the highest level, its main feature are R functions for a variety of machine-learning algorithms such as tree learners (C4.5/J4.8, M5', logistic model trees), or popular meta and rule-based learners. These learners are in fact obtained via interface generators, i.e., functions which return functions providing interfaces to Weka's classes. Such generators are available for Weka's "classifiers" (i.e., regression and classification learners), clusterers, association learners, and filters. The generated interface functions have enough meta information to allow for dynamic documentation, in particular for listing the available Weka control options via WOW, the Weka option wizard. The R objects obtained by calling the interface functions are suitably (S3) classed, making it possible to provide general-purpose prediction methods for Weka's classifiers and clusterers and more specialized methods for vizualization. Users can easily add interfaces to additional Weka learners and filters, and add R classes and methods for the results of applying these interfaces. The low-level interaction between R and Java is based on package rJava, with only minimal amounts of Java glue code added for performance enhancements. We also discuss possible enhancements, which also relate to general issues arising when interfacing R with other systems: First, too much of the Weka objects is private and hence basically unavailable for interfacing. Second, ideally data would be shared between R and Weka in a way that conversion between the native formats would happen only when needed. And finally, it would be very valuable to have Weka interfaces to some of R's functionality. Keywords: machine learning, statistical learning, R, Weka, Java, interface