GSoC 2011 logo

R Projects for the GSoC 2011 Initiative

As in 2008, 2009, and 2010, the R Project is again participating in the Google Summer of Code during 2011.

Based on ideas collected and disussed on the R Wiki, the projects and students listed below were selected for participation and are being sponsored by Google during the summer 2011.

Manipulating RStudio Graphics Towards Creating Intuitive Mathematical Comprehension

Andrew Rich

Mentor: Daniel Kaplan assisted by J J Allaire

Short description: The manipulate package in RStudio can be used to demonstrate mathematical ideas intuitively through interacting with sliders and watching a corresponding graph shift. I will create a variety of applets written in R to show basic calculus and statistical concepts for Professor Daniel Kaplan and J.J. Allaire’s contribution to Project MOSAIC.

HUGE: High-dimensional Undirected Graph Estimation

Tuo Zhao

Mentor: Kathryn Roeder assisted by Han Liu

Short description: Modern data acquisition routinely produces massive amount of complex datasets. Despite the high dimensionality and complexity, many problems have hidden structure that makes efficient statistical inference possible. One important hidden structure is sparse conditional independence graphs (or undirected graphical models). Our HUGE project aims at providing a fast and scalable toolkit for nonparametric graphical models in ultrahigh-dimensional data analysis.

A GUI based package to assist optimization problems in R

Yixuan Qui

Mentor: John Nash assisted by Ben Bolker

Short description: This project aims at building up a GUI based package of R to assist the preparation and solution of optimization problems. It is anticipated to improve the usability of optimization tools in R by providing users with meaningful suggestions on the choice of optimizer and parameters through a visible and interactive way. The program will also provide a mechanism to auto-generate codes that could be run in R to solve a specific optimization problem.

Dyadic data analysis in R

Jesse Wang

Mentor: Felix Schönbrodt assisted by Stefan Schmukle

Short description: The Actor-Partner Interdependence Model (APIM; Kashy & Kenny, 1999; Kenny, Kashy, & Cook, 2006) is a model of dyadic relationships that integrates a conceptual view of interdependence in two-person relationships with the appropriate statistical techniques for measuring and testing it. There is only one R package (“dyad”) that helps researchers to conduct dyadic analysis. And it suffers some limitation such that it cannot handle complex interaction effects. To overcome its functional deficiency,

Image Analysis in R

Sunil Kumar

Mentor: Ian Fellows

Short description: To bring full integration of ImageJ to R and to expand the RImageJ into a fully functional R image analysis engine.

Developing a hyperSpec GUI

Sebastian Mellor

Mentor: C. Beleites assisted by Colin Gillespie

Short description: Currently hyperSpec provides a limited GUI interface via the `locator()` function for basic graphics. This proposal will develop a Graphical User Interface for the hyperSpec package. This GUI will be made up of smaller widgets that can be chained, synchronised, and included in batch scripts.

optile Category order optimization for graphical displays of categorical data

Alexander Pilhöfer

Mentor: Antony Unwin

Short description: The project goal is to implement an interface in R which provides category order optimization for different types of input (such as tables, data frames or matrices) and 2- as well as k-dimensional categorical data.

DClusterm: Model-based detection of disease clusters

Paula Moraga

Mentor: Virgilio Gómez-Rubio assisted by Barry Rowlingson

Short description: Analysis of disease data is important in order to detect disease outbreaks and risk factors. Some of the methods for cluster detection have been implemented in the DCluster package. However, a model-based approach would be of interest in order to explore disease incidence to potential risk factors. Model-based clustering will be implemented using Generalized Linear Models. Hence, many possible clusters will be proposed and the most likely cluster will be selected using model selection techniques.

OpenMP parallel framework for R

Lei Jiang

Mentor: George Ostrouchov

Short description: As an existing project in the ideas list, it aims to use multi-threaded programming to impose parallelism based on multicore/shared memory architecture. As OpenMP is a well known specification for parallel programming, it is performed in a neat way without hassle in messaging passing or load balancing, and supports hybrid programming with MPI as well. The expected results include a usable R-OpenMP package that will reside on CRAN servers with good performance, compatibility and user experience.

Exploratory visualization of dynamic stochastic processes.

Kåre Jacobsen

Mentor: Niels Richard Hansen

Short description: To contribute with functions to help explore, visualize and analyze data from multivariate stochastic dynamic systems.

SMART: Sparse Multivariate Adaptive Regression Toolkit

Juemin Yang

Mentor: Han Liu

Short description: The project aims at providing the “fastest and most scalable” implementations of three modern nonparametric predictive methods (SpAM, MT-SpAM and G-SpAM). This package has the potential to become a general-purpose exploratory data analysis toolbox for a wide range of data analysis practitioners. The targeted applications include large-scale scientific data analysis (e.g. genomics/proteomics/bio-imaging), social media data analysis (e.g. image/audio/video/text modeling) and financial time-series

Convergence acceleration of the Expectation-Maximization (EM) algorithms in computational statistics: A suite of cutting-edge acceleration schemes

Jennifer Feder Bobb

Mentor: Ravi Varadhan

Short description: The Expectation-Maximization (EM) algorithm is a useful and popular optimization approach that arises in a wide range of scientific applications. Adaptations of the original EM approach have been proposed that provide faster convergence rates without compromising its global convergence property. We propose to develop an R package which will provide a unified implementation of the diverse set of accelerations schemes to the EM algorithm in an open source, user-friendly environment.

R-EM-Accelerator—Smarter Iterative Schemes Save Your Time

Hui Zhao

Mentor: Roger Peng with assistance of Ravi Varadhan

Short description: This project aims at developing an R package that offers multiple latest acceleration schemes under a single call and can be used to accelerate any EM algorithm. In the proposal, I will show how flexible and convenient it will be for any R user to use this package and a reasonable timeline, which is the result of prior learning, is also included. In addition, I’d like to mention that I want R project as the mentoring organization and Professor Ravi Varadhan as my mentor.

Proposal for Components in TradeAnalytics Toolchain enhancements

Y. Chen

Mentor: Brian G. Peterson

Short description: The existing packages have included necessary tools/functions to construct and apply trading strategies. More functions related to trading a portfolio, testing of parameters and evaluation of strategies can be added. This proposal is focus on some of the targets related to these new developments.

Cranvastime: Interactive longitudinal and temporal data plots

Xiaoyue Cheng

Mentor: Di Cook with assistance of Heike Hofmann

Short description: The project involves developing interactive time series and longitudinal data plots, in association with a new interactive graphics package for R called cranvas, which is based on Qt, and has the capability to handle large amounts of data. The purpose is to improve R’s capabilities for exploring temporal data. The time series plot will enable exploring slightly irregular seasonality, and associations between multiple series.The longitudinal plot will enable the study of the individual variation.

Last modified: May 19, 2011 by John C. Nash