Tutorial: Exploratory data analysis with a special focus on clustering and multiway methods


François Husson, Agrocampus Rennes, Rennes Centre for Higher Education and Research in Agronomy, Rennes Cedex, France
Julie Josse, Agrocampus Rennes, Rennes Centre for Higher Education and Research in Agronomy, Rennes Cedex, France

Motivation

Nowadays, researchers have to handle complex data, hence the need to sum them up and to visualize the information in a proper and convenient way. This course presents first some classical methods for data mining to explore and visualize data such as Principal Components Analysis and Multiple Correspondence Analysis. Then we will focus on Multiple Factor Analysis which allows one to take into account data sets structured in groups of variables. We will see how theses methods can handle heterogeneous data (continuous and categorical). Finally, we will present methods of clustering (hierarchical clustering and k-means algorithms), with emphasis on the complementarity between clustering and exploratory methods. We illustrate these different methods through data sets with origins in fields such as genomics (mouse and human tumor data), ecology, and sensometrics (wine data).

Outline

Topics will include: The different methods will be illustrated with numerous examples and we will use one or more packages such as FactoMineR.

Intended audience

Researchers in applied fields, teachers in data mining and data analysis, statisticians whose are interested in multivariate analysis and multiway analysis

Background knowledge

No prior knowledge is required. Basic knowledge in PCA is welcome.

Related link

More information will be available (notes, scripts, and data sets) at our website http://factominer.free.fr.

Tutorial Materials

Slides are here.