Overview

The first version of this package was created in 2012 for an introductory PhD course on Chemometrics given at Department of Chemistry and Bioscience, Aalborg University. Quickly I found out that using R for this course (with all advantages it gives) needs a lot of routine work from students, since most of them were also beginners in R. Of course it is very good for understanding when students get to know e.g. how to calculate explained variance or residuals in PCA manually or make corresponding plots and so on, but for the introductory course these things (as well as numerous typos and small mistakes in a code) take too much time, which can be spent for explaining methods and proper interpretation of results.

This is actually also true for everyday use of these methods, most of the routines can be written once and simply re-used with various options. So it was decided to write a package where most widely used chemometric methods for multivariate data analysis are implemented and which also gives a quick and easy-to-use access to results, produced by these methods. First of all via numerous plots.

Here how it works. Say, we need to make a PCA model for data matrix x with autoscaling. Then make an overview of most important plots and investigate scores and loadings for first three components. The mdatools solution will be:

# make a model for autoscaled data with maximum possible number of components
m = pca(x, scale = TRUE)

# show explained variance plot
plotVariance(m)

# select optimal number of components (say, 4) for correct calculation of residuals
m = selectCompNum(m, 4)

# show plots for model overview
plot(m)

# show scores plot for PC1 and PC3
plotScores(m, c(1, 3))

# show loadings plot for the same components
plotLoadings(m, c(1, 3))

# show the loadings as a set of bar plots
plotLoadings(m, c(1, 3), type = "h")

Fairly simple, is not it? The other “routine”, which have been taken into account is validation — any model can be cross-validated or validated with a test set. The model object will contain the validation results, which will also appear on all model plots, etc. See the next chapters for details.