Calibration of PLS-DA model
Calibration of PLS-DA model is very similar to conventional PLS with one difference — you need to provide information about class membership of each object instead of a matrix or a vector with response values. This can be done in two different ways. If you have multiple classes, it is always recommended to provide your class membership data as a factor with predefined labels or a vector with class names as text values. The labels/values in this case will be used as class names. It is also acceptable to use numbers as labels but it will make interpretation of results less readable and can possible cause problems with performance statistics calculations. So use names!
It is very important that you use the same labels/names for e.g. calibration and test set, because this is the way model will identify which class an object came from. And if you have e.g. a typo in a label value, model will assume that the corresponding object is a stranger.
So let’s prepare our data.
data(iris)
cal.ind = c(1:25, 51:75, 101:125)
val.ind = c(26:50, 76:100, 126:150)
Xc = iris[cal.ind, 1:4]
Xv = iris[val.ind, 1:4]
cc.all = iris[cal.ind, 5]
cv.all = iris[val.ind, 5]
In this case, the fifth column of dataset Iris is already factor, otherwise we have to make it as a factor explicitely. Lets check if it is indeed correct.
## [1] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
## [16] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa versicolor versicolor versicolor versicolor versicolor
## [31] versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor versicolor
## [46] versicolor versicolor versicolor versicolor versicolor virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica
## [61] virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica virginica
## Levels: setosa versicolor virginica
However, for the model with just one class, virginica, we need to prepare the class variable in a different way. In this case it is enought to provide a vector with logical values, where TRUE
will correspond to a member and FALSE
to a non-member of the class. Here is an example how to do it (we will make two — one for calibration and one for validation subsets).
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [29] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
## [57] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Now we can calibrate the models:
You could notice one important difference. In case when parameter c
is a vector with logical values you also need to provide a name of the class. If you do not do this, a default name will be used, but it may cause problems when you e.g. validate your model using test set where class membership is a factor as we have in this example.
Let’s look at the summary for each of the model. As you can see below, summary for multi class PLS-DA simply shows one set of results for each class. The performance statistics include explained X and Y variance (cumulative), values for confusion matrix (true positives, false positives, true negatives, false negatives) as well as specificity, sensitivity and accuracy values.
##
## PLS-DA model (class plsda) summary
## ------------------------------------
## Info:
## Number of selected components: 1
## Cross-validation: full (leave one out)
##
## Class #1 (setosa)
## X cumexpvar Y cumexpvar TP FP TN FN Spec. Sens. Accuracy
## Cal 91.97 46.36 25 0 50 0 1 1 1
## Cv NA NA 25 0 50 0 1 1 1
##
## Class #2 (versicolor)
## X cumexpvar Y cumexpvar TP FP TN FN Spec. Sens. Accuracy
## Cal 91.97 46.36 0 0 50 25 1.00 0 0.667
## Cv NA NA 0 3 47 25 0.94 0 0.627
##
## Class #3 (virginica)
## X cumexpvar Y cumexpvar TP FP TN FN Spec. Sens. Accuracy
## Cal 91.97 46.36 24 5 45 1 0.90 0.96 0.920
## Cv NA NA 24 6 44 1 0.88 0.96 0.907
Dealing with the multi-class PLS-DA model is similar to dealing with PLS2 models, when you have several y-variables. Every time you want to show a plot or results for a particular class, just provide the class number using parameter nc
. For example this is how to show summary only for the third class (virginica).
##
## PLS-DA model (class plsda) summary
## ------------------------------------
## Info:
## Number of selected components: 1
## Cross-validation: full (leave one out)
##
## Class #3 (virginica)
## X cumexpvar Y cumexpvar TP FP TN FN Spec. Sens. Accuracy
## Cal 91.97 46.36 24 5 45 1 0.90 0.96 0.920
## Cv NA NA 24 6 44 1 0.88 0.96 0.907
You can also show statistics only for calibration or only for cross-validation parts, in this case you will see details about contribution of every component to the model.
##
## PLS-DA results (class plsdares) summary:
## Number of selected components: 1
##
## Class #3 (virginica):
## X expvar X cumexpvar Y expvar Y cumexpvar TP FP TN FN Spec. Sens. Accuracy
## Comp 1 91.969 91.969 46.356 46.356 24 5 45 1 0.90 0.96 0.920
## Comp 2 5.500 97.468 6.878 53.233 21 5 45 4 0.90 0.84 0.880
## Comp 3 2.186 99.654 4.816 58.049 24 3 47 1 0.94 0.96 0.947
For one class models, the behaviour is actually similar, but there will be always one set of results — for the corresponding class. Here is the summary.
##
## PLS-DA model (class plsda) summary
## ------------------------------------
## Info:
## Number of selected components: 3
## Cross-validation: full (leave one out)
##
## Class #1 (virginica)
## X cumexpvar Y cumexpvar TP FP TN FN Spec. Sens. Accuracy
## Cal 98.53 61.31 24 3 47 1 0.94 0.96 0.947
## Cv NA NA 24 3 47 1 0.94 0.96 0.947
Like in SIMCA you can also get a confusion matrix for particular result. Here is an example for multiple classes model.
## setosa versicolor virginica None
## setosa 25 0 0 0
## versicolor 0 0 5 20
## virginica 0 0 24 1
And for the one-class model.
## virginica None
## virginica 24 1
## None 3 47