Predictions for new data

Again very similar to PLS — just use method predict() and provide at least matrix or data frame with predictors (which should contain the same number of variables/columns). For test set validation you can also provide class reference information similar to what you have used for calibration of PLS-DA models.

In case of multiple class model, the reference values should be provided as a factor or vector with class names as text values. Here is an example.

res = predict(m.all, Xv, cv.all)
summary(res)
## 
## PLS-DA results (class plsdares) summary:
## Number of selected components: 1
## 
## Class #1 (setosa):
##        X expvar X cumexpvar Y expvar Y cumexpvar TP FP TN FN Spec. Sens. Accuracy
## Comp 1   92.925      92.925   47.013      47.013 25  1 49  0  0.98     1    0.987
## Comp 2    4.560      97.484   10.373      57.386 25  0 50  0  1.00     1    1.000
## Comp 3    1.789      99.274    1.588      58.974 25  0 50  0  1.00     1    1.000
## 
## 
## Class #2 (versicolor):
##        X expvar X cumexpvar Y expvar Y cumexpvar TP FP TN FN Spec. Sens. Accuracy
## Comp 1   92.925      92.925   47.013      47.013  0  0 50 25  1.00   0.0    0.667
## Comp 2    4.560      97.484   10.373      57.386 10  4 46 15  0.92   0.4    0.747
## Comp 3    1.789      99.274    1.588      58.974 10  6 44 15  0.88   0.4    0.720
## 
## 
## Class #3 (virginica):
##        X expvar X cumexpvar Y expvar Y cumexpvar TP FP TN FN Spec. Sens. Accuracy
## Comp 1   92.925      92.925   47.013      47.013 25  4 46  0  0.92  1.00    0.947
## Comp 2    4.560      97.484   10.373      57.386 25  4 46  0  0.92  1.00    0.947
## Comp 3    1.789      99.274    1.588      58.974 24  4 46  1  0.92  0.96    0.933

And the corresponding plot with predictions.

par(mfrow = c(1, 1))
plotPredictions(res)

If vector with reference class values contains names of classes the model knows nothing about, they will simply be considered as members of none of the known classes (“None”).

In case of one-class model, the reference values can be either factor/vector with names or logical values, like the ones used for calibration of the model. Here is an example for each of the cases.

res21 = predict(m.vir, Xv, cv.all)
summary(res21)
## 
## PLS-DA results (class plsdares) summary:
## Number of selected components: 3
## 
## Class #1 (virginica):
##        X expvar X cumexpvar Y expvar Y cumexpvar TP FP TN FN Spec. Sens. Accuracy
## Comp 1   92.949      92.949   53.963      53.963 25  4 46  0  0.92  1.00    0.947
## Comp 2    1.624      94.573    6.097      60.059 24  4 46  1  0.92  0.96    0.933
## Comp 3    2.702      97.275   -0.151      59.908 22  4 46  3  0.92  0.88    0.907
res22 = predict(m.vir, Xv, cv.vir)
summary(res22)
## 
## PLS-DA results (class plsdares) summary:
## Number of selected components: 3
## 
## Class #1 (virginica):
##        X expvar X cumexpvar Y expvar Y cumexpvar TP FP TN FN Spec. Sens. Accuracy
## Comp 1   92.949      92.949   53.963      53.963 25  4 46  0  0.92  1.00    0.947
## Comp 2    1.624      94.573    6.097      60.059 24  4 46  1  0.92  0.96    0.933
## Comp 3    2.702      97.275   -0.151      59.908 22  4 46  3  0.92  0.88    0.907

As you can see, statistically results are identical. However, the predictions plot will look a bit different for these two cases, as you can see below.

par(mfrow = c(2, 1))
plotPredictions(res21)
plotPredictions(res22)

And because predict() returns an object with results you can also use most of the plots available for PLS regression results. In the last example below you will find plots for X-distance and Y-variance.

par(mfrow = c(1, 2))
plotXResiduals(res21)
plotYVariance(res22)