Preprocessing as part of the model
From v. 0.15.0 preprocessing methods can be combined into a list and provided as additional argument to the models, including PLS. Once provided, PLS will take care of estimation of preprocessing parameters (e.g. reference spectrum for EMSC) and then will automatically apply all preprocessing methods to new data when the user calls function predict().
Here is an example. We will create a PLS model for Simdata UV/Vis spectra but we would like to smooth the spectra first and apply SNV normalization. Here is how to do this:
# load calibration and test set
Xc = simdata$spectra.c
yc = simdata$conc.c[, 2, drop = FALSE]
Xt = simdata$spectra.t
yt = simdata$conc.t[, 2, drop = FALSE]
# define chain of preprocessing methods
p = list(
prep("savgol", width = 11, porder = 2, dorder = 2),
prep("varsel", var.ind = 20:120)
)
# create two PLS models - with and without preprocessing
m1 = pls(Xc, yc, 6, cv = list("ven", 10))
m2 = pls(Xc, yc, 6, cv = list("ven", 10), prep = p)
# apply the models to the test set
r1 = predict(m1, Xt, yt)
r2 = predict(m2, Xt, yt)
par(mfrow = c(2, 2))
plotRegcoeffs(m1)
plotRegcoeffs(m2)
plotPredictions(m1, res = list("cal" = m1$res$cal, "test" = r1), main = "Predictions (no preprocessing)")
plotPredictions(m2, res = list("cal" = m2$res$cal, "test" = r2), main = "Predictions (with preprocessing)")
As you can see there is no need to apply the preprocessing methods to the test set manually, method predict() takes care of everything.
Information about the preprocessing methods is also shown in the model summary:
##
## PLS model (class pls) summary
## -------------------------------
## Info:
## Number of selected components: 2
## Cross-validation: venetian blinds with 10 segments
##
## Preprocessing methods:
## - savgol: width = 11, porder = 2, dorder = 2
## - varsel: var.ind = 20:120
##
## Response variable: C2
## X cumexpvar Y cumexpvar R2 RMSE Slope Bias RPD
## Cal 99.04458 96.48095 0.965 0.028 0.965 0e+00 5.36
## Cv NA NA 0.962 0.029 0.963 2e-04 5.18