SIMCA classification

This chapter is about more general simca method, which can do both Data Driven SIMCA and other SIMCA implementations. For DD-SIMCA it is recommended to use the new ddsimca method, the full description can be found here.

However, if you want to use the other SIMCA implementations you may find this text useful.

SIMCA (Soft Independent Modelling of Class Analogy) is a simple but efficient one-class classification method mainly based on PCA. The general idea is to create a PCA model using only samples/objects belonging to a class and classify new objects based on how well the model can fit them. The decision is made using the two distances we discussed in detail in the PCA chapter — orthogonal and score distances and corresponding critical limits.

Critical limits computed for both distances (or their combination) are used to cut-off the strangers (extreme objects) and accept class members with a pre-defined expected ratio of false negatives (\(\alpha\)). If a data driven approach (either classic/moments or robust) is used to compute the critical limits, then the method is called DD-SIMCA (Data Driven SIMCA), which is implemented separately.

The classification performance can be assessed using the number of true/false positives and negatives and statistics, showing the ability of a classification model to recognize class members (sensitivity or true positive rate) and how good the model is for identifying strangers (specificity or true negative rate). In addition to that, model also calculates the percentage of misclassified objects. All statistics are calculated for calibration and validation (if any) results, but one must be aware that specificity cannot be computed without objects not belonging to the class and, therefore, calibration and cross-validation results in SIMCA do not have specificity values.

It must also be noted that any SIMCA model is also a PCA model object and any SIMCA result is also a PCA result object, therefore all plots, methods, statistics, available for PCA, can be used for SIMCA model and result objects as well.