SIMCA/DD-SIMCA classification

SIMCA (Soft Independent Modelling of Class Analogy) is a simple but efficient one-class classification method mainly based on PCA. The general idea is to create a PCA model using only samples/objects belonging to a class and classify new objects based on how good the model can fit them. The decision is made using the two distances we discussed in detail in the corresponding PCA chapter — orthogonal and score distances and corresponding critical limits.

Critical limits computed for both distances (or their combination) are used to cut-off the strangers (extreme objects) and accept class members with a pre-define expected ratio of false negatives (\(\alpha\)). If data driven approach (either classic/moments or robust) are used to compute the critical limits, then the method is called DD-SIMCA (Data Driven SIMCA). You can find more details about the method in this paper.

The classification performance can be assessed using number of true/false positives and negatives and statistics, showing the ability of a classification model to recognize class members (sensitivity or true positive rate) and how good the model is for identifying strangers (specificity or true negative rate). In addition to that, model also calculates a percent of misclassified objects. All statistics are calculated for calibration and validation (if any) results, but one must be aware that specificity can not be computed without objects not belonging to the class and, therefore, calibration and cross-validation results in SIMCA do not have specificity values.

You can think that SIMCA is actually a PCA model where function categorize() is used to make a decision: if object is categorized as regular, it will be considered as member of the class, otherwise — it is a stranger. Therefore read carefully how PCA works in general and how critical limits for distances are computed in particular, to understand how SIMCA works.

It must be also noted that any SIMCA model is also a PCA model object and any SIMCA result is also a PCA result object, therefore all plots, methods, statistics, available for PCA, can be used for SIMCA model and result objects as well.