DD-SIMCA classification

Data Driven SIMCA (Soft Independent Modelling of Class Analogy) is a simple but efficient one-class classification method mainly based on PCA. The general idea is to create a PCA model using only samples/objects belonging to a target class and classify new objects based on how well the model can fit them. The decision is made using the two distances we discussed in the PCA chapter — orthogonal and score distances and corresponding critical limits.

Critical limits computed for both distances (or their combination) are used to cut-off the strangers (extreme objects) and accept class members with a pre-defined expected ratio of false negatives (\(\alpha\)). In DD-SIMCA the critical limits and other outcomes are computed by fitting the distance values with scaled chi-square distribution.

All theoretical background as well as some practical aspects of DD-SIMCA can be found in this paper. It has open access, freely available for everyone. Moreover, it is advised to read the paper first, as all examples below are based on similar examples and the same Oregano dataset, as discussed in the paper.

It should also be noted that any DD-SIMCA model is also a PCA model object and any DD-SIMCA result is also a PCA result object, therefore all plots, methods, statistics, available for PCA, can be used for SIMCA model and result objects as well.