Multivariate Curve Resolution

Multivariate Curve Resolution (MCR) is a group of methods that can be used to solve the curve resolution problem in spectroscopy, which, in its general form, can be defined as follows. Let’s say we have a mixture of $A$ chemical components (e.g. ribose, fructose and lactose). Every individual component is usually called a pure component. Every pure component $i$ has a spectrum (IR, NIR, Raman, etc.), which can be represented as a column vector $\mathbf{s}_i$, with size $J \times 1$, where $J$ is a number of values in each spectrum (corresponding to the number of wavelengths, wavenumbers, chemical shifts, etc.).

According to the Beer-Lambert law, if you mix the components into one mixture and take a spectrum of this mixture, the spectrum will be just a linear combination of the spectra of the pure components. This can be written as follows:

\[\mathbf{d} = c_1 \mathbf{s}^T_1 + c_2 \mathbf{s}^T_2 + \dots + c_A \mathbf{s}^T_A \]

In this equation, $\mathbf{d}$ is a vector of spectral values representing the spectrum of the mixture ($1 \times J$), $c_1, c_2, ..., c_A$ are concentrations of the pure components in the mixture and $\mathbf{s}_1, \mathbf{s}_2, ..., \mathbf{s}_A$ are the spectra of the pure components. If we combine the concentration values into a $1 \times A$ row-vector $\mathbf{c} = [c_1, c_2, ..., c_A]$ then the equation can be written in a more compact form:

\[\mathbf{d} = \mathbf{c} \mathbf{S}^T\]

Where $\mathbf{S}$ is a $J \times A$ matrix containing spectra of all pure components as columns.

Apparently, if we have more than one mixture and concentrations of the pure components vary, we can combine all concentration values into a matrix $\mathbf{C}$, where every row will correspond to a particular mixture. In this case we can write the equation as follows:

\[\mathbf{D} = \mathbf{C} \mathbf{S}^T\]

The task of the MCR methods is to get $\mathbf{C}$ and $\mathbf{S}$ by knowing $\mathbf{D}$, so we sort of resolve the mixtures into individual components and their concentrations. This is not a trivial task as the expression above does not have a unique solution. For example, one of the solutions is what PCA gives, but scores do not correspond to the real concentration values, nor do loadings represent the spectra of the pure components.

In fact, it is impossible to get $\mathbf{C}$ and $\mathbf{S}$ precisely, what we get is an estimate, which can be denoted as $\mathbf{\hat{C}}$ and $\mathbf{\hat{S}}$. In this case we can rewrite the equation as:

\[\mathbf{D} = \mathbf{\hat{C}} \mathbf{\hat{S}}^T + \mathbf{E}\]

Where $\mathbf{E}$ is a matrix with residuals.

So there are many different methods and tricks which help to get a decent solution in this case. In mdatools, starting from v. 0.11.0, there are two MCR methods available — based on the purity approach (mcrpure()), also known as SIMPLISMA, and, based on constrained alternating least squares (mcrals()). This chapter explains how to use both for practical tasks.

More information about the MCR methods in general can be found in this book.

Starting from v. 0.11.0 the mdatools package contains additional dataset, carbs, which has three objects: carbs$S is a matrix ($1401 \times 3$) of Raman spectra of three carbohydrates: fructose, lactose, and ribose; carbs$D contains 21 simulated spectra of their mixtures and carbs$C contains concentrations used to create the mixtures. The mixture spectra also contain some random noise, which is uniformly distributed between 0 and 3% of maximum intensity. The spectra of the pure components were taken from publicly available SPECARB library created by S.B. Engelsen. This dataset will be mainly used in this chapter to show how the implemented MCR methods work.