Subscribe to Newsletter
Techniques & Tools Chemical, Data Analysis, Spectroscopy

Chemometrics United

Chemometrics can be thought of as signal processing for measurements made on chemical systems, and the tools available range from simple to dizzyingly complex. The best tool for a given task depends both on the objective and on how the measured signal manifests. If the signal is reasonably described by the linear mixture model, it’s common to rely on multivariate linear regression tools, such as partial least squares and classical least squares (CLS) for quantification. Partial least squares is one member of a broad class of inverse least squares (ILS) methods and CLS is often referred to as ‘forward least squares’. In the recent past, chemometricians have favored ILS methods, dwelling on the disadvantages of CLS while ignoring the downside of ILS. I believe that a solid understanding of the pros and cons of both methods eliminates the apparent conflict between ILS and CLS, and instead allows them to be used in synergy.

The best tool for a given task depends both on the objective and on how the measured signal manifests.

Suppose that a data set corresponding to a predictor includes signals from both a target of interest and various interferences; for example, a set of measured absorbance spectra. Also available are the corresponding measured reference values for the target (the predictand). The goal is to use an easy-to-measure predictor to predict a hard-to-measure predictand using a linear regression model. For example, in the process environment it might be of interest to replace an expensive, time-consuming, off-line wet chemistry analysis (the predictand) with a fast, inexpensive, online spectroscopic measurement (the predictor). The result is an online ‘inferential sensor’ that may also enable proactive control of the process. The often-stated primary advantage of ILS is that the chromophores of all analytes contributing to the signal need not be known; this is true but it’s important to note that the interferences must vary in the calibration set, if the model is to account for them. Unless the interferences are varied in a way that makes their signal orthogonal to the signal of the target, there is a chance for coincidental correlation between target and interference. If the correlation remains, the ILS model can take advantage of it, but if the correlation breaks, the model will typically perform poorly in the future.

In contrast, CLS will attempt to use only the target signal and thus avoid coincidental correlation. However, without utilizing external information, CLS requires a good design of experiments (DoE). Astute readers will note that this is exactly the same DoE that would keep the ILS model from relying on coincidental correlation. So, right off the bat, understanding the two modeling approaches has provided a synergistic perspective.

A second item to note is that there is a misconception that CLS is only useful with spectra as the predictand (while citing multi-component Beer’s law); in fact, CLS can be applied to other systems.

In general, ILS algorithms are fast and many tools are available to help in model identification (for example, cross-validation). Additionally, the statistics are well defined. In contrast, depending on available measurements, CLS models can be difficult to identify. Unfortunately (except for the simplest problems), interpretation of ILS models can be difficult and misleading. In contrast, CLS models tend to provide the most interpretable models – and that may well be the primary objective. During identification of an ILS model there are several useful constraints (such as non-negativity) that are not applicable during CLS model identification. The wonderful upshot is that ILS models can be used to guide CLS modeling so that both ILS and CLS can be used to their best advantage during model identification.

Because CLS allows useful constraints, provides greater interpretability and is easy to update, I anticipate expanded use of CLS in chemometrics applications in the future. However, it is the synergistic use of ILS and CLS that will enable high quality regression solutions.

Receive content, products, events as well as relevant industry updates from The Analytical Scientist and its sponsors.
Stay up to date with our other newsletters and sponsors information, tailored specifically to the fields you are interested in

When you click “Subscribe” we will email you a link, which you must click to verify the email address above and activate your subscription. If you do not receive this email, please contact us at [email protected].
If you wish to unsubscribe, you can update your preferences at any point.

About the Author
Neal Gallagher

Neal Gallagher received B.S. degrees in Chemical Engineering and Engineering Physics from the University of Colorado in 1985, an M.S. in Chemical Engineering from the University of Washington in 1987, and a Ph.D. in Chemical Engineering with a mathematics minor from the University of Arizona in 1992. Neal is Vice President and co-founder of Eigenvector Research, Inc. Neal has worked on an extremely wide variety of chemometrics projects for a number of companies, national laboratories and academic institutions. He specializes in chemometrics consulting, algorithm development for detection, classification and quantification, chemometrics research, short courses and software. He is a co-author of PLS_Toolbox for use with MATLAB, it’s companion stand-alone product Solo and other advanced chemometrics software packages. And has been working extensively in developing algorithms for hyperspectral image analysis with an emphasis on anomaly and target detection.

Register to The Analytical Scientist

Register to access our FREE online portfolio, request the magazine in print and manage your preferences.

You will benefit from:
  • Unlimited access to ALL articles
  • News, interviews & opinions from leading industry experts
  • Receive print (and PDF) copies of The Analytical Scientist magazine

Register