Why Isn’t Chemometrics Center Stage?
There is a great need for more accurate planning of measurements and efficient extraction of relevant information from complex analytical data. The solution is at hand, let’s start using it properly.
Today’s chemical analytical technologies provide unprecedented quality and quantity of measurements. Datasets are invariably vast and multivariate, incorporating millions of individual measurements on great numbers of variables (analytes). Unfortunately, our ability to interpret this data has fallen behind our ability to generate it, especially in multidisciplinary projects at the interface of chemistry and biology, chemistry and technology, and chemistry and the real world.
Analytical scientists understand and use simple statistical tools in their daily work but these tools are rarely suited for large datasets. Chemometrics historically provided a valuable toolbox to mine information from complex data. However, despite the accumulation of 40 years of experience with chemometrics, current levels of expertise - and popularity of the topic among analytical scientists - is shamefully low.
Why is this? After all, chemometrics tools enable and facilitate not only extraction of information from data obtained from hyphenated techniques but also integration of data obtained by many different analytical platforms. This is crucial to generating a full picture of the system being analyzed, for example, in advanced pharmaceutical and biomedical applications.
Actually, chemometric tools can be put to multiple uses: in the design of experiments; in selection and optimization of analytical conditions; in quality control of series of measurements; and in data processing, including sample correction and compression, calibration, pattern recognition and classification. Given all this, chemometrics should occupy a position at center stage for analytical chemists.
It’s not that the tools are not available. Most analytical chemistry labs are equipped with chemometric/statistical software. These user-friendly packages provide standard chemometric tools and a selection of result-visualization tools. However, slick packages also present a danger: misuse and abuse. I know of examples that range from the disquieting to the truly appalling. Suffice to say, it is simply not acceptable to query data with all the software tools available with a click of the ‘do all’ button, and then choose the ‘best’ result (according to the user!). At best this is a pure waste of time, at worst it is malpractice.
The real problem is that chemometric software has become a ‘black box’. Users are not familiar with the theoretical aspects of the tools, their assumptions or the requirements of the data sets. They don’t understand what parameters to optimize or how to go about it. In short, they are selecting blindly. Applying chemometrics is not magic: ‘garbage in, garbage out’ holds, and poor data simply means poor results. Chemometrics is not ‘a cleaning lady’ at the Analytical Chemistry department; indeed, it should ‘take a corner office,’ because it has tools to reveal trends that are buried within the data.
While chemometrics applies mathematical and statistical methods, the development of new methods is usually motivated by the pull of solving real chemical problems, rather than the push of mathematical and statistical sophistication. ‘Fit-for-use’ is a common approach in chemometrics, which tries to adapt statistics to chemistry instead of vice-versa. In general, the approach to analyzing data is a holistic one, taking a multivariate modeling approach. This can reveal unexpected patterns because the combined effect of all variables is taken into account, in contrast to traditional chemical and physical relationships, which usually consider just one or a few variables at the same time. The incorporation of wider effects can be especially advantageous for studying complex systems, such as in metabolomics and system biology. On the other hand, it can also be a disadvantage because complex analyses can be more difficult to interpret and translate to the scientific question at hand. This brings us back to the greatest challenge for chemometricians, which is to make the subject understandable and palatable to every analytical chemist, whether they have a mathematical/statistical background or not. Our goal must be to position chemometrics center stage.
Ewa Szymańska, currently a post-doctoral researcher in the Analytical Chemistry department, IMM, at Radboud University Nijmegen, in The Netherlands, began her adventure in chemometrics during her PhD. “At that time, I was using chemometric tools to evaluate if the separation method developed at the Medical University of Gdansk, Poland, could be used to discriminate healthy patients from cancer patients,” she says. For the last four years, Ewa has been involved in many multidisciplinary metabolomics projects where she has gained great experience in applying chemometric tools to complex analytical data. “Explaining chemometric findings to analytical scientists with different backgrounds is what I find most challenging and rewarding – surprisingly, being a pharmacist without a strong mathematical background helps with that!”