When AI Meets Physics: A Microcosmic Revolution in Liquid Chromatography

We are, I believe, living through the early stages of a technological revolution whose scale and impact will surpass those of past industrial, semiconductor, and internet revolutions. What artificial intelligence (AI), machine learning (ML), and deep learning (DL) have accomplished in just four years is nothing short of remarkable. AI’s long-term potential is clearly vast, but it is always difficult at the early stages of a revolution to predict what the end-stage will look like. (In the early days of the internet, how many foresaw the eventual transformation of global communication and data exchange?)

However, it is already clear that analytical chemistry will benefit from AI’s ability to process vast amounts of data with exceptional speed. For example, ML algorithms have the potential to enhance tasks such as analyzing and interpreting mass spectrometry data across various omics fields, counting and characterizing cells and nanoparticles observed under a microscope, developing methods in LC, or performing diagnostic checks on instruments before a run. In particular, AI has the potential to help separation scientists more accurately predict retention time in liquid chromatography – which is what I will focus on here.

This advance would represent a microcosmic AI revolution for the chromatography field – and for the pharmaceutical industry. In pharma, there is a need to improve the efficiency of the life cycle of new medicines – to deliver safe and effective treatments to patients far more quickly. More accurately predicting retention times in LC would: streamline method development, improve the identification of unknown compounds by MS, help to more efficiently track the presence of drug impurities, reduce chromatographic workloads though silico chromatography experiments, and reduce human errors by facilitating the automation of the whole life cycle of the drug, as well as the integration of AI technologies.

However, predicting LC retention time is extremely challenging. It depends on the distribution and accumulation of analyte molecules across a highly heterogeneous three-dimensional interfacial region, heterogeneous in both composition and structure, between an impermeable solid (the stationary phase) and a liquid (the mobile phase).

Regardless of the separation mode in LC (whether RPLC, HILIC, AEX, or even mixed-mode systems such as AEX-RP), the elegant peak we observe, or the simple retention time we measure from it, is only the tip of the iceberg. Beneath it lies a complex and “ugly monster:” the fundamentals of liquid-to-solid adsorption in LC and its microscopic description, which forms the keel of the iceberg and provides an underlying clue about the observed macroscopic retention time. This transition from the microscopic to the macroscopic scale in liquid chromatography is what makes accurate prediction of retention times so difficult.

An indomitable monster?

Since the very beginning of liquid chromatography in the 1960s, various approaches have been proposed to predict retention times. These include physics-based or purely theoretical models, descriptor-based models, and statistical or empirical design-of-experiments (DoE) interpolation models.

However, none of these approaches have been able to fully account for the observed retention phenomena because:

● Physics-based models rely strictly on theoretical principles, but are often too idealized to properly capture the complexity and specificity, both in composition and structure, of real HPLC adsorbents, eluents, and analytes. The advantage of such models, however, is that they allow extrapolation and generalization across different analytes and LC conditions.

● Descriptor-based models are physics-agnostic and rely on training sets of analytes with known molecular descriptors (for instance, descriptors of their molecular size, ability to accept or donate hydrogen-bonds, dipole moments, etc.), aiming for a high-quality fit between model predictions and experimental retention data. Extrapolation to other external analytes is possible, but it remains risky.

● Design of experiments and statistical/empirical models are also physics-agnostic and rely on a selected set of experiments to calibrate the model. They only allow for data interpolation; they cannot predict anything accurately beyond the training space.

This is why AI, with its speed in analyzing data, and its ability to recognize otherwise undetectable patterns, represents a promising alternative to overcome the weaknesses of these three classes of methods used to predict retention time in LC.

Before we move on to the seeds of a solution, let us consider what we mean when we refer to AI-based approaches for retention prediction, and why they have so far fallen short. Essentially, we are describing ML methods that involve rapidly complex algorithms (essentially large sets of coefficients determined through linear regression after training) on vast amounts of collected input data. Examples of such inputs include SMILES codes for analyte structures, pKa values, temperature, column chemistry, pH, salt concentration, and other parameters, all paired with a known experimental outcome: the retention time.

In other words, ML approaches can be viewed as a glorified yet extremely powerful and versatile fitting technique. The advantage of AI-based methods is their ability to handle highly complex problems, such as the intricate liquid-to-solid composition and structure between the stationary and mobile phases, that conventional models cannot adequately capture. Moreover, they can identify retention patterns that classical models fail to recognize, thanks to their capacity to process massive datasets. Finally, AI possesses the remarkable ability to predict properties relatively well – even beyond the boundaries of the training space.

However, there are two main and intrinsic limitations of future AI-based approaches to LC retention prediction. The first obvious limitation of any ML tool is “garbage in garbage out.” Both the input and output data mentioned above inevitably contain some experimental errors and noise that the AI algorithm may overinterpret and not properly recognize and filter out. This is a critical source of hallucination or incompressible errors for retention prediction.

The second and most fundamental limitation of ML approaches is their inevitable lack of recognition of the essence of retention in LC. As I said, ML is nothing more than a glorified fitting tool. Even with a very large number of accurate input and output data, ML cannot currently predict subtleties at the fundamental and microscopic levels, which fully determines the macroscopic observed retention time.

When AI meets physics

To address the current lack of accuracy and robustness in ML techniques for predicting retention times in LC, where the mean relative error in 2026 is typically around 10 percent (unacceptable for the purposes of streamlining method development, identifying compounds, and tracking drug impurities), I believe we must combine these agnostic ML approaches with deterministic methods grounded in the physics of adsorption at the solid-liquid interface. Such deterministic models can provide the most relevant data about the stationary phase-mobile phase-analyte ternary system.

The objective is to harness the strengths of both approaches: leveraging the speed and pattern recognition capabilities of ML while benefiting from the realism, accuracy, and robustness of physics-based methods. This is why I am a firm believer that molecular dynamics simulations (MDSs) represent the most suitable deterministic technique, since they describe the governing microscopic events happening in LC systems. By solving the complex motions of individual atoms in a solid-liquid-analyte system in accordance with Newton’s laws of mechanics, molecular dynamics can generate highly accurate equilibrium data that relates to the retention time. This, in turn, provides ML tools with the most relevant input for effective learning, requiring only minimal datasets and delivering the most accurate and robust predictions.

At Waters we are focused on addressing the MDS part of the solution. For example, Waters is sponsoring Daniel Frerichs’ full PhD program at Philipps University Marburg, Germany, under the supervision of Professor Ulrich Tallarek, who has expertise in MDS of retention in both RPLC and HILIC separation modes. And over the past two years, we have pursued an even more challenging objective: determining the retention mechanism in RP-AEX mixed-mode chromatography (MMC).

The latest simulations provided a striking revelation, confirming the existence of the untamable “ugly monster” hidden beneath the seemingly simple observation of a Gaussian peak and measurement of its retention time in MMC. Specifically, in the case of a simple singly charged mono-acid (e.g., phenyl acetate), we discovered that the surface of a C18 (RP)-ternary alkyl amine (AEX) MMC column is highly heterogeneous, with islands of C18 chains surrounding AEX groups and bordered by large regions of freely accessible silica surface.

Overall, MDS predicted four distinct adsorption sites: classical HILIC adsorption, HILIC partition, AEX adsorption, and RP adsorption. This level of mechanistic insight could not be achieved by the three traditional approaches mentioned previously or by purely AI-based methods relying only on macroscopic retention times and poorly relevant input data. Our early findings then strongly emphasize the need to integrate deterministic MDS tools into ML approaches to develop new, more robust hybrid AI models.

At this stage of our research, we have exclusively focused on establishing the most relevant data for AI-driven retention prediction of any analyte in MMC. These include the solid-to-liquid phase ratio of the MMC column, the specific surface area (measured by standard BET protocol) of the stationary phase, and, most importantly, a unique parameter extracted from MDS calculations: the analyte density differential (ADD). ADD measures the excess amount of analyte present in the column compared to that of an inert compound (zero excess). Its strength lies in encapsulating and averaging all local heterogeneities of analyte composition across the entire three-dimensional interfacial region between the solid and the liquid.

The next step is for computer scientists, specializing in MDS and ML, to incorporate MDS as a physics-based equation to transform classical input data, such as solute SMILES sequences, mobile phase composition, surface chemistry of the stationary phase, and temperature, into ADD input data. This parameter strongly, and indeed physically, correlates with the observed retention time.

The end stage

We can’t predict the exact pace at which AI algorithms and techniques will evolve over the next five, 10, or 20 years. Nevertheless, given that the current generation of AI tools are already demonstrating remarkably low rates of hallucination, I anticipate that practical AI-based retention-time prediction will have a meaningful impact on the drug life cycle in pharmaceutical laboratories within the next five years. By this, I mean that the relative errors achieved by this next generation of AI tools will likely remain within a few percent – well inside the bounds of experimental error.

Moreover, there are many potential applications of physics-informed AI in liquid chromatography. For example, one could predict distorted band profiles in preparative liquid chromatography and subsequently optimize purification processes by extending retention-time prediction (in linear chromatography), or predict isotherm adsorption (in non-linear chromatography). Although our present discussion has focused on equilibrium problems between the stationary and mobile phases, the same line of reasoning can be extended to predicting peak widths in LC and resolution power of LC columns. The distinction lies in the underlying physics, which involves complex fluid dynamics and advection-diffusion phenomena in randomly packed bed structures. Likewise, detection techniques may also benefit: improved multi-reaction MS-based compound identification could be significantly enhanced by incorporating the physics of molecular fragmentation into AI tools that generate candidate structures. Many other application areas could be cited as well.

More broadly, the next generation of AI tools will replace more and more tasks currently performed by analytical chemists – from the routine to the increasingly elaborate. The world of tomorrow will belong to those who create and master AI technologies and teach others how to use and prompt them.

This revolution extends far beyond our field, shaping the success rate of startup companies, the productivity of large multi-national companies, and even the power of entire countries. Of course, many questions remain. Do we want a world governed primarily by AI-programmed machines? Can we justify the additional greenhouse gas emissions required to power AI systems? But it seems we have already crossed the point of no return given the rapid adoption of AI technologies over the past four years. Indeed, any individual, company, or state that rejects AI risks rapid obsolescence, supplanted by competitors equipped with the latest systems.

My hope is that AI will ultimately serve separation science, pharma, and humankind at large – without undermining the values and human connections that underpin our society.

About the Author(s)

Fabrice Gritti

Fabrice Gritti is based at the Waters Corporation, Milford, USA.

When AI Meets Physics: A Microcosmic Revolution in Liquid Chromatography

An indomitable monster?

When AI meets physics

The end stage

About the Author(s)

Fabrice Gritti

Recommended

The Analytical Scientist Innovation Awards 2024: #5

The Climate Conversation: Part Two – Michael Gonsior

Green is Digital

Could AI Ever Replace The Analytical Scientist?

Explore

Featured Topics

Issues

Techniques & Tools

Applications & Fields

People & Profiles

Business & Education

When AI Meets Physics: A Microcosmic Revolution in Liquid Chromatography

An indomitable monster?

When AI meets physics

The end stage

Newsletters

About the Author(s)

Fabrice Gritti

Recommended

Related Content

The Analytical Scientist Innovation Awards 2024: #5

The Climate Conversation: Part Two – Michael Gonsior

Green is Digital

Could AI Ever Replace The Analytical Scientist?