Analytical science is generating molecular data at unprecedented scale, raising new questions about how best to make sense of it all. Artificial intelligence is increasingly part of that conversation, reshaping how researchers process, integrate, and interpret molecular information. Mo Jain, Founder and CSO of Sapient, believes these advances could signal a new era for analytical science – one in which interpretation, rather than data generation, becomes the defining challenge.
His team is developing large-scale mass spectrometry and multi-omics platforms that harness AI to extract actionable biological signals and accelerate translational discovery. Here, he reflects on how mass spectrometry workflows are evolving and how AI is beginning to shape the next generation of multi-omics research.
When we talk about “AI” in an analytical science context, what are we actually referring to?
There is certainly some confusion, especially with the explosion of generative AI tools that dominate headlines today. At its core, AI is about training computer systems to perform complex, repetitive tasks with high fidelity – and analytical sciences are rife for those kinds of tasks. With today’s technologies, we're generating data at an unprecedented scale – tens, hundreds, even a thousand times more than just a few years ago. The challenge isn’t about generating data anymore; it’s about making sense of it.
Take mass spectrometry as an example. With newer high-throughput systems we can now measure tens of thousands of molecules, from proteins to metabolites to lipids, in a single biosample – and do so across thousands of samples at a time. But after that data is collected, you need to process enormous spectral files, remove noise, align peaks, perform quality control, and format everything for analysis. This kind of pipeline is exactly where AI is already having a huge impact. Thanks to advances in cloud computing, distributed systems, and now generative AI, we can now process and prepare complex datasets orders of magnitude faster than ever before. It may not be the flashiest application of AI – no robots, no rocket launches – but it is an incredibly practical, impactful use case that is transforming how we approach discovery.
And once the data is cleaned and structured, then comes the fun part: the interpretive layer. We can use AI to start asking deep biological questions, such as: Who is likely to develop a specific disease over the next decade? Who will or won’t respond to a given therapy? Or what is the best drug target? Finding answers to these kinds of questions requires analysis of multi-dimensional datasets, encompassing thousands of measurements across thousands of individuals and many timepoints. Generative AI can now help us uncover patterns in this sea of information and translate it into actionable biological insight.
What do you see as the most promising areas where AI can transform how MS data are analyzed and interpreted?
One of the most exciting opportunities for AI in mass spectrometry is its ability to help us move beyond overly simplistic models of disease. For a long time, we’ve operated under the “one gene, one protein, one disease” framework – but that is more mythology than science. In reality, even so-called “single-gene disorders” are influenced by numerous modifier genes and environmental factors.
When we approach disease by measuring a single biomarker and attempting to develop a drug around it, we leave an enormous amount of information untapped. It’s like trying to solve a jigsaw puzzle with only a fraction of the pieces; it doesn’t matter how good you are at putting the pieces you do have together – you’ll never achieve the full picture.
AI changes that. With it, we can analyze the full complexity of MS data and uncover the broader biological context, identifying not just proteins or metabolites in isolation, but entire networks and pathways involved in disease. AI can help us map the effects of targeting one protein and how it might influence others, offering a more complete view of the biological system.
This shift has the potential to transform both drug development and precision medicine. It’s not only about designing better drugs, it’s about knowing which patients will benefit from which treatments – and why.
Could you explain, in a nutshell, how Sapient’s platform integrates mass spectrometry with AI, and how this compares with more traditional biomarker discovery approaches?
Traditional biomarker discovery over the past few decades has largely relied on genomics, inferring protein activity from DNA or RNA data. While genomics has driven significant biomarker advancements, we’re now aware of its limitations. First, genomic measures are largely static. Their value is in predicting what could happen, but doesn’t necessarily reflect what actually is happening biologically. Second, many drugs target proteins, so obtaining accurate information on protein levels and modulation is critical. Studies have shown DNA and RNA as surrogates of protein levels are often poorly correlated with quantitative protein measures, particularly in disease or treated states.
Sapient’s platform measures proteins and cytokines, as well as metabolites, lipids, and other key molecules, at scale using mass spectrometry. This “multi-omics 2.0” approach goes beyond sequencing to capture dynamic biomarkers, which reflect both endogenous biology as well as environmental factors that can influence health and disease over time. And with the high throughput capabilities of our mass spec instrumentation, we can probe this broad molecular landscape in humans across thousands of samples at a time, providing high-specificity measures that reveal robust signals reflective of dynamic biological processes.
Sapient integrates AI-driven analysis as part of our platform to extract insights from these complex datasets, helping us identify therapeutic targets, predict disease progression, and understand drug responses more accurately.
What was the greatest analytical or computational challenge you faced during development – and how did you overcome it?
Scientific progress, much like biology itself, rarely comes down to a single “eureka” moment. Just as disease isn’t typically caused by one gene or one protein alone, the development of our platform has involved solving a series of interlinked technical, analytical, and computational challenges, each one building on the last.
One of the earliest and most fundamental hurdles we faced in that process was scaling mass spectrometry. Traditionally, MS instruments are designed to process hundreds of samples, and we wanted to scale that to hundreds of thousands of samples. This meant re-engineering both the hardware, to run faster and detect more molecules, and the software, to handle data extraction and quality control at a completely new magnitude.
We recently published a preprint paper that details how we developed our rapid liquid chromatography-mass spectrometry (rLC-MS) system, which uses a novel “mix mode” chromatography we’ve developed to capture diverse classes of chemicals in a single run. This innovation significantly reduces the time needed to capture diverse compounds in a complex sample, meaning many more samples can be processed per day. In parallel, there have been massive improvements in MS instrumentation, which are now 10–20 times faster and more sensitive than just a few years ago. That progress, while external, unlocked new possibilities on our end.
The next major hurdle was data interpretation. With this massive influx of high-resolution data, we had to figure out how to extract meaningful insights. This is where cloud computing, distributed systems, and AI came into play – allowing us to rapidly manage, analyze, and interpret these vast datasets for actionable discovery.
How do you guard against bias, overfitting, or spurious correlations when dealing with such vast, complex datasets?
When you think about where AI truly excels, it’s not necessarily in de novo discovery but in making sense of unstructured data – identifying clusters, relationships, and structures that humans can’t easily detect. This is why AI has become so powerful in fields like intelligence analysis, finance, and complex systems modeling.
Biological data is similar in that it’s vast, messy, and deeply interconnected, but our traditional approach to understand it has been very reductionist. We’ve built simplified frameworks – “pathways” and “processes” – that are really just human attempts to make sense of complexity. They’re useful, but inherently limited. AI allows us to move beyond that by revealing non-obvious structures in biological systems, displaying how molecules and pathways relate in ways we could never have defined manually.
Accounting for bias and overfitting is critical because if you feed biased or noisy data into any model, you’ll get flawed results. There are two main ways to mitigate this. The first is scale: with enormous datasets, random bias tends to average out, as long as it isn’t systematic – this is how LLMs such as ChatGPT achieve predictive accuracy in their AI models. But in the bioanalytical world, we can’t generate data at that scale; it’s too costly and too slow.
The second – and more practical – approach is orthogonality: you validate discoveries across independent datasets, using independent technologies. For instance, if we identify a potential biomarker in one cohort via mass spectrometry, we’ll verify it in a completely separate population or using an entirely different platform like ELISA or RNA sequencing. That cross-validation helps expose systematic bias and build confidence in the findings.
The bigger danger, however, is overfitting – especially when you have millions of measurements across a relatively small number of samples. The way we counter this is by scaling data generation and validation – increasing both the breadth and diversity of datasets so the algorithms can generalize rather than memorize.
The key comes in maintaining balance – having enough diverse, high-quality data, verifying it orthogonally, while remaining conscious of the traps of bias and overfitting. This is exactly why the advances we’ve seen in analytical technologies lately are so important. With these innovations, we’re finally achieving the throughput needed to build datasets large enough and diverse enough for true AI-driven discovery.
Once mass spectrometry reaches this scale, AI can operate on the kind of dense, high-dimensional biochemical information it needs – and suddenly, whole classes of biological questions become answerable in ways they simply weren’t before.
What do you see as the greatest challenges in applying AI to mass spectrometry?
We’re in the midst of a massive shift at the moment. Over the past year alone, the volume of data generated through platforms like ours has exploded. But data only becomes valuable when it leads to biological insight. On its own, it is just filling hard drives with zeros and ones.
The next challenge, and opportunity, is learning how to make use of that data. We're seeing this evolution already play out in therapeutics: instead of targeting a single protein, drug developers are now designing therapies aimed at two, three, even up to six targets simultaneously. The same trend is happening in diagnostics, where we’ve moved beyond single-marker tests to multi-marker panels that measure a dozen or more signals. AI and machine learning models have been critical in this area to assist selecting biomarker signatures that provide both the specificity and selectivity to better predict disease or drug response.
One area where this is already making an impact is in understanding biological age. Chronological age doesn’t necessarily reflect how “old” or healthy a person truly is. Two 50-year-olds can look and function very differently, as different factors such as a person’s lifestyle or disease onset can dramatically shift their health profile. We recently developed a machine learning-based metabolic aging clock model to derive a single metric of biological age from dozens of molecular markers in blood. This metric was then validated across different disease states to understand how conditions accelerate aging, and how interventions might reverse that process.
Achieving that kind of insight wouldn’t be possible without AI. Soon, we’ll be integrating hundreds or even thousands of molecular features to better understand biological systems. And as that happens, our ability to truly grasp the complexity of the human body – and the patient in front of us – will dramatically improve. It requires integration of complex, multi-dimensional data to produce a single, actionable output.
That’s a big part of where analytical science is heading – not just generating data, but becoming stewards of AI-driven interpretation. The analytical scientist’s role is evolving from technician to strategic contributor in discovery and drug development.
As AI takes on more of the data handling and interpretation burden, how do you see the role of human expertise evolving?
Human expertise is arguably even more important in an AI-enabled analytical workflow than it was before. If you look across history, every major technological shift – from microscopy to sequencing to modern computing – only became transformative when it was paired with deep human insight. AI is no different; its true value appears only when it’s combined with a scientist who knows how to interpret, challenge, and refine what the system is doing.
In other words, the human is still the driver; the AI is the amplifier. If the person behind the wheel has strong intuition and deep expertise, AI will magnify that in a powerful, positive way – accelerating interpretation and revealing patterns you’d never have time to find manually. But if the user lacks that grounding, then AI simply pushes you faster in the wrong direction. So the most important contribution from the human side is scientific intuition, critical thinking, and the ability to discern signal from noise. That’s where the real value emerges – not from replacing scientists, but from elevating the judgment and experience that only a human can bring.
Looking ahead, what do you think still needs to change for AI-derived biomarkers to gain widespread acceptance in pharma and clinical settings?
The pharma industry is, in many ways, paradoxical. We think of it as highly innovative – and scientifically it absolutely is – but culturally and operationally, it’s extremely conservative. That’s largely because everything ultimately touches human health. The cost of error is enormous, so the system has evolved to minimize risk, rather than maximize innovation. As a result, the way we develop, test, approve, and prescribe drugs hasn’t fundamentally changed in 60, 70, even 80 years, despite dramatic technological advances.
The world around pharma, however, is changing very quickly. Regulatory bodies, especially the FDA, are starting to recognize that the system is, as it stands, unsustainable. If we continue doing things the way we always have, healthcare will simply become financially untenable. Drug development is too slow, too expensive, and too inefficient. So there’s huge external pressure – from patients, policymakers, and healthcare systems – to rethink how we discover and validate biomarkers, how we design trials, and how we ultimately approve diagnostics and interventions.
This shift is already underway. The FDA has begun accepting multimarker diagnostic panels, and AI-based tools are increasingly used for imaging and pathology. Over the next few years, that will extend naturally into molecular diagnostics, including AI-derived biomarkers. There will be stumbles – there always are with major technological shifts – but if we’re transparent and willing to learn quickly, it’s a very healthy evolution.
On the analytical science side, the cultural shift has to be just as significant. Historically, scientific training has rewarded going very deep into an extremely narrow topic. But what’s needed now is breadth, not just depth: the ability to understand multiple layers of biology, multiple technological platforms, and multiple analytical approaches. Ironically, those that have been criticized as a “jack-of-all-trades, master of none” are exactly the people who will thrive in an AI-first world.
For AI-derived biomarkers to achieve broad acceptance, two things must first change. First, regulators must continue evolving – embracing the reality that multimodal, multianalyte, algorithmic diagnostics are not only scientifically valid, but economically necessary. And our culture of scientific training must evolve – shifting from producing ultra-specialists to producing versatile thinkers who understand technology, biology, computation, and translation.
When those two shifts come together, AI-enabled biomarker discovery will stop being the outlier and will become the norm.
