From Spectra to Structure: An Interview with Novogaia’s Tess Bevers

Mass spectrometry has long supported natural product discovery, helping researchers map the chemical diversity of organisms ranging from bacteria to plants. Yet even with modern instruments capable of rapidly generating vast numbers of high-quality tandem mass spectra, connecting those signals to the correct molecular structures that produced them remains challenging.

A new generation of machine-learning models is beginning to tackle that challenge directly. Among them is Gaia-01, a transformer-based foundation model developed by Novogaia that aims to predict molecular structures directly from mass spectrometry data. Trained on large collections of spectra, the system converts fragmentation patterns into structural hypotheses that can feed directly into workflows such as dereplication and compound prioritization. More broadly, it reflects a growing shift in computational metabolomics toward predictive models of chemical structure.

Here, Tess Bevers, Novogaia’s CEO, discusses how advances in AI, fungal natural product chemistry, and mass spectrometry are converging to reshape how researchers search for new bioactive molecules.

How are advances in mass spectrometry changing the way researchers approach natural product drug discovery today?

Traditionally, drug discovery from natural sources has involved culturing microorganisms or sourcing materials from plants and other organisms, followed by screening extracts against a specific bioassay. This is followed by structure elucidation, purification, and NMR – only to find that the molecule identified may be too complex to optimize. Many natural products are not immediately suitable as drug candidates.

Now we are able to culture the fungus, run the natural sample through a mass spectrometer, and gain an early view of the types of molecules it produces. At the same time, we can collect activity data for the biomarker of interest and determine which active extracts also contain molecules with properties aligned with drug development.

This ensures that we only proceed with purification and structural characterization for compounds with genuine potential for further preclinical development, effectively serving as a prioritization tool. It’s also relatively low-cost, opening up the possibility of screening nature at scale and identifying new molecules for drug development.

The reason this is now possible is not just due to better hardware – mass spectrometers have seen significant improvement in recent years – but also better software. With the genesis of AI and machine learning models, we can now parse all that data to extract meaningful information.

I’ve previously spoken to scientists at a crossroads, faced with large volumes of mass spec data they didn’t know how to use – there was simply too much for one chemist to analyze. If that data can be processed through a model that interprets mass spectra, it becomes usable. For us, this shifts mass spectrometry from a detection tool for known metabolites to a screening tool for novel chemistry.

That’s the core of our pipeline. Our foundation model, Gaia-01, is designed to infer molecular structure from mass spectrometry – something we are steadily improving. If we can reliably predict structure from a single spectrum, it can plug directly into traditional drug discovery pipelines, including toxicity prediction, property prediction, and dereplication.

What motivated your team to focus on fungi specifically as a source for drug discovery?

The rationale for focusing on fungi, rather than bacteria or plants, comes down to a couple of key factors. Like humans, fungi are eukaryotic organisms, so they are genetically closer to us than bacteria. As a result, the molecules they produce are often better suited to interacting with other eukaryotic systems, making their chemistry particularly interesting for drug discovery.

Second, fungi represent on of the most chemically diverse biological kingdoms, yet they remain widely unexplored. Screening efforts have historically focused on bacteria and plants for practical reasons, as fungi have been more difficult to work with. However, advances in culturing, handling, and analysis have made them far more accessible.

Could you explain, in a nutshell, how your foundation model works?

In essence, Gaia-01 is a transformer-based foundation model trained at scale on molecular datasets, with additional fine-tuning on paired mass spectrometry and structure data. The data are processed using a two-step architecture, with an encoder and a decoder. At a high level, it takes untargeted tandem mass spectrometry data and, by drawing on large numbers of examples, translates this into the most likely molecular structure for a given spectrum.

We benchmark performance using MassSpecGym, a widely used framework in this field. It defines standard inputs and outputs, allowing direct comparison between models, and was developed by a leading academic who is also one of our advisors, Tomas Pluskal.

Based on these benchmarks, Gaia-01 is currently the top-performing model, at least since its publication at the end of October. We remain in close contact with other leading groups, including teams at MIT, and the field is evolving rapidly.

Was there a key breakthrough or “eureka” moment during development?

I’d say one moment stands out. We built the model in two parts – an encoder and a decoder. For the decoder specifically, we developed an autoregressive transformer model – an approach that differs from those previously used in this space and is inspired by advances in the field of language modeling. Applying this kind of architecture to mass spectrometry was unexplored, and the initial results quickly suggested we were seeing something meaningful.

With machine learning, you’re constantly building different pipelines and approaches to analyze data, but in the end everything comes down to benchmarking – checking how your model performs against established standards. The key question is whether a model can match or outperform the best existing results.

When we applied this transformer-based decoder, we saw a substantial jump in performance. At the time, the best published model (Diffusion) achieved around 4 percent, whereas our results were 41 percent. That was the moment we thought: this is a huge jump – how is that possible? It suggested that the architecture and large-scale training was fundamentally more effective.

Just before we published, we learnt that another group reported a model reaching around 36 percent, narrowing the gap slightly. Even so, moving from around 4 percent to 40 percent represented a major shift, and that was very much a “eureka” moment for us.

It reflects a common pattern in machine learning: progress is often incremental until a new architecture leads to a step change in performance, as seen with systems such as AlphaFold. That said, there is still considerable work to do before this becomes fully reliable – although we are getting closer.

What do you see as the main challenges or limitations – both for your approach and for computational metabolomics more broadly?

I’d say the main challenge is data. The most valuable input for these models is high-resolution mass spec data paired with known molecular structures – where each spectrum can be linked to a defined structure. The more of these paired datasets we have, the better we can train our models.

There has already been a push within the field to generate more of this data, and we’re doing the same at Novogaia, particularly in fungal chemistry. Building these datasets is a key part of improving model performance.

The second area is benchmarking. Most groups rely on MassSpecGym, but the dataset used for evaluation is relatively small. While it provides a strong and stringent benchmark, there is a clear need to expand both the frameworks and the datasets as the field advances. We are therefore developing internal benchmarks and exploring how evaluation can be improved more broadly.

What do you see as the most promising applications of this approach – in drug discovery and beyond?

Mass spectrometry has been around for decades, and there’s a huge amount of data, but using it in this way – as a data-driven, predictive tool – is still an emerging field. What’s exciting is that we’ve been able to build a state-of-the-art model as a relatively small team – just four people working on this, with the kind of computing resources you’d expect from a startup.

That’s also why I find metabolomics such an exciting space. If you look at genomics or proteomics, those areas are extremely crowded, and pushing the state of the art is very difficult. But in metabolomics, applying these AI approaches is still relatively new, which gives smaller teams the opportunity to make meaningful advances.

With technologies like this, progress tends to move fastest in areas where there’s the most funding. So I think we’ll see the biggest advances in pharma. That said, the potential goes far beyond that. We’re already seeing interest from other industries – for example, agriculture, where companies are looking for new biopesticides by screening microbes for molecules with specific activity. There’s also interest from companies working on flavours and fragrances, looking for natural alternatives.

All of these areas benefit from tools that accelerate the discovery of functional molecules from nature. From a business perspective, however, drug discovery remains the strongest model. That is how we apply the technology internally – identifying fungal molecules and screening them against targets to advance candidates into preclinical development.

At its core, this technology is about translating the chemistry of life. Every part of nature contains small molecules, yet much of this space remains poorly understood, even with advances in sequencing. As structure prediction improves, this could help us better understand why evolution has produced certain molecules and what their functions are.

Looking forward, what are the immediate next steps for your team?

There are three main priorities. First, we’re already working on Gaia-02, using more data and a new architecture that allows for end-to-end training. We’ve built a state-of-the-art model with Gaia-01, but there is still significant room for improvement. Our aim is to generate stronger results and publish them, ideally at major AI conferences such as NeurIPS, where presentation is an important marker of robustness and competitiveness.

Second, on the experimental side, we’re focused on generating validated hits – identifying molecules from nature, in our case fungi, that show activity against innate immunity targets. A validated hit is a molecule that binds to the target, is selective, potent, and non-toxic.

And finally, we’re building what we think of as a “lab-in-the-loop” process. We’re generating large amounts of data, but the goal is to make sure that every experiment feeds back into the model. So it becomes a continuous cycle – each screen makes the system better. That feedback loop is something we’re actively working to establish and demonstrate.

We’re still at an early stage with all of this. Predicting molecular structure directly from a mass spectrum remains a difficult problem, but performance continues to improve as these models are able to analyze far more data than a single individual ever could.

This is an active and competitive area, which is a good thing – it creates momentum and a sense of urgency. What the field needs now is better datasets, stronger benchmarks, and more robust evaluation across chemical space to ensure models generalise and performance metrics are reliable.

About the Author(s)

Henry Thomas

Deputy Editor of The Analytical Scientist

From Spectra to Structure: An Interview with Novogaia’s Tess Bevers

How are advances in mass spectrometry changing the way researchers approach natural product drug discovery today?

What motivated your team to focus on fungi specifically as a source for drug discovery?

Could you explain, in a nutshell, how your foundation model works?

Was there a key breakthrough or “eureka” moment during development?

What do you see as the main challenges or limitations – both for your approach and for computational metabolomics more broadly?

What do you see as the most promising applications of this approach – in drug discovery and beyond?

Looking forward, what are the immediate next steps for your team?

About the Author(s)

Henry Thomas

Recommended

This Week’s Mass Spec News

What If Computers Could Smell?

The Analytical Scientist Innovation Awards 2024: #6

The Analytical Scientist Innovation Awards 2024: #4

Explore

Featured Topics

Issues

Techniques & Tools

Applications & Fields

People & Profiles

Business & Education

From Spectra to Structure: An Interview with Novogaia’s Tess Bevers

How are advances in mass spectrometry changing the way researchers approach natural product drug discovery today?

What motivated your team to focus on fungi specifically as a source for drug discovery?

Could you explain, in a nutshell, how your foundation model works?

Was there a key breakthrough or “eureka” moment during development?

What do you see as the main challenges or limitations – both for your approach and for computational metabolomics more broadly?

What do you see as the most promising applications of this approach – in drug discovery and beyond?

Looking forward, what are the immediate next steps for your team?

Newsletters

About the Author(s)

Henry Thomas

Recommended

Related Content

This Week’s Mass Spec News

What If Computers Could Smell?

The Analytical Scientist Innovation Awards 2024: #6

The Analytical Scientist Innovation Awards 2024: #4