The Dark Metabolome: No Mere Figment?

In the spring of 2024, Martin Giera, Aries Aisporna, Winnie Uritboonthai, and Gary Siuzdak published a paper arguing that in-source fragmentation (ISF) – the fragmentation of analytes during the initial ionization process within the ESI source – accounts for over 70 percent of the peaks observed in typical LC-MS/MS metabolomic datasets.

“This finding disrupts the prevailing assumption that the majority of peaks in mass spectra correspond to unique metabolites,” the authors wrote.

The Analytical Scientist reported on the research in an article titled “The Dark Metabolome: A Figment of Our Fragmentation?” This article, and its provocative title, was cited by a group of researchers, including Yasin El Abiead and Pieter C. Dorrestein, as having “amplified the sensationalism” – as part of their response, published in Nature Metabolism.

In this first installment of our series on the ongoing dark metabolome debate, we speak with El Abiead and Dorrestein about the scientific and professional stakes at play. They explain why they disagree with the conclusions drawn by Giera et al. – and how those conclusions have been amplified in the media – and offer their perspective on what can and cannot be said about the size and significance of the dark metabolome.

Can you comment on the stakes involved in this debate?

Pieter Dorrestein: The Nature Metabolism response paper was motivated after I heard from junior colleagues, both in the US and internationally, whose grants and manuscripts focused on the discovery of new metabolites were rejected, citing the discussion in the original ISF paper and follow up (social) media, including the article in The Analytical Scientist. In other words, this discourse has already had tangible consequences – not only for scientific funding and publication, but also for the career trajectories of emerging researchers in the metabolomics field.

From a scientific standpoint, the framing of that discussion underestimated the scale of molecular discovery still ahead, even in well-studied sample types. This has had the effect of downplaying the importance of untargeted metabolomics as a discovery tool, at a time when expanding the known metabolome is critical for advancing our understanding of biology, health, and disease.

What did the original say about the dark metabolome and how were those findings covered in the media?

Dorrestein: Both the idea that the dark metabolome is not real and the overemphasis on in-source fragmentation (ISF) became central talking points in follow-up blogs, article highlights (with titles like "Phantom Metabolites" or "Figment of Our Fragmentation"), and social media discussions. A core issue is that while the original paper may have observed a certain number of ISFs in their 0V, 950K compound library, it led to broad overgeneralizations about all untargeted metabolomics experiments.

ISF itself is not a new phenomenon. It has been well-documented since molecules were first introduced into mass spectrometers. We know that certain molecules fragment more easily than others, and the extent of ISF depends on multiple factors – including chemical structure, instrument design, and source parameters. The 0V 950K library used in that study may be enriched for compounds more prone to ISF – but we can't assess that fully because neither the raw data nor the compound structures are publicly available.

In contrast, studies that examined ISF in real biological samples have reported much lower rates, typically between 2 percent and 36 percent, depending on experimental conditions. Furthermore, many instrument manufacturers offer "soft" ionization settings specifically designed to minimize ISF, which are routinely used in metabolomics workflows.

One of my favorite studies on this topic is: “Evaluation of Lipid In-Source Fragmentation on Different Orbitrap-based Mass Spectrometers,” which clearly shows how source voltages and instrument design affect the extent of ISF. If someone brought me data claiming 70 percent of signals were ISFs, my first recommendation would be to optimize the instrument settings, not to conclude that all metabolomics data are similarly affected.

Also, it's worth noting that intentional fragmentation of all ions – as done in DIA (Data-Independent Acquisition) – can yield 100 percent fragmentation by design. So, the mere presence of fragment ions doesn't reduce confidence in data, nor does it imply misinterpretation unless deconvolution is improperly handled.

Lastly, the suggestion in the paper that the metabolome is less complex than previously thought cannot be supported by the presented results. Without comprehensive structural validation and biological context, such a conclusion is premature and risks misleading the broader interpretation of untargeted metabolomics data.

Yasin El Abiead: I would like to add that although comments and framing in the media certainly emphasized the dismissal of the “dark metabolome” as an analytical artifact, Gierra et al. did state: “[..] the ‘dark metabolome’ is largely made up of fragment ions generated during ESI.” They repeated the same statements in another article, based on the same data and show the coelution of a selection of ISF ions associated with the same standard. These points were amplified rather uncritically in the media – but the claim was indeed made in the original manuscript.

We agree that ISFs exist and that it is important to pay attention to them. The main problem with some of the statements made is that they derive far-reaching biological conclusions from highly concentrated synthetic chemicals measured under specific instrument conditions. Metabolic extracts are highly complex with concentrations spanning more than 10 orders of magnitude. As such, even if we were to ignore the effect of instrument settings on ISFs – which we should not – many of these ISFs would simply be too low to detect in real-world scenarios.

Finally, it should not be forgotten that while any given molecule will produce multiple ion species, biological samples also contain isobaric molecules that are too similar to resolve with current routine methodologies and species that are too diluted to detect, which should also be counted towards the dark metabolome.

What do the findings from the recently published preprint from Chi et al. mean for the debate?

Dorrestein: Based on my reading, the paper reports that approximately 10 percent of features arise from in-source fragmentation (ISF) – a figure that aligns with the 2–36 percent range reported in other metabolomics studies conducted under properly optimized instrument settings. This is notably different from the 70 percent ISF rate cited in the Giera et al. paper, which was based on a pure compound library.

Importantly, the phrase "our results are consistent with that by Giera et al. (2024)" is preceded by a clear caveat: "If we assume ESI always fragments a small percentage of all analytes..." – indicating that the authors are acknowledging the Giera findings in context, not equating them directly to typical metabolomics workflows. Chi et al. suggest that the discrepancy could stem from the fundamental differences between analyzing complex biological samples and pure standards. In complex samples, many low-abundance ions may go undetected due to limited sensitivity, which is not as limited with pure compound libraries. This explanation is in addition to differences in instrument settings and optimization.

El Abiead: I agree, however, I want to add that Chi et al. have since published a new version of their preprint where they have removed the statement on their data being consistent with Gierra et al. and clarified that in-source fragments observed for highly concentrated analytical standards in clean solutions cannot be assumed to be also detectable in complex biological samples – an argument they emphasized through reanalysis of 61 public datasets. This goes to show the importance of testing hypotheses on biological data; even educated guesses remain guesses.

Do you think Giera et al.’s work has value as a “wake-up call” to the field to better understand and manage ISF?

Dorrestein: I welcome the ongoing debate around in-source fragmentation (ISF) – awareness of ISF is important – but it's worth noting that this is not a new discussion. The phenomenon has been well documented since the early days of mass spectrometry, and many papers have explored its implications in metabolomics and beyond.

What’s equally important, however, is to focus not only on the problem but also on the solutions. Computational strategies to identify and manage ISFs have been available for at least 15 years. Several robust approaches exist today, including algorithms that leverage peak shape and retention time matching to detect ISFs and, in some cases, even use them to enhance molecular annotations.

For example, on one of my favorite studies on this topic from the Siuzdak lab – in which ISFs are intentionally enhanced to improve annotation. In their Analytical Chemistry paper, the authors describe two key aspects to ISF:

That standard metabolomics data often contain insufficient ISFs to significantly aid annotation with algorithms that can find them ("these in-source annotation algorithms are limited by ESI sources that are generally designed to minimize ISF."); and
That instrument settings can be deliberately adjusted to induce ISFs in a controlled way, making them useful for structural annotation.

This work underscores an important point: ISFs can be beneficial when harnessed appropriately. More importantly, simply counting ISF-related ions does not provide insight into how many distinct molecules are present in a sample, nor does it inform us about how many of those molecules are known versus unknown to say anything about the dark metabolome. That distinction is critical when interpreting the complexity of untargeted metabolomics data.

El Abiead: I agree with Pieter. Also, I think there’s value in reminding the metabolomics community that MS/MS scans can be used creatively to learn more about the measured sample than just spectral library matching. And Giera et al. certainly did that.

Do you consider this debate “closed” – or do we need further research? Do we need a “consensus study?”

Dorrestein: While in one sense the topic of in-source fragmentation (ISF) is "closed" – we know ISFs are present in metabolomics data – it must remain an "open" discussion, especially for those learning the field. It's essential to understand how ISFs affect data structure and downstream analysis.

ISFs are not inherently problematic; in fact, they can be beneficial. As demonstrated in the wonderful Analytical Chemistry paper from the Siuzdak lab mentioned earlier, ISFs can aid in annotating substructures – improving annotation confidence. They also offer value in MS1-only datasets, including imaging mass spectrometry, and that are intentionally generated in data-independent acquisition (DIA) methods to fragment all ions entering the mass spectrometer.

That said, many open questions remain. For example:

Which molecular structures are more prone to ISF?
Can solvent additives be used to suppress or enhance ISF, depending on analytical goals?
Can ionization sources be designed to better control or reduce ISF?
How do different molecules behave – some fragment easily, while others are remarkably stable. What governs this distinction?

One could even imagine estimating bond strengths based on the energy required to generate specific ISFs across a series of compounds. A "consensus study" across labs – investigating how instrument types and settings influence ISF behavior – would be a valuable contribution to the field.

El Abiead: Further studies, such as those mentioned by Pieter would indeed be valuable. We as a community also have to be aware that in-source fragments will always remain a subset of the unknown ions in our data that may or may not belong to unknown molecules – something that has been known for decades. While we and others will continue to develop methodologies to control and utilize them, they must be considered, especially when concluding on things like the number of detected metabolites based on observed signals without accounting for co-migrating ions first.

Given the current evidence, what can we say about the existence, size, and significance of the dark metabolome?

Dorrestein: I love this question – it’s both scientific and deeply philosophical. What is the scale of the metabolome, and how much of it do we really know?

To start with what we do know:

KEGG (covering many organisms) and EcoCyc (specific to E. coli) together contain around 20,000 structures.
The Human Metabolome Database (HMDB) lists ~30,000 detected metabolites in humans, plus ~220,000 predicted ones.
ChEBI (Chemical Entities of Biological Interest) contains about 62,000 entries.
Natural products databases collectively hold around 400,000 structures, representing molecules derived from nature's biochemical diversity.

These databases give a sense of the current known metabolome – spanning humans, microbes, and broader ecosystems.

In untargeted metabolomics, especially using LC-MS/MS, the average annotation rate is about 14 percent. However, this percentage only refers to the molecules that are detectable under specific conditions and matchable to known references. In other words, 86 percent of signals in a typical dataset remain unannotated – what we often refer to as the "dark metabolome."

Even within the human metabolome alone, there's immense complexity.

There are common metabolites found in nearly everyone – core components of primary metabolism.
But beyond that are less frequently observed molecules that vary across populations, environments, diets, and sample types (e.g., tissues vs. biofluids).
While most primary metabolites are broadly detectable, molecules derived from the microbiome, diet, medications, environmental exposures, and their co-metabolism are often more transient or individual-specific.

Critically, a person's metabolism is shaped not just by the ~20,000–25,000 genes encoded in the human genome, but also by the 1–3 million microbial genes they carry. Across the global population, the collective microbial gene pool exceeds 100 million genes, massively expanding the biosynthetic potential and metabolic diversity we can observe.

Untargeted metabolomics, when performed under optimized conditions, detects all ionizable molecules present in sufficient quantity – but that’s just the tip of the iceberg. With advancing instrumentation sensitivity, resolution, and computational annotation tools, I believe we are on the verge of a dramatic expansion.

In the next 5–10 years, I predict that the known human metabolome will increase by 10–20x. And when considering global diversity, including different diets, environments, microbiomes, and chemistries, the number of molecules detected through untargeted metabolomics in humans may well grow orders of magnitude beyond that.

We’re only beginning to map the true scope of biochemical diversity – and the exciting part is that much of it still awaits discovery.

El Abiead: I would like to add that the answer to this question depends on the definition of the dark metabolome. The one that I find most interesting deals with the question of how many metabolites remain to be discovered rather than just ions. What I like about this framing is that it includes metabolites that we might not even see as distinct signals in our current analysis.

Even the chemicals in our blood (which is considered a rather simple sample type in metabolomics) span ~10 orders of magnitude, possibly more, and our current instrumentation is not capable of detecting such a range. We can see this, for example, in targeted metabolomics experiments performed in the exposomics field, where exposure chemicals are routinely observed via highly optimized analytical methods. However, we less frequently pick them up in our untargeted discovery experiments. Other metabolites, such as those produced by our microbiome, cannot be targeted as easily as the majority are yet to be discovered and will require further development of analytical methodology, which would again increase the size of our dark metabolome.

What do you think newcomers to metabolomics should understand about ISF and the dark metabolome when entering the field?

Dorrestein: Newcomers to metabolomics should understand a few key points about in-source fragmentation (ISF) and the dark metabolome:

ISF is tunable: instrument settings can be optimized to either minimize or enhance ISF, depending on the goals of the experiment. It’s not a fixed property of the system but something that can – and should – be controlled.
Counting ISFs does not reveal the number of unknowns (dark metabolome): The presence of ISF-related ions does not tell you how many molecules in a sample are known versus unknown. Before making such assessments, one must first deconvolute all ion forms – including isotopes, adducts, multimers, and fragments – to estimate the actual number of distinct molecular entities detected.
Context is everything: the degree of ISF in a dataset cannot be assumed universally. It is highly experiment- and instrument-dependent, and should be evaluated specifically for each dataset using appropriate methods.

Avoiding both underestimation and overestimation of ISF’s role requires a nuanced understanding of experimental design, source settings, and analytical context. Rather than applying generalizations, each dataset should be critically evaluated within the framework of its acquisition conditions and biological goals.

Pieter C. Dorrestein is a Professor at Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, USA; Director, Collaborative Mass Spectrometry Innovation Center; and Co-Director, Institute for Metabolomics Medicine. Yasin El Abiead is a Postdoctoral Researcher – also at Skaggs School of Pharmacy and Pharmaceutical Sciences.

About the Author(s)

James Strachan

Over the course of my Biomedical Sciences degree it dawned on me that my goal of becoming a scientist didn’t quite mesh with my lack of affinity for lab work. Thinking on my decision to pursue biology rather than English at age 15 – despite an aptitude for the latter – I realized that science writing was a way to combine what I loved with what I was good at. From there I set out to gather as much freelancing experience as I could, spending 2 years developing scientific content for International Innovation, before completing an MSc in Science Communication. After gaining invaluable experience in supporting the communications efforts of CERN and IN-PART, I joined Texere – where I am focused on producing consistently engaging, cutting-edge and innovative content for our specialist audiences around the world.

The Dark Metabolome: No Mere Figment?

Can you comment on the stakes involved in this debate?

What did the original say about the dark metabolome and how were those findings covered in the media?

What do the findings from the recently published preprint from Chi et al. mean for the debate?

Do you think Giera et al.’s work has value as a “wake-up call” to the field to better understand and manage ISF?

Do you consider this debate “closed” – or do we need further research? Do we need a “consensus study?”

Given the current evidence, what can we say about the existence, size, and significance of the dark metabolome?

What do you think newcomers to metabolomics should understand about ISF and the dark metabolome when entering the field?

About the Author(s)

James Strachan

Recommended

The Analytical Scientist Innovation Awards 2024: #7

The Analytical Scientist Innovation Awards 2024: #4

The Analytical Scientist Innovation Awards 2024

Keeping Up with the Power List: Part 1

Explore

Featured Topics

Issues

Techniques & Tools

Applications & Fields

People & Profiles

Business & Education

The Dark Metabolome: No Mere Figment?

Can you comment on the stakes involved in this debate?

What did the original say about the dark metabolome and how were those findings covered in the media?

What do the findings from the recently published preprint from Chi et al. mean for the debate?

Do you think Giera et al.’s work has value as a “wake-up call” to the field to better understand and manage ISF?

Do you consider this debate “closed” – or do we need further research? Do we need a “consensus study?”

Given the current evidence, what can we say about the existence, size, and significance of the dark metabolome?

What do you think newcomers to metabolomics should understand about ISF and the dark metabolome when entering the field?

Newsletters

About the Author(s)

James Strachan

Recommended

Related Content

The Analytical Scientist Innovation Awards 2024: #7

The Analytical Scientist Innovation Awards 2024: #4

The Analytical Scientist Innovation Awards 2024

Keeping Up with the Power List: Part 1