Following The Analytical Scientist’s interviews with Pieter Dorrestein and Yasin El Abiead, and responses from Martin Giera and Gary Siuzdak, Shuzhao Li offers a clarifying perspective on the role of in-source fragmentation in untargeted metabolomics. For Li – whose recent preprint surveys LC-MS data from human plasma and serum – the real story is less about conflict and more about context.
In his view, there is no fundamental disagreement between the two “camps”; rather, a miscommunication has arisen from conflating datasets derived from synthetic chemical standards with those from complex biological samples.
Here, Li shares his perspective on how ISF fits into the broader metabolomics landscape, why newcomers should be excited rather than discouraged, and how thoughtful methodology will illuminate the path forward.
Could you share your perspective on how this disagreement arose and what the stakes are?
I don't see a fundamental disagreement between the two papers, Giera et al. (2024) and El Abiead et al. (2025). The latter paper found a lot of ISFs too – perhaps not as dramatic as the 70 percent from the first paper, but El Abiead et al. showed in their Figure 1a about 5,000 protonated compounds (out of 24,419) with two fragments and about 6,000 compounds with one fragment. We can say that ISFs are common in these data from chemical standards.
As someone that pays little attention to media (I have no Twitter, X or Bluesky account), I suspect that misinterpretation occurred outside the Giera et al. paper. The conclusion in the Giera et al. paper was based on chemical standards in METLIN database – a cautionary note on data interpretation but not directly on biological data.
A source of confusion is that the "dark matter" of the “dark metabolome” can refer to MS/MS data or MS (that is, MS1) data. Let's make a distinction here: both MS/MS and MS data can be generated either on chemical standards or on biological samples. The majority of metabolomics data today is LC-MS data from biological samples. The MS/MS spectra of chemical standards are mainly used for metabolite annotation.
The Giera et al. paper used MS/MS data at 0 eV to mimic MS1 data. The El Abiead et al. paper took an MSn dataset and analyzed the MS1 subset. But both focused on data from chemical standards, not biological data. The latter did include two biological datasets to illustrate their point of discovering new metabolites.
One cannot simply extrapolate statistics of chemical standards to biological data. The nature of data differs greatly. For example, LC-MS data from one biological sample can have more than 1,000 spectra and each spectrum has more than 1,000 mass peaks – very different complexity compared to a spectrum from chemical standards. The feature detection in LC-MS typically requires signals in consecutive spectra as an elution peak, therefore filters out many mass peaks. Besides this filtering effect, many molecular species in the biological samples can suppress the low signals of ISFs. El Abiead et al. suggested this last point, but we show it directly on biological data.
What do your findings suggest for the debate around the existence, size, and significance of the dark metabolome?
We only surveyed human plasma/serum metabolomic data, where a large number of compounds are not identified; but they are dependent on detection frequency. The more abundant features are explainable by isotopic and adduct patterns. ISFs contribute to much less than 10 percent of features. This suggests that the overwhelming majority of LC-MS metabolomic features from human blood samples are real compounds.
The unidentified or unknown are less frequent, therefore reflecting variations in the human populations. Part of the "dark metabolome" is exposome, not biological metabolites per se but from environmental factors. I believe these data are critical to fully understand human health and disease. They are part of the biochemistry that is not coded in the genome.
The size of the metabolome needs further analysis. Aligning data cross labs is a difficult problem. We reported our work on a consensus serum metabolome at recent conferences. That preprint is now released and addresses this question.
Do you consider this debate “closed” – or do we need further research?
Our analysis is on LC-MS data and we have a good explanation of most features. There is room for improvement, but in general, I think there is no more mystery. The data collection included common methods on Orbi and TOF. Our full analysis is released as Jupyter notebooks and people are welcome to replicate it.
The MS/MS data have different problems. But as El Abiead et al. pointed out, computational tools are getting better and scientists always need to validate their findings.
ISFs are an interesting phenomenon and Prof. Siuzdak's group have made some smart use of them to aid metabolite identification. I don't think they are a serious concern in biological data.
What do you think newcomers to metabolomics should understand about ISF and the dark metabolome when entering the field?
They should be excited! It is very rare to enter a field with a vast amount of unknown!
We were all newcomers to this exciting field. Tools and standards are still evolving. We have made a few recommendations in our preprint. If I have to add something here, that would be a) don't believe popular media, and b) interpret data in the specific context.