As part of our ongoing series of articles exploring the size and significance of the dark metabolome, the role of in-source fragmentation in LC–MS/MS data sets, and what it all means for untargeted metabolomics, we spoke with Oliver Jones, Professor of Analytical Chemistry at RMIT University, Melbourne – and the first to formally define the “dark metabolome.” In this conversation, he reflects on the origins and evolution of the term, considers recent debates over artifacts and data interpretation, and looks ahead to the future of metabolomics – as the field shifts from generating data to ensuring its insights deliver tangible benefits to society.
You played an important role in the development of the term “dark metabolome”...
A few people had used the term in papers before. For instance, “dark matter in metabolomics” appeared in 2015, closely followed by “metabolic dark matter” in a 2016 paper I was part of. A 2018 NMR paper used the term “dark metabolome,” but they didn’t define what it was. I believe I was the first to formally define the term in 2018 as: "All the metabolites present in a system that are either not extracted or not seen using standard analytical methods or are lost/transformed during extraction."
Do you think the way the term has been used has evolved since 2018?
Yes, I think so. Early on, people mostly thought of metabolites where the mass spectra were visible but the compounds couldn’t be identified. But there are other possibilities: extraction procedures can transform one metabolite into another, some compounds may not be extracted at all, and light-sensitive metabolites might degrade before analysis. I think we’ve become more careful about referring to a peak as an “unidentified metabolite."
What are your thoughts on the recent debates over in-source fragmentation in untargeted metabolomics?
When Gary Siuzdak and Martin Giera recently talked about the artifacts in LC–MS data and in-source fragmentation, that got a lot of attention. Their core message, I think, is that you have to be very careful about what you call a biomarker, and to make sure your data quality and QA/QC are really sound, which I agree with. But if they meant to say that this is a much bigger issue than people realize, I’m not sure I agree.
As I said, I think we’ve become more aware of this problem in recent years. We all understand that when you do mass spectrometry, you have to do a lot of quality control to check what you’ve got and confirm its identity. You can find thousands of features, but they don’t all equal metabolites – a lot of them are just dimers, adducts, breakdown products, or leftovers from your mobile phase.
So perhaps the number of unidentified metabolites aren’t quite as numerous as some people suggest; but that doesn’t mean the number of unidentified metabolites are negligible – or a mere figment of the imagination – either.
It is true that metabolomics as a field has promised a lot, but we still don’t have many robust biomarkers – diagnostic biomarkers of disease, or markers of environmental pollution, for instance. But we have to be careful that we don’t inappropriately connect frustrations here with the issue of in-source fragmentation to paint an overly negative picture of the metabolomics field as a whole.
Given what we currently know about the dark metabolome, what’s your view on its size and significance?
Analytical chemists love talking about “unknown unknowns” – I’ve seen Donald Rumsfeld’s line on many conference slides over the years. It’s intriguing that we detect features we can’t identify, but I don’t think there’s an enormous pool of truly missing metabolites. Some may simply be misclassified, and no single analytical method can capture everything – which is why multiple approaches are essential.
I’d say the brutally honest answer is, we still don’t know. The area hasn’t been explored in depth, and too often unidentified peaks have just been labelled “unknown 1” or “unknown 2” and then set aside. I’ve done that myself: if it looked important, I’d try to identify it; if not, I’d move on. We need to be stricter in what gets published, ensuring enough samples and replication to prove that any “unknown” is real and meaningful.
Of course, some metabolites may never be seen – they degrade before analysis or during extraction. In some cases, one compound may even be converted into another, leading to misleading results; for example, arginine can be converted into ornithine during chemical derivatization.
So, my view is that the dark metabolome deserves more attention. But we’re still far from knowing how big it really is.
Do you think this debate is a sign of healthy scrutiny in the field, or has it gone too far?
Overall, scrutiny is a good thing. People should be able to ask: Are you sure about this? How did you check it? Does it stand up to detailed analysis? If you’re confident in your work, you should welcome that kind of examination. After all, we want metabolomics research to be used for beneficial purposes – in medicine, environmental monitoring, and food safety, etc. – so the data must be reliable.
What people object to is unfair scrutiny. Broad claims that all metabolomics data is questionable aren’t constructive. Debate that encourages careful thinking is healthy; undermining the credibility of an entire discipline is not.
Are there any big trends in the metabolomics field more broadly that you find especially exciting?
I’m not sure there’s one single new area, but there’s certainly a lot happening. There’s some really nice medical work coming out – particularly Darren Creek’s work in malaria metabolism, which I think is really valuable, as is David Beale’s efforts to link environmental metabolomics with policy.
What I find most interesting is the growing discussion about how we can get metabolomics out of the lab and deliver something useful to society. For a number of years at metabolomics conferences, people would present studies saying, “Look, we found this XYZ,” and it was good science, a nice paper – but the question remained: how does that benefit the public?
My own area of interest is environmental metabolomics – can we use it as an early warning system of harm from contamination, or can we identify markers of specific pollutants? Mark Viant at the University of Birmingham, UK, has done a lot of work engaging with the environmental and chemical industries, asking: what do you need for metabolomics to be part of your safety assessments? How can we use environmental metabolomics to improve chemical safety, and what’s holding it back?
The same kinds of conversations are happening in medical metabolomics: why don’t we have more biomarkers of health conditions that can improve patient safety? I think the field is shifting from the data-generation phase to the “how do we ensure this benefits society?” phase. That’s encouraging – and it’s what will help metabolomics become more like proteomics or genomics, which are already more widely applied.
How far behind would you say metabolomics is compared to proteomics? Are there lessons metabolomics can take from the proteomics world?
I think proteomics is sometimes seen as metabolomics’ big brother – further along the track in terms of developing as a science and moving out of the lab and into society. There are definitely lessons we can take, especially around quality control and standardized methodologies.
Metabolomics probably also needs to pick up the pace in terms of sample numbers. In proteomics they established MIAPE – Minimum Information About a Proteomics Experiment – which set out community-agreed standards for what constitutes a quality experiment. There was a similar attempt in metabolomics – the Metabolomics Standards Initiative – that I was part of, but it never really took off.
Now, if you want to publish in Metabolomics, the journal requires that you deposit your data in a public repository, which is a good practice. But I am not sure if this is checked and there’s still a lot of variation in what counts as a metabolomics experiment – sample size, confidence in metabolite identification, whether you relied only on a library search or also used standards, whether you analyzed the raw spectra etc. There are different levels of identification, but you’re only required to report what level you achieved – you don’t have to go all the way. So, yes, there’s still a bit to learn and to do, but metabolomics is getting there, and I’m really looking forward to seeing what’s next.
Previous Articles in the Series
Pieter Dorrestein and Yasin El Abiead: The Dark Metabolome: No Mere Figment?
Martin Giera and Gary Siuzdak: The Dark Metabolome Debate Continues
Shuzhao Li: A Call for Context
Gary Patti: Metabolomics Is Not in Crisis
Jan-Christoph Wolf: Does In-Source Fragmentation Require a Soft Touch?