A Crisis in Metabolomics?

A recent perspective paper brings together leading figures in the metabolomics field to highlight a growing concern: poor metabolite identification is allowing implausible compound assignments, overconfident annotations, and insufficiently validated results to enter the literature – and be reused as if they were established fact.

The rapid rise of LC-MS-based metabolomics has enabled researchers to generate vast, complex datasets, opening new avenues for discovery. But it has also exposed weaknesses in how those data are interpreted. According to the authors, the issue lies not with the technology itself, but with how results are assigned, reported, and increasingly recycled through databases, software tools, and AI-driven analyses.

Here, Ian Wilson, a Visiting Professor at Imperial College London and the University of Liverpool, a co-author of the paper, discusses the challenges of metabolite identification, data validation, and the clinical integration of metabolomics.

When did you first become aware that there might be a problem with MS and LC-MS-based metabolomics?

It became clear as metabolomics grew in popularity, particularly with the wider availability of LC-MS. The ease of use of this technology made the field much more accessible, which brought in many researchers without a strong background in metabolomics. While this has expanded the field, it has also led to challenges in how data are interpreted.

One of the main issues is that LC-MS can generate large amounts of data very quickly. Less experienced users may assume that a database match is sufficient to identify a metabolite, but this is not always the case. Databases such as the HMDB (Human Metabolome Database) contain a huge number of compounds – not all of which are true human metabolites – so a mass match alone does not confirm the identity of a molecule in a sample.

Unlike genomics or proteomics, where identification is more directly linked to known sequences, the metabolome is influenced by many factors, including environment, diet, microbiome, and medication. This makes interpretation more complex and increases the risk of misidentification.

Concerns grew further when studies began reporting very large numbers of identified metabolites without appropriate validation. In practice, while thousands of features may be detected, only a small proportion can be confidently identified, and this requires additional steps. Ideally metabolite identities should be reported at one of the four levels described in the Metabolomics Standards Initiative (MSI) of the Metabolomics Society, such as comparison with reference standards, but this is often not done.

Overall, the issue is not so much with the technology itself, but rather how it is used. As the field expands, there is a need for greater awareness of best practices, including careful validation of results, to ensure that findings are accurate and clinically meaningful.

What does this mean for the field – and for what comes downstream of these metabolomics studies?

It means there is a real risk of incorrect findings entering the scientific record and being treated as fact. Misidentified metabolites can lead to flawed biological interpretations, and once these are published, they can influence future studies, databases, and even clinical hypotheses.

The main concern is that errors are not always obvious. A metabolite may appear plausible, especially if similar findings have already been reported. But if those earlier studies were also incorrect, the problem becomes self-reinforcing. Over time, this can distort our understanding of biology and disease. You recently discussed a very good example of the problem with Jeremy Nicholson on the misidentification of two common urinary metabolites where a few minutes search of the biochemical literature by the authors would have highlighted the problem.

For downstream applications – such as biomarker discovery or clinical research – this has important implications. If a metabolite is wrongly identified, it may be used to build hypotheses, guide experiments, or even inform clinical decisions, despite not being biologically relevant.

A key issue is the gap between detecting signals and confirming their identity. While modern platforms can generate large datasets, turning that data into reliable information requires careful validation. This includes checking whether findings are biologically plausible and confirming identities using reference standards where possible. And if the standards are not available, and the metabolite is essential for the hypothesis, well, you are into isolation and full characterization territory.

The pressure to publish quickly can make this step less rigorous, increasing the risk of making one of these “never errors” as Jeremy Nicholson called them. As a result, the field faces a challenge not in generating data, but in ensuring that the conclusions drawn from that data are accurate.

Overall, this highlights the need for stronger validation practices and greater awareness across the field. Improving how results are checked and interpreted will be important to ensure that metabolomics findings are reliable and useful for research and clinical applications.

Is untargeted metabolomics in crisis? An existential threat?

It is not yet a crisis, but it could be heading there – particularly if errors continue to be propagated and amplified through, for example, AI and automated data analysis.

The main concern is how results are interpreted and reported. In some cases, studies make strong biological claims without providing the underlying data needed to confirm that the identified metabolites are actually present. Without proper validation, these findings can be misleading.

There is growing effort within the field to address this. Researchers are working with journals to establish clearer standards, including the requirement to provide supporting data for metabolite identification and to make these data accessible. This is especially important when conclusions and hypotheses are based on a small number of key metabolites.

The important lesson is that detecting a signal, and getting a database hit, is not the same as identifying a compound. Proper identification requires detailed analysis and, ideally, confirmation with reference standards. Without this, there is a risk that incorrect assignments become accepted facts, and are then reused in subsequent studies.

The situation is manageable, but it requires greater transparency, better validation practices, more careful interpretation of results, and much better reviewing and oversight by journal editors. If these steps are taken, the field can continue to grow without compromising the quality of its findings.

Do you think stronger adherence to best practices can act as a form of quality control and help ensure more reliable metabolomics findings?

That is certainly the aim, and strong peer review in reputable journals does provide an important level of quality control. Many flawed papers are filtered out through this process and never reach publication in well-established journals.

However, the challenge is that there are now many journals with varyingstandards. If researchers are under pressure to publish, work that is rejected by more rigorous journals can still be published elsewhere with less scrutiny. This means that peer review alone cannot fully prevent poor-quality data from entering the literature.

Science has always had this issue to some extent, but the scale has increased as the number of researchers and publications has grown. At the same time, the volume of data generated by modern technologies makes careful interpretation ever more difficult.

There are some positive developments. Improved standards, better data transparency, and whilst I have said that it poses a threat in terms of facilitating the widespread dissemination of rubbish, the potential use of AI for preliminary screening could help strengthen quality control. But these tools still depend on expert input and proper validation.

Overall, strong journals and experienced reviewers remain essential, but ensuring data quality also depends on researchers applying best practices and critically evaluating their own findings before attempting publication.

It seems that genomics – and to some extent proteomics – are further ahead in terms of integration and standardization. Why do you think metabolomics hasn’t reached that same level yet?

It is progressing, but more slowly, partly because it is more complex and less integrated into, for example, clinical practice. Unlike genomics, metabolomics reflects a wide range of influences – including environment, diet, and medication – which makes the data harder to interpret and standardize, especially for use clinically.

I think that one of the key challenges is to bring the metabolomics and clinical chemistry communities together. My experience is that they still operate somewhat separately, with limited integration between the two. As a result, metabolomics has not yet been fully accepted as something that might be useful for routine clinical workflows.

There have been efforts to bridge this gap. For example, studies such as the HUSERMET (Human Serum Metabolome) project specifically included a workstream to compare metabolomics data with traditional clinical chemistry results. This showed that while metabolomics provided detailed metabolite information, this could also be aligned with conventional clinical chemistry. Recent re-analysis of the HUSERMET data has shown that both clinical chemistry and metabolomics can provide similar statistical outcomes results but that combining the two approaches gave the most useful results.

But, whilst there are obvious problems with metabolite identification, looking on the bright side, the technology is now capable of supporting a wide range of applications, from disease profiling to environmental exposure studies. The field is currently in transition from a specialized research area towards broader, real-world use – and such transitions are always difficult. How well this progresses will likely depend on better integration with existing clinical tools, improved standardization, and clearer demonstration of how metabolomics adds value in practice.

About the Author(s)

Jessica Allerton

Associate Editor, The Analytical Scientist

James Strachan

Over the course of my Biomedical Sciences degree it dawned on me that my goal of becoming a scientist didn’t quite mesh with my lack of affinity for lab work. Thinking on my decision to pursue biology rather than English at age 15 – despite an aptitude for the latter – I realized that science writing was a way to combine what I loved with what I was good at. From there I set out to gather as much freelancing experience as I could, spending 2 years developing scientific content for International Innovation, before completing an MSc in Science Communication. After gaining invaluable experience in supporting the communications efforts of CERN and IN-PART, I joined Texere – where I am focused on producing consistently engaging, cutting-edge and innovative content for our specialist audiences around the world.

A Crisis in Metabolomics?

When did you first become aware that there might be a problem with MS and LC-MS-based metabolomics?

What does this mean for the field – and for what comes downstream of these metabolomics studies?

Is untargeted metabolomics in crisis? An existential threat?

Do you think stronger adherence to best practices can act as a form of quality control and help ensure more reliable metabolomics findings?

It seems that genomics – and to some extent proteomics – are further ahead in terms of integration and standardization. Why do you think metabolomics hasn’t reached that same level yet?

About the Author(s)

Jessica Allerton

James Strachan

Recommended

The Analytical Scientist Innovation Awards 2024: #7

The Analytical Scientist Innovation Awards 2024: #4

Let Me See That Brain

The Analytical Scientist Innovation Awards 2024

Explore

Featured Topics

Issues

Techniques & Tools

Applications & Fields

People & Profiles

Business & Education

A Crisis in Metabolomics?

When did you first become aware that there might be a problem with MS and LC-MS-based metabolomics?

What does this mean for the field – and for what comes downstream of these metabolomics studies?

Is untargeted metabolomics in crisis? An existential threat?

Do you think stronger adherence to best practices can act as a form of quality control and help ensure more reliable metabolomics findings?

It seems that genomics – and to some extent proteomics – are further ahead in terms of integration and standardization. Why do you think metabolomics hasn’t reached that same level yet?

Newsletters

About the Author(s)

Jessica Allerton

James Strachan

Recommended

Related Content

The Analytical Scientist Innovation Awards 2024: #7

The Analytical Scientist Innovation Awards 2024: #4

Let Me See That Brain

The Analytical Scientist Innovation Awards 2024