Black-Box Data Analysis for Spatial Metabolomics
Automated and reliable tools for spatially annotating metabolites from imaging mass spectrometry data are essential.
Andrew Palmer, Theodore Alexandrov |
In recent years, metabolomics has been recognized as a field of major importance that promises to advance our understanding of cell biology, physiology, and medicine. Metabolites are the ‘small cogs’ in the cellular machinery and consist of small molecules that are ingested, altered, and catalyzed within the cellular machinery, including not only those molecules synthesized within cells but also those gained from the environment, such as vitamins and nutrients. Such molecules are indicative of cellular processes both from the underlying genetics, cell differentiation and the immediate environmental pressures – and they provide a real-time read out of the state of individual cells and cell populations. Cellular activity can be highly spatially localized and so being able to image markers of metabolic activity may provide researchers with new perspectives on biological problems. Traditional methods often treat samples as homogeneous bulk materials, but this risks missing important biological information; for example, the degree of penetration of an anti-cancer drug into a tumor or a secretion of an antibiotic in proximity to invading bacteria.
Imaging mass spectrometry (MS) is essentially a chemical camera that can map the distribution of chemicals across a sample with micrometer precision using highly accurate measurements of the molecules’ masses. Its unique feature is that effectively millions of images are recorded showing the distribution of potentially thousands of molecules. Unfortunately, this makes the datasets very large; a single image can be over 100 GB of data and processing the data is currently the main bottleneck in gaining further biochemical and biological knowledge.
What we need to exploit the full potential of such an advanced technique are algorithms for high-throughput molecular annotation of our ‘big data’ data. Successful algorithms must incorporate existing molecular knowledge databases, efficiently exploit both the mass spectral and spatial information inherently present in imaging mass spec data, but also, importantly, control annotation confidence.
A large body of knowledge on metabolites and metabolic pathways has been accumulated for specific biological systems and recorded in curated databases (for example, HMDB, KEGG, LIPIDMAPS, ChEMBL). We are developing novel spectral and image analysis tools to assess whether these molecules are present in imaging MS data – and where. In fact, this approach is quite different to the usual methods for analyzing mass spectrometry data, which typically focus simply on individual spectra. Our tools will be wrapped up as an online ‘black box’ search engine to which researchers can directly submit their data. Users will receive molecular images corresponding to detected metabolites as an output, which shifts the perspective away from MS peak analysis of individual spectra to high-level analysis of metabolic images linked to molecular knowledge bases.
Over the past few years, we have developed the algorithms that form the cornerstone for such a black box system and evaluated them within the biological analysis pipelines of several collaborators. The next step will be to provide it to the community as an open source engine so that everyone can use it online or offline to turn the chemical pictures produced by imaging MS into functional maps of metabolic activity. This is the core aim of the European Horizon2020 project METASPACE we have just launched that unites eight partners from five countries.
For more information, visit: www.embl.de/research/units/scb/alexandrov. And in When Art Meets Science, enjoy an artistic representation of the future of 3D chemical mapping from Alexandrov.
Andrew Palmer is an early career scientist who received his PhD from the University of Birmingham, UK, in 2014. He has undertaken postdoctoral work at the University of Bremen, Germany, and is currently a postdoctoral fellow at the European Molecular Biology Laboratory, Heidelberg, Germany in the Alexandrov Team. His research combines both experimental chemical analysis and advanced bioinformatics in order to map the distributions of molecules within samples and translate the big chemical data produced into biologically meaningful information.
Theodore Alexandrov received a PhD in mathematics in Russia in 2007, did his postdoc at the University of Bremen, Germany, where he became a group leader at the Center for Industrial Mathematics and the head of its MALDI Imaging Lab. Since 2010, he has been a visiting researcher at University of California San Diego. Alexandrov is also a co-founder and the scientific director of the company SCiLS. Since 2014, he has been a team leader at European Molecular Biology Laboratory in Heidelberg with a research program on spatial metabolomics. The Alexandrov Team at EMBL develops novel tools of computational biology that reveal spatial organization of metabolic processes by exploiting high-throughput metabolic imaging and by translating the big data generated into biological knowledge.