In recent years, metabolomics has been recognized as a field of major importance that promises to advance our understanding of cell biology, physiology, and medicine. Metabolites are the ‘small cogs’ in the cellular machinery and consist of small molecules that are ingested, altered, and catalyzed within the cellular machinery, including not only those molecules synthesized within cells but also those gained from the environment, such as vitamins and nutrients. Such molecules are indicative of cellular processes both from the underlying genetics, cell differentiation and the immediate environmental pressures – and they provide a real-time read out of the state of individual cells and cell populations. Cellular activity can be highly spatially localized and so being able to image markers of metabolic activity may provide researchers with new perspectives on biological problems. Traditional methods often treat samples as homogeneous bulk materials, but this risks missing important biological information; for example, the degree of penetration of an anti-cancer drug into a tumor or a secretion of an antibiotic in proximity to invading bacteria.
Imaging mass spectrometry (MS) is essentially a chemical camera that can map the distribution of chemicals across a sample with micrometer precision using highly accurate measurements of the molecules’ masses. Its unique feature is that effectively millions of images are recorded showing the distribution of potentially thousands of molecules. Unfortunately, this makes the datasets very large; a single image can be over 100 GB of data and processing the data is currently the main bottleneck in gaining further biochemical and biological knowledge. What we need to exploit the full potential of such an advanced technique are algorithms for high-throughput molecular annotation of our ‘big data’ data. Successful algorithms must incorporate existing molecular knowledge databases, efficiently exploit both the mass spectral and spatial information inherently present in imaging mass spec data, but also, importantly, control annotation confidence.
A large body of knowledge on metabolites and metabolic pathways has been accumulated for specific biological systems and recorded in curated databases (for example, HMDB, KEGG, LIPIDMAPS, ChEMBL). We are developing novel spectral and image analysis tools to assess whether these molecules are present in imaging MS data – and where. In fact, this approach is quite different to the usual methods for analyzing mass spectrometry data, which typically focus simply on individual spectra. Our tools will be wrapped up as an online ‘black box’ search engine to which researchers can directly submit their data. Users will receive molecular images corresponding to detected metabolites as an output, which shifts the perspective away from MS peak analysis of individual spectra to high-level analysis of metabolic images linked to molecular knowledge bases. Over the past few years, we have developed the algorithms that form the cornerstone for such a black box system and evaluated them within the biological analysis pipelines of several collaborators. The next step will be to provide it to the community as an open source engine so that everyone can use it online or offline to turn the chemical pictures produced by imaging MS into functional maps of metabolic activity. This is the core aim of the European Horizon2020 project METASPACE we have just launched that unites eight partners from five countries.
For more information, visit: www.embl.de/research/units/scb/alexandrov. And in When Art Meets Science, enjoy an artistic representation of the future of 3D chemical mapping from Alexandrov.