Small Molecule Discovery – and Make It Snappy!
An MS-based algorithm could transform natural product drug discovery
Researchers across the life sciences face the crucial challenge of correctly identifying small molecules in a sample. Historically, natural product drug discovery has been a low-throughput process that depends a lot on luck – just think of how penicillin was discovered! Though recent decades have shown significant advances in genomics and high-throughput MS-based data collection, trawling databases for this information is difficult and takes time. To add to this, existing approaches are based on chemical domain knowledge and often fail to explain many of the peaks found in mass spectra.
Now, a team of researchers from Pittsburgh’s Carnegie Mellon University and Russia’s St. Petersburg State University have created an MS-based algorithm that can quickly and accurately identify whether a particular molecule is truly new or has previously been discovered.
“When we started this study, efficient and accurate methods for identification of small molecules from their mass spectra were not available,” says Hosein Mohimani, part of the research team. “We had previously developed scalable methods (such as Dereplicator and Dereplicator+) for identifying small molecules, but they failed to correctly identify a large portion.” MolDiscovery builds on these previous attempts by combining machine learning and expert knowledge to create theoretical MS fragmentation patterns from the molecular structures and scoring these against query mass spectra.
“Our results showed that molDiscovery outperforms state-of-the-art methods in accuracy and efficiency. Additionally, unlike existing machine learning methods, molDiscovery generalizes well to unseen data,” says Mohimani. In fact, the paper reports that molDiscovery identified six times more unique small molecules than previous methods.
But the researchers don’t plan to stop there. They are already working on various extensions to molDiscovery and plan to incorporate expert knowledge from analytical chemistry literature into their model to further improve accuracy. “We are also working on more complex models that automatically learn unknown small molecule fragmentation rules. We also plan to integrate molDiscovery and its derivatives into our computational pipelines for high-throughput natural products discovery from multi-omic data, such as NRPminer and MetaMiner,” says Mohimani. “We believe MolDiscovery and its derivatives will play a crucial role in shaping the future of data-driven natural product drug discovery.”
- L Cao et al., Nat Comms, 12, 3718 (2021). DOI: 10.1038/s41467-021-23986-0.
By the time I finished my degree in Microbiology I had come to one conclusion – I did not want to work in a lab. Instead, I decided to move to the south of Spain to teach English. After two brilliant years, I realized that I missed science, and what I really enjoyed was communicating scientific ideas – whether that be to four-year-olds or mature professionals. On returning to England I landed a role in science writing and found it combined my passions perfectly. Now at Texere, I get to hone these skills every day by writing about the latest research in an exciting, creative way.