Subscribe to Newsletter
Techniques & Tools Mass Spectrometry, Technology, Proteomics

Machine Learning: A Powerful Weapon in Protein Analysis

The term machine learning (ML) appears frequently in the news, yet remains poorly understood. There is a general lack of awareness of the crucial role that ML approaches are already playing in the lab environment, particularly their efficacy in managing large volumes of data, exemplified by their use in various omics analyses.  

In this article, we explore how ML is advancing mass spectrometry (MS)-based analyses, including those focused on proteins (such as single-cell proteomics) and their interactions with the immune system. Example applications in these areas of research include the streamlining of multi-level data integration and biomarker discovery (1). A promising use of ML lies in its ability to predict protein qualities from primary amino acid sequences. Coupling MS with trapped ion mobility spectrometry (TIMS) allows researchers to “speak the language of peptide sequences” by predicting ion mobilities before acquiring data. The result: simplified analysis and reduced false positives.

George Rosenberger

Jonathan Krieger

The power of TIMS
 

TIMS is a contemporary evolution of traditional ion mobility spectrometry (IMS) that keeps ions stationary in a moving column of gas, rather than driving the ions through a stationary phase.

An electric field is used to trap the ions, and modification of this electric field allows researchers to elute ions in a mobility-dependent fashion (2). By separating and eluting ions in dense clusters based on their mobility, groups of molecules with shared structural elements are captured at specific times. These structural elements, unique to each analyte eluted, are represented by an acquired collisional cross section (CCS) value, which is highly reproducible across instruments and labs.

Combining TIMS with MS adds a crucial additional dimension to protein analysis and identification, allowing analysts to transition from 3D to 4D proteomics. The result is increased sensitivity, selectivity, and acquisition speed.

A principal challenge in protein analysis is the matching of acquired values to potential peptide matches in existing databases, also known as peptide spectrum matches (PSMs). This process can be complicated by the presence of many potential matches with similar probability scores. Notably, false positives happen when the wrong PSM is chosen. TIMS has been shown to offer a partial solution as using the CCS values acquired increases the number of PSMs, peptides, and proteins identified in bottom-up proteomics analyses, increasing confidence in assignments.

Deep learning predicts CCS
 

Approaches such as deep learning, a form of ML based on artificial neural networks inspired by the human brain, can be used to predict CCS values from peptide sequences prior to TIMS-MS analysis. The presence of histidine and proline amino acids, as well as general hydrophobicity, are the main drivers of these predicted CCS values (3). Comparing the predicted values with acquired measurements provides us with correlation scores, which can subsequently be used to identify the most likely PSM for a given spectrum. Using ML in this way improves the efficiency of data analysis and reduces the time needed to obtain reliable results.

Existing data already demonstrate the power of these machine-predicted CCS values. In 2021, Yasushi Ishihama and colleagues shared the systematic characterization of CCS values for 4,433 pairs of monophosphorylated and unphosphorylated monopeptides using TIMS (3). Analysis with a ML approach (TIMScore) added over 110,000 PSMs to their published results and doubled the number of peptides observed (reaching almost 100,000).

A transformer model of peptide CCS values has also been applied to ascertain that the accuracy of CCS predictions based on amino acid sequences could be as high as 95 percent for tryptic peptides and 92 percent for phosphorylated tryptic peptides.

Applying TIMS-MS in immunopeptide analysis
 

TIMS-MS has demonstrated potential in various omics fields, including single-cell proteomics, where (traditionally) hundreds of thousands or millions of cells were needed to conduct in-depth protein analyses. A recent paper detailed the use of TIMS-MS to identify an average of 365 proteins from single primary T cells (these numbers increased to 804, 1116, and 1651 proteins for five-, 10-, and 40-cell samples) (4). So, proteome coverage from these relatively small numbers of cells was sufficient for the study of essential metabolic pathways. And post-translational modifications (PTMs), including phosphorylation and acetylation, were recognizable from samples of just one cell. The approach has promise for the analysis of clinically relevant single-cell samples in the future.

The potential in immunopeptidomics, which describes the analysis of peptides presented to T cells, looks set to be particularly important. In fact, MS-based immunopeptidomics represents the only unbiased method available for the identification and characterization of these peptides (5). Researchers at University Hospital Tübingen, Germany, recently used a TIMS-MS method with HLA peptide prediction for the sensitive and high-throughput analysis of human leukocyte antigen (HLA)-associated peptides. The team more than doubled the immunopeptidomic coverage obtained when compared with previous methods. To be specific, they identified 15,000 peptides from approximately 40 million cells. The hope is that increased knowledge of these peptides will drive the production of personalized cancer vaccines and cell therapies targeting HLA peptides. Notably, the method described could also be leveraged for immunopeptidomic profiling in large patient cohorts and to improve existing CCS prediction algorithms for HLA peptides (6).

What lies ahead?
 

TIMS-MS adds another dimension to protein analysis, facilitating speed and sensitivity. Combining this powerful technique with ML is allowing researchers to gain increasing levels of insight into complex systems, which could transform key research areas. In the case of immunopeptidomics, this partnership holds promise for the delivery of new biomarkers and the development of improved therapeutics.

In the future, as computing power continues to advance, ML and its associated approaches will continue to develop from this already exciting base. In the field of protein analysis, this may facilitate the faster analysis of more analytes than previously thought possible. Elsewhere, areas such as data-independent acquisition and de novo sequencing also look set to reap the rewards – namely, simplified analysis, increased accuracy, and higher overall confidence in results.

Credit: Supplied by Interviewees

Receive content, products, events as well as relevant industry updates from The Analytical Scientist and its sponsors.
Stay up to date with our other newsletters and sponsors information, tailored specifically to the fields you are interested in

When you click “Subscribe” we will email you a link, which you must click to verify the email address above and activate your subscription. If you do not receive this email, please contact us at [email protected].
If you wish to unsubscribe, you can update your preferences at any point.

  1. M Mann et al., “Artificial intelligence for proteomics and biomarker discovery,” Cell Syst, 12, 8, 759–770 (2021). DOI: 10.1016/j.cels.2021.06.006. 
  2. ME Ridgeway et al., “Trapped ion mobility spectrometry: A short review,” Int J Mass Spectro, 425, 22–35 (2018). DOI: 10.1016/j.ijms.2018.01.006
  3. K Michelmann et al., “Fundamentals of trapped ion mobility spectrometry,” J Am Soc Mass Spectrom, 26, 1, 14–24 (2015). DOI: 10.1007/s13361-014-0999-4.
  4. K Ogata et al., “Effect of phosphorylation on the collisional cross sections of peptide ions in ion mobility spectrometry,” Mass Spectrom (Tokyo), 10, A0093 (2021). DOI: 10.5702/massspectrometry.A0093.
  5. D Mun et al., “Optimizing single cell proteomics using trapped ion mobility spectrometry for label-free experiments,” Analyst (2023). DOI: 10.1039/d3an00080j.
  6. KM Phulphagar et al., “Sensitive, high-throughput HLA-I and HLA-II immunopeptidomics using parallel accumulation-serial fragmentation mass spectrometry,” bioRxiv (2023). DOI: 10.1101/2023.03.10.532106.
About the Authors
Jonathan Krieger

Head of Research Bruker ProteoScape, Bruker Daltonics


George Rosenberger

Software Project Manager – Machine Learning for MS-based Omics, Bruker Switzerland AG

Related Application Notes
Charge heterogeneity analysis of an acidic protein and identification of its proteoforms using a streamlined icIEF-UV/MS workflow

| Contributed by SCIEX

Site-specific differentiation of hydroxyproline isomers using electron activated dissociation (EAD)

| Contributed by SCIEX

High-Resolution Accurate Mass Library for Forensic Toxicology

| Contributed by Shimadzu

Related Product Profiles
ASMS 2024: Innovations Unveiled

Higher Peaks – Clearly.

| Contributed by Shimadzu Europa

Compact with countless benefits

| Contributed by Shimadzu Europa

Register to The Analytical Scientist

Register to access our FREE online portfolio, request the magazine in print and manage your preferences.

You will benefit from:
  • Unlimited access to ALL articles
  • News, interviews & opinions from leading industry experts
  • Receive print (and PDF) copies of The Analytical Scientist magazine

Register