Diving into Proper Proteomics
Let’s not boast of “complete proteomes” or “protein coverage” until we correct our ignorance of post-translational modifications. Proteome analysis is more than just identifying and counting proteins.
Marcus Macht |
When the term “proteomics” was invented back in 1994 at the Siena conference, the idea was to describe all proteins produced by a given organism at a given state. At that time, analytical technologies, such as 2D gel electrophoresis, already allowed us to visualize in excess of 1000 individual spots, each representing one or sometimes more proteins. With the increasing use of mass spectrometry, it has become obvious that these spots do not all belong to different proteins but rather that many of them share a common protein sequence and behave differently due to post-translational modifications. Thus, the number of actual different protein sequences was significantly lower than first thought. Recent successes in genome sequencing revealed that the number of genes in humans, for example, is also much lower than originally expected – around 22,000 individual genes are currently accounted for; for yeast, it’s more like 6,500.
With our ability to identify and quantify several thousands of proteins in a single LC-MS/MS experiment, people tend to state that, “we can identify a complete proteome” and there are publications that indicate “the percentage of proteome covered.” However, I think we need to be extremely careful with such statements for the following three reasons:
- Identification is a tricky task. In the majority of the experiments, bottom-up technologies are used, which actually identify peptides – not proteins. Protein identity is inferred by the statistically most likely assemblage of peptides into a protein sequence. Mutations or species-dependent sequence variations can only be covered if the particular peptide is detected during the experiment, which is often not the case.
- In quantitative analysis it is essential to know what is actually being quantified. Quantification of a protein’s abundance in a multiple reaction monitoring (MRM)-based experiment using two peptides reveals absolutely no information about abundances of different protein species that differ in their modifications, so long as these modifications are not part of the quantified peptides.
- Post-translational modifications (PTMs) in large-scale studies are usually neglected. While phosphorylation is still quite often a focus in proteomic analyses, large-scale glycosylation is rarely investigated. Further modifications, such as methylation or acetylation, have not really been covered in large-scale studies at all and remain completely unknown (save for a few individual proteins). There are numerous good reasons for this, such as a lack of enrichment possibilities and technical complications in the analysis and interpretation of the data, but what it means is that the number, location and regulation of huge numbers of modifications remain opaque.
One could ask, “Is that so important?” The answer clearly has to be, “Absolutely!” We are currently using systems biology to try to understand the regulating networks that exist between proteins and metabolites. Let’s take as an example a patient taking warfarin. This drug, which acts against blood clotting, is still commonly used in the treatment of thrombosis or lung embolism, although you may know it as a rat poison (it’s action here: critical internal bleeding). Patients using warfarin have reduced blood coagulation and sometimes suffer severe side effects. Given its dramatic effect, we would surely expect to see the action of warfarin in the proteome, right? Actually, the answer is yes – but only when we look very carefully indeed. While it is relatively easy to monitor the effect of the drug, its mode of action is to suppress the formation of g-carboxyglutamate modifications on several blood coagulation factors. Without the modifications, these proteins are not functional. Analysis of these PTM’s is complicated, requiring at least a semiquantitative determination of modifications in peptides that are small and of low pI. Given that they originate from proteins that are highly glycosylated and show a tremendous amount of sequence variants, this is no small task. And yet, without this information, a comparison of treated and untreated patient proteomes would be utterly meaningless.
If we want to make claims about proteome analysis and coverage of “complete” proteomes, quantitative as well as comprehensive PTM information must be taken into consideration. In my view, there is still some distance to go before we can characterize a complete proteome, an endeavor that must be supported by both analytical instrument vendors and the scientific community. The temptation to pick low-hanging fruits like high throughput protein ID or protein based quantitation is clearly there, but complementary techniques to bottom-up technologies, such as 2D-gel based methods, reveal post-translational differences relatively easily. Right now, it is only by applying these additional complementary techniques that we can truly reveal the “complete proteome”