Large scale cross-linking to discover, evaluate, and validate the human cell structural proteome and interactome
Tara K. Bartolec1,2, Xabier Vázquez-Campos1, Alexander Norman3, Clement Luong4, Marcus A. Johnson4, Richard J. Payne3,5, Marc R. Wilkins1, Joel P. Mackay4, Jason K. K. Low4
1 Systems Biology Initiative, School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Randwick, NSW, Australia
2 Genome Biology Unit, European Molecular Biology Laboratory, 69117, Heidelberg, Germany
3 School of Chemistry, University of Sydney, Sydney, NSW 2006, Australia
4 School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia
5 Australian Research Council Centre of Excellence for Innovations in Peptide and Protein Science, The University of Sydney, NSW 2006, Australia
Introduction: Recent advances in structural biology have expanded our ability to create experimental structures for proteins and complexes. However, many proteins remain refractory to these approaches or have not yet been analysed. Machine-learning based structure predictors have enabled access to highly accurate protein structure models for entire proteomes. These modellers are trained on experimental structures available in the Protein Data Bank (PDB), which constitute a relatively small subset of proteins, with many structures solved using non-native conditions or sequences. Therefore, a critical question is whether predicted (and PDB) structures reflect the bona fide structures and complexes formed by proteins in their native environment. We investigate this question using a large-scale cross-linking mass spectrometry (XL-MS) resource generated for the human cell.
Methods: To generate a high density and high depth XL-MS dataset for human HEK293 cells, we utilised a multipronged approach. Briefly, we cross-linked four subcellular fractions (nucleus, endoplasmic reticulum, mitochondria and cytosol) using three different cross-linkers with orthogonal chemistries (DHSO, DSSO, DMTMM). Then, we enriched cross-linked peptides using offline size-exclusion chromatography followed by further fractionation by high pH reverse phase HPLC. Mass spectrometry was performed on concatentaed fractions using hybrid-MS2-MS3, or MS/MS with EThcD or HCD, fragmentation strategies. Cross-linked peptides were identified using XlinkX 2.3 or pLink2, using stringent search parameters and post-hoc filtering to control the false discovery rate to <2% at the unique residue pair (URP) or PPI levels.
Preliminary data: Our study has generated the most comprehensive XL-MS dataset reported to date for any species, with 28,910 URPs representing 4,084 unique proteins and 2,110 unique putative PPIs. The use of a subcellular fractionation strategy before cross-linking resulted in significantly improved proteome coverage, whilst orthogonal reactivities (D/E-D/E, K-K and K-D/E) improved the density of cross-linking per protein, especially in intra-protein links. We demonstrate that our resource of URPs confirm and rediscover existing experimental structures, capturing proteoforms and complexes within their approximate subcellular niches and range of conformations. Remarkably, our intra-molecular URPs also largely corroborate thousands of new structures predicted by next-generation modeller AlphaFold2, including those involving proteins (or regions of proteins) without existing resolution, and those lacking any structural precedent. Furthermore, our inter-protein crosslinks recapture the topology of well-described complexes and PPIs, whilst supporting or discovering poorly characterised PPIs. Critically, the inter-protein crosslinks also help localise PPI interfaces, and we use this information to assess quaternary protein structures modelled in AlphaFold-Multimer.
Novel aspect: We have experimentally corroborated thousands of experimental (in vitro) or predicted structures for proteins and PPIs in the human cell.
Event Timeslots (1)