Protein-protein interactions are fundamental to many biological processes in living cells. Membrane proteins play an essential role in many of these processes, where they act as gateways of cellular signaling pathways, pumps, and more, facilitating selective transport processes across membranes. To understand the detailed mechanisms of these processes, modeling the 3D structures of their associated protein complexes is a critical step. While protein complex structures are steadily being determined by experiment and deposited in the Protein Data Bank (PDB),1, 2 experiments are still expensive and a substantial human and instrument time investment, assuming the proteins at hand are compatible with the methods available.3 Moreover, structures of protein complexes are often extremely difficult to determine by experiments, even without the involvement of a membrane. Thus, when a protein complex structure has not yet been experimentally determined, computational tools can be used to construct atomic models.4 A protein docking program can take component proteins, called subunits, as input and assemble them into models of the protein complex. Many general protein docking methods and specialized versions thereof have been publicly released, such as ZDOCK,5 HADDOCK,6 ClusPro,7 RosettaDock,8 HEX,9 SwarmDock,10 and ATTRACT.11 Even protein structure prediction methods like AlphaFold12 have been tweaked to be able to output multimeric structures.13 The rigid-body docking method LZerD14, 15, 16, 17 in particular has been consistently ranked highly in the server category in CAPRI,18, 19 the blind communitywide assessment of protein docking methods.
The fact that proteins in vivo are not generally interacting in isolation in a uniform environment is often confounding to computational modeling of complexes. Even with state-of-the-art modeling techniques, existing docking methods struggle to rigorously handle environments other than a uniform aqueous environment.20, 21 Membranes create an environment where hydrophobic surfaces are not as energetically incentivized to be buried in the protein–protein interface, since such surfaces may competitively interact with the membrane itself. For example, in the transmembrane halorhodopsin protein family as found in halobacteria, which is included in the benchmark of Mem-LZerD, the protein–protein interface is not especially rich in hydrophobic amino acids relative to the remainder of the molecular surface, much of which participates in the protein-lipid interface.22 Highlighting the utility of docking methods, a halorhodopsin structure for Halobacterium salinarium was available in 2000, but no structure for Natronomonas pharaonic was available until 2009.23 Homology-based methods can be used to model the subunits, and a docking method can then be used to explore the space of interaction poses. Other proteins do not wholly embed in the membrane, but instead pass only partway through or interact only with the membrane surface. These peripheral membrane proteins are likewise important and are implicated for example in sensitivity to membrane composition.24 These broad categories of membrane protein break down further into classes with substantial mechanistic differences, from the purely α-helical transmembrane regions most commonly considered by computational methods, transmembrane β-barrels, to peripheral membrane proteins attaching to the membrane with amphipathic helices25 or with hydrophobic loops.26 Transience in certain interactions between these proteins can render them difficult to directly consider in vitro,27 but more accessible via computational modeling.28 It is then clear that techniques capable of modeling interactions involving membranes have the potential to elucidate many cellular processes in many biological contexts. Detailed mechanistic understanding of membrane protein complexes currently represents a major knowledge gap in molecular biology and is the subject of much active investigation.29 Recent studies have shown that protein structure prediction methods can enable combinatorial modeling of putative interactions among, for example, the cytochrome c maturation system I proteins of E. coli.28
Several computational techniques have been developed which predict the modes of interaction of proteins in a membrane environment. In the case of G-protein coupled receptors (GPCRs), highly specialized approaches have been developed and applied to model their oligomerization.30, 31, 32, 33 For more general proteins, MPDock, part of the RosettaMP software collection, takes a specification of membrane chemistry and dimensions to model the assembly of transmembrane complexes from bound experimental structures.34 In their benchmark, the bound subunits were pulled apart, repacked according to the algorithm of Rosetta, and then pulled back together. Memdock carries out rigid-body docking of α-helical membrane proteins by constraining their orientation, before a finer-grained refinement procedure, and makes the assumption that the whole-input center of mass roughly coincides with the membrane midplane.35, 36 JabberDock for membrane proteins represents the subunits with a volume map incorporating dynamics information from an expensive 70 nanosecond molecular dynamics simulation and uses particle swarm optimization techniques to explore the pose space, which required strict constraints on the pose space.37 In summary, MPDock requires bound native structures as input and requires manual processing, Memdock was only tested on α-helical proteins and requires assumptions about the membrane location, and is further unable to include soluble regions of proteins in the modeling, and JabberDock requires substantial molecular dynamics calculations and is limited to exploring a tight region of the search space the precludes tolerance for misorientation. These existing analyses require orientations taken from bound complex structures, rather than orientations predicted separately for each subunit. In the context of blind modeling, however, the precise orientations which individual subunits take on upon binding are not known. A method for blind membrane docking should predict the membrane orientations of the input subunits and have a docking search space which is narrow enough to exclude prediction errors, but broad enough to tolerate reasonable errors in orientation predictions.
Mem-LZerD, which we developed in this work, is based on the LZerD rigid-body docking method and its extensions.14, 15, 16, 17, 38, 39, 40 LZerD uses a soft surface representation of the protein subunits based on geometric hashing41 and 3D Zernike descriptors (3DZDs),42, 43, 44, 45, 46, 47 which allows for fast generation of docking poses without considering side-chain repacking. The geometric hashing procedure originally used internally by regular LZerD also admits more site-specific orientation and translation restrictions. Mem-LZerD targets any transmembrane protein complexes, as well as peripheral membrane proteins. The geometric hashing data structure used by Mem-LZerD is newly augmented with the positioning of each sample point relative to the membrane generated by the Positioning of Proteins in Membranes (PPM) algorithm,48, 49 as well as the angular orientation in the same membrane. Pruning the search space in this way, skipping infeasible poses, yielded a running time 74 times faster on average compared to regular LZerD without search space or model constraints. This speedup highlights that for membrane proteins, most of the calculation time in regular LZerD is spent on infeasible poses, e.g. those which are sideways or entirely outside the membrane, which then contaminates the output model ranking.
Mem-LZerD yielded acceptable models in the CAPRI criteria within the top 10 models for 13 of 21 (61.9%) unbound docking targets of the Memdock benchmark set, which is a greater fraction than successfully modeled by existing methods Memdock or JabberDock. Previous studies used knowledge of the ground truth (i.e. correct) subunit orientations, taken as those found in the OPM database, rather than predicting them as in the Mem-LZerD protocol. When assuming the ground truth orientations, Mem-LZerD successfully modeled 14 of 21 targets (66.7%). On our separate transmembrane protein benchmark set, Mem-LZerD successfully modeled 35 of the 44 (79.5%) unbound benchmark docking targets, while on a preoriented peripheral membrane protein benchmark set, Mem-LZerD successfully modeled 54 of 92 (58.7%). Mem-LZerD has been incorporated into the LZerD webserver, available at https://lzerd.kiharalab.org. We further show that the protocol pipeline of the Orientations of Proteins in Membranes/Positioning of Proteins in Membranes (OPM/PPM) suite,48 Mem-LZerD, and CHARMM-GUI50 can produce ready-for-simulation files of sampled binding poses in explicit lipid membranes.
Comments (0)