Computational design of highly signalling-active membrane receptors through solvent-mediated allosteric networks

Gene construction and expression

Haemagglutinin (HA)-tagged A2AR gene in pcDNA3.1(+) was obtained from the complementary DNA library. Genes coding for designed variants were synthesized. Plasmids were transiently transfected into HEK293T cells (ATCC) using Genejuice (Novagen/EMD Millipore, 70967). HEK293T cells at 70–80% confluency in 15 cm tissue culture plates were transfected with 4 μg DNA per plate. After 24 h, the transfection medium was replaced with standard growth medium DMEM supplemented with 2 mM l-glutamine (Lonza Biosciences, BE12-604F), 100 μg ml−1 penicillin, 100 μg ml−1 streptomycin (Thermofisher, 15070063) and 10% fetal bovine serum (Corning, 35-010-CV), and the cells were grown for an additional 24 h before collection.

Membrane preparation

Membranes were prepared from transfected cells using sucrose gradient centrifugation as previously described17. Briefly, cells from 10 cm plates were collected with a cell scraper using PBS solution (VWR, 45000-448). The cells were then pelleted and resuspended in cold hypotonic buffer (1 mM Tris-HCl, pH 6.8 (VWR,IC816100), 10 mM EDTA (VWR, EM-4005) and protease inhibitor cocktail (Promega, PAG6521)) and forced through a 26-gauge needle three times. The cell lysate was layered onto a 38% sucrose solution in buffer A (150 mM NaCl (Merck, S9888), 1 mM MgCl (Merck 442611-M), 10 mM EDTA, 20 mM Tris-HCl, pH 6.8, and protease inhibitor cocktail) in SW-28 ultracentrifuge tubes. The cells were centrifuged at 40,000 g at 4 °C for 20 min, followed by collection of the interface band with an 18-gauge needle. The collected solution was transferred to Ti-45 ultracentrifuge tubes (Beckman) and the volume brought up to 50 ml with buffer A. The sample was then spun at 185,000 g at 4 °C for 30 min. The membrane pellets from each 10 cm plate were resuspended in 0.5 ml buffer A and stored at −80 °C in 100 μl aliquots.

Designed receptor partial purification

The designed variants were partially purified from thawed membrane preparations immediately before assaying via anti-HA agarose beads (Thermofisher, 26181). The membrane preparations were solubilized with 1% n-dodecylmaltoside (DDM; Merck, D4641) for 1 h at 4 °C and loaded onto anti-HA agarose beads for 1 h at 4 °C. The beads were washed with Tris-buffered saline (TBS) containing 0.1% DDM wash buffer three times and the HA-tagged receptor variants were eluted with HA peptide (1 mg ml−1 in TBS with 0.1% DDM). The purity of affinity-purified protein samples was assessed by Coomassie-stained SDS–PAGE and western blot analysis for all receptor variants.

To determine whether the bands observed in western blots resulted from glycosylation, HEK293T cells expressing WT A2AR were treated with 50 μg ml−1 tunicamycin (Sigma, T7765), a glycosylation inhibitor, by replacing the transfection media with culture media supplemented with tunicamycin. The cells were collected after treatment for 24 h (48 h post-transfection) as described above. Partially purified A2AR samples from untreated or treated cells were analysed by western blotting to determine the extent of glycosylation inhibition34 (Supplementary Fig. 1).

Expression and purification of the G protein Gs

G-alphaS (GαS) was cloned into pFastbacI (Thermofisher, 10359016) followed by transformation into DH10α cells (Agilent, 200231). Recombinant bacmid DNA was isolated and transfected into Sf9 insect cells (Sigma, 71104-M) with Cellfectin II (Thermofisher, 10362100). The transfected cells were grown at 28 °C for 72 h followed by centrifugation in 15 ml Falcon tubes to pellet cell debris. The supernatant was saved as P1 viral stock. The virus was amplified by infecting the Sf9 cells with P1 viral stock solution at a twofold multiplicity of infection and cultured for 72 h before collection. This amplification process was repeated to generate a high-titre P3, which was used for infection and protein expression. The P3 stock was used to infect the Sf9 cells at a multiplicity of infection greater than 4, which were collected 48–60 h post infection. The cells were then washed three times in ice-cold PBS and resuspended in homogenization buffer (10 mM Tris-HCl, 25 mM NaCl, 10 mM MgCl2, 1 mM EGTA (Sigma, 234626), 1 mM dithiothreitol (DTT; Sigma, 10197777001), protease inhibitor cocktail and 10 μM guanosine diphosphate (Sigma, G7127), pH 8.0). GαS was purified as described previously35. Antibodies were purchased from Santa Cruz Biotechnology.

G protein activation assays

Receptor variants were assayed for their ability to induce guanine nucleotide exchange in GαS. To measure constitutive activity, the reaction mixture consisted of 4 μM GαS, 20 μM [35S]GTPγS mix (guanosine 5′-(γ-thio)triphosphate tetralithium salt) (Revvity, NEG030H001MC), 50 mM Tris-HCl, pH 7.2, 100 mM NaCl, 4 mM MgCl2 and 1 mM DTT. The receptor concentrations were ~5–20 nM per sample, as estimated from the absorbance of the anti-HA agarose-purified samples at 280 nm. For all samples used in the reactions, western blotting was performed to normalize for receptor quantity using monoclonal anti-HA antibody (Thermo Scientific, 26183). The reactions were initiated by adding 150 μl partially purified receptor samples to 300 μl of the reaction mixture and incubating on ice for 1 h. To measure the ligand-induced receptor activities, a saturating concentration of adenosine (10 μM; Sigma, A9251) was added to the reaction mixture. The reactions were stopped by filter binding onto Millipore nitrocellulose filters (Merck, HAWP04700). The filters were washed three times with ice-cold TBS before incubation with scintillation fluid. Radioactivity counts were measured on a Beckman LS6000 scintillation counter. Mock transfected samples (using empty pcDNA vector) and WT receptor served as controls. Background binding, as measured in the mock transfected samples, was subtracted and the activities of the receptor variants were normalized relative to the WT receptor via densitometry analysis of western blots using ImageJ software. The statistical significance of the differences in the constitutive or ligand-induced activities was assessed by Student t-tests using GraphPad software. All receptor activities were obtained from at least two independent experiments and, in any given experiment, technical triplicates were recorded for each condition.

Apparent agonist-bound receptor stability

The apparent stability of the agonist-bound designed variants was determined by measuring either receptor binding to agonist (apparent melting temperature) or receptor activities (apparent receptor half-life) as a function of temperature. To measure the apparent melting temperature, anti-HA agarose affinity-purified receptor samples prepared as described above were first pre-incubated at increasing temperatures for 30 min. The samples were then incubated with a saturating amount of [3H]adenosine (10 μM; Revvity, NET1161250UC) in a total volume of 150 μl per sample for 1 h on ice, followed by loading onto Millipore nitrocellulose washed for scintillation counting as mentioned above. To measure the apparent receptor half-life, purified receptor samples were pre-incubated at 37 °C for 0–60 min before addition to the reaction mixture containing GαS, [35S]GTPγS and 10 μM adenosine, as described above for the measurement of G protein activation. The apparent melting temperatures and active-state half-life curves were fitted using GraphPad Prism software and analysed for statistical significance by Student t-tests. All receptor activities were obtained from at least two independent experiments and, in any given experiment, technical triplicates were recorded for each condition.

Modelling of receptor active-state conformation

The modelling was performed before the release of the A2AR active-state structure bound to mini-Gs (ref. 22). Hence, we modelled the receptor active conformation by homology to the fully active Gs-bound B2AR structure using the method IPHoLD14,18. As the sequences of B2AR and A2AR on the extracellular side diverge substantially, we took advantage of available partially active agonist-bound A2AR structures. We grafted the extracellular region of the UK-432097 agonist-bound A2AR structure36 onto the intracellular region of the fully active homologue Gs-bound (B2AR) X-ray structure to create a chimeric homologue template for the fully active conformation of the agonist and Gs-bound receptor template. Intracellular loop regions were rebuilt de novo in the presence of bound Gs as described previously37, and the entire structure was relaxed using SPaDES, a method developed for modeling membrane proteins with explicit solvent15. Around 20,000 independent simulations were performed and, from those, the 10% lowest-energy models were clustered, with the centres of the most populated clusters selected as fully active-state receptor templates for the design calculations of activating microswitches and solvent-mediated networks. The A2AR inactive-state X-ray structure bound to the high-affinity antagonist ZM241385 (ref. 38) was selected as a representative inactive-state model for the receptor design templates.

Computational design of activating microswitches and solvent-mediated networks

The inactive- and active-state templates described above were used as starting models in the design calculations of interaction networks bridging the receptor TMHs. Receptor residues in the extracellular and intracellular binding sites were taken from the native adenosine receptor A2AR to preserve adenosine ligand and G-protein binding. We focused our design calculations on the interface positions between the static TMHs 1, 2 and 3 and the switchable TMHs 6 and 7 and selected a total of 22 designable positions.

Owing to the substantial computational overhead associated with the side-chain and solvent rotamer repacking in SPaDES, we restricted the sequence search space to hydrophobic (that is, Leu, Ile, Met, Val and Ala), uncharged polar (that is, Ser, Thr, Asn and Gln) and small charged residues (that is, Asp and Glu). We included Tyr in the polar group for the design space at position 6.48 as the distances between TMHs 6 and 7 at that site are greater and could fit large hydrophobic residues.

We performed the design selection in two steps. In the first step, sequences were designed to either increase or decrease the energy of the active state. In the second step, selected sequences were then threaded and extensively relaxed in both the active- and inactive-state models with explicit hydration to calculate the impact of the designed mutations on that state. The energy difference between both states relative to A2AR defines the quantity ΔΔGapo, which is a measure of the impact of the design on the conformational equilibrium of the receptor and the relative conformational stability of each state. Specifically, the design protocol uses a simulated annealing Metropolis Monte Carlo minimization algorithm to sample sequence space. The energy of each sequence and solvent-mediated interaction network is calculated using a protein side-chain and solvent molecule rotamer repacking protocol, followed by minimization of the receptor structure over all the conformational degrees of freedom using the SPaDES energy function. In these calculations, sites interfacing the static and switchable TMH helices were defined as hydratable and repackable, while residues within 5 Å of hydratable sites were set as repackable. We used the SPaDES hydrate water12 scoring function with pre-Talaris2013 behaviour and initialized the membrane. In the first design step, the active-state energy of each design was assessed from 50 independent models and used to select designs for the second design step. In that stage, the active and inactive states of the selected designs underwent extensive relaxation using SPaDES to provide an accurate estimation of the free energies of the active (ΔGactive) and inactive (ΔGinactive) states as well as ΔΔGapo. Specifically, we generated 500 independent models for each identified design in each state. We found excellent convergence in the top ten scoring models (the average standard errors in ΔGinactive and ΔGactive were 0.32 and 0.21 REU, respectively) and subsequently selected the best scoring model for structural analysis in each case.

We found that the most frequent impact of mutations in the receptor core was the destabilization of both the inactive- and active-state structures (positive ΔGinactive and ΔGactive). As this would probably lead to protein unfolding, such sequences were systematically discarded during our calculations before we applied the selection criteria 1–3 described below. Typical ΔG thresholds from WT A2AR for unselected designs were >8 REU for ΔGinactive and ΔGactive.

Overall, nine single- or double-point mutants and five combinatorial designs, including thirteen distinct designed residues, were selected for experimental validation. These designs were selected because they sampled different levels of the following three criteria: (1) conformational stability of the active state compared with that of the inactive state, (2) water-mediated hydrogen bond connectivity between static and switchable TMHs, and (3) the strength of protein–ion interactions locking and stabilizing the protein into a specific conformation.

Unselected designs displayed the following trends in criteria values:

1.

Criterion 1: ΔΔGapo (that is, ΔGactive − ΔGinactive) was often higher than 5 REU and would likely correspond to a loss in both basal and ligand-induced signalling activities.

2.

Criterion 2: we often observed a loss of the water-mediated hydrogen bond connectivity between static and switchable TMHs that mirrors the properties of the Hyd_low designs selected to validate our design method. The typical threshold for criterion 2 (that is, the number of static–switchable water-mediated hydrogen bonds) for unselected designs was <18.

3.

Criterion 3: we did not observe any striking specific trend for the ion interaction energies that would be systematically orthogonal to criterion 2. If a mutation occluded the Na+ binding site preventing ion binding, it also often led to a loss in water-mediated interactions. As reflected by the distribution of the mutational effects on Na+ interaction energies, criterion 3 is a second-order effect that is not as discriminative as criteria 1 and 2 for selecting competent signalling receptors.

An example script to run SPaDES hydrate is provided in the Supplementary Methods. All designed final models, scripts to regenerate trajectories for the results presented in this report, documentation and detailed demos are available via GitHub at https://github.com/barth-lab/SPaDES_DESIGNS.

Relationships between basal activities and the equilibrium between ligand-free states

The allosteric two-state model (ATSM) provides a theoretical framework for relating the calculated thermodynamic properties of a receptor to its measured allosteric activity. According to the ATSM, we can expect a quasi-linear relationship between basal (that is, ligand-free) activity and free energy in our study. Without ligand stimulus, the basal activity (BA) is directly related to the fraction of ligand-free receptor R in the active state, which can be calculated from the equilibrium constant of the ligand-free receptor in the inactive and active state (KR) and therefore from the free-energy difference between the two states (ΔGapo) by solving the following equations:

$$_}}=-RT\,\mathrm[}/(1-})]$$

(2)

Reliably fitting the Boltzmann distribution described by equation (2), where R is the molar gas constant and T is the temperature, would require a dataset covering the complete range of basal activities (for example, from <5% to >95% of the maximal receptor activity). Nevertheless, the relationship between BA and ΔΔGapo is quasi-linear within the range of measured receptor basal activities in our study (Supplementary Fig. 3). To a first approximation, such linear relationships enable a direct quantitative comparison between experimental basal activity and ΔΔGapo energy that is independent of the energy units. The linear coefficients relating basal activities to Rosetta-based energy units or kilocalories per mole are different because the former differ from the physical free-energy units.

Topologies of water-mediated interaction networks

Protein–solvent interactions were analysed as a network through graph analysis using the software Cytoscape39. In this analysis, each pair of interacting residue and water molecule was assigned a pair of nodes, and each hydrogen bond interaction between them represented by an edge. For each designed protein in each state, solvent positions were extracted from the 10% lowest-energy structural models and clustered to identify positions common to multiple models. These cluster centres defined the input solvent nodes for Cytoscape. A water molecule mediating a static–switchable interaction was defined as a water node sharing at least one edge with a residue node on a static TMH and sharing at least one edge with a residue node on a switchable TMH. A water node connected to a residue node on a static TMH can also mediate a static–switchable interaction if it is sharing an edge with another water node connected to a residue node on a switchable TMH and vice versa. Each edge of these types defined a static–switchable water-mediated interaction in our analysis.

Modelling sodium binding

We expanded SPaDES to model buried ion molecules in protein structures. As SPaDES was developed for protein design calculations that require joined sampling of large structure and sequence spaces, it relies on a compromise between accuracy and speed. The approach approximates ion solvation energies by the interaction energy between the ion and a shell of water molecules (in the bulk) or a combination of water molecules and protein residues in direct contact with the ion (in the protein core). SPaDES energies are thus dominated by short-range interactions and neglect configurational entropic effects or long-range interactions. However, as the design goal was to identify amino acid substitutions that impact the direct contacts with ion molecules, the approximations underlying the model are valid and capture the primary enthalpic determinants of Na+ interactions in protein cores.

Specifically, the method models the effects of amino acid substitutions on protein–ion interactions in the vicinity of known ion binding sites and involves the following steps:

1.

An ion is placed at the origin of a voxelated mesh grid at 0.5 Å intervals in the x, y and z directions in a 5 Å box around the known ion binding site obtained from the input structure. All possible positions of the ion are sampled exhaustively in this grid by moving from −2.5 Å to 2.5 Å in each Cartesian direction, creating a total of 1,331 initial ‘ion placement’ states.

2.

The ion-placed protein structures are hydrated and repacked using SPaDES and clustered using a hierarchical tree based on the geometric position of the ion and its distance matrix from surrounding residues. The top ten scoring (by energy) clusters are selected for ion placement refinement.

3.

In each of the selected structures, the ion is replaced in a voxelated mesh grid at 0.05 Å intervals from −0.2 Å to 0.2 Å in the current ion binding site. This results in 7,290 starting structures for the next step.

4.

These ion-placed protein structures are hydrated and repacked using the SPaDES method and the top ten scoring structures selected as the final poses for analysis.

Our simulations converge to highly reproducible Na+ interaction energies as the average standard error lies within 0.016 REU (Supplementary Table 3).

Scripts to reproduce the ion sampling results, including a detailed demo, are available via GitHub at https://github.com/barth-lab/SPaDES_DESIGNS.

The ion occupancy in a specific state of the receptor was derived by calculating the Boltzmann distribution of an ion molecule buried in the receptor versus fully solvated in the bulk. Energies of Na+ in the bulk were calculated for a Na+ ion fully solvated in a 125 Å3 cubic grid. Briefly, Na+ was placed at the centre of a cubic grid, solvated and then repacked using SPaDES to identify the optimal network of ion–water interactions. Overall, 500 individual trajectories were run, with the lowest-energy being a consistent outcome with six water molecules packed around the ion. The ion–water interaction energy of the lowest-energy configuration was taken as the ion energy in the bulk and used to calculate the Boltzmann distribution. The ion occupancy was directly calculated from the Boltzmann factors of each state.

Deriving Boltzmann occupancies from SPaDES energies required the translation of SPaDES energy units into a thermodynamic scale (that is, kilocalories per mole). We found that the relative effects of the designed mutations on Na+ occupancy were similar for scaling factors translating SPaDES energies into thermodynamic quantities that ranged from 1 REU = 1 kcal mol−1 to 1 REU = 3 kcal mol−1, covering a range of Rosetta forcefields. Overall, these results indicate that, at the qualitative level, our conclusions on the impact of the designs on receptor–Na+ binding interactions are valid and robust to the scoring function.

Sequence conservation analysis

In assessing sequence conservation, we first performed a sequence alignment over all class A GPCRs over the structurally conserved TMHs 1–7 via the GPCRdb database40. We next performed a Basic Local Alignment Search Tool (BLAST) search over this multisequence alignment to calculate E values with respect to human A2AR (where an E value of 10 indicates 10 hits would be expected to be found by chance given the same size of a random sequence database). We selected all those sequences with an E value smaller than 10−2 to account for distant homologues when assessing the conservation of design and solvated sites. At each design site (1.42, 2.46, 2.50, 3.39, 3.43, 5.55, 6.40, 6.41, 6.45 and 7.49) and additional adjacent solvated sites (1.50, 2.43, 5.58, 6.48, 7.45, 7.51 and 7.53), we calculated both the residue conservation with respect to A2AR (or mOR in the case of solvation) and the occurrence of the designed amino acid. The motif analysis reports on how frequently the combinations of mutations selected in the designed receptors occur in natural receptors. Specifically, we searched through all class A GPCRs for occurrences where at least two mutations, such as Asn2.50 and Asp7.49, corresponding to Hyd_High2, naturally occur together. Combinations of mutations not reported in Supplementary Table 11 were not observed.

Structure prediction using AlphaFold

AlphaFold2 was used to predict the structures of A2AR and Hyd_high7 without bound G protein41. AlphaFold2 Multimer42,43 was also applied, wherein the auxiliary sequence of a bound G protein was used to steer template generation. The best Amber-refined output models had an RMSD of 0.44 Å and 0.85 Å for regions with a predicted local distance difference test score >80 for the inactive-state (PDB: 8GNE) and active-state (PDB: 6GDG) structures of A2AR.

Expression vector design of the T4L fusion construct

There are numerous, successful examples of T4L functioning as a stabilizing chaperone protein for numerous GPCR proteins deposited in the PDB. Our strategy was to follow those successful examples and fuse a cysteine-free T4L (ref. 44) to the N terminus of A2AR with one exception: we decided to use a T4Lcp designed by Cellitti et al. that relocates the N-terminal A helix to the carboxy terminus, creating subdomains that are contiguous in sequence and linked by a G4S Poly-Glycine-Serine sequence45. T4Lcp has been crystallized at a higher resolution (1.8 Å) than other T4L constructs and may therefore have a higher probability of positively impacting SPaDES crystallization. The T4Lcp SPaDES fusion sequence was optimized using the structure of the T4Lcp protein (PDB: 2O7A) as follows. To link the two proteins, we designed small linkers between the C terminus of T4Lcp and the N-terminal amino acid of the Hyd_high variants. Following in silico screening and experimental testing, the two-amino acid linker Ala-Pro was found to be the most successful construct that resulted in a stable, soluble protein and, foremost, the one that crystallized.

Cloning and Baculovirus expression of T4Lcp–Hyd_high variants

A T4Lcp 207A–Hyd_high7 fusion including A2AR encompassing residues 2–316 with designed mutations L48A, S91T, I238Y, L194M, V239L, A243L and L95M (numbered relative to reference sequence NP_000666) was PCR-amplified and topoisomerase-cloned into a custom topoisomerase-adapted pFastBac (KX) vector (Life Technologies). The cDNA from this clone was named 14022b11KXem1h1 and its sequence was verified. Expression of this vector generated MKTIIALSYI FCLVFADYKD DDDGAP/ T4L 207A/ AP/A2AR aa2–316 plus L48A, S91T, I238Y, L194M, V239L, A243L and L95M /GS/LVPRGS/HHHHHHHHHH. T4Lcp 207A corresponds to amino acids 2–124 (numbered relative to reference sequence NP_049736). Standard Baculovirus expression using a modified version of the Bac-to-Bac system protocol (Life Technologies, 10359016) in combination with the DH10EMBacY bacmid (Geneva Boiotech, DH10EMBacY, 12 × 100 µl) was used to generate the virus. T4Lcp–A2AR constructs were expressed in Sf9 cells, which were cultured for 48 h, collected by centrifugation and pelleted for storage at −80 °C before purification.

T4Lcp–Hyd_high7 purification

Protein purifications were carried out using cobalt affinity, thrombin cleavage, PNGase F deglycosylation and size-exclusion chromatography (SEC). Infected Sf9 cell pellets were lysed in 20 mM Tris, pH 7.5, 500 mM NaCl, 0.2 mM lauryl maltose neopentyl glycol (LMNG; Anatrace, NG310) and 1 mM CGS21680 (Sigma-Aldrich). Membrane samples were collected by spinning at 44,000 r.p.m. for 45 min. T4Lcp–Hyd_high7 was extracted from Sf9 membranes with a Dounce homogenizer in a solubilization buffer comprising 20 mM DDM (Anatrace, D310), 4 mM cholesteryl hemisuccinate Tris salt (CHS; Anatrace, CH210), 50 mM HEPES, pH 7.8, 500 mM NaCl and 25 μM Cmpd-1. Talon resin (Takara, 635653) was added and mixed continuously overnight at 4 °C. The Talon resin was collected by spinning and washed extensively with Ni washing buffer comprising 5 mM DDM, 1 mM CHS, 50 mM HEPES, pH 7.8, 500 mM NaCl and 10 mM imidazole. This was followed by a second washing step using 2 mM LMNG instead of DDM, 1 mM CHS, 50 mM HEPES, pH 7.8, 500 mM NaCl and 10 mM imidazole. The protein was eluted from the resin using elution buffer comprising 250 mM imidazole, 1 mM LMNG, 50 mM HEPES, pH 7.8, and 500 mM NaCl. A desalting column was then used to exchange buffer into 50 mM HEPES, pH 7.8, 500 mM NaCl, 0.5 mM LMNG and 0.1 mM CGS21680, followed by cleavage with 20 μl thrombin and 20 μl PNGase F overnight at 4 °C. The elution sample was concentrated and loaded onto a SEC column (Superdex 200) with a buffer comprising 0.2 mM LMNG, 20 mM HEPES, pH 7.5, 500 mM NaCl and 25 μM CGS21680. The major peak fractions were combined and concentrated for crystallization and the ligand was added to a final concentration of 500 μM.

Fluorescent thermal stability assay of T4Lcp–Hyd_high7 constructs

The receptors were purified as described for crystallization with the reducing agent removed in the final chromatography step. The receptors were diluted to 2 μM in PBS, 0.5 mM DDM, 0.1 mM CHS, 20 μM 7-diethylamino-3-(4′-maleimidylphenyl)-4-methylcoumarin (Invitrogen) and 100 μM ligand compound. Samples were incubated on ice for 1 h before measurement. Melts were completed in a Qiagen Rotor-Gene Q apparatus, monitoring the blue channel with a thermal ramp of 30–90 °C measuring the fluorescence three times per degree. The data were processed using Excel. The raw fluorescence was normalized and the melting temperatures were calculated from these data.

T4Lcp–Hyd_high7 crystallization

T4Lcp–Hyd_high7 protein (50 mg ml−1) with 0.5 mM of the ligand agonist CGS21680 was mixed with monoolein containing 10% cholesterol in a protein/lipid ratio of 1:1.5 (v/v) using the twin-syringe mixing method46. We performed extensive lipidic cubic phase (LCP) crystallization trials on T4Lcp–Hyd_high7 using a Mosquito robot (TTP Labtech) at room temperature. Initial crystals were obtained from 0.1 M HEPES, pH 7–8, 25–30% polyethylene glycol 300 (PEG300) and 50–200 mM sodium potassium tartrate 2 days after the LCP trays were set up. Optimization screens were performed with well buffer (0.1 M HEPES, pH 7, 27% PEG300 and 50 mM sodium potassium tartrate) with 10% detergent screen (Hampton Research). Final crystal datasets were collected at a resolution of 3.9 Å using 0.1 M HEPES, pH 7, 27% PEG300 and 50 mM sodium potassium tartrate with 5 mM CYMAL-5 (Hampton Research).

Data collection, processing and structure determination

A complete X-ray dataset was collected from a single rod-shaped crystal (~10 × 100 µm2) at 100 K in a single sweep of 900 × 0.2° oscillations and exposure time of 0.4 s at the DLS-I04-1 beamline at the Harwell Diamond Light Source, UK. The whole sample (that is, the LCP sample/blob in the MiTeGen loop that contained the crystal) was exposed to the beam during data collection. The diffraction data were integrated using autoPROC/XDS (refs. 47,48) and merged and scaled in SCALA (ref. 49) in the CCP4 suite50. The crystal structure of the T4Lcp–Hyd_high7–CGS21680 complex was determined by molecular replacement using the previously solved T4Lcp and A2AR structures, 2o79 (ref. 45) and modified 5IU4 (ref. 51), as separate search models. Clear density for the ligand was observed immediately. After numerous refinement cycles with REFMAC5 (ref. 52), the model was further refined using Phenix.SPaDES_refine as described below to reach reasonable R factors and MolProbity scores53,34. The final structure has been deposited in the PDB (8UGW). The solved molecular structure with explicit water molecules is available via GitHub at https://github.com/barth-lab/phenix_with_spades/tree/main/crystal_structure under the name 8UGW_with_SPaDES_waters.pdb.

Crystal packing contact analysis

Packing contacts were estimated by calculating the buried area of the unit cell with cell mates using the Molecular Operating Environment (version 2020.01) software (Chemical Computing Group). Residues from distinct molecules related by symmetry to atoms within 4.5 Å of each other were defined as crystal contacts.

Structure refinement with Phenix.SPaDES_refine

Structure refinement with Phenix alone or SPaDES alone resulted in either suboptimal structure geometry (as measured by MolProbity and all-atom clash scores) or suboptimal agreement with the experimental electron-density map (as measured by R-free), respectively. To address these issues, we considered the Phenix.rosetta_refine protocol53, which aims to improve the refinement of protein structures against low-resolution data by combining the strength of Phenix and Rosetta. Briefly, a RosettaScripts protocol calls Phenix routines to calculate electron-density maps such that the energy can be weighted against the crystallographic likelihood function by normalizing the gradients, while the Rosetta protocol optimizes the model geometry (repack and minimization routines). The method alternates between real-space and reciprocal-space refinement, where the Rosetta constrains reciprocal-space refinement to physically plausible conformations (which improves MolProbity scores), while the density maps restrain the Rosetta sampling in real space (which improves R-free).

Phenix.rosetta_refine was developed to refine water-soluble protein structures using an implicit solvent model. To apply this protocol to the refinement of membrane protein structures with explicitly modelled small solvent molecules, we implemented RosettaMembrane with SPaDES in Phenix.rosetta_refine. SPaDES builds de novo protein–solvent interaction networks and substantially improves the prediction of side-chain conformations and solvent-mediated interactions compared with standard Rosetta and other force fields. The implicit solvation model of Rosetta was replaced with the hybrid implicit–explicit solvation model of SPaDES and the protocol was modified to consider de novo modelled solvent molecules during the geometry optimization routines (repack and minimization). The weights for the real-space restraints were also optimized for the new protocol, which we refer to as Phenix.SPaDES_refine. Specifically, the Phenix.SPaDES_refine protocol performs ten separate cycles of geometry optimization: (1) each of the first three refinement cycles consists of a repack with the following scoring function weights: fa_rep = 0.22, fa_intra_rep = 0.001 and elec_dens_fast = 12.0, then a minimization with the default SPaDES scoring function weights; (2) each of the next five cycles consists of a repack with elec_dens_fast = 12.0 and a minimization with elec_dens_fast = 12.0, then a minimization with the default SPaDES scoring function weights; and (3) each of the final two cycles consists of a repack with elec_dens_fast = 12.0, then a minimization with default SPaDES scoring function weights.

We judged and reported the quality of the resulting models using various metrics, including R-free, MolProbity, all-atom clash score, rotamer geometry, Ramachandran plot and structural quality, as measured by Rosetta in Rosetta energy units. To assess the impact of the explicit solvent molecules in the refinement, we compared the structural models refined using the Phenix.SPaDES_refine and Phenix.rosetta_refine protocols (Supplementary Table 8). Overall, Phenix–SPaDES performed slightly better than Phenix–Rosetta. Unlike the Phenix–SPaDES model, the structure refined using Phenix–Rosetta displayed Ramachandran plot and rotamer outliers. The all-atom clash score and R-free were also considerably higher for the Phenix–Rosetta model, suggesting that Phenix–SPaDES was able to generate a higher model quality than Phenix–Rosetta.

Installation instructions for Phenix.rosetta_refine with SPaDES and detailed documentation and a demo for the refinement described above are available via GitHub at https://github.com/barth-lab/phenix_with_spades.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Comments (0)

No login
gif