Billions of years of natural evolution has created an astounding variety of biomolecules with diverse structures and functions. This has inspired much effort to harness the power of evolution in the laboratory to create engineered biomolecules with designer properties [1, 2, 3, 4]. Such directed evolution experiments in the laboratory typically entail the diversification of a gene to generate a library of mutants, followed by the identification of unique variants with the desired properties by either ‘screening’ or ‘selection’. Screening is conceptually simpler, where mutants are individually assessed for the property of interest using a high-throughput assay. However, analyzing one mutant at a time significantly restricts the sequence space that can be explored. On the other hand, selection follows the principles of natural evolution, where rare variants with a desired function are isolated from a large pool of mutants using an engineered selection scheme. Vital to the success of such a selection scheme is the ability to connect the desired function to a selectable phenotype, such that the rare mutants with such properties can be effectively enriched and identified.
The pace of natural evolution is slow. This is because a higher mutation frequency needed to support faster evolution is associated with increasingly detrimental effects that are not tolerated by most cells. However, such a slow pace is incompatible with the demands of laboratory evolution, where outcomes are desired at a much faster timescale. Viruses are an interesting exception, and have the ability to evolve much more rapidly relative to living organisms. Evolution of viruses in nature is rapid for a number of reasons. Viruses multiply quickly, with each infection event creating hundreds to thousands of progeny virus. Additionally, viral genomes are much smaller than a typical cell, as viruses rely on the host cell to provide many necessary functions for its replication, allowing them to mutate at a much higher frequency. Important early work took advantage of this to study viral evolution and virus-host co-evolution [5,6]. In addition to their higher natural mutation frequency, the following advantages make viruses a highly attractive vehicle for driving laboratory evolution of biomolecules: 1) viruses enable efficient and controlled delivery of gene libraries into a variety of host cells; 2) they are capable of efficient self-amplification, creating hundreds to thousands of replicas within host cells upon successful replication; 3) it is possible to develop creative schemes connecting the function of diverse biomolecules to the replication of viruses; 4) successfully amplified genotypes are packaged safely into capsids, which could be isolated and characterized, or used to drive further rounds of evolution. In this review, we discuss the recent advances on leveraging these unique capabilities of viruses for performing novel directed evolution experiments of various biomolecules.
One of the earliest and most prominent examples of virus-assisted directed evolution was the phage-display technology [7,8]. The phage particle is used to display a peptide or protein on its surface, while encoding the corresponding sequence in its genome, establishing a direct genotype-phenotype connection. The library of displayed protein or peptide mutants can be selected for binding specific targets, and the selected sequences can be readily characterized through DNA sequencing. This technology has been widely used for the discovery of peptide and protein based affinity agents for both research and therapeutic applications, and these advances have already been captured in many excellent review articles. Here, we instead focus on other innovative use of viruses to evolve biomolecules with novel functions, beyond selective binding affinity toward a desired target.
The phage-display technology is limited to the evolution of proteins or peptides outside of the cellular context, where affinity-based selection is typically used to identify binders. It is challenging to adapt this strategy for the evolution of intracellular targets, and those with other sophisticated functions, such as enzymatic activity. The Liu group has developed a platform for the continuous directed evolution of biomolecules in bacteria, called phage-assisted continuous evolution (PACE) [9∗∗, 10, 11, 12]. In PACE, the biomolecule of interest (BOI) is encoded in the phage genome, and the gene encoding the phage protein III (pIII; essential for phage infectivity) is removed from the phage genome, and supplied in trans from an accessory plasmid (AP) in the host E. coli cells (Figure 1) [9,10]. The expression of pIII is connected to the activity of the BOI, such that only phage particles encoding a BOI variant with desired function facilitate the expression of pIII to produce infectious progeny phage. A mutagenesis plasmid (MP) is also included in the host cell that facilitates high frequency of mutation within the host cell in an arabinose-inducible manner, enabling the creation of sequence diversity. The culture wherein the phage is continuously multiplying – also called the ‘lagoon’ – is constantly diluted with an influx of freshly cultured host cells, and a concomitant outflow to maintain steady volume, such that only those phage progeny encoding active BOI mutants survive in this pool through continuous replication, while inactive variants are eventually washed out. The stringency of the selection can be tuned by the rate of dilution of the lagoon, such that the phage variants containing only the most active BOIs, which enable the most efficient phage replication, can keep up with the continuous dilution. The emergence of ‘cheating phenotypes’ – typically arising from genetic variations within the host cells that allow the phage to bypass the selection scheme – are eliminated through a continuous supply of fresh genetically ‘clean’ host cells. Because of the fast replication cycle of phage, once optimized, PACE can enable up to 40-50 rounds of evolution per day.
Central to engineering a novel biomolecular function using PACE is the ability to connect it to the selective expression of the pIII protein. Doing so is relatively straightforward for functions involved directly in transcription or translation. Indeed, one of the first applications of PACE involved engineering a T7 RNA polymerase (T7RP) to recognize the non-cognate promoter sequences such as T3 and SP6 [9,13,14]. Altering the sequence-specificity of T7RP so drastically was challenging to achieve in a single step; so, a series of designer stepping-stone promoter sequences – first a hybrid T7/T3 sequence, followed by a high-copy T3, and then a low-copy T3 promoter – were used to arrive at the novel T7RP mutants which recognizes T3 promoter sequence with excellent efficiency [9]. A limitation of wild-type T7RP is its strong preference for initiating transcription using GTP, which was also overcome using PACE to generate variants that efficiently start transcription using additional nucleotides [9]. These evolved T7RP sequences harbor many mutations across the protein, highlighting the advantage of performing many cycles of iterative evolution achievable through PACE, which provides access to a significantly broader mutational landscape relative to traditional evolution strategies. [9].
Many additional biomolecular functions have also been successfully coupled to pIII production to facilitate their engineering through PACE. These include: DNA recombination, which was used to flip the pIII from a transcriptionally inactive to active configuration [9]; protease activity that releases active T7RP from an inhibited fusion protein, which then drives pIII expression [15, 16, 17]; orthogonal aminoacyl-tRNA synthetases [18, 19, 20∗] and tRNAs [21, 22, 23] that decode a nonsense/frameshift codon either in T7RP, which then drives pIII expression, or directly in the pIII gene itself; base editors that correct an inactivating mutation to generate active T7RP, which then enables pIII expression [24∗, 25∗, 26, 27, 28∗, 29]; biosynthetic machinery of bicyclomycin (an antibiotic), the product of which alleviate Rho-dependent transcriptional termination to facilitate pIII expression [30]; and methanol dehydrogenase activity, the product of which (formaldehyde) activates pIII expression by blocking the repressor protein FrmR [31]. In addition, PACE selection systems frequently rely on engineered biomolecular interactions to turn on the pIII gene. For example, engineering of sequence-specific DNA binding proteins (TALEN, Cas9) through PACE were achieved by fusing them with the ω subunit of RNAP (RpoZ), which recruits RNAP to promote the transcription of pIII (Figure 1c) [24∗, 25∗, 26,32,33]. More complex two-hybrid systems have also been used. For example, to engineer new protein–protein interactions, the interacting partners are separately fused to a sequence-specific DNA-binding domain and RpoZ, respectively, such that their successful interaction leads to the recruitment of RNAP to drive pIII expression (Figure 1c) [34]. Additionally, split T7RPs have been engineered that are activated when the split halves bind each other, templated by molecular interactions between the domains fused to them, which then leads to pIII expression [35, 36, 37∗, 38, 39]. Such systems have been used for a wide range of applications from engineering protein–protein interaction, developing sensors for small molecules, etc. [35, 36, 37∗, 38, 39] A split T7RP system was also used to improve the solubility of target proteins using PACE, by fusing them to one of the T7RP fragments [40]; the insoluble variants render the fused T7RP fragment unavailable, preventing the reconstitution of active T7RP, whereas the soluble variants promote its soluble expression, enabling the formation of active T7RP that drives pIII expression. Protein–protein interactions in the periplasmic space were also connected to pIII expression in the cytoplasm using the transmembrane CadC sensor protein, enabling evolution of antibody fragments that require disulfide bond formation to fold correctly. [41].
When designing directed evolution schemes, it is sometimes necessary to select against certain potential outcomes to ensure the emergence of a specific function. Examples include variant RNAPs that selectively recognize one promoter sequence but not another, aaRS variants that charge a desired noncanonical amino acid but none of the canonical ones, etc. Negative selection schemes in PACE were enabled by a mutant pIII protein N–C83 that, if expressed, produces non-infective phage particles [14]. By connecting the undesired functions to this mutant pIII, it is possible to select against them during PACE [14,17,18,32,37]. Additionally, to increase the stringency of the selection scheme underlying PACE, sometimes the pIII gene is split into two fragments, which can be ligated together post-translationally using trans-splicing split-intein domains, and each fragment is separately subjected to selective expression [26,40]. A similar splitting strategy is also applied to enable PACE of very large target genes, such as base-editors, which are not tolerated well in the phage genome [27, 28∗, 29].
Although a key advantage of PACE is the ability to rapidly perform many rounds of iterative evolution, extensive optimization is necessary to identify the precise selection conditions with the right level of stringency. The selection scheme of PACE can also be used in a noncontinuous manner, where phage progeny from each round is serially transferred to fresh culture either manually or automatically, which avoids the need to develop and optimize elaborate fluidics setup, and provides greater flexibility to adjust selection pressure in between individual selection rounds [19,20,24∗, 25∗, 26, 27, 28∗,31]. Phage-assisted noncontinuous evolution (PANCE) has been used on its own, or in combination with PACE to evolve a variety of targets, including aaRSs [19,20] and base editors [24∗, 25∗, 26, 27, 28∗], and enzymes [31]. PANCE was also automated to develop strategies such as eVOLVER-enabled phage-assisted continuous evolution (ePACE) [25], and phage- and robotics-assisted near-continuous evolution (PRANCE) [22], which facilitate many parallel selections under different stringencies, long-term evolution experiments, and even automatically adjusting selection stringency between rounds.
Over the last ten years, PACE and PANCE were used for developing numerous engineered biomolecules, including RNA polymerases with altered promoter-specificity [9,13,14], proteases (TEV, botulinum neurotoxin proteases) with novel substrate selectivity [15, 16, 17], enhanced base editors with novel PAM-specificity or editing capabilities [24∗, 25∗, 26, 27, 28∗, 29], enzymes and biosynthetic pathways with enhanced activities [30,31], aaRSs with novel substrate specificities [18, 19, 20∗], orthogonal tRNAs for efficient genetic code expansion [21, 22, 23], evolution of ribosomal RNA for improved translation [42], proteins with improved soluble expression [40], a variant of botulinum neurotoxin that overcomes insect resistance [34], evolution of antibody fragments in the periplasm [41], etc., which are summarized in Table 1. One limitation of PACE is its reliance on random mutagenesis to create sequence diversity, which makes it challenging to select for novel functions that may require multiple distinct synergistic mutations to appear simultaneously. However, this issue can be addressed by starting the selection with a focused mutation library of the target gene, where key residues are custom-randomized. [16,17].
Historically, directed evolution of biomolecules has been carried out almost exclusively using unicellular microorganisms, such as E. coli and S. cerevisiae, as host cells. In addition to obvious factors, such as their robustness and our ability to readily grow and manipulate them, these cells offer a range of unique advantages that are ideally suited for performing directed evolution, such as the following: A) They can stably propagate small plasmids, which enables facile introduction, maintenance, and expression of mutant libraries therein, and subsequent characterization of specific mutants demonstrating a desired phenotype. B) The transformation efficiency of these cells is high enough, such that large gene libraries can be delivered with the maintenance of sequence diversity, but not too high to allow simultaneous uptake of multiple different plasmids into the same cell with high frequency, which largely avoids generating cells carrying multiple different library variants. In contrast, directed evolution in mammalian cells has significantly lagged behind, even though such technology is needed to engineer biomolecular functions that must be optimized in the context of the unique biology of these cells [43,44]. Apart from the increased complexity of their biology and growth conditions, these cells in general do not stably maintain episomal DNA, which makes it significantly more complicated to deliver and maintain gene libraries in these cells. Additionally, common transfection-based plasmid delivery methods typically lead to uptake of many copies of plasmids per cell [45]. Mammalian viruses has been used to overcome some of these limitations (Figure 2a), including controlled delivery of mutant genes, generation of sequence diversity through error-prone replication, as well as enabling selection of a desired function by coupling it to an essential virus gene (Table 2).
Some of the earliest use of viruses in the aid of directed evolution in mammalian cells came from Berkhout and coworkers (Figure 2b) [46, 47∗, 48, 49, 50]. They used the naturally error-prone nature of retrovirus replication to evolve both the virus itself, as well as non-viral BOIs, such as components of the bacteria-derived tetracycline-regulated gene expression system (Tet) [47∗, 48, 49]. Components of the Tet system was used to functionally substitute the endogenous Tat-TAR regulatory mechanism, which is essential for HIV replication. Long-term serial culture of this Tet-dependent virus led to the identification of mutants that show significantly improved performance. However, the low mutation frequency associated with HIV replication makes directed evolution using this system time consuming and significantly limited in scope. The use of live continuously replicating HIV also raises significant concern about the safety profile.
The ability of retroviruses to integrate into the genome has also been exploited for delivering and stably expressing synthetic gene libraries in mammalian cells. This approach has been used with significant success to map potential mutations that confer drug resistance in various targets for cancer therapy, including BCR-ABL [51], farnesyl transferase [52], MEK1 [53], etc. A similar approach was also recently used to deliver mutant library of a bacterial phospholipase D into mammalian cells to select variants that efficiently incorporate non-natural alcohols into phospholipids for membrane labeling [54]. However, random integration of retro- or lentivirus into the host genome may lead to context-dependent variation in expression levels of individual mutants. Additionally, to achieve a clear genotype-phenotype connection, this approach is restricted to the integration of a single copy of a mutant gene per cell. However, for certain gene products (e.g., tRNA) [44], such a low copy number may not provide sufficient expression levels to select for its activity.
In 2018, the Shoulders group presented a novel strategy for continuous directed evolution in mammalian cells aided by adenovirus (Figure 2c) [55]. Using error-prone mutants of the adenovirus DNA polymerase (AdPol), developed earlier by Hoeben et al. [56], they were able to significantly enhance the mutation frequency of the viral genome, without affecting the host genome. Introducing a BOI in this engineered adenovirus genome, lacking AdPol and adenoviral protease (AdProt), allowed its continuous diversification to generate mutant library in situ. To enable selection, the BOI activity was coupled to the expression of the essential AdProt gene in the host cell. As a proof of concept, AdProt expression was placed under the control of the Tet regulation system, and the TetR-based trans-activator (tTA) was evolved to gain resistance to its inhibitor, dox. The feasibility of connecting the activities of additional BOIs, such as a recombinase and an orthogonal aminoacyl-tRNA synthetase, to adenovirus replication was also demonstrated. The key promise of this system is the ability to perform PACE-like continuous diversification/selection in mammalian cells for the first time. The replication-deficient nature of the engineered adenovirus also makes it relatively safe. However, the large size of the adenovirus genome increases the chances of complications arising from off-target mutations. It also makes it difficult to introduce synthetic transgene libraries into the adenovirus genome.
An analogous selection system called VEGAS was recently reported by English et al. [57], which uses the Alphavirus Sindbis for directed evolution in mammalian cells (Figure 2d). The BOI is introduced in the Sindbis virus genome, from which essential structural genes are removed and are supplied in trans. The naturally high mutation rate of this virus (up to 10−3 per nucleotide) enables continuous diversification of BOI, and its activity is coupled to the expression of essential structural genes removed from the viral genome. Using VEGAS, it was possible to rapidly evolve tTA with doxycycline resistance, engineer the active form of GPCRs toward functional signaling endpoints (morphine), and evolve allosteric nanobodies that stabilized the active state of GPCR. However, technical concerns about the VEGAS platform have been recently expressed by Denes et al. [58] In their hands, continual replication led to a loss of the BOI containing virus, as RNAs from the removed structural genes compete for packaging and are preferentially packaged. Safety concerns were also noted, since Sindbis is a pathogenic virus to humans, and a single recombination event can lead to a replication competent virus.
Our group recently developed a directed evolution system in mammalian cells called VADER using adeno-associated virus (AAV) [44], which is a small human virus with no known pathogenicity. The small size of its genome, coupled with the ability to supply all of the native AAV genes in trans (italicize) to support replication, makes it straightforward to introduce synthetic mutant libraries of a BOI into the AAV genome. The activity of the BOI can be coupled to the expression of an essential virus protein, such as the capsid proteins. VADER was used to evolve suppressor tRNAs used for improved noncanonical amino acid mutagenesis in mammalian cells. VADER uses a two-step selection scheme (Figure 2e) to enable the enrichment of tRNAs that are active, but do not cross-react with host aaRSs. Introducing a TAG stop codon at a surface-exposed site of the capsid allows only the proliferation of AAV encoding active tRNA mutants. An azide-containing ncAA was used as the substrate of the orthogonal aaRS-tRNA pair, the incorporation of which enables bioorthogonal chemical labeling of the capsid with biotin, followed by its capture mediated by avidin. Since cross-reactive tRNAs do not charge this ncAA, they fail to undergo enrichment at this step. Some of the limitations of the AAV-mediated selection system are its moderate cargo capacity (∼4.5 kb), and the current inability to use it in a continuous manner if desired.
Comments (0)