Chapter 11 - Springer Link

5 downloads 0 Views 2MB Size Report
infectious virus-based systems is the antiviral lectin Griffithsin. Using the Geneware® system, functional Griffithsin was expressed in N. benthamiana at a very ...
Chapter 11 Recombinant Protein Expression in Nicotiana Nobuyuki Matoba, Keith R. Davis, and Kenneth E. Palmer Abstract Recombinant protein pharmaceuticals are now widely used in treatment of chronic diseases, and several recombinant protein subunit vaccines are approved for human and veterinary use. With growing demand for complex protein pharmaceuticals, such as monoclonal antibodies, manufacturing capacity is becoming limited. There is increasing need for safe, scalable, and economical alternatives to mammalian cell culture-based manufacturing systems, which require substantial capital investment for new manufacturing facilities. Since a seminal paper reporting immunoglobulin expression in transgenic plants was published in 1989, there have been many technological advances in plant expression systems to the present time where production of proteins in leaf tissues of nonfood crops such as Nicotiana species is considered a viable alternative. In particular, transient expression systems derived from recombinant plant viral vectors offer opportunities for rapid expression screening, construct optimization, and expression scaleup. Extraction of recombinant proteins from Nicotiana leaf tissues can be achieved by collection of secreted protein fractions, or from a total protein extract after grinding the leaves with buffer. After separation from solids, the major purification challenge is contamination with elements of the photosynthetic complex, which can be solved by application of a variety of facile and proven strategies. In conclusion, the technologies required for safe, efficient, scalable manufacture of recombinant proteins in Nicotiana leaf tissues have matured to the point where several products have already been tested in phase I clinical trials and will soon be followed by a rich pipeline of recombinant vaccines, microbicides, and therapeutic proteins. Key words: Nicotiana benthamiana, Recombinant protein expression, Transient expression, Molecular farming, Plant-made pharmaceutical

1. Introduction to Plant-Made Pharmaceuticals

Since the ability to genetically engineer plants was established in the early 1980s (1), researchers have endeavored to utilize plantproduced proteins for a variety of applications, including the production of recombinant pharmaceutical proteins. The potential advantages of using plant-based expression systems include the

James A. Birchler (ed.), Plant Chromosome Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 701, DOI 10.1007/978-1-61737-957-4_11, © Springer Science+Business Media, LLC 2011

199

200

Matoba, Davis, and Palmer

ability to produce complex proteins that require posttranslational modifications, excluding the possibility of introducing human pathogens during the manufacturing process, and the ability to scale up production efficiently and cost-effectively (2–6). The field of plant-made pharmaceuticals (PMPs) has steadily evolved since the expression of a functional mouse IgG in tobacco was published in 1989 (7) to the point where several plant-produced proteins have been used in clinical trials (8–10). Having several PMP therapeutics in advanced clinical trials is a major breakthrough and has initiated more focused efforts on developing a regulatory process for approving plant-made biologics (11). A number of plant-based expression systems have been developed to express recombinant proteins. These expression systems include plant cell cultures and intact plants, and the use of both stable transformation and transient expression systems (12). Plant cell culture systems derived from plants that have been used to produce biopharmaceuticals include tobacco, tomato, soybean, rice, carrot cells, and Arabidopsis thaliana (13–15). Other cell-based systems include algae and moss bioreactors (16–18). Advantages that cell-based systems may provide compared to whole plants include the ability to produce proteins under more controlled conditions, which is important for good manufacturing practice (GMP) protein production, shorter turnaround times for production runs, and in some cases, protein purification can be less expensive due to streamlined downstream processing (15, 16). A wide variety of plant species have been tested for their ability to produce recombinant pharmaceutical proteins, including Nicotiana species, safflower, tomato, potato, soybean, alfalfa, spinach, A. thaliana, maize, and rice (4, 12). The major strategies for expressing proteins in whole plants are transient expression with viral vectors and stable transformation where transgenes are targeted to either the nuclear or chloroplast genome (3, 12, 19–23). Stable transformation offers the advantage that it is scalable to large field production methods. However, this can be offset in some cases by the lower expression level often obtained in transgenic plants compared to transient expression and the length of time required to create transgenic lines that have high expression levels that are stable across multiple generations. Nicotiana species have long been model systems in plant biology and became particularly prominent as the field of plant molecular biology and plant transformation began. Nicotiana benthamiana, in particular, has been a workhorse for studies of plant–virus interactions due to its susceptibility to a wide variety of viruses and ease of use in laboratory settings (24). Based on the detailed understanding of viral replication and protein expression obtained from decades of research from many groups, transient expression using viral vectors has become a major strategy for

Protein Expression in Plants

201

expressing proteins in plants, with the majority of current efforts focused on N. benthamiana and tobacco mosaic virus (TMV)based vector systems (19, 20, 25, 26). Transient expression in N. benthamiana offers the advantages of high expression levels, relatively short production times of days to several weeks, and ease of use in controlled growth conditions where optimal parameters for biomass production under GMP can be obtained. In addition, since N. benthamiana is not a food crop, there is less concern about contaminating food supplies. A limitation of the infectious TMV vector systems is that the expression of proteins larger than 50 kDa becomes difficult. This has been largely solved with the introduction of “deconstructed” viral vectors that can be delivered to plant cells by Agrobacterium tumefaciens, resulting in high level expression of larger proteins (20, 27). A number of different recombinant proteins have been expressed in these various plant systems including numerous antigens for vaccine production, hormones, growth factors, blood proteins, cytokines, enzymes, and antibodies (2–4, 28, 29). In many cases, these proteins retained biological activity, and at least, a few were expressed at levels sufficient to support commercial production (30). As has been observed in all prokaryotic and eukaryotic expression systems, there is significant variability in plant expression levels of specific proteins and not all proteins express well in plants. Moreover, within the arena of plant-based expression, a given protein may express much better in one system versus another (12). However, one clear conclusion that can be drawn from these studies is that plant-based expression is a viable alternative for the production of some pharmaceutical proteins. It is very likely that within the next several years the first plant-produced therapeutic protein will be approved and made available to patients. This milestone will usher in a new wave of development in PMP research and help drive further improvements in plant-based expression technologies and downstream processing of plant-derived proteins.

2. Protein Expression Methods in Nicotiana Species: Historical Overview and Progress 2.1. Stable Expression Systems

Nuclear genome transformation: Historically, transgenic plants obtained via nuclear genome transformation have been most widely used for the expression of recombinant proteins. Nuclear genome transformation of Nicotiana plants is easily accomplished by methods exploiting A. tumefaciens. A specific tissue culture procedure for this purpose has been well established, which usually takes about 3  months (31). Once transformants are established, the transferred gene is stably maintained and inherited as part of the plant chromosome in a Mendelian fashion. However,

202

Matoba, Davis, and Palmer

varying degrees of target gene expression are commonly observed for different transgenic lines (e.g., see (32)). This is due to ­random transgene integration into the chromosome in Agrobacteriummediated transformation and is commonly referred to as “position effect.” Therefore, screening multiple transgenic lines (a few dozen or more) is necessary to identify good expressors. Such interline expression variance can also occur within different progenies from the same parental line albeit with stable transgene heredity. While homozygous lines may show higher expression than heterozygous plants (33), suppressed expression is often observed due to transgene silencing (34). Furthermore, even within the same plant, there can be considerable degrees of diversity in transgene expression levels, depending on growth stage and physiological conditions; typically, young and growing tissue expresses higher levels of recombinant proteins. Therefore, even though the transgene stably exists in the system, the reality is that careful monitoring of expression along with good agricultural practice is essential to obtain a sustainable supply of the protein of interest. Typical expression levels in nuclear genome transformants usually do not exceed 1–2% of total proteins in leaf tissue (less than 100 mg/kg of fresh biomass) and are largely dependent on the transgene construct (promoter, 5¢ and 3¢ untranslated regions (UTRs), transcript stability, codon usage, etc.) and the nature of expressed proteins (size, solubility, stability, toxicity, accumulation sites, etc.). Transplastomic plants: The plastid genome can also be transformed via biolistic transgene delivery, generating a different type of transgenic plants called “transplastomic” plants (35). Although the procedure to develop transplastomic Nicotiana plants is not as straightforward as Agrobacterium-mediated nuclear genome transformation, as it takes 7–12 months (36), chloroplasts have been shown to be capable of expressing very high levels of recombinant proteins, often reaching an order of magnitude higher than those obtained with best expressors via nuclear transformation (23). The advantages of plastid-based recombinant protein expression are multifold. These include the high ploidy of the organelle, lack of gene silencing, and the availability of sitedirected transgene integration through homologous recombination. These features provide higher, more controlled, and uniform transgene expression than conventional transgenic plants based on nuclear transformation. Concurrent expression of multiple genes can be readily achieved by utilizing polycistronic operons (37–39). Furthermore, environmental concerns are alleviated by the strict maternal inheritance of the plastid in most species, including Nicotiana plants (23); thus, transgenes will not disseminate into nearby plants through pollen outflow. While many of the unique advantages of protein expression in this organelle listed above are associated with the prokaryotic

Protein Expression in Plants

203

nature of plastids, the same is also true for the system’s shortcomings. Among the major limitations is the plastid’s inability to perform posttranslational modifications such as N-glycosylation, which plays important roles in the bioactivity and pharmacokinetics of many therapeutic proteins, including monoclonal antibodies (40). Plant cell/tissue culture: Transgenic cell and tissue cultures derived from Nicotiana plants have also been explored as platforms for recombinant protein expression (41). Historically, plant cells and hairy roots have been developed for the production of various plant secondary metabolites, including food additives, nutraceuticals, and pharmaceuticals (42, 43). For cell culturebased recombinant protein production, tobacco-derived Bright Yellow-2 (BY-2) and Nicotiana tabacum-1 (NT-1) cells are frequently chosen for protein expression due to well-established transformation and propagation procedures and favorable growth characteristics (13). Meanwhile, hairy root culture represents a type of tissue culturebased protein expression (44, 45). Neoplastic, clonally expanding roots are readily developed by transformation of preestablished transgenic plants expressing a protein of interest with Agrobacterium rhizogenes. The advantages of these in vitro systems over whole-plant systems are mainly attributed to their greater amenability to controlled production conditioning for consistent product quality and yield, similar to that obtained in microbial fermentations and mammalian cell culture systems. From a product development standpoint, such features might make regulatory approval easier to obtain for plant culture systems than whole-plant systems. Indeed, the world’s first licensed, plantmade protein pharmaceutical is a vaccine antigen for poultry Newcastle disease expressed in tobacco cell culture; the first product for human use will likely be glucocerebrosidase, which is produced in carrot suspension cells for Gaucher’s disease (46). However, tight regulation in a closed system in turn limits production scalability, which is one of the most attractive advantages of whole-plant-based systems. Obvious challenges for plant culture systems would, therefore, be the efficiencies of protein expression and recovery. In this context, a critical factor determining the feasibility of the culture systems would be whether the protein of interest can be efficiently secreted in culture medium or not because this will significantly impact the complexity and cost of downstream processing (41). 2.2. Transient Expression Systems

Each of the above transgenic-based expression systems has its own unique advantages. Nevertheless, transgenic approaches require long development times, which have been a major bottleneck not only for their progress but also for technical improvements. As a result, a general concept for plant-based protein expression has been “low and slow,” a few successful cases aside.

204

Matoba, Davis, and Palmer

Meanwhile, the recent advent of highly efficient transient expression systems has completely changed the concept and revolutionized plant-made pharmaceutical research. Whole recombinant virus-based expression: Transient expression of recombinant proteins in Nicotiana plants is currently performed by the use of engineered infectious plant viruses or Agrobacteriummediated DNA transfer (agroinfiltration). Over the last decade, many plant virus vectors have been engineered for the expression of foreign proteins in Nicotiana plants. Most of these are based on RNA viruses such as tobacco mosaic virus (TMV) and potato virus X (PVX) (19). The first generation plant virus vectors utilize infection-competent viruses, ­represented by the modified TMV-based Geneware® system (Kentucky BioProcessing, LLC, Owensboro, KY). In essence, such a vector is comprised of the viral cDNA harboring a gene of interest either as a fusion to viral coat proteins (CPs), mainly for epitope presentation as vaccine antigen (47), or placed downstream of an additional subgenomic promoter (19, 48). Viruses are inoculated into the leaf initially as infectious RNA, which is created from the vector either through in  vitro transcription or agroinfiltration followed by in planta transcription (48). Thus, the protein of interest is coexpressed along with systemic viral spread and replication, with maximal expression usually obtained within 2–3  weeks postinfection. A recent notable example of recombinant proteins expressed by infectious virus-based systems is the antiviral lectin Griffithsin. Using the Geneware® system, functional Griffithsin was expressed in N. benthamiana at a very high level, reaching as high as 5 g of the protein per kg of leaf biomass (30). Such high levels of expression with this type of virus vectors are, however, usually limited to small proteins whose coding sequences are less than 1.5 kb. This is due to the increased genetic instability of recombinant viruses carrying a larger foreign sequence (20, 49). Despite this limitation, this method offers a viable option for the mass production of small proteins such as antiviral lectins and monoclonal antibody single-chain variable fragments (scFvs). Deconstructed virus-based expression: A major breakthrough in viral expression strategies was facilitated by the recent advent of deconstructed virus vectors, originally reported for the TMVbased magnICON® system, developed by ICON Genetics GmbH (Halle, Germany) (50). The essence of improvements in this system from the first generation viral vectors are: (1) deletion of the viral CP gene to enhance the stability and size compatibility of a transgene, (2) viral cDNA modifications facilitating in planta RNA replicon recovery upon Agrobacterium-mediated DNA transfer, and (3) efficient whole-plant vector delivery by vacuumbased agroinfiltration (“magnifection”) to compensate for defective systemic movement due to CP deletion (27, 51). These improvements allowed the uniform and high-level (gram per kg

Protein Expression in Plants

205

biomass) expression of larger proteins in N. benthamiana plants within 10  days. ICON Genetics further developed a similar deconstructed viral vector system based on PVX. Taking advantage of the fact that TMV and PVX do not compete during ­replication, fully assembled immunoglobulin (Ig)G molecules were expressed at up to 0.5 g per kg of leaf by codelivering deconstructed TMV and PVX vectors (each encoding a gene for Ab heavy or light chains) (52). This technology may provide the most rapid means among all currently available recombinant expression systems for the production of full length monoclonal antibodies from genes in various production scales ranging from bench to commercialization (53). A potential limitation of the magnifection method is that it is technically challenging to scale up; however, this impediment has recently been solved by ­development of a robotic magnifection system by Kentucky BioProcessing, LLC. Nonviral vector-based expression: Agroinfiltration with conventional nonviral binary vectors, on the other hand, had been primarily used for analytical purposes before constructing transgenic plants (e.g., see (54)). However, progress made in recent years now allows even these vectors to express proteins at higher levels with agroinfiltration compared to transgenic plants. One of the key factors for high expression with agroinfiltration-delivered nonviral vectors appears to be the coexpression of a posttranscriptional gene silencing (PTGS) inhibitor such as tomato busy stunt virus-derived p19 and potyviral helper component proteinase (HC-Pro) (55–60). Other important factors of successful Agrobacterium-based transient expression include the strain of Agrobacterium, density of the bacteria, infiltration media, infiltration condition, and the plant’s physiological condition (61, 62), along with vector and transgene design. Detailed protocols of agroinfiltration-based protein expression have been recently published (63, 64). A notable example of agroinfiltration-based nonviral transient protein expression can be seen in a recent report by Vezina et  al. (55). Up to 1.5  g of full-size, assembled IgG was expressed in 1 kg of N. benthamiana leaf in 4–6 days by coexpressing the heavy and light chains, HC-Pro, and a chimeric human b1,4-galactosyltransferase (GT) by infiltrating a mixture of four Agrobacterium strains, each delivering either of four constructs. Notably, GT coexpression led to not only the addition of the terminal b1,4-galactose residue but also the efficient reduction of the plant-specific glycotope structure (a1,3-fucose and b1,2-xylose) on the IgG N-glycans, making them more “humanlike” (55). Very recently, a series of new highly efficient agroinfiltration expression vectors (pEAQ vectors) has been constructed based on a conventional binary vector containing cauliflower mosaic virus 35S promoter, and modified 5¢-UTR and the 3¢-UTR from Cowpea mosaic virus RNA-2 within the T-DNA

206

Matoba, Davis, and Palmer

region (65). These vectors were shown to express multiple polypeptides along with P19 from a single plasmid at a high level within a few days. As exemplified above, agroinfiltration currently provides a key technique for efficient transient expression of recombinant proteins in Nicotiana plants. The above-mentioned magnifection method allows simple scale-up of protein production based on these transient systems even to a commercial scale. In theory, grams of recombinant proteins can be obtained within the shortest period of time (days) among all currently available recombinant production systems (53). Such a rapid turnaround from DNA to product may provide a powerful tool for those proteins that need to be produced in a very short time, such as influenza vaccines, biodefense agents, and individualized cancer vaccines (20). Combined with other unique advantages of plant expression systems (i.e., cost-effectiveness, safety, eukaryotic protein modification), transient plant expression can now offer a robust and efficient method to produce valuable proteins. Given all the marked progress in transient expression systems illustrated above, the main stream of recombinant protein expression in Nicotiana plants has almost entirely moved away from constructing transgenic plants. However, transgenic plants still provide important resources to recombinant protein production. One particularly important application would be the engineering of expression hosts, thereby providing optimal features for protein production in terms of quantity and quality. 2.3. Host Engineering for Optimal Protein Expression

There is no perfect biological system for recombinant protein expression. In other expression systems available today, it has been a common practice to engineer hosts for optimal protein production, along with expression vector design (66–69). Efforts in this regard have just begun in the field of plant-based protein expression. Below are some of the plant-host factors targeted for improvement to enable improved protein expression. Host species: An ideal host should support high-level foreign protein expression while providing large biomass with easy and rapid growth profiles in both the greenhouse and field. Currently, N. benthamiana is used as a primary host for recombinant virus and agroinfiltration-based transient expression. The plant exhibits high susceptibility to many plant pathogens (24), which may be contributing to efficient expression with transient systems. However, N. benthamiana has a relatively small biomass and does not grow well in the field compared to other Nicotiana species. Large Scale Biology Company created a hybrid line between Nicotiana excelsior and N. benthamiana (N. excelsiana), which was shown to support high-level expression with the Geneware® system while maintaining good field growth properties (47, 70). Sheludko et  al. recently tested several

Protein Expression in Plants

207

Nicotiana species (N. benthamiana, Nicotiana debneyi, N. excelsior, Nicotiana exigua, Nicotiana maritima, and Nicotiana simulans) for Agrobacterium-mediated transient expression of a model protein, i.e., green fluorescent protein (GFP) with the magnICON® deconstructed TMV vector and a nonviral construct based on cauliflower mosaic virus 35S promoter (71). The authors concluded that N. excelsior has the best characteristics in terms of biomass and GFP accumulation level for both types of the expression vectors. PTGS: Transgene-specific PTGS often leads to poor expression in plants. As described above, coexpression of a PTGS inhibitor often improves levels of protein expression. Creating a host line expressing such an inhibitor would theoretically circumvent the need of codelivering vectors for the PTGS inhibitor and a target protein upon transient protein expression. However, P19expressing transgenic plants exhibit altered morphology because PTGS is actively involved in plant development (72, 73). A potential solution may be the control of PTGS inhibitor expression, for example, by the use of an inducible promoter (74). Glycosylation: As described above, glycans often found on human endogenous bioactive proteins have important roles in their bioactivities and pharmacokinetics, constituting an important factor for their recombinant production in heterologous hosts. For example, in the case of N-glycans on the Fc fragment of IgG, the presence and absence of terminal sugars such as sialic acid, core fucose, bisecting N-acetylglucosamine, and mannose residues impact the longevity and effector functions such as complement-dependent cytotoxicity and Ab-dependent cell-mediated cytotoxicity (75). Glycoengineering is therefore a major focus of protein expression systems with virtually all organisms, as none of the currently available systems (including mammalian cells) can provide a uniform glycoform that is completely human-like that possesses desirable biological properties (40). For plant-expressed therapeutic proteins, a concern has been the presence of the plantspecific a1,3-fucose and b1,2-xylose structure and the lack of ­terminal b1,4-galactose and sialic acid on complex N-glycans (76). In particular, the plant-specific glycan structure can be immunogenic and reduce bioavailability upon human use (77,  78). Thus, many efforts have been centered on reducing plant-specific glycotopes by targeting a1,3-fucosyltransferase (FT) and b1,2-xylosyltransferase (XT) and further humanizing by adding the terminal b1,4-galactose via expressing GT (76). Recently, Strasser et  al. have used RNA interference (RNAi) ­technology to construct transgenic N. benthamiana that possesses downregulated FT and XT activities (DFT/XT). A monoclonal antibody transiently expressed in this plant with a nonviral expression vector via agroinfiltration was shown to contain an almost homogeneous N-glycan without detectable b1,2-xylose

208

Matoba, Davis, and Palmer

and a1,3-fucose residues (79). More recently, the same group has overexpressed GT in the DFT/XT either stably (via nuclear genome transformation) or transiently (via agroinfiltration). A monoclonal antibody expressed in these plants was shown to contain terminal b1,4-galactose and lack b1,2-xylose and a1, 3-fucose residues in the N-glycan structure (80). Other plant-derived factors influencing protein production: Plants contain numerous endogenous factors (e.g., phenolics and proteases) that potentially affect the yields, homogeneity, and quality of recombinant proteins before or after extraction (81). Particularly, proteolysis often accounts for low expression and recovery of recombinant proteins. As has been done with E. coli and yeast (66, 82, 83), constructing protease-deficient or proteasedeactivated hosts might be a potential approach to this issue. A built-in protease inhibitor system was reported with transgenic potato plants constitutively expressing tomato cathepsin D inhibitor or bovine aprotinin, which was shown to stabilize a foreign protein (neomycin phosphotransferase II) in leaf tissue (84). Protease activity might be downregulated by RNAi. However, such approaches may not be as simple as would be in monocellular organisms like E. coli and yeast, given the higher complexity of proteases and their roles in plant development, growth, and homeostasis (81). Alternatively, protease inhibitors or silencing constructs may be coexpressed with the protein of interest using transient expression methods described above. Such a strategy has been proposed (85). Other possible targets of host engineering for enhanced protein expression may be the coexpression of molecular chaperones (21, 86). These would thereby promote correct protein folding and help avoid the stress response associated with misfolded protein accumulation (87).

3. Tips for Successful Protein Expression in Nicotiana Plants

As described above, significant progress has been made over the last few years with respect to in planta expression vectors and methodologies. Plant expression technologies have reached a point where many proteins can be expressed at levels within days that approach ~5 g/kg leaf biomass. With such robust expression vectors becoming available, researchers will soon only need to clone a gene of interest into an appropriate expression vector and follow established procedures to obtain complex bioactive proteins of interest. However, even with the cutting-edge expression vectors, high-level expression is not guaranteed for all proteins, as is the case with any recombinant expression system. Protein expression levels would be reflected not only by transcription efficiency but also by transcript stability, translation efficiency as well

Protein Expression in Plants

209

as protein stability. The nature of a transgene can have a significant impact on transcript stability and translation efficiency. Meanwhile, poor expression may be due to the protein’s high susceptibility to proteolysis, potential toxicity, or physiological incompatibility with plant cells. Last but not the least, there are important factors to consider for the extraction and purification of proteins expressed in plants. These are discussed in this section. 3.1. Transgene Design

Recombinant protein expression in heterologous hosts is often inefficient due to nonoptimal transgene construction, which could compromise mRNA stability and/or protein translation efficiency (66, 88). Given the current availability of quick and cost-effective gene synthesis services, it is preferable whenever possible to redesign the entire coding sequence to maximize the likelihood of high protein expression. Several companies offer gene synthesis services using proprietary algorithms to optimize the factors described below toward efficient transgene expression in a given organism. One of the most important factors is codon usage. Different organisms tend to have a particular preference for one or a small set of many synonymous codons for a given amino acid (89). Codon usage reflects the availability of cognate amino-acylated tRNAs and thus can correlate with translation efficiency (89, 90). Codon frequencies in Nicotiana plants can be found at h t t p : / / w w w. k a z u s a . o r. j p / c o d o n / c g i - b i n / s p s e a r c h . cgi?species=nicotiana&c=i. For example, among the four alanine codons, GCG occurs at the lowest frequency (9%) in N. benthamiana, according to the database, and is therefore considered “nonpreferred.” Avoiding such rare codons would be theoretically important for codon optimization. However, a caveat for codon optimization across the whole coding sequence is that rare codons at certain positions may play an important role in proper protein folding by regulating local translational efficiency (91–93). Meanwhile, it is well known that the sequence surrounding the AUG start codon constitutes an important factor in the control of mRNA translation efficiency in eukaryotes. In addition to the plant-specific Kozak-like consensus sequence (94, 95), the sequence and secondary structure downstream of the initiation codon may also influence the transcript’s translational efficiency and stability (91, 96–98). Removing potential mRNA modification sequences constitutes another factor for consideration toward transcript stability. For example, bacterial genes tend to be AT-rich, whereas many plant genes have a higher GC content (88). AT-rich sequences may contain potential polyadenylation signals such as AATAAA and RNA destabilizing elements such as ATTTA (88). These could lead to poor accumulation of intact transcripts upon expression in plant cells, although they are not

210

Matoba, Davis, and Palmer

always functionally active (91). Information about some of these cryptic sequences is available (88). Usually, codon optimization helps remove many of such cryptic sites. However, the list is not comprehensive, and unidentified deleterious sequences may still exist, thus requiring a detailed analysis if transcript quality is suspected for poor protein expression. Finally, mRNA’s secondary structure may also need attention in terms of a potential target for silencing (99); however, this may be addressed by the coexpression of a silencing inhibitor. 3.2. Strategies to Improve Protein Accumulation

Poor in planta protein accumulation due to stability or toxicity issues can be sometimes addressed by targeting to a specific organelle such as the apoplast, endoplasmic reticulum (ER), chloroplast, or vacuole through specific signal peptides attached to the protein of interest. Detailed discussion on the effects of subcellular targeting is provided in recent review articles (100, 101). Each cellular compartment has a different influence on the folding, assembly, posttranscriptional modification, and metabolism for proteins. Therefore, an optimal accumulation site for a given protein has to be determined experimentally. However, proteins that require special posttranslational modification such as glycosylation must be targeted to the endomembrane systems, thus limiting potential options. For such proteins, the ER has been shown to be able to support higher accumulation levels (100, 101), although targeting expression to this cellular compartment will limit N-glycosylation primarily to high-mannose types (e.g., (32)). In conventional recombinant protein expression systems, a wide range of translational fusion partners has been developed to enhance expression, solubility, and stability of the passenger protein (66). Although not yet extensively investigated, such a strategy may also improve the protein accumulation in plants. A unique approach has been recently proposed, in which a recombinant protein is designed to be accumulated within a highly stable ER-derived protein body in leaf tissue by fusing a unique sequence repeat of elastin-like polypeptides (102) or a sequence derived from prolamin seed storage proteins (103, 104). A potential challenge for these fusion strategies is the requirement for the cleavage of the fusion partner upon purification, which could significantly increase the downstream processing costs.

3.3. Extraction and Purification

Extraction of recombinant proteins from Nicotiana leaves poses a number of challenges that originate in the physical and biochemical environment of the leaf. Plant cell walls and extensive endomembrane structures provide the first purification challenge. They encapsulate a cytoplasm rich in alkaloids and polyphenolic compounds such as flavonoids and tannins, and a wide array of protease activities that can complicate downstream

Protein Expression in Plants

211

purification of the desired recombinant protein products. Obviously, the level of accumulation of the protein of interest is critically important, with higher accumulation levels reducing the purification burden. Several groups have explored methods that avoid disruption of the cellular structure to harvest the secreted proteome that contains the target protein. Julian Ma and colleagues have recently described methods for recovery of monoclonal antibodies and an antiviral lectin (cyanovirin-N) from root exudates of plants grown hydroponically (105). This approach has significant appeal, since it allows continuous production and harvest from live plants, and given that the complexity of the root exudate proteome is low, a protein that is expressed at relatively high levels could be purified by conventional filtration and chromatography methods with relative ease. Proteins that are targeted into the secretory pathway in Nicotiana leaves usually accumulate in an appropriately folded form in the interstitial space between plant cells. The soluble protein complement in the interstitial space is comparatively simple, and the secreted recombinant protein can be one of the most abundant proteins present (e.g., see (10, 106–108)). Extraction of the protein component of the interstitial space relies on gentle treatment of the leaf material to limit cell breakage: intact leaves are harvested from plants expressing the secreted recombinant protein of interest and infiltrated with extraction buffer under vacuum. The secreted proteome, hopefully including the recombinant protein, is then recovered in the interstitial fluid (IF) by centrifugation under low gravitational forces that limit tissue damage. This method has been used to recover a wide array of secreted proteins from Nicotiana leaf tissues, from small proteins such as scFvs, lysozyme, and aprotinin, through molecules as large as monoclonal antibodies. Key examples are published by McCormick et al. (10, 106, 107) and by Du et al. (108). In our experience, the target protein may represent at least 10% and frequently more than half of the secreted protein component recovered from the IF extract, greatly reducing the purification burden and facilitating purification by conventional affinity, ion exchange, or hydrophobic interaction chromatography methods. The methods that describe the IF purification procedure are described in the United States Patent literature in several US Patents entitled “Method for recovering proteins from interstitial fluid of plant tissues”; these are available at http://www.uspto.gov. Kentucky Bioprocessing LLC has invented an apparatus that is capable of processing multikilogram amounts of Nicotiana leaf tissue for recovery of secreted proteins, which facilitates the IF process at industrial scale (see US Patent 6,284,875 entitled “Centrifuge for extracting interstitial fluid”).

212

Matoba, Davis, and Palmer

For proteins targeted to the cytoplasm or retained in the ER, and in instances where IF extraction of secreted proteins is economically unfeasible, it is necessary to extract the whole soluble protein component. When designing an optimal extraction and purification strategy of a target recombinant protein, it is important to investigate as thoroughly as possible the biochemical and biophysical properties of the native protein. Answers to questions such as (1) what is the protein’s isoelectric point (pI), (2) does it contain disulphide bonds, (3) does it prefer a certain ionic strength and pH of buffer to maintain solubility, and (4) does it tolerate heat will help guide the design of the initial extraction process. Some general rules about plant protein extraction should be considered. In the first instance, plant cells are replete with proteases and phenolics that will hamper downstream purification and product quality if not addressed early in the extraction procedure. It is generally unhelpful to use reagents that are costly, overly toxic, or which present a complicated disposal problem, in the initial laboratory extraction procedure, since this will render the protocol difficult to scale-up. The protein extract is derived by grinding leaf material with buffer and must be separated from the fibrous material by facile filtration processes (at bench scale) or by continuous flow centrifugation (at industrial scale). For process scale-up, it is important to keep extract volumes as low as possible, so we often strive to use buffer to leaf ratios as low as 0.5:1.0, and preferably not greater than 1:1. It is also desirable to keep the pH of the extraction buffer below 5.0 because the majority of plant proteins are insoluble at this pH, in particular the proteins involved in the photosynthetic complex such as the large and small subunits of RUBISCO, which can be challenging to eliminate during purification of the target proteins. Oxidized phenolics and RUBISCOassociated membranous structures can very rapidly foul filtration membranes and chromatography columns, so it is vital that green (RUBISCO-associated membranes) and brown (polyphenols) color is eliminated from the protein extract. A low pH buffer (