Expression and structure-function characterisation of ... - DiVA

4 downloads 12374 Views 8MB Size Report
here concerns the structural studies of three herpesviral proteins. SOX from .... define it. It cannot survive on its own and behaves as a molecular parasite,.
Expression and structure-function characterisation of herpesviral proteins

Sue-Li Dahlroth

1

Doctoral thesis at Stockholm University Department of Biochemistry and Biophysics ©Sue-Li Dahlroth, Stockholm 2008

ISBN 978-91-7155-755-1 pp 1-76 Printed in Sweden by Universitetsservice AB, Stockholm 2008 Distributor: Department of Biochemistry and Biophysics, Stockholm University Papers I-III are reprinted with permission from the publisher.

2

Abstract Human viruses coexist with their hosts, sometimes silently and sometimes by causing a vast range of symptoms. To fully understand these seemingly simple particles, how they have evolved, their pathogenesis, to be able to develop new drugs and potentially new vaccines and diagnostic tools we need to study the individual viral proteins both functionally and structurally. In order to determine and study a protein structure, large amounts of it is needed. The easiest way to obtain a protein is to recombinantly overexpress it in the well-studied bacterium Escherichia coli. However, this expression host has one major disadvantage, overexpressed proteins might not be folded or be insoluble. Within the field of structural genomics, protein production has become one of the most challenging problems and the recombinant overexpression of viral proteins has in particular proven to be very difficult. The first part of the thesis concerns the recombinant overexpression of troublesome proteins in E. coli. A new method has been developed to screen for soluble overexpression in E. coli at the colony level, making it suitable for screening large gene collections. This method was used to successfully screen deletion libraries of troublesome mammalian proteins as well as complete ORFeomes from five herpesviruses. As a result soluble expression of previously insoluble mammalian proteins was obtained as well as crystals of three proteins from two oncogenic human herpesviruses, all linked to DNA synthesis of the viral genome. The second part of the work presented here concerns the structural studies of three herpesviral proteins. SOX from Kaposi’s sarcoma associated herpesvirus is involved in processing and maturation of the viral genome. Recently SOX has also been implicated in host shutoff at the mRNA level. With this structure, we propose a substrate binding site and a likely exonucleolytic mechanism. The holoenzyme ribonucleotide reductase is solely responsible for the production of deoxyribonucleotides and regulates the nucleotide pool of the cell. The small subunit, R2, has been solved from both Epstein Barr virus and Kaposi’s sarcoma associated herpesvirus. Both structures show disordered secondary structure elements in their apo-and mono metal forms, located close to the iron binding sites in similarity to the p53 induced R2 indicating that these two R2 proteins might play a similar and important role.

3

Table of Contents INTRODUCTION

8

HUMAN HERPESVIRUSES

11

HERPESVIRUS STRUCTURE HERPESVIRUS SUBFAMILIES ALPHA HERPESVIRUSES BETA HERPESVIRUSES GAMMA HERPESVIRUSES

12 13 14 15 15

STRUCTURAL GENOMICS

17

PROTEIN PRODUCTION

19

PHYSICAL PARAMETERS FUSION PROTEINS AND PROTEIN TAGS CONSTRUCT DESIGN

20 21 22

SCREENING FOR RECOMBINANT SOLUBLE OVEREXPRESSION 23 COLONY SCREENING METHODS THE COFI BLOT (PAPER I AND III)

24 28

LIBRARY METHODS

31

DELETION LIBRARIES SCREENED WITH THE COFI BLOT (PAPER II)

32

EXPRESSION SCREENING COMPLETE GENOMES

36

STRUCTURAL GENOMICS AND HUMAN PATHOGENS

38

THE DAILY SCOOP (PAPER IV) SCOOP AND OTHER HERPESVIRUS ORFEOMES

39 41

MOVING FURTHER DOWN THE PIPELINE

42

HOST SHUTOFF IN HHV

45

SOX STRUCTURAL STUDIES OF THE SOX PROTEIN FROM KSHV (PAPER V) SUBSTRATE BINDING SOX AS AN RNASE?

46 48 50 51

NUCLEOTIDE SYNTHESIS

53

RIBONUCLEOTIDE REDUCTASE

53 4

STRUCTURAL STUDIES OF THE R2 SUBUNIT OF THE RIBONUCLEOTIDE REDUCTASE FROM EBV AND KSHV (MANUSCRIPT IN PREPARATION)

54

FUTURE PROSPECTS

57

ACKNOWLEDGEMENTS

58

REFERENCES

60

5

List of papers  This thesis is based on the following papers, referred to in the text by their roman numerals. I.

Cornvik, T., Dahlroth, S-L, Magnusdottir, A., Herman, M.D, Knaust, R., Ekberg, M. and Nordlund P. Colony-Filtration blot, A new screening method for soluble protein expression in E. coli. Nature Methods. 2005, 2(7):507-9.

II.

Cornvik, T, Dahlroth, S-L., Magnusdottir, A., Flodin, S., Engvall, B., Lieu, V., Ekberg, M. and Nordlund, P. An efficient and generic strategy for producing soluble human proteins and domains in E.coli by screening construct libraries. Proteins: Structure, Function and Bioinformatics. 2006, 1;65(2):26673.

III.

Dahlroth, S-L., Nordlund, P. and Cornvik, T. Colony filtration blot for screening soluble expression in Escherichia coli. Nature Protocols, 2006, 1(1):253-8.

IV.

Dahlroth, S-L., Lieu., V, Haas, J. and Nordlund, P. Screening Colonies of Pooled ORFeomes, SCOOP: A rapid and efficient strategy for expression screening ORFeomes in E. coli. Submitted

V.

Dahlroth, S-L., Gurmu, D., Schmitzberger, F., Erlandsen, H. and Nordlund, P. Structure of the shutoff and exonuclease protein from the oncogenic Kaposi’s sarcoma associated herpesvirus Manuscript

6

Abbreviations 2D-gel AE AIDS CAT CMV CoFi blot EBV GFP HHV HIV HSV-1, 2 IMAC IPTG

Two-dimensional gel Alkaline exonuclease Acquired immune deficiency syndrome Chloramphenicol acetyltransferase Cytomegalovirus Colony filtration blot Epstein Barr virus Green fluorescent protein Human herpesvirus Human immunodeficiency virus Herpes simplex 1 and 2 Immobilised metal affinity chromatography Isopropyl -D-1-thiogalactopyranoside

KS KSHV MCD mCMV MS NMR ORF ORFeome PEL RNR SAD SCOOP SG SOX VZV WHO UNICEF

Kaposi’s sarcoma Kaposi’s sarcoma associated herpes virus Multicentric Castleman’s disease Murine Cytomegalovirus Mass spectrometry Nuclear magnetic resonance Open reading frame The complete collection of ORFs from one organism Pleural effusion lymphoma Ribonucleotide reductase Single wavelength anomalous dispersion Screening colonies of ORFeome pools Structural genomics Shut off and exonuclease Varicella zooster virus World health organisation United nations children’s fund

7

Introduction Viruses are small biological entities, existing on the border of life as we define it. It cannot survive on its own and behaves as a molecular parasite, making use of its host’s cellular machinery to create infectious progeny. They come in many different and diverse forms and have been around since ancient times, evolving with its surroundings (1). Viruses are divided into families, subfamilies, genus and strains and can vary in both shape and size. The smallest known virus belongs to the Parvovirdae family with a size of 20-25 nm and the largest known virus is the Mimivirus of 400 nm (2). They will carry their genetic material of only a few genes to hundreds as either DNA or RNA, which is encapsulated in a protective layer of proteins (capsid) and sometimes a membrane consisting of lipids and proteins. Proteins in this outer shell will determine their mode of infection and which hosts they infect, be it bacteria, plants, fungi, animals or humans (1, 2). a)

c)

b)

e)

d)

Figure 1 Pictures1 of different viruses taken with electron microscopy. a) Adenovirus (~90-100 nm) b) Bacteriophages (~20-200 nm) c) Herpesvirus (~200 nm) d) Hepatitis C virus (50 nm) e) the Ebola virus (~80 nm).

 1

Pictures are part of the public domain and under no copyright restrictions. http://www.wikipedia.org

8

Certain types of viruses that infect plants can destroy entire harvests of certain crops causing enormous economic damage each year (3). In humans viral infections can be life long or temporary and can cause symptoms that can range from anything like a common cold, diarrhoea, the flu, chicken pox, and measles to hepatitis, polio, cancer, AIDS (acquired immune deficiency syndrome), encephalitis, Ebola hemorrhagic fever and so on. Viruses can roughly be divided into DNA or RNA viruses depending on how they carry their genetic material. For both classes, the genome can be double stranded (ds) single stranded (ss), circular or linear. The life cycle of a virus (Figure 2) can be divided into several stages, attachment to the target cell, entry (by endocytosis, fusion or genetic injection), replication and shedding (the process when new viral particles leave the cell). Shedding occurs either through lysis, budding, apoptosis or exocytosis (2). After host cell entry, DNA viruses must move its genome into the host cells nucleus, the site for DNA replication and transcription. The RNA, from RNA viruses, can either remain in the cell cytoplasm, which will then be the scene for its life cycle, or it can convert its RNA into DNA that will move into the nucleus and fuse with the hosts’ genome. These latter types of RNA viruses, of which the best known is HIV (human immunodeficiency virus) are called retroviruses and cause lifelong infections. An interesting fact is that up to 8% of the human genome is believed to be remnants of retroviral infections (4) and although we carry what is referred to as proviruses in our genome, they do not as far as we know cause disease. It is however not only retroviruses that can cause lifelong infections. Many additional elements determine weather or not a viral infection will persist, such as the target cell, the individual’s immune system and how the viral genome is maintained and replicated once inside its host. For instance herpesviruses are relatively large dsDNA viruses infecting a wide range of host cells. Herpesviral infections are life long due to target cell type, genome maintenance and a cunning strategy to evade the immune system (5).

9

a)

e)

c) d)

b)

Figure 2 The general life cycles of viruses from attachment to shedding. a) A DNA virus, that enters through fusion. The DNA is exported to the nucleus where it is replicated and transcribed. The mRNA is transported to the cytoplasm and translated into viral proteins. New virions are shed through exocytosis. b) An RNA virus enters through endocytosis and is stripped. The RNA is either c) translated directly into viral proteins and new virions are made or d) the RNA is reversibly transcribed into DNA that enters the nucleus and fuses with the host genome. This provirus is transcribed and translated into virions that are shed through budding. e) The RNA genome is injected into the host cell and is translated into viral proteins, which assemble into new infectious virions that are shed through host cell lysis.

Since viruses are the causative agents for numerous mild symptoms like colds but also very brutal diseases like cervical cancer, Burkitt’s lymphoma, liver cancer etc they are intensely studied. Their mode of infection, pathogenesis and epidemiology as well as their molecular structure and cellular interactions are of huge interest with the goal of developing diagnostic tools, vaccines and antiviral drugs. As huge as the discovery and development of penicillin and other antibiotics, as a treatment for bacterial infections in the late 1920’s, is the discovery of vaccination (from the Latin word vacca meaning cow) in the late 18th century. In 1796 Edward Jenner used the cowpox virus to vaccinate humans, which resulted in protection 10

against the two smallpox viruses that can cause blister-like scarring in the face, blindness and even death. Smallpox was officially declared eradicated in 1979. Polio, caused by the poliovirus was for a very long time a dreaded childhood disease that can cause paralysis, meningitis and even death. Vaccines against polio were developed by Jonas Salk in 1952 and Albert Sabin in 1962. Since the start of a global vaccination effort in 1988 by WHO, UNICEF and the Rotary Foundation, the number of reported cases has dropped from hundreds of thousands to only thousands each year2. Several continents are today declared as polio-free and a global eradication has been proposed, although is still persists in some developing countries (6). Even though many viral infections can be stopped by vaccinations there are still many viruses, such as HIV, Hepatitis C, Dengue fever, for which vaccines do not exist.

Human herpesviruses  An immense number of books and scientific articles have been written about a wide range of topics concerning herpesviruses, from the overall virion structure, the mode of transmission, prevalence in ethnic and social groups, to the regulation of specific proteins in an infected cell. The main purpose of this thesis is not to give an absolute introduction to herpesviruses, but just a sneak peak and enough information about them for the reader to understand the importance of this work as well as give a general grasp of the complex relationship between these viruses and their hosts. To date, hundreds of herpesviruses have been identified but only 8 are known human pathogens and for the remainder of this thesis the focus will be on the human herpesviruses and they will be referred to by their common names or abbreviations (Table 1). All herpesviruses are large dsDNA viruses that share a common overall structure and similar life cycle. Depending on their mode of infection, the length of the life cycle and target cells, they are further divided into subfamilies (2, 5). These subfamilies are further divided into genera, although these will not be referred to or mentioned in this text.

 2

http://www.who.int/en/

11

Table 1 A list of the 8 human herpesviruses, their subfamily, formal names, common names and abbreviations. In the text, they will henceforth be referred to either by their common name or abbreviations in column 5, except for HHV-6 and HHV-7. Subfamily

Formal name

Abbrev

Common name

Abbrev

Alpha herpesvirus

Human herpesvirus 1 Human herpesvirus 2 Human herpesvirus 3 Human herpesvirus 4 Human herpesvirus 5 Human herpesvirus 6 Human herpesvirus 7 Human herpesvirus 8

HHV-1

Herpes simplex virus-1

HSV-1

HHV-2

Herpes simplex virus-2

HSV-2

HHV-3

Varicella-zoster virus

VZV

HHV-4

Epstein-Barr virus

EBV

HHV-5

HCMV

HHV-6

Human cytomegalovirus -

-

HHV-7

-

-

HHV-8

Kaposi’s sarcomaassociated herpesvirus

KSHV

Alpha herpesvirus Alpha herpesvirus Gamma herpesvirus Beta herpesvirus Beta herpesvirus Beta herpesvirus Gamma herpesvirus

Most of what we know today about herpesviruses has been based on studies of the herpes simplex virus-1 (HSV-1) due to its early identification and easy cultivation in cell cultures. Recently, however, significant progress has been made in understanding the structure, biology and pathogenesis of the other human herpesviruses. Herpesvirus structure As already mentioned, the herpesvirus family is a group of dsDNA viruses, which are ~200 nm in diameter with a genome size of ~130-250 kbp (70-170 genes). What also unites herpesviruses is the common architecture of the infectious particles (Figure 3). In the infectious virion, the genome is linear and is wrapped around a core of proteins. This genome is contained within an icosahedral shell, the capsid, which is made up of two types of oligomeric proteins, hexon and penton capsomers. Surrounding the capsid is the poorly characterized tegument, an amorphous mass consisting of various essential 12

and non-essential proteins that are delivered to the cells at the very initial stage if infection (5). Amino acid sequencing and MS analyses have been carried out to determine the protein content of the tegument. Apart from cellular proteins (that might be specifically or non-specifically packaged into the virions) (7-9) the tegument contains more than 20 for HSV-1 and more than 30 for HCMV virus-encoded proteins that aid in viral replication and immune evasion (8, 10, 11). The tegument is surrounded by a membrane of lipids and glycoproteins, (each of the herpesviruses encodes a set of 20-80 glycoproteins) used for target cell recognition, attachment and entry (5, 12, 13).

Figure 3 Schematic picture of the overall structure of a herpesvirus. The DNA core is surrounded by the capsid, the tegument and a lipid envelope containing glycoproteins.

Herpesvirus subfamilies All herpesviruses belong to one of three subfamilies, alpha, beta and gamma, depending on their host range, length of reproductive cycle and target cells. In addition to their overall structure, the three herpesvirus subfamilies share the mode of infection and life cycle (5, 14). A herpesvirus life cycle has two distinct phases, a latent and a lytic phase. After attachment to the target cell, the lipid membrane is fused with the cellular membrane and the capsid and tegument proteins are released into the cell. The capsid is dissolved and the DNA is transported into the nucleus where is circularises into an episome. The viral episome is replicated and maintained with the host genome. Only a very small set of genes is expressed during the latent phase, and their products block apoptotic pathways and aid in immune evasion and genome maintenance. During the latent phase, the infected individual shows no symptoms of infection. At a given signal, for instance a weakened immune system due to a cold, the virus is reactivated and goes into lytic phase. The 13

molecular signals that cause reactivation of herpesviruses are still not entirely known, but HSV reactivation has been ascribed to physical damage, ultraviolet light, hormones, or even fever. In the lytic phase, the virus will take over host gene expression and shuts it down. It will then start replication of its own genome and subsequently produce viral proteins. New infectious virions will assemble and leave the host cell (5). A common characteristic of herpesvirus infections is that they rarely pose any real threats to a healthy person and although infection is life long a person can go through life without even knowing that they are infected. The real problem occurs in people with weakened immune systems where infection can lead to organ failure and consequently death (15-18). Alpha herpesviruses HSV-1 is the prototype of the alpha herpesvirus family. The alpha herpesviruses HSV-1/2 causes cold sores and genital herpes while VZV causes chicken pox and shingles (14). Symptoms of infection will show in epithelial cells such as skin and mucosa and these cells will consequently be targeted by the immune system. However, the target destination is sensory neurons in the brain. Infection starts at the mucosal surface, where the virus will undergo lytic replication in the surrounding epithelial cells. After this it enters a nearby sensory neuron, where it will establish a lifelong albeit latent infection. The capsid travels up the axon on microtubules to the nucleus where the genome enters the nucleus and circularises into an episome (19, 20). Upon reactivation, the virus will travel back down the axon to epithelial cells, where further lytic spread will result in symptoms and a hopeful transmission to new hosts. Although about 90 % of the general population is infected with HSV it is only very rarely that this will cause any symptoms other than cold sores (5). However, alpha herpesvirus infections can result in encephalitis most commonly in children, the elderly, and people with weakened immune systems (i.e. those with HIV/AIDS or cancer) although this is very rare and only occasionally fatal (21).

14

Beta herpesviruses Beta herpesviruses, such as HCMV, HHV-6 and HHV-7 replicate more slowly than alpha herpesviruses and establish latency in progenitor cells of the bone marrow, monocytes and T-cells, which are all part of the immune system (5). The best characterised member in this subfamily is HCMV, which replicates in a vast number of cells i.e. macrophages, dendritic cells, colonic and retinal cells, endothelial cells (22). It has been estimated that >60% of the general population carries CMV (5) and infection usually goes unnoticed in healthy adults. It is only in immunocompromised individuals (like HIV patients, organ transplant recipients as well as unborn babies) that serious conditions such as pneumonia, encephalitis and retinitis arise (22, 23). HCMV infection is considered the major cause of these conditions and the subsequent mortality among immunosuppressed individuals (24, 25). Gamma herpesviruses There are two human herpesviruses that belong to the gamma herpesvirus subfamily, EBV and KSHV. A key feature of these viruses is their capacity to induce lymphoproliferation and cancers (5, 26). EBV’s major targets are epithelial cells and B-cells where it also establishes latency. KSHV infects a wide range of cells in vivo and in vitro for example endothelial cells, B-cells epithelial cells and fibroblasts (27). After infection the linear genome circularises into an episome, which tethers itself to the host chromosome by a specific protein and replicates in concert with the hosts genome (5, 28, 29). EBV causes infectious mononucleosis, better known as “kissing disease” and it has been estimated that >90% of the worlds adult population carries EBV3. EBV was the first human tumour virus discovered and is associated with Burkitt’s lymphoma and nasopharyngeal carcinoma and to several other malignancies for instance several types of Hodgkin’s lymphoma and gastric carcinoma. In AIDS patients EBV causes a number of other lymphomas and tumours (5, 30). In addition, there is a growing body of evidence, although very controversial, that suggests a connection between EBV infection and

 3

http://www.who.int/en/

15

liver and breast cancer as well as certain types of auto-immune diseases like multiple sclerosis, rheumatoid arthritis and diabetes (31, 32) Kaposi’s sarcoma (KS) was first described in 1872 as a rare purplishpigmented sarcoma of the skin typically found in elderly men of Jewish and Mediterranean descent. However, during the onset of AIDS in the 1980’s there was a noticeable increase of KS and in 1994 the cause was identified as a new human herpesvirus subsequently called Kaposi’s sarcoma associated herpesvirus (KSHV) (33). Besides causing KS, KSHV has been associated with some rare but lethal lymphomas, pleural effusion lymphoma (PEL) and multicentric Castleman’s disease (MCD) (5, 34). In Europe and the US 50% are carriers (35) and in these countries KS has become one of the most clinically described neoplasms. As with EBV, KSHV has also been linked to other more controversial conditions like sarcoidosis (5, 36, 37) and multiple myeloma (38, 39). What really sets KSHV aside from the rest of the human herpesviruses is the amount of cellular genes that is has copied throughout evolution, in something called molecular mimicry or molecular piracy, where more than a dozen cellular genes have been copied. Furthermore, several of these genes have potential tumour related functions, so called oncogenes, meaning that they can affect the cell cycle, apoptosis and other types of cell signalling (40-42). EBV, on the other hand, encodes several highly evolved transcriptions factors and signalling proteins that induce many of the same cellular genes that KSHV has pirated into its own genome (5). Herpesviruses have coevolved with their hosts to establish lifelong infections in various cell types (43, 44). They are the major cause of several minor syndromes and major malignancies in humans. Whether or not the symptoms are manageable with today’s treatments or more severe is determined by the individual’s immune system and genetic predispositions. In immunocompromised patients, such as organ transplant recipients, cancer patients, and AIDS patients a herpesviral infection can cause major complications which could result in death. To fully understand these viruses the individual proteins, such as virion proteins and proteins from various stages of the viral life cycle, can be 16

studied. However, some of these proteins can be hard to come by, since they are not present or expressed in sufficient amounts in the virion particle or target cells. The easiest way to obtain these proteins would therefore be to recombinantly overexpress them. Recombinant proteins from herpesviruses could help in creating vaccines, yield in high-resolution protein structures that can help in understanding the viral life cycle, evolution, and pathogenicity and might serve as potential drug targets.

 Structural genomics It is broadly accepted that the function of a proteins is dependant on the amino acid sequence and how this chain of amino acids is folded in the three-dimensional space. It is also widely accepted that the information for the folding pattern of a protein resides in its linear DNA sequence. However, this folding information is at present much too hidden from us and in order to understand how a protein works at the molecular level we must study its three-dimensional structure, preferably at high resolution. In the wake of the genomics efforts, the effort to determine the complete genomic sequence of all organisms (45-48), new emerging fields have risen with aims to, on a-full-organism-scale determine patterns, trends, functions and pathways among and within these genes and their corresponding gene products, be it at the transcriptional, translational or degradation level. Huge databases with massive amounts of information have become available for searches, which in turn have yielded in vast amounts of new results and data. Structural genomics (SG), the effort to structurally determine a large number of proteins from one organism/genome, is such a field (49). In Table 2, the approximate number of genes for some organisms and viruses are shown and the number of unique protein structures for each of them4. To date less than 5% of the human herpesviral proteins have been structurally characterised and even less have been solved within the context of SG (50, 51).

 4

http://www.rcsb.org/pdb/home/home.do

17

Table 2 Unique structures submitted to PDB for a few organisms/viruses also shown is the approximate number of genes5. Organism

~No of genes

Unique structures in PDB

Homo sapiens Saccharomyces cerevisiae Escherichia coli HIV EBV(HHV-4) KSHV (HHV-8)

20,000 7,700

4080 533

4,400 9 80 88

1403 32 (6) 12 9

Within structural genomics projects there usually exists a common approach to which all target proteins must succumb. This approach/strategy is usually referred to as a pipeline and has the general outline of target selection, cloning, overexpression, purification, crystallisation and structure determination. Even though these steps are common, the execution of them may vary, for instance how the targets are chosen (what criteria they are based on) how the cloning is done (i.e. digestion followed by ligation, recombination cloning or ligase independent cloning), which expression system is used (i.e. bacteria, yeast, insect cells) (52-60). But the common denominator to all these steps within SG initiatives is the aim to achieve as high output of target structures as possible. To be able to work with as many target proteins as possible and to increase success rates new methods within all steps have been developed and evolved. In the early days of SG it became evident that one of the major obstacles was, and still is, the production of suitable protein samples of sufficient amount and purity (61). Since then, statistics for all the steps in the SG pipeline from various worldwide SG efforts have been gathered. SG initiatives have together cloned more than 100,000 targets from various organisms and produced suitable protein samples for downstream processing for about 1/3 of the cloned targets7.

 5

http://www.genomesonline.org For HIV, the number of structures exceed the number of genes. The HIV genes are subject to splicing, leaky scanning and frame shifting and products of translation are subjected to protease activity hence creating more proteins than genes. In addition, structures from different strains of HIV have been solved. 7 http://targetdb.pdb.org/statistics/targetstatistics.html 6

18

Protein production Since certain proteins are present only at very low levels and at certain time points in a living organism, the most ethical and effective way of obtaining a protein of interest is to recombinantly overexpress it. High-resolution structures of biological macromolecules, such as proteins, can be determined with several methods. The most convenient and efficient methods are X-ray crystallography and NMR. For X-ray crystallography, the dominating method, an absolute requirement is well diffracting protein crystals. Numerous crystallisation trials and optimisations might be necessary to produce a well diffracting crystal and therefore large amounts of pure and soluble protein is needed. The most widely used expression host to date for recombinant overexpression is the well-studied bacterium E. coli. The major reason for using E. coli is that with great ease and low cost, large amounts of biomass can be generated (62-64). E. coli also possesses other advantages beneficial for structural biologists. For instance, it will produce a very homogenous protein sample since it lacks the machinery to create certain covalent modifications like glycosylations. In principal, E. coli has the same protein production machinery as any other cell, although is differs in ways that might create problems. For instance, when trying to overexpress a protein, the bacterium might form large insoluble aggregates called inclusion bodies (62, 65, 66). Inclusion bodies consist of unfolded or partly folded proteins and might take up a very large volume of the cell (67, 68), although there have been situations where correctly folded and active proteins have been found in inclusion bodies (69). The already mentioned lack of covalent modifications, which might be needed for proper protein folding and function, could be one reason for inclusion body formation. E. coli may also lack certain important tRNAs that could lead to halted translation as well as lacking certain folding partners such as chaperones (67, 68, 70-73). Another problem when overexpressing proteins is in vivo degradation by indigenous proteases or that the target protein in itself is toxic to the bacterium (74). The overexpression of eukaryotic proteins in E. coli is especially troublesome (75), although even when trying to overexpress indigenous E. coli proteins, in vivo proteolysis occurs as well as inclusion body formation (65, 66, 76). 19

In contrast to prokaryotic proteins, eukaryotic proteins tend to be, on an average, larger and consist of more domains that are connected by flexible linker regions (77) and it has been shown that the size of a protein correlates with the success rate of its soluble expression in E. coli (53). As already mentioned, the official success rates from SG efforts for soluble production of prokaryotic proteins, is 50% and 30% for eukaryotic proteins when a basic SG pipeline is used (53, 60). However, in these statistics there is no information of the success rates in regard to full-length protein expression or protein size. To increase the likelihood of obtaining soluble protein, when using E. coli, there are generally two approaches: i) either change physical parameters of the experiment (like the bacterial strains, culture conditions, promoters or fusion partners or ii) change the properties of the target protein. To ensure high success rates and low costs, the soluble recombinant expression should be screened before proceeding with large-scale expression and purification. Physical parameters To compensate for rare codons, tRNAs can be co-expressed in E. coli (78, 79) and numerous strains are commercially available that can co-express these tRNAs. Typically these strains also lack certain proteases that could lead to protein degradation (80). Even strains that should be more resistant to toxic proteins and strains that provide the right oxidizing conditions, permitting disulfide bonds to be formed, have been created (81-83). Another parameter that can influence the recombinant expression of a target protein is the culture conditions (62). The growth medium can be changed as well as the growth temperature. Although very few systematic studies have been reported (84, 85), it is still believed that the expression medium could influence soluble expression. In regard to expression temperatures, more support exists for its influence on the expression (62, 72) than for the expression medium. For several proteins it has been shown that by decreasing the temperature, target proteins could be rescued from the fate of inclusion bodies (62, 85-87). How the solubility of a protein in a bacterium correlates with a decrease in expression temperature is not fully understood and might be due to a combination of factors involved in the transcription 20

and translation as well as folding of the protein. When the transcription and translation machineries slow down, due to decreased temperature, the protein might have time to fold in a proper way. The attractive forces, between hydrophobic parts, that could lead to protein aggregation are potentially weaker at low temperatures (88). It has also been shown that the expression of several indigenous chaperones are induced at lower temperatures (89). Although new bacterial strains have been created and culture conditions are varied, the problem of inclusion bodies and proteolysis still persists, especially for mammalian proteins. Fusion proteins and protein tags Fusion proteins are often large soluble proteins that are subcloned upstream or downstream of the target protein. It has been shown that by adding a large soluble fusion protein, the folding propensity and therefore the solubility of the target protein itself can be increased (75, 90-94). The most widely used fusion proteins are glutathione S-transferase (GST), maltose binding protein (MBP) and thioredoxin (TrxA). Fusion proteins do not only serve as solubilising factors but can also aid in purification and detection of the target protein. Although there are obvious benefits of adding a large soluble protein, there are some clear limitations to it. For instance, the fusion protein should preferably be removed before crystallisation trials either by laborious recloning or digestions. Large fusion partners might also alter the solubility of the target protein in a negative way and removal could therefore result in an unpleasant surprise, such as protein aggregation and precipitation (94-98). Instead of adding a large fusion protein for detection and purification, that could potentially alter the solubility of the target protein, small peptide tags can be used instead. The most commonly used peptide tag is the His-tag. A stretch of six histidines is added, in the cloning step either upstream or downstream of the target protein, which has the ability to bind divalent cations, typically Ni2+ or Co2+. If these cations are immobilised on a gel resin, the target protein can be caught and separated from non-histidinetagged proteins. This method, which we today refer to as IMAC 21

(immobilised metal affinity chromatography), was first described in the late 1980’s (99) and has since then revolutionised recombinant protein purification. Commercially available antibodies and probes, conjugated with horseradish peroxidase (HRP) or alkaline phosphatase (AP), directed towards His-tags have since then been generated and therefore a His-tag can be used for target protein detection based on immunochemicals. Construct design The second approach to increase the chances of obtaining recombinant soluble expression is to change the characteristics of the target protein. As already mentioned the expression of prokaryotic proteins can be problematic and success rates for soluble expression of such a protein is approximately ~50 %. However, for eukaryotic proteins the success rates drop significantly to ~30%8. This could be due to the larger size of these proteins, the number of domains and flexible linker regions (which might be protease sensitive), the requirement of specific chaperones for folding and the requirement for post-translational modifications. A natural reaction to these problems has been the attempts to clone and express the individual domains of the target protein, which has been proven to be very useful (57, 75, 100-103). This strategy is based on the theory that if the full-length protein fails to express or crystallise, perhaps its individual domains might. Domains can be predicted either experimentally, with limited proteolysis coupled with MS analysis (101, 102), deuterium exchange MS (104) or with special computer programs (105). The latter strategy, designing new constructs partly with help of domain predictions, has successfully been employed within SG-initiatives where several expression constructs for one target protein are generated (Figure 4). It has been shown that by using this approach the probability of generating soluble protein increased two-fold (100). Since these types of domain prediction programs have a fair degree of uncertainty, several expression constructs have to be designed that start close to the predicted domains.

 8

http://targetdb.pdb.org/statistics/targetstatistics.html

 22

a) b)

Figure 4 Construct design. a) A schematic picture of a result from a domain prediction program of a multi-domain protein. b) New expression constructs are designed to define domain borders in hope of finding a better expressing construct.

The solubility of a protein can also be increased by making random or focused mutations or deletions. This approach is called in vitro evolution and will be described later in the text. Whether new expression constructs are generated in a focused or random approach generate, all of them have to be screened for soluble expression.

Screening for recombinant soluble overexpression The aim of a screen, no matter how it is executed, is to rapidly reduce a large number of clones/targets to a more easily handled number. The traditional approach when screening for soluble expression is usually done with liquid cultures in individual vials and more recently in a 96-well format. 1 ml cultures are grown and induced in parallel, cells are harvested, lysed and soluble material is separated from insoluble by centrifugation and/or purification. The soluble fraction is usually analysed by SDS-PAGE gels (53, 60, 85, 106-108). Robots can perform certain steps in this process while others still have to be done manually. A couple of years ago an effort at genome wide expression screening was attempted. Some 10,167 ORFs from the nematode Caenorhabditis elegans was screened for soluble expression in E. coli in a 96-well format. Soluble expression could be detected for 1,356 ORFs corresponding to a success rate of 13% (109). This number is very much lower than more recently reported success rates for eukaryotic proteins and it was later shown that many ORFs 23

were wrongly annotated, had mispredicted gene boundaries and were out of frame (110). Although only one vector and one expression strain was used and some steps were automated, the workload of this effort is likely to have been very large. In our lab, we have previously developed a method that utilises filtration in order to separate soluble material from insoluble called FiDo (filtration dot blot) (111). Liquid cultures are grown and induced in a 96-well plate. A small fraction of the culture is then transferred to another 96-well plate with a low protein binding submicron filter in the bottom. The liquid media is removed by vacuum and a bacterial pellet formed on top of the filter. The pellet is either resuspended in a denaturing lysis buffer (solubilising all proteins in the bacteria) or a native lysis buffer (only releasing the soluble proteins). Vacuum is reapplied and the filtrate is collected in a collector plate. The filtrate is then used to make dots on a membrane with a high protein affinity, like nitrocellulose. The nitrocellulose is then blocked and probed with an antibody or probe and developed like a Western blot. The FiDo screen has also been modified to accommodate for an affinity purification step to be able to determine the purifying ability of the target protein (112). In order to secure a high output of structures in an SG pipeline the best strategy would be to generate multiple constructs subcloned with different fusion proteins/tags, which would then be expressed in different strains and at different temperatures. A quick calculation shows that working with 96 targets, creating 10 variants of each (based on domain predictions) cloned with 2 different fusion proteins/tags and expressed in two different strains at two different temperatures, would generate 7,680 different experiments and although most of the work would be done in a 96-well format, it would be quite labour intense. Therefore these types of combinatorial experiments are currently not pursued within SG initiatives due to high costs and heavy workload. Colony screening methods As already mentioned, solubility can be increased by adding a large fusion protein. The solubility would then subsequently be screened based on the physical characteristics of a soluble protein, such as its ability to be 24

separated from insoluble protein by centrifugation, filtration or purification of liquid cultures. However, solubility of a protein is intimately connected to its folding and activity and could therefore be monitored by fusing the target protein to a reporter protein, that when folded correctly would give rise to an easily monitored phenotype. The theory relies on, that if the reporter protein is well folded, hence soluble, the target protein should also be soluble and well folded (Figure 5). a)

b)

Figure 5 Schematic picture of a target protein (grey) fused to a reporter protein (black). a) If the target protein is well folded and soluble, the reporter protein will fold and the phenotype can be observed. b) If the target protein misfolds, the reporter protein will misfold and no phenotype will be seen.

Solubility would not have to be screened in liquid cultures but instead at colony level and would therefore lift the heavy burden of handling liquid cultures since thousands of colonies, if they all carried different constructs, could potentially be screened on one colony plate. Waldo et al described in 1999 a method to monitor folding and therefore solubility in colonies by fusing the target protein to GFP (green fluorescent protein). Only bacterial colonies that fluoresce would have complete readthrough and express a well-folded and soluble target protein and vice versa (113). Another method relying on the same theory, presented by Maxwell et al, is to fuse CAT (chloramphenicol acetyltransferase) to the protein of interest. Only a bacterium that can grow in the presence of the antibiotic chloramphenicol would express the target protein (114). Both these methods are easy to use and allow thousands of colonies to be screened in one experiment.

25

Nevertheless, these types of methods have potential drawbacks. Firstly, the reporter protein might affect the solubility of the target as already mentioned. Secondly, false positives are seen for example when GFP is used (115, 116), and thirdly, the reporter protein has to be removed before the target protein can be used for structural studies. In order to avoid the drawbacks of adding a large soluble protein, several methods only relying on a small reporter peptide have been developed. In practice, these methods rely on splitting a large reporter protein in two, where none of the parts are active on their own. The target protein is fused to a small peptide, corresponding to a vital part of the reporter protein. If the target protein is soluble and well folded the phenotype should be detectable when the “rest” of the reporter protein is added or co-expressed and vice versa (Figure 6).

a)









b)

Figure 6 The theory behind a split reporter protein. The target (grey) is fused with a part of the reporter protein (dark grey). If the target protein folds the tag will subsequently fold and when combined with the rest of the reporter protein (black), the phenotype will show. If the target protein is insoluble, the tag will be unfolded and no phenotype will be seen when adding the rest of the reporter protein.

Wigley et al (117) used -galactosidase in this manner. -galactosidase was split into a 52 amino acid -fragment, which was fused to different control targets, and an -fragment corresponding to the rest of the protein. When the control targets were expressed, colonies would turn blue or white depending on the solubility. This method was effectively used to screen a hybrid gene library of the human P450 for proper folding and solubility. Even though this method seems to work well, especially when wanting to monitor slow folding processes, a certain degree of false positives and negatives could be 26

observed. Additionally, in a review by G.S Waldo from 2003 it was claimed that the -fragment could render proteins insoluble (118). The developers of the previously described GFP-method have recently developed it to better suit the demands of a smaller tag (119). Only part of GFP, representing -strand 11, is fused to the target protein via a flexible linker. This approach has successfully been used to screen mutation libraries of proteins from Mycobacterium tuberculosis (115). Both these methods have approached the problem, that reporter proteins can remain active although it is fused to an insoluble target protein as well as that they might alter protein solubility, by splitting the reporter protein in two pieces. Both these methods have the advantage that they work in vivo and in vitro making it easy to monitor solubility of the protein in a cell lysate. A small complication is however, that in order for the rest of the reporter protein to be present in the cell, two plasmids have to be used and the protein has to be co-expressed. A method that can directly monitor folding has been described by DeLisa et al (120). It relies on that the twin-arginine translocation (Tat) pathway only moves well-folded proteins across the inner membrane to the periplasmic space of the E. coli cell (121). By fusing the target protein to a Tat exporting-signal at the N-terminal and to -lactamase (which only confers antibiotic resistance in the periplasm) at the C-terminal, the folding of the target protein can be directly monitored. This method was tested on a number of well-characterised proteins known to be soluble in the cytoplasm (like GST, MBP, GFP, TrxA etc) as well as some cytoplasmic unstable proteins and a good correlation could be seen. Potentially, -lactamase could influence the folding in some unforeseen way and there were no reports on any upper size-limit of the proteins that could be exported with the Tatpathway. In addition, before the target protein can be used the tag and/or the -lactamase have to be removed either by recloning or digestions. By fusing a gene to a C-terminal biotin acceptor peptide and therefore enabling biotinylation in vivo by E. coli biotin ligase, BirA (122), detection and affinity purification can be used based on the very strong binding of biotin to the protein avidin (123). Tarendeau et al (124) used this type of approach at the colony level. Colonies carrying an expression construct with a C-terminal biotin acceptor peptide (Avi-tag) are arrayed very closely, by a robot, on a nitrocellulose 27

membrane and induced for expression. The Avi-tag will be biotinylated in vivo and after lysis, colonies expressing biotinylated proteins can be detected on the nitrocellulose by probing with a fluorescent streptavidin conjugate. A deletion library of almost 27,000 constructs (and with a seven-fold oversampling!) of the influenza virus polymerase PB2 was successfully screened in this manner and the approach was called ESPRIT (expression of soluble proteins by incremental truncation). A major advantage is (at least in theory) that since neither recloning nor digestions should be needed, identified constructs can, if a plate replica were to be made, go directly in to scale up experiments and subsequent affinity purifications based on biotins affinity for avidin could be done. In addition no reports have so far surfaced on any potential negative side effects by adding an Avi-tag, whether it affects the solubility or if aggregated protein could potentially be biotinylated and misinterpreted as soluble. Although this is a very powerful method in its present implementation it still relies on expensive robotics, which would not be part of standard laboratory equipment, and the use of costly streptavidin-magnetic beads. Several colony-based fusion protein or fusion tag screens have been developed and evolved into well performing strategies. They are all elegant methods that allow for a swift and easy way to identify targets that could be suitable for large-scale protein production. Most importantly, however, is that when it comes to colony-based screens they are best put to use when the desire is to screen large collections of gene variants. The CoFi blot (paper I and III) We wanted to develop a method that works as a solubility screen at the colony level but neither relies on a reporter protein that could potentially affect the solubility of the target protein nor makes use of a tag that needs a reporter protein to be co-expressed. Since we had very good experiences with the previously described FiDo screen we adapted it in such a way so it would allow us to screen soluble expression at the colony level. In this method colonies are grown on a plate, called the master plate, and are transferred to a submicron low protein-binding filter that can separate inclusion bodies from soluble protein. Colonies on the master plate are 28

regrown and colonies on the filter are induced for expression by placing the filter on a plate containing IPTG. After induction the filter is used to make a filter sandwich (Figure 7). Upon lysis, soluble protein will diffuse through the filter and attach to a high protein-binding membrane, such as nitrocellulose. Detection is done by incubating the nitrocellulose with probes or antibodies directed at the target protein and using standard immunochemicals.

a)

b)

c)

d)

e)

f)

g)

h)

Figure 7 A schematic picture of the CoFi-blot method. a) Colonies are grown on a master plate, which are b) transferred to a Durapore filter and c) expression is induced on a plate containing IPTG. d) The filter with the colonies is then used to make a filter sandwich consisting of the Durapore on top of a nitrocellulose and a Whatman paper with lysis buffer. e) Close up of sandwich. f) Upon lysis, by repeated freeze thawing, soluble protein will diffuse through the filter and bind to the nitrocellulose. g) The nitrocellulose is then blocked and incubated with probes or antibodies directed at the target protein. h) The signals are detected by chemiluminescence.

In our case we tend to use His-tag fusions, which efficiently can be probed with Ni2+ conjugates as well as be used for purification. Colonies that give rise to signals are picked from the master plate and can either go directly into scale-up experiments or be further analysed. We chose to name this method the CoFi blot (colony filtration blot).

29

In order to verify how well the CoFi blot works we decided to compare it to the traditional method of growing liquid cultures and separating soluble protein from insoluble by centrifugation. 32 eukaryotic and 24 E. coli proteins were subcloned in two different expression vectors yielding either an N-terminal His- or FLAG-tag. Targets were both grown and induced as colonies, which were subjected to the CoFi blot, and as liquid cultures. The bacteria in the liquid cultures were harvested, lysed and centrifuged. Dots were made on a nitrocellulose of both total and soluble protein content and developed in the same way as for the CoFi blot.

Figure 8 The CoFi blots performance was compared to traditional way of screening for soluble expression, centrifugation. In the picture the results and correlation between the two methods are shown for 32 eukaryotic proteins. Targets have been noted with (+), (-) or (0). (+) for where the two methods are in agreement, (-) for disagreement and (0) for when there is no total expression. Constructs with no total expression have been excluded from the statistics.

We have shown that the two methods are in 84% agreement with a fairly good correlation between expression levels. We have also shown that the CoFi blot is reproducible by re-screening clones in quadruplicates. Differences between the two methods can be due to the different metabolic states for a bacteria growing in a liquid culture as opposed to growing on a solid support. Another factor that could explain the deviances is the affinity of the filter for the proteins. Even though the filter is a low-protein binding

30

filter, some proteins might still stick and therefore the CoFi blot would not work as a detection method for these particular proteins. In summary, we have developed a colony expression screen, called the CoFi blot, with good reproducibility and correlation to a more traditional expression screen. The CoFi blot can be applied to any type of protein to which antibodies have been generated or that contain a detectable tag, such as a His-tag. Since it utilises standard molecular biology reagents and equipment, it is a method suitable for any laboratory. Another advantage is that the CoFi blot only utilises a small affinity tag for detection and not a large fusion protein and therefore there is little risk that the tag will influence the solubility/folding of the target protein. In addition, colonies containing constructs that yield soluble protein, can be directly picked from the master plate and be subjected to scale up experiments. One major advantage is that the CoFi blot detects solubility after lysis and separation of cell debris, meaning that the target protein has survived two additional steps required for scale-up purification and is therefore, potentially, a better indicator of a useful protein as compared to other methods where solubility is detected in the cell before lysis. The CoFi blot also carries the same advantage of other colony-based screens; the ability to screen thousands of colonies in a single experiment.

Library methods A successful strategy to make a protein soluble is by making mutations, amino acid substitutions and deletions that could favour protein expression and/or folding (115, 125). These types of alterations could be made in a focused manner like changing the design of the expression construct based on domain predictions (generating only a limited number of variants), or by randomly generating thousands of variants (by error prone PCR, DNA shuffling, truncations etc) creating a library. However, beneficial mutations can be quite rare and one could potentially end up with a needle in a haystack scenario. A colony screen is therefore an efficient mean to lift the heavy burden of screening overexpression of such a library with traditional methods.

31

As already mentioned, compared to prokaryotic proteins eukaryotic are often larger and consist of more domains connected by flexible linker regions. As discussed, a very common approach to increase a protein’s solubility is also to change the expression construct by predicting domains. Domains are predicted by computer programs, and expression constructs close to predicted domain borders are cloned and tried for soluble expression. Even though this approach has shown to be very successful, domain predictions based on computer programs are not always accurate. Domain borders can also be determined experimentally by limited proteolysis coupled to MS, but this approach poses a kind of catch-22 situation since it relies on soluble expression from start. We wanted to develop a strategy with which we could obtain or enhance soluble protein expression and potentially map domain borders at the same time. Deletion libraries screened with the CoFi blot (paper II) We decided to produce deletion libraries of 21 human proteins that previously, as determined in paper I, had been insoluble or did not express protein at all to in an effort render them soluble and at the same time investigate if we could experimentally map domain borders. Deletion libraries were generated at DNA-level from the 5’ end of the target gene by a method described by Henikoff et al (126) and is now available as a kit called Erase-a-Base® (Promega). The target gene has to be cloned into a plasmid, which is linearised and incubated with exonuclease III that removes nucleotides from one strand of the DNA at a constant rate. By removing timed aliquots a deletion library spanning the entire length of the gene can be created. In order for this method to work, two restriction sites have to be introduced in front of the target gene to ensure unidirectional deletion. One to create a 5’-overhang that is susceptible to exonuclease III digestion and one that generates a protecting 3’-overhang at the opposite end of the gene. The remaining single strand is removed by S1 nuclease and the ends are flushed by Klenow polymerase. The linear blunt-ended fragments are then re-ligated and transformed into a cloning strain (Figure 9). Colonies are grown and harvested and the library is extracted and transformed into an expression

32

strain. The library inside the expression strain is then screened for soluble expression with the CoFi blot method.

Figure 9 Schematic picture of the Erase-a-base procedure. A plasmid containing appropriate restrictions sites in front of the target gene is linearised. A deletion library, spanning the entire gene, is generated by adding exonuclease III and removing timed aliquots. The remaining single strand is removed by SI nuclease and blunt ends are generated by Klenow polymerase. The plasmids are religated and a cloning strain is transformed.

21 human proteins were chosen, where 19 were successfully recloned into an expression vector with appropriate restriction sites in front of the genes. For detection and purification purposes we added a His-tag in front of the restrictions sites. Deletion libraries were generated for all 19 genes and positive colonies could be seen for 17 (Figure 10).

33

Figure 10 CoFi blots of 6 targets. Positive colonies can be seen as black spots on a, b, c, d, and f and negative colonies are seen as very faint grey spots. Since this is a deletion library from the 5’end, only one third of the colonies can in theory be positive. e) Example of a completely negative CoFi blot. Positive colonies were picked for analysis and identification purposes.

From each target, 24 positive colonies were picked and used to inoculate small liquid cultures that were induced for expression. The soluble fraction was purified by IMAC and the eluate was analysed as Dot blots. At this stage 14 targets gave soluble expression and out of these, 11 could be seen as a band on an SDS-PAGE gel (Figure 11). The remaining three proteins with no soluble protein detected were either extracellular or membrane associated. Figure 11 SDS-PAGE of constructs identified with the CoFi blot method a) The best expressing constructs were purified by small scale IMAC and the eluate was run on SDS-PAGE. b) 6 different constructs from two proteins showing the difference in construct length and their level

34

Because we wanted to see if this approach could be used to experimentally determine domain borders we ran two domain prediction programs, PFAM and BIOZON9, and compared them with our translational start points. As can be seen from (Figure 12), our well expressing constructs have translational starts correlating well with predicted domain borders. Interestingly, for some clones they even have a translational start within a predicted domain and these constructs would therefore not have been identified using standard domain predictions. It should also be noted that the two domain prediction programs, determine domains based on different criteria, and therefore a difference can be seen in their domain predictions, which shows the importance in these type of experimental approaches compared to rational construct design based on domain predictions.

Figure 12 Domains as predicted by PFAM and BIOZON and translational starts of constructs identified using the CoFi blot method.

 9

http://www.biozon.org/

35

Another very intriguing observation is that for most targets there are clones that express soluble protein that is full length or close to full length. This could be the result of favourable changes in these regions that affect the transcription, translation and/or folding. This would be in concurrence with a report that regions downstream of the initiation codon can act as translational enhancers (127). From the initial set used in paper I, where only 30% of the mammalian proteins could be expressed we increased the success to 60%. In retrospect we have developed an experimental method, by generating random deletions to increase soluble expression and at the same time define domain borders. The CoFi blot is routinely used in our lab and has now also been adapted to allow the screening of recombinant overexpression of integral membrane proteins. It was verified by screening random mutation libraries of nine integral membrane proteins (128) and has now been used to screen a very large part of all human membrane proteins (Unpublished results).

Expression screening complete genomes The main objective of an expression screen is to slim down the number of samples from a larger set. It could be applied to a library of different variants of the same gene, as earlier described, or a collection of genes representing for example a protein family, a pathway or a whole genome. Particularly the handling of an entire genome and the cloning into an expression vector could become quite troublesome, and new strategies are needed to handle very large gene collections for expression studies. A good start point, for such studies are Gateway® adapted entry clones allowing for the easy subcloning of a gene into an expression vector through recombination (Figure 13). Today several ORFeomes (a collection of all ORFs from one organism/genome) for a number of genomes are available as Gateway clones (110, 129-132).

36

Figure 13 Schematic picture of Gateway® cloning from Invitrogen. A PCR product is generated flanked by specific recombination sites that are complementary with a vector, pENTR. The PCR product is moved through recombination into the pENTR creating an Entry clone with new recombination sites. The Entry clone is incubated with an expression vector containing recombination sites complementary to the new sites. An expression clone is created.

As earlier mentioned a genome wide expression screen was performed on the C. elegans ORFeome v 1.1 with very low success rates. This screen was done in the more traditional 96-well format. Even though all ORFs were available as Gateway clones and some steps were automated, this work was probably very labour intense. Another attempt at handling and screening a part of the C .elegans ORFeome was reported by Gillette et al (133). This strategy named POET (pooled ORFeome expression technology) was based on pooling 688 C. elegans ORFs and in a single recombination reaction transfer them into an expression vector with a His-tag. An expression strain was subsequently transformed with this pooled ORFeome library. The transformation reaction was then used to inoculate a larger volume to make a large expression culture. This culture was induced for expression, harvested and lysed. To separate soluble protein from insoluble, the lysate was purified by IMAC. To analyse expression and separate the different overexpressed proteins from each other, the eluate was run on a 2D-gel. Intense and deviant spots were picked for MS analysis. 165 spots were picked and 50 were identified as C. elegans proteins by MS. Out of these 50 proteins, 12 were 37

chosen for small-scale expression and 6 out of these for large-scale expression. This method acts as a rough sieve to identify strong expressers in a large gene collection. It also makes the handling, from cloning to expression screening simpler. However, individual DNA measurements and the pooling and subpooling of equal amounts must have been tiresome. In addition, this method might not be readily used in labs with more modest equipment, referring to the access of 2D-gel electrophoresis or MS equipment. POET is also haunted by some limiting factors, such as the step of converting a transformation reaction into a large culture and the iso-electric focusing range on the gel. In a culture containing several variants of a bacterial strain there is always a risk that one or a few variants take over the entire culture and several targets might be lost in this step. The iso-electric range of pH 4-7 will result in that only proteins with a pI within this range will be separated, excluding a vast amount of potentially overexpressed proteins. In the end, anything identified must be fished out from the original collection and recloned for any downstream processes.

Structural genomics and human pathogens In the beginning of this thesis, a general introduction was given to human pathogens in the form of viruses. Although recombinant protein and protein structures from viruses could aid in the development of vaccines and drugs, relatively few protein structures of viral proteins have been solved10. Especially from SG initiatives submission of viral protein structures in PDB have been modest (53). This could partly be explained by that the focus, within these initiatives, is on other types of proteins. In two articles (50, 51), where the focus or part of the focus was on viral proteins, poor success rates were reported. The main problem was low expression and low solubility in E. coli. Success was predominantly achieved when each target was treated individually or expressed in insect cells.

 10

http://www.rcsb.org/pdb/home/home.do

 38

The daily SCOOP (Paper IV) We wanted to use the CoFi blot as a homing method to, within a genome, identify easily expressed targets and targets in need of special care. We designed a general strategy and applied it to the ORFeome of KSHV. KSHV consist of approximately 90 ORFs, which were supplied to us as 113 entry clones (132). ORFs coding for integral membrane proteins were cloned as full-length constructs, as well as into separate non-transmembrane domains. In similarity to POET, our approach also relied on pooling of Gateway® adapted entry clones and a single recombination reaction. However, instead of tedious DNA measurements we grew separate cultures of bacteria containing the individual entry clones over night. OD600 measurements showed similar density for all of them and equal amounts were pooled. Plasmids from this pool were extracted and ORFs were moved, in one recombination reaction, into an expression vector with an N-terminal Histag. A cloning strain was transformed with the reaction and plated. The library was harvested and an expression strain was transformed and plated. 4,000 colonies were subjected to the CoFi blot procedure. Positive colonies were analysed based on their purifying propensity with small-scale purification. Identification was done through regular Sanger sequencing. 23 unique constructs could be identified as expressing soluble protein with this approach. 11 out of these 23 constructs were randomly chosen for largescale expression and purification. In order to benchmark this method we compared it to the more traditional handling and screening; individual subcloning, expression screening and subsequent affinity purification all in a 96-well format. With the more traditional setup 25 constructs could be identified as soluble expressers. When comparing our new approach, now called SCOOP (screening colonies of ORFeome pools), with the more traditional approach 23 and 25 constructs, respectively, were identified as yielding soluble protein in E. coli. 20 out of these 23 and 25 were the same for the two experiments. To elucidate why 5 constructs were missed with SCOOP, their expression at the colony level was investigated using the CoFi blot. Soluble expression at the colony level could not be detected for 3/5. The additional 3 constructs that were only identified with the CoFi blot were grown and induced for expression in liquid cultures. The soluble fraction was affinity purified and 39

the eluate was run on SDS-PAGE gels and very faint bands could be seen on the gel, indicating that these constructs were expressed at low levels. It has long been known that a bacterium growing on solid support in a colony behaves differently from bacteria growing in a liquid culture (134, 135). In this context we therefore propose that either the metabolic state of the colony, compared to bacteria growing in liquid, does not allow for soluble expression of some ORFs and vice versa, or that the CoFi blot simply does not work for these proteins. In addition, when making pools of this kind, some targets are inevitable lost along the way, perhaps accounting for the additional two not identified by the CoFi blot. Success rates will correlate to the overall coverage/normalisation of the library and although we find several constructs more often than others, we must conclude that our library is well dispersed since we cover almost all soluble expressers. Although 113 constructs were screened by SCOOP, sequencing at a very late stage revealed empty vectors, mutations leading to frame shifts and early stop codons for 31 constructs. A TMHMM11 search of the remaining constructs predicted 8 ORFs to code for transmembrane proteins with at least 2 transmembrane regions. Of the 113 constructs used in the original experiment only 74 could be expected to produce proteins of the expected size. In a matter of days we have been able to screen the ORFeome from a human pathogen for soluble recombinant expression in E. coli using SCOOP (Figure 14). Once again we have shown the general applicability and advantages of the CoFi blot method, by which constructs identified can directly be moved further down the SG pipeline. A normalised library is created through pooling of bacterial cultures instead of at DNA-level, thus avoiding tedious DNA measurements. Subcloning into an expression vector is done in one recombination reaction and the subsequent screening is done with the CoFi blot. Identification of positive constructs is done by sequencing. From this very small set of targets, soluble expression in one strain and at one temperature was roughly about 30%, which was also verified by SCOOP. This number is a bit lower than the SG statistics of 37%, although this corresponds to proteins from all types of viruses and

 11

http://www.cbs.dtu.dk/services/TMHMM-2.0/

40

there is no information about full-length expression nor expression conditions (53).

Figure 14 Flow scheme of SCOOP.

SCOOP has also been applied to deletion libraries of the KSHV ORFeome. Pools were generated based on restriction enzyme compatibility for the Erase-a-base® protocol. Deletion libraries were generated and expression was screened using the CoFi blot. Initial results show that several other targets with widely different functions can be made to express in E. coli with this approach. We have also screened the overexpression of all integral membrane proteins from KSHV using the membrane protein CoFi-blot. SCOOP and other herpesvirus ORFeomes As mentioned, structure determination of viral proteins has been suffering, in part, due to poor soluble expression in E. coli and therefore these types of proteins have been considered as “difficult”. As a result only 525 unique virus protein structures have been submitted to the PDB, a number that includes proteins from all types of viruses, from SARS and HIV to 41

herpesviruses. Out of these 525 structures only 31 have been submitted by SG initiatives, worldwide. SCOOP was applied to four additional ORFeomes from herpesviruses, HSV-1, VZV, EBV and mCMV available as Gateway® adapted entry clones. By using one expression strain and one temperature we were able to identify 75 unique constructs as expressing soluble protein. These constructs corresponded to proteins of various functions like DNA metabolism, immune evasion and tegument proteins as well as proteins of unknown function. For 65 there was no protein structure in PDB. Large-scale expression, purification and crystallisation trials have been attempted for all 65. So far 3 protein structures have been solved.

Moving further down the pipeline One area that has been intensely studied for human herpesviruses is DNA metabolism due to the high degree of conservation among the proteins involved and the importance of DNA synthesis (5, 136). In fact the most effective drug, available on the market today, targets the viral infection at the level of DNA synthesis and therefore proteins involved in DNA metabolism would be effective as antiviral drug targets. Acyclovir is an acyclic nucleoside analogue used to treat alpha herpesvirus infections, which targets the viral thymidine kinase that phosphorylates acyclovir to acyclovirmonophosphate. Downstream phosphorylations, by cellular enzymes, convert this monophosphate to a triphosphate that when incorporated in to the growing DNA chain of the virus, by the viral DNA polymerase, causes chain termination. Acyclovir and its related compounds represent a success story in the general treatment of HSV, VZV and CMV infections (25, 137). It can be administered as a lotion on symptomatic tissue, orally or intravenously depending on the severity of infection. However, resistant strains of these viruses have been identified, especially in HIV-infected persons and almost all significant resistance has been seen in immunocompromised patients (25, 137, 138). For these patients more aggressive antiviral remedies have to be used. In short, all herpesviruses encode a core set of six DNA synthesis enzymes (Table 4); a DNA polymerase, a DNA polymerase processivity subunit, three helicase primase subunits and a single strand DNA binding protein

42

(aka ICP8). In addition to their importance in DNA synthesis they also mediate homologous recombination during viral replication (5, 139). A second set of proteins with links to DNA synthesis, more or less conserved in the herpesvirus family includes the thymidine kinase, alkaline endo-exonuclease, ribonucleotide reductase, uracil N-glycosylase and a dUTPase (Table 4). These proteins are considered non-essential for replication in cultured cells but several appear essential for “normal” behaviour of the virus in animal models (5). Table 4 The proteins involved in DNA replication and synthesis, their abbreviations and their gene names in all the human herpesviruses. White rows are proteins directly part of DNA synthesis and grey rows are proteins with links to DNA synthesis. Enzyme

HSV-1/-2

VZV

HCMV

HHV6/7

EBV

KSHV

DNA polymerase, POL DNA polymerase processivity subunit, PPS Helicase-primase ATPase subunit, HP1 Helicase-primase RNA pol subunit B, HP2 Helicase-primase subunit C, HP3 Single strand DNA binding protein, ICP8 Thymidine kinase, TK Alkaline exonuclease, AE Deoxyuridine triphosphate, dUTPase Uracil-DNA glycosidase, UNG Ribonucleotide reductase, large subunit, R1 Ribonucleotide reductase, small subunit, R2

UL30

ORF28

UL54

U38

BALF5

ORF9

UL42

ORF16

UL44

U27

BMRF1

ORF59

UL5

ORF55

UL105

U77

BBLF4

ORF44

UL52

ORF6

UL70

U43

BSLF1

ORF56

UL8

ORF52

UL102

U74

BBLF2/3

ORF40/41

UL29

ORF29

UL57

U41

BALF2

ORF6

UL23 UL12

ORF36 ORF48

UL98

U70

BXLF1 BGLF5

ORF21 ORf37

UL50

ORF8

UL72

U45

BLLF3

ORF54

UL2

ORF59

UL114

U81

BKRF3

ORF46

UL39

ORF19

UL45

U28

BORF2

ORF61

UL40

ORF18

-

-

BaRF1

ORF60

43

The alkaline endo-exonuclase (AE) is conserved in all herpes viruses and is involved in the maturation and packaging of viral DNA (140, 141). In HSV this protein interacts with ICP8 (142). These two proteins can also work together to promote strand exchange similar to recombination events in bacteriophage  (143). Recently it was shown that the AE from both EBV and KSHV is involved in host shutoff at the mRNA level, an activity that has not been shown for the other HHV AEs (140, 144, 145). The well-known allosterically regulated enzyme ribonucleotide reductase (RNR) is solely responsible for the de novo synthesis of all four deoxyribonucleotides. RNR is a heterotetramer of two subunits, R1 and R2 (146). All HHV carries homologues to R1 and R2 except beta herpesviruses, which lack the R2 (5). Interestingly, the RNR of HSV is not negatively regulated by high TTP pools as their cellular counterparts and seems necessary for viral growth in “resting” cells (147, 148). There are reports that this viral enzyme has an accessory function to protect HSV-1 infected cells against cytokine induced apoptosis (149). Uracil N-glycosylase functions as a repair enzyme that cleaves mutated U residues from the DNA backbone (150, 151). The virally encoded dUTPase breaks down dUTP, thus preventing its incorporation into viral DNA(152, 153). More understanding of proteins involved in DNA synthesis and replication, their mode of action and interactions, amongst themselves but also with cellular proteins, could aid in the development of new drugs targeting this important step of the viral life cycle. The fundamental mission of all viruses is to replicate, spread and persist in the host environment to which they have become adapted. Viruses differ with respect to the mechanisms by which they accomplish their objectives. The difference is reflected not only in the basic mechanisms of viral entry, synthesis of proteins, nucleic acid synthesis, virion assembly and egress but also with respect to the basic strategies by which they counteract the enormous recourses of the host cell and of the multicellular organism aiming at total viral extinction. In principal, as a virus, you want no reports to the immune system or to the apoptotic machinery of your presence and you will take necessary precautions to achieve this. However, the immune system and the apoptotic machinery are not so easily tricked with many back up systems 44

to fend off and terminate any invader. The infected cell, must in order to pass immune system control, show several signs of being “normal”, therefore certain mechanisms need to be upregulated while others need to be blocked. Anything remotely suspicious will result in an attack by the immune system. Immune evasion in the host is a major feature of herpesviruses. To persist and replicate in a host cell the virus has to launch several anti-host functions. For herpesviruses these functions tend to fall, more or less, within four categories, i) selectively block synthesis of new proteins, ii) block the function of preexisting host proteins activated after infection, iii) selectively degrade cellular proteins and iv) block signalling to the host immune system indicating that the cell is infected.

Host shutoff in HHV Viruses i.e. polio, influenza, SARS-CoV etc frequently induce host shutoff to avoid competition from host cell transcripts during the mass-production of their own proteins and to prevent expression of factors involved in stimulating an immune response (154). One way to take over a cell is to down regulate cellular protein synthesis by degrading RNA (155, 156). Not only are host proteins not synthesised due to low mRNA levels but pre-existing polyribosomes are also degraded and splicing is impaired. Host shutoff is a consequence of infection of alpha herpesviruses and gamma herpesviruses and occurs a few hours after infection (5, 157). One of the consequences of host shutoff is a block in the synthesis of HLA class I and II molecules, revealed by reduced levels of these antigen presenting complexes at the surface of cells in the EBV lytic phase. This effect could lead to escape of T-cell recognition and elimination (157). In alpha herpesviruses, mRNA degradation is a result of the tegument protein vhs (virion host shutoff) protein encoded by UL41 (158). However, although vhs functions as an RNase it cannot succeed in complete host shutoff by itself and collaborates with ICP27, inhibiting splicing of mRNA (5, 159). Intriguingly, it appears that some mRNAs are more resistant to degradation than others (160). Vhs homologues are found in all alpha herpesviruses but not in beta-or gamma herpesviruses (157).

45

Host shutoff at the mRNA level has also been observed in gamma herpesviruses. However, this occurrence has been ascribed to another protein, the already mentioned alkaline exonucleases (AEs) of these viruses (140, 144, 145). A screen of KSHV genes for ability of their corresponding gene products to, in vivo, diminish expression of a GFP reporter protein lead to the identification of the gene product of ORF37 as a host shutoff factor. Since this protein functions both as an exo- and endonuclease as well as a shutoff protein it was named SOX (shutoff and exonuclease) (140). SOX SOX is conserved in all herpesviruses as an alkaline exonuclease involved in DNA maturation and packaging, but it is only the gamma herpesvirus homologues that display a shutoff activity in vivo. SOX expression initiates 8 hours after lytic reactivation, which precisely coincides with the beginning of host shutoff (161). Unlike vhs, SOX is not a tegument protein (162) and the co-expression with GFP resulted in loss of GFP fluorescence, which was the result of the GFP mRNA degradation. This discovery was further verified by siRNA knockdowns (140, 144). In vitro studies of the herpesviral AEs exhibit a 5’-3’ and a 3’-5’ (10-fold weaker) dsDNA exonuclease activity and an endonuclease activity (163) They work best at an alkaline pH and have an absolute requirement for Mg2+ (although Mn2+ can work, equally well for HSV-1/2 AE at some concentrations) (164-166). For the HSV null mutant in gene UL12, growth is severely restricted but is still able to replicate and encapsidate considerable amounts of viral DNA. Capsids, as a result from this mutant, are impaired in the egress from the nucleus to the cytoplasm (167). In addition the HSV AE, UL12, exhibits strand exchange activity, in vitro, and acts together with ICP8 as a recombinase (143, 168). In contrast to UL12, which only localises to the nucleus, SOX localises both to the nucleus and cytoplasm (144). Complex sequence alignments showed that all herpesviral AEs belong to the PD-(D/E)XK superfamily, which comprises endonucleases (i.e. EcoRI, EcoRV BamHI etc), nicking enzymes and one exonuclease (169-172). Typically, sequence similarity between these proteins is so low that it is only upon structure determination that they are assigned to this superfamily. Although low sequence similarity, it has been proposed that all these 46

proteins share a common core of a 4-5-stranded -sheet that brings together the charged residues in the conserved motif PDX10-30(D/E)XK that is involved in Mg2+ binding and catalysis (169). When aligning all herpesviral AEs, seven conserved linear motifs can be identified (173). Both random and focused mutations identified single and double function mutants. Single function mutants lacked either DNase or shutoff activity in vitro or in vivo respectively, whereas double function mutants had neither. When linearly mapping these mutations it was revealed that mutations that abolished shutoff activity located outside of these conserved motifs (except one, which is not conserved) whereas double function mutations and single function mutations affecting only DNase activity located within the conserved linear motifs (Table 5, Figure 15) (144). Table 5 Summary of locations of mutations that affect SOX activity (140, 144) Mutation

DNase activity

Shutoff activity

Conserved motif

Conserved residue in HHV AEs

T24I A61T Q129H P176S G217A D221E V369I D474N Y477Stop

Yes Yes No Yes No No Yes Yes Yes

No No Yes No No No No No No

No No I No II II VI No No

No No Yes No Yes Yes No No No

Figure 15 Schematic figure of the seven (I-VII) conserved motifs of SOX and the mutations affecting the activity. Black box indicates a double function mutation. Grey box indicates single function mutations abolishing DNase activity and white boxes indicate single function mutations abolishing shutoff activity.

47

As a result it was proposed that SOX is a bifunctional protein that exhibits DNase activity in the nucleus and shutoff activity in the cytoplasm, inducing host shutoff. Because mutations that abolished shutoff activity located outside of conserved linear motifs, it was proposed that this activity had been acquired later in evolution (140). However, since the shutoff activity only could be shown in vivo, it was suggested that SOX might need an additional cellular or viral-binding partner to induce this action. The EBV homologue, BGLF5, has been shown to possess the same features as SOX by similar characterisations (145, 166, 174, 175). With an amino acid sequence identity of 42% it has now been shown that both these proteins down regulate expression of HLA molecules, which impairs the recognition of infected cells, by T-cells and that this is not a feature of the other herpesviral AEs (175).

Structural studies of the SOX protein from KSHV (Paper V) One of the proteins identified as positive with SCOOP was the KSHV SOX protein. Catalytically active SOX, both native and selenomethionine labelled, was expressed with an N-terminal His-tag in E. coli BL21[DE3] and purified to homogeneity with our generic two-step purification scheme by IMAC and gel filtration. Well diffracting crystals were obtained through vapour diffusion. Native crystals diffracted to 1.85 Å and phases were obtained by SAD. SOX consists of 487 residues with a molecular mass of 55 kDa. It consists of two domains giving it a rough bowl shape. The two domains, the N-terminal domain and the core domain are connected by a partly unmodelled region called the bridge. The N-terminal domain consists of -helices and the core domain consist of a 5 stranded -sheet flanked by several helices in similarity to the rest of the members in the PD(D/E)-XK superfamily. The two domains form a crevice where the bottom is made up of a U-shaped loop, called the U-loop, stretching from core domain into the N-terminal domain (Figure 16). This crevice is positively charged, consists of highly conserved amino acids in all herpesviral AEs as well as the residues in the already mentioned conserved sequence motif of the PD(D/E)-XK superfamily. 48

U-loop

Figure 16 Overall model of SOX. The N-terminal domain in light and the core domain in dark. Three sulphates can be seen in the density and have therefore been modelled. Density for what we believe is a magnesium ion could be seen in the cleft and has therefore been modelled. The bridge connecting the N-terminal domain with the core domain is indicated by a dotted line

Densities can be seen for what we believe are three sulphates, where two are coordinated by totally or semi-conserved residuesproposed to be binding the site for the substrate backbone, and one is involved in crystal contacts. A magnesium ion has also been modelled in the crevice. This magnesium ion is partly coordinated by residues in the PD(D/E)XK motif. A DALI search revealed a surprisingly similar fold to the only known exonuclease within the superfamily, the -exonuclease (Figure 17) (176).

Figure 17 Superposition of a monomer from the -exonuclease (dark) on SOX (light). -exonuclease is also a two-domain protein. The two domains in -exonuclease are connected by a completely modelled bridge. The partly modelled bridge in SOX is an ordered -helix and the U-loop has small -sheet in the -exonuclease. The bridge in -exonuclease is completely modelled and a sulphate in SOX is bound in the same location as a phosphate in -exonuclease. In addition, the magnesium ion in SOX is located in the same position as a manganese ion in exonuclease (not shown).

49

Substrate binding In a superposition of SOX and the -exonuclease, we can see that one of the sulphates is located and coordinated in the same way as a phosphate in the exonuclease structure. The magnesium ion in SOX is also located and coordinated as a manganese ion in the -exonuclease structure as well as other divalent ions in other proteins belonging to the same superfamily. Since this area in SOX is highly conserved and both the tertiary structure as well as the aligning of ligands with the -exonuclease, we propose that this is the site for substrate binding. However, although great similarities are observed in both the overall fold of these two enzymes, dissimilarities can be seen. For instance, SOX is larger than -exonuclease and where SOX has partly unstructured and unmodelled regions (such as the bridge and the Uloop) -exonuclease has an ordered secondary structure. We believe that the -exonuclease structure represents what for SOX would be a substrate bound state, where conformational changes lead to ordered secondary structures in the bridge and the U-loop. Another major difference between these two proteins is their quaternary structure, where the -exonuclease is active as a trimer (176), which could explain its processivity. No prior reports or our results indicate that SOX functions as anything but a monomer (166). The second modelled sulphate in SOX, located at the beginning of the crevice, is coordinated by totally or semi-conserved residues, which would indicate that this is a true substrate backbone binding site. To elucidate whether this sulphate could represent part of the substrate backbone, we modelled a 4bp oligo by aligning the phosphates in the backbone to the two sulphates in the model. As can be seen (Figure 18) this oligo fits nicely in this highly conserved crevice and aligns very well with our sulphates. The second phosphate in the oligo backbone places itself very close to the modelled magnesium ion. It is possible that the phosphodiester bond is cleaved by a nucleophilic attack by an activated water molecule close to the magnesium, which stabilises negative charges in this area, cleaving off one nucleotide at a time. Since the substrate of SOX can be both double stranded and single stranded SOX must have a type of unwinding activity. We therefore believe that SOX separates the strands of the dsDNA and the single stranded substrate destined for digestion is guided into the active site by the bridge and is stabilised by binding to conserved and partly conserved 50

aromatic residues in and around the crevice. The backbone is cleaved and the free nucleotide is released on the opposite side of the “tunnel” created under the bridge. The remaining free single strand of the substrate is directed away from the protein surface.

Figure 18 Conserved residues in SOX as determined by sequence alignments with all other HHV AEs. Totally conserved residues are depicted in red and least conserved residues are light blue. A 4bp oligo was modelled in the crevice based on the location of the two sulphates.

SOX as an RNase? An intriguing question is whether SOX can mediate host shut off through an intrinsic RNase activity. As already mentioned, functional characterisations of SOX in vivo, show that it affects the total mRNA content of the host cell and although host shutoff is observed in alpha herpesviruses, the other herpesviral AE’s cannot induce this type of degradation.

Figure 19 Mapping mutations that affect the activity of SOX. Red deletes overall activity. Orange deletes DNase activity and pink the shutoff activity.

51

In addition, single function mutations abolishing the shutoff activity of SOX locate outside of conserved linear sequence motifs found in all herpesviral AE’s. It has therefore been proposed that the host shutoff trait of SOX, and BGLF5 from EBV, has been acquired later in evolution. It has also been proposed that SOX might need an additional cellular binding partner or that it does not have any intrinsic RNase activity and only activates some sort of RNA decay pathway (140, 144). With the structure in hand we can now map known mutations, which affect the shutoff function. As can be seen (Figure 19), mutations that abolish shutoff function are spread all over the protein, with predominance in the Nterminal domain. The mutations locate both on the exposed surface and in buried regions although they are mostly found close to the surface. Since SOX has a nuclease type active site, with an extended pocket that appear well suited for binding nucleic acids, it could be argued that both the DNase and a potential shutoff RNase activity are carried out in the same active site, since no other obvious binding or active site can be seen in the structure. As support for this theory, both double-function mutations (G217A, D221E), located in the U-loop, as well as one shutoff mutation (P176S), located in the end of the unmodelled bridge are either directly involved in metal binding or reside very close to the substrate binding and active site. As in most mutational studies carried out in vivo, it cannot be ruled out that these mutations only affect the folding and stability of SOX, cancelling out activity all together. Even so, these mutations emphasize the importance of this region. However, since RNase activity cannot be detected for SOX in vitro, additional proteins are likely to be required enabling the nuclease activity for mRNA degradation. Possibly this can be viral or host factors mediating the interaction of SOX with conserved elements of mRNA. What would support the hypothesis of an additional cellular binding partner for shutoff activity is the predominance of mutations affecting shutoff in the N-terminal domain, especially in parts that are not present in the exonuclease. Perhaps this domain, which also carries a more overall positive charge than the rest of the protein, will play a role in cellular protein binding. In this scenario, whether or not it is SOX that carries out the RNA degradation or the potential binding partner cannot be determined presently.

52

Nucleotide synthesis For every living cell, cell division is a highly regulated process and even the slightest stray could lead to serious consequences, where cancer is a classic sign. In a multicellular organism, there has to be a strict balance between cells that rest, divide and die. The cell cycle is a series of events that lead to cell division and is under strict control of key enzymes called cyclins and cyclin dependent kinases. The cell cycle is roughly divided into four phases; G1-, S-, G2- and M-phase. During G1, cells synthesise RNA and proteins that are needed for S-phase, which is characterised by DNA synthesis and chromosome replication. In the G2-phase, proteins needed for cell division are synthesised, which is followed by actual cell division during the Mphase. An additional phase exists for cells, called G0, this phase is entered by cells in G1 and is the state of cells that are non-dividing or are fully differentiated. The actual time period a cell is present in G0 varies with cell type some, i.e. neurons, never leave. The passage from one phase to another is under rigorous regulation and has certain check points that need to be cleared in order for safe cell division. Deoxyribonucleotides are the building blocks of DNA and the synthesis of deoxyribonucleotides peaks during Sphase. The enzyme responsible for the production of deoxyribonucleotides is the extremely well studied and strictly regulated enzyme ribonucleotide reductase (146). However, this enzyme is also important for the synthesis of deoxyribonucleotides in resting cells, as a response to DNA damage. Both alpha and gamma herpesviruses encode their own RNR while beta herpesviruses only encode one of the RNR subunits (R1) (5).

Ribonucleotide reductase



RNR is an enzyme that can be found in all kingdoms of life as well as in viruses. As already mentioned, RNR performs the chemically difficult reduction of all four ribonucleotides to their corresponding deoxyribonucleotides in a reaction involving a free radical (177). The allosterically regulated RNR is the only enzyme that can perform this reaction and it therefore controls the entire dNTP pool in the cell.

53

Fig 20 The reaction catalysed by RNR.

RNR’s have been divided based on their radical cofactors into three classes, I, II and III, where class I further is divided into subclasses Ia and Ib. A strictly balanced supply of deoxyribonucleotides is essential for accurate DNA replication and repair (146). RNR could therefore be a potential drug target for the treatment of different types of cancer and viral infections (178182). RNR is a heterotetramer consisting of two homodimers, R1 and R2 and has an absolute requirement for iron in the form of a di-iron site located in the R2 subunit. A free radical is generated, by reductive cleavage of molecular oxygen at the di-iron site of the R2 subunit, which is transported to the R1 subunit to activate the nucleotide substrate for catalysis (177). RNR activity is under strict transcriptional and postranscriptional control and in replicating mammalian cells enzyme activity is regulated by control of R2 protein stability (146, 177). However, to produce deoxyribonucleotides as a response to DNA damage in resting cells, the expression of another cellular R2 is induced, which lies under the control of the transcription factor and tumour suppressor p53, hence it has been named p53R2 (183, 184). p53R2 also appears to play a role in providing deoxyribonucleotides for mitochondrial DNA synthesis (188) Structural studies of the R2 subunit of the ribonucleotide reductase from EBV and KSHV (manuscript in preparation) The EBV genome encodes for a class I RNR R2 subunit with low amino acid sequence identity to bacterial and eukaryotic R2s. This R2, encoded by the BaRF1 gene was identified as positive by SCOOP of the EBV ORFeome and is different to R2s of other organisms in that it has a glutamate (Glu61) in the position of a metal-coordinating aspartate common to most R2s. The structure of EBV R2 was solved in its apo, mono metal and di-metal form 54

(Figure 21) at 1.9, 1.6 and 2.0 Å respectively and their radical generating abilities were investigated using electron paramagnetic resonance (EPR). The monomer of the EBV R2 is an -helix bundle in similarity to other R2s, but with several interesting differences in the region around the metal site (Figure 22). In one of the monomers in the apo and monometal structures, residues 62-73 of -helix B are disordered and Glu61 is either disordered or oriented away from the metal binding site. Additionally, there is no traceable density for residues 120-143 in 2 and D indicating that these elements too are disordered. These stretches seem to order themselves only when the di-ferrous metal site is fully formed, where Glu61 participates in metal coordination. This large conformational flexibility upon iron site formation has not previously been described for any R2 subunits. Recently, however, disorder in this particular region is found in three R2 structures also recently available in PDB; of H. sapiens p53R2 (2VUX) in Bacillus halodurans (2RCC), Plasmodium vivax (2O1Z) and S. cerevisiae (2RCC). It is likely that the extra flexibility of helix B affects the rate of iron binding and therefore the timing of the radical generation reaction. EPR confirms that EBV R2 is slower in generating the radical and that the radical site is more accessible to radical scavengers. These observed deviances in the different metal bound forms of EBV R2, become all the more interesting when put into a wider virus-host cell context. p53R2 also displays this disordered region. In nonproliferating cells, DNA damage induces expression of the p53R2 and also the R1 protein. Together they form an active RNR complex supplying the resting cell with dNTPs for DNA repair (183, 186). Since the host cell already encodes for its own RNR it is therefore believed that the viral RNR supplies nucleotides in a resting host cell (187). The mobility of these regions in both EBV R2 and p53R2 may be a relevant adaptation to particular chemical environments in a resting cell, and may have a functional role in modulating iron and oxygen access, respectively to the active site.

55

Figure 21 Model of the EBV R2 homodimer in its di-metal form. The subunits are coloured in light blue and dark blue. Iron atoms are coloured in orange.

a)

c)

b)

Figure 22 Cartoon model of the a) apo-EBV R2 b) mono metal and c) di-metal EBV R2. The secondary structure elements described in the text, which are disordered for one of the monomers in the apo and monometal forms, are coloured in cyan (helix B) and light pink (helix 2 and D).

We have also solved the apo structure of KSHV R2 encoded by ORF60. Initial results show that the apo form of KSHV R2 also has the unstructured stretch in the -helix B, which additionally supports the hypothesised and specific role of gamma herpesvirus R2s and p53R2. However, the chemical basis for this role still remains to be elucidated.

56

Future prospects  Needless to say, viral biology is an interesting field. To thoroughly understand its evolution, pathogenesis, to develop new drugs and potentially new vaccines and diagnostic assays etc we need to study the individual viral proteins both functionally and structurally. Unfortunately, the recombinant overexpression of viral proteins has so far proven to be very difficult and therefore very few viral proteins have been structurally determined. We have developed methods and strategies to speed up the structural genomics pipeline and increase the throughput. Recombinant viral proteins are being routinely pushed through our pipeline and protein structures are coming out as a result. Three structures have so far been solved from proteins involved in DNA metabolism, which is an area of both scientific as well as medical interest. However, to really comprehend a virus’ interaction with its host, structural studies of protein complexes are necessary and this will in the future be pursued. The CoFi blot and SCOOP are routinely used in our lab, screening gene collections of various kinds. The CoFi blot is evolving and we are finding new applications for it.

57

Acknowledgements When wanting to write the acknowledgements I realised that what I appreciate the most, with all the people that have surrounded me during this journey, is their fantastic patience and their never fading trust and support. Many people have been part and I would especially like to thank a few of them. Pär, för ditt förtoende. Inte bara när att du antog mig som student utan också för din övertygelse om att mitt projekt skulle fungera. För att du lät mig åka ända till Kina. Du har skapat en inspirerande miljö och jag har lärt mig mycket, både om forskning och mig själv. Stefan Nordlund, för att du alltid lyssnar tålmodigt på en students predikament. Alla på DBB, speciellt Anki, Kicki, Lotta och Ann, som hjälpt mig många gånger med administrativa saker. Past and present members of PN group especially: Martin A, Vickan, Susanne F, Helena, Susanne VdB, Deb, Tove, Karl-Magnus, Agnes, Martin M, Martin Hä, Amin, Benita, Anders T, Jessica A, Andreas, Herwig, Albert, Ken, Lina-Beth, Alberto, Marie, Pelle, Hanna, Henrik, Anna, Florian, Elisabeth and Marina. Monica, för alla visdomsord om livet. Said, för ditt fantastiska tålamod och för att du tar hand om labbet!! Ulrika, U som i ullig, L för labrador!!!! För att du är en fantastisk vän. Audur, för sjukt roliga stunder både i och utanför labbet! Och all hjälp!!!! Lola, for always sharing ranting experiences and listening to my ranting. Heidi, för prat och stöd. Martin, Stina, Pål och Emma, för upplyftande diskussioner som berör alla möjliga ämnen! Daniel M-M, för att du alltid kan få mig att skratta. Karin, för vänskap på labbet och utanför. Daniel G, för din humor. Damian, för att du tålmodigt lyssnar och hjälper mig med mina dataproblem. Maria D, för alla leenden. People at Beida, especially Xiao Dong, Erik, Nan Jie, Xiaoyan, Koh and Dr Gao! Xie xie. Sunny “shi pa” and family, your generosity and curiosity overwhelm me. I am certain we will meet again! Mina Kina-kompisar, Anders K, Kim, Tove, Anna, Sara, Anders L, Ola och Maria!!!! Stockholms biovetenskapliga forskarskola, med Eva och Camilla i spetsen. Mina fosko-kompisar, speciellt Anton, Anna och Linda. Med era historier inser jag att jag inte är själv!

58

Mia och Linda för att ni är helt utanför labblivet, men alltid är på min sida! Jenny, för att du alltid är redo att luncha. Crista and Jill, for your never seizing friendship although you are so far away. Jasmine and Stephanie, for great times and really being there for me! Familjerna Nolvi, Jansson/Umana, Tronström och Sundesson. My entire family in Singapore, who means the world to me, see you soon! Familjen Cornvik, för att ni alltid lyssnar på mig och mitt eviga tjatande om allt och inget. Anna-Karin, det finns inga ord som beskriver vår vänskap. Du är världens bästa och ingen kan ersätta dig! Kim, världens bästa bror, som alltid ser till att ta ner mig på jorden och som alltid ställer upp för mig. Mamma och Pappa, för att ni helt utan tvekan står vid mig. Ni ger mig perspektiv och försöker alltid förstå vad jag håller på med, både på labbet och med mitt liv. Tobias, för den fantastiska person du är. För det du betytt, betyder och alltid kommer att betyda för mig. Du gör mig komplett i alla avseenden. DOJ, DOJ!

59

References 1. 2. 3. 4. 5.

6. 7. 8. 9.

10. 11. 12. 13. 14.

15.

Koonin, E. V., Senkevich, T. G., and Dolja, V. V. (2006) The ancient Virus World and evolution of cells, Biol Direct 1, 29. Cann, A. J. (2005) Principles of Molecular Virology (Standard Edition), Fourth Edition. Abo, M. E., and Sy, A. A. (1998) Rice Virus Diseases: Epidemiology and Management Strategies, J. Sust. Agri. 11, 113134. Khodosevich, K., Lebedev, Y., and Sverdlov, E. (2002) Endogenous retroviruses and human evolution, Comp. Funct. Genomics 3, 494498. Arvin, A., Campadelli-Fiume, G., Mocarski, E., Moore, P. S., Roizman, B., Whitley, R., and Yamanishi, K. (2007) Human Herpesviruses, Biology, Therapy and Immunoprophylaxis, First Edition. Dutta, A. (2008) Epidemiology of poliomyelitis-Options and update, Vaccine. Zhu, F. X., Chong, J. M., Wu, L., and Yuan, Y. (2005) Virion proteins of Kaposi's sarcoma-associated herpesvirus, J. Virol. 79, 800-811. Baldick, C. J., Jr., and Shenk, T. (1996) Proteins associated with purified human cytomegalovirus particles, J. Virol. 70, 6097-6105. Johannsen, E., Luftig, M., Chase, M. R., Weicksel, S., CahirMcFarland, E., Illanes, D., Sarracino, D., and Kieff, E. (2004) Proteins of purified Epstein-Barr virus, Proc. Natl. Acad. Sci. U. S. A. 101, 16286-16291. Loret, S., Guay, G., and Lippe, R. (2008) Comprehensive characterization of extracellular herpes simplex virus type 1 virions, J. Virol. 82, 8605-8618. Kalejta, R. F. (2008) Tegument proteins of human cytomegalovirus, Microbiol. Mol. Biol. Rev. 72, 249-265. Heldwein, E. E., and Krummenacher, C. (2008) Entry of herpesviruses into mammalian cells, Cell. Mol. Life Sci. 65, 16531668. Spear, P. G. (2004) Herpes simplex virus: receptors and ligands for cell entry, Cell. Microbiol. 6, 401-410. Mori, I., and Nishiyama, Y. (2005) Herpes simplex virus and varicella-zoster virus: why do these human alphaherpesviruses behave so differently from one another?, Rev. Med. Virol. 15, 393406. Lin, A., Xu, H., and Yan, W. (2007) Modulation of HLA expression in human cytomegalovirus immune evasion, Cell. Mol. Immunol. 4, 91-98. 60

16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27.

28.

29. 30. 31.

Laurent, C., Meggetto, F., and Brousset, P. (2008) Human herpesvirus 8 infections in patients with immunodeficiencies, Hum. Pathol. 39, 983-993. Klein, E., Kis, L. L., and Klein, G. (2007) Epstein-Barr virus infection in humans: from harmless to life endangering viruslymphocyte interactions, Oncogene 26, 1297-1305. Griffiths, P. D. (2006) CMV as a cofactor enhancing progression of AIDS, J. Clin. Virol. 35, 489-492. Sandri-Goldin, R. M. (2003) Replication of the herpes simplex virus genome: does it really go around in circles?, Proc. Natl. Acad. Sci. U. S. A. 100, 7428-7429. Deshmane, S. L., and Fraser, N. W. (1989) During latency, herpes simplex virus type 1 DNA is associated with nucleosomes in a chromatin structure, J. Virol. 63, 943-947. Kennedy, P. G., and Chaudhuri, A. (2002) Herpes simplex encephalitis, J. Neurol. Neurosurg. Psychiatry 73, 237-238. Landolfo, S., Gariglio, M., Gribaudo, G., and Lembo, D. (2003) The human cytomegalovirus, Pharmacol. Ther. 98, 269-297. De Bolle, L., Naesens, L., and De Clercq, E. (2005) Update on human herpesvirus 6 biology, clinical features, and therapy, Clin. Microbiol. Rev. 18, 217-245. Vancikova, Z., and Dvorak, P. (2001) Cytomegalovirus infection in immunocompetent and immunocompromised individuals--a review, Curr. Drug Targets Immune Endocr. Metabol. Disord. 1, 179-187. Mercorelli, B., Sinigalia, E., Loregian, A., and Palu, G. (2008) Human cytomegalovirus DNA replication: antiviral targets and drugs, Rev. Med. Virol. 18, 177-210. Du, M. Q., Bacon, C. M., and Isaacson, P. G. (2007) Kaposi sarcoma-associated herpesvirus/human herpesvirus 8 and lymphoproliferative disorders, J. Clin. Pathol. 60, 1350-1357. Barozzi, P., Potenza, L., Riva, G., Vallerini, D., Quadrelli, C., Bosco, R., Forghieri, F., Torelli, G., and Luppi, M. (2007) B cells and herpesviruses: a model of lymphoproliferation, Autoimmun. Rev. 7, 132-136. Cotter, M. A., 2nd, and Robertson, E. S. (1999) The latencyassociated nuclear antigen tethers the Kaposi's sarcoma-associated herpesvirus genome to host chromosomes in body cavity-based lymphoma cells, Virology 264, 254-264. Leight, E. R., and Sugden, B. (2000) EBNA-1: a protein pivotal to latent infection by Epstein-Barr virus, Rev Med Virol 10, 83-100. Thorley-Lawson, D. A., Duca, K. A., and Shapiro, M. (2008) Epstein-Barr virus: a paradigm for persistent infection - for real and in virtual reality, Trends Immunol. 29, 195-201. Lunemann, J. D., and Munz, C. (2007) Epstein-Barr virus and multiple sclerosis, Curr. Neurol. Neurosci. Rep. 7, 253-258. 61

32. 33.

34. 35. 36. 37.

38. 39.

40. 41. 42.

43. 44. 45.

Posnett, D. N. (2008) Herpesviruses and autoimmunity, Curr. Opin. Investig. Drugs 9, 505-514. Chang, Y., Cesarman, E., Pessin, M. S., Lee, F., Culpepper, J., Knowles, D. M., and Moore, P. S. (1994) Identification of herpesvirus-like DNA sequences in AIDS-associated Kaposi's sarcoma, Science 266, 1865-1869. Verma, S. C., and Robertson, E. S. (2003) Molecular biology and pathogenesis of Kaposi sarcoma-associated herpesvirus, FEMS Microbiol. Lett. 222, 155-163. Dedicoat, M., and Newton, R. (2003) Review of the distribution of Kaposi's sarcoma-associated herpesvirus (KSHV) in Africa in relation to the incidence of Kaposi's sarcoma, Br. J. Cancer 88, 1-3. Haburchak, D. R., Thomason, J. W., Edelman, D. C., and Constantine, N. T. (2001) Negative human herpesvirus 8 serology in sarcoidosis, J. Hum. Virol. 4, 111. Di Alberti, L., Piattelli, A., Artese, L., Favia, G., Patel, S., Saunders, N., Porter, S. R., Scully, C. M., Ngui, S. L., and Teo, C. G. (1997) Human herpesvirus 8 variants in sarcoid tissues, Lancet 350, 16551661. Sjak-Shie, N. N., Vescio, R. A., and Berenson, J. R. (1999) The role of human herpesvirus-8 in the pathogenesis of multiple myeloma, Hematol. Oncol. Clin. North Am. 13, 1159-1167. Zong, J. C., Arav-Boger, R., Alcendor, D. J., and Hayward, G. S. (2007) Reflections on the interpretation of heterogeneity and strain differences based on very limited PCR sequence data from Kaposi's sarcoma-associated herpesvirus genomes, J. Clin. Virol. 40, 1-8. Choi, J., Means, R. E., Damania, B., and Jung, J. U. (2001) Molecular piracy of Kaposi's sarcoma associated herpesvirus, Cytokine Growth Factor Rev. 12, 245-257. Fujimuro, M., Hayward, S. D., and Yokosawa, H. (2007) Molecular piracy: manipulation of the ubiquitin system by Kaposi's sarcomaassociated herpesvirus, Rev. Med. Virol. 17, 405-422. Neipel, F., Albrecht, J. C., and Fleckenstein, B. (1997) Cellhomologous genes in the Kaposi's sarcoma-associated rhadinovirus human herpesvirus 8: determinants of its pathogenicity, J. Virol. 71, 4187-4192. McGeoch, D. J., and Cook, S. (1994) Molecular phylogeny of the alphaherpesvirinae subfamily and a proposed evolutionary timescale, J. Mol. Biol. 238, 9-22. Davison, A. (2002) Comments on the phylogenetics and evolution of herpesviruses and other large DNA viruses, Virus Res. 82, 127132. Fleischmann, R. D., Adams, M. D., White, O., Clayton, R. A., Kirkness, E. F., Kerlavage, A. R., Bult, C. J., Tomb, J. F., Dougherty, B. A., Merrick, J. M., McKenney, K., Sutton, G., 62

46. 47.

FitzHugh, W., Fields, C., Gocayhe, J.D., Scott, J., Shirley, R., Liu, L-I., Glodek, A., Kelley, J.M, Weidman, J.F., Phillips, C.A., Spriggs, T., Hedblom, E., Cotton, M.D., Herback, T., Hanna, M.C., Nguyen, D.T., Saudek, D.M., Brandon, R.C., Fine, L.D., Frichtman, J.L., Fuhrmann, J.L., Geoghagen, N.S.M., Gnehm, C.C., McDonald, L.A., Small, K.V., Fraser, C.M., Smith, H.O., Venter, J.C. (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd, Science 269, 496-512. (2001) Initial sequencing and analysis of the human genome, Nature 409, 860-921. Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., Gocayne, J. D., Amanatides, P., Ballew, R. M., Huson, D. H., Wortman, J. R., Zhang, Q., Kodira, C. D., Zheng, X. H., Chen, L., Skupski, M., Subramanian, G., Thomas, P. D., Zhang, J., Gabor Miklos, G. L., Nelson, C., Broder, S., Clark, A. G., Nadeau, J., McKusick, V. A., Zinder, N., Levine, A. J., Roberts, R. J., Simon, M., Slayman, C., Hunkapiller, M., Bolanos, R., Delcher, A., Dew, I., Fasulo, D., Flanigan, M., Florea, L., Halpern, A., Hannenhalli, S., Kravitz, S., Levy, S., Mobarry, C., Reinert, K., Remington, K., AbuThreideh, J., Beasley, E., Biddick, K., Bonazzi, V., Brandon, R., Cargill, M., Chandramouliswaran, I., Charlab, R., Chaturvedi, K., Deng, Z., Francesco, V. D., Dunn, P., Eilbeck, K., Evangelista, C., Gabrielian, A. E., Gan, W., Ge, W., Gong, F., Gu, Z., Guan, P., Heiman, T. J., Higgins, M. E., Ji, R.-R., Ke, Z., Ketchum, K. A., Lai, Z., Lei, Y., Li, Z., Li, J., Liang, Y., Lin, X., Lu, F., Merkulov, G. V., Milshina, N., Moore, H. M., Naik, A. K., Narayan, V. A., Neelam, B., Nusskern, D., Rusch, D. B., Salzberg, S., Shao, W., Shue, B., Sun, J., Wang, Z. Y., Wang, A., Wang, X., Wang, J., Wei, M.-H., Wides, R., Xiao, C., Yan, C., Yao, A., Ye, J., Zhan, M., Zhang, W., Zhang, H., Zhao, Q., Zheng, L., Zhong, F., Zhong, W., Zhu, S. C., Zhao, S., Gilbert, D., Baumhueter, S., Spier, G., Carter, C., Cravchik, A., Woodage, T., Ali, F., An, H., Awe, A., Baldwin, D., Baden, H., Barnstead, M., Barrow, I., Beeson, K., Busam, D., Carver, A., Center, A., Cheng, M. L., Curry, L., Danaher, S., Davenport, L., Desilets, R., Dietz, S., Dodson, K., Doup, L., Ferriera, S., Garg, N., Gluecksmann, A., Hart, B., Haynes, J., Haynes, C., Heiner, C., Hladun, S., Hostin, D., Houck, J., Howland, T., Ibegwam, C., Johnson, J., Kalush, F., Kline, L., Koduru, S., Love, A., Mann, F., May, D., McCawley, S., McIntosh, T., McMullen, I., Moy, M., Moy, L., Murphy, B., Nelson, K., Pfannkoch, C., Pratts, E., Puri, V., Qureshi, H., Reardon, M., Rodriguez, R., Rogers, Y.-H., Romblad, D., Ruhfel, B., Scott, R., Sitter, C., Smallwood, M., Stewart, E., Strong, R., Suh, E., Thomas, R., Tint, N. N., Tse, S., Vech, C., Wang, G., Wetter, J., Williams, S., 63

48.

49. 50. 51.

52. 53.

Williams, M., Windsor, S., Winn-Deen, E., Wolfe, K., Zaveri, J., Zaveri, K., Abril, J. F., Guigo, R., Campbell, M. J., Sjolander, K. V., Karlak, B., Kejariwal, A., Mi, H., Lazareva, B., Hatton, T., Narechania, A., Diemer, K., Muruganujan, A., Guo, N., Sato, S., Bafna, V., Istrail, S., Lippert, R., Schwartz, R., Walenz, B., Yooseph, S., Allen, D., Basu, A., Baxendale, J., Blick, L., Caminha, M., Carnes-Stine, J., Caulk, P., Chiang, Y.-H., Coyne, M., Dahlke, C., Mays, A. D., Dombroski, M., Donnelly, M., Ely, D., Esparham, S., Fosler, C., Gire, H., Glanowski, S., Glasser, K., Glodek, A., Gorokhov, M., Graham, K., Gropman, B., Harris, M., Heil, J., Henderson, S., Hoover, J., Jennings, D., Jordan, C., Jordan, J., Kasha, J., Kagan, L., Kraft, C., Levitsky, A., Lewis, M., Liu, X., Lopez, J., Ma, D., Majoros, W., McDaniel, J., Murphy, S., Newman, M., Nguyen, T., Nguyen, N., Nodell, M., Pan, S., Peck, J., Peterson, M., Rowe, W., Sanders, R., Scott, J., Simpson, M., Smith, T., Sprague, A., Stockwell, T., Turner, R., Venter, E., Wang, M., Wen, M., Wu, D., Wu, M., Xia, A., Zandieh, A., and Zhu, X. (2001) The Sequence of the Human Genome, Science 291, 1304-1351. Venter, J. C., Remington, K., Heidelberg, J. F., Halpern, A. L., Rusch, D., Eisen, J. A., Wu, D. Y., Paulsen, I., Nelson, K. E., Nelson, W., Fouts, D. E., Levy, S., Knap, A. H., Lomas, M. W., Nealson, K., White, O., Peterson, J., Hoffman, J., Parsons, R., Baden-Tillson, H., Pfannkoch, C., Rogers, Y. H., and Smith, H. O. (2004) Environmental genome shotgun sequencing of the Sargasso Sea, Science 304, 66-74. Kim, S. H. (1998) Shining a light on structural genomics, Nat. Struct. Biol. 5 Suppl, 643-645. Tarbouriech, N., Buisson, M., Geoui, T., Daenke, S., Cusack, S., and Burmeister, W. P. (2006) Structural genomics of the Epstein-Barr virus, Acta Crystallogr. D. Biol. Crystallogr. 62, 1276-1285. Fogg, M. J., Alzari, P., Bahar, M., Bertini, I., Betton, J. M., Burmeister, W. P., Cambillau, C., Canard, B., Corrondo, M. A., Coll, M., Daenke, S., Dym, O., Egloff, M. P., Enguita, F. J., Geerlof, A., Haouz, A., Jones, T. A., Ma, Q., Manicka, S. N., Migliardi, M., Nordlund, P., Owens, R. J., Peleg, Y., Schneider, G., Schnell, R., Stuart, D. I., Tarbouriech, N., Unge, T., Wilkinson, A. J., Wilmanns, M., Wilson, K. S., Zimhony, O., and Grimes, J. M. (2006) Application of the use of high-throughput technologies to the determination of protein structures of bacterial and viral pathogens, Acta Crystallogr. D. Biol. Crystallogr. 62, 1196-1207. Marsden, R. L., and Orengo, C. A. (2008) Target selection for structural genomics: an overview, Meth. Mol. Biol. 426, 3-25. Graslund, S., Nordlund, P., Weigelt, J., Hallberg, B. M., Bray, J., Gileadi, O., Knapp, S., Oppermann, U., Arrowsmith, C., Hui, R., Ming, J., dhe-Paganon, S., Park, H. W., Savchenko, A., Yee, A., 64

54.

55.

56. 57.

58.

Edwards, A., Vincentelli, R., Cambillau, C., Kim, R., Kim, S. H., Rao, Z., Shi, Y., Terwilliger, T. C., Kim, C. Y., Hung, L. W., Waldo, G. S., Peleg, Y., Albeck, S., Unger, T., Dym, O., Prilusky, J., Sussman, J. L., Stevens, R. C., Lesley, S. A., Wilson, I. A., Joachimiak, A., Collart, F., Dementieva, I., Donnelly, M. I., Eschenfeldt, W. H., Kim, Y., Stols, L., Wu, R., Zhou, M., Burley, S. K., Emtage, J. S., Sauder, J. M., Thompson, D., Bain, K., Luz, J., Gheyi, T., Zhang, F., Atwell, S., Almo, S. C., Bonanno, J. B., Fiser, A., Swaminathan, S., Studier, F. W., Chance, M. R., Sali, A., Acton, T. B., Xiao, R., Zhao, L., Ma, L. C., Hunt, J. F., Tong, L., Cunningham, K., Inouye, M., Anderson, S., Janjua, H., Shastry, R., Ho, C. K., Wang, D., Wang, H., Jiang, M., Montelione, G. T., Stuart, D. I., Owens, R. J., Daenke, S., Schutz, A., Heinemann, U., Yokoyama, S., Bussow, K., and Gunsalus, K. C. (2008) Protein production and purification, Nat. Methods 5, 135-146. Lesley, S. A., Kuhn, P., Godzik, A., Deacon, A. M., Mathews, I., Kreusch, A., Spraggon, G., Klock, H. E., McMullan, D., Shin, T., Vincent, J., Robb, A., Brinen, L. S., Miller, M. D., McPhillips, T. M., Miller, M. A., Scheibe, D., Canaves, J. M., Guda, C., Jaroszewski, L., Selby, T. L., Elsliger, M. A., Wooley, J., Taylor, S. S., Hodgson, K. O., Wilson, I. A., Schultz, P. G., and Stevens, R. C. (2002) Structural genomics of the Thermotoga maritima proteome implemented in a high-throughput structure determination pipeline, Proc. Natl. Acad. Sci. U. S. A. 99, 11664-11669. Bussow, K., Scheich, C., Sievert, V., Harttig, U., Schultz, J., Simon, B., Bork, P., Lehrach, H., and Heinemann, U. (2005) Structural genomics of human proteins--target selection and generation of a public catalogue of expression clones, Microb. Cell. Fact. 4, 21. Heinemann, U., Bussow, K., Mueller, U., and Umbach, P. (2003) Facilities and methods for the high-throughput crystal structural analysis of human proteins, Acc. Chem. Res. 36, 157-163. Banci, L., Bertini, I., Cusack, S., de Jong, R. N., Heinemann, U., Jones, E. Y., Kozielski, F., Maskos, K., Messerschmidt, A., Owens, R., Perrakis, A., Poterszman, A., Schneider, G., Siebold, C., Silman, I., Sixma, T., Stewart-Jones, G., Sussman, J. L., Thierry, J. C., and Moras, D. (2006) First steps towards effective methods in exploiting high-throughput technologies for the determination of human protein structures of high biomedical value, Acta Crystallogr. D. Biol. Crystallogr. 62, 1208-1217. Aricescu, A. R., Assenberg, R., Bill, R. M., Busso, D., Chang, V. T., Davis, S. J., Dubrovsky, A., Gustafsson, L., Hedfalk, K., Heinemann, U., Jones, I. M., Ksiazek, D., Lang, C., Maskos, K., Messerschmidt, A., Macieira, S., Peleg, Y., Perrakis, A., Poterszman, A., Schneider, G., Sixma, T. K., Sussman, J. L., Sutton, G., Tarboureich, N., Zeev-Ben-Mordehai, T., and Jones, E. Y. 65

59.

60.

61.

62. 63. 64. 65. 66.

67. 68. 69.

(2006) Eukaryotic expression: developments for structural proteomics, Acta Crystallogr. D. Biol. Crystallogr. 62, 1114-1124. Gong, W. M., Liu, H. Y., Niu, L. W., Shi, Y. Y., Tang, Y. J., Teng, M. K., Wu, J. H., Liang, D. C., Wang, D. C., Wang, J. F., Ding, J. P., Hu, H. Y., Huang, Q. H., Zhang, Q. H., Lu, S. Y., An, J. L., Liang, Y. H., Zheng, X. F., Gu, X. C., and Su, X. D. (2003) Structural genomics efforts at the Chinese Academy of Sciences and Peking University, J. Struct. Funct. Genomics 4, 137 - 139. Alzari, P. M., Berglund, H., Berrow, N. S., Blagova, E., Busso, D., Cambillau, C., Campanacci, V., Christodoulou, E., Eiler, S., Fogg, M. J., Folkers, G., Geerlof, A., Hart, D., Haouz, A., Herman, M. D., Macieira, S., Nordlund, P., Perrakis, A., Quevillon-Cheruel, S., Tarandeau, F., van Tilbeurgh, H., Unger, T., Luna-Vargas, M. P., Velarde, M., Willmanns, M., and Owens, R. J. (2006) Implementation of semi-automated cloning and prokaryotic expression screening: the impact of SPINE, Acta Crystallogr. D. Biol. Crystallogr. 62, 1103-1113. Edwards, A. M., Arrowsmith, C. H., Christendat, D., Dharamsi, A., Friesen, J. D., Greenblatt, J. F., and Vedadi, M. (2000) Protein production: feeding the crystallographers and NMR spectroscopists, Nat. Struct. Biol. 7, 970-972. Sorensen, H., and Mortensen, K. (2005) Soluble expression of recombinant proteins in the cytoplasm of Escherichia coli, Microb.l Cell Fact. 4, 1-8. Makrides, S. C. (1996) Strategies for achieving high-level expression of genes in Escherichia coli, Microbiol. Rev. 60, 512538. Hannig, G., and Makrides, S. C. (1998) Strategies for optimizing heterologous protein expression in Escherichia coli, Trends Biotechnol. 16, 54-60. Prouty, W. F., and Goldberg, A. L. (1972) Fate of abnormal proteins in E. coli accumulation in intracellular granules before catabolism, Nature New Biol. 240, 147-150. Prouty, W. F., Karnovsky, M. J., and Goldberg, A. L. (1975) Degradation of abnormal proteins in Escherichia coli. Formation of protein inclusions in cells exposed to amino acid analogs, J. Biol. Chem. 250, 1112-1122. Bowden, G. A., Paredes, A. M., and Georgiou, G. (1991) Structure and morphology of inclusion bodies in Escherichia coli, Biotechnology. (N. Y). 9, 725-730. Taylor, G., Hoare, M., Gray, D. R., and Martson, F. A. O. (1986) Size and density of inclusion bodies, Biotechnology. (N. Y). 4, 553557. Garcia-Fruitos, E., Gonzalez-Montalban, N., Morell, M., Vera, A., Ferraz, R., Aris, A., Ventura, S., and Villaverde, A. (2005) 66

70. 71. 72. 73. 74. 75.

76.

77. 78.

79.

80.

81. 82.

Aggregation as bacterial inclusion bodies does not imply inactivation of enzymes and fluorescent proteins, Microb. Cell Fact. 4, 27. Carrio, M. M., Corchero, J. L., and Villaverde, A. (1998) Dynamics of in vivo protein aggregation: building inclusion bodies in recombinant bacteria, FEMS Microb. Lett. 169, 9-15. Carrio, M. M., and Villaverde, A. (2003) Role of molecular chaperones in inclusion body formation, FEBS Lett. 537, 215-221. Baneyx, F., and Mujacic, M. (2004) Recombinant protein folding and misfolding in Escherichia coli, Nat. Biotechnol. 22, 1399-1408. Baneyx, F., and Palumbo, J. L. (2003) Improving heterologous protein folding via molecular chaperone and foldase co-expression, Methods Mol. Biol. 205, 171-197. Sorensen, H. P., and Mortensen, K. K. (2005) Advanced genetic strategies for recombinant protein expression in Escherichia coli, J. Biotechnol. 115, 113-128. Dyson, M. R., Shadbolt, S. P., Vincent, K. J., Perera, R. L., and McCafferty, J. (2004) Production of soluble mammalian proteins in Escherichia coli: identification of protein features that correlate with successful expression, BMC Biotechnol 4, 32. Vincentelli, R., Bignon, C., Gruez, A., Canaan, S., Sulzenbacher, G., Tegoni, M., Campanacci, V., and Cambillau, C. (2003) Mediumscale structural genomics: strategies for protein expression and crystallization, Acc. Chem. Res. 36, 165-172. Brocchieri, L., and Karlin, S. (2005) Protein length in eukaryotic and prokaryotic proteomes, Nucleic Acids Res 33, 3390-3400. Sorensen, H. P., Sperling-Petersen, H. U., and Mortensen, K. K. (2003) Production of recombinant thermostable proteins expressed in Escherichia coli: completion of protein synthesis is the bottleneck, J. Chromatogr. B Analyt. Technol. Biomed. Life. Sci. 786, 207-214. Hua, Z., Wang, H., Chen, D., Chen, Y., and Zhu, D. (1994) Enhancement of expression of human granulocyte-macrophage colony stimulating factor by argU gene product in Escherichia coli, Biochem. Mol. Biol. Int. 32, 537-543. Baneyx, F., and Georgiou, G. (1991) Construction and characterization of Escherichia coli strains deficient in multiple secreted proteases: protease III degrades high molecular weight substrates in vivo, J. Bacteriol. 173, 2696-2703. Studier, F. W., and Moffatt, B. A. (1986) Use of bacteriophage T7 RNA polymerase to direct selective high-level expression of cloned genes, J. Mol. Biol. 189, 113-130. Miroux, B., and Walker, J. E. (1996) Over-production of proteins in Escherichia coli: Mutant hosts that allow synthesis of some

67

83.

84.

85.

86. 87.

88.

89.

90. 91.

92. 93.

membrane proteins and globular proteins at high levels, J. Mol. Biol. 260, 289-298. Bessette, P. H., Aslund, F., Beckwith, J., and Georgiou, G. (1999) Efficient folding of proteins with multiple disulfide bonds in the Escherichia coli cytoplasm, Proc. Natl. Acad. Sci. USA 96, 1370313708. Abergel, C., Coutard, B., Byrne, D., Chenivesse, S., Claude, J.-B., Deregnaucourt, C. l., Fricaux, T., Gianesini-Boutreux, C., Jeudy, S., Lebrun, R. g., Maza, C., Notredame, C. d., Poirot, O., Suhre, K., Varagnol, M., and Claverie, J.-M. (2003) Structural genomics of highly conserved microbial genes of unknown function in search of new antibacterial targets, J. Struct. Funct. Genom. 4, 141-157. Berrow, N. S., Bussow, K., Coutard, B., Diprose, J., Ekberg, M., Folkers, G. E., Levy, N., Lieu, V., Owens, R. J., Peleg, Y., Pinaglia, C., Quevillon-Cheruel, S., Salim, L., Scheich, C., Vincentelli, R., and Busso, D. (2006) Recombinant protein expression and solubility screening in Escherichia coli: a comparative study, Acta Crystallogr. D. Biol. Crystallogr. 62, 1218-1226. Vasina, J. A., and Baneyx, F. (1997) Expression of aggregationprone proteins at low temperatures: a comparative study of the E. coli cspA and tac promoter systems, Protein Expr. Purif. 9, 211-218. Vasina, J. A., and Baneyx, F. (1997) Expression of AggregationProne Recombinant Proteins at Low Temperatures: A Comparative Study of the Escherichia coli cspA and tac Promoter Systems, Protein Expr. Purif. 9, 211-218. Kiefhaber, T., Rudolph, R., Kohler, H. H., and Buchner, J. (1991) Protein aggregation in vitro and in vivo: a quantitative model of the kinetic competition between folding and aggregation, Biotechnology. (N. Y). 9, 825-829. Ferrer, M., Chernikova, T. N., Yakimov, M. M., Golyshin, P. N., and Timmis, K. N. (2003) Chaperonins govern growth of Escherichia coli at low temperatures, Nat. Biotechnol. 21, 12661267. Kapust, R. B., and Waugh, D. S. (1999) Escherichia coli maltosebinding protein is uncommonly effective at promoting the solubility of polypeptides to which it is fused, Protein Sci. 8, 1668-1674. Hammarstrom, M., Hellgren, N., Van den Berg, S., Berglund, H., and Hard, T. (2002) Rapid screening for improved solubility of small human proteins produced as fusion proteins in Escherichia coli, Protein Sci. 11, 313-321. Braun, P., Hu, Y., Shen, B., Halleck, A., Koundinya, M., Harlow, E., and LaBaer, J. (2002) Proteome-scale purification of human proteins from bacteria, Proc. Natl. Acad. Sci. USA 99, 2654-2659. Braun, P., and LaBaer, J. (2003) High throughput protein production for functional proteomics, Trends Biotechnol. 21, 383-388. 68

94. 95.

96.

97.

98. 99.

100.

101. 102.

103. 104.

105.

Waugh, D. S. (2005) Making the most of affinity tags, Trends Biotechnol. 23, 316-320. Nallamsetty, S., and Waugh, D. S. (2007) A generic protocol for the expression and purification of recombinant proteins in Escherichia coli using a combinatorial His6-maltose binding protein fusion tag, Nat. Protoc. 2, 383-391. Jeon, W., Aceti, D., Bingman, C., Vojtik, F., Olson, A., Ellefson, J., McCombs, J., Sreenath, H., Blommel, P., Seder, K., Burns, B., Geetha, H., Harms, A., Sabat, G., Sussman, M., Fox, B., and Phillips, G. (2005) High-throughput Purification and Quality Assurance of Arabidopsis thaliana Proteins for Eukaryotic Structural Genomics, J. Struct. Funct. Genom. 6, 143-147. Ashraf, S. S., Benson, R. E., Payne, E. S., Halbleib, C. M., and Gron, H. (2004) A novel multi-affinity tag system to produce high levels of soluble and biotinylated proteins in Escherichia coli, Protein Expr. Purif. 33, 238-245. Nallamsetty, S., and Waugh, D. S. (2006) Solubility-enhancing proteins MBP and NusA play a passive role in the folding of their fusion partners, Protein Expr. Purif. 45, 175-182. Hochuli, E., Bannwarth, W., Dobeli, H., Gentz, R., and Stuber, D. (1988) Genetic Approach to Facilitate Purification of Recombinant Proteins with a Novel Metal Chelate Adsorbent, Nat. Biotechnol. 6, 1321-1325. Graslund, S., Sagemark, J., Berglund, H., Dahlgren, L. G., Flores, A., Hammarstrom, M., Johansson, I., Kotenyova, T., Nilsson, M., Nordlund, P., and Weigelt, J. (2008) The use of systematic N- and C-terminal deletions to promote production and structural studies of recombinant proteins, Protein Expr. Purif. 58, 210-221. Cohen, S. L. (1996) Domain elucidation by mass spectrometry, Structure 4, 1013-1016. Gao, X., Bain, K., Bonanno, J., Buchanan, M., Henderson, D., Lorimer, D., Marsh, C., Reynes, J., Sauder, J., Schwinn, K., Thai, C., and Burley, S. (2005) High-throughput Limited Proteolysis/Mass Spectrometry for Protein Domain Elucidation, J. Struct. Funct. Genom. 6, 129-134. Bjornsson, A., Mottagui-Tabar, S., and Isaksson, L. A. (1996) Structure of the C-terminal end of the nascent peptide influences translation termination, EMBO J. 15, 1696-1704. Pantazatos, D., Kim, J. S., Klock, H. E., Stevens, R. C., Wilson, I. A., Lesley, S. A., and Woods, V. L., Jr. (2004) Rapid refinement of crystallographic protein construct definition employing enhanced hydrogen/deuterium exchange MS, Proc. Natl. Acad. Sci. U. S. A. 101, 751-756. Bateman, A., Coin, L., Durbin, R., Finn, R. D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., 69

106.

107.

108. 109.

110.

111. 112.

113. 114. 115.

116.

Sonnhammer, E. L. L., Studholme, D. J., Yeats, C., and Eddy, S. R. (2004) The Pfam protein families database, Nucleic Acids Res. 32, D138-D141. Sauder, M. J., Rutter, M. E., Bain, K., Rooney, I., Gheyi, T., Atwell, S., Thompson, D. A., Emtage, S., and Burley, S. K. (2008) High throughput protein production and crystallization at NYSGXRC, Meth. Mol. Biol. 426, 561-575. Puri, M., Robin, G., Cowieson, N., Forwood, J. K., Listwan, P., Hu, S. H., Guncar, G., Huber, T., Kellie, S., Hume, D. A., Kobe, B., and Martin, J. L. (2006) Focusing in on structural genomics: the University of Queensland structural biology pipeline, Biomol. Eng. 23, 281-289. Peti, W., and Page, R. (2007) Strategies to maximize heterologous protein expression in Escherichia coli with minimal cost, Protein Expr. Purif. 51, 1-10. Luan, C. H., Qiu, S., Finley, J. B., Carson, M., Gray, R. J., Huang, W., Johnson, D., Tsao, J., Reboul, J., Vaglio, P., Hill, D. E., Vidal, M., Delucas, L. J., and Luo, M. (2004) High-throughput expression of C. elegans proteins, Genome Res. 14, 2102-2110. Lamesch, P., Milstein, S., Hao, T., Rosenberg, J., Li, N., Sequerra, R., Bosak, S., Doucette-Stamm, L., Vandenhaute, J., Hill, D. E., and Vidal, M. (2004) C. elegans ORFeome version 3.1: increasing the coverage of ORFeome resources with improved gene predictions, Genome Res. 14, 2064-2069. Knaust, R. K. C., and Nordlund, P. (2001) Screening for soluble expression of recombinant proteins in a 96-well format, Anal. Biochem. 297, 79-85. Eshaghi, S., Hedren, M., Nasser, M. I. A., Hammarberg, T., Thornell, A., and Nordlund, P. (2005) An efficient strategy for highthroughput expression screening of recombinant integral membrane proteins, Protein Sci. 14, 676-683. Waldo, G. S., Standish, B. M., Berendzen, J., and Terwilliger, T. C. (1999) Rapid protein-folding assay using green fluorescent protein, Nat. Biotechnol. 17, 691-695. Maxwell, K. L., Mittermaier, A. K., Forman-Kay, J. D., and Davidson, A. R. (1999) A simple in vivo assay for increased protein solubility, Protein Sci. 8, 1908-1911. Cabantous, S., Pédelacq, J.-D., Mark, B. L., Naranjo, C., Terwilliger, T., and Waldo, G. (2005) Recent Advances in GFP Folding Reporter and Split-GFP Solubility Reporter Technologies. Application to Improving the Folding and Solubility of Recalcitrant Proteins from Mycobacterium tuberculosis, J. Struct. Funct. Genom. 6, 113-119. Denis-Quanquin, S., Lamouroux, L., Lougarre, A., Maheo, S., Saves, I., Paquereau, L., Demange, P., and Fournier, D. (2007) 70

117.

118. 119. 120. 121.

122. 123.

124.

125. 126. 127. 128. 129.

Protein expression from synthetic genes: selection of clones using GFP, J. Biotechnol. 131, 223-230. Wigley, W. C., Stidham, R. D., Smith, N. M., Hunt, J. F., and Thomas, P. J. (2001) Protein solubility and folding monitored in vivo by structural complementation of a genetic marker protein, Nat. Biotechnol. 19, 131-136. Waldo, G. S. (2003) Genetic screens and directed evolution for protein solubility, Curr. Opin. Chem. Biol. 7, 33-38. Cabantous, S., Terwilliger, T. C., and Waldo, G. S. (2005) Protein tagging and detection with engineered self-assembling fragments of green fluorescent protein, Nat. Biotechnol. 23, 102-107. DeLisa, M. P., Samuelson, P., Palmer, T., and Georgiou, G. (2002) Genetic analysis of the twin arginine translocator secretion pathway in bacteria, J. Biol. Chem. 277, 29825-29831. DeLisa, M. P., Tullman, D., and Georgiou, G. (2003) Folding quality control in the export of proteins by the bacterial twinarginine translocation pathway, Proc. Natl. Acad. Sci. U. S. A. 100, 6115-6120. Beckett, D., Kovaleva, E., and Schatz, P. J. (1999) A minimal peptide substrate in biotin holoenzyme synthetase-catalyzed biotinylation, Protein Sci. 8, 921-929. Scott, C. J., Martin, S. L., Wallace, A., Curran, M. D., and Walker, B. (2000) Characterization of the affinity of streptavidin toward a peptide sequence previously identified as a target substrate for biotinylation by the Escherichia coli biotin holoenzyme synthetase, BirA, Anal. Biochem. 284, 416-417. Tarendeau, F., Boudet, J., Guilligay, D., Mas, P. J., Bougault, C. M., Boulo, S., Baudin, F., Ruigrok, R. W., Daigle, N., Ellenberg, J., Cusack, S., Simorre, J. P., and Hart, D. J. (2007) Structure and nuclear import function of the C-terminal domain of influenza virus polymerase PB2 subunit, Nat. Struct. Mol. Biol. 14, 229-233. van den Berg, S., Lofdahl, P. A., Hard, T., and Berglund, H. (2006) Improved solubility of TEV protease by directed evolution, J. Biotechnol. 121, 291-298. Henikoff, S. (1984) Unidirectional digestion with exonuclease III creates targeted breakpoints for DNA sequencing, Gene 28, 351359. Etchegaray, J. P., and Inouye, M. (1999) Translational enhancement by an element downstream of the initiation codon in Escherichia coli, J. Biol. Chem. 274, 10079-10085. Martinez Molina, D., Cornvik, T., Eshaghi, S., Haeggstrom, J. Z., Nordlund, P., and Sabet, M. I. (2008) Engineering membrane protein overproduction in Escherichia coli, Protein Sci. 17, 673-680. Labaer, J., Qiu, Q., Anumanthan, A., Mar, W., Zuo, D., Murthy, T. V., Taycher, H., Halleck, A., Hainsworth, E., Lory, S., and Brizuela, 71

130.

131.

132.

133.

134. 135. 136. 137.

138. 139. 140.

L. (2004) The Pseudomonas aeruginosa PA01 gene collection, Genome Res. 14, 2190-2200. Lamesch, P., Li, N., Milstein, S., Fan, C., Hao, T., Szabo, G., Hu, Z., Venkatesan, K., Bethel, G., Martin, P., Rogers, J., Lawlor, S., McLaren, S., Dricot, A., Borick, H., Cusick, M. E., Vandenhaute, J., Dunham, I., Hill, D. E., and Vidal, M. (2007) hORFeome v3.1: a resource of human open reading frames representing over 10,000 human genes, Genomics 89, 307-315. Gong, W., Shen, Y. P., Ma, L. G., Pan, Y., Du, Y. L., Wang, D. H., Yang, J. Y., Hu, L. D., Liu, X. F., Dong, C. X., Ma, L., Chen, Y. H., Yang, X. Y., Gao, Y., Zhu, D., Tan, X., Mu, J. Y., Zhang, D. B., Liu, Y. L., Dinesh-Kumar, S. P., Li, Y., Wang, X. P., Gu, H. Y., Qu, L. J., Bai, S. N., Lu, Y. T., Li, J. Y., Zhao, J. D., Zuo, J., Huang, H., Deng, X. W., and Zhu, Y. X. (2004) Genome-wide ORFeome cloning and analysis of Arabidopsis transcription factor genes, Plant Physiol. 135, 773-782. Uetz, P., Dong, Y. A., Zeretzke, C., Atzler, C., Baiker, A., Berger, B., Rajagopala, S. V., Roupelieva, M., Rose, D., Fossum, E., and Haas, J. (2006) Herpesviral protein networks and their interaction with the human proteome, Science 311, 239-242. Gillette, W. K., Esposito, D., Frank, P. H., Zhou, M., Yu, L. R., Jozwik, C., Zhang, X., McGowan, B., Jacobowitz, D. M., Pollard, H. B., Hao, T., Hill, D. E., Vidal, M., Conrads, T. P., Veenstra, T. D., and Hartley, J. L. (2005) Pooled ORF expression technology (POET): using proteomics to screen pools of open reading frames for protein expression, Mol. Cell. Proteomics 4, 1647-1652. Shapiro, J. A. (1998) Thinking about bacterial populations as multicellular organisms, Annu. Rev. Microbiol. 52, 81-104. Vlamakis, H., Aguilar, C., Losick, R., and Kolter, R. (2008) Control of cell fate by the formation of an architecturally complex bacterial community, Genes Dev. 22, 945-953. Boehmer, P. E., and Lehman, I. R. (1997) Herpes simplex virus DNA replication, Annu. Rev. Biochem. 66, 347-384. Wu, J. J., Brentjens, M. H., Torres, G., Yeung-Yue, K., Lee, P., and Tyring, S. K. (2003) Valacyclovir in the treatment of herpes simplex, herpes zoster, and other viral infections, J. Cutan. Med. Surg. 7, 372-381. Morfin, F., and Thouvenot, D. (2003) Herpes simplex virus resistance to antiviral drugs, J. Clin. Virol. 26, 29-37. Nishiyama, Y. (2004) Herpes simplex virus gene products: the accessories reflect her lifestyle well, Rev. Med. Virol. 14, 33-46. Glaunsinger, B., and Ganem, D. (2004) Lytic KSHV infection inhibits host gene expression by accelerating global mRNA turnover, Mol. Cell 13, 713-723.

72

141.

142. 143.

144.

145.

146. 147. 148.

149.

150. 151. 152.

Martinez, R., Sarisky, R. T., Weber, P. C., and Weller, S. K. (1996) Herpes simplex virus type 1 alkaline nuclease is required for efficient processing of viral DNA replication intermediates, J. Virol. 70, 2075-2085. Thomas, M. S., Gao, M., Knipe, D. M., and Powell, K. L. (1992) Association between the herpes simplex virus major DNA-binding protein and alkaline nuclease, J. Virol. 66, 1152-1161. Reuven, N. B., Staire, A. E., Myers, R. S., and Weller, S. K. (2003) The herpes simplex virus type 1 alkaline nuclease and singlestranded DNA binding protein mediate strand exchange in vitro, J. Virol. 77, 7425-7433. Glaunsinger, B., Chavez, L., and Ganem, D. (2005) The exonuclease and host shutoff functions of the SOX protein of Kaposi's sarcomaassociated herpesvirus are genetically separable, J. Virol. 79, 73967401. Rowe, M., Glaunsinger, B., van Leeuwen, D., Zuo, J., Sweetman, D., Ganem, D., Middeldorp, J., Wiertz, E. J., and Ressing, M. E. (2007) Host shutoff during productive Epstein-Barr virus infection is mediated by BGLF5 and may contribute to immune evasion, Proc. Natl. Acad. Sci. U. S. A. 104, 3366-3371. Nordlund, P., and Reichard, P. (2006) Ribonucleotide reductases, Annu. Rev. Biochem. 75, 681-706. Cohen, G. H. (1972) Ribonucleotide reductase activity of synchronized KB cells infected with herpes simplex virus, J. Virol. 9, 408-418. Daikoku, T., Yamamoto, N., Maeno, K., and Nishiyama, Y. (1991) Role of viral ribonucleotide reductase in the increase of dTTP pool size in herpes simplex virus-infected Vero cells, J. Gen. Virol. 72 ( Pt 6), 1441-1444. Langelier, Y., Bergeron, S., Chabaud, S., Lippens, J., Guilbault, C., Sasseville, A. M., Denis, S., Mosser, D. D., and Massie, B. (2002) The R1 subunit of herpes simplex virus ribonucleotide reductase protects cells against apoptosis at, or upstream of, caspase-8 activation, J. Gen. Virol. 83, 2779-2789. Sekino, Y., Bruner, S. D., and Verdine, G. L. (2000) Selective inhibition of herpes simplex virus type-1 uracil-DNA glycosylase by designed substrate analogs, J. Biol. Chem. 275, 36506-36508. Mullaney, J., Moss, H. W., and McGeoch, D. J. (1989) Gene UL2 of herpes simplex virus type 1 encodes a uracil-DNA glycosylase, J. Gen. Virol. 70 ( Pt 2), 449-454. Pyles, R. B., Sawtell, N. M., and Thompson, R. L. (1992) Herpes simplex virus type 1 dUTPase mutants are attenuated for neurovirulence, neuroinvasiveness, and reactivation from latency, J. Virol. 66, 6706-6713.

73

153. 154. 155. 156. 157. 158.

159. 160.

161. 162. 163.

164.

165. 166.

Preston, V. G., and Fisher, F. B. (1984) Identification of the herpes simplex virus type 1 gene encoding the dUTPase, Virology 138, 5868. Aranda, M., and Maule, A. (1998) Virus-induced host gene shutoff in animals and plants, Virology 243, 261-267. Oroskar, A. A., and Read, G. S. (1989) Control of mRNA stability by the virion host shutoff function of herpes simplex virus, J. Virol. 63, 1897-1906. Strom, T., and Frenkel, N. (1987) Effects of herpes simplex virus on mRNA stability, J. Virol. 61, 2198-2207. Smiley, J. R. (2004) Herpes simplex virus virion host shutoff protein: immune evasion mediated by a viral RNase?, J. Virol. 78, 1063-1068. Everly, D. N., Jr., Feng, P., Mian, I. S., and Read, G. S. (2002) mRNA degradation by the virion host shutoff (Vhs) protein of herpes simplex virus: genetic and biochemical evidence that Vhs is a nuclease, J. Virol. 76, 8560-8571. Hardwicke, M. A., and Sandri-Goldin, R. M. (1994) The herpes simplex virus regulatory protein ICP27 contributes to the decrease in cellular mRNA levels during infection, J. Virol. 68, 4797-4810. Corcoran, J. A., Hsu, W. L., and Smiley, J. R. (2006) Herpes simplex virus ICP27 is required for virus-induced stabilization of the ARE-containing IEX-1 mRNA encoded by the human IER3 gene, J. Virol. 80, 9720-9729. Glaunsinger, B. A., and Ganem, D. E. (2006) Messenger RNA turnover and its regulation in herpesviral infection, Adv. Virus Res. 66, 337-394. Jenner, R. G., Alba, M. M., Boshoff, C., and Kellam, P. (2001) Kaposi's sarcoma-associated herpesvirus latent and lytic gene expression as revealed by DNA arrays, J. Virol. 75, 891-902. Hoffmann, P. J., and Cheng, Y. C. (1978) The deoxyribonuclease induced after infection of KB cells by herpes simplex virus type 1 or type 2. I. Purification and characterization of the enzyme, J. Biol. Chem. 253, 3557-3562. Hoffmann, P. J., and Cheng, Y. C. (1979) DNase induced after infection of KB cells by herpes simplex virus type 1 or type 2. II. Characterization of an associated endonuclease activity, J. Virol. 32, 449-457. Bronstein, J. C., and Weber, P. C. (1996) Purification and characterization of herpes simplex virus type 1 alkaline exonuclease expressed in Escherichia coli, J. Virol. 70, 2008-2013. Stolzenberg, M. C., and Ooka, T. (1990) Purification and properties of Epstein-Barr virus DNase expressed in Escherichia coli, J. Virol. 64, 96-104.

74

167. 168.

169. 170.

171.

172. 173. 174.

175.

176. 177. 178. 179.

Shao, L., Rapp, L. M., and Weller, S. K. (1993) Herpes simplex virus 1 alkaline nuclease is required for efficient egress of capsids from the nucleus, Virology 196, 146-162. Reuven, N. B., and Weller, S. K. (2005) Herpes simplex virus type 1 single-strand DNA binding protein ICP8 enhances the nuclease activity of the UL12 alkaline nuclease by increasing its processivity, J. Virol. 79, 9356-9358. Pingoud, A., Fuxreiter, M., Pingoud, V., and Wende, W. (2005) Type II restriction endonucleases: structure and mechanism, Cell. Mol. Life Sci. 62, 685-707. Bujnicki, J. M., and Rychlewski, L. (2001) The herpesvirus alkaline exonuclease belongs to the restriction endonuclease PD-(D/E)XK superfamily: insight from molecular modeling and phylogenetic analysis, Virus Genes 22, 219-230. Kosinski, J., Feder, M., and Bujnicki, J. M. (2005) The PD-(D/E)XK superfamily revisited: identification of new members among proteins involved in DNA metabolism and functional predictions for domains of (hitherto) unknown function, BMC Bioinformatics 6, 172. Pingoud, A., and Jeltsch, A. (2001) Structure and function of type II restriction endonucleases, Nucleic Acids Res 29, 3705-3727. Goldstein, J. N., and Weller, S. K. (1998) The exonuclease activity of HSV-1 UL12 is required for in vivo function, Virology 244, 442457. Liu, M. T., Hsu, T. Y., Lin, S. F., Seow, S. V., Liu, M. Y., Chen, J. Y., and Yang, C. S. (1998) Distinct regions of EBV DNase are required for nuclease and DNA binding activities, Virology 242, 613. Zuo, J., Thomas, W., van Leeuwen, D., Middeldorp, J. M., Wiertz, E. J., Ressing, M. E., and Rowe, M. (2008) The DNase of gammaherpesviruses impairs recognition by virus-specific CD8+ T cells through an additional host shutoff function, J. Virol. 82, 23852393. Kovall, R., and Matthews, B. W. (1997) Toroidal structure of lambda-exonuclease, Science 277, 1824-1827. Eklund, H., Uhlin, U., Farnegardh, M., Logan, D. T., and Nordlund, P. (2001) Structure and function of the radical enzyme ribonucleotide reductase, Prog. Biophys. Mol. Biol. 77, 177-268. Wnuk, S. F., and Robins, M. J. (2006) Ribonucleotide reductase inhibitors as anti-herpes agents, Antiviral Res. 71, 122-126. Liuzzi, M., Deziel, R., Moss, N., Beaulieu, P., Bonneau, A. M., Bousquet, C., Chafouleas, J. G., Garneau, M., Jaramillo, J., Krogsrud, R. L., Lagacé, L., McCollum, R.S., Nawoot, S. and Guindon, Y. (1994) A potent peptidomimetic inhibitor of HSV

75

180.

181.

182. 183.

184. 185. 186.

187. 188.

ribonucleotide reductase with antiviral activity in vivo, Nature 372, 695-698. Crute, J. J., Grygon, C. A., Hargrave, K. D., Simoneau, B., Faucher, A. M., Bolger, G., Kibler, P., Liuzzi, M., and Cordingley, M. G. (2002) Herpes simplex virus helicase-primase inhibitors are active in animal models of human disease, Nat. Med. 8, 386-391. Cerqueira, N. M., Pereira, S., Fernandes, P. A., and Ramos, M. J. (2005) Overview of ribonucleotide reductase inhibitors: an appealing target in anti-tumour therapy, Curr. Med. Chem. 12, 12831294. Wang, X., Zhenchuk, A., Wiman, K. G., and Albertioni, F. (2008) Regulation of p53R2 and its role as potential target for cancer therapy, Cancer Lett. Tanaka, H., Arakawa, H., Yamaguchi, T., Shiraishi, K., Fukuda, S., Matsui, K., Takei, Y., and Nakamura, Y. (2000) A ribonucleotide reductase gene involved in a p53-dependent cell-cycle checkpoint for DNA damage, Nature 404, 42-49. Nakano, K., Balint, E., Ashcroft, M., and Vousden, K. H. (2000) A ribonucleotide reductase gene is a transcriptional target of p53 and p73, Oncogene 19, 4283-4289. Koshland, D. E., Jr. (1993) Molecule of the year, Science 262, 1953. Guittet, O., Hakansson, P., Voevodskaya, N., Fridd, S., Graslund, A., Arakawa, H., Nakamura, Y., and Thelander, L. (2001) Mammalian p53R2 protein forms an active ribonucleotide reductase in vitro with the R1 protein, which is expressed both in resting cells in response to DNA damage and in proliferating cells, J. Biol. Chem. 276, 40647-40651. Hakansson, P., Hofer, A., and Thelander, L. (2006) Regulation of mammalian ribonucleotide reduction and dNTP pools after DNA damage and in resting cells, J. Biol. Chem. 281, 7834-7841. Bourdon, A., Minai, L., Serre, V., Jais, J.P., Sarzi, E., Aubert, S., Chrétien, D., de Lonlay, P., Paquis-Flucklinger, V., Arakawa, H., Nakamura, Y., Munnich,. A and Rötig, A. (2007) Mutation of RRM2B, encoding p53-controlled ribonucleotide reductase (p53R2), causes severe mitochondrial DNA depletion.Nat. Genet. 39:703704.

76

Suggest Documents