enzyme chez la LgtC (Lairson et al., 2004), qui sert de système modèle pour ce
...... top sis o rie n ta lis. 1. IIR. (M u lich ak et al., 2. 0. 01. ) 1 β-v a n c o sa m in.
UNIVERSITE GRENOBLE I – JOSEPH FOURIER ECOLE DOCTORALE CHIMIE ET SCIENCES DU VIVANT
THESE Pour l’obtention du Diplôme de DOCTEUR DE L’UNIVERSITE JOSEPH FOURIER
Discipline : Biologie cellulaire présentée et soutenue publiquement par Lenka ŠNAJDROVÁ Le 5 juin 2006
MODELISATION MOLECULAIRE DES LECTINES ET DES GLYCOSYLTRANSFERASES Directeurs de thèse : Dr. Anne Imberty et Prof. RNDr. Jaroslav Koča, DrSc.
JURY Prof. Ing. Vladimír Křen, DrSc., Rapporteur Prof. Catherine Etchebest, Rapporteur RNDr. Josef Chmelík, CSc. Examinateur Dr. Serge Perez, Examinateur Dr. Anne Imberty, Examinateur Prof. RNDr. Jaroslav Koča, DrSc., Examinateur
1.
Introduction : glucides, lectines, glycosyltransférases .....................................................1 1.1. Les glucides et les glycoconjugués..........................................................................1 1.2. Les lectines .............................................................................................................1 1.2.1. Structures et propriétés des lectines .................................................................1 1.2.1.1. Les lectines dans le monde vivant ............................................................1 1.2.1.2. Spécificité ...............................................................................................3 1.2.1.3. Structure..................................................................................................4 1.2.1.4. Les sites de liaison...................................................................................5 1.2.1.5. La multivalence .......................................................................................7 1.2.2. Lectines et infections bactériennes ..................................................................8 1.2.2.1. Adhésion bactérienne aux surfaces cellulaires..........................................8 1.2.2.2. Pseudomonas aeruginosa ........................................................................9 1.3. Les glycosyltransférases........................................................................................14 1.3.1. La fonction et l’arrangement dans les cellules ...............................................14 1.3.2. Les mécanismes de réaction ..........................................................................15 1.3.3. Le classement par similarité de séquences : familles......................................17 1.3.4. Le classement structural : repliements et clans...............................................18 1.3.5. La base de données structurales des glycosyltransférases..............................16 ARTICLE I .......................................................................................................................... 23 2. Méthodes : Chimie computationnelle et modélisation moléculaire ................................35 2.1. Mécanique quantique ............................................................................................35 2.1.1. Les méthodes empiriques ..............................................................................37 2.1.2. Les méthodes semi-empiriques......................................................................37 2.1.3. Les méthodes ab initio...................................................................................37 2.1.4. Les méthodes de la théorie de la fonctionnelle de la densité (DFT)................30 2.1.5. Fonctions de base ..........................................................................................31 2.2. Mécanique moléculaire .........................................................................................31 2.3. Dynamique moléculaire ........................................................................................41 2.3.1. Principe de base ............................................................................................41 2.3.2. Conditions périodiques..................................................................................42 2.3.3. Algorithme SHAKE ......................................................................................43 2.3.4. Sommation d’Ewald (Particle Mesh Ewald, PME) ........................................43 2.3.5. Analyse d’une simulation de dynamique moléculaire ....................................44 2.3.5.1. Fonction de distribution radiale (radial distribution function, RDF) .......44 2.3.5.2. Dynamique essentielle ...........................................................................45 2.3.5.3. Méthodologie MM-PBSA/MM-GBSA ..................................................45 2.4. Les méthodes hybrides..........................................................................................46
2.5. Modélisation par homologie..................................................................................47 2.5.1. L’identification du repliement protéique et le choix du gabarit ......................47 2.5.2. L‘alignement cible-gabarit ............................................................................40 2.5.3. La construction du modèle ............................................................................41 2.5.4. L’évaluation de justesse du modèle ...............................................................50 3. Calculs quantiques sur la lectine PA-IIL .......................................................................51 3.1. Les lectines solubles de Pseudomonas aeruginosa ................................................51 3.2. La lectine PA-IIL..................................................................................................51 ARTICLE II......................................................................................................................... 53 4. Modélisation des glycosyltransférases par homologie ...................................................67 4.1. Les galactosyltransférases impliquées dans la biosynthèse des glycosaminoglycanes .......67 4.2. La modélisation par homologie de β3GalT6..........................................................68 4.3. Les galactosyltransférases associées aux chloroplastes dans les plantes.................70 4.4. La modélisation par homologie de MGDG synthase..............................................71 ARTICLE III........................................................................................................................ 73 5. Simulations de dynamique moléculaire sur la glycosyltransférase LgtC........................87 5.1. Neisseria meningitidis...........................................................................................87 5.2. Épidémiologie des méningites bactériennes ..........................................................87 5.3. La glycosyltransférase LgtC..................................................................................88 ARTICLE IV ....................................................................................................................... 91 ARTICLE V....................................................................................................................... 105 6. Conclusions et perspectives ........................................................................................ 139 REFERENCES .................................................................................................................. 143
1. INTRODUCTION : GLUCIDES, LECTINES, GLYCOSYLTRANSFERASES 1.1. Les glucides et les glycoconjugués Les glucides sont des constituants essentiels de tous les organismes vivants et sont les molécules biologiques les plus répandues sur la Terre. On peut se faire une idée approximative de leur importance quantitative en évaluant la portion de la surface terrestre qui est couverte par la végétation. Les arbres et les plantes sont en grande partie constitués de cellulose - un polysaccharide qui a une fonction structurale importante. En plus des polysaccharides structuraux (cellulose, chitine…), on trouve également sur Terre une grande quantité de polysaccharides de réserve. Le plus connu d’entre eux est l’amidon, qui non seulement représente une réserve d’énergie pour les plantes et pour les animaux, mais est aussi largement exploité par l’industrie et l’agroalimentaire. La cellulose et l’amidon sont des matières utilisées par les humains depuis la nuit des temps. Il existe cependant un éventail d’autres glucides, qui ne sont pas présents dans la nature en aussi grande quantité, mais dont les fonctions ne sont toutefois pas moins importantes. Comme ces sucres sont généralement fixés de manière covalente sur des protéines ou des lipides, les macromolécules résultantes sont appelées les glycoconjugués. Les glycoconjugués sont fréquemment trouvés sur les membranes cellulaires des organismes, la partie saccharidique dépassant de la surface comme une antenne. Ainsi, les glycoconjugués sont impliqués dans de nombreux phénomènes biologiques, notamment dans la communication intra et intercellulaire et dans les interactions entre pathogènes et cellules de l’hôte et beaucoup d’autres fonctions qui restent à découvrir.
1.2. Les lectines 1.2.1. Structures et propriétés des lectines 1.2.1.1.
Les lectines dans le monde vivant
Les lectines sont des protéines qui reconnaissent spécifiquement certains mono- ou oligosaccharides. Il ne s’agit pas d’enzymes ni d’anticorps, car les lectines ne présentent pas d’activité catalytique et ne sont pas impliquées dans la réponse immunitaire classique. Les lectines comportent généralement plusieurs sites de liaison. Par conséquent, leur interaction avec les glucides à la surface des érythrocytes cause l’agrégation d’un grand nombre de ces cellules. Ce phénomène, qui est appelé hémagglutination lorsqu’on utilise des globules rouges, est une caractéristique typique des lectines et est fréquemment utilisé pour leur
1
détection et leur caractérisation. Il est possible d’inhiber les réactions d’agglutination et de précipitation des lectines par les glucides spécifiques pour ces protéines. Les lectines sont présentes dans tous les organismes vivants. Elles sont impliquées dans la communication entre les cellules, dans l’interaction de certaines molécules avec la surface cellulaire, mais aussi facilitent le contact primaire des pathogènes (virus, bactérie, parasite) avec les cellules de hôte (Figure 1). Sur la surface cellulaire du parasite (jaune), différentes lectines sont représentées. La cellule hôte (orange) exprime une grande quantité de glycoprotéines sur sa surface, représentées par des symboles colorés. Quand une lectine parasitaire rencontre le saccharide correspondant de l’hôte, le parasite s'attache fortement à la surface de la cellule hôte (partie droite de l'image).
Figure 1: Représentation schématique de l'interaction d’un parasite avec une cellule hôte.
Il existe une si grande variété de lectines que plusieurs schémas de classification ont été proposés sans qu’aucun ne donne pleine satisfaction. On peut classer les lectines selon leur origine (lectines virales, bactériennes, végétales etc.), selon leur spécificité à l’égard de glucides, selon leur structure moléculaire ou selon d’autres critères. Grâce au séquençage d’un nombre toujours croissant de génomes, le nombre de lectines découvertes croît à une vitesse vertigineuse, tandis que les structures cristallographiques et RMN croissent à un rythme bien plus faible. Une banque de données rassemblant les données structurales de lectines est disponible sur Internet à l’adresse suivante : http://www.cermav.cnrs.fr/lectines/ Cette base de données comprend aujourd’hui plus de 500 structures tridimensionnelles de lectines dont certaines sont représentées par plusieurs structures obtenues avec différents ligands. La banque de lectines comprend ainsi 36 familles structurales différentes dont 5 familles d’origine virale, 8 d’origine bactérienne, 5 d’origine fongique, 5 d’origine végétale et 13 d’origine animale. 2
1.2.1.2.
Spécificité
Les lectines reconnaissent de manière spécifique des oligosaccharides. La plupart peuvent également se lier à des monosaccharides. L’affinité des lectines pour les monosaccharides est généralement assez faible (Kd de l’ordre de 1 mM) en comparaison avec leur affinité pour les oligosaccharides (de l’ordre du µM) (Dam and Brewer, 2002). Sharon (Lis and Sharon, 1998) divise les lectines en cinq classes en fonction de leur maximum d’affinité pour le mannose (Man), le galactose (Gal) ou N-acétylgalactosamine (GalNAc), la N-acétylglucosamine (GlcNAc), le fucose (Fuc) ou l’acide sialique (NeuAc, acide N-acétylneuraminique). Ces monosaccharides et leurs dérivés sont les unités typiques des structure d’oligosaccharides sur les surfaces cellulaires, les autres ne s’y retrouvent que rarement. La similitude topologique entre certains monosaccharides est déterminante pour la spécificité: par exemple, la plupart des lectines, qui reconnaissent le Gal, lient aussi le GalNAc. Certaines lectines présentent une affinité pour des monosaccharides qui ne semblent pas être structurellement proches, par exemple Fuc, Man et Fru (fructose). Pourtant, les trois fonctions hydroxyles, qui sont fondamentales pour la liaison du sucre à la lectine, ont une topologie très similaire, et pour cette raison, ces trois ligands s’attachent à la PA2L (Loris et al., 2003) de Pseudomonas aeruginosa (Figure 2).
Figure 2: Le D-fructopyranose (Fru), le L-fucopyranose (Fuc) et le D-mannopyranose (Man).
La classification des lectines en fonction de leur spécificité pour des monosaccharides n’est cependant pas tellement adaptée, si on considère que ces protéines présentent une affinité beaucoup plus forte pour les oligosaccharides. Pour la reconnaissance des structures oligosaccharidiques, la similitude topologique joue un rôle encore plus important. La surface d’une lectine peut porter plusieurs sites qui peuvent reconnaître des unités saccharidiques: le site principal reconnaît spécifiquement les unités monosaccharidiques, alors que les sites secondaires (appelés parfois "les sites hydrophobes") reconnaissent, par exemple, les aglycones hydrophobes, ou bien précisent la spécifité de la lectine envers les oligosaccharides particuliers (Loris et al., 1998). La protéine peut donc présenter une affinité pour plusieurs oligosaccharides, dont la topologie et l’arrangement spatial correspondent à la surface de la protéine. Par exemple, la galectine CGL2 du champignon Coprinopsis cinerea a été
3
cristallisée avec 5 ligands différents (Walser et al., 2004). La galectine-3 humaine présente une affinité envers plusieurs oligosaccharides (Bachhawat-Sikder et al., 2001). Les oligosaccharides sont reconnus par les lectines dans une conformation unique, qui ne doit pas forcément correspondre à la conformation de plus basse énergie que l’oligosaccharide adopterait en solution (Bush et al., 1999).
1.2.1.3.
Structure
Du point de vue structural, on classe les lectines en trois grandes classes selon leur topologie: A. Les lectines simples. Ces lectines sont formées de plusieurs monomères (pas forcément identiques), dont la masse moléculaire généralement n’excède pas 40 kDa. Cette classe comprend pratiquement toutes les lectines végétales, les lectines bactériennes solubles et les galectines, une famille de lectines animales spécifiques pour le galactose.
Figure 3: Représentation graphique d’un monomère de concanavaline A de Canavalia ensiformis en complexe avec le trimannoside (code PDB 1CVN). La protéine est représentée par un ruban jaune pour les brins β, un ruban rouge pour les hélices α et un fil pour les autres zones. Une représentation en bâtons est utilisée pour le sucre et en boule pour les cations.
B. Les lectines en mosaïque. Ce groupe comporte diverses protéines de différentes sources (virus, animaux). Il s’agit de molécules complexes qui sont composées de plusieurs types de modules ou domaines, dont un seul possède le site de liaison.
4
Figure 4: Réprésentation graphique d’un trimère d’hémagglutinine du virus Influenza en complexe avec l’acide sialique (code PDB 1HGF). La représentation de la protéine est analogue à la figure 3.
C. Les assemblages macromoléculaires. Les lectines de ce type sont fréquemment trouvées chez les bactéries, où elles forment des structures filamenteuses de 3 à 7 nm de diamètre et jusqu’à 100 nm de longueur, appelées fimbriae ou pili. La plus grande partie d’un filament fimbrial est formée par la polymérisation d’une unité prédominante, qui ne joue qu’un rôle structural. Seul un type d’unités, généralement une composante minoritaire, possède le site de liaison pour les glucides et donc est responsable de la capacité d’adhésion du fimbriae.
Figure 5: Représentation schématique de différents fimbriae de la bactérie Escherichia coli. (Lis and Sharon, 1998)
1.2.1.4.
Les sites de liaison
Le site de liaison d’une lectine est généralement constitué par un creux sur la surface de la protéine, dont la forme ne varie pas beaucoup après la liaison du ligand. La lectine interagit 5
avec le ligand par un réseau de liaisons hydrogène et d’interactions hydrophobes, comme montre l’exemple des galectines (Leonidas et al., 1998) (Figure 6). Seules les liaisons hydrogènes sont impliquées dans l'interaction du sucre avec la protéine. Ainsi, l'oxygène en position 4 établit trois liaisons hydrogène avec les acides aminés His49, Asn51 et Arg53 et l'oxygène O6 deux liaisons H avec Asn62 et Glu72. La coordination d’ions métalliques peut aussi contribuer à la liaison, comme cela est observé chez les lectines de type C, telles que les mannose-binding proteins (MBPs) (Ng et al., 2002) (Figure 7). Ici, on retrouve des liaisons hydrogène entre un oxygène et deux acides aminés (Asn192 et Glu190), mais également des liaisons de coordination entre l'atome de calcium et les groupements hydroxyles O3 et O4 du sucre. Les forces de van der Waals sont fortement impliquées dans l’interaction protéine-ligand. Bien que les glucides lient des molécules assez polaires, la disposition spatiale des fonctions hydroxyles créent parfois des régions hydrophobes sur les surfaces saccharidiques, qui peuvent former les contacts avec les régions hydrophobes sur les molécules de protéines. Un exemple est la lectine EcorL de Erythrina corallodendron (Elgavish and Shaanan, 1998) (Figure 8). Plusieurs liaisons hydrogène sont impliquées dans l'interaction du galactose avec la protéine (Ala218, Gln219), mais les interactions de van der Waals entre la face hydrophobe du galactose et le cycle de la Phe131 sont également importantes. Les interactions de type salin ne sont généralement pas impliquées dans les interactions lectine-sucre sauf pour des glucides chargés comme l’acide sialique (Grootenhuis and van Boeckel, 1991). Le contact entre le ligand et la protéine est parfois facilité par les molécules d’eau, interposées entre la lectine et le glucide (Bourne et al., 1990; Loris et al., 1994).
Figure 6: Site de reconnaissance de la galectine-7 humaine (Homo sapiens) en complexe avec le β-D-galactose (code PDB 2GAL2). Le sucre est représenté par des bâtons épais, les acides aminés par des bâtons fins, les liaisons hydrogène par des pointillés.
6
Figure 7: Mannose binding protein de rat (Rattus norvegicus) en complexe avec l’α-D-mannose (code PDB 1RDL). La représentation de la protéine est analogue à l’image précédente, l'atome de calcium est représenté par une sphère.
Figure 8: La lectine EcorL de Erythrina corallodendron en complexe avec le β-D-galactose (code PDB 1AXZ). Le sucre est représenté par des bâtons épais, les acides aminés par des bâtons fins. La surface accessible au solvant est représentée en bleu foncé.
1.2.1.5.
La multivalence
Les lectines sont des protéines multivalentes, ce qui se traduit par la liaison de plusieurs molécules de glucides à une molécule (ou à un assemblage de molécules) de lectine. Dans certains cas, un monomère de lectine peut porter un ou plusieurs sites de liaison pour le glucide. Dans d’autres cas, des monomères de lectines avec un seul site s’associent souvent sous la forme de dimères, trimères, tétramères etc., qui créent aussi une multivalence pour les glucides. Les interactions multiples entre la protéine et le glucide sont probablement impliquées dans le processus de reconnaissance (Lee and Lee, 1995). Dans la plupart des cas, les systèmes biologiques semblent utiliser des interactions faibles mais multivalentes plutôt qu’une seule interaction forte. La multivalence est donc employée non seulement pour atteindre l’affinité, ou l’avidité nécessaire, mais aussi pour assurer la haute spécificité pour les 7
interactions avec certains types de surfaces cellulaires.
1.2.2. Lectines et infections bactériennes 1.2.2.1.
Adhésion bactérienne aux surfaces cellulaires
La plupart des bactéries sont des organismes hétérotrophes qui gagnent de l’énergie par le saprophytisme, par la symbiose ou par la pathogénie. Les bactéries pathogènes emploient un répertoire flexible de mécanismes par lesquels elles attaquent leurs hôtes. Les études récentes ont mis en évidence des modèles communs. Dans un premier temps, la plupart des bactéries nécessitent la reconnaissance de l’hôte et l’adhésion à un tissu cible pour que l’infection soit possible (Pizarro-Cerda and Cossart, 2006). L’adhésion de certaines bactéries aux cellules de l’hôte est assurée par les lectines qui sont présentes sur la surface bactérienne et qui s’attachent aux glycoprotéines ou glycolipides sur les surfaces cellulaires de l’hôte (Sharon and Ofek, 2000). Ainsi certains mutants de bactéries pathogènes n’exprimant pas de lectines sont incapables d’initier l’infection (Roberts et al., 1994). Les bactéries qui s'attachent aux surfaces s'agglutinent dans une matrice polymère hydratée qu'elles produisent, et donc forment un biofilm (Costerton et al., 1999). La formation de ces communautés bactériennes concentrées et leur résistance inhérente aux agents antimicrobiens sont la cause de nombreuses infections bactériennes persistantes et chroniques. Sur des surfaces bactériennes, les lectines se présentent souvent sous la forme de fimbriae ou pili (voir le chapitre sur la structure des lectines). Elles sont généralement polymorphes et consistent en un assemblage de centaines d’unités de plusieurs types, dont une seule s’attache aux glucides (Ofek and Doyle, 1994). Les fimbriae sont très bien adaptées pour servir comme médiateurs de l’adhésion bactérienne au tissu d’arrivée. Parmi les nombreux pathogènes humains, deux exemples assez importants sont les bactéries Escherichia coli et Campylobacter jejuni. Bien que ces bactéries ne causent pas ou peu d’infections fatales, elles sont couramment répandues chez les humains dans le monde entier, présentant des infections multiples. Certains souches d'Escherichia coli sont uropathogènes, c'est-à-dire qu’elles causent les infections urinaires, qui comptent parmi les infections bactériennes les plus répandues chez les humains (Kunin, 1987). L'adhésion de la bactérie à l'épithélium de l'appareil urinaire se produit par l'intermédiaire des fimbriae P et des fimbriae du type 1, qui ont fait l'objet de nombreuses études (Bergsten et al., 2005; Ofek et al., 2000). Dans les années 1990, la bactérie Campylobacter jejuni a été reconnue parmi les pathogènes les plus fréquemment responsables de diarrhées aiguës dans le monde industrialisé (Tauxe, 1992). Les infections par Campylobacter jejuni deviennent préoccupantes à cause de la résistance croissante de la bactérie aux antibiotiques (Altekruse et al., 1999). Plusieurs facteurs de virulence de ces bactéries sont responsables de l’adhésion de la bactérie aux
8
cellules de hôte (Ketley, 1997), et les flagelles sont probablement les plus importants. Les études in vitro ont montré que l’adhérence de la bactérie aux cellules de la muqueuse intestinale est inhibé par des fucosyloligosaccharides (Ruiz-Palacios et al., 2003). L’affinité des lectines pour les glucides est d’habitude relativement basse (de l’ordre du millimolaire). Par modification chimique, on peut obtenir une augmentation d’affinité de plusieurs ordres. Par cette approche, on peut envisager de concevoir des glucides antiadhésifs qui s’attacheraient aux lectines bactériennes avec une affinité plus forte que leurs ligands naturels et empêcheraient la bactérie d’adhérer aux cellules de l’hôte, ce qui est généralement le premier pas de l’infection bactérienne (Sharon and Ofek, 2000).
1.2.2.2.
Pseudomonas aeruginosa
La bactérie Pseudomonas aeruginosa est une bactérie Gram négatif, qui appartient à la famille des Pseudomonadaceae. Cette famille comprend plusieurs genres qui constituent un groupe de bactéries généralement connues comme « pseudomonades ». Ces bactéries sont largement répandues dans le sol et dans l'eau. Elles se trouvent souvent sur la surface des plantes, et parfois aussi sur la surface des animaux. Pseudomonas aeruginosa est une bactérie pathogène opportuniste, c'est-à-dire qu'elle exploite un défaut dans le système immunitaire de l’hôte pour initier l'infection. Elle est au troisième rang des bactéries responsable d'infections nosocomiales (11%) derrière Escherichia coli (22,6%) et Staphylococcus aureus (20%) (Carlet et al., 2001). Dans les établissements de soins, elle peut causer une mortalité importante : dans les années 1997-1998, l’infection par P. aeruginosa s’est avérée fatale pour 35% des nouveaux-nés dans l’unité des soins intensifs d’un hôpital à Oklahoma City (Etats-Unis) (Moolenaar et al., 2000). La bactérie est réputée pour sa résistance multiple aux antibiotiques (Defez et al., 2004). Pseudomonas aeruginosa cause des infections de l'appareil urinaire, des infections du système respiratoire, des dermatites, des infections des tissus mous, des bactériémies, des infections des os et des articulations, des infections gastro-intestinales et une gamme d'infections systémiques, particulièrement chez les patients atteints de cancer ou du SIDA, qui sont immunodéprimés, chez les patients avec des brûlures sévères, ou chez les patients atteints de mucoviscidose. La mortalité chez ces patients atteint jusqu’à 50%. La population de race blanche, dite encore caucasienne, est affectée par une maladie héréditaire appelée mucoviscidose (ou fibrose kystique). La mucoviscidose se transmet sur le mode autosomique récessif, c’est-à-dire que seuls les homozygotes sont malades. La maladie est particulièrement fréquente en Europe et en Amérique du Nord : dans ces régions, son incidence est estimée à 1/2500 naissances (Ratjen and Doring, 2003). La mucoviscidose
9
frappe les glandes séreuses et les glandes à sécrétion muqueuse : dans le système respiratoire et digestif des individus affectés se produit un mucus anormalement dense. Ce phénomène est causé par une mutation du gène CF qui code pour une protéine de 1480 acides aminés appelée protéine CFTR (Cystic Fibrosis Transmembrane conductance Regulator) (Dork et al., 1991). Aujourd’hui encore, la fibrose cystique reste une maladie incurable et toujours mortelle. Il existe des thérapies palliatives qui ont permis de prolonger l’espérance de vie des malades, mais celle-ci ne dépasse guère l’âge de 30 ans. La greffe est le seul espoir pour certains enfants ou adolescents. Des essais de thérapie génique in vitro et chez l’homme, sont actuellement en cours. Des études ont montré que Pseudomonas aeruginosa forme un biofilm dans le système respiratoire des patients atteints de mucoviscidose (Singh et al., 2000). Un espoir pour les malades de fibrose cystique est donc la recherche de nouveaux antibiotiques, qui seront plus efficaces.
1.3. Les glycosyltransférases 1.3.1. La fonction et l’arrangement dans les cellules La biosynthèse des glucides, des polysaccharides et des glycoconjugés est dirigée par une famille contenant de nombreuses enzymes, qui sont appelées les glycosyltransférases. Les glycosyltransférases existent chez les procaryotes et chez les eucaryotes, et généralement présentent une spécificité pour le substrat donneur et pour le substrat accepteur. Chez les eucaryotes, la plupart de réactions de glycosylation, qui génèrent une vaste quantité de structures oligosaccharidiques, s’effectuent dans l’appareil de Golgi. Les glycosyltransférases présentes dans cet organite sont des protéines transmembranaires du type II qui consistent en un court domaine cytoplasmique N-terminal, suivi d'un domaine transmembranaire, d’une région tige et d’un grand domaine catalytique globulaire, qui se trouve à l'intérieur de l'appareil de Golgi (Figure 9).
10
Figure 9: L’arrangement typique d’une glycosyltransférase dans l'appareil Golgi. Adapté du Petit Larousse (Merlet et al., 2004).
1.3.2. Les mécanismes de réaction La plupart des glycosyltransférases (dites de Leloir) transfèrent le groupement saccharidique d’un nucléotide-sucre (le donneur) sur une molécule spécifique d’accepteur, donnant naissance à une liaison glycosidique. Durant le transfert du glucide, la configuration du carbone anomérique du sucre est soit retenue, soit inversée (Figure 10). De ce point de vue, on peut distinguer deux groupes de glycosyltransférases: les glycosyltransférases à mécanisme d’inversion ou à mécanisme de rétention.
Figure 10: Represéntation schématique des deux types de transfert de sucre à partir d’un nucléotide-sucre. R = nucléotide (uridine, cytidine etc.), R' = monosaccharide transféré.
11
En s'appuyant sur l'analogie entre les glycosylhydrolases (GHs) et les glycosyltransférases (GTs), on a d’abord supposé que les GTs utilisent des mécanismes de réactions similaires aux GHs, qui ont été largement étudiés (Davies et al., 1998; Lairson and Withers, 2004). Les GHs à mécanisme d'inversion utilisent la réaction de déplacement direct SN2, ce qui mène à une inversion de la configuration du carbone anomérique du sucre. Ce mécanisme des GHs à l'inversion de configuration a été appuyé par nombreuses études (Huang et al., 1997; Vocadlo et al., 2002). Il a été proposé que les GTs à mécanisme d'inversion suivent un schéma similaire (Figure 11): une base générale (B) déprotone le nucléophile de l'accepteur et facilite ainsi le déplacement du nucléoside diphosphate. Cette hypothèse est pour le moment appuyée par des données structurales et cinétiques, ainsi que par des résultats de mutagenèse dirigée (Chiu et al., 2004; Pedersen et al., 2000).
Figure 11: Mécanisme de réaction de glycosyltransférases à inversion de configuration: déplacement direct. (Lairson and Withers, 2004)
Le mécanisme des GHs à rétention de configuration est typiquement une double réaction de déplacement, qui comporte deux états d’ion oxocarbenium et un intermédiaire covalent glycosyl-enzyme (Zechel and Withers, 2000). Par contre, une hypothèse de mécanisme analogue chez les GTs à rétention de configuration n'a jamais été confirmée par l'expérience. Les techniques expérimentales utilisées n'ont pas permis de détecter clairement un intermédiaire covalent glycosyl-enzyme. Pour cela, un mécanisme de réaction alternatif a été proposé, de type SNi. Celui-ci comporte un seul état de transition, dans lequel l'attaque du nucléophile de l'accepteur et le départ de groupe sortant du donneur se produisent sur la même face (Figure 12). Le mécanisme SNi a été proposé notamment pour les glycosyltransférases qui ne possèdent pas de nucléophile convenablement placé dans leur site actif.
12
Figure 12: Mécanismes de réaction proposés pour les glycosyltransférases à rétention de configuration: mécanisme de déplacement double (a) et mécanisme SNi (b). (Lairson and Withers, 2004)
Le débat sur le mécanisme de double déplacement ou le mécanisme SNi chez les GTs à rétention de configuration a été relancé avec la détection d'un intermédiaire covalent glycosylenzyme chez la LgtC (Lairson et al., 2004), qui sert de système modèle pour ce groupe de GTs. De manière surprenante, le glucide est lié à Asp190 et non à Gln189 qui avait été identifié comme acide aminé catalytique Ces résultats restent donc sujets à controverse. A l’heure actuelle, aucun des deux mécanismes de réaction n’a été confirmé et il faudra plus d'expériences pour éclaircir le mécanisme des GTs à rétention de configuration.
1.3.3. Le classement par similarité de séquences : familles A partir des similitudes de séquence, les glycosyltrasférases sont plus précisément classées en familles, qui sont désignées par les lettres GT1, GT2, ... GT70 etc. En s’appuyant sur la comparaison des séquences primaires, le système CAZY a été proposé (Coutinho and Henrissat, 1999) qui classe les glycosyltransférases en 83 familles. Cette base de données regroupe plus de 18 000 entrées et presque 32 000 séquences de glycosyltransférases (Tableau 1) et se retrouve sur l'adresse internet suivante: http://www.cazy.org/CAZY/fam/acc_GT.html. Le nombre de séquences hypothétiques de GTs issues des séquençages systématiques de nombreux génomes étant toujours en perpétuelle croissance, la connaissance des familles individuelles de glycosyltransférases s’approfondit et se précise. Par ailleurs, de nouvelles familles sont régulièrement définies dès l’instant qu’une nouvelle fonction est déterminée. Les différences entre les familles peuvent être importantes. Certaines familles contiennent des centaines voire même des milliers de séquences qui représentent des protéines de différents organismes, ayant des fonctions très diverses. Il est extrêmement difficile de prédire la fonction biochimique précise d’une séquence hypothétique apparentée à ces grandes familles dites poly-spécifiques. Par contre, d’autres familles sont monospécifiques et ne comportent que peu de séquences.
13
Tableau 1: Résumé de nombre d’entrées de séquences, de données SwissProt et de structures tridimensionnelles pour chaque famille de GTs dans le système CAZY.
1.3.4. Le classement structural : repliements et clans De manière surprenante, il apparaît que les GTs adoptent une structure tridimensionnelle très conservée qui se résume aujourd’hui principalement à deux types de repliement protéique, nommés GT-A et GT-B (Figure 13). Le repliement GT-A est caractérisé par un domaine Rosmann α/β/α comprenant un feuille β central, créé par un minimum de huit brins β. Le point commun de ces protéines est un motif d’acides aminés Asp-X-Asp (DxD), où les groupes carboxylates de ces résidus aspartate coordonnent un cation divalent (Breton et al., 2006). Le motif DxD, via le cation divalent, est impliqué dans la liaison du substrat donneur. Le repliement GT-B est caractérisé par deux domaines Rossman α/β/α reliés par un domaine de liaison relativement flexible. La liaison du ligand est conditionnée par un changement
14
conformationnel de la protéine, qui se traduit par un changement de la position relative des deux domaines α/β/α (Bourne and Henrissat, 2001). L’existence d’un troisième repliement protéique a été proposée récemment (Breton et al., 2006). La sialyltransférase bactérienne CstII (Chiu et al., 2004), appartenant à la famille GT42, adopte un repliement protéique qui présente des similarités avec celui de GT-A, mais avec plusieurs différences notamment dans la connectivité des brins β et par l’absence de motif DxD, et donc il peut être considéré comme un nouveau repliement protéique (Figure 14). La structure de cette enzyme comporte un sandwich α/β/α et ne présente pas de motif DxD.
Figure 13: Représentation graphique des repliements GT-A et GT-B. La protéine est représentée par un ruban jaune pour les brins β, un ruban vert pour les hélices α et un fil pour les autres zones. Le sucre est représenté par des bâtons, l'atome de manganèse par une sphère. A gauche, structure de glycosyltransférase Extl2 de Mus musculus (code PDB 1OMZ) en complexe avec UDP-GalNAc. A droite, structure de glycosyltransférase MurG de Escherichia coli (code PDB 1NLM) en complexe avec UDP-GlcNAc.
Figure 14: La structure de sialyltransférase CstII de Campylobacter jejuni (code PDB 1RO7) en complexe avec CMP-3FNeuAc. La représentation de la protéine est analogue à l’image précédente.
Les deux repliements protéiques principaux mentionnés ci-dessus ne sont pas corrélés avec le
15
résultat stéréochimique de la réaction. Donc, les protéines appartenants à chacun de ces repliements protéiques peuvent être classifiées en glycosyltransférases qui conservent ou inversent la configuration anomérique du carbone du nucleotide-sucre. Ces quatre clans sont désignés par les chiffres romains I-IV (Coutinho et al., 2003) (Figure 15).
Figure 15: Quatre clans des glycosyltransférases. Les abréviations a et e indiquent la configuration du carbone anomère du glucide avant et après la réaction (Coutinho et al., 2003).
1.3.5. La base de données structurales des glycosyltransférases Les glycosyltransférases étant des protéines fréquemment associées aux membranes, leur cristallisation et leur caractérisation structurale restent des étapes très difficiles. Au laboratoire de Glycobiologie Moléculaire du CERMAV, une base de données des structures tridimensionnelles connues a été créée et mise à la disposition de la communauté scientifique sur Internet (Figure 16). Cette base de données regroupe les informations essentielles sur ces protéines, et offre des liens utiles vers d’autres bases de données. Au cours de ma thèse, une partie de mon travail a consisté à vérifier périodiquement la publication de nouvelles structures cristallographiques et à mettre à jour cette base de données, qui se trouve à l'adresse http://www.cermav.cnrs.fr/cgi-bin/rxgt/rxgt.cgi
16
Figure 16: La base de données est accessible sur l'adresse internet suivante : http://www.cermav.cnrs.fr/cgibin/rxgt/rxgt.cgi.
Les glycosyltransférases contenues dans cette base de données sont classées non seulement selon le système CAZY (Coutinho and Henrissat, 1999), mais aussi selon l'organisme d'origine, la liaison formée par la réaction enzymatique, ou le repliement protéique. Les liens utiles permettent d'acquérir l'information bibliographique, les coordonnées atomiques de la banque de données PDB (Berman et al., 2000), ou d'autres renseignements provenant de différentes sources, comme SwissProt (Bairoch et al., 2004)ou Pfam (Bateman et al., 2002). Chaque structure est accompagnée d'images bi- et tridimensionnelles, qui illustrent les détails des interactions entre la protéine et le substrat. En mars 2006, 151 structures cristallographiques de glycosyltransférases sont disponibles dans la banque (Figure 17). La plupart de ces protéines ont été cristallisées en complexe avec les substrats ou ses analogues, ce qui permet une analyse détaillée (au niveau atomique) des interactions entre les protéines et les substrats. Beaucoup de structures de glycosyltransférases ont été résolues plusieurs fois, avec des substrats et/ou leurs analogues différents (Tableau 2). La disproportion énorme entre le nombre de séquences connues et le nombre de structures résolues de glycosyltransférases est évidente dans le Tableau 1. Sur 83 familles de GTs dans le système CAZY, seulement 19 sont représentées dans la base de données CERMAV, et il est prévisible que cette disparité va s'accroître, car le séquençage des protéines progresse beaucoup plus vite que leur cristallisation et caractérisation structurale. Les représentants de certaines familles (6, 7 et 63) ont été caractérisés plusieurs fois avec différents substrats, alors que les structures de nombreuses autres familles restent un mystère. Le Tableau 3 donne une
17
idée sur le nombre de structures résolues, ainsi que sur les organismes d'origine et sur les activités de GTs. Le nombre souvent élevé de la dernière colonne indique que la glycosyltransférase correspondante a été résolue avec différents substrats ou sous la forme de différents mutants.
Figure 17: Evolution du nombre de structures de glycosyltransférases au cours des dernières années.
18
19
20
b4GalT I
b4GalT I
LgtC
Glycogénine
GnT I
GlcAT-I
GlcAT-P
Toxin B
Extl2
β-1,4-Galactosyltransférase I
β-1,4-Galactosyltransférase I
α-1,4-Galactosyltransférase
Protéine auto-glycosylante
β-1,2-N-Acetylglucosaminyltransférase I α-1,2-Mannosyl-transférase
Polypeptide α-GalNActransférase Polypeptide α-GalNActransférase β-1,3-Glucuronyltransférase
β-1,3-Glucuronyltransférase
α-Glucosyltransférase
α-1,4-N-Acetylhexosaminyltransférase Mannosylglycerate synthase
7
7
8
8
13
27
43
44
64
78
43
27
15
GTB
Glycosyltransférase B
6
MGS
GalNAc-T2
Kre2P/ Mnt1P GalNAc-T1
GTA
Glycosyltransférase A
6
a3GalT
rétention rétention
GDP-mannose α-mannosyltransférase
rétention
inversion
inversion
rétention
rétention
rétention
inversion
rétention
rétention
inversion
Ext-related protein 2
α-1,2-mannosyl-glycoprotein β-1,2N-acetylglucosaminyltransférase glycolipid 2-alphamannosyltransférase UDP-GalNAc:polypeptide α-Nacetylgalactosaminyl-transférase T1 UDP-GalNAc:polypeptide α-Nacetylgalactosaminyl-transférase T2 UDP-GlcUA:Gal β-1,3-Gal-R glucuronyltransférase galactosylgalactosylxylosylprotein β-1,3-glucuronyltransférase 1 UDP-glucose glucosyltransférase
lactose synthase / N-acetyllactosamine synthase lipopolysaccharide galactosyltransférase glycogenine-1
inversion
rétention
fucosylglycoprotein α1,3galactosyltransférase lactose synthase / N-acetyllactosamine synthase
rétention
rétention
inversion
Mécanisme
fucosylglycoprotein α1,3-Nacetylgalactosaminyltransférase
protéine impliquée dans la biosynthèse de la paroi bactérienne UDP-galactose:β-D-galactosyl-1,4-Nacetyl-D-glucosaminide α-1,3galactosyltransférase
SpsA
α-1,3-Galactosyltransférase
Nom de l'enzyme
Abréviation
6
Fonction Repliement GT-A 2 Putative glycosyltransférase
cazy
GlcNAc/GalNAc (α1,4)GlcUA Man(α1-O)glycerate
Glc(α1-O)-Thr37
GlcUA(β1,3)Gal
GalNAc(α1-O-) Ser/Thr GalNAc(α1-O-) Ser/Thr GlcUA(β1,3)Gal
Man(α1,2)Man
Glc(α1-O)-Tyr194 / Glc(α1,4)Glc GlcNAc(β1,2)Man
Gal(α1,4)Gal
Gal(β1,4)GlcNAc
Gal(β1,4)GlcNAc
Gal(α1,3)Gal
GalNAc(α1,3)Gal
Gal(α1,3)Gal
Liaison formée
Rhodothermus marinus
Clostridium difficile Mus musculus
Homo sapiens
Homo sapiens
2FFU, 2FFV
Homo sapiens
2BO4, 2BO6, 2BO7, 2BO8
1OMX, 1OMZ, 1ON6, 1ON8
2BVL, 2BVM
1V82, 1V83, 1V84
1FGG, 1KWS
1S4O, 1S4P, 1S4N
1FO8, 1FO9, 1FOA
1LL0, 1LL2, 1LL3
1XHB
2AE7, 2AEC, 2AES, 2AGD, 2AH9 1G9R, 1GA8, 1SS9
1FGX, 1FR8, 1NF5, 1NHE, 1NKH, 1NQI, 1O23, 1NMM, 1NWG, 1OQM, 1O0R, 1PZT, 1PZY, 1TW5, 1TW1, 1TVY, 2FYA, 2FYB, 2FYC, 2FYD
1QGQ, 1QGS, 1H7Q, 1H7L, 1QG8 1FG5, 1G8O, 1G93, 1K4V, 1GWW, 1GWV, 1GX0, 1GX4, 1O7O, 1O7Q, 1VZT, 1VZU, 1VZX 1LZ0, 1LZI, 1R7T, 1R7V, 1R7Y, 1R81, 1WSZ, 1WT0, 1WT1, 1WT2, 1WT3, 1XZ6, 1ZHJ, 1ZJO, 1ZI1, 1ZI3, 1ZI5,1ZI4, 2A8W 1LZ7, 1LZJ, 1R7U, 1R7X, 1R80, 1R82, 1ZIZ, 1ZJP, 1ZJ0,1ZJ1,1ZJ2,1ZJ3,2A8U
Code PDB
Neisseria meningitidis Oryctolagus cuniculus Oryctolagus cuniculus Saccharomyces cerevisiae Mus musculus
Homo sapiens
Bos taurus
Homo sapiens
Homo sapiens
Bos taurus
Bacillus subtilis
Origine
(Flint et al., 2005)
(Pedersen et al., 2003)
(Reinert et al., 2005)
(Kakuda et al., 2004)
(Pedersen et al., 2002; Pedersen et al., 2000)
(Fritz et al., 2006)
(Fritz et al., 2004)
(Lobsanov et al., 2004)
(Unligil et al., 2000)
(Gibbons et al., 2002)
(Lairson et al., 2004; Persson et al., 2001)
(Gastinel et al., 1999; Ramakrishnan et al., 2002; Ramakrishnan et al., 2004; Ramakrishnan and Qasba, 2001; Ramakrishnan and Qasba, 2002; Ramakrishnan et al., 2006; Ramakrishnan et al., 2001; Ramasamy et al., 2003) (Ramasamy et al., 2005)
(Letts et al., 2006; Nguyen et al., 2003; Patenaude et al., 2002)
(Lee et al., 2004; Letts et al., 2006; Nguyen et al., 2003; Patenaude et al., 2002)
(Boix et al., 2001; Boix et al., 2002; Gastinel et al., 2001; Zhang et al., 2004; Zhang et al., 2003)
(Charnock and Davies, 1999; Tarbouriech et al., 2001)
Références
21
GtfD
UGT71G1
VvGT1
BGT
PmST1
β-vancosaminyltransférase
β-Glucosyltransférase
β-Glucosyltransférase
Glycogène synthase 1
Glycogène synthase
Heptosyltransférase II
Trehalose-6-phosphate synthase β-1,4-N-Acetylglucosaminyltransférase
β-Glucosyltransférase
α-2,3-Sialyltransférase
1
1
1
5
5
9
20
63
80
Sialyltransférase
Cst II
MurG
OtsA
RfaF
PaGS
bifunctional α-2,3/2,8sialyltransférase
CMP-NeuA: α-2,3-sialyltransferase
UDP-glucose: triterpene glucosyltransférase UDP-glucose:anthocyanidin 3-Oglucosyltransférase ADP-glucose-starch glucosyltransférase NDP-glucose-starch glucosyltransférase ADP-heptose LPS heptosyltransférase II UDP-glucose-glucosephosphate glucosyltransférase UDP-N-acetylglucosamine-Nacetylmuramyl-(Pentapeptide) pyrophosphoryl-undecaprenol Nacetylglucosamine transférase DNA β-glucosyltransférase
UDP-β-L-4-epivancosamine:vancomycinvancosaminyltransférase D
UDP-glucose:glucosyltransférase GtfB
TDP-4-epi-vancosaminyltransférase GtfA
inversion
inversion
NeuAc(α2,3)Gal / NeuAc(α2,8)NeuAc
NeuAc(α2,3)Gal / NeuAc(α2,3)GalNac / NeuAc(α2,3)NeuAc
Glc(β1-O)-5hydroxymethyl cytosine
GlcNAc(β1,4)Mur NAc
inversion
inversion
Glc(α1,1)Glc-6-P
Glc(α1,4)Glc
4-epivancosamine (β1-O)-Tyr6 of vancomycin aglycone Glc(β1-O)(Phegly)4 of vancomycin aglycone 4-epivancosamine (β1,2)Glc(β1-O)(Phegly)4 of vancomycin aglycone Glc(β1-O)triterpene Glc(β1-O)cyanidin Glc(α1,4)Glc
rétention
inversion
rétention
rétention
inversion
inversion
inversion
inversion
inversion
Tableau 2: Ensemble des structures présentes dans la base de données de glycosyltransférases.
42
Repliement Cst-II
28
GtfB
β-Glucosyltransférase
1
AtGS
GtfA
β-epivancosaminyltransférase
1
Repliement GT-B
Campylobacter jejuni
Pasteurella multocida
Bacteriophage T4
Escherichia coli
Escherichia coli
Escherichia coli
Agrobacterium tumefaciens Pyrococcus abyssi
Medicago truncatula Vitis vinifera
Amycolatopsis orientalis
Amycolatopsis orientalis
Amycolatopsis orientalis
1RO7, 1RO8
2BGT, 2BGU, 1C3J, 1QKJ, 1JEJ, 1JG6, 1JG7, 1JIU, 1JIV, 1JIX, 1IXY, 1J39, 1NZD, 1NVK, 1NZF, 1SXP, 1SXQ, 1M5R , 1YA6, 1Y8Z, 1Y6G, 1Y6F, 1XV5 1EX0, 1EX1
1F0K, 1NLM
1GZ5, 1UQT, 1UQU
1PSW
2BFW, 2BIS
1RZU, 1RZV
2C1X, 2C1Z, 2C9Z
2ACV, 2ACW
1RRV
1IIR
1PN3, 1PNV
(Chiu et al., 2004)
(Ni et al., 2006)
(Lariviere et al., 2003; Lariviere and Morera, 2002; Lariviere and Morera, 2004; Lariviere et al., 2005; Moréra et al., 1999; Moréra et al., 2001; Vrielink et al., 1994)
(Ha et al., 2000; Mulichak et al., 2003)
(Gibson et al., 2004; Gibson et al., 2002)
(Horcajada et al., 2006)
(Buschiazzo et al., 2004)
(Offen et al., 2006)
(Shao et al., 2005)
(Mulichak et al., 2004)
(Mulichak et al., 2001)
(Mulichak et al., 2003)
Organisme H. sapiens
M. musculus
R. norvegicus C. elegans
Fonction β3-GalT1 β3-GalT2 β3-GalT4 β3-GalT5
Query Q9Y5Z6 O43825 O96024 Q9Y2C3
β3-GalT6 β3-GalT7 C1-GalT1 C1-GalT2 β3-GalT1 β3-GalT2
AY050570 Q9NY97 AF155582 AB084170 O54904 O54905
β3-GalT3 β3-GalT4 β3-GalT5 β3-GalT6
O54906 Q9Z0F0 Q9JI67 AY050569
C1-GalT1 β3-GalT4 C1-GalT1 C1-GalT1
AF157963 O88178 AF157962 Q18515
Highest probability match with known crystal structure 1QG8 (7%), 1LLL3 (13%), 1GA8 (11%), 1LZJ (13%) 1QG8 (11%); 1LL3 (11%); 1LZJ (11%); 1OMZ (11%) 1LZJ (15%); 1QG8 (11%) 1QG8 (13%), 1LZJ (13%), 1FGG (14%), 1OMZ (9%), 1LL3 (15%) 1FGG (19%); 1FR8 (17%); 1QG8 (13%) 1QG8 (10%), 1LZJ (13%) 1FR8 (11%); 1QG8 (12%) 1QG8 (9%); 1FGG (8%) 1QG8 (7%); 1GA8 (11%); 1LL3 (13%); 1LZJ (14%) 1QG8 (9%); 1LZJ (12%); 1LL3 (12%); 1OMZ (10%), 1FR8 (13%) 1QG8 (11%); 1LZJ (13%); 1LL3 (12%); 1OMZ (11%) 1QG8 (11%), 1LL3 (17%), 1GA8 (13%) 1FGG (12%); 1QG8 (10%) 1FR8 (18%); 1FGG (17%); 1OMZ (11%); 1QG8 (9%); 1LL3 (13%) 1FOA (13%) 1QG8 (11%); 1GA8 (13%) (no similar glycosyltransferases found) 1LZJ (11%)
Tableau 3: Analyse contenu de la base de données des glycosyltransférases.
22
ARTICLE I
Breton, C., Snajdrova, L., Jeanneau, C., Koca, J. & Imberty, A. (2005). Structures and mechanisms of glycosyltransferases superfamily. Glycobiology 16, 29R-37R. Résumé : Les glycosyltransférases (GTs) catalysent le transfert du groupement saccharidique d'un glucide donneur sur une molécule spécifique d'accepteur (glucide ou non-glucide). Une classification qui s'appuie sur la ressemblance des séquence divise les GTs en plusieurs dizaines de familles, ce qui reflète aussi la diversité des molécules qui peuvent être utilisées comme accepteurs. Par contre, ce groupe d'enzymes est caractérisé par une architecture tridimensionnelle beaucoup plus conservée. Depuis quelques années, seuls deux repliements (GT-A et GT-B) ont été identifiés dans les structures cristallographiques résolues. Un rapport récent sur la structure d'une sialyltransférase bactérienne a permis la définition d'un nouveau repliement. Les progrès dans l'élucidation des structures et des mécanismes de réaction des GTs sont discutés dans cet article de revue. Le nombre de structures cristallographiques de GTs étant toujours en croissance, on a développé une base de données 3DGlycosyltransferase qui rassemble les informations structurelles concernant cette classe d'enzymes.
23
24
Glycobiology vol. 16 no. 2 pp. 29R–37R, 2006 doi:10.1093/glycob/cwj016 Advance Access publication on July 21, 2005
REVIEW Structures and mechanisms of glycosyltransferases
Christelle Breton1,2, Lenka Šnajdrová2,3, Charlotte Jeanneau2, Jaroslav Koca3, and Anne Imberty1,2 2
CERMAV–CNRS, affiliated with Université Joseph Fourier, PO Box 53, 38041 Grenoble cedex 9, France; and 3NCBR, Masaryk University, Kotlarska 2, CZ-611 37 Brno, Czech Republic Accepted on July 13, 2005
Glycosyltransferases (GTs) catalyze the transfer of a sugar moiety from an activated donor sugar onto saccharide and nonsaccharide acceptors. A sequence-based classification spreads GTs in many families thus reflecting the variety of molecules that can be used as acceptors. In contrast, this enzyme family is characterized by a more conserved three-dimensional architecture. Until recently, only two different folds (GT-A and GT-B) have been identified for solved crystal structures. The recent report of a structure for a bacterial sialyltransferase allows the definition of a new fold family. Progress in the elucidation of the structures and mechanisms of GTs are discussed in this review. To accommodate the growing number of crystal structures, we created the 3D-Glycosyltransferase database to gather structural information concerning this class of enzymes. Key words: fold recognition/glycosyltransferase/mechanism/ superfamily/three-dimensional structure Introduction Glycosyltransferases (GTs; EC 2.4.x.y) constitute a large family of enzymes that are involved in the biosynthesis of oligosaccharides, polysaccharides, and glycoconjugates (Taniguchi et al., 2002). These molecules of enormous diversity mediate a wide range of functions from structure and storage to signalling. Particularly abundant are the GTs that transfer a sugar residue from an activated nucleotide sugar donor, to specific acceptor molecules, forming glycosidic bonds. Transfer of the sugar residue occurs with either the retention or the inversion of the configuration of the anomeric carbon. These enzymes are present in both prokaryotes and eukaryotes, and they generally display exquisite specificity for both the glycosyl donor and the acceptor substrates. In eukaryotes, most of the glycosylation reactions that generate the diversity of oligosaccharide structures of eukaryotic cells occur in the Golgi apparatus. Golgi resident GTs are type-II transmembrane proteins 1
To whom correspondence should be addressed; e-mail:
[email protected] and
[email protected]
with a large C-terminal globular catalytic domain facing the luminal side. Recent success in X-ray crystal structure determinations of Golgi GTs has provided a molecular basis accounting for donor and acceptor substrate specificities as well as catalysis. This review deals with the most recent structural data concerning sugar nucleotide-dependent GTs. It also includes fold recognition and homology-modelling studies that were shown to be of value for identifying the folds of many GT families or for rationalizing experimental data. Sequence-based GT families GTs have been classified into families by amino acid sequence similarities (Coutinho et al., 2003) (available at http:// afmb.cnrs-mrs.fr/CAZY). In addition to nucleotide sugardependent GTs, the CAZY database also integrates GTs that utilize dolichol-phospho-sugars, sugar-1-phosphates, and lipid diphospho-sugars as activated donors. At the time of writing (May 2005), the database comprises more than 12,000 known and putative GT sequences that have been divided into 78 families (denoted as GTx). Approximately 240 human sequences have already been listed in this database where they fall into 42 distinct families. However, not all GT sequences are present in the database, and the number of families will likely increase with the discovery of new GT genes. Large differences in the number and function of GTs are observed among families. A few families comprise a huge number of sequences from various sources with diverse functions. This is best illustrated with the family GT2 which contains more than 3500 sequences, originating from animal, plant, yeast, and bacterial species, and for which at least 12 distinct GT functions have already been characterized. The members of family 2 include cellulose synthase, chitin synthase, mannosyltransferase, glucosyltransferase, galactosyltransferase, rhamnosyltransferase, and so on. In contrast, other families are monofunctional and contain only a few sequences. In the case of “monospecific” families, sequence similarities are generally observed for the whole catalytic domain, whereas for the large “polyspecific” families, such as GT2 or GT4, they are mostly restricted to only a portion of the catalytic domain. The prediction of the function of a putative GT of sequence homology can be problematic because there are many examples of closely related sequences having different catalytic activity. The best example comes from the study of blood group A and B transferases, which differ by only four amino acids. They use the same H-antigen acceptor but a different glycosyl donor (uridine diphospho N-acetyl galactosamine [UDP-GalNAc] for the A transferase and UDP-Gal
25 For permissions, please e-mail:
[email protected] © The Author 2005. Published by Oxford University Press. All rights reserved.
29R
C. Breton et al.
for the B transferase) (Yamamoto et al., 1990). Therefore, if an open reading frame is related to a large polyspecific family, its precise function cannot be reliably predicted. This sequence-based classification is assumed to integrate both structural and mechanistic features within each family (Coutinho et al., 2003). When applied to glycoside hydrolases, this classification system was shown to correlate well with enzyme mechanism (inversion or retention of the anomeric configuration), because once established for a member of a family, the mechanism can be safely extended to all other members of the family (Davies and Henrissat, 1995). This may not apply so safely to all GT families, because distant similarities between inverting and retaining CAZY families have been noted, suggesting that these families share common ancestors (Franco and Rigden, 2003; Liu and Mushegian, 2003). A similar observation has been made for the retaining GT27 family (corresponding to the animal polypeptide-αGalNAc transferases), which displays sequence similarities with inverting enzymes (Breton et al., 1998), and that was initially grouped into GT2 family (Campbell et al., 1997). Another example is given by the GT52 family, which may comprise both inverting (α2,3-sialyltransferase) and retaining (α2-glucosyltransferase) enzymes that use different nucleotide donors, cytidine monophospho (CMP)-βN-acetyl neuraminic acid (NeuAc) and UDP-α-Glc, respectively. Very recently, the three-dimensional structure of a mannosylglycerate synthase (MGS) from Rhodobacter marinus has been published (Flint et al., 2005). Although MGS, at the sequence level, displays similarity with protein members of family GT2, it is a retaining enzyme that has been classified in a new family (GT78). Altogether, these results demonstrate that the catalytic mechanism (inverting or retaining) cannot always be predicted with reliability from sequence comparison alone.
Crystal structures of GTs Difficulties with high-level expression, purification, and crystallization hampered crystal structure determinations for GT enzymes. The first X-ray structure was reported in 1994 for bacteriophage T4-glucosyltransferase, an enzyme that transfers glucose from UDP-Glc to phage-modified DNA (Vrielink et al., 1994). Since then, >100 crystal structures have been described for proteins corresponding to 23 different GTs, from prokaryotes and eukaryotes. Structural information is now available for 17 distinct GT families, including both retaining and inverting enzymes (Table I). In contrast to glycosylhydrolases that adopt a large variety of folds, including all α, all β, or mixed α/β structures, GT folds have been observed to consist primarily of α/β/α sandwiches (Figure 1), similar or very close to the Rossmanntype fold, a classical structural motif (six-stranded parallel β-sheet with 321456 topology) found in many nucleotidebinding proteins (Lesk, 1995). Until recently, only two structural superfamilies have been described for GTs, named GT-A and GT-B, and which were first observed in the original SpsA and β-glucosyltransferase (BGT) structures, respectively (Vrielink et al., 1994; Charnock and Davies, 1999). A third family has recently emerged which comprises the bacterial sialyltransferase (CstII) belonging to family 30R
GT42 (Chiu et al., 2004). This protein displays a similar type of fold than GT-A, but with some differences, so it can be considered as a new fold. The GT-A and GT-B folds are also shared by non-GT enzymes, such as nucleotidyltransferases and sugar epimerases, respectively (Brown et al., 1999; Campbell et al., 2000). The GT-B fold consists of two separate Rossmann domains with a connecting linker region and a catalytic site located between the domains. There is an excellent structural conservation between protein members of the GT-B family, particularly in the C-terminal domain which corresponds to the nucleotide-binding domain. Variations are more pronounced in the N-terminal domains, in the loops and helices which point towards the active site, which have evolved to accommodate very different acceptors. Peptide motifs characteristic of the GT-B fold, notably a glutamate residue and glycine-rich loops interacting with the ribose and phosphate moieties of nucleotide donor, respectively, have been described (Wrabl and Grishin, 2001). But so far, GT-B enzymes do not appear to share any strictly conserved residue (Hu and Walker, 2002). Although divalent cations may be required for full activity of GT-B enzymes, there is no evidence of a bound metal ion associated with catalysis. The GT-A fold consists of an α/β/α sandwich (a sevenstranded β-sheet with 3214657 topology in which strand 6 is antiparallel to the rest) that resembles a Rossmann fold. The central β-sheet is flanked by a smaller one, and the association of both creates the active site. A general feature of all enzymes of the GT-A family is the presence of a common motif, the DxD motif, and also their requirement for a divalent cation for activity (Breton et al., 1998; Breton and Imberty, 1999). The DxD motif is shown in all crystal structures to interact primarily with the phosphate groups of nucleotide donor through the coordination of a divalent cation, typically Mn2+ (Figure 2). Depending on the GT, the two aspartate amino acids are not always conserved, but this particular motif, or its variants, can always be identified at the same location, in a short loop connecting one β-strand of the main α/β/α sandwich to a smaller one (Figure 1A). Comparison of the catalytic domains of enzymes of the GT-A family revealed the presence of two regions that are structurally well conserved in all members of the GT-A family, including inverting and retaining enzymes (Figure 2). This suggests that common structural elements are necessary for the glycosyl transfer reaction, irrespective of the stereochemistry of the reaction. The first region mostly corresponds to the Rossmann-type nucleotide-binding domain, encompassing the first 100–120 residues, and that is terminated by the DxD motif. The key amino acids that interact with UDP are mainly found at the C-term of strands β1 and β4. In some crystal structures, residues in the C-terminus of the catalytic domain were shown to make additional contacts with UDP. Differences are observed in the function of the residues of the DxD motif in retaining and inverting enzymes. In retaining enzymes, the two aspartate residues can interact with the Mn2+ ion, whereas only the last aspartate interacts with the metal cation in inverting enzymes (Persson et al., 2001; Tarbouriech et al., 2001). In both cases, the variable amino acid of the DxD motif (usually a polar or an aliphatic residue of moderate
26
Structures and mechanisms of glycosyltransferases
Table I. Glycosyltransferases with available crystal structures
Organism
Glycosyltransferase
Name
GT family
Number of structure
Reference
β-Glucosyltransferase
BGT
GT63
18
Vrielink et al. (1994)
Virus Phage T4 Prokaryotes Agrobacterium tumefaciens
Glycogen synthase 1
AtGS
GT5
2
Buschiazzo et al. (2004)
Amycolatopsis orientalis
β-Epi-vancosaminyltransferase
GtfA
GT1
2
Mulichak et al. (2003)
β-Glucosyltransferase
GtfB
GT1
1
Mulichak et al. (2001)
β-Vancosaminyltransferase
GtfD
GT1
1
Mulichak et al. (2004)
Bacillus subtilis
Putative glycosyltransferase
SpsA
GT2
5
Charnock and Davies (1999)
Campylobacter jejuni
α-2,3/2,8-Sialyltransferase
CstII
GT42
2
Chiu et al. (2004)
Escherichia coli
β-1,4-GlcNAc transferase
MurG
GT28
2
Ha et al. (2000)
Trehalose-6-phosphate synthase
OtsA
GT20
3
Gibson et al. (2002)
Heptosyltransferase II
RfaF
GT9
1
PDB 1PSW
Neisseria meningitidis
α-1,4-Galactosyltransferase
LgtC
GT8
3
Persson et al. (2001)
Rhodothermus marinus
Mannosylglycerate synthase
MGS
GT78
4
Flint et al. (2005)
Eukaryotes Yeast
α-1,2-Mannosyltransferase
Kre2p/Mnt1P
GT15
3
Lobsanov et al. (2004)
Mouse
α-1,4-N-Acetylhexosaminyltransferase
Extl2
GT64
4
Pedersen et al. (2003)
Polypeptide-α-GalNAc transferase
ppGalNAc-T1
GT27
1
Fritz et al. (2004)
Rabbit
α-Glucosyltransferase
Glycogenin
GT8
3
Gibbons et al. (2002)
β-1,2-GlcNAc transferase I
GnTI
GT13
3
Bovine
α-1,3-Galactosyltransferase
α3GalT
GT6
13
β-1,4-Galactosyltransferase I
β4GalT1
GT7
16
Gastinel et al. (1999)
β-1,3-Glucuronyltransferase
GlcAT-I
GT43
2
Pedersen et al. (2000)
β-1,3-Glucuronyltransferase
GlcAT-P
GT43
3
Kakuda et al. (2004)
α-1,3-GalNAc transferase A
GTA
GT6
12
Patenaude et al. (2002)
α-1,3-Galactosyltransferase B
GTB
GT6
6
Patenaude et al. (2002)
Human
Unligil et al. (2000) Gastinel et al. (2001)
All data are available from the 3D-Glycosyltransferase database accessible from the Glyco3D site (http://www.cermav.cnrs.fr/glyco3d/). Only reference to the original work is indicated. Glycogen and starch phosphorylases (GT35) are not included in the 3D-Glycosyltransferase database.
size) is involved in ribose binding. The C-terminal portion is highly variable and is mostly dedicated to the recognition of the acceptor. However, a common structural motif is seen that corresponds to the region β6–α4–α5 (as shown for α4-galactosyltransferase [LgtC] in Figure 2) which forms part of the active site. This corresponds to the second structurally conserved region which comprises residues that were shown in some crystal structures to interact with both the donor sugar and the sugar acceptor (Persson et al., 2001; Boix et al., 2002; Pedersen et al., 2002; Ramakrishnan et al., 2002). In inverting enzymes, the presumed catalytic base was proposed in this region (a Asp or Glu residue at the beginning of α5-helix) (Pedersen et al., 2002; Kakuda et al., 2004). Despite the similarity of their spatial folds, GT-A and GT-B enzymes appear to be unrelated. These two superfamilies have members in the three domains of life. In addition, some bifunctional enzymes, such as the exostosin family (GT47/GT64), are expected to contain both GT-A and GT-B folds. The third-fold family has recently emerged with the crystal structure of a sialyltransferase (CstII) from Campylobacter jejuni, a highly prevalent foodborne pathogen (Chiu
27
et al., 2004). This is the first crystal structure of an enzyme that utilizes a nucleotide monophosphosugar (CMP-NeuAc). This enzyme which belongs to GT42, displays a different type of α/β/α sandwich (a seven-stranded parallel β-sheet with 8712456 topology) and has no DxD motif. The ratio of loops to secondary elements is high in GTs, and many crystal structures do not describe the entire catalytic domain, because the polypeptide extremities and/or several loops are flexible and do not present clear electron density. In both the GT-A family and bacterial sialyltransferase, flexible loops appear to play an important role in substrate binding. Of paramount importance is the demonstration for some of these enzymes of an ordered binding of donor and acceptor substrates linked to a donor substrate induced conformational change (Boix et al., 2001, 2002; Ramakrishnan et al., 2002). A disordered loop (or C-term extremity) in the free enzyme becomes ordered upon nucleotide sugar binding and creates a lid over the donor substrate where additional residues make direct contacts with the diphosphate moiety (Figure 2). This new conformation, called the closed active conformation, creates a pocket that will serve as binding site for the acceptor (Figure 3). Affinity 31R
C. Breton et al.
and, therefore, to limit the hydrolysis of energetically precious nucleotide sugar. Another significant feature is the observed distorted conformation of the bound nucleotide sugar in the active site that may be important in the catalytic mechanism (for a review, see Qasba et al., 2005). A database is now available that provides an extensive list of the known three-dimensional structures of GTs (http://www.cermav.cnrs.fr/glyco3d). The GTs are classified not only according to CAZY systematics (Coutinho et al., 2003), but also according to the organism of origin, the linkage formed by the enzymatic reaction or the protein fold. Useful links permit retrieval of bibliographic information and also atomic coordinates at the Protein Data Bank and other protein databases with structural information. Images are provided to illustrate the details of protein/ substrate interactions. Molecular modelling of GTs Molecular modelling of GTs is difficult. The number of available crystal structures is still limited. Only 17 of the 78 CAZY families include at least one GT for which an X-ray structure has been reported. The low degree of sequence similarity within some of the CAZY families and the absence of similarity between different families represent an unsurmountable barrier for classical sequence alignment procedure which is a prerequisite for homology building. Docking of substrates also appears to be a difficult task because of the flexibility of the nucleotide sugar and the presence of phosphate and divalent cation. Appropriate energy parameters have been recently developed for some modelling softwares (Petrova et al., 1999). Fold recognition
Fig. 1. Ribbon diagram of three glycosyltransferases (GTs) representative of the different folds. Bound nucleotide sugar are represented with stick model, and manganese, when present, by a ball. (A) GT-A fold, mouse α-1,4-N-acetylhexosaminyltransferase (EXTL2) complexed with UDP-GalNAc (PDB code 1OMZ) (Pedersen et al., 2003), (B) GT-B fold, Escherichia coli MurG complexed with UDP-GlcNAc (PDB code 1NLM) (Hu et al., 2003), and (C) Campylobacter jejuni sialyltransferase CstII complexed with cytidine monophospho 3-fluoro N-acetyl neuraminic acid (CMP-3FNeuAc) (PDB code 1RO7) (Chiu et al., 2004). Drawings were prepared with the Chimera program (Pettersen et al., 2004).
studies performed on LgtC (GT8), with the use of titration microcalorimetry confirmed that the open state (free enzyme) has no or little affinity for the oligosaccharidic acceptor (Boix et al., 2002). This complex mechanism is thought to prevent water molecules to act as acceptor for sugar transfer 32R
Fold recognition is a theoretical approach which allows alignment of one sequence with one structure by a process called “threading” (Godzik, 2003). Although there is little or no sequence similarity between GTs, the limited number of observed folds facilitates the use of fold recognition methods to predict whether GT-A, GT-B, or “something else,” will be the most probable for a given sequence. When performed on selected sequences representing all GT families present in CAZY database, such “threading” analyses predicted that many other GT families should adopt the GT-A or GT-B fold (Breton et al., 2002; Franco and Rigden, 2003). In 2002, of 56 GT families, we predicted with a high level of confidence the occurrence of either a GT-A or a GT-B fold for 23 families, five among them have later been confirmed by crystal structures: GT5, 9, 20, 27, 64 (formerly 47). Table II which gives the results of an updated fold-recognition study extended to include the new families that were performed using the three-dimensional positionspecific scoring matrix (3D-PSSM) fold-recognition program (Kelley et al., 2000). With the growing number of crystal structures sharing a GT-A or GT-B fold and the existence of a new fold type (Cst II fold), the new picture that emerges from this study is the prediction of a GT-A fold for 22 families, a GT-B fold predicted for 13 families, and a CstII fold predicted for two families. Therefore, GT families, where a SpsA or a BGT fold has been experimentally
28
Structures and mechanisms of glycosyltransferases
Fig. 2. Schematic representation of the catalytic domains of different members of the glycosyltransferase GT-A superfamily including inverting and retaining enzymes. The location of the two best structurally conserved regions are indicated by the gray shading. The first region encompasses ∼100 residues at the N-terminus of the catalytic domain and corresponds roughly to the nucleotide-binding domain that is terminated by the DxD motif. The second region comprises the structural motif formed by β6-α4-α5 (α4-galactosyltransferase [LgtC] numbering) that is part of the acceptor domain. β-Strands and α-helices are represented by blue arrows and green cylinders, respectively. Plain arrows and cylinders correspond to the most conserved secondary structure elements in the GT-A superfamily and are denoted as indicated for the LgtC. Residues interacting with the nucleotide sugar and acceptor sugar are marked using the colour coding given at the bottom of the figure.
Table II. Classification of glycosyltransferase (GT) families in structural superfamilies
Fig. 3. Accessible surface representation of two glycosyltransferases (GTs). (A) Neisseria meningitidis α-1,4-galactosyltransferase LgtC complexed with donor and acceptor sugar analogs (PDB code 1GA8) (Persson et al., 2001). (B) T4 phage β-glucosyltransferase (BGT) complexed with UDP and a 13-mer DNA duplex (PDB code 1IXY) (Lariviere and Morera, 2002). Connolly surfaces were calculated and represented using the MOLCAD program (Waldherr-Teschner et al., 1992).
demonstrated or predicted with a high probability level, now represent 51 (65%) of the 78 currently known families. Families corresponding to integral membrane proteins with multiple transmembrane domains were not considered in this study (named GT-C superfamily by some authors, Kikuchi et al., 2003; Liu and Mushegian, 2003). It is striking to note that a CsT II fold can now be predicted with confidence for
29
GT families
GT-A fold
GT-B fold
CstII fold
Known fold
2, 6, 7, 8, 13, 15, 27, 43, 64, 78
1, 5, 9, 20, 28, 35, 63
42
Predicted folda
12, 14, 16, 21, 24, 25, 31, 32b, 34, 40, 45, 46, 49, 54, 55, 60, 62, 67, 69, 71, 74, 77
3, 4, 10, 18, 19, 30, 33, 38, 41, 47, 56, 70, 72
29, 73
a Using 3D-PSSM program (Kelley et al., 2000). GT families corresponding to integral membrane proteins with multitransmembrane domains were not included in this study (excluded GT families GT22, 39, 48, 50, 53, 57, 58, 59, 66, 76). b GT-B fold predicted in Wrabl and Grishin (2001) and Liu and Mushegian (2003), using other methods (i.e., PSI-BLAST, HMMer).
GT29, the enzyme family comprising the eukaryotic sialyltransferases. Some families gave weak or moderate scores in fold recognition (i.e., GT11, 17, 18, 23, 26, 37, 44, 52, 61, 65, 68, 75), and this probably means that novel folds can still be discovered in future structural studies. Probabilistic methods of database searching, such as PSI-BLAST (Altschul et al., 1997) and HMMer (Eddy, 1998), can be used to detect homologs with lower sequence similarity. Using these methods, some authors were also able to predict a GT-A or GT-B fold for a number of GT families (Kikuchi et al., 2003; Liu and Mushegian, 2003). No major conflicting results are observed between the different approaches 33R
C. Breton et al.
except for GT32 (Table II). Fold recognition has also been used in a more innovative manner to identify GTs in the fully sequenced genome of Mycobacterium tuberculosis (Wimmerova et al., 2003). All of the 3995 translated peptide sequences were threaded on a small library of structures with GT-A, GT-B, and Rossmann folds. The resulting folding “scores” were combined with biochemical characteristics of the sequences (length, theoretical pI...) and treated with chemometric methods to discriminate and/or predict folds. Several sequences with unknown function could be proposed as putative GTs. The two-domain GT-B fold was easier to identify. Multivariate sequence analysis associated with fold recognition also prove to be a useful tool for predicting both folds and mechanisms (inverting or retaining) for the Escherichia coli and Synechocystis putative GTs present in the CAZY database (Rosen et al., 2004).
expasy.org//SWISSMODEL.html), proposed models for the monoglucosyl–diacylglycerol synthase and diglucosyl– diacylglycerol synthase of Acholeplasma laidlawii and Streptococcus pneumonia involved in the synthesis of membrane glucolipids. In plants, a large family of glucosyltransferases (belonging to GT1 family) are involved in the biosynthesis of natural products, such as alkaloids, terpenoids, and phenylpropanoids. Based on the crystal structure of a bacterial glucosyltransferase from Amycolatopsis orientalis that belongs to the same sequence family in CAZY, a model of the betanidin 5-O-glucosyltransferase from Dorotheathus bellidiformis has been proposed (Hans et al., 2004) as well as of the cyanohydrin glycosyltransferase from Sorghum bicolor (Thorsøe et al., 2005).
Homology modelling and docking of substrates
Since both inverting and retaining glycosyltransferases undergo large conformational movements (Figure 4), the flexible loops that are proposed in GT mechanisms have been the subject of pioneering studies by molecular dynamic simulation. Two well-characterized systems, that is, the inverting bovine β4-galactosyltransferase (Ramakrishnan and Qasba,
Once a sequence has been attributed a fold, it is possible to predict the secondary structure elements, namely α-helices and β-strands, and to align it with sequences with known three-dimensional structure. When sequence identities are very low, comparison of hydrophobic clusters (HCA) has proven to be a useful method for aligning sequences with similar fold (Gaboriaud et al., 1987). A new structure can then be modelled using homology-building methods. However, in general, models have a low confidence index for flexible loops and highly variable regions, and for GTs, this may be a major problem for modelling acceptor sites. This is illustrated by the modelling studies performed in GT-A family. The first attempt to model a GT was done on an α3-galactosyltransferase, using as a template the structure of BGT which was the only crystal structure available at that time. Although modelling was incorrect, the Rossmann-like topology of the nucleotide-binding domain and the role of the DxD motif in binding the divalent cation were later confirmed in the solved crystal structure of α3-galactosyltransferase (Gastinel et al., 2001). In contrast, when the target and the template have sufficient identity, such as α3-GalT and the α3-GTs responsible for the synthesis of blood group A, blood group B, Forssman and iGb3 antigens, the models are accurate and allow for docking of nucleotide sugars and acceptors (Heissigerova et al., 2003). Specificity towards the sugar donor and sugar acceptor was shown to be determined by a few critical residues in the binding site. In addition, a closed active conformation of the modelled enzymes was proposed that may complement data from the crystal structures of the blood group A and B transferases determined in an open conformation (Patenaude et al., 2002). Some homology modelling studies were performed in the GT-B family. The GT-B topology with a large cleft between the two domains appears to be well adapted to accommodate large acceptor molecules as shown in Figure 3B with the docking of DNA in the BGT structure. The configuration of the substrates in the proposed model (Moréra et al., 1999) suggested the participation of an Asp residue (D100) as the catalytic base. A recent study supports a role in catalysis for this residue (Lariviere et al., 2003). Edman and colleagues (Edman et al., 2003), using the fully automated modelling server SWISS-MODEL (http://swissmodel. 34R
Molecular dynamics
Fig. 4. Superimposition of open (green) and close (red) forms of glycosyltransferases (GTs). (A) Bovine α-1,3-galactosyltransferase: open form (PDB code 1G8O) (Gastinel et al., 2001) and close form (PDB code 1K4V) (Boix et al., 2001). (B) Bovine β-1,4-galactosyltransferase form I (PDB code 1PZT) and form II (PDB code 1PZY) (Ramasamy et al., 2003). Drawings were prepared with the Chimera program (Pettersen et al., 2004).
30
Structures and mechanisms of glycosyltransferases
2001) and the retaining LgtC from Neisseria meningitidis (Persson et al., 2001), have been crystallized in their closed conformations. In both cases, when substrates are removed from the protein, several ns simulations with explicit water molecules allowed to see the first step of the loops opening (Gunasekaran and Nussinov, 2004; Snajdrová et al., 2004). Longer simulations would be required to visualize the complete loop movements (>20 Å change for one loop of β4-GalT), but these preliminary results demonstrate the correlated motions of several loops as well as the importance of contacts between loops in the mechanism. Mechanisms A specific feature of GTs is the use of an activated donor that can be a nucleoside diphosphosugar, nucleoside monophosphosugar, or lipid phosphosugar. Mechanistic analogies between GTs and glycosylhydrolases have been reviewed recently (Lairson and Withers, 2004). The mechanism of inverting GTs is believed to be similar to the one of inverting glycosylhydrolases with the requirement of one acidic amino acid that activates the acceptor hydroxyl group by deprotonation. All structural evidence to date supports the SN2 mechanism originally proposed by Wong’s group for α3-FucT (Murray et al., 1996). Mechanism of GT-A inverting GTs has the particularities to involve a Mn++ ion that plays the role of acid catalyst and also initiates a sequential ordered mechanism in which nucleotide sugar binding is followed by loops closing and acceptor binding (Ramakrishnan et al., 2004). GT-B fold-inverting GTs have completely different active sites with no ion involved. T4 phage BGT that transfers glucose to modified DNA also uses an SN2 mechanism with the participation of one acidic amino acid from the protein (Lariviere et al., 2003). The mechanism of retaining GTs has not been elucidated yet. Direct comparison with glycosylhydrolases would suggest a double displacement mechanism with a short-lived glycosyl–enzyme intermediate. Despite numerous efforts, the only intermediate that could be trapped involved an amino acid rather far from the active site (Lairson et al., 2004). An alternative mechanism suggests a direct attack by the acceptor, as previously proposed for glycogen phosphorylase (Klein et al., 1986). This so-called SNi-like mechanism was originally proposed by Persson et al. (2001) for bacterial LgtC. Quantum mechanical calculations using Density Functional Theory applied to a model site of LgtC (136 atoms included) confirmed that the one-step mechanism is energetically favoured (Tvaroska, 2004). These prediction were used for designing new putative inhibitors based on the scaffold of the transition state (Raab et al., 2005). Retaining GTs with GT-B fold, that have distant structural similarities with glycogen phosphorylases, have been proposed to adopt the same mechanism, based on structural studies of complexes with nontransferable analogues (Gibson et al., 2004). In all retaining GTs, the nucleotide sugar is forced to adopt a folded shape that brings the sugar over the pyrophosphate. This special conformation facilitates the transfer by several means: the C-1 is spatially accessible for the reactions, the anomeric bond is elongated and weakened by the torsion around the Π torsion angle as
31
calculated by ab initio methods (Petrova et al., 1999), and a hydrogen bond can be established between O-2 of sugar and phosphate, lowering the energy barrier. Conclusions A sequence-based classification places GTs in many families. In contrast, folds appear to be more conserved because there is convergence to only a few topologies for most of the GT families. The small variety of folds observed is compensated by a large structural variability in the acceptorbinding domain, thus conferring some functional plasticity which allows fine tuning with respect to the acceptor. The catalytic mechanisms are still poorly understood, but current data suggest the importance of movement of loops and domains in catalysis. Additional structural, modelling, and mutational studies are needed to further progress in the understanding of these enzymes.
Acknowledgments CERMAV is part of GDR CNRS n 2590 and FR CNRS n 2607. The work was supported by Ministry of Education (JK, LS contract MSM0021622413) of the Czech Republic. Travel and visits between NCBR and CERMAV were supported by a BARRANDE exchange program. C.J. benefited from a grant from French Ministry of Research and Education and L.S. from French Ministry for Foreign Affairs.
Abbreviations 3D-PSSM, three-dimensional position-specific scoring matrix; BGT, β-glucosyltransferase; CMP, cytidine monophosphate; GalNAc, N-acetylgalactosamine; GT, glycosyltransferase; MGS, mannosylglycerate synthase; NeuAC, N-acetyl neuraminic acid; UDP, uridine diphosphate.
References Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402. Boix, E., Swaminathan, G.J., Zhang, Y., Natesh, R., Brew, K., and Acharya, K.R. (2001) Structure of UDP complex of UDP-galactose: β-galactoside-α-1,3-galactosyltransferase at 1.53A resolution reveals a conformational change in the catalytically important C-terminus. J. Biol. Chem., 276, 48608–48614. Boix, E., Zhang, Y., Swaminathan, G.J., Brew, K., and Acharya, K.R. (2002) Structural basis of ordered binding of donor and acceptor substrates to the retaining glycosyltransferase alpha-1,3-galactosyltransferase. J. Biol. Chem., 277, 28310–28318. Breton, C. and Imberty, A. (1999) Structure-function studies of glycosyltransferases. Curr. Opin. Struct. Biol., 9, 563–571. Breton, C., Bettler, E., Joziasse, D.H., Geremia, R.A., and Imberty, A. (1998) Sequence-function relationships of prokaryotic and eukaryotic galactosyltransferases. J. Biochem. (Tokyo), 123, 1000–1009. Breton, C., Heissigerova, H., Jeanneau, C., Moravcova, J., and Imberty, A. (2002) Comparative aspects of glycosyltransferases. Biochem. Soc. Symp., 69, 23–32.
35R
C. Breton et al.
Brown, K., Pompeo, F., Dixon, S., Mengin-Lecreulx, D., Cambillau, C., and Bourne, Y. (1999) Crystal structure of the bifunctional N-acetylglucosamine 1-phosphate uridyltransferase from Escherichia coli: a paradigm for the related pyrophosphorylase superfamily. EMBO J., 18, 4096–4107. Buschiazzo, A., Ugalde, J.E., Guerin, M.E., Shepard, W., Ugalde, R.A., and Alzari, P.M. (2004) Crystal structure of glycogen synthase: homologous enzymes catalyze glycogen synthesis and degradation. EMBO J., 23, 3196–3205. Campbell, J.A., Davies, G.J., Bulone, V., and Henrissat, B. (1997) A classification of nucleotide-diphospho-sugar glycosyltransferases based on amino acid sequence similarities. Biochem. J., 326, 929–939. Campbell, R.E., Mosimann, S.C., Tanner, M.E., and Strynadka, N.C. (2000) The structure of UDP-N-acetylglucosamine 2-epimerase reveals homology to phosphoglycosyl transferases. Biochemistry, 39, 14993–15001. Charnock, S.J. and Davies, G.J. (1999) Structure of the nucleotide-diphospho-sugar transferase, SpsA from Bacillus subtilis, in native and nucleotide-complexed forms. Biochemistry, 38, 6380–6385. Chiu, C.P., Watts, A.G., Lairson, L.L., Gilbert, M., Lim, D., Wakarchuk, W.W., Withers, S.G., and Strynadka, N.C. (2004) Structural analysis of the sialyltransferase Cst II from Campylobacter jejuni in complex with a substrate analog. Nat. Struct. Mol. Biol., 11, 163–170. Coutinho, P.M., Deleury, E., Davies, G.J., and Henrissat, B. (2003) An evolving hierarchical family classification for glycosyltransferases. J. Mol. Biol., 328, 307–317. Davies, G. and Henrissat, B. (1995) Structures and mechanisms of glycosyl hydrolases. Structure, 3, 853–859. Eddy, S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755–763. Edman, M., Berg, S., Storm, P., Wikstrom, M., Vikstrom, S., Ohman, A., and Wieslander, A. (2003) Structural features of glycosyltransferases synthesizing major bilayer and nonbilayer-prone membrane lipids in Acholeplasma laidlawii and Streptococcus pneumoniae. J. Biol. Chem., 278, 8420–8428. Flint, J., Taylor, E., Yang, M., Bolam, D.N., Tailford, L.E., MartinezFleites, C., Dodson, E.J., Davis, B.G., Gilbert, H.J., and Davies, G.J. (2005) Structural dissection and high-throughput screening of mannosylglycerate synthase. Nat. Struct. Mol. Biol., 12, 608–614. Franco, O.L. and Rigden, D.J. (2003) Fold recognition analysis of glycosyltransferase families: further members of structural superfamilies. Glycobiology, 13, 707–712. Fritz, T.A., Hurley, J.H., Trinh, L.B., Shiloach, J., and Tabak, L.A. (2004) The beginnings of mucin biosynthesis: the crystal structure of UDP-GalNAc:polypeptide alpha-N-acetylgalactosaminyltransferaseT1. Proc. Natl. Acad. Sci. U. S. A., 101, 15307–15312. Gaboriaud, C., Bissery, V., Benchetrit, T., and Mornon, J.P. (1987) Hydrophobic cluster analysis: an efficient way to compare and analyse amino acid sequences. FEBS Lett., 224, 149–155. Gastinel, L.N., Cambillau, C., and Bourne, Y. (1999) Crystal structures of the bovine beta4galactosyltransferase catalytic domain and its complex with uridine diphosphogalactose. EMBO J., 18, 3546–3557. Gastinel, L.N., Bignon, C., Misra, A.K., Hindsgaul, O., Shaper, J.H., and Joziasse, D.H. (2001) Bovine α1,3-galactosyltransferase catalytic domain structure and its relationship with ABO histo-blood group and glycosphingolipid glycosyltransferases. EMBO J., 20, 638–649. Gibbons, B.J., Roach, P.J., and Hurley, T.D. (2002) Crystal structure of the autocatalytic initiator of glycogen biosynthesis, glycogenin. J. Mol. Biol., 319, 463–477. Gibson, R.P., Turkenburg, J.P., Charnock, S.J., Lloyd, R., and Davies, G.J. (2002) Insights into trehalose synthesis provided by the structure of the retaining glucosyltransferase OtsA. Chem. Biol., 9, 1337–1346. Gibson, R.P., Tarling, C.A., Roberts, S., Withers, S.G., and Davies, G.J. (2004) The donor subsite of trehalose-6-phosphate synthase: binary complexes with UDP-glucose and UDP2-deoxy-2-fluoro-glucose at 2 A resolution. J. Biol. Chem., 279, 1950–1955. Godzik, A. (2003) Fold recognition methods. Methods Biochem. Anal., 44, 525–546. Gunasekaran, K. and Nussinov, R. (2004) Modulating functional loop movements: the role of highly conserved residues in the correlated loop motions. Chembiochem., 5, 224–230. Ha, S., Walker, D., Shi, Y., and Walker, S. (2000) The 1.9 A crystal structure of Escherichia coli MurG, a membrane-associated glycosyltransferase involved in peptidoglycan biosynthesis. Protein Sci., 9, 1045–1052.
36R
Hans, J., Brandt, W., and Vogt, T. (2004) Site-directed mutagenesis and protein 3D-homology modelling suggest a catalytic mechanism for UDP-glucose-dependent betanidin 5-Oglucosyltransferase from Dorotheanthus bellidiformis. Plant J., 39, 319–333. Heissigerova, H., Breton, C., Moravcova, J., and Imberty, A. (2003) Molecular modeling of glycosyltransferases involved in the biosynthesis of blood group A, blood group B, Forssman and iGb3 antigens and their interaction with substrates. Glycobiology, 13, 377–386. Hu, Y. and Walker, S. (2002) Remarkable structural similarities between diverse glycosyltransferases. Chem. Biol., 9, 1287–1296. Hu, Y., Chen, L., Ha, S., Gross, B., Falcone, B., Walker, D., Mokhtarzadeh, M., and Walker, S. (2003) Crystal structure of the MurG:UDP-GlcNAc complex reveals common structural principles of a superfamily of glycosyltransferases. Proc. Natl. Acad. Sci. U. S. A., 100, 845–849. Kakuda, S., Shiba, T., Ishiguro, M., Tagawa, H., Oka, S., Kajihara, Y., Kawasaki, T., Wakatsuki, S., and Kato, R. (2004) Structural basis for acceptor substrate recognition of a human glucuronyltransferase, GlcAT-P, an enzyme critical in the biosynthesis of the carbohydrate epitope HNK-1. J. Biol. Chem., 279, 22693–22703. Kelley, L.A., MacCallum, R.M., and Sternberg, M.J.E. (2000) Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol., 299, 499–520. Kikuchi, N., Kwon, Y.D., Gotoh, M., and Narimatsu, H. (2003) Comparison of glycosyltransferase families using the profile hidden Markov model. Biochem. Biophys. Res. Commun., 310, 574–579. Klein, H.W., Im, M.J., and Palm, D. (1986) Mechanism of the phosphorylase reaction. Utilization of D-gluco-hept-1-enitol in the absence of primer. Eur. J. Biochem., 157, 107–114. Lairson, L.L. and Withers, S.G. (2004) Mechanistic analogies amongst carbohydrate modifying enzymes. Chem. Commun. (Camb), 20, 2243– 2248. Lairson, L.L., Chiu, C.P., Ly, H.D., He, S., Wakarchuk, W.W., Strynadka, N.C., and Withers, S.G. (2004) Intermediate trapping on a mutant retaining alpha-galactosyltransferase identifies an unexpected aspartate residue. J. Biol. Chem., 279, 28339–22844. Lariviere, L. and Morera, S. (2002) A base-flipping mechanism for the T4 phage betaglucosyltransferase and identification of a transition-state analog. J. Mol. Biol., 324, 483–490. Lariviere, L., Gueguen-Chaignon, V., and Morera, S. (2003) Crystal structures of the T4 phage beta-glucosyltransferase and the D100A mutant in complex with UDP-glucose: glucose binding and identification of the catalytic base for a direct displacement mechanism. J. Mol. Biol., 330, 1077–1086. Lesk, A.M. (1995) NAD-binding domains of dehydrogenases. Curr. Opin. Struct. Biol., 5, 775–783. Liu, J. and Mushegian, A. (2003) Three monophyletic superfamilies account for the majority of the known glycosyltransferases. Protein Sci., 12, 1418–1431. Lobsanov, Y.D., Romero, P.A., Sleno, B., Yu, B., Yip, P., Herscovics, A., and Howell, P.L. (2004) Structure of Kre2p/Mnt1p: a yeast alpha1,2mannosyltransferase involved in mannoprotein biosynthesis. J. Biol. Chem., 279, 17921–17931. Moréra, S., Imberty, A., Aschke-Sonnenbron, U., Rüger, W., and Freemont, P.S. (1999) T4 phage beta-glucosyltransferase: substrate binding an proposed catalytic mechanism. J. Mol. Biol., 292, 717– 730. Mulichak, A.M., Losey, H.C., Walsh, C.T., and Garavito, R.M. (2001) Structure of the UDPglucosyltransferase GtfB that modifies the heptapeptide aglycone in the biosynthesis of vancomycin group antibiotics. Structure, 9, 547–557. Mulichak, A.M., Losey, H.C., Lu, W., Wawrzak, Z., Walsh, C.T., and Garavito, R.M. (2003) Structure of the TDP-epi-vancosaminyltransferase GtfA from the chloroeremomycin biosynthetic pathway. Proc. Natl. Acad. Sci. U. S. A., 100, 9238–9243. Mulichak, A.M., Lu, W., Losey, H.C., Walsh, C.T., and Garavito, R.M. (2004) Crystal structure of vancosaminyltransferase GtfD from the vancomycin biosynthetic pathway: interactions with acceptor and nucleotide ligands. Biochemistry, 43, 5170–5180. Murray, B.W., Takayama, S., Schultz, J., and Wong, C.H. (1996) Mechanism and specificity of human alpha-1,3-fucosyltransferase V. Biochemistry, 35, 11183–11195.
32
Structures and mechanisms of glycosyltransferases
Patenaude, S.I., Seto, N.O., Borisova, S.N., Szpacenko, A., Marcus, S.L., Palcic, M.M., and Evans, S.V. (2002) The structural basis for specificity in human ABO (H) blood group biosynthesis. Nat. Struct. Biol., 9, 685–690. Pedersen, L.C., Tsuchida, K., Kitagawa, H., Sugahara, K., Darden, T.A., and Negishi, M. (2000) Heparan/chondroitin sulfate biosynthesis. Structure and mechanism of human glucuronyltransferase I. J. Biol. Chem., 275, 34580–34585. Pedersen, L.C., Darden, T.A., and Negishi, M. (2002) Crystal structure of beta 1,3glucuronyltransferase I in complex with active donor substrate UDP-GlcUA. J. Biol. Chem., 277, 21869–21873. Pedersen, L.C., Dong, J., Taniguchi, F., Kitagawa, H., Krahn, J.M., Pedersen, L.G., Sugahara, K., and Negishi, M. (2003) Crystal structure of an alpha 1,4-Nacetylhexosaminyltransferase (EXTL2), a member of the exostosin gene family involved in heparan sulfate biosynthesis. J. Biol. Chem., 278, 14420–14428. Persson, K., Ly, H.D., Dieckelmann, M., Wakarchuk, W.W., Withers, S.G., and Strynadka, N.C.J. (2001) Crystal structure of the retaining galactosyltransferase LgtC from Neisseria meningitidis in complex with donor and acceptor sugar analogs. Nat. Struct. Biol., 8, 166–175. Petrova, P., Koca, J., and Imberty, A. (1999) Potential energy hypersurfaces of nucleotide-sugars: Ab initio calculations, force-field parametrization, and exploration of the flexibility. J. Am. Chem. Soc., 121, 5535–5547. Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng, E.C., and Ferrin, T.E. (2004) UCSF chimera – a visualization system for exploratory research and analysis. J. Comput. Chem., 25, 1605–1612. Qasba, P.K., Ramakrishnan, B., and Boeggeman, E. (2005) Substrateinduced conformational changes in glycosyltransferases. Trends Biochem. Sci., 30, 53–62. Raab, M., Kozmon, S., and Tvaroska, I. (2005) Potential transition-state analogs for glycosyltransferases. Design and DFT calculations of conformational behavior. Carbohydr. Res., 340, 1051–1057. Ramakrishnan, B. and Qasba, P.K. (2001) Crystal structure of lactose synthase reveals a large conformational change in its catalytic component the beta1,4-galactosyltransferase-I. J. Mol. Biol., 310, 205–218. Ramakrishnan, B., Balaji, P.V., and Qasba, P.K. (2002) Crystal structure of beta-1,4galactosyltransferase complex with UDP-Gal reveals an oligosaccharide binding site. J. Mol. Biol., 318, 491–502. Ramakrishnan, B., Boeggeman, E., Ramasamy, V., and Qasba, P.K. (2004) Structure and catalytic cycle of beta-1,4-galactosyltransferase. Curr. Opin. Struct. Biol., 14, 593–600. Ramasamy, V., Ramakrishnan, B., Boeggeman, E., and Qasba, P.K. (2003) The role of tryptophan 314 in the conformational changes of beta1,4-galactosyltransferase-I. J. Mol. Biol., 331, 1065–1076.
33
Rosen, M.L., Edman, M., Sjostrom, M., and Wieslander, A. (2004) Recognition of fold and sugar linkage for glycosyltransferases by multivariate sequence analysis. J. Biol. Chem., 279, 38683–38692. Snajdrová, L., Kulhánek, P., Imberty, A., and Koca, J. (2004) Molecular dynamics simulations of glycosyltransferase LgtC. Carbohydr. Res., 339, 995–1006. Taniguchi, N., Honke, K., and Fukuda, M. (2002) Handbook of Glycosyltransferase and Related Genes. Springer, Tokyo. Tarbouriech, N., Charnock, S.J., and Davies, G.J. (2001) Three-dimensional structures of the Mn and Mg dTDP complexes of the family GT-2 glycosyltransferase SpsA: a comparison with related NDP-sugar glycosyltransferases. J. Mol. Biol., 314, 655–661. Thorsøe, K.S., Bak, S., Olsen, C.E., Imberty, A., Breton, C. and Møller, B.L. (2005) Determination of catalytic key amino acids and UDP-sugar donor specificity of the cyanohydrin glycosyltransferase UGT85B1 from Sorghum bicolor: molecular modeling substantiated by site specific mutagenesis and biochemical analyses. Plant Physiol, forthcoming. Tvaroska, I. (2004) Molecular modeling insights into the catalytic mechanism of the retaining galactosyltransferase LgtC. Carbohydr. Res., 339, 1007–1014. Unligil, U.M., Zhou, S., Yuwaraj, S., Sarkar, M., Schachter, H., and Rini, J.M. (2000) X-ray crystal structure of rabbit N-acetylglucosaminyltransferase I: catalytic mechanism and a new protein superfamily. EMBO J., 19, 5269–5280. Vrielink, A., Rüger, W., Driessen, H.P.C., and Freemont, P.S. (1994) Crystal structure of the DNA modifying enzyme β-glucosyltransferase in the presence and absence of the substrate uridine diphosphoglucose. EMBO J., 13, 3413–3422. Waldherr-Teschner, M., Goetze, T., Heiden, W., Knoblauch, M., Vollhardt, H., and Brickmann, J. (1992) MOLCAD – computer aided visualization and manipulation of models in molecular science. In Post, F.H. and Hin, A.J.S. (eds), Advances in Scientific Visualization. Springer, Heidelberg, Germany, pp. 58–67. Wimmerova, M., Engelsen, S.B., Bettler, E., Breton, C., and Imberty, A. (2003) Combining fold recognition and exploratory data analysis for searching for glycosyltransferases in the genome of Mycobacterium tuberculosis. Biochimie, 85, 691–700. Wrabl, J. and Grishin, N. (2001) Homology between O-linked GlcNAc transferases and proteins of the glycogen phosphorylase superfamily. J. Mol. Biol., 314, 365–374. Yamamoto, F., Clausen, H., White, T., Marken, J., and Hakomori, S. (1990) Molecular genetic basis of the histo-blood group ABO system. Nature, 345, 229–233.
37R
34
2. METHODES : CHIMIE COMPUTATIONNELLE ET MODELISATION MOLECULAIRE Contrairement aux domaines classiques de la chimie qui ont une longue tradition, la chimie computationnelle et la modélisation moléculaire sont relativement nouvelles. Malgré leur jeunesse, ces méthodes sont de plus en plus utilisées et ont un avenir prometteur. On classe les méthodes actuelles de calcul en plusieurs catégories selon le niveau de théorie qu’elles utilisent ou bien selon le type de calcul qu’elles réalisent. On distingue la mécanique quantique (MQ) et la mécanique moléculaire (MM) qui sont les deux approches limites par rapport au niveau de théorie utilisée. Une combinaison de ces deux approches existe également, la technique hybride de mécanique quantique / mécanique moléculaire (MQ/MM). Les autres méthodes sont la dynamique moléculaire (DM) et la modélisation de Monte Carlo, les deux utilisant une approche MM pour échantillonner l’espace conformationnel des molécules et analyser leur comportement structural dans le temps. On distingue les calculs single-point et les optimisations de géométrie des molécules, les calculs de diverses propriétés chimiques ou les calculs de recherche conformationnelle. On peut utiliser chacune de ces approches aux différents niveaux de la théorie utilisée. La seule limitation est la taille du système étudié et l’équipement de calcul disponible. Toutefois, il faut toujours se rendre compte que la chimie computationnelle ne produit pas de solutions. Elle ne produit que les résultats numériques; l’évaluation de la qualité de ces résultats et l’appréciation de leur fiabilité est du ressort du scientifique qui emploie cet outil.
2.1. Mécanique quantique Dans la MQ, la molécule est considérée comme un ensemble de noyaux atomiques, chacun d’eux étant entouré de ses électrons, qui sont entièrement décrits par l’équation Schrödinger (Schrödinger, 1926). L’équation 1 décrit la forme de l’équation de Schrödinger en fonction du temps pour le système à une seule particule ∂ψ (r , t ) h 2 ∇ + ν Ψ (r, t ) = ih − ∂t 2m
(1)
où E est l’énergie totale de la particule de la masse m en mouvement à travers l’espace sous l’influence du champ externe ν. h est la constante de Planck réduite. Le vecteur de position r est défini dans les coordonnées cartésiennes par r = xi + yj + zk . On peut utiliser la fonction d’onde ψ décrivant l’état du système pour évaluer d’autres propriétés. ∇ 2 est l’opérateur laplacien, qui est décrit, (dans l’espace cartésien) par l’équation 2.
35
∇ i2 =
∂2 ∂2 ∂2 + + ∂x i2 ∂y i2 ∂z i2
(2)
La côté gauche de l’équation Schrödinger (eq. 1) est habituellement abrégée en Hψ, où H est l’opérateur hamiltonien de l’énergie, prenant généralement en considération cinq contributions de l’énergie totale: (1) l’énergie cinétique des électrons, (2) l’énergie cinétique des noyaux, (3) l’attraction des électrons aux noyaux, (4) l’attraction entre les électrons, et (5) la répulsion entre les noyaux. A condition que l’opérateur hamiltonien H ne soit pas une fonction de temps, il est possible de séparer les coordonnées et le temps, et on peut réduire l’équation de Schrödinger en une forme plus simple: H (r )Ψ(r ) = EΨ (r )
(3)
Ainsi, on peut déterminer l’énergie du système spécifique dans les conditions spécifiques (coordonnées, fonction d’onde) par l’intégration de l’équation de Schrödinger: E=
∫ Ψ *HΨdr ∫ Ψ *Ψdr
(4)
ψ peut être une fonction complexe, donc la notation complexe conjuguée est utilisée dans l’expression 4. Le carré de la fonction d’onde (ψ2) au point r est la probabilité de présence d’une particule en ce point. La probabilité d’existence de la particule dans l’espace entier doit être égale à 1, donc le dénominateur dans l’équation 4 doit être 1 également, et la fonction d’onde est normalisée. On peut résoudre l’équation de Schrödinger analytiquement seulement pour des problèmes très simples, tels que la particule unique dans une boîte, l’oscillateur harmonique ou l’atome d’hydrogène isolé. Afin d’effectuer ces résolutions numériques complexes et d’obtenir des résultats dans un temps raisonnable, il est nécessaire d’introduire plusieurs approximations. L’approximation de Born-Oppenheimer (Born and Oppenheimer, 1927) est la première et la plus fondamentale et est utilisée dans toutes les méthodes de MQ (Hehre et al., 1986). Selon cette approximation, le mouvement des électrons est séparé de celui des noyaux en prenant en compte le fait que les électrons sont beaucoup plus légers, et donc peuvent réagir à chaque changement de positions des noyaux presque immédiatement. On considère les noyaux comme fixes, donc leur fonction d’onde électronique dépend seulement de leurs positions (et non de leurs mouvements). D’autres approximations généralement utilisées concernent la forme de la fonction d’onde, comme l’approximation MO-LCAO (Mulliken, 1955) qui représente l’orbitale moléculaire comme une combinaison linéaire d’orbitales atomiques. En fonction des autres approximations utilisées, les méthodes de MQ sont divisées en quatre groupes principaux: 36
2.1.1. Les méthodes empiriques Les méthodes empiriques, comme, par exemple, le Hückel Molecular Orbital (HMO) or Extended Hückel Theory (EHT) réduisent considérablement les temps de calcul en le réduisant aux parties "nécessaires" ou "intéressantes" (Leach, 2001). Ces méthodes utilisent des approximations très grossières qui produisent des grandes erreurs de calcul, et donc ne sont employées que très rarement.
2.1.2. Les méthodes semi-empiriques Toutes les méthodes semi-empiriques, telles que CNDO, INDO, MINDO, NNDO, MNDO ou le bien-connu et encore utilisé Austin Model 1 (AM1) (Dewar, 1993) et Parametrization Method 3 (PM3) (Stewart, 1989a; Stewart, 1989b) négligent généralement le calcul de certaines intégrales et les remplacent par des paramètres expérimentaux. Ces paramètres peuvent être obtenus, par exemple, à partir des caractéristiques spectrales des atomes, de calcul ab initio de haut niveau ou d’autres méthodes expérimentales (Young, 2001). Une approximation supplémentaire des méthodes semi-empiriques est de considérer uniquement les électrons de valence dans le calcul, le reste des électrons étant inclus dans le "coeur" (avec le noyau), qui n’est pas pris en compte dans le calcul. De cette manière, tout en appartenant toujours aux méthodes QM, ces approches réduisent considérablement la puissance de calcul requis par rapport aux méthodes ab initio classiques et elles peuvent être utilisées pour des réactions chimiques.
2.1.3. Les méthodes ab initio Les méthodes ab initio résolvent l’équation de Schrödinger en utilisant un nombre mininal d’approximations, telles que celles de Born-Oppenheimer et de MO-LCAO. Elles ne comportent aucuns paramètres expérimentaux ou empiriques. Le problème principal des méthodes ab initio est la résolution des interactions électrostatiques entre les électrons. La méthode Hartree-Fock (HF) (Fock, 1930) était la première théorie introduite pour donner une réponse à ce problème. L’ensemble des électrons est défini de manière à ce que chaque électron se déplace dans le champ électrostatique moyen des autres électrons. Le système entier est donc décrit par une série d’équations HF, qui sont résolues par un processus itératif en utilisant la méthode du self-consistent field (SCF). Dans la SCF procédure, les fonctions ψ de toutes les OM (orbitales moléculaires) sont initialement estimées et sont utilisées pour construire les opérateurs hamiltoniens de chaque électron. Ces opérateurs hamiltoniens sont nécessaires pour la génération d’une nouvelle série de ψ, qui sera plus précise. Ce cycle est répété jusqu’à ce que la convergence soit atteinte. La qualité des résultats de méthodes HF dépend de la qualité de l’expansion de la fonction d’onde dans l’ensemble des bases (voir le
37
chapitre Fonctions des bases pour plus d’information). L’énergie calculée par la HF convergera vers la meilleure énergie accessible (HF limite) avec l’ensemble des bases complet. Malheureusement, la HF limite n’est pas encore l’énergie réelle (ie., expérimentale) à cause de quelques approximations supplémentaires dans la théorie HF. Cette théorie, par exemple, ne prend pas en compte les possibles effets relativistes, par exemple que la masse des électrons n’est pas forcément constante et qu’elle peut changer en fonction de leur vitesse. Une autre approximation, selon laquelle l’électron est en mouvement dans le potentiel moyen des autres électrons, et donc que sa position n’est pas affectée par la position des électrons voisins, ne permet pas représenter les effets de corrélation électronique. Cette énergie de corrélation est définie comme la différence entre l’énergie exacte et l’énergie limite de HF. De nombreuses méthodes de corrélation électronique, appellées aussi méthodes postHatree-Fock, ont été développées pour inclure l’énergie de corrélation dans le calcul (Cramer, 2002). Les méthodes les plus populaires et les plus fréquemment utilisées sont, par exemple, (1) la méthode de perturbation Møller-Plesset (Møller and Plesset, 1934) dans les versions MP2, MP4 et MP6, (2) les méthodes de configuration-interaction (CI) (Hehre et al., 1986) utilisée dans les niveaux CIS, CID, CISD ou QCISD, et (3) les méthodes de coupled clusters (CC) (Cizek, 1966) utilisées le plus fréquemment dans les variantes CCSD ou CCSD(T).
2.1.4. Les méthodes de la théorie de la fonctionnelle de la densité (DFT) Les méthodes de la théorie de la fonctionnelle de la densité (DFT) (Koch and Holthausen, 2001) dans leur utilisation comme alternatives aux méthodes ab initio sont fondées sur un autre niveau de théorie – les théorèmes de Hohenberg-Kohn (Hohenberg and Kohn, 1964). Dans ces méthodes, la fonction d’onde ψ de l’état fondamental non-dégénéré d’un système multi-électrons est substituée par la fonctionnelle apparente de la densité électronique ρ. Le gros avantage de la DFT par rapport aux méthodes ab initio est qu’on n’a pas besoin de calculer la fonction d’onde entière du système à N particules, mais que seule la densité électronique totale est calculée pour obtenir toutes les propriétés de l’état fondamental(Springborg, 1997). La charge de calcul est beaucoup plus faible, car la densité électronique ρ, fonction de trois variables, est un objet plus simple que la fonction d’onde ψ à N variables. Une autre simplification du calcul découle du fait que la DFT n’exige qu’une minimisation du potentiel d’énergie E[ρ(r)] au lieu de la solution exacte de l’équation Schrödinger comme c’est le cas dans les méthodes ab initio. La première méthode DFT utilisable était la théorie de la fonctionnelle de la densité de Kohn-Sham (KS-DFT) (Kohn and Sham, 1965) qui incluait la corrélation électronique et les effets d’échange. Généralement, les méthodes DFT nécessitent la construction de la fonctionnelle d’énergie. Les fonctionnelles utilisées le plus fréquemment sont la fonctionnelle d’énergie échange Becke-88 (B88) (Becke, 1988), les fonctionnelles de corrélation d’énergie 38
de Vosko-Wilk-Nussair (Vosko et al., 1980) et de Lee-Yang-Parr (Lee et al., 1988), la combinaison de B88 avec LYP appellée BLYP, ou bien la fonctionnelle hybride B3LYP qui est à présent très fréquemment utilisée (Jensen, 1999).
2.1.5. Fonctions de base Les méthodes HF et DFT comportent un calcul d’intégrales de fonctions d’orbitales atomiques. Le nombre et la configuration de ces fonctions d’orbitales atomiques est l’ensemble des fonctions de base. Les orbitales atomiques ont généralement la forme d’orbitales de type Slater (STO), mais en raison de la complexité de leur calcul, elles sont souvent réduites à des orbitales de type Gaussian (GTO). Le plus petit ensemble de fonctions de base est appelé STO-3G. Il utilise seulement une STO avec des orbitales s pour les atomes d’hydrogène et des orbitales s et p pour les autres atomes. La STO est alors développées en trois GTOs (Young, 2001). On peut améliorer la qualité des résultats en utilisant deux ou trois STOs pour les orbitales atomiques au lieu d’une (ensemble de fonctions de base double ou triple). Une alternative à cette approche est l’utilisation d’ensembles de base split-valence 3-21G, 4-31G, 6-31G, ou 6311G. Ces bases développent deux (ou trois) STOs pour les électrons de valence, et une STO pour les électrons des couches internes, en utilisant le nombre approprié de GTOs. Par exemple dans l’ensemble 6-31G, l’orbitale des couches internes est développée dans une série de six GTOs, tandis que le orbitales de valence sont développées dans une série de trois GTOs pour la première orbitale et une GTO pour la deuxième orbitale. Une autre amélioration consiste en l’addition de fonctions de polarisation (indiquées par *): on peut rajouter les orbitales d pour les atomes non-hydrogène (6-31G*) ainsi que les orbitales p pour les atomes hydrogènes (6-31G**). Pour les anions ou les composés riches en électrones, il es recommandé d’ajouter les fonctions de diffusion (indiquées par +) à l’ensemble des fonctions de base. Ainsi, l’ensemble le plus fréquemment utilisé est le 6-31+G, qui rajoute les fonctions de diffusion s et p aux atomes non-hydrogènes et le 6-31++G, qui rajoute les fonctions p aux atomes hydrogène (Sapse, 1998). Il est également possible de combiner les fonctions de polarisation et les fonctions de diffusion.
2.2. Mécanique moléculaire Les méthodes de mécanique moléculaire (MM) utilisent les lois de la physique classique pour la prédiction des structures et de leurs propriétés, comme l’énergie. Au même titre que les méthodes de MQ, les méthodes de MM utilisent une alternative à l’approximation de BornOppenheimer, mais les électrons ne sont pas explicitement inclus dans le calcul. Cette simplification assume que les électrons dans la molécule sont répartis dans leur distribution optimale, et seules les positions des noyaux sont prises en considération pour la résolution du
39
problème chimique particulier. La molécule est alors mécaniquement simplifiée à un ensemble de masses auxquelles des charges ont été attribuées. Ces masses interagissent entre eux par des forces classiques, décrites par des fonctions d’énergie potentielle. Les effets électroniques sont omis ou simplifiés, et inclus dans le champ de forces pour certains atomes spécifiques. Un champ de forces utilise généralement un ensemble d’équations pour définir comment l’énergie potentielle du système varie en fonction de la position des atomes. Le champ définit aussi les propriétés "chimiques" pour chaque atome (en prenant en compte son environnement chimique), telles que la charge partielle, l’hybridation ou le rayon de Van der Waals. Enfin, le champ de force définit les paramètres spécifiques, telles que les longueurs, les angles et les torsions de liaison, et les barrières de rotation des angles définies par les constantes de force. Ces valeurs sont généralement importées à partir de données expérimentales ou de calculs ab initio de haut niveau. Le théorème fondamental de la MM est que ‘énergie potentielle totale (Etot) peut être considérée comme la somme de plusieurs contributions (5): E tot = Estretch + E bend + E tors + E vdW + E elstat + E cross
(5)
Estretch est la fonction d’énergie pour l’étirement d’une liaison entre deux atomes, généralement définie par une fonction d’oscillateur harmonique, Ebend est la fonction d’ouverture d’un angle (généralement harmonique), Etors représente la fonction pour les variations de torsion, i.e. les rotations autour des liaisons. EvdW, Eelstat et Ecross représentent les fonctions pour les interactions de Van der Waals, électrostatiques et les termes croisés. Les interactions de Van der Waals sont décrites, par exemple, par le potentiel de Lennard-Jones (Lennard-Jones, 1931), les interaction électrostatiques par une simple loi de Coulomb, et les interactions croisées représentent les interactions spécifiques, comme l’étirement-flexion ou torsion-tension. De nombreux champs de force sont couramment utilisés tels que AMBER, CHARMM, GROMOS, MM1 à MM4, OPLS ou TRIPOS. Une revue générale des de champs de forces disponibles est donnée dans le livre de Young (Young, 2001). Du point de vue du temps de calcul, l’ensemble des approximations utilisées rendent les calculs de MM beaucoup moins exigeants que les calculs de MQ. On peut donc appliquer ces méthodes à un plus grand nombre de systèmes, y compris les biomolécules (protéines, acides nucléiques). A l’heure actuelle, les calculs sont souvent effectués en prenant en compte le solvant, ce qui représente mieux les conditions naturelles du système. Malheureusement, les méthodes de MM comportent aussi quelques limitations: (1) le nombre de systèmes paramétrés, pour lesquelles la méthode MM donne les résultats validés, (2) l’omission des électrons pendant le calcul, ce qui signifie que la MM en principe n’est pas capable à résoudre
40
les problèmes où dominent les effets électroniques (réactions chimiques, transferts d’électrons ou de protons etc.).
2.3. Dynamique moléculaire La dynamique moléculaire (DM) est la méthode la plus fréquemment utilisée pour la simulation de systèmes réels. La DM est capable de simuler un comportement structural en fonction du temps et dans les conditions spécifiées (pression, volume, température). Le principe de base des méthodes de DM est l’échantillonnage de l’espace conformationnel du système étudié par l’intégration des équations du mouvement de Newton pour tous les atomes présents dans le système (voir plus bas). Les simulations de DM sont très importantes pour la recherche du comportement structural des biomolécules en fonction du temps. En utilisant la DM, on peut étudier la flexibilité ou la rigidité des biomolécules, mesurer les interactions intermoléculaires entre la protéine et les ligands ou d’autres biomolécules, calculer l’énergie libre ou bien étudier l’effet du solvant sur la structure des biomolécules. Malheureusement, la DM classique utilisant des champs de force empiriques est limitée de la même manière que la MM et donc n’est pas applicable, par exemple, à l’étude des réactions enzymatiques.
2.3.1. Principe de base Lorsque l’on ignore les effets quantiques, chaque atome d’une molécule est considéré comme une masse ponctuelle dont le mouvement est déterminé par l’ensemble des forces exercées par tous les autres atomes en fonction du temps. Les intervalles de temps de chaque cycle de dynamique sont suffisamment courts, environ 1 fs (10-15 s), pour que l’on puisse considérer les forces interatomiques comme constantes. Il devient alors possible d’intégrer numériquement les équations de mouvement et de déterminer ainsi les positions des atomes de la molécule étudiée dans le temps. Cette évolution temporelle – la soi-disant trajectoire – d’une molécule est décrite par la résolution des équations de mouvement de Newton. La molécule est donc perçue comme une entité dynamique dont les atomes modifient leurs positions spatiales au cours du temps. Au temps t, chaque atome i, de vecteur position Xi et de masse mi subira une accélération ai telle que selon la loi de Newton: r r d 2 ri r dV Fi = mi a i = − r = mi 2 dri dt
(6)
où m est la masse d’atome, a est son accélération, t est le temps, r sont les coordonnées cartésiennes de l’atome, et V est l’énergie potentielle. On peut alors intégrer les équations de mouvement et obtenir une trajectoire pour chaque
41
atome en fonction du temps. Pour cela on utilise l’algorithme de précision de deuxième ordre – l’algorithme de Verlet (en troncature de la série de Taylor). Sa validité est aisément testée à l’aide de la troisième loi fondamentale de la thermodynamique qui implique que l’énergie totale d’un système doit être conservée au cours d’une simulation. En général le potentiel d’interaction Vtotal utilisé est un potentiel empirique « complet ». Ce qui veut dire qu’il contient les contributions des énergies de liaisons, d’angles de valence, d’angles dièdres et des interactions de van der Waals et électrostatiques.
2.3.2. Conditions périodiques Le solvant est introduit dans le modèle sous forme d’une boîte parallélépipédique au sein de laquelle, on place le soluté étudié. En pratique, on utilise un artifice de calcul appelé conditions périodiques qui permet de simuler des systèmes infinis. Celui-ci implique que la boîte de solvant est entourée de 26 copies identiques à elle-même (Figure 18). Quand une molécule de la boîte centrale sort par une face, elle est immédiatement remplacée par une copie qui apparaît par la face opposée. Pour reproduire des propriétés d’un grand système, une molécule dans la boîte centrale ne peut interagir qu’avec une copie de chaque particule (« minimum image convention »). En conséquence, la troncature de potentiel (cut-off) doit être inférieure à la demi-longueur des trois dimensions de boîte.
Figure 18: Conditions périodiques dans l'espace bidimensionnel.
Ainsi les conditions périodiques rendent possibles des simulations à pression et température constantes tout en maintenant fixe le nombre de molécules de l’ensemble moléculaire. Elles permettent donc de simuler des processus chimiques dans l’ensemble thermodynamique isotherme-isobare (N, T, P) qui correspond aux conditions généralement rencontrées au laboratoire. La méthode de Berendsen (Berendsen et al., 1984) que nous avons utilisé sert à réguler la température. Le système est lié avec un bain externe et les vélocités de chaque pas 42
sont multipliées par un facteur qui dirige la température vers la valeur désirée. De la même manière, la pression peut être contrôlée en ajustant le volume du système.
2.3.3. Algorithme SHAKE Afin de pouvoir tenir compte de toutes les interactions entre atomes, le pas d’intégration ∆t (algorithme de Verlet) doit être au moins 10 à 20 fois inférieur aux fréquences de vibration les plus élevées. Ce qui est généralement de l’ordre de 1 femtoseconde (1 fs = 10-15 s). En effet, un des problèmes de fond de la technique de dynamique moléculaire est l’inadéquation entre l’échelle de temps des phénomènes physiques observés et celle des simulations accessibles par la technologie actuelle. Une simulation courante dure plusieurs nanosecondes (Figure 19), tandis que les échelles de temps des processus dynamiques observés varient de la femtoseconde (e.g. l’élongation des liaisons covalentes) à la seconde (e.g. la vibration des cycles aromatiques des résidus d’acides aminés). La procédure SHAKE (van Gunsteren and Berendsen, 1990), qui néglige les élongations des liaisons, permet d’allonger le pas d’intégration jusqu’à 2, voire 4 femtosecondes. En effet, cet algorithme permet de fixer la longueur de certaines liaisons durant une dynamique moléculaire. Le plus souvent, SHAKE n’est appliqué que sur des liaisons impliquant un atome d’hydrogène dont les vibrations sont les plus rapides. Le choix du pas d’intégration dépend également de la température du système. Dans les simulations à haute température, il faut éviter des pas trop grands, qui entraînent un éclatement du système.
Figure 19: Un pas d'intégration ∆t est d'ordre de femtoseconds (10-15 s), alors que la trajectoire dure généralement quelques nanoseconds (10-9 s).
2.3.4. Sommation d’Ewald (Particle Mesh Ewald, PME) Le traitement des interactions électrostatiques à longues distances présente toujours un problème, notamment dans les simulations de dynamique moléculaire de molécules chargées. Une des solutions possibles, l’utilisation d’une grande distance de troncature, augmente le temps de calcul en fonction du cube de la distance. Comme cela a été démontré dans plusieurs simulations de molécules d’ADN (Cheatham III et al., 1995; York et al., 1995), l’introduction de la sommation d’Ewald (Essmann et al., 1995) semble plus efficace et plus précise. Cette méthode est basée sur un algorithme rapide qui calcule l’énergie électrostatique complète d’une boîte périodique, dans un système macroscopique infini d’images répétées. Elle évite 43
ainsi l’évaluation standard des interactions non-liées et utilise sa propre fonction composée d’une sommation dans l’espace direct et d’une autre dans l’espace réciproque pour calculer des interactions de type Lennard-Jones et les interactions électrostatiques.
2.3.5. Analyse d’une simulation de dynamique moléculaire Une fois la simulation réalisée, l’ensemble des données enregistrées permet d’analyser le comportement dynamique de la structure étudiée sous plusieurs formes. L’analyse d’une trajectoire dynamique la plus courante et la plus simple consiste à tracer une composante géométrique en fonction du temps. L’histoire des angles de torsion et des distances apporte de précieux renseignements dynamiques et aide à la compréhension de certaines propriétés physiques.
2.3.5.1.
Fonction de distribution radiale (radial distribution function, RDF)
La fonction de distribution radiale (Figure 20) est utilisée pour rendre compte du comportement d’une molécule en solution. Cette fonction fournit la distribution statistique des molécules de solvant autour d’un soluté pendant la durée de la simulation. On peut considérer la distribution d’atomes de solvant w (pour water) autour d’un atome de soluté, A, NwA(r) étant le nombre moyen d’atomes w dans une sphère de rayon r et de centre A. La fonction de distribution, gwA(r) est alors définie par: gWA ( r ) =
dN ( r ) 1 ⋅ WA 2 4πρ W r dr
(7)
où ρw est la densité des atomes dans le solvant w et 4πρ W r 2 normalise gwA(r) à une grande distance de soluté.
Figure 20: L'exemple d'une fonction radiale de distribution. Cette image représente une RDF pour les molécules d'eau autour d'un atome de carbon de substrate donneur de glycosyltransférase LgtC.
44
2.3.5.2.
Dynamique essentielle
La méthode de EDA (Essential Dynamic Analysis, traduit par Dynamique Essentielle) (Amadei et al., 1993)est utilisée pour séparer les mouvements corrélés et de grand amplitude du système simulé des mouvements locaux moins intéressants (les vibrations pour la plupart). L'analyse par la dynamique essentielle consiste en trois étapes. Premièrement, une matrice de covariance est créée à partir des fluctuations atomiques dans la trajectoire. Deuxièmement, la matrice est diagonalisée et un ensemble de valeurs propres et de vecteurs propres est obtenu. Les vecteurs propres correspondent aux directions dans l'espace, et les mouvements le long des vecteurs propres individuels correspondent aux fluctuations concertées des atomes. Les valeurs propres, par contre, représentent le moindre carré de la fluctuation totale du système le long des vecteurs propres correspondants. Et troisièmement, une nouvelle trajectoire contenant uniquement les mouvements atomiques corrélés est calculée à partir de la trajectoire originale, ce qui est fait par l'application des vecteurs propres les plus importants. Ceux-ci, comme l'ont montré Berendsen et al. (Van Aalten et al., 1997) correspondent aux valeurs propres les plus grandes, généralement avec des valeurs entre 5 et 10.
2.3.5.3.
Méthodologie MM-PBSA/MM-GBSA
Les méthodes MM/PBSA et MM/GBSA (Molecular Mechanics / Poisson Boltzmann Surface Area ou Generalized Born Surface Area) (Srinivasan et al., 1998) permettent de calculer les énergies libres absolues des molécules en solution. Ces énergies sont ensuite utilisées pour calculer les énergies libres d’interaction. Les deux méthodes sont utilisées pour l'analyse des trajectoires obtenues par dynamique moléculaire. Elles emploient la même stratégie de calcul, où l'énergie libre absolue totale du système est la somme de trois contributions, qui ont été calculées séparément (eq. 8) La première contribution (MM) contient l'énergie libre absolue du soluté dans le vide. Le deuxième (PB ou GB) et le troisième (SA) terme comprennent l'énergie libre de la solvatation du soluté, donc le changement d'énergie lié au passage du soluté du vide au solvant. ∆G solv = ∆E mm − T∆S + ∆G polar + ∆Gnonpolar 142 4 43 4 123 1424 3 MM
PB ou GB
(8)
SA
L'énergie libre du soluté dans le vide ( ∆H = ∆E mm − T∆S , donc la partie MM) est déterminé avec le programme SANDER d'AMBER. Le terme ∆Emm est la somme des contributions d'énergie électrostatique (décrit par le potentiel coulombien) et de van der Waals (décrit par le potentiel de Lennard-Jones), et peut contenir les contributions d'énergie interne (énergies d'élongation, de flexion ou de torsion). Les effets entropiques peuvent également être pris en compte (T∆S), où T es la température; le changement d'entropie (∆S) est typiquement estimé en s'appuyant sur les formules statistiques classiques et l'analyse d'un ensemble des structures 45
énergétiquement minimisées. Les deux méthodes diffèrent considérablement dans le calcul de l'énergie de solvatation électrostatique (∆Gpolar, donc la partie PB ou GB). La méthode MM/PBSA emploie la résolution de l'équation Poisson-Boltzmann, qui fait partie du programme DelPhi (Honig and Nicholls, 1995; Rocchia et al., 2001). Par contre, la méthode MM/GBSA emploie l'approximation de Born généralisée(Massova and Kollman, 2000). Les deux approches prennent en considération le solvant implicite. Le dernier terme est la contribution non-polaire (∆Gnonpolar, donc la partie SA). Sa valeur se rapporte à l'étendue de la surface du soluté accessible au solvant (Solvent Accessible Surface Area, SASA). L'énergie totale d'interaction (∆Gbind) est alors déterminée par l'équation 9: ∆G bind = ∆G complex − (∆G protein + ∆G ligand )
(9)
où les termes respectifs sont les énergies libres totales de solvatation (∆Gsolv) pour chaque espèce (la protéine, le ligand, le complexe).
2.4. Les méthodes hybrides Les deux méthodologies MQ et MM ont leurs points forts et leurs points faibles. Les méthodes hybrides essaient de tirer avantage des deux méthodes: le pouvoir de la MQ à modéliser des réactions chimiques ou d’autres processus de transfert d’électrons, et l’applicabilité de la MM aux grands systèmes moléculaires, comprenant même des molécules de solvant. Le système modélisé (par exemple une enzyme) est alors divisé en deux parties: la partie MQ, où la réaction chimique se déroule, et la partie MM, contenant le reste de la protéine et le solvant environnant. Ces méthodes sont appelées MQ/MM (Gao, 1998). La propriété fondamentale, calculée par toutes les approches computationnelles, est l’énergie potentielle. Dans le cas des méthodes MQ/MM, cette énergie est calculée selon le principe de base donné dans l’équation 7. E tot = E MQ + E MM + E inter(MQ/M M)
(10)
L‘énergie totale est calculée comme la somme des énergies calculées séparément par la MM (EMM) et la MQ (EMQ) et l‘énergie de l’interface entre ces deux régions (Einter(MQ/MM)). La région MQ peut être évaluée par n’importe quelle méthode semi-empirique (dans la majorité de cas par AM1), ab initio ou DFT. La partie MM est calculée par n’importe quel champ de force. L’interface MQ/MM est calculée soit par l‘approche des atomes liés ou par l’approche des orbitales hybrides sp2 (Reuter et al., 2000). 46
A l’heure actuelle, plusieurs algorithmes MQ/MM sont disponibles. Ils sont implémentés dans des logiciels commerciaux, tel que ONIOM (Svensson et al., 1996) dans le logiciel Gaussian03 (Frisch et al., 2004), ROAR (Cheng et al., 2002) dans le logiciel AMBER8 (Case et al., 2004), ou bien dans des logiciels développés localement dans les universités, qui combinent différents algorithmes MQ et MM . Ces logiciels MQ/MM sont généralement développés pour les problèmes spécifiques de leurs utilisateurs. La combinaison du code MQ de GAMESS (Schmidt et al., 1993) avec le champ de force de MM du CHARMM (Brooks et al., 1983) est aussi très populaire.
2.5. Modélisation par homologie Les structures tridimensionnelles des protéines présentent une source appréciable d’information pour l’annotation structurale de ces biomolécules. On détermine ces structures par des méthodes expérimentales, comme la cristallographie aux rayons X ou la spectroscopie RMN. Les méthodes expérimentales ne sont cependant pas toujours applicables. Dans ce cas, la prédiction de structure par les méthodes computationnelles peut aboutir à un modèle utile. On peut modéliser les structures de protéines soit ab initio, en n’utilisant que la séquence, ou par les méthodes comparatives qui sont fondées sur une base de données de structures de protéines connues (Baker and Sali, 2001; Fiser et al., 2002). Les méthodes ab initio s’appuient surtout sur les lois de la physique, alors que les méthodes comparatives consistent principalement en connaissances statistiques. Bien que des progrès significatifs aient été faits dans les méthodes ab initio (Bradley et al., 2003), la modélisation par homologie donne toujours les résultats les plus fiables à condition qu’il existe une structure de protéine connue dont la séquence est assez similaire à la séquence à modéliser (Baker and Sali, 2001). Pour que la prédiction de la structure d’une protéine par modélisation par homologie soit possible, deux conditions doivent être remplies (Blundell et al., 1987; Marti-Renom et al., 2000). Premièrement, il faut que la séquence à modéliser – la séquence cible – présente une nette similarité avec une autre séquence, dont la structure est connue la séquence « gabarit ». Deuxièmement, il est indispensable qu’un alignement précis entre la séquence cible et la séquence gabarit puisse être proposé. La modélisation par homologie consiste en quatre étapes: l’identification du repliement protéique, l’alignement des séquences cibles et gabarit, la construction du modèle et l’évaluation de ce modèle.
2.5.1. L’identification du repliement protéique et le choix du gabarit La première étape dans la modélisation par homologie est l’identification du repliement probable de la séquence cible. On peut identifier le repliement intéressant en utilisant l’un des logiciels qui analysent les bases de données de séquence et de structures, comme, par exemple,
47
la Protein Data Bank (PDB) (Berman et al., 2000), la classification structurelle des protéines (SCOP) (Murzin et al., 1995), ou la base de données CATH (Class, Architecture, Topology, and Homology) (Pearl et al., 2005). On peut distinguer trois classes de méthodes pour la recherche de séquences gabarits utilisables pour la modélisation de la séquence cible. Premièrement, les méthodes de comparaison par paires, qui comprennent les logiciels populaires BLAST (Altschul et al., 1990) et FASTA (Pearson, 1990). Ces programmes alignent la séquence cible sur toutes les séquences de la base de données dont les structures sont connues. Les performance et l’efficacité de cette classe de méthodes ont fait l’objet de nombreuses études (Brenner et al., 1998). Deuxièmement, les méthodes de profil de séquence tels que PSI-BLAST (Altschul et al., 1997) s’appuient sur les profils dérivés d’alignements de séquences multiples, qui accroissent la sensibilité et l’exactitude de la recherche. Troisièmement, les méthodes de repliement protéiques qui utilisent une combinaison de raisonnements séquentiels et structurels pour la détection des similitudes entre les structures et les séquences (David et al., 2000). Dans ces méthodes, la séquence cible est repliée sur une librairie de profils tridimensionnels ou de structures individuelles, et chaque repliement est évalué selon une certaine fonction score. Les méthodes et serveurs utilisés habituellement dans cette catégorie comprennent Superfamily (Gough et al., 2001), GenThreader (Jones, 1999), 3D-PSSM (Kelley et al., 2000) et ProFit (Sippl and Flockner, 1996). Lorsque les similarités de séquence sont faibles, ces méthodes de repliement sont plus efficaces pour la détection d’homologie structurale que les méthodes qui s‘appuient uniquement sur l’information séquentielle (Rohl et al., 2004).
2.5.2. L‘alignement cible-gabarit Après l’identification du gabarit, l’étape suivante dans la modélisation par homologie est l’alignement précis de la séquence cible sur la séquence gabarit. Bien que la plupart des méthodes donnent un alignement cible-gabarit, il est souvent nécessaire d’employer une méthode d’alignement spécialisée pour un re-alignement des séquences, car les méthodes d’identification du gabarit sont souvent conçues pour repérer les relations distantes, parfois au détriment de l’exactitude locale de l’alignement. Il est recommandé d’utiliser des méthodes qui permettent d’aligner les séquences correspondant aux structures secondaires et/ou aux région hydrophobes, comme par exemple le programme HCA (Hydrophobic Cluster Analysis) (Callebaut et al., 1997). De plus les acides aminés jouant un rôle particulier (ponts disulfure, site actif…) doivent être pris en compte lors de l’alignement final.
48
2.5.3. La construction du modèle L’alignement cible-gabarit est utilisé pour la construction du modèle tridimensionnel de la protéine cible. Il existe plusieurs méthodes pour la construction du modèle. Les différents procédés aboutissent néanmoins à des modèles très similaires, s’ils sont employés de manière optimale (Marti-Renom et al., 2002). La modélisation par la applications de contraintes spatiales est peut-être la méthode la plus prometteuse de toutes les techniques de modélisation par homologie (d’autres méthodes comportent la modélisation par assemblage de corps solides, la modélisation par superposition de segments ou la reconstruction des coordonnées). La procédure de la modélisation par homologie débute par l’alignement de la séquence cible sur une structure tridimensionnelle connue, dont la séquence a été identifiée comme proche à la séquence cible. A partir de l’alignement effectué dans l’étape précédente, on distingue les régions structurellement conservées (SCRs), qui ne présentent pas de problèmes majeurs pour la modélisation, et les régions variables – généralement les boucles – qui sont beaucoup plus compliquées à modéliser. On discerne deux approches fondamentales qui abordent ce problème. D’une part, les méthodes ab initio comportent la génération d’un grand nombre de conformations choisies au hasard, et leur évaluation selon différents critères, le plus souvent énergétiques (Fiser et al., 2000; Jacobson et al., 2004). Et d’autre part, les méthodes de bases de données produisent un ensemble de conformations connues qui s’appuie soit sur les ressemblances dans les séquences entre les boucles dans la banque de données et la boucle modélisée, soit sur les critères géométriques – par exemple la distance entre le C- et Nterminus de la boucle (Fernandez-Fuentes et al., 2006; Li et al., 1999). Une fois les boucles candidates choisies de cette manière, les critères énergétiques sont employés pour évaluer la probabilité du modèle. Ils existent aussi des méthodes qui combinent les deux approches. Le résultat obtenu est un modèle tridimensionnel pour la séquence cible contenant tous les atomes non hydrogènes de la chaîne principale et des chaînes latérales. Dans la première étape de la construction du modèle, les contraintes sur les distances et les angles de torsion sur la séquence cible sont dérivées de son alignement sur la structure tridimensionnelle du gabarit. Dans la deuxième étape, les contraintes spatiales et les contributions d’un champ de force imposant une stéréochimie adéquate sont combinés en fonction objet. Et finalement, le modèle est obtenu par minimisation de cette fonction objet dans les coordonnées cartésiennes. Parmi les nombreux programmes exploitant cette méthodologie, les plus couramment utilisés sont le MODELLER (Marti-Renom et al., 2000) et le SWISS-MODEL (Schwede et al., 2003).
49
2.5.4. L’évaluation de justesse du modèle La validité et la précision de la structure prédite déterminent l’information qu’on pourra en retirer. L’évaluation de la validité des modèles de protéines tridimensionnelles en l’absence des structures connues est donc indispensable pour leur interprétation. On peut évaluer le modèle entier ou ses régions individuelles. Il existent de nombreux logiciels et serveurs d’évaluation, dont le plus connu est PROCHECK (Laskowski et al., 1993).
50
3. CALCULS QUANTIQUES SUR LA LECTINE PA-IIL 3.1. Les lectines solubles de Pseudomonas aeruginosa Les structures de plusieurs lectines bactériennes solubles de Pseudomonas et d’autres bactéries gram négatives ont été décrites récemment (Imberty et al., 2004). Ces lectines sont supposées jouer un rôle dans la reconnaissance de l’hôte et aussi dans la formation et la cohésion du biofilm (Tielker et al., 2005). La bactérie gram négative Pseudomonas aeruginosa produit deux lectines solubles qui sont associées avec les facteurs de virulence: la lectine PA-IL spécifique du galactose (Cioci et al., 2003; Gilboa-Garber, 1982), et la lectine PA-IIL, qui s’attache de préférence au fucose (Mitchell et al., 2002), mais peut reconnaître aussi le mannose ou le fructopyranose (Garber et al., 1987; Loris et al., 2003). Il a été démontré que la lectine PA-IIL lie spécifiquement les glucides du lait maternel (Lesman-Movshovich et al., 2003). Ceci correspond au fait que le lait maternel, qui est extraordinairement riche en oligosaccharides fucosylés, protège les nourrissons contre les infections microbiennes (Perret et al., 2005).
3.2. La lectine PA-IIL La structure cristallographique de la lectine PA-IIL de la bactérie Pseudomonas aeruginosa a récemment été résolue en complexe avec le fucose (Mitchell et al., 2002). Cette lectine présente l’affinité la plus forte pour le fucose, mais des complexes de PA-IIL avec le Dfructose et le D-mannose ont aussi été décrits (Loris et al., 2003) ainsi qu’avec le L-galactose et le D-arabinose (Sabin et al., 2006). Le fucose est liée au site actif d’une manière qui n’a jamais été observée dans un complexe protéine-glucide. Le monomère de fucose est rattaché à deux cations de calcium par l’intermédiaire de trois fonctions hydroxyles, qui sont ainsi impliquées dans la coordination de ces cations et renforcent la liaison entre la protéine et le ligand (Figure 21, Figure 22).
51
Figure 21: Structure cristallographique d’un tétramère de lectine PA-IIL. Chaque monomère consiste de 114 résidus acides aminés, de deux cations calcium (violet) et d'une molécule de fucose.
Figure 22: Site de liaison de PA-IIL en interaction avec calcium et fucose. Les liaisons de coordination et les liaisons hydrogène sont représentées par lignes pointillées.
L’affinité de l’interaction entre le fucose et la lectine est remarquablement élevée, ce qui pourrait être une cause de la colonisation chronique des poumons de patients atteints de mucoviscidose par la bactérie Pseudomonas aeruginosa. Une meilleure compréhension du principe de cette interaction inhabituelle pourra aider à concevoir un ligand avec une affinité encore plus forte, qui occuperait le site de liaison de la lectine PA-IIL et ainsi, empêcherait la bactérie de se fixer dans les poumons de malades.
52
ARTICLE II
Mitchell, E. P., Sabin, C., Snajdrova, L., Pokorna, M., Perret, S., Gautier, C., Hofr, C., Gilboa-Garber, N., Koca, J., Wimmerova, M. & Imberty, A. (2005). High affinity fucose binding of Pseudomonas aeruginosa lectin PA-IIL: 1.0 A resolution crystal structure of the complex combined with thermodynamics and computational chemistry approaches. Proteins 58, 735-746. Résumé : La PA-IIL est une lectine soluble de Pseudomonas aeruginosa qui présente une haute affinité pour le fucose. Elle est étroitement liée aux facteurs de virulence de cette bactérie. Les études structurales précédentes ont révélé un nouveau mode d’interaction entre la protéine et le glucide, avec la participation directe de deux ions calcium. Une combinaison de méthodes thermodynamiques, structurales et computationnelles a été utilisée pour étudier les bases de la haute affinité pour le ligand monosaccharidique. La structure cristallographique du tétramère de PA-IIL en complexe avec le fucose et le calcium a été affinée à une résolution de 1.0 Å, ce qui a permis, en combinaison avec la modélisation moléculaire, de proposer un réseau des liaisons hydrogène dans le site de reconnaissance. Les calculs ab initio des charges partielles ont indiqué une importante délocalisation de charge entre les ions de calcium, les chaînes latérales du site de liaison de la protéine, et le ligand saccharidique. Cette délocalisation semble être responsable de la forte enthalpie de liaison et donc de la très forte affinité, qui a été observée pour ce mode unique de reconnaissance de glucide.
53
54
PROTEINS: Structure, Function, and Bioinformatics 58:735–746 (2005)
High Affinity Fucose Binding of Pseudomonas aeruginosa lectin PA-IIL: 1.0 Å Resolution Crystal Structure of the Complex Combined with Thermodynamics and Computational Chemistry Approaches ˇ najdrova´,2,3 Martina Pokorna´,3 Ste´phanie Perret,2 Edward P. Mitchell,1 Charles Sabin,1,2 Lenka S Catherine Gautier,2 Ctirad Hofr,4 Nechama Gilboa-Garber,5 Jaroslav Kocˇa,3 Michaela Wimmerova´,3,6 and Anne Imberty2,* 1 E.S.R.F. Experiments Division, Grenoble, France 2 CERMAV - CNRS (affiliated with Universite´ Joseph Fourier), Grenoble, France 3 NCBR, Masaryk University, Brno, Czech Republic 4 Institute of Biophysics, Academy of Sciences of the Czech Republic, Brno, Czech Republic 5 Bar-Ilan University, Faculty of Life Sciences, Ramat Gan, Israel 6 Department of Biochemistry, Masaryk University, Brno, Czech Republic
ABSTRACT PA-IIL is a fucose-binding lectin from Pseudomonas aeruginosa that is closely related to the virulence factors of the bacterium. Previous structural studies have revealed a new carbohydratebinding mode with direct involvement of two calcium ions (Mitchell E, Houles C, Sudakevitz D, Wimmerova M, Gautier C, Pere´z S, Wu AM, Gilboa-Garber N, Imberty A. Structural basis for selective recognition of oligosaccharides from cystic fibrosis patients by the lectin PA-IIL of Pseudomonas aeruginosa. Nat Struct Biol 2002;9:918 –921). A combination of thermodynamic, structural, and computational methods has been used to study the basis of the high affinity for the monosaccharide ligand. A titration microcalorimetry study indicated that the high affinity is enthalpy driven. The crystal structure of the tetrameric PA-IIL in complex with fucose and calcium was refined to 1.0 Å resolution and, in combination with modeling, allowed a proposal to be made for the hydrogen-bond network in the binding site. Calculations of partial charges using ab initio computational chemistry methods indicated that extensive delocalization of charges between the calcium ions, the side chains of the protein-binding site and the carbohydrate ligand is responsible for the high enthalpy of binding and therefore for the unusually high affinity observed for this unique mode of carbohydrate recognition. Proteins 2005;58:735–746. © 2004 Wiley-Liss, Inc.
those under assisted ventilation. Their infections are frequently fatal for cystic fibrosis patients. The airway epithelia glycoconjugates and the abundant mucins of CF patients are known to be modified and serve as attachment points for P. aeruginosa.1⫺3 For this reason, the different carbohydrate-binding proteins of these bacteria (pilin, flagellin, and non-pili lectins) are of high interest for their role in host recognition and specific adhesion. Under certain culture conditions, P. aeruginosa expresses two intracellular lectins, PA-IL and PA-IIL, specific for galactose and fucose, respectively.4 The two lectins consist of four subunits of 12.7 kDa5 and 11.7 kDa,6 respectively. These lectins, which do not display sequence similarity, are mainly associated with the inner membrane of the bacteria,7 but can also be exposed at the surface.8 The present study focuses on PA-IIL, the fucose-binding lectin that recently attracted much attention.9 In hemagglutination inhibition tests, the strongest monosaccharide derivative inhibition was obtained with p-nitrophenyl-␣-Lfucose followed by L-fucose ⬎ L-galactose ⬎ D-arabinose ⬎ D-fructose, and D-mannose while D-galactose was not active.10 PA-IIL is characterized by high affinity for Lfucose10 with an association constant (Ka) of 1.6 ⫻ 106 M⫺1. Such micromolar affinity range is very unusual among lectin-monosaccharide interactions where the millimolar range is more commonly observed. PA-IIL has been the subject of two crystallographic studies8,11 that yielded
Key words: Pseudomonas aeruginosa; lectin; fucose; cristallography; thermodynamics; quantum chemistry
Abbreviations: CF, cystic fibrosis; PA-IL, Pseudomonas aeruginosa first (galactose-binding) lectin; PA-IIL, Pseudomonas aeruginosa second (fucose-binding) lectin The Supplementary Materials referred to in this article can be found at http://www.interscience.wiley.com/jpages/0887-3585/suppmat/ index.html. Included is a table of charges similar to Table V but for other calculation methods (HF and Mulliken).
INTRODUCTION Pseudomonas aeruginosa is a gram-negative bacterium that is found in various environments including soil, water, and vegetation. It is also an opportunistic pathogen, responsible for numerous nosocomial infections in immunocompromised patients. The bacteria colonize patients suffering from a number of chronic lung diseases, particularly ©
2004 WILEY-LISS, INC.
*Correspondence to: Anne Imberty, CERMAV-CNRS, 601 rue de la Chimie, BP53, 38042 Grenoble cedex 9, France. E-mail:
[email protected] Received 4 June 2004; Accepted 13 August 2004 Published online 30 November 2004 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/prot.20330
55
736
E.P. MITCHELL ET AL.
supernatant was further purified on D-Mannose Agarose (Sigma-Aldrich, USA). PA-IIL was allowed to bind to the immobilized mannose in the equilibrating buffer (20 mM Tris/HCl, 100 mM NaCl, 100 M CaCl2, pH 7.5) and then was eluted by the elution buffer (20 mM Tris/HCl, 100 mM NaCl, 100 M CaCl2, 0.1 M mannose, pH 7.5). The purified protein was intensively dialyzed against distilled water for 1 week (for the removal of D-mannose), concentrated by lyophilization and kept at ⫺20°C.
several structures: in addition to the native lectin and the calcium-free one, complexes with three monosaccharides L-fucose, D-mannose, and D-fructopyranose have been described. In the PA-IIL/fucose complex,11 the fucose residue locks onto a pair of calcium ions with three of its hydroxyl groups participating in the coordination of these cations. Such a protein/carbohydrate binding mode has never been observed before. The different orientation of mannose and fructose in complex with the lectin8 yielded isosterical arrangements of three sugar hydroxyl groups in direct contact with the calcium ions. More recently, a sequencerelated protein purified from the plant pathogenic bacterium Ralstonia solananacearum, RS-IIL, has been cocrystallized with mannose and the structure displays the same peculiar role for a pair of calcium ions in the binding site.12 The presence of the two calcium ions in the binding site appears to be the clue for the two particularities of PA-IIL: the weak specificity and the high affinity. The weak specificity has been fully characterized by crystallography and the requirement for the particular stereochemistry of two equatorial and one axial hydroxyl groups is now well established12 and leads to a number of sugars being able to bind. The aim of the present work was to investigate the high affinity of PA-IIL for fucose. The thermodynamic basis of the affinity has been studied by analyzing enthalpy and entropy contributions while the atomic basis is characterized fully by the 1.0 Å resolution crystal structure of the complex. Further rationalization is brought by the study of charge distribution carried out with ab initio calculations. This combination of structural and thermodynamic approaches could serve for further development of anti-adhesion therapeutics against P. aeruginosa infections.
Hemagglutination Activities and Their Inhibition Samples (0.05 ml) of the native and recombinant PA-IIL solutions, each at a concentration of 1 mg per ml, were serially twofold diluted with 0.05 ml saline. Then 0.05 ml of 5% suspension of papain-treated type O human erythrocytes4 was added to each tube. Following a short shaking and 30 min at room temperature the tubes were centrifuged for 30 s (1000⫻) and the hemagglutination was examined. For the inhibition test L-fucose and D-mannose solutions at 0.15 M concentration were serially twofold diluted in 0.05 ml saline for each lectin preparation analysis. 0.05 ml of each lectin preparation at the dilution giving a titer of 1:32 was added to each tube in the respective series. After 30 min at room temperature 0.05 ml of the erythrocyte suspension was added to each tube and the hemagglutinating activity was examined as described above. Microcalorimetry Experiments Titration calorimetry experiments were performed using two different systems, a Microcal VP-ITC microcalorimeter (Microcal, Northampton, MA) and CSC4200 (Calorimetry Sciences Corp.), respectively. All titrations were performed in 10 mM Tris.HCl buffer containing 0.1 M NaCl and 1–3 mM CaCl2 at different pH as specified in Table II. Aliquots of 4 or 5 l of each carbohydrate, dissolved in the same buffer, were added at 4 min intervals into the lectin solution present in the calorimeter cell. In the titrations, the protein concentration in the cell varied from 0.078 to 1 and 0.20 to 0.25 mM for Microcal VP-ITC (cell volume ⫽ 1411 l) and CSC4200 (cell volume ⫽ 1300 l), respectively. The sugar concentration in the 250 l syringe was 10 to 20 times higher than the protein concentration used in the experiment (typically 1 and 4 –5 mM for Microcal VP-ITC and CSC4200, respectively). The temperature of the cell was controlled to 25 ⫾ 0.1°C. Data from a control experiment performed by identical monosaccharide injections into the cell containing only a buffer were subtracted prior to data analysis. Integrated heat effects were analyzed by nonlinear regression using a single site-binding model (Microcal Origin 5.0 and Bindworks 3.0). Fitted data yielded the binding constant (Kb) and the enthalpy of binding (⌬Hb). Other thermodynamic parameters, i.e., changes in free energy, ⌬Gb, and entropy, ⌬Sb, were calculated from equation
MATERIALS AND METHODS Construction of Plasmid for Expression of PA-IIL The oligonucleotides used as primers were as follows: 5⬘-GGA GAT ACC ATA TGG CAA CAC AAG GAG-3⬘ (27-mer) and 5⬘-TTC CAA GCT TCT AGC CGA GCG G-3⬘ (22-mer). The former is designed for the introduction of NdeI and the latter for HindIII restriction sites, respectively. The PCR was performed by using Taq polymerase (Qiagen, GmbH, Germany) and template DNA from Pseudomonas aeruginosa ATCC 33347. After digestion with NdeI and HindIII, the amplified fragment was introduced into multiple cloning site of pET-25(b⫹) vector (Novagen, Madison, WI) resulting in plasmid pET25pa2l. Expression and Purification of PA-IIL E. coli BL21(DE3) cells containing the plasmid pET25pa2l were cultured in 1 L of Luria broth (LB) at 37°C. When the culture reached an optical density of 0.5– 0.6 at 600 nm, iso-propyl--D-thiogalactopyranoside (IPTG) was added to a final concentration of 0.5 mM. Cells were harvested after 3 h incubation at 30°C, washed and resuspended in 10 ml of the buffer (20 mM Tris/HCl, 100 mM NaCl, 100 M CaCl2, pH 7.5). The cells were disrupted by sonication (Soniprep 150, Schoeller instruments, GB). After centrifugation at 10,000 g for 1 h, the
⌬G b ⫽ ⌬H b ⫺ T⌬S b ⫽ ⫺ RTlnK b where T is the absolute temperature and R ⫽ 8.314 J.mol⫺1.K⫺1. At least two independent titrations were performed for each ligand tested.
56
HIGH AFFINITY LECTIN/FUCOSE COMPLEX
737
TABLE I. Data Collection and Refinement Statistics Data Collection and Phasing Wavelength (Å) Resolution (Å) Highest resolution shell (Å) Completeness (%) Space group a, b, c (Å)  (°) Measured reflections Unique reflections Average multiplicity Nb of passes (low and high) RMerge (%) Average I/(I) Wilson B-factor (Å2) ACORN phasing statistics Correlation coefficient R factor Refinement Resolution range (Å) Nb of reflections Protein atoms Residues modelled with multiple locations Water molecules Solvent ions Metal ions Ligands Rcrys Rfree Cruickshank’s DPI based on maximum likelihood (Å) Average Biso (Å) All atoms Main chain Side chain and water RMS Deviation from ideality Bond (Å) Angles (degrees) Percent of refl. for Rfree Residues with alternative conformations Double
0.934 1.0 1.00 ⫺ 1.04 97.7 (94.3) P21 52.735, 72.827, 54.747 94.21 706877 216831 3.3 (2.9) 2 0.063 (0.228) 5.0 (2.8) 6.6 0.38 (166355 medium E reflections) 0.44 (166355 medium E reflections) 30–1.0 212499 3308 27 682 (59 with occupancy 0.5) 2 sulfate groups 8 Ca2⫹ 4 fucose residues 0.1146 0.1351 0.023 13.1 8.8 16.7 0.015 1.89 3% (4332 reflections) A31,A41,A46,A60,A62, A68,A77,B29,B46, B53,B68,B77, C13,C29, C59,C77,C82,D13, D41,D66,D68,D77. A81,B81,C81,D16,D81.
Triple
tant solution. All data images were recorded on an ADSC Q4R CCD detector (Quantum Corp.) on the fixed energy beamline ID14-2 at the ESRF (Grenoble, France). To avoid the loss of the strongest reflections owing to saturation, two sets of data, at low and high resolution, were collected in that order on the same crystal. For the high resolution data the detector was positioned as close as permitted on ID14-2. This did mean that the full resolution available from the crystal was not collected and resulted in a high resolution bin Rmerge of 22.8%. Measurements were made at a single X-ray wavelength of 0.934 Å. Data were processed with the HKL Suite13 and scaled and converted to structure factors using the CCP4 program suite14 (see Table I for data collection and diffraction data statistics).
Crystallization and Data Collection Lyophilized purified PA-IIL was dissolved in water (10 mg ml⫺1) in the presence of fucose (250 g ml⫺1), and salts (ZnCl2, CaCl2, MgCl2 at 2 mM). Crystallization trials were performed with Hampton crystallization screens I and II (Hampton research, Laguna Nigel, CA) using the hanging drop technique. After optimization of conditions, hanging drops were prepared using 2.5 l of this preparation and 2.5 l of the reservoir solution consisting of 1.75 M ammonium sulphate in 0.1 M Tris HCl (pH 8.5), and stored at 20°C. Crystals grew in a rectangular form to maximum dimensions of 0.3 ⫻ 0.3 ⫻ 0.5 mm3 after one month. They belong to space group P21 with unit cell dimensions of a ⫽ 52.735 Å, b ⫽ 72.827 Å, c ⫽54.747 Å, and  ⫽ 94.21° at 100K. The asymmetric unit is a tetramer, corresponding to a Vm 1.75 Å3 Da⫺1 and a solvent content of 29%. A crystal was cryo-cooled at 100 K, after soaking it for as short a time as possible in glycerol 25% v/v in the precipi-
Structure Determination Phasing was performed with ACORN,15 using only the position of five of the calcium ions from the crystal
57
738
E.P. MITCHELL ET AL.
field23 and the positions of all hydrogen atoms of the binding site were optimized.
structure of PA-IIL/fucose complex previously solved at 1.3 Å resolution.11 No reflections were omitted for the purposes of calculating the normalized structure factors for input to ACORN. Using the ACORN phases, an initial structure was built automatically using ARP/warp16 to give 440 residues out of a total of 456 for the asymmetric unit cell. The structure was edited using the program O17 to add missing residues, calcium ions, and fucose ligands that were clearly defined in the electron density maps. This model was then subjected to 20 cycles of refinement in REFMAC18 yielding Rcrys of 20.10% and Rfree of 21.80%. After several cycles of REFMAC refinement with isotropic B-factors, including automatic water molecule placement using ARP/warp, manual rebuilding with O, addition of riding hydrogens in REFMAC, and construction of alternative conformations of side chains, Rcrys and Rfree values respectively decreased to 15.91% and 16.89%. In the next stage, refinement including anisotropic B-factors and riding hydrogens on all protein and fucose groups except hydroxyl moieties, was performed with SHELXL.19 Additional water molecules were placed on the basis of Fo-Fc difference maps. Water molecules were added if the electron density was greater than three sigma and the hydrogen bonding network was sensible. In a number of cases the water molecules, though clearly present, were less than 2.0 Å apart and for these two water molecules, each with an occupancy of 0.5, were positioned in the model. In order to study the protonated state of the aspartic and glutamic acids in the binding site, the last cycles of refinement were conducted without stereochemical restraints on the carbon-oxygen bond lengths and bond angles of the carboxylate groups. This resulted in a final model of 456 residues with four fucose molecules (one per monomer), eight calcium ions (two per monomer), 682 water molecules of which 59 have a half occupancy, two sulfate ions with Rcrys of 11.46% and Rfree of 13.51% at 1.0 Å resolution (Table I). Additionally, the last six cycles of SHELX refinement were performed in parallel without the hydrogen atoms modeled on the fucose ligands in order to create an omit map and give a gauge for visualizing the binding site hydroxyl hydrogens (which had never been modeled during the refinement). Coordinates for the 1.0 Å resolution X-ray structure of PA-IIL/fucose complex have been deposited in the Protein Data Bank20 with the code 1UZV.
Quantum Chemical Calculations Quantum-chemical calculations were performed on a selected part of the 1.0 Å crystal structure of PA-IIL/ fucose, with hydrogen atoms located as described above. The ligand-site amino-acid residues Asn21, Ser23, G1u95, Asp96, Asp99, Asp101, Asn103, Asp104 from the A-chain and Gly114 (from the B-chain) were included in all the calculations. Two calcium ions, one fucose residue and one water molecule that create a hydrogen-bond bridge between the fucose and the protein were also included. As the first step, the geometry was optimized to ensure that the X-ray structure would be correctly reproduced by quantum chemical calculations. Because of the size of the system (138 heavy atoms), the initial geometry optimizations were performed by semi-empirical methods. All optimizations were carried out with the program MOPAC200224 using eight different combinations of methods and techniques: two semi-empirical methods, AM125 and PM326,27 were combined with two SCF procedures, conventional or MOZYME, and two optimizers, BFGS or EF. The following optional keywords were used: GEO-OK, MMOK, NOXYZ, and NODIIS. The optimizations were performed with both 1.3 Å and 1.0 Å resolution structures. The data analysis was performed using the program TRITON.28 The best results, with the lowest RMSD difference between the calculated and X-ray data, were obtained by the combination of PM3/EF with conventional SCF procedure. This procedure was therefore used for optimizing the binding site geometry. In order to render the computation less demanding, all backbone atoms were removed, except for Asn21, replacing the C␣ atoms by methyl groups. The population analysis on the ab initio level was performed on the binding site with no hydrogen atoms added on the aspartate and glutamate groups, leading to a net charge of ⫺2 for the system. Identical calculations were repeated for different protonation states, adding one or two hydrogen atoms on selected acetate groups. Mulliken population analysis29 and the population analysis based on the electrostatic potential (ESP) with the subsequent RESP procedure30,31 were performed on this reduced binding site model. The population analyses were based on two different single-point ab initio calculations, performed at the HF/6-31G** level, and the DFT B3LYP/6-31G** level. All the ab initio calculations were performed using the program Gaussian98.32
Location of Hydroxyl Groups by Molecular Modeling The hydrogen bond network in the binding site was built using the molecular graphics package Sybyl.21 Hydrogen atoms on carbons were added geometrically in their riding position. Hydrogens of carbohydrate hydroxyl groups (O2, O3, O4 and O6) and protein (OG or Ser23) were oriented towards the closer acceptor carboxylate group, when present. Several orientations were tested for the water molecules in order to optimize the resulting hydrogenbond network. Finally, partial charges were added according to the parameterization developed previously for protein– carbohydrate interactions22 for the Tripos force-
Data Mining of Similar Sequences Blast searches33 were performed on general databases (www.infobiogen.fr) as well as on microbial databases (http://www.ncbi.nlm.nih.gov/sutils) both at the peptide and DNA sequence levels. When only DNA sequences were available, they were translated starting from the first ATG motif upstream. Peptide sequence alignments were performed using Clustalx34 and displayed with Genedoc.35
58
HIGH AFFINITY LECTIN/FUCOSE COMPLEX
739
fitting a classical equation for single-site binding.36 The thermodynamic parameters, free energy, and entropy of binding, ⌬Gb and ⌬Sb, were calculated as indicated in the experimental section. The data obtained under different experimental conditions are listed in Table II. Depending on the pH and conditions of measurement, the binding affinity varied between 120 to 180 103 M⫺1, corresponding to dissociation constants ranging between 5.6 to 8.3 10⫺6 M. The experiments performed on the CSC and MicroCal calorimeters gave slightly different values for the stoichiometry of the interaction. While an occupancy of almost one fucose per monomer was measured on the Microcal, for the other machine only 75% saturation was obtained. This could be ascribed either to some defects in the protein preparation, with one monomer inactivated, or to problems due to the need of higher protein concentration and/or due to the lower sensitivity of the CSC microcalorimeter. Nevertheless, this does not affect the evaluation of the thermodynamic parameters. In all measurements, a high enthalpy of binding (⌬Hb ⬇ ⫺27.5 kJ/mol) was observed. The entropy contribution is weak but slightly favorable (T⌬Sb ⬇ 2 kJ/mol), contributing to less than 7% of the free energy of binding (⌬Gb ⬇ ⫺29.5 kJ/mol). Analysis of the Binding Site in the Very High Resolution Crystal Structure The crystal structure of the PA-IIL/fucose complex refined at 1.0 Å resolution displayed the same overall features [Fig. 2(A)] as those previously refined at 1.3 Å resolution in the same P21 space group11 and refined at 1.2 Å resolution in P212121 space group.8 In addition to the eight calcium ions, and four fucose ligands, 682 water molecules and two sulfate ions have been refined; 27 amino acids were modeled with two or three conformations. Each monomer is characterized by a calcium and fucose binding site made up primarily by one long beta-hairpin loop with the participation of a shorter loop and of the C-terminal acidic group at Gly114 of the neighboring monomer. The high resolution allows a detailed analysis of the site. Distances of interest for both the coordination of the two calcium ions, both hepta-coordinated, and the binding of fucose are given in Table III. The two calcium ions are very close to each other, with distances varying from 2.73 to 2.75 Å in the four monomers. Figure 2(B) illustrates the quality of the electron density map at the ligand site. The omit Fo–Fc difference map allows all of the aliphatic hydrogen atoms of fucose to be located. The difference map contour around the hydrogen at C-1 also indicates that a minority population of fucose adopts the -configuration, though this was not modeled. The hydroxyl hydrogen atoms are of the greatest interest since they are involved in the hydrogen-bond network between the fucose residue and the protein side chain. Unfortunately only one of three hydroxyl groups can be oriented from the density map. As seen in Figure 2(B), the hydrogen atom of the O-2 hydroxyl group points towards oxygen OD1 of the Asp96 side chain. Since not all hydrogen atoms can be identified clearly, it is not possible to conclude from the X-ray electron density maps alone the protonation state of the acidic groups
Fig. 1. Titration calorimetry results of L-Fuc (0.78 mM) binding to PA-IIL (0.08 mM) in 10 mM Tris buffer containing 100 mM NaCl and 3 mM CaCl2 (pH 7.5), at 25°C. Top: data obtained from 28 automatic injections, 10 l each, of L-Fuc, into PA-IIL in the cell. Bottom: plot of the total heat released as a function of total ligand concentration for the titration shown above (squares). The solid line represents the best least-square fit for the obtained data.
RESULTS Comparison of Hemagglutination Activity of Recombinant and Native PA-IIL The hemagglutinating titers of both the native and recombinant lectin preparations were almost the same, 1:256 –1:512. The hemagglutination appeared more rapidly with the recombinant lectin. Both lectin preparations also showed similar carbohydrate affinities: being inhibited by fucose dilution up to 0.147mM (11 tubes) and by mannose dilution up to 4.7 mM (5– 6 tubes). Thermodynamic Data The results of a typical microcalorimetry experiment are displayed in Figure 1, showing a steep decrease in the exothermic heat of binding while saturation is achieved. The values of binding affinity Kb, binding enthalpy ⌬Hb and stoichiometry per monomer n have been obtained by
59
740
E.P. MITCHELL ET AL.
TABLE II. Thermodynamics—ITC for PA-IIL with Fucose†
Exp 1—fucosea Exp 2—fucosea Exp 3—fucoseb
pH
Calcium (mM)
Kb 105 M⫺1
n
⫺⌬G (kJ/mol)
⫺⌬H (kJ/mol)
⫺⌬S (J/mol. K)
⫺T⌬S (kJ/mol)
7.4 8.5 7.5
4 1–3 3
1.2 (⫾ 0.1) 1.6 (⫾ 0.2) 1.8 (⫾ 0.2)
0.78 (⫾ 0.02) 0.79 ⫾ (0.06) 1.05 ⫾ (0.02)
29.0 (⫾ 0.1) 29.6 (⫾ 0.3) 29.9 (⫾ 0.2)
27.7 (⫾ 0.5) 27.5 (⫾ 1.4) 27.3 (⫾ 0.3)
⫺4.1 (⫾ 1.8) ⫺7.1 (⫾ 3.7) ⫺5.9 (⫾ 1.7)
⫺1.3 (⫾ 0.5) ⫺2.1 (⫾ 0.1) ⫺2.6 (⫾ 0.4)
†
Experiments 1 to 3 have been conducted using different batches of recombinant protein. Values in parentheses are standard deviations from at least two separate titrations. a Data from CSC ITC microcalorimeter. b Data from a Microcal VP-ITC microcalorimeter.
TABLE III. Geometrical Details of the Four Binding Sites (Distances in Å) in the 1.0-Å Resolution Crystal Structure of PA-IIL/Fucose Complex† Chain A
Chain B
Chain C
Interactions between the calcium ions and the protein. Ca-1 Asp101.OD1 2.37 2.37 2.37 Ca-1 Glu95.E1 2.46 2.46 2.48 Ca-1 Glu95.E2 2.41 2.42 2.42 Ca-1 Asp99.OD1 2.36 2.36 2.37 Ca-1 Asp104.OD2 2.42 2.41 2.41 Ca-2 Asn21.O 2.38 2.39 2.37 Ca-2 Asp101.OD2 2.41 2.41 2.42 Ca-2 Asn103.OD1 2.34 2.34 2.33 Ca-2 Asp104.OD1 2.35 2.35 2.36 2.42 2.44 2.43 Ca-2 Gly114.Oa Interactions between the calcium ions and the fucose residue. Ca-1 O2 2.51 2.49 2.51 Ca-1 O3 2.47 2.48 2.48 Ca-2 O3 2.48 2.49 2.50 Ca-2 O4 2.49 2.51 2.48 Direct interactions between the fucose residue and the protein. O2 Asp96.OD1 *** 2.67 2.66 2.62 O3 Asp99.OD1 2.98 3.00 2.94 O3 Asp99.OD2 *** 2.55 2.57 2.55 O3 Asp101.OD1 2.95 2.94 2.96 O3 Asp101.OD2 2.94 2.94 2.94 O3 Asp104.OD2 2.99 3.01 3.02 O4 Asn21.O 3.01 3.11 3.07 2.53 2.55 2.58 O4 Gly114.OXTa*** O5 Ser23.N *** 2.92 2.96 2.95 C-Me Thr45.CG2 4.06 4.27 4.09 Interactions between the fucose residue and the protein via water molecules. O1 Wat_1 *** 3.16 3.27 3.13 O2 Wat_1 *** 3.00 3.04 3.13 O1 Wat_2 *** 2.76 2.70 2.75 Wat_1 Thr98.N *** 2.90 2.97 2.90 Wat_1 Asp99.N *** 3.32 3.18 3.28 Wat_2 Ser23.OG *** 2.68 2.71 3.32 †
Chain D 2.36 2.45 2.41 2.38 2.41 2.39 2.43 2.33 2.34 2.41 2.48 2.48 2.46 2.50 2.66 3.00 2.58 2.96 2.92 2.99 3.03 2.58 2.98 4.07 3.27 3.04 2.76 2.93 3.16 2.62
The three stars indicate distances with geometrical feature in agreement with existence of a hydrogen bond. C-terminal residue from the other monomer.
a
CD-OD2). This can be compared to values of 1.24 and 1.32 Å observed for COOH observed in the atomic resolution endocellulase crystal structure.37 The C-O bonds of the fucose site carboxylates, bar Asp99, refine to lengths between 1.25 and 1.27 Å which are compatible with no protonation of those groups (the estimated average coordinate error being 0.023 Å). Further comparison of the molecular model with the protonated structure of the binding site, as treated by computational chemistry approaches are discussed below.
involved in ligand binding. Nevertheless, due to the high resolution of the data (around 5.6 observation per parameter during anisotropic B-factor refinement), it was possible to perform 20 refinement cycles without stereochemical restraints on the bond lengths of the COO groups in the ligand site and to analyze the differences, if any. There are few differences between the four monomers of the asymmetric unit, and therefore only the average bond lengths are listed in Table IV. Larger average bond length differences are observed for Asp99 (1.22 Å for CD-OD1 vs. 1.27 Å for
60
HIGH AFFINITY LECTIN/FUCOSE COMPLEX
741
Modeling of Hydrogen-Bond Network Where it is assumed, accordant with the X-ray data, that none of the aspartate and glutamate side chains of the binding site are protonated, the modeling study arrives at only one possible network of hydrogen bonds between the sugar and the protein. The most buried hydroxyl groups of fucose, i.e., O3 and O4, donate a hydrogen bond to Asp99.OD2 and Gly114*.OXT, respectively. The third hydroxyl group, O2, also donates a hydrogen bond to an acidic group, i.e., to OD1 of Asp96, this hydrogen bond being the only one that could be identified from the X-ray data. With the participation of the backbone NH of Ser23, that gives a hydrogen to the fucose ring oxygen O5, this gives a total of four direct hydrogen bonds that can be modeled unambiguously between the protein and the sugar [Fig. 2(C)]. The hydrogen atoms of the conserved water molecule Wat1 could also be modeled: the orientation of the water molecule could be deduced from the facts that it receives two hydrogens from NH of Thr98 and Asp99 and that it donates one hydrogen to the O2 hydroxyl of fucose. The second hydrogen of the same water molecule can then point to O1 of fucose. The special location of this water molecule seems to allow the water to stabilize the tip of the calcium-carbohydrate binding loop made up of 14 amino acids in a beta-hairpin motif. This particular water is conserved in all PA-IIL/fucose complexes (all four monomers, 1.3 and 1.0 Å resolution). Its special role in maintaining the conformation of the loop is illustrated by the fact that it is also conserved in the native PA-IIL structure, but not in the demetallized one, in which the calcium-binding loop is partially disordered.8 The second water molecule, Wat2, that stabilizes the alpha configuration of fucose by bridging the anomeric oxygen atom to the side chain of Ser23 is not so well conserved in all monomers and splits in two close sites (modeled at 0.5 occupancy each) in one of them. As a consequence, there are several possibilities for orienting the protons between the Ser23.OG, Wat2 and O1 of fucose and the network shown in Figure 2(C) represents only one possibility. The proposed network displayed in Figure 2(C) results in the formation of seven direct or indirect hydrogen bonds between one monosaccharide and the protein.
Fig. 2. Structure of the 1.0 Å resolution structure of the PA-IIL/fucose complex. A: Ribbon representation of the tetramer consisting of the asymmetric unit with stick representation of sugar and sulfate and cpk representation of the calcium ions. B: Final sigma weighted 2Fo–Fc electron density map (blue color contoured at 4 ) around the monosaccharide residue, water molecules and calcium ions and sigma weighted Fo–Fc omit map (green color contoured at2.5 ) after six cycles of refinement omitting the hydrogen atoms riding the fucose carbons. C: Interactions of PA-IIL with calcium ions and fucose with modeled hydrogen atoms. Coordination contacts are indicated by orange solid lines and hydrogen bonds by green dashed lines. The * symbol indicates the terminal glycine from the other monomer. All molecular drawings were prepared with MOLSCRIPT53 and RASTER-3D.54
61
742
E.P. MITCHELL ET AL.
TABLE IV. Carbon–Oxygen Bond Lengths in PA-IIL Binding Site† Length from ab initio calculations, Å Residue
C—O bond
N21 N103 G114
O OD1 O OXT OE1 OE2 OD1 OD2 OD1 OD2 OD1 OD2 OD1 OD2
E95 D96 D99 D101 D104
Length from X-ray data Åa
Nonprotonated
Gly114*protonated
Asp96protonated
Both protonated
1.25 1.23 1.27 1.27 1.26 1.27 1.25 1.26 1.22 1.27 1.26 1.25 1.25 1.27
1.23 1.24 1.25 1.26 1.27 1.27 1.27 1.25 1.27 1.26 1.27 1.27 1.28 1.27
1.23 1.24 1.25 1.33b 1.27 1.27 1.26 1.26 1.27 1.26 1.27 1.28 1.29 1.26
1.23 1.24 1.25 1.26 1.27 1.28 1.34b 1.22 1.28 1.25 1.27 1.27 1.27 1.27
1.23 1.25 1.25 1.33b 1.27 1.28 1.34b 1.22 1.27 1.26 1.28 1.28 1.28 1.26
†
Comparison between crystallographic data and models calculated using semi-empirical methods for different protonation states. Average over the four monomers. Standard deviations less than 0.02 Å. b Protonated oxygen atom. a
contains one protonated glutamate with C-O and C-OH distances of 1.24 and 1.32 Å, respectively.37 Comparison of the X-ray derived distances with the calculated ones indicates that only the calculation performed without any protonation is in agreement with experimental values. The binding site has been therefore considered as nonprotonated for the rest of the study.
Prediction of Protonation State by Ab Initio Calculations As described above, the PA-IIL binding site is characterized by the presence of six carboxylate groups that are involved in the binding of two calcium ions. Since there are no amino acids that are likely to be positively charged in the close vicinity of the binding site and assuming zero protonation of the carboxyl groups, the overall charge of the site is expected to be ⫺2. In order to confirm the crystallographic data indicating the absence of protonation, semi-empirical calculations were performed on the binding site in different ionization states. The following residues could potentially be protonated: Glu95, Asp96, Asp99, Asp101, Asp104, and Gly114B. Since both oxygens of Glu95, Asp101, and Asp104 are coordinated by calcium ions, they were excluded. Also Asp99 is unlikely to be protonated because the only position of a hydrogen would be in steric hindrance with O3 hydroxyl group of fucose. Therefore, the most probable candidates for protonation are the C-terminus carboxylate of Gly114B and the Asp96 residue. Four different systems were constructed and optimized: the fully deprotonated structure (i.e., none of the carboxyl groups of the binding site was protonated) with a net charge of ⫺2, two systems with net charges of ⫺1 (one with Gly114B protonated, the other with Asp96 protonated), and one system with a zero net charge (with both Gly231 and Asp96 protonated). In all the above calculations, the original X-ray structure was correctly reproduced with low RMSD (see Table IV) except for the structural details described below. Analysis of the resulting geometries (Table IV) indicates that the computed C–O bond lengths are indeed dependent upon the protonation state with bond length varying from 1.27 (⫹/⫺0.03) Å for a delocalized carboxylate C-O bond to 1.33 (⫹/⫺0.01) Å for a carboxylic C-OH bond. These lengths can be compared readily to the experimentally measured values in very high resolution crystal structure of endocellulase that
Calculation of Charge Distribution by Ab Initio Calculations Charge distribution has been calculated on a mini-model of the site as described in the experimental section. Mulliken and RESP charges were calculated using various methods and levels of theory and on different systems: isolated fucose, the binding site of PA-IIL without any ligand, the binding site of PA-IIL with two Ca2⫹ ions and, finally, the binding site of PA-IIL with two Ca2⫹ ions and either a fucose residue or three water molecules (the water molecules replacing the O2, O3, and O4 atoms of fucose) (Table V). The Mulliken scheme does not separate greatly the above different scenarios of the binding site (data in Supplementary Material). However, when the RESP procedure is applied, the charge distribution is very dependent on the composition of the complex. Results obtained by taking RESP charges from B3LYP/6-31G** calculations and from HF/6-31G** calculations are listed in Table V. The complete results, including all data, are provided in supplementary material. The atomic charges calculated with B3LYP or with HF/6-31G** calculations are in good agreement. When there is no ligand present in the binding site, the global charge of ⫺6 is distributed mainly over the carboxyl groups. When including two calcium ions in the system, highly negative charges are induced in their neighborhood (i.e., on oxygen atoms of carbonyl and carboxyl groups). The calcium ions themselves keep relatively high positive charges of about ⫹1.5. On the other hand, isolated fucose
62
743
HIGH AFFINITY LECTIN/FUCOSE COMPLEX
TABLE V. Charge Distribution in the Binding Site Calculated on Different Systems, Using B3LYP//6–31G** (HF/6–31G** in Italics) Ab Initio Calculations with RESP Charges
Protein Asn21.O Gly114.O Asn103.OD1 Asp99.OD1 Asp104.OD1 Asp101.OD1 Asp101.OD2 Ca2⫹ ions Ca1 Ca2 Fucose Fuc.O2 Fuc.O3 Fuc.O4 Water Wat2 Wat3 Wat4 a
PA-IIL/Ca2⫹ Fucose
PA-IIL/Ca2⫹ Water
PA-IIL/Ca2⫹
Isolated molecule
⫺0.36/⫺0.44 ⫺0.46/⫺0.56 ⫺0.52/⫺0.60 ⫺0.67/⫺0.78 ⫺0.42/⫺0.52 ⫺0.50/⫺0.61 ⫺0.65/⫺0.76
⫺0.42/⫺0.50 ⫺0.56/⫺0.65 ⫺0.51/⫺0.63 ⫺0.74/⫺0.87 ⫺0.60/⫺0.72 ⫺0.58/⫺0.70 ⫺0.69/⫺0.81
⫺0.45/⫺0.56 ⫺0.67/⫺0.78 ⫺0.61/⫺0.72 ⫺0.82/⫺0.93 ⫺0.77/⫺0.90 ⫺0.65/⫺0.77 ⫺0.76/⫺0.87
⫺0.32a ⫺0.54 ⫺0.56 ⫺0.64 ⫺0.65 ⫺0.51 ⫺0.81
1.08/1.19 0.61/0.71
1.33/1.49 0.99/1.18
1.48/1.62 1.48/1.64
⫺0.18/⫺0.24 0.02/0.01 0.00/⫺0.03
⫺0.52/⫺0.60 ⫺0.49/⫺0.57 ⫺0.52/⫺0.60 ⫺0.73/⫺0.84 ⫺0.35/⫺0.46 ⫺0.54/⫺0.67
B3LYP/6 –31G** calculations on isolated protein binding site did not converge.
Search for Similar Sequences in Databases
is a neutral molecule but displays rather negatively charged hydroxy-oxygens with partial charges of about ⫺0.5. When two calcium ions and fucose are present in the binding site, the charges on both calcium ions decreases significantly (by about 0.6 to 0.9). A strong charge transfer from three oxygen atoms of fucose and also from the neighboring oxygen atoms of amino acids in the binding site to calcium ions occurs. The situation is somewhere “in between” when there are three water molecules coordinated on calcium ions instead of three oxygens of fucose (see Table V). In summary, when fucose is not present in the active site, the calcium ions bind by ionic bonds. In this case, the solvent molecules should be very competitive since the free energy of Ca ion solvation is ⫺381 kcal/mol.38 This would suggest that there is no strong structural role of Ca ions in the unliganded PA-IIL. The fucose binding can be considered as an aqua-substitution reaction for the calcium ion: the fucose hydroxyl groups have a larger mean polarizability than water molecules, resulting in a larger charge redistribution upon binding. In this case, the binding is very strong and this suggests that the Ca ions have a significant structural role in the complex as the charge delocalization is crucial for its stability. Also, the binding site of the protein is stabilized by charge delocalization. It is interesting that when fucose is present, the charge transfer is not as significant on calcium 1 (charge 1.08) as it is on calcium 2 (charge 0.61). This may be caused by the difference in the functional groups that surround the ions and contribute differently to charge delocalization. While calcium 2 is coordinated by oxygens of three carboxyl groups, two carbonyl groups and two oxygens from fucose, calcium 1 is coordinated by oxygens of five carboxyl groups and two oxygens of fucose. So the real difference is the presence of two carbonyl groups which replace two carboxyl groups.
With the rapid progress of bacterial genomics, genes coding for PA-IIL-like proteins were identified in the newly sequenced genome of other bacteria: Ralstonia solanacearum,39 Chromobacterium violaceum40 and Burkholderia cenocepacia (The Sanger Institute, unfinished sequence data were obtained from http://www.sanger.ac.uk/ Projects/B_cenocepacia/). A similar gene is also present in the newly sequenced genome of another strain of Pseudomonas aeruginosa, UCBPP-PA14 (Harvard Partners Center for Genetics and Genomics, http://www.hpcgg.org:8080/ hpcgg/jsp/hpcgg/Sequence/mouse/pseudomonas.jsp). The gene product, protein accession number ZP_00136733.1, is apparently shorter at the N-terminus then the protein from P. aeruginosa PA01. Detailed analysis of nucleotide sequence prior to the gene sequence revealed that an alternative initiation GTG codon for the automatic gene annotation was taken into account, not considering a presence of the classical initiation ATG codon 15 bp upstream. Moreover, 11 bp further upstream of this new ATG translational start site, a putative Shine-Delgarno sequence can be found, confirming that five amino acids in the annotated gene ZP_00136733.1 are missing. The corrected N-teminus sequence results in complete homology of both P. aeruginosa lectins. The alignment of the PA-IIL like region of all the identified and putative lectins is given in Figure 3. In all cases, the PA-IIL-like domain corresponds to the C-terminus, finishing with a fully conserved WPLG sequence. The calcium and sugar binding loop (Asp96 to Asp104 in PA-IIL) is also highly conserved. Four of the sequences, i.e., the ones from both strains of P. aeruginosa, from R. solanacearum, and from C. violaceum, have very similar lengths and almost similar N-term sequences, while the three B. cenocepacia sequences have
63
744
E.P. MITCHELL ET AL.
Fig. 3. Alignment of PA-IIL-like sequences from different bacteria. The sequences are from the following genes, with their NCBI/EMBL deposition number between [brackets]. P.a_PA01, fucose-binding lectin from Pseudomonas aeruginosa PA01 [TrEMBL: Q9HYN5]; P.a_UCBPP, ORF Paer104401 from Pseudomonas aeruginosa UCBPP-PA14 [NCBI: ZP_00136733.1GI 32038461]; R.s., mannose-binding lectin from Ralstonia solanacearum GMI1000 [TrEMBL: Q8XUA5]; Ch.v., ORF CV1744 from Chromobacterium violaceum ATCC12472 [NCBI: NP_901414.1 GI:34495455]; B.c.1, B.c.2 and B.c.3, ORFs predicted from unfinished sequence data of Burkholderia cenocepacia J2315 [chromosome 2, complement: 215683-216501, 214786-215556 and 216624-217013]; Numberings of amino acids is putative for P.a_UCBPP, Ch.v. and the three B.c. and correspond to the most probable limits of the ORF. PA-IIL residues involved in calcium binding are marked by an asterisk and the ones responsible for the fine monosaccharide preference are indicated by an exclamation mark.
binding of glucose by Lens culinaris lectin44 and the binding of galactose by Erythrina corallodendron lectin.45 In the latter case, structural studies indicated that the favorable entropy may be due to the release of two tightly bound water molecules upon galactose binding.46 Similar explanation could apply to the favorable entropy observed in PA-IIL/fucose interaction since three water molecules are tightly bound to calcium in the crystal structure of the native lectin8 and are replaced by fucose hydroxyl groups in the complex.
longer and more variable N-term regions with no similarity with known proteins (data not shown here). DISCUSSION Thermodynamic Basis of PA-IIL Affinity to Fucose The present microcalorimetry study confirms the very high affinity of the PA-IIL for fucose, albeit with values one order of magnitude lower than that published previously from equilibrium dialysis study (Ka : 1.5 106 M⫺1).10 The discrepancies between theses values could be due to differences between methodological approaches or between the protein samples. Nevertheless, a review of lectin–protein interactions indicates that the affinity measured here is much higher than the millimolar range usually observed for binding unsubstituted monosaccharides,41 the highest reported value being a Ka of 9 103 M⫺1 for the binding to GalNAc of soybean agglutinin.42 Analysis of the different thermodynamic contributions demonstrates that the above interaction is characterized by a strong enthalpy of binding, accompanied by a weakly favorable entropy term (Table II). When compared to classical millimolar range lectin-monosaccharide interactions, the unusual affinity is explained not only by the relatively high enthalpy of binding, but also by the absence of an entropy barrier. The favorable entropy of binding is very unusual in protein– carbohydrate complexes where an entropy barrier is generally observed.41 The barrier is higher for the binding of flexible oligosaccharides, but unfavorable negative entropy contributions are also observed for monosaccharide binding and are generally attributed to the displacement of water molecules from disordered locations in amphiphilic regions of the carbohydrate binding sites to more ordered networks in bulk water solution.43 Nevertheless, favorable entropy has already been observed for the
Atomic Basis of the PA-IIL Affinity for Fucose A direct interaction between one calcium atom and a sugar ligand has previously been observed in the carbohydrate recognition domain (CRD) of C-type animal lectins that binds neutral monosaccharides with two equatorial hydroxyl groups such as D-mannose, L-fucose or Lgalactose.47 However, in the C-lectin family, the dissociation constant for monosaccharides is in the millimolar range. Another family of lectins that includes serum amyloid P component (SAP) and C-reactive protein (CRP), contains two close calcium ions involved in binding diverse ligands but it differs from that of PA-IIL since it is only directed to negatively charged carbohydrates.48 In the present work, the very particular environment of the two close calcium ions was analyzed. Three of the fucose hydroxyl groups are directly involved in the coordination sphere of the calcium ion. Two of these hydroxyl groups create very strong hydrogen bonds with acetate from protein side chains with an O—O distance of 2.55 (⫹/⫺0.02) Å, a value significantly shorter than that currently observed in protein carbohydrate interactions. Furthermore, the acetates that are involved in the hydrogen bonds also coordinate the calcium ions. This yields two peculiar six-membered ring arrangements (—O3—O3H—
64
745
HIGH AFFINITY LECTIN/FUCOSE COMPLEX
than other protein– carbohydrate interactions, due to the direct involvement of two cations in the binding. The very fine present analysis of the interaction gives some clues for the development of high-affinity ligands. Possible routes for a rational design of high-affinity antiadhesive compounds encompass further stabilization of the charge distribution, enhancement of the hydrophobic interactions that are minimal in the complexes studied to date and rational use of the presence of water molecules. These approaches could yield compounds of therapeutic interest not only against P. aeruginosa, but also against several other related dangerous pathogens that use similar lectins for their colonization strategy.
D99.OD2—D99.CD—D99.OD1—Ca1— and a similar ring involving O2, Gly114 and Ca2) that could be responsible for the strong charge delocalization shown by the ab initio calculations and could therefore play a major role in the unusually high affinity observed between PA-IIL and the monosaccharide. Biological Occurrence of this Sugar Binding Mode The PA-IIL sequence, first identified in 2000, has been reported recently in other phylogenetically related bacteria, the plant pathogen R. solanacearum and C. violaceum.12 This latter gram-negative saprophytic bacterium from soil and water is of high interest for biotechnology since it produces diverse antibiotics. It is normally considered as a nonpathogen, although it is occasionally involved in infections and may cause fatal septicemias in human and animals.49 Using the homology searches described above, three ORFs are described for the first time which code for three proteins containing C-terminal sequences highly similar to PA-IIL in another emerging pathogen, Burkholderia cenocepacia (previously B. cepacia). In the 1980s, it was found to cause life threatening pulmonary infections in cystic fibrosis patients with complications caused by B. cepacia having a mortality rate of 80%.50 It has to be noted that all amino acids involved in calcium binding are fully conserved in these different proteins, including the C-terminal glycine. All these proteins can therefore be postulated to bind two calcium ions and also to display the same dimeric structure as PA-IIL. On the other hand, some variations are observed in the threeamino-acid motif that is responsible for sugar specificity12 and it is difficult to predict what will be the highest affinity monosaccharide ligand for each sequence. The occurrence of the PA-IIL sequence in four different bacteria may give some clues as to the function of this lectin. These bacteria share some particularities: they are soil inhabitants, opportunistic pathogens, and highly dangerous once an infection is established. It has been observed that the ORFs of C. violaceum that present a close similarity to ORFs of both R. solanacearum (17.4%) and P. aeruginosa (9.6%) are mostly coding for proteins involved in the interactions of these bacteria with their environment.40
ACKNOWLEDGEMENTS We thank the ESRF, Grenoble, for access to synchrotron data collection facilities. The work was supported by Ministry of Education (LSˇ , MP, JK, MW contract LN00A016), and by Grant Agency (CH, Grant 202/01/ D110) of the Czech Republic, by CNRS, by French Ministry of Research ACI Microbiologie program and by Vaincre la Mucoviscidose. Travels and visits between NCBR and CERMAV are supported by a BARRANDE exchange program. REFERENCES 1. Scharfman A, Degroote S, Beau J, Lamblin G, Roussel P, Mazurier J. Pseudomonas aeruginosa binds to neoglycoconjugates bearing mucin carbohydrate determinants and predominantly to sialylLewis x conjugates. Glycobiology 1999;9:757–764. 2. Lamblin G, Degroote S, Perini JM, Delmotte P, Scharfman A, Davril M, Lo-Guidice JM, Houdret N, Dumur V, Klein A, Roussel P. Human airway mucin glycosylation: A combinatory of carbohydrate determinants which vary in cystic fibrosis. Glycoconj J 2001;18:661– 684. 3. Roussel P, Lamblin G. The glycosylation of airway mucins in cystic fibrosis and its relationship with lung infection by Pseudomonas aeruginosa. Adv Exp Med Biol 2003;535:17–32. 4. Gilboa-Garber N. Pseudomonas aeruginosa lectins. Meth Enzymol 1982;83:378 –385. 5. Avichezer D, Katcoff DJ, Garber NC, Gilboa-Garber N. Analysis of the amino acid sequence of the Pseudomonas aeruginosa galactophilic PA-I lectin. J Biol Chem 1992;267:23023–23027. 6. Gilboa-Garber N, Katcoff DJ, Garber NC. Identification and characterization of Pseudomonas aeruginosa PA-IIL lectin gene and protein compared to PA-IL. FEMS Immunol Med Microbiol 2000;29:53–57. 7. Glick J, Garber NC. The intracellular localization of Pseudomonas aeruginosa lectins. J Gen Microbiol 1983;9:3085–3090. 8. Loris R, Tielker D, Jaeger K-E, Wyns L. Structural basis of carbohydrate recognition by the lectin LecB from Pseudomonas aeruginosa. J Mol Biol 2003;331:861– 870. 9. Imberty A, Wimmerova M, Mitchell EP, Gilboa-Garber N. Structures of the lectins from Pseudomonas aeruginosa: Insights into molecular basis for host glycan recognition. Microb Infect 2004;6: 221–228. 10. Garber N, Guempel U, Gilboa-Garber N, Doyle RJ. Specificity of the fucose-binding lectin of Pseudomonas aeruginosa. FEMS Microbiol Lett 1987;48:331–334. 11. Mitchell E, Houles C, Sudakevitz D, Wimmerova M, Gautier C, Pere´z S, Wu AM, Gilboa-Garber N, Imberty A. Structural basis for oligosaccharide-mediated adhesion of Pseudomonas aeruginosa in the lungs of cystic fibrosis patients. Nat Struct Biol 2002;9:918 – 921. 12. Sudakevitz D, Kostlanova N, Blatman-Jan G, Mitchell EP, Lerrer B, Wimmerova M, Katcof fDJ, Imberty A, Gilboa-Garber N. A new Ralstonia solanacearum high affinity mannose-binding lectin RS-IIL structurally resembling the Pseudomonas aeruginosa fucose-specific lectin PA-IIL. Mol Microbiol 2004;52:691–700.
CONCLUSION Two paradigms dictate protein– carbohydrate interactions. The first one is that the efficacy of these interactions in biological processes is due to their multivalency that creates an “avidity” or “Velcro” effect that would compensate the low affinity (i.e., millimolar range) generally observed for each binding site.51 The second paradigm is that protein/sugar binding is mainly due to two types of interaction: the numerous hydrogen bonds, owing to sugar hydroxyl groups, contribute to the affinity, and the van der Waals interactions, mainly due to stacking of the hydrophobic face of sugar to side chains of aromatic amino acids, are responsible of the specificity.52 The interaction between PA-IIL soluble bacterial lectin and monosaccharides is clearly of a different type: almost no hydrophobic contact is involved and the interaction displays a higher affinity
65
746
E.P. MITCHELL ET AL. 36. Wiseman T, Williston S, Brandts JF, Lin LN. Rapid measurement of binding constants and heats of binding using a new titration calorimeter. Anal Biochem 1989;179:131–137. 37. Varrot A, Davies GJ. Direct experimental observation of the hydrogen-bonding network of a glycosidase along its reaction coordinate revealed by atomic resolution analyses of endoglucanase Cel5A. Acta Crystallogr D Biol Crystallogr 2003;59:447– 452. 38. Magini M, Tanaka N, Ohtaki H, Tamamushi R. Ions and molecules in solution. Amsterdam: Elsevier; 1983. 97 p. 39. Salanoubat M, Genin S, Artiguenave F, Gouzy J, Mangenot S, Arlat M, Billault A, Brottier P, Camus JC, Cattolico L, Chandler M, Choisne N, Claudel-Renard C, Cunnac S, Demange N, Gaspin C, Lavie M, Moisan A, Robert C, Saurin W, Schiex T, Siguier P, Thebault P, Whalen M, Wincker P, Levy M, Weissenbach J, Boucher CA. Genome sequence of the plant pathogen Ralstonia solanacearum. Nature 2002;415:497–502. 40. Brazilian National Genome Project Consortium. The complete genome sequence of Chromobacterium violaceum reveals remarkable and exploitable bacterial adaptability. Proc Natl Acad Sci USA 2003;100:11660 –11665. 41. Dam TK, Brewer CF. Thermodynamic studies of lectin-carbohydrate interactions by isothermal titration calorimetry. Chem Rev 2002;102:387– 429. 42. Gupta D, Cho M, Cummings RD, Brewer CF. Thermodynamics of carbohydrate binding to galectin-1 from Chinese hamster ovary cells and two mutants. A comparison with four galactose-specific plant lectins. Biochemistry 1996;35:15236 –15243. 43. Lemieux RU, Delbaere LT, Beierbeck H, Spohr U. Involvement of water in host-guest interactions. Ciba Found Symp 1991;158:231– 245. 44. Schwarz FP, Puri KD, Bhat RG, Surolia A. Thermodynamics of monosaccharide binding to concanavalin A, pea (Pisum sativum) lectin, and lentil (Lens culinaris) lectin. J Biol Chem 1993;268: 7668 –7677. 45. Surolia A, Sharon N, Schwarz FP. Thermodynamics of monosaccharide and disaccharide binding to Erythrina corallodendron lectin. J Biol Chem 1996;271:17697–17703. 46. Elgavish S, Shaanan B. Structures of the Erythrina corallodendron lectin and of its complexes with mono- and disaccharides. J Mol Biol 1998;277:917–932. 47. Drickamer K. Ca(2⫹)-dependent sugar recognition by animal lectins. Biochem Soc Trans 1996;24:146 –150. 48. Thompson D, Pepys MB, Tickle I, Wood S. The structures of crystalline complexes of human serum amyloid P component with its carbohydrate ligand, the cyclic pyruvate acetal of galactose. J Mol Biol 2002;320:1081–1086. 49. Duran N, Menck CF. Chromobacterium violaceum: a review of pharmacological and industrial perspectives. Crit Rev Microbiol 2001;27:201–222. 50. Mahenthiralingam E, Baldwin A, Vandamme P. Burkholderia cepacia complex infection in patients with cystic fibrosis. J Med Microbiol 2002;51:533–538. 51. Sacchettini JC, Baum LG, Brewer CF. Multivalent proteincarbohydrate interactions. A new paradigm for supermolecular assembly and signal transduction. Biochemistry 2001;40:3009 – 3015. 52. Vyas N. Atomic features of protein-carbohydrate interactions. Curr Opin Struct Biol 1991;1:732–740. 53. Kraulis P. Molscript: A program to produce both detailed and schematic plots of protein structures. J Appl Crystallogr 1991;24: 946 –950. 54. Merrit EA, Murphy ME. Raster3D version 2.0. A program for photorealistic molecular graphics. Acta Crystallogr D Biol Crystallogr 1994;50:869 – 873.
13. Otwinowski Z, Minor W. Processing of X-ray diffraction data collected in oscillation mode. In: Carter CWJ, Sweet RM, editors. Macromolecular Crystallography, Part A. San Diego: Academic Press; 1997. p 307–326. 14. Collaborative computational project number 4. The CCP4 suite: programs for protein crystallography. Acta Crystallogr D Biol Crystallogr 1994;50:760 –763. 15. Cowtan KD, Zhang KY. Density modification for macromolecular phase improvement. Prog Biophys Mol Biol 1999;72:245–270. 16. Perrakis A, Morris R, Lamzin VS. Automated protein model building combined with iterative structure refinement. Nat Struct Biol 1999;6:458 – 463. 17. Jones TA, Zou JY, Cowan SW, Kjeldgaard M. Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Crystallogr A 1991;47:110 – 119. 18. Murshudov GN, A.A.Vagin, E.J.Dodson. Refinement of macromolecular structures by the maximum-likelihood method. Acta Crystallogr D Biol Crystallogr 1997;53:240 –255. 19. Sheldrick GM, Schneider TR. Shelxl: High-resolution refinement. Meth Enzymol 1997;277:319 –343. 20. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Res 2000;28:235–242. 21. Tripos Associates, 6.4. 1699 S. Hanley Road, Suite 303, St Louis, MO 63144 USA. 22. Imberty A, Bettler E, Karababa M, Mazeau K, Petrova P, Pe´rez S. Building sugars: The sweet part of structural biology. In Vijayan M, Yathindra N, Kolaskar AS, editors. Perspectives in Structural Biology. Hyderabad: Indian Academy of Sciences and Universities Press; 1999. p 392– 409. 23. Clark M, Cramer RDI, van den Opdenbosch N. Validation of the general purpose Tripos 5.2 force field. J Comput Chem 1989;10: 982–1012. 24. MOPAC, 2002. Tokyo: Fujitsu Limited; 1999. 25. Dewar MJS, Zoebisch EG, Healy EF, Stewart JJP. AM1: A new general purpose quantum mechanical model. J Am Chem Soc 1993;107:3902–3909. 26. Stewart JPP. Optimisation of parameters for semi-empirical methods I. Method. J Comp Chem 1989;10:209 –220. 27. Stewart JPP. Optimisation of parameters for semi-empirical methods II. Applications. J Comp Chem 1989;10:221–264. 28. Damborsky J, Prokop M, Koca J. TRITON: graphic software for rational engineering of enzymes. Trends Biochem Sci 2001;26:71– 73. 29. Mulliken RS. Electronic population analysis on LCAO-MO molecular wave functions I. J Chem Phy 1955;23:1833–1846. 30. Bayly CI, Cieplak P, Cornell WD, Kollman PA. A well-behaved electrostatic potential based method using charge restraints for deriving atomic charges—the RESP model. J Phys Chem 1993;97: 10269 –10280. 31. Cornell WD, Cieplak P, Bayly CI, Kollman PA. Application of RESP charges to calculate conformational energies, hydrogenbond energies, and free-energies of solvation. J Am Chem Soc 1993;115:9620 –9631. 32. Gaussian, 98, Revision A. 9. Pittsburgh: Gaussian, Inc.; 1998. 33. Altschul SF, Madden TL, Scha¨ffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997;3389 – 3402. 34. Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 1994;22:4673– 4680. 35. Nicholas KB, Jr. NHB, Deerfield DWI. GeneDoc: Analysis and visualization of genetic variation. EMBNEW News 1997;4:14.
66
4. MODELISATION DES GLYCOSYLTRANSFERASES PAR HOMOLOGIE 4.1. Les galactosyltransférases impliquées dans la biosynthèse des glycosaminoglycanes La biosynthèse des glycosaminoglycannes débute par la biosynthèse du précurseur tétrasaccharidique lié au groupement hydroxyle d’une serine: GlcAβ1,3Galβ1,3Galβ1,4XylβO-Ser. Ce précurseur sert d’amorce pour l’addition alternée de GlcNAc (ou GalNAc) et d’acide glucuronnique (GlcUA), ainsi donnant naissance à l’héparane sulfate ou au chondroïtine sulfate (Esko and Lindahl, 2001). L’assemblage de ce tétrasaccharide implique le transfert séquentiel d’unités monosaccharidiques à partir des nucleotides-sucres correspondants, sur l’extrémité réductrice de l’oligosaccharide en croissance. Parmi les glycosyltransférases impliquées dans cette biosynthèse, nous nous sommes intéressés à la galactosyltransférase II (β3GalT6) qui catalyse la formation de la liaison Galβ1,3Gal et qui a été récemment caractérisée et séquencée (Bai et al., 2001).
Figure 23: La réaction catalysée par la β3GalT6.
Cette enzyme, qui a un mécanisme d’inversion de configuration, appartient à la famille 31 du système CAZY (Coutinho and Henrissat, 1999) qui, à ce jour, comporte 342 protéines avec 6
67
différentes activités enzymatiques. Il s’agit donc d’une famille assez importante mais à ce jour, aucune structure cristallographique n’est disponible dans ce groupe d'enzymes.
4.2. La modélisation par homologie de β3GalT6 Nous nous sommes appliqués à modéliser par homologie la β3GalT6 pour avoir une structure représentative de la famille GT31. En l’absence de structure tridimensionnelle dans cette famille, la modélisation a été effectuée à partir d’une structure cristallographique d’une autre famille. La recherche de repliement protéique correspondant le mieux à cette séquence a été faite par 3D-PSSM (Kelley et al., 2000) et a montré que le taux de ressemblance le plus haut est entre la β3GalT6 humaine (code GenBank AY050570) (Bai et al., 2001) et la glucuronyltransférase GlcAT-I humaine (code PDB 1FGG) (Pedersen et al., 2000). Le Tableau 4 montre les résultats de l’analyse de séquences.
Organisme H. sapiens
M. musculus
R. norvegicus C. elegans
Fonction β3-GalT1 β3-GalT2 β3-GalT4 β3-GalT5
Query Q9Y5Z6 O43825 O96024 Q9Y2C3
β3-GalT6 β3-GalT7 C1-GalT1 C1-GalT2 β3-GalT1 β3-GalT2
AY050570 Q9NY97 AF155582 AB084170 O54904 O54905
β3-GalT3 β3-GalT4 β3-GalT5 β3-GalT6
O54906 Q9Z0F0 Q9JI67 AY050569
C1-GalT1 β3-GalT4 C1-GalT1 C1-GalT1
AF157963 O88178 AF157962 Q18515
Highest probability match with known crystal structure 1QG8 (7%), 1LLL3 (13%), 1GA8 (11%), 1LZJ (13%) 1QG8 (11%); 1LL3 (11%); 1LZJ (11%); 1OMZ (11%) 1LZJ (15%); 1QG8 (11%) 1QG8 (13%), 1LZJ (13%), 1FGG (14%), 1OMZ (9%), 1LL3 (15%) 1FGG (19%); 1FR8 (17%); 1QG8 (13%) 1QG8 (10%), 1LZJ (13%) 1FR8 (11%); 1QG8 (12%) 1QG8 (9%); 1FGG (8%) 1QG8 (7%); 1GA8 (11%); 1LL3 (13%); 1LZJ (14%) 1QG8 (9%); 1LZJ (12%); 1LL3 (12%); 1OMZ (10%), 1FR8 (13%) 1QG8 (11%); 1LZJ (13%); 1LL3 (12%); 1OMZ (11%) 1QG8 (11%), 1LL3 (17%), 1GA8 (13%) 1FGG (12%); 1QG8 (10%) 1FR8 (18%); 1FGG (17%); 1OMZ (11%); 1QG8 (9%); 1LL3 (13%) 1FOA (13%) 1QG8 (11%); 1GA8 (13%) (no similar glycosyltransferases found) 1LZJ (11%)
Tableau 4: Similarité de séquences entre différents β3-GalT et structures cristallographiques d’autres glycosyltransférases.
L’alignement des deux séquences a été effectué en considérant à la fois (1) l’homologie des séquences [Clustal W (Thompson et al., 1994), BioEdit (Hall, 1999)], (2) la position du site actif et des résidus importants catalytiques, et (3) la comparaison de clustars hydrophobes par la méthode HCA (Hydrophobic Cluster Analysis) (Callebaut et al., 1997). Malgré ces efforts, de nombreuses insertions et délétions de boucles sont présentes dans l’alignement final. Ceci peut être attribué à un taux de similarité entre les séquences relativement faible (19%). 68
Figure 24: L’alignement final de la glucuronyltransférase GlcAT-I humaine (code PDB 1FGG, code SwissProt O94766) et des glycosyltransférases β3GalT6 humaine (code GenBank AY050570) et murine (code GenBank AY050569).
La démarche suivante a consisté en la modélisation de la structure tridimensionnelle par le logiciel COMPOSER (Srinivasan and Blundell, 1993) dans le programme Sybyl . Plusieurs éléments de structures secondaires observés ou prédits sont assez bien conservés et donc ne présentent pas de problèmes pour la modélisation. Malheureusement, l’alignement comporte plusieurs insertions ou délétions majeures qui, en conséquence, ne peuvent pas être modélisées. Nous avons cependant créé un modèle préliminaire qui pourrait être utile plus tard, quand la connaissance sur cette famille de CAZY aura progressé.
69
Figure 25: Modèle préliminaire de la glycosyltransférase humaine β3GalT6.
4.3. Les galactosyltransférases associées aux chloroplastes dans les plantes L’enveloppe des plastes des plantes supérieures est un système membranaire de composition unique. A la différence des autres systèmes membranaires riches en phospholipides, les membranes plastidiales sont pour plus de 80 % constituées de galactolipides (Douce and Joyard, 1990) (monogalactosyldiacylglycérol, MGDG et digalactosyldiacylglycérol, DGDG; Figure 26). La majeure partie des lipides spécifiques de l’enveloppe plastidiale, et en particulier les galactolipides, sont synthétisés par des enzymes présentes dans cet organite (Marechal et al., 2002).
Figure 26: Galactolipides des membranes plastidiales. Le MGDG et le DGDG sont formés par l'assemblage d'un squellete glycérol, estérifié aux positions sn-1 et sn-2 par des acides gras et un ou deux galactoses en position sn3.
70
Un plaste, organite typiquement végétal, a été récemment décrit (Kohler et al., 1997) chez les Apicomplexes, des protistes classés traditionnellement dans le règne animal. Le phylum des Apicomplexes regroupe des parasites parmi les plus importants de l'homme, e.g. Toxoplasma gondii, vecteur de la toxoplasmose et Plasmodium falciparium, agent du paludisme. Chez ces parasites, le plaste – appelé apicoplaste – est vital : s’il est dégradé, la mort cellulaire suivra bientôt (Marechal and Cesbron-Delauw, 2001). Pour tous les plastes étudiés a ce jour, l'enveloppe limitante synthétise les lipides majeurs de l'organite, i.e. le MGDG et le DGDG (Marechal, 2002).
4.4. La modélisation par homologie de MGDG synthase La synthèse du MGDG est catalysée par une UDP-galactose:1,2-diacylglycérol 3-β-Dgalactosyltransférase (EC 2.4.1.46), appelée aussi MGDG synthase. Cette enzyme appartient à la famille GT 28 du système CAZY est catalyse le transfert d’un groupement α-D-galactose de l’UDP-galactose au 1,2-diacylglycérol (Figure 27). La MGDG synthase, enzyme clef de la biogenèse plastidiale, représente donc une cible potentielle pour des traitements thérapeutiques de type herbicide et antiparasitaire. La séquence codant pour une MGDG synthase n'a pas été identifiée dans les génômes des parasites. Pourtant, la synthèse de MGDG mesurée chez les plantes et les apicomplexes est inhibée par les ions Zn2+ de façon similaire (Marechal et al., 2002).
Figure 27: La réaction catalysée par la MGDG synthase. Le galactose est transféré de l’UDP-galactose au 1,2diacylglycérol. La configuration du carbone α du galactose transférée est inversé (un liaison β-1→sn-3 est formée), donc la MGDG synthase est une glycosyltransférase à mécanisme d’inversion.
71
La structure cristallographique de MGDG synthase n’est pas encore disponible. La protéine est une cible prometteuse pour des recherches d’herbicides, il est donc important de progresser dans la connaissance de la structure de la MGDG synthase afin de concevoir des inhibiteurs. A ce jour, 7 gènes ont été identifiés, trois chez Arabidopsis thaliana (MGD1, MGD2, MGD3), un chez le concombre, l’épinard, le tabac et le soja. Nous nous sommes attachés à modéliser la MGDG synthase de l’épinard. La structure de la MGD1 de Arabidopsis thaliana a été modélisée récemment à partir de MurG, une glycosyltransférase bactérienne de la famille 28 (Jeanneau, 2003). Malgré le faible taux d'identité (20%) entre les séquences de MurG et MGD1, les régions correspondants aux éléments de structures secondaires observés ou prédits sont très bien conservées et ne présentent pas d'insertion ou de délétion majeures. Le modèle a donc été construit en s'appuyant sur la ressemblance de ces régions et sur la conservation des résidus impliqués dans la reconnaissance du nucléotide-sucre. L’UDP-Gal a été placé dans le site de liaison du donneur dans une conformation et une orientation similaire à celles observées dans MurG en complexe avec l’UDP-GlcNAc. Ne disposant pas de structure de MurG en complexe avec son accepteur, il n'a pas été possible de prédire le site de liaison de l'accepteur. La structure de la MGD1 de Spinacia oleracea (soMGD1) a été modélisée en utilisant le modèle de atMGD1. Les séquences de soMGD1 et atMGD1 montrent un haut degré d'homologie (71 % d'identité), donc le modèle final de soMGD1 ressemble beaucoup à celui de at MGD1.
72
ARTICLE III
Botte, C., Jeanneau, C., Snajdrova, L., Bastien, O., Imberty, A., Breton, C. & Marechal, E. (2005). Molecular modelling and site directed mutagenesis of plant chloroplast MGDG synthase reveal critical residues for activity. J Biol Chem. 280, 34691-34701. Résumé : Le monogalactosyldiacylglycérol (MGDG), le lipide majeur dans les plastes de plantes et d'algues, est synthétisé par la MGD (ou MGDG synthase), une glycosyltransférase dimérique et membranaire, qui se trouve dans l'enveloppe plastidiale et qui catalyse le transfert d'un groupement galactose du glucide donneur (UDP-Gal) à la molécule d'accepteur (diacylglycérole). Bien que cette enzyme soit essentielle pour la biogenèse, et donc une cible prometteuse pour la conception d’herbicides, aucune information structurale n'est disponible. Les monomères de MGD montrent une forte similarité de séquence avec la MurG, une glycosyltransférase bactérienne qui catalyse le transfert d'un N-acétyl-glucosamine sur le peptidoglycanne. En s’appuyant sur la structure cristallographique de MurG de Escherichia coli, on a proposé un modèle pour la MGD de Spinacia oleracea. Cette prédiction structurale a été soutenue par les analyses de mutagenèse dirigée. L'architecture prédite du monomère est un repliement protéique constitué de deux domaines Rossmann. Le site de liaison pour l’UDP-galactose a été prédit dans la crevasse qui sépare les deux domaines Rossmann. Deux segments peptidiques de MGD (les boucles {β2-α2} et {β6-β7}) n’ont pas d’équivalents dans MurG, donc leurs structures n’ont pas pu être prédites. En confrontant le modèle obtenu avec l'information phylogénétique et biochimique, on a pu propose que la boucle {β2-α2} du domaine N-terminal soit impliquée dans la liaison du diacylglycérol.
73
74
THE JOURNAL OF BIOLOGICAL CHEMISTRY VOL. 280, NO. 41, pp. 34691–34701, October 14, 2005 © 2005 by The American Society for Biochemistry and Molecular Biology, Inc. Printed in the U.S.A.
Molecular Modeling and Site-directed Mutagenesis of Plant Chloroplast Monogalactosyldiacylglycerol Synthase Reveal Critical Residues for Activity*□ S
Received for publication, May 23, 2005, and in revised form, June 29, 2005 Published, JBC Papers in Press, July 11, 2005, DOI 10.1074/jbc.M505622200
Cyrille Botte´‡1, Charlotte Jeanneau§1, Lenka Snajdrova§, Olivier Bastien‡¶, Anne Imberty§, Christelle Breton§, and Eric Mare´chal‡2 From the ‡UMR 5168 CNRS-Commissariat a` l’Energie Atomique-Institut National de la Recherche Agronomique, Universite´ Joseph Fourier, De´partement Re´ponse et Dynamique Cellulaires, 17 rue des Martyrs, Commissariat a` l’Energie Atomique, 38054 Grenoble cedex 9, the §Centre de Recherche sur les Macromole´cules Ve´ge´tales-CNRS, 38041 Grenoble, and the ¶Gene-IT, 92500 Rueil-Malmaison, France Monogalactosyldiacylglycerol (MGDG), the major lipid of plant and algal plastids, is synthesized by MGD (or MGDG synthase), a dimeric and membrane-bound glycosyltransferase of the plastid envelope that catalyzes the transfer of a galactosyl group from a UDP-galactose donor onto a diacylglycerol acceptor. Although this enzyme is essential for biogenesis, and therefore an interesting target for herbicide design, no structural information is available. MGD monomers share sequence similarity with MURG, a bacterial glycosyltransferase catalyzing the transfer of N-acetyl-glucosamine on Lipid 1. Using the x-ray structure of Escherichia coli MURG as a template, we computed a model for the fold of Spinacia oleracea MGD. This structural prediction was supported by site-directed mutagenesis analyses. The predicted monomer architecture is a double Rossmann fold. The binding site for UDP-galactose was predicted in the cleft separating the two Rossmann folds. Two short segments of MGD (具2–␣2典 and 具6 –7典 loops) have no counterparts in MURG, and their structure could not be determined. Combining the obtained model with phylogenetic and biochemical information, we collected evidence supporting the 具2–␣2典 loop in the N-domain as likely to be involved in diacylglycerol binding. Additionally, the monotopic insertion of MGD in one membrane leaflet of the plastid envelope occurs very likely at the level of hydrophobic amino acids of the N-terminal domain.
Membranes of plant plastids have a unique lipid composition. Whereas phospholipids are the major polar lipids of cellular membranes, plastids contain up to 80% galactosylglycerolipids, i.e. 1,2-diacyl-3-O-(-Dgalactopyranosyl)-sn-glycerol (called monogalactosyldiacylglycerol or MGDG)3 and 1,2-diacyl-3-O-(␣-D-galactopyranosyl-(136)--D-galacto-
* This work was supported by Ministe`re de la Recherche and by Oseo Agence Nationale de la Valorisation de la Recherche Rhoˆne-Alpes. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. □ S The on-line version of this article (available at http://www.jbc.org) contains supplemental Fig. 1S. 1 Both authors contributed equally to this work. 2 To whom correspondence should be addressed. Tel.: 33-4-38-78-49-85; Fax: 33-4-3878-50-91; E-mail:
[email protected]. 3 The abbreviations used are: MGDG, monogalactosyldiacylglycerol; DAG, 1,2-diacylglycerol-sn-glycerol; DGDG, digalactosyldiacylglycerol; MGD, MGDG synthase; CHAPS, 3-[(3-cholamidopropyl)dimethylammonio]-1-propanesulfonic acid; MOPS, 4-morpholinepropanesulfonic acid; WT, wild-type; UDP-gal, UDP-galactose; MGlcDG, 1,2-diacyl-3-O-(-D-glucopyranosyl)-sn-glycerol; MGDG, 1,2-diacyl-3-O-(-D-galactopyranosyl)-sn-glycerol; MURG, UDP-␣-D-(N-acetyl)-glucosamine:N-acetylmuramyl(pentapeptide)pyrophosphoryl-undecaprenol-4-D-(N-acetyl)-glucosaminyl transferase.
75
OCTOBER 14, 2005 • VOLUME 280 • NUMBER 41
pyranosyl)-sn-glycerol (called digalactosyldiacylglycerol or DGDG) (1, 2). Because chloroplast thylakoids constitute the largest membrane surface in photosynthetic plants and algae, galactosylglycerides are considered as the most abundant polar lipids in the biosphere (3). MGDG is generated by transfer of a -galactosyl moiety from a water-soluble UDP-␣-D-galactose (UDP-Gal) donor onto the sn-3 position of the hydrophobic 1,2-diacyl-sn-glycerol (DAG) acceptor (4, 5). This reaction is catalyzed by a UDP-␣-D-galactose:1,2-diacylsn-glycerol 3--D-galactosyltransferase or MGDG synthase (TABLE ONE). Plastids derive from a single primary endosymbiosis between a cyanobacterium and a eukaryotic host (6, 7). Cyanobacteria share with plastids this unique galactolipid composition (8). However, synthesis of MGDG in cyanobacteria is a two-step process (9, 10). A first transfer of a -glucosyl moiety from a UDP-␣-D-glucose (UDP-Glc) donor onto DAG generates monoglucosyldiacylgycerol. Then an epimerase converts the -glucosyl polar head into -galactosyl, producing MGDG (TABLE ONE). Evolution from a two-enzyme MGDG synthesis in cyanobacteria to the one-enzyme synthesis in algae and plants is one of the major puzzling questions to understand the history of plastids. As soon as the first MGDG synthase (csMGD1) was molecularly characterized in cucumber (Cucumis sativa), the search for potential cyanobacterial homologues was conducted (11). However, based on the primary sequences, no gene candidate for cyanobacterial galactolipid synthesis could be identified (TABLE ONE). It is possible that the MGDG synthetic machinery is phylogenetically unrelated between cyanobacteria and eukaryotes. The picture is likely more complicated, because the MGDG synthase evolution cannot be fully traced in eukaryotes either. A collection of MGDG synthase (MGD) genes is now well established in Angiosperms, molecularly characterized in spinach (Spinacia oleracea) (12), thale cress (Arabidopsis thaliana) (13), and rice (Oriza sativa) (14). TABLE ONE gives a summary of what is currently known about MGDG synthases and related enzymes involved in glycolipid syntheses. MGD orthologues have been identified in the moss Physcomitrella patens, the green algae Chlamydomonas reinhardtii and Prototheca wickerhamii, and the red algae Cyanidioschyzon merolae. In the case of glaucophytes like Cyanophora paradoxa, whose plastids preserved a cyanobacterial peptidoglycan wall, it is not clear yet if MGDG is synthesized owing to a one-enzyme or a two-enzyme process. In eukaryotes that contain complex plastids inherited from a secondary endosymbiosis (Euglenids, Chlorarachniophytes, Cryptomonads, Haptophytes, Heterokonts, Dinoflagellates, and Apicomplexa) (7), MGD orthologues could only be found in a Heterokont, the diatom Thalassiosira pseudonana. In Euglenids (Euglena gracilis) or Apicomplexans for which we have abun-
JOURNAL OF BIOLOGICAL CHEMISTRY
34691
34692 JOURNAL OF BIOLOGICAL CHEMISTRY 5.3.1.x
2.4.1.157
EC number
2.4.1.46
2.4.1.46
2.4.1.46
2.4.1.46
2.4.1.157
2.4.1.227
UDP-␣-D-galactose: 1,2-diacylsn-glycerol 3-Dgalactosyltransferase
UDP-␣-D-galactose: 1,2-diacylsn-glycerol 3-Dgalactosyltransferase
UDP-␣-D-galactose: 1,2-diacylsn-glycerol 3-Dgalactosyltransferase
UDP-␣-D-galactose: 1,2-diacylsn-glycerol 3-Dgalactosyltransferase
UDP-␣-D-glucose: 1,2-diacylsn-glycerol 3␣-D-glucosyltransferase
UDP-␣-D-(N-acetyl)-glucosamine: N-acetylmuramyl(pentapeptide) pyrophosphoryl undecaprenol 4-D(N-acetyl)-glucosaminyl transferase
Synthesis of MGDG
Synthesis of MGDG
Synthesis of MGDG
Synthesis of MGDG
Synthesis of MGlcDG
Synthesis of lipid II
CAZy family
Threedimensional structure
Other than GT-28a dnb
UDP-␣-D-Glc
Synthesis of chloroplast MGDG in glaucophytes (i.e. Cyanophora)
No gene candidate
No gene candidate
Sugar donor
MGD
GT-28
This work
UDP-␣-D-Gal
MGD
GT-28
UDP-␣-D-Gal
UDP-␣-D-Gal
MGS
GT-4e
Model
Synthesis of MGlcDG in bacteria
Other than GT-28d
UDP-␣-D-Glc
UDP-␣-D-Gal
Exp. and sequence similarity
MURG
GT-28
X-ray
UDP-␣-D-GlcNAc
Synthesis of Lipid II peptidoglycan in bacteria and cyanobacteria
Exp. and sequence similarity
No gene candidate
Synthesis of MGDG in Apicomplexa (i.e. Plasmodium and Toxoplasma)
No gene candidate
Synthesis of MGDG in Euglenids (i.e. Euglena)
Sequence similarity
Synthesis of chloroplast MGDG in Heterokonts (i.e. Thalassiosira)
exp. and sequence similarityc
c
b
In the Synechocystis PCC 6803 complete genome, the only recorded amino acid sequence of the CAZy GT-28 family is the cyanobacterial MURG homologue. dn, does not apply. Exp., experimental evidence. d No GT-28 candidate in the Plasmodium or Toxoplasma predicted open reading frames. e Model from Ref. 60.
a
Name
Synthesis of MGDG in cyanobacteria (i.e. Synechocystis)
Molecular characterization
dn
DAG
Acceptor
(N-acetyl)muramyl (pentapeptide)pyrophosphorylundecaprenol (Lipid I)
DAG
DAG
DAG
DAG
DAG
Synthesis of chloroplast MGDG in plants (i.e. Physcomitrella, Spinacia, and Arabidopsis), green algae (i.e. Chlamydomonas and Prototheca), and red algae (Cyanidioschyzon)
?
1,2-Diacyl-3-O-(D-glucopyranosyl)-snglycerol glucosyl isomerase
Synthesis of MGDG
Synthesis of MGDG
UDP-␣-D-glucose: 1,2-diacylsn-glycerol 3-D-glucosyltransferase
Activity experimentally detected
Synthesis of MGlcDG
Function
Summary of plant plastid MGDG synthases and related enzymes involved in glycolipid syntheses in algae, cyanobacteria, and bacteria
TABLE ONE
-(N-acetyl)glucosaminyl (N-acetyl)muramyl (pentapeptide) pyrophosphorylundecaprenol (Lipid II)
MGlcDG
MGDG
MGDG
MGDG
MGDG
MGDG
MGDG
MGlcDG
Product
MGDG Synthase Molecular Modeling
76
VOLUME 280 • NUMBER 41 • OCTOBER 14, 2005
MGDG Synthase Molecular Modeling dant genomic information (Plasmodium falciparum and Toxoplasma gondii) no MGD candidate gene could be identified using classic bioinformatic tools. Lack of MGD orthologue in these organisms is surprising, because chloroplastic galactolipid syntheses could be assessed in Euglena (15) or in Toxoplasma and Plasmodium (16). Because fold is more conserved than sequence by evolution, three-dimensional structure comparison is a powerful means to establish relatedness of proteins (17), even in the absence of sequence similarity. Some clues to comprehend galactolipid synthesis in cyanobacteria, glaucophytes, euglenids, or apicomplexans might therefore benefit from the knowledge of the molecular structure of Angiosperm MGD. Synthesis of MGDG is a key process for the biogenesis of plastid membranes, particularly for thylakoid expansion (12, 13, 18). MGDG is also involved in the functional integrity of the photosynthetic machinery (19). Arabidopsis mutants containing half the normal MGDG amount are consistently severely affected, with defects in chloroplast development, impairment of photosynthesis, and an overall chlorotic phenotype. MGDG is the substrate for another essential lipid, DGDG (20, 21), which is exported to plasma membrane (22) and mitochondria (23) under phosphate deprivation, likely to replace missing phosphatidylcholine (24, 25). In addition to plastid membranes, MGDG synthesis is therefore essential for the biogenesis of most cell membranes. Taken together, the roles played by MGDG are vital and imply that MGD enzymes are potent targets for herbicide screening (26). Therefore, the MGD three-dimensional structure would also be an important starting point to dissect the MGD molecular mechanism and orientate the rational design of herbicide candidates. In Angiosperms, MGDG production is restricted to the two membranes of the envelope that surrounds plastids (27–29). Our current knowledge about MGDG synthase function, structure, and membrane topology is established from the enzymatic activity of purified chloroplast envelope from spinach (30, 31), latter attributed to soMGD1 (12). In A. thaliana, synthesis of MGDG is catalyzed by a family of three proteins (atMGD1, atMGD2, and atMGD3) of which activity and subcellular targeting to plastid were analyzed in depth (13, 18). Unfortunately, our attempts to crystallize MGD proteins from either of these models, spinach or Arabidopsis, after functional expression in Escherichia coli (26), were unsuccessful. Glycosyltransferases have been hierarchically classified on the basis of sequence similarities and stereochemistry of the reactions (32). Despite their number and functional diversity, glycosyltransferases fall into two major protein fold superfamilies named GT-A and GT-B, respectively (32, 33). This classification is constantly evolving and updated (CAZy classification available at afmb.cnrs-mrs.fr/CAZY/) (34). From sequence alignments and fold recognition, MGD enzymes are members of the GT-B superfamily. The closest homologues of MGD proteins were found to be MURG glycosyltransferases (11), with whom they are classified in the GT-28 family of the CAZy systematics. To date the GT-28 family does only contain one three-dimensional structure, i.e. that of E. coli MURG (ecMURG) (35, 36). MURG catalyzes the last intracellular step in bacterial and cyanobacterial peptidoglycan biosynthesis, i.e. transfer of an N-acetyl--D-glucosaminyl from a UDP-␣-DN-acetyl-glucosamine (UDP-GlcNAc) donor onto lipid 1 (Table I). For both MGD and MURG activities, a -glycosyl moiety is transferred from a UDP-␣-sugar donor to a hydrophobic acceptor, after an (␣3) inversion of the anomeric configuration of the sugar (Table I). It is therefore tempting to suppose that MGD from algae and plants derived from a MURG sequence of the ancestral symbiotic cyanobacteria. Here, we used the ecMURG structure as a template for secondary structure comparison and fold prediction. This approach was combined
77
OCTOBER 14, 2005 • VOLUME 280 • NUMBER 41
with enzymological data and phylogenetical comparisons to deduce molecular models for spinach soMGD1 and Arabidopsis atMGD1, atMGD2, and atMGD3, focusing particularly on the active site. The soMGD1 model was then challenged and sustained by site-directed mutagenesis analyses.
MATERIALS AND METHODS Materials—Unlabeled and 14C-labeled (11.0 GBq.mmol⫺1) UDP-Gal were obtained from Sigma and New England Nuclear, respectively. 1,2Dioleoyl-sn-glycerol (DAG), isopropyl--D-thiogalactopyranoside, phosphatidylglycerol, and CHAPS were purchased from Sigma. Wild-type soMGD1 Expression Vector and Site-directed Mutagenesis via PCR—The soMGD1 sequence used in this study was inserted in NdeI-BamHI cloning site of the pET-y3a vector (12). The full-length wild-type (WT) soMGD1 refers to the coding sequence truncated of its predicted chloroplastic transit peptides (12), i.e. from leucine 99 to alanine 522. Mutations were introduced into the cloned soMGD1 by using the QuikChange site-directed mutagenesis kit (Stratagene). Mutations were confirmed by sequencing (Genome Express, Grenoble). Recombinant Wild-type and Mutated soMGD1 Functional Expression in E. coli at 28 °C —Isolated colonies of transfected E. coli (BL21DE3) were inoculated in LB medium (2.5 ml, 100 g/ml carbenicillin) and grown at 37 °C. When A600 reached 0.5, the cell suspension was transferred to 15 ml of LB medium (100 g/ml carbenicillin) and grown at 37 °C until A600 reached 0.5. Cells were then transferred to 400 ml of LB medium (100 g/ml carbenicillin) and grown until A600 reached 0.5. Isopropylthio--D-galactopyranoside (0.4 mM) was subsequently added to induce soMGD1 expression, and the suspension was incubated at 28 °C for 4 h. Cells were harvested by centrifugation and stored at ⫺20 °C. Detergent Solubilization of Wild-type and Mutated soMGD1 and Purification by Hydroxyapatite Chromatography —All the operations were carried out at 4 °C. After expression of soMGD1, bacteria pellets (1 mg of proteins) were incubated for 30 min in 1 ml of medium A (6 mM CHAPS, 1 mM dithiothreitol, 10 mM MOPS-KOH, pH 7.8) containing 50 mM KH2PO4/K2HPO4. The mixture was centrifuged for 30 min at 7,000 ⫻ g. The pellets containing inclusion bodies of improperly folded and inactive soMGD1 polypeptides (26) were discarded. The supernatants, containing soMGD1 enzymes extracted from bacterial membranes by CHAPS and solubilized in mixed micelles (12, 13, 26), were loaded onto the top of a 5- ⫻ 15-mm column containing 500 l of a Hydroxyapatite-Ultrogel (LKD) matrix equilibrated with buffer A, at a 1 ml/min flow rate (37). The column was washed with 2 ml of buffer A. The matrix-bound soMGD1 wild-type and mutated proteins were eluted with 2 ml of 500 mM KH2PO4/K2HPO4 in buffer A. Fractions were collected and used for enzymatic assays and protein determination. MGDG Synthase Enzymatic Assay—Enzyme activity was assayed in mixed micelles at 25 °C (30). Phosphatidylglycerol (1.3 mM) and DAG (160 M) dissolved in chloroform were first introduced into glass tubes. After evaporation of the solvent under a stream of argon, 300 l of incubation medium containing 4.5 mM CHAPS, 1 mM dithiothreitol, 250 mM KCl, 250 mM KH2PO4/K2HPO4, and 10 mM MOPS-KOH, pH 7.8, and purified soMGD1 were added. The mixture was mixed vigorously and kept 30 min at 25 °C for equilibration of mixed micelles. The reaction was then started by addition of 1 mM UDP-[14C]galactose (37 Bq.mol⫺1) and stopped by addition of chloroform/methanol (1:2, v/v). The lipids were subsequently extracted (38), and the radioactivity of the 14 C-labeled MGDG ultimately produced was determined by liquid scintillation counting. Activity is expressed in micromoles of incorporated galactose.h⫺1.mg protein⫺1.
JOURNAL OF BIOLOGICAL CHEMISTRY
34693
MGDG Synthase Molecular Modeling MGD and MURG Sequences—In this study, twelve MGD and eight MURG sequences were used as reference sets for sequence analyses and structure model predictions. Each sequence is given with its accession reference in the source data base. Annotated MGD sequences from representative Angiosperms, i.e. eight in dicots, S. oleracea, soMGD1 (CAB56218); A. thaliana, atMGD1 (BAB12042), atMGD2 (T52269), and atMGD3 (BAB12041); C. sativa, csMGD1 (AAC49624.1), Nicotiana tabacum, ntMGD1 (BAB11980), and Glycine max, gmMGD1 (BAB11979); and two in the monocot O. sativa, osMGD1 (BAD33425) and osMGD2 (XM_481404), were obtained via the National Center for Biotechnology Information (NCBI) web site (www.ncbi.nlm.nih.gov/, February of 2005). Full-length sequence from the moss P. patens, ppMGD was deduced from clustered expressed sequence tag (EST) sequences (Php_AX155049, Php_dbEST_Id: 10946475_Frame-2 and Php_AX150691_Frame-3) obtained via the Moss Genome Initiative web site (www.leeds.moss.ac.uk, August of 2004). Partial sequence from the green alga C. reinhardtii MGD, crMGD (partial, C_21260001) was obtained via the ChlamyDB website (www.chlamy.org/chlamydb.html). Partial sequence from the non-photosynthetic green alga P. wickerhamii, pwMGD (partial, AAV65358) was obtained via the NCBI website. Annotated MGD sequence from the red alga C. merolae, cmMGD (#3974) was obtained via the Cyanidioschyzon Genome Project web site (merolae.biol.s.u-tokyo.ac.jp/). The three MGD sequences from the diatom T. pseudonana, tpMGD1 (full-length; Thp_grail.23.172.1 and Thp_newV2.0.genewise.23.85.1), tpMGD2 (partial, Thp_grail. 120.10.1), and tpMGD3 (partial, genewise.89.116.1) were obtained via the DOE Joint Genome Institute (genome.jgi-psf.org/thaps1/thaps1.home.html). The MURG sequences of seven bacteria, i.e. E. coli, ecMURG (CAA38867), Bacillus subtilis, bsMURG (P37585), Mycobacterium avium, maMURG (NP_960831), Vibrio vulnificus, vvMURG (Q7MNV1), Listeria innocua, liMURG (NP_471475), Lactococcus lactis, llMURG (NP_267745), and Bartonella bacilliformis, bbMURG (AAT38530), and one cyanobacteria, Synechocystis sp. PCC 6803, syMURG (NP_442963), were downloaded from the NCBI web site. These sequences were obtained by BLASTP similarity searches and selected for their representation of prokaryotic phylogenetic diversity. Phylogenetic Reconstruction—MGD and MURG phylogeny was inferred from the 22 sequences described under “Materials and Methods.” Distance between sequences was computed using the TULIP methods (39, 40). Alignment was achieved with the Smith-Waterman method and the BLOSUM 62 scoring matrix, using the Biofacet package from Gene-IT, France (41). We computed estimated z-scores with 2000 sequence shuffling (39, 42). Distance between sequences was then calculated using the z-score matrix (40). Tree topology was generated using the Neighbor-Joining algorithm (43). Homology Modeling—Homology modeling of soMGD1 was based on the crystal structure of ecMURG (36) using sequence alignment described in this report. ␣-Helix and -strand distributions in the MGD sequences were predicted with several secondary structure prediction servers (www.sbg.bio.ic.ac.uk/⬃3dpssm/ and www.npsa-pbil.ibcp.fr/ cgi-bin) (44, 45). Hydrophobic cluster analysis method was used to refine the sequence alignment (46). Hydrophobic cluster analysis is a graphical method based on the detection and comparison of hydrophobic clusters that are presumed to correspond to the regular secondary structure elements constituting the architecture of globular proteins. Structurally conserved regions, corresponding to secondary structure elements, were built with the COMPOSER homology modeling module (47) of the software Sybyl (Tripos Inc., St. Louis, MO). Loops were subsequently modeled from a general non-redundant protein fold data
78
34694 JOURNAL OF BIOLOGICAL CHEMISTRY
base in COMPOSER. Two protein segments (Asp69–Asn80 and Thr296– Ile305 in the soMGD1 sequence, called 具2–␣2典 and具6 –7典 loops) could not be modeled and were not considered in the final model. Minor steric conflicts and local conformational problems were analyzed with PROCHECK (48), and local optimization was performed in several cycles. The final model includes hydrogen atoms and partial atomic charges derived using the Pullman procedure. Docking of Nucleotide Sugar in the Binding Site—The UDP-Gal molecular structure was docked into the proposed active site of soMGD1 in an orientation and conformation similar to that of UDPGlcNAc in the ecMURG crystal structure (36). Several cycles of energy minimization were performed to optimize both the geometry of the ligand and the interacting protein side chains in the binding site and its vicinity. Energy calculations were performed using the Tripos force field (49) in the Sybyl package together with energy parameters derived for carbohydrates (50) and for nucleotide sugars (51).
RESULTS Homology of MGD Sequences throughout Evolution—The fully sequenced genome of A. thaliana contains three MGD proteins, atMGD1, atMGD2, and atMGD3 (18). These isoforms could be phylogenetically grouped into two types, the A-type (atMGD1) and the B-type (atMGD2 and atMGD3), having different substrate specificities with regard to DAG molecular species and different physiological functions. Awai et al. (13) showed that non-conserved residues could serve as signatures for A- and B-types. On a primary sequence basis, A-type is characterized by a canonical chloroplast transit peptide of ⬃100 amino acids, whereas the B-type exhibits a shorter addressing sequence of ⬃30 residues. Numerous full-length and cloned A-type MGD sequences are characterized in other Angiosperms (in S. oleracea, soMGD1; C. sativa, csMGD1; N. tabacum, ntMGD1; and G. max, gmMGD1), probably because this type is devoted to thylakoid biogenesis and is consistently highly expressed. By contrast, Awai et al. (13) could only trace B-types in gene fragments from unfinished plant genomes. It is now possible to describe in the sequenced genome of another plant model, i.e. O. sativa (rice), a second example of a multigenic family comprising an A-type (osMGD1) and a B-type (osMGD2). We sought the occurrence of MGDG synthases in other groups and identified MGD sequences in the moss P. patens, ppMGD (full-length); the green algae C. reinhardtii, crMGD (fragment) and P. wickerhamii, pwMGD (fragment); and the red alga C. merolae, cmMGD (full-length). In the complete genome sequence of C. merolae, only one MGD gene is predicted. In the diatom T. pseudonana, three different MGD sequences could be detected, tpMGD1 (full-length), tpMGD2 (fragment), and tpMGD3 (fragment). In diatoms, plastids derive from a secondary endosymbiosis (7) and are surrounded by three membranes, the outermost being connected to the endomembranes (reticulum). Consistently, the full-length tpMGD1 exhibits a bipartite N terminus with a predicted signal peptide (targeting to endomembranes) upstream from a chloroplast transit peptide (targeting to chloroplast envelope membranes). Because we do not know the complete sequences of tpMGD2 and tpMGD3, the precise subcellular localization of tpMGD1, tpMGD2, and tpMGD3 and their physiological functions, the numbering of tpMGD sequences was set arbitrarily and is not related to Angiosperm A- or B-types. Multiple alignments built with MGD sequences from distant species (see supplementary Fig. 1) highlight the residues conserved throughout evolution. No amino acid conservation is detected in N-terminal chloroplast-addressing sequences. Rather strong conservational pressure is noticed in most of the mature proteins, with highly conserved domains
VOLUME 280 • NUMBER 41 • OCTOBER 14, 2005
MGDG Synthase Molecular Modeling
FIGURE 1. Phylogeny of MGD and MURG. Fulllength sequences of MGDG synthases from five Angiosperms (at, A. thaliana; cs, C. sativus; gm, G. max; nt, N. tabacum; and so, S. oleracea), a Bryophyte (pp, P. patens), a Rhodophyte (cm, C. merolae), and a Heterokont (tp, T. pseudonana) and eight prokaryotic sequences of MURG enzymes (bb, Bartonella bacilliformins; bs, B. subtilis; li, L. innocua; ll, L. lactis; ma, Mycobacterium avium; sy, Synechocystis sp. PCC 6803; vv, V. vulnificus) were used to build a TULIP tree according to a previous study (40). The obtained phylogenetic tree shows related groups corresponding to MGD sequences from primary plastids of green lineage (green), MGD sequences from primary and secondary plastids of red lineage (red), and MURG sequences (gray). The MURG/MGD phylogenic discontinuity correlates with a functional discontinuity for substrates (Lipid 1/DAG and UDP-GlcNac/UDP-Gal) and the red lineage/green lineage discontinuity correlates with a functional discontinuity in used DAG molecular species (DAG of C16-C18 acyllength/DAG of C16-C18-C20 acyl-length). In Angiosperms, the type A and B clusters also correspond to differences in enzyme specificity for DAG molecular species. MGD and MURG segments, for which local similarity profiles correlate with these discontinuities, were analyzed as possible sites for DAG binding (see supplemental Fig. 1S).
such as two G-loops (G-loop 1, SDTGGGHRASA, and G-loop 2, VLXXGGG(E/D)GXG). In our attempts to predict a MGD structure, we used the residue conservation profile observed in eukaryotic MGD sequences to identify amino acids that might be likely important for catalysis, particularly in the N-domain of the protein that diverges from the N-domain of MURG and that is likely involved in acceptor binding (see below). Comparative Analysis of the Sequence and Substrate Evolution in MGD and MURG Phylogeny—Fig. 1 shows the phylogenetic tree built with ten MGD mature proteins identified in plants and protists and eight MURG sequences sampled in bacteria and cyanobacteria. Phylogeny was reconstructed using the TULIP tree method (39, 40). This tree recovers the A- and B-types MGD from Angiosperms in two clusters. The Bryophyte sequence (ppMGD) shares features with Angiosperm A-type. When computed with partial sequences from C. reinhardtii (crMGD) and P. wickerhamii (pwMGD), the green algae MGD proteins branch with the Angiosperms plus Bryophyte clade (not shown), indicating the close relationship between MGD of green algae and plants (Viridiplantae), in a “green lineage” subgroup. The red algal sequence from C. merolae (cmMGD) is set between Viridiplantae and Heterokonts, consistently with the origin of Heterokonts plastids deriving from a secondary endosymbiosis with a red alga (7). Thus, the phylogeny of MGD comprises a second subgroup, corresponding to the “red lineage” (Fig. 1). This red lineage subgroup of MGD sequences is more closely related to MURG sequences (Fig. 1). In our study, we sought to connect structural and functional information from ecMURG with functional information from soMGD1 to deduce structural features for this enzyme. Discontinuities between MURG and red lineage MGD and between red lineage and green lineage MGD might provide precious information, particularly regarding the acceptor site. In addition to the acceptor discontinuity that characterizes MURG (Lipid 1) and MGD (DAG), there is a subtle discontinuity in the acyl-species used to synthesize MGDG in plastids from chloroplasts
79
OCTOBER 14, 2005 • VOLUME 280 • NUMBER 41
(plants and green algae) and rhodoplasts (red algae), i.e. DAG of C16C18 acyl-length in plants (52) and green algae (53–55) and DAG of C16-C18-C20 acyl-length in red algae (55, 56). In Angiosperms, Awai et al. (13) had shown that A- and B-types also had different enzymological specificities regarding DAG molecular species, i.e. B-type enzymes exhibited a higher affinity for C18:2/C18:2 DAG than for 18:1/16:0, whereas A-type MGD did not exhibit different selectivity for these substrates. Looking back at the multiple alignment of MGD sequences (supplementary Fig. 1S), we sought therefore regions that might correlate with such phylogenic and acceptor discontinuities. Close to G-loop 1, we notice a DXWX(E/D)XXXWP segment of ⬃10 amino acids in Angiosperms and Bryophyte (supplemental Fig. 1) that shows a DXWKEYXGWP profile in Angiosperm B-type, a DLWX(E/D)HTPWP profile in Angiosperm A-type, and a divergent content in red algae and Heterokonts and that appears as an extra domain when compared with MURG (see Fig. 2, amino acid segment called 具2–␣2典 loop, see below). MGD Fold Prediction Using ecMURG as Template—The x-ray structure of ecMURG free enzyme (35) or in complex with its UDP-GlcNAc substrate (36) consists of two distinct domains similar in size, a N-domain and a C-domain, each containing a three layer ␣//␣ sandwich typical of a Rossmann fold. The secondary structure elements in the N-domain are numbered b1/␣1/2 . . . ␣5/6, and those in the C-domain ⬘1/␣⬘1/⬘2 . . . ␣⬘5/⬘6 (Fig. 2). Classic Rossmann folds contain conserved glycine-rich motifs, with the consensus GXGXXG. (5457 indicated G-loop 1 and G-loop 2 in Fig. 2. The N- and C-domains are joined by a short linker between 7 of the N-domain and the ␣-link of the C-domain, defining the floor of a deep separating cleft. The individual N- and C-domains are similar in the presence and absence of substrate; however, there is a change in the relative orientation around residue Asn161 of ecMURG (36) (hinge position indicated in Fig. 2.) The ␣⬘6 – 8 helices at the C terminus of the sequence return to the N-domain floor and stabilize the double-Rossmann fold.
JOURNAL OF BIOLOGICAL CHEMISTRY
34695
MGDG Synthase Molecular Modeling
FIGURE 2. Multiple alignment of the MGD sequences from S. oleracea and A. thaliana with the E. coli MURG sequence. Sequences were aligned with ClustalW followed by manual refinement using the hydrophobic cluster analysis method. Predicted secondary structures (␣-helices and -strands) are indicated. Alignment is given starting from the first -strand. N- and C-domains of the MURG double Rossmann fold are indicated. Gaps in aligned MURg and MGD are shown with a gray line. MURG hinge position between the N- and C-domains and the substrate binding sites identified from structural studies (35, 36) are indicated. Black boxes, conserved residues; gray boxes, similar residues. C-domains and the substrate binding sites identified from structural studies (35, 36) are indicated. Black boxes, conserved residues; gray boxes, similar residues. C-domains and the substrate binding sites identified from structural studies (35, 36) are indicated. Black boxes, conserved residues; gray boxes, similar residues.
80 34696 JOURNAL OF BIOLOGICAL CHEMISTRY
VOLUME 280 • NUMBER 41 • OCTOBER 14, 2005
MGDG Synthase Molecular Modeling Fig. 2 shows the multiple alignment of E. coli ecMURG, A. thaliana atMGD1, atMGD2, and atMGD3, and S. oleracea soMGD1, obtained with ClustalW and manually refined using hydrophobic cluster analysis method so as to optimize the alignment of putative secondary structure elements (46). The different MGD sequences are 60 –70% identical. Despite a low percentage of identity with MURG (⬃20%), the alignment shows a strong conservation of secondary structures (␣-helices and -strands) (Fig. 2). In particular, Rossmann folds of the N- and C-domains and the corresponding G-loop 1 and G-loop 2 are strictly conserved. Two important insertions are predicted in the N-domain of MGD, between 2 and ␣2 (the 具2–␣2典 loop) and between 6 and 7 (the 具6 –7典 loop). Among residues of the C-domain of ecMURG involved in the recognition of the UDP sugar, most are conserved in aligned MGD sequences. In particular, in the ecMURG/soMGD1aligned residues, we listed the Arg164/Arg313 and Glu269/Glu427 conservation at the positions involved in ribose recognition in MURG, Phe244/ Phe402 and Ile245/Val403 at the uracil recognition position, Thr266/Thr424 at the phosphate recognition position, and Gln288/Gln444 and Asn292/ Asn448 at the position involved in the GlcNAc recognition in MURG (Fig. 2). Interestingly, the Gln289 at the position involved in GlcNAc recognition in MURG, is not conserved in MGD and is substituted by a glutamate, Glu445 that may be important in the specificity for galactose. Differences are also detected in G-loop 2 at the Ser192/Gly345 and Gln193/Glu346 positions in the ecMURG/soMGD1 alignment. G-loop 2 is known to bind phosphate. Both G-loop 1 and G-loop 2 cover the donor binding site of MURG, and the observed substitution might be also important for appropriate binding of substrates in MGD. The multiple alignment (Fig. 2) was a basis for building a threedimensional model of soMGD1 based on the crystal structure of ecMURG. The resulting model is shown in Fig. 3. The soMGD1 structure is built as a double Rossmann fold, with a catalytic site inside the cleft separating N- and C-domains (Fig. 3). The prediction is interrupted in the N-domain at the level the 具2–␣2典 and 具6 –7典 regions, two loops that do not have corresponding sequences in MGD and could not be safely built. The roots for the 具6 –7典 region are oriented toward the floor of the double Rossmann fold. By contrast, the roots for the 具2– ␣2典 loop head toward the catalytic site. This orientation, combined with the correlation of the 具2–␣2典 amino acid profile with phylogeny and DAG molecular species specificity, suggests that the 具2–␣2典 loop might be involved in DAG recognition. Structural Prediction of the Substrate Binding Pocket in soMGD1—The UDP-Gal binding site was predicted using structural information from the ecMURG:UDP-GlcNAc x-ray crystal structure (36). The donor binding site is located in the conserved proline- and glycine-rich regions within the cleft separating the N- and C-domains, on the C-domain side (Fig. 4, A and B). G-loop 1 and G-loop 2 erect symmetrically above the substrate binding pocket (Fig. 4C). Deep inside the cleft, the aromatic ring of a phenylalanine, Phe402, is stackedontheuracilandcontactsitclosely(Fig.4,BandC).Theuracilisfurther held in place by contact from the N(3) and the O(4) atoms to the backbone of Val403. Ribose interacts with Arg313 and Glu427. The galactosyl moiety interacts with 2 amino acids of the ␣⬘5 helix: the acidic group of Glu445 is hydrogenbonded to O(3), whereas the NH2 of Gln444 bridges both O(3) and O(4). Hydrophobic contacts are also established between the methane hydrogens at C(4) and C(6) that interact with Pro422 (Fig. 4B). Because of the divergence between MURG and MGD in the N-domain, where the acceptor is believed to bind, no prediction was achieved concerning DAG binding. In our model we observe enrichment in hydrophobic amino acids close to the galactose binding site that may be important for DAG docking (data not shown). In enzymological analyzes, MGD activity was lost when the enzyme was incubated with lysine reagents (citraconic anhydride or tert-bu-
81
OCTOBER 14, 2005 • VOLUME 280 • NUMBER 41
FIGURE 3. Overall structure of the model of soMGD1:UDP-Gal complex. A, ribbon representation of soMGD1. UDP-Gal is shown in ball-and-stick representation. B, Connolly surface of the protein. Color was coded with hydrophobicity potential (from brown for hydrophobic to blue for hydrophilic regions). Stars indicate the location of the missing 具2–␣2典 loop.
toxycarbonyl-L-methioninehydrosuccidimidyl ester). MGD was protected from lysine-blocking agents by DAG, indicating that one or more important lysine residues were localized in the vicinity of the acceptor site (31). By substrate protection experiments, Mare´chal et al. (31) further showed that one or more key histidine and cysteine residues were present in the vicinity of the DAG and UDP-Gal binding sites. Fig. 4C highlights the cysteine, histidine, and lysine residues predicted in the vicinity of the MGD catalytic site, i.e. Cys272, His240, His245, and Lys419 and to a lesser extent a more remote Cys378. MGD Site-directed Mutagenesis and Functional Assay—We mutated recombinant soMGD1 via polymerase chain reaction, in most cases by replacing residues by a small amino acid (WT 3 Ala) or replacing conserved acidic residues by amines (Asp 3 Asn, Glu 3 Gln, and Asp/Glu 3 Asn), or basic amino acids by an acidic residue (Lys 3 Glu and Arg 3 Glu). In one mutant, the GGGEG segment of G-loop 2 was deleted (G-LOOP mutant). Wild type and mutants were expressed in E. coli after isopropyl--D-thiogalactopyranoside induction. Analysis of the total proteins from each strain showed the accumulation of a 45-kDa polypeptide (solid arrow, Fig. 5A) corresponding to the monomeric subunit of soMGD1 (12). The G-LOOP soMGD1 mutant had a lower apparent molecular weight (white arrow, Fig. 5A). No soMGD1 protein could be detected in non-induced controls (control, Fig. 5A). Production levels of WT and point-mutated soMGD1 polypeptides were equivalent, whereas an overaccumulation of the G-LOOP mutant was observed. In recombinant E. coli, polypeptides corresponding to soMGD1 are either targeted to the bacterial membranes (0.1%) where the enzyme is active, or massively routed to inclusion bodies as inactive
JOURNAL OF BIOLOGICAL CHEMISTRY
34697
MGDG Synthase Molecular Modeling
FIGURE 4. Three-dimensional representation of soMGD1 substrate binding site. A, detail view of the UDP-Gal binding pocket. Connolly surface color is coded with electrostatic potential (from blue for negatively charged to red for positively charged amino acids); UDP-Gal is represented with stick model. B, stereographic view of the UDP-Gal interaction with soMGD1 residues. Hydrogen bonds between UDP-Gal and soMGD1 residues are displayed with green dashed lines. C, general view of the vicinity of the UDP-Gal binding site. Colored amino acids were considered for mutations (green, amino acids predicted to bind UDP-Gal; red, lysine and histidine residues in the vicinity of the UDP-Gal binding site). Yellow arrows indicate the roots of the missing loop 具2–␣2典 that is presumed to play a role in DAG binding.
unfolded polypeptides (99.9%) (26). We solubilized membrane soMGD1 with CHAPS (30, 31, 37), a zwitterionic detergent that is also known to help MURG solubilization (58). The enzyme in mixed
82
34698 JOURNAL OF BIOLOGICAL CHEMISTRY
micelles was subsequently purified by a hydroxyapatite chromatography (12, 13, 37), and the purified fraction was used for enzymatic and protein assays and determination of specific activities (Fig. 5B). The
VOLUME 280 • NUMBER 41 • OCTOBER 14, 2005
MGDG Synthase Molecular Modeling decreases obtained after mutation of most sites suspected of being important for MGD, sustained the predicted model presented here. To check our hypothesis on the role of the 具2–␣2典 region in acceptor binding, kinetic parameters were determined for the native enzyme and mutant W171A. To that purpose, WT and W171A soMGD1 proteins expressed in E. coli membranes were extracted and solubilized with CHAPS (12, 13, 26) and further purified by hydroxyapatite-agarose chromatography (37). The purified fractions are delipidated; it is therefore possible to control the DAG and UDP-Gal availability and determine the corresponding Km according to the surface dilution model (30, 59). Under this experimental procedure, the Km measured for the recombinant WT protein in respect to UDP-Gal was ⬃6-fold higher than that measured for the chloroplast-purified enzyme (30) and was not affected by the mutation (for WT soMGD1, KmUDP-gal ⫽ 0.650 ⫾ 0.146 mM; for W171A soMGD1, KmUDP-gal ⫽ 0.565 ⫾ 0.565 mM). By contrast, a 2-fold increase in the Km value for DAG was observed (for the WT, KmDAG ⫽ 0.0120 ⫾ 0.0028 mol-fraction; for the W171A mutant, KmDAG ⫽ 0.0225 ⫾ 0.0035 mol-fraction). The effect of the W171A mutation on the affinity for DAG supports a possible role of the 具2–␣2典 loop in the enzyme interaction with DAG. Surface Analysis of soMGD1—Topological studies previously demonstrated the association of soMGD1 with envelope membranes (12). Using the present model, membrane association can be sought by surface analysis. Surface hydrophobic patches can be evidenced in the N-domain as clearly shown in Fig. 3B. Such hydrophobic regions that are found on both sides of the N-domain are expected to interact strongly with the envelope membrane.
DISCUSSION FIGURE 5. soMGD1 point mutation analysis. A, functional expression of WT and mutated soMGD1 in E. coli. E. coli cells (BL21 DE3) were transformed with pET-Y3a plasmid containing wild type (WT) or mutated soMGD1 genes (as indicated). Bacteria were grown at 37 °C until A600 reached 0.4 – 0.5. Expression of recombinant proteins was induced by addition of 0.4 mM isopropyl--D-thiogalactopyranoside at 28 °C for 4 h. 30 g of bacteria total protein extracts was subsequently separated by SDS-PAGE and visualized by Coomassie Blue staining. Control corresponds to non-induced bacteria. Black arrows indicate position of 45-kDa WT and mutated soMGD1. The white arrow indicates the G-loop-deleted form of the protein. Molecular mass markers are given in kilodaltons. B, galactosylation activity assays of the purified WT and mutated soMGD1 proteins. Expressed WT and mutated soMGD1 proteins were extracted from bacterial membranes and solubilized with CHAPS, a zwitterionic detergent, and subsequently purified by chromatography on hydroxyapatite-agarose as described under “Materials and Methods.” Specific activity is given in micromoles of galactose incorporated per hour per mg protein.
G-LOOP deletion prevented any proper enzyme solubilization by CHAPS, thus suggesting a role of this region in accurate protein folding. For this mutant, the activity was measured in crude bacteria extracts (Fig. 5B). Based on activity, three classes of mutants were observed. First, C378A mutant, targeting a Cys in the vicinity of the substrate binding region was only partially affected, with ⬃75% of the WT-specific activity (Fig. 5B). Likewise, soMGD1 substituted at the level of Cys284, Cys286, and Cys415 did not show any altered activity in crude bacteria extracts (not shown) and were not analyzed further after hydroxyapatite purification. Second, the W171A, E173N, W177A, R380E, N448A, E346A, H245A, and F402A were deeply affected with only 5–15% of the WT activity. In the third group comprising E427A, E445A, K419A, H240A, and the G-LOOP soMGD1, the activity was utterly abolished (Fig. 5B). The complete absence of activity in the G-LOOP mutant is not surprising, knowing the deleterious impact the G-loop deletion can produce on the three-dimensional structure. In a more refined way, a single point mutation occurring within G-loop 2, at the Glu346 position (E346A) was sufficient to tune down the activity. From these results, the activity
83
OCTOBER 14, 2005 • VOLUME 280 • NUMBER 41
Before ecMURG was crystallized, providing the precious structural information used in this study, biochemical analyses had given clues on MGD catalytic site, membrane topology, and oligomerization (12, 31). Using mixed micelles containing DAG, it was shown that the MGDG synthase activity from spinach chloroplast envelope was a sequential, either random or ordered, bireactant system, in which the binding of one substrate did not significantly change the specificity for the cosubstrate. It was therefore possible to estimate the Km values for UDPgalactose and various DAG species, indicating that the enzyme was able to discriminate between DAG molecular species (31). The inhibition by UDP that was competitive in regard to UDP-Gal and non-competitive in regard to DAG, supported the existence of separate sites for each substrate. Inactivation by amino acid reagents and protection by substrates indicated the existence of key lysine, histidine, and cysteine residues in the vicinity of the catalytic site (31). The occurrence of ⬃10 cysteine, ⬃15 histidine, and ⬃35 lysine residues in MGD sequences did not allow any simple identification of those that were most likely in the vicinity of the substrate binding sites. Sensitivity to a hydrophobic chelating agent o-phenanthroline and activity recovery after addition of bivalent metal cations supported the existence of one or more metal associated to the enzyme (26, 31). Using antibodies raised against soMGD1, the MGDG synthase from spinach was shown associated to membranes in a NaCl-resistant but NaOH-sensitive manner, indicating that the enzyme had no transmembrane span and was rather a “monotopic” enzyme, embedded in one membrane leaflet (12). Eventually, the kinetic of soMGD1 inactivation after ␥-radiation was consistent with a functional dimer (12). In the present study, we combined the biochemical data on MGD catalytic site with the structural information learned from MURG to refine the predicted soMGD1 model. MDG Overall Fold—Based on sequence comparisons (Figs. 1 and 2), the architecture of soMGD1 predicted in this study (Figs. 3 and 4) was
JOURNAL OF BIOLOGICAL CHEMISTRY
34699
MGDG Synthase Molecular Modeling sustained by PCR-point mutation analyses (Fig. 5). It consists of two Rossmann folds called the N- and C-domains. Catalytic site is predicted in the cleft between the two domains. The C-domain is the most satisfactorily predicted, with identification and confirmation by point mutation of key residues for UDP-Gal binding. In the N-domain, prediction is incomplete, lacking the two 具2–␣2典 and 具6 –7典 loops that have no counterpart in ecMURG. From phylogenetic comparisons (Fig. 1), we deduced that the 具2–␣2典 loop, which heads toward the catalytic site, may be involved in catalysis as well. Structural Homology Reveals the UDP-Gal Binding Site—The UDPGal binding site was predicted using the conformation of the UDPGlcNAc:ecMURG complex (36) (Fig. 4) and was supported by sitedirected mutagenesis experiments in which identified residues proved to be important for soMGD1 activity (Fig. 5). The overall orientation of the nucleotide sugar is conserved after optimization (Figs. 3A, 4A, and 4B). The predicted orientation of UDP in the binding site of soMGD1 involved the same network of hydrogen bonds and stacking contacts as that determined with UDP-GlcNAc in ecMURG. Hydrogen bonds with Gln444 and Glu445 are conserved (Gln288 and Gln289 in MurG). Some differences in binding and orientation of the glucoside moiety in MURG and the galactoside moiety in MGD1 were observed. In particular, specificity for galactose might derive from the proline residue at position 422 in MGD1. This bulky amino acid does not allow the presence of an equatorial hydroxyl group at position 4 but favors the hydrophobic contact with the methane hydrogen at this position. In MURG, the equivalent amino acid is a less bulky alanine residue that allows equatorial orientation of the O(4) atom establishing an hydrogen bond with this residue backbone. Proposed Residues of the 具2–␣2典 Loop Involved in the DAG Binding Site—No conclusive structure could be deduced for the acceptor binding site. The enrichment in hydrophobic amino acids close to the UDPGal binding site, as compared with MURG, might reflect the necessary environment for DAG, whose hydrophobicity is much higher than that of Lipid 1. The structure of the 具2–␣2典 loop could not be determined, but its roots head in the direction of the substrate binding pocket (Fig. 4C). That loop is rich in aromatic amino acids that are candidates for hydrophobic interactions and stacking. A possible role for that loop might be to maintain the acceptor or to discriminate between different acceptor species. Point mutation analyses support the importance of Trp171, Glu173, and Trp177 for activity (Fig. 5). The enzymological analysis of soMGD1 mutated at the level of Trp171 was undergone after membrane extraction, solubilization in detergent-lipid mixed micelles, purification, and Km computation following the surface dilution model. The overall decrease in specific activity measured with the W171A mutant could only be attributed to a decrease of the affinity for the provided DAG (in this experiment, containing only dioleoyl-1,2-snglycerol molecular species) supporting a possible role of the 具2–␣2典 loop in the enzyme interaction with DAG. The 具2–␣2典 loop seems therefore an important component of a DAG binding region, possibly in conjunction with remote residues of the soMGD1 sequence. Key Histidine and Lysine Residues in the Vicinity of the Active Site—We identified lysine and histidine residues in the vicinity of the broad substrate binding zone that proved essential for activity (Fig. 5). These amino acids were highly conserved in MGD sequences (supplemental Fig. 1S). Mutation of His240, His245, and Lys419 totally abolished the enzyme activity thus supporting theirroleincatalysis.Becausetheseresiduesdonotappeartobeinclosecontact with UDP-Gal, a speculative hypothesis would be that these residues might be involved in acceptor (DAG) binding. In this study, no cysteine residue could be identified as clearly essential for activity. In particular, cysteine residues in two intriguing CXC motifs (CYC in the 6 and CDC in the ⬘4 strands) were
84
34700 JOURNAL OF BIOLOGICAL CHEMISTRY
mutated (at the level of Cys284, Cys286, and Cys415), without any noticeable effect (not shown). Proposed Membrane Association Domain—Previous sequence analyses tempted to explain the monotopic insertion of soMGD1 inside one leaflet of the inner envelope membrane by detected amphipathic ␣-helices (12). Surface hydrophobic patches have been clearly evidenced in the N-domain (Fig. 3B). Such hydrophobic regions that are found on both sides of the N-domain are expected to interact strongly with the envelope membrane. It is striking to note that similar hydrophobic patches have been described in ecMURG that is also acting at the membrane surface (35). The cleft region from each MGD monomer (in the vicinity of which we predicted both the UDP-Gal binding site and the roots of the 具2–␣2典 loop) is likely to face the membrane surface where the very hydrophobic DAG resides and protrude sufficiently to bind the hydrophilic UDP-Gal sugar donor. Proposed MGD Dimerization Domain—To our knowledge, there is no report on a possible MURG dimerization as supported for MGD by radiation inactivation kinetics and cross-linking experiments (12, 26). The only described organization of two MURG proteins is in x-ray crystal where two protein molecules are observed in the asymmetric unit (35, 36). In the MGD model presented here, membrane association seems likely localized in the N-domain. Based on this membrane topology, on the geometric constraints derived from the DAG origin in the membrane and the UDP-Gal origin in the aqueous phase, it may be reasonable to suppose a dimerization involving the protruding C-domain. More investigations are required to explore this hypothesis. An additional question is also the possibility of a dimerization of MURG. Question of the Association of MGD with Metal Cations—Association of apo-MGD with metals is experimentally supported by sensitivity of the native and recombinant enzymes to ortho-phenanthroline, a hydrophobic chelating agent, and the subsequent partial recovery of the activity when supplied with metals such as Zn2⫹, Mg2⫹, or Cu2⫹ (26, 31). Metal coordination might imply cysteine and/or histidine residues. It is still not clear whether metal coordination might be directly involved in catalysis, stability of the overall structure, and/or MGD dimerization. The present study does not provide sufficient data to support any hypothetical metal binding site(s).
CONCLUSION Attempts to crystallize and solve the structure of plant MGDG synthases, the enzymes producing the most abundant polar lipid on earth, were unsuccessful until now. Using MURG as a template, and challenging our predictions by site-directed mutagenesis, we could obtain the overall fold of an Angiosperm MGDG synthase. As expected for a glycosyltransferase of the GT-B superfamily, the architecture of a monomer was that of a double Rossmann fold. The binding site for the UDPGal sugar donor was best predicted. The structure of two short segments (具2–␣2典 and 具6 –7典 loops) that have no counterparts in MURG could not be determined. Combining the obtained model with phylogenetic and biochemical information, we extended our structural investigation to structural questions that are unique to MGD proteins, that is binding of DAG, insertion into membranes, sites for dimerization, and association to a metal. In particular, we collected evidence supporting that the 具2–␣2典 loop in the N-domain is very likely involved in DAG binding. Additionally, the monotopic insertion of MGD in one membrane leaflet of the plastid envelope occurs very likely at the level of hydrophobic amino acids superficially exposed in the N-domain. Based on these topological constraints, MGD dimerization would occur in the protruding C-domain. We could not identify any site dedicated to metal association. Future prospects include a refinement of
VOLUME 280 • NUMBER 41 • OCTOBER 14, 2005
MGDG Synthase Molecular Modeling the MGD model, focusing on regions of MGD monomers, which likely play important functional roles, i.e. the 具2–␣2典 loop and parts of the C-domain that may be involved in protein dimerization. Based on MGD and MURG sequence similarity, the question of a possible dimerization (and association to metal) of MURG should also be addressed. Eventually, the presented structure will be a starting model to investigate possible molecular pharmacological mechanisms of MGD inhibitors and to understand the difficult question of the evolution of MGDG-synthesizing enzymes. Acknowledgment—We thank M. A. Block for fruitful discussion.
REFERENCES 1. Douce, R., Holtz, R. B., and Benson, A. A. (1973) J. Biol. Chem. 248, 7215–7222 2. Joyard, J., Block, M. A., Malherbe, A., Mare´chal, E., and Douce, R. (1993) in Lipid Metabolism in Plants (Moore, T. S., ed) pp. 231–257, CRC Press, Boca Raton, FL 3. Gounaris, K., and Barber, J. (1983) Trends Biochem. Sci. 8, 378 –381 4. Ferrari, R. A., and Benson, A. A. (1961) Arch. Biochem. Biophys. 93, 185–192 5. Neufeld, E. F., and Hall, C. W. (1964) Biochem. Biophys. Res. Commun. 14, 503–508 6. Archibald, J. M., and Keeling, P. J. (2002) Trends Genet. 18, 577–584 7. Keeling, P. J., Archibald, J. M., Fast, N. M., and Palmer, J. D. (2004) Science 306, 2191 8. Joyard, J., and Douce, R. (1987) in The Biochemistry of Plant Lipids: Structure and Function (Stumpf, P. K., ed) Vol. 9, pp. 215–274, Academic Press, New York 9. Feige, G. B., Heinz, E., Wrage, K., Cochems, N., and Ponzelar, E. (1980) in Biogenesis and Function of Plant Lipids (Mazliak, P., Benveniste, P., Costes, C., and Douce, R., eds) pp. 135–140, Elsevier-North-Holland Biomedical Press, Amsterdam 10. Murata, N., and Sato, N. (1982) in Biochemistry and Metabolism of Plant Lipids (Wintermanns, J. F. G. M., and Kuiper, P. J. C., eds) pp. 165–168, Elsevier Biomedical Press, Amsterdam 11. Shimojima, M., Morikawa, S., Maeda, K., Tohya, Y., Miyazawa, T., and Mikami, T. (1997) Proc. Natl. Acad. Sci. U. S. A. 94, 333–337 12. Mie`ge, C., Mare´chal, E., Shimojima, M., Awai, K., Block, M.A., Ohta, H., Takamiya, K., Douce, R., and Joyard, J. (1999) Eur. J. Biochem. 265, 990 –1001 13. Awai, K., Mare´chal, E., Block, M.A., Brun, D., Masuda, T., Shimada, H., Takamiya, K., Ohta, H., and Joyard, J. (2001) Proc. Natl. Acad. Sci. U. S. A. 98, 10960 –10965 14. Qi, Y. H., Yamauchi, Y., Ling, J. Q., Kawano, N., Li, D. B., and Tanaka, K. (2004) Planta 219, 450 – 458 15. Matson, R. S., Fei, M., Chang, S. B. (1970) Plant. Physiol. 45, 531–532 16. Mare´chal, E., Azzouz, N., De Macedo, C. S., Block, M. A., Feagin, J. E., Schwarz, R. T., and Joyard, J. (2002) Eukaryot. Cell 1, 653– 656 17. Jones, D., and Thornton, J (1993) J. Comput. Aided Mol. Des. 7, 439 – 456 18. Mare´chal, E., Awai, K., Block, M. A., Brun, D., Masuda, T., Shimada, H., Takamiya, K., Ohta, H., and Joyard, J. (2000) Biochem. Soc. Trans. 28, 732–738 19. Pali, T., Garab, G., Horvath, L. I., Kota, Z. (2003) Cell. Mol. Life. Sci. 60, 1591–1606 20. Do¨rmann, P., Balbo, I., and Benning, C. (1999) Science 284, 2181–2184 21. Benning, C., and Ohta, H. (2005) J. Biol. Chem. 280, 2397–2400 22. Andersson, M. X., Stridh, M. H., Larsson, K. E., Liljenberg, C., and Sandelius, A. S. (2003) FEBS Lett. 537, 128 –132 23. Jouhet, J., Mare´chal, E., Bligny, R., Baldan, B., Joyard, J., and Block, M. A. (2004) J. Cell Biol. 167, 863– 874 24. Ha¨rtel, H., and Benning, C. (2000) Biochem. Soc. Trans. 28, 729 –732 25. Ha¨rtel, H., Do¨rmann, P., and Benning, C. (2000) Proc. Natl. Acad. Sci. U. S. A. 97, 10649 –10654
26. Nishiyama, Y., Hardre´-Lie´nard, H., Miras, S., Mie`ge, C., Block, M. A., Revah, F., Joyard, J., and Mare´chal, E. (2003) Protein. Expr. Purif. 31, 79 – 87 27. Douce, R. (1974) Science 183, 852– 853 28. Block, M. A., Dorne, A. J., Joyard, J., and Douce, R. (1983) J. Biol. Chem. 258, 13281–13286 29. Cline, K., and Keegstra, K. (1983) Plant Physiol. 85, 780 –785 30. Mare´chal, E., Block, M. A., Joyard, J., and Douce, R. (1994) J. Biol. Chem. 289, 5788 –5798 31. Mare´chal, E., Mie`ge, C., Block, M. A., Joyard, J., and Douce, R. (1995) J. Biol. Chem. 270, 5714 –5722 32. Coutinho, P. M., Deleury, E., Davies, G. J., and Henrissat, B. (2003) J. Mol. Biol. 328, 307–317 33. Breton, C., and Imberty, A. (1999) Curr. Opin. Struct. Biol. 9, 563–571 34. Coutinho, P.M., and Henrissat, B. (1999) J. Mol. Microbiol. Biotechnol. 1, 307–308 35. Ha, S., Walker, D., Shi, Y., and Walker, S. (2000) Protein Sci. 9, 1045–1052 36. Hu, Y., Chen, L., Ha, S., Gross, B., Falcone, B., Walker, D., Mokhtarzadeh, M., and Walker, S. (2003) Proc. Natl. Acad. Sci. U. S. A. 100, 845– 849 37. Mare´chal, E., Block, M. A., Joyard, J., and Douce R. (1991) C. R. Acad. Sci. Paris 313, 521–528 38. Bligh, E. G., and Dyer, W. J. (1959) Can. J. Biochem. Physiol. 37, 911–917 39. Bastien, O., Aude, J. C., Roy, S., and Mare´chal, E. (2004) Bioinformatics 20, 534 –537 40. Bastien, O., Aude, J. C., Roy, S., and Mare´chal, E. (2005) BMC Bioinformatics 6, 49 41. Codani, J. J., Comet, J. P., Aude, J. C., Gle´met, E., Wozniak, A., Risler, J. L., He´naut, A., and Slonimski, P. P. (1999) Methods Microbiol. 28, 229 –244 42. Comet, J. P., Aude, J. C., Glemet, E., Risler, J. L., Henaut, A., Slonimski, P. P., and Codani, J. J. (1999) Comput. Chem. 23, 317–331 43. Felsenstein, J. (1989) Cladistics 5, 164 –166 44. Kelley, L. A., MacCallum, R. M., and Sternberg, M. J. (2000) J. Mol. Biol. 299, 499 –520 45. Combet, C., Blanchet, C., Geourjon, C., and Deleage, G. (2000) Trends Biochem. Sci. 25, 147–150 46. Callebaut, I., Labesse, G., Durand, P., Poupon, A., Canard, L., Chomilier, J., Henrissat, B., and Mornon, J. P. (1997) Cell Mol. Life Sci. 53, 621– 645 47. Blundell, T., Carney, D., Gardner, S., Hayes, F., Howlin, B., Hubbard, T., Overington, J., Singh, D. A., Sibanda, B. L., and Sutcliffe, M. (1988) Eur. J. Biochem. 172, 513–520 48. Laskowski, R. A., MacArthur, M. W., Moss, D. S., and Thornton, J. M. (1993) J. Appl. Crystallogr. 26, 283–291 49. Clark, M., Cramer, R. D. L., and van der Openbosch, N. (1989) J. Comput. Chem. 10, 982–1012 50. Imberty, A., Bettler, E., Karababa, M., Mazeau, K., Petrova, P., and Pe´rez, S. (1999) in Perspectives in Structural Biology (Vijayan, M., Yathindra, N., and Kolaskar, A. S., eds) pp. 392– 409, Indian Academy of Sciences and Universities Press, India 51. Petrova, P., Koca, J., and Imberty, A. (1999) J. Am. Chem. Soc. 121, 5535–5547 52. Heinz, E. (1977) in Lipids and Lipid Polymers (Tevini, M., and Lichtenthaler, H. K., eds) pp. 102–120, Springer Verlag, Berlin 53. Janeiro, D. R., and Barnett, R. (1981) J. Lipid Res. 22, 1119 –1125 54. Sato, N., Tsuzuki, M., and Kawagushi, A. (2003) Biochim. Biophys. Acta 1633, 27–34 55. Sanina, N. M., Goncharova, S. N., and Kostetsky, E. Y. (2004) Phytochemistry 65, 721–730 56. Cohen, Z., Khozin-Goldberg, I., Adlerstein, D., and Bigogno, C. (2000) Biochem. Soc. Trans. 28, 740 –743 57. Baker, P. J., Britton, K. L., Rice, D. W., Rob, A., and Stillman, T. L. (1992) J. Mol. Biol. 228, 662– 671 58. Crouvoisier, M., Mengin-Lecreulx, D., and van Heijenoort, J. (1999) FEBS Lett. 449, 289 –292 59. Deleted in proof 60. Edman, M., Berg, S., Storm, P., Wikstrom, M., Vikstrom, S., Ohman, A., and Wieslander, A. (2003) J. Biol. Chem. 278, 8420 – 8428
85 OCTOBER 14, 2005 • VOLUME 280 • NUMBER 41
JOURNAL OF BIOLOGICAL CHEMISTRY
34701
86
5. SIMULATIONS DE DYNAMIQUE MOLECULAIRE SUR LA GLYCOSYLTRANSFERASE LGTC Au cours de ce travail de thèse, je me suis intéressée à l’étude du mécanisme de la réaction de la glycosyltransférase LgtC, qui est présente dans la bactérie Neisseria meningitidis (Gotschlich, 1994).
5.1. Neisseria meningitidis La Neisseria meningitidis (le méningocoque) est une bactérie diplocoque à gram négatif, qui est l'agent de la méningite cérébro-spinale (voir le chapitre suivant). Le méningocoque est un parasite strict de l'espèce humain. Le rhino-pharynx est la porte d'entrée du germe dans l'organisme. De nombreux sujets sont porteurs sains et jouent un rôle capital dans l'épidémiologie de la maladie. La paroi du méningocoque, et des Neisseria en général, contient des lipooligosaccharides (LOS), dont la structure ressemble à une partie d’antigène Pk des glycolipides humains présent dans le sang, et ainsi permet le camouflage de la surface bactérienne contre le système immunitaire humain (Moran et al., 1996). Les polysaccharides capsulaires permettent de classer par agglutination les méningocoques en groupes sérologiques (les sérogroupes), classés à leur tour en sous-groupes (les sérotypes). La distinction entre les différents groupes est primordiale au moment de la mise au point des vaccins, car un vaccin préventif n'agit que sur un groupe de méningocoques spécifique. Ainsi, il existe des vaccins modernes très efficaces aussi bien pour le groupe A que le pour le groupe C extrêmement dangereux.
5.2. Épidémiologie des méningites bactériennes Les méningites bactériennes sont des infections des membranes (méninges) et du liquide céphalorachidien (LCR) qui entourent le cerveau et la moelle épinière ; elles sont une cause majeure de décès et d’incapacités dans le monde. Passée la période périnatale, trois bactéries, dont la transmission se fait d’homme à homme par les sécrétions respiratoires sont responsables de la plupart des méningites bactériennes : Neisseria meningitidis, Streptococcus pneumoniae et Haemophilus influenzae. L’étiologie des méningites bactériennes varie avec l’âge et la géographie. Tous les ans, on estime à un million le nombre de cas de méningite survenant dans le monde, dont 200 000 sont fatals. Le taux de létalité est fonction de l’âge et de la bactérie en cause, oscillant classiquement de 3% à 19% dans les pays développés. Une
87
létalité plus élevée (37%-60%) a été rapportée dans les pays en développement. Des séquelles, incluant surdité, retard mental et autres séquelles neurologiques, intéressent jusqu’à 54% des survivants. La méningococcie, ou infection à N. meningitidis, se présente sous deux formes cliniques : la méningite et la septicémie (méningococcémie). Si les deux syndromes peuvent coexister, la méningite seule est particulièrement fréquente. La réactivité immunologique du polyoside capsulaire permet de classer N. meningitidis en sérogroupes (voir le chapitre précédent). Douze sérogroupes ont été identifiés, mais les trois sérogroupes A, B et C sont responsables de plus de 90% des cas de méningococcie. La méningite à méningocoque diffère des autres méningites bactériennes par son potentiel épidémique. Les plus grandes épidémies surviennent dans une région de l’Afrique subsaharienne qui s’étend de l’Ethiopie à l’est jusqu’en Gambie à l’ouest, composée de 15 pays et regroupant plus de 260 millions d’habitants, désignée par l’expression "ceinture de la méningite". Dans cette zone de forte endémicité, de grandes épidémies surviennent périodiquement. Le traitement par les antibiotiques n'est efficace que dans le premier stade de la maladie. Les vaccins ont donc un rôle important dans la lutte contre les méningites bactériennes. Il existe des vaccins commercialisés contre N. meningitidis, H. influenzae et S. pneumoniae, mais la protection offerte par chacun est spécifique de la bactérie et limitée à certains sérogroupes ou sérotypes.
5.3. La glycosyltransférase LgtC La glycosyltransférase LgtC (Wakarchuk et al., 1998; Wakarchuk et al., 1996) de Neisseria meningitidis est une α-(1→4)-galactosytransférase de la famille GT8, qui retient la configuration a du carbone anomérique du galactose transféré. La LgtC catalyse une réaction importante dans la biosynthèse des lipooligosaccharidiques de la paroi bactérienne en transférant un groupe de α-D-galactose de l’UDP-galactose à un lactose terminal (Figure 28).
88
Figure 28: La réaction catalysée par LgtC.
La structure cristallographique de la glycosyltransférase LgtC a été résolue récemment en complexe avec un substrat donneur et un analogue de l’accepteur (Persson et al., 2001). La LgtC est un monomère composé de 286 résidus acides aminés, qui forment neuf brins β et quatorze hélices α (Figure 29). Une grande partie du site actif est cachée sous deux boucles, dénommées X et Y. La LgtC adopte le fold GT-A, et un motif DxD (Asp103-Ile104-Asp105) se trouve dans le site actif. Les acides aminés de ce motif, ainsi que le résidu His244, sont impliqués dans la coordination du cation Mn2+. Selon les données expérimentales, le reste de la coordination du cation Mn2+ est fourni par le substrat donneur UDP-Gal (Figure 30).
Figure 29: Représentation graphique de l’enzyme LgtC en complexe avec les substrats (code PDB 1GA8).
89
Figure 30: Schéma de coordination du cation de manganèse dans le site actif de la LgtC.
Le mécanisme de la réaction de la LgtC n’a pas encore été déterminé, bien que plusieurs hypothèses aient été présentées (Ly et al., 2002; Tvaroska, 2004). En connaissant le mécanisme de la réaction de cette enzyme, il serait possible d’envisager la conception d’inhibiteurs efficaces. De telles molécules pourraient en conséquence être utilisées comme des nouveaux antibiotiques spécifiques contre la bactérie Neisseria meningitidis.
90
ARTICLE IV
Snajdrova, L., Kulhanek, P., Imberty, A. & Koca, J. (2004). Molecular dynamics simulations of glycosyltransferase LgtC. Carbohydrate Research 339, 995-1006. Résumé : Des simulations de dynamique moléculaire ont été effectuées sur la α-(1-4)galactosyltransférase LgtC de Neisseria meningitidis avec et sans le substrat donneur UDPGal et en présence d'ion manganèse. L'analyse des trajectoires a révélé un mouvement limité dans la boucle X (résidus 75-80) et un changement conformationnel plus important dans la boucle Y (résidus 246-251) dans les simulations où l’UDP-Gal était absent. Dans ce cas, les boucles X et Y s'ouvrent de presque 10 Å, exposant le site actif au solvant. La « région charnière » responsable de l’ouverture des boucles est constitué des résidus 246-247. On a aussi analysé le comportement de l’ion manganèse au cours des simulations. Le nombre de coordination est 6 quand l’UDP-Gal est présent, et il monte à 7 si le substrat donneur est absent. Dans le deuxième cas, trois molécules d'eau coordonnent le manganèse. Dans les deux cas, la coordination est très stable, indiquant que l'ion manganèse est lié fortement dans le site actif de l’enzyme même si l’UDP-Gal est absent. Une analyse de la position des molécules d’eau a confirmé que la mobilité de ces molécules dans le site actif (et donc l'accessibilité de ce site pour les molécules du solvant) est plus forte en absence du substrat donneur.
91
92
Carbohydrate RESEARCH Carbohydrate Research 339 (2004) 995–1006
Molecular dynamics simulations of glycosyltransferase LgtC q a,a,b Petr Kulh Lenka Snajdrov anek,a,c Anne Imbertyb and Jaroslav Kocaa,c,* a
National Centre for Biomolecular Research and Department of Organic Chemistry, Faculty of Science, Masaryk University Brno, CZ-611 37 Brno, Czech Republic b Centre de Recherches sur les Macromolecules Vegetales, CNRS and Universite Joseph Fourier, IFR 2607, BP 53, F-38041 Grenoble, France c Department of Organic Chemistry, Faculty of Science, Masaryk University Brno, CZ-611 37 Brno, Czech Republic Received 13 October 2003; accepted 17 December 2003
Abstract—Molecular dynamics simulations have been performed on fully solvated a-(1 fi 4)-galactosyltransferase LgtC from Neisseria meningitidis with and without the donor substrate UDP-Gal and in the presence of the manganese ion. The analysis of the trajectories revealed a limited movement in the loop X (residues 75–80) and a larger conformational change in the loop Y (residues exposing the 246–251) in the simulation, when UDP-Gal was not present. In this case, the loops X and Y open by almost 10 A, active site to the solvent. The Ôhinge regionÕ responsible for the opening is composed of residues 246–247. We have also analyzed the behavior of the manganese ion in the simulations. The coordination number is 6 when UDP-Gal is present and it increases to 7 when it is absent. In the latter case, three water molecules become coordinated to the ion. In both cases, the coordination is very stable implying that the manganese ion is tightly bound in the active site of the enzyme even if UDP-Gal is not present. Further analysis of the structural water molecules location confirmed that the mobility of water molecules in the active site and the accessibility of this site for solvent are higher in the absence of the substrate. 2004 Elsevier Ltd. All rights reserved. Keywords: Galactosyltransferase; Molecular dynamics; Loops opening; Structural water molecules
catalyzing a key step in the biosynthesis of lipooligosaccharide structures by transferring a-D -galactose from UDP-galactose to a terminal lactose (Scheme 1).1 The resulting structure is a mimic of the human Pk blood group glycolipid and can therefore participate in the camouflage of the bacterial surface from recognition by the human immune system. Since Neisseria meningitidis can act as a human pathogen, causing an invasive systemic infection that can be fatal,2 understanding of the substrate recognition and mechanism of LgtC is of interest for future design of inhibitors that could block lipooligosaccharide biosynthesis. The crystal structure of LgtC has been recently solved as a complex with analogs of both donor and acceptor.3 The enzyme is a monomer composed of 286 residues that create nine b-strands and 14 a-helices (Fig. 1). The active site is mostly buried under two loops that presumably play a key role in organizing the substrates before the reaction and probably also in removing the products
1. Introduction The glycosyltransferases catalyze the transfer of glycosyl moieties from a donor sugar to an acceptor. In most cases, the donor is a nucleoside phosphosugar and the acceptor is a hydroxyl group of another sugar, a lipid, or another component of glycoconjugates. The glycosyltransferases are classified as either retaining or inverting, depending on the stereochemical outcome of the reaction catalyzed. The galactosyltransferase LgtC from Neisseria meningitidis is a retaining a-(1 fi 4)-glycosyltransferase,
Abbreviations: RMSD, Root mean squares deviations; EDA, Essential dynamics analysis. q Supplementary data associated with this article can be found in the online version, at doi:10.1016/j.carres.2003.12.024 * Corresponding author. Tel.: +420-541-129310; fax: +420-541129506; e-mail:
[email protected] 0008-6215/$ - see front matter 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.carres.2003.12.024
93
996
L. Snajdrov a et al. / Carbohydrate Research 339 (2004) 995–1006
Scheme 1. Schematic representation of the LgtC catalyzed transfer of galactose from UDPGal to lactose. The product retains the configuration of the donor sugar glycosidic bond; LgtC is thus a retaining glycosyltransferase.
from the active site. Loop X (residues 75–80) makes a connection between two helices. Loop Y consists of residues 246–251 and is located in the C-terminus region of the enzyme. The two loops are on opposite sides of the active site groove. In the complex with substrate analogs, the two loops close above the donor sugar by a stacking interaction between Pro248 and His78.3 There is no available structural data about the conformation of the loops in the native enzyme, since the absence of UDPGal results in inability to crystallize the enzyme. The structure of LgtC is a key starting point to fully understand the mechanism of the enzymatic reaction. The understanding of the catalytic mechanism is an essential precondition for rational design of effective inhibitors of this enzyme. Although the structural details of the complex are well characterized,3 a subsequent detailed mechanistic study could not clearly delineate the chemical mechanism of this enzyme.4 Extensive structural studies have been performed on another family of retaining glycosyltransferases that contains bovine a-(1 fi 3)-galactosyltransferase5–7 and human blood group A and B glycosyltransferases.8;9 From the comparison of the different structures, it could be established that in this family the presence of UDP-Gal is necessary for the conformational change of two loops, and the closed conformation could be modeled.10 A first theoretical approach of catalytic reaction has also been proposed for a-(1 fi 3)-galactosyltransferase,11 but the proposed mechanism has not yet been validated by experimental work and is still discussed. Open questions are related not only to the enzymatic reaction itself, but also to the role of the enzyme in arranging the starting material to react and the products of the reaction to
depart from the active site. For example, no detailed information is known about the flexibility of loops X and Y and their role in attaching the manganese ion as well as in the donor and acceptor substrate anchoring. Because of the structural and mechanistic questions and problems mentioned above, we decided to use theoretical methods, namely molecular dynamics, to address the issue. Molecular dynamics has often proved to serve as a reliable computer simulation method that may usefully complement experimental observations. Very recently, it has also been used to analyze the backbone flexibility of an inverting glycosyltransferase.12 We report here the first molecular dynamics simulations of a retaining glycosyltransferase, LgtC, in water environment and in the presence of its donor substrate. The analyses of the trajectories provide an insight into atomic interactions within the active site of this glycosyltransferase as well as into the flexibility of the active site. The aim of this work is to analyze the active site loops flexibility, how it is influenced by the presence of the UDP-Gal substrate, and to determine the role of Mn2þ ions. As a spin-off, we will show how the protonation state of His244 may influence the behavior of the loops and how this may produce artifacts in the calculations.
2. Results and discussion Since kinetics studies suggested that UDP-Gal binds first to LgtC, followed by the acceptor lactose,4 we focus here on this first step and analyze the stability of the complex without the acceptor. Three molecular
94
L. Snajdrov a et al. / Carbohydrate Research 339 (2004) 995–1006
997
Figure 1. Structural informations from the crystal structure of LgtC complexed with substrate analogs.3 Sequence information together with secondary structures (a), 3D-structure with schematic representation of a-helices and b-sheets (b). Donor, acceptor, and manganese ions are represented as CPK.
dynamics simulations of LgtC in explicit solvent were carried out in order to explore the dynamic behavior of
LgtC (Table 1). Conditions of simulation were chosen in order to study the stability of the protein loops as a
95
L. Snajdrov a et al. / Carbohydrate Research 339 (2004) 995–1006
998
Table 1. Molecular dynamics simulations on LgtC Simulation
Composition
Purpose
Number of water moleculesa
I
LgtC + Mn2þ + UDP-Gal, d-coordinated His244 LgtC + Mn2þ + UDP-Gal, e-coordinated His244 LgtC + Mn2þ
To explore the effect of different protonation states of His244 To analyze flexibility of the structure with the donor substrate To identify flexible regions and the role of Mn2þ in binding
11,437
II III a
11,425 11,440
3 in all three cases. Simulation box size before equilibration was 75 · 78 · 82 A
Mn2þ ion (simulation II), His244 remains in contact with the ion, as observed experimentally, throughout the simulation. This latter situation is therefore chosen as the most probable one, and is kept for the simulations analyzed in the present work (simulations II and III). It follows from both simulations that UDP-Gal probably binds to the active site with e-coordinated His244. Nevertheless, the simulations do not exclude the second possibility (d-coordinated His244) but with His244 coordination type changing during loops X and Y closing (see below). However, this later case is less probable.
function of the presence of UDP-Gal and the role of Mn2þ in UDP-Gal binding. Furthermore, the hydration was thoroughly analyzed. 2.1. Influence of different tautomeric forms of His244 on LgtC structure The X-ray structure of LgtC was used as a source of initial coordinates for all molecular dynamics simulations. Since the resolution of the X-ray was not adequate enough to determine location of hydrogen atoms, they have to be added in theoretical positions. This may not be straightforward and simple as several residues can exist in more than one protonation state depending on pH, ionic strength etc. Following generally used convention for simulations in water solution, all Asp, Glu, Lys, and Arg residues were considered in ionic forms in all simulations and His residues were considered in neutral forms. These are represented by two tautomers in which the hydrogen atom is located either in e- or in d-position. In our calculations, each His residues were assigned such tautomeric form that the hydrogen bond network within the entire protein was optimal. This was only problematic for His244 as it is also the integral part of Mn2þ ion binding site and the structure of this site can be influenced by His244 isomeric form. His244 coordinates Mn2þ ion via the nitrogen atom in e-position in both X-ray structures of LgtC. However, the situation in free LgtC is not known because this structure has not been solved yet. It is therefore possible that His244 is d-coordinated. Simulations II and I with differently coordinated His244 were run to bring more light into this problem. When the nitrogen in d-position coordinates the manganese ion (simulation I), a strong repulsion between the hydrogen in e-position and the ion occurs. As a consequence, the entire side chain of this amino acid residue turns around during the initial minimization so that the d-nitrogen atom of His244 can coordinate the manganese ion. This causes a significant change of conformation of the part of the backbone related to this residue, which subsequently results in the opening of loops X and Y during the simulation. Such a situation is not in agreement with the crystal structure,3 and furthermore would probably cause hydrolysis of UDP-Gal prior to the enzymatic reaction. On the contrary, when the nitrogen in e-position coordinates the
2.2. Effect of the presence of UDP-Gal in the binding site Stability of the trajectories––Simulations in the presence (II) and in the absence (III) of UDP-Gal were calculated with total production length of 3 ns for each of them. In both cases, potential energy of the system became stable during the first 100 ps of the simulation. When looking at the stabilities of the geometry, simulation II needed almost 2 ns to reach the full stability whereas trajectory III became fully stable after about 1 ns (Fig. 2a). Analysis of the flexibility along the peptide chain can be performed by calculating theoretical B-factors13 and comparing them with the ones determined by X-ray crystallography. Both simulations II and III display an excellent agreement between calculated B-factors and the experimental ones (Fig. 2b). Small differences in some loop regions are discussed below. Active site opening and the loops flexibility––Essential dynamics analysis (EDA) was used to identify correlated atomic motions occurring during simulations II and III. EDA is able to filter out large correlated motions in the simulated system from uninteresting local motions (mostly vibrations). Overall translational and rotational motions were removed before EDA was started. EDA itself consists of the following steps. First, a covariance matrix is built from the atomic fluctuation in the trajectory. Only C-alpha atoms have been included since it has been shown that these atoms contain all the information for a reasonable description of the concerted motion.14 Upon matrix diagonalization, a set of eigenvalues and eigenvectors is obtained. The eigenvectors correspond to directions in space, and motions along single eigenvectors correspond to concerted fluctuation of atoms. The eigenvalues then represent the total mean
96
L. Snajdrov a et al. / Carbohydrate Research 339 (2004) 995–1006
999
2.5
RMSD [angstroms]
2
1.5
1
0.5
0
0
1000
2000
3000
time [ps]
(a) 60 simulation II simulation III Xray structure 50
B-factor
40
30
20
10 loop X 1
0
(b)
0
A
2
B
50
3C
D
loop Y 4 5 E
100
6F G
7 H
150
I
J 8
200
K
L
9
250
M N
300
number of alpha carbon in the protein backbone
from the starting structure for simulations II and III (a), and B-factors of Ca -atoms of the backbone (b) taken Figure 2. Mass-weighted RMSD (in A) from X-ray data or calculated from MD simulations.
significant difference is observed in the C-terminal region, and especially in the loop Y. Indeed, when looking at the crystal structure, this is one of the loops that cover the binding site, together with the loop X, located in the N-terminal region. The latter loop X also displays slightly increased motion for simulation III in Figure 3. Interestingly, there are some parts of the protein that exhibit larger motions when the UDP-Gal is present than when it is absent. All of the residues concerned are distant from the active site. This behavior is explained by a high mobility of side chains and by their tendency to form ionic pairs with another side chain (e.g., Lys20 and Glu23, Arg53 and Glu23, Arg224 and Glu137, and so on) and does not directly depend on the presence of donor substrate.
square fluctuation of the system along the corresponding eigenvectors. In the last step, new trajectory containing only correlated atomic motions is calculated from the old trajectory by applying the most significant eigenvectors. These, as Berendsen and coworkers demonstrated,14;15 correspond to only the early largest eigenvalues, usually between 5 and 10. Root mean square deviation (RMSD) for each C-alpha atom calculated from such filtered trajectories (10 eigenvectors were used) are shown in Figure 3. In both simulations, as expected, loops display more motions than secondary structure elements do. As an overall observation, the simulation in absence of UDPGal displays larger motion of both loops X and Y than the simulation in the presence of this substrate. A
97
L. Snajdrov a et al. / Carbohydrate Research 339 (2004) 995–1006
1000
RMSD for 10 eigenvectors of given simulation
4
simulation II simulation III 3
2
1
0
loop X A
0
2
B
50
3C
D
loop Y 4 5 E
6F G
100
7 H
150
I
J 8
K
L
200
9
250
M N
300
number of alpha atom in the protein backbone calculated from the first 10 eigenvectors (simulation II in black and simulation III in Figure 3. Results of essential dynamics analysis. RMSD (A) light grey). Simulation III displays larger motion of both loops X and Y than simulation II.
Ca247-Ca248 change. The second is an internal movement within loop Y itself (pseudodihedral Ca249Ca250-Ca251-Ca252––see supplementary material, Fig. 1S). Other flexible regions observed during the simulations––A certain number of loops that were stabilized by hydrogen bonds or ionic contact in the crystal structure exhibited a different conformational behavior during the molecular dynamics run in explicit solvent. The resulting local rearrangement explains the small differences in B-factors between simulations and X-ray that are displayed in Figure 2. Some of the observed movement present differences in simulation II and III: In the crystal structure, the C-terminal domain was bound to another part of the protein by a strong hydrogen bond involving the side chain of Ser270 and Lys20. In both simulations II and III, this contact is broken early during the MD simulation and the C-terminal domain does not keep its starting position. Furthermore, other flexible regions were observed in both simulations II and III. However, as it follows from the analyses of both trajectories, these changes do not influence behavior of the active site.
Loops opening in simulation III––The history of these two loops has been analyzed. The inspection of the trajectories reveals that loops X and Y behave differently in simulations II and III. During simulation III, loops X and Y were observed to move away from each other, which results in opening of the binding pocket and exposing the active site to the solvent. This did not occur in simulation II, where the donor substrate was bound. This is illustrated by measuring the closest distance of both loops (Fig. 4). It is seen that the loops measured by the distance of His78 open by almost 10 A and Pro248 alpha carbon atoms. There are two key interactions that keep loops X and Y close to each other in simulation II: First, the presence of alternating hydrogen bonds between atom NE2 of His78 of the loop X and oxygen atoms OP2 and OP6 of pyrophosphate of UDP-Gal; and second, the stacking interaction between the rings of His78 and Pro248. During simulation III (UDP-Gal not being present in the active site), His78 cannot obviously form any hydrogen bond with donor substrate and the stacking interaction between His78 and Pro248 appears to be not strong enough to keep the loops together. Dihedral angles over CA atoms in both loops X and Y and its surrounding were monitored and it was confirmed that relatively large changes occur in loop Y, whereas only small changes were observed in loop X. Further analysis revealed that there are two kinds of movement seen a in loop Y. The first is a ÔhingeÕ movement of the entire loop. The hinge is composed of the residues Cys246-Gly247. The movement is best represented by pseudodihedral angle Ca245-Ca246-
2.3. Coordination of the manganese ion Manganese ion is required for both the stability and the activity of LgtC.16 This is the case in glycosyltransferases adopting the same one domain fold (GT-A), in the sequence of which an acidic motif (DXD) is conserved and directly involved in coordination of the divalent ion.17;18 The role of this divalent cation is to bridge the protein side chains with the diphosphate group of the
98
L. Snajdrov a et al. / Carbohydrate Research 339 (2004) 995–1006
1001
Figure 4. Distance between the a-carbons of His78 and Pro248 as a measure of the loops X and Y opening during simulations II (a) and III (b). Graphic representation of the system where the loops are closed (simulation II) (c) and open, with the active site exposed to solvent (end of simulation III) (d), and comparison of the two structures at the end of the MD simulations (e).
donor substrate. In the crystal structure of LgtC,3 a Mn2þ ion is coordinated by two phosphate oxygen atoms of UDP as well as by the side chain atoms of Asp103 and Asp105, the two acidic residues forming the DXD motif.
The question remains open as to whether the ion arrives into the active site prior to the UDP-Gal or together with it as a complex. We therefore ran the simulation III, where the UDP-Gal is not present, and compared the
99
L. Snajdrov a et al. / Carbohydrate Research 339 (2004) 995–1006
1002
behavior of the Mn2þ ion with that in simulation II. We focused our attention mainly on the coordination number of the ion during the two simulations. In simulation II, the coordination number of manganese ion remains 6 over the whole trajectory involving both carboxylate oxygens of Aspl05, one carboxylate oxygen of Aspl03, imidazole nitrogen of His244 and two pyrophosphate oxygens of UDP-Gal (Fig. 5a). The average distance of the ion from its coordinating atoms (see Table 2), in a good oscillates between 2.1 and 2.4 A agreement with the X-ray structure. When the donor substrate is removed (simulation III), water molecules occupy the location of UDP and solvate Mn2þ ion. Radial distribution function (RDF) between
the ion and water oxygens calculated from the last 1 ns of simulation III shows two solvation shells (Fig. 6). The first solvation shell, which is represented by sharp peak and with the maximum at in the range from 2.1 to 2.6 A 2.3 A, contains three water molecules. Zero value of RDF between the first and second peaks indicates that no exchange of water molecules between the first and second solvation shell occurred in the last 1 ns of simulation. Indeed, residence time of each of these three water molecules is 3 ns. The second solvation shell is represented by a broader peak of RDF in the range of and with the maximum at 4.6 A. It contains 3.6–5.8 A about 10 water molecules with residence times shorter than 1000 ps.
Figure 5. Coordination of manganese ion during MD (a and b) and QM (c and d) calculations. Manganese ion is hexa-coordinated in simulation II (a), and hepta-coordinated in simulation III (b). The energy-minimized structures using ab initio quantum mechanics show better agreement of the hepta-coordinated structure (d) with the MD data than with that of the hexacoordinated manganese ion (c).
Table 2. Coordination of the manganese ion in structure determined by X-ray, molecular dynamics simulations and quantum mechanics calculations (all distances in A) X-raya
Atom
NE2(His244) OD1(Asp103) OD1(Asp105) OD2(Asp105) OP3(PYR285) OP5(PYR285) O(WAT377) O(WAT457) O(WAT9759) a b
Simulation II
Simulation III
QM calculation
Distance from Mn2þ
Average distance from Mn2þ
Standard deviation
Average distance from Mn2þ
Standard deviation
Two water molecules
Three water molecules
2.28 2.24 2.22 2.41 2.25 2.23 –– –– ––
2.37 2.17 2.18 2.27 2.18 2.17 –– –– ––
0.07 0.05 0.05 0.07 0.05 0.05 –– –– ––
2.44 2.18 2.26 2.29 –– –– 2.31 2.31 2.33
0.09 0.05 0.09 0.09 –– –– 0.07 0.08 0.08
2.19 2.08b 2.20 2.25 –– –– 2.27 –– 2.43
2.48 2.21 2.35 2.30 –– –– 2.28 2.53 2.20
Data was taken from crystal structure of LgtC.3 Distance to OD2(Asp103) atom, OD1(Asp103) atom forms hydrogen bond with water 377.
100
L. Snajdrov a et al. / Carbohydrate Research 339 (2004) 995–1006
1003
8
RDF (r)
6 4 2 0
2
3
4
5
6
7
r [A] Figure 6. Radial distribution function calculated for the Mn2þ ion and water oxygens (data taken from the last 1 ns of simulation III). The first solvation shell (sharp peak) contains three water molecules, the second solvation shell (flat peak) contains about 10 water molecules.
The above analysis confirms that oxygen atoms of three water molecules coordinate the ion (Fig. 5b, Table 2), resulting in a coordination number of 7 for the manganese ion. This is not the most common stereochemistry of Mn2þ that is octahedral (coordination number 6). However, the Mn2þ stereochemistry is strongly influenced by the active site of the enzyme, especially by Asp105 that acts as a bidentate ligand. The average value of the O–Mn–O angle is approximately 56, which is significantly lower than the value for the regular octahedral configuration (90). This deformation generates some space to include one additional water molecule. In order to confirm the above observations that are based on force field calculations, more precise ab initio calculations were performed with two water molecules (coordination 6) and three water molecules (coordination 7) using a model of the binding site with a limited number of atoms. The results for coordination number 7 exhibit much better agreement with the geometry taken from molecular dynamics than those obtained for the model with the coordination number of 6 (Fig. 5c and d). These results are in agreement with the hypothesis that steric requirements of UDP-Gal do not allow the Mn2þ ion the coordination number of seven in the simulation II. At the same time, we may conclude that the metal ion is very well located in the active site of the enzyme without the UDP-Gal, so it may easily access the active site prior to UDP-Gal. 2.4. Analysis of the tightly bound water molecules As discussed above, analysis of the manganese environment indicated infinite (or at least very long) residence times for three coordinating water molecules. Analysis has been extended to all water molecules in contact with the protein in order to search for tightly bound water molecules that could play a role in the architecture of the binding site and might also become structural water molecules. A number of tightly bound water molecules were found to stay essentially in the
same positions as those in the X-ray structure. In general, the ÔinternalÕ tightly bound waters stayed in their positions over the whole (or nearly whole) trajectory, whereas those captured on the surface of LgtC were often replaced by water molecules from the solution. It was confirmed that most of the ÔinternalÕ tightly bound waters stabilize the structure of protein, whether forming a bridge between two or three secondary structural motifs or keeping stable one single loop of the protein. With regard to the binding site, 15 water molecules have residence time longer than 1 ns in simulation III and 8 ns in simulation II (see supplementary materials, Table 1S). Representative picture taken from the end of both trajectories, displaying the protein backbone and substrate(s) (Fig. 7), show the typical arrangement of water molecules in the active site. Simulation in the presence of nucleotide sugar––Four water residues from the X-ray structure stay in the active site during the entire MD trajectory, although with some small variations in hydrogen bonding pattern. Three of them are in direct contact with uridine–ribose part of UDP-Gal (Fig. 7a). Water 483 forms a bridge between OP2 of the pyrophosphate part of UDP-Gal and the backbone carbonyl group of His78. Water 324 is bound between the O20 atom of ribose and the carboxyl group of Asp103. Water 382 forms a bridge between the carbonyl group of Leu102 and the amido group of Ala5, keeping together two amino acid strands of LgtC. The fourth stable water molecule (302) interacts with the previous one and also forms a bridge between helix Ser80-Leu87 and the neighboring strand of the protein (Tyr101). Four other water molecules arrived from the bulk solution at the beginning of MD simulation, approached the donor-binding site and stayed there for a reasonably long time (over 1000 ps), but their binding was not as stable as in the case of the X-ray water molecules (which were localized deeply inside the molecule). No water molecule can come close to the manganese ion, as the latter is steadily coordinated by six electronegative atoms from protein and donor substrate (see above).
101
1004
L. Snajdrov a et al. / Carbohydrate Research 339 (2004) 995–1006
substrate is protected from hydrolysis not only by side chains of residues Ile76, Cys246, and Tyr186, but also by stable binding of water molecules 1860 and 6853 to residues mentioned above. Simulation in the absence of nucleotide sugar––Similar remarks apply as for simulation II. The stable water molecules positioned around the active site of the protein are listed in the supplementary material. Many tightly bound waters stabilize the structure of the protein by keeping one or more structural motifs in a certain shape. For example, water 319 was found to stabilize loop X over nearly the entire trajectory, staying in a fixed position inside the loop, between carbonyl oxygens of Ile76, Ile79, and Ser80 and the side chain of Gln187. In contrast, several water molecules exchanged around the more flexible loop Y. Waters 377, 457, and 9759 coordinated the manganese ion as a consequence of absence of donor substrate. Several water molecules are located in the binding site for UDP-Gal where they fill the space for donor substrate (Fig. 7a). They exchange more frequently because of the absence of this substrate and due to the opening of loops X and Y.
3. Conclusions
Figure 7. End-of-trajectory picture of several stable water molecules in simulations II (a) and III (b). The accessibility of the active site for solvent is higher in the absence of donor substrate than in its presence. Apart of this, the picture shows the change of conformation of loop X (loop Y was not included for better readability of the picture).
Since no acceptor is included in this simulation, it is of interest to analyze how water molecules occupy this empty space. The affinity of LgtC to the lactose acceptor is in the millimolar range and only three hydrogen bonds are observed in the crystal structure between the lactose substrate and the surrounding amino acids.3 When examining the neighborhood of the three amino acids involved (Cys246, Asp130, and Gln189), two tightly bound water molecules could be localized in equivalent positions of two oxygen atoms of lactose, O3 of glucose, and O5 of galactose. Water 1860 came in 1658 ps from the bulk solution and formed hydrogen bonds with the hydroxyl group of Thr212 and the carboxyl group of Asp130. Water 6853 also came in 586 ps from the bulk solution and formed a hydrogen bond with Asp130 and Gln189 until 1498 ps (with a short interruption of 40 ps, forming a hydrogen bond with Cys246). No water remained for significant time at the theoretical location of the acceptor hydroxyl group (i.e., O4 of galactose). It therefore appears that the donor
102
Three molecular dynamics simulations were performed on fully solvated galactosyltransferase LgtC from Neisseria meningitidis. The key objective was to run a simulation with the manganese ion and the substrate donor UDP-Gal present and to compare it with the simulation where the UDP-Gal was removed. Essential dynamics analysis of the trajectories clearly revealed that some conformational movement occurs in loop X (residues 75–80) and substantial movement on loop Y (residues 246–251) close to the C-terminal of the protein. In the simulation with UDP-Gal this loop is tightly bound to the UDP-Gal. However, when UDP-Gal is not present in the system, both loops open almost by 10 A and the active site is exposed to the solvent. Three water molecules enter it and coordinate strongly the manganese ion. We have identified residues and dihedrals that are responsible for the loops opening. This Ôhinge regionÕ is composed of the residues 246–247. Blocking this region could allow for fixing the loops, especially loop Y, and may be also for crystallization of the enzyme without UDP-Gal, which has so far been a problem. Detailed analysis of the coordination sphere of manganese indicates that the coordination sphere of the ion is very stable when UDP-Gal is present. In this case, the coordination number is 6. When UDP-Gal is removed, the coordination number increases to 7 by coordination of three bulk water molecules. The coordination number of 7 is confirmed by quantum mechanics calculations. Also in this case, the coordination is very stable. Altogether, we may conclude that the manganese ion is
L. Snajdrov a et al. / Carbohydrate Research 339 (2004) 995–1006
tightly bound in the active site of the enzyme, even if UDP-Gal is not present. In the analysis of structural water molecules, we focused on those residing in the active site of the protein. We found out that the number of water molecules is higher (and their residence times are correspondingly shorter) in the absence of donor substrate than in its presence, suggesting a higher mobility of water molecules and better accessibility of the active site in the former case.
4. Methods The simulations were performed using the A M B E R - 6 . 0 program package19 using the module SANDER. The initial coordinates were taken from the X-ray structure of LgtC galactosyltranseferase with UDP-2-deoxy-2fluorogalactose (UDP-Gal, donor analog) and 4-deoxylactose (acceptor analog), which is deposited in the Protein Databank20 under the code 1GA8.3 The original structure had to be modified. The four missing residues 218–221, which were not detected in X-ray experiment, were added in Sybyl 6.721 using protein loop search. In the simulation III, both donor and acceptor analog molecules were removed. In the simulations I and II, that differs only by the tautomeric state of His244, the acceptor analog molecule was removed and donor analog molecule was replaced with the native UDP-galactose. All of the simulated structures were built up by module XLEAP of A M B E R using the A M B E R force field parm94.22 The parameters and charges for UDP-Gal were taken from our previous parameterization.23 Hydrogens were also added by XLEAP. The system was solvated with TIP3P water molecules added as WATBOX216 cubes to form a rectangular water box. For details about number of water molecules and size of the box see Table 1. The minimum distance of the protein atoms from the walls of this rectangular box was set to In order to maintain electro-neutrality, 9 (trajec10 A. tories I, II) or 7 (trajectory III) Naþ counter ions were added by XLEAP. A stepwise equilibration of the system was performed before the production phase of MD was started. After the energy minimization of hydrogen atoms, ions and noncrystallographic water molecules were minimized, followed by full minimization of ions and all water molecules. To finish the equilibration of the solvent, a 20 ps MD of ions and water molecules was performed at 298 K. The equilibration continued with minimization of the whole system with gradual decreasing of restraints, which were imposed on side chains of the protein residues. After the last step, where no restraints were put on the protein, a 70 ps molecular dynamics with slow heating from 10 to 298 K was carried out, followed by 30 ps
1005
dynamics. Both equilibration MD simulations were carried out under constant pressure of 1 atm. The production phase of MD was performed at a constant pressure of 1 atm and a temperature of 298 K. A timestep of 1 fs was used for the simulations and the coordinates were saved every 1 ps. Bond lengths involving hydrogen atoms were constrained using the SHAKE algorithm.24 In all simulations, non-bonded interactions were treated with the Particle-Mesh Ewald (PME) method.25 The nonbonded pairlist was updated every 10 MD steps using a cutoff of 9 A. The MD trajectories were analyzed with the modules CARNAL and PTRAJ of A M B E R package. Essential dynamics analysis was performed with modules trjconv, g_covar and g_anaeig of G R O M A C S suit of programs.26;27 Tightly bound water molecules analysis was performed with program R E S T I M E (R. Rittenhouse, 2000, unpublished program), and in-house programs (Z. Kriz, 2002). Ab initio calculations were performed with Gaussian 98 molecular modeling package,28 using the Hartree– Fock method with 6-31G* basis set. Two restricted models of manganese ion coordination surrounding were used. The first model represents coordination 7 and the second one coordination 6. First model consists of truncated Asp103, Asp105, and His244 residues, Mn2þ ion, and water molecules 377, 457, and 9759. The Asp103 and Asp105 were truncated to formiate to mimic d-carboxyl groups. The His244 residue was truncated to imidazole ring. Initial coordinates were taken from the final coordinates of the trajectory III, coordinates of new hydrogen atoms were chosen in such Models were a way that new bonds had a length of 1 A. calculated with multiplicity six and with initial wavefunction guess calculated with STO-3G basis set. Hydrogen atom coordinates were first optimized followed by full energy minimization of the whole structure. The second model was generated from the result of the first model by removing the water 457. This structure was also fully geometry optimized.
5. Miscellaneous The drawings were prepared using the
VMD
program.29
Supporting information available List of tightly bound water molecules in LgtC active site (Table 1S), and the history of ÔpseudodihedralÕ angles as a representation of hinge (Ca245- Ca246- Ca247Ca248) and internal movement (Ca249- Ca250- Ca251Ca252) of the loop Y (Fig. 1S).
103
1006
L. Snajdrov a et al. / Carbohydrate Research 339 (2004) 995–1006
Acknowledgements L.S. is supported by a grant from the French Ministere des Affaires Etrangeres. The research has partially been supported (L.S., P.K.) by the Ministry of Education of the Czech Republic (grant LN00A0016). J.K.Õs stays in Grenoble were financed by CNRS. We are also grateful to Supercomputing Centre in Brno and the CECIC in Grenoble for providing us with computational resources.
References 1. Gotschlich, E. J. Exp. Med. 1994, 180, 2181–2190. 2. Tzeng, Y. L.; Stephens, D. S. Microbes Infect. 2000, 2, 687–700. 3. Persson, K.; Ly, H. D.; Dieckelmann, M.; Wakarchuk, W. W.; Withers, S. G.; Strynadka, N. C. J. Nature Struct. Biol. 2001, 8, 166–175. 4. Ly, H. D.; Lougheed, B.; Wakarchuk, W. W.; Withers, S. G. Biochemistry 2002, 41, 5075–5085. 5. Gastinel, L. N.; Bignon, C.; Misra, A. K.; Hindsgaul, O.; Shaper, J. H.; Joziasse, D. H. EMBO J. 2001, 20, 638–649. 6. Boix, E.; Swaminathan, G. J.; Zhang, Y. N.; Natesh, R.; Brew, K.; Acharya, K. R. J. Biol. Chem. 2001, 276, 48608– 48614. 7. Boix, E.; Zhang, Y. N.; Swaminathan, G. J.; Brew, K.; Acharya, K. R. J. Biol. Chem. 2002, 277, 28310–28318. 8. Patenaude, S. I.; Seto, N. O.; Borisova, S. N.; Szpacenko, A.; Marcus, S. L.; Palcic, M. M.; Evans, S. V. Nature Struct. Biol. 2002, 9, 685–690. 9. Nguyen, H. P.; Seto, N. O.; Cai, Y.; Leinala, E. K.; Borisova, S. N.; Palcic, M. M.; Evans, S. V. J. Biol. Chem., 2003, 278, 49191–49195. 10. Heissigerova, H.; Breton, C.; Moravcova, J.; Imberty, A. Glycobiology 2003, 13, 377–386. 11. Andre, I.; Tvaroska, I.; Carver, J. P. Carbohydr. Res. 2003, 338, 865–877. 12. Gunasekaran, K.; Ma, B.; Ramakrishnan, B.; Qasba, P. K.; Nussinov, R. Biochemistry 2003, 42, 3674–3687. 13. Resat, H.; Mezei, M. Biophys. J. 1996, 71, 1179–1190. 14. Amadei, A.; Linssen, A. B.; Berendsen, H. J. Proteins 1993, 17, 412–425. 15. Van Aalten, D. M. F.; De Groot, B. L.; Findlay, J. B. C.; Berendsen, H. J. C.; Amadei, A. J. Comput. Chem. 1997, 18, 169–181.
104
16. Wakarchuk, W. W.; Cunningham, A.; Watson, D. C.; Young, N. M. Protein Eng. 1998, 11, 295–302. 17. Breton, C.; Bettler, E.; Joziasse, D. H.; Geremia, R. A.; Imberty, A. J. Biochem. (Tokyo) 1998, 123, 1000–1009. 18. Busch, C.; Hofmann, F.; Selzer, J.; Munro, S.; Jeckel, D.; Aktories, K. J. Biol. Chem. 1998, 273, 19566–19572. 19. Case, D. A.; Pearlman, D. A.; Caldwell, J. W.; Cheatham T. E., III; Ross, W. S.; Simmerling, C. L.; Darden, T. A.; Merz, K. M.; Stanton, R. V.; Cheng, A. L.; Vincent, J. J.; Crowley, M.; Tsui, V.; Radmer, R. J.; Duan, Y.; Pitera, J.; Massova, I.; Seibel, G. L.; Singh, U. C.; Weiner, P. K.; Kollman, P.A. University of California: San Francisco, 1999. 20. Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. Nucl. Acids Res. 2000, 28, 235–242. 21. SYBYL; 6.4 ed.; 1699 S. Hanley Road, Suite 303, St Louis, MO 63144, USA. 22. Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Gould, I. R.; Merz, K. M. J.; Ferguson, D. M.; Spellmeyer, D. C.; Fox, T.; Caldwell, J. W.; Kollman, P. A. J. Am. Chem. Soc. 1995, 117, 5179–5197. 23. Petrova, P.; Koca, J.; Imberty, A. J. Am. Chem. Soc. 1999, 121, 5535–5547. 24. Ryckaert, J. P.; Cicotti, G.; Berendsen, H. J. C. J. Comp. Chem. 1977, 23. 25. Darden, T.; York, D.; Pedersen, L. J. Chem. Phys. 1993, 98, 10089–10092. 26. Berendsen, H. J. C.; van der Spoel, D.; van Drunen, R. Comp. Phys. Commun. 1995, 43–56. 27. Lindahl, E.; Hess, B.; van der Spoel, D. J. Mol. Mod. 2001, 306–317. 28. Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; Zakrzewski, V. G.; Montgomery, J. A.; Stratmann, R. E.; Burant, J. C.; Dapprich, S.; Millam, J. M.; Daniels, A. D.; Kudin, K. N.; Strain, M. C.; Farkas, O.; Tomasi, J.; Barone, V.; Cossi, M.; Cammi, R.; Mennucci, B.; Pomelli, C.; Adamo, C.; Clifford, S.; Ochterski, J.; Petersson, G. A.; Ayala, P. Y.; Cui, Q.; Morokuma, K.; Malick, D. K.; Rabuck, A. D.; Raghavachari, K.; Foresman, J. B.; Cioslowski, J.; Ortiz, J. V.; Baboul, A. G.; Stefanov, B. B.; Liu, G.; Liashenko, A.; Piskorz, P.; Komaromi, I.; Gomperts, R.; Martin, R. L.; Fox, D. J.; Keith, T.; Al-Laham, M. A.; Peng, C.Y.; Nanayakkara, A.; Challacombe, M.; Gill, P. M. W.; Johnson, B.; Chen, W.; Wong, M. W.; Andres, J. L.; Gonzalez, C.; Head-Gordon, M.; Replogle, E. S.; Pople, J.A. 98, Revision A. 9 ed.; Gaussian, Inc., 1998. 29. Humphrey, W.; Dalke, A.; Schulten, K. J. Mol. Graph. 1996, 33–38.
ARTICLE V (EN PREPARATION)
Kulhanek, P., Snajdrova, L., Imberty, A. & Koca, J.: Molecular dynamics simulations of glycosyltransferase LgtC with substrate donor and acceptor under physiological conditions. (soumis à J. Amer. Chem. Soc.) Résumé : Les simulations de dynamique moléculaire ont été réalisées sur la α-(1,4)galactosyltransférase (LgtC) de Neisseria meningitidis en complexe avec un ion manganèse et (a) le substrat donneur UDP-Gal, (b) le substrat donneur UDP-Gal et l'inhibiteur deoxylactose, ou (c) le substrat donneur UDP-Gal et le substrat accepteur lactose. Tous les systèmes considérés ont été entièrement solvatés et les conditions physiologiques ont été simulés [c(NaCl) ~ 0,154M]. L'analyse par la méthode de dynamique essentielle ne montre pas de grands changements structuraux près du site actif. Notamment, aucun mouvement n’est observé pour les deux boucles qui couvrent le substrat donneur, ce qui confirme une forte affinité pour l'UDP-Gal. De plus, dans toutes les simulations, le substrat donneur conserve la conformation inhabituelle qui avait été observée précédemment dans les structures cristallographiques. Le deoxylactose reste lié fortement au site actif durant toute la simulation de 10 ns. Aucune molécule d'eau n’est détectée à proximité du carbone anomérique du groupement galactosyl. Au cours de la simulation en présence de lactose, la situation est tout à fait différente. Cette simulation représente le comportement du complexe pré-reactionnel qui ne peut être observé par méthode structurale. Deux positions du lactose dans le site de liaison sont observées pendant les premières 7 ns, puis le lactose est libéré. En fait, un déplacement du lactose par rapport au deoxylactose est observé après la phase d'équilibration. Le comportement des trois acides aminés (Gln189, Asp190 et Tyr186) qui pourraient jouer le rôle de nucléophile dans le mécanisme de double déplacement SN2 - toujours hypothétique - a été également analysé. Plusieurs molécules d'eau ont été localisées à proximité du carbone anomérique de l'UDP-Gal, et donc le solvant pourrait également jouer un rôle dans la réaction enzymatique. Cette étude permet de proposer une première hypothèse du mécanisme de réaction, qui n'a toutefois pas encore été appuyée par des calculs quantiques ou confirmée par des résultats expérimentaux.
105
106
Molecular Dynamics Simulations of Glycosyltransferase LgtC with Substrate Donor and Acceptor under Physiological Conditions Petr Kulhánek,a,b Lenka Šnajdrová,a,c Anne Imberty,c and Jaroslav Kočaa,b* a) National Centre for Biomolecular Research and b) Department of Organic Chemistry, Faculty of Science, Masaryk University, 611 37 Brno, Czech Republic c) Centre de Recherches sur les Macromolécules Végétales (CERMAV-CNRS), Affiliated with Université Joseph Fourier de Grenoble, BP 53, 38041 Grenoble Cedex, France
* author for correspondence, E-mail:
[email protected]
107
ABSTRACT
Molecular
dynamics
simulations
have
been
performed
on
the
fully
solvated
a-(1,4)-
galactosyltransferase LgtC from Neisseria meningitidis with (a) donor substrate UDP-Gal (LU), (b) donor substrate UDP-Gal and deoxylactose inhibitor (LUD), and finally with (c) donor substrate UDPGal and acceptor substrate lactose (LUL) under physiological conditions [c(NaCl)~0.154M]. Essential dynamics analysis did not show larger structural changes close to the active site. Especially, no significant movements were observed in the two loops that cover substrate donor, which confirms a high binding affinity of UDP-Gal to the active site. Moreover, donor substrate in all simulations remained in the same unusual conformation which had been previously observed in crystal structures. Deoxylactose was found to be stably bound to the active site during the whole 10 ns long LUD simulation. No water molecules were detected in the close proximity of the anomeric carbon of the galactosyl moiety. A different behavior of lactose binding in comparison with deoxylactose was found in LUL simulation. This simulation represents behavior of pre-reaction complex, which had not been observed by any structural method yet. Two different binding modes were observed during first 7 ns, then the lactose was released from the active site. A shift of lactose in comparison with deoxylactose within the active site was found. The behavior of three possible nucleophiles Gln189, Asp190, and Tyr186 required in still hypothetical double displacement SN2 mechanism were also analyzed. However, no substantial movements of side chains of these amino acids towards anomeric carbon of galactosyl moiety required by this mechanism were observed. The reaction mechanism that we propose has yet to be supported by quantum chemical calculations and experimental data.
KEYWORDS: galactosyltransferase, molecular dynamics, tightly bound water, binding energy, active site solvation, physiological conditions 108
Introduction The glycosyltransferases catalyze the transfer of glycosyl moieties from a donor sugar to an acceptor. In most cases, the donor is a nucleoside phosphosugar and the acceptor is a hydroxyl group of another sugar, a lipid, or another component of glycoconjugates. The galactosyltransferase LgtC from Neisseria meningitidis is a retaining α-1,4-glycosyltransferase, a member of family 8 of glycosyltransferases,1 that transfers α-D-galactose from UDP-galactose (UDP-Gal) to a terminal lactose (LAC) (Scheme 1). The resulting structure is a mimic of the human Pk blood group glycolipid and can therefore participate in the camouflage of the bacterial surface from recognition by the human immune system. As Neisseria meningitidis can act as a human pathogen causing an invasive systemic infection that can be fatal,2 understanding of the substrate recognition and mechanism of LgtC is of interest for future design of inhibitors that could block lipooligosaccharide biosynthesis.
Scheme 1. Schematic representation of the LgtC catalyzed transfer of galactose from UDP-Gal to lactose. The product retains the configuration of the donor sugar glycosidic bond; LgtC is thus a retaining glycosyltransferase.
The crystal structure of LgtC has been recently solved as a complex with analogs of both donor (UDP2-deoxy-2-fluorogalactose) and acceptor (deoxylactose).3 The enzyme is a monomer composed of 286 residues that create nine strands and fourteen α-helices (Figure 1). The active site is mostly buried under 109
two loops that presumably play a key role in organizing the substrates before the reaction and probably also in removing the products from the active site. Loop X (residues 75-80) makes a connection between helices C and D. Loop Y consists of residues 246-251 and is located in the C-terminus region of the enzyme connecting sheet 9 and helix M. The two loops are on opposite sides of the active site groove. In the complex with substrate analogs, the two loops close above the donor sugar by a stacking interaction between Pro248 and His78.3 Structural data about the conformation of the loops in the native enzyme is not available because the absence of UDP-Gal donor substrate results in inability to crystallize the enzyme. However, there are computer simulation data4 showing opening of the loops when UDP-Gal is not present.
Figure 1. Structural information from the crystal structure of LgtC complexed with substrate analogs. α−helices in purple; β−sheets in yellow; deoxylactose in iceblue; UDP-Gal in orange (UDP) and blue (galactosyl); X loop in red; Y loop in green. Drawing prepared by VMD.5
The structure of LgtC is a key starting point to fully understand the mechanism of the enzymatic reaction that is an essential precondition for rational design of effective inhibitors of this enzyme. Even if a reasonable effort was put into the field, there are still several important unanswered questions concerning binding substrates and the reaction mechanism of glycosyltransferases. In the case of inverting glycosyltransferases, the experimental structural information6,7 and also theoretical 110
calculations8,9 support an idea of a single SN2 nucleophilic attack by one of the hydroxyl groups of the acceptor. However, the situation is much less clear for retaining glycosyltransferases. There are several studies that experimentally exclude the so-called double displacement mechanism.10 In this mechanism, two consecutive SN2 steps each inverting configuration would occur so the final configuration is retained. Studies excluding double displacement mechanism are also supported by theoretical investigations11 that prefer a single step SNi reaction. However, the energy barrier calculated by this theoretical study is considerably higher (31 kcal/mol for model in vacuum and from 10 to 24 kcal/mol for models in enzyme environment approximated by homogeneous dielectric material) than that could be estimated from experimentally measured rate constants (12–14 kcal/mol).3,10 Only recently a paper appeared12 that again supports the idea of the double displacement mechanism. A review collecting above information can be found in refs 13 and 14. Another issue is that there was never considered a water molecule that would be involved in the reaction mechanism as there was no one found in existing crystal structure that would be close to the active site. However, the question is whether the situation is the same in a water environment and for real (not modified) substrates. Finally, there are also several issues concerning arranging the starting material to react and the products of the reaction to depart from the active site. Our attempt with this paper is to address the above questions. We will show differences in deoxylactose and lactose binding to the active site. Additionally, we will also somewhat weaken the statement coming from the X-ray structure analysis that there is no solvent water molecule present in the active site. Finally, we will contribute to the ongoing discussion on the double displacement mechanism by studying the role of the ASP190 residue and two other residues.
111
We have chosen molecular dynamics for our studies as it becomes a reliable and frequently used computer simulation method that may usefully complement experimental observations. Recently, molecular dynamics has been used to analyze the backbone flexibility of an inverting glycosyltransferase15 and also to study LgtC.4 We report here the first molecular dynamics simulations of LgtC that was run with both substrates in explicit water solvent and with such concentration of sodium ions that would best mimic the physiological conditions.
Computational Methods Molecular Dynamics Simulations. Three simulations abbreviated LU, LUD, and LUL (see Table 1) of 10 ns length for each of them were performed using the AMBER-7.0 program package16 using the modules SANDER and PMEMD.17 The initial coordinates were taken from the X-ray structure of LgtC galactosyltransferase with UDP-2-deoxy-2-fluorogalactose (donor analog) and 4-deoxylactose (DAC, acceptor analog), which is deposited in the Protein Databank18 under the code 1GA8.3 The original structure had to be modified. The four missing residues 218-221, which were not detected in X-ray experiment, were added in Sybyl 6.719 using protein loop search. The donor analog molecule was replaced with the native UDP-Gal in three simulations LU, LUD, and LUL. In simulation LUL, the deoxylactose was further replaced with lactose. All of the simulated structures were built up by module XLEAP of AMBER using the AMBER force field parm99.20 The parameters and charges for UDP-Gal were taken from our previous parameterization.21 Deoxylactose and lactose were described by GLYCAM04 force field.22 Manganese atom in the active site were describe as ion with charge two plus.23 Hydrogens were added by XLEAP. The system was solvated with TIP3P water molecules added as WATBOX216 cubes to form a rectangular water box. For details about number of water molecules and size of the box see Table I. The minimum distance of the protein atoms from the walls of this rectangular box was set to 10 Å. In order to maintain electro-neutrality and isotonic conditions, Na+ and Cl- ions were added to the simulated systems by XLEAP. 112
A stepwise equilibration of the system was performed before the production phase of MD was started. After the energy minimization of hydrogen atoms, ions and non-crystallographic water molecules were minimized, followed by full minimization of ions and all water molecules. To finish the equilibration of the solvent, a 50 ps long molecular dynamics simulation of ions and water molecules was performed at 298 K. The equilibration continued with minimization of the whole system with gradual decreasing of restraints which were imposed on side chains of the protein residues. After the last step, where no restraints were put on the protein, a 70 ps molecular dynamics with slow heating from 10 K to 298 K was carried out, followed by 30 ps dynamics. Both equilibration MD simulations were carried out under constant pressure of 1 atm. The production phase of molecular dynamics was performed at a constant pressure of 1 atm and a temperature of 298K. A timestep of 2 fs was used for the simulations and the coordinates were saved every 1 ps. Bond lengths involving hydrogen atoms were constrained using the SHAKE algorithm.24 In all simulations, non-bonded interactions were treated with the particle-mesh Ewald (PME) method25 with direct sum truncated at a cutoff of 9 Å. Trajectory Analysis. The MD trajectories were analyzed with the module PTRAJ of AMBER package (for geometry measurements, radial distribution function). Essential Dynamics Analysis (EDA) was performed with modules trjconv, g_covar and g_anaeig of GROMACS suite of programs.26,27 Tightly bound water molecules analysis and the calculation of 2D-radial distribution functions were performed with program RETINAL.28 This program uses fast method for searching contacts between solute and solvent based on subcell method. Binding Energy Calculations. MM/PBSA29,30,31 and MM/GBSA30,31 were used to calculate binding energies. These methods use the same approach that differs only in the evaluation of the electrostatic part of solvation energy. The binding energy ∆Gb is calculated from the absolute free energies of complex Gc and individual receptor Gr and ligand Gl as follows: ∆Gb = Gc – (Gr + Gl)
113
The complex is LgtC protein with both sugar donor and acceptor. The ligand is either DAC or LAC. The receptor is a complex of LgtC only with UDP-Gal. Absolute free energies are calculated in both methods in the similar way: G = - TS + Gpol + Gnon The first two parts of formula represent free energy of the system in vacuum. The last two terms then represent solvation free energy (e.g., the free energy related to the movement of the system from vacuum to solvent) that is divided in two individual contributions. The Gpol is the electrostatic part of solvation energy which is calculated either using Poisson-Boltzmann (PB) approach (MM/PBSA) or GeneralizedBorn (GB) approach (MM/GBSA). Different atom radii that determine boundary between solute and solvent were used for PB32 and GB.33 In PB calculations, the interior relative permitivity was set to 2 and the solvent relative permitivity to 78 which correspond to water. In GB calculations, the interior relative permitivity was set to 1 and the solvent relative permitivity to 78. In both approaches, the atomic charges were the same as in molecular dynamics simulations. The Gnon is the non-polar part of solvation energy which is calculated from solvent accessible surface area (SASA). There is a linear relationship between SASA and Gnon. Tension factor (slope of this linear relationship) of 0.00542 and 0.0072 was used in MM/PBSA32 and MM/GBSA34 calculations, respectively.
114
Results and Discussion Overall Characteristics of the Simulations. Three trajectories were run with total production length of 10 ns for each of them (Table 1). The simulations were carried out in the explicit water and in the presence of Na+ and Cl- ions. The concentration of NaCl molecules was chosen in such a way that the simulation mimic physiological conditions, e.g., c(NaCl) is close to 0.154 mol/l. The protein contains 31 positively and 30 negatively charged groups on the surface, which are even in some regions very close to each other. Non-zero ionic strength could then avoid possible structural changes caused by electrostatic attraction or repulsions among these groups because electrostatics of the system is much better balanced. Electrostatic field introduced by ions could also increase apparent permitivity in protein core, which could consequently slightly change the overall structure. In our previous study, physiological conditions were not used and the obtained results were still in good agreement with experimental observations. The reason why we decided to change simulation condition in presented simulations was that the binding affinity of deoxylactose or lactose is significantly lesser than that of UDP-Gal described in previous work. The neglecting of above mentioned ionic strength effects could then be fatal for the stabilities and correctness of simulations.
Table 1. Overview of simulated systems Abreviation Contents
Conditions
LU
LgtC + UDP-Gal
c(NaCl)=0.124 Mol/l, 14137 water molecules, 44 Na+ atoms, 35 Cl- atoms
LUD
LgtC + UDP-Gal + Deoxylactose c(NaCl)=0.127 Mol/l, 13769 water molecules, 44 Na+ atoms, 35 Cl- atoms
LUL
LgtC + UDP-Gal + Lactose
c(NaCl)=0.124 Mol/l, 14102 water molecules, 44 Na+ atoms, 35 Cl- atoms
115
Two simulations, namely LUD and LUL, were run to show the behavior of LgtC catalytic system either in the presence of sugar acceptor model (lactose) or the inhibitor (deoxylactose). The remaining simulation, LU, was run to show the behavior of loops that cover UDP-Gal in the active site in longer simulation time and under different simulation conditions than in previous work.4 Simple measurement of structural changes occurring during three simulations LU, LUD, and LUL is shown in Figure S1 (see Supporting Information section). It is represented by root-mean-square deviation (RMSD) of Cα atoms from their crystallographic positions. Large changes only occurred during first 2 to 3 ns and then the systems do not change significantly. In the period from 3 ns to 10 ns, average RMSD (and its variance) are 1.66 (0.09), 1.62 (0.10), and 1.48 (0.09) Å for LU, LUD, and LUL simulations, respectively. However, the RMSD quantity is hardly able to show regions, in which larger or smaller structural changes occurred. For this purpose, Essential Dynamics Analysis26,27 was used with the subsequent analysis of Cα atom fluctuations. Only Cα atoms were considered in EDA. In the first step, covariance matrix build from the fluctuations of these atoms around their average positions was calculated. Then its eigenvalues and their corresponding eigenvectors were evaluated. Eigenvectors can be used to split overall motion along trajectory into separate modes. The modes with greater eigenvalues represent slow motion – essential motion, whereas modes with smaller eigenvalues represent fast motion – thermal motion. In the final step, eigenvectors represented by 10 greatest eigenvalues were used for the construction of new trajectory from old one that does not contain thermal motion noise. Such trajectories for LU, LUD, and LUL simulations were analyzed. To find the regions with largest changes, root-mean-square fluctuations of individual Cα atoms were calculated (Figure 2). Main structural changes (RMSF > 0.75 Å) occur in the following protein parts (expressed by residue numbers): LU simulation 55-57, 66-68, 136-148, 182-183, 217-220, 222-228, 255; LUD simulation 2729, 53-57, 136-147, 216-224, 226, 255, 257-258, 260-; LUL simulation 26-28, 55-57, 121, 146-147, 218-221, 254-258. Described structural changes are mainly caused by the movements of loops on the protein outer interface. These protein regions also116 exhibit high values of B-factors in corresponding X-
ray structure (Figure S2 – see Supporting Information section), which means that above mentioned regions represent flexible parts of protein. The most important observation is that these structural changes do not influence overall protein folding and the structure of the active site and its close proximity on the time scale the simulations were run. This can be demonstrated by minimal distance between C1 atom of galactosyl moiety and Cα atom whose RMSF is greater than 0.75 Å These values for the first snapshots of trajectories (after equilibration) are 14.8 (GLN136), 15.3 (GLN136), 15.8 (ALA258) Å for LU, LUD, and LUL, respectively.
Figure 2. Root-mean-square fluctuations of Cα atoms from their average positions along peptide chain. Projected trajectories using eigenvectors represented by 10 greatest eigenvalues from EDA were used.
Stability of UDP-Gal binding into the active site and active site loops flexibility. In our previous work,4 main attention was focused to the analysis of how the X and Y loop behavior is influenced by presence/absence of UDP-Gal that is somewhat “covered” or stabilized by the loops. We have concluded that the loops spontaneously open when UDP-Gal is not present in the active site and keep closed in the opposite case. However, our simulations in that study were only 3 ns long and the ionic strength was lower (c(Na+) ≈ 0.03 Μ). In the present study, the loops were found to be closed during whole 10 ns simulations confirming conclusions from ref. 4. Very short loop opening was observed at 117
the beginning of simulation LU (Figure 3a) where the distance between X and Y loops is overcoming 9 Å. Similar situation occurred at the end of the simulation. However, observed loops opening is significantly smaller than in the situation without UDP-Gal in active site, in which loops were opened up to 15 Å during 3 ns long simulation.4 In the remaining time, the distance oscillates in the short range from 5.0 to 6.5 Å (Figure 3b). The stability of UDP-Gal binding into active site was also confirmed by EDA analysis, which does not show any significant essential motions of loops. The observed loop stability is in good agreement with experimental data10 showing that UDP-Gal binding to the active site is very strong (Km = 18 µΜ) and, therefore, a spontaneous releasing of UDP-Gal from the active site accompanied by the loops opening would not be likely on the time scale of our simulations.
Figure 3. (a) Distance between loop X and Y represented by distance between two Cα atoms of HIS78 and PRO248 residues. (b) Histogram analysis of distance distributions.
Deoxylactose and lactose binding into the active site – geometry considerations. Distance between C4' carbon of DAC/LAC (carbon atom in position 4 that binds the acceptor group in the case of LAC) and galactosyl C1 (anomeric) carbon was used as the first geometrical criterion to judge stability of the enzyme/substrate complex (Figure 4). The binding of deoxylactose to the active site is stable during the whole LUD simulation. The average value is 3.71 Å with root-mean-square deviation (rmsd) of about 0.19 Å. The situation for LUL simulation is completely different. Two changes in its binding were observed. The first one occurred at about 2 ns and the second one at about 7 ns. The average distance value for the first period is 4.37 Å with rmsd of about 0.15 Å. Small shift to 4.41 Å in average distance was observed for the second period but with significant increasing of its rmsd to 0.28 Å. After 7ns, the releasing of lactose from the active site was observed. 118
It can be seen that the average distance in LUL simulation is shifted by about 0.7 Å to a larger value compared to LUD simulation. This behavior might be expected as the hydroxyl group at position 4 exhibits additional steric requirements that have to be balanced by longer distance between the acceptor and donor molecules.
Figure 4. (a) Evolution of distance between the C1 carbon of UDP-Gal and the C4' lactose or deoxylactose carbon atom. (b) Histogram analysis of the above distance distributions.
Figure 5. (a) Evolution of the angle connecting C4', C1, and OP1 atoms. (b) Histogram analysis of the above angle distributions.
Two observed lactose binding modes were confirmed by measuring of the angle among C4', C1, and OP1 atoms (Figure 5). Because changes in the lactose binding occurred during the equilibration stage, the angle values at the beginning of production part of the trajectory differ for LUD and LUL simulations. Lactose binding mode switches back at about 2 ns. This is accompanied by a stepwise change of the measured angle. No similar stepwise change occurs for deoxylactose confirming stability of its binding to the active site during the whole LUD simulation. This is also confirmed in Figure 6 that shows superposition of X-ray structure and the average structure of the active site from the whole LUD simulation. Only small changes in positions of amino-acid residues were observed. 119
Figure 6. Superposition of X-ray (red) structure and the average structure from the whole LUD simulation (blue).
Figure 7. Superposition of the first (blue) and second (red) lactose binding modes. (View from the center of the protein).
Comparison of two lactose binding modes is shown in Figure 7. In the first mode, lactose is situated in such a position that hydroxyl group is above C1 carbon of UDP-Gal. In this case, the lactose could hypothetically attack UDP-Gal with the change of configuration on C1 atom. However, this is not in agreement with experimental observation showing that the configuration is retained. If the first mode represents a real situation then it would mean that the attack from this side would have to exhibit higher activation barrier than the real mechanism. Another and more probable explanation is that the first binding mode is just an artifact of LUL simulation because it was started from artificially constructed structure. It follows from the detailed analysis of the first 2 ns long period follows that lactose is continuously burring into the protein and when some critical point is reached it is very quickly (during several ps) switched to the second observed binding mode. In this mode, hydroxyl group of lactose is situated from the same side as diphosphate part as it would be expected. Lactose stays in this mode 120
approximately 5 ns. From the comparisons of the two lactose binding modes, it also follows that the change in lactose binding does not influence structure of UDP-Gal in which it remains in an unusual folded conformation.3
Behavior of possible nucleophiles during double displacement SN2 reactions. As noticed in introduction section, there is no complete knowledge of the reaction mechanism available. Mechanism that is suggested in other configuration retaining transferases requires a presence of a nucleophile somewhere directly within the enzyme itself. The transfer is then realized in two inverting steps, so the final configuration is the same as it was the starting one. It was speculated that GLN189 could be such a nucleophile within LgtC.3 It is not a strong nucleophile as for example glutamate but its terminal amide oxygen was found in the crystal structure in position very close to C1 atom from the required side of galactosyl moiety. The evolution of distance between such oxygen and C1 atom shows similar behavior in all presented simulations (Figure S3 – see Supporting Information section). Averaged values and rmsd of measured distance are 4.24 (0.40), 4.10 (0.29), and 4.81 (0.37) Å for LU, LUD, and LUL simulations, respectively. Only shift of about 0.6 Å to larger value was observed in the LUL simulation compared to the remaining two. This shift is connected with the movement of lactose to the above of the galactosyl moiety during the first 2 ns (as discussed before). This shift remains when lactose changes its binding position and even if it is released from the active site. Above analysis shows that GLN189 remains in the proper space in all simulations. However, experimental studies excluded this residue as possible nucleophile candidate.3 The mutation of GLN189 to ALA189 decreases enzyme activity to 3 % of the activity of the wild type. This means that such residue cannot be essential for the mechanism because sufficient activity still remains. We can also conclude that it is very unlike that another nucleophile candidate exists because GLN189 constantly occupies the space that would have to be reached by it. We will show it in following parts, in which the behavior of two other nucleophile candidates will be inspected. 121
First of considered nucleophile candidates is TYR186. This residue was found in the crystal structure relatively close to the anomeric carbon C1. Furthermore, its position in the enzyme might be suitable for the attack of the UDP-Gal from required side. However, oxygen atom of hydroxyl group situated on aromatic ring of this residue did not move towards C1 atom during any of the simulation (Figure S4 – see Supporting Information section). The average values and rmsd of distance between the oxygen and C1 atom are 9.57 (0.66), 9.60 (0.49), and 10.20 (0.45) Å for LU, LUD, and LUL simulations, respectively. The best situation is in simulation LU, in which the closest distance of about 6.6 Å was found. Also experimental observation excluded this residue because its mutation to phenylalanine lead even to the increase of catalytic activity by about 50%.3 Very recently, based on experimental studies, also the ASP190 residue was proposed as new candidate to play the nucleophile role.12 This is very surprising as the carboxyl group of this residues is as far as about 9 Å distant from the anomeric C1 carbon in the crystal structure. The idea that comes the first to explain this discrepancy is that a conformational change on the protein occurs that brings the carboxyl group of ASP190 more close to the anomeric carbon. Therefore, we have analyzed this distance in the LU, LUD, and LUL simulations (Figure S5 – see Supporting Information section) and found no substantial distance change in the course of all simulations. The average values and rmsd of distances between the closest carboxylic oxygen and C1 atom are 10.53 (0.45), 10.15 (0.52), 9.89 (0.35) for LU, LUD, and LUL simulations, respectively. Longer average distances for LU and LUD simulations were found because ASP190 is even moving apart from C1 atom. So there is no sign that the carboxyl group of ASP190 would get closer to the anomeric carbon within the simulated time scale. As this simulation time scale is substantially shorter than the rate of the enzymatic reaction, it is realistic to expect that also slower conformational changes could successfully contribute to bring the reacting groups closer to each other. But in this case very large conformational changes would have to occur.
Analysis of active site solvation and search for tightly bound water molecules. We have calculated radial distribution function (RDF) based on distance r between the oxygen atom of water molecules and 122
C1 carbon atom for LU, LUD, and LUL simulations (Figure S6 – see Supporting Information section). This function provides information about the occurrence of water molecules in close neighborhood of the active site. Surprisingly, RDF functions for LU and LUL simulations show a presence of water molecules close to C1 atom. This is in direct contradiction with conclusions drawn in ref. 3, where no water molecule was expected close to the active site. Because RDF does not include information about spatial distribution of water molecules around the active site, we extended it to two dimensions (Figure 10). The first one is the usual distance as discussed above and the second one is the angle comprised by the water oxygen, C1, and, OP1 atoms. The value of this angle shows whether the water oxygen could attack the C1 atom as SN2 nucleophile.
Figure 10. Radial distribution functions with spatial extension. (a) LU, (b) LUD, (c) LUL from 0 to 2 ns, and (d) LUL from 2 to 5ns.
Two solvation shells are well formed in LU simulation. The first shell is within range from 3 to 5 Å. Integration shows that there are on average 3.8 water molecules in it. The shell is represented by two 123
separated peaks in RDF. The larger one has maximum at 3.7 Å and 47 deg. It represents water molecules that interact with oxygen atoms of diphosphate part of UDP-Gal. The second peak has maximum at 3.7 Å and 116 deg. It represents water molecules that could attack C1 atom nearly from the opposite site as diphosphate group is situated. This means that these water molecules would be responsible for hydrolysis of UDP-Gal. However, the residence times of all water molecules in the first shell are very short. Average value is about 9 ps with longest residence time about 570 ps. Even these “long residency” water molecules very quickly exchange between the two occupied subspaces of the first shell. This could be an explanation why the observed hydrolysis is so slow in comparison with the catalyzed galactosyl transfer.10 The above finding is in good agreement with the crystal structure of LgtC with donor analog3 (PDB code 1G9R), where one water molecule is present in such a position that the geometrical parameters used above are of values 4.4 Å and 51 deg, respectively. Also acetic acid was found close to anomeric C1 atom in this crystal structure. One of its oxygens exhibits geometrical parameters 3.6 Å and 116 deg, which are almost identical as the parameters shown above for the second peak. We found two other crystal structures (PDB codes 1LL2 and 1UQU) of retaining glycosyltransferases that exhibit the presence of water molecules at least in one of above described positions.35,36 The second shell of C1 atom was then found within range from 5 to 7 Å and the average number of water molecules in it is about 4.1. When deoxylactose is present in the system, water molecules are effectively rejected from the close proximity of C1 atom. Water molecules are only detected in the distance larger than 5 Å. However, one exception was found. One water molecule reached C1 in shorter distance approximately after 3 ns of the simulation. This water occupied the position for about 12 ps. This water molecule originates from one of the water molecules found in the crystal structure and it participates with other water molecules in the binding of deoxylactose to the active site. The RDF was calculated separately for two binding modes of lactose. No water molecule was found closer to C1 atom than 5.5 Å in the first binding mode (see Figure 11a). The situation is different in the second binding mode. Well formed peak at position 3.5 Å and 129 deg shows the presence of water 124
molecules close to C1 atom. Integration of this peak in the range from 3 to 5 Å leads to average number of 2.4 water molecules. The representative pictures from the beginning and the end of the second mode are shown in Figure 11b and 11c. Different behavior of lactose compared to its deoxy analogue is probably caused by the presence of hydroxyl group at position 4'. As it was shown above this moves lactose by about 0.7 Å away from the active site. This shift together with the small change of GLN189 position generates space enough to enter water molecules.
125
Figure 11. Solvent behavior during lactose binding to LgtC. (a) first binding mode (1 ns), (b) second binding mode (2 ns), (c) second binding mode (6 ns). Drawing prepared by CHIMERA37 using solvent-excluded molecular surface created by MSMS package.38
126
Involved water molecules quickly exchange in the small cavity formed between lactose and UDP-Gal. It is interesting that they partially interact with carboxylic group of the ASP130. In some cases, one water molecule reaches position from which it could attack UDP-Gal as SN2 nucleophile. The ASP130 could then indirectly serve as the base that accepts proton from this water molecule (through water molecule chain or through lactose hydroxyl group in position 6'). As the result, galactose, UDP, and lactose would be present in the active site. In the next step of this hypothetical mechanism, the lactose hydroxyl group would attack C1 carbon from the opposite site. UDP would act as the base in this step. Proposed mechanism is shown in Scheme 2.
Scheme 2. Proposed mechanism of LgtC catalysis including water molecule in the active site.
127
Lactose and deoxylactose binding into the active site – energy considerations. The above geometrical observations show different behavior of deoxylactose and lactose in the active site. We will now quantify the changes by calculating the free energy of binding using MM/PBSA and MM/GBSA approaches. Evolution of binding free energy and its components calculated by MM/PBSA approach using a single trajectory (substrates/enzyme complex) is shown in Figure 12. It is clearly seen that the binding free energy for deoxylactose is lower than that for lactose during the entire simulations. There is also difference in binding energies between the first and second binding mode of lactose and between the second binding mode and the release of lactose from the active site. This change is more visible in the evolution of the free energy MM and PB components, especially for the release of lactose from the active site (SA component is very small and nearly constant therefore it is not shown in the graph). Average values of binding energies are summarized in Table 2. Both MM/PBSA and MM/GBSA methods give similar results. Calculated binding energies show that deoxylactose binds tightly than deoxylactose, which is accompanied by the binding energy differences of about 5 kcal/mol. When lactose switches from the first to the second binding mode, its binding energy further decreases by about 1.7 kcal/mol. Experimental lactose binding energy3,10 calculated from its Km value (20 mM) is about -2 kcal/mol. Our value is then overevaluated. The probable reason is that several approximations are used in our calculations. Especially solute entropy changes and deformation energies of protein and free deoxylactose and lactose were omitted in our calculations. The next problem is accompanied by the presence of water molecules that mediate the binding of both deoxylactose and lactose to the active site. This is especially important when lactose change its binding mode. There are several approaches how to deal with such water molecules39 but they have to permanently occupy some space element, which is not our case. Even if there are some approximations and limitations of these methods we can still use above results to drawn qualitative conclusions. First, they at least show that lactose bind to the active site less stable than deoxylactose, which is in agreement with experimental observation10 (Kei of deoxylactose is about 16 mM). Second, the binding energy decreases when lactose switches between the first and
128
second binding mode. This decrease could mean that the observed second binding mode is actually the beginning of lactose release from the active site.
Table 2. Binding free energies calculated by MM/PBSA and MM/GBSA methods. Simulation Period [ns]
∆Gb(MM/PBSA) [kcal/mol]
∆Gb(MM/GBSA) [kcal/mol]
LUD
0-10
-16.3
-15.4
LUL
0-2
-10.8
-10.7
LUL
2-7
-9.1
-8.9
Figure 12. (a) Evolution of the MM and PB binding energy components. (b) Evolution of total MM/PBSA binding free energy.
129
Conclusion Three simulations LU, LUD, and LUL of LgtC enzyme with UDP-Gal, UDP-Gal and deoxylactose, and UDP-Gal and lactose fully solvated and under physiological conditions were run with total length of 10 ns for each of them. No simulation shows any significant changes in LgtC folding. Larger changes only occurs in four regions that mostly represent flexible loops. An important observation is that these changes do not occur in close proximity of the active site. In all simulations, the loops covering UDPGal in unusual conformation stay closed, which is in agreement with very low Km value of its binding to the active site. A well formed first solvation shell containing about 3.8 water molecules was found in the LU simulation within distance range from 3 to 5 Å from C1 atom. It principally occupies two subspaces. The first one is on the same side of galactosyl moiety as the diphosphate. The second one is in a position from where the water molecule might attack UDP-Gal. These two subspaces well corresponds to the oxygen atoms of water and acetate found in the crystal structure of LgtC with UDP-Gal analog. Different behavior of deoxylactose and lactose was found. Deoxylactose was found to bind tightly to the active site during the whole LUD simulation and no water molecule was found in the close proximity of the UDP-Gal anomeric carbon C1, which is also in agreement with known crystal structure. When lactose is present in the active site, two different binding modes were observed. In the first mode, the lactose is continuously burring in to the active site during about 2 ns in such a way that hydroxyl group in 4' position approaches the anomeric carbon C1 of UDP-Gal from opposite side than it should be according to the reaction stereochemisty. No water molecules were found close to C1 carbon in this mode. Lactose is then very quickly switched to the second mode, in which the hydroxyl group is on the correct side of galactosyl moiety. This mode is exhibited by the presence of about 2.4 water molecules in the close proximity of carbon C1. These water molecules are situated approximately in the same space as the second part of the first solvation shell in LU simulation. Moreover they are pushed by lactose more above the galactosyl moiety. The oxygen of attacking lactose hydroxyl group then occupies the position as the first part of the first solvation shell in LU simulation. This finding is surprising as no 130
water molecules were supposed to be in the active side. One of the reasons would be that the hydroxyl group, which is missing in deoxylactose, exhibits additional steric requirements that are balanced by longer distance between the acceptor and donor molecules. After 5 ns long second binding mode, the release of lactose from the active site was observed. The role of possible nucleophile candidates GLN189, TYR186, and ASP190 in still hypothetical double SN2 displacement mechanism were also evaluated. None of them moves towards the anomeric carbon C1 in any simulation. Therefore it seems that they do not play a key role in this type of mechanism. However, the presence of water molecules in the second lactose binding mode show that these well positioned water molecules could play their role. Nevertheless, more studies are necessary to prove it.
Acknowledgments. The authors thank the Supercomputing Centre in Brno, Czech Republic, for providing access to computer facilities. Financial support from the Ministry of Education, Youth, and Physical Training of the Czech Republic (Contract No. MSM0021622413 – PK and LC06030 – JK) and from the Grant Agency of the Czech Republic (Grant No. 204/03/H016 - LS) is gratefully acknowledged. Supporting Information Available: Figure S1. Evolution of root-mean-square deviation of Cα atom positions from their X-ray positions in the course of simulations.; Figure S2. Experimental Bfactors of Ca atoms from crystal structure of LgtC complex with donor and acceptor analogs.3; Figure S3. (a) Evolution of the distances between C1 atom and the amide oxygen of GLN189 residue. (b) Histogram analysis of the above distances.; Figure S4. (a) Evolution of the distances between C1 atom and the hydroxyl oxygen of TYR186 residue. (b) Histogram analysis of the above distances.; Figure S5. (a) Evolution of the distances between C1 atom and the closest carboxylic oxygens of ASP190 residue. (b) Histogram analysis of the above distances.
131
References (1) Coutinho, P. M.; Deleury, E.; Davies, G. J.; Henrissat, B. J. Mol. Biol. 2003, 328, 307-317. (2) Tzeng, Y. L.; Stephens, D. S. Microbes Infect. 2000, 2, 687-700. (3) Persson, K.; Ly, H.D.; Dieckelmann, M.; Wakarchuk, W. W.; Withers, S. G.; Strynadka, N. C. Nature Struct. Biol. 2001, 8, 166-175. (4) Šnajdrová, L.; Kulhánek, P.; Imberty, A.; Koča, J. Carbohydrate Research 2004, 339, 995-1006. (5) Humphrey, W.; Dalke, A.; Schulten, K. J. Mol. Graphics 1996, 14, 33-38. (6) Ünligil, U. M.; Rini, J. M. Current Opinion in Structural Biology 2000, 10, 510-517. (7) Tarbouriech, N.; Charnock, S. J.; Davies, G.J. J. Mol. Biol. 2001, 314, 655-661. (8) Tvaroska, I.; Andre, I.; Carver, J. P. J. Am. Chem. Soc. 2000, 122, 8762-8776. (9) Tvaroska, I.; Andre, I.; Carver, J. P. Glycobiology 2003, 13, 559-566. (10) Ly, H. D.; Lougheed, B.; Wakarchuk, W. W.; Withers, S. G. Biochemistry 2002, 41, 5075–5085. (11) Tvaroska, I. Carbohydr. Res. 2004, 339, 1007-1014. (12) Lairson, L. L.; Chiu, C.P.; Ly, H. D.; He, S.; Wakarchuk, W. W.; Strynadka, N. C.; Withers, S. G. J. Biol. Chem. 2004, 279, 28339-28344. (13) Lairson, L. L.; Withers, S. G. Chem. Commun. 2004, 20, 2243-2248. (14) Breton, C.; Snajdrova, L.; Jeanneau, C.; Koca, J.; Imberty, A. Glycobiology 2006, 16, 29R-37R. (15) Gunasekaran, M.; Ma, B.; Ramakrishnan, B.; Qasba, P. K.; Nussinov, R. Biochemistry 2003, 42, 3674-3687.
132
(16) AMBER 7, Case, D. A.; Pearlman, D. A.; Caldwell, J. W.; Cheatham III, T. E.; Wang, J.; Ross, W. S.; Simmerling, C. L.; Darden, T. A.; Merz, K. M.; Stanton, R. V.; Cheng, A. L.; Vincent, J. J.; Crowley, M.; Tsui, V.; Gohlke, H.; Radmer, R. J.; Duan, Y.; Pitera, J.; Massova, I.; Seibel, G. L.; Singh, U. C.; Weiner, P. K.; Kollman. P.A. University of California, San Francisco, 2002. (17) PMEMD 3, Robert, E. D.; Pedersen, L. G. University of North Carolina-Chapel Hill, 2003 (18) Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. Nucleic Acids Res., 2000, 28, 235-242. (19) SYBYL. 1699 S. Hanley Road, Suite 303, St Louis, MO 63144 USA. (20) Wang, J.; Cieplak, P.; Kollman, P. A. J. Comp. Chem. 2000, 21, 1049-1074. (21) Petrova, P.; Koca, J.; Imberty, A. J. Amer. Chem. Soc., 1999, 121, 5535-5547. (22) GLYCAM04, Kirschner, K. N.; Tschampel, S. , Woods, R. ??? (23) Bradbrook, G. M.; Gleichmann, T.; Harrop, S. J.; Habash, J.; Raftery, J.; Kalb, J.; Yariv, J.; Hillier, I. H.; Helliwell, J. R. J. Chem. Soc. - Farad. Trans. 1998, 94,1603-1611. (24) Ryckaert, J. P.; Cicotti, G.; Berendsen, H. J. C. J. Comp. Chem., 1977, 23. (25) Darden, T.; York, D.; Pedersen, L. J. Chem. Phys. 1993, 98, 10089-10092. (26) Berendsen, H. J. C.; van der Spoel, D.; van Drunen, R. Comp. Phys. Comm., 1995, 43-56. (27) Lindahl, E.; Hess, B.; van der Spoel, D. J. Mol. Mod., 2001, 306-317. (28) RETINAL 1.0 and 2.0-beta, Kulhánek, P. Masaryk University, Brno, Czech Republic, 2005 (29) Srinivasan, J.; Miller, J.; Kollman, P. A.; Case, D. A. J. Biol. Struct. Dyn. 1998, 16, 671-682. (30) Srinivasan, J.; Cheatham, III, T. E.; Kollman, P.; Case, D. A. J. Am. Chem. Soc. 1998, 120, 9401-9409. 133
(31) Kollman, P. A.; Massova, I.; Reyes, C.; Kuhn, B.; Huo, S.; Chong, L.; Lee, M.; Lee, T.; Duan, Y.; Wang,W.; Donini, O.; Cieplak, P.; Srinivasan, J.; Case, D. A.; Cheatham, III, T. E. Accts Chem. Res. 2000, 33, 889–897. (32) Sitkoff, D.; Sharp, K. A.; Honig, B. J. Phys. Chem. 1994, 98, 1978-1988. (33) Tsui, V. ; Case, D. A. Biopolymers (Nucl. Acid. Sci.) 2001, 56, 275-291. (34) Gohlke, H.; Kiel, C.; Case, D. A. J. Mol. Biol. 2003, 330, 891–913. (35) Gibbons, B. J.; Roach, P. J.; Hurley, T.D. J. Mol. Biol. 2002, 319, 463-477. (36) Gibson, R. P.; Tarling, C. A.; Roberts, S. Withers, S. G.; Davies, G. J. J. Biol. Chem. 2004, 279, 19501955. (37) Pettersen, E. F.; Goddard, T. D.; Huang, C. C.; Couch, G. S.; Greenblatt, D. M.; Meng, E. C.; Ferrin, T. E. J. Comput. Chem. 2004, 25, 1605-1612. (38) Sanner, M. F.; Olson, A. J.; Spehner, J. C. Biopolymers 1996, 38, 305-320. (39) Li, Z.; Lazaridis, T. J. Phys. Chem. B 2005, 109, 662-670.
134
Supporting Information
Molecular Dynamics Simulations of Glycosyltransferase LgtC with Substrate Donor and Acceptor under Physiological Conditions Petr Kulhánek,a,b Lenka Šnajdrová,a,c Anne Imberty,c and Jaroslav Kočaa,b* a) National Centre for Biomolecular Research and b) Department of Organic Chemistry, Faculty of Science, Masaryk University, 611 37 Brno, Czech Republic c) Centre de Recherches sur les Macromolécules Végétales (CERMAV-CNRS), Affiliated with Université Joseph Fourier de Grenoble, BP 53, 38041 Grenoble Cedex, France
Figure S1. Evolution of root-mean-square deviation of Cα atom positions from their X-ray positions in the course of simulations.
Figure S2. Experimental B-factors of Cα atoms from crystal structure of LgtC complex with donor and acceptor analogs.3
135
Figure S3. (a) Evolution of the distances between C1 atom and the amide oxygen of GLN189 residue. (b) Histogram analysis of the above distances.
Figure S4. (a) Evolution of the distances between C1 atom and the hydroxyl oxygen of TYR186 residue. (b) Histogram analysis of the above distances.
Figure S5. (a) Evolution of the distances between C1 atom and the closest carboxylic oxygens of ASP190 residue. (b) Histogram analysis of the above distances.
136
Figure S6. Radial distribution functions (RDF) of water oxygens from C1 atom of UDP-Gal. The RDF for LUL simulation is separated for two observed lactose binding modes.
137
138
6. CONCLUSIONS ET PERSPECTIVES Durant ma thèse, j'ai eu l'occasion d’explorer différents domaines de la modélisation moléculaire. Bien que la discipline soit assez vaste et ne cesse de s’agrandir (comme le montre l’explosion des outils informatiques dans le domaine du séquençage de l’ADN et dans le traitement des données biologiques), j’ai acquis une bonne connaissance des principes de la bioinformatique et de la dynamique moléculaire, et j'ai appris à exploiter efficacement les outils que nous offre la récente technologie informatique. Dans la première partie de mes travaux, je me suis concentrée sur la lectine à fucose PA-IIL, qui a la particularité d’avoir pour son ligand une forte affinité que l’on ne retrouve pas habituellement dans les interactions protéine-glucide. Mon objectif a consisté à essayer d'élucider l'origine de cette forte affinité. J'ai alors opté pour une méthode de calculs quantiques centrés autour du ligand (le fucose) et son environnement immédiat. Il a été ainsi démontré qu'une des causes de la liaison forte du fucose pour la PA-IIL serait une délocalisation importante des charges électriques du complexe ligand-protéine. Nous sommes arrivés à cette conclusion suite à la comparaison de deux états mutuellement indépendants : le site de liaison « vide » et ce même site de liaison « occupé ». Il serait évidemment idéal de suivre une transition entre ces deux états : soit l’attachement du fucose au site de liaison, soit son départ du site. Malheureusement, les calculs quantiques sont tellement exigeants en termes de ressources informatiques qu’il n’est pas encore possible de simuler des mécanismes aussi complexes. Une suite logique de la recherche sur la forte affinité du fucose pour la lectine PA-IIL serait de faire une simulation de dynamique moléculaire (DM), car cette méthode pourrait nous indiquer beaucoup de choses sur la situation énergétique du complexe protéine-ligand. Les simulations de dynamique moléculaire ont plusieurs avantages par rapport aux calculs quantiques. Premièrement, elles ne sont pas restreintes à une fraction de la biomolécule mais elles peuvent prendre en considération l'unité biologique entière, le dimère de PA-IIL dans ce cas. Deuxièmement, la dynamique moléculaire n’est pas aussi exigeante sur le plan ressources informatiques, donc elle nous permet d’effectuer les simulations plus complexes. Il serait ainsi possible de suivre les changements d’énergie ou les changements des positions des atomes au cours d’une simulation durant laquelle le fucose est progressivement « tiré » vers le site de liaison et finalement « forcé » de se lier à ce site. Et troisièmement, la dynamique moléculaire permet d’effectuer les simulations en présence du solvant (l’eau), ce qui n’est pas possible dans les calculs quantiques. L’analyse en détail du comportement du solvant dans les simulations sur la lectine PA-IIL en complexe avec le fucose (et peut-être avec d’autres ligands) pourrait donc nous aider à comprendre des phénomènes que les calculs quantiques ne
139
sont pas en mesure d’expliquer. Les analyses de simulations de DM s’effectueront au laboratoire NCBR à Brno. Il est prévu d’exploiter l'approche MM/PBSA pour évaluer le point de vue énergétique de la liaison du fucose à la PA-IIL, et peut-être aussi d’autres méthodologies. Un autre domaine de ma recherche est centré sur la glycosyltransférase LgtC, plus particulièrement par des simulations de dynamique moléculaire (DM) de cette enzyme. La structure cristallographique de LgtC, qui a été décrite pour la première fois en 2001, est devenue un système modèle pour les chercheurs qui tentent de découvrir le mécanisme réactionnel des glycosyltransférases. Dans une première étude, nous avons opté pour la méthode de dynamique moléculaire, qui pouvait décrire le comportement des substrats dans le site actif de l'enzyme. Durant trois années, une série de simulations ont été effectuées qui ont apporté des informations intéressantes. Ainsi nous avons confirmé que les boucles au-dessus du site actif étaient relativement flexibles en absence des substrats donneur (UDP-Gal) et accepteur (lactose ou deoxylactose). Nous avons aussi fait des simulations avec la LgtC en présence de l’UDP-Gal soit sans substrat accepteur, soit en présence d’un substrat de taille minimum (le lactose), ou en présence d’un analogue inhibiteur, le deoxylactose. Après comparaison des trois simulations, nous avons conclu que la position de l’accepteur lactose dans le site actif de l’enzyme est probablement différente de celle qui a été observée pour le deoxylactose dans la structure cristallographique. L’analyse détaillée du comportement du solvant lors de simulations de dynamique moléculaire a suggéré que les molécules d’eau pourraient participer à la réaction enzymatique. Nous avons examiné le site actif de l’enzyme en détail dans le but d’identifier l’acide aminé qui pourrait jouer un rôle principal dans la réaction de transfert de galactose. De manière surprenante, on a montré que le groupement carboxyle du résidu aspartate 130 pourrait assumer ce rôle. Nous avons ensuite proposé un mécanisme de réaction de LgtC, qui n’est cependant pas encore confirmé ni par les calculs quantiques ni par l’expérience. Si notre hypothèse s’avère correcte, il s’agit d’un mécanisme de déplacement double. La démarche suivante qui s’offre logiquement pour appuyer ou démentir notre hypothèse serait probablement la méthodologie MQ/MM (ou même la dynamique quantique, DQ), sur laquelle on pourrait simuler directement la réaction enzymatique et qui évaluerait aussi le bilan énergétique du mécanisme proposé. Si les résultats des calculs à un niveau de théorie plus haut (MQ/MM ou DQ) montrent que le mécanisme proposé est réalisable du point de vue énergétique et spatial, les molécules inhibitrices de la LgtC pourraient être conçues. Ces molécules posséderaient une structure similaire à celle de l’état de transition de la réaction de LgtC ; la différence importante consisterait dans le fait que les molécules inhibitrices seraient stables, c’est-à-dire qu’elles posséderaient des liaisons covalentes là où les interactions sont
140
très faibles dans l’état de transition. Une fois les inhibiteurs conçus et synthétisés, ils pourraient être testés dans l’expérience. Les molécules qui montreraient le meilleur effet d’inhibition serviraient ensuite pour la conception de nouveaux antibiotiques contre la bactérie Neisseria meningitis, qui utilise la LgtC pour synthétiser la partie lipooligosaccharidique de sa paroi. Un domaine entièrement différent de mes travaux a porté sur la modélisation des protéines par homologie. J'ai ainsi expérimenté des approches diverses pour la recherche de séquences et de structures protéiques dans les bases de données internationales, lorsque je me suis appliquée à modéliser la glycosyltransférase β3GalT6 de la famille GT31. J'ai tenté de construire un modèle homologue en m’appuyant sur l’homologie avec une GlcAT-I de la famille GT43 (code PDB 1FGG); mais à cause d'une homologie de séquences assez basse (19%), ce travail n'a pas été poursuivi pour l’instant. Il pourra être repris si des structures cristallographiques de glycosyltransférases présentant des séquences de plus fortes similarités sont résolues. J'ai obtenu un modèle homologue plus plausible pour la galactosyltransférase MGD1 d’épinard de la famille GT28, qui est une enzyme clef de la biosynthèse plastidiale. Le modèle a été construit en exploitant une homologie de séquences avec une structure cristallographique connu de la même famille (MurG, code PDB 1NLM). Dans mon travail, je me suis appuyé sur une structure de MGD1 d’Arabidopsis thaliana précédemment modélisé au laboratoire. Le modèle résultant de soMGD1 se caractérise par le repliement GT-B et contient le substrat donneur (UDP-Gal) dans le site actif. La validation expérimentale du modèle a été effectué dans un autre laboratoire grenoblois par mutagenèse dirigée des acides aminés qui ont été proposés comme jouant un rôle important dans le site actif. Le modèle de la structure de MGD1 pourra être utilisé pour le développement de nouvelles drogues à action herbicide et anti-parasitaire. Pendant mes trois séjours à Grenoble, je me suis occupée de la mise à jour de la base de données de structures cristallographiques de glycosyltransférases (GTs), qui a été développée au CERMAV et dont la conception et l'entretien sont les résultats de l'effort de nombreuses personnes. La base de donnés de glycosyltransférases de CERMAV est un complément utile du système CAZY, particulièrement pour les chercheurs qui voudraient acquérir une vue d'ensemble de l'état actuel des structures tridimensionnelles expérimentales de ces protéines. Mon travail a consisté à vérifier périodiquement la publication de nouvelles structures cristallographiques de glycosyltransférases et à mettre à jour la base de données. Pendant les trois années, j’ai rajouté de dizaines d’entrées, en utilisant différentes sources pour trouver toutes les informations nécessaires. Je me suis aussi occupée de la création des images jpeg qui accompagnent tous les glycosyltransférases (à partir du modèle tridimensionnel). La mise à jour de la base de données de glycosyltransférases a donc constitué un apport fécond pour
141
moi, car elle m’a obligé à pénétrer profondément dans la problématique de ces enzymes. Les premières structures cristallographiques de glycosyltransférases ont été résolues seulement dans les années 2000 et il n’en existe encore qu’un nombre limité. Cependant, il est prévisible que le nombre de structures va augmenter de plus en plus rapidement. Sur 83 familles de GTs dans le système CAZY, seulement 19 sont représentés dans la base de données de CERMAV, ce qui montre une disparité énorme entre les le nombre de séquences connues et le faible nombre de structures résolues de ces enzymes. De plus, plusieurs groupes d’entrées dans la base de données du CERMAV se rapportent à une seule glycosyltransférase dans le système CAZY. En raison de cela, seulement 31 glycosyltransférases différentes sont représentées pour un total de 149 entrées. Cette disproportion est causée par le fait que pour certaines GTs très étudiées, un grand nombre de complexes ou de mutants ont été cristallisés et que toutes ces structures sont référencées dans la base de données. Etant donné que les structures de protéines d’une même famille de CAZY sont très similaires, la découverte de structure de GTs d’une famille « inexplorée » est un but important pour de nombreuses équipes de glycobiologistes. En particulier, il n’existe aucune information structurale sur les fucosyltransférases, ni sur les mannosyltransférases et sialyltransférases d’organismes eukaryotes. Mes travaux de thèse ont été très variés quant aux méthodes employées et aussi quant aux protéines étudiées, pourtant je crois qu’il pourraient contribuer un peu à la compréhension des sujets mentionnés. Ma recherche a été entièrement théorique, mais au départ, il y avait toujours une connaissance expérimentale (structure cristallographique, forte affinité observée entre la protéine et le ligand, etc.). Pour finir ces travaux, il faudra non seulement développer les résultats atteints jusqu’ici, mais surtout confirmer leur validité par des approches expérimentales.
142
REFERENCES Altekruse, S. F., Stern, N. J., Fields, P. I. and Swerdlow, D. L. (1999). Emerg Infect Dis. 5, 28-35. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. and Lipman, D. J. (1990). J. Mol. Biol. 215, 403-410. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D. J. (1997). Nucleic Acids Res. 25, 3389-3402. Amadei, A., Linssen, A. B. and Berendsen, H. J. (1993). Proteins. 17, 412-425. Bachhawat-Sikder, K., Thomas, C. J. and Surolia, A. (2001). FEBS Lett. 500., 75-79. Bai, X., Zhou, D., Brown, J. R., Crawford, B. E., Hennet, T. and Esko, J. D. (2001). J Biol Chem. 276, 48189-48195. Bairoch, A., Boeckmann, B., Ferro, S. and Gasteiger, E. (2004). Brief Bioinform. 5, 39-55. Baker, D. and Sali, A. (2001). Science 294, 93-96. Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S. R., Griffiths-Jones, S., Howe, K. L., Marshall, M. and Sonnhammer, E. L. (2002). Nucleic Acids Res. 30, 276-8. Becke, A. D. (1988). Physical Reviews A 38, 3098-3100. Berendsen, H. J. C., Postma, J. P. M., van Gunsteren, W. F., DiNola, A. and Haak, R. J. (1984). J. Phys. Chem. 81, 3684-3690. Bergsten, G., Wullt, B. and Svanborg, C. (2005). Int J Med Microbiol. 295, 487-502. Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. and Bourne, P. E. (2000). Nucleic Acids Res. 28, 235-242. Blundell, T. L., Sibanda, B. L., Sternberg, M. J. and Thornton, J. M. (1987). Nature 326, 347352. Boix, E., Swaminathan, G. J., Zhang, Y., Natesh, R., Brew, K. and Acharya, K. R. (2001). J. Biol. Chem. 276, 48608-48614. Boix, E., Zhang, Y., Swaminathan, G. J., Brew, K. and Acharya, K. R. (2002). J. Biol. Chem. 277, 28310-28318. Born, M. and Oppenheimer, R. (1927). Annalen der Physik 84, 457-484. Bourne, Y. and Henrissat, B. (2001). Curr Opin Struct Biol. 11, 593-600. Bourne, Y., Rouge, P. and Cambillau, C. (1990). J Biol Chem. 265, 18161-18165.
143
Bradley, P., Chivian, D., Meiler, J., Misura, K. M., Rohl, C. A., Schief, W. R., Wedemeyer, W. J., Schueler-Furman, O., Murphy, P., Schonbrun, J., Strauss, C. E. and Baker, D. (2003). Proteins 53, 457-468. Brenner, S. E., Chothia, C. and Hubbard, T. J. (1998). Proc Natl Acad Sci U S A 95, 60736078. Breton, C., Snajdrova, L., Jeanneau, C., Koca, J. and Imberty, A. (2006). Glycobiology 16, 29R-37R. Brooks, B. R., Bruccoleri, R. E., Olafason, B. D., States, D. J., Swaminathan, S. and Karplus, M. (1983). Journal of Computational Chemistry 14, 187-217. Buschiazzo, A., Ugalde, J. E., Guerin, M. E., Shepard, W., Ugalde, R. A. and Alzari, P. M. (2004). EMBO J. 23, 3196-3205. Bush, C. A., Martin-Pastor, M. and Imberty, A. (1999). Annu. Rev. Biophys. Biomol. Struct. 28, 269-293. Callebaut, I., Labesse, G., Durand, P., Poupon, A., Canard, L., Chomilier, J., Henrissat, B. and Mornon, J. P. (1997). Cell Mol Life Sci. 53, 621-45. Carlet, J., Desenclos, J.-C., Aubry-Damon, H. and Lepoutre, A. (2001). Institut de Veille Sanitaire (InVS), Saint-Maurice. Case, D. A., Darden, T. A., Cheatham III, T. E., Simmerling, C. L., Wang, J., Duke, R. E., R., L., Merz, K. M., Wang, B., Pearlman, D. A., Crowley, M., Brozell, S., Tsui, V., Gohlke, H., Mongan, J., Hornak, V., Cui, G., Beroza, P., Schafmeister, C., Caldwell, J. W., Ross, W. S. and Kollman, P. A. (2004). University of California, San Francisco. Charnock, S. J. and Davies, G. J. (1999). Biochemistry 38, 6380-6385. Cheatham III, T. E., Miller, J. L., Fox, T., Darden, T. A. and Kollman, P. A. (1995). J. Am. Chem. Soc. 117, 4193-4194. Cheng, A., Stanton, R. S., Vincent, J. J., van der Vaart, A., Damodaran, K. V., Dixon, S. L., Hartsough, D. S., Mori, M., Best, S. A., Monard, G., Garcia, M., Van Zant, L. and Merz Jr., K. M. (2002). The Pennsylvania State University. Chiu, C. P., Watts, A. G., Lairson, L. L., Gilbert, M., Lim, D., Wakarchuk, W. W., Withers, S. G. and Strynadka, N. C. (2004). Nat Struct Mol Biol. 11, 163-70. Cioci, G., Mitchell, E. P., Gautier, C., Wimmerova, M., Sudakevitz, D., Perez, S., GilboaGarber, N. and A., I. (2003). FEBS Lett. 555, 297-301. Cizek, J. (1966). Journal of Chemical Physics 45, 4256-4268. Costerton, J. W., Stewart, P. S. and Greenberg, E. P. (1999). Science. 284, 1318-1322. Coutinho, P. M., Deleury, E., Davies, G. J. and Henrissat, B. (2003). J. Mol. Biol. 328, 307317.
144
Coutinho, P. M. and Henrissat, B. (1999). In "Recent Advances in Carbohydrate Bioengineering" (H. J. Gilbert, G. Davies, B. Henrissat and B. Svensson, eds), pp. 312. The Royal Society of Chemistry, Cambridge. Cramer, C. J. (2002). "Essentials of Computational Chemistry: Theories and Models". John Wiley and Sons, New York. Dam, T. K. and Brewer, C. F. (2002). Chem. Rev. 102, 387-429. David, R., Korenberg, M. J. and Hunter, I. W. (2000). Pharmacogenomics 1, 445-455. Davies, G., Sinnott, M. L. and Withers, S. G. (1998). In "Comprehensive Biological Catalysis: A Mechanistic Reference" (M. L. Sinnott, ed.), vol. 1, pp. 119-209. Academic Press, San Diego. Defez, C., Fabbro-Peray, P., Bouziges, N., Gouby, A., Mahamat, A., Daures, J. P. and Sotto, A. (2004). J Hosp Infect. 57, 209-216. Dewar, M. J. S., Zoebisch, E. G., Healy E. F., Stewart J. J. P. (1993). J. Am. Chem. Soc. 107, 3902-3909. Dork, T., Wulbrand, U., Richter, T., Neumann, T., Wolfes, H., Wulf, B., Maass, G. and Tummler, B. (1991). Hum Genet. 87, 441-446. Douce, R. and Joyard, J. (1990). Annu Rev Cell Biol. 6, 173-216. Elgavish, S. and Shaanan, B. (1998). J Mol Biol. 277, 917-932. Esko, J. D. and Lindahl, U. (2001). J. Clin. Invest 108, 169-173. Essmann, U., Perera, L. and Berkowitz, M. L. (1995). Journal of Chemical Physics 103, 8577-8593. Fernandez-Fuentes, N., Oliva, B. and Fiser, A. (2006). Nucleic Acids Res. 34, 2085-2097. Fiser, A., Do, R. K. and Sali, A. (2000). Protein Sci. 9, 1753-1773. Fiser, A., Feig, M., Brooks, C. L. r. and Sali, A. (2002). Acc. Chem. Res. 35, 413-421. Flint, J., Taylor, E., Yang, M., Bolam, D. N., Tailford, L. E., Martinez-Fleites, C., Dodson, E. J., Davis, B. G., Gilbert, H. J. and Davies, G. J. (2005). Nat Struct Mol Biol. 12, 608614. Fock, V. (1930). Zeitschrift für Physik 61, 126-148. Frisch, M. J., Trucks, G. W., Schlegel, H. B., Scuseria, G. E., Robb, M. A., Cheeseman, J. R., Montgomery, J., J. A., Vreven, T., Kudin, K. N., Burant, J. C., Millam, J. M., Iyengar, S. S., Tomasi, J., Barone, V., Mennucci, B., Cossi, M., Scalmani, G., Rega, N., Petersson, G. A., Nakatsuji, H., Hada, M., Ehara, M., Toyota, K., Fukuda, R., Hasegawa, J., Ishida, M., Nakajima, T., Honda, Y., Kitao, O., Nakai, H., Klene, M., Li, X., Knox, J. E., Hratchian, H. P., Cross, J. B., Bakken, V., Adamo, C., Jaramillo, J., Gomperts, R., Stratmann, R. E., Yazyev, O., Austin, A. J., Cammi, R., Pomelli, C.,
145
Ochterski, J. W., Ayala, P. Y., Morokuma, K., Voth, G. A., Salvador, P., Dannenberg, J. J., Zakrzewski, V. G., Dapprich, S., Daniels, A. D., Strain, M. C., Farkas, O., Malick, D. K., Rabuck, A. D., Raghavachari, K., Foresman, J. B., Ortiz, J. V., Cui, Q., Baboul, A. G., Clifford, S., Cioslowski, J., Stefanov, B. B., Liu, G., Liashenko, A., Piskorz, P., Komaromi, I., Martin, R. L., Fox, D. J., Keith, T., Al-Laham, M. A., Peng, C. Y., Nanayakkara, A., Challacombe, M., Gill, P. M. W., Johnson, B., Chen, W., Wong, M. W., Gonzalez, C. and Pople, J. A. (2004). Gaussian, Inc., Wallingford CT. Fritz, T. A., Hurley, J. H., Trinh, L. B., Shiloach, J. and Tabak, L. A. (2004). Proc Natl Acad Sci U S A. 101, 15307-15312. Fritz, T. A., Raman, J. and Tabak, L. A. (2006). J Biol Chem. Gao, J. (1998). In "Encyclopedia of Computational Chemistry" (P. v. R. Schleyer, ed.), pp. 1257-1262. John Wiley and Sons, New York. Garber, N., Guempel, U., Gilboa-Garber, N. and Royle, R. J. (1987). FEMS Microbiology Letters 48, 331-334. Gastinel, L. N., Bignon, C., Misra, A. K., Hindsgaul, O., Shaper, J. H. and Joziasse, D. H. (2001). EMBO J. 20, 638-649. Gastinel, L. N., Cambillau, C. and Bourne, Y. (1999). EMBO J. 18, 3546-3557. Gibbons, B. J., Roach, P. J. and Hurley, T. D. (2002). J. Mol. Biol. 319, 463-477. Gibson, R. P., Tarling, C. A., Roberts, S., Withers, S. G. and Davies, G. J. (2004). J Biol Chem 279, 1950-5. Gibson, R. P., Turkenburg, J. P., Charnock, S. J., Lloyd, R. and Davies, G. J. (2002). Chem Biol 9, 1337-1346. Gilboa-Garber, N. (1982). Methods Enzymol. 83, 378-85. Gotschlich, E. C. (1994). J Exp Med. . 180, 2181-2190. Gough, J., Karplus, K., Hughey, R. and Chothia, C. (2001). J. Mol. Biol. 313, 903-919. Grootenhuis, P. D. J. and van Boeckel, C. A. A. (1991). J Am Chem Soc. 113, 2743-2747. Ha, S., Walker, D., Shi, Y. and Walker, S. (2000). Protein Sci. 9, 1045-1052. Hall, T. A. (1999). Hehre, W. J., Radom, L., Schleyer, P. v. R. and Pople, J. A. (1986). "Ab Initio Molecular Orbital Theory". John Wiley and Sons, New York. Hohenberg, P. and Kohn, W. (1964). Physical Rewiev B 136, 864-871. Honig, B. and Nicholls, A. (1995). Science 268, 1144-1149. Horcajada, C., Guinovart, J. J., Fita, I. and Ferrer, J. C. (2006). J Biol Chem. 281, 2923-2931.
146
Huang, X., Tanaka, K. S. E. and Bennet, A. J. (1997). J Am Chem Soc. 119, 11147-11154. Imberty, A., Wimmerova, M., Mitchell, E. P. and Gilboa-Garber, N. (2004). Microbes Infect 6, 221-228. Jacobson, M. P., Pincus, D. L., Rapp, C. S., Day, T. J., Honig, B., Shaw, D. E. and Friesner, R. A. (2004). Proteins. 55, 351-367. Jeanneau, C. (2003). Université Joseph Fourier, Grenoble, France. Jensen, F. (1999). In "Introduction to Computational Chemistry". John Wiley and Sons, Chichester. Jones, D. T. (1999). J. Mol. Biol. 287, 797-815. Kakuda, S., Shiba, T., Ishiguro, M., Tagawa, H., Oka, S., Kajihara, Y., Kawasaki, T., Wakatsuki, S. and Kato, R. (2004). J Biol Chem. 279, 22693-703. Kelley, L. A., MacCallum, R. M. and Sternberg, M. J. (2000). J Mol Biol. 299, 499-520. Ketley, J. M. (1997). Microbiology. 143, 5-21. Koch, W. and Holthausen, M. C. (2001). "A Chemists's guide to Density Function Theory". Wiley-VCH, Weinheim. Kohler, S., Delwiche, C. F., Denny, P. W., Tilney, L. G., Webster, P., Wilson, R. J., Palmer, J. D. and Roos, D. S. (1997). Science 275, 1485-1489. Kohn, W. and Sham, L. J. (1965). Phys. Rev. 140, A1133–A1138. Kunin, C. M. (1987). "Urinary Tract Infections: Detection, Prevention, and Management". Williams & Wilkins, Baltimore. Lairson, L. L., Chiu, C. P., Ly, H. D., He, S., Wakarchuk, W. W., Strynadka, N. C. and Withers, S. G. (2004). J Biol Chem. 279, 28339-44. Lairson, L. L. and Withers, S. G. (2004). Chem Commun (Camb). 20, 2243-2248. Lariviere, L., Gueguen-Chaignon, V. and Morera, S. (2003). J. Mol. Biol. 330, 1077-1086. Lariviere, L. and Morera, S. (2002). J. Mol. Biol. 324, 483-490. Lariviere, L. and Morera, S. (2004). J Biol Chem. 279, 34715-20. Lariviere, L., Sommer, N. and Morera, S. (2005). J Mol Biol 352, 139-150. Laskowski, R. A., MacArthur, M. W., Moss, D. S. and Thornton, J. M. (1993). J. Appl. Cryst. 26, 283-291. Leach, A. R. (2001). "Molecular Modelling: Principles and Applications". Pearson: Prentice Hall, Harlow. Lee, C., Yang, W. and Parr, R. G. (1988). Phys Rev B Condens Matter. 37, 785-789. 147
Lee, H. J., Barry, C. H., Borisova, S. N., Seto, N. O., Zheng, R. B., Blancher, A., Evans, S. V. and Palcic, M. M. (2004). J Biol Chem. Lee, Y. C. and Lee, R. T. (1995). Acc. Chem. Res. 28, 321-327. Lennard-Jones, J. E. (1931). Proc.Camb.Phil.Soc. 27, 469. Leonidas, D. D., Vatzaki, E. H., Vorum, H., Celis, J. E., Madsen, P. and Acharya, K. R. (1998). Biochemistry 37, 13930-13940. Lesman-Movshovich, E., Lerrer, B. and Gilboa-Garber, N. (2003). Can J Microbiol. 49, 230235. Letts, J. A., Rose, N. L., Fang, Y. R., Barry, C. H., Borisova, S. N., Seto, N. O., Palcic, M. M. and Evans, S. V. (2006). J Biol Chem 281, 3625-3632. Li, W., Liu, Z. and Lai, L. (1999). Biopolymers. 49, 481-495. Lis, H. and Sharon, N. (1998). Chem Rev. 1998 Apr 2;98(2):637-674. 98, 637-647. Lobsanov, Y. D., Romero, P. A., Sleno, B., Yu, B., Yip, P., Herscovics, A. and Howell, P. L. (2004). J Biol Chem. 279, 17921-31. Loris, R., Hamelryck, T., Bouckaert, J. and Wyns, L. (1998). Biochim Biophys Acta. 1383, 936. Loris, R., Stas, P. P. and Wyns, L. (1994). J Biol Chem. 269, 26722-26733. Loris, R., Tielker, D., Jaeger, K. E. and Wyns, L. (2003). J. Mol. Biol. 331, 861-870. Ly, H. D., Lougheed, B., Wakarchuk, W. W. and Withers, S. G. (2002). Biochemistry 41, 5075-5085. Marechal, E. (2002). Trends Plant Sci. 7, 99-100. Marechal, E., Azzouz, N., de Macedo, C. S., Block, M. A., Feagin, J. E., Schwarz, R. T. and Joyard, J. (2002). Eukaryot Cell. 1, 653-656. Marechal, E. and Cesbron-Delauw, M. F. (2001). Trends Plant Sci. 6, 200-205. Marti-Renom, M. A., Madhusudhan, M. S., Fiser, A., Rost, B. and Sali, A. (2002). Structure (Camb). 10, 435-440. Marti-Renom, M. A., Stuart, A. C., Fiser, A., Sanchez, R., Melo, F. and Sali, A. (2000). Annu Rev Biophys Biomol Struct. 29, 291-325. Massova, I. and Kollman, P. A. (2000). Perspectives in Drug Discovery and Design 18, 113135. Merlet, P., Garnier, Y., Vinciguerra, M., Lambrechts, C., Durand, B., Houssemaine-Florent, H. and Ouvrard, C. (2004). Éditions Larousse, Paris. Mitchell, E. P., Houles, C., Sudakevitz, D., Wimmerova, M., Gautier, C., Peréz, S., Wu, A. 148
M., Gilboa-Garber, N. and Imberty, A. (2002). Nature Structural Biology 9, 918-921. Møller, C. and Plesset, M. S. (1934). Phys. Rev. 46, 618-622. Moolenaar, R. L., Crutcher, J. M., San Joaquin, V. H., Sewell, L. V., Hutwagner, L. C., Carson, L. A., Robison, D. A., Smithee, L. M. and Jarvis, W. R. (2000). Infect Control Hosp Epidemiol. 21, 80-85. Moran, A. P., Prendergast, M. M. and Appelmelk, B. J. (1996). FEMS Immunol Med Microbiol. 16, 105-115. Moréra, S., Imberty, A., Aschke-Sonnenbron , Rüger, W. and Freemont, P. S. (1999). J. Mol. Biol. 292, 717-730. Moréra, S., Lariviere, L., Kurzeck, J., Aschke_Sonnenborn, U., Freemont, P. S., Janin, J. and Ruger, W. (2001). J Mol Biol 311, 569-577. Mulichak, A. M., Losey, H. C., Lu, W., Wawrzak, Z., Walsh, C. T. and Garavito, R. M. (2003). Proc Natl Acad Sci U S A 100, 9238-9243. Mulichak, A. M., Losey, H. C., Walsh, C. T. and Garavito, R. M. (2001). Structure 9, 547557. Mulichak, A. M., Lu, W., Losey, H. C., Walsh, C. T. and Garavito, R. M. (2004). Biochemistry 43, 5170-5180. Mulliken, R. S. (1955). The Journal of Chemical Physics 23, 1833-1846. Murzin, A. G., Brenner, S. E., Hubbard, T. and Chothia, C. (1995). J. Mol. Biol. 247, 536-540. Ng, K. K., Kolatkar, A. R., Park-Snyder, S., Feinberg, H., Clark, D. A., Drickamer, K. and Weis, W. I. (2002). J Biol Chem. 277, 16088-16095. Nguyen, H. P., Seto, N. O., Cai, Y., Leinala, E. K., Borisova, S. N., Palcic, M. M. and Evans, S. V. (2003). J Biol Chem 278, 49191-5. Ni, L., Sun, M., Yu, H., Chokhawala, H., Chen, X. and Fisher, A. J. (2006). Biochemistry. 45, 2139-2148. Ofek, I. and Doyle, R. J. (1994). "Bacterial Adhesion to Cells and Tissues". Kluwer Academic Publishers. Ofek, I., Hasty, D. L., Abraham, S. N. and Sharon, N. (2000). Adv Exp Med Biol. 485, 183192. Offen, W., Martinez-Fleites, C., Yang, M., Kiat-Lim, E., Davis, B. G., Tarling, C. A., Ford, C. M., Bowles, D. J. and Davies, G. J. (2006). EMBO J. Patenaude, S. I., Seto, N. O., Borisova, S. N., Szpacenko, A., Marcus, S. L., Palcic, M. M. and Evans, S. V. (2002). Nature Struct. Biol. 9, 685-690. Pearl, F., Todd, A., Sillitoe, I., Dibley, M., Redfern, O., Lewis, T., Bennett, C., Marsden, R., Grant, A., Lee, D., Akpor, A., Maibaum, M., Harrison, A., Dallman, T., Reeves, G., 149
Diboun, I., Addou, S., Lise, S., Johnston, C., Sillero, A., Thornton, J. and Orengo, C. (2005). Nucleic Acids Res. 33, D241-251. Pearson, W. R. (1990). Methods Enzymol. 183. Pedersen, L. C., Darden, T. A. and Negishi, M. (2002). J. Biol. Chem. 277, 21869-21873. Pedersen, L. C., Dong, J., Taniguchi, F., Kitagawa, H., Krahn, J. M., Pedersen, L. G., Sugahara, K. and Negishi, M. (2003). J. Biol. Chem. 278, 14420-14428. Pedersen, L. C., Tsuchida, K., Kitagawa, H., Sugahara, K., Darden, T. A. and Negishi, M. (2000). J. Biol. Chem. 275, 34580-34585. Perret, S., Sabin, C., Dumon, C., Pokorna, M., Gautier, C., Galanina, O., Ilia, S., Bovin, N., Nicaise, M., Desmadril, M., Gilboa-Garber, N., Wimmerova, M., Mitchell, E. P. and Imberty, A. (2005). Biochem J 389, 325-332. Persson, K., Ly, H. D., Dieckelmann, M., Wakarchuk, W. W., Withers, S. G. and Strynadka, N. C. (2001). Nature Struct. Biol. 8, 166-175. Pizarro-Cerda, J. and Cossart, P. (2006). Cell 124, 715-727. Ramakrishnan, B., Balaji, P. V. and Qasba, P. K. (2002). J. Mol. Biol. 318, 491-502. Ramakrishnan, B., Boeggeman, E. and Qasba, P. K. (2004). Biochemistry 43, 12513-22. Ramakrishnan, B. and Qasba, P. K. (2001). J. Mol. Biol. 310, 205-218. Ramakrishnan, B. and Qasba, P. K. (2002). J. Biol. Chem. 277, 20833-20839. Ramakrishnan, B., Ramasamy, V. and Qasba, P. K. (2006). J Mol Biol 357, 1619-1633. Ramakrishnan, B., Shah, P. S. and Qasba, P. K. (2001). J. Biol. Chem. 276, 37665-37671. Ramasamy, V., Ramakrishnan, B., Boeggeman, E. and Qasba, P. K. (2003). J. Mol. Biol. 331, 1065-1076. Ramasamy, V., Ramakrishnan, B., Boeggeman, E., Ratner, D. M., Seeberger, P. H. and Qasba, P. K. (2005). J Mol Biol 353, 53-67. Ratjen, F. and Doring, G. (2003). Lancet. 361, 681-689. Reinert, D. J., Jank, T., Aktories, K. and Schulz, G. E. (2005). J Mol Biol 351, 973-8. Reuter, N., Dejaegere, A., Maigret, B. and Karplus, M. (2000). Journal of Physical Chemistry A 104, 1720-1733. Roberts, J. A., Marklund, B. I., Ilver, D., Haslam, D., Kaack, M. B., Baskin, G., Louis, M., Mollby, R., Winberg, J. and Normark, S. (1994). Proc Natl Acad Sci U S A 91, 1188993. Rocchia, W., Alexov, E. and Honig, B. (2001). J Phys Chem B 105, 6507-6514.
150
Rohl, C. A., Strauss, C. E., Chivian, D. and Baker, D. (2004). Proteins. 55, 656-677. Ruiz-Palacios, G. M., Cervantes, L. E., Ramos, P., Chavez-Munguia, B. and Newburg, D. S. (2003). J Biol Chem. 278, 14112-14120. Sabin, C., Mitchell, E. P., Pokorna, M., Gautier, C., Utille, J. P., Wimmerova, M. and Imberty, A. (2006). FEBS Lett. 580, 982-987. Sapse, A.-M. (1998). pp. 233. Oxford University Press, Oxford. Schmidt, M. W., Baldridge, K. K., Boatz, J. A., Elbert, S. T., Gordon, M. S., Jensen, J. H., Koseki, S., Matsunaga, N., Nguyen, K. A., Su, T. J., Windus, L., Dupuis, M. and Montgomery, J. A. (1993). Journal of Computational Chemistry 14, 1347-1363. Schrödinger, E. (1926). Annalen der Physik 79, 361-376. Schwede, T., Kopp, J., Guex, N. and Peitsch, M. C. (2003). Nucleic Acids Res. 31, 3381-3385. Shao, H., He, X., Achnine, L., Blount, J. W., Dixon, R. A. and Wang, X. (2005). Plant Cell. 17, 3141-3154. Sharon, N. and Ofek, I. (2000). Glycoconj J 17, 659-664. Singh, P. K., Schaefer, A. L., Parsek, M. R., Moninger, T. O., Welsh, M. J. and Greenberg, E. P. (2000). Nature. 407, 762-764. Sippl, M. J. and Flockner, H. (1996). Structure 4, 15-9. Springborg, M. (1997). "Density-Functional Methods in Chemistry and Materials Science". Wiley-Interscience. Srinivasan, J., Cheatham III, T. E., Cieplak, P., Kollman, P. A. and Case, D. A. (1998). J Am Chem Soc. 120, 9401-9409. Srinivasan, N. and Blundell, T. L. (1993). Protein Eng. 6, 501-512. Stewart, J. P. P. (1989a). The Journal of Computational Chemistry 10, 209-220. Stewart, J. P. P. (1989b). Journal of Computational Chemistry 10, 221-264. Svensson, M., Humbel, S., Froese, R. D. J., Matsubara, T., Sieber, S. and Morokuma, K. (1996). Journal of Physical Chemistry 100, 19357-19363. Tarbouriech, N., Charnock, S. J. and Davies, G. J. (2001). J. Mol. Biol. 314, 655-661. Tauxe, R. V. (1992). In "Campylobacter Jejuni: Current Status and Future Trends" (I. Nachamkin, M. J. Blaser and L. S. Tompkins, eds), pp. 9-12. American Society for Microbiology, Washington. Thompson, J. D., Higgins, D. G. and Gibson, T. J. (1994). Nucleic Acids Res. 22, 4673-80. Tielker, D., Hacker, S., Loris, R., Strathmann, M., Wingender, J., Wilhelm, S., Rosenau, F. and Jaeger, K. E. (2005). Microbiology 151, 1313-1323. 151
Tvaroska, I. (2004). Carbohydr Res. 339, 1007-14. Unligil, U. M., Zhou, S., Yuwaraj, S., Sarkar, M., Schachter, H. and Rini, J. M. (2000). EMBO J. 19, 5269-5280. Van Aalten, D. M. F., De Groot, B. L., Findlay, J. B. C., Berendsen, H. J. C. and Amadei, A. (1997). Journal Of Computational Chemistry 18, 169-181. van Gunsteren, W. F. and Berendsen, H. J. C. (1990). Angew. Chem. Int. Ed. Eng. 29, 9921023. Vocadlo, D. J., Wicki, J., Rupitz, K. and Withers, S. G. (2002). Biochemistry. 41, 9727-9735. Vosko, S. J., Wilk, L. and Nusair, M. (1980). Canadian Journal of Physics 58, 1200-1211. Vrielink, A., Ruger, W., Driessen, H. P. C. and Freemont, P. S. (1994). EMBO J. 13, 34133422. Wakarchuk, W. W., Cunningham, A., Watson, D. C. and Young, N. M. (1998). Protein Eng. 11, 295-302. Wakarchuk, W. W., Martin, A., Jennings, M. P., Moxon, E. R. and Richards, J. C. (1996). J Biol Chem. 271, 19166-19173. Walser, P. J., Haebel, P. W., Kunzler, M., Sargent, D., Kues, U., Aebi, M. and Ban, N. (2004). Structure. 12, 689-702. York, D. M., Yang, W., Lee, H., Darden, T. A. and Pedersen, L. G. (1995). J. Am. Chem. Soc. 117, 5001-5002. Young, D. C. (2001). "Computational Chemistry: A Practical Guide for Applying Techniques to Real World Problems". John Wiley and Sons, New York. Zechel, D. L. and Withers, S. G. (2000). Acc Chem Res. 33, 11-18. Zhang, Y., Deshpande, A., Xie, Z., Natesh, R., Acharya, K. R. and Brew, K. (2004). Glycobiology 14, 1295-1302. Zhang, Y., Swaminathan, G. J., Deshpande, A., Boix, E., Natesh, R., Xie, Z., Acharya, K. R. and Brew, K. (2003). Biochemistry. 42, 13512-21.
152