Multiple alignment of protein structures based on ligand or prosthetic group position Jean-Christophe Nebel Faculty of Computing, Information Systems & Mathematics Kingston University, London
[email protected] http://www.kingston.ac.uk/~ku33185/Nestor3D.html
Some challenges • Protein 3D structure prediction (John Moult, organiser of CASP 6, December 2004)
§ Proteins with homologues: good models, but fine grain (all atoms) models are still needed § Twilight zone: approximate (and useful) models; further improvements will require full atom description and refinements
• Protein function annotation from 3D structure § Structural genomics projects: high-throughput delivery of protein structures regardless of the state of their functional annotation
• Drug design § High resolution models of active sites are required
Principles • Active sites are key to understanding of protein functions • Most conserved regions of homologue proteins are linked to active sites (e.g. PROSITE patterns) • Multiple alignment improves pattern recognition (e.g. ClustalW)
Multiple alignment of protein 3D structures based on active site position at atomic level Ø generation of 3D motifs
3D motif generation • Rigid alignment of protein 3D structures based on local features linked to active sites § prosthetic groups § ligands § PROSITE patterns (under development)…
• Generate consensus pattern based on threshold § atom positions (1-1.4 Å) § chemical group positions, e.g. carboxyl, (1.5-3.8 Å) § cavity position, if relevant, (1-1.4 Å)
Validation: Proteins with porphyrin rings • The PDB holds 1551 proteins containing these groups (i.e. 5.3% of PDB entries, 01/02/05) § globines § peroxidases § cytochromes § P450s § chlorophyll proteins…
• All atoms of the porphyrin ring used to align sets of homologue proteins
Validation: Methodology • Representatives of proteins containing porphyrin rings § PDB50% § Identical chains and chains that are not involved with a prosthetic group were removed. Ø Structures of 237 chains
• For each chain § Generation of a set of homologues (within our set) using FASTA § Generation of a 3D template based on homologues only Ø Comparison between template and PDB structure
Validation: Results 66 patterns were produced
False positives
(at least 3 homologues were required, E value of 10e-6, atom distance of 1.25 Å)
True positives
Number of true and false positives
Distribution of true positives
Ø Similar results with other parameters, groups and cavity Ø Detection of abnormal structures: 1U5U & 1S05 1S05: structural model validated using a restricted set of NMR experiments 1U5U: protein fragment
Application: Modelling of active site of CYP17 • P450 protein involved in biosynthesis of sex hormones • Enzyme associated to some forms of cancer (breast & prostate) • 3D structure is unknown • P450 active site: § Haem group linked to protein by a cysteine § Ligand (i.e. drug) on the other side of the haem group
Homologues of p450 human CYP17 P450s:
3500+ sequences (50+ human genes) 125 structures ( 5 humans)
Protein
Length
Identity%
Hits
1PQ2
476
28.6%
136
1W0G
485
28.2%
135
1BU7
455
29.3%
79
1H5Z
455
24.0%
63
1N97
389
26.4%
61
1IZO
417
23.9%
35
1AKD
417
29.5%
18
Ø No single good candidate! But 5 proteins are close…
Consensus structure of CYP17 Consensus groups Consensus atoms
Haem group
Cavity area
Is it biologically meaningful?
P450 known patterns based on sequences
4 clusters based on structure Ø the 4 most common patterns!
e.g. blue box, P450 signature [FW]-[SGNH]-x-[GD]-x-[RKHPT]-x-C-[LIVMFAP]-[GAP] Modelling of P450 active site based on consensus 3D structures, J.-C. Nebel, International Conference on Biomedical Engineering, BioMed 2005, 16-18 February 2005, Innsbruck, Austria
Comparison with CYP17 active site models Generated independently by Dr S. Ahmed* (Kingston University - School of Chemical & Pharmaceutical Sciences)
Putative H-bond
*Ahmed S: The use of the novel substrate-heme complex approach in the derivation of a representation of the active site of the enzyme complex 17alpha-hydroxylase and 17,20-lyase. Biochem Biophys Res Commun 2004, 316(3):595-598.
Towards function prediction Clustering according to active site similarity (Kulczynski’s metric & Neighbour Joining)
NESTOR3D Protein families:
ClustalW P450s Globines Flavocytochromes Peroxidases Catalases, Cytochromes B, C, C2, C3 & C’
Nestor3D: free software (Java) http://www.kingston.ac.uk/~ku33185/Nestor3D.html
• Functionality § Generation of consensus structures § Generation of cavity descriptions § Generation of similarity matrices
• Applications § Better understanding of active sites (drug design…) § Function prediction: Ø detection of putative active sites from a protein 3D structure Ø generation of phylogenetic tree from similarity matrix § Homology modelling Ø generation of a structure template (constraints…) Ø validation of predicted protein structure
Conclusion • New method to produce high resolution active site models • Consensus structure elements biologically meaningful and related to function • Require proteins interacting with ligands or heterogeneous groups: § rigid molecules:
6% PDB entries § semi rigid molecules: 20% PDB entries
Future work • Alignments based on PROSITE patterns • Generation of a 3D motif database
5 kinases (ATP or ADP)