Multiple alignment of protein structures based on ... - Semantic Scholar

18 downloads 0 Views 4MB Size Report
Twilight zone: approximate (and useful) models; further improvements will require full atom description and refinements. • Protein function annotation from 3D ...
Multiple alignment of protein structures based on ligand or prosthetic group position Jean-Christophe Nebel Faculty of Computing, Information Systems & Mathematics Kingston University, London

[email protected] http://www.kingston.ac.uk/~ku33185/Nestor3D.html

Some challenges • Protein 3D structure prediction (John Moult, organiser of CASP 6, December 2004)

§ Proteins with homologues: good models, but fine grain (all atoms) models are still needed § Twilight zone: approximate (and useful) models; further improvements will require full atom description and refinements

• Protein function annotation from 3D structure § Structural genomics projects: high-throughput delivery of protein structures regardless of the state of their functional annotation

• Drug design § High resolution models of active sites are required

Principles • Active sites are key to understanding of protein functions • Most conserved regions of homologue proteins are linked to active sites (e.g. PROSITE patterns) • Multiple alignment improves pattern recognition (e.g. ClustalW)

Multiple alignment of protein 3D structures based on active site position at atomic level Ø generation of 3D motifs

3D motif generation • Rigid alignment of protein 3D structures based on local features linked to active sites § prosthetic groups § ligands § PROSITE patterns (under development)…

• Generate consensus pattern based on threshold § atom positions (1-1.4 Å) § chemical group positions, e.g. carboxyl, (1.5-3.8 Å) § cavity position, if relevant, (1-1.4 Å)

Validation: Proteins with porphyrin rings • The PDB holds 1551 proteins containing these groups (i.e. 5.3% of PDB entries, 01/02/05) § globines § peroxidases § cytochromes § P450s § chlorophyll proteins…

• All atoms of the porphyrin ring used to align sets of homologue proteins

Validation: Methodology • Representatives of proteins containing porphyrin rings § PDB50% § Identical chains and chains that are not involved with a prosthetic group were removed. Ø Structures of 237 chains

• For each chain § Generation of a set of homologues (within our set) using FASTA § Generation of a 3D template based on homologues only Ø Comparison between template and PDB structure

Validation: Results 66 patterns were produced

False positives

(at least 3 homologues were required, E value of 10e-6, atom distance of 1.25 Å)

True positives

Number of true and false positives

Distribution of true positives

Ø Similar results with other parameters, groups and cavity Ø Detection of abnormal structures: 1U5U & 1S05 1S05: structural model validated using a restricted set of NMR experiments 1U5U: protein fragment

Application: Modelling of active site of CYP17 • P450 protein involved in biosynthesis of sex hormones • Enzyme associated to some forms of cancer (breast & prostate) • 3D structure is unknown • P450 active site: § Haem group linked to protein by a cysteine § Ligand (i.e. drug) on the other side of the haem group

Homologues of p450 human CYP17 P450s:

3500+ sequences (50+ human genes) 125 structures ( 5 humans)

Protein

Length

Identity%

Hits

1PQ2

476

28.6%

136

1W0G

485

28.2%

135

1BU7

455

29.3%

79

1H5Z

455

24.0%

63

1N97

389

26.4%

61

1IZO

417

23.9%

35

1AKD

417

29.5%

18

Ø No single good candidate! But 5 proteins are close…

Consensus structure of CYP17 Consensus groups Consensus atoms

Haem group

Cavity area

Is it biologically meaningful?

P450 known patterns based on sequences

4 clusters based on structure Ø the 4 most common patterns!

e.g. blue box, P450 signature [FW]-[SGNH]-x-[GD]-x-[RKHPT]-x-C-[LIVMFAP]-[GAP] Modelling of P450 active site based on consensus 3D structures, J.-C. Nebel, International Conference on Biomedical Engineering, BioMed 2005, 16-18 February 2005, Innsbruck, Austria

Comparison with CYP17 active site models Generated independently by Dr S. Ahmed* (Kingston University - School of Chemical & Pharmaceutical Sciences)

Putative H-bond

*Ahmed S: The use of the novel substrate-heme complex approach in the derivation of a representation of the active site of the enzyme complex 17alpha-hydroxylase and 17,20-lyase. Biochem Biophys Res Commun 2004, 316(3):595-598.

Towards function prediction Clustering according to active site similarity (Kulczynski’s metric & Neighbour Joining)

NESTOR3D Protein families:

ClustalW P450s Globines Flavocytochromes Peroxidases Catalases, Cytochromes B, C, C2, C3 & C’

Nestor3D: free software (Java) http://www.kingston.ac.uk/~ku33185/Nestor3D.html

• Functionality § Generation of consensus structures § Generation of cavity descriptions § Generation of similarity matrices

• Applications § Better understanding of active sites (drug design…) § Function prediction: Ø detection of putative active sites from a protein 3D structure Ø generation of phylogenetic tree from similarity matrix § Homology modelling Ø generation of a structure template (constraints…) Ø validation of predicted protein structure

Conclusion • New method to produce high resolution active site models • Consensus structure elements biologically meaningful and related to function • Require proteins interacting with ligands or heterogeneous groups: § rigid molecules:

6% PDB entries § semi rigid molecules: 20% PDB entries

Future work • Alignments based on PROSITE patterns • Generation of a 3D motif database

5 kinases (ATP or ADP)

Suggest Documents