Receptor-Based Computational Screening of Compound Databases ...

9 downloads 543 Views 681KB Size Report
sent in a database into the binding pocket of the selected target and .... Online SMILES translator and structure generator from F. Oellien and M.C.. Nicklaus.
Current Protein and Peptide Science, 2006, 7, 000-000

1

Receptor-Based Computational Screening of Compound Databases: The Main Docking-Scoring Engines Olivier Sperandio§‡, Maria A. Miteva§, Francois Delfaud‡ and Bruno O. Villoutreix§,* §

INSERM U648, University Paris V, 45 rue des Sts Peres, 75006 Paris, France; ‡MEDIT SA, 2 rue du Belvédère, 91120 Palaiseau, France Abstract: The processes used by academic and industrial scientists to discover new drugs have recently experienced a true renaissance with many new and exciting techniques. The number of protein structures and/or chemical ligands is constantly growing, through the use of parallel chemistry, X-ray crystallography, NMR or homology modeling methods and so is the theoretical understanding of protein-ligand interactions. As such, structure-based approaches to drug-design and in silico screening are becoming routine part of most modern lead discovery programs. Prioritization of compound libraries is an extremely important task that aims at the rapid identification of tight-binding ligands and ultimately new therapeutic compounds. These in silico approaches combined with other experimental methods facilitate the design of new medicines to treat cardiovascular, degenerative, infectious, and neoplastic diseases, among others. Here, we review key concepts and specific features of several selected ligand–receptor docking/scoring methods while several other topics pertaining to the field of in silico screening are reviewed in the following articles of this special issue of Current Protein and Peptide Science.

Keywords: Virtual ligand screening, structure-based drug design, docking, scoring. I. INTRODUCTION Research scientists are under ever increasing pressure to identify new therapeutic compounds, as there are urgent needs to identify new molecules to fight against lifethreatening diseases, because of the importance taken by generics, the Damocles sword represented by the relatively short time span of patents or the completion of genome projects, among others. Drug discovery is a complex and expensive endeavor. It is commonly accepted that there are seven steps in the drug discovery process: disease selection, target hypothesis, lead compound identification (screening), lead optimization, pre-clinical trial, and clinical trial and pharmacogenomic optimization. Traditionally, these steps are carried out sequentially, and if one of the steps is slow, the entire process is delayed. Because it is not possible to speed-up clinical trials, it seems that the only way to accelerate the process is to act on the preclinical steps. Among the various techniques used to facilitate hit identification, experimental high-(or medium)-throughput screening (HTS) represents probably the most investigated one. The perspective of screening millions of compounds on a target is admittedly very powerful to identify hits but the associated financial resources/time are tremendous [1]. Along the same line of reasoning, the size of the virtual organic chemistry space accessible using currently known synthetic methods is estimated to be between 1020 and 1024 molecules, it is thus impossible to investigate experimentally all these compounds [2]. In addition, because numerous false-positive hits can be found via HTS methods, it has been suggested that othertools/approaches should be used in combination with ex *Address correspondence to this author at the INSERM U648, University Paris 5, 45 rue des Sts Peres, 75006 Paris, France; E-mail: [email protected] 1389-2037/06 $50.00+.00

perimental screening to facilitate the drug discovery process [1, 3-5]. In fact, virtual or in silico ligand screening (VLS) has become a method of choice for hit identification (when used with caution) not to replace HTS and NMR-based screens but rather to complement them, such that experiments be only carried out on a small list of compounds preselected via computer means and Human interventions [611]. Chemoinformatics is the application of informatics tools to help solving drug discovery problems, from library shaping to ADME-Tox prediction to VLS methods. All these methods including in silico screening are now frequently used [12-14] and have clearly additional applications in the elucidation of fundamental biochemical processes [15]. Among the various VLS methods directly used for hit indentification, we can usually distinguish two families: ligandbased screening and structure-based screening [16, 17]. For ligand-based methods, the strategy is to use information provided by compounds that are known to bind to the desired target and to use these data to identify other molecules in the databases with similar properties [5, 18-20]. This can be done by a variety of methods, including similarity and substructure search, clustering, QSAR, pharmacophore matching or three-dimensional (3D) shape matching. For structurebased methods (Fig. 1), it is assumed that the 3D structure of the target is known either by X-ray crystallography or NMR experiments (few studies have been reported using NMR and in silico screening so far), or predicted by homology modeling [21-27]. The principle here is to dock all the ligands present in a database into the binding pocket of the selected target and evaluate the fit between the molecules [9]. The quality of the fit is then used to rank the small molecules. The two critical parts of structure-based screening methods based on the 3D structure of the receptor are the search for a © 2006 Bentham Science Publishers Ltd.

2

Current Protein and Peptide Science, 2006, Vol. 7, No. 5

Sperandio et al.

Fig. (1). Overview of receptor-based VLS.

plausible binding mode for the ligand (docking) and the comparison of the different binding modes (scoring)[28]. There are indeed many different algorithms to dock and different methods to score the ligands (see for example [3, 8, 9, 20, 23, 29-42]). Next to structure-based screening methods, it is also important to mention inverse docking approaches (ligands are docked against numerous proteins). This concept is gaining momentum [43-46] since original papers about the methods [47, 48] to help scoring (see below) or to help understanding selectivity [49, 50]. Here we review concepts pertaining to the field of receptor/structure-based VLS and several selected applications, with particular emphasis on methods well-suited for high-throughput docking of large compound collections. We

first comment on databases and target preparations as well as on ADME (absorption, distribution, metabolism and excretion)/tox filters and then describe selected applications. To shorten the paper, some methods are only listed in a section called “Other programs”. We do not distinguish between commercial and non-commercial VLS packages for numerous reasons and it is important to note that most applications run on Linux/Unix (including Apple OS X) or Windows system. II. BACKGROUND Database and Target Preparation – ADME/tox Filters In the initial stages of a virtual screening project it is necessary to prepare the compound collection and to define the

Structure-Based Virtual Ligand Screening

target-binding pocket. There are many compound collections available including databases from chemical vendors (see Table 1) and several collections have been investigated thoroughly by several research groups (see for instance [51-53]). It is also possible to use focused libraries (see for instance some recent examples [54-56]) or to generate virtual libraries constructed from small fragments, but in this latter case it is important to note that some of the resulting molecules may be very difficult to synthesize [57, 58]. The initial compound collection is then usually reduced in size by applying several in silico filters (see Table 1), in an attempt to have a database of molecules that have physical properties and chemical functionality consistent with known drugs/leads/hits [51, 5971]. Common filtering protocols are variations of Lipinski’s rule-of-five (potential for oral bioavailability), they can include a limit on the number of rotatable bonds, on the polar surface area, on ClogP (calculated octanol/water partition coefficient, thus a measure of differential solubility of a compound in these two solvents), other rules involve removing compounds containing specific chemical substructures associated with poor chemical stability, reactivity [72] or toxicity (reactive groups include epoxides, anhydrides…see a recent paper and references therein [73]), frequent hitters [74] and promiscuous inhibitors [75], while methods to predict drug metabolism (e.g. cytochromemediated metabolism, Pgp efflux) have also been reported [76-81]. The selected molecules after applying Lipinski’s rule-of-five or related filters are sometimes erroneously called “drug-like” since in fact many organic chemicals conform to the above listed rules while they are by no means drug-like. In other words, these rules define only some necessary conditions for a drug candidate (such as likely solubility, bio-availability), not sufficient ones. For fragmentbased lead discovery [82] (fraganomics), Congreve et al. have also proposed other rules, the so-called “rule of 3” (molecular mass < 300 Da, max ClogP = 3 and number of Hbond donors and acceptors ≤ 3) [83] (see software like Flux to help the design of libraries of fragments [84]). In all cases, it is important to note that these filters should not be applied blindly but rather tailored to a specific project (e.g., soft filtering protocol for some cancer projects as it might be beneficial to keep potentially toxic compounds in the database). Just as an example about simple ADME/tox analysis, logP is known to provide valuable information as it is related to the hydrophobicity and hydrophilicity of a substance (http://www.raell.demon.co.uk/chem/logp/logppka.htm). A low logP (below 0, polar compounds) suggests molecules that need to be infused, medium (2-5, most drugs), the molecules can be used orally and for a very high logP (4-7, hydrophobic compounds), toxics may build up in fatty tissues. Other properties not directly related to ADME/tox but that would be very important to assess in silico include prediction of aqueous solubility and DMSO solubility [85]. The compound collections are usually in 2D (SMILES format [86], see Table 1, or Mol2 or SDF [87]) and special care is needed to generate the 3D structures of the small molecules. Moreover, physically relevant ionization and tautomeric states have to be considered whenever possible [4]. The stereochemistry of chiral centers is often not known before generation of the 3D database and it can be necessary to generate enantiomers arising from chiral centers. Com-

Current Protein and Peptide Science, 2006, Vol. 7, No. 5

3

puter tools such as Corina, Concord, Converter, Omega or ICM can be used to generate the compound collections in 3D (see Table 1). File format conversion can be performed with OpenBabel/iBabel (Table 1). Some free online tools can generate a 3D structure starting from a SMILES input or via manual drawing of the compounds (Table 1). The design of the compound libraries is very important since it plays a key role not only for real-life screening experiments but also to evaluate VLS packages (docking/scoring) [4, 88, 89]. Indeed, with regard to validation, a typical theoretical work involves pulling known actives out of a random library, but if the random compounds are, on the average, much smaller/bigger than the actives, then the docking/scoring method will appear very efficient while it might be indeed of limited value [90]. Several research groups have presented protocols to generate target-focused libraries or virtual libraries (for example [91],[92]). Analysis of the target 3D structures (analysis of point mutations, possible flexibility, quality of the 3D structures…) [25, 93-97], definition and preparation of the targetbinding site are also crucial to the process while lists of possible therapeutic targets and prediction of protein druggability/annotated binding sites [98-102] have been reported. Structural investigation of the binding pocket is important because docking/scoring methods are sensitive to the nature of the binding cavity (polar, closed, open, hydrophobe…see for instance [103-105] and eHiTS by Zsoldos et al. in this special issue). The binding pocket can be known from experimental work or theoretically predicted (see for example [102, 106-112]) (see Table 1). Many tools cab be used to characterize the pockets, such as MED-SuMo (structurebased comparison of proteins and protein-ligand interactions) [113], GRID and SuperStar (see Table 1). Free Linux/Unix tools to analyze/predict target cavities can be found on the Internet, see for instance [114] (and Table 1). Finally and with regard to targets, several articles have been focusing on VLS and specific receptor classes/diseases such as G-protein-coupled receptors [115-121], EGFR [122], metalloenzymes [123], CDK2 [93], casein kinase 2 [124], peroxisome proliferator-activated receptor gamma [125], cytochrome P450 2D6 [126], ribosome [127], RNA targets [128], kinases [129], metalloproteinases [130], tuberculosis [131], antimycobacterial drugs [132], prion disease [133], nuclear receptor [134], Severe Acute Respiratory Syndrome coronavirus 3C-like proteinase [135], phosphatase 2C (Ser/Thr phosphatase with divalent magnesium or manganese cofactors) [136] or on drug design to inhibit proteinprotein interactions (e.g., in cancer research) [137-140]. Interestingly, an analysis of the population of enzyme structures in the Protein Data Bank underlines that some target families are still not well covered in spite of the exponential growth in the number of structures deposited [141]. Yet, the situation is certainly going to change as many new 3D structures are being released, due to worldwide Structural Genomics initiatives (see for example among many others [142, 143]). One important advantage of in silico VLS methods is to enable screening of other regions than the catalytic site (or other already well known pockets of a given target) and thus target new regions of a protein, to, for instance, inhibit mac-

4

Current Protein and Peptide Science, 2006, Vol. 7, No. 5

Table I.

Sperandio et al.

Tools to Facilitate VLS Studies http://chembank.med.harvard.edu

Free collections and utilities

PubChem: http://pubchem.ncbi.nlm.nih.gov/

An information resource linking chemistry and biology

http://www.bioscreening.com/compound_libraries.htm

Web directory about compound collections

http://www.cermn.unicaen.fr/chimiotheque

Free collections

http://www.genome.ad.jp/dbget/ligand.html

Free collections

NMRShiftDB - http://www.nmrshiftdb.org

Free collection [292-294]

http://Ligand.info

Utilities such as ligand clustering and ligand similarity search [295]

ChemDB: http://cdb.ics.uci.edu/CHEM/Web/

Free collections and utilities

FAF-Drugs: http://bioserv.rpbs.jussieu.fr/Help/FAFDrugs.html

Free collections (and ADME/tox) and utilities [296]

ZINC: http://zinc.docking.org

Free collections (ADME/tox) and utilities [297]

ChemMine: http://bioweb.ucr.edu/ChemMine

Free collections and utilities

http://www.mdli.com

Available Chemicals Directory Commercial collection

http://www.chemnavigator.com

Commercial collection

http://www.ebi.ac.uk/chebi/

Dictionary of small molecules

bindingDB: http://www.bindingdb.org/

Measured binding affinities, complexes [166]

PDBbind: http://www.pdbbind.org/

Proteins with co-crystallized ligands and experimental binding affinities

KiBank: http://kibank.iis.u-tokyo.ac.jp/

Proteins with co-crystallized ligands and experimental binding affinities

RELIBASE: http://relibase.ebi.ac.uk

Proteins with co-crystallized ligands

http://bidd.nus.edu.sg/group/bidd.htm

Database of protein targets, ADME/tox and others

(TTD, ADME associated protein) DrugBank: http://redpoll.pharmacy.ualberta.ca/drugbank/

Numerous data about drugs and targets [298]

AffinDB: http://www.agklebe.de/affinity

Proteins with co-crystallized ligands and experimental binding affinities [299]

http://www.inteligand.com/

Ilib Diverse: tool to create virtual drug-like libraries

http://dtp.nci.nih.gov/index.html

The US National Cancer Institute collections including natural products

http://cactus.nci.nih.gov/services/translate/

Online SMILES translator and structure generator from F. Oellien and M.C. Nicklaus

http://cactvs.cit.nih.gov/SDF_toolkit/

The SDF toolkit (in Perl)

http://cactus.nci.nih.gov/ncidb2/chem_www.html

Compilation of web sites that offer chemistry databases/search services, data about toxic molecules, hazardous substances…

http://solvdb.ncms.org/solvdb.htm

Solvent database

http://www.molecularmodels.ca/

Models of main functional groups, courses in organic chemistry…

http://cgl.imim.es/biochemoinformatic.htm (Dr. J. Mestres lab.)

cMolP: compute molecular properties

Consolv: http://www.bch.msu.edu/labs/kuhn/web/software.html

Tool to analyze protein-water interaction [300]

AlogP: http://vcclab.org/lab/alogps

Tools to predict logP [301]

CDK : http://sourceforge.net/projects/cdk/

Chemistry development kit [302]

http://www.iupac.org/dhtml_home.html

IUPAC International Chemical Identifier project

http://www2.chemie.uni-erlangen.de/services/gifcreator/

GIF/PNG-creator with SMILES input

Structure-Based Virtual Ligand Screening

Current Protein and Peptide Science, 2006, Vol. 7, No. 5

5

(Table 1). contd…..

http://www-ra.informatik.uni-tuebingen.de/software/joelib/

Computational chemistry package

http://www.uku.fi/~thassine/ghemical/ http://www.uiowa.edu/~ghemical/osx.shtml

Computational chemistry package [303]

http://www.chemaxon.com/products.html

Computer tools for chemistry

MoFa: http://www.inf.uni-konstanz.de/bioml/research/index.html

Molecular fragment miner

http://blue.chem.psu.edu/~rajarshi/code/java/

Free ADME/tox, 2D to 3D and other utilities, based for instance on CDK [302]

http://iris12.colby.edu/~www/jme/smiledg.html http://iris12.colby.edu/~www/jme/dg.html

2D to 3D conversion and other tools

http://davapc1.bioch.dundee.ac.uk

2D to 3D conversion

http://relibase.ebi.ac.uk

2D to 3D conversion

Corina: http://www2.ccc.uni-erlangen.de/software/corina/corina.html

2D to 3D conversion

Concord: http://www.tripos.com

2D to 3D conversion

Converter: http://www.accelrys.com

2D to 3D conversion

Omega: http://www.eyesopen.com

2D to 3D conversion

ICM: http://www.molsoft.com

2D to 3D conversion

BUILD3D via or with XDrawChem : http://xdrawchem.sourceforge.net/

Possible 2D to 3D conversion

http://www.molinspiration.com/

ADME/tox online

http://www.molsoft.com/

ADME/tox online

http://www.chemaxon.com/

ADME/tox online

http://www.mol-net.de/

ADME/tox online

FAF-Drugs: http://bioserv.rpbs.jussieu.fr/Help/FAFDrugs.html

ADME/tox online, OpenBabel online [296]

http://www.daylight.com/smiles/f_smiles.html

Tutorial for SMILES and chemistry toolkit

http://146.107.217.178/lab/alogps/

Computation of logP (one molecule at a time) with several methods and other utilities

www.tripos.com/data/support/mol2.pdf

Mol2 file format (2D or 3D)

OpenBabel: http://openbabel.sourceforge.net/babel.shtml

File format conversion

iBabel: http://www.macinchem.fsnet.co.uk/applescripts.htm

File format conversion and other chemistry tools (essentially for Mac or Linux/Unix)

MED-SuMo: http://www.medit.fr

Tool for analysis of binding sites

GRID: http://www.moldiscovery.com/soft_grid.php

Tool for analysis of binding sites

SuperStar:http://www.ccdc.cam.ac.uk/products/life_sciences/superstar

Tool for analysis of binding sites

http://www.bioinformatics.leeds.ac.uk/sb/

Tool for analysis of binding sites [101]

Sc-PDB: http://bioinfo-pharma.u-strasbg.fr/scPDB/

Tool for analysis of binding sites [100]

Q-site: http://www.bioinformatics.leeds.ac.uk/qsitefinder

Server to predict binding site [107]

CASTp: http://sts.bioengr.uic.edu/castp/

Server to predict binding site

SCREEN : http://interface.bioc.columbia.edu/screen

Server to predict binding site [304]

MEDock: http://medock.csie.ntu.edu.tw/

Online tool to define binding site [305]

http://rocr.bioinf.mpi-sb.mpg.de/ and http://www.r-project.org/

R and ROCR: Free tools to compute ROC curves

Moloc: http://www.moloc.ch/about.html

Roche Biostructural modeling package

6

Current Protein and Peptide Science, 2006, Vol. 7, No. 5

romolecular interaction. Along the same line, a new protein with no known binding pocket/known function can also be targeted since it is possible to suggest likely binding sites. Appropriate protonation states of ionizable target residues in the binding site need to be determined [144] and the correct tautomer (e.g., for histidine) should be assigned, if possible. The positioning of hydrogen atoms on hydroxyl groups in the active site should also be examined and altered if necessary. In some cases, tightly bound crystallographic waters or ions might need to be maintained during the docking/scoring process. Loop/side-chain/main-chain flexibility should be investigated if possible, through investigation of different crystal forms/NMR structures or theoretical simulation procedures [94, 145-149] prior to docking experiments. Docking and Scoring Assuming the receptor structure is available, a primary challenge in lead discovery is to predict both ligand orientation and binding affinity; the former is often referred to as ‘molecular docking’ while the latter is referred to as ‘scoring’ (or ranking) [103]. Docking protocols can be described as a combination of two components; a search strategy and a simple scoring/fitting function to assess the poses prior to the real scoring step. The search algorithm should generate an optimum number of configurations that should include the experimentally determined binding mode. At this stage, it should be mentioned that docking accuracy is usually assessed by the ability to reproduce the experimentally determined binding mode of a ligand but direct comparisons with experimental data should be considered with cautions since, for example, odd interactions could be brought about by extreme crystallization conditions or else forced by crystal contacts with symmetry related molecules. The complexity of molecular docking implies several approximations, from rigid body docking (where both partners are treated as rigid but “flexibility” can be generated prior to docking), to (pseudo)flexible ligand docking (where the receptor is held rigid and the ligand is partially flexible) to flexible docking (where both receptor and ligand flexibility are considered). Algorithms dealing with flexibility can be divided in three types, namely systematic (e.g., incremental construction algorithms), stochastic (e.g., Monte Carlo methods and evolutionary algorithms, they make random changes on some variables and usually require multiple independent runs), and deterministic searches (e.g., energy minimization and molecular dynamics) (see explanations about these simulation methods in [36, 37]). Some VLS packages use more than one of these approaches. Generating a broad range of binding modes is ineffective without a model to rank each conformation that is both accurate and efficient/fast. This step is also crucial because of its role in determining in fine which ligand has a priori a potential affinity for a given target. A rigorous scoring function will generally be computationally expensive, thus, the function’s complexity is usually reduced, with a consequential loss of accuracy. The scoring functions commonly used are said to be: force-field based, empirical and knowledge-based (Fig. 2) [23, 28, 150]. Several of these functions have been

Sperandio et al.

recently evaluated [151-160]. Force-field based methods try to predict the binding free energy of a protein-ligand complex by adding up individual contributions from different types of interactions. Usually, non-bonded interaction energy between the ligand and the protein, is computed and the interaction terms include van der Waals, electrostatics and hydrogen bonds. Force-field scoring functions are based on different force-field parameter sets, derived from physical/chemical experiments or theoretical computations, additional terms can also be used to refine the ranking efficiency (e.g., better treatment of solvation energy) [160-162]. Programs for energetic analysis of receptor-ligand interaction based on force-field scoring functions can be available online, like for instance PEARLS [163] (Table 2). Empirical scoring functions sum enthalpic and some entropic interactions with some relative weights obtained on a training set of well-characterized protein-ligand complexes. The weights are assigned by regression methods in order to match experimentally defined affinities. Here again, the interaction terms include at least van der Waals, electrostatics and hydrogen bonds. Knowledge-based scoring functions rely on statistical analysis to extract rules on preferred atom pair interactions observed experimentally. The rules are interpreted as pair-potentials that are used to score ligand-binding pose. Scoring functions try to reproduce in some cases, experimentally measured binding constant. However, this is not trivial for numerous reasons, including the fact that experimentally measured affinities of the same compounds with the same target can differ from laboratory to laboratory. Some research groups now suggest the use of quantum mechanics-based scoring functions for in silico screening experiments [164]. In order to help the optimization of scoring functions, several databases have been created, like PDBbind [165], BindingDB [166], RELIBASE [167], KiBank [168] in which scientists can find the 3D structure of a target, the ligands and measured binding affinity data (Kd, Ki, IC50) (see also Table 1). Other test sets to investigate binding and scoring have been reported, such as the CCDC/Astex test set [169]. In many cases, it will be important to tune the scoring function in order to improve enrichment factor, this can be done by twisting some parameters in order to better consider to the nature of the binding pocket or by selecting the best package for a given targets assuming one has enough experimental information to start with [50, 170-172]. To prevent the approximations inherent to each scoring function, modeling groups have suggested the use of consensus scoring protocols [154, 173-183]. This type of approach relies on the combination of several scoring functions in order to reciprocally compensate the drawbacks of each function used and promote the identification of true positive hits. The importance of consensus scoring is however still under debate (see for example [90, 170]). Other methods for postprocessing of docking results have been investigated, such as MASC (multiple active site correction) [50] or SIFt (structural interaction fingerprint) [184] or rescoring with Xscore [157] or with HINT [185-187] (see review by Ferrara et al., [150]). Several groups suggest now the use of consensus docking and consensus scoring protocols (see below).

Structure-Based Virtual Ligand Screening

Current Protein and Peptide Science, 2006, Vol. 7, No. 5

7

Fig. (2). A) Example of force-field based scoring function [223]. B) Example of empirical scoring function [215]. C) Example of knowledgebased scoring function [291].

III. THE MAIN RECEPTOR-BASED DOCKING/ SCORING PACKAGES Stochastic (Random) Search There are many docking programs available, they differ in the sampling algorithms used, the handling of ligand and protein flexibility, the scoring functions they employ, and the CPU time required to dock a molecule into a given target. In the following, we briefly outline the main features of these computer packages/algorithms. It is important to note before describing these applications that, for the time being, no one docking/scoring method performs consistently well across different protein targets and that it is important to test the packages with non-default parameters on the selected targets prior to start a screening campaign. The main stochastic search methods are Monte Carlo (MC) and Genetic Algorithms (GA). MC can generate an ensemble of conformations statistically consistent at a given temperature. Random perturbations of the atomic positions (or of user-defined variables) are applied in order to explore the conformational space of the molecular system (applied

differently, MC sampling can also be used to compute accurate binding free energy [188]). An energy function evaluates whether the energy of the newly generated conformation is either lower than the one from the previous step or, if higher, is within an energy range defined by the so-called Boltzmann factor (Metropolis criteria). GAs use the concept of evolution to explore the molecular conformational space and attempt to converge to optimal docking poses. The degrees of freedom of the system are encoded as ‘genes’ into a ‘chromosome’ and are stochastically perturbed through the use of genetic operators. Mutations change the gene-associated value of the variable, and cross-over simulates gene recombination between two different chromosomes. A fitness function is used as an evolutionary pressure to evaluate which members of the population are to be transmitted to the next generation. Affinity (www.accelrys.com) (Essentially Monte Carlo with Simulated Annealing) Affinity uses a combination of Monte Carlo type and simulated annealing procedure to dock a ligand into a receptor pocket. It incorporates some ligand/protein flexibility.

8

Current Protein and Peptide Science, 2006, Vol. 7, No. 5

Table 2.

Some Docking/Scoring Engines/Tools

Some packages : (see text for references, explanations and other programs) Main method: Stochastic (random) search LigandFit (www.accelrys.com) GLIDE (www.schrodinger.com/) ICM (www.molsoft.com) AutoDock 3.0 (www.scripps.edu/mb/olson/doc/autodock) GOLD (www.ccdc.cam.ac.uk/) and SILVER (for the post-processing of docking results) PRO_LEADS (www.protheric.com) Systematic search/Incremental construction SLIDE (www.bch.msu.edu/labs/kuhn/web/index.html) FRED (www.eyesopen.com) Please see the online demos to try many OpenEye applications. DOCK (http://dock.compbio.ucsf.edu/) (see also ViewDock to analyze Dock data: www.cgl.ucsf.edu/chimera/docs/UsersGuide/index.html) (see tools for postprocessing DOCK results [306])

Sperandio et al. MMTK [311]. http://www.bioinfo.no/tools/normalmodes [312] Online tools to compute protein electrostatics: PCE (Protein Continuum Electrostatics) [313] http://bioserv.rpbs.jussieu.fr/PCE

This is thus an energy-driven method. The molecular system is divided into three regions: a region where no atom is allowed to move; a region of full flexibility (ligand + binding site); and a buffer region. A rough starting complex conformation is first generated on which a first Monte Carlo step is applied. This is a classical Monte Carlo simulation except that each step that presents with a lower energy is minimized before applying the Metropolis criteria. Once the obtained conformation is satisfactory, a simulated annealing procedure is used to refine it. Affinity uses the molecular mechanics program Discover. The CVFF and CFF91 force-field parameters have been implemented to evaluate the binding energy along the docking procedure. Non-bond interactions can be calculated in many different ways: a grid-based approach, a cell multipole approach, a group based cutoff approach, or a hard sphere steric method without electrostatics. Furthermore, Affinity can also incorporate some solvation effects. In addition to the nonbond interactions, various empirical penalty terms, a distance based hydrogen-bond term, a ligand confining term and a simple tethering term can be applied to aid the docking process.

Surflex (www.biopharmics.com)

LigandFit (www.accelrys.com) (Monte Carlo) [189] DockIt (www.metaphorics.com) FlexX (www.biosolveit.de/) (www.tripos.com/) FlexE (www.biosolveit.de/) (www.tripos.com/) Example of Online scoring tool PEARLS: Program for Energetic Analysis of Receptor-Ligand System [163] http://ang.cz3.nus.edu.sg/cgi-bin/prog/rune.pl GFscore, A General Non-Linear Consensus Scoring Function for HighThroughput Docking [307] http://gfscore.cnrs-mrs.fr/index.htm Examples of Online tool to analyze protein point mutations – Important information to consider in some VLS projects SIFT: http://blocks.fhcrc.org/sift/SIFT.html [308] PolyPhen: www.bork.embl-heidelberg.de/PolyPhen/ [309] FOLD-X: http://fold-x.embl-heidelberg.de:1100/cgi-bin/main.cgi (compute also protein-protein binding energy) [310] Some additional tools for small molecules or proteins (among many others): http://wwmm.ch.cam.ac.uk/wikis/wwmm/index.php/Main_Page Murray-Rust research group: World Wide Molecular Matrix a WWW site about concepts and tools for Webservices in the field of chemistry and drug design Web-server for Normal Mode Analysis of proteins to investigate for large amplitude movements. Normal modes are performed with

The package has utilities to predict/define the binding site, either based upon the protein shape (flood-filling algorithm) or based on a co-crystallized ligand. A Monte Carlo method is employed for the conformational search of the ligand (Fig. 3). During this search, bond lengths/bond angles are untouched while torsion angles are randomized (stochastic sampling). Multiple structural changes may thus occur at the same time during this step. Once a new conformation for the ligand is generated, the fitting of the compound in the binding pocket is carried out (shape similarity search procedure), eventually followed by rigid body minimization. The selection of a pose based on comparing the shape of the ligand conformation with that of the binding pocket is performed. If the shape is similar then the ligand is docked into the binding site and its binding energy is evaluated via an energy function called DockScore [189]. This one is divided into two terms: the interaction energy with the protein (essentially a soft 9-6 van der Waals term and an electrostatic term with a distance-dependent dielectric constant) and, eventually, the internal energy of the ligand. Evaluation of the interaction energy is made through the use of a grid. The position and the orientation of the ligand is optimized by minimizing this “dock energy” with respect to rigid body translations and rotations of the ligand. This docking procedure attempts to produce ligand poses having favorable interaction energy with the receptor, while a scoring procedure is then performed in order to predict binding affinities and prioritize docked ligands. Several scoring functions are available including, for instance, Ludi [190], LigScore [191] or PLP [192]. The Dreiding and CFF force-fields are available. Site partitioning via relocation

Structure-Based Virtual Ligand Screening

Current Protein and Peptide Science, 2006, Vol. 7, No. 5

9

Fig. (3). LigandFit flowchart.

clustering algorithm has been implemented to divide the original binding site into smaller sites of various sizes to improve docking into some difficult pockets. The overall procedure is reasonably fast on Linux workstations although it becomes very heavy when the site partitioning protocol is applied. GLIDE (Hierarchical Filters, Monte Carlo) (http:// www.schrodinger.com/) [193, 194] The method uses hierarchical filters to explore plausible docking poses for a given ligand within the receptor site (Fig. 4). The shape and properties of the receptor are represented on a grid by several different sets of fields (computed prior to docking) and calculations become progressively more accurate as the docking proceed. The binding site is defined by a rectangular box, this one confines the translations of the mass center of the ligand into the desired region of the target. A set of initial ligand conformations is generated through exhaustive search of the torsional minima, and the conformers are clustered in a combinatorial fashion. Each cluster, characterized by a common conformation of the core and an exhaustive set of side-chain conformations, is docked as a single object in the first stage. The search begins with a rough positioning and scoring phase that significantly narrows the search space and reduces the number of poses to be further considered (Fig. 4 steps 1 to 2c). The selected conformations are subjected to standard minimization with the OPLS-AA force-field in the receptor binding site (Fig. 4 step 2d). Then, the 10 lowest-energy poses go through a MC procedure in which nearby torsional minima are examined, and orientation of peripheral groups

Fig. (4). GLIDE flowchart.

of the ligand is then refined. The minimized poses are rescored using the GlideScore function, which is a more sophisticated version of ChemScore with force-field–based components and additional term accounting for solvation. The choice of the best pose is made using a model energy

10

Current Protein and Peptide Science, 2006, Vol. 7, No. 5

score (Emodel) that combines the energy grid score, GlideScore, and the internal strain of the ligand. The nonbonded terms of the molecular mechanics component have been scaled down (soft potential) in order to take into account the observed exiguity of certain receptor sites and their inability to accommodate the presence of a ligand.

Sperandio et al.

uses a potential energy function with scaled partial charges to prevent overestimation of the electrostatic energy. The idea is to use the simplest scoring function possible to increase computation speed while still trying to account for solvation effects. Both calculations use pre-calculated information on a grid. The two-stage docking procedure is then repeated for a large number of random ligand orientations.

QXP (or Flo+ Software) (Monte Carlo) [195] QXP contains several different algorithms/methods to produce low-energy conformers or for docking purposes (e.g., MCDOCK and FULLDOCK algorithms). The torsional conformation space of small molecules can be explored rapidly using Metropolis sampling. This is followed by rigid body rotations and translations to align the ligand onto guide atoms within the active site. These guide atoms are atoms in van der Waals contact with the binding site atoms. Having aligned the atoms within the active site, the MC method is applied to the ligand using only rigid body rotations and translations. Conjugate-gradient minimization is then performed on the ligand torsion angles followed by Metropolis MC. In this method a grid representation of the receptor is used. In the most recent version of the package, protein flexibility within the binding pocket is authorized. QXP has two scoring functions, a function based on traditional molecular mechanics force-field (modified version of the AMBER force-field with short non-bonded cut-offs and a distance dependent dielectric) and an empirical potency score, optimized simultaneously for pose and affinity prediction. This empirical potency score contains terms for receptor-ligand atom-atom contacts, hydrogen bonds, steric repulsion, solvation, internal ligand strain, and ligand and receptor entropy. This scoring function has been developed and optimized to predict both the relative potencies of inhibitors in experimental SAR (structure-activity relationship) series and the crystallographic binding modes of those inhibitors for which complex structures were available [196]. Prodock (Monte Carlo) [197] The method uses a MC minimization technique to dock flexible ligands to a flexible binding site, using internal coordinates to represent the molecules. The process differs from a standard MC because a gradient based minimization takes place prior to applying the Metropolis criteria. A grid based representation of the interactions between the ligand and the receptor is used. During docking, the magnitudes of the various energy terms are scaled to facilitate sampling. Finally, a specific term in the energy function takes into account solvation energy that is proportional to the exposed surface area to the solvent. Two force-fields are implemented in Prodock, namely AMBER and ECEPP/3 along with an implicit solvation model. DockVision (www.dockvision.com) (Monte Carlo) [198] This approach uses a MC based docking algorithm. The search process is divided into two steps. The first stage of this docking algorithm generates a random ligand orientation. This first MC step minimizes major steric clashes while the second step uses a more typical MC simulated annealing protocol. The first step involves a crude scoring function (a geometric score for atomic overlaps) while the second step

ICM (www.molsoft.com) (Monte Carlo) [199, 200] The ICM program (Fig. 5) is based on Monte Carlo simulation that relies on global optimization of the energy function of the flexible ligand in the receptor field (flexible ligand/grid receptor approach) (receptor flexibility, sidechains/main-chain can be considered, see the IFREDA method [200]). The location of the receptor-binding pocket can be specified by the user or selected by the cavity detection module implemented in the program. The energy terms include the internal energy of the ligand based on the ECEPP/3 force-field, as well as soft-van der Waals, hydrogen-bonding (spherical Gaussians centered at the donor and acceptor sites), electrostatic with distance-dependent dielectric constant and hydrophobic ligand/receptor interaction terms pre-calculated on the grid for computational efficiency. A Monte-Carlo minimization procedure in the internal coordinate space is employed to search for the global minimum of the energy function. Each step of the algorithm consists of a random change of two types, torsional or positional, followed by the local minimization. Torsional move involves complete randomization of a single arbitrarily chosen torsion angle. A positional move involves a pseudoBrownian random translation and rotation of the ligand. A third type of move can also be applied, torsion moves of amino-acid side-chains at the interface using a biased probability methodology (the idea is to sample with a larger probability those regions of conformational space which are known, a priori, based on analysis of rotamers, to be highly populated). Global optimization is performed in the binding site such that both the intra-molecular ligand energy and the ligand-receptor interaction energy are optimized. The VLS scoring function used in ICM consists of the internal forcefield energy of the ligand and the ligand/receptor interaction energy with eventually a term to account for the size of the binding site/ligand. The ligand/receptor interaction energy includes several weighted terms: van der Waals, a hydrophobicity term based on the solvent accessible surface buried upon binding, an electrostatic solvation term calculated using a boundary-element solution of the Poisson equation, hydrogen-bond interactions and an entropic term proportional to the number of flexible torsions in the ligand. A history mechanism (stack file) that both expels unwanted minima and promotes the discovery of new minima is also implemented. MCDOCK (Monte Carlo) [201] The method is a three-stage strategy using a Monte Carlo algorithm (Fig. 6). The three successive stages of the protocol consists of increasingly refining the level at which the ligand is placed within the receptor site. This first stage (geometry-based docking) consists of placing the ligand inside

Structure-Based Virtual Ligand Screening

Current Protein and Peptide Science, 2006, Vol. 7, No. 5

11

Fig. (5). ICM flowchart for VLS and/or small molecule-receptor docking.

the receptor site without major clashes by using a binary grid. A MC routine that uses only 6 degrees of freedom associated to rotation and translation is in charge of positioning the rigid ligand into the binding site. Constraints are used to prevent the ligand from escaping the binding site. The second stage uses another MC protocol (energy-based docking). The nonbonded terms of the energy function are the classical Lennard-Jones and Coulombic terms. Concerning the internal energy of the ligand, only the nonbonded components are treated, while the torsional components are ignored. A set of cutoffs is used to evaluate the nonbonded terms of the interaction energy between the ligand and the receptor, depending on the stage of the sampling. After a global sampling that allows the eviction of bad contacts between the ligand and the receptor, a simulated annealing protocol is applied using Metropolis criteria. The position of the center of mass, the three overall Euler angles, and internal torsion angles of the ligand are sampled. Finally, a MC protocol is used to prevent the system to be trapped in a local minimum. AutoDock 3.0 (Genetic algorithm) (http://www.scripps. edu/mb/olson/doc/autodock) [202]

Fig. (6). MCDOCK flowchart.

This program employs a Lamarkian genetic algorithm (LGA) that incorporates a local minimization for a given fraction of the population. The LGA mixes a global search for ligand conformation and orientation, handled by a genetic algorithm switching between “genotypic space” and “phenotypic space, with an adaptive local search to perform energy minimization. Crossover and mutation occur in genotypic space on a random number of individuals, while phenotypic space is determined by the energy function. The

12

Current Protein and Peptide Science, 2006, Vol. 7, No. 5

scoring function of AutoDock3.0 is modeled after the AMBER force-field, and uses a pairwise sum of energetic terms with parameters for van der Waals, hydrogen bonding and distance-dependent dielectric electrostatics, as well as conformational torsional restriction entropy and empirical solvation terms. DARWIN (Genetic Algorithm) [203] The method combines genetic algorithm and local gradient minimization with the CHARMM molecular mechanics force-field and function [204]. The procedure for energy evaluation contains a specific component that focuses on the solvation energy, a term that has been shown to be essential for accurate reproduction of known binding modes. Solvent contributions are assessed using a modified version of DelPhi (see review on electrostatics [205]). DIVALI (Genetic Algorithm) [206] The method uses an AMBER-type potential energy function with a distance-dependent dielectric and a genetic algorithm search function. The receptor is modeled as a rigid entity and a grid-based energy evaluation of ligand-protein interactions can be performed. The scoring function used to discriminate survivors from unfavorable docking solutions is based on the AMBER energy function. It is composed of both inter- and intramolecular energies. The intermolecular energy depends mainly on a distance-dependent electrostatic term and van der Waals interactions. Hydrogen bonding interactions and solvation effects are not explicitly taken into account. The internal energy of the ligand is the sum of intramolecular non-bonded interactions (VDW + Coulombic) and a torsional energy term. FFLD (Genetic Algorithm; Fragment-Based) [207, 208] The docking approach is a three-step strategy based on the decomposition of a flexible ligand into rigid fragments. First, the program SEED is used to dock the fragments into the binding site of the receptor. Second, the ligand is docked by a genetic algorithm (FFLD) that uses a fast scoring function. The genetic algorithm perturbations affect only the conformation of the ligand; its placement in the binding site is determined by the SEED anchors and a least-square fitting method. With this approach, the position and orientation of the ligand in the binding site are determined by the best binding modes of its fragments previously docked using an accurate energy function that takes into account electrostatic solvation. The FFLD scoring function consists of an intraligand van der Waals energy, a ligand–protein soft core van der Waals, and an intermolecular polar energy term (number of hydrogen bonds and unfavorable polar contacts). Third, the FFLD results are post-processed by CHARMM [204] minimization (CHARMm22 force-field). The user can decide to take into account solvation effects (according to the Generalized Born approximation for fragment solvation and with an ad hoc procedure for the unbound receptor via finite difference solution of the Poisson equation) [209]. GOLD (Genetic Algorithm) (http://www.ccdc.cam.ac.uk/) [210-212] The GOLD program consists of three main parts and it uses a genetic algorithm (GA) to explore the full range of

Sperandio et al.

ligand conformational flexibility and the rotational flexibility of selected receptor hydrogens. The mechanism for ligand placement is based on fitting points. The program adds fitting points to hydrogen-bonding groups on protein and ligand, and maps acceptor points in the ligand and donor points in the protein and vice versa. Additionally, GOLD generates hydrophobic fitting points in the protein cavity onto which ligand CH groups are mapped. The search algorithm (GA) optimizes flexible ligand dihedrals, ligand ring geometries, some protein dihedrals, and the mappings of the fitting points. The scoring function to rank different binding modes can be Goldscore, a molecular mechanics-like function with four terms: the protein-ligand hydrogen-bond score, a soft protein-ligand van der Waals term, the ligand intramolecular hydrogen bond and ligand intramolecular 612 van der Waals « strain » terms. Several other scoring functions are available in GOLD and the Silver program for the post-processing of the docking results has been recently reported. Water molecules can be treated specifically. PRO_LEADS (http://www.protheric.com) [213] PRO_LEADS docks flexible ligands into a rigid receptor site using a Tabu search algorithm. The principle of the tabu search is to maintain a list of tabu conformations. Those represent previously accepted conformations (via the scoring function) from which subsequently generated conformations must be as far as possible (RMSD criterion) in order to diversify the conformational space explored. After reaching a user-defined population, the newly accepted conformation must replace a conformation that is already present in the tabu list following the rule of “first in” “first out”. The ligand flexibility is addressed through a defined set of rotatable bonds. Those represent the main degrees of freedom used in the tabu search. The full set of variables used in the tabu search is defined by an internal coordinate modeling tree [214]. The generated poses are evaluated by directly assessing the binding affinity between the two molecules via a modified version of the empirical scoring function, ChemScore (empirical) [215], which up-weights hydrogen bonds. A grid-based calculation of the potential energy is used in the case of lipophilic interactions and to prevent steric clashes. A local minimization is applied on the lowest energy conformations obtained from the tabu search. SYSTEMATIC TRUCTION

SEARCH/INCREMENTAL

CONS-

LibDock [216] There are four important aspects for this docking procedure: the conformational search procedure, the binding site image (i.e., the interaction hot spots), the matching step, and the final optimization and scoring step. First, several random conformations are generated (varying only the rotatable bonds). The internal ligand energy, is estimated by using van der Waals potentials and a dihedral angle term, and each conformation is then minimized. The binding site image consists of a list of apolar hot spots (i.e., points in the binding site that are favorable for an apolar atom to bind), and polar hot spots (i.e., points in the binding site that are favorable for a hydrogen bond donor or acceptor to bind). To initially position a given conformation of a ligand as a rigid body into

Structure-Based Virtual Ligand Screening

the binding site, the atoms of the ligand are matched to the appropriate hot spots. A single conformation can produce up to 10,000 matches. In the interest of efficiency, most of these matches cannot be optimized, so a pruning/scoring strategy is required. A simple scoring function (hydrogen-bonding potential or a steric potential) is used, not as a tool to absolutely rank the docked conformations but rather as an initial filter to select only a few docked conformations. SLIDE (http://www.bch.msu.edu/labs/kuhn/web/index .html) [217, 218] Slide docks flexible ligands into a partially flexible protein. The core of the approach relies on an iterative matching procedure between interaction centers within the receptor and interaction points within the ligand. The receptor site is analyzed in term of interactions points: hydrogen-bond donor, hydrogen-bond acceptor and hydrophobic. A similar procedure is applied on the ligands. A multilevel hashing procedure exhaustively detects matches between triplets of interaction points and triplets of ligand atoms such that vertices and edges are, compatible and within a threshold distance, respectively. Once a match is found, the corresponding triplets are superimposed through a least-square procedure. This determines an anchor fragment, later kept rigid. Steric hindrance between the protein and the ligand anchor fragment is resolved by using rigid body translations. Once the anchor fragment is determined and that no collision is observed, the rest of the ligand atoms are flexibly added and optimized by rotating all single bonds. This includes some side chain flexibility. In many cases, some van der Waals collisions between atoms of the ligand and the protein structure will be noticed. The meanfield theory is thus used to decide which torsion angle is to be rotated to improve shape complementarity. This theory is based on an iterative process that assumes that most moves of both molecules required to accommodate the presence of a ligand are minimal. If a solution is not found, this ligand orientation is rejected. The generated poses are evaluated with an empirical scoring based on evaluation of hydrogen bonds, hydrophobic contacts, and the number of displaced water molecules. The number of hydrogen bonds is counted and the complementary surface between ligand and protein is evaluated for hydrophobic groups. The number of water displacements and protein intramolecular hydrogen bonds disruption can be taken into account. Hydrophobicity is evaluated with a knowledge-based criterion [219]. Using a concept of occurrence of atomic exposure to solvent from a database of compounds, protein and ligand atoms are weighted with a hydrophobic score defining their effective hydrophobic interaction. ProPose [220, 221] This docking package is based on an incremental construction algorithm (fragmentation of the ligands by cutting at rotatable bonds). A torsional angle library derived from semi-empirical quantum chemical calculations provides information about torsion angle energies to build the ligand within the active site of the receptor. The program uses a discrete “pharmacophore-like” description of molecular interactions that are transformed into a smooth potential energy surface. A target description file contains information required for the incremental docking procedure: atomic co-

Current Protein and Peptide Science, 2006, Vol. 7, No. 5

13

ordinates (protein and ligand), definition of interactions points and centers. Once a base fragment is docked the remaining parts of the ligand are added incrementally inside the receptor-binding pocket. A fast scoring function is then used to investigate the ligand poses FRED (http://www.eyesopen.com) [222] This method performs docking of rigid ligands into a rigid receptor site. It exhaustively docks compounds into the binding pocket (by rigidly rotating and translating each conformer, this strategy completely avoids the sampling issues associated with stochastic methods). Then FRED filters the pose ensemble by rejecting the ones that clash with the protein using a negative image of the active site. All remaining poses are ranked with one scoring function. Optionally, FRED can perform a systematic solid body optimization of the top ranked poses. The best pose is selected using the Consensus Structure method (or optionally just one of the available scoring functions). A full coordinate optimization of the pose via the MMFF force-field can be performed. The refined poses can then be scored using one or more scoring functions, including MASC (Multiple Active Site Correction) corrected scoring functions [50]. Among the many features of FRED, it is important to note the use of a very fast Gaussian based scoring function that evaluates the surface complementarity between the receptor and the ligands DOCK and Methods dock.compbio.ucsf.edu/)

Based

on

DOCK

(http://

DOCK was one of the first docking programs to be developed for structure-based drug design [223] (see also review [224]). The general approach of this method is divided into three main steps. First, the determination of a set of overlapping spheres in contact with the surface of the receptor site (Fig. 7A). These spheres fill the molecular surface of the binding site and represent a negative image of the target site. Second, the center of these spheres is matched with the ligand atoms via the use of a graph-matching algorithm. Third, a scoring function is used to evaluate the pertinence of the docking poses by approximating the protein/ligand binding energy. Once the molecular protein surface is generated, the program SPHGEN [225] creates spheres in the binding pocket by using each point of the receptor surface and their associated normals. The centers of the spheres generated to fill the receptor binding-site in DOCK represent the putative positions of ligand atoms. The concept of the matching algorithm used in the DOCK suite consists of finding equivalent internal distances between subsets of ligand atoms and the sphere centers. There are three main ways of proceeding: automatic, manual, and chemical. For example, in the automated procedure, the number of desired ligand orientations needs to be specified, and DOCK generates matches between ligand atoms and sphere centers until the number of orientations is attained. In the manual procedure, the distance and node parameters of the graph need to be specified, and DOCK will find matches that satisfy them. The docking procedure can be done using rigid ligands or flexible ligands. In the case of flexible ligand-docking, the very first stage is the identification of rigid fragments and the

14

Current Protein and Peptide Science, 2006, Vol. 7, No. 5

torsion angles that link them (Fig. 7B). Two methods can be applied, anchor-first search or simultaneous search. The anchor-first search method (DOCK4) is based on the divideand-conquer algorithm and on a greedy algorithm. It uses a multi-concentric region representation of the ligand with the selected anchor fragment in the center region. The segments are subsequently added to the anchor fragments according to the innermost layer first and, within a layer, the largest segment first. Each added fragment is subjected to extensive torsion angle exploration calculations. In the simultaneous search, all torsion angles are searched and minimized at the same time. The conformational search is performed before the orientation search such that each conformation is done independently. It is possible to perform energy minimization at each incremental cycle or at the final stage. The best candidates are selected by using an efficient scoring/pruning procedure. The evaluation of the ligand orientation uses a grid-based procedure in which steric and electrostatic interactions between the putative ligand and the receptor are pre-computed at each grid point. Several approaches can be actually used: contact score, energy score (Lennard-Jones van der Waals potential and Coulombic electrostatics with distance-dependent dielectric constant), Generalized Born/Surface Area (GB/SA) score (implemented in DOCK5), chemical score. A continuum score calculation is also available. TARGETED DOCK (Systematic, Based on DOCK) [226] Targeted-DOCK is a modified version of DOCK 1.0 that is able to target user-specified atom types to selected positions in the receptor site. The list of pairs between ligand

Sperandio et al.

atoms and receptor spheres can be obtained from analysis of the receptor site, for example, a specific hydrogen-bond or a tightly bound water molecule. Those pairs represent specific interactions not just shape complementarity therefore it becomes possible to weight differently target sphere/atom pair and other sphere/atom pairs. The match algorithm of DOCK then focuses on this specific list of pairs of target sphere and ligand atom. PhDOCK (Pharmacophore, DOCK) [227]

Systematic,

Based

on

PhDOCK is based on the pharmacophore representations of small molecules that are stored in a database. This pharmacophoric-point representation is compared to predefined DOCK site points in the binding region in order to orient the complex (ligand + protein). The global methodology is based on the general DOCK protocol as implemented in DOCK 4.0. An iterative procedure is applied which consists of associating each molecule to a pharmacophore. The pharmacophore representation is first used to overlay molecules based on their widest 3D pharmacophore. The basic objects of this representation are simply hydrogen-bond donors, hydrogen-bond acceptors, and ring centroids. For each orientation that provides a good match with the receptor points, the ensemble of conformers is docked into the binding site, and all members of the ensembles are scored SG-DOCK/SP-DOCK (systematic, molecular field similarity, based on DOCK)[228] These methods apply two distinctive algorithms, SPDOCK (similarity penalizing docking) and SG-DOCK

A)

B)

Fig. (7). A) Binding site definition in DOCK. B) Optimization of van der Waals interactions in DOCK (by torsion angle variation).

Structure-Based Virtual Ligand Screening

Current Protein and Peptide Science, 2006, Vol. 7, No. 5

15

(similarity guided docking) based on DOCK 4.0. SG-DOCK uses similarity criteria along the incremental construction process. This algorithm promotes target-ligand orientation having the binding mode observed in the reference structures and penalizes those diverging from them. The similarity docking score is used throughout the incremental ligand construction as well as in the final minimization step. On the contrary, SP-DOCK exclusively uses similarity criteria to penalize the docking pose after the docking process, thus having only an effect on the final ranking. These algorithms use the program MIMIC to calculate a Gaussian-based molecular field (steric and electrostatic) similarity between a target ligand and a reference structure. Ligand-bound protein structures are used as reference structures as well as sets of probes or fragments placed in the active site of the protein. MVP [229] The MVP program implements several different docking algorithms, including a special procedure that grows ligands within the binding site and a “superdock” procedure that fits fully grown compounds into the binding site by superimposition onto target points. The growth procedure starts the process from an “anchor group” within each compound. The superdock approach is broadly similar to that of the original DOCK program [224]. MVP accounts for some aspects of configurational entropy by running two separate calculations for each compound, one in the binding site and one free in solution, calculating the binding energy using Boltzmann summations over the respective minima. HierVLS (hierarchical virtual ligand screeing) [230] and MPSim-Dock Hierarchical docking algorithm [231] are other approaches of significant interest based on DOCK 4.0. HierVLS, a fast hierarchical docking approach that starts with a coarse grain conformational search over a large number of configurations filtered with a fast but crude energy function, followed by a succession of finer grain levels, using more and more accurate but more expensive descriptions of the ligand-protein-solvent interactions. The final step of this procedure optimizes one configuration of the ligand in the protein site using an accurate energy expression and description of the solvent, which would be impractical for all conformations and sites sampled in the coarse level. MPSimDock combines elements of DOCK with molecular dynamics methods available in the software, MPSim. Surflex (www.biopharmics.com) (Fragment-Based Approach) [232-236] Surflex is based on a previously developed program named Hammerhead. It uses the same concept of pocket finder and binding site-probing definition (protomols) but it is characterized by an innovative incremental construction of the ligand and recently refined scoring function. The following describes the overall procedure and the two main phases of the algorithm (Fig. 8). The program first creates an idealized binding site a protomol (mol file) that serves as a target to which putative ligands or ligand fragments are aligned on the basis of molecular similarity. Ligands are docked into the protein to optimize the value of the scoring function. Each putative ligand is fragmented, resulting in 110 molecular fragments, each of which may have some ro-

Fig. (8). Surflex flowchart.

tatable bonds. Each fragment is then conformationally searched and each conformation of each fragment is aligned to the protomol to yield poses that maximize molecular similarity to the protomol. The aligned fragments are scored and pruned on the basis of the scoring function and the degree of protein interpenetration. Two procedures can be used to construct the full ligand from the aligned fragments (incremental or whole molecule approach). The best scoring poses are subjected to gradient-based optimization of conformation and alignment, and the top scoring poses are returned along with their scores. The poses can be post-processed at a later stage using user-defined parameters. The scoring function terms involve, in rough order of significance, hydrophobic complementarity, polar complementarity, entropic terms, and solvation terms. A modified version of the scoring function has been implemented and this one makes explicit use of negative data (which vastly increases the amount of empirical information available to estimate parameters) [236]. ADAM [237] The program docks flexible ligands into a binding site based on the assumption that hydrogen-bonds between the ligand and the protein represent the preponderant interactions and can serve as a guide for docking. The protocol detects hydrogen-bond schemes between the ligand and the protein, and then tries to obtain a match between the concerned parts

16

Current Protein and Peptide Science, 2006, Vol. 7, No. 5

of the ligand and dummy atoms from the protein side. Dummy atoms are generated at the center of hydrogen-bond heteroatom site. The match part of the ligand serves as a placement fragment, and permits a full minimization in situ (full ligand + protein) by a combination of successive minimization methods. In this step, both the ligand and the protein active site atoms are allowed to move therefore accounting for some protein flexibility. Once the H-bonding patterns has been constructed and optimized, the rest of the ligand has to be docked and minimized as well. This is done through a two-step optimization procedure based on internal coordinates. The downhill simplex method is then used to minimize the ligand within the binding site. This last step also represents the scoring part of the docking procedure because the total energy obtained at the end of the minimization is also used for ranking. CLIX (Chemical Match, Rigid-Body) [238] The method consists of docking rigid ligands by matching compatible pairs of interaction points in the protein with ligand atom groups. It uses the program GRID (http://www.moldiscovery.com/soft_grid.php) to first probe the binding site of the receptor in order to determine most favorable interaction sites with specific chemical groups. Then, CLIX considers all combinations of pairs of GRID sites and ligand groups. The pairs that display satisfactory geometric fits and enough coincidence of atomic groups are retained. When a matching pair is found, steric hindrance and rotation around these pairs are processed in order to maximize the number of favorable interactions between the receptor site and the ligand. The program uses a simple and fast energy function. It is based on the energy maps provided by GRID. The procedure consists of summing the energies corresponding to the grid points occupied by a ligand atom. DockIt (www.metaphorics.com) The program uses distance geometry for docking ligands into the receptor site. The active site is defined as a set of spheres and ligand conformations are constructed within this site via the distance geometry procedure. Thus, only 2D connectivity is required for the ligands (Daylight SMILES or MDL mol files). The originality of the approach is that DockIt creates a bound conformation for the considered ligands that fits both shape and chemical complementarities. FLOG (Rigid Docking) [239] The method docks rigidly previously generated conformers by matching interaction points from the receptor site with ligand atoms. The physical properties of the ligand atoms are divided into several atom types (neutral hydrogen-bond donors, neutral hydrogen-bond acceptors, polar, hydrophobic…). The receptor energy field is represented via several grids that contain pre-calculated potential energies of interaction with putative ligand atoms. The binding energy evaluation is then reduced to simple series of look-up tables. The grid point interactions are calculated for each of the ligand atom types. The determination of match centers (interaction centers) is based on the search of favorable sites of interactions. All possible atom-center matches are found using a clique-finding algorithm. For each match, an initial

Sperandio et al.

orientation of the ligand is produced and optimized until the grid-based score is maximal. This is done by rotating and translating the ligand as a rigid body in order to get the best fit of the selected match pairs. The non-bonded potential energies of interaction are the canonical Lennard-Jones and Coulombic potentials and other terms can be considered such as hydrogen-bond and hydrophobic interaction. The hydrogen-bond potential contains an attractive Gaussian term that also serves to identify, in the absence of such interaction, the consequential presence of hydrophobic interaction. Those terms take into account both the complementarity and the geometry required for optimal hydrogen bonds. A Simplex optimization is used to refine the final orientations of the docked ligand. SYSDOC - EUDOC (Cation-pi Interaction) [240, 241] The method docks as rigid body (externally generated) conformers in the binding site of the selected target. The complex conformations are generated by a systematic combination of rigid body translation and rotation of the ligand around the three Cartesian axes. The complexes generated are not minimized. A continuum calculation of the binding energy is used to score the docking poses via the three available force-fields (TRIPOS, AMBER, CHARMM). The computed energy considers three kinds of interaction: electrostatic, steric (vdW), and hydrogen-bonds. The Coulombic term uses a distance-dependent dielectric constant. Cation-pi interaction between the ligand and the target are also investigated. Molecular flexibility is treated by using several conformations pre-generated for both the target and the ligand. In the last version of the program suite (EUDOC), a virtual screening procedure has been added to anticipate issues specific to the concept of library enrichment. The procedure focuses exclusively on the ligands that present a good match with the receptor and does not consist of an exhaustive exploration of the conformational space. SANDOCK (Complementary Surface)[242] The method docks ligands into protein receptor sites using a distance matching algorithm and a chemical-based dot surface representation of the protein. SANDOCK characterizes the binding site of the target protein as a surface of dots defined by chemical properties and accessibility that represents the most probable complementary surface for different ligand atom types. Dots are then assigned one of the three chemical properties: hydrogen bonding, hydrophobic and topological properties. Topological property represents the degree of priority for a given dot, based on its accessibility to the solvent. A very similar procedure is then applied to the ligand atoms excluding the topology property, which is replaced by a flagging system that determines the essentiality of the ligand atom to the molecular framework. Ligand atoms are placed into the target active site using a distance match algorithm that compares distances between dots of the receptor with distances between ligand atoms. Once a certain number of distances have been matched, a rigid body transformation is applied in order to fit the concerned atoms and dots. The inherent chemical properties of the dots and of the ligand atoms are taken into account by tuning the distance tolerance parameter. This permits to guide the ligand placement according to the nature of the considered dots and at-

Structure-Based Virtual Ligand Screening

oms. The scoring function is the weighted sum of three terms: “geometric fit, hydrophobic fit and hydrogen bond fit”. Specific distance constraints can be added to ensure the presence of a given ligand profile and orientation within the acceptor site.

Current Protein and Peptide Science, 2006, Vol. 7, No. 5

17

Uelec and Uvdw are the Coulombic and Lennard-Jones potentials between the ligand and the protein respectively. Uligand is the strain internal energy of the ligand and Usolv is the desolvation energy. This scoring function is used for both energy minimization and the final ranking of the docked poses.

Q-fit [243] The method docks molecular fragments into a rigid receptor site. It uses a probabilistic procedure based on statistical thermodynamic principles to place ligand atom triplets at the lowest energy sites. The interaction energy of atom probes with the protein receptor is computed. Triplets of the most favorable interaction sites are generated and matched with compatible triplets of ligand atoms. For this purpose, two alternative algorithms are available, geometric hashing method and pose clustering. Once a ligand and a fragment have been accepted, a downhill Simplex algorithm performs a rigid body minimization in order to optimize the placement of the ligand within the receptor site. Q-Fit evaluates the interaction between the ligand and the receptor via nonbonded energy calculations, using force-field parameters from GRID. At each grid point, the non-bonded interaction energy of a probe type is calculated through van der Waals (6-12 Lennard-Jones), electrostatic and hydrogen bond potentials. The two first potentials are those used in GRID, while the hydrogen bond function uses a customized procedure with a direction-dependent 8-6 function.

FlexX (http://www.biosolveit.de/) com/) [245]

(http://www.tripos.

FlexX docks flexible ligands into rigid receptors using an incremental approach and some concepts present in the LUDI program [246]. The approach can be divided into three areas: conformational flexibility, protein-ligand interactions and scoring. The conformational flexibility of the ligand is modeled by a discrete set of preferred torsion angles at acyclic single bonds [247] and multiple conformations for ring systems (computed with CORINA). With regard to the interaction scheme, FlexX relies on the detection of geometrically restrictive interactions such as hydrogen-bonds (Fig. 9), specific hydrophobic interactions such as phenyl-methyl doublets, or spherical surfaces that are derived from favored interaction distances (i.e., geometrically restrictive interactions have a favored interaction distance). Such interactions are defined by the vicinity and compatibility of so-called interaction centers and interaction radii, which define interaction surfaces (modeled as interaction points for computer efficiency on the receptor side).

Ph4Dock (Pharmacophoric Matching) [244] The program docks rigid ligands into partially flexible receptor sites by matching pharmacophoric points between the ligand and the protein. The overall approach can be divided into five steps: conformation search of ligands, concave search, pharmacophore query creation, pharmacophore search and energy minimization. The receptor site is detected and probed to list pharmacophoric features (dummy atoms) such as hydrogen-bond donor, hydrogen-bond acceptor, and hydrophobic region. This is accompanied by the definition of exclusion spheres that determine forbidden regions of the protein active site. The definition of the binding site is made through a set of spheres that contacts the surface of the protein-binding site in a similar manner than the program SPHGEN [225] from the DOCK package. Dummy atoms are placed at the center of the generated spheres and clustered through a single-linkage clustering algorithm. The nature of the pharmacophoric feature associated to a dummy atom is determined on the sole basis of the value of its electrostatic energy of interaction with the protein-binding site. The most negative energies are associated to hydrogen-bond acceptors, the most positive energies are associated to hydrogen-bond donors, and the ones in between are considered as hydrophobic. The matching procedure is in charge of finding matches between the ligand atoms and the protein-derived dummy atoms. This comes to find the largest set of compatible pharmacophoric features. A two-step minimization procedure optimizes the ligand poses and determines those that will be passed to the scoring function. Even though the protein atoms are by default treated as rigid entities, it is possible to optimize the receptor side-chains. The main scoring function used in Ph4Dock is defined as follows: UTotal = U elec + Uvdw + Uligand + Usolv.

Fig. (9). FlexX: condition for the formation of interactions. Hydrogen-bond between the carbonyl oxygen and the nitrogen. The interaction centers are the oxygen and the hydrogen atom and the interaction surface is shown as a grey cone. A contact is possible if the interaction center of each group is lying approximately on the interaction surface (defined by their interaction radii) of the counter group.

The docking algorithm is divided into three phases: base fragment selection, base fragment placement, and complex construction, where the ligand is built incrementally from the base fragment. The protocol is highly sensitive to the base fragment and performances are impeded by large numbers of rotatable bonds in the ligand. The base selection involves definition of connected parts of the ligand and placement of the base fragment consists of matching all triplets of interaction centers within the fragment with compatible interaction points within the receptor site (Fig. 10). The goal of the base placement algorithm is thus to find positions in the active site such as to form a sufficient number of favorable interactions between the part of the ligand and the protein. Once a set of favorable placements for the base fragment has been computed, the incremental construction starts. The construction algorithm is computed as a tree search problem. The goal of this search is to find the leaves that contain placements with favorable binding energies as estimated by the scoring function. Because exploring the whole tree is not

18

Current Protein and Peptide Science, 2006, Vol. 7, No. 5

Sperandio et al.

Fig. (10). FlexX: mapping of interaction centers from the ligand onto three interaction points in the receptor site.

feasible, a simple greedy heuristic is used. Only some of the best partial placements found are considered for the next iteration. The next fragments to be added to the growing ligand are in priority those promoting additional hydrogen bonds or salt-bridges because they are more directional and thus geometrically more precisely defined. An optimization procedure follows each fragment adjunction. The placement needs to be optimized if new interactions are found or if some overlaps are detected. The ranking of the ligands is performed with the modified empirical Bohm scoring function (this one evaluates the free energy of binding ∆G). This scoring function includes several (weighted) terms: a fixed ground term, a term taking into account the loss of entropy during ligand binding due to the hindrance of rotatable bonds in the ligand, hydrogen-bond, ionic interaction, aromatic interaction and lipophilic interaction (Fig. 11). (please see Tripos website for additional information about other FlexX methods, such as CombiFlexX and numerous other drug design tools). FlexX – PHARM (http://www.biosolveit.de/) (http:// www.tripos.com/) [248] FlexX-pharm uses the core algorithm of FlexX but allows the incorporation of characteristics concerning the protein-ligand binding modes derived from the receptor active site, these ones being treated as simple constraints. The program can either guide the FlexX docking step by using the constraints through look-up tables or simply be used as a post-filtering tool to analyze FlexX results. The constraints defined in FlexX-PHARM can be either interaction or spatial constraints. Interaction constraints are defined such that specific groups of atoms within the receptor must interact with a part of the ligand in a predefined way (e.g., H-bond donor and acceptor). Spatial constraints consist of restraining certain atom types or groups within the ligand in a certain re-

Fig. (11). FlexX scoring function.

gion of the receptor. For each constraint, the program detects all counter-group candidates in the ligands. A maximum distance look-up table (MAXLT) is calculated for all candidate counter-groups and a minimum distance look-up table (MINLT) for the receptor, as such, impossible combinations can be rejected. The look-ahead filtering procedure comes after the fragment placement of each incremental constructing step. Therefore, FlexX-PHARM does alter neither the actual base fragment placement nor the incremental construction of the ligand made by FlexX. During the fragment placement, FlexX-PHARM detects additional interactions that are deviating from ideal geometries. Three sequential checks are designed to verify if the retained solutions match the constraints. Those checks are more and more stringent and preclude inappropriate docking solutions from being retained for the next incremental iteration. First, logical checks determine if pharmacophoric constraints are satisfied and are simple logical conditions used to decide whether the partial docking solution can be retained. Second, distance checks examine constraints that remain possible and NOT those that are already fulfilled. Those checks represent rules for adding the next fragment by comparing either distances between interaction constraints in the receptor and an anchor atom for the next fragment on the docked part of the ligand, or intra-ligand distances between the anchor atom and candidate counter-groups. Finally, directed tweak checks allow for local flexible 3D search for the added fragment by adjusting its rotatable bonds. FlexE (http://www.tripos.com/) [249] The method uses the docking engine of FlexX but can take into account some protein flexibility. It uses a united representation of the protein that characterizes structural variations of some flexible regions. The description is based on an ensemble of superimposed structures of the protein

Structure-Based Virtual Ligand Screening

target, the superimposed structures are combined to create the united protein structure by clustering the alternative sidechain conformation and backbone parts (these are called instances). The inter-dependencies within the ensemble are handled by a so-called incompatibility graph. A specific procedure is in charge of searching this graph in order to submit valid structures to the docking engine. The instances corresponding to the different parts of the structure are compiled into an incompatibility graph that guaranties the simultaneity of the instances. Each instance capable of forming interactions is given an interaction geometry that is a set of interaction points. These points are tested for overlaps with other parts of the protein and are used for scoring the interactions with the ligand. The united representation is used during the incremental construction of the ligand. The ligand is placed fragment after fragment within the binding site, and at each step of the construction, all possible interactions between the growing ligand and all the instances of the united protein structure are determined. For each particular instance, the scoring function is applied. OTHER PROGRAMS In the following, we report on other important tools that have been used in VLS studies and/or that have just been described. DoMCoSAR (Monte Carlo-simulated annealing) [250] provides docking modes that are consistent with a StructureActivity Relationship model. The approach establishes the binding mode for the compounds in a chemical series with the assumption that all molecules exhibit the same binding mode. It is based on a combination of simulated annealing, CHARMm parameters and the use of soft-core potentials (electrostatic, attractive and repulsive terms). The docking protocol is divided into three steps during which the softcore potentials are adjusted. During the first stage, the soft non-bonded potentials are used in combination with Monte Carlo coupled to a simulated annealing procedure starting at high temperature. The molecules that belong to a chemical series are docked into the binding site. During the second stage, the local minima obtained from step one are used as starting points while the starting temperature for the annealing is decreased and the soft-core potential are progressively hardened. Finally during the third stage, the soft-potentials disappear and the resulting structures undergo local minimization. The approach is said to be slow and is not applicable to virtual screening of diverse sets of molecules. LIGIN is based on a surface complementarity approach and docks a rigid ligand into a rigid receptor [251]. An initial set of random orientations is generated for the ligands in order to maximize a complementarity function. This function takes into account atomic surface contacts weighted by their propensity to form favorable interactions. It is a simple sum of surface areas of atomic contacts with weight (plus or minus one) depending on the types of atoms in contact, and one term, which prevents strong atomic bumping during the search. Then follows an optimization step, driven by maximization of hydrogen bonds between the ligand and the receptor. The program generates initially randomly distributed ligand positions and orientations. These poses are originally all confined within a user-defined box that encapsulates the receptor active site. Once the complementarity function is

Current Protein and Peptide Science, 2006, Vol. 7, No. 5

19

maximized, the compounds are clustered and sorted. Then a list of hydrogen bond candidates is generated to drive the final optimization of the ligand pose. Protein flexibility is taken into account to some extent. Indeed, the protocol can be instructed to neglect some protein residues in the complementarity function evaluation. FTDock can be used to dock rigid ligands within a rigid receptor site, although it is essentially used for protein-protein docking experiments [252]. The method relies on the Fourier correlation theory to evaluate successively the shape and the electrostatic complementarity of molecular surfaces (ligand and protein) [253]. Yucca docks (rigid docking) previously generated (with Omega) low-energy conformers into the receptorbinding pocket and rank them with a PLP-based scoring function (see V. Choi, http://people.cs.vt.edu/~vchoi/) [254]. A simple and very fast surface-matching procedure (the authors note that it is faster than FRED) has been reported by Yamagishi and colleagues [255] and seems promising for initial and rapid screening of large compound collections. Several programs that have been developed in Dr. Jorgensen’s laboratory such as BOSS, BOMB and QikProp have been shown interesting for virtual screening experiments [14, 33]. The protein alpha shape dock (PAS-Dock) tool is a new empirical scoring function suitable for virtual screening on homology models [256]. eHITS (http://www.simbiosys. ca/ehits/), performs fast flexible ligand docking. A systematic algorithm is used in eHiTS with no random, stochastic or evolutionary element. eHiTS provides a comprehensive search space coverage. The eHiTS system generates all major docking modes that are compatible with the steric and chemistry constraints of the target cavity for each candidate structure. The output consists of multiple sets of 3D coordinates per structure with rough fitness scores that are highly configurable. A user-defined binding pocket or the entire receptor surface can be searched. Other methods of interest for relatively fast library screening are GasDock (multipopulation genetic algorithm) [257] and DragHOME [258] that allows the docking of ligands into low resolution protein models. Another method called GEMDOCK (A Generic Evolutionary Method for molecular DOCKing, http://gemdock.life.nctu.edu.tw/dock/) has recently been reported [259]. The docking engine combines both discrete and continuous global search strategies while the scoring is performed with an empirical scoring function that is said to result in rapid recognition of potential ligands. Virtual screening can also be done via molecular dynamics simulations using a force-field whose effective charges are refined by means of a novel procedure that relies on quantummechanical calculations and preserves the internal consistency of the parameterization scheme [260]. The MOE-dock package (http://www.chemcomp.com/index.htm) uses Tabu search methodology and simulated-annealing to sample conformational space, and the docked ligands are then empirically scored. Quantum (http://www.q-pharm.com/home) and Moloc (http://www.moloc.ch/index.html) are modeling packages with many modules/tools for drug discovery projects and structural analysis of proteins/small molecules. The RiboDock package uses Monte Carlo sampling and a fast empirical function for scoring RNA-ligands interaction [261]. A novel engine combining exhaustive similarity searches directly in SMILES format with docking of flexible ligands has recently presented and aims at facilitating in

20

Current Protein and Peptide Science, 2006, Vol. 7, No. 5

silico screening of very large databases [262]. Packages that combine ligand-based and receptor-based approaches are also emerging, like for instance CoLiBRI [263]. Molecular dynamics can help to perform in silico screening of flexible ligands against flexible receptor binding sites [264]. Hierarchical multiple-filter database searching strategies are attractive approaches for drug lead exploration [265]. For instance, approaches that combine several methods are emerging, like for instance, docking with FlexX and applying molecular dynamics and quantum mechanics and molecular mechanics methods [266] or molecular dynamics with Quantum-Refined Force-Field ([267] see the review of Ferrara et al. for additional comments on QM/MM calculations [150]). Hierarchical database screenings using a pharmacophore model, rigid-body docking, solvation docking, and molecular mechanics-Poisson-Boltzmann/surface area has been shown useful [268]. Also, combination of methods can be important, not only to improve the results but also to reduce software license cost. For example, studies combining FLOG and ICM-dock [269] or FRED-SurflexDOCK [170] have been reported and successfully applied for identification of potent inhibitor of protein kinase CK2 [124]. Data obtained from VLS studies can be incorporated into de novo approaches to structure-based rational drug design. Combining the two concepts should help designing entirely novel potential leads. Examples of packages for such computations include Ludi [270], SPROUT [271, 272], LEAD3D [273], ENPDA (for peptide design) [274] or LigBuilder [275]. IV. SUCCESSES AND CHALLENGES VLS methods based on the 3D structure of the receptor have been shown to be useful in prioritizing large libraries and influential for drug design projects despite present technical and theoretical limitations. Numerous success stories about identification of new hits, on different targets (X-ray or homology models), with different kinds/sizes of compound collections and different programs have been presented in recent reviews [3, 8, 12, 31, 32, 56, 276]. For example, Vangrevelinghe et al. discovered novel and selective inhibitors for protein Caseine Kinase II (CK2, homology models) by screening with DOCK a subset (400000 molecules) of the Novartis database [277]. The identification of novel and potent inhibitors of DNA gyrase by 3D structurebiased ‘needle’ screening was described [278]. Identification of non-peptide inhibitors of beta-secretase via highthroughput docking and continuum electrostatics calculations has been reported using a library of over 200000 compounds and several promising lead compounds are under development [279]. Novel acetylcholinesterase inhibitors were discovered by screening 160000 commercially available compounds with the ADAM&EVE method [280]. Jenkins et al. discovered hit compounds for angiogenin using several structure-based VLS methods [176]. Schapira et al. discovered novel nuclear hormone receptor antagonists using ICMVLS and via screening of several commercial compound collection databases [281, 282]. Kraemer et al. found inhibitors of human aldose reductase applying a protocol of consecutive hierarchical filters (the docking software used

Sperandio et al.

was FlexX) to search the Available Chemicals Directory [283]. Many examples and applications discussed in the present article indicate that in silico VLS approaches are mature but that the scoring (reliable ranking of compounds) step continues to be the major issue. Further progress will be required to better account for and balance entropic effects and electrostatic interactions (e.g., solvation). Scoring schemes could be improved by tailoring them to a specific target site rather than by writing new scoring functions/parameters. Related key challenges acting both during docking and scoring are the appropriate treatment of ionization and tautomerization states in the input data. Docking the correct ligand tautomer would require dynamic protein pKa prediction, since tautomers are influenced by environment but addressing this problem during VLS computations is challenging. Other problems in the field relate to ligand/target flexibility, the choice of force-fields, partial charges, solvent, dielectric constant and the exploration of multiple binding modes [23]. Also, thorough docking/scoring calculations of databases containing millions of compounds can be highly time consuming. Methods to speed-up calculations are therefore important. Computations can be carried out on clusters and/or via grid computing technologies but limiting factors such as the number of commercial software licenses needed to perform docking simultaneously on multiple processors (“software cost”) and computer cost have to be taken into account. V. CONCLUSION Virtual screening methods based on the 3D structure of the receptor provide a real opportunity for identification of new active compounds, without bias towards known hits or leads. Yet, in some cases, it is of interest to use both, ligandbased and structure-based approaches. We have introduced in this review concepts about target definition, pocket prediction, ADME/tox computations and library design as well as presented diverse docking/scoring methods. Additional concepts pertaining to the field of drug design and comments/methods/algorithms can be found in the following reviews of this special issue of CPPS. With regard to docking/scoring, we have seen that there is room for improvements in the field (like the need to accurately account for both ligand/receptor flexibility [284]). Yet, because the interplay between docking and scoring is fairly complex, these improvements are difficult to make and/or assess. Many methods are able to produce reliable models of bound ligands (correct poses are generated) but it is still difficult to distinguish ‘true’ ligands from false-positives. Thus, algorithms that can handle better receptor flexibility, induced-fit motions and binding affinities are needed. Several approaches have been mentioned in the text, they can involve real flexibility of the partners and/or the use of special scoring functions that take into account indirectly atom mobility. Very recent examples along this line that consider indirectly receptor atom mobility (in addition to the ones already described above) include development of a new knowledge-based scoring function, M-Score (taking into account the isotropic B-factors) [285] or new approaches for induced-fit docking [286, 287] or as integrated in eHiTS (see details by Zsoldos et al. in this special issue). From analysis

Structure-Based Virtual Ligand Screening

of the literature, there is an overall agreement about the need to run in silico and in vitro experiments in parallel on several test cases in order to understand better how to optimize in silico/in vitro approaches [4, 288, 289]. Chaining or combining different methods can be beneficial in some cases and it is conceivable that, in the next decade, such integrated methods will become mainstream. Combining NMR, X-ray analysis, in silico screening and parallel chemistry is obviously important, like in the case of fragment-based lead discovery projects (fraganomics) [3, 82, 290]. Prior to docking and scoring, significant amount of energy still needs to be spent on generating and maintaining large compound collections and preparation of the targets. These steps are crucial and need to be performed with caution. There is no clear agreement about what should be an optimal compound collections (size, diversity…) for a VLS (or HTS) campaign and research groups are using different types of database, containing for example only small fragments or natural products (more difficult since generally large, flexible with several chiral centers) or focus libraries. During the last 5 years, the number of success stories resulting from structure-based VLS experiments has been growing significantly (simple internet or PubMed search supports this observation) and applications of these techniques give not only important information about key molecular mechanisms but also provides with many novel hits that will help to cure diseases while facilitating the design of more specific wet lab experiments. In numerous cases, structure-based VLS represents a useful alternative strategy to experimental HTS to find novel lead compounds [150]. ACKNOWLEDGMENTS We thank the French Institute of Health and Medical Research (Inserm) for supports and those who established the Inserm Avenir Career Award (Prof. C. Brechot and collaborators). We are grateful to the University of Paris 5 and Prof. D. Jore for providing laboratory space. Comments about our review article from Dr. Xueliang Fang (The University of Michigan, Ann Arbor, MI, USA) and Dr. Wen Hwa Lee (Structural Genomics Consortium, Oxford University, UK) were greatly appreciated. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14]

Mestres, J. (2002) Biochem Soc Trans, 30, 797-799. Ertl, P. (2003) J Chem Inf Comput Sci, 43, 374-380. Congreve, M., Murray, C. W. and Blundell, T. L. (2005) Drug Discov Today, 10, 895-907. Brenk, R., Irwin, J. J. and Shoichet, B. K. (2005) J Biomol Screen, 10, 667-674. Bajorath, J. (2002) Nat Rev Drug Discov, 1, 882-894. Kubinyi, H. (2003) Nat Rev Drug Discov, 2, 665-668. Lundqvist, T. (2005) Curr Opin Drug Discov Devel, 8, 513-519. Shoichet, B. K. (2004) Nature, 432, 862-865. Lyne, P. D. (2002) Drug Discov Today, 7, 1047-1055. Bleicher, K. H., Bohm, H. J., Muller, K. and Alanine, A. I. (2003) Nat Rev Drug Discov, 2, 369-378. Wang, Y., Chiu, J.-F. and He, Q.-Y. (2005) Current ComputerAided Drug Design, 1, 43-52. Hardy, L. W. and Malikayil, A. (2003) Curr. Drug. Discov., 15, 15-20. Jennings, A. and Tennant, M. (2005) Curr Pharm Des, 11, 335344. Rotella, D. P. (2006) IDrugs, 9, 331-333.

Current Protein and Peptide Science, 2006, Vol. 7, No. 5 [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56]

21

McConkey, B. J., Sobolev, V. and Edelman, M. (2002) Current Science, 83, 845-856. Dror, O., Shulman-Peleg, A., Nussinov, R. and Wolfson, H. J. (2004) Curr Med Chem, 11, 71-90. Schneidman-Duhovny, D., Nussinov, R. and Wolfson, H. J. (2004) Curr Med Chem, 11, 91-107. Stahura, F. L. and Bajorath, J. (2005) Curr Pharm Des, 11, 11891202. Lengauer, T., Lemmen, C., Rarey, M. and Zimmermann, M. (2004) Drug Discov Today, 9, 27-34. Xu, H. (2002) Curr Top Med Chem, 2, 1305-1320. Rockey, W. M. and Elcock, A. H. (2002) Proteins, 48, 664-671. Evers, A. and Klebe, G. (2004) J Med Chem, 47, 5381-5392. Mohan, V., Gibbs, A. C., Cummings, M. D., Jaeger, E. P. and DesJarlais, R. L. (2005) Curr Pharm Des, 11, 323-333. Oshiro, C., Bradley, E. K., Eksterowicz, J., Evensen, E., Lamb, M. L., Lanctot, J. K., Putta, S., Stanton, R. and Grootenhuis, P. D. (2004) J Med Chem, 47, 764-767. Davis, A. M., Teague, S. J. and Kleywegt, G. J. (2003) Angew Chem Int Ed Engl, 42, 2718-2736. Kairys, V., Fernandes, M. X. and Gilson, M. K. (2006) J Chem Inf Model, 46, 365-379. Wieman, H., Tondel, K., Anderssen, E. and Drablos, F. (2004) Mini Rev Med Chem, 4, 793-804. Kitchen, D. B., Decornez, H., Furr, J. R. and Bajorath, J. (2004) Nat Rev Drug Discov, 3, 935-949. Oprea, T. I. and Matter, H. (2004) Curr Opin Chem Biol, 8, 349358. Jain, A. N. (2004) Curr Opin Drug Discov Devel, 7, 396-403. Alvarez, J. C. (2004) Curr Opin Chem Biol, 8, 365-370. Fradera, X. and Mestres, J. (2004) Curr Top Med Chem, 4, 687700. Jorgensen, W. L. (2004) Science, 303, 1813-1818. Langer, T. and Hoffmann, R. D. (2001) Curr Pharm Des, 7, 509527. Abagyan, R. and Totrov, M. (2001) Curr Opin Chem Biol, 5, 375382. Brooijmans, N. and Kuntz, I. D. (2003) Annu Rev Biophys Biomol Struct, 32, 335-373. Taylor, R. D., Jewsbury, P. J. and Essex, J. W. (2002) J Comput Aided Mol Des, 16, 151-166. Schneider, G. and Bohm, H. J. (2002) Drug Discov Today, 7, 6470. Waszkowycz, B. (2002) Curr Opin Drug Discov Devel, 5, 407-413. Jongejan, A., de Graaf, C., Vermeulen, N. P., Leurs, R. and de Esch, I. J. (2005) Methods Mol Biol, 310, 63-91. Anderson, A. C. (2003) Chem Biol, 10, 787-797. Cole, J. C., Murray, C. W., Nissink, J. W., Taylor, R. D. and Taylor, R. (2005) Proteins, 60, 325-332. Fernandes, M. X., Kairys, V. and Gilson, M. K. (2004) J Chem Inf Comput Sci, 44, 1961-1970. Rognan, D. (2006) J Physiol Paris. Sotriffer, C. A. and Dramburg, I. (2005) J Med Chem, 48, 31223125. Do, Q. T., Renimel, I., Andre, P., Lugnier, C., Muller, C. D. and Bernard, P. (2005) Curr Drug Discov Technol, 2, 161-167. Lamb, M. L., Burdick, K. W., Toba, S., Young, M. M., Skillman, A. G., Zou, X., Arnold, J. R. and Kuntz, I. D. (2001) Proteins, 42, 296-318. Chen, Y. Z. and Zhi, D. G. (2001) Proteins, 43, 217-226. Fukunishi, Y., Mikami, Y., Kubota, S. and Nakamura, H. (2005) J Mol Graph Model. Vigers, G. P. and Rizzi, J. P. (2004) J Med Chem, 47, 80-89. Baurin, N., Baker, R., Richardson, C., Chen, I., Foloppe, N., Potter, A., Jordan, A., Roughley, S., Parratt, M., Greaney, P., Morley, D. and Hubbard, R. E. (2004) J Chem Inf Comput Sci, 44, 643-651. Krier, M., Bret, G. and Rognan, D. (2006) J Chem Inf Model, 46, 512-524. Lameijer, E. W., Kok, J. N., Back, T. and Ijzerman, A. P. (2006) J Chem Inf Model, 46, 553-562. Capelli, A. M., Feriani, A., Tedesco, G. and Pozzan, A. (2006) J Chem Inf Model, 46, 659-664. Alvesalo, J. K. O., Siiskonen, A., Vainio, M. J., Tammela, P. S. M. and Vuorela, P. M. (2006) J Med Chem, 49, 2353-2356. Orry, A. J., Abagyan, R. A. and Cavasotto, C. N. (2006) Drug Discov Today, 11, 261-266.

22 [57] [58] [59] [60] [61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74]

[75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98]

Current Protein and Peptide Science, 2006, Vol. 7, No. 5 Pirok, G., Mate, N., Varga, J., Szegezdi, J., Vargyas, M., Dorant, S. and Csizmadia, F. (2006) J Chem Inf Model, 46, 563-568. Lameijer, E. W., Kok, J. N., Back, T. and Ijzerman, A. P. (2006) J Chem Inf Model, 46, 545-552. Kassel, D. B. (2004) Curr Opin Chem Biol, 8, 339-345. Muegge, I. (2003) Med Res Rev, 23, 302-321. Hou, T. and Xu, X. (2004) Curr Pharm Des, 10, 1011-1033. Beresford, A. P., Segall, M. and Tarbit, M. H. (2004) Curr Opin Drug Discov Devel, 7, 36-42. Lombardo, F., Gifford, E. and Shalaeva, M. Y. (2003) Mini Rev Med Chem, 3, 861-875. Gasteiger, J. (2003) Mini Rev Med Chem, 3, 789-796. Martin, Y. C. (2005) J Med Chem, 48, 3164-3170. Rishton, G. M. (2003) Drug Discov Today, 8, 86-96. Lipinski, C. A., Lombardo, F., Dominy, B. W. and Feeney, P. J. (2001) Adv Drug Deliv Rev, 46, 3-26. Li, A. P. (2001) Drug Discov Today, 6, 357-366. Roche, O. and Guba, W. (2005) Mini Rev Med Chem, 5, 677-683. Olah, M. M., Bologa, C. G. and Oprea, T. I. (2004) Curr Drug Discov Technol, 1, 211-220. Selzer, P., Roth, H. J., Ertl, P. and Schuffenhauer, A. (2005) Curr Opin Chem Biol, 9, 310-316. Socorro, I. M. and Goodman, J. M. (2006) J Chem Inf Model, 46, 606-614. von Korff, M. and Sander, T. (2006) J Chem Inf Model, 46, 536544. Roche, O., Schneider, P., Zuegge, J., Guba, W., Kansy, M., Alanine, A., Bleicher, K., Danel, F., Gutknecht, E. M., RogersEvans, M., Neidhart, W., Stalder, H., Dillon, M., Sjogren, E., Fotouhi, N., Gillespie, P., Goodnow, R., Harris, W., Jones, P., Taniguchi, M., Tsujii, S., von der Saal, W., Zimmermann, G. and Schneider, G. (2002) J Med Chem, 45, 137-142. Feng, B. Y., Shelat, A., Doman, T. N., Guy, R. K. and Shoichet, B. K. (2005) Nat Chem Biol, 1, 146-148. Vermeulen, N. P. (2003) Curr Top Med Chem, 3, 1227-1239. Veber, D. F., Johnson, S. R., Cheng, H. Y., Smith, B. R., Ward, K. W. and Kopple, K. D. (2002) J Med Chem, 45, 2615-2623. Ertl, P., Rohde, B. and Selzer, P. (2000) J Med Chem, 43, 37143717. Nassar, A. E., Kamel, A. M. and Clarimont, C. (2004) Drug Discov Today, 9, 1055-1064. Oprea, T. D. (2002) Molecules, 7, 51-62. Caldwell, G. W. and Yan, Z. (2006) Curr Opin Drug Discov Devel, 9, 47-60. Rees, D. C., Congreve, M., Murray, C. W. and Carr, R. (2004) Nat Rev Drug Discov, 3, 660-672. Congreve, M., Carr, R., Murray, C. and Jhoti, H. (2003) Drug Discov Today, 8, 876-877. Fechner, U. and Schneider, G. (2006) J Chem Inf Model, 46, 699707. Balakin, K. V., Savchuk, N. P. and Tetko, I. V. (2006) Curr Med Chem, 13, 223-241. Weininger, D. (1988) J Chem Inf Comput Sci, 28, 31-36. Dalby, A., Nourse, J. G., Hounshell, W. D., Gushurst, A. K. I., Grier, D. L., Leland, B. A. and Laufer, J. (1992) J Chem Inf Comput Sci, 32, 244-255. Knox, A. J., Meegan, M. J., Carta, G. and Lloyd, D. G. (2005) J Chem Inf Model, 45, 1908-1919. Schuffenhauer, A., Brown, N., Selzer, P., Ertl, P. and Jacoby, E. (2006) J Chem Inf Model, 46, 525-535. Verdonk, M. L., Berdini, V., Hartshorn, M. J., Mooij, W. T., Murray, C. W., Taylor, R. D. and Watson, P. (2004) J Chem Inf Comput Sci, 44, 793-806. Deng, Z., Chuaqui, C. and Singh, J. (2006) J Med Chem, 49, 490500. Anderson, A. C. and Wright, D. L. (2005) Current Computer-Aided Drug Design, 1, 103-127. Thomas, M. P., McInnes, C. and Fischer, P. M. (2006) J Med Chem, 49, 92-104. Barril, X. and Morley, S. D. (2005) J Med Chem, 48, 4432-4443. Carlson, H. A. (2002) Curr Opin Chem Biol, 6, 447-452. Carlson, H. A. (2002) Curr Pharm Des, 8, 1571-1578. Carlson, H. A. and McCammon, J. A. (2000) Mol Pharmacol, 57, 213-218. Chen, X., Ji, Y. Z. and Chen, Y. Z. (2002) Nucleic Acids Res, 30, 412-415.

Sperandio et al. [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111] [112] [113] [114] [115] [116] [117] [118] [119] [120] [121] [122] [123] [124] [125]

[126] [127] [128] [129] [130] [131] [132] [133] [134]

Hajduk, P. J., Huth, J. R. and Tse, C. (2005) Drug Discov Today, 10, 1675-1682. Kellenberger, E., Muller, P., Schalon, C., Bret, G., Foata, N. and Rognan, D. (2006) J Chem Inf Model, 46, 717-727. Gold, N. D. and Jackson, R. M. (2006) J Chem Inf Model, 46, 736742. An, J., Totrov, M. and Abagyan, R. (2005) Mol Cell Proteomics, 4, 752-761. Krovat, E. M., Steindl, T. and Langer, T. (2005) Current Computer-Aided Drug Design, 1, 93-102. Brenk, R., Vetter, S. W., Boyce, S. E., Goodin, D. B. and Shoichet, B. K. (2006) J Mol Biol, 357, 1449-1470. Armstrong, K. A., Tidor, B. and Cheng, A. C. (2006) J Med Chem, 49, 2470-2477. Sotriffer, C. and Klebe, G. (2002) Farmaco, 57, 243-251. Laurie, A. T. and Jackson, R. M. (2005) Bioinformatics, 21, 19081916. Elcock, A. H. (2001) J Mol Biol, 312, 885-896. Glaser, F., Morris, R. J., Najmanovich, R. J., Laskowski, R. A. and Thornton, J. M. (2006) Proteins, 62, 479-488. Hoppe, C., Steinbeck, C. and Wohlahrt, G. (2006) J Mol Graphics Model, 24, 328-340. Hajduk, P. J., Huth, J. R. and Fesik, S. W. (2005) J Med Chem, 48, 2518-2525. Tondel, K., Anderssen, E. and Drablos, F. (2002) J Comput Aided Mol Des, 16, 831-840. Jambon, M., Imberty, A., Deleage, G. and Geourjon, C. (2003) Proteins, 52, 137-145. Carpy, A. J. and Marchand-Geneste, N. (2003) SAR QSAR Environ Res, 14, 329-337. Stoermer, M. J. (2006) Medicinal Chemistry, 2, 89-112. Bock, J. R. and Gough, D. A. (2005) J Chem Inf Model, 45, 14021414. Klabunde, T. and Hessler, G. (2002) Chembiochem, 3, 928-944. Surgand, J. S., Rodrigo, J., Kellenberger, E. and Rognan, D. (2006) Proteins, 62, 509-538. Becker, O. M., Marantz, Y., Shacham, S., Inbal, B., Heifetz, A., Kalid, O., Bar-Haim, S., Warshaviak, D., Fichman, M. and Noiman, S. (2004) Proc Natl Acad Sci U S A, 101, 11304-11309. Shacham, S., Marantz, Y., Bar-Haim, S., Kalid, O., Warshaviak, D., Avisar, N., Inbal, B., Heifetz, A., Fichman, M., Topf, M., Naor, Z., Noiman, S. and Becker, O. M. (2004) Proteins, 57, 51-86. Edwards, B. S., Bologa, C., Young, S. M., Balakin, K. V., Prossnitz, E. R., Savchuck, N. P., Sklar, L. A. and Oprea, T. I. (2005) Mol Pharmacol, 68, 1301-1310. Cavasotto, C. N., Ortiz, M. A., Abagyan, R. A. and Piedrafita, F. J. (2006) Bioorg Med Chem Lett, 16, 1969-1974. Irwin, J. J., Raushel, F. M. and Shoichet, B. K. (2005) Biochemistry, 44, 12316-12328. Cozza, G., Bonvini, P., Zorzi, E., Poletto, G., Pagano, M. A., Sarno, S., Donella-Deana, A., Zagotto, G., Rosolen, A., Pinna, L. A., Meggio, F. and Moro, S. (2006) J Med Chem, 49, 2363-2366. Lu, I. L., Huang, C. F., Peng, Y. H., Lin, Y. T., Hsieh, H. P., Chen, C. T., Lien, T. W., Lee, H. J., Mahindroo, N., Prakash, E., Yueh, A., Chen, H. Y., Goparaju, C. M., Chen, X., Liao, C. C., Chao, Y. S., Hsu, J. T. and Wu, S. Y. (2006) J Med Chem, 49, 2703-2712. de Graaf, C., Oostenbrink, C., Keizers, P. H. J., van der Wijst, T., Jongejan, A. and Vermeulen, N. P. E. (2006 ) J Med Chem, 49, 2417-2430. Franceschi, F. and Duffy, E. M. (2006) Biochem Pharmacol. Filikov, A. V., Mohan, V., Vickers, T. A., Griffey, R. H., Cook, P. D., Abagyan, R. A. and James, T. L. (2000) J Comput Aided Mol Des, 14, 593-610. Muegge, I. and Enyedy, I. J. (2004) Curr Med Chem, 11, 693-707. Hu, X., Balaz, S. and Shelver, W. H. (2004) J Mol Graph Model, 22, 293-307. Arcus, V. L., Lott, J. S., Johnston, J. M. and Baker, E. N. (2006) Drug Discov Today, 11, 28-34. Kantardjieff, K. and Rupp, B. (2004) Curr Pharm Des, 10, 31953211. Reddy, T. R., Mutter, R., Heal, W., Guo, K., Gillet, V. J., Pratt, S. and Chen, B. (2006) J Med Chem, 49, 607-615. Cases, M., Garcia-Serna, R., Hettne, K., Weeber, M., van der Lei, J., Boyer, S. and Mestres, J. (2005) Curr Top Med Chem, 5, 763772.

Structure-Based Virtual Ligand Screening [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158]

[159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173]

Liu, Z., Huang, C., Fan, K., Wei, P., Chen, H., Liu, S., Pei, J., Shi, L., Li, B., Yang, K., Liu, Y. and Lai, L. (2005) J Chem Inf Model, 45, 10-17. Rogers, J. P., Beuscher, A. E. t., Flajolet, M., McAvoy, T., Nairn, A. C., Olson, A. J. and Greengard, P. (2006) J Med Chem, 49, 1658-1667. Arkin, M. (2005) Curr Opin Chem Biol, 9, 317-324. Arkin, M. R. and Wells, J. A. (2004) Nat Rev Drug Discov, 3, 301317. Fairlie, D. P. (2004) Aust. J. Chem, 57, 855-857. Whitty, A. and Kumaravel, G. (2006) Nat Chem Biol, 2, 112-118. Mestres, J. (2005) Drug Discovery Today, 10, 1629-1637. Rupp, B. (2003) Acc Chem Res, 36, 173-181. Abagyan, R., Lee, W. H., Raush, E., Budagyan, L., Totrov, M., Sundstrom, M. and Marsden, B. D. (2006) Trends Biochem Sci, 31, 76-78. Nielsen, J. E. and McCammon, J. A. (2003) Protein Sci, 12, 18941901. Cavasotto, C. N., Kovacs, J. A. and Abagyan, R. A. (2005) J Am Chem Soc, 127, 9632-9640. Karplus, M. and Petsko, G. A. (1990) Nature, 347, 631-639. Halle, B. (2002) Proc Natl Acad Sci U S A, 99, 1274-1279. Ferrari, A. M., Wei, B. Q., Costantino, L. and Shoichet, B. K. (2004) J Med Chem, 47, 5076-5084. Karplus, M. and McCammon, J. A. (2002) Nat Struct Biol, 9, 646652. Ferrara, P., Priestle, J. P., Vangrevelinghe, E. and Jacoby, E. (2006) Current Computer-Aided Drug Design, 2, 83-91. Wang, R., Lu, Y., Fang, X. and Wang, S. (2004) J Chem Inf Comput Sci, 44, 2114-2125. Perola, E., Walters, W. P. and Charifson, P. S. (2004) Proteins, 56, 235-249. Kellenberger, E., Rodrigo, J., Muller, P. and Rognan, D. (2004) Proteins, 57, 225-242. Kontoyianni, M., McClellan, L. M. and Sokol, G. S. (2004) J Med Chem, 47, 558-565. Kontoyianni, M., Sokol, G. S. and McClellan, L. M. (2005) J Comput Chem, 26, 11-22. Bursulaya, B. D., Totrov, M., Abagyan, R. and Brooks, C. L., 3rd. (2003) J Comput Aided Mol Des, 17, 755-763. Wang, R., Lai, L. and Wang, S. (2002) J Comput Aided Mol Des, 16, 11-26. Warren, G. L., Andrews, C. W., Capelli, A.-M., Clarke, B., LaLonde, J., Lambert, M. H., Lindvall, M., Nevins, N., Semus, S. F., Senger, S., Tedesco, G., Wall, I. D., Woolven, J. M., Peishoff, C. E. and Head, M. S. ((in press)) J Med Chem. Chen, H., Lyne, P. D., Giordanetto, F., Lovell, T. and Li, J. (2006) J Chem Inf Model, 46, 401-415. Huang, N., Kalyanaraman, C., Irwin, J. J. and Jacobson, M. P. (2006) J Chem Inf Model, 46, 243-253. Bernacki, K., Kalyanaraman, C. and Jacobson, M. P. (2005) J Biomol Screen, 10, 675-681. Ferrara, P., Gohlke, H., Price, D. J., Klebe, G. and Brooks, C. L., 3rd. (2004) J Med Chem, 47, 3032-3047. Han, L. Y., Lin, H. H., Li, Z. R., Zheng, C. J., Cao, Z. W., Xie, B. and Chen, Y. Z. (2006) J Chem Inf Model, 46, 445-450. Raha, K. and Merz, K. M., Jr. (2005) J Med Chem, 48, 4558-4575. Wang, R., Fang, X., Lu, Y., Yang, C. Y. and Wang, S. (2005) J Med Chem, 48, 4111-4119. Chen, X., Lin, Y. and Gilson, M. K. (2002) Biopolymers Nucleic Acid Sci, 61, 127-142. Hendlich, M., Bergner, A., Gunther, J. and Klebe, G. (2003) J Mol Biol, 326, 607-620. Zhang, J., Aizawa, M., Amari, S., Iwasawa, Y., Nakano, T. and Nakata, K. (2004) Comput Biol Chem, 28, 401-407. Nissink, J. W., Murray, C., Hartshorn, M., Verdonk, M. L., Cole, J. C. and Taylor, R. (2002) Proteins, 49, 457-471. Miteva, M. A., Lee, W. H., Montes, M. O. and Villoutreix, B. O. (2005) J Med Chem, 48, 6012-6022. Radestock, S., Bohm, M. and Gohlke, H. (2005) J Med Chem, 48, 5466-5479. Schulz-Gasch, T. and Stahl, M. (2003) J Mol Model (Online), 9, 47-57. Yang, J. M., Chen, Y. F., Shen, T. W., Kristal, B. S. and Hsu, D. F. (2005) J Chem Inf Model, 45, 1134-1146.

Current Protein and Peptide Science, 2006, Vol. 7, No. 5 [174] [175] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194]

[195] [196] [197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210]

23

Krovat, E. M. and Langer, T. (2004) J Chem Inf Comput Sci, 44, 1123-1129. Wang, R., Lu, Y. and Wang, S. (2003) J Med Chem, 46, 22872303. Jenkins, J. L., Kao, R. Y. and Shapiro, R. (2003) Proteins, 50, 8193. Clark, R. D., Strizhev, A., Leonard, J. M., Blake, J. F. and Matthew, J. B. (2002) J Mol Graph Model, 20, 281-295. Wang, R. and Wang, S. (2001) J Chem Inf Comput Sci, 41, 14221426. Charifson, P. S., Corkery, J. J., Murcko, M. A. and Walters, W. P. (1999) J Med Chem, 42, 5100-5109. Baber, J. C., Shirley, W. A., Gao, Y. and Feher, M. (2006) J Chem Inf Model, 46, 277-288. Oda, A., Tsuchida, K., Takakura, T., Yamaotsu, N. and Hirono, S. (2006) J Chem Inf Model, 46, 380-391. Feher, M. (2006) Drug Discov Today, 11, 421-428. Mpamhanga, C. P., Chen, B., McLay, I. M., Ormsby, D. L. and Lindvall, M. K. (2005) J Chem Inf Model, 45, 1061-1074. Deng, Z., Chuaqui, C. and Singh, J. (2004) J Med Chem, 47, 337344. Fornabaio, M., Spyrakis, F., Mozzarelli, A., Cozzini, P., Abraham, D. J. and Kellogg, G. E. (2004) J Med Chem, 47, 4507-4516. Fornabaio, M., Cozzini, P., Mozzarelli, A., Abraham, D. J. and Kellogg, G. E. (2003) J Med Chem, 46, 4487-4500. Cozzini, P., Fornabaio, M., Marabotti, A., Abraham, D. J., Kellogg, G. E. and Mozzarelli, A. (2002) J Med Chem, 45, 2469-2483. Clark, M., Guarnieri, F., Shkurko, I. and Wiseman, J. (2006) J Chem Inf Model, 46, 231-242. Venkatachalam, C. M., Jiang, X., Oldfield, T. and Waldman, M. (2003) J Mol Graph Model, 21, 289-307. Bohm, H. J. (1998) J Comput Aided Mol Des, 12, 309-323. Krammer, A., Kirchhoff, P. D., Jiang, X., Venkatachalam, C. M. and Waldman, M. (2005) J Mol Graph Model, 23, 395-407. Verkhivker, G. M., Rejto, P. A., Bouzida, D., Arthurs, S., Colson, A. B., Freer, S. T., Gehlhaar, D. K., Larson, V., Luty, B. A., Marrone, T. and Rose, P. W. (1999) J Mol Recognit, 12, 371-389. Halgren, T. A., Murphy, R. B., Friesner, R. A., Beard, H. S., Frye, L. L., Pollard, W. T. and Banks, J. L. (2004) J Med Chem, 47, 1750-1759. Friesner, R. A., Banks, J. L., Murphy, R. B., Halgren, T. A., Klicic, J. J., Mainz, D. T., Repasky, M. P., Knoll, E. H., Shelley, M., Perry, J. K., Shaw, D. E., Francis, P. and Shenkin, P. S. (2004) J Med Chem, 47, 1739-1749. McMartin, C. and Bohacek, R. S. (1997) J Comput Aided Mol Des , 11, 333-344. Cotesta, S., Giordanetto, F., Trosset, J. Y., Crivori, P., Kroemer, R. T., Stouten, P. F. and Vulpetti, A. (2005) Proteins, 60, 629-643. Trosset, J. Y. and Scheraga, H. A. (1999) Journal of computational chemistry, 20, 412-427. Hart, T. N. and Read, R. J. (1992) Proteins, 13, 206-222. Abagyan, R. and Totrov, M. (1994) J Mol Biol, 235, 983-1002. Cavasotto, C. N. and Abagyan, R. A. (2004) J Mol Biol , 337, 209225. Liu, M. and Wang, S. (1999) J Comput Aided Mol Des, 13, 435451. Morris, G., Goodsell, D., Halliday, R., Huey, R., Hart, W., Belew, R. and Olson, A. (1998) Journal of computational chemistry, 19, 1639-1662. Taylor, J. S. and Burnett, R. M. (2000) Proteins, 41, 173-191. Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States, D. J., Swaminathan, S. and Karplus, M. (1983) J Comput Chem, 4, 187217. Honig, B. and Nicholls, A. (1995) Science, 268, 1144-1149. Clark, K. P. and Ajay. (1995) Journal of computational chemistry, 16, 1210. Budin, N., Majeux, N. and Caflisch, A. (2001) Biol Chem, 382, 1365-1372. Cecchini, M., Kolb, P., Majeux, N. and Caflisch, A. (2004) J Comput Chem, 25, 412-422. Majeux, N., Scarsi, M., Apostolakis, J., Ehrhardt, C. and Caflisch, A. (1999) Proteins, 37, 88-105. Jones, G., Willett, P., Glen, R. C., Leach, A. R. and Taylor, R. (1997) J Mol Biol, 267, 727-748.

24

Current Protein and Peptide Science, 2006, Vol. 7, No. 5

[211] [212] [213] [214] [215] [216] [217] [218] [219] [220] [221] [222] [223] [224] [225] [226] [227] [228] [229] [230] [231] [232] [233] [234] [235] [236] [237] [238] [239] [240] [241] [242] [243] [244] [245] [246] [247] [248] [249] [250] [251] [252]

Verdonk, M. L., Chessari, G., Cole, J. C., Hartshorn, M. J., Murray, C. W., Nissink, J. W., Taylor, R. D. and Taylor, R. (2005) J Med Chem, 48, 6504-6515. Verdonk, M. L., Cole, J. C., Hartshorn, M. J., Murray, C. W. and Taylor, R. D. (2003) Proteins, 52, 609-623. Baxter, C. A., Murray, C. W., Clark, D. E., Westhead, D. R. and Eldridge, M. D. (1998) Proteins, 33, 367-382. Abagyan, R. A., Totrov, M. and Kuznetsov, D. (1994) Journal of computational chemistry, 15, 488-506. Eldridge, M. D., Murray, C. W., Auton, T. R., Paolini, G. V. and Mee, R. P. (1997) J Comput Aided Mol Des, 11, 425-445. Diller, D. J. and Merz, K. M., Jr. (2001) Proteins, 43, 113-124. Schnecke, V., Swanson, C. A., Getzoff, E. D., Tainer, J. A. and Kuhn, L. A. (1998) Proteins, 33, 74-87. Zavodszky, M. I. and Kuhn, L. A. (2005) Protein Sci, 14, 11041114. Schnecke, V. and Kuhn, L. A. (2000) Perspective in drug discovery and design, 20, 171-190. Seifert, M. H., Schmitt, F., Herz, T. and Kramer, B. (2004) J Mol Model (Online), 10, 342-357. Seifert, M. H. (2005) J Chem Inf Model, 45, 449-460. McGann, M. R., Almond, H. R., Nicholls, A., Grant, J. A. and Brown, F. K. (2003) Biopolymers, 68, 76-90. Kuntz, I. D., Blaney, J. M., Oatley, S. J., Langridge, R. and Ferrin, T. E. (1982) J Mol Biol, 161, 269-288. Kuntz, I. D. (1992) Science, 257, 1078-1082. DesJarlais, R. L., Sheridan, R. P., Seibel, G. L., Dixon, J. S., Kuntz, I. D. and Venkataraghavan, R. (1988) J Med Chem, 31, 722-729. DesJarlais, R. L. and Dixon, J. S. (1994) J Comput Aided Mol Des , 8, 231-242. Joseph-McCarthy, D., Thomas, B. E. t., Belmarsh, M., Moustakas, D. and Alvarez, J. C. (2003) Proteins, 51, 172-188. Fradera, X., Knegtel, R. M. and Mestres, J. (2000) Proteins, 40, 623-636. Lambert, M. H. Docking conformationally flexible molecules into protein binding sites: New York 1997. Floriano, W. B., Vaidehi, N., Zamanakos, G. and Goddard, W. A., 3rd. (2004) J Med Chem, 47, 56-71. Cho, A. E., Wendel, J. A., Vaidehi, N., Kekenes-Huskey, P. M., Floriano, W. B., Maiti, P. K. and Goddard, W. A., 3rd. (2005) J Comput Chem, 26, 48-71. Welch, W., Ruppert, J. and Jain, A. N. (1996) Chem Biol, 3, 449462. Ruppert, J., Welch, W. and Jain, A. N. (1997) Protein Sci, 6, 524533. Jain, A. N. (2003) J Med Chem, 46, 499-511. Jain, A. N. (1996) J Comput Aided Mol Des, 10, 427-440. Pham, T. A. and Jain, A. N. (2005) J. Med. Chem., (in press). Mizutani, M. Y., Tomioka, N. and Itai, A. (1994) J Mol Biol , 243, 310-326. Lawrence, M. C. and Davis, P. C. (1992) Proteins, 12, 31-41. Miller, M. D., Kearsley, S. K., Underwood, D. J. and Sheridan, R. P. (1994) J Comput Aided Mol Des, 8, 153-174. Pang, Y. P. and Kozikowski, A. P. (1994) J Comput Aided Mol Des, 8, 669-681. Pang, Y. P., Perola, E., Xu, K. and Prendergast, F. G. (2001) J Comput Chem, 22, 1750-1771. Burkhard, P., Taylor, P. and Walkinshaw, M. D. (1998) J Mol Biol, 277, 449-466. Jackson, R. M. (2002) J Comput Aided Mol Des, 16, 43-57. Goto, J., Kataoka, R. and Hirayama, N. (2004) J Med Chem, 47, 6804-6811. Rarey, M., Kramer, B., Lengauer, T. and Klebe, G. (1996) J Mol Biol, 261, 470-489. Bohm, H. J. (1992) J Comput Aided Mol Des, 6, 61-78. Klebe, G. and Mietzner, T. (1994) J Comput Aided Mol Des, 8, 583-606. Hindle, S. A., Rarey, M., Buning, C. and Lengaue, T. (2002) J Comput Aided Mol Des, 16, 129-149. Claussen, H., Buning, C., Rarey, M. and Lengauer, T. (2001) J Mol Biol, 308, 377-395. Vieth, M. and Cummins, D. J. (2000) J Med Chem, 43, 3020-3032. Sobolev, V., Wade, R. C., Vriend, G. and Edelman, M. (1996) Proteins, 25, 120-129. Gabb, H. A., Jackson, R. M. and Sternberg, M. J. (1997) J Mol Biol, 272, 106-120.

Sperandio et al. [253] [254] [255] [256] [257] [258] [259] [260] [261] [262] [263] [264] [265] [266] [267] [268] [269] [270] [271] [272] [273] [274] [275] [276] [277] [278] [279] [280] [281] [282] [283] [284] [285] [286] [287] [288] [289] [290] [291] [292] [293] [294]

Katchalski-Katzir, E., Shariv, I., Eisenstein, M., Friesem, A. A., Aflalo, C. and Vakser, I. A. (1992) Proc Natl Acad Sci U S A, 89, 2195-2199. Choi, V. (2005) Chemistry & Biodiversity, 2, 1517-1524. Yamagishi, M. E., Martins, N. F., Neshich, G., Cai, W., Shao, X., Beautrait, A. and Maigret, B. (2006) J Mol Model (Online). Tondel, K., Anderssen, E. and Drablos, F. (2006) J Comput Aided Mol Des. Li, H., Li, C., Gui, C., Luo, X., Chen, K., Shen, J., Wang, X. and Jiang, H. (2004) Bioorg Med Chem Lett, 14, 4671-4676. Schafferhans, A. and Klebe, G. (2001) J Mol Biol, 307, 407-427. Yang, J. M. and Chen, C. C. (2004) Proteins, 55, 288-304. Curioni, A., Mordasini, T. and Andreoni, W. (2004) J Comput Aided Mol Des, 18, 773-784. Morley, S. D. and Afshar, M. (2004) J Comput Aided Mol Des , 18, 189-208. Vidal, D., Thormann, M. and Pons, M. (2006) J Chem Inf Model, 46, 836-843. Oloff, S., Zhang, S., Sukumar, N., Breneman, C. and Tropsha, A. (2006) J Chem Inf Model, 46, 844-851. Sivanesan, D., Rajnarayanan, R. V., Doherty, J. and Pattabiraman, N. (2005) J Comput Aided Mol Des, 19, 213-228. Lorber, D. M. and Shoichet, B. K. (2005) Curr Top Med Chem, 5, 739-749. Khandelwal, A., Lukacova, V., Comez, D., Kroll, D. M., Raha, S. and Balaz, S. (2005) J Med Chem, 48, 5437-5447. Ferrara, P., Curioni, A., Vangrevelinghe, E., Meyer, T., Mordasini, T., Andreoni, W., Acklin, P. and Jacoby, E. (2006) J Chem Inf Model, 46, 254-263. Wang, J., Kang, X., Kuntz, I. D. and Kollman, P. A. (2005) J Med Chem, 48, 2432-2444. Maiorov, V. and Sheridan, R. P. (2005) J Chem Inf Model, 45, 1017-1023. Bohm, H. J. (1994) J Comput Aided Mol Des, 8, 623-632. Boda, K. and Johnson, A. P. (2006) J Med Chem, in press. Boda, K. and Johnson, A. P. (in press ) J Med Chem. Douguet, D., Munier-Lehmann, H., Labesse, G. and Pochet, S. (2005) J Med Chem, 48, 2457-2468. Belda, I., Madurga, S., Llora, X., Martinell, M., Tarrago, T., Piqueras, M. G., Nicolas, E. and Giralt, E. (2005) J Comput Aided Mol Des, 19, 585-601. Wang, R., Gao, Y. and Lai, L. (2000) J Mol Model, 6, 498-516. Shoichet, B. K., McGovern, S. L., Wei, B. and Irwin, J. J. (2002) Curr Opin Chem Biol, 6, 439-446. Vangrevelinghe, E., Zimmermann, K., Schoepfer, J., Portmann, R., Fabbro, D. and Furet, P. (2003) J Med Chem, 46, 2656-2662. Boehm, H. J., Boehringer, M., Bur, D., Gmuender, H., Huber, W., Klaus, W., Kostrewa, D., Kuehne, H., Luebbers, T., MeunierKeller, N. and Mueller, F. (2000) J Med Chem, 43, 2664-2674. Huang, D., Luthi, U., Kolb, P., Edler, K., Cecchini, M., Audetat, S., Barberis, A. and Caflisch, A. (2005) J Med Chem, 48, 5108-5111. Mizutani, M. Y. and Itai, A. (2004) J Med Chem, 47, 4818-4828. Schapira, M., Raaka, B. M., Das, S., Fan, L., Totrov, M., Zhou, Z., Wilson, S. R., Abagyan, R. and Samuels, H. H. (2003) Proc Natl Acad Sci U S A, 100, 7354-7359. Schapira, M., Raaka, B. M., Samuels, H. H. and Abagyan, R. (2000) Proc Natl Acad Sci U S A, 97, 1008-1013. Kraemer, O., Hazemann, I., Podjarny, A. D. and Klebe, G. (2004) Proteins, 55, 814-823. Sherman, W., Day, T., Jacobson, M. P., Friesner, R. A. and Farid, R. (2006) J Med Chem, 49, 534-553. Yang, C.-Y., Wang, R. and Wang, S. (in press) J Med Chem. Moitessier, N., Therrien, E. and Hanessian, S. (in press) J Med Chem. Sherman, W., Beard, H. S. and Farid, R. (2006) Chem Biol Drug Des, 67, 83-84. Lang, P. T., Kuntz, I. D., Maggiora, G. M. and Bajorath, J. (2005) J Biomol Screen, 10, 649-652. Thomas, M. P. and McInnes, C. (2006) IDrugs, 9, 273-278. Edwards, P. J. (2006) IDrugs, 9, 347-353. Muegge, I., Martin, Y. C., Hajduk, P. J. and Fesik, S. W. (1999) J Med Chem, 42, 2498-2503. Steinbeck, C. and Kuhn, S. (2004) Phytochemistry, 65, 2711-2717. Steinbeck, C., Krause, S. and Kuhn, S. (2003) J Chem Inf Comput Sci, 43, 1733-1739. Steinbeck, C. (2001) Curr Opin Drug Discov Devel, 4, 338-342.

Structure-Based Virtual Ligand Screening [295] [296] [297] [298] [299] [300] [301] [302] [303]

Current Protein and Peptide Science, 2006, Vol. 7, No. 5

Grotthuss, v. M., Pas, J. and Rychlewski, L. (2003) Bioinformatics, 19, 1041-1042. Miteva, M. A., Violas, S., Montes, M., Gomez, D., Tuffery, P. and Villoutreix, B. O. (in press) Nucleic Acids Research. Irwin, J. J. and Shoichet, B. K. (2005) J Chem Inf Model, 45, 177182. Wishart, D. S., Knox, C., Guo, A. C., Shrivastava, S., Hassanali, M., Stothard, P., Chang, Z. and Woolsey, J. (2006) Nucleic Acids Res, 34, D668-672. Block, P., Sotriffer, C. A., Dramburg, I. and Klebe, G. (2006) Nucleic Acids Res, 34, D522-526. Raymer, M. L., Sanschagrin, P. C., Punch, W. F., Venkataraman, S., Goodman, E. D. and Kuhn, L. A. (1997) J Mol Biol, 265, 445464. Tetko, I. V. and Tanchuk, V. Y. ( 2002) J Chem Inf Comput Sci, 42, 1136-1145. Steinbeck, C., Hoppe, C., Kuhn, S., Floris, M., Guha, R. and Willighagen, E. L. (2006) Curr. Pharm. Des., in press. Hassinen, T. and Perakyla, M. (2001) J Comp Chem, 22, 12291242.

Received: December 13, 2005

Revised: March 16, 2006

Accepted: May 01, 2006

[304] [305] [306] [307] [308] [309] [310] [311] [312] [313]

25

Nayal, M. and Honig, B. (2006) Proteins. Chang, D. T., Oyang, Y. J. and Lin, J. H. (2005) Nucleic Acids Res, 33, W233-238. Springer, C., Adalsteinsson, H., Young, M. M., Kegelmeyer, P. W. and Roe, D. C. (2005) J Med Chem, 48, 6821-6831. Betzi, S., Suhre, K., Chetrit, B., Guerlesquin, F. and Morelli, X. (Submitted). Ng, P. C. and Henikoff, S. (2003) Nucleic Acids Res, 31, 38123814. Ramensky, V., Bork, P. and Sunyaev, S. (2002) Nucleic Acids Res, 30, 3894-3900. Guerois, R., Nielsen, J. E. and Serrano, L. (2002) J Mol Biol , 320, 369-387. Hinsen, K. (2000) J. Comput. Chem., 21, 79-95. Hollup, S. M., Salensminde, G. and Reuter, N. (2005) BMC Bioinformatics, 6, 52. Miteva, M. A., Tuffery, P. and Villoutreix, B. O. (2005) Nucleic Acids Res, 33, W372-375.