Current Computer-Aided Drug Design, 2007, 3, 33-49
33
Acceleration of the Drug Discovery Process: A Combinatorial Approach Using NMR Spectroscopy and Virtual Screening Xavier Morelli1* and Alan C. Rigby2* 1 CNRS-IBSM/BIP
Laboratory, 31 Chemin Joseph Aiguier, 13402 Marseille, Cedex 20, France, 2Division of Molecular and Vascular Medicine, Department of Medicine, Beth Israel Deaconess Medical Center, and Harvard Medical School, Boston, MA 02215, USA Abstract: The continued implementation of NMR-based approaches in hit-through-lead drug discovery in academic and corporate settings is founded upon NMR applications that assess structure activity relationships. A very recent application of NMR spectroscopy to these discovery initiatives involves fraganomics, in which NMR is used to iteratively “guide” the assembly of several weakly interacting fragments or small molecules through chemical links. Moreover, several groups have recently reported the potential of integrating NMR spectroscopy with in silico, virtual screens of large chemical repositories possessing diverse collections of small molecules. Importantly an improved understanding of the intermolecular forces that mediate protein-protein/ protein-ligand interactions has been integral to improving these virtual screening approaches, resulting in the identification of novel ligands for several therapeutic targets. Recent success of these structure-based discovery initiatives in targeting protein-protein interactions that are responsible for the non-covalent assembly and/or regulation of macromolecular complexes and are a critical paradigm in many disease pathologies will be discussed. The atomic details of these requisite interactions are the cornerstone of NMR and crystallographic “structure-guided”, drug discovery initiatives aimed at disrupting complex formation. This review will predominantly focus on the recent advances in structure based computational screening approaches, highlighting the successful integration of in silico virtual screens with NMR-based techniques. The application of this powerful, combinatorial approach for the evaluation of well-characterized target space as well as its application to unique chemical space such as the protein-protein interaction inhibition (2P2I) that has recently been shown to be tractable to small molecule intervention will be discussed.
Keywords: NMR, high throughput docking, structure-based screening. INTRODUCTION The Human Genome Project [1] has identified approximately 30,000 genes in the human genome, with a subset of these genes, ~3,000-10,000, believed to be involved in the pathogenesis of disease [2,3]. In a comprehensive review of pharmaceutical industry portfolios Drews and colleagues identified approximately 500 drug targets, but suggested there are likely as many as 5,00010,000 potential therapeutic targets [4,5]. However, they caution that many of these potential targets remain underexploited since they are believed to be intractable to orally bioavailable small molecules [4,6,7]. That said, most commercial drug discovery initiatives continue to develop and/or expand their pre-clinical and clinical pipelines, focusing on G-protein coupled receptors (GPCR’s), nuclear receptors, ion channels and enzymatic targets [3,5]. While most drugs on the market are competitive inhibitors of these targets, there remains a significant unmet clinical need in many disease indications, a predicament that is likely to worsen due to the emergence of serious clinical complications with current therapeutic drugs [8,9]. Corporate *Address correspondence to these authors at the CNRS-IBSM/BIP Laboratory, 31 Chemin Joseph Aiguier, 13402 Marseille, Cedex 20, France; E-mail:
[email protected] or Division of Molecular and Vascular Medicine, Department of Medicine, Beth Israel Deaconess Medical Center, and Harvard Medical School, Boston, MA 02215, USA; E-mail:
[email protected] 1573-4099/07 $50.00+.00
“hit through lead” portfolios are often biased to placate market forces, in an attempt to deliver “best-in-class” drugs [4,5,7,10]. The late stage “false positive” triage of compounds enrolled in research-intensive optimization is expensive. Structure-based and/or ligand-based virtual screening efforts that are partnered with structural biology methods including X-ray crystallography and/or nuclear magnetic resonance (NMR) spectroscopy provide a cost effective, streamlined approach for identifying hits and validating leads prior to further preclinical and clinical evaluation [11-13]. Alone these approaches have their own merit in discovery pipelines, however it is the combination of these approaches, specifically NMR with structure-based in silico screening that will be the focus of this review. In an attempt to expand what is considered to be a therapeutically drugable target, several groups have capitalized on the structural genomic initiatives that continue to provide new structures and the interaction proteome that defines how these protein-protein interactions are assembled [14-18]. Protein-protein molecular recognition is the cornerstone of cellular function and mechanistic signal transduction. Success in targeting these protein-protein interfaces, which are classically comprised of relatively flat, featureless surfaces devoid of cavities and/or pockets that are critical for ligand specificity and affinity, has been difficult [19]. Nonetheless, these protein-protein interfaces represent therapeutically interesting and commercially lucrative target space ([20] and references therein). The long-standing notion © 2007 Bentham Science Publishers Ltd.
34 Current Computer-Aided Drug Design, 2007, Vol. 3, No. 1
that these interfaces are excessively complicated and thus “poorly drugable” is founded on the inherent threedimensional (3D), biophysical complexity of this target space [19,20]. Although these protein-protein interfaces were believed to be refractory to small molecule intervention, an improved understanding of these complex surfaces and the relative energetic contributions bestowed by; interface shape, interface size, geometrical complexity, polarity and roughness have recently stimulated renewed interest in this target space [19,21]. The typical protein-protein interface is 750-1500 Å2, however the notion of “interaction hot spots”, which contribute a disproportionate amount of binding energy to the interaction, has provided a unique target strategy for the identification of small molecule “hits”, which are compounds that are likely to posses in vitro and/or in silico activity [20]. These hot spots tend to reside on both proteins and are clustered within the interaction
Morelli and Rigby
interface, thereby contributing in a co-operative manner to the formation of the complex through surface complementary and significant protein flexibility [22]. However, the combination of target complexity and limited chemical space sampling within the “drug-like” chemical repositories that are typically screened in small molecule High Throughput Screens (HTS) has created both a experimental and intellectual barrier, limiting the therapeutic utility of these targets [20]. As recently as 2001 the widely held belief in drug discovery initiatives was that these interactions would likely remain intractable to ‘Lipinsky rule of 5’-compliant, small molecule therapies [23-25]. One promising approach for the discovery of small molecule “hits” for this under-exploited target space is fragment-based lead discovery or fraganomics, whereby small, relatively simple fragments are evaluated for their
Fig. (1). Role of NMR in the Development of Candidate Small Molecules. In an attempt to meet current unmet clinical needs, drug discovery programs are currently using data provided by structural proteomic initiatives and the interaction proteome to search of novel therapeutic targets involved in these disease indications. If the 3D structure of the target is known the drug discovery strategy proceeds to hit identification, which is the identification of small molecules that possess in vitro/in vivo function and are comprised of lead-like PK and PD properties. However, as is often the situation when exploring new 3D target space, NMR spectroscopy, X-ray crystallography or homology model-based approaches are first needed for structure determination. Using this target structure one should initially evaluate its drugability using any one of a number of approaches including the NMR approach recently proposed by Hajduk and colleagues at Abbott Laboratories [21]. If the target is deemed drugable according to the metrics evaluated, the search for “small molecule hits” is commenced. NMR is an important tool for “hit” identification that involves a medium throughput NMR-based binding assay, chemical shift perturbation analysis (CSP), SAR by NMR, SAR by ILOE or other fragment-based approaches. Hits identified can be further evaluated for specificity and/or non-specific target reactivity using NMR approaches such a RAMPED-UP NMR and/or ALARM NMR [63,78,163]. Validated hits are then optimized using hit-through-lead approaches, which often involve medicinal chemistry and Quantitative Structure Activity Relationships (QSAR) that are iteratively evaluated by NMR [71]. The ultimate goal is a bioavailable small molecule with optimized PD/PK properties for subsequent pre-clinical and ultimately clinical evaluation.
Accelerating Drug Discovery
ability to bind to a target and provide a scaffold or substrate for further medicinal chemistry and/or chemical ligation to another targeted fragment [26,27]. Successful fragment screening requires a biophysical screening approach that is capable of detecting relatively weak interactions while providing an observable window through which the linkage and subsequent interaction of these proximal fragments with the target can be monitored ([28-30] and references therein). Nuclear magnetic resonance spectroscopy (NMR) provides a robust platform to characterize both the ligand binding site and affinity, while simultaneously providing a window through which the entire target protein or proteins can be structurally observed without the need of an assay for detecting this interaction. The ability of NMR to provide detailed structural information pertaining to target proteins and the newly identified hits or leads being optimized has facilitated the integration of NMR into many translational discovery programs in both corporate and academic settings (Fig. (1)). Fraganomics and other approaches that combine virtual screening and structural biology approaches such as NMR continue to contribute to the growing knowledgebase of improved scaffold diversity within small molecule “hits” and “leads” that are capable of inhibiting protein-protein interactions, including the nutlins, which inhibit the interaction between p53 and human double minute-2 ( H D M 2 ) as well as compounds that inhibit the heterodimerization of BCL2 family members including those involving BCL2 and BCL-XL [31-34]. While several examples of these successes will be highlighted a detailed discussion of this work is beyond the scope of this review. As we improve our understanding of the varied interactions that mediate molecular recognition at the cellular and/or structural level, one goal is to better understand and thus overcome the challenges associated with identifying novel, small molecule therapies. This will only happen if the movement afoot to strengthen the iterative integration of biophysical and computational approaches continues, ensuring new strategies are developed, evaluated, refined and ultimately integrated into discovery programs. Here we review NMR spectroscopic and High Throughput Docking (HTD) approaches that are being implemented in commercial and academic drug discovery initiatives to circumvent “bottlenecks” encountered in traditional hit-through-lead drug discovery. Additionally, we will emphasize the utility and power of coupling NMR with structure-based HTD to accelerate discovery processes. Limitations in the scoring algorithms used to reproducibly rank identified “hits” within structure-based drug discovery initiatives have slowed the adoption of structure and/or ligand-based approaches although many laboratories are focused on improving the accuracy and reproducibility of these scoring functions ([3538] and references therein). We will present several docking strategies and comment on current scoring algorithm limitations, documenting improvements in both of these areas that have resulted in demonstrable successes and argues for the adoption of this combinatorial approach [39]. While structure-based virtual screening or NMR approaches for hit identification alone suffer from inherent caveats and/or limitations, it is the powerful combination of these approaches that will be the focus of this review. Others have extensively commented on and demonstrated the strength and utility of partnering homology-based structure
Current Computer-Aided Drug Design, 2007, Vol. 3, No. 1 35
determination with these in silico approaches. While this is an important area of discovery research that has generated several hits and/or drugs we will not discuss it further, however several excellent papers documenting this approach and its success are provided (and references therein [11,4042]). Here we highlight several recent successes that use this combined strategy to target protein-protein and proteinligand interfaces, and other novel target space which had previously been perceived as too complex for small molecule, ‘rule of 5-compliant’ intervention, yet now represents an exciting paradigm shift in therapeutic intervention [43,44]. STRUCTURE BASED COMPUTATIONAL DRUG DISCOVERY Computer aided drug discovery (CADD) approaches for screening virtual libraries in silico that are theoretically comprised of 10100 molecules is a promising tool for exploring new and expanded chemical space. CADD approaches are likely to accelerate hit identification, and subsequent lead validation and optimization, to provide leads that are potentially more therapeutically viable [39,45]. In silico computational mining of three-dimensional databases or libraries using either a pharmacophore that is comprised of critical functional determinants or target-based HTD, which facilitates the rapid screening of the molecular surface or target “active site” with members of a virtual compound library offers several advantages over experimental HTS, but is not without its own limitations and caveats (Fig. (2)). However, an extensive study carried out by the Shoichet laboratory nicely demonstrated that for several targets experimental HTS and/or in silico screening provided complimentary data of use to these discovery communities [46]. As we will discuss structure-based virtual screening uses critical structural descriptors that are extracted from the target protein (alone or in complex with endogenous ligand) and defined using biophysical approaches including but not limited to NMR spectroscopy. These pharmacophores guide and structurally filter HTD approaches that evaluate large chemical repositories for “hits” that satiate these requirements using an approach that is both efficient and cost effective. Many of the important advances in the use of NMR in drug discovery have been and continue to be reliant on recombinant expression of target proteins in E. coli bacterial expression systems that facilitate the preparation of adequate quantities of isotopically enriched target protein. In addition, these recombinant approaches allow us to readily optimize construct development for the expression of proteins and/or protein domains that are folded and amenable to solution studies in the concentration range needed for NMR spectroscopy. In an attempt to provide NMR data of the highest quality the solution conditions are often optimized using strategies such as micro-drop that permit one to easily evaluate; buffer salt, pH, the need and/or concentration of stabilizer as well as temperature at which the studies will be carried out [47]. Unfortunately, these and other inherent limitations of NMR and/or X-ray crystallography have precluded the use either approach for many targets including membrane proteins such as GPCR’s, which represent a single family of receptors that are involved in many signal
36 Current Computer-Aided Drug Design, 2007, Vol. 3, No. 1
Morelli and Rigby
Fig. (2). Schematic Flowcharts of the Proposed Protein Protein Interaction Inhibition (2P2I) Approach that Uses High Throughput Docking (HTD) in Hit Identification and Lead Optimization. A) If there are no known inhibitors of the target being explored or the target represents unique target space, the first step (following the previously detailed structural study) in the 2P2I discovery flowchart involves HTD of a diverse 3D database representing many chemical scaffolds (1K to 50K compounds). The diversity of this initial library is critical for optimizing the chemical space being evaluated. We propose that our generalist/consensus scoring function (GFscore) should then be used to score and rank the ”hits/ small molecules” that dock within the target space (500 to 2K compounds). Having enriched the proprietary or publicly accessible database using these in silico approaches we suggest performing experimental HTS on a portion of this same library in an attempt to enrich for hits that have demonstrable biological function and are not toxic within this subset of compounds. A comparison of the in silico and experimental HTS experiments will be used to further refine/optimize the scoring function. This specific scoring function and the preliminary hits that are obtained from the proposed “in silico Screening Protocol” will be used to screen the entire database comprised of (1M to 10M compounds representing increased chemical space). B) Having optimized and refined the HTD protocol that incorporates the refined scoring function, which identified true positives that were subsequently validated using experimental HTS from a diverse subset of the library, we suggest that the entire database including these “hits” are screened in silico (1M to 10M compounds). These hits are then evaluated through in vitro and/or in vivo assays to determine if compounds within this identified hit list (500 to 2K compounds) are lead compounds. 2D and 3D QSAR modeling (with CoMFA) can also be used, where necessary as well as combinatorial chemistry approaches to ensure that the chemical space being explored has been optimized within the entirety of the database screened (10 to 100 compounds). Prior to clinical evaluation it is necessary that the lead compound is rigorously evaluated for ADME, PD and PK properties ensuring clinical efficacy without deleterious side effects. The originality of the proposed 2P2I approach (A and B) resides in the fact that an acceleration of the process is possible since A) enriches or selects for a subset of molecules to be further used in defining the chemical space and the scoring parameters needed to reproducibly identify small molecule protein-protein interaction inhibitors when no known inhibitors are available.
transduction pathways, and more importantly are believed to be requisite in many metabolic and signaling pathways ([48] and references therein). Despite the absence of highresolution structural data for many membrane protein families the discovery of small molecule therapies continues to increase with ~2000 new patents filed each year for drugs and/or small molecules that target select membrane protein families [48]. Using static or magic angle spinning (MAS) solid state NMR approaches and novel expression vectors for this family of proteins, a wealth of recently developed
dynamic and ligand binding data offers new promise in structure-based discovery initiatives of membrane proteins [48-51]. While important, solid-state approaches in drug discovery will not be discussed further. For a comprehensive review of the current state of this field the reader is referred to an excellent review recently published by Dr. Anthony Watts [48]. This review will focus on the use of solution NMR spectroscopy, which since the advent of structure activity relationships (SAR) by NMR has been the principal NMR method for fragment-based lead screening of protein-
Accelerating Drug Discovery
ligand interactions, although other approaches have been developed, and integrated into the biochemical and biophysical arsenal of NMR screening platforms ([29,52,53] and references therein). INTEGRATION OF NMR INTO DRUG DISCOVERY NMR is an established method for the three-dimensional structure determination of small proteins (≤ 40 kDa) and is an archetypical method for characterizing the molecular dynamics that are critical for macromolecular complex assembly ([52,54] and references therein). In recent years the molecular mass range amenable to structure determination by NMR has increased significantly (> 50 kDa) with the development of triple resonance pulse sequence technologies, increased magnetic field strengths and heteronuclear recombinant protein expression methodologies [52,55-57]. Similarly, the use of residual dipolar couplings, selective and/or segmental labeling strategies and the use of transverse relaxation optimized spectroscopy (TROSY), provides an optimized biophysical tool amenable to the study of complex macromolecular systems [58-62]. Importantly, the ability of NMR spectroscopy to provide structural details for protein-protein and protein-ligand interaction interfaces in protein complexes well beyond 100 kDa is invaluable for translating these binding data into interaction interfaces that are now, as previously noted, viable targets in an emerging paradigm of discovery initiatives in search of novel small molecule therapeutics [29,52]. The increasing prevalence of NMR spectroscopy in drug discovery pipelines is centered around the ability of NMR to detect ligand binding over many affinity ranges, while also providing a detailed structural picture of the entire target and permitting the identification of the specific ligand binding site ([63] and references therein) (Fig. (1)). Furthermore, NMR-based HTS, fragment screening or “fraganomics”, SAR by NMR, and other NMR applications involved in hit through lead validation and optimization have been integrated into many discovery pipelines to facilitate earlier stage “false positive” triage ([57] and references therein). The success of these NMR-based approaches supports an increasingly important role for NMR in both ligand and target validation, which is likely to become increasingly significant as we extend the boundaries of conventional target space [64]. HIT IDENTIFICATION Chemical Shift Perturbation Analysis In addition to providing detailed structural information pertaining to newly identified drugable targets, NMR spectroscopy is the preferred method for characterizing the structural perturbations resulting from protein-protein and/or protein-ligand interactions over several affinity ranges. Ligand-induced localized changes of the chemical environment of nuclei that are within the recognition/binding site results in chemical shift perturbations (CSP) of those resonances critical for binding [52,65]. If the ligand binding exchange rate is fast on the NMR chemical shift timescale, which suggests that 1/τ is faster than the chemical shift difference between the bound and free states of the ligand (τ is the lifetime), an average chemical shift for each resonance
Current Computer-Aided Drug Design, 2007, Vol. 3, No. 1 37
is observed and easily followed as a function of ligand added. In the fast exchange regime (