A computational de novo design tool for generating

0 downloads 0 Views 1MB Size Report
drug-like molecules, and its application in fragment based drug design ... The de novo design tool described in this poster uses reaction vectors to encode.
A computational de novo design tool for generating synthetically feasible drug-like molecules, and its application in fragment based drug design 1,

1 Gillet ,

B.C. Allen V.J. B. M.J. 3 3 J. Cole and J. Liebeschuetz 1 University 2 Eli 3 Cambridge

1 Chen ,

2 Bodkin ,

of Sheffield, Sheffield, United Kingdom Lilly, Erl Wood, United Kingdom

Crystallographic Data Center, Cambridge, United Kingdom

Introduction The goal of de novo design is to identify novel compounds with therapeutic potential from anywhere in chemical space, rather than being restricted to searching databases of previously synthesised compounds. The vast size of chemical space requires a highly efficient search strategy. Defining the therapeutic potential of a compound is an inherently multidimensional problem, as activity, bulk properties, ADME/Tox etc all need to be optimised simultaneously. Finally there is little point in identifying theoretically interesting compounds which cannot be easily synthesised. The de novo design tool described in this poster uses reaction vectors to encode known reactions which can then be applied to previously unseen reagents to generate novel products, and evolutionary multiobjective optimisation to search the available chemical space.

Figure 6 – Pharmacophore triplet driven molecule building

Part 1 shows an drug molecule within a protein binding site. The triangles represent some of the pharmacophore feature triplets found in that molecule. Each triplet is composed of 3 atoms each allocated to one of Donor, Acceptor, Polar, Anion, Cation or Hydrophobe classes.

Part 2 shows how triplets provide a useful series of goals for the tool to build towards, without overly constraining the exact route. The tool will preferentially generate molecules with Acceptors at A and B, and a Hydrophobe at C, based on topological distances from the atoms in the starting material.

C

Reaction Vectors A reaction vector (RV) is generated from the vector representations of the reactants and products1. Multiple reactions can give rise to the same RV, with the degree of redundancy dependant on the range and complexity of the descriptors used. In this work the descriptor used is the count of the atom pairs, as defined in Figure 1. Currently only AP2 and AP3 are considered, as this appears to allow sufficient generality to allow novel molecules while maintaining specificity. Figure 1 shows the RV generated from an aromatic halogenation. Prior to generating RV’s it is necessary to clean the database, to balance incomplete reactions. Figure 1 - Generation of the reaction vector.

A

1. Product Atom Pair

Count

B

2.

Reagent Atom Pair

Count

C(2,2,1)-2(4)-C(2,2,1)

4

C(2,2,1)-2(4)-C(2,2,1)

6

C(3,2,1)-2(1)-Cl(1,0,0)

1

C(2,2,1)-3-C(2,2,1)

6

C(3,2,1)-2(4)-C(2,2,1)

2

C(2,2,1)-3-C(2,2,1)

4

C(3,2,1)-3-C(2,2,1)

2

Thrombin

C(2,2,1)-3-Cl(1,0,0)

2

Thrombin is a serine protease found in the bloodstream, that catalyses a number of important coagulation reactions4. There has been considerable effort in the development of direct Thrombin inhibitors as anticoagulant drugs, to treat cardiovascular disease. While several of the most promising compounds are polypeptides, e.g. Hirudin and Bivalirudin, there are some small molecule inhibitors in clinical use, e.g. Argatroban, and ongoing efforts to develop more.

Positive Atom Pairs

Negative Atom Pairs

C(3,2,1)-2(4)-C(2,2,1)

+2

C(2,2,1)-2(4)-C(2,2,1)

-2

C(3,2,1)-2(1)-Cl(1,0,0)

+1

C(2,2,1)-3-C(2,2,1)

-2

C(3,2,1)-3-C(2,2,1)

+2

C(2,2,1)-3-Cl(1,0,0)

+2

Atom Pairs: X1(h,p,r)-S(o)-X2(h,p,r) where: • X1 and X2 are the atomic symbols. • S is the path separation between the atoms. • h is the number of non-hydrogen connections. • p is the number of π electrons. • r is the no. of rings • o is the bond order (only relevant for S=2).

Given an RV and a reagent, generating a product structure is a three step process: 1. Test the reagent to ensure it is suitable for the reaction. For example, in a single component reaction the reagent must contain all the negative AP’s in the RV. 2. Generate a reagent fragment by removing negative AP’s. Figure 2 shows the fragmentation process. Two AP2’s need to be removed, and only one pathway generates a fragment with the correct set of lost AP3’s. The various incorrect pathways exhibit removal of non-matching AP2’s, loss of AP3’s which are not lost in the RV, or final fragments which still retain AP3’s that need to be removed. 3. Grow product(s) by adding positive AP’s. Figure 3 shows the growth process. It proceeds as a breadth first search of all possible AP2 additions. Each product fragment is validated, and removed if it contains an incorrect AP3 or lacks a site for an unused AP2. Figure 2 – Generation of Reagent Fragments

Figure 3 – Growth of Product Molecules

To demonstrate the utility of the de novo design tool, it has been applied to a well studied problem in fragment-based drug design; deriving drug like molecules from active fragments.

Figure 7 shows three fragments with measured Thrombin inhibition. Using these fragments plus some modified versions as starting materials, and using pharmacophore similarity to known Thrombin inhibitors, together with Lipinski based criteria to drive the output into drug-like space, several sets of novel potential inhibitors have been generated. Figure 8 shows a plot of the Pareto front at various points during a program run. It shows that as new molecules are generated the population as a whole moves towards the optimisation criteria. Figure 9 shows the inhibitor targets used, and some of the more promising output molecules, compared with other known actives from the Chembl database. The molecules generated are novel, and appear to fall into the same area of chemical space as some known Thrombin inhibitors. They are suitably druglike, and can in theory be generated from the fragments using a small number of known reactions. Clearly this approach can also generate unsuitable molecules, but by generating a set of output molecules distributed evenly through the range of solutions, some are likely to be plausible.

Figure 7 – Thrombin Fragments

IC50 = 400μm IC50 = 330μm

0.9

0.8

0.7

0.6

0.5

Conclusions

Evolutionary Algorithms and Multidimensional Optimisation Figure 4 – Pareto ranking for a two objective problem

Pareto front

Figure 5 – Knime WorkFlow

Initial molecules After 50 new molecules After 100 new molecules After 200 new molecules

0.4 Given a database of reactions, a set of -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 logP desirable criteria and an appropriate starting material, it is possible to generate a set of novel molecules that cover the range of optimal combinations of desirable properties. Since all steps in generating the molecules are derived from known reactions they should be relatively easy to synthesize. By choosing suitable criteria the tool can be used to generate drug like molecules with similar pharmacology to known actives. This tool is highly applicable to problems in fragment based drug design.

Figure 9 – De Novo designed Thrombin inhibitors Similarity Target

Generated Molecules

Measured Inhibitors

Objectives for De Novo Design

A.

B.

Evolutionary algorithms are a well established set of techniques for searching for optimal solutions from large datasets2. They require: • A representation of the potential solutions.  Molecules as SMILES strings. • An evaluation or ‘fitness’ function.  Pareto ranks generated from multiple objectives using the NSGA-II3 algorithm, as shown in Figure 4. • A selection method to choose which solution will reproduce.  Roulette wheel selection. • Evolutionary operators to generate modified solutions.  Reaction vectors as molecular mutators. The tool was coded as a node within the Knime chemical data integration platform, as shown in Figure 5. Part A shows the reaction cleaning and RV database generation, and part B shows the de novo design tool. This allowed for simple control of input data sets, and ready access to numerous tools for analysis of results.

IC50 = 1000μm

Figure 8 – The Pareto front for Thrombin inhibitor generation

1

Similarity

Reaction Vector

Test Sets and Results

Input molecules

Reaction vectors (as SQL table) Reagent database

The selection of suitable design objectives is crucial to the successful generation of reasonable output molecules. Optimising simple properties such as molecular weight, or logP is straightforward, however to generate molecules with targeted biological activity more complex objectives are required. These results were generated using topological similarity of pharmacophore triplet fingerprints to known actives, as shown in Figure 6. These descriptors were selected because they allow the tool to access structurally dissimilar molecules which share pharmacological features with the actives.

Further Work The key validation step for this tool is to apply it to real world problems in collaboration with synthetic chemistry, and test the products. There are several other design objectives, such as QSAR’s or docking/binding energy calculations, that could be included in the tool and might provide better biological activity in the product molecules. Additionally work is in progress to improve the reaction vector algorithm, to speed it up and to allow accurate assessment of the size of potential compound libraries prior to generating all possible compounds. The tool also has the potential to be run in reverse to generate plausible synthetic routes to a known molecule.

References 1. 2. 3. 4.

Patel, H.; Bodkin, M. J.; Chen, B.; Gillet, V. J., Knowledge-Based Approach to de Novo Design Using Reaction Vectors. Journal of Chemical Information and Modeling 2009, 49, (5), 1163-1184. Coello, C. A. C. In An introduction to evolutionary algorithms and their applications, 5th International School and Symposium on Advanced Distributed Systems, Guadalajara, MEXICO, Jan 24-28, 2005. Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T., A fast and elitist multiobjective genetic algorithm: NSGA-II. Evolutionary Computation, IEEE Transactions on 2002, 6, (2), 182-197. Gurm, H. S.; Bhatt, D. L., Thrombin, an ideal target for pharmacological inhibition: A review of direct thrombin inhibitors. American Heart Journal 2005, 149, (Supplement 1), S43-S53.