Case-Based Design for Tablet Formulation - CiteSeerX

3 downloads 2115 Views 268KB Size Report
sification applications, for a more knowledge intensive CBD task, and proposes ... from a library, and are suited to diagnosis, advice and help-desk applications.
Appears in Proceedings of the 4th European Workshop on Case-Based Reasoning, c 1998 Springer-Verlag (www.springer.de). All rights reserved. pp358-369. Copyright

Case-Based Design for Tablet Formulation? Susan Craw1, Nirmalie Wiratunga1, and Ray Rowe2 1

School of Computer and Mathematical Sciences, The Robert Gordon University, St Andrew Street, Aberdeen AB25 1HG, UK

fS.Craw,[email protected] http://www.scms.rgu.ac.uk

ZENECA Pharmaceuticals Hurds eld Industrial Estate Maccles eld, Cheshire SK10 2NA, UK 2

Abstract. Case-Based Design (CBD) applies a knowledge-based process to the knowledge commonly associated with Case-Based Reasoning (CBR) systems { the library of exemplars. This paper investigates the problems in using commercial CBR tools, primarily aimed at classi cation applications, for a more knowledge intensive CBD task, and proposes techniques that overcome some of these diculties. This work results from the development of a pharmaceutical CBD system Cbr-Tfs that proposes tablet formulations in order to manufacture viable tablets. Results show that Cbr-Tfs proposes useful ingredients for the tablet, and that the quantities it suggests are well within the limits of the tablet manufacturing process. CBD's increased need for specialised adaptation knowledge is also highlighted and this raises the issue of its acquisition.

1 Introduction Many Case-Based Reasoning (CBR) systems classify feature vectors by retrieval from a library, and are suited to diagnosis, advice and help-desk applications. More sophisticated classi cation tasks demand more complex, knowledge-based techniques for retrieval and similarity matching. Case-Based Design (CBD) can be even more demanding since it is now clear that the adaptation phase is also critical [15], and this is also highly knowledge intensive and domain speci c. This has highlighted the de ciency of regarding the library as the only knowledge source, and led Richter to identify 4 knowledge containers [12]: description language, similarity measures, transformations for adaptation, as well as the cases themselves. Therefore, a knowledge acquisition bottleneck occurs for more than the explicit problem-solving knowledge in the library. The early success of CBR tools resulted in e ort being concentrated on classi cation and meant that CBR tools are suited to simpler classi cation applications; the evolution of rule-based expert systems technology also followed this ?

The work reported here underpins a recently awarded EPSRC grant (GR/L98015) that aims to provide automated tools that assist knowledge acquisition for CBD.

Case-Based Design for Tablet Formulation

359

pattern initially. Even if a CBR tool provides general mechanisms that store and access complex exemplars, it is often more dicult to use the tool to incorporate and apply the domain speci c knowledge that is required for retrieval and adaptation, so essential for CBD. This paper describes the development of Cbr-Tfs, a CBD system for tablet formulations; designs that identify the excipients and their quantities that are added to a drug in order to manufacture a viable tablet delivering a desired dose of the drug. This task incorporates many of the complications of the CBD process, whilst retaining a relatively structured representation for exemplars. We introduce the tablet formulation task in Section 2, before reviewing some CBD approaches in Section 3. The development of Cbr-Tfs, our CBD system for tablet formulation, is described in Section 4. An evaluation of the e ectiveness of Cbr-Tfs is reported in Section 5 before we draw some conclusions from this experience in Section 6.

2 Tablet Formulation The design of a new tablet involves identifying inert substances called excipients to balance the properties of the drug so that the tablet is manufactured in a robust form, and the desired dose of drug is delivered and absorbed by the patient. Excipients play the role of llers, binders, lubricants, disintegrants and surfactants in the tablet. The diculty of the formulation task arises from the need to select a set of mutually compatible excipients, whilst at the same time satisfying a variety of other constraints. Furthermore, a formulation speci es the quantity of each of the added excipients. We have a dataset of formulations for 13 drugs in doses 10mg, 60mg, . . . 360mg. We use it to build libraries of formulations for a subset of the drugs and create requirements for formulations for drugs not in the library. Our collaborator Zeneca Pharmaceuticals currently uses a rule-based system Tfs [13] to create tablet formulations. It is interesting to note that prior to Tfs the human formulators applied a case-based approach, but now Tfs is in routine use at Zeneca, where it forms an important stage in the development of tablet formulations for new drugs. Here we investigate the suitability of CBR for tablet formulation, since the knowledge acquisition e ort for the original rule-based Tfs was substantial, and the subsequent maintenance to re ect a policy change in tablet formulation required signi cant re-engineering. In another project we successfully replicated the debugging and maintenance of the rule-based Tfs using our automated knowledge re nement tool KRUST [3]. The knowledge engineering demand is demonstrated by the fact that Tfs is one of the few knowledge-based formulation systems in regular commercial use [4].

3 Case-Based Design A design is a complex combination of interdependent components, whereas the solution for a classi cation problem is typically a nominal value, such as a dis-

360

Susan Craw et al.

ease name, chosen from an enumerated list. The complexity and variety of both the situation in which the design is to be used, and the constraints that must be satis ed, demand a powerful knowledge representation for exemplars, and hence an ecient index for the library and a sophisticated similarity function. Complex, object-oriented structures are common in CBD; e.g. chemical process designs [14], architectural design [5, 10]. This is further complicated by the fact that the similarity of stored designs with new requirements may appear fairly contrived; e.g. with a styrofoam \cup" the material provides the functionality of a handle without being structurally similar. Structure-Behaviour-Function models are one approach that incorporates purpose with the more obvious content [9, 2]. Sophisticated knowledge-based matching is a feature of CBD, but is not restricted to design; e.g. Protos's success with audiology diagnosis is partially attributed to its matching [11]. Adaptation is vital for CBD systems since the creative nature of design means that it is impossible for the CBD library to contain all the possible designs. In contrast, a CBR library of classi cation exemplars may very reasonably contain examples of each of the possible classes, although it may not contain exemplars that cover all possible situations in which these classes apply. Adaptation is primarily knowledge-based since the circumstances in which a design should be adapted and how the change should be achieved are each knowledge intensive. At best the new design must be created from parts of retrieved designs, but normally retrieved designs must be altered to take account of di erences in the new situation as in Julia [7], or repaired to overcome failed designs as with Chef [6]. In contrast, Dial [8] explicitly stores adaptation exemplars, in addition to the normal library, in order to recall previous adaptations.

4 The CBR-TFS System is implemented in ReCall, a tool o ering a suciently rich knowledge representation for tablet formulation exemplars, and allowing Tcl scripts to de ne specialised CBR methods. Thus, ReCall supports standard retrieval and adaptation methods with the exibility to tailor these for the sophisticated matching and adaptation required by tablet formulation, and design more generally. Figure 1 indicates the facilities available for each of the phases in the standard R4 cycle [1]. Cbr-Tfs

4.1 Object-Oriented Representation for Cases An object-oriented approach stores properties of individual drugs and excipients once for re-use in di erent tablets. Tablet objects contain properties of the tablet such as dose, tablet weight and yield pressure. They also contain links to other object types. Drug objects contain properties of the drug such as solubility, stability, yield pressure. They also contain the stability of the drug with each available excipient. Filler/Binder/Lubricant/Disintegrant/Surfactant objects each contain properties of the excipient such as solubility, yield pressure.

Case-Based Design for Tablet Formulation

361

New Problem Recall CBR Shell

Library RETRIEVE

ID3 Decision Tree Indexes

Relevant Cases REUSE Most Similar Cases

Similarity Matching Weighted Attributes Tcl Functions

REVISE

Voting Mechanism Tcl Adaptation

RETAIN Solution for New Problem

Fig. 1. Facilities in ReCall The formulation exemplars in the library are tablet objects with most of the attributes instantiated with a value. Some drugs do not need a surfactant in which case all the surfactant attributes are empty. Similarly if a drug is unstable with a particular excipient then the stability for this excipient in the drug object is zero. Figure 2 depicts a typical exemplar. A probe describes the properties of the drug and the dose to be delivered for a new tablet to be formulated. The objects and attributes are identical to exemplars, but only the Drug object is fully instantiated and only the DrugLink and Dosage slots in the tablet object are instantiated; all other slots are lled by the CBD process. Tablet1 DrugLink: Dosage: 10 DrugConcentration: 0.1 FillerConcentration: 0.8 TabletDiameter: 6.92 TabletWeight: 95 YeildPressureSlow: 141.42 StrainRateSensitivity: 22.42 other tablet properties... FillerLink: FillerAmount: 84.2% DisintegrantLink: DisintegrantAmount:2.1% other tablet properties ...

DrugA DrugName: DRUG-A DrugSolubility: 50 DrugSolubilityCategory: Soluble Lactose DrugStability: 97.7 ExcipientName: Lactose DrugYieldPressure: 65 ExcipientSolubility: 166 ExcipientYieldPressure: 158.8 other drug properties... Croscarmelose: 96.8 other excipient properties ... Gelatin: 102.8 MaizeStarch: 71.6 Croscarmelose Talc: 0 ExcipientName: Croscarmelose stabilities with other excipients ExcipientSolubility: 0 ExcipientYieldPressure: 40.3 other excipient properties ...

Fig. 2. A Tfs Exemplar

362

Susan Craw et al.

4.2 Decision Tree Indexing The ID3 induction algorithm builds a decision tree which is then used as an index for the library of exemplars. In our experiments the smallness of the library did not necessitate the use of an index, but the repeated similarity tests we used bene t from initial selection from larger libraries. It is important to note that ID3 is a learning algorithm for classi cation problems; i.e. given a small set of classes the decision tree will select the predicted class for a new problem. Figure 3 contains a typical decision tree for 48 exemplars where the classes are ller names. Each node in the decision tree contains an attribute that partitions the exemplars according to the attribute value each takes. ID3 is used to select which attribute should be used at each decision, to provide the most discriminating partition for the class that is being predicted. Therefore, the root node of the example decision tree contains all 48 exemplars, and the solubility category splits the library into three sets of 16 exemplars. These are further split by StrainRateSensitivity (SRS) or YieldPressure(YP), and the leaf nodes each contains exemplars with a particular ller. Two leaf nodes contain more than 1 ller name since these exemplars cannot be further distinguished. DrugSolubilityCategory = Soluble ex: 16

Ex: 48

DrugSolubilityCategory = Very-Slightly-Soluble ex: 16

DrugSolubilityCategory = Practically-Insoluble ex: 16

Lactose, Magnesium-Carbonate, Calcium-Phosphate YP 37.5 ex: 8

Magnesium-Phosphate

SRS 38 ex: 8

Ca-Dihydrogen-Phosphate

Fig. 3. A Typical Index Tree The decision tree is used as an index by selecting those exemplars at the leaf node corresponding to the answers for the probe. Since the decision tree is used with the probe, the attributes that are used in the decision tree must be restricted to those that appear in the probe; i.e. Dosage, DrugLink and the attributes of the Drug object. We did not include both DrugSolubilityCategory and DrugSolubility and experimentation suggested that DrugSolubilityCategory gave simpler trees than DrugSolubility. An ID3 decision tree index works well for case-based classi cation since the tree is built using the classi cation attribute as the concept. In contrast, for CBD it is dicult to know which of the many attributes of the solution should be used as the concept to be learned. We now describe the experiments that suggested a suitable use of index trees.

Case-Based Design for Tablet Formulation

363

4.3 Experimentation with Indexes We investigated the retrieval success for 10mg doses of one drug from a library containing formulations for the other 12 drugs, using decision tree indexes constructed for the concepts ller, binder, disintegrant, lubricant and surfactant in turn. This experimentation with 10mg tablets showed that assembling the exemplars retrieved by each of the 5 tree indexes (to predict ller, binder, lubricant, disintegrant and surfactant) often produced more successful formulations than using the ller tree alone to predict the ller, etc. Further experiments resulted in the following strategy being adopted in Cbr-Tfs: Filler/Binder/Disintegrant: assemble exemplars retrieved by all 5 tree indexes; Lubricant: use only exemplars retrieved by index built for lubricants; and Surfactant: use only exemplars retrieved by index built for surfactants.

4.4 Similarity Matching The exemplars retrieved by the index are further ltered by applying a similarity function to assess their closeness to the probe. The similarity function is used to rank the retrieved exemplars and so select a subset of most similar retrieved exemplars to pass to the adaptation phase. ReCall applies a simple di erence calculation for numerical attributes and a binary match/no-match decision for symbolic attributes, but it allows a weighting factor to a ect the relative importance of attributes; Cbr-Tfs assigns a higher weighting to dose since it is a particularly important factor in formulation. Cbr-Tfs selects the retrieved exemplars to be used in adaptation by nding the intersection of the following sets1 : Absolute Threshold selects exemplars whose similarity with the probe is above some prede ned value; Relative Similarity selects exemplars whose similarity with the probe is close to that of the best match; and Di erence Method selects the most highly rated exemplars until the next exemplar is signi cantly poorer than its predecessor in the similarity ranking.

Figure 4 shows an example with absolute threshold 50, relative similarity 20 and di erence threshold 6. Here, the intersection corresponds to the subset for the di erence method.

4.5 Adaptation of Excipient ReCall's standard method for adaptation is a voting mechanism where the most frequently occurring excipient in the retrieved exemplars is used. But it also allows the developer to add Tcl functions to implement more specialised adaptation. Cbr-Tfs applies additional Tcl functions throughout adaptation; e.g. it 1

Each necessarily selects exemplars most similar to probe.

364

Susan Craw et al.

Exemplar85 Exemplar22 Exemplar53 Exemplar40 Exemplar86 Exemplar23 Exemplar41 Similarity Similarity Similarity Similarity Similarity Similarity Similarity 60 58 57 50 49 39 15

Suggest Documents