Automatic Query Generation from Computerized Clinical Guidelines Zafar Hashmi1 , Tatjana Zrimec1, and Andrew Hopkins2 Center of Health Informatics, University New South Wales, Australia 2 Cardiology Department, St George Hospital, Sydney, Australia {Zafarh,Tatjana}@cse.unsw.edu.au,
[email protected] 1
280
International Journal of Information Studies
Volume 1 Issue 4
December 2009
Automatic Query Generation from Computerized Clinical Guidelines Zafar Hashmi1 , Tatjana Zrimec1, and Andrew Hopkins2 Center of Health Informatics, University New South Wales, Australia 2 Cardiology Department, St George Hospital, Sydney, Australia {Zafarh,Tatjana}@cse.unsw.edu.au,
[email protected]
1
ABSTRACT: A framework has been developed that automatically generates context-specific query, and determines its query type, from computerized clinical practice guidelines (CPGs). The generated query can be submitted to PubMed using web services to retrieve and link relevant medical literature pertaining to the computerized CPGs content. This framework makes use of contexts, semantics, statistical information and meta-information of the medical phrases in the Extended-Knowledge Components of the computerized CPG content. The medical literature retrieved and linked, by our framework, is found to be relevant to the knowledge of the computerized CPG. Keywords: Clinical guidelines computerization, Automatic query formulation, Medical literature retrieval. Received: 12 May 2009, Revised 28 June, Accepted 12 July 2009 1. Introduction Within the omnipresent internet, a large number of information systems for medical literature have been developed on the web [1, 2]. Such medical information systems, e.g. PubMed, MedlinePlus, Cochrane, provide comprehensive coverage of evidence based medical literature [1]. However, the search interfaces provided by these systems are less than ideal, potentially hindering health practitioners’ ability to fully utilize the data held within them [1, 3]. Most systems require the physician to formulate a focused query consisting of a few keywords, which need to be carefully selected to clearly define the information needs [1, 3-5]. Furthermore, the query may also require the selection of a query type such as “diagnosis”, “etiology”, “prognosis” or “therapies”. Previous studies have shown that searching for clinical information using the web-facilities is labor-intensive, time-consuming and complicated [4, 6-11]. Moreover, it has also been found that the health practitioner’s needs are often not satisfied [4, 8]. Clinical practice guidelines (CPGs) have been developed to provide comprehensive up-to-date interpretation of evidence-based best clinical practices [12]. Yet these complex guidelines are often not understood or, more importantly, fully implemented [13-15]. As a result there have been different methods suggested to transform CPGs into a computer understandable format [16-20]. Such computerized CPGs are used with decision support systems to provide evidence-based assistance [15, 21]. It has been observed that healthcare practitioners, when using CPGs, tend to validate or to supplement their understanding of the CPGs [22, 23]. Practitioners may refer to the current medical literature to acquire deeper insights into the clinical trials that support the clinical guidelines [23, 24]. There is an ongoing research to find efficient ways to fulfill healthcare practitioners’ information needs and clinical inquires at point of care [4, 25, 26]. It has been investigated and shown [29] that automatically retrieving and linking relevant online medical evidences to CPGs would alleviate the burden on clinicians, demands on their time and would provide more focused-assistance at point of care [27-28]. As the clinical guidelines deal with variety of illness issues such as diagnosis, etiology, treatment, therapy etc, so, the linked medical literature should be contextually relevant to the CPGs content [23, 30]. Consequently, automatically linking online medical literature to the knowledge content of CPGs would provide potential benefits to healthcare practitioners. In order for this to be effective the linked medical literature must be retrieved by a focused, semantically and contextually correlated clinical query to the CPG content. We have developed an approach for automatically linking current best evidence to particular segments of computerized clinical practice guidelines for providing relevant and more focused information. In this paper, we present our Context Specific Query Generation Framework (CQGF). This framework automatically generates queries from refined medical terms and phrases derived from computerized CPGs content [31, 32] and determines its query type. This is achieved by using the context, semantics and meta-information of the computerized CPG’s knowledge content. The generated query and its query type are sent to PubMed by using web-services to retrieve relevant medical literature pertaining to the CPGs’ knowledge content. The retrieved literature is linked and embedded along
International Journal of Information Studies
Volume 1 Issue 4
December 2009
281
with its abstract to the corresponding CPG’s knowledge content. The results of applying CQGF in cardiology domain are encouraging. 2. Representing Computerized Clinical Practice Guidelines with Extended Knowledge Components Clincal practice guidelines aim to comprehensively detail the available evidence and make recommendations. However they can be complex and they lack consistent structure, making them unsuitable for computerized applications. As a result it is necessary to segment the information contained in CPGs to allow further computerized analysis and the generation of effective algorithms to condense the key CPGs knowledge base into effective keywords to be used in a literature search. To this end we have developed a CPGs computerization framework that models the key aspects of clinical practice guidelines and [31, 32] an effective segmentation of CPG data. This framework is based on the document-centric GEM model [18] which is an XML format designed to allow the development of decision support systems based on CPGs. We have extended the GEM model’s Knowledge Components node to represent the knowledge structure of CPG’s content, and we termed these Extended-Knowledge Components. This representation models the clinical practice guidelines content in concise, yet clinically-focused segments of knowledge content. We have enriched the Extended-Knowledge Components with important information including a contextual impact factor, semantics and other meta-information pertaining to CPG content. These ‘Extended Knowledge Components’, created automatically, are computer-interpretable. This process allows the CPG to be represented by a set of Extended-Knowledge components, which can be formally represented by equation 1. A CPG is represented by ‘C’ and an extended knowledge component is represented by ‘Ex-KC’. C = {Ex-KC1, Ex-KC2….Ex-KCn}
(1)
A segment about Reperfusion therapy from the Australian clinical practice guidelines for the management of acute coronary syndromes 2006 (ACS_06)[33] is shown in Figure 1. Figure 2 shows an excerpt of an Extended Knowledge component (Ex-KC ) as a result of the computerization of that segment of the CPG. . 3. Context Specific clinical Query Generation Framework (CQGF) To retrieve ‘relevant’ medical literature related to the CPGs’ knowledge content, medical practitioners need to formulate a search query. Such a search query should manifest CPG-context, semantics, medical problem and clinical intention by using focused medical phrases [1, 3-5]. Multiple studies have investigated healthcare practitioners’ information needs and information seeking behavior [4, 6, 7, 25, 34-38]. Based on their findings, different query models and search strategies as well as techniques to categorize queries into query types have been proposed. However, concepts like scenario specific query, query types, tasks oriented query, query templates, and query categories have almost same meaning. Recent studies reveal that almost 60 % of clinicians queries center on specific query types or scenarios [2, 7, 10, 34-37, 39]. Cimino et al. (2002) showed that clinicians’ questions are predictable and they focus on specific tasks. Therefore, focused and context-sensitive medical queries, either from a healthcare practitioner or system-generated, should appropriately frame the problem and clinical intention. This can be achieved by using a list of specialized problem-specific medical-terms and concepts as the search query [5, 23].
Figure 1. A segment of content from ‘ACS 06’ CPG entitled reperfusion therapy
282
International Journal of Information Studies
Volume 1 Issue 4
December 2009
In devising our framework for query generation the following objectives were considered: 1. Retrieved literature should be focused and relevant to Extended-Knowledge Components since they are highly focused for specific clinical problems/procedures. 2. The search query should be objectively and automatically derived from the content of computerized CPGs (i.e. from Ex-KCs). 3. To formulate a more focused search query by exploiting the Ex-KC syntactic structure (elements tags), contextual, semantic and other meta-information. 4. To use a query model that captures the scope and intentions of Ex-KC. As most of the clinicians’ search queries center on specific scenarios or query types. To achieve the above defined objectives, we have developed a technique that (a) extracts potentially important medical terms and phrases based on the contextual importance, (b) filters out insignificant medical candidates, (c) exploits the information about the semantic types to find association between medical phrases and (d) categorizes search query based on defined clinical query intention defined in the query model. The resulting context-specific query is used to query MEDLINE using the ‘web service’ provided by PubMed’s E-utilities. After retrieving relevant medical articles, they are systematically linked and embedded within the Ex-KC. In the following section we describe our algorithm that carries out above defined tasks. 3.1 Algorithm for Context-Specific Clinical Query Generation Technique The heuristic for our technique of context-specific query generation from Extended-Knowledge Components is shown in the flowcharts in Figures 3, 4 and 5. Specifically (Figure 3), potentially significant medical term candidates are extracted from Ex-KCs. These medical term candidates are passed through a redundancy filter to eliminate redundant medical term candidates. Remaining medical term candidates are evaluated based on the significance of their semantic types. This is done by applying a semantic type filter to remove those medical term candidates that have semantically less significance. The resulting medical term candidates are then analyzed and additional meta-information is attached to each candidate by using Term frequency analysis and by Total weight calculation.
Figure 2. An Excerpt of an extended knowledge component (EX-KC) resulted from the computerization of ACS_06 CPG segment of reperfusion therapy procedures
International Journal of Information Studies
Volume 1 Issue 4
December 2009
283
A disease ontology filter is applied to each potentially significant medical term candidate to further specify the meaning of each medical term candidate. Those medical term candidates that do not conform to disease ontology are filtered out. In the next step three sets of potentially significant medical term candidates are created: • Total weight (TW) set - In this set all filtered medical term candidates that are potentially significant are sorted based on their total weight. The details for total weight technique are contained in 3.2.4 below. • Semantic relation score (SMR) set - This set has the same medical term candidates as in the TW set, the only difference is that the medical term candidates are sorted based on a Semantic Relation Score (SMR score). We have quantified the mutual-relation of each medical term candidate based on their semantic types. The details of this technique will be discussed in 3.2.6. • Knowledge component (KC) set – This set consists of all those significant medical term candidates which are found under the KC element of an Ex-KC. To formulate the final query set, a threshold is applied that is used to determine the number of medical term candidates for a final query set. If the number of potentially significant medical candidates is less than or equal to the threshold then all candidates are members of the final query set. The medical terms and phrases are used to find the query type (details in 3.2.8). If the number of medical term candidates is greater than the threshold then a Common Term Analysis is applied to the above defined three sets to reduce the number of candidates. Figure 4 shows the steps for the Common Term Analysis. With this analysis common medical terms and phrases that are within certain threshold values from the three sets of medical term candidates are identified (details in 3.2.7). The threshold values are based on the highest value of the total weight and the highest value of the SMR score of the candidates in TW set and SMR set. The goal here is to find a candidate that is common in at least two sets and has highest TW score and SMR score. This candidate becomes a member of the final query set. The threshold values vary between 100% to 25% of the highest values of the total weight and SMR score.
Figure 3. Flowchart of search query generation from Ex-KC – Generation of potentially significant medical candidates
284
International Journal of Information Studies
Volume 1 Issue 4
December 2009
Figure 4. Flowchart of the Common Term Analysis process The Common Term Analysis is continued until the number of candidates in the final query set reaches the threshold. Once the final query set is created the medical candidates are used to find the query type (see Figure 5). 3.2 Functional Flow of the Context Specific clinical Query Generation Framework (CQGF) In this section, we describe the functional flow of our CQGF framework and explain its modules. The CQGF architecture with its ‘functional-overview’ is shown in Figure 6. As described earlier (section 2), after computerization of CPG, its content is transformed into ‘Extended-Knowledge Components’ (Ex-KCs) that constitute the ‘Ex-KC knowledge base’. Each Ex-KC is
Figure 5. Determining the Query Type of a query set
International Journal of Information Studies
Volume 1 Issue 4
December 2009
285
Figure 6. CQGF architecture and functional overview taken one by one to be processed by CQGF framework so that a ‘Context-specific’ query from the Ex-KCs can be generated. The resulting queries are submitted to retrieve relevant medical articles pertaining to the Ex-KCs. These medical articles are linked and their abstracts are embedded within the corresponding Ex-KC. 3.2.1 Medical Terms Extractor The ‘Medical Phrase Extractor’ module extracts medical phrases from Ex-KCs. This module extracts only those medical phrases/terms from Ex-KCs that have specific ‘contextual impact factors’, as defined by the GEM model, within the Ex-KCs. The significant contextual segments within the Ex-KCs are (i) ‘Knowledge Component’ segment, (ii) ‘Recommendation’ segment, (iii) ‘Decision Variable’ segment, (iv) ‘Action’ segment and (v) ‘Imperative’ segment. The medical phrases within these five segments in the Ex-KC have relatively higher impact factors than other elements, so that, these have greater potential to be representative of the Ex-KCs. The ‘Medical Phrase Extractor’ sends the extracted medical phrases and terms to a ‘Redundancy Filter’ 3.2.2 Redundancy Filter There is a strong likelihood of redundancy in a set of extracted medical term candidates due to the same term being present in several different segments of the Ex-KC. The ‘Redundancy Filter’ removes all the redundant medical term candidates from the potential query set. The set of remaining medical term candidates is sent to ‘ST Filter Module’. 3.2.3 Semantic Type (ST) Filter The Semantic Type (ST) Filter module filters out those medical candidates whose semantic types are not considered to be significant. This module retrieves the ULMS (Unified Medical Language System) semantic types for all the medical terms / phrases. Then it uses a list of “permissible” semantic types to select those medical term candidates that are relevant for the domain. We have generated a list of semantic types relevant for Cardiology after consulting with specialists in the cardiology domain and after a qualitative analysis of the usage of these semantic types. With additional experiments conducted with clinical practice guidelines (CPGs) for heart disease we generated a list of semantic types best represented CPGs in Cardiology.
286
International Journal of Information Studies
Volume 1 Issue 4
December 2009
3.2.4 Info-Manager The ‘Info-Manager’ module for each potentially significant medical term candidate adds the following weights: ‘Term Frequency’, ‘Total Contextual Weight’ and ‘Total weight’. ‘Term Frequency’ is calculated by a frequency analysis of a medical term candidate to find the number of times this medical term candidate appeared in the corresponding Ex-KC. After frequency analysis a “Total Contextual Weight” of each medical candidate is calculated using the Contextual Weight. . Every element in Ex-KC refers to a specific context e.g. Recommendation, Decision Variable etc. Such contexts of different elements have been quantified based on their impact factor, into contextual weights (represented as η ) [40]. A medical term candidate belonging to a certain element is given the contextual weight of that element. Since the medical term candidate may appear in different elements within the Ex-KC, the aggregation of its contextual weights is calculated, which is named Total Contextual Weight. The Total Contextual Weight of any medical phrase is a summation of contextual weights of the medical phrase within Ex-KC that is defined in equation (2). In this equation, Tη mpα is a Total Contextual Weight of a medical candidate (mp) in i
‘ith’ number of Ex-KC ( α ). The “ η mp j ” is defined as the Contextual Weight of ‘jth’ mp in ‘ith’ number of Ex-KC ( α ) at CTS αi
context-structure (CTS). ‘N’ represent the number of time that mp is found in α . N
CTS Tηmpα = ∑ηmp j i
j =1
(2)
αi
After calculating the Total Contextual Weight of a medical term candidate, the Total Weight for the medical term candidate is calculated by two factors (a) Term-Frequency and (b) Total Contextual Weight. So that the Total weight is a summation of these two factors. If the Total Weight is represented by (ψ ) and the Term Frequency is represented by ( ) then the Total Weight can be formally defined in equation (3), where Tη mpα is the “Total Contextual Weight’ of ‘medical candidate” as i defined in equation (2).
ψ = + Tηmpα
i
3.2.5 Diseases Ontology Filter ‘Diseases Ontology’ represents domain knowledge that provides relationship information concerning different entities such as diseases, symptoms, diagnosis, procedures, etc, pertaining to the corresponding disease. Such domain knowledge plays an important role in identifying the essence of different medical terms. We have developed a prototype of diseases ontology for our framework. This Diseases Ontology can be exyended to be applied to any domain in general. For cardiology domain, we developed a “Heart Diseases Ontology” (HDO) that defines the relationship among heart diseases, symptoms, therapies, diagnosis procedures, and diagnostic clues. In Heart Diseases Ontology, we use ‘Has a’ relationship between diseases, symptoms, therapies, diagnosis, and diagnostic procedures. For example ST segment elevation myocardial infarction has the following symptoms: chest pain, arm pain etc. We used ICD10, SNOMEDCT, MeSH and the information provided by cardiology specialists. The “HDO-Filter” module uses this ontology and filters out those medical term candidates that do not conform to the HDO. 3.2.6 Semantic Relation Score Module The UMLS provides relationships between different semantic types. We used the “associated-with” relation that determines whether one semantic type is associated with another semantic type. Since every medical term candidate has a semantic type, analyzing the ‘associated with’ relation of semantic types, we were able to determine whether the medical term candidate is associated-with ‘none’, ‘one’, or ‘many’ medical phrases. The rationale behind this is the more associated a ‘medical candidate’ is with other medical term candidates, the more potential it has to be included in the final query set. We quantified the ‘associated-with’ relation by giving score of “1”, if medical candidate is found ‘associated-with’ other medical candidate. The score of “0” is given if ‘associated-with’ relation is not found with other medical phrase. It is formally represented by equation 4 where the ‘associated with relation’ between two medical candidates (mpi and mpj) in Ex-KC ( α ) is represented by ( SMRRel = ξ ).
International Journal of Information Studies
Volume 1 Issue 4
December 2009
287
⎧⎪1 SMRRel = ξ = ⎨ 0 ⎩⎪
α α ⎫ if mpi (associated-with) mp j ⎪ α α⎬ if mpi (Not associated-with) mp j ⎪ ⎭
(4)
N −1
SMRSCORE =
∑ξ
(5)
1
The summation of SMRRel = ξ score for a medical term candidate is termed the Semantic Relation score (SMR score) defined by equation 5. In equation 5, ‘N’ represents the total number of medical candidates in the query set. The value of ‘SMR score’ for a particular medical phrase could be within the range of [0 to (N-1)] where ‘N’ is the total number of medical phrases. The ‘SMR score Module’ calculates the ‘SMR score’ for each medical candidate and adds it to the corresponding medical candidate. 3.2.7 The Final Query Set The main function of the Final Query Set Creator (FQSC) is to finalize the medical term candidates for the search query that will be used to query the MEDLINE database. Previous studies have shown that search queries should include only a small number of highly significant medical terms. A query set of a large number of medical terms/phrases tends to generate a query that is too specific which usually returns no medical articles from the knowledge source [23]. During our experiments using search queries of different lengths with MEDLINE we found that five terms is the maximum useful number of medical terms/ phrases in a query. Therefore we used this value as the threshold for the final query set (THFQS). Upon receiving the medical term candidates, the Final Query Set Creator module creates the three sets of medical term candidates as described in section 3.1: a Total Weight (TW set), with medical term candidates sorted by the Total Weight, an ‘SMR score’ (SMR set) where medical candidates are sorted by the SMR scre and a ‘KC’ element set of medical candidates found under the of the Ex-KC (KC set). The rationale behind the third set is that the ‘KC’ element within Ex-KC represents Ex-KC title so the medical term candidates appearing in the KC element have additional significance. If the KC element medical term candidate is also found in the TW set or the SMR set, it increases the likelihood that the corresponding medical term candidate will be a member of final query set. The number of medical term candidates in the final query set is then adjusted. The Final Query Set Creator evaluates the number of received medical candidates against the threshold value (5 terms) for final query set. If the number of received medical candidates is less than the threshold value, it tries to find medical term candidates from the ‘KC set’ which do not appear in the TW set or the SMR set. The selection of medical term candidates from the KC set is based on the ‘Total weight’, as assigned by the “Info-manager”.Those candidates are used to complete the number of medical terms of the final query set. Otherwise the Final Query set contains the received elements. The resulting final query set is sent to the Query Type Finder. If the initial number of received medial term candidates is more than 5, then Common Term Analysis is applied to select the first 5 best medical term candidates. The heuristic used in the Common Term Analysis is based on three different threshold values: 100%, 50% and 25% of the highest value (as shown in Figure 4) in the TW set and the SMR set. All candidates that satisfy the specified conditions are moved to the Final Query Set which is then sent to the Query Type Finder. 3.2.8 Query Type Finder Different models have been proposed to categorize the intentions of clinicians for their information needs. In our work, we used PubMed to find relevant medical literature. PubMed has adopted a query model proposed by Haynes [36, 38] for clinician query classification. In line with the query classification at PubMed, we categorized our search query into four types: (i) Diagnosis: literature related to evaluation of disease process, (ii) Etiology: literature related to causation of disease or condition, (iii) Prognosis: literature related to predication or forecast of a course of disease or condition, (iv) Therapy: literature related to therapy, prevention, or rehabilitation. If the medical terms / phrases belong to two query types, both query types are assigned [29] to the final search query.If the search query falls into more than two categories or does not fall into any above defined category, it is assigned as a “General” type. The final search query set was automatically categorized by these four categories. Figure 5 shows the process for determining a query type. A set of semantic types is assigned to each query type. These sets were generated in consultation with cardiologists following the methods provided in the literature [2, 36, 41-43]. Each medical term candidate is associated with at least one semantic type in UMLS. The query type for a search query is determined by finding the maximum number of medical terms / phrases that belong to a specific query type.
288
International Journal of Information Studies
Volume 1 Issue 4
December 2009
3.2.9 PubMed Webservice The ‘PubMed Webservice Module’ is used to connect to PubMed, to send a search query and to receive the retrieved medical articles. It uses the Pubmed E-utilities features and implements the specification defined by PubMed. This module transforms a final search query into the format defined by PubMed and implements our heuristic to retrieve, link and embed information in the corresponding Ex-KC. If medical literature is not retrieved for the submitted query, this module removes the least significant medical term / phrase form the query based on the Total Weight and SMRscore value and re-submits the new query to PubMed. This process is repeated until a result set of medical literature is retrieved or the query terms are exhausted. 4. Working Example of CQGF Framework This section presents a working example of the CQGF framework functionalities. First, a section of knowledge content is taken from the clinical practice guideline (CPG) “Management of acute coronary syndromes 2006” (ACS 06) [33]. In this example the section entitled “Reperfusion Therapy” was used and details the management of patients with ST-segment-elevation myocardial infarction with thrombolytic agents, and is shown in Figure 1. Using the GEM Cutter tool and our CPG computerisation framework this section was transformed into an Ex-KC. Figure 2. shows an excerpt of the resulting “Reperfusion Therapy” Ex-KC. Second, the CQGF analyses the “Reperfusion Therapy Ex-KC” to find potentially significant medical term candidates as described in section 3.2.1. Next, redundant medical terms are filtered out from the extracted medical terms and medical phrases by the Redundancy Filter. The resulting set of remaining medical terms and phrases are listed, along with their semantic types, in Table 1. Further filtering is applied to this set of medical terms by the Semantic Type Filter that removes insignificant terms based on their semantic types (as described in section 3.2.3). The highlighted rows in Table 1 indicate those medical terms that remained after the application of the Semantic Type Filter. This subset of medical terms is then processed by the Info-Manager to add weighting information to the medical terms and phrases (see section 3.2.4). At this stage there were eleven potentially significant medical terms. These eleven terms were further processed by the Heart Diseases Ontology Filter to further stratify the relative importance of the medical terms. Medical Phrase/Term
Semantic Type
reperfusion therapy
Therapeutic or Preventive Procedure
science of anatomy
Biomedical Occupation or Discipline
thrombolytic therapy
Therapeutic or Preventive Procedure
assay of fibrinolysis
Laboratory Procedure
transplantation
Therapeutic or Preventive Procedure
operative surgical procedures
Therapeutic or Preventive Procedure
fibrinolysis
Physiologic Function
using
Functional Concept
patients
Patient or Disabled Group
combined
Qualitative Concept
methodology
Intellectual Product
myocardial infarction
Disease or Syndrome
surgery specialty
Biomedical Occupation or Discipline
anatomy aspects
Qualitative Concept
percutaneous coronary intervention
Therapeutic or Preventive Procedure
coronary artery bypass surgery
Therapeutic or Preventive Procedure
anatomy
Anatomical Structure
protein c inhibitor
Biologically Active Substance
transplanted tissue
Tissue
surgical aspects
Functional Concept
procedures
Health Care Activity
st segment elevation
Finding
utilization qualifier
Functional Concept
graft material
Biomedical or Dental Material
Table 1. Remaining medical phrases and terms after ‘Redundancy Filter’. Highlighted rows indicate those terms that are filtered by Semantic Type Filter
International Journal of Information Studies
Volume 1 Issue 4
December 2009
289
After HDO filtration, seven medical phrases are left which are further processed by the SMR score module (as described in section 3.2.6). These medical terms are shown along with their Total weight and their SMR score in Table 2. In Table 2 the medical candidate “fibrinolysis” has a SMR score of “0” because, its semantic type is “Physiologic Function” that does not have an “associated-with” relation with “Therapeutic or Preventive Procedure”, “Disease or Syndrome”, or “Finding” semantic types in UMLS. The number of the remaining medical terms and phrases for the final query set is greater than the pre-defined threshold for a final query set (THFQS), that is ‘5’ terms. To further reduce the number of terms in the query set, three sets were created. The Common Term Analysis was performed using the 3 sets (see section 3.2.7). TW set and SMR set terms and their values can be found in Table 2. The KC set medical terms were “reperfusion therapy, methodology, and procedures”. After the Common Term Analysis five medical terms / phrases remained in the final query set. FinalQuerySet = {reperfusion therapy, percutaneous coronary intervention, myocardial infarction, ST segment elevation, thrombolytic therapy}. The query type for the FinalQuerySet was determined (section 3.2.6) to be “Therapy”. The final query submitted to the PubMed was {reperfusion therapy, percutaneous coronary intervention, myocardial infarction, st segment elevation, thrombolytic therapy} with query type “Therapy”. The medical articles retrieved and linked to Reperfusion Therapy Ex-KC are shown in Table 3. Medical Phrase/Term
TW
Semantic Type
SMR score
reperfusion therapy
4.75
Therapeutic or Preventive Procedure
1.0
percutaneous coronary intervention
2.5
Therapeutic or Preventive Procedure
1.0
st segment elevation
1.75
Finding
1.0
myocardial infarction
1.75
Disease or Syndrome
5.0
thrombolytic therapy
1.5
Therapeutic or Preventive Procedure
1.0
fibrinolysis
1.5
Physiologic Function
0.0
coronary artery bypass surgery
1.5
Therapeutic or Preventive Procedure
1.0
Table 2. Resulting medical terms/phrases from Heart Diseases Ontology Filter Retrieved and Linked Medical Articles for Reperfusion Therapy Ex-KC Longer-term follow-up of patients recruited to the REACT (Rescue Angioplasty Versus Conservative Treatment or Repeat Thrombolysis) trial. Carver A, Rafelt S, Gershlick AH, Fairbrother KL, Hughes S, Wilcox R; REACT Investigators. J Am Coll Cardiol. 2009 Jul 7;54(2):118-26. Effect of intravenous FX06 as an adjunct to primary percutaneous coronary intervention for acute ST-segment elevation myocardial infarction results of the F.I.R.E. (Efficacy of FX06 in the Prevention of Myocardial Reperfusion Injury) trial. Atar D, Petzelbauer P, Schwitter J, Huber K, Rensing B, Kasprzak JD, Butter C, Grip L, Hansen PR, Süselbeck T, Clemmensen PM, Marin-Galiano M, Geudelin B, Buser PT; F.I.R.E. Investigators.J Am Coll Cardiol. 2009 Feb 24;53(8):720-9. Transferring patients with ST-segment elevation myocardial infarction for mechanical reperfusion: a meta-regression analysis of randomized trials. De Luca G, Biondi-Zoccai G, Marino P.Ann Emerg Med. 2008 Dec;52(6):665-76. Review. Association of angiographic perfusion score following percutaneous coronary intervention for ST-elevation myocardial infarction with left ventricular remodeling at 6 weeks in GRACIA-2. Abraham JM, Gibson CM, Pena G, Sanz R, AlMahameed A, Murphy SA, Blanco J, Alonso-Briales J, Lopez-Mesa J, Gimeno F, Sánchez PL, Fernández-Avilés F; GRACIA-2 (Grupo de Análisis de la Cardiopatía Isquémica Aguda) Investigators. J Thromb Thrombolysis. 2009 Apr;27(3):253-8. Epub 2008 Mar 11. Effects of pretreatment with clopidogrel on nonemergent percutaneous coronary intervention after fibrinolytic administration for ST-segment elevation myocardial infarction: a Clopidogrel as Adjunctive Reperfusion Therapy-Thrombolysis in Myocardial Infarction (CLARITY-TIMI) 28 study. Gibson CM, Murphy SA, Pride YB, Kirtane AJ, Aroesty JM, Stein EB, Ciaglo LN, Southard MC, Sabatine MS, Cannon CP, Braunwald E; TIMI Study Group. Am Heart J. 2008 Jan;155(1):133-9. Epub 2007 Nov 19. Primary percutaneous coronary intervention compared with fibrinolysis for myocardial infarction in diabetes mellitus: results from the Primary Coronary Angioplasty vs Thrombolysis-2 trial. Timmer JR, Ottervanger JP, de Boer MJ, Boersma E, Grines CL, Westerhout CM, Simes RJ, Granger CB, Zijlstra F; Primary Coronary Angioplasty vs Thrombolysis-2 Trialists Collaborators Group. Arch Intern Med. 2007 Jul 9;167(13):1353-9.
290
International Journal of Information Studies
Volume 1 Issue 4
December 2009
Does coronary angioplasty after timely thrombolysis improve microvascular perfusion and left ventricular function after acute myocardial infarction? Agati L, Funaro S, Madonna M, Sardella G, Garramone B, Galiuto L. Am Heart J. 2007 Jul;154(1):151-7. Percutaneous coronary intervention in patients receiving enoxaparin or unfractionated heparin after fibrinolytic therapy for ST-segment elevation myocardial infarction in the ExTRACT-TIMI 25 trial. Gibson CM, Murphy SA, Montalescot G, Morrow DA, Ardissino D, Cohen M, Gulba DC, Kracoff OH, Lewis BS, Roguin N, Antman EM, Braunwald E; ExTRACT-TIMI 25 Investigators. J Am Coll Cardiol. 2007 Jun 12;49(23):2238-46. Epub 2007 May 25.
Table 3. Medical articles retrieved from PubMed for “Reperfusion Therapy Ex-KC” 5. Discussion and Evaluation The clinical guideline, Australian clinical practice guidelines for the management of acute coronary syndromes 2006, was computerized using our CPG computerization framework. This guideline is widely used in the evaluation and management of patients presenting to Emergency Departments with acute coronary syndromes in Australia. The computerized CPG in form of EX-KCs was processed with CQGF framework to generate clinical queries, and to retrieve medical literature from MEDLINE. Domain experts (specialists from the Cardiology domain) were asked to analyze and evaluate the results in the following settings.
Relevant
Not relevant
Relevant
Partially relevant
Not relevant
100%
0%
55.6 %
44.4 %
0%
Table 4. Results for clinical queries for two evaluation measures
Table 5. Generated clinical queries results for three evaluation measures
First, the generated clinical queries were evaluated based on their relevancy: (i) Relevant and (ii) Not relevant. ‘Relevant’ indicates that clinical query is relevant to the corresponding segment of CPG and ‘Not relevant’ indicates that clinical query does not represent the corresponding content. CQGF generated query for this evaluation were marked by domain experts. Table 4 shows the results of this evaluation. Another evaluation of the generated query was conducted along the lines of three categories: (i) Relevant, (ii) Partially Relevant, and (iii) Not-relevant. For this evaluation ‘Relevant’ indicates that generated query exactly represents the CPG content, ‘Partially relevant’ indicates that generated query is relevant but medical terms may be added or deleted, and ‘Not relevant’ indicates that generated query does not represent CPG content at all. Table 5 shows the results for this evaluation. To evaluate the ‘Query Type’ generated by the CQGF for clinical query, domain experts were asked to provide their feedback along the lines of two measures: (i) Agree, and (ii) Disagree. Table 6 shows the results for this evaluation. We conducted another evaluation of the ‘Query Type’ along the lines of the following measures: (i) Agree, (ii) Agree with reservation, and (ii) Disagree. In this evaluation, ‘Agree’ indicates that query type is correct, ‘Agree with reservation’ indicates that Query Type is marginally correct but it could be different as well, and ‘Disagree’ indicates that Query Type is not correct. Table 7 shows the results for this evaluation. CQGF assigned some of the queries two query types e.g. Diagnosis and Therapy. In this case, domain expert classified the Query Type under “Agree with reservation” category. CQFG depends on the PubMed retrieval technique to retrieve the medical literature using the generated query and query type. We also asked the domain expert to analyze the retrieved articles for segments of CPG content. For this analysis maximum twelve retrieved medical articles were analyzed. Retrieved medical articles were evaluated for two categories (i) Relevant and (ii) Not-relevant. In this evaluation ‘Relevant’ indicates that retrieved articles are relevant with the possibility that some
Agree
Disagree
Agree
Agree with reservation
Disagree
88.9 %
11.1 %
77.6%
11.3%
11.1%
Table 6. Results of finding query type for two evaluation measures
International Journal of Information Studies
Table 7. Results for relevancy of retrieved for three evaluation measures
Volume 1 Issue 4
December 2009
291
of the articles are not directly relevant to CPG content, and ‘Not-relevant’ indicates that retrieved articles are not directly relevant to CPG content. Table 8 shows the results of this evaluation. Relevant
Not relevant
88.8 %
11.2 %
Table 8. Results for relevancy of retrieved medical articles On reviewing the evaluation by domain experts, we found that queries and query types generated for those CPG segments that have: medications information, quantity of medication dose, and usage of those medications, were not good match. Such information also does not appear quite often in medical articles titles and abstract, which affects the relevancy of retrieved articles as well. 6. Concluding Remarks and Future Work In this paper, we have presented our technique that automatically generates clinical query from semi-structured, informationrich segments of clinical practice guideline’s content. This technique exploits the semantic tags, contextual information, semantic relation, and relevant meta-information of CPG content to formulate focused, context-specific query and determines its query type. This technique is implemented in our ‘Context-specific Query Generation Framework’. This generated clinical query is used to query MEDLINE database. By using online PubMed facilities the system retrieves and links medical articles with the corresponding segments of a CPG content. Automatically generating clinical query to provide articles pertaining to CPG content can improve the use of computerized CPGs in clinical settings. It also would help alleviate some of the burden in seeking relevant information at point of care. We have identified number of issues that need to be addressed in future work for improving the automatic delivery of medical evidence pertaining to computerized CPG content. This includes: investigating the ways to enhance the accuracy of the semantic filter, to develop more detailed domain ontology, investigating other semantic tags in computerized CPG to increase the precision of clinical query formulation, finding more optimal query length and investigating other techniques to improve query categorization. References [1] Wesley, W. C., Victor, Z. L (2003). A Knowledge-based Approach for Scenario-specific Content Correlation in a Medical Digital Library, Medical Digital Library. UCLA, Computer Science Department. [2] Zhenyu, L., Wesley, W. C (2007). Knowledge-based query expansion to support scenario-specific retrieval of medical free text, Information Retrieval, 2 (10) 173-202. [3] Mendonça, E., Cimino, J (2000). Automated knowledge extraction from MEDLINE citations, In: Proc AMIA Symp. [4] Ely, J., Osheroff, J. A ., Chambliss, M., Ebell, M. H., Rosenbaum, M. E (2005). Answering physicians’ clinical questions: Obstacles and potential solutions, Journal of the American Medical Informatics Association, 12. 217-224. [5] Olena, M. (2005). Automatic Keyphrase Indexing with a Domain-Specific Thesaurus, In: Computer Science, Master: The University of Waikato. 95. [6] Covell, D., Uman, G., Manning, P (1985). Information needs in office practice: are they being met? Annals of Internal Medicine, (103) 596–599. [7] Ely, J., Osheroff, J., Ebell, M., Bergus, G,. Levy, B. Chambliss, M.,Evans, E. (1999). Analysis of questions asked by family doctors regarding patient care, BMJ, 319. 358-361. [8] Gorman, P., Helfand, M (1995). Information seeking in primary care: how physicians choose which clinical questions to pursue and which to leave unanswered, Medical Decision Making, 15. 113-119. [9] Gorman, P. N., Ash, J. S., and. Leslie, W. W. (1994). Can primary care physicians’ questions be answered using the medical journal literature? Bulletin of the Medical Library Association, 82. 140-146. [10] Hersh, W. R., Pentecost, J, Hickam, D. H. A task-oriented approach to information retrieval evaluation, JASIS, 47, 50-56. [11] Timpka, T., Ekstrom, M., Bjurulf, P. (1989). Information needs and information seeking behaviour in primary health care, Scandinavian Journal of Primary Health Care, vol. (7), pp. 105-109, 1989. [12] Shiffman, R., Michel, G., Essaihi, A., Thornquist, E. (2004a). Bridging the guideline implementation gap: a systematic, document-centered approach to guideline implementation, J Am Med Inform Assoc, 11 (5) 418-26.
292
International Journal of Information Studies
Volume 1 Issue 4
December 2009
[13]. Kosecoff, J. K., Rogers, W., McCloskey, L., Winslow, C., Brook, R (1987). Effects of the National Institutes of Health Consensus Development Program on physician practice, J Am Med Inform Assoc. 258. 2708-13. [14] Lomas, J. A., Domnick-Pierre, K., Vayda, E., Enkin, M., Hannah, W (1989). Do practice guideline guide practice? The effect of a consensus statement on the practice of physicians, N Engl J Med. 321, 1306-11. [15] Sonnenberg, F., Hagerty, C (2006). Computer-interpretable clinical practice guidelines. Where are we and where are we going? Year book of medical informatics, 145-58. [16] Andreas, S. Robert, K., Silvia, M (2002). Asbru 7.3 ReferenceManual. Technical Report Asgaard-TR-2002-1, Vienna University of Technology, Institute of Software Technology & Interactive Systems, Vienna, Austria. [17] Sutton, David, R. ., John, F. (2003). The syntax and semantics of the proforma guideline modeling language, Journal of the American Medical Informatics Association, 10. 433-443. [18] Shiffman, R., Karras, B., Agrawal, A., Chen, R., Marenco, L., Nath, S. (2000). GEM: a proposal for a more comprehensive guideline document model using XML, J Am Med Inform Assoc, 7 (5) 488-98. [19] Peleg, M., Boxwala, A., Ogunyemi, O., Zeng, Q., Tu, S., Lacson, R., Bernstam, E., Ash, N., Mork, P., Ohno-Machado, L., Shortliffe, E., Greenes, R. (2000). GLIF3: the evolution of a guideline representation format, In: Proceedings of the AMIA Symposium. [20] Mark, A. M., Samson, W. T., Das, Amar, K., Shahar, Y. (1996). EON: A componentbased approach to automation of protocol-directed therapy, Journal of the American Medical Information Association, 3. 367-388. [21] Hussain, S., Abidi, S (2007). Ontology Driven CPG Authoring and Execution via a Semantic Web Framework, In: 40th Hawaii IEEE International Conference on System Sciences (HICSS-40), Hawaii. [22] Chueh, H., Barnett, G (1997). Just-in-time” clinical information, Acad Med, 72, p. 512-517. [23] Abidi, S., Kershaw, M., Milios, E. (2005a). Augmenting GEM-Encoded Clinical Practice Guidelines with Relevant Best-Evidence Autonomously Retrieved from MEDLINE, Health Informatics Journal, 11, p. 57-72. [24] Jimmy, L., Dina, D.-F. (2007). Semantic Clustering of Answers to Clinical Questions, In: Proceedings of the 2007 Annual Symposium of the American Medical Informatics Association. [25] Chambliss, M., Conley, J (1996). Answering Clinical Questions, Journal of Family Practice, 43, pp. 140-144. [26] Dina, D.-F., Jimmy, L (2007). Answering Clinical Questions with Knowledge-Based and Statistical Techniques, Computational Linguistics 33. 63-103. [27] Demner-Fushman, D., Seckman, C., Fisher, C., Hauser, S., Clayton, J., Thoma, G. (2008). A Prototype System to Support Evidence-based Practice, In: Proceedings of the 2008 Annual Symposium of the American Medical Information Association (AMIA 2008), Washington, DC. [28] Seol, Y., Kaufman, D., Mendonça, E., Cimino, J., Johnson, S (2004). Scenario-based assessment of physicians’ information needs, In: MEDINFO 2004. [29] Abidi, S., Kershaw, M., Milios, E. (2005a). BiRD: A Strategy to Autonomously Supplement Clinical Practice Guidelines with Related Clinical Studies, In: 38th IEEE Hawaii International Conference on System Sciences. [30] Hashmi, Z., Zrimec, T., Hopkin, A (2009a). CQGF: Context Specific Query Generation Framework from Computerized Clinical Practice Guidelines, In: Second International Conference on the Applications of Digital Information and Web Technologies ICADIWT London. [31] Hashmi, Z. I., Zrimec, T., Hopkin, A (2009d). Computerization Framework for Clinical Practice Guidelines by Extending the XML Guidelines Element Model (GEM), In: The XXII International Conference of The European Federation For Medical Informatics MIE 2009, Sarajevo. [32] Hashmi, Z. I., Zrimec, T., Lajovic, I. (2009c). Engineering Techniques And Methods Towards Implementation Of CPG Knolwedge Computerization Framework, In: IADIS International Conference on E-Health, Algarve, Portugal. [33] Heart Foundation Australia, (2006). Guidelines for the management of acute coronary syndromes, v. 2009. [34] Ely, J., Osheroff, J., Gorman, P., Ebell, M., Chambliss, M., Pifer, E., Stavri, P. (2000). A taxonomy of generic clinical questions: classification study, BMJ, 321,. 429-432. [35] Haynes, R., McKibbon, K., Walker, C., Ryan, N., Fitzgerald, D., Ramsden, M. (1990). Online access to MEDLINE in Clinical Setting, Annals of Internal Medicine, 112, 78-84. [36] Haynes, B., Wilczynski, A., McKibbon, Walker, C., Sinclair, J (1994). Developing optimal search strategies for detecting clinically sound studies in MEDLINE, Journal American Medical Informatics Association, vol. 1 (6) 447–458. [37] Wilczynski, N., McKibbon, K. A., Haynes, R. B. (2001). Enhancing retrieval of best evidence for health care from bibliographic databases: Calibration of the hand search of the literature, In: Proceedings of 10th World Congress on Medical Informatics (MEDINFO 2001). [38]. Haynes, R., McKibbon, K., Wilczynski, N., Walter, S., Werre, S. (2005). Optimal search strategies for retrieving scientifically strong studies of treatment from Medline: analytical survey, BMJ, 1162-3.
International Journal of Information Studies
Volume 1 Issue 4
December 2009
293
[39] Cimino, J.Aguirre, A.,Johson, S., Peng, P. (1993a). Generic queries for meeting clinical information needs, Bulletin of the Medical Library Association, 81 (2) 195-206. [40] Hashmi, Z., Zrimec, T., Hopkin, A. (2009b). Evidence Based Knowledge Retrieval from CPG Knowledge Base by Context and Statistical Analysis Techniques, In: Second International Conference on the Applications of Digital Information and Web Technologies ICADIWT 2009, London. [41] Sibanda, T., He, T., Szolovits, P., Uzuner, O. (2006). Syntactically-informed semantic category recognition in discharge summaries, presented at AMIA Annu Symp Proc. [42] Bodenreide, R. O., McCray, A. (2003). Exploring semantic groups through visual approaches, Journal of Biomedical Informatics, 36 (6) 414-432. [43] Long, W. (2005). Extracting diagnoses from discharge summaries, In: The proceedings of AMIA Symposium.
294
International Journal of Information Studies
Volume 1 Issue 4
December 2009