Markov Logic Networks in Health Informatics - Semantic Scholar

31 downloads 0 Views 126KB Size Report
umes of complex health informatics data and improving different aspects of medical ... cancer therapy, individual and population traits are be- ing inferred from ...
Markov Logic Networks in Health Informatics Shalini Ghosh, Natarajan Shankar, Sam Owre, Sean David∗, Gary Swan, Patrick Lincoln SRI International, Menlo Park, CA Abstract

e.g., classification or regression trees, are already being used widely in the medical domain. There are some advantages of using MLNs over models like decision trees. Decision trees are propositional, whereas MLNs have first-order rules and therefore could be more expressive. Moreover, MLNs allow rules to be applied probabilistically, which makes them more robust to possible errors in the rules.

Health informatics is a fertile source of applications for data-intensive computing. In this position paper, we discuss some problems in health informatics and present high-level ideas about possible approaches using the framework of probabilistic relational models, in particular Markov Logic Networks (MLNs).

We first give an overview of MLNs and then outline how we can use MLNs for tasks in health informatics. We focus on two important problems in this domain: (1) improving various clinical criteria used in medical diagnosis, and (2) doing data integration and joint inference across multiple information sources to get the necessary information for complex clinical criteria.

1. Introduction Machine learning is having a growing impact in different aspects of healthcare, with state-of-the-art predictive modeling tools being used to tackle a wide variety of healthcare problems. Large amounts of data are being generated in the medical domain, e.g., electronic health records of patient health conditions, diagnostic tests, and drug regimens. Machine learning methods are becoming increasingly useful in analyzing large volumes of complex health informatics data and improving different aspects of medical diagnosis, starting from disease detection to planning the treatment regimen. For example, outlier detection techniques are used to improve intensive care monitoring and alerts, multi-label classifiers are used to automate medical report analysis for assigning diagnostic codes, reinforcement learning is used for optimizing patient treatment strategies, statistical relational learning and reinforcement learning techniques are being applied to personalization of cancer therapy, individual and population traits are being inferred from analyzing clinical temporal data, and an ensemble of diverse predictive models are being used to aid clinical decisions (ICML Workshop, 2008; NIPS Workshop, 2010).

Overview of MLNs: A Markov Logic Network (MLN) has a set of first order formulas F with associated weights, defining a space of models where each formula holds with a particular probability (proportional to its weight) defined over this space. There are also an associated set of constants K that are used to ground the formulas to get facts, which are the grounded instances of formulas in the MLN. One can compute the marginal probability of any formula over this space. In typical use, an MLN is used to infer the marginal probability distributions for (output) random variables and formulas based on the distribution for the input random variables. For example, given the test results of a patient, one can design an MLN to compute the probability that the patient has a particular disease. Let us consider a small example MLN that shows a possible relation between smoking and cancer. Suppose we know that smoking causes cancer and friends influence smoking behavior. In this example, there are three persons A, B and C. The Friends predicate indicates that two people are friends, while the Smoking and Cancer predicates indicate if a person smokes or has cancer. We use the Probabilistic Consistency Engine (PCE) (Owre & Shankar, 2009) tool for MLN inference, in which the specification of this MLN would look like:

In this position paper, we show how probabilistic relational models, in particular Markov Logic Networks (MLNs) (Richardson & Domingos, 2006), can be a useful and natural tool to solve certain problems in health informatics. MLNs are probabilistic first order relational models consisting of first-order logic formulas with associated weights, which enable us to reason in a more expressive first order relational framework while capturing the uncertainty in a domain. Machine learning models, ∗

Additional affiliations of Dr. David include Stanford School of Medicine, Stanford Hospital and Clinics, and Lucile Packard Children’s Hospital. Appearing in the ICML 2011 Workshop on Machine Learning for Global Challenges, Bellevue, WA, USA, 2011. Copyright 2011 by the author(s)/owner(s).

1

# Declarations. sort Person; const A, B, C: Person; predicate Friends(Person, Person) hidden; predicate Smokes(Person) hidden; predicate Cancer(Person) hidden; # Formulas. add [x] Smokes(x) => Cancer(x) 2.0; add [x, y] Friends(x, y) => (Smokes(x) => Smokes(y)) 1.0; # Priors. add [x] Cancer(x) -7.0;

# Facts. add Smokes(A); add Friends(A, B);

The first part of this simple example MLN consists of declarations of types (sorts), constants and predicates. The declarations are followed by the formulas that state that (1) ∀x, if x smokes, then x has some probability of having cancer (corresponding to weight of 2.0), and (2) ∀x, y, if x and y are friends and if x smokes, then that influences the smoking habit of his/her friend y with some probability (corresponding to weight of 1.0). The next part specifies the prior weight of the Cancer predicate — setting the prior weight of the Cancer predicate to -7.0 is equivalent to setting a low prior probability of having cancer based on domain knowledge. The final part of the MLN consists of the facts, which in this case indicate that A smokes, and A and B are friends. Running inference on the above MLN model in PCE gives us the marginal probabilities of cancer: prob(Cancer(A)) = 0.0057, prob(Cancer(B)) = 0.0025, prob(Cancer(C)) = 0.0007. This result indicates that C has a low chance (less than 0.1%) of developing cancer, since based on the facts and rules in this MLN, C does not smoke. B has a higher chance (0.25%) of developing cancer than C, since B is a friend of A, who is a smoker. Finally, A has the highest probability (0.57%) of developing cancer, since A is a smoker.

2. Better Estimates for Clinical Diagnosis Criteria The first problem we would like to consider is that of getting better estimates for various criteria used in risk analysis for clinical diagnosis. Doctors and other medical personnel use different algorithms in various stages of medical treatment, e.g., for dosage determination in a drug regimen, or in risk analysis of a patient’s condition. Here we consider the problem of risk analysis from clinical criteria in more detail. Clinical feature Clinical symptoms of DVT Other diagnosis less likely than PE Heart rate > 100 beats per minute Immobilization/surgery within last 4 weeks Previous DVT or PE Hemoptysis Malignancy

Points 3.0 3.0 1.5 1.5 1.5 1.0 1.0

Table 1. Points in Wells rule for pulmonary embolism.

Specialists in various medical disciplines use different criteria to diagnose diseases, e.g., Framingham criteria for congestive heart failure, Duke criteria for infective endocarditis (IE), Wells Clinical Prediction Rule for pulmonary embolism (PE) or deep venous thrombosis (DVT), Diagnostic criteria for Thromboangiitis Obliterans, Jones criteria for Acute Rheumatic Fever (ARF), Eagle criteria, Apache score, Framingham stroke risk criteria, Goldberger criteria, and many others 1 . 1

http://medicalcriteria.com/

Some of these criteria involve domain-driven rules, e.g., the Wells rule for predicting pulmonary embolism (PE) uses the points specified in Table 1 — a total greater than 6 indicates high risk, between 2 to 6 indicates moderate risk, and less than 2 indicates low risk of PE. Other criteria involve hand-crafted rules, e.g., the Duke criteria for infective endocarditis (IE) identifies 2 major (e.g., positive blood culture) and 6 minor (e.g., fever, microbiological evidence) conditions, and then diagnoses IE if any of the following holds: (1) two major conditions are true, (2) one major and three minor conditions are true, or (3) five minor conditions are true. The Problem: In most of these criteria, the points (if present) are somewhat adhoc estimates. Also, risk of diseases are classified in these criteria based on handcoded algorithmic rules designed by a domain expert (e.g., thresholds on sum of points or majority rules, as explained above). This is generally based on data that the domain expert has seen over the years. If we can encode these criteria by learning weighted rules (weights corresponding to the points) from the data to create predictive models, then we could have a more principled (and possibly more accurate) approach for designing these criteria. Proposed Approach: We propose using MLNs to better infer the risk of certain diseases, as outlined by these clinical criteria. The rules that are already present in the criteria can be the initial formulas in the MLN. If we have patient data records for the diseases along with the corresponding symptoms that were demonstrated, we can use the data to learn the weights of the MLN formulas as well as learn new combinations of rules to get better formulas for detecting the disease under consideration. Since we already have the hand-created clinical criteria rules, those can be used as the initial set of rules to seed the MLN creation and also constrain the weight/structure learning algorithms of the MLN. One of the advantages of this approach is that the decision process for clinical diagnosis in this case is based on a robust statistical inference process rather than a deterministic algorithm, as is currently being used in medical decision-making based on clinical criteria. The performance of MLNs can be estimated by standard validation techniques: one can compare the accuracy of the MLN model in predicting the presence of disease in a patient to the accuracy of the corresponding clinical criterion, over a test set of patients for whom their disease condition is known. Note that probabilistic predictive models have already been successful in providing improvements over standard criteria, e.g., the recently proposed PhysiScore (Saria et al., 2010) has a much higher accuracy in predicting infant morbidity than other current clinical criteria like the Apgar score for physiological assessment of newborn infants (Apgar, 1953).

3. Integrating Information across Health Records

previously recorded history of prescribed medications for other known health conditions.

For getting the data used in some of the complex clinical criteria discussed in the last section, medical personnel often have to gather patient health records from multiple sources. In many healthcare provider networks, information about a patient is distributed in health records across different providers. This is very prevalent in distributed networks, e.g., PPO networks, where patient health records can be distributed across multiple doctors offices. But it is also an issue in more centralized networks, e.g., an HMO network, where patient data can be often split across multiple departments of specialization, especially for more complicated cases.

Today, this process of data integration and overall evaluation from multiple records is mostly manual, and is therefore error-prone. Automating this process could aid the doctors by merging data and flagging inconsistencies in the process, which would be easier to evaluate than going through the tedious manual process.

The Problem: One of the core tasks in this setting is combining information from multiple sources. Sometimes the information from one source could be incomplete or erroneous, which should be taken into account and potentially fixed during information integration. Directly integrating the data from different sources may not suffice — each individual source may be performing some underlying inference task, in which case the goal of combining information across the difference sources would require integration of data as well as enforcing consistency between the individual inference outcomes. Each information source can vary in degree of reliability and completeness — doctors offices which rely on medical transcription to populate a patient’s health records can have typographical errors of drug names in allergy listing, while health records also may not be complete and up-to-date, e.g., the patient did not have a drug allergy when seeing one doctor and developed it later, so that the drug allergy information in the patient’s record for that doctor’s office could be incomplete. Each doctor’s office also makes a decision on the healthiness of a patient, based on their available information. However, some doctors might need access to all these records, e.g., a doctor who is investigating the patient’s medical history before performing a heart surgery and has to perform the overall task of determining if the patient is fit to undergo the surgery. Each individual record can give different types of information, e.g., one has ENT information, while another has ObGyn records. In addition, the health record (e.g., blood test results) from each source could be accompanied by analysis from a doctor or technician, assessing whether the patient is healthy based on the report. The surgeon would consider all of the individual patient records as well as the health reports together, doing effectively a final joint inference to determine if the patient is overall fit for the surgery. In the process, it could also be necessary to resolve possible inconsistencies while aggregating information from these different health records. For example, the doctor may need to resolve differences in recorded drug allergies. Another important aspect in which this reasoning could potentially help the surgeon is in identifying adverse drug interactions — the surgeon can then determine the suitability of the drugs necessary for the surgery, given the person’s

Proposed Approach: The task of extracting and combining information from multiple sources in health informatics involves integration the data from these sources as well as enforcing consistency between specific inference outputs. For the purpose of this position paper, we assume that data is either directly available from the individual sources in a structured format, or that we can use a state-of-the-art information extraction tool for extracting structured data from unstructured or semistructured health record text. Assuming the data is available from the individual sources in the necessary structured format, we focus on (1) performing inference for the sub-tasks using the facts extracted by the information extraction systems from individual sources, and (2) doing an overall joint inference. For both problems (1) and (2), we propose using MLNs. MLN for individual sources: Both the data and the rules from each individual information source can be modeled as an MLN. Running information extraction on the health records generates grounded entities as well as relations between the entities. The grounded entities constitute the constants in the MLN, while the extracted relations between them constitute the facts in the MLN. We recommend mapping the constants and facts of these individual MLNs to a common ontology, e.g., UMLS ontology (Bodenreider, 2004), so that during subsequent information integration we don’t have to worry about schema matching or other techniques for getting a common terminology. Each information source would have some domain specific rules that are targeted to solve a sub-task. For example, a cardiologist will have some specific rules (see Section 2 for some examples) to determine if a patient is healthy, after looking at the data relating to the cardiovascular health of the patient. These rules can be flexibly modeled using formulas in first order logic with associated weights in the MLN, where the weights can either be specified by domain knowledge of an expert or learned from data (if available). So, in summary, given a set of patient records from an information source, the set of constants and facts generated by an information extraction system, along with weighted formulas encoding domain-specific rules, defines the MLN for the source information. Composite MLN: We cast the problem of integrating data and inference output from multiple sources into one of (a) getting MLNs corresponding to each source, and (b) combining the individual MLNs to a single composite MLN. The composite MLN should: (1) be a valid MLN, i.e., it should define a valid probability distribution over the set of possible truth assignments, (2)

resolve potential logical inconsistencies between the results of the individual MLNs, and (3) have rules that encode the combined knowledge from the rules in the different individual MLNs. The direct way to combine MLNs is to take the union of the constants, facts and rules of the individual MLNs. However, simply doing this could have some problems, as outlined below. Firstly, the formulas coming from different individual MLNs could conflict with each other. For example, let us consider that one MLN has the formula A: ∀x ¬healthy(x)

=⇒

overweight(x) 2.0,

while another MLN has the 2 formulas B and C: ∀x ¬healthy(x) ∀x underweight(x)

=⇒ =⇒

underweight(x) 2.0, ¬overweight(x).

A and B each have a high probability of being true (corresponding to the weight 2.0), while C is a deterministic rule. The first formula A can be present in the MLN of a doctor’s office that treats clinical obesity, while the last 2 formulas B and C can be present in the MLN of a doctor’s office that treats patients having weight loss due to chronic illnesses, e.g., cancer. Consider that we just combine the rules from the 2 MLNs directly, so that we have A, B and C in the same MLN. Now, consider a patient who is considered to be not healthy. Formula A indicates that the patient is overweight with a high probability, while formula B and C together indicate that the patient is not overweight with a high probability. If the formulas are combined directly, it will result in an answer which will not be very meaningful, since the patient will be considered with high probability to be both overweight and not overweight. In this case, it is necessary to learn the weights of the formulas in the composite MLN, based on some other data. For example, if we knew from the patient’s health records that the patient has cancer and is undergoing chemotherapy, then there is a high chance that the patient is not overweight — in that case, the weight for formula A should be suitably decreased in this particular composite MLN. Secondly, if we do not have new formulas in the composite MLN other than those inherited from the individual MLNs, then we may not be able to have formulas that combine rules from different MLNs. For example, let us consider that 2 individual MLNs have the formulas ∀x ¬overweight(x) =⇒ healthy(x) and ∀x ¬smoker(x) =⇒ healthy(x). In the composite MLN, it is useful for the purpose of medical diagnosis to have the rule: ∀x ¬overweight(x) and ¬smoker(x) =⇒ healthy(x). So, it is important to be able to learn rules that compose formulas coming from the MLNs of different sources, making it necessary to perform structure learning in the composite MLN (along with learning the weights for the new rules). So while combining the formulas, it is important to have weight learning (Huynh & Mooney, 2011) to be able to resolve inconsistencies by suitable weighting, and structure learning (Kok & Domingos, 2005) to be able to

learn rules that combine formulas coming from different sources. There are other possible solutions for combining the MLNs. One can learn a decision tree that would determine when to use each individual MLN, e.g., in the healthy/overweight example, the decision tree would indicate that based on the patient’s history, if he/she has obesity problems, then the composite MLN should put a higher weight on formula A, whereas if the person has a chronic illness like cancer, then the composite MLN should put a higher weight on formulas B and C. Another possible composition would be to chain the inference output of one MLN into another in a pipeline architecture. A different combination might use the probabilities generated by each MLN as (probabilistic) facts for the other MLN until the marginal probabilities converged — this approach would be useful if any part of the original data is missing while creating the composite MLN. Since different MLNs are derived from different sources, another aspect of the problem is that we can have more confidence in one source over another. Estimating confidences of the sources from the available data and using the confidence values in estimating the weights of the formulas in the composite MLN is another interesting line of research to explore. Conclusion In this position paper, we have discussed two problems in health informatics and presented solution strategies that use the framework of Markov Logic Networks (MLNs). There are other problems in health informatics that would interesting to investigate, e.g., decreasing the relatively high false positive rate of breast cancer screening tests, where MLNs and other probabilistic models could be quite useful.

References Apgar, Virginia. A proposal for a new method of evaluation of the newborn infant. Curr. Res. Anesth. Analg., 32(4), 1953. Bodenreider, Olivier. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Research, 32:D267–D270, 2004. Huynh, Tuyen N. and Mooney, Raymond J. Online maxmargin weight learning for markov logic networks. In SDM, 2011. ICML Workshop. Machine Learning in Health Care Applications, 2008. Kok, Stanley and Domingos, Pedro. Learning the structure of Markov logic networks. In ICML, pp. 441–448, 2005. NIPS Workshop. Medicine, 2010.

Predictive Models in Personalized

Owre, Sam and Shankar, Natarajan. PCE User Guide, Technical manual, ver 1.0. CSL, SRI International, July 2009. Richardson, Matthew and Domingos, Pedro. Markov logic networks. Machine Learning, 62(1-2):107–136, 2006. Saria, Suchi, Rajani, Anand K., Gould, Jeffrey, Koller, Daphne, and Penn, Anna A. Integration of early physiological responses predicts later illness severity in preterm infants. Science Translational Medicine, 2(48), 2010.