A Simple Bayesian Network for Tuberculosis Detection

2 downloads 0 Views 177KB Size Report
2Biomedical Security Institute, University of Pittsburgh and Carnegie Mellon, PA. The early diagnosis of tuberculosis is important in reducing the exposure of ...
A Simple Bayesian Network for Tuberculosis Detection Haobo Ma MD,'2 Fu-Chiang Tsui PhD,',' Jeremy U. Espino MD, 1"2 Michael M. Wagner MD, PhD" 2

'The RODS Laboratory, Center for Biomedical Informatics, University of Pittsburgh, PA 2Biomedical Security Institute, University of Pittsburgh and Carnegie Mellon, PA Then, we validated the network with information available within the 2 weeks after admission, which included all variables from all five groups. Again, we calculated the sensitivity, specificity and area under ROC curve.

The early diagnosis of tuberculosis is important in reducing the exposure of other individuals to tuberculosis infection. Unfortunately, the identification of M tuberculosis in culture requires four to five weeks before the result is available. Even rapid genetic probes require one to two weeks after a laboratory receives the sample. Thus, it is common practice for physicians to place patients "suspected" of tuberculosis infection in isolation and on prophylactic antibiotics based on signs, symptoms, preliminary laboratory tests and radiological findings.

In the first step of the validation, the network gave an area under the ROC curve 0.99 (95% CI, 0.97 to 1.0). We chose the cutoff probability 0.6 for maximal discriminating power. Under this ideal cutoff point, the network detected 11 of 12 tuberculosis cases, achieving a sensitivity of 92% (95% CI, 62 to 100%) and a specificity of 98% (95% CI, 89 to 100%).

We developed a simple Bayesian network for detecting pulmonary tuberculosis in hospitalized nonHIV patients. The network was developed with Hugin Expert.', a development environment for Bayesian network construction. The network comprised 19 Boolean-valued nodes divided into five groups: demographics, clinical findings, radiological findings, laboratory and medications. We estimated the conditional probability tables (CPTs) for each node according to a literature review and consultation with 2 board certified internists. We specifically parameterized the structure and CPTs to exclude HIV infected patients. The reason for excluding HIVinfected patients in our network was the lack of knowledge, expert or literature-based, to parameterize the CPT of the "tuberculosis" node when conditioned on HIV status.

In the second step of the validation, the area under the ROC curve was 1.0. Again we used the optimal cutoff probability 0.6. Under this cutoff point, the sensitivity was 100% and the specificity was 100%. Caveats to this work are that physicians manually abstracted the chart data and the small sample size of the validation. Both indicate the preliminary nature of this work.

One of the advantages for manual abstraction is that it accurately represents the patient's condition. However, it might bring in personal bias and it is a time-consuming process. Future research will measure detection performance of the network using natural language processing to extract features from narrative clinical information such as chest x-ray reports, progress notes, and admission notes

We retrospectively validated the network on a group of 62 cases, 12 patients who were diagnosed with tuberculosis during their hospital stay and 50 randomly selected non-tuberculosis inpatients, extracted from the clinical data repository at the UPMC Health System, Pittsburgh, PA from January 1999 to December 2001. None of these patients were HIV positive. One medical informatics post-doctoral fellow reviewed the data and determined the Boolean value of each feature for each case.

Despite collecting three years of data, the sample size consisted of only 62 cases and probably did not represent the number of case variations that would be encountered in the real world. Future evaluations on the performance of the network will use a larger dataset.

Although not the only inference method that could be used to diagnose tuberculosis, our study is the first to use the Bayesian network. We conclude that the Bayesian network is a promising method for timely tuberculosis diagnosis, assuming the data can be extracted automatically and accurately from electronic sources.

To determine any differences when diagnostic inference is performed early in a patient's hospital stay versus later in the hospital stay, we did a two-step validation on the network. First we validated the network using only patient's demographics, clinical and radiological findings available in the first 72 hours after admission. The sensitivity and specificity of the network and area under ROC curve was determined.

AMIA 2002 Annual Symposium Proceedings

1092

Suggest Documents