Integrating Microarray Gene Expression Object Model and ... - CiteSeerX

0 downloads 0 Views 33KB Size Report
HL7 CDA. (Clinical Document Architecture) is a document model for clinical information, describing syntax but not semantics. We designed a document template ...
Integrating Microarray Gene Expression Object Model and Clinical Document Architecture for Cancer Genomics Research Yu Rang Park, B.S.1, Hye Won Lee, M.S. 1, Ju Han Kim M.D., Ph.D.1,2* 1

Seoul National University Biomedical Informatics (SNUBI), 2Human Genome Research Institute, Seoul National University College of Medicine, Seoul 110-799, Korea, [email protected]

ABSTRACT Systematic integration of gene expression profiling with clinical information may facilitate cancer genomics research. MAGE-OM (MicroArray Gene Expression Object Model) defines standard objects for genomic but not for clinical data. HL7 CDA (Clinical Document Architecture) is a document model for clinical information, describing syntax but not semantics. We designed a document template and common data elements in XML schema with additional constraints for CDA to define content semantics, enabling data model-level integration of MAGE-OM and CDA for cancer genomics research. Introduction To meet the growing need for improved data communication between clinical and genomics research, it is necessary to overcome the barriers of data heterogeneity and lack of standards. Although there are many studies about this problem, clear solution for this problem has not yet emerged [1]. HL7 CDA (Clinical Document Architecture) is a document model for the exchange of clinical information in XML [2]. MAGE-OM (MicroArray Gene Expression Object Model) is an objectoriented data model for gene expression data [3]. HL7 CDA, however, describes only the generic structure (i.e. syntax) of clinical document but not the meaning of contents (i.e. semantics), which is necessary for data model-level integration with the proposed object models in molecular biology. We designed a document template in XML Schema with additional constraints for CDA to define content semantics, enabling data model-level integration of MAGE-OM and CDA for cancer genomics research. Methods and Results To integrate CDA and MAGE-OM, a template for constraints on CDA should be designed. CDA is a generic container for representing clinical information in a form that is human-and-machine readable. It has an ability to apply one or more of hierarchical sets of templates, which serve to additional constraints. Following the wisdom of NCICB’s (National Cancer Institute Center for Bioinformatics) CDE

(Common Data Element) development effort, we systematically extracted information unit and common data elements from a collection of breast and lung cancer genomics studies and designed a document template using XML Schema for modellevel data description. There is a huge gap between the points of views of clinical and genomic data models. The whole clinical data is simply viewed as “experimental sample” like ‘specimen’ in genomic data models. Genomic data is viewed as “molecular analysis result” like ‘CBC’ in clinical data models. Because CDA only defines the generic document structure but not content semantics, we applied XML Schema to design a section and entry-level template for “molecular analysis result” information, which were then linked to the BioSource class of MAGE-OM. Finally, we created a template for cancer genomics data model, consisting of ten clinical and genomic sections; Patient demography, History, Physical examination, Diagnosis, Treatment, Tumor info, Molecular analysis, Specimen info, Recurrence info and Death info. Discussion We created a template using XML Schema to integrate CDA and MAGE-OM. Among the three methods of creating template [2], we applied more constrained XML Schema. By creating an integrated data model, while conforming to standards and providing better generalizability than local implementation strategy, it enables both syntactic and semantic validations of CDA XML document integrated with MAGE-ML document derived from MAGE-OM. Acknowledgement: This study was supported by a grant from Korea Health 21 R&D Project (0412-MI01-0416-0002).

References 1. Shabo A. Genotype Shared Model, V0.9, Health Level Seven Clinical-Genomics SIG, 2005. 2. Dolin RH, Alschuler L, Beebe C, et al. The HL7 Clinical Document Architecture. J Am Med Inform Assoc 2001;8(6):552-69. 3. Spellman PT, Miller M, Stewart J, et al. Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol 2002;3(9):RESEARCH0046.1-0046.9

AMIA 2005 Symposium Proceedings Page - 1073

Suggest Documents