2009 International Conference on Emerging Technologies
Efficient Metadata Loading Algorithm for Generation and Parsing of Health Level 7 Version 3 Messages Yasir Mehmood, Muhammad Younus Javed
Muhammad Afzal, Hafiz Farooq Ahmad
Department of Computer Engineering College of Electrical & Mechanical Engineering, NUST. Rawalpindi, Pakistan
[email protected],
[email protected]
Department of Computing School of Electrical Engineering & Computer Science, NUST. Islamabad, Pakistan {muhammad.afzal, farooq.ahmad}@seecs.edu.pk
Abstract— Information technology has started focusing on the healthcare enterprises, for providing better medical care. There exist different healthcare enterprise standards that are used for the communication of medical information across health enterprises providing swift and reliable results. HL7 is one of those standards that are used for the exchange of medical information between healthcare systems. The main focus of this research work is to make metadata processing efficient in HL7 v3. HL7 v3 is an emerging standard to achieve semantic interoperability with its well defined information models like reference information model (RIM), domain message information model (D-MIM), and refined message information model (R-MIM). These models are converted to some technology specific format for implementation such as model interchange format (MIF). This format includes metadata information in the form of XML. MIFs are required to load into memory for generation and parsing of messages. HL7 Java SIG API utilizes these files in a non-efficient manner, as it loads all associations (no matter these are required or not) present in a particular MIF file. This is not only the performance issue but also memory wastage. In this paper, we propose an algorithm to improve the process of message generation and parsing by avoiding unnecessary associations during MIF loading. This technique is based on proxy design pattern. The proposed technique removes the performance bottleneck of the API and makes it space efficient.
I.
INTRODUCTION
Health Level 7 is one of several ANSI accredited Standards Developing Organizations (SDOs) operating in the healthcare arena [1]. HL7 has developed several standards in health care domain like conceptual standards in the form of HL7 Reference Information Model (HL7 RIM), document standards like HL7 clinical document architecture (HL7 CDA), application standards like HL7 clinical context object workgroup (HL7 CCOW), messaging standards like HL7 version 2.x and HL7 version 3.0. Messaging standards are of high importance because they define how the information can be packaged and communicated from one party to the other [2]. Metadata includes information about structure of the message being generated, format on which messages are based and constraints applied on different associations, attributes etc. All the messages of HL7 v3 are based on MIF and hierarchical message definition (HMD). Both of these formats are XML
978-1-4244-5632-1/09/$26.00 ©2009 IEEE
315
based [3] and supported by Java SIG API but proposed methodology is based on MIF not on HMD. Metadata loading for the generation of HL7 v3 messages has lot of problems and challenges which should be handled for the successful generation of messages i.e. it requires lot of memory for complete loading of metadata, beside this, there is possibility that metadata file may include cross references to each other which may result in memory errors like stack overflow etc, The new proposed methodology which is presented in this paper provides the way of efficiently loading metadata for the generation of messages. In this methodology a new version of the existing MIF file is created. This new version (temporary version) is based on the existing version. In new metadata file, first all of the required associations with reference to the entry point are included. Entry point is the point from which a particular metadata file is attached to the referencing message [4]. When mandatory associations are inducted into the temporary metadata file, developer decides to include only the required optional associations. When finished with all of the mandatory and optional associations with reference to the entry point, then classes referenced in these associations are taken as entry point and so on. This process of message generation loads only those associations that are needed by a particular application. This approach will help in efficient space utilization and also will remove memory errors. II.
MODEL INTERCHANGE FORMAT
MIF is a formal specification of all of the HL7 artifacts. This format is based on XML and is used to support the storage and exchange of artifacts as part of the HL7 development processes [5]. MIF is the pre-publication form of the health level 7 related to storage of HL7’s own data [3]. MIF files provide the information for the generation of messages. These provide the basic guidelines about associations between different classes in a particular message. Metadata contains all the information about the message i.e. associations among different classes, cardinality of associations, attribution level etc and it is present in the MIF files. There are two kinds of associations which are present in
MIF file: mandatory associations (associations with cardinality greater than zero) and optional associations (associations with cardinality equal to zero). Mandatory associations are those associations which are necessary for the generation of message. In contrast to these, optional associations are those which are not mandatory but can be added or ignored depending upon the requirements of a message. Sometimes, there are a lot of useless associations that are of not use in certain scenarios. A. MIF and HL7 API The core HL7 artifacts like vocabulary, data types, RIM, associations between RIM classes, their cardinality, cardinality of attributes etc. are also expressed in MIF file. Java Special interest group has developed an API with the approval of HL7 Inc. for generation, parsing and validation of messages using MIF named as Java SIG API. Similarly, Eclipse OHT platform has its own Java API related to the generation and parsing of HL7 messages based on MIF. Both of these projects are open source. Both of these APIs are at very low level and most of the time they resemble the structure of metadata interchange format file in contrast to some high level concepts [5]. III.
PROBLEM
Problem with existing methodology followed in Java SIG API is that all of the associations which are referenced in a metadata file are loaded. Sometimes, it is good especially in cases when there is a strong need of loading all of the metadata files but in most of the cases it has lot of issues. Most of the times, there is no need to load all of the metadata files rather only few of the associations are required. Approach which is followed in existing methodology is that it loads all of the associations no matter they are mandatory or optional. When these associations are loaded, relevant classes are also referenced. Some of the associations may contain reference to some other metadata file which has same structure and is treated in the same manner. Due to this, system has to do a lot of extra work which results in wastage of time and memory. In most of the cases there is no need to load all of the optional associations. There may be requirement of loading just three or four optional associations. As a result of this, there may be need of loading just a few referenced metadata files in spite of all of the metadata files. There may be dozens of metadata files which could be referenced in a single metadata file but one may need two or three according to her/his requirement. Then there should be an option for the developer to select from the list of available files which is not provided in the current methodology. Some of the metadata files may be cross referenced by each other because of the fact that a single metadata file can contain the references (in different associations) to a huge number of metadata files. Some of these metadata files may have direct or indirect reference to the root metadata file. Due to these cross referencing, a lot of memory is used which may result in stack overflow errors. This problem is encounter not only during generation of message but also during parsing of the message on the receiver side.
IV.
METHODOLOGY
The approach proposed in this paper is useful for both loading of metadata files in generation of message as well as in parsing and validation of messages. It is based on proxy design pattern. In spite of loading all of the associations present in the metadata file, initially only necessary associations are loaded then those associations are loaded which are optional at the RMIM level but required in a certain scenario. Due to this, it becomes possible to defer the full cost of generation of messages according to the need of application which is the main motivation of proxy design pattern [6]. To make the message generation processes time as well as space efficient; API is optimized to load only those associations which are required in a particular scenario. Initially, all of the necessary associations associated with the entry point of the metadata file are identified. It can be done by reading and understanding MIF files. Some time different MIF files are referenced in different associations within the metadata file, and these metadata files have similar structure and sometimes result in the cross referencing. Necessary metadata files can be identified by checking the value of minimum multiplicity attribute of “targetconnection” tag which is nested inside the association tag. For necessary association value of minimum multiplicity attribute should be greater than ‘0’. After identification of necessary associations, identify referenced metadata files and ask the application developer to load these metadata files. When all of the necessary associations attached with the entry point are identified, then take classes referenced in these associations as entry point turn by turn and identify necessary associations further attached with these entry points, and load metadata files if they are referenced in associations and so on. When finished with necessary or required association, now it’s the turn of optional associations. First of all identify all of the optional associations (associations with minimum multiplicity value equal to ‘0’). After identification of these associations, system asks application developer if she/he is interested in loading these associations. If application developer is interested, the association is loaded (along with MIF file referenced in that association if any); otherwise remove that association from the metadata file. When done with first level optional associations, then it is the turn of second level associations. System takes the first level associations as entry point and repeats the above mentioned step and so on. After finishing with all of the associations, it is required to load them in the memory as a new temporary metadata file. For this purpose a temporary metadata file is created, all of the identified associations are written down in this new MIF file. After that all of the MIF files referenced from these identified associations are traversed, similar steps are performed and identified associations from these referenced MIF files replaces the association (which contain reference to other MIF file) of temporary MIF file and this process goes on until all of the associations and MIF files referenced from these associations are handled. At the end the new temporary MIF file is ready. System is given the path of this temporary MIF file and all of the associations of this MIF file are loaded.
316
For parsing of message similar approach is used. Here idea is to parse both message file as well as MIF file. On the basis of content of the message file associations from the metadata files are loaded in the memory. This is done first by parsing the message file, and after parsing identify the names of each starting tag and values of those tags are compared with the name attribute of “targetconnection” tag in metadata file and in case of choice boxes, it is compared with the “traversalName” attribute of the “participantClassSpecialization” which is nested in the “targetconnection” tag. On the basis of these comparisons, associations and referenced metadata files are loaded in the memory. It results in the easy parsing of message. In worst case scenario, the results of both existing and new technique are same. In the best case scenario, one may be asked to load just one metadata file depending upon the requirements of the message. This technique reduces the time required for the processing of metadata files largely and also improves the space utilization. V.
DETAILS
In proposed methodology a temporary MIF file based on the requirements of the application is created and loaded in the memory. For generation of message, first, all of the required associations with reference to the entry point are identified and a new metadata file is created in which these necessary associations are inducted. One can easily identify required or mandatory associations by checking value of their attributes. For this purpose tag “mif:association” and its nested tag “mif:targetconnection” are of high importance. First traverse to the “mif:assication” tags associated with the entry point and then read the “mif:targetconnection” tags nested in each of the “mif:association” tag. Each of the “mif:tragetconnection” tag has some specialized attributes like name, “minimumMultiplicity”, “maximumMulitplicity”, “isMandatory”, “sortKey” etc. For identification of the fact, whether the required association is mandatory or not, value of “minimumMultiplicity” attribute is used. If the value of this attribute is greater than zero, it means that this association is mandatory and vice versa. When all of the necessary associations with respect to the entry point are identified, there is the turn of remaining (optional) associations with respect to the reference point. All of the optional associations which are needed are loaded in the temporary MIF file. E.g. in the RMIM of result event whose metadata interchange format file is (POLB_MT004000) there are lot of optional associations associated with the entry point like “recordTarget”, “author”, “verifier”, “performer”, “dataenterer”, “informationrecipient” etc. One may use only association of “dataenterer” according to scenario while other may use all of these depending upon requirements. After identification of all of the mandatory and optional associations with respect to the entry point there is the turn of classes, mentioned or referenced in these associations, to be taken as entry point turn by turn and repeat the above mentioned steps and so on. Some associations may contains the references to some other MIF file if so than all of the steps mentioned in this section is repeated for referenced MIF file and the associations which are required from this MIF file replaces the association
(which contained the reference to the MIF file) of the new MIF file. At the end, one temporary metadata file is created. This metadata file contains only those associations which are according to ones requirements. After generation of message temporary metadata file may be removed or kept for further use. A. Algorithm for generation of message 1. Identify entry point of the given MIF file. 2. Identify all of the associations associated with the entry point of the given MIF file 3. Identify mandatory associations (out of the associations found in step one) by checking “minimumMulitiplicity” attribute whose value is greater than 0 in tag “mif:targetConnection” nested in “mif:association” tag. 4. Load the associations of step 2 in temporary MIF file in memory. 5. Identify optional associations by checking “minimumMultiplicity” attribute whose value is ‘0’. a. Identify those associations which are according to requirements and induct them to the MIF file. 6. Identify the classes referenced in associations identified in step 3 & 5, 7. Take these classes as entry point turn by turn and repeat steps from 2 to 6 8. If another metadata file is referenced in associations identified in step 3 & 5 a. Then repeat steps from 1 to 7 with referenced metadata file and so on. Flow of these steps is given in the Figure 1. For parsing of message, there is need to understand both messages as well as metadata file because message file is based on the MIF file. All of the associations which are loaded for the generation of message can be identified from the contents of the message file. Every starting tag in the message file has its correspondence in the MIF file. For simple associations, value of each starting tag of message file is compared with the value of name attribute of the “targetConnection” nested inside the association attribute of metadata file, while handling of choice box associations is little bit tricky. It can be handled by comparing the value of corresponding tag of message file with the value of traversal name attribute “traversalName” of participation class specialization tag “participantClassSpecialization” nested inside “targetConnection” tag in metadata file. Based on these comparisons, system picks those associations with which these values are matched. After that these associations are written in the temporary metadata file which is used for the generation of message. B. Algorithm for parsing of message 1. Read value of each element of message file 2. Compare these values with the values in metadata file there are two possibilities
317
Figure 1. Flow of activities during generation of message
a.
3. 4. 5. 6.
If the association is simple then compare value with name attribute of “targetConnection” tag nested inside association tag b. If the association is of choice type, then transverse to the participant class specialization tag written as “participantClassSpecialization” and compare the value of tag of message file with value of its name attribute. Load those associations for which values are compared in the memory Leave rest of the associations If other MIF files are referenced, then load them and repeat steps from 1 to 4 Parse the message with the new temporary MIF files
These steps can be repeated until all of the associations which are used in the message file are identified and loaded. After loading of these associations it is easy to parse the message. In worst case, it is possible that all of the associations
Figure 2. Flow diagram of loading of one association for parsing
mentioned in the metadata files are loaded which result in performance degradation. C. Performance Analysis In existing methodology, best, average and worst case behaviors are same. The reason is that, in existing methodology all of the associations are loaded no matter whether they are needed or not. This causes memory leakage and wastage of time. In contrast, average case behavior of proposed algorithm (methodology) is far better than the currently used methodology because it loads only required associations due to which lot of memory and time is saved. With this, the problems of cross references and stack overflow errors are solved. For comparison between existing and proposed methodology, lot of messages like patient activation, placer
318
order, result event etc., are generated on different systems and their performance is measured against different parameters like time of processing, memory usage, and correct generation of message. In case of result event message, message was not generated using existing methodology due to stack overflow error which is removed due to optimized methodology. While in other messages, result was generated by using existing methodology but it was time consuming and memory inefficient. These problems are fixed in proposed methodology. In table 1, a comparison between existing and proposed approaches is given with respect to message generation and message parsing. The messages are generated on nine different laboratory tests and message type was placer order. TABLE I. No. of Tests 1 2 3 4 5 6 7 8 9
COMPARISON OF PROPOSED AND EXISTING METHODOLOGIES IN MESSAGING Message Generation Time Existing Approach
2259 ms 2277 ms 2290 ms 2312 ms 2330 ms 2345 ms 2367 ms 2382 ms 2399 ms
Proposed Approach
885 ms 899 ms 919 ms 934 ms 948 ms 964 ms 980 ms 1004 ms 1015 ms
Message Parsing Time Existing Approach
2343 ms 2367 ms 2389 ms 2399 ms 2412 ms 2429 ms 2440 ms 2456 ms 2462 ms
Proposed Approach
compressed with some suitable algorithm, and then again it is compressed with gzip so that it may be compressed as much as possible. Main drawback with this technique is that it is good for transferring XML schema from one point to the other but our concern is to process the schema for generation of HL7V3 message and this objective cannot be achieved efficiently by compression using XMill. There is need of decompressing this schema in memory for generation and parsing of messages because it would result in memory wastage and would result in the same as discussed in this paper for Java SIG API. Some approaches are based on XML parsing like pullbased parsing [9], lazy Parsing [10], and schema specific parsing [11]. Pull based parsing is very much related to the approach proposed in this paper. It gives control to the user to build or process only those parts of the data model that are actually needed by the application. Schema specific parsing is useful only when all of the MIF files have same schema otherwise they cost extra penalty.
913 ms 922 ms 935 ms 941 ms 957 ms 965 ms 976 ms 998 ms 1011 ms
In Table1, time comparisons show the efficiency of proposed approach over existing approach. This is given in detail in graphs of figure 3 for message generation and in figure 4 for parsing of messages. It is clearly indicated that proposed approach is more than 50 percent efficient that the existing approach with respect to time.
Figure 4. Comparison of Approaches in Message Parsing
VII. CONCLUSION Methodology used by Java SIG API for generation and parsing of messages has some serious performance issues i.e. it uses memory very inefficiently. Methodology proposed is an enhanced version of existing methodology. It overcomes the efficiency problems related to time and memory. It improves the metadata loading procedure by loading only required objects rather loading all of the objects in memory. This work is evaluated on different systems of different specifications and in all cases it produced better results than the existing one as shown in Table 1 and Figure 3 & 4. In short, this technique utilizes memory and time much more efficiently than the existing methodology. ACKNOWLEDGMENT
Figure 3. Comparison of Approaches in Message Generation
VI.
RELATED WORK
MIF is an XML schema lot of work had been done for efficient processing of XML schema. Some of approaches adopted over time are based on compression of XML schema some on static partitioning and dynamic partitioning. Compression scheme for XML data was first addressed in XMill [8]. In this approach XML document is first
This work is part of health life horizon project funded by National ICT R&D funds Pakistan [12] http://hl7.seecs.edu.pk REFERENCES [1] [2]
319
Welcome to Health Level Seven [online]. Available http:// www.hl7.org/ Health Level 7 [online]. Available http://en.wikipedia.org/wiki/HL7
[3] [4] [5] [6] [7] [8] [9]
Meta information loader [online]. Available http://aurora.regenstrief.org/ javasig/ wiki/HL7 v3Overview HL7 Version 3 Guide, www.hl7.org, January 2005. The HL7 MIF-Model interchange format [online]. Available: http:// www.ringholm.de/docs /03060_en_HL7_MIF.htm Proxy Design Pattern [online]. Available http://www.inf.bme.hu/ ooret/1999osz/DesignPatterns/Proxy4 Java SIG API Documentation H. Liefke, D. Suciu. “XMILL: An efficient compressor for XML data”. In Procedings of ACM SIGMOD, pp. 153-164 , 2000. A. Slominski, “Xml pull paring,” http://http://www.xmlpull.org/, 2004
[10] M. L. Noga, S. Schott, and W. Lowe, “Lazy xml processing,” In proceedings of the 2002 ACM symposium on Document engineering, pp.88-94, 2002. [11] K. Chiu and W. Lu, “A compiler-based approach to schema-specific xml parsing,” in The First International Workshop on High Performance XML Processing, 2004. [12] Proposal / Application for ICT-Related Development and Research Grant [online]. Available http://hl7.seecs.edu.pk/documentation.htm
320