SOFTWARE—PRACTICE AND EXPERIENCE Softw. Pract. Exper. 2006; 36:513–524 Published online 19 January 2006 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/spe.707
Generating content and display of library catalogue cards using XML technology Jovana Vidakovi´c∗,† and Miloˇs Rackovi´c Department of Mathematics and Computer Science, Faculty of Science, Trg Dositeja Obradovi´ca 4, 21000 Novi Sad, Serbia and Montenegro
SUMMARY This paper presents the use of XML technology in modelling library documents, i.e. catalogue cards (and the type of reports) found in a library information system. The method of schema formation for content of various types of library catalogue cards is also described. The display of catalogue cards has been done based on described schemas. The display process has been performed in two steps. The first one extracts the content from a bibliographical record based on the schema describing catalogue card concepts. The result is an XML document containing all catalogue card concepts filled in with the corresponding content from the bibliographical record. Its display is being performed in the second step, resulting in an HTML document. As well as adequate representation of data obtained by searching bibliographical c 2006 material databases, reporting is of prime importance in a library information system. Copyright John Wiley & Sons, Ltd. KEY WORDS :
XML; XML Schema language; catalogue cards; concepts; display
INTRODUCTION There are various types of reports in librarianship. One kind comprises reports about bibliographical material relating to reports on processed publications in the local or remote bibliographical material databases. They include catalogue cards, among other things. Catalogue cards are reports describing objects in a library collection. A catalogue card contains concepts. A concept contains fields, fields contain subfields and subfields contain subsubfields. The International Standard Bibliographic Description (ISBD) standard is used for defining the content and appearance of catalogue cards. The ISBD standard [1] is an international standard for bibliographical description and it states exactly the order of bibliographical data, the application of a particular punctuation system and data retrieval
∗ Correspondence to: Jovana Vidakovi´c, Department of Mathematics and Computer Science, Faculty of Science, Trg Dositeja Obradovi´ca 4, 21000 Novi Sad, Serbia and Montenegro. † E-mail:
[email protected]
c 2006 John Wiley & Sons, Ltd. Copyright
Received 16 July 2004 Revised 28 March 2005 Accepted 6 April 2005
514
J. VIDAKOVIC´ AND M. RACKOVIC´
Figure 1. An example of a catalogue card.
from predetermined sources. According to the ISBD standard, elements of a bibliographical description are grouped in areas whose order, content, language and alphabet are exactly defined. A catalogue card is made using ISBD principles. An example of a catalogue card is given in Figure 1. The XML technology is suitable for use in library information systems since library documents are structured and it is possible to use XML documents to model them. Library data are converted into XML documents, as described in several projects. The conversion of MARC records to XML documents has been done in the Java programming language, and it has been described in the Medline Project of Stanford University Medical Center [2]. Validation of XML documents is done with the help of DTD. The USA Library of Congress has developed the MARC XML project which features the manipulation of MARC data in the XML environment [3]. Conversion between MARC and MARC XML, published by the Library of Congress, is done using MARC4J. MARC4J is the library for working with MARC records in Java [4]. Using MARC4J, it is easy to write any kind of Java application or servlet that involves MARC or MARC XML data. At the University of Buffalo, State University of New York, MARC records were converted to XML catalogue pages containing bibliographical information, holdings information, and more [5]. Using an XSL stylesheet with properly tagged XML files, all of the data in the MARC records were preserved and control of the appearance of the page was gained. Citation elements of interest to the general user, such as author, title, call number and subject, were displayed and labelled. In Zeremski [6] and Jakˇsi´c [7] the schema of the UNIMARC format and the bibliographical record schema have been described, as well as bibliographical records validation. Currently, available references do not give an insight into the way of modelling library reports. If reports are accessible, only the final result of the database search of a library information system is visible, i.e. the display of data about the publication searched for. c 2006 John Wiley & Sons, Ltd. Copyright
Softw. Pract. Exper. 2006; 36:513–524
LIBRARY CATALOGUE CARDS USING XML TECHNOLOGY
515
The development of the library information system BISIS started in 1993 at the University of Novi Sad [8]. So far it has featured reports modelled by various means. In [9] the extended model of the entity relationship vocabulary of the catalogue cards has been given, and the complete specification is done using the ORACLE CASE tool Designer 2000. Vuli´c [10] shows the object-oriented specification of the system for reporting using the unified modelling language. In the current version of BISIS, library reports are generated using the Java programming language. As a part of further development of BISIS, this paper describes the method of modelling library catalogue cards using the XML Schema language. First, the concepts of different types of catalogue cards are described by a schema, and then the particular types of catalogue cards are described by separate schemas. The schema is used to completely describe the concepts and content of the catalogue cards. In view of the described procedure it is possible to extend the document by new concepts, and it is also possible to model other types of library documents. This paper also presents two programs written in the Java programming language. The first one extracts the content out of the XML library record following rules of the described schemas, forms a new XML document which is then displayed as HTML by the second program. The detailed description of the system for generating and displaying library catalogue cards is shown in [11].
SPECIFICATION OF LIBRARY CATALOGUE CARD CONCEPTS USING XML SCHEMA LANGUAGE During analysis of the catalogue card concepts it has been noted that there are concepts recurring on various types of catalogue cards [12]. Catalogue card concepts have been modelled by a single schema, i.e. the content of each concept is described by citing the fields and subfields which it is made up of. This schema contains a description of the concepts, fields and subfields. The punctuation is also given, and particular catalogue card types are modelled by schemas invoking corresponding concepts by citing their names. A part of the schema for catalogue card concepts description is given in Figure 2. The root element is Concepts. It consists of subelements representing concepts. Each concept in the schema is modelled either by a sequence or choice of elements representing the fields comprising a particular concept. The exception is the CallNumber concept, where subfields are cited instead of fields, subfields and subsubfields. This has been done so as to achieve the consistency of the same level of concepts. Attributes appearing for concepts, fields and subfields define punctuation, display manner, alphabet, processing rule, justification, and font. The punctuation relating to the concept is of the String type and it can have the following values: new line, next line. It denotes the place on the catalogue card where the content of the new concept will be printed in relation to the previous concept. The punctuation for fields and subfields is an attribute of the String type representing a character or a number of characters preceding the display of a field or subfield from the record. If a subfield is repetitive and the punctuation is changed during the next occurrence, the attribute punctuationRep is introduced. The special case is conditional punctuation modelled by punctuationCond. This attribute is cited in case the punctuation of a field depends on other fields occurring in the record. The value of this attribute is a conditional expression starting with a keyword if and followed by an expression depending on what is taken into account when determining punctuation. c 2006 John Wiley & Sons, Ltd. Copyright
Softw. Pract. Exper. 2006; 36:513–524
516
J. VIDAKOVIC´ AND M. RACKOVIC´
Figure 2. A part of the catalogue cards concepts description schema.
The rules for processing are as follows: aggregation, or, and. The application of a certain rule depends on which fields, i.e. subfields, should be included in the particular concept. The processing type is modelled by the attribute processing type. The aggregation rule processes all cited elements, i.e. fields and subfields, as they appear, and stores all those in the record to the concept content. This rule is represented as a sequence of corresponding elements, i.e. by an xsd:sequence element. The or rule is applied when the concept content should contain only one of the cited fields. As soon as the first field from the list is found in the bibliographical record, the processing is terminated. It is also possible to represent this rule as a choice of elements on the lower hierarchy level, i.e. by an xsd:choice element. The and rule is applied in the case that content of a concept should contain all cited fields. If any of the fields do not exist in the record, processing of the entire concept is unsuccessful. This rule is represented as a sequence of elements. c 2006 John Wiley & Sons, Ltd. Copyright
Softw. Pract. Exper. 2006; 36:513–524
LIBRARY CATALOGUE CARDS USING XML TECHNOLOGY
517
The mode of display is modelled by the justification and font attributes. The attribute justification is cited only with concepts whose content should be justified to the right and it has the value right. These are the concepts CallNumber and InventoryNumber. With the remaining concepts it is understood that the content is justified to the left, so it is not cited. The font attribute is cited with concepts where the font is different from the standard font in which catalogue cards are being printed or the font size is different from the standard one. The standard font is Courier New, and the standard size is 10 pts. The attribute font can also have the following values: bold, if the contents should be emphasized, or compressed, if the letters should be less in the display. The CallNumber and Headword concepts are represented as the choice of fields, i.e. subfields from the bibliographical record which comprises them. The rule for processing these concepts is or, and that is represented by the xsd:choice tag. CallNumber is a mark put on library material in order to identify it and locate it in the warehouse. It consists of numbers, letters or combinations of the two. The data on location of the given bibliographical unit (call number) are located in the fields 996 or 997, in the subfield d which is repetitive. So, one of the two subfields is chosen: 996d or 997d. CallNumber is printed at the top right corner of the catalogue card, so the value of the justification attribute is right. The punctuation of this concept is not specified. The processing type is or, because the elements are processed in the given order and the processing terminates as soon as one is successfully processed. The field 996 subfield d is a complex type consisting of the sequence of elements, i.e. subsubfields. The rule for processing subsubfields is aggregation, because the subsubfields are processed in the given order and the subsubfield found in the record is being printed on the catalogue card. The subsubfields have a defined punctuation. For the subsubfield 1 of the field 996 of the subfield d (location mark) the punctuation is not specified, since it is the first subsubfield in the concept. The subsubfield n of the field 996 of the subfield d (current number) has the punctuation ‘–’, and the subsubfield f of the field 996 of the subfield d (format) has the punctuation ‘/’. Yet another concept is the main description. The main description on a catalogue card comprises bibliographical cataloguing data on the publication from the block 2. It is the content of all of the fields from block 2 present in the given record. The display of the main description starts with the title (subfield 200a) from the fourth character of the entry, while the next lines of the main description are printed starting with the first character of the entry. While printing the separate subfields, punctuation rules defined in the ISBD standard, or another set of rules connected to practice, are respected. Depending on the publication type, the appearance of the main description on the catalogue card varies. If the monograph with individual entry is in question, the main description is as shown in the Figure 3 (CataloguingDescMon). The concept is modelled by a complex structure, i.e. a sequence of elements representing fields being contained by the main description for monographs. The rule for processing is aggregation, because the fields are processed in the given order. The rule for processing subfields within fields is also aggregation, since it is possible that the bibliographical record does not contain any of the subfields which are cited in the schema. Each subfield has an attribute punctuation, whose value is printed before the content of the subfield from the record. This concept can also contain repetitive subfields, where the punctuation differs when appearing for the first and subsequent times, so the attribute punctuationRep is introduced, whose value is printed before the repetitive occurrence of the subfield. There are also subfields where punctuation depends on the occurrence of the remaining subfields in the record. These subfields have a defined attribute punctuationCond. The order of the application c 2006 John Wiley & Sons, Ltd. Copyright
Softw. Pract. Exper. 2006; 36:513–524
518
J. VIDAKOVIC´ AND M. RACKOVIC´
Figure 3. The CataloguingDescMon concept.
of the punctuation during processing is as follows: punctuationCond, punctuationRep, punctuation. The punctuation for a repetitive field is applied in case a subfield which has already appeared in the record and gains advantage over the regular punctuation. Figure 4 gives an example of a subfield with attributes punctuation and punctuationRep. An example of a subfield with conditional punctuation is shown in Figure 5. If the subfield i of the field 225 is not preceded by the subfield h, but another, then the subfield i is preceded by the punctuation ‘.’ (point break); otherwise it is preceded by ‘,’ (comma break). The conditional expression modelling c 2006 John Wiley & Sons, Ltd. Copyright
Softw. Pract. Exper. 2006; 36:513–524
LIBRARY CATALOGUE CARDS USING XML TECHNOLOGY
519
Figure 4. A subfield with repetitive punctuation.
Figure 5. A subfield with conditional punctuation.
this case is as follows: if(h not before) ‘.’, checking at the subfield i if there is not a subfield h before it, setting the punctuation to a new value. The main description by other kinds of publications is modelled in a similar way. Each of them is modelled by a separate concept. Concepts differ only in the content of the field appearing in each of them. For example, the main description for serial publications has the field 207 instead of 205, etc. The remaining concepts are modelled in much the same way, as described in detail in [11]. c 2006 John Wiley & Sons, Ltd. Copyright
Softw. Pract. Exper. 2006; 36:513–524
520
J. VIDAKOVIC´ AND M. RACKOVIC´
Figure 6. Main catalogue card for monograph with an individual headword.
SPECIFICATION OF XML SCHEMA FOR DESCRIBING PARTICULAR CATALOGUE CARDS By describing appearance and content of concepts in a schema, it is possible to define all card types using these concepts. When defining these kinds of documents only the concept names are given without providing any details of the content, the display form and punctuation of the field, subfield or subsubfield present in the mentioned concept, as is described in [11]. The catalogue card about to be described in this paper is the main catalogue card for the monograph with an individual headword. This catalogue card type consists of the following concepts: call number, headword, main description for monograph with individual headword, remarks, ISBN, UDC number and inventory number. The schema modelled for this catalogue card type is shown in Figure 6 in the Schema Design view of the XML Spy Editor. It is possible to model XML Schema for an arbitrary type of catalogue card consisting of concepts specified in the schema for catalogue card concepts description in a similar way.
IMPLEMENTATION OF GENERATING CONTENT AND DISPLAY OF LIBRARY CATALOGUE CARDS In the library information system BISIS bibliographical records are saved in a UNIMARC format. A user searches the database by querying a particular field or fields, for example by author or book title, and gets a record, in the UNIMARC format, as a result. In the future development of the BISIS it is planned to use XML technology so that records are transformed from the UNIMARC format to XML documents. The obtained XML document should be valid according to the schema described in [6]. If a document is valid a catalogue card will be created. Catalogue cards are displayed with the help of two programs written in the Java programming language. The first program extracts content from an XML bibliographical record based on the schema describing catalogue card concepts and the schema describing a particular type of catalogue card. c 2006 John Wiley & Sons, Ltd. Copyright
Softw. Pract. Exper. 2006; 36:513–524
LIBRARY CATALOGUE CARDS USING XML TECHNOLOGY
521
Figure 7. Class diagram of the first program.
The program first traverses the structure of the schema describing a particular type of catalogue card and finds the names of the concepts the card consists of. After that, the program looks for these concepts in the schema for describing the catalogue card concepts. It looks for the list of fields, subfields and subsubfields of the concept. Afterwards, the cited fields and subfields, i.e. subsubfields, are searched for in the record, and their content is extracted. The punctuation, found in the schema describing the catalogue card concepts, appends to the record content. The content with the punctuation is placed into a new XML document whose tags have names of the concepts. Beside the content, all data important for display, such as font and justification, are placed into that XML document. The class diagram of the first program is shown in Figure 7. The class FirstPass is the starting class in the first program and it uses methods from classes XMLUtils and MyXMLParser. The class XMLUtils is realized in the current BISIS version and is used to work with XML documents. The class MyXMLParser transforms the input XML file into a DOM (Document Object Model) tree. The class FirstPass uses the methods of the ConceptProcess class which process concepts. The attributes of the concept are being found, and subsequently so are the fields that the concept consists of. The names of the fields are stored in a list for further processing. All concepts, except for ArabicTracing and RomanTracing concepts, are further processed using the methods of the FirstLevelProcess class. Based on the list of field names from concepts, the appropriate fields in the record are being searched for. The subfields of the found field are also put into the special list, because it is important to know if a subfield has already appeared in the record in order to determine its punctuation. If punctuation of a subfield depends on the appearance of another subfield in the record, the conditional punctuation is applied by processing the corresponding conditional expression. The concepts ArabicTracing and RomanTracing are enclosed in separate classes due to their structure being modelled in a different way from other concepts. The resulting XML document is an instance of the schema modelled for checking if newly generated XML documents are valid. Part of this document is shown in Figure 8. The group attributes is separated c 2006 John Wiley & Sons, Ltd. Copyright
Softw. Pract. Exper. 2006; 36:513–524
522
J. VIDAKOVIC´ AND M. RACKOVIC´
Figure 8. Part of the schema for checking if output XML documents are valid.
Figure 9. Class diagram of the second program.
because they appear in all concepts, as shown in the schema describing the catalogue card concepts. The document formed in this way is ready for display which is carried out by the second program. In the second program the tags relating to display are found and replaced with the appropriate HTML tags and attributes. For example, the content of the concept CallNumber is written right justified. The content of the attribute justification is put in the attribute ALIGN in the HTML tag TD, since it provides particular indentation. The corresponding places in the HTML code store the content of the tag content. Each of the attributes is processed by a separate class, and the class diagram of the second program is shown in Figure 9. The final result of these two programs is an HTML document ready for presentation by the browser. An example of the catalogue card display generated by the described application and shown by the browser can be seen in Figure 10. The detailed specification and the examples of outputs from the programs are shown in [11]. The HTML document which is the final result of these programs can be used as a means of showing c 2006 John Wiley & Sons, Ltd. Copyright
Softw. Pract. Exper. 2006; 36:513–524
LIBRARY CATALOGUE CARDS USING XML TECHNOLOGY
523
Figure 10. An example of a catalogue card.
existing catalogue records on the screen in the traditional catalogue card format using a Web browser. This application can be extended for printing catalogue cards, so that catalogue cards for new and existing catalogue records can be printed in the traditional format.
CONCLUSION In this paper a way of bibliographical catalogue cards modelling using the XML Schema language is described, as well as display of these cards. A general description of catalogue cards is made by modelling the schema describing catalogue card concepts. Particular catalogue cards are modelled by individual schemas. The implementation is done in the Java programming language based on these schemas. This way of modelling bibliographical catalogue cards provides a simple method for adding new concepts and types of catalogue cards and for modifying existing ones. If an existing type of a catalogue card should be changed or a new type should be made using existing concepts, it is enough to change c 2006 John Wiley & Sons, Ltd. Copyright
Softw. Pract. Exper. 2006; 36:513–524
524
J. VIDAKOVIC´ AND M. RACKOVIC´
or make a schema for that type of catalogue card by citing only the concepts which it consists of, which can be done by the end user, i.e. a librarian. If a new concept should be created or an existing one should be modified, it is enough to change it or add it to the general schema by citing all the fields and subfields comprising the concept. It should be done abiding by the rules which the schema has been modelled by, so as to enable the application to process the new concept as well. Only if a new concept cannot fit into any of the existing rules, or needs some specific processing, is it necessary to change the existing programs. This system for generating and showing bibliographical catalogue cards should be integrated into the library information system BISIS based on the XML technology. The system can be expanded so that it can support other types of bibliographical reports. This way of report modelling could be applied to any other structured report.
ACKNOWLEDGEMENT
This paper is part of the scientific research project ‘XML technologies and cooperative information systems’ no. 1875, supported by the Ministry of Science, Technologies and Development, Republic of Serbia.
REFERENCES 1. Family of ISBDs. http://www.ifla.org/VI/3/nd1/isbdlist.htm [19 January 2005]. 2. Clarke KS. Medlane/XMLMARC update: From MARC to XML database, a project of Lane Medical Library, Stanford University Medical Center. http://elane.stanford.edu/laneauth/medlane 2001.html [19 January 2005]. 3. MARC XML—MARC 21 XML Schema, Official Web site. http://www.loc.gov/standards/marcxml/ [19 January 2005]. 4. Peters B. MARC4J http://marc4j.tigris.org/ [19 January 2005]. 5. Ludwig M. Breaking through the invisible Web—1/15/2003, netConnect, CA266430. http://www.schoollibraryjournal.com/article/CA266430.html [19 January 2005]. 6. Zeremski M. Modelling of UNIMARC format in XML technology. Master Thesis, Novi Sad, 2002 (in Serbian). Available at: http://diglib.ns.ac.yu/ndltd/docs/set1/ndltd64/ZeremskiMMagistarskaTeza.pdf [19 January 2005]. 7. Jakˇsi´c M. Mapping of bibliographical standards into XML. Software: Practice and Experience 2004; 34(11):1051–1064. 8. Surla D, Konjovi´c Z (eds.). Distributed Library Information System BISIS. Group for Information Technologies: Novi Sad, 2004 (in Serbian). 9. Savi´c N. System for generating catalogue cards in Library Information System. Master Thesis, Belgrade, 1998 (in Serbian). 10. Vuli´c T. The reporting and documentation process of the Library Information System. Master Thesis, Novi Sad, 1997 (in Serbian). 11. Vidakovi´c J. Modelling and implementation of bibliographical catalogue cards in XML technology. Master Thesis, Novi Sad, 2003 (in Serbian). Available at: http://diglib.ns.ac.yu/ndltd/docs/set2/ndltd335/JovanaMagistarski.pdf [19 January 2005]. 12. Vidakovi´c J, Rackovi´c M. Modelling the concepts of bibliographical catalogue cards using XML Schema language. Novi Sad Journal of Mathematics 2003; 33(2):95–102.
c 2006 John Wiley & Sons, Ltd. Copyright
Softw. Pract. Exper. 2006; 36:513–524