Web. The parameters used to study the chosen digital libraries include: .... name of the organisation or institution hosting the digital library, country, classification ...... technique to return the results sorted with the best matches listed at the top.
A Review of the Status of Twenty Digital Libraries Meyyappan, N., Chowdhury, G.G., Foo, S. (2000). Journal of Information Science, 26(5), 337-355.
A Review Of The Status Of Twenty Digital Libraries N. Meyyappan, G.G. Chowdhury and Schubert Foo Division of Information Studies, School of Applied Science Nanyang Technological University, Singapore Abstract Recent proliferation of research in digital libraries has given rise to a number of working digital libraries around the world. These digital libraries have been defined, designed and developed differently, and therefore the experience that one might have from one particular digital library might not be the same with other digital libraries. Current status of twenty digital libraries around the world: twelve from the US, three from the UK, two from Australia, one from New Zealand, one from Singapore, and one from Canada, has been reviewed. Various features of these selected digital libraries were collected from their home pages, journal articles and the information published on the Web. The parameters used to study the chosen digital libraries include: contents, type of library, organization, user interface, access, information retrieval, search features, output format, and links to other Internet resources. While some of the chosen digital libraries cater for specific subject or document format, others play the role of digital as well as virtual libraries giving access to the local digital collection as well as remote collections accessible through the Web. While most of these digital libraries have been developed for use in-house or by authorised users, some digital libraries are globally accessible. The chosen digital libraries differ in terms of the information search and output facilities, and very few have the facility to store search histories. Only four digital libraries have books in electronic form – National library of Canada in general area, Gutenberg in subject-specific area, and SETIS and Carnegie Mellon University in special collection area. The review confirms that whilst digital libraries to date have been quite useful, there is need for further improvements in terms of user interfaces and information facilities. Additionally, this study reveals that two different types digital libraries are likely to emerge in future. The first is subject and document specific digital libraries that will cater for specific subject and type of information like digital video, maps, photographs and paintings, theses, and so on. The second is hybrid libraries that will link the traditional libraries with its OPAC, CD-ROM and online databases to the world of digital libraries and virtual libraries or gateways. The provision of personalized information services is an emerging trend in digital libraries to provide the next higher level of functionality to support users’ specific information needs and preferred search and retrieval strategies.
1
Introduction Digital library research has drawn much attention not only in the developed countries but also in developing countries. Improvements in information technology and increased funding towards information infrastructure have led to the development of a wide range of digital library collections and services. Some digital library research projects are run in collaboration with academic and international organizations. Digital Library Initiative projects in the US and the eLib projects in UK have played a key role in digital library development. In addition, many Digital Library projects are currently underway in Australia, Asia, Europe, Africa and Latin America. While some of them have their own funding, others are funded under DL-specific funding initiatives. Many definitions of digital libraries are available in the literature. According to Oppenheim [1], a digital library is an organized and managed collection of information in a variety of media (text, still image, video, audio, 3D models or a combination of these) all in a digital form. The British library DL program [2] defines digital library as the widely accepted descriptor for the use of digital technologies to acquire, store, conserve, and provide access to information and materials in whatever form it was originally published. These definitions emphasize that the materials in a digital library should be in digital form. Stanford digital library working group [3] goes further to define digital libraries as a co-ordinated collection of services which are based on collections of materials, some of which may not be directly under the control of the organization providing a service in which they play a role. Drabenstott [4] has identified the following common elements from various definitions of digital libraries: · · · · ·
The digital library is not a single entity; The digital library requires technology to link the resources of many; The linkages between the many digital libraries and information services are transparent to the end users; Universal access to digital libraries and information services is a goal; Digital library collections are not limited to document surrogates; they extend to digital artifacts that cannot be represented or distributed in printed forms.
The Digital Libraries Federation (DLF) [5] define digital library from a librarian’s point of view, stating that digital libraries are organisations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or a set of communities. Borgman [6] has recently examined the various definitions of digital libraries proposed by the various research groups, and proposes that a digital library is: (1) a service; (2) an architecture; (3) information resources, databases, text, numbers, graphics, sound, video, etc.; and (4) a set of tools and capabilities to locate, retrieve and utilize the information resources available.
2
Digital Libraries have been categorized differently by researchers. Oppenheim [1] has identified four types of libraries on a continuum running from the traditional to the digital: traditional, automated, hybrid and digital. A hybrid library, according to Oppenheim [1], is a library having a range of different information sources, printed or electronic, local or remote to various locations of library resources in different parts of the world. Rusbridge [7] suggests that a hybrid library should be designed to bring a range of technologies from different sources together in the context of a working library, and also to begin to explore integrated systems and services in both the electronic and print environments. Digital library research involves people from different areas such as computer and information science, social science and economics, law, and so on. Building a good digital library involves research in a number of areas. Chowdhury and Chowdhury [8] have recently reviewed digital library research under 15 major headings highlighting the major research activities in each of the 15 areas and pointing out the future of digital library research. This paper presents an overview of twenty digital libraries from around the world with a view to: · ·
providing an understanding of the various working digital libraries in different parts of the world, and identifying the various features of the working digital libraries including content, coverage, organisation of information, user interface, access, and search and retrieval facilities.
Quite a few digital library projects are going on in different parts of the world under different digital library programmes. DLI phase 1 had six projects, DLI phase 2 [9] has more than 30 projects and eLib programme [10] funded around 60 projects. Besides these, many commercial, public and government agencies have digital library projects around the world. Digital library projects are increasing at such a high rate that it is difficult to keep track of the total number of ongoing digital library projects. Twenty working digital libraries were selected for this study. The following points have been considered for the selection of these digital libraries: representation from different countries, origination from different types of institutions, coverage of various subjects and various types of materials. The chosen digital library projects for this study also include academic, public and special libraries. Out of the twenty chosen digital libraries: twelve are from the US, three from the UK, two from Australia, and one each from Canada, New Zealand and Singapore. Various features of the selected digital libraries were collected from their home pages, journal articles and the information published on the Web. The parameters used to study the chosen digital libraries include: the type of library, parent organization, the collection, information access, information storage and retrieval including the search and output features. Thus, this paper attempts to give an idea of what a digital library is, what are
3
their objectives, what do they cover, what are the search features, accessibility options, display format, and so on. This paper is expected to be particularly useful for students and beginners in the digital library area because it provides a snapshot of the various features of some prominent digital libraries around the world. Parts of this paper may also be useful for digital library researchers for it provides the comparison of some of these features. Changes in digital libraries are taking place all the time, and indeed many changes have taken place even during this short period of the study (from the end of 1999 to early 2000). However, efforts have been made to incorporate all such changes to date as possible. The list of references may appear short for a typical review paper. However, this list has to be supplemented with the URLs of the various digital libraries that were the primary sources of information used for this study. Digital Libraries: Basic Information Table 1 presents some general information about the chosen digital libraries, such as the name of the organisation or institution hosting the digital library, country, classification of the library, year of origin, URL, funding agency, and partnership with other organisation (if any). Table 2 gives information about the specialization, content and type of materials in the collection. Table 3 gives information about the access to and organisation of materials. Table 4 gives information on the output format, sort facility, search history; and Table 5 shows the search facilities available in the chosen digital libraries. Twelve, out of the twenty chosen, digital library projects were undertaken in the US alone, and nine of these were undertaken by Universities, two were by professional organisations, while one was undertaken by the Library of Congress. Out of the eight other digital libraries chosen from the other parts of the world, six were undertaken at the University level while two were undertaken by National Libraries, viz. National Library of Canada and the British Library. Three digital library projects were undertaken by the University of California: one at Santa Barbara campus known as Alexandria Digital Library (ADL), the second one at the Berkeley campus (UCB), and the third one at the University of California Office of the President. The British library has five digital library projects namely, International Dunhuang, Beowulf, Bibliotheca universalis, Magna Carta and Treasures Digitization. The Beowulf project has been considered for this study. The earliest of the digital library projects, though it was not called a digital library then, was Gutenberg that was set up in 1971. One project from the British Library started in 1993; six projects started in 1994 with a funding from DLI-1 programme; two projects started in 1995, three projects in 1996, two in 1998, and one in 1999. Two projects, viz., BUILDER and the HEADLINE, funded by the eLib programme in the UK, started in 1998, and are still under development.
4
Objectives of the Chosen Digital Libraries The motivation and specific objectives for each digital library differ significantly. Each of them was basically designed to achieve a specific purpose. For example, · · · · ·
· ·
· · · · ·
· · ·
The main objective of the ACM digital library was to provide full-text access to articles, conference proceedings published in ACM periodicals and proceedings. ADL was designed to provide access to a large range of maps and images to text and multimedia using spatially indexed information. The objective of AMMEM was to provide a rich primary source of materials relating to the history and culture of the US. The objective of the British Library’s Electronic Beowulf project has been to increase access to its collections by use of imaging and network technology. Both BUILDER and HEADLINE aimed to create hybrid libraries. BUILDER is supposed to develop the working model of a hybrid library in a teaching and research context, seamlessly integrating access to a wide range of printed and electronic information sources through the WWW interface. HEADLINE, yet to put materials in its digital library, aims to provide the user with a wide range of library resources regardless of the physical form. The main objective of CDL is resource sharing among the University of California libraries in other campuses and some local libraries. The objective of creating the digital library at the Carnegie Mellon University (CMDL) was to create an integrated speech, image and language understanding digital video library in addition to having some e-books, arts, music, e-journals and periodicals. GEMS has been built to serve as a vehicle to deliver a wide range of information resources regardless of the media type over a campus-wide network to all faculty, staff and students. The main objective of the Gutenberg project was to provide easy access to the humanities literature available in electronic format. IDL has been developed to build a collection of full-text journal articles from Physics, Engineering and Computer Science and to make them available over WWW. The main objective of the IEL digital library is to provide electronic access to full-text articles, conference proceedings and IEEE standards in the area of electrical engineering, information technology, applied physics and other technical disciplines. The objective of NCSTRL was to develop a distributed technical reports library containing a collection of technical reports related to computer science from the institutions or organisations offering PhD programmes in computer science or engineering in different parts of the world. NDLTD was designed to build a digital library of theses and dissertations of masters and doctoral students from various universities in the US and around the globe. The electronic collection of NLC was set up to make Canadian online books, journals and catalogues of over 500 Canadian libraries available through WWW. The NZDL’s objective has been to develop the underlying technology, which will help others to create and manage their own collections, for digital libraries and make it available to the public. 5
· · ·
·
Queensland Digital Library project, called DIGILIB, was designed to have a collection of wide range of domestic, public, mining and agricultural buildings in Queensland and Brisbane. The objective of SETIS was to facilitate access to in-house and remote textual and image databases, instructional programs, and the creation and storage of electronic texts. The objective of the UCB digital Library project was to develop tools and technologies to support highly improved models of the “Scholarly information life cycle” in a distributed, continuous and self-publishing model through object recognition and image retrieval in large image databases. The main objective of UMDL is to offer electronic information resources in environmental studies and other interdisciplinary areas, including life, natural and social sciences over distributed network environment.
Funding Most of the digital library research projects in the US were funded by the US funding agencies viz. NSF, NASA, ARPA in DLI phase 1. The six libraries received funds to the tune of US$ 25 million under DLI-1. Each one received approximately US$ 4 million or more. The NCSTRL project is sponsored by ARPA with the Corporation of National Research Initiatives (CNRI) and the National Science Foundation. Two libraries – HEADLINE and BUILDER, receive grants form eLib phase 3. The American Memory project is funded by the Library of Congress and private sector participation; NDLTD project was funded by the US Department of Education and the Southeastern University Research Association (SURA). New Zealand digital library (NZDL) project is funded by the New Zealand Foundation for Research, Science and Technology, and Lotteries Grant Board. DIGILIB is a collaborative project between the University of Queensland’s Architectural Department and Library; the Tertiary Education Institute and the University of Queensland funded this project. CDL is supported by the University of California, while National library of Canada supported the NLC digital library program. The British Library Digital and Network Services Steering Committee has funded the substantial equipment purchases used in London, while the University of Kentucky has funded equipment and system support for use in Lexington. GEMS was supported by the Nanyang Technological University, Singapore [11]. ACM and IEL digital libraries are managed by professional organisations. The ACM digital library is supported by the Association for Computing Machinery, and IEL is supported by the Institute of Electrical and Electronics Engineers (IEEE) and the Institution of Electrical Engineers (IEE).
6
ADL (Alexandria Digital Library)
University of California at Santa Barbara, USA
D
AMMEM (American Memory)
Library of Congress, USA
D
British Library (British library Electronic Beowulf Project) BUILDER (Birmingham University Integrated Library Development and Electronic Resource)
British UK
D
CDL (California Digital Library)
University of California office of the President, USA Carnegie Mellon University, USA
CMDL (Carnegie Mellon University Digital Library) DIGILIB (Queensland Digital Library (QDL) Project
University Queensland, Australia
of
ACM
http://alexandria.sdc.ucsb .edu
Library of Congress, USA
1993
http://www.bl.uk
British Library, UK and University of Kentucky,Lexington
1998
http://builder.bham.ac.uk/
1997
www.cdlib.org
ELib funded and partnership with the University of Oxford, University of Wolverhampton, West Hill College of Higher Educ. & Birminghan Central library California University, USA
1994
www.ul.cs.cmu.edu
DLI-1
www.architect.uq.edu.au/di gilib/index.html
Tertiary Education Institute and University of Queensland
H
D
of
NA D
GUTENBERG (Gutenberg Project)
University Illinois, USA
HEADLINE (Hybrid Electronic Access and Delivery in the Library Networked Environment)
London School of Economics, London School of Business University of Hertfordshire, UK University of Illinois at Urbana Campaign, USA IEEE and IEE, USA
1999 H
of
www.ntu.edu.sg/library/med ia/gems/gems.htm www.gutenberg.net
By donation and Volunteer services
1998
www.headline.ac.uk
ELib
http://dli.grainger.uiuc.e du
DLI-1 in partnership with 14 publishers and 5 S/W providers IEEE and IEL
H
1994 D NA D
Cornell University, USA
NDLTD (Networked Digital Library of Theses and Dissertations)
Virginia Tech.University, USA
NLC (National Library of Canada)
National Library of Canada, Canada
NZDL (New Zealnad Digital Library)
University of Waikato, New Zealand
SETIS (Scholarly Electronic Text and Image Services) UCB (University of California at Berkeley)
University of Sydney, Australia University California Berkeley, USA
of at
UMDL (University of Michigan Digital Library)
University Michigan, USA
of
Nanyang Technological University, Singapore
1971 D
NCSTRL (Networked Computer Science Technical Reference Library)
DLI-1
http://lcweb2.loc.gov/amme m
H
Nanyang Technological University, Singapore
IEL (IEEE / IEE Electronic Library)
www.acm.org/dl
1996
GEMS (Gateway Electronic Media Services)
IDL (Illinois Digital Library)
NA
1994
Library,
University Birmingham, UK
Fund/Project
D
URL
Association for Computing Machinery, USA
Year
*classfication
ACM (Association for Computing Machinery)
Name
Institution and country
Table 1: Basic information on the chosen digital libraries
www.ieee.org/products/o nline/iel 1995
www.ncstrl.org
ARPA with CNRI and NSF
1995
www.theses.org
US department of Education and South Eastern University Research Association
NA
www.nlc-bnc.ca
National Library of Canada
1996
www.cs.waikato.ac.nz/~nzdl
New Zealand foundation for Research Science and Technology and Lotteries Grant Board
D
D
H H
University of Sydney
1996 D
http://setis.library.usyd. edu.au/ DLI-1
1994 D
http://elib.cs.berkeley.ed u DLI-1
1994 D
www.lib.umich.edu/libhome/ dig.html
7
Legends: D – Digital Library
H- Hybrid Library
NA – Not Available
Subject Coverage of the Chosen Digital Libraries Most of the libraries chosen for this study have been set up in academic environments providing access to e-journals, OPAC, CD-ROM databases, and online databases. We can classify these libraries into three major categories: general, subject-specific and specialized collections. General Collection BUILDER, National Library of Canada, GEMS, NZDL, UMDL and CDL cover ejournals, OPAC, CD-ROM and online databases from a number of disciplines. Subject-specific Digital Libraries ACM, ADL, AMMEM, GUTENBERG, IDL, IEL, NCSTRL, SETIS, and UCB are subject-specific digital libraries by virtue of their coverage. ACM digital library covers literature from their publications only. ADL covers spatially referenced map information; IDL covers electronic journals in engineering, physics and computer science; NCSTRL covers computer science technical reports from Computer Science departments and industrial and government research laboratories from different parts of the world; IEL covers electrical and electronics engineering, information technology, applied physics and other technical discipline from their publications; UCB covers environmental subject; and SETIS, GUTENBERG and AMMEM cover humanities and social sciences. The subject coverage of the project HEADLINE will be economics, finance, business and management scalable to larger groupings of libraries. Specialized Digital Libraries DIGILIB covers images, photographs, historical buildings, and CMDL has concentrated on the development of digital video library. The British Library’s electronic Beowulf project provides access to the old English poem Beowulf manuscripts in the form of images. NDLTD covers theses and dissertations in various disciplines from various participating institutions. The Collection Table 2 provides information on the content and type of information contained in the digital libraries concerned. Some digital libraries contain only abstracts or bibliographic information while others contain full-text information. We can classify the contents of the twenty chosen libraries into five groups by its type: bibliographic, full-text, both bibliographic and full-text, images or graphics, and multimedia. All the libraries provide access to bibliographic or full-text databases in some form or the other. ADL concentrates on multimedia databases, and the CMDL’s Informedia project concentrates on multimedia and music collections in the digital video library. The latter also covers ebooks and full text e-journals and periodicals. Each library has a separate user interface 8
for accessing OPAC. Three libraries, viz. UCB, DIGILIB and ADL have special collections of maps, images, and so on. Four digital libraries have a collection of CDROM databases. In using these databases, the user first selects a database from a list of CD-ROM databases available in the library. Subsequently, the user interacts with the database using the search interface provided by the database producer. UMDL has more than 290 abstracting and indexing journals, newspapers and electronic journals in their networked digital library collection. Table 2: Information about collections, content and type of chosen digital libraries Name
Category
ACM
Content
Type
Specific
Articles, proceedings, calendar of events in ACM periodicals and proceedings
Bibliographic, Full-text and combined
ADL
Specific
Geographically referenced materials- maps, images, texts
AMMEM
Specific
BL
Special
BUILDER
General
History and Culture of the USA. Multimedia collections of digitized documents, photographs, sound and moving pictures and text from the library’s Americana collections Image based edition of the great old English poem in the British Library and Images of Cotton Vitellius A. XV. International Dunhuang, Bibliotheca Universalis, Magna Carta and Treasures Digitisation digital library projects, OPAC. Printed and electronic information sources, and Examination papers
Maps, spatial images and Texts Full-text, image ,video and Audio
CDL
General
CMDL
Images
Bibliographic, Full-text and Combined Bibliographic, full-text and Combined
Special
On-line archive of California, Melvyl Union Catalogue, periodicals database, E-journals abstracting & indexing databases Digital Video Collection plus photographs and full-text
DIGILIB
Special
Images of Queensland historic buildings, Brisbane architecture
Images and texts
GEMS
General
E-journals, OPAC, CD-ROM databases, Project report, AV sources
Bibliographic, full-text and Combined
GUTENBERG
Specific
Humanities, literature and references
Full-text
HEADLINE
General
London School of Economics, London Business School library OPACs, CD-ROMs, E-journals, course material, exam papers, secondary sources, financial and government information
Full-text, bibliographic and Combined
IDL
Specific
Journal article from Physics, Engineering, and Computer Science journals
Bibliographic, Full-text and Combined
IEL
Specific
Full-text
NCSTRL
Specific
Articles, conference proceedings and technical standards in Electrical and Electronic Engineering, Information Technology and Applied Physics. Collection of Computer Science research reports & papers
NDLTD
Special
Thesis and Dissertations, E-journals, VT Spectrum, WDBJ7 script archives
Full-text, bibliographic and Combined
NLC
General
On-line books, Journals and OPAC
NZDL
General
Developing interface technology
Full-text, bibliographic and Combined Full-Text
SETIS
Specific
Humanities, Poetry, Drama, Dictionaries, Text And Image creation Projects, digital version of Post-graduate theses
Full-text
UCB
Specific
Text, maps, images, sound, video, Hyper-textual Multi-media
Full-text, hyper text and Multi-media
UMDL
General
E-journals, CD-ROM databases, electronic reference shelf and UM-Med Search
Full-text, bibliographic and combined
Digital Video
Full-text, Bibliographic and Combined
9
ACM has a collection of 39,378 full-text articles from the ACM journals and conference proceedings; table of contents with over 7,000 citations from articles published in ACM journals and magazines from 1985 onwards; and tables of contents with nearly 35,000 citations from articles published in over 700 volumes of conference proceedings since 1985. ADL has a collection of geographically referenced materials such as maps, images and texts and datasets in multimedia form in earth and social sciences. The datasets include metadata and basic data in digital elevation models, digital raster graphics, scanned aerial photographs, landsat, seismic datasets and technical reports, Sierra Nevada ecologic project datasets, mojave ecologic project datasets, and AVHRR. Metadata are available for Gazetteers, Geodex, Georef, mojave bibliography and PEGASUS map records. AMMEM covers more than one million primary source materials relating to the history and culture of the United States of America. The collection also covers documents, film manuscripts, photographs and sound recordings that describe the American history. British Library’s Electronic Beowulf project has a collection of manuscripts of the great old English poem surviving in the British Library. In addition, Electronic Beowulf includes images of Cotton Vitellius A. xv, indispensable eighteenth-century transcriptions, copies of the 1815 first edition with early nineteenth-century collations of the manuscript, a comprehensive glossarial index, and a new edition and transcript. Major additions include links with the Toronto Dictionary of Old English project and with the comprehensive Anglo-Saxon bibliographies of the Old English Newsletter. BUILDER has a collection of printed and electronic information sources, examination papers and electronic version of journals, Forensic Linguistics and Midland History. BUILDER is also involved in developing a hybrid library search interface. CDL consists of the On-line archive of California, Melvyl Union Catalogue, and the California periodicals database. More than 2000 electronic journals from major scholarly publishers and information providers are licensed and made available in their network. It has a collection of abstracting and indexing databases, reference databases, and automatic weekly search services. CMDL has a multimedia digital library called Informedia that contains over one thousand hours of digital video, audio, images, and text. Informedia has a collection of more than 100 videos produced by the Bureau of Mines, Bureau of Reclamation, the Federal Emergency Management Agency Presents, the Fermi Lab, NASA core, the National Zoo National Oceanic and Atmospheric Administration, the Smithsonian Institution Presents, and the United States Geological Survey. This library also provides access to more than 300 e-journals, periodicals, and e-books. DIGILIB has a collection of images of Queensland historic buildings that include a wide range of domestic, public, mining and agricultural buildings. Many of these buildings were previously unrecorded in any accessible form and several have since been demolished. Over 1030 images are currently stored in the library. 10
GEMS provides access to networkable CD-ROM databases, Chinese CD-ROM titles, online search services, e-journals, AV sources, OPAC, and the Web. GEMS has a collection of more than 310 e-journals. It has a digital collection of project reports, theses, conference articles and publications submitted by staff and students to the library. It also provides other information such as academic calendar, course information, registration details, timetables, outstanding bills, and so on. Gutenberg has a full-text collection of the Bible, Shakespearean drama, and other religious documents. Full-texts of the Roget’s Thesaurus, almanacs, encyclopedia and dictionaries are also available. HEADLINE includes electronic journals, locally digitized materials, course-related materials, reading lists, examination papers, local consortium catalogues, secondary sources such as BIDS, IBSS, ECONLit, SOSIG, Biz/Ed, financial data sets and government information. This digital library also covers diverse resources available at the partner sites. IDL has developed a system, called DeLIver (Desktop Link to Virtual Engineering Resources) that provides access to full-text articles from Physics, Engineering and Computer Science journals. The collection contains around 40,000 articles from over 54 journals from five publishers. IEL has a collection of more than 5,00,000 full-text articles from over 12,000 publications including journals and conference proceedings. The coverage of the IEL includes full-text archives to IEEE and IEE publications from 1988 to the present. IEEE publishes nearly 30% of the world’s literature in electrical, electronics, computer engineering and science and provides access to more than 120 journal titles, more than 600 annual conference proceedings title and over 875 IEEE technical standards. This electronic library is a subset of the INSPEC bibliographic and abstracts database. NCSTRL has a collection of over 30,000 documents from more than 156 institutions offering PhD or engineering degree in Computer Science. NCSTRL collection is available from servers of the participating institutions from anywhere and to anybody in the world. NDLTD has a collection of more than 1800 theses and dissertations in Virginia Tech University campus. In addition, they have electronic journals, VT Spectrum, and WDBJ7 script archives. This digital library provides a facility for federated search from eleven other digital libraries of theses and dissertations. There are more than 60 institutions using this ETD software for creating their own digital library of theses and dissertations. NLC electronic collection incorporates formally published Canadian online books and journals. Catalogue records for Electronic Collection titles, including the Uniform Resource Locators (URLs), are also available. NLC’s electronic collection has eighteen
11
million full bibliographic records, 5,50,000 authority records, and 30 million holdings of 500 Canadian libraries including the National Library. NZDL provides access to 13 collections mainly covering Computer Science but also including the HCI bibliography, FAQ archive, Humanity Development library, Indigenous peoples, youth culture oral history, Oxford text archive, project Gutenberg collection, TidBits and Newspapers in Maori. Computer Science Technical Report collection is the largest one containing over 25,000 research reports from around 300 sites worldwide. There is a large collection of frequently asked questions on many topics, and full-text index to the US newsmagazine, the Computists Communique. SETIS provides access to a large number of networked and in-house full text databases in the humanities. In addition to the literary, philosophical and religious texts, the service is engaged in a number of text and image creation projects. Large number of collections such as the American Poetry full-text database, Australian literature from the year 1840, English poetry database, English drama databases, Oxford English Dictionary, etc., are available. There is also a distributed database of postgraduate theses in digital form. The UCB digital library maintains a collection of over 80,000 digital images, about 2 million records of data in tabular form and 2513 full-text documents in an online database. The collection includes documents, maps, articles, and reports on the environment of California, including Environmental Impact Reports (EIRs), educational pamphlets, water usage bulletins, and country plans. UMDL concentrates on some journal literature and reference resources including McGraw Hill Encyclopedia of Science and Technology, Encyclopedia Americana, Encyclopaedia Britanica, and 200 core and popular journals. This library also provides access to 1100 Elsevier journals. UM coverage of journals and newspapers in digital form has crossed 3000. Information Storage and Retrieval Information storage and retrieval plays an important role in any digital library. Specific information retrieval features of each digital library are discussed below. ACM organized their digital library collection using their own classification system, called Computing Classification System (CCS). Collections of this digital library are indexed under journals and magazines, proceedings by subject, by sponsor and by series. Conferences are also listed alphabetically under special interest group. All journals and proceedings literature covered by this library are grouped under eleven categories and also under 16 general terms. ADL documents are organized by the Library of Congress Subject Headings. In addition, some index terms are assigned by the university considering special collection of geospatially-referenced materials. Documents are organised to search geographic locations,
12
beginning and ending dates, type, ‘available as’ types, originators and identifiers. Documents are organised to search the contents using a two-dimensional world map. AMMEM categorised their collections into different subject groups, year, place, original format, digital format, library division and user’s format. Under each category collections are displayed alphabetically. All documents under subject category are grouped into thirteen sub-groups. There is a provision to select all the collection in a group or any or a set of collections to search. In Beowulf, images of the manuscripts are organised to search the entire edition, specific line(s), or specific folio(s). User can also search by word, sub-string and alliteration. The Beowulf manuscript was divided into two scribes and the scribes are searchable. In BUILDER, documents are organized under department, title, and course code and examination paper number. In Forensic journals and Midland History journals documents are organized to search in full-text. CDL has categorised their collections and services in three groups: browse, search and services. CDL has indexed documents under eight selected topics and title alphabetically. Documents are organized under title, topic, and abstract. One can also search for the exact beginning of the title of a document. Documents are also organized to allow users to search in any of the following four formats: E-journals, databases, reference texts and archival finding aids. CDL provides access to many information resources for locating or gaining direct access to scholarly materials in both print and electronic formats. CMDL grouped their collections as art, books, collections, journals, multimedia, music, periodicals and projects. Under each group there are subgroups and each subgroup has further subgroups. Items are organised in a hyperbolic tree structure and each group and subgroup is arranged alphabetically. In the on-line book page, documents are organized under author and title. In DIGILIB, images and photographs are organized by town, type, features, structures, materials and context. GEMS provides access to a collection of CD-ROM and Online databases and full-texts of project reports and some selected papers. An alphabetical listing of databases and electronic journal titles is available for browsing. Documents are grouped under 72 subject headings. GEMS has a facility to provide electronic resources from NTU collections searched through the NTU OPAC. Collections are organized to provide crossmedia search – OPAC, CD-ROM and online database indexes, digital theses, conference and other publications. In Gutenberg, the whole list of books is arranged by date of release, by titles and by author. Documents can be searched by title, author, subject, language, and Library of Congress Subject Headings. As all the available documents are in plain ASCII format, the downloaded documents can be used in any system.
13
In IDL, documents are organized such that each part is searchable by selecting or referring to that part. The full text of each article in the collection is tagged by title, abstract, table, analysis, references and conclusion parts, using SGML. This helps users search full-text or parts of the articles. In IEL all the documents can be searched in the full text, in the body, title, URL, site name, image link, image alt text, description, keywords, and in remote anchor text as a phrase or terms, or in the name or in combination of the above. Documents are also organized by the date of submission. Users can view the table of contents of journals in PDF and HTML format. NCSTRL collections are indexed under author, year, title, abstract and institution. Author, year and institutions can be searched using the browse index facility or searching by words under abstract and title. After searching user can go for full-text documents subject to the authors’ terms and conditions. The required documents can be downloaded in HTML format or in a format designed by authors. NDLTD has indexed documents under author and department. All the documents can be searched in the full text, in the body, title, URL, site name, image link, image alt text, description, keywords, and in remote anchor text as a phrase or terms, or in the name or in combination of the above. Documents are also organized by the date of submission. In NLC documents are indexed alphabetically under title and organized using the DDC system and full-text. Full-texts of electronic publications are archived in the following formats: ASCII, HTML, Text, Word, and WordPerfect. In NZDL, documents are organized in such a way that one can search in the first page of a document or in one particular page of a document. There are 13 different collections. User has to select a collection, and choose the query type – Boolean or ranked – and specify the search terms. In SETIS, there are many text and image creation projects. Only the collections of six projects are arranged to browse the full text collections; documents are arranged under keyword, title of works, author publication date, place of publication, publisher, male and female authors, and author date and literature period. UCB uses Chesire II user interface that allows three forms of search: simple forms, tile bars search, and browse lists of all documents. Simple search form has the facility to search by document or by page within a document. The tile bar interface allows user to make informed decisions about which documents and which passages of those documents to view based on the distribution behaviour of the query terms in the documents. There are two tile bars, one is called simple tile bar which is used to locate information in a collection of documents, and the other one, called Single-document tile bar, is a tool to locate information within a given document.
14
UMDL resources are arranged in three forms: alphabetically by title, by category and resources by service. There are nine headings viz., arts & humanities, business and economics, engineering, general references, government information & law, health sciences, news and current events, science, and social sciences. There are fourteen resources by service. Some of them are: Cambridge Science Abstracts, ISI Citation Databases, and Proquest. The UMDL project has also developed two methods of interaction: one on the multi-scale (infinite pan and zoom) platform of PAD++, and another on a distributed multi-person computing environment. Search Features Some Digital Libraries have more than one form of search, like Simple Search and Advanced Search. The search features discussed here include the ones available in both the simple and advanced search modes. The various search facilities available in the twenty digital libraries are given in Table 5. Browse / Index facility for searching is available in twelve libraries for a limited number of fields. User can go to the alphabetical listing of a field and choose keywords or author or title field for searching. Boolean search Boolean operators – AND, OR and NOT – are used to combine words or phrases in a search expression. Users can also enter a search phrase within quotes for searching in a simple search field. Simple query forms provide facility to enter query in a single line. The search query may contain Boolean operators or phrases. The search query is parsed into words or phrase and Boolean operators. These words and/or phrases are connected with AND operator. Users can also search in multiple fields in some of the chosen digital library. Multiple field search facility provides the ability to search in multiple fields using Boolean operators. In some digital libraries only AND and OR are used, and in some, AND is implied for multiple field search. Table 3: Organisation and access facility of the chosen digital libraries Name
Accessibility
Organisation of Information
ACM
Public access on subscription
ACM Computing Classification System. Broad groups under 11 categories, and 16 general terms.
ADL
For UC domain
Under Subject Heading and index terms assigned by cataloguer
AMMEM
Public access except for a few items
Broad groups, format, time, place, original format digital format and library.
BL
Public access
Full manuscripts, line, folio, folioline, fitt, scribes, SGML tags.
BUILDER
Staff, student and faculty
Under department, title, course code, examination paper number
CDL
Open to all, other campus users and campus users
Subject, format and campus
CMU
Public access, some materials need password authentication
Grouped under art, books, journals, multimedia, music, periodicals and projects
15
DIGILIB
Public access
Organised to search by town, type, features, Structure, materials and context.
GEMS
Staff, student and faculty
OPAC, e-journals, CD-ROM databases, examination papers
GUTENBERG
Public access, some materials are copyrighted
Author and Title
HEADLINE
London School of Economics, London School of Business, and University of Hertfordshire members Faculty, Staff , Students and selected other users
The digital library is yet to have materials
IEL
Full text is available to subscribers only
NCSTRL
Public access
Full text, in the body, title, URL, site name, image link, image alt text, description, keywords, and in remote anchor text as a phrase or terms, or in the name or in combination of the above Author, title, year and abstracts.
NDLTD
Restricted, Unrestricted, and Mixed
NLC
Public access, restricted and on payment
NZDL
Public access
Organized to search first page, same page and Same document.
SETIS
Users of University of Sydney Campus; Public access for some collections
Alphabetically by collection then by author.
UC B
Public access
Photographs, databases, documents and geographical layers. Under resources are arranged in different fields
UMDL
Three categories: UM network, authorised UM users, and open to all users
CD-ROM, e-journals under Subject (9 categories), Alphabetical and selected resources by service
IDL
Articles are organised under full-text, different sections and figures.
Full text, in the body, title, URL, site name, image link, image alt text, description, keywords, and in remote anchor text as a phrase or terms, or in the name or in combination of the above Indexed under Department and Author. Title and subject
NZDL uses &, | and ! as Boolean operators in their query for AND, OR and NOT respectively. NLC uses &, |, ~ and; for AND, OR, NOT and NEAR respectively. This library provides a facility to enter query in French language whereby users can use ‘accum’, ‘equiv’, and ‘minus’ operator for AND, OR and NOT operators respectively. If any Boolean operator is not included in a search expression, the system will take it as a phrase search. NDLTD and IEL use ‘must contain’, ‘should contain’, and ‘must not contain’ as operators in place of AND, OR, and AND NOT Boolean operators respectively. One can also use ‘+’ and ‘-‘ as the addition and rejection operators in a query. Similarly IDL has ‘must contain’, ‘may contain’, ‘not contain’ and ‘must contain nearby’ in place of ‘AND’, ‘OR’, ‘NOT’ and proximity operators respectively. In Carnegie Mellon University’s e-books search, there are two searchable fields – author and title. These two fields can be combined with AND only. Other libraries use standard Boolean operators AND, OR and NOT. In Beowulf project OR, AND NOT and WITH operators are used for Boolean OR, AND NOT and AND respectively. Table 5 shows the various types of Boolean search facilities available in the chosen digital libraries.
16
ProximitySearch A Proximity operator searches both words in a field or text with a fixed number of intervening word(s) between them. Ten digital libraries, out of twenty, under study have proximity search facilities. The proximity search operators used are “ADJ”, “NEAR” and “WITH”. ACM, NLC, BUILDER, HEADLINE, SETIS and GEMS use NEAR as the proximity operator. NDLTD uses “ADJ” as the proximity operator. In BUILDER and HEADLINE, when we use the NEAR proximity operator the documents that match the search term within 50 words are returned: the closer together the words are, the higher the rank of the page, so the higher it appears in the list of search results. In SETIS users can search by phrase or a combination of two phrases using the proximity operator ‘NEAR’; the number of characters between words can be limited to 40 or 80 or 120. Users can combine author and title fields with keyword or phrase search selected from any one of the above fields. In some libraries user can restrict the number of characters between two words while using the proximity operator. Table 5 shows the proximity search facilities available in the chosen digital library. Phrase Search A search expression may be built with the combination of terms or phrases and logical operators. A query may be entered in quotes to search for an exact match of the phrase. Fifteen libraries, out of the chosen twenty, have a phrase search facility. In BUILDER and HEADLINE, if the user does not specify any Boolean operator, the system will take the search expression as a phrase. Only SETIS has the facility of combining two phrases. In some digital libraries, for example SETIS, if there is no Boolean operator between two words in the simple search form, the system takes it as a phrase; in some cases we have to enter a phrase within quotes. In some libraries, for example in NZDL, the sequence of words are parsed and connected with ‘AND’ operator. Table 5 shows the phrase search features of the chosen digital libraries. Truncation Truncation searches allow users to search for different word variants with a single search expression where the truncation symbol stands for one or more characters in the search term. There are three types of truncation: left truncation, right truncation and middle truncation. Right truncation matches any number of characters at the end of the word, while left truncation starts with any number of characters followed by the search word. Middle truncation matches words starting and ending characters with any intervening characters. Sixteen, out of the chosen twenty, digital libraries have only right truncation facility. Various operators such as ‘*’, ‘#’, ‘?’ are used for truncation. The DIGILIB and Beowulf have the facility for single and multiple wild card searching. Table 5 shows the truncation search facilities available in the chosen digital libraries.
17
Stemming Stemming searches look for other grammatical forms of the search terms. For example a stemming search on fly would also find flies. AMMEM, BUILDER, HEADLINE, NZDL, and ACM have this facility. Fuzzy search Fuzzy search expands the search by generating similarly spelled words to the specified word or phrase. This type of expansion allows for misspellings. Only ACM digital library has this facility. Phonic searching Phonic search looks for a word that sounds like the word we are searching for and begins with the same letter. Only ACM digital library has this search facility. Case sensitivity Only four digital libraries provide case sensitivity options. NZDL has a facility to select case sensitive or case insensitive search using ‘c’ or ‘i’ respectively. British Library’s Beowulf has the option to select case sensitive search. In NDLTD and IEL, search terms in lower-case will match words in any case; otherwise, an exact case match is used. Term weighting In a search expression user can specify that some terms should count more than other. For example, if a user is looking for documents about both ‘Apple’ and ‘Pear’, he/she might want to give preference to the word ‘Apple’ over the word ‘Pear’. Term weighting allows to retrieve documents with higher weightage. NLC and NZDL have the facility of term weighting search. Limiters Some of the digital library collections are grouped according to format, year, form or type. Limiters are used to select or restrict a particular group of documents or forms or type to search. For example, NCSTRL and ACM provide a facility to limit by the year of search. If a user wants to search for documents for two years, he/she can restrict the search period using ‘greater than’ and ‘less than’ operators. Only a few libraries have the facility of comparative operators ‘>’ and ‘