26 records - Christian Plaunt. School of ... based front-end to MELVYL (tm) online catalog of the holdings of the nine campuses of the. University of ... This paper reports on progress in a research and demonstration project at the School of Library.
Making a Library Catalog Adaptive Michael K. Buckland Barbara A. Norgard Christian Plaunt School of Library and Information Studies University of California Berkeley, CA 92740 In: American Society for Information Science: Proceedings of the 55th Annual Meeting, 1992, pp. 260-263. Medford, NJ: Learned Information, 1992 Abstract The design of a prototype adaptive catalog is presented. Online catalog searches commonly retrieve too few or too many items. This prototype, implemented as a transparent workstationbased front-end to MELVYL (tm) online catalog of the holdings of the nine campuses of the University of California, adapts to excessive or insufficient retrieval by strategically limiting, sorting, or expanding users searches, based on preferences defined by the user.
1
Overview
This paper reports on progress in a research and demonstration project at the School of Library and Information Studies, University of California, Berkeley, entitled “Prototype for an Adaptive Library Catalog” (Buckland et al. 1992). The project seeks to make the searching of a large online library catalog easier and more effective. In particular, remedies are needed for the tendency of large online bibliographic retrieval systems to retrieve too many or too few records. A prototype, named OASIS, has been developed in the form of a DEC workstation serving as a front-end to the University of California MELVYL (tm) Catalog. Through the pre-processing of commands and the post-processing of retrieval results, the OASIS prototype allows enhancements to the MELVYL system to be developed and demonstrated economically. Three “strategic commands” – FEWER, FILTER, and MORE – will be discussed.
2
Problems
This project addresses a set of related problems that are generally found in online library catalogs and, indeed, in other bibliographic retrieval systems: 1. Searches typically retrieve too many or too few records (Markey 1989; Hudson & Walker 1987; Larson 1991). This problem is exacerbated by the growing size of databases as new material is added and as old records are converted to machine-readable form. 1
2. Newer, more sophisticated retrieval techniques, usually developed on small files, do not (or may not) scale up well or economically when used on large files (Hildreth 1989). 3. Users use only the most basic commands. Complex commands are rarely employed (Bellardo 1985). 4. Users generally settle for the first few records or start over with some new search command (Seaman 1992). 5. Most users of online catalogs are generally inexpert and yet, as card catalogs are abandoned, have no choice but to use the online catalog. Hence a major design challenge is to avoid making the task of using the online catalog too complex (Walker 1990). How can a system be designed to respond to this combination of problems?
3
A Solution: Adaptive Retrieval Systems
The approach adopted has three bases. Firstly, we seek to provide for the novice user some of the experience and tactics that an experienced search intermediary would use. Since human intermediaries cannot always be provided, this expertise needs to be incorporated into the system itself. Secondly, the system should be responsive to each user’s individual needs and preferences regardless of that user’s degree of skill. Thirdly, the system should be designed to be adaptive. Catalog designers cannot predict what searches will be submitted. Searchers have little basis for predicting how much material a database will contain relevant to any given search. Online bibliographic systems have traditionally been designed to retrieve everything with specified attributes, but, in practice, people rarely want everything. They normally want a few. those few that will suit them best. Although it should be possible for searchers to seek and obtain “everything on”, it would seem to be more sensible if catalogs would ordinarily supply a select few and allow the option of asking for more if and when desired. In other words, whatever the search and whatever the contents of the database, the catalog should ordinarily seek to supply the preferred number of items and retrieve those that appear to match the individual searcher’s preferences best.
4
Strategic Searches and Strategic Search Commands
A search strategy is a series of tactical search commands designed to achieve the desired result. Search strategies commonly fall into one of a few standard patterns. Conscious use of search strategies is characteristic of experienced searchers. Could at least some search strategies be incorporated into the system in such a way that they would be routinely used by inexperienced searchers? We use the term “strategic search command” to denote a single command that would automatically initiate a series of tactical search commands. Consider the pervasive problem of excessive (or insufficient) retrieval. In any case of excessive retrieval it would be convenient to be able to issue a “FEWER” command that would automatically modify the search to yield a preferable subset. In cases of large retrieval, a “SORT” command that would analyze and present statistical summary of the retrieved set so that the searcher could understand the characteristics of the retrieved set which could prove very helpful. Such an analysis would provide a basis for deciding how to proceed. If 2
too few records have been retrieved, five or less, for example, it would be convenient to be able to enter a “MORE” command such that the system would retrieve more records. Implementations of strategic commands for FEWER, SORT, and MORE for the MELVYL catalog are being examined using the OASIS prototype.
5
The “FEWER” Command
A FEWER command has been developed for reducing the size of retrieved sets. To do this it is necessary to for the system to know which criteria should be used as bases for retrieved set reduction. As presently implemented there are three default preferences: It is assumed that, other things being equal, if there are too many records, a user tend to prefer records for materials that are conveniently accessible (e.g. held in a nearby library), are readable (e.g., are in English), and are up-to-date. This is operationalized by incorporating in the OASIS front-end a series of MELVYL search modifiers that have the effect of limiting search results in the preferred way, such a “AND AT BERKELEY”, “AND LANGUAGE ENGLISH”, “AND DATE RECENT” and so on. A default set of such search limiters is provided but any user can specify other personal preferences, e.g. for older material, in Spanish, and held in southern California. Whatever preferences are adopted, an excessively large retrieved set can be reduced by issuing the command FEWER. The MELVYL system would not recognize “FEWER”. What happens is that the FEWER command is intercepted in the telnet interface between the front-end and MELVYL and, instead of being forwarded, it triggers the sending of the first (or next) of the search limiters stored in the front-end.
6
Fewer Example
A MELVYL search on the Library of Congress Subject Heading “information retrieval” yields 314 records which is likely to be too many for most purposes. The effect of repeated use of the FEWER command with the default set of preferences is as follows: FIND FEWER FEWER FEWER FEWER
7
XSU INFORMATION RETRIEVAL (sends AND AT UCB) (sends AND LANG ENGLISH) (sends AND DATE RECENT) (sends AND DATE CURRENT)
314 178 171 87 29
Retrieved Set Analysis
The “FILTER” or “SORT” command analyzes a retrieved set and presents the user with a statistical summary. The MELVYL system would not recognize the FILTER command which, again is preprocessed into commands that instruct MELVYL send a continuous display of the records. The OASIS front-end diverts these records into local storage, sorts the records according to the each value of each attribute of interest, and displays a count of the results. In the initial versions, it was assumed that location, language, and date would be of interest. The OASIS software would examine the downloaded MARC records for the values of these attributes into a three-dimensional array that is mapped into a two-dimensional display. An analysis of a retrieved set of 104 records yielded by a subject search on “cluster analysis” is shown in Table 1. 3
FIND SUBJECT CLUSTER ANALYSIS 104 Location: Berkeley Elsewhere Language English Other English Other Total 1989-92 5 2 5 1 13 1986-88 8 3 7 1 19 1983-85 4 2 8 2 16 1980-82 8 2 5 0 15 1977-79 3 1 7 2 13 1974-76 2 1 9 1 13 1956-73 9 0 6 0 15 Total: 39 11 47 7 104 Table 1. Retrieved set analysis. In effect a multiplicity of subsets, one for each combination of each specified attribute, has been created. The display serves two quite different purposes: The display itself reveals the composition of what has been retrieved, itself helpful feedback; and the records in any subset can be selected for display, a searching amenity. In principle, such analysis could be based on any specifiable data in (or inferred from) any field in downloaded bibliographic records.
8
“MORE” and “SUMMARIZE”
The emphasis of the project was on coping with excessive retrieval, but sometimes more records are wanted. In practice searchers do not simply want more, they want additional works that are similar to the retrieved set in specific ways. An expert searcher would expand a search result by finding other works by the same author, or on the same subject, or with similar titles, or with similar call numbers. The key is to identify the attribute that will lead to other works that are related in the desired way. For example, a MELVYL subject search on “bayesian statistics” yields no records. However, a title keyword search on “bayesian” and “statistics” does retrieve a set of 26 records. The OASIS command SUMMARIZE SUBJECTS downloads the subject headings assigned to the retrieved set and displays these subject headings order of frequency so that the searcher extend the search knowledgeably in whatever is the preferred direction. The SUMMARIZE SUBJECT command, when applied to the 26 records found by the title keyword search “bayesian statistics” yields the following: FIND TW BAYESIAN STATISTICS SUMMARIZE SUBJECTS Search result: 59 headings found 1. 2. 3. 4. 5. 6.
26
Bayesian statistical decision theory 26 Econometrics 5 Mathematical statistics 5 Probabilities 4 Social sciences 3 Statistics 3
A subject search on “bayesian statistical decision theory” leads to 203 records, a significant expansion of the retrieved set. 4
9
Implementation
Modification of the software of an online library catalog is likely to be a complex and expensive undertaking. There is, therefore, considerable advantage if an inexpensive prototype can be used to support the development of new and experimental features and to gain experience with their effects before deciding to modify the operational system. At the same time new commands should be tested against databases of realistic size and complexity. The approach adopted in this project is to use a workstation as a transparent front-end to the MELVYL online catalog of the holdings of the nine campuses. The workstation normally functions as if a plain terminal transmitting data to and from MELVYL. However, the power of the workstation is used to identify some user input, pre-processing it into a form that MELVYL will be able to handle. In particular, pre-processing is used to substitute a series of tactical commands for a user’s single strategic command. The workstation can also perform post-processing, downloading sets of retrieved records in order to do things that MELVYL cannot do, for example, retrieved set analysis and the summarizing of headings for expanded searches. The idea is that if a new function developed on the prototype were attractive enough, it could be implemented on the host system. The OASIS front-end runs on a Unix workstation (DECStation 5000) in Berkeley. It is linked over the Internet to the MELVYL Catalog (Books) which runs on an IBM mainframe in Oakland. It is written in Lisp and is designed to be flexible and amenable to experimentation.
10
Comments
The project objective is the demonstrate that strategic commands to reduce or expand retrieval results were feasible. More experience is needed to ascertain how effective and how helpful they will be in routine use. It is hoped that simple strategic commands of this form will empower novice users to achieve more expert searching. It is already clear that strategic commands are a very convenient amenity for expert searchers. Prototyping in this manner – using a workstation as a front-end to an operation system – is convenient but has two constraints: Firstly, the data available to the front-end is limited by the functionality of the host system. One can demonstrate innovative searches on downloaded retrieved sets, but searches in the host’s database are restricted to the search capabilities of the host. Secondly, the MELVYL system was not designed or high-speed downloading. Operations involving the downloading of large retrieved large retrieved sets will remain tedious until downloading speeds much faster than one MARC record per second are achieved. Nevertheless, two-stage retrieval, downloading a large retrieved set for more detailed searching, provides an opportunity to use retrieval techniques on the retrieved set that might be impossible on the host, either because it would not scale up well to a very large database or because it is technically or economically impractical. Note that the examples given involve the ranking of subsets rather than strict document ranking. Also, there appears to be significant scope for using non-topical attributes such as date, language, and location, for limiting and analyzing search results. There seems no obvious reason why the approach described here could not be adopted more generally, not only for other library catalogs but also for other kinds of bibliographic retrieval systems such as indexing and abstracting services. The general approach is to move some of the complexity form the task facing the searcher and move it into the system (Buckland & Florian 1991). 5
11
Future Work
The existing experimental prototype needs to be made more robust to support more extensive and realistic use. The strategic search command for expanding searches needs further development. The initial focus on language, date, and location could be expanded to include any fields in the MARC record. The filter command can be used to analyze MARC records, notably for detecting deficient records. Also, the analysis of large retrieved sets is likely to be useful for collection analysis.
Acknowledgments The work reported in this paper summarizes a research and demonstration project supported by the US Department of Education under the Higher Education Act, Title IID, award R19700017, and by the School of Library and Information Studies of the University of California at Berkeley.
References Bellardo, T. (1985). What do we really know about online searchers. Online Review, 9:223–239. Buckland, M. K. & D. Florian (1991). Expertise, task complexity, and the role of intelligent information systems. Journal of the American Society for Information Science, 42:635–643. Buckland, M. K., B. A. Norgard, & C. Plaunt (1992). Design for an adaptive library catalog. In Networks, Telecommunications and the Networked Information Resource Revolution: Proceedings of the ASIS 1992 Mid-Year Meeting, pages 165–171, Silver Spring, MD. American Society for Information Science, American Society for Information Science. Hildreth, C. R. (1989). Intelligent Interfaces and Retrieval Methods for Subject Searching in Bibliographic Retrieval Systems. Cataloging Distribution Service, Library of Congress, Washington, DC. Hudson, J. & G. Walker (1987). The year’s work in technical services research, 1986. Library Resources and Technical Services, (December 1987):275–286. Larson, R. R. (1991). Classification clustering, probabilistic information retrieval, and the online catalog. Library Quarterly, 61(2):133–173. Markey, K. (1989). Integrating the machine-readable LCSH into online catalogs. Information Technology and Libraries, 33:299–312. Seaman, S. (1992). Online catalog failure as reflected through interlibrary loan error requests. College & Research Libraries, 53:113–120. Walker, S. (1990). Interactional aspects of a reference retrieval system. In Infomatics 10: Prospects for Intelligent Retrieval, pages 119–136, London. Aslib.
6