Prototyping Enhanced Online Search Capability - CiteSeerX

13 downloads 27908 Views 141KB Size Report
Internet to a large second generation online library catalog. Pre-processing in the ... UNIX based program allows the computer to monitor user input and system response. Specific ..... Expert systems Computer science -- Congresses. 115 records ... References. Buckland, M. K., M. H. Butler, B. A. Norgard, & C. Plaunt (1993).
Prototyping Enhanced Online Search Capability Michael K. Buckland Mark H. Butler Barbara A. Norgard Christian Plaunt School of Information Management and Systems 207C South Hall, UC Berkeley Berkeley, CA 94720-4600 LP01 In Proceedings of the Fourteenth National Online Meeting, 1993, pp. 51-56. Medford, NJ: Learned Information, 1993 Abstract

This paper reports on a project to use a workstation as a front-end connected over the Internet to a large second generation online library catalog. Pre-processing in the front-end enables the searcher to submit new commands which the front-end passes on to the host in a form a acceptable to the host. Post-processing by the front-end of downloaded sets (and supersets) permits two-stage retrieval strategies and, thereby, enhanced retrieval capabilities not supported by the host. There are limits to what can be done, but this approach has the advantage of very inexpensive prototyping without any access to the software of the host system. It is, therefore, without risk to the host. Examples of enhanced functionality in the form of \strategic" search commands and retrieved set analysis are described.

1

Introduction

During the past decade, there has been a great investment in online public access catalogs and electronic bibliographic records. We are now seeing the addition of large periodical indexes and full text databases to these OPACs. This incredible growth highlights two large problems:  

need to lter excessive retrieval need to nd additional items like ones already found (Norgard et al. 1993)

We sought to develop a simple way in which to explore solutions to these problems by enhancing the existing base, rather than starting again from scratch with a new OPAC. It is impractical and costly to explore and test solutions by modifying the software underlying an existing electronic union catalog. The outcome of our project is OASIS (Otlet's Adaptive Searcher Information Service) (Buckland et al. 1993). It consists of several programs running in conjunction on a DecStation 1

5000/200. This program acts as an intermediary between the user and MELVYL (tm), the online union catalog of holdings for the entire University of California system (13 million holdings of 7 million items). OASIS allows us to appear to add commands to MELVYL without actually doing anything to MELVYL's software. New commands can be rapidly implemented and tested, without the large costs associated with actually modifying an existing database.

2

Description

The rst incarnation of OASIS is based on a scripting language called Expect (Libes 1991). This UNIX based program allows the computer to monitor user input and system response. Speci c commands can then be executed based on the user/machine interaction. Expect implements many common UNIX commands and all of UNIX's le handling capabilities. The language is interpreted, thus allowing scripts to be modi ed while the program is running. Most importantly, Expect allows the spawning of subprocesses. In our case, the spawning of telnet sessions to MELVYL. The use of telnet allows us to establish a client server relationship with MELVYL. Expect allows us to intercept all user input and to enhance and direct it properly. Similarly, we are able to intercept system response and to lter it for the user. We can also download large retrieval sets and apply additional processing before presenting the results to the user. MELVYL thinks OASIS is just a regular human user at a simple line terminal, and is happy to oblige its every request. From the user's perspective all of this background work is transparent. Once OASIS has attached itself to MELVYL, it appears as though the user is using regular MELVYL except that there are now several additional commands available. For the bene t of the user, we have even added some online help for OASIS. When the user requests help, OASIS examines their request. If they require help about OASIS, then it is provided directly. If they require help about MELVYL, their command is passed on to MELVYL. The simple C like language underlying Expect allows us to rapidly implement and test new ideas. The relative ease with which new commands can be added provides great exibility in terms of exploration of dead ends. There is no great loss of resources if an idea proves to be useless. The current direction of the project breaks down into two major areas: pre-processing and post processing commands. Pre-processing allows us to enhance the user's request by modifying their input. Post processing allows us to enhance the union catalog by modifying its output. Taken together, these pre- and post-processed commands allow us to take some steps toward exploring solutions to excessive and insucient retrieval.

2.1 Pre-Processing Commands

One key example of a pre-processing command in "FEWER". This command is designed to take advantage of MELVYL's existing mechanisms for limiting an excessive retrieved set in a logical and systematic way. OASIS stores a list of these limiting commands. Currently the default list is: and and and and

language English date recent (the last 10 years) date current (the last 3 years) form book

2

Each time the users issues the \fewer" command, the next limiting command on the list is sent to MELVYL. This can have the e ect of drastically reducing the size of a retrieved set. Search request: F SU MARX, KARL Search result: 2,757 records at all libraries Search request: F SU MARX, KARL AND LANGUAGE ENGLISH Search result: 1,184 records at all libraries Search request: F SU MARX, KARL AND LANGUAGE ENGLISH AND DATE RECENT Search result: 352 records at all libraries Search request: F SU MARX, KARL AND LANGUAGE ENGLISH AND DATE RECENT AND DATE CURRENT Search result: 79 records at all libraries Search request: F SU MARX, KARL AND LANGUAGE ENGLISH AND DATE RECENT AND DATE CURRENT AND FORM BOOK Search result: 79 records at all libraries

The user has reduced an unusable retrieval set of over 2,500 records down to a more manageable 79. The list invoked by the \fewer" command is completely customizable by the user. At any point they may edit the list. One useful addition is limiting to one or more of the university's three undergraduate libraries. This provides, in e ect, a limit to texts of a more introductory or general nature. Search request: F SU ARTIFICIAL INTELLIGENCE Search result: 1,817 records at all libraries Search request: F SU ARTIFICIAL INTELLIGENCE AND LANGUAGE ENGLISH Search result: 1,718 records at all libraries Search request: F SU ARTIFICIAL INTELLIGENCE AND LANGUAGE ENGLISH AND DATE RECENT Search result: 1,484 records at all libraries Search request: F SU ARTIFICIAL INTELLIGENCE AND LANGUAGE ENGLISH AND DATE RECENT AND DATE CURRENT Search result: 639 records at all libraries Search request: F SU ARTIFICIAL INTELLIGENCE AND LANGUAGE ENGLISH AND DATE RECENT AND DATE CURRENT AND FORM BOOK Search result: 636 records at all libraries Search request: F SU ARTIFICIAL INTELLIGENCE AND LANGUAGE ENGLISH AND DATE RECENT AND DATE CURRENT AND FORM BOOK AND AT MOFFITT Search result: 12 records at Moffitt 636 records at all libraries

3

In this way a typical search yielding 1,817 records has been winnowed down to 12 up-to-date introductory works. Another example of pre-processing concerns error checking. Since all users commands are monitored by the OASIS program, it is possible to prevent the user from doing incorrect or inappropriate things. For instance, the program currently checks to make sure that there is a retrievable set before downloading records for post processing. OASIS could also be programmed to correct the most common typographical errors in commands before sending them to MELVYL. A possible future enhancement is error-checking and guidance as the user edits their list of limiting commands invoked by "fewer". This list can be checked for accuracy and possible suggestions can be made before the list is saved. A second possible future enhancement involves the simulation of a common command language (Z39.58) on MELVYL's collection of bibliographic, periodical and full text databases. This would allow the user to simultaneously query these di erent databases using a single command language. The underlying program will parse the user's command and re-issue it to each database in a form it can understand. Certain heuristics can also be built in. For example, in certain databases it is better to search both title and abstract for key words, while in others searching the title and subject elds is more e ective. Initially, the result of this command would be a list of the size of retrieved sets in each database available on MELVYL and a note of the characteristics of the material in each database. There is no reason why the system could not be extended to also query other sizeable online union catalogs or even a GOPHER and WAIS.

2.2 Post-Processing

Two examples of post-processing are presented here. Both of them take advantage of a pseudo client server relationship. They also take the opposite approach of a command like \fewer", in that the goal is usually to retrieve as large a set as possible. The rst example is a command called \ lter" or \aggregate". It is designed to deal with the problem of the order of presentation of the retrieved set. Usually this order is alphabetical by author. If you retrieve 2,500 records you have no choice but to go through all of them until you nd some of the items you want. Unfortunately the order provided is usually not the order wanted. \Filter" attempts to solve this problem by breaking the large set into subsets. The current criteria are location, date and language. The user is then free to view the contents of cells in any order they choose. CAT-> find subject artificial intelligence AGGREGATE=> r1 c2 display records in the cell at row 1 col 2 of grid AGGREGATE=> aggregate redisplay Aggregate grid (can use agg) AGGREGATE=> done exit back to MELVYL UCB Other ------------------------------------------------------------------|| English | Other || English | Other || Total || Row || C1 | C2 || C3 | C4 || || ||-----------------------------------------------------------------|| R1 || 1990-1992 | 232 | 4 || 250 | 9 || 495 || R2 || 1980-1989 | 441 | 49 || 594 | 39 || 1123 || R3 || 1970-1979 | 66 | 8 || 68 | 11 || 153 || R4 || 1960-1969 | 15 | 0 || 27 | 10 || 52 || R5 || 1950-1959 | 0 | 0 || 1 | 0 || 1 ||

4

R6 R7 R8 R9 R10

|| 1900-1949 | 0 | 0 || 1 | 0 || 1 || || 1800-1899 | 0 | 0 || 0 | 0 || 0 || || 1700-1799 | 0 | 0 || 0 | 0 || 0 || || 1600-1699 | 0 | 0 || 0 | 0 || 0 || || -1599 | 2 | 0 || 0 | 1 || 3 || ||-----------------------------------------------------------------|| || Total | 756 | 61 || 941 | 70 || 1828 || ------------------------------------------------------------------Press RETURN for next set AGGREGATE->

Some topics, particularly those dealing with technology require ner time breaks. We have built in a second grid for only the last ten years. This allows the user to really focus in on what is wanted. AGGREGATE=> r1 c2 AGGREGATE=> aggregate AGGREGATE=> done

display records in the cell at row 1 col 2 of grid redisplay Aggregate grid (can use agg) exit back to MELVYL

UCB Other ------------------------------------------------------------------|| English | Other || English | Other || Total || Row || C1 | C2 || C3 | C4 || || ||-----------------------------------------------------------------|| R1 || 1992-93 | 46 | 0 || 77 | 1 || 124 || R2 || 1991 | 90 | 1 || 80 | 3 || 174 || R3 || 1990 | 101 | 3 || 98 | 5 || 207 || R4 || 1989 | 68 | 8 || 93 | 6 || 175 || R5 || 1988 | 93 | 6 || 96 | 8 || 203 || R6 || 1987 | 80 | 10 || 100 | 5 || 195 || R7 || 1986 | 56 | 7 || 105 | 8 || 176 || R8 || 1985 | 57 | 7 || 62 | 5 || 131 || R9 || 1984 | 39 | 4 || 63 | 1 || 107 || R10 || 1983 | 24 | 2 || 30 | 2 || 58 || ||-----------------------------------------------------------------|| || Total | 654 | 48 || 804 | 44 || 1550 || ------------------------------------------------------------------Press RETURN for next set AGGREGATE->

The entire retrieved set is stored in secondary memory on the workstation for the duration of the session. Thus users can modify their parameters and re-submit the set for additional post processing. In an e ort to provide ner granularity, we provide not only a campus by campus look, but also a library by library look within the campus. This allows the user to see what is available at the library where they are doing their search. AGGREGATE=> r1 c2

display records in the cell at row 1 col 2 of grid

5

AGGREGATE=> aggregate AGGREGATE=> done

redisplay Aggregate grid (can use agg) exit back to MELVYL

Astr/Math Other ------------------------------------------------------------------|| English | Other || English | Other || Total || Row || C1 | C2 || C3 | C4 || || ||-----------------------------------------------------------------|| R1 || 1990-1992 | 27 | 0 || 455 | 13 || 495 || R2 || 1980-1989 | 134 | 22 || 901 | 66 || 1123 || R3 || 1970-1979 | 40 | 3 || 94 | 16 || 153 || R4 || 1960-1969 | 9 | 0 || 33 | 10 || 52 || R5 || 1950-1959 | 0 | 0 || 1 | 0 || 1 || R6 || 1900-1949 | 0 | 0 || 1 | 0 || 1 || R7 || 1800-1899 | 0 | 0 || 0 | 0 || 0 || R8 || 1700-1799 | 0 | 0 || 0 | 0 || 0 || R9 || 1600-1699 | 0 | 0 || 0 | 0 || 0 || R10 || -1599 | 0 | 0 || 2 | 1 || 3 || ||-----------------------------------------------------------------|| || Total | 210 | 25 || 1487 | 106 || 1828 || ------------------------------------------------------------------Press RETURN for next set AGGREGATE->

The user has the exibility to toggle between libraries. AGGREGATE=> r1 c2 AGGREGATE=> aggregate AGGREGATE=> done

display records in the cell at row 1 col 2 of grid redisplay Aggregate grid (can use agg) exit back to MELVYL

Engineering Other ------------------------------------------------------------------|| English | Other || English | Other || Total || Row || C1 | C2 || C3 | C4 || || ||-----------------------------------------------------------------|| R1 || 1990-1992 | 150 | 0 || 332 | 13 || 495 || R2 || 1980-1989 | 167 | 1 || 868 | 87 || 1123 || R3 || 1970-1979 | 14 | 0 || 120 | 19 || 153 || R4 || 1960-1969 | 5 | 0 || 37 | 10 || 52 || R5 || 1950-1959 | 0 | 0 || 1 | 0 || 1 || R6 || 1900-1949 | 0 | 0 || 1 | 0 || 1 || R7 || 1800-1899 | 0 | 0 || 0 | 0 || 0 || R8 || 1700-1799 | 0 | 0 || 0 | 0 || 0 || R9 || 1600-1699 | 0 | 0 || 0 | 0 || 0 || R10 || -1599 | 1 | 0 || 1 | 1 || 3 || ||-----------------------------------------------------------------|| || Total | 337 | 1 || 1360 | 130 || 1828 || ------------------------------------------------------------------Press RETURN for next set

6

AGGREGATE->

The next iteration of OASIS will allow for complete con gurability by the user. They will be able to pick any three dimensions from a list. The dimensions can be either any MARC eld or sub eld in the retrieved set. We expect this will be extremely useful for more experienced users who will be able to think up novel combinations of dimensions for displaying or re-displaying the information. For example, what books on music were published in San Francisco before 1900? A further extension would allow for the user to combine separate retrieved sets for additional postprocessing. The second post processing command is \summarize". It will display a summary list by order of frequency for the selected attribute. The fteen items with the highest incidence are displayed. The simplest example allows a user to nd applicable LC subject headings given only a title word. For instance, a search for the subject heading \Vietnam War" yields no records. However a search for the title words \Vietnam" and \War" yields 535 records. When summarized by subject, the user can see which subject headings will retrieve the most items. Your request: OASIS subject summary Your result: 657 subject headings found Type FIND SU and the heading to search for books with that heading. Do not include birth or death dates.

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.

Vietnamese Conflict, 1961-1975 111 Vietnamese Conflict, 1961-1975 -- United States 85 Vietnamese Conflict, 1961-1975 -- Personal narratives, 43 Vietnamese Conflict, 1961-1975 -- Aerial operations, Americ 24 Vietnamese Conflict, 1961-1975 -- Poetry 22 Vietnam -- History 21 Vietnamese Conflict, 1961-1975 -- Literature and the confli 20 United States -- Foreign relations -- Vietnam 17 Vietnam -- Foreign relations -- United States 17 Vietnam -- History -- 1945-1975 17 American literature -- 20th century -- History and criticis 15 Vietnamese Conflict, 1961-1975 -- Motion pictures and the 13 United States -- History -- 194512 Vietnamese Conflict, 1961-1975 -- Literature and the war 12 United States -- Foreign relations -- 194511

records records records records records records records records records records records records records records records

Similarly, a more advanced user can view a summary of subjects from a retrieved subject set, thus gaining an insight into the interrelatedness of subject headings in the literature, a rough co-occurence measure, and the identi cation of a selection of related subjects potentially worth searching. Your request: OASIS subject summary Your result: 1275 subject headings found

7

Type FIND SU and the heading to search for books with that heading. Do not include birth or death dates. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.

Artificial intelligence Artificial intelligence -- Congresses Expert systems Computer science Expert systems Computer science -- Congresses Artificial intelligence -- Data processing Robotics -- Congresses Computational linguistics Linguistics -- Data processing Machine learning Artificial intelligence -- Addresses, essays, lectures Logic, Symbolic and mathematical Problem solving Cognition -- Congresses Computer vision -- Congresses Artificial intelligence -- Abstracts

1152 507 167 115 92 64 43 37 36 35 27 27 24 24 23

records records records records records records records records records records records records records records records

The summarize command will also work with authors and libraries. In the case of libraries, the idea is to facilitate browsing in an academic setting where there are many small libraries with overlapping holdings related to speci c subject areas. Unlike the \ lter" command, this will show the user which libraries are likely to be most worth a visit. Once those libraries have been identi ed, the user can look at a speci c library's contents by using the library as a dimension in the \ lter" command.

3

Conclusion

OASIS is an attempt to nd an ecient means of prototyping solutions to excessive and insucient retrieval on today's OPACs. Our current implementation allows us to rapidly develop and test potential solutions. If they do not prove e ective, then very little has been lost in time or money. Furthermore, OASIS provides an example of how existing OPACs can be enhanced without modifying their underlying structure.

References Buckland, M. K., M. H. Butler, B. A. Norgard, & C. Plaunt (1993). Oasis: A front-end for prototyping catalog enhancements. Library Hi Tech, 10(4):7{22. Libes, D. (1991). Expect: Scripts for controlling interactive programs. Computing Systems, 4(2). Norgard, B. A., M. G. Berger, M. K. Buckland, & C. Plaunt (1993). Online catalog: From technical services to access service. Advances in Librarianship, 17:111{148.

8