The aim is to transfer the advanced strategies to the disposal of all users, not just .... by the Graduate School in Use
UNDERSTANDING EXPERT SEARCH STRATEGIES FOR DESIGNING USER-FRIENDLY SEARCH INTERFACES Anne Aula and Mika Käki Tampere Unit for Computer-Human Interaction Department of Computer and Information Sciences Pinninkatu 53B, FIN-33014 University of Tampere E-mail:
[email protected],
[email protected]
ABSTRACT Web search engines face an extremely heterogeneous user population from web novices to highly skilled experts. Currently, the search strategies of the experienced web searchers are largely unknown and this paper addresses this issue by an observational study. Seven computer scientists were observed during and interviewed after performing their own work-related web search tasks. The observations indicated that experts have effective means for enhancing the searching, such as: using multiple search terms and operators, frequent query editing, using multiple windows, versatile result saving, and using the 'Find' functionality. On the other hand, even the experts had misconceptions about the default operator and the ordering of the returned documents. Based on the results we suggest that search user interfaces should: 1) suggest alternative search terms, 2) explain search operators in natural language, 3) provide search history, and 4) facilitate users’ orientation to the results. The suggestions are formulated as concrete solutions in a prototype user interface. The aim is to transfer the advanced strategies to the disposal of all users, not just experts. KEYWORDS Expert strategies, interface design, web search, search user interface.
1. INTRODUCTION About 85% of the users of the World Wide Web (web) have been estimated to use search engines (Kobayashi and Takeda, 2000). However, novices are shown to have considerable difficulties in finding information from the web (Pollock and Hockley, 1997), while experts are more successful. The difference between the groups implies that employed strategies can make a difference. Pollock and Hockley (1997) studied web searching of Internet novices and found that novices inappropriately tried to use natural language expressions, tried to express several searches at the same time, and over or under specified the search requests. Furthermore, search engine log studies show that the “general public” also uses only few search terms (typically from 1 to 3), use operators rarely, and make many mistakes when using the operators (Jansen and Pooch, 2001; Jansen et al, 2000; Spink et al., 2000). In addition to query formulation, an important part of information search is to have the found information available in the future. Jones et al. (2001) and Sellen et al. (2002) report multiple methods (e.g., e-mailing URLs to oneself) that professionals use to achieve this. Our goal is to uncover strategies experienced users have to make the demanding task of information search fluent. We aim at making those strategies available for less-experienced users, as well. As a target group, we chose seven researchers in computer science. Researchers need large amounts of information in their work and web is a valuable source for it in computer science (Goodrum et al., 2001). Frequent use of web as well as computer expertise enables these people to use computers creatively in information searching. Our observations showed that the experts formulated queries with multiple search terms and refined their queries frequently. They used many browser windows to retain the context while searching. They also used file system services to store temporary and final results. They also used the ‘Find’ function frequently. Surprisingly, they had misconceptions about the ordering of the results and the default operator.
2. PARTICIPANTS AND SETTINGS Seven researchers in computer science participated. The participants had worked as researchers for 4.5 years on average. They all were active users of the web having used it for 7.6 years on average. The participants rated themselves as experienced information searchers (mean 8.3 on a scale from 1 to 10). Participants were observed and interviewed in their own offices while doing work-related searches. They were told to use all the tools and methods they would normally use. They could finish the session whenever they wanted (after 30 minutes, the session was finished by the experimenter). The session was recorded with a video camera. After the session, the participants were thoroughly interviewed about their search strategies.
3. OBSERVATIONS All of the participants used Google (http://www.google.com/). They had used Google for 2.5 years, on average. None of the subjects had altered the default settings of Google, so, 10 results were shown per page. The mean time for a search session was about 25 minutes per participant. During the search session, participants entered on average 7.7 queries, resulting in a total of 54 queries. Search terms were defined as character strings separated by a space. Stop words, such as articles, were not counted as search terms unless they were parts of a phrase or preceded by a ‘+’ sign. The mean number of search terms per query was 3.3 (SD=1.8). For participants P1, P3, and P5, the mean number of search terms was 4.3 (SD=0.5) and for others, the mean number was 2.3 (SD=0.8). Usually, the queries consisted of just the query terms written consecutively (86% of all queries). Phrase search was used in 4% and the ‘–’ sign was used in 10% of the queries. Four participants did not use either the phrases or the ‘–’ signs. Observations are summarized in Table 1 where the columns are arranged from left to right according to the work experience. Note that the three users with the highest work experience used the longest queries (shaded columns). Furthermore, these users are the ones who most commonly used search strategies that are not directly supported or suggested by the search engine (marked in the table with ‘+’). The misconceptions about the search engine are shown in the bottom of the table (marked with ‘−’). Table 1. Summary of the observations. Subjects in the columns and strategies and problems in the rows. Advanced strategies are marked with ‘+’ and problems with ‘– ‘. Observation
P1
P3
P5
P7
P6
P4
P2
Number of years as a researcher
15
4.5
4
3
3
1
1
Number of search terms per query
3.8
4.2
4.9
3.3
2.5
1.7
1.6
Times refining the query
4
6
5
1
1
1
1
Advanced search features (‘–’ or phrase)
+ + +
+ + + + + – –
New window for the results Saving links to separate file/folder ‘Find’ to locate search terms Copying and pasting search terms Misconceptions about the default operator Misconceptions about the order of results
– –
+ + + + – –
+
– –
– –
–
–
The most important advanced strategies used by the participants were the following: • Saving links and results. All but one participant collected and organized the material they found for later use. Searching was seen as a two-stage process where “relevant-looking” documents are first collected and only afterwards, the documents are read in detail. Four participants used bookmarks, one used a separate text file, and one a file system folders to store the links. P1 commented: “Usually, in a search task like this, I go through a large number of search results quickly. If the results are PDF files, I save them on my computer and if they are WWW pages, I collect links to them. Only in the next phase I start reading the contents of the result pages.”
•
Opening new windows. Four participants opened result pages in a new browser window. Windows were seen as good memory aids, but this strategy also had problems. P5 commented: “I can easily have ten windows open at the same time. […] Soon the system would have collapsed, because the system resources of Windows 98 are running out because of too many open browser windows.” • Using Find command. Two participants frequently used ‘Find’ function to quickly locate the search terms in the documents (HTML or PDF) and to see the context in which they appeared. • Copy-pasting terms. Two users commonly entered search terms using ‘Copy’ and ‘Paste’ actions. For example, titles of articles from a web page were copied and pasted to the search field. P5 commented: “I start with common search terms. After finding something based on the common terms, I start using the terms found in the results.” A surprising result was that the participants had misconceptions about the way Google works. None of the participants knew that Google uses AND as the default operator and only two participants had a correct idea about the result ordering. P7 said: “In my understanding, it first tries to find the documents that have all the search terms and shows them, then the ones with all but one term and so on.”
4. IMPLICATIONS To support users with different levels of information search skills in all phases of the search process, we suggest four guidelines for search user interfaces: 1. Suggest search terms. Our observations indicate that experienced searchers use more search terms than less experienced searchers. Usually, the terms are added with Copy and Paste functions. The search interface should facilitate formulating more accurate queries by suggesting terms to the users. 2. Explain operators in natural language. Boolean operators are powerful search tools, but even expert users have misconceptions about the default operators used by search engines. If the operators were explained in natural language, the use of them would presumably become easier. 3. Provide a search history. The experts left “footsteps” by opening the results and formulating new queries in new browser windows. This was done to make it easy to go back to leads found earlier. Search interface should facilitate the search process by maintaining the footsteps for the user. 4. Facilitate the evaluation of the results. Participants used ‘Find’ in order to quickly see the context in which the search term occurred in documents. By highlighting them for the user, the context in which the search term appears in the documents could be found effortlessly. We have implemented a search interface demonstrating these guidelines (Figure 1). On the left, there is a list of clusters automatically generated from Google results. The clusters form groups of the results that share the same words or phrases and they are used for filtering the result listing. The clusters help users in evaluating the accuracy of large result sets (Guideline 4) and provide a fast way to access results, even the ones far down the result list (by selecting a cluster, the user will see only the results that contain the selected keyword(s)). Systematic testing of this functionality is currently ongoing and the first tests have shown it to be an effective method for accessing relevant documents from long result listings efficiently and accurately.
Figure 1: Enhanced search interface with keyword list and a pop-up menu.
In addition to giving an easy access to relevant results, the clusters serve as possible additional search terms (Guideline 1). The terms from the clusters can be added to the query through a pop-up menu as required or forbidden terms (Guideline 2). We will also implement a method for re-writing the query in a natural language so that the user can check if the query was interpreted correctly (Guideline 2). For example, the user could formulate a query: “information visualization” OR “scientific visualization” –maps. The search user interface translates the query into natural language and shows the user the interpretation of the query: You are now searching for documents that either have the phrase “information visualization” or “scientific visualization” in them (or both). The word “maps” will not occur anywhere in the documents. Currently, our interface has a history function that stores the queries made during one search session in a combo box (Guideline 3). In the future, our plan is to provide the users with an enhanced history function that will store all the queries and important web pages, and also make it easy for the users to organize their search history.
5. DISCUSSION AND CONCLUSIONS With seven participants, we do not claim to have revealed all or even the most efficient expert strategies that exist. Anyhow, we showed that experts have strategies that make the information searching efficient and fluent. We claim that also novices can benefit from these strategies given appropriate user interface to support them. In the design suggestions, we proposed ways of satisfying the purposes of the advanced strategies the experts used. The suggestions include interface features that support the searching as a process (search history), easier evaluation of the search results (result clustering and highlighting the search terms), as well as features that make it easier to formulate queries, also ones with Boolean operators (translating the operators and queries to natural language). The success of the proposed features is currently evaluated with empirical studies.
ACKNOWLEDGMENTS This work was supported by the Graduate School in User-Centered Information Technology and the Academy of Finland (project 178099). We thank Kari-Jouko Räihä for valuable comments.
REFERENCES Goodrum, A.A. et al, 2001. Computer Science Literature and the World Wide Web. Available online at http://www.neci.nec.com/~lawrence/papers/cs-web01/cs-web01.pdf. Jansen, B.J. and Pooch, U., 2001. Web user studies: A review and framework for future work. Journal of the American Society for Information Science and Technology, Vol. 52, No. 3, pp. 235-246. Jansen, B.J., Spink, A., and Saracevic, T. 2000. Real life, real users, and real needs: A study and analysis of user queries on the web. Information Processing and Management, Vol. 36, No. 2, pp. 207-227. Jones, W., Bruce, H., and Dumais, S., 2001. Keeping found things found on the Web. In Proceedings of the Tenth International Conference on Information and Knowledge Management, pp. 119-126. Kobayashi, M. and Takeda, K. (2000) Information retrieval on the Web. ACM Computing Surveys, Vol. 32, No. 2, pp. 144-173. Pollock, A. and Hockley, A., 1997. What’s wrong with Internet searching. D-Lib Magazine. Available online at www.dlib.org/dlib/march97/bt/03pollock.html. Sellen, A.J., Murphy, R., and Shaw, K.L., 2002. How knowledge workers use the Web. In Proceedings of CHI’02, ACM Press, pp. 227-234. Spink, A. et al. (2000) Searching the Web: The Public and Their Queries, Journal of the American Society for Information Science and Technology, 52(3), 226–234