Searching on the Web: Two Types of Expertise - CiteSeerX

24 downloads 46015 Views 273KB Size Report
Sep 8, 1999 - series of search tasks in an economics-related domain. (introduction ... sites (browsing) seem to be available to users after only minimal training ...
Searching on the Web: Two Types of Expertise (updated handout version Sept. 8th 1999) Gerhard Strube Christoph Hoelscher Center for Cognitive Science, Institute of Computer Science and Social Research, University of Freiburg 79085 Freiburg, Germany

Center for Cognitive Science, Institute of Computer Science and Social Research, University of Freiburg 79085 Freiburg, Germany

[email protected]

[email protected]

ABSTRACT Efforts to improve Web search facilities call for improved understanding of user characteristics. We investigated the types of knowledge that are relevant for web-based information seeking, along with the knowledge structures and related strategies. In an exploratory field experiment, 12 established Internet experts were first interviewed about search strategies and then performed a series of realistic search tasks on the WWW. Based on this preliminary study a model of information searching on the WWW was derived and tested in a second study. In the second experiment two classes of potentially relevant types of knowledge were directly compared. Using a series of search tasks in an economics-related domain (introduction of the EURO currency) we investigated the effects of Web experience and domain-specific background knowledge on search strategies. We found independent and combined effects of both Web experience and domain knowledge, hinting at the importance of considering both types of expertise as cognitive factors in web-based searches.

1. INTRODUCTION Internet Search Engines such as Altavista or Excite are a central part of information seeking on the World Wide Web. While the skills necessary for navigating along hyperlinks in individual web sites (browsing) seem to be available to users after only minimal training, considerably more experience is required for querybased searching and inter-site navigation [4]. Because experienced users make use of search engines regularly for diverse information needs, i.e., using them quite often, it is reasonable to assume that they will develop particular expert knowledge in mastering these more complex services. The research presented here focused on interactions with search engines and related services. In addition, query-based searching allows for comparisons with research on search behavior of endusers in traditional IR systems.

2. PILOT STUDY with INTERNET PROFESSIONALS The behavior of experienced Internet users and their specific knowledge has not been systematically investigated. Thus the first study is of exploratory character and aims at a detailed description of Web expertise, describing typical search behavior of Web experts and constructing a descriptive model of information seeking with search engines. Comparable models for searching in electronic information systems were proposed by [3] and [5], but did not consider, for example, the specific

differences between the World Wide Web and bibliographic database systems. We define Web expertise as a type of media competence, as the knowledge and skills necessary to utilize the WWW and other Internet resources successfully in solving information problems. It has to be clearly distinguished from background-knowledge related to the topic area of a specific Web search (see section 3). Well established Internet professionals were recruited for this study. They can claim at least 3 years of intensive experience with this medium and a daily use of the Internet as a source of information at their workplace. Among the 12 participants – each of them participated in both parts of the expert study – were information brokers, Web masters, Internet consultants, Web content designers, librarians and authors of books about online searching.

2.1 Procedure 2.1.1 Phase I: Interviews First the participants were asked to describe their experience with the available search services, their search behaviour and their intentions and rationales for using certain sources and strategies. With the help of mental walk-throughs the process of searching for online information was then discussed step by step. Then a specialised card-sorting task was employed to scrutinize the experts' conceptual structures [6]: During the interview, relevant terminology and actions were recorded on color-coded cards. Afterwards, the experts were asked to build a graph structure with these cards. To support the participants in this task, some appropriate concept categories and relations were predefined and presented to the experts.The resulting structure was supposed to represent an expert's personal conceptualization of the search process.

2.1.2 Phase II: Web-based information-seeking tasks In the second phase of this expert study, the experts had to perform a number of real-life information-seeking tasks on the Internet (example: 'Which finger is unaffected in RSI?'). All inputs to the computer were carried out by an experimenter's assistant who received the oral instructions for each action from the participants, thus forcing the expert to make every step of the interaction process verbally explicit. Furthermore, the experts were asked to think out loud during their search activities.

in the search process and that the experts in our study quite frequently switched back and forth between these modes, if necessary.

2.2 Results of the expert study 2.2.1 Interviews The experts reported a wealth of Internet-related knowledge, much of it highly idiosyncratic. Therefore, expert statements relevant to the search process were collected from the transcripts and entered into a matrix to determine which concepts, heuristics and strategies were common to the majority of the experts. Likewise, the concept-card models were inspected for interindividual, common knowledge structures. The statement matrix and the card models were aggregated into an initial process model of information seeking with search engines, describing the search process from the experts' shared perspective.

2.2.2 Web-based information-seeking tasks We distinguished two levels of data analysis, the level of information seeking steps and the level of individual search queries. For the analysis of information seeking steps, a set of rules was derived from the experts' process model for segmentation and categorization of the protocols into action units. A total of 56 information problems was tackled by the subjects, two thirds of these successfully. A total of 1956 action units were identified, each corresponding to a step in the process model. The matrix of transition probabilities between all steps of the model was computed, allowing for an analysis of interaction sequences. Only the main results are presented here.

A close-up on direct interaction with a search engine reveals a pattern that corresponds to the default handling of the search engine with high transitions probabilities (from top to bottom of figure 2). Additionally, the experts showed more complex behavior if no relevant documents were found, including reformulations or re-formatting of existing queries, changing search engines, requesting additional result pages as well as backtracking to earlier result pages or queries. Again we observe opportunistic behavior, which makes use of all the options a search engine provides. The interactive and iterative nature of the search process was also quite salient in the experts' behavior. )LJ4XHU\)RUPDWWLQJLQ3LORW6WXG\ ([SHUWVLQ3LORWVWXG\ 1RRI

DOOSDUWLFLSDQWVDUH([SHUW:HEXVHUV

,QIRUPDWLRQQHHG 



6HDUFK $FFHVV NQRZQ 



:HE6LWHGLUHFWO\

(QJLQH LQWHUDFWLRQ



 



$FFHVV 



GRFXPHQW



([DPLQH 

%URZVHLQ 

:HEVLWH

 

)DLOXUH



6XFFHVV

On a global level of querying and browsing we found that for two thirds of the search tasks, the experts initially chose to use a search engine. Only in one third of the cases did they opt for accessing a known web site as the initial strategy. Finding potentially relevant documents with a search engine led to browsing episodes in about 47 percent of the cases. Once the searchers were in “browsing mode” they continued browsing for several hops, hence the .73 probability of one browsing move leading to the next. Browsing episodes equally likely terminated the search or led back to the search engine. This indicates that browsing and keyword-based searching were equally important )LJ3LORWVWXG\6HDUFK(QJLQH&ORVHXS DOOSDUWLFLSDQWVDUH([SHUW:HEXVHUV

6HOHFW/DXQFK6HDUFK(QJLQH 

>$GYDQFHG6HDUFK@ 6HOHFW/DXQFK6HDUFK(QJLQH 

3.1.1 Simulated Search tasks







*HQHUDWH6HOHFWVHDUFKWHUPV

Based on the process model developed in section 2, complex search tasks were broken down into sub-tasks corresponding to individual steps of the process, like search term selection or query revision. The resulting sub-tasks were collected in a questionnaire. The approach made sure that each participant worked on the same stimuli (words, queries, result pages), allowing for comparisons that are not readily available from observing unrestricted task performance on the Web. In "real" searches on the Web participants will follow different path trying to solve given tasks and will hardly ever faces exactly the same pages of results or have to reformulate the exact same search queries as another participant.



@



$FFHVV IDPLOLDU  :HEVLWHGLUHFWO\

6HDUFK



(QJLQH 











LQWHUDFWLRQ



$FFHVV 

GRFXPHQW



([DPLQH 

%URZVHLQ 

:HEVLWH

 

)DLOXUH



6XFFHVV

Across all experimental groups the pattern of action sequences is comparable to the data from the Pilot study. One important difference is the fact that participants now obviously found less useful pages and had to reiterate their searches more frequently to find relevant information (see Fig. 4 + 5 for details). This increased difficulty most likely reflects both differences in the tasks (harder) and the participants (overall lower levels of

90 80 70 60 50 40 30 20 10 0

W eb - Ec ono W eb - Ec ono + W eb + E c ono W eb + E c ono +

Ac c es s topic -related W eb s ite

Brow s e from unive rs ity hom e

Ac c es s Searc h engine direc tly

Nets c ape Searc h button

(Fig. 6) we find several important differences between groups. Only “double experts” initially tried to directly access web-sites related to economics, while all others immediately accessed a search engine in one way or the other. Web-Experts would type in the URL of their favorite search engine, while the “double novices” were highly inclined to simply click on the Netscape Search button (these effects prove to be significant interactions in HILOGLINEAR analysis). Once a Web search has led to a page of results (Fig. 7), Web experts were significantly more likely to choose a target document for closer inspection than Web novices (35% vs. 25%), while Web novices more often reiterate their search queries. We also found significant interactions of domain knowledge and Web expertise: When Web experts have little domain knowledge, they were most likely to pick a target document (possibly for lack of clear selection criteria). Double novices showed the highest proportion of query re-formulations while choosing the smallest number of target documents for closer examination– and of these documents the highest proportion turned out to be irrelevant1. A 1

In most cases - and more often than Web and/or domain experts - the "double novices" did not find the required information on

qualitative inspection of the query re-formulations that were

)LJ:KDWWRGRZLWKDSDJHRIUHVXOWV

above). But they seem to compensate for this by showing more verbal creativity and flexibility than the other groups: They were most likely to use their own terminology and not only rely on the words that were already in the original task statement. Also they more often than others used completely different terminology from one query to the next.

60

P e r c e nt

50

W e b - Econ o -

40

W e b - Econ o +

30

W e b + E con o -

20

W e b + E con o +

10 0 C h an g e of Se a rch

R e form u late / N e w Qu ery

N e xt / pre v R e s u lts

C h oo s e a d ocum e n t

En g in e

issued by the double novices indicated that they often make only small and ineffective changes to their queries, forcing them to reiterate repeatedly.

3.2.2 Query properties We found the same general pattern of query formulations for both the web-based search and the simulated search tasks, with the data from the search simulations being somewhat more clear-cut (Figure 8). )LJ4XHU\)RUPDWWLQJLQ6LPXODWHGVHDUFKWDVNV

SHUFHQWRIDOOTXHULHV 'RPDLQNQRZOHGJH/2:

'RPDLQNQRZOHGJH+,*+

:HEQRYLFH

:HEH[SHUW

:HEQRYLFH

:HEH[SHUW









 SOXV



































$1'

DQ\NLQGRI IRUPDWWLQJ (UURUV

Web experts relied significantly more on query formatting tools than Web novices (87 % vs. 47 %), while higher domain knowledge corresponded to a lower number of Boolean operators and modifiers being used. A very clear effect of Web expertise could be found for the number of queries with formatting errors (19.6% vs. 1.9%). Effects of the experimental conditions could also be established for the number of search terms per query, and the sources of search terms, but only in the searches actually performed on the Web, not in the Simulated Search. From the Pilot study one would have expected Web experts to use longer queries. This hypothesis was not confirmed: the queries issued by Web experts were only marginally longer than those of Web novices (2.61 vs. 2.32 words/query). Instead we found a significant effect of domain knowledge: Participants with little domain knowledge made significantly longer queries (average query length: 2.96 vs 1.97 words). Maybe domain experts know more appropriate terms and hence need fewer of them. The analysis of query formatting (Fig. 8.) revealed that participants who know a lot about the subject domain, but lack Web expertise are quite reluctant to use query formatting (see

the chosen page and returned back to the results page without further browsing

4. Discussion In the Pilot study we investigated how Internet professional conceptualize the search process and derived a process model of search engine interaction. This model was first applied to the search behavior of the same Internet professionals and we believe that it has shown its value as a tool for capturing expert searching behavior. In the second study, the EURO study, we focus on a direct comparison of expert and novice web searchers. It turns out that the process model can be applied to the behavior of both expert and novice searchers and that it also captures differences between these groups. Expertise was further differentiated into technical Web expertise and domain-specific background knowledge, in this case the field of economics. The two types of expertise have shown independent and combined effects. Participants which could rely on both types of expertise were overall most successful in their search behavior. Lacking in one or the other type of expertise led to compensatory behavior, with, for example, domainexpert/web-novices relying heavily on terminology and avoiding query formatting. The severe troubles that the "double novices" faced when dealing with the tasks in the EURO study again point at the joint contribution that both domain-knowledge and Internet expertise make to the search process. Further analyses of the rich data set from the EURO study are currently underway. These will look, e.g at how much time the experimental groups spend in individual search stages and overall, and try to further differentiate between search paths which lead to success or failure.

5. REFERENCES [1] Hsieh-Yee, I. (1993). Effects of Search Experience and Subject Knowledge on the Search Tactics of Novice and Experienced Searchers. Journal of the American Society for Information Science, 45(3), 161-174. [2] Jansen, B. J., Spink, A., Bateman, J., & Saracevic, T. (1998). Real life information retrieval: A study of user queries on the Web. SIGIR Forum, 32(1), 5-17. [3] Marchionini, G., Dwiggins, S., Katz, A., & Lin, X. (1993). Information Seeking in Full-Text End-User-Oriented Search Systems: The Roles of Domain and Search Expertise. LISR, 15, 35-69. [4] Pollock, A., & Hockley, A. (1997). What's wrong with Internet Searching. D-Lib Magazine.[Online] Available: http://www.dlib.org/dlib/march97/bt/03pollock.html [1998, March 4] [5] Shneidermann, B., Byrd, D., & Croft, W.B., (1997). Clarifying Search: A User-Interface Framework for Text D-Lib Magazine. Searches. [Online] Available: http://www.dlib.org/dlib/ january97/retrieval/01shneiderman.html [1998, March 4] [6] Strube, G., Janetzko, D., & Knauff, M. (1996). Cooperative construction of expert knowledge: the case of knowledge engineering. In P. B. Baltes & U. M. Staudinger (Eds.), Interactive minds (pp. 366-393). Cambridge: Cambridge University Press.