task-based information interaction evaluation: the ...

TASK-BASED INFORMATION INTERACTION EVALUATION: THE VIEWPOINT OF PROGRAM THEORY  KALERVO JÄRVELIN, PERTTI VAKKARI, PAAVO ARVOLA, FEZA BASKAYA, ANNI JÄRVELIN, JAANA KEKÄLÄINEN, HEIKKI KESKUSTALO, SANNA KUMPULAINEN, MIAMARIA SAASTAMOINEN, REIJO SAVOLAINEN, and EERO SORMUNEN University of Tampere Evaluation is central in research and development of information retrieval (IR). In addition to designing and implementing new retrieval mechanisms, one must also show through rigorous evaluation that they are effective. A major focus in IR is IR mechanisms’ capability of ranking relevant documents optimally for the users, given a query. Searching for information in practice involves searchers, however, and is highly interactive. When human searchers have been incorporated in evaluation studies, the results have often suggested that better ranking does not necessarily lead to better search task, or work task, performance. Therefore it is not clear, which system or interface features should be developed to improve effectiveness of human task performance. In the present paper we focus on evaluation of task-based information interaction (TBII). We give special emphasis to learning tasks to discuss TBII in more concrete terms. Information interaction is here understood as behavioral and cognitive activities related to task planning, searching information items, selecting between them, working with them, and synthesizing and reporting. These five generic activities contribute to task performance and outcome, and can be supported by information systems. In an attempt toward task-based evaluation, we introduce program theory as the evaluation framework. Such evaluation can investigate whether a program consisting TBII activities and tools works, how it works, and further, provides a causal description of program (in)effectiveness. Our goal in the present paper is to structure TBII on the basis of the five generic activities and consider the evaluation of each activity using the program theory framework. Finally, we combine these activity-based program theories to an overall evaluation framework for TBII. Such evaluation is complex due to the large number of factors affecting information interaction. Instead of presenting tested program theories, we illustrate how the evaluation of TBII should be accomplished using the program theory framework in the evaluation of systems and behaviors, and their interactions, comprehensively in context. Categories and Subject Descriptors: H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval---Search process, Selection process General Terms: Experimentation, Human factors, Theory Additional Key Words and Phrases: Task-based information interaction, evaluation ACM Reference Format: Kalervo Järvelin, Pertti Vakkari, Paavo Arvola, Feza Baskaya, Anni Järvelin, Jaana Kekäläinen, Heikki Keskustalo, Sanna Kumpulainen, Miamaria Saastamoinen, Reijo Savolainen and Eero Sormunen 2014. Task-based information interaction evaluation: the viewpoint of program theory. ACM TOIS Vol. 33, No. 1, Article 3, 30 pages. DOI:http://10.1145/2699660, Publication date: March 2015.



Acknowledgement: Author’s addresses: School of Information Sciences, University of Tampere, Finland. Permission to make digital or hardcopies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credits permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 8690481, or [email protected]. © 2010 ACM 1539-9087/2010/03-ART39 $15.00 DOI:http://dx.doi.org/10.1145/0000000.0000000

1. INTRODUCTION

Evaluation is central in research and development of information retrieval (IR). It is not sufficient alone to design and implement new retrieval mechanisms – one must also show through rigorous evaluation that they are effective. In the literature of IR evaluation, a major focus is IR systems and their components. They are often evaluated by their capability of ranking relevant documents optimally for the users. This is often done through test-collection based experiments not involving users directly [Sanderson 2010]. Searching for information in practice involves searchers, however, and is highly interactive. When human searchers have been incorporated in evaluation studies with simple search tasks, the results have often been mixed: better ranking does not necessarily lead to better search task performance [Turpin and Scholer 2006; Smith and Kantor 2008]. Therefore it is not clear, which system or interface features should be developed to improve effectiveness of human task performance. This obvious limitation has led to task-based and contextual IR research [see Belkin 2010; Ingwersen and Järvelin 2005; Vakkari 2003]. We believe it plausible that information retrieval, as a practical activity, should contribute to task performance. If this is not observed, our expectation is wrong or we do not understand which factors contribute, and how, to task performance. We believe that rational development of information systems, including IR systems, requires an understanding of the task process where IR systems are used as tools. In this paper we focus on evaluation of task-based information interaction (TBII). Tasks are understood as the larger tasks motivating information interaction. Sometimes such tasks are called work tasks [Ingwersen and Järvelin 2005] but we prefer the term “task” since all tasks of interest need not be work tasks. We focus in particular on learning tasks [cf. Marchionini 2006] as tasks to reduce the excessive complexity of embracing all possible tasks and to be able to discuss TBII in more concrete terms. Information interaction is here understood as behavioral and cognitive activities related to task planning, searching and selecting information items, working with information items, and synthesizing and reporting [cf. Blandford and Attfield 2010; Cool and Belkin 2002]. More specifically, the main activities constitutive of information interaction are characterized through the following examples: — task planning: understanding goals, sketching the process, and mapping available and needed information — searching information items: specifying, exploring, finding, or stumbling on items — selecting information items: choosing, bookmarking, or storing items — working with information items: reading, organizing, analyzing, or converting — synthesizing and reporting: drafting, constructing, reporting, or disseminating. Therefore, information interaction is broader than searching and subsumes information access, task-based searching or task-based information retrieval, terms often used in the literature. Information items are any retrievable or accessible units of information that may come into attention of task performer. These may be traditional documents, their identifiable components (e.g., captions, paragraphs), or web pages containing various media types. We divide the TBII process into the five generic activities listed above. They all contribute to task performance and outcome, and can be supported by information systems. They are essential activities in learning tasks and relevant in many other information intensive tasks. In practice, these activity types are intertwined and iterate but can be discerned at least for analytical purposes. We introduce program theory [Rossi et al. 2004] as the framework for TBII evaluation. As an evaluation framework, program theory is suitable for comprehensive evaluation of complex programs, which have several interacting factors affecting their performance. This framework originates from the evaluation of programs focusing on social problems. Program evaluation investigates whether the program works, how it

works, and further, explicates the theory which provides a causal description of program (in)effectiveness. Our goal in the present paper is to structure TBII on the basis of the generic activities and consider the evaluation of each activity using the program theory framework and finally combine these activity-based program theories to an overall evaluation framework for TBII. Such evaluation is complex due to the large number of factors affecting information interaction [e.g., Ingwersen and Järvelin 2005]. We cannot yet provide fully developed and verified program theories for the activities or the entire task process. We rather illustrate how the evaluation of TBII could be accomplished using the program theory framework when the aim is to evaluate systems and behaviors, and their interactions, comprehensively in context. We refer by “sources” generally to web sites, databases, search engines, and (digital) libraries that provide information items. As a practical activity, information retrieval or searching is written in lower case, and the scholarly domain as “Information Retrieval”. Both have the acronym “IR” and we hope that the context makes the meaning clear. We use “tools” to refer to information (retrieval) systems and their components supporting the five activities and their subactivities. We use “system” to refer to objects of evaluation, which may consist of tools, their users and the task process. In Section 2 we review earlier literature on TBII. In Section 3 we discuss evaluation and program theory in more detail. In Section 4 we discuss tasks and their constituent activities as a context of evaluation; here we focus on learning tasks as a specific subdomain, which will be followed throughout the rest of the paper. We also present an example of task-based evaluation of information interaction. In Sections 5 through 9 we discuss the applicability of the program theory framework for the evaluation of the generic activities in the context of learning tasks in particular. In Section 10 we collect the emerging overall evaluation framework based on the program theory approach. Section 11 concludes the paper. 2. EARLIER STUDIES ON TBII EVALUATION

There is a limited number of studies, which either analyze information searching as a part of a larger task or evaluate information retrieval systems from a task perspective, i.e., how the use of search system or tool supports accomplishment of a task. In the following we first review relevant evaluation frames and evaluation studies, and then major empirical studies on relations between task performance and information searching. Early advocates of task-based IR system design and evaluation were Belkin, Seeger and Wersig [1983]. They suggested that information systems should support the problem treatment process and be evaluated accordingly. Hersh and his colleagues [1996] were among the first to empirically evaluate search systems from the viewpoint of task performance. They assessed to what extent students were able to solve clinical problems by using two information retrieval systems. There are some attempts to sketch frameworks for evaluating IR systems from a task perspective. Both Belkin [2010] and Vakkari [2010] argue that IR systems should be evaluated by their ability to advance the underlying task leading the actor to engage with the system. Because this happens through a sequence of sub-tasks, i.e. information interactions, IR systems should also be assessed by their ability to support reaching these subgoals. 2.1 Field studies

Perhaps the first attempt to explore how task performance is associated with information searching was Kuhlthau’s [1993] Information Search Process Model (ISP), which was based on a series of field studies. She analyzed how learning tasks generated information searching and acquisition. The ISP model consists of six stages in the ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article xx, Publication date: Month YYYY

learning task: initiation, selection, exploration, formulation, collection and presentation. Each of them produces differing information needs and searching. In the initial stages of the ISP the actors’ understanding of the task is vague, and consequently, their information needs are unclear, and information searches are exploratory. When the actors find a focus, their notion of the task becomes clearer, information needs more articulated and information searching more directed. Extending this model, Vakkari [2001a; Vakkari et al. 2003] showed how the stages of the learning task process were associated with information searching. The selection of search terms, search tactics and relevance criteria and information types needed varied with the progress of task performance. There are some field studies on how professionals’ work tasks are associated with information searching and information item use. Byström and Järvelin [1995] studied how task complexity was related with the types of information needed and the variety of information items used by municipal administrators. Kumpulainen and Järvelin [2010; Kumpulainen 2013] and Saastamoinen and colleagues [2012] examined how task complexity was associated with query types and the use of systems and resources in molecular medicine and public administration. The results show that tasks, sessions and information interaction differ considerably at different levels of task complexity. Attfield and Dowell [2003], and Markkula and Sormunen [2006] analyzed journalists’ work processes. Both studies differentiated various stages in the production process and showed how these stages shaped information access and interaction. Freund and colleagues [2005] modeled how software engineers’ work context was associated with their information behavior. Huuskonen and Vakkari [2010] studied how social workers record and use information in a client information system during various tasks. They showed how variation in social workers’ tasks and information needs was associated with their way of exploring and using information in the records. The above studies concentrate on the participants’ real tasks. Another option is to use simulated work tasks [Borlund 2000]. Qu and Furnas [2008] analyzed how people comprehend and structure topics not known to them, and how information searching is associated with this sense-making process. Li and Belkin [2010] studied the associations between simulated work task types and interactive search behavior. There were significant differences in the number of IR systems consulted, result pages viewed, items viewed and selected across the six work task scenarios. There is a paucity of studies reviewing how task outcome is associated with the search process and search results, in particular. Vakkari and Huuskonen [2012] analyzed the extent to which the characteristics of search process and search output (precision and recall) were associated with task outcome. They found that effort in the search process degraded precision, but improved task outcome. These results suggest that traditional effectiveness measures should be complemented with measures for task outcome. In sum, the field studies reviewed above have identified a number of relevant factors affecting information interaction such as the stage of task process, selection of search terms, and search output. We suggest that these factors can be evaluated more comprehensively and systematically by building on the ideas of program theory. 2.2 Experimental studies

There are a few studies, which have applied experimental design for analyzing how task performance is associated with information searching. Based on Vakkari’s [2001a; Vakkari et al. 2003] studies, Wu and colleagues [Wu et al. 2008] developed a technique for identifying actors’ work task stages (pre-focus, focus formulation or post focus). The technique was used for modifying actors’ task profile and filtering information accordingly. They found in a comparative study that search effectiveness was higher when

applying their task stage identification method compared to traditional relevance feedback. Butcher and colleagues [2011] explored the impact of graphical and keyword search interfaces on actors’ cognitive and behavioral processes during online information search for learning tasks. They showed that an interface representing the conceptual structure of a field supports deeper cognitive processing and evaluation of results during information search compared to an established keyword search system. The interface led actors to spend more time exploring domain ideas and relationships and less time exploring individual resources. Liu and Belkin [2012] investigated whether task type in writing a feature article has an effect on search behavior and task outcome over sessions. The results indicate that task type did not affect outcome, but actor’s topic familiarity and task experience did have a positive effect. In addition, the actor’s search and writing behavior affected outcome. The more time proportionally was used for writing instead of searching was associated with the increase of sentences and facts in the report. Bron and colleagues [2012] studied the extent to which an exploratory search interface supports media researchers in an early stage in their research process for formulating a research problem. The number of query formulations was higher and the diversity of information items found greater in the exploratory interface compared to the baseline. However, this did not lead the actors of the exploratory interface to formulate better research questions in the five criteria assessed. Wildemuth [2004] studied how students’ domain knowledge was associated to search tactics used when searching in a factual database on microbiology for answering clinical problems. She found out that search tactics changed over time when students’ domain knowledge changed. Wildemuth and colleagues [1995] studied longitudinally the relationship between actor’s knowledge in a domain and searching proficiency in that domain, and the relationship between searching proficiency and database-assisted problem-solving performance. The authors found little evidence of any relationship between personal domain knowledge and searching proficiency (e.g., search results, and improvement in selection of search terms over time). However, searching proficiency was positively associated with problem solving. The existing models of TBII suggest the need for developing evaluation indicators that reflect both the goals and sub-goals of tasks that trigger searching. They also suggest that task performance and associated information interaction are processes that should be studied accordingly, i.e., longitudinally over several task and search sessions. Empirical results on the relations between task characteristics and information interaction are relatively scarce and in part contradictory. There is a need to explore extensively how tasks are associated with information interaction. We suggest that program theory provides a relevant framework to enhance evaluation studies such as these because it enables the systematic analysis of factors triggering information searching, as well as the outputs and outcomes of this activity. 3. EVALUATION AND PROGRAM THEORIES

Evaluation, in general, is the systematic determination of merit and significance of something using criteria against a set of standards [Scriven 1991]. Evaluation requires some object that is evaluated and some goal that should be achieved. Evaluation aims at analyzing to what extent the object of evaluation attains the goals. Goals are typically defined in terms of what the object (e.g. system, service) aims at achieving. To perform evaluation one needs to define (a) the object of evaluation, (b) the goal(s) of the object of evaluation, (c) indicators of goal achievement, and (d) criteria for assessing goal attainment. When defining the goals, it is important to understand the difference between outputs and outcomes of the evaluation objects [cf. Rossi et al. 2004]. Outputs are products delivered by a system, e.g., information items in the case of IR systems. ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article xx, Publication date: Month YYYY

Outcomes are the benefits a system produces for the users, e.g., user’s improved attainment of search goals or ability to advance in a task [cf. Hersh 1994; Vakkari 2010]. In addition to the conceptualization of system objectives, it is essential to explicate how they are supposed to be achieved. In particular, it is necessary to model the mechanisms connecting the system features and system use to the explicated goals. A useful way of doing this is evaluation based on the program theory framework [Rossi et al. 2004]. The framework was originally developed for evaluating programs that address social problems. Due to its generic nature, the ideas of program theory can be applied to the evaluation of diverse kinds of social phenomena, including TBII. In TBII evaluation, the object of evaluation is task performance as a whole. Different tools and methods that are applied in information interaction represent types of interventions affecting the whole. The program theory framework is particularly suitable for the evaluation of TBII because this framework enables the review of cyclic and iterative features of activities constitutive of information interaction. A program theory (PT) is an evaluation framework that consists of an explicit theory of how a program causes the intended or observed outputs and outcomes, and an evaluation guided by the theory [Rogers et al., 2000; Rossi et al. 2004]. The program consists of inputs, activities or actions undertaken to bring about a desired goal manifesting as output(s) and possible outcomes. Outputs are immediate results of the activities, and outcomes are changes in knowledge, skills, or other characteristics that take place directly or indirectly, immediately or later as results of inputs, activities and outputs. The theory should establish a causal relationship in the program (A causes B) and in addition explain the causal mechanism (how A causes B). For this the theory should explain how the components of the program are related to each other. The evaluation derives the evaluation questions from the theory, and the theory should guide the design and execution of the evaluation. The evaluation investigates whether the program works, how it works, and further, explicates the theory. This includes causal description and causal explanation, as well as determining program (in)effectiveness. [Coryn et al. 2011; Rossi et al. 2004].

Fig. 3.1. Structuring evaluation from the viewpoint of program theory

Figure 3.1 illustrates the main idea of program theory. When applied to TBII, the inputs are the human resources and tools, time and money available to the task, the activities of the task process like using the tools and interacting with information. Often the outputs of a given activity are inputs to the subsequent activities and finally build the task outcome. In PT based evaluation, there are variables representing inputs, activities, outputs and outcomes. When explaining the outputs and outcomes,

they are represented by dependent variables whereas the inputs and activities are independent variables. When evaluating TBII in practice, the ultimate object evaluated is a task process in its context where one or more human beings are performing a task supported by one or more information systems. A task often consists of several subtasks. In the present paper, we consider only task performance by individuals, not collaboration. To make evaluation simpler, more confined contexts than the ultimate motivating tasks are often used. This is illustrated in Figure 3.2, which shows four nested evaluation contexts ranging from the IR context to seeking context, task context, and socioorganizational context. Some indicators for output/outcome are also suggested. We note though that such a collection of evaluation contexts alone is not a program theory for TBII because it provides no causal explanation. For example, it does not explain how the use of an information system produces the effects on task outcomes.

Fig. 3.2. Nested evaluation contexts for (I)IR [based on Ingwersen and Järvelin 2005, p. 322; Kekäläinen and Järvelin 2002]. Copyright © 2005 Springer..

While limited contexts are easier to study, easier to standardize, and have led to progress in IR research, one must bear in mind that improving outputs, e.g., precision of search results, does not automatically result in improved outcomes [Vakkari and Huuskonen 2012]. In particular, one cannot know this, if the outcomes are not clearly defined or lack good indicators, or if there is no program theory explaining how the outputs contribute to the outcomes. [Järvelin 2009; 2011] The evaluation of TBII is inherently highly complex because there are many variables interacting in the production of outputs and final outcomes. Therefore it is impossible for the present paper to suggest a program theory for TBII. There simply is not such a body of tested empirical data available. Rather, the concept of program theory provides a framework for designing evaluation studies in order to establish program theories upon which TBII evaluation can be built.

ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article xx, Publication date: Month YYYY

4. TASKS AND ACTIVITIES IN INFORMATION INTERACTION

To improve information interaction and task performance one needs to understand the phenomenon first. There is a range of factors affecting information interaction – factors related to tasks, actors and behaviors, information items and collections, information systems, and context [e.g., Ingwersen and Järvelin 2005]. Their effects and interactions need to be mapped as a prerequisite for evaluation that builds on the ideas of program theory (Figure 3.1). In general, a task can be defined as a sequence of activities a person performs in order to accomplish a goal. There are a number of task classifications specifying the types, dimensions and attributes of (work) tasks (for an overview, see [Li and Belkin 2008]). From the perspective of TBII evaluation, tasks classified as information intensive are particularly relevant. The performance of such tasks incorporate a variety of cognitive activities like planning, decision making and learning. Dependent on the nature and amount of information required in task performance, tasks differ with regard to complexity, ranging from simple automatic processing tasks to genuine decision tasks [Byström and Järvelin 1995]. From the information interaction viewpoint, complex tasks [Byström and Järvelin 1995], which cannot be completed on the basis of previous knowledge but require searching for additional information and may require performance in several cycles, are especially relevant. Typically, intentional or unintentional learning takes place in performing complex tasks since information is used to construct new knowledge in the course of task performance. Thus, we analyze information interaction in tasks, which provide an intentional or unintentional learning component. For the sake of simplicity, we call such tasks as learning tasks in order to be able to be more concrete in our discussion. As we approach learning tasks from a broader perspective, such tasks are not confined to activities typically taking place in formal education, for example, writing essays. Learning theories take many forms and so do the competing definitions of learning. To keep the conceptualization of learning tasks manageable we lean on the classic Bloom’s taxonomy of learning objectives (goals). He divided learning objectives into three broad “domains”: cognitive (knowing), affective (feeling), and psychomotor (doing) [Bloom et al. 1956]. Here we limit our view to the cognitive domain. Krathwohl [2002] revised the representation of Bloom’s cognitive domain by separating two dimensions: knowledge and cognitive processing. Learning goals in the knowledge dimension are classified into four categories: factual, conceptual, procedural and metacognitive knowledge. Learning goals related to the cognitive processing dimension consist of six categories: remember, understand, apply, analyze, evaluate and create. Categories on both dimensions form a hierarchy of increasing complexity. We expect that information interactions in a learning task relate to various types of cognitive processing of factual, conceptual, procedural and metacognitive knowledge. The actor may also transfer parts of information items into the task output, for example, by copy-pasting, and thus avoid cognitive processing of information. This extreme behavior in information interaction falls outside the Bloom’s taxonomy but does not make the taxonomy invalid in our context. Kuhlthau’s [1993] Information Search Process (ISP) model is based on, and has been verified in a number of empirical studies. These include studies both on students performing learning tasks in formal education [Kuhlthau 1993], and on experts such as financial analysts [Kuhlthau 1999], lawyers [Kuhlthau and Tama 2001], and scientists [Anderson 2006] performing information intensive work tasks. Similar sequences of subtasks (cf. the stages of the ISP model) and patterns of information interactions were identified taking place in formal learning tasks and information intensive work tasks. At the early stages of the task performance the actor explores the problem at hand and information interactions serve learning. After the actor has formulated a

focus for the task she moves into the stage of systematic information gathering [Kuhlthau 1993]. Based on this we may argue that the ISP model is a relevant framework for investigating task-based information interactions.

Fig. 4.1. Learning task performance and information interactions.

In the present paper, however, we do not apply Kuhlthau’s ISP model directly. It specifies the (sequential) stages of a task on a high abstraction level which are not directly connected to information interactions. We want to identify generic activity types that are performed across the task process stages, relate to information interaction, and are amenable to support by information systems. Figure 4.1 presents our model for information interactions in a learning task. The activities and their relationships are briefly characterized below. We provide detailed descriptions of activities, their relationships, inputs, processes and outputs in Sections 5-9. The model in Figure 4.1 represents the activities of an individual at three levels: task performance, information item and source interactions. The fourth, information sources level represents the formal and informal information channels available for the individual. At the task level, the model makes a difference between cognitive and behavioral processes. The former illustrates the process of knowledge construction generating task outcomes. The latter concerns the observable physical interactions with information items to generate task output (e.g. report, presentation). Because we have, for analytical reasons, separated task performance from search and information item interactions, task planning and reflective assessment (Plan/Assess in Figure 4.1) is the only activity assigned to the cognitive level. More specifically, task planning and reflective assessment are meta-activities dealing with the setting of task goals; assessing information needs; planning activities, their order and outputs; sketching task outcomes; and evaluating outcomes. The activities of information searching (Search) and selecting information items (Select) take place at the level of information source interactions (based on snippets or other surrogates of information items). The output of these activities, selected information items, are inputs for working with information items (Read) and synthesizing and reporting (Write) activities at the level of information item interactions. Further, the outputs and outcomes of activities at the level of information item interactions are inputs for the task’s physical and cognitive performance, respectively. Note that in Figure 4.1 the effects of search interactions are mediated through information item ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article xx, Publication date: Month YYYY

interactions to learning task outcomes. The model also characterizes the difference between task output and task outcome. The quality of one or the other can be used as an evaluation criterion. It is obvious that the five activity types described above cover a larger share of information interaction than searching and selection of information items alone. We believe that the typology of five generic activities is comprehensive enough for the needs of TBII evaluation. We will argue that these activity types will allow building program theories that connect interaction inputs to outcomes in task performance thus allowing a more comprehensive comparison of effects on task outcome due to augmentation of individual activities. The five generic activity types can be approached at three levels: physical (observable) behavior, cognitive, and affective [Kuhlthau 1993]. While the last one is important, we deliberatively leave it for later papers in order to make the present one simpler, and focus on the first two aspects. The activities vary in their goals, process, clarity and structuredness across task stages but are in principle present in every stage. In practical TBII these activity types are also intertwined. Nevertheless, it is possible to recognize these types at least analytically. A task may consist of several subtasks with their own inputs, processes and outcomes. The analysis of activities can be applied to these as well. An Example of TBII Comprehensive Evaluation. Next we present a motivating example on TBII evaluation, which covers information searching, and its contribution to task performance and which is inspired by the program theory approach. As we approach the issues of TBII evaluation by focusing on learning tasks, the topic of the example discussed below is chosen from this particular perspective. We base our TBII example on a learning task from Vakkari and Huuskonen [2012]. The task is to write an essay on a given topic for a class on evidence-based medicine, acquiring information from Medline (the Medical Literature Analysis and Retrieval System Online). We assume that the participating students follow a simplified process model in their essay projects: they are advised first to understand and structure the topic of the essay and explicate their information need, and thereafter start searching for information for the assignment based on their knowledge of the topic. Searching is divided into selecting search terms, formulating and running queries. After searching, relevant information items are selected for the further examination. The process continues by working with information items, synthetizing information and writing an essay. We do not consider feedback and iteration here. A necessary condition for evaluating a tool, as well as the whole process, is to explicate its goal and derive criteria of success from that goal. We discuss below possible support tools for each process phase, and the goals of these tools with tentative suggestions for success indicators in reaching the goals. The task accomplishment requires understanding of the topic of the task as well as of the actions needed for the completion of the task. Because background knowledge of the students may and usually does vary, their behavior at the planning phase varies as well. Some start searching for information right away with a preconceived idea they have; others do more conceptual analysis and planning for the essay writing. The student may be supported in task planning by showing hierarchical or faceted divisions of the subject area, and typical actions and their order for the given task (e.g. essay writing). Alternatively, Kuhlthau’s [1993] Information Search Process (ISP) may be explained to students to support their task planning. These aids could be evaluated by their use, by asking the students about their helpfulness, and by observing the results of the subsequent phases. Bates [1990] considers how and at what level supporting tools could be integrated to a searching interface, in particular (1) the degree of searcher vs. tool involvement in the search, and (2) the type of activities that the student is able to direct the system to

do at once. The students search for information with the OVID-Medline interface, which has a thesaurus of the Medical Subject Headings (MeSH). The interface supports structuring the topic and selecting search terms. This tool can be assessed according to what extent it supports shaping the topic, e.g., to what extent a thesaurus helps students to identify central concepts and their relations for the topic. The criteria of success could be the increase in the number and the specificity of concepts held by the students compared with MeSH [Vakkari 2010]. Further, terminological support for identifying search terms can be assessed by the extent to which the tool improved the query. Which query dimensions are selected for the evaluation, depends on the goal of the search and the support tool. In general, a query is an input to be evaluated by its outputs, i.e., search results. Search results can be evaluated by the quality of their representation mode or as an indicator of the success of the whole search. In both cases, the number of useful information items identified on the result list, and inferred measures like precision, recall or cumulated gain can be used as indicators of success. Support for selecting and saving relevant items for the next phase comprises such tools as bookmarking, saving bibliographic information or information items. The goal is to help locating and refinding the selected items as an input for the next phase and these tools can be evaluated by, e.g., users’ satisfaction, time spent or ease of locating and getting items for further processing. Working with information items comprises activities like reading and interpreting selected items. Support is needed to find and collect relevant semantic components in relevant items. Semantic components consist of text passages like sentences or paragraphs representing a useful idea or fact for progressing in the construction of the task output, the essay. A supporting tool could be a focused search method, like contextualization [Arvola, Kekäläinen and Junkkari 2011]. The indicators of success may reflect the amount and proportion of relevant text read in the selected items [Arvola et al. 2010; Sakai and Dou 2013]. Moreover, the development of the conceptual structure of the topic could be measured as an indicator of the success, which could be evaluated through conceptual maps [see e.g. Halttunen and Järvelin 2005]. The aim of the tools for supporting synthetizing and reporting is the quality of task output, in the case of essay writing, the quality of the essay. Such tools assist, among other things, in annotation, information refinding and link creation between items and systems [Kopak et al. 2010]. These tools could be evaluated with indicators such as the proportion of information items cited in the essay of all items selected from search results and the amount and proportion of text used from the information items selected from the search results. A more complicated indicator would be the degree of text transformation for writing the essay. This refers to the degree to which the actors have been able to synthetize the text in the information items used for creating the output, i.e., the essay. Sormunen and colleagues [2012] proposed a method for analyzing the degree of synthetizing. In addition to these indicators, subject experts can assess the quality of the output along various dimensions [Vakkari and Huuskonen 2012]. Most of the success indicators suggested above can be combined with the effort required for using the tool evaluated to meet the goal. Effort can be measured objectively as time spent on a tool or activity, or as effort perceived by the actors. Therefore, these indicators can be expressed as gain effort ratios like the number of query terms per time used for identifying them, or the increase in the number of concepts expressed in information need per time used for this articulation. The immediate outcome of the whole task process is learning about the topic of the essay and about the process of writing an essay; a long-term outcome could be capability to apply learned accomplishments in new situations. Output of the task, the essay, is fairly easy to evaluate and indicates the quality of the immediate learning outcome. The subprocesses may be evaluated with regard to the output, even to the immediate ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article xx, Publication date: Month YYYY

outcome. Evaluating the process with respect to the long-term outcome is more demanding. We propose that the ultimate goal of information interaction tools is to support performance in the task motivating searching. Therefore, it is necessary to understand mechanisms that connect the contribution of these tools to the output of the task. If we do not know that, we are not able to claim anything about the role of information interaction tools in task performance. 5. TASK PLANNING AND REFLECTIVE ASSESSMENT 5.1 The Phenomenon

Task planning and reflective assessment is a meta-activity, kind of an auxiliary process [Sormunen et al. 2014]. The purpose is to plan, monitor and evaluate task performance and task outcome. In addition to planning the task, it also focuses on planning and evaluation of the inputs, processes and outputs, and order, of the other activities (see Sections 6-9) present in the task performance process. Planning the task process is about organizing the activities towards the goal and planning the individual activities needed to reach the goal [Schraw 1998]. During the activity the actor sets the goals and plans the procedure of the activity at hand, against which performance is assessed and monitored. The task may be conceptualized differently by the actor depending on whether others assign the goals, or whether the actor sets them. [Tanni and Sormunen 2008]. This conceptualization is called perceived task, and is created in relation to the actor’s mental model. Actor’s mental model of the task consists of concepts and their interrelations, and the mental model becomes clearer at the end of the task process [Vakkari 2001a]. Task planning and reflective assessment activity is adaptive: it may change the task goals while the actor learns about the task or activity at hand. This kind of activity may occur during task performance as monitoring behavior, such as “keeping track” or “changing track”. In every turn, the actor may re-assesses these aspects and reformulate them. Task planning and reflective assessment also intertwines with the actor’s metacognitive knowledge, which can be divided into knowledge about one’s own cognitive state, process and the task itself [Flavell 1979]. The metacognitive control enables the actor to benefit from instructions and therefore helps to solve tasks [Carr et al. 1989]. Metacognitive knowledge during the activity is dynamic and may include “changing course” (switching the tactics and reformulating the perceived goal), parallel thinking, pulling back and reflecting, scaffolding, curiosity, and assessing time and effort in order to allocate resources [Bowler 2010]. Declarative, procedural and conditional knowledge about the task and about oneself are constantly present while the actor is monitoring tasks and activities. Moreover, prior knowledge on task, task domain and task processes affect the behavior. It can be argued, that by improving one’s metacognitive skills and skills in task planning and reflective assessment, the task performance is improved [Biggs 1988; Jones et al. 1995]. This activity might be one possible aspect to evaluate, if the success in reaching the task outcome is to be measured. Task planning and reflective assessment also relates to task complexity and professional training. Simple tasks are structured and training makes many tasks simpler. In some cases the planning may be subconscious reacting to what is being experienced and without an apriori plan (c.f. Suchman’s [1987] situated actions). For the sake of simplicity, the present study focuses on task planning that is based on conscious, reflective thinking.

5.2 Evaluating Task Planning and Reflective Assessment

Task planning and reflective assessment are approached from the viewpoint of program theory in Figure 5.1, which depicts its inputs, processes and outputs and (direct and indirect) effects on task outcomes. Task planning and reflective assessment takes contextual factors, the perceived task, and the information sources as input. Contextual factors include factors such as time and other constraints, work environment, and socio-organizational factors etc. Perceived task is the actor’s current understanding of the task. It is dynamic and reformulated during task performance and it takes feedback from other activities. The third input, the information sources, comprises all possible information sources including information systems, social contacts and actor’s own memory. In Figure 5.1 we present the information sources at conceptual schema level (understanding of potential sources), whereas in Figure 4.1 they are presented at instance level (various sources actually used).

Fig. 5.1. Task planning and reflective assessment

The processes of this activity are planning, monitoring and assessing the perceived task and the task performance itself [cf. Schraw 1998]. Planning involves the selection of appropriate strategies and activities and the allocation of resources. Monitoring involves the actor’s awareness of her comprehension and awareness of task performance, in real time. Assessing may focus on the inputs, process, outputs and outcome of an activity, for example, how thoroughly various information sources were searched, do selected information items cover the whole problem area of the task, and how elaborated was the comprehension the actor achieved in working with information items, respectively. Task planning and reflective assessment produces several outputs that contribute to the actor’s comprehension of the task at hand. This refined understanding and focus formulation [cf. Vakkari 2001a] contributes to the task scope and quality. The outputs of the activity contribute directly or indirectly to task outcomes. Firstly, task planning and reflective assessment activity contributes to the perception about the task, and, hence, to the task goal definition and the task outcome planning. Secondly, it contributes to individual activities (see Sections 6-9) and to the task performance process (how to carry it out, which activities are included, which strategies and tactics are selected) and thirdly, to cost-effort evaluations. We may illustrate the evaluation of task planning and reflective assessment by drawing on the example of essay writing introduced in Section 4 above. The actor first ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article xx, Publication date: Month YYYY

creates an understanding of the task at hand, the perceived task. In the beginning, this understanding may be unclear and it is iteratively refined during task performance. Then the actor plans – more or less consciously – the steps, which lead to task completion. During task performance, the actor constantly monitors the actions and decides to continue or to stop an activity, or choose another activity. The invested effort during the task process is evaluated against the goals. The actor assesses what is a reasonable effort to be put in the essay writing in order to achieve the desired outcome, i.e., in case of students, the mark. The actor’s level of expertise and previous knowledge on similar tasks (both subject knowledge and procedural knowledge) affect the planning, monitoring and assessment processes. Task planning and reflective assessment impact the other activities. For example, during the searching activity (see Section 6) the perceived task and the actor’s understanding on the outcome of the task affect the perceived information needs leading to information behavior. Planning leads to the formulation of the perceived task needs, selection of suitable information systems for use, and it also affects formulating and expressing the needs to the selected search system. The selection is affected by the actor’s awareness about, and the availability of, the information systems or other channels (i.e., information sources). This activity is present also when selecting documents and making relevance assessments (Section 7), working with information items (Section 8), and synthesizing and reporting (Section 9). The level of impact on the other activities may be of interest, e.g., how do different task strategies affect the outputs of other activities and task outcome. Task planning and reflective assessment activity can be supported by information systems. This may be achieved through modeling the task processes into a system, as in business process engineering. This is applicable with well-structured routine tasks such as administrative tasks [cf. Saastamoinen et al. 2012]. Bates’ views on interface functions are similar, in which the behavioral patterns are incorporated into the interface [Bates 1990]. In task modeling, task structure, task process and source types can also be mined from web-logs [Lucchese et al. 2011; Ruthven 2012] and used in task support. Secondly, task planning and reflective assessment may be supported by decision support systems, which facilitate solving less structured problems. They typically help compiling useful information from information items in order to identify and solve problems and to make decisions. In case of evidence-based medicine, characteristics of individual patients are matched to a computerized knowledge base, and software algorithms generate recommendations, which help decide the diagnosis and suitable treatment [Berner 2007, pp. 3-5]. These examples show that this activity may be supported by information systems and that they affect the processes illustrated in Figure 5.1. This effect can be studied with experimental settings. 6. INFORMATION SEARCHING 6.1 The Phenomenon

During the searching activity the actor interacts with search systems with the intention of fulfilling goals originating from the task. Therefore, searching is an instrumental activity. Searching occurs if the perceived task needs cannot be fulfilled by, e.g., consulting the actor’s own memory. The aim is to search for task-relevant information to be used in task performance. The task-based information interaction process as a whole may span a long period of time (e.g., weeks or months). The searching activity, however, is briefer and typically repeated at different phases of the TBII process. The underlying task dictates what kinds of supporting functions the actor needs from the system [cf. Checkland and Holwell 1998; Vakkari 1999].

In an abstract sense, the searching activity can be divided into querying and browsing phases. At the behavioral level the querying consists of expressing information needs to the system [see Ruthven 2008], while browsing entails a sequence of activities, typically (i) glimpsing an initial “scene” received, followed by (ii) homing in on some information item(s) observed, and finally (iii) acquiring or abandoning some results after examination [Bates 2007]. Both querying and browsing inherit their goals and success criteria from the task level [Byström and Hansen 2005; Xie 2008]. As in the case of task planning, the perceived search task may be reformulated during task performance, and it takes feedback from other activities. Searching activity also produces outputs that contribute to the other activities. First, as searching activity is monitored, it produces inputs for the task planning and reflective assessment. Secondly, searching activity produces inputs for the subsequent selection activity. At the cognitive level querying entails formulating and expressing the actor’s intended meanings – determined in relation to the task at hand – by using the language of the information system, while browsing entails considering meanings in relation to the task in the scene glimpsed, and objects observed and examined. During the interaction with the technical environment supporting (sub-)activities, the actor may encounter conceptual, syntactic, or technological barriers [Kumpulainen and Järvelin 2012]. In order to be able to acquire information, the actor needs different types of knowledge (conceptual, syntactic, or technological knowledge) about the systems [Borgman 1985; Kumpulainen and Järvelin 2012]. Conceptual mismatches may be due to the actor’s concepts not present in the system, the system’s concepts that the actor should be aware of, or the system and actor concepts that are similar but non-identical [Blandford and Attfield 2010]. 6.2 Evaluation of Information Searching

In Figure 6.1 we present Information searching from the program theory point of view and start with inputs. Contextual factors, such as constraints in terms of time or costs, perceived task needs (actor’s current understanding) and characteristics of the actor (e.g., motivation and skills) affect the formation of the search process. Usually some (even brief) planning (see Section 4), during which perceived task needs are formulated, precedes the decision to search. The perceived task needs must be transformed into search expressions and lead to homing in on some items in browsing. The actors differ in their ability to conceptualize and express their information needs [Belkin 1980; Kuhlthau 1991; Vakkari 2000] and the needs may change [Bates 1989] in dialogue with the system [Swanson 1977; Belkin 1980; Bookstein 1983]. The real life searching process (expressing information needs / examining information objects) is typically interactive. Before the actual searching can take place, some particular search channels and systems must be selected. After the selection, the actor employs some particular search strategies and tactics to find information, which serves the perceived task needs. Searching may take place as cross-session and multisession activities [Kumpulainen and Järvelin 2010, Kumpulainen 2014; Vakkari 2001b] and even a single search session may entail using multiple systems. These aspects of the searching process may affect the outputs of the activity (search results). However, the effect of searching outputs on the task outcome is indirect [Vakkari and Huuskonen 2012].


Fig. 6.1. Information searching

Search output (search results) consists of information items found, while the outcome entails cognitive changes occurring during the search process. The information searching activity gives feedback to the task planning and reflective assessment activity. It is also a prerequisite for the subsequent selection activity. In case of the essay writing task (cf. Section 4), the ability of the student to select channels/systems, to formulate and express the search needs, and to home in to relevant objects is affected by the actor’s characteristics, such as search knowledge and domain knowledge. Further, actor’s personality, study approaches and motivation affect the search behavior [Heinström 2005; 2006]. Traditional IR evaluation has focused on the quality of the search result retrieved. Our goal is to understand, however, how the various factors related to searching activity may affect the outcome of the task as a whole. These factors include the search tools included in technical environment (e.g., support for expressing information needs; ease of operation); actions of the actors (e.g., how the items retrieved are scanned visually); and various meta-activities (e.g., tactical-level changes in goals during searching activity). 7. SELECTING INFORMATION ITEMS 7.1 The Phenomenon

In selecting information items after retrieval, the issue is about making decisions about the relevance or usefulness of the found items in the task process as seen at its current stage. This activity can take place separately after searching, but is often interleaved with searching. It may also be interleaved with the next activity, working with information items. Nevertheless, the activity can be identified, at least analytically. Selecting information items requires interacting with available item representations, which may include various elements (such as titles and abstracts), passages (e.g., snippets) or metadata (e.g., keywords), or the entire content (e.g., full text). The purpose of this interaction is to determine which information items are likely to have sufficient contribution and coverage to task performance (i.e., relevance) so that they should be selected for immediate or later use in task performance, and ignoring the rest. There is a large body of literature in Information Science on the concept of relevance – see, e.g., [Barry and Schamber 1998; Borlund 2003; Cosijn and Ingwersen 2000; Ingwersen and Järvelin 2005; Saracevic 1975; 1996a; 1996b]. In TBII, topical relevance may often be a first step (and important if the actor is a novice w.r.t. the task), while

the higher order dimensions of relevance (e.g., pertinence, socio-cognitive, situational relevance) certainly play an essential role. Different actors, even if experts regarding the task may assess information item relevance quite differently depending on many factors (e.g., prior knowledge, learning style). Vakkari [2000; 2001a-b] and Vakkari and Hakala [2000] analyzed factors contributing to relevance, relevance criteria, and how changes in the criteria are related to stages of task performance. There is a connection between an individual actor’s changing understanding of his/her task over time and how the relevance of information items is judged and what types of information are considered as relevant. The more structured the task, the better able the actor is to distinguish between relevant and other information items. Hansen [2011] found in a study on patent retrieval that relevance judgments might be done collectively for a group of information items as well as per item.

Fig. 7.1. Document selection decision stages [Wang and Soergel 1998, p. 118]. Copyright © 1998 John Wiley & Sons, Inc.

Wang and Soergel [1998] studied the document selection process and proposed the selection framework presented in Figure 7.1. The framework consists of a range of document elements (‘document information elements’, DIEs), and applies multiple relevance criteria and five document value dimensions for the decision to select retrieved documents for later to use. Oard and Kim [2001] proposed a framework for modeling the content of information items based on observation of how actors interact with those objects in the course of information seeking and use. Based on the frameworks, we may identify issues in designing tools supporting information item selection: (1) the representation of individual information items (which DIEs are included, how represented, in what order and emphasis), (2) how sequences of possibly overlapping information items in the retrieval result are represented, (3) how the representations can be manipulated, and (4) how the representations support relevance assessments based on varying criteria. 7.2 Evaluation of Information Item Selection

Aiming at the evaluation of information item selection from a program theory perspective, we propose the model in Figure 7.2. The main input consists of the search results from the preceding search activity. Important factors that affect the selection process are the motivating task stage and extent, and the perceived types of information ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article xx, Publication date: Month YYYY

needed (e.g., procedural, factual [Byström and Järvelin 1995]). Actor’s characteristics (e.g., learning style, motivation) and preferences (e.g., for document genres), and contextual factors like time pressure and type of organization affect the process. The process consists of three behaviors interacting with the information item presentation tools: selecting item representations, examining and manipulating item representations, and making decisions. The outputs are information items selected for further processing in the task and items ignored or rejected. Both sets may contain topically relevant and non-relevant items for the task due to information saturation, inability to use some information, or omissions. Since information item selection is an instrumental activity, the goal is to select in a cost effective way a sufficient number of information items for later use so that searching and selecting can be stopped.

Fig. 7.2. Information item selection

The short-term outcomes are information serving the task and an understanding of what information may be available. The outcome in the task is improved outcome quality due to cost-effective information item selection process. Whether this actually happens under various circumstances is of interest. We need to learn through evaluation how this depends on the inputs to information item selection: search results, actors, perceived tasks, task stages, and contextual factors. It is important to understand what the relative significance of each factor is in current practice, and explore effective new possibilities, whether better presentation interfaces or behaviors. The information item presentation tools may be components of a search engine, but can be evaluated separately. In the sample learning task (cf. Section 4), the students receive bibliographic information and abstracts as search results, which they can present in a few formats and scan. Based on the university library’s license agreements with publishers, many of the listed documents can be available as html in a web browser or as freely downloadable pdf full documents. Both representations allow scanning and limited searching to support selection. It is possible to save relevant items using bookmarking and saving bibliographic information or information items to support later use. The representations used, the time spent and dwell times, and decisions made can be registered and compared to which items are really used later and contribute to the final essay [cf. Tague and Schultz 1988].

8. WORKING WITH INFORMATION ITEMS 8.1 The Phenomenon

A learning task requires acquisition and interpretation of information items that are deemed relevant from the viewpoints of learning goals and task outcome. In the Internet environment the information is often immediately accessible, and reading snippets (list of query results, link on a page) is interwoven with reading the information items themselves. At this phase reading is distinctively active: information is selected, gathered, organized, annotated and combined for further use, e.g., writing. The actor examines the information items through the lenses of her learning task, that is, reading takes place in a situation framed by the task [Rouet 2006, pp. 23-26]. Reading may involve browsing, scanning [Bates 2007] and other modes of reading [Marshall 2009]. Arvola and Kekäläinen [2010] suggest a model for browsing within an e-document, where all browsing strategies will eventually lead to reading some text passage (or any other semantic component, e.g., picture), to assessing their relevance, and either to reading the document further or to moving to the next document. In general, the information architecture of each information item guides reading and interpretation [Chen and Lin 2014; Dillon and Turnbull 2005]. For example, the actor may take advantage of various text organizers such as genre-dependent semantic structures, rhetorical organization, or visual and verbal cues included [Rouet 2006, pp. 3061] or guiding gadgets provided by the interface [Arvola and Kekäläinen 2010]. While reading the actor may apply various techniques (like copying, note taking, annotation, etc.) to strengthen the retention of details or to prepare for the writing activity [Adler et al. 1998; Pearson et al. 2013]. The information is interpreted with relation to the actor’s cognitive state in context, including actor’s understanding of the learning task [Goldman et al. 2013; Ingwersen and Järvelin 2005, pp. 274-276; Rouet 2006, p. xix]. Detailed reading of information items may point out new search terms or new items of information for instance in the form of references, which are picked up and possibly utilized as an input in new cycles of searching. The actor’s previous knowledge is the basis to build on; the learning task and the current situation provide strong contextual effects. 8.2 Evaluation of Working with Information Items

Figure 8.1 represents the domain of interest in working with information items. The inputs are partly common with the previous activities, i.e., contextual factors and actor’s qualities, like level of expertise. Some inputs change along the process, like the state of knowledge and motivation. The items selected in the previous activity (see Section 7) serve as inputs for working with information. The quality, genre and information architecture of the items are important inputs in the process. Working with information, i.e., scanning and browsing, reading, comparing and linking, and annotating, is facilitated by tools with different features, whose impact can be studied. The utilization of such tools is affected by the qualities of actors, and this combined with the task features has effects on the time and effort needed to gather the information perceived necessary for the task accomplishment. The outputs are the results of the working with the information items, e.g., identified semantic components, organization of those components and annotations. The outcome is the enhanced knowledge of the subject and preparation for the next phase in the process. In order to establish the causal chain in evaluation, we should identify the components affecting the outputs and outcomes. The mechanisms are neither studied enough nor well understood. We are interested in how information items are browsed, scanned and read, which tools or techniques are used for annotation and how. How do these actions alternate between individuals and systems? For example, reading may either be continuous and linear or discontinuous and non-linear. Of interest is, how much ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article xx, Publication date: Month YYYY

linear versus selective reading is needed to make sense of the text considering how the information is represented (i.e. device, browser, organization of content). How does the annotation or note taking differ between systems or from working with, e.g., a paper and pencil? Apart from working only on one single item, another challenge is to compare the contributions of multiple selected items and comprehend their relationships [Spivey 1997, p. 136; Adler et al. 1998].

Fig. 8.1. Working with information items

Various qualitative and quantitative methods can be applied in evaluation depending on the type of data. The qualitative evaluation takes place most conveniently in quasi-experimental settings where students may work on realistic learning tasks [Kiili et al. 2012]. Tests are used to assess learning outcomes [Todd 2006]. Usability laboratory environment, e.g. eye tracking, gives robust data about the actor’s attention and using the tools. Transaction logging and other unobtrusive user interface monitoring outside the lab enable more test persons [e.g. Kekäläinen et al. 2014]. In our running example (cf. Section 4), the students would read the selected items retrieved from Medline and saved in their personal repositories as links or items. Besides reading, the students might mark or annotate relevant parts of the information items (semantic components); categorize them, and create links between related or associative components. Reading enhances their understanding of the subject of the task, and leads to new searching cycles or to synthesizing and reporting. The output of this phase is information identified, selected and organized from information items inherited as an input from the previous phase. The outcome is enhanced understanding of the subject of the essay and the pertinent information items, which prepares the student for the next phase. 9. SYNTHESIZING AND REPORTING 9.1 The Phenomenon

In a typical complex task, working with selected information items to learn about a topic or to solve a problem is not enough in accomplishing the task. The outcome of cognitive processing of information has to be documented and justified using information items as evidence. This is an activity, which is an essential part of knowledge work competences shared by professional experts. For example in higher education,

academic writing is taught routinely to students as a scholarly task and competence [Monte-Sano and De La Paz 2012; Segev-Miller 2004]. Writing based on the use of multiple information items intertwines with reading as a reading-to-write process [Spivey 1997] but we can make here an analytical distinction between them. Writing starts as a cognitive process while reading in form of planning how the information items can be used in the text to be written. In addition to information items, the writer takes advantage of her knowledge of the topic and understanding of the discursive practices of the intended audience [Spivey 1997, pp. 144145]. The form of writing (descriptive, narrative or argumentative genre) also affects the ways of using information items [Monte-Sano and De La Paz 2012]. Synthesizing and reporting based on information items take many forms. In copying, text (or other content) is transferred verbatim from an information item to one’s own output. Patchwriting is close to copying but includes some minor editing such as deleting or adding words or altering grammatical structures. In paraphrasing, the writer restates the passages of an information item in her own words. Copying, patchwriting and paraphrasing reproduce the content of information items without requiring cognitive processing beyond remembering (identifying) or understanding. [Howard, Serviss, and Rodrigue 2010; Li and Casanave 2012.] To summarize, the writer restates and compresses the main points of an information item. [Howard et al. 2010; Sormunen and Lehtiö 2011] In case of argumentative writing, the author is expected to synthesize information from multiple information items [Sormunen et al. 2012]. Synthesizing across information items is cognitively more demanding than summarizing a single text. The writer has to develop an integrating idea how to transform information from differently structured - even contradicting - information items into a new structure [Mateos and Solé 2009], and take into account the differences in the contextual origin of information items [Monte-Sano and De La Paz 2012]. 9.2 Evaluation of Synthesizing and Reporting

Figure 9.1 depicts the inputs, process, outputs and outcome of the synthesizing and reporting activity. The process of the activity can be represented through three subprocesses: planning, translating and reviewing [Flower and Hayes 1981]. In planning the actor organizes ideas concerning the task outcome drawing on prior knowledge and information items retrieved, selected and interpreted in the course of task performance. Flower and Hayes [1981] define translating as ‘putting ideas into visible language’, i.e. generating drafts of documents or presentations according to the plan. In reviewing the actor reads or views the drafts to detect problems in meanings or presentational conventions and edits drafts to complete the task output. The input factors are mainly the same as in Figure 8.1 but after working with information items the actor’s state of knowledge is at a new level. Her perception of task needs is also in a new form because now the requirements for the synthesizing and reporting output have to be taken into account (document genre, discourse practices of the target audience). Semantic components distilled from selected information items become a new input category.


Fig. 9.1. Synthesizing and reporting

Synthesis and reporting can be supported by citation management software such as RefWorks, EndNote Web, Mendeley or Zotero, which have become popular tools in academic writing. These tools help organize and re-find information items selected, stored and probably classified or annotated by the actor. Via integration to the word processing software they help locate selected information items at the time of writing, and create automatically text citations and the list of references [Zaugg et al. 2011]. Present tools support writing only at the level of technical management of information items and their references. High-level semantic support has been demonstrated, for example, in argumentation support systems, but only in narrow domains [de Moor and Aakhus 2006]. There is an obvious need to develop systems, which actively retrieve and suggest semantic components from the repository of selected information items relevant in a particular writing situation. The output of the synthesizing and reporting activity is a document or a presentation required in the task. From the learning perspective, the outcome of the process is the permanent change in the actor’s factual, conceptual, procedural and metacognitive knowledge structures. From the task performance perspective, the outcomes are related to the content and quality of the output (answer to a question, solution to a problem, justifications for a recommendation, etc.). The student of our running example could start writing an essay using a word processor extended by a citation management system. During writing she can easily refind selected information items and semantic components annotated or bookmarked in the previous activity, and the software automatically adds text citations and information items into the list of references. If a dedicated writing template supports the task, the actor may be reminded of missing required entries in the prepared output. The student synthesizes research findings reported in different information items and uses references to various information items to justify her conclusions. The output of this process is an essay, which may be evaluated in terms of its argumentative quality. The outcome of the process is an increase in the student’s factual and conceptual knowledge on the topic and procedural knowledge on evidence-based medicine. 10. DISCUSSION

In Sections 5-9 we have considered TBII evaluation in terms of the five activities individually. Figure 10.1 puts them together into a comprehensive TBII evaluation process

based on program theory. Each of the process boxes is an entire activity characterized in Sections 5-9. We will discuss the properties and limitations of this proposed evaluation framework below.

Fig. 10.1. Overall TBII evaluation framework based on program theory and five activity types

The main strengths of the evaluation framework include: - it covers TBII comprehensively; - it supports activity-based and comprehensive evaluation; - it supports cyclic and sequential task process evaluation; - it focuses on interaction evaluation; - it directs attention on significant impacts and the components producing them; - it supports evaluation of diverse types of tools. These strengths are discussed briefly below. The framework covers a range of activities in TBII from task planning to searching, selecting items, working with items, and reporting. It covers the scope of Figure 3.2 from the program theory viewpoint. The framework supports the evaluation of individual activities using their specific program theories. Each activity has its own program theory identifying the activityspecific inputs, processes, immediate outputs and outcomes. The evaluator can find out which activity-specific factors contribute, and in what way, to the outputs and outcomes. At the same time, the framework emphasizes comprehensive evaluation of TBII in task performance. The framework reminds the evaluator of the several activities that together mediate impacts of each individual activity to the entire task’s outputs and outcomes. So far, we do not know well their relative impact on task performance. Tasks vary in their complexity. Frequently repeated simple tasks may have a rather straightforward process and their TBII may be described as a single pass through Figure 10.1. However, in complex tasks the TBII activities may be performed in various loops and iterations. This is represented in Figure 10.1 by emphasizing the central role of the task planning and reflective assessment as a meta-activity. In principle, each move between activities is based on a reflective assessment of a situation at hand. Ending an activity returns the process to the meta-activity which starts a new activity or repeats the previous one. Multiple visits to each activity in a task process can be evaluated individually or as aggregated. Program theories in general explain how the components (e.g., tools, behaviors) of the activities being evaluated are related to each other and jointly contribute to the outputs and outcomes. Therefore program theories are particularly suitable for evaluating TBII. The focus is on humans performing tasks with tools in contexts. The tools are not evaluated as stand-alone systems but rather as components in interaction with associated human behaviors and factors. A program theory aims to provide an explicit theory of how a program causes the intended outputs and outcomes. It specifies the causal mechanism explaining which components contribute and how significantly, or fail to contribute, to the outputs and outcomes. Therefore program theories help direct attention to significant program ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article xx, Publication date: Month YYYY

components rather than to marginal ones. This reminds of keeping perspective in evaluation and helps avoid excessive sub-optimization of individual tools contributing to the process. The five activities incorporated in the proposed TBII evaluation framework can be supported by diverse types of tools. In more routine task processes TBII can be modeled into a task-specific integrated process support tool. In more complex tasks, such may not be possible and one has to rely on more generic tools. However, even such tools become increasingly available and integrated in the web environment, more comprehensively supporting the entire TBII process. The proposed comprehensive evaluation framework helps design the integrated tools properly for task support. The proposed TBII evaluation framework has some limitations. Most importantly, we have not specified for individual activities complete and tested program theories. We have rather provided general level models listing some essential aggregate factors and causal chains that call for empirical testing. These should help as guidelines in detailed program specification and empirical evaluation. At the moment, there is not sufficient knowledge available for specifying program theories for the activities. It is also an open question how generally this can be done. While we focused on learning tasks, we believe that many tasks have a learning component and can be seen as learning tasks even if the main aim may be something else (such as making a decision or producing a presentation). The task performer often cannot avoid learning either about the domain, the problem, or the process, when performing the task. In very routine tasks the learning component may however be minor. We think that many aspects of TBII evaluation in learning tasks also apply to a wider scope of tasks. Much empirical research is needed to establish this. Table 10.1 gives a typology of four types of studies needed. Table 10.1. Types of research designs

Study type / Focus Descriptive Experimental

Focus on phenomenon I II

Focus on Tools III IV

Research designs may be phenomenon-oriented, asking how the activity is performed, or tool oriented, evaluating how they contribute to the activity outputs. On the other hand, research may be descriptive or experimental under both orientations. Descriptive studies with focus on phenomenon (I) are the first steps in understanding how actors perform an activity in the wild. Experimental studies with focus on phenomenon (II) allow finding out how much, and in what way, the components of each activity specified in the program theory contribute to the outputs and outcomes. Descriptive studies with focus on tools (III) aim to find out how a given tool is used and contributes to the activity. Finally, experimental studies with focus on tools (IV) enable answering questions such as how much, and in what way, an individual tool serving an activity contributes to the outputs and outcomes. In the preceding sections we have discussed several studies that fall into these types. We believe that the IR research community needs a systematic approach in carrying out further evaluation studies on TBII. Another possible limitation of our framework for TBII evaluation is its focus on learning tasks. We approach learning tasks from a broader perspective of performing complex tasks, which provide an intentional or unintentional learning component. We are therefore not confined to activities typically taking place in formal education such as writing essays. While the scope of information intensive tasks our framework covers needs to be analyzed, it covers at least an important subset of tasks. A third limitation of our framework is that we do not consider human cooperation in information intensive task performance. Computer supported collaborative work

(CSCW) [Neale et al. 2004] and collaborative information seeking (or retrieval) [Shah 2012] are important application areas of TBII and would probably benefit from program theory inspired evaluation. One may question, whether the program theory approach is necessary for TBII evaluation. Program theory is a way to approach evaluation, to ask pertinent questions and to structure evaluation comprehensively. It is a heuristic toward good evaluations of social programs and, as proposed in the present paper, of TBII. Looking at the result alone, e.g., the proposed evaluation framework for TBII, it is impossible prove that program theory was used to derive it (unless this is explicitly stated). It may be possible to arrive at a similar result without program theory. Still, we are convinced that program theory is an excellent approach to structure the evaluation of TBII. 11. CONCLUSIONS

We have presented an evaluation framework for task-based information interaction based on program theory. By tasks we mean the tasks initiating information interaction, and learning tasks in particular. Our aim is to extend IR evaluation so that an information system’s contribution to task outcome can be evaluated. Program theory requires building models that explicitly connect input factors and process factors to system outputs and outcomes when evaluating task-based information interaction where a human being is performing a task and using tools for information interaction in context. We propose that, in the case of learning tasks, there are five intertwined activity types that cover information interaction from searching to task outcome: task planning and reflective assessment, searching information items, selecting them, working with them, and synthesizing and reporting. We structured, following program theory, in each case at the general level the input, process and output factors and how they are connected across activity types. We are convinced that this type of analysis and modeling helps to evaluate properly information interaction, and its facilitating tools, for their contributions to task outcomes. The proposed task based evaluation framework extends activities and associated support tools to be evaluated. Not only search activities and tools producing ranked result list should be evaluated, but also activities related to processing and using information in the documents retrieved for task performance. The framework structures the activities and their relations to be explored in evaluation. In this way it facilitates the discovery of new research questions for the research community. The phenomenon under investigation becomes more complex, e.g., several factors including the tool to be evaluated may influence attaining the goal given to the object of evaluation. In order to validly evaluate the object, one has to control the potentially relevant factors also contributing to the goal attainment. It is also important to explicate the mediating or moderating factors influencing the evaluation result. This requires a more thorough structuring of the factors belonging to the research design, i.e., increasing theoretical understanding of the phenomenon of interest. TBII evaluation seems to speak for enriching the field of IR with ideas from HCI. This suggests taking the human actor more into account as a part of the system to be evaluated. The major types of factors influencing each activity type are the perceived task with characteristics; the actor with characteristics (prior knowledge and perceived situation); prior information interaction results as input to the activity under scrutiny; the information system(s) supporting the activity; and the perceived goals regarding the output of the activity and the outcomes of the task. When an information system is one of the objects of evaluation, it must be understood how the system features under study likely enhance an actors’ ability to perform information interaction and consequently the task. Actors’ characteristics vary, and the support derived from the system by the actors varies accordingly. Therefore, it is essential also to include human factors in evaluation experiments, not only in field studies. ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article xx, Publication date: Month YYYY

A step further for including human factors in evaluation experiments, is to keep system features constant, and explore how manipulating human behavior influences the efficiency or effectiveness of a system in task performance. This would emphasize the focus on the phenomenon in research designs (cf. Research design III, Table 10.1). In line with the study by Moraveji and colleagues [2011] one could assess to what extent enhancing human capacity, e.g., by educating actors to perform search or other activities would influence the way actors are performing the activity and how this is related to interacting with information. In all, combined with our other suggestions, this would enlarge the scope of IR evaluation towards understanding the role of tasks and human behavior in this context. ACKNOWLEDGMENTS This research was funded, in part, by the Academy of Finland grant #133021. The constructive comments by the anonymous reviewers are also gratefully acknowledged.

REFERENCES Adler, A., Gujar, A., Harrison, B.L., O’Hara, K., and Sellen, A. 1998. A diary study of work-related reading: design implications for digital reading devices. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI '98). ACM/Addison-Wesley, New York, NY, USA, 241-248. Anderson, T.D. 2006. Uncertainty in action: observing information seeking within the creative processes of scholarly research. Information Research, 12, 1(2006), paper 283. Arvola, P. and Kekäläinen, J. 2010. Simulating user interaction in result document browsing. In Workshop Proceedings of the 33rd Annual International ACM SIGIR Conference: Simulation of Interaction, Automated Evaluation of Interactive IR. ACM, New York, NY, USA, 27-28. Arvola, P., Kekäläinen, J., and Junkkari, M. 2010. Expected reading effort in focused retrieval evaluation. Information Retrieval 13, 5 (2010), 460-484. Arvola, P., Kekäläinen, J., and Junkkari, M. 2011. Contextualization models for XML retrieval. Inform. Process. Manag. 47, 5 (2011), 762-776. Attfield, S. and Dowell, J. 2003. Information seeking and use by newspaper journalists. J. Doc. 59, 2 (2003), 187-204. Barry, C.L. and Schamber, L. 1998. Users' criteria for relevance evaluation: A cross-situational comparison. Inform. Process. Manag. 34, 2/3 (1998), 219-236. Bates, M.J. 1989. The design of browsing and berrypicking techniques for the online search interface. Online Inform. Rev. 13, 5 (1989), 407-424. Bates, M.J. 1990. Where should the person stop and the information search interface start? Inform. Process. Manag. 26, 5 (1990), 575-591. Bates, M.J. 2007. What is browsing – really? A model drawing from behavioural science research. Information Research 12, 4 (2007), paper 330. Belkin, N.J. 1980. Anomalous states of knowledge as a basis for information retrieval. Can. J. Inform. Lib. Sci. 5 (1980), 133-143. Belkin, N.J. 2010. On the evaluation of interactive information retrieval systems. In The Janus Faced Scholar. A Festschrift in Honor of Peter Ingwersen. Det Informationsvidenskabelige Akademi (Royal School of Library and Information Science, Copenhagen); ISSI (International Society for Scientometrics and Informetrics), Copenhagen, Denmark, 13-22. Belkin, N.J., Seeger, R., and Wersig, G. 1983. Distributed expert problem treatment as a model for information system analysis and design. J. Inform. Sci. 5 (1983), 153-167. Berner, E.S. 2007. Clinical Decision Support Systems. Springer, New York, NY, USA. Biggs, J. 1988. The role of metacognition in enhancing learning. Aust. J. Educ. 32, 2 (1988), 127-138. Blandford, A. and Attfield, S. 2010. Interacting with Information. Morgan and Claypool, San Rafael, CA, USA. Bloom, B.S., Engelhart, M.D., Furst, E.J., Hill, W.H., and Krathwohl, D.R. 1956. Taxonomy of Educational Objectives: The Classification of Educational Goals. David McKay, New York, NY, USA. Bookstein, A. 1983. Information retrieval: a sequential learning process. J. Am. Soc. Inf. Sci. 34, 5 (1983), 331-342. Borgman, C.L. 1985. The user's mental model of an information retrieval system. In Proceedings of the Eighth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '85). ACM, New York, NY, USA, 268–273. Borlund, P. 2000. Experimental components for the evaluation of interactive information retrieval systems. J. Doc. 50, 1, (2000), 71-90. Borlund, P. 2003. The concept of relevance in IR. J. Am. Soc. Inf. Sci. Technol. 54, 10 (2003), 913–925. Bowler, L. 2010. A taxonomy of adolescent metacognitive knowledge during the information search process. Libr. Inform. Sci. Res., 32, 1 (2010), 27-42. Bron, M., van Gorp, J., Nack, F., de Rijke, M., Vishneuski, A., and de Leeuw, S. 2012. A subjunctive exploratory search interface to support media studies researchers. In Proceedings of the 35th

International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '12). ACM, New York, NY, USA, 425-434. doi:10.1145/2348283.2348342 Butcher, K.R., Davies, S., Crockett, A., Dewald, A., and Zheng, R. 2011. Do graphical search interfaces support effective search for and evaluation of digital library resources? In Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL '11). ACM, New York, NY, USA, 315-324. doi:10.1145/1998076 Byström, K. and Hansen, P. 2005. Conceptual framework for tasks in information studies. J. Am. Soc. Inf. Sci. Technol. 56, 10 (2005), 1050-1061. Byström, K. and Järvelin, K. 1995. Task complexity affects information seeking and use. Inform. Process. Manag. 31, 2 (1995), 191-213. Carr, M., Kurtz, B.E., Schneider, W., Turner, L.A., and Borkowski, J.G. 1989. Strategy acquisition and transfer among German and American children: Environmental influences on metacognitive development. Dev. Psychol. 25, 5 (1989), 765-771. Checkland, P. and Holwell, S. 1998. Action research: Its nature and validity. Syst. Pract. Act. Res. 11, 1 (1998), 9-21. Chen, C-M. and Lin, S-T. 2014. Assessing effects of information architecture of digital libraries on supporting e-learning: A case study on the Digital Library of Nature & Culture. Comput. Educ. 75, 1(2014), 92-102. Cool, C. and Belkin, N.J. 2002. A classification of interactions with information. In Emerging Frameworks and Methods, Proc. of the 4th International Conference on Conceptions of Library and Information Science (CoLIS4). Greenwood Village, CO, Libraries Unlimited, 1-15. Coryn, C.L.S., Noakes, L.A., Westine, C.D., and Schröter, D.C. 2011. A systematic review of theory-driven evaluation practice from 1990 to 2009. Am. J. Eval., 32, 2 (2011), 199-226. Cosijn, E. and Ingwersen, P. 2000. Dimensions of relevance. Inform. Process. Manag. 36, 4 (2000), 533-550. Dillon, A. and Turnbull, D. 2005. Information architecture. In (3rd ed.) Encyclopedia of Library and Information Sciences 1, 1 (2005), 2361–2368. Flavell, J.H. 1979. Metacognition and cognitive monitoring: A new area of cognitive-developmental inquiry. Am. Psychol. 34, 10 (1979), 906-911. Flower, L., and Hayes, J. R. 1981. A cognitive process theory of writing. College Composition and Communication, 32, 4 (1981), 365–387. Freund, L., Toms, E.G., and Clarke, C.L.A. 2005. Modeling task-genre relationships for IR in the workplace. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Rretrieval (SIGIR '05). ACM, New York, NY, USA, 441-448. Goldman, S., Lawless, K., and Manning, F. 2013. Research and development of multiple source comprehension assessment. In Reading - from words to mutiple texts, M. A. Britt, S. R. Goldman, and J.-F. Rouet (Eds.). Routledge, New York, NY, USA, 180–199. Halttunen, K., and Järvelin, K. 2005. Assessing learning outcomes in two information retrieval learning environments. Inform. Process. Manag. 41, 4 (2005), 949 – 972. Hansen, P. 2011. Task-based Information Seeking and Retrieval in the Patent Domain: Processes and Relationships. Acta Universitatis Tamperensis: 1631, Tampere University Press, Tampere, Finland. Heinström, J. 2005. Fast surfing, broad scanning and deep diving: The influence of personality and study approach on students’ information-seeking behavior. J. Doc. 61, 2 (2005), 228–247. Heinström, J. 2006. Fast surfing for availability or deep diving into quality – motivation and information seeking among middle and high school students. Information Research 11, 4 (2006), paper 433. Hersh, W. 1994. Relevance and retrieval evaluation: Perspectives from medicine. J. Am. Soc. Inf. Sci. 45, 1 (1994), 201-206. Hersh, W., Pentecost, J., and Hickam, D. 1996. A task-oriented approch to information retrieval. J. Am. Soc. Inf. Sci. 47, 1 (1996), 50-56. Howard, R., Serviss, T., and Rodrigue, T. 2010. Writing from sources, writing from sentences. Writing and Pedagogy 2, 2 (2010), 177–192. Huuskonen, S. and Vakkari, P. 2010. Client information system as an everyday information tool in child protection work. In Proceedings of the Third Symposium on Information Interaction in Context (IIiX '10). ACM, New York, NY, USA, 3-12. Ingwersen, P. and Järvelin, K. 2005. The Turn: Integration of Information Seeking and Retrieval in Context. Springer, Dordrecht, The Netherlands. Järvelin, K. 2009. Explaining user performance in information retrieval: Challenges to IR evaluation. In Proceedings of the 2nd International Conference on the Theory of Information Retrieval (ICTIR '09). Springer, Heidelberg, Germany, 289-296. Järvelin, K. 2011. Evaluation. In Interactive Information Seeking, Behaviour and Retrieval, Ruthven, I. and Kelly, D. (Eds.). Facet Publishing, London, UK, 113-138. Jones, M.G., Farquhar, J.D., and Surry, D.D. 1995. Using metacognitive theories to design user interfaces for computer-based learning. Educational Technology 35, 4 (1995), 12–22. Kekäläinen, J., Arvola, P., and Kumpulainen, S. 2014. Browsing patterns in retrieved documents. In Proceedings of the 5th Information Interaction in Context Symposium (IIIX '14). ACM, New York, NY, USA, 299-302.


Kekäläinen, J. and Järvelin, K. 2002. Evaluating information retrieval systems under the challenges of interaction and multi-dimensional dynamic relevance. In Proceedings of the 4th CoLIS Conference. Libraries Unlimited, Greenwood Village, CO, USA, 253-270. Kiili, C., Laurinen, L., Marttunen, M., and Leu, D. J. 2012. Working on understanding during collaborative online reading. Journal of Literacy Research 44, 4 (2012), 448–483. Kopak, R., Freund, L., and O’Brien, H.L. 2010. Supporting semantic navigation. In Proceedings of the Third Symposium on Information Interaction in Context (IIIX ’10). ACM, New York, NY, USA, 359–364. Krathwohl, D.R. 2002. A revision of Bloom’s taxonomy : An overview. Theory into Practice 41, 4 (2002), 212– 264. Kuhlthau, C. 1991. Inside the search process. J. Am. Soc. Inf. Sci. 42, 5 (1991), 361-371. Kuhlthau, C. 1993. Seeking Meaning. Ablex, Norwood, N.J, USA. Kuhlthau, C. 1999. The role of experience in the information search process of an early career information worker: Perceptions of uncertainty, complexity, construction, and sources. J. Am. Soc. Inf. Sci. 50, 5 (1999), 399–412. Kuhlthau, C. and Tama, S.L. 2001. Information search process of lawyers: a call for ’just for me’ information services. J. Doc. 57, 1 (2001), 25–43. Kumpulainen, S. 2013. Task-based Information Access in Molecular Medicine: Task Performance, Barriers, and Searching within a Heterogeneous Information Environment. Ph.D. Thesis, Tampere University Press, Acta Universitatis Tamperensis: 1879. Kumpulainen, S. 2014. Trails across the heterogeneous information environment: manual integration patterns of search systems in molecular medicine. J. Doc. 70, 5 (2014), 856-877. doi: 10.1108/JD-06-20130082 Kumpulainen, S. and Järvelin, K. 2010. Information interaction in molecular medicine: Integrated use of multiple channels. In Proceedings of the Third Symposium on Information Interaction in Context (IIiX '10). ACM, New York, NY, USA, 95-104. Kumpulainen, S. and Järvelin, K. 2012. Barriers to task-based information access in molecular medicine. J. Am. Soc. Inf. Sci. Technol. 63, 1 (2012), 86–97. Li, Y. and Belkin, N. 2008. A faceted approach to conceptualizing tasks in information seeking. Inform. Process. Manag. 44, 6 (2008), 1822-1837. Li, Y. and Belkin, N. 2010. An exploration of the relationships between work task and interactive information search behavior. J. Am. Soc. Inf. Sci. Technol. 61, 9 (2010), 1771-1789. Li, Y. and Casanave, C.P. 2012. Two first-year students’ strategies for writing from sources: Patchwriting or plagiarism? Journal of Second Language Writing 21, 2 (2012), 165–180. Liu, J. and Belkin, N. 2012. Searching vs. writing: Factors affecting information use task performance. In Proceedings of the American Society for Information Science and Technology, 49, 1 (2012), 1–10. Lucchese, C., Orlando, S., Perego, R., Silvestri, F., and Tolomei, G. 2011. Identifying Task-based Sessions in Search Engine Query Logs. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining (WSDM '11). ACM, New York, NY, USA, 277-286. Marchionini, G. 2006. Exploratory search: from finding to understanding. Commun. ACM 49, 4 (2006), 4146. Markkula, M. and Sormunen, E. 2006. Video needs at the different stages of television program making process. In Proceedings of the 1st International Conference on Information Interaction in Context (IIiX ‘06). ACM, New York, NY, USA, 111-118. Marshall, C.C. 2009. Reading and Writing the Electronic Book. Morgan and Claypool, San Rafael, CA, USA. Mateos, M. and Solé, I. 2009. Synthesising information from various texts: a study of procedures and products at different educational levels. Eur. J. Psychol. Educ. 24, 4 (2009), 435-451. Monte-Sano, C. and De La Paz, S. 2012. Using Writing Tasks to Elicit Adolescents’ Historical Reasoning. J. Lit. Res. 44, 3 (2012), 273–299. de Moor, A. and Aakhus, M. 2006. Argumentation support: From technologies to tools. Commun. ACM 49, 3 (2006), 93-98. Moraveji, N., Russell, D., Bien, J., and Mease, D. 2011. Measuring improvement in user search performance resulting from optimal search tips. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '11). ACM, New York, NY, USA, 355-364. Neale, D.C., Carroll, J.M., and Rosson, M.B. 2004. Evaluating computer-supported cooperative work: models and frameworks. In Proceedings of the 2004 ACM Conference on Computer Supported Cooperative Work (CSCW’04). ACM, New York, NY, USA, 112-121. Oard, D.W. and Kim, J. 2001. Modeling information content using observable behavior. In Proceedings of the 64th Annual Conference of the American Society for Information Science and Technology (ASIST 2001). Washington, USA, 481-488. Pearson, J., Buchanan, G., and Thimbleby, H. 2013. Designing for Digital Reading. Morgan and Claypool, San Rafael, CA, USA. Qu, Y. and Furnas, G. 2008. Model-driven evaluation of exploratory search: A study under sensemaking framework. Inform. Process. Manag. 44, 2 (2008), 534-555. Rogers, P.J., Petrosino, A., Huebner, T.A., and Hacsi, T.A. 2000. Program theory evaluation: Practice, promise, and problems. New Directions for Evaluation, 87 (2000), 5–13.

Rossi, P.H., Lipsey, M.W., and Freeman, H. 2004. Evaluation: A Systematic Approach. Sage, Thousand Oaks, CA, USA. Rouet, J.-F. 2006. The Skills of Document Use: From Text Comprehension to Web-based Learning. Erlbaum, Mahwah, NJ, USA. Ruthven, I. 2008. Interactive information retrieval. Annu. Rev. Inform. Sci. 42, 1 (2008), 43-91. Ruthven, I. 2012. Grieving online: the use of search engines in times of grief and bereavement. In Proceedings of the 4th Information Interaction in Context Symposium (IIIX '12). ACM, New York, NY, USA, 120-129. Saastamoinen, M., Kumpulainen, S., and Järvelin, K. 2012. Task Complexity and Information Searching in Administrative Tasks Revisited. In Proceedings of the 4th Information Interaction in Context Symposium (IIIX '12). ACM, New York, NY, USA, 204-213. Sakai, T. and Dou, Z. 2013. Summaries, ranked retrieval and sessions: A unified framework for information access evaluation. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '13). ACM, New York, NY, USA, 473-482. Sanderson, M. 2010. Test collection based evaluation of information retrieval systems. Foundations and Trends in Information Retrieval 4, 4 (2010), 247—375. Saracevic, T. 1975. Relevance: A review of and a framework for thinking on the notion in information science. J. Am. Soc. Inf. Sci. 26, 6 (1975), 321-343. Saracevic, T. 1996a. Modeling interaction in information retrieval (IR): a review and proposal. In Proceedings of the American Society for Information Science, 33 (1996), 3-9. Saracevic, T. 1996b. Relevance reconsidered ’96. In Information science: Integration in perspectives. Proceedings of the 2nd Conference on Conceptions of Library and Information Science. The Royal School of Librarianship, Copenhagen, DK, 201-218. Schraw, G. 1998. Promoting general metacognitive awareness. Instructional science, 26, 1-2 (1998, 113-125. Scriven, M. 1991. Evaluation Thesaurus. 4th ed. Sage, Newbury Park, CA, USA. Segev-Miller, R. 2004. Writing from sources: The effect of explicit instruction on college students’ processes and products. Educational Studies in Language and Literature 4, 1 (2004), 5–33. Shah, C. 2012. Collaborative Information Seeking: The Art and Science of Making the Whole Greater than the Sum of All. Springer, Heidelberg, Germany. Smith, C.L. and Kantor, P.B. 2008. User adaptation: Good results from poor systems. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '08). ACM, New York, NY, USA, 147-154. Sormunen, E. and Lehtiö, L. 2011. Authoring Wikipedia articles as an information literacy assignment – copy-pasting or expressing new understanding in one’s own words? Information Research 16, 4 (2011), paper 503. Sormunen, E., Heinström, J., Romu, L., and Turunen, R. 2012. A method for the analysis of information use in source-based writing. Information Research 17, 4 (2012), paper 535. Sormunen, E., Tanni, M., Alamettälä, T., and Heinström, J. 2014. Students' group work strategies in sourcebased writing assignments. J. Assoc. Inf. Sci. Technol. 65, 6 (2014), 1217-123132 Spivey, N.N. 1997. The Constructivist Metaphor. Reading, Writing, and the Making of Meaning. Academic Press, San Diego, CA, USA. Suchman, L. 1987. Plans and Situated Actions: The Problem of Human-machine Communication. Cambridge University Press, New York, USA. Swanson, D. 1977. Information retrieval as a trial-and-error process. Libr. Quart. 47, 2 (1977), 128-148. Tague, J. and Schultz, R. 1988. Some measures and procedures for evaluation of the user interface in an information retrieval system. . In Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '88). ACM, New York, NY, USA, 371-385. Tanni, M. and Sormunen, E. 2008. A critical review of research on information behavior in assigned learning tasks. J. Doc. 64, 6 (2008), 893-914. Todd, R. 2006. From information to knowledge: charting and measuring changes in students’ knowledge of a curriculum topic. Information Research 11, 4 (2006), paper 264. Turpin, A. and Scholer, F. 2006. User performance versus precision measures for simple search tasks. In Proceedings of the 29th aAnnual International ACM SIGIR Conference on Research and Development in Information Rretrieval (SIGIR '06). ACM, New York, NY, USA, 11-18. Vakkari P. 1999. Task complexity, problem structure and information actions: integrating studies on information seeking and retrieval. Inform. Process. Manag. 35, 6 (1999), 819-837. Vakkari, P. 2000. Relevance and contributing information types of searched documents in task performance. In Proceedings of the 23rd aAnnual International ACM SIGIR Conference on Research and dDevelopment in Information Retrieval (SIGIR '00). ACM, New York, NY, USA, 2-9. Vakkari, P. 2001a. A theory of the task-based information retrieval process: a summary and generalization of a longitudinal study. J. Doc. 57, 1 (2001), 44-60. Vakkari, P. 2001b. Changes in search tactics and relevance judgments in preparing a research proposal: A summary of findings of a longitudinal study. Information Retrieval 4, 3-4 (2001), 295-310. Vakkari, P. 2003. Task based information searching. Annu. Rev. Inform. Sci. 37, 1 (2003). Information Today, Medford, NJ, USA. ACM Transactions on xxxxxxxx, Vol. xx, No. x, Article xx, Publication date: Month YYYY

Vakkari, P. 2010. Exploratory searching as conceptual exploration. In Proceedings of 4th Workshop on Human-computer Interaction and Information Retrieval. Rutgers University, New Brunswick, NJ, USA, 24-27. Vakkari, P. and Hakala, N. 2000. Changes in relevance criteria and problem stages in task performance. J. Doc. 56, 5 (2000), 540-562. Vakkari, P. and Huuskonen, S. 2012. Search effort degrades search output but improves task outcome. J. Am. Soc. Inf. Sci.Technol 63, 4 (2012), 657-670. Vakkari, P., Pennanen, M., and Serola, S. 2003. Changes of search terms and tactics while writing a research proposal. Inform. Process. Manag. 39, 3 (2003), 445-463. Wang, P. and Soergel, D. 1998. A cognitive model of document use during a research project: Study I: Document selection. J. Am. Soc. Inf. Sci. 49, 2 (1998), 115-133. Wildemuth, B. 2004. The effects of domain knowledge on search tactic formulation. J. Am. Soc. Inf. Sci. Technol. 55, 3 (2004), 246-258. 7 Wildemuth, B. M., de Bliek, R., Friedman, C. P., and File, D. D. 1995. Medical students’ personal knowledge, searching proficiency, and database use in problem solving. J. Am. Soc. Inf. Sci. Technol. 46, 8 (1995), 590-607. Wu, I.C., Lu, D.R., and Chang, P.C. 2008. Towards incorporating a task-stage identification technique into long-term document support process. Inform. Process. Manag. 44, 5 (2008), 1649-1672. Xie, I. 2008. Interactive Information Retrieval in Digital Environments. IGI, New York, NY, USA. Zaugg, H., West, R.E., Tateishi, I., and Randall. D.L. 2011. Mendeley: Creating communities of scholarly inquiry through research collaboration. TechTrends 55, 1 (2011), 32-36.

task-based information interaction evaluation: the ...

task-based information interaction evaluation: the ...

Suggest Documents

Monitoring and Evaluation - InterAction

Evaluation in interaction - Core

Monitoring and Evaluation - InterAction

The interaction effect of accounting information ...

Supplementary Information The interaction between ...

Quantum Information Processing Using the Exchange Interaction

Supplementary Information Evaluation of the

Information Visualization - Human-Computer Interaction

SUPPORTING INFORMATION Distance interaction ...

Modelling interaction to inform information

Supplementary Information Dynamic interaction of

HumanâInformation Interaction with Complex

Fluid interaction for information visualization

dynamic interaction of information systems

Advanced Interaction for Information Visualization

District Educator Evaluation Information

Evaluation in Information Retrieval

Health Information Systems Evaluation

EVALUATION OF EVALUATION IN INFORMATION RETRIEVAL Tefko

information quality evaluation for grid information

Evaluation of the influence of the polymer-filler interaction on ...

Evaluation of the Interaction between the Poincianella pyramidalis (Tul

Melding Information Systems Evaluation with the ... - CiteSeerX

AN EVALUATION OF THE MANAGEMENT INFORMATION SYSTEM ...

task-based information interaction evaluation: the ...