From Documents to Tasks: Deriving User Tasks from Document Usage Patterns Oliver Brdiczka Palo Alto Research Center (PARC) 3333 Coyote Hill Road , Palo Alto, CA, USA
[email protected] ABSTRACT A typical knowledge worker is involved in multiple tasks and switches frequently between them every work day. These frequent switches become expensive because each task switch requires some recovery time as well as the reconstitution of task context. First task management support systems have been proposed in recent years in order to assist the user during these switches. However, these systems still need a fairly big amount of investment from the user side in order to either learn to use or train such a system. In order to reduce the necessary amount of training, this paper proposes a new approach for automatically estimating a user’s tasks from document interactions in an unsupervised manner. While most previous approaches to task detection look at the content of documents or window titles, which might raise confidentiality and privacy issues, our approach only requires document identifiers and the temporal switch history between them as input. Our prototype system monitors a user’s desktop activities and logs documents that have focus on the user’s desktop by attributing a unique identifier to each of these documents. Retrieved documents are filtered by their dwell times and a document similarity matrix is estimated based on document frequencies and switches. A spectral clustering algorithm then groups documents into tasks using the derived similarity matrix. The described prototype system has been evaluated on user data of 29 days from 10 different subjects in a corporation. Obtained results indicate that the approach is better than previous approaches that use content. Categories and Subject Descriptors H.1.2 User/Machine Systems: Human factors; H.5.2 User Interfaces: Theory and methods; I.5 Pattern Recognition: Clustering General Terms Algorithms, Experimentation, Measurement. Keywords Automatic task identification, document clustering, user task modeling. 1. INTRODUCTION Work of knowledge workers is characterized by spending short amounts of time in tasks and switching frequently between them [2]. While frequent switches between task contexts lead to long recovery periods and increased stress [5], some dedicated UI and technology Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. IUI’10, February 7–10, 2010, Hong Kong, China. Copyright 2010 ACM 978-1-60558-515-4/10/02...$10.00.
to assist the user in this recovery and switching phases has been proposed in recent years (e.g., [1] or [7]). However, the lack of correctness in the representation of the user’s tasks as well as high set-up and maintenance costs in terms of training or learning to use these new systems are still a hurdle for their wide adoption. In this paper, we present a novel approach for constructing a model of a user’s tasks without user intervention. The proposed method is based on spectral clustering of document interactions and, in contrast to previous approaches, it does not rely on content of the documents themselves. The correctness of the proposed approach has been evaluated on a large data set of 29 days of desktop work from 10 knowledge workers. Much research work has focused on automatic task prediction using supervised machine learning. The TaskTracer system [1] which has been built within the DARPA CALO project employed task prediction approaches that focus on the selection of different features that are observed and discriminative for each task. After feature selection, tasks are predicted using a hybrid approach of naïve Bayesian classifier and support vector machines [9]. The used features include windows desktop events, email senders/recipients and window title information. A recent version (TaskTracer2 [10]) includes further features and an improved real-time detection and online learning algorithm. However, these approaches focus on supervised training, i.e. the user needs to invest time to declare his or her current activity during run time and based on given user feedback, a task model and representation is learned and evolved. No a priori model or representation of the user’s tasks is acquired from collected observations (without human intervention). An unsupervised approach [8] that uses similar features can detect task switches and is able to identify task boundaries but not the actual tasks themselves. Rattenbury et al. proposed an automatic task support system, called CAAD [7]. CAAD grouped software artifacts (documents, folders, web pages, people) of a user’s interaction contexts into clusters to represent his or her tasks. While CAAD established associations between artifacts and task clusters automatically using an algorithmic approach, it did not focus on the a priori identification of the user’s tasks, but rather on the visualization and possible editing of the clusters. Consequently, system evaluations concerned the acceptance of the visualization or approach rather than the accuracy of the task identification. The Connections system [12] aimed at enhancing search and document retrieval with usage contexts. The idea is to improve content-based search like google desktop by integrating document usage context. Included features are document and file accesses as well as document content. A later system, called Confluence [3], enhances this search by task-based document retrieval. Both systems could show improvements in document search in their evaluations. While these systems do not require any user input
concerning his tasks or activities, they do not create a representation of a user’s tasks or identify his tasks explicitly. The Microsoft SWISH system [6] is the most closely related work to ours. SWISH constructs a model of a user’s tasks based on tf-idf filtered terms in the title of application windows and windows switching history. A number of clusters of the filtered terms are isolated, corresponding to the user’s tasks. Probabilistic Latent Semantic Indexing as well as switching history is used to create a number of clusters corresponding to the user’s tasks. While the first SWISH prototype has been evaluated with good results in a lab study, the number of involved users and task was rather small compared to the data set used for evaluation in this paper. The aim of the approach proposed in this paper is to construct a first representation of the user’s tasks based on observed document interactions. The method is completely unsupervised, so no user feedback is necessary. In contrast to previous approaches, the proposed method only leverages document usage information (i.e. switches between documents and document dwell times) and does not intervene with the content of the documents themselves. The extraction and use of content becomes particularly a problem when task representations are shared among co-workers. A user may, for example, not want to share all documents of a task with his subordinates. By using an algorithm that is only based on document identifiers, it is guaranteed that the shared task representation itself will not contain/reveal any sensitive content (e.g., keywords). Links to the actual documents can be included, but will be protected by access right that can be granted or refused. In comparison to most of the previous work, the proposed approach has been applied and evaluated on a big dataset collected from corporate users that were not involved in the research project. The obtained results are promising, outperforming previous content-based approaches. 2. SPECTRAL CLUSTERING OF DOCUMENT INTERACTIONS The aim of the approach proposed in this paper is to provide a means to discover and re-construct a user’s tasks from document interactions without using any document content related data. Our prototype system logs document interactions (opening, closing, switching between documents). Similarity scores for documents are further derived based on their co-occurrences. A spectral clustering approach then groups documents into clusters, corresponding to potential user tasks. 2.1 Logging of Document Interactions Our prototype logs which and when applications get focus on a user’s desktop. The path of the document currently displayed in the application window is then extracted. Document information is anonymized by automatically attributing a unique identifier to each extracted document path and only logging this identifier for further analysis. The prototype has been implemented on Windows XP using AutoHotkey (www.autohotkey.com) scripting language. Embedded Visual Basic scripting has been used to query and interact with Microsoft Outlook and Microsoft Office applications. The logging prototype resides in the Windows taskbar and runs in the background on a user’s PC. While our prototype is not able to extract document paths from all possible applications, it covers Microsoft Outlook, Microsoft Office applications and Acrobat Reader as well as most utility programs included in Windows XP. In addition to logging the document identifiers, the prototype also calculates and maintains usage statistics for each document
identifier. These include the similarity score to other documents as well as the dwell time for each document. The similarity score is for two documents doc1 and doc2 is defined by:
sim(doc1 , doc2 ) =
# co − occurences of doc1 and doc2 # occurences of doc1
A co-occurrence is defined by a direct switch from one document to another on a user’s desktop, while an occurrence is defined by a contiguous period of time when one document has focus. The notion dwell time tdwell of a document doc1 refers to the time a user actively works on a document, i.e.
t dwell (doc1 ) = t focus (doc1 ) − tidle (doc1 ) , where tfocus is the time a document has focus on a user’s desktop and tidle is the time no keyboard or mouse events occurred while the document had focus. Similarity score and dwell time are the input for our spectral clustering approach described in the following. 2.2 Spectral Clustering Approach As soon as enough data is logged, the spectral clustering algorithm is started and run as a background process by the logging prototype that resides in the taskbar. The data processing and spectral clustering algorithm have been implemented in Java using the WEKA machine learning framework [13]. Spectral clustering refers to a class of techniques that rely on the eigenvalues of a similarity matrix to partition points into disjoint clusters, with points in the same cluster that have high similarity and points in different clusters that have low similarity. Spectral clustering has been widely used in machine learning approaches, computer vision and speech processing. Spectral clustering techniques make use of the spectrum of the similarity matrix of the data to perform dimensionality reduction for clustering in fewer dimensions. We use an implementation of the Shi-Malik algorithm [11] included in the WEKA machine learning framework [13]. Given a set of data points A, the similarity matrix may be defined as a matrix S where Sij represents a measure of the similarity between points in a graph. It partitions data points A into two sets (A1, A2) based on the eigenvector v corresponding to the second-smallest eigenvalue of the Laplacian matrix L of S. L is
L = I − D −1/ 2 S D −1/ 2 where D Dii = ∑ Sij . Each partitioning step
defined by matrix
is the diagonal is evaluated by
j
calculating the degree of dissimilarity between the two resulting sets A1 and A2, the so called normalized cut citerion (Ncut). The Ncut criterion computes the weight of the edges connecting the two partitions as a fraction of the total edge connections to all the nodes in the graph. By comparing the Ncut value to a threshold (chosen between 0 and 1), it can be decided whether the current partition should be subdivided and recursively repartition the segmented parts if necessary. The recursive splitting stops as soon as no partition has a Ncut value below the specified threshold. The set of data points A is thus hierarchically subdivided into clusters of points having highest similarity. Before applying spectral clustering to the document similarity values, we filter the observed documents by their dwell times. The aim is to remove spurious documents that do not belong to any of the user’s tasks and therefore the interaction with these documents is
limited and of short duration. Only documents with an average dwell time of 25 seconds are considered to be relevant to any of the user’s tasks in our approach. The similarity scores of these documents are the input for spectral clustering ( Sij=sim(doci,docj) ). The threshold for the Ncut criterion has been set to 0.1, admitting only partition splits that cut connections between a rather small fraction of documents. The resulting partitions from the spectral clustering algorithm represent groups of documents with high similarity, i.e. with a high fraction of switches between them. Assuming that the user switches more frequently between documents that relate to a specific user task and less frequently between documents belonging to different user tasks, each of the constructed clusters is considered to correspond to one of the user’s tasks. 3. EXPERIMENTAL EVALUATION In order to evaluate our approach, we adopted a similar evaluation methodology as used in [6]. Data ground truth was obtained by manually labeling user task classes during a large data collection/study involving 10 subjects (cf. section Data Collection). Precision, recall and F-measure have then been estimated for the clusters identified by our method (cf. section Results). 3.1 Data Collection Data was collected in situ, observing a total of 10 employees working at a research and development company. The subjects were knowledge workers belonging to different departments of the company. The concerned departments included accounting, library service, intellectual property, IT and company administration. None of the subjects was involved in our research project and prototype development. All but one subject (who was observed for only two whole days due to scheduling constraints) were observed for three whole work days. The observed days were not contiguous; each was separated by a period of one week to one month. The days of observation were selected to be ones in which the subject engages in at least one recurring task, such as monthly or weekly status reporting. However, the selected recurrent tasks only constituted a small subset of the complete task set that has been observed for each user. The remaining tasks could also recur during all days of observation (but we did not influence their recurrence in any way). The observer would meet the subject upon their arrival to work and follow the subject as closely as possible until the end of the business day. Using a paper notepad with an electronic LiveScribe pen (www.livescribe.com), the researcher would record and label, to the second, user tasks and their start/end times (e.g., 8:15:32am start “Expense Reporting”). Prior to and during the shadowing, our prototype was installed on each subject’s PC to record document usage. User task and document usage data has been anonymized, attributing identifiers to documents and subjects. Our prototype applied the spectral clustering algorithm to the recorded document interactions of each subject, automatically generating document clusters corresponding to the subject’s tasks. 3.2 Evaluation and Results The aim of the evaluation was to assess the quality of the document clusters identified by our clustering method with respect to the actual user tasks (which have been identified and labeled by a human observer as ground truth). However, as our method is completely unsupervised, it does not assign any task labels to the identified clusters which subsequently could be used for direct comparison and evaluation with the ground truth. In order to solve this problem, we automatically assign a task label to each cluster by
looking at the percentage of documents in the cluster that belong to a specific task label. The task label with the largest percentage of documents in that cluster is assumed to be the task label of the cluster. In order to evaluate the performance, we used the three quantitative measures precision, recall and F-measure. Precision is referring to the fraction of documents in a cluster that belong to the task label of that cluster. Recall designates the fraction of all documents that belong to a task label and appear in the corresponding cluster. The F-measure is the weighted harmonic mean of precision and recall:
F − measure =
2 * precision * recall precision + recall
Higher F-measure generally indicates higher performance of the concerned algorithm. A high number of different tasks have been observed and labeled for conducting the evaluation. A total number of 572 tasks have been identified by the human observer for the 10 subjects, ranging from 35 (minimum) to 77 (maximum) tasks per subject. Our prototype system identified a total of 219 clusters with a minimum of 9 and a maximum of 52 clusters per subject. Table 1 (last row) indicates the obtained results (precision, recall and F-measure) for the whole data set using the evaluation methodology described earlier in this section. While the obtained overall precision is rather low, our method has a high recall value. The low precision can be explained by the fact that the number of task labels is higher than the number of clusters that have been identified. If there are more task labels than clusters, many documents that belong to tasks that have not been mapped to any cluster add noise to the identified clusters (their task label has never been considered for creating/mapping to a cluster, but they still count as erroneous members of a cluster). However, the clusters that have been identified seem each to have a good correspondence to one particular user task, resulting in a good overall recall value. 5 tasks (10 subjects) All tasks (10 subjects)
Precision 0.71 0.20
Recall 0.77 0.76
F-measure 0.74 0.32
Table 1: Average precision, recall and F-measure for the 5 most frequent tasks per subject (50 tasks in total) and all subjects’ tasks (572 tasks in total) The tasks that have been manually identified by an observer were recorded with different levels of granularity, i.e. high-level tasks corresponding to the subject’s projects as well as low-level tasks corresponding to basic actions have been recorded. Example tasks that have been labeled are “Proposal for Client X” (high-level) or “Filling out expense web form” (low-level). In order to get to the right granularity of the labeled tasks and to be able to compare with previous approaches, tasks have been ordered according to their total duration, i.e. the accumulative total amount of time the user spent in these tasks. The evaluation of the document clusters identified by our approach has then been conducted on a limited number of these ordered tasks. By limiting the number of tasks that are considered for evaluation, we focus on evaluating higher-level tasks (e.g., projects) in which the subjects spend much of their time. Table 1 (first row) shows the obtained results (precision, recall and F-measure) for the 5 most frequent tasks of each subject. We obtain both good precision and recall for our approach. These results seem to indicate that our approach outperforms previous content-based approaches like SWISH [6] which obtained 0.49 (precision), 0.72
(recall) and 0.58 (F-measure) on a smaller data set of only 4 hours and 5 tasks in total. Thus document switch information and dwell times seem to be sufficient to identify a user’s high-level tasks. Figure 1 depicts average precision, recall and F-measure values for an increasing number of user tasks. We see that by including more and more tasks with lower frequencies, the precision gradually drops, while the recall remains quite constant. Our interpretation is that the document clusters identified by our method correspond quite well to specific high-level user tasks. However, as the number of tasks increases and thus their granularity in the data set, not all possible tasks can be isolated by our method. This is also due to the fact that we chose not to force the algorithm to generate a specific number of clusters, but to create the clusters based on the characteristics of logged data. precision
1
recall F-measure
0.9 0.8
prediction systems like TaskTracer move from tasks to documents/artifacts (top-down) and thus require the user to extensively define and train task labels before use. The proposed move from documents/artifacts to tasks (bottom-up) allows for an easier (semi-) automatic initialization for task prediction. By unobtrusively observing the user for a certain period of time, a first task representation can be constructed without user input. The user may eventually associate labels to the discovered task/document clusters. However, a simple activity/task based organization of work artifacts without explicit labels and their prediction has also shown its benefits to users [7]. The user can just choose the pertinent 'task context' (artifact collection) when he or she switches tasks. Each document cluster/task can further be edited by the user and ultimately also be shared among users. 5. REFERENCES [1]
N. Dragunov, T.G. Dietterich, K. Johnsrude, M. McLaughlin, L. Li, and J.L. Herlocker. TaskTracer: a desktop environment to support multitasking knowledge workers. In Proc. of IUI '05, 75-82, 2005.
[2]
V.M. González, G. Mark. Constant, constant, multi-tasking craziness: managing multiple working spheres. In Proc. CHI '04. ACM, New York, NY, 113-120.
[3]
K. Gyllstrom, C. Soules, A. Veitch. Activity put in context: identifying implicit task context within the user's document interaction. In Proc. IIiX’08, 51-56, 2008.
[4]
G. Mark, V.M. Gonzalez, J. Harris. No task left behind?: examining the nature of fragmented work. In Proc. CHI '05. ACM, New York, NY, 321-330.
[5]
G. Mark, D. Gudith, U. Klocke. The cost of interrupted work: more speed and stress. In Proc. CHI '08, 107-110, 2008.
[6]
N. Oliver, G. Smith, C. Thakkar, and A. C. Surendran. SWISH: semantic analysis of window titles and switching history. In Proc. IUI '06, 194-201, 2006.
[7]
T. Rattenbury, J. Canny. CAAD: an automatic task support system. In Proc. CHI’07, 687-696, 2007.
[8]
J. Shen, L. Li, T.G. Dietterich. Real-Time Detection of Task Switches of Desktop Users. In Proc. IJCAI, 2007.
[9]
J. Shen, L. Li, T. G. Dietterich, J. L. Herlocker. A hybrid learning system for recognizing user tasks from desk activities and email messages. In Proc. IUI’06, 86-92, 2006.
0.7 0.6 0.5 0.4 0.3 0.2 3
4
5
6
7
8
9
10
11
12
13
14
15
Number of User Tasks (observed)
Figure 1: Average precision, recall and F-measure with respect to the number of user tasks 4. CONCLUSION AND FUTURE WORK This paper introduced a novel approach for automatically constructing a model of a user’s tasks based on logged document interactions. In contrast to previous approaches, no content-related data (document text, window titles etc.) is recorded. Identifiers are attributed to logged documents in order to maintain data confidentiality and user privacy. The prototype system and approach has been evaluated on data of 29 work days from 10 subjects. The obtained results show that the proposed approach works well in identifying a smaller set of high-level tasks, while with increasing number of tasks and in particular task granularity, the precision of the isolated clusters gradually decreases. Furthermore, the results seem to indicate that our method outperforms previous content-based approaches like SWISH. We believe that the proposed approach provides a new efficient means for estimating a first representation of a user’s tasks without any need for human intervention or input. Existing task
[10] J. Shen, J. Irvine, X. Bao, M. Goodman, S. Kolibaba, A. Tran, F. Carl, B. Kirschner, S. Stumpf, T.G. Dietterich. Detecting and correcting user activity switches: algorithms and interfaces. In Proc. IUI’09, 117-126, 2009. [11] J. Shi, J. Malik. Normalized Cuts and Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8): 888-905, 2000. [12] C.A. Soules, G. R. Ganger. Connections: using context to enhance file search. SIGOPS Oper. Syst. Rev. 39(5): 119-132, 2005. [13] H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques, Morgan Kaufmann, San Francisco, 2005. Software available at http://www.cs.waikato.ac.nz/ml/weka/ (retrieved Sep. 2009)