Information seeking, query log analysis, consumer search task, task relationships ..... This often indicates some degree of task evolvement from broad to .... american society for information science and technology,. 53(8), 639-652. Spink, A.
Examining Task Relationships in Multitasking Consumer Search Sessions: A Query Log Analysis Xiang Zhou, Pengyi Zhang, Jun Wang Department of Information Management, Peking University 5 Yiheyuan Rd, Haidian District, Beijing 100871, China {zhouxiang.im, pengyi, junwang}@pku.edu.cn ABSTRACT
With the advancement of the Internet, search has expanded to every aspect of our daily lives including online shopping. Previous research has discovered that the focused, one-time search task model does not apply in many real-life settings and people engage in multitasking sessions. In this poster, we examined the relationships among tasks in multitasking product search through query log analysis. We analyzed 2,910 users’ 47,387 queries in 18,102 product search sessions from taobao.com, the biggest Chinese C2C ecommerce site. Results show that: (1) 35.7% of all search sessions were multi-tasking sessions; (2) users’ issued more queries and spent more time in multitasking sessions, but the number of queries per task remained similar for monotasking and multitasking sessions; (3) about 80% of the tasks were unrelated in multitasking sessions, while the other 20% were hierarchical and sibling tasks. The results provide exploratory understanding of the relationships among multiple product search tasks, and could be useful for query recommendation and product recommendation. Keywords
Information seeking, query log analysis, consumer search task, task relationships, taobao.com
purchasing decisions. Information search, the first stage in the buying process, becomes more important in online shopping than in traditional retailing (Rowley, 2000). Online shopping is more “information intensive”, meaning that the E-commerce Web sites intended for transactions become information sources (Fortune, 1998). Similar to other kinds of Web search, users engage in multitasking sessions when searching for product information. Research has found that multitasking is quite common in Web search (Lucchese, Orlando, Perego, Silvestri, & Tolomei, 2013; Spink, Ozmutlu, & Ozmutlu, 2002). Tasks are often interleaved. For example, Feild and Allan (2013) found that a 10-query sequence typically consists of three to seven search tasks. Figure 1 shows an example query sequence from a user log session from Taobao.com, China’s leading C2C e-commerce site. In this search session containing nine queries, the consumer searched for three purchasing intentions: a cake, an umbrella, and a bag. These tasks were not related, however, some search tasks may be related (such as buying a pair of pants and a matching shirt for a special occasion). Very little research has been done on product information search
INTRODUCTION
Users engage in search tasks in many contexts including online shopping. According to China Internet Network Information Center (CNNIC), the number of Chinese Internet users who use online shopping has reached 413 million by December 2015. The Internet has lowered the cost of search for alternative and substitute products greatly, enabling consumers to collect more information about products, brands, and sellers before they make the {This is the space reserved for copyright notices.] ASIST 2016, October 14-18, 2016, Copenhagen, Denmark.
[Author Retains Copyright. Insert personal or institutional copyright notice here.]
Figure 1: Example Multitasking Product Search
to understand the characteristics of multi-tasking product search. In this paper, we aim to address this problem by examining task relationships in multitasking product search sessions through query log analysis. The two research questions of this research are: R1. What are the characteristics of multitasking product search in terms of number of queries and session length? R2. How do tasks in multi-tasking search sessions relate to each other?
topical categories in 60% of the multitasking sessions such as changing from entertainment to hobbies. Besides topical change, tasks in such multitasking sessions may be interleaved or hierarchically organized (Jones & Klinkner, 2008). Users might have primary tasks and secondary tasks while working on multiple tasks simultaneously (Spink, Cole, & Waller, 2008). Relatively less research has been done examining the relationship among tasks. METHODS
RELATED RESERCH
Data Set
Search Sessions and Tasks
Our data set includes browser click-through logs from taobao.com during the month of May 2013. We cleaned the data set and removed users with too many sessions (top 2.5%) who are likely to be sellers. The data set used in this analysis includes 47,387 queries of 2,910 users in 18,102 sessions. Each record contains the following fields:
A search session is often defined as a series of queries made by a user within a range of time, thus time intervals and search patterns are often used for session identification (He, Gker, & Harper, 2002). Previous research has used session delta ranging from 5 to 30 minutes: if no interaction happens within 5 to 30 minutes of the previous interaction, the next interaction is considered a new session (Kamvar & Baluja, 2006). In E-commerce research, session delta has been set at longer time intervals, such as 90 minutes (Karpischek, Santani, & Michahelles, 2012). There has been a large body of research focusing identifying search tasks from query sequences (Jones & Klinkner, 2008; Li, Deng, Dong, Chang, & Zha, 2014; Lucchese et al., 2013). For example, Jones and Klinkner (2008) used a hierarchical clustering approach to identify search tasks and goals from query sequence. Lucchese et al. (2013) used a two-step clustering approach based on lexical and semantic features of adjacent queries. Li et al. (2014) used probabilistic models that based on query cooccurrence. Feild and Allan (2013) found that search task context has big impact on query recommendation performances, and suggested that task-aware query recommendation should include features such as user behavior and Web search features in addition to query log analysis. Multitasking Search
Previous research has found that multitasking information search is a common user behavior in Web search. For example, Spink et al. (2002) found that 11.4% of 1000 randomly extracted sessions were multitasking sessions on Excite. Sessions with multiple queries tended to be more multitasking, for example, Spink, Park, Jansen, and Pedersen (2006) found that 81.1% two-query search sessions and 91.3% three-and-more-query search sessions were multi-tasking sessions on AltaVista. Lucchese et al. (2013) found that 74% of Web queries were part of multitasking search sessions in a three-month sample of AOL search logs. In multitasking Web search, users often change topics in the same session (Feild & Allan, 2013; Spink et al., 2002). For example, Spink et al. (2002) found that users changed
• IP address: the IP address from which a click is made; • URL: the URL of the Web page a user visited; • query terms: queries as entered by a user (if any); • date and time: the date and time when a user opened an URL. Figure 2 shows some example log records of a user.
Figure 2: Example Log Records Session and Task Identification
In this paper, we set the session delta at 45 minutes, meaning that if a user was inactive for 45 minutes, his/her next activity is considered a new session. The example in Figure 2 contains 3 sessions, each searching for a different product category. For task identification, we employed hierarchical clustering of query terms based on pairwise Jaccard similarity to identify search tasks and experimented with thresholds from 0.2 to 0.6. We randomly sampled 1,015 (2%) search log records and created gold standards manually. We then set the threshold at 0.35, which yields the highest F-score
(
) for task identification.
A session is considered a multitasking search session when the session contains queries that are matched to two or more product categories.
Task Relationship Analysis
We used Taobao’s product catalog available through open API1 as the external knowledge source for determining how two tasks are related. The catalog is a 5-lelvel hierarchical classification with more than 14,000 product categories. We matched a task to a product category if any of the queries belonged to a task contain the category name. We defined three types of task relationships based on the relationship of the product categories in the catalog: • Hierarchical: the product categories that a user searched have a parent-child relationship. • Sibling: the product categories that a user searched for are siblings in the product category tree. • Unrelated: all other relationships. Table 1 shows some examples of hierarchical, sibling, and unrelated tasks. Type of relationship
Task 1 product category (ID)
Task 2 product category (ID)
Hierarchical
Clothing (161)
Shirts (16102)
Hierarchical
Home furniture (107)
Sofa (10704)
Sibling
Pants (16103)
Belts (16116)
Sibling
Shirts (16102)
Dress pants (16107)
Unrelated
Cell phone (11201)
Television (1220109)
Unrelated
Clips (1300302)
Pants (16103)
# of tasks in a session
N
%
Cumulative %
1
11640
64.3%
64.3%
2
3823
21.1%
85.4%
3
1330
7.3%
92.8%
4
626
3.5%
96.2%
5
288
1.6%
97.8%
6
170
0.9%
98.8%
7
85
0.5%
99.2%
8
44
0.2%
99.5%
9
25
0.1%
99.6%
10
21
0.1%
99.7%
11 and more
50
0.3%
100%
Table 2: Number of Tasks in a Session
Table 1: Examples of Different Task Relationships
As Table 1 shows, according to the product catalog, clothing and shirts are hierarchical tasks. Shirts and dress pants are sibling tasks. Cellphone and television are unrelated tasks. Figure 3: Session Distribution by Number of Tasks RESULTS Distribution of Sessions by Number of Tasks
We examined the sessions by the number of tasks contained in a session. Table 2 shows the number of tasks in a session. 64.3% (11,640) of all sessions contain only one task, while 35.7% (6,462) of all sessions were multitasking sessions, i.e., contained more than one tasks. About 21.1% of the sessions contained two tasks, 7.3% contained three tasks, and 0.3% contained more than eleven tasks. We plotted the distribution of all sessions by the number of tasks in each session. The X-axis (log-scale) in Figure 3 is the number of task in a session, and the Y-axis (log-scale) is the number of sessions containing a particular task number. Analysis showed that the number of sessions followed a power-law distribution by number of tasks in a session (R2=0.95), meaning that most search sessions contained small number of tasks while very few sessions contained many tasks. 1
open.taobao.com
Characteristics of Multitasking Sessions
Table 3 shows the average number of queries per session and per task in mono-tasking and multitasking sessions. # of tasks in a session
# of queries per session
# of queries per task
One task
1.46
1.46
Two tasks
3.16
1.58
Three or more tasks
6.96
1.66
Table 3: Number of Queries per Session and per Task
Users issued more queries in multi-tasking sessions. Monotasking sessions contained 1.46 queries in average, whereas two-task sessions contained 3.16 queries, and sessions dealing with three or more tasks contained 6.96 queries. The average number of queries issued per task seemed to increase slightly from 1.46 per task to 1.58 per task in twotask sessions to 1.66 per task in sessions with three or more tasks.
Table 4 shows the average session length of mono-tasking and multitasking sessions and time spent on each task. # of tasks in a session
Average session length
Time spent per task
One task
36m8s
36m8s
Two tasks
57m7s
28m33s
Three or more tasks
92m3s
21m59s
Table 4: Average Session length
The average length of mono-tasking sessions is 36 minutes and 8 seconds, whereas the average length of two-task sessions is 57 minutes 7 seconds. Sessions with three or more tasks took about 92 minutes 3 seconds in average. Correlation analysis show that session length is positively related to the number of tasks a user is dealing with in that session (correlation coefficient equals 0.346, P