A learning approach for email conversation thread reconstruction

Article

A learning approach for email conversation thread reconstruction

Journal of Information Science 39(6) 846–863 Ó The Author(s) 2013 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/0165551513494638 jis.sagepub.com

Mostafa Dehghani School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Iran

Azadeh Shakery School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Iran and School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Iran

Masoud Asadpour School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Iran and School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Iran

Arash Koushkestani School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Iran

Abstract An email conversation thread is defined as a topic-centric discussion unit that is composed of exchanged emails among the same group of people by reply or forwarding. Detecting conversation threads contained in email corpora can be beneficial for both humans to digest the content of discussions and automatic methods to extract useful information from the conversations. This research explores two new feature-enriched learning approaches, LExLinC and LExTreC, to reconstruct linear structure and tree structure of conversation threads in email data. In this work, some simplifying assumptions considered in previous methods for extracting conversation threads are relaxed, which makes the proposed methods more powerful in detecting real conversations. Additionally, the supervised nature of the proposed methods makes them adaptable to new environments by automatically adjusting the features and their weights. Experimental results show that the proposed methods are highly effective in detecting conversation threads and outperform the existing methods.

Keywords Conversation thread reconstruction; email; information management; linear and tree structures of conversation threads; machine learning

1. Introduction Email has evolved as a convenient means of communication among people in recent years. It is a fast, efficient and inexpensive way of sharing information which offers a user the chance to discuss different topics with different people. Email data are valuable sources of information and analysing the discussions taking place can be beneficial for various groups of users and applications. A key difference between emails and other types of text documents is that emails are not completely independent of each other. An email could be a response to another email and thus, in order to understand the intention of a certain email, it is necessary to find the conversational information hidden in the previous related emails. In addition, users who want to obtain information from past discussions usually like to follow arguments in discussions. Therefore, detecting dependencies among emails can improve the quality of email data analysis. A conversation thread in an email corpus is formally defined as a set of exchanged emails on the same topic among the same group of people by Corresponding author: Azadeh Shakery, Room 517, School of ECE, College of Engineering, University of Tehran, North Kargar Ave, Tehran, Iran. Email: [email protected]

Dehghani et al.

847

reply or forwarding [1]. Recently, conversation thread detection from email data has been profitably employed in several applications, like discussion search in public mailing lists [2–6], topic detection in email corpora [7], email summarization [8, 9], email classification [10, 11], question answering [12, 13], expert finding [14, 15], improvement of user experience in email management [16], visualization [17, 18], analysis of information sharing in interactive environments [19] and detection of argument patterns to evaluate the creditability of messages in conversations [20]. Two structures could be assumed for conversation threads: linear structure and tree structure. In linear structure, emails that belong to the same conversation are detected and arranged in chronological order, shaping a flat structure. On the other hand, since in email conversations, users are allowed to choose any preceding email to reply to, many branches of discussions appear in a conversation and a conversation thread demonstrates a tree-shaped structure. In tree structure, conversation threads are shaped like a rooted tree in which the first email is the root and replies or forwards are shown as its children [2]. In this paper, we cast the problem of reconstructing email conversation threads to a supervised learning problem and propose two approaches for detecting conversation threads, LExLinC and LExTreC. In the former, we try to reconstruct the linear structure of conversation threads. To this end, we determine a number of email features that are capable of detecting co-thread relations between emails. Learning algorithms are exploited to classify emails based on these features and conversation threads are extracted regarding these relations. In the second approach, we consider a tree structure for emails. We introduce some new features that help us to detect parent–child relationships between emails. Then learning algorithms are exploited to arrange emails based on these relationships in a tree structure. In general, different attributes can be used as features for detecting co-thread or parent–child relationships between emails, for example, email subject, content, date and participants. Many previous works in this area have employed only a subset of these attributes. For example, since email subjects are usually the same in an email thread, some previous works have extensively used this feature for reconstructing conversation threads [21, 22]. These works first group emails based on their subjects and then analyse each group to extract the structure. However, the assumption that all emails in a conversation thread have the same subject is not always true. It is possible that users change the subject during the same conversation, or two separate conversations have the same subject, especially for general subjects like meetings or appointments. Some other previous works try to reconstruct the tree structure of conversations for clusters of emails [2, 23, 24]. They attempt to find parent–child relationships among emails in the same cluster. Although these methods are able to reconstruct tree structure of conversations, they fail when clusters should be merged or split to create real conversations. In our proposed approaches, we have considered all of these concerns, trying to exploit all available features without making any restricting assumptions. We have also defined and used some new features that are shown to significantly improve the quality of thread reconstruction. Specifically, we use named entities to estimate similarities of email topics, and use the social network of people to estimate the closeness of email participants. On the other hand, compared with the methods that use many different features, the leveraging of learning algorithms in our method makes it flexible and adaptive. In our method, the learning algorithms judge the relative importance and effectiveness of each feature, instead of heuristically combining the features, and thus our method is capable of establishing a model to combine and use the features efficiently. We evaluate our proposed methods on three real datasets and discuss the effectiveness of evaluation results. The three datasets are BC3 corpus and two datasets extracted from Apache Tomcat and Redhat Fedora public mailing lists. We compare the results with previous approaches for reconstructing conversation threads. The results indicate that our method outperforms existing methods in both finding emails in the same discussion thread and reconstructing the structure of threads. The rest of the paper is organized as follows. In Section 2, we begin with an overview of related works. In Sections 3 and 4, we will describe details of our approach in thread reconstruction and talk about the design of features used in our models. Experimental results and evaluation analysis along with some discussion will be presented in Sections 5 and 6. Finally we will make conclusions and propose some research directions in Section 7.

2. Related work Much research has been done on detecting conversation threads. Existing methods can be categorized into two groups: (1) metadata-based and (2) content-based. Met-data-based approaches [21, 25] use email header fields such as INREPLY-TO and REFERENCES to detect conversation threads. The Zawinski algorithm [25] is one of the most popular algorithms for email threading and is based on metadata information. However, since these header fields are optional for email clients, they are not always available and, in addition, such data are not capable of reconstructing all email conversations accurately. On the other hand, content-based approaches use the content of emails, including subject line, body Journal of Information Science, 39(6) 2013, pp. 846–863 Ó The Author(s), DOI: 10.1177/0165551513494638

Dehghani et al.

848

text, time and participant information, for reconstructing conversation threads. Content-based approaches again fall into two groups: the ones that only detect conversations and those that reconstruct the tree structure of conversations. The first group of content-based approaches only groups emails into conversations, paying no attention to the structure of conversations. Wu and Oard [22] use the subject lines of emails to detect conversation threads. Specifically, they cluster emails into the same conversation thread if the emails have the same subject line and have at least one participant in common. Wang et al. [26] extract email threads using the Zawinski algorithm and then merge or decompose extracted threads based on their subject lines in order to reconstruct the conversations. Cowan-Sharp [27] applies Latent Dirichlet Allocation to the problem of detecting topic and topic change in email conversation threads. In this work, it is demonstrated that Latent Dirichlet Allocation can be used to successfully classify email messages to threads. Erera and Carmel [1] cluster emails into coherent conversations exploiting a similarity function that considers all relevant email attributes, such as email subject, participants, date of submission and email content. The methods in the second group of content-based conversation thread reconstruction try to reconstruct tree structures of emails. Lewis and Knowles [28] are among these works. They regard email threading as a retrieval problem and study five retrieval strategies to indicate whether an email is a response to another one. Their results show that the most effective strategy is using the quotation of an email as query and the unquoted parts of other emails as documents. In their work, Lewis and Knowles focused only on the body text of emails and did not take advantage of other information available in emails. Another work in this category is Yeh and Harnly’s method [21]. In their work, they make the assumption that all emails in the same conversation have the same subject line and the lifetime of a conversation is usually shorter than a fixed period. Thus they divide emails into a number of groups such that all emails in the same group have the same subject line and the maximum time difference between any two emails in a group is less than a fixed threshold. They then try to reconstruct the tree structure of email threads by detecting parent–child relations among emails in the same group. Although their method is successful in detecting tree structures, the assumptions they have made do not always hold, which is confirmed in their experiments. Recently, Joshi et al. [29] used segmentation and detection of near duplicate emails to find and organize messages that should be grouped together based on reply and forwarding relationships. They make an assumption that, when an email replies to another email, the content of the former exists in the latter as a quoted text in a separate segment. Thus they reconstruct conversation threads considering these segmentation patterns. Dehghani et al. [30] map the problem of reconstructing conversation threads to an optimization problem and propose an evolutionary algorithm to solve it. In their method, all main attributes of emails are aggregated in a fitness function. There are also some related researches that work on chat [31, 32] and forum data [33] or other social web domains [34]. Wang et al. [23] use a graph-based connectivity matrix, in which both content similarity and temporal information are utilized to recover tree structures in newsgroup data given a discussion stream of messages. Their approach just recovers the tree structure of each thread and does not regard merging or splitting threads to reconstruct real conversations. In addition, they use a few simple similarity features and parameters are manually tuned in their proposed method. Wang et al. [24] propose a probabilistic model in conditional random fields framework to predict the replying structure for online forum discussions. There is also some research work that employs conversation threads to improve forum retrieval and search [2, 35]. Seo et al. [2] investigate retrieval and learning techniques that exploit the hierarchical thread structures in community sites. They introduce structure discovery techniques that use a variety of features to model relationships among posts that are in the same thread. In fact, they make some restricting assumptions and try to reconstruct the tree structure of a thread using flat structure of its posts. While these methods are effective in detecting threads, they might fail to detect all email conversations [29] or to reconstruct true structures of the conversations [1]. Some of these methods have made assumptions that are not consistent with our problem definition, for example, considering the same subject line for all emails in the same conversation [21, 22], or assuming no need to merge or split groups of emails for reconstructing real conversation threads [2, 23, 24]. In this research, we concentrate on reconstructing email conversation threads without making any restricting or simplifying assumptions in structure or content of conversations.

3. Conversation thread reconstruction Mailing lists archives commonly maintain or display the structure of email conversation threads in two ways: linear structure and tree structure. In linear structure, as its name implies, all emails that belong to the same thread are arranged in chronological order. In tree structure, emails are arranged in a tree based on their reply relationship in conversation threads. Figure 1 shows the linear and tree structure of a sample conversation thread respectively. In this figure, each node represents an email. In Figure 1(A), edges represent time priorities, while in Figure 1(B), edges represent reply relations between emails: there is a directed edge from ei to ej if the child email ei is a response to the parent email ej . Journal of Information Science, 39(6) 2013, pp. 846–863 Ó The Author(s), DOI: 10.1177/0165551513494638

Dehghani et al.

849

Figure 1. A sample conversation thread.

Our main contribution in this paper is two-fold: we first propose an approach to reconstructing the linear structure of a conversation thread by detecting clusters of emails that belong to the same conversation thread, and then sorting the emails in chronological order. We further propose an approach to extracting the relationships of emails in each thread and reconstruct the tree structure of conversation threads based on the extracted parent–child relations. In cases where the user is interested in seeing emails in the order they were written, the linear structure of a conversation thread can be helpful. However, based on this structure, the emails are presented strictly in chronological order, with no attention to the sender and receiver of each individual email. Tree structure, in comparison, allows the user to quickly appreciate the overall structure of a conversation, specifically who is replying to whom. As such, it is most useful in situations where there are complex debates in conversations. In long conversations, it quickly becomes impossible to follow the arguments without concerning the tree structure of conversations. A downside of using tree structure is that the user is unable to follow emails in chronological order.

3.1. Learning to extract linear structures of conversations (LExLinC) In our proposed approach, we have mapped the problem of reconstructing the linear structure of conversation threads to a graph clustering problem. In this mapping, we first create a semantic email network out of the set of emails: a weighted undirected graph whose nodes represent emails and edges represent co-thread relationships between emails. On the created graph, we are able to calculate measures like closeness, betweenness and other centrality measures, which are beneficial for extracting relationships among emails and for other purposes like detecting conversation focus of email discussions [36]. Clustering the emails in the created network will result in the conversation threads: any two emails in the same cluster belong to the same thread. Since we have a semantic network of emails, we can take advantage of graph clustering algorithms for this purpose. Figure 2 shows the steps of extracting linear structure of conversation threads. In the first step, a network indicating the relationships between emails is created. This network is clustered into conversation threads in the second step. In the third step, emails in each cluster are arranged in chronological order to reveal the linear structure of conversation threads. The problem is now reduced to creating a semantic email network out of the set of emails which is capable of indicating the relationships between emails in terms of belonging to the same thread. To create this semantic email network, a naive approach would be to connect two emails in the network if their similarity is above a threshold, where the similarity is calculated as a function of the main features of the emails. The problem with this approach is that there is no easy way to determine the importance of each feature. To avoid such a limitation, we propose a feature-based learning method in our approach. We propose to train a predictor to predict the co-thread relationship between two emails. There is an edge between two emails in the network if the predicted score is positive, with the weight of the edge being proportional to the predicted co-thread score. In the training phase, for any pair of emails in the training data, a feature vector composed of main attributes of emails is generated. If two emails are co-thread, the corresponding pair is labelled 1, and otherwise it is labelled 0. We will introduce the features in detail in Section 4. Journal of Information Science, 39(6) 2013, pp. 846–863 Ó The Author(s), DOI: 10.1177/0165551513494638

Dehghani et al.

850

Figure 2. Reconstructing the linear structure of conversation threads.

Figure 3. Reconstructing the tree structure of conversation threads.

3.2. Learning to extract tree structures of conversations (LExTreC) In order to reconstruct the tree structure of a conversation, we first find paths of arguments in the conversation, and then integrate these paths to construct the conversation tree. An argument is a linear path from an email to the root of the conversation tree. These mined arguments are valuable and meaningful by themselves, since each path represents an independent discussion. Figure 3 shows the entire procedure of reconstructing the tree structure of a conversation thread. We employ a learning-to-rank approach to learn a model to find an argument path from each node to the root of the conversation tree. To train the model, we extract all argument paths from conversation trees in training data and compute the required features. Using the learned model, we can find the argument paths from different nodes to the root of the conversation tree in the test data. The final trees are then constructed using the most probable parent–child relationships extracted based on the argument paths. The details of the procedure are presented in the rest of this subsection. 3.2.1. Extracting paths of arguments. In order to extract the argument paths, we use a learning approach. We consider each argument path as an ordered list, and use learning-to-rank to train a model that enables us to recover ordered lists corresponding to paths in the test data. Learning-to-rank is widely used in various information retrieval tasks. This approach Journal of Information Science, 39(6) 2013, pp. 846–863 Ó The Author(s), DOI: 10.1177/0165551513494638

Dehghani et al.

851

is based on machine learning techniques and performs ranking based on learning a model from training dataset. For information retrieval, features extracted from query–document pairs are used for learning a scoring function. The learned scoring function is then employed to retrieve and rank documents in response to a query [37]. To train the ranking model in our task, we extract all paths in the conversation trees in training data. The path extraction is done simply by following tags representing reply relations between emails in training data. To map the problem to a retrieval and ranking problem, each email in the training data is considered as a query (we call it Qmail) and all other emails are considered as documents (we call them Dmails). The relevant Dmails of a particular Qmail are its ancestors in the argument path. The ancestors of each Qmail node are sorted such that the direct father of Qmail is the first in the ranked list and the root of the tree is the last. We then define a number of features representing parent–child relations between emails in argument paths. The details of the features are presented in Section 4. Finally a learning-to-rank algorithm is used to learn a model to combine extracted features. Thus far, extracting argument paths in test data is reduced to finding ranked lists of relevant Dmails. We consider each email in the conversation tree as a Qmail, and extract the path from the direct father of the Qmail to the root of the tree. These extracted paths are combined in the next step to generate the conversation tree. Note that, in general, all the emails are ranked in response to a Qmail, but each argument path comprises a certain number of emails. Thus we have to cut the ordered list and discard emails with low relevance scores. We use a dynamic threshold for this purpose, considering only Dmails with scores higher than the threshold as relevant Dmails comprising the argument path. In order to set the threshold, we simply find the maximum difference between Dmail scores in response to a Qmail, and cut the list from that point. It is noteworthy that using dynamic threshold is indispensable, since range of argument path scores is very wide and thus setting a fixed threshold will not work. In addition, the strength of features used for learning the model leads to a big gap between relevant and irrelevant Dmails’ scores and irrelevant Dmails get far lower scores than relevant Dmails. Therefore our simple technique in finding a dynamic threshold is efficient. The left box in Figure 3 shows the whole procedure of extracting argument paths. In the sample shown, the test data contains 11 emails and each of these emails is considered as a Qmail in turn. In response to each Qmail, the relevant Dmails are sorted and cut using a dynamic threshold to generate the argument path of the Qmail. 3.2.2. From arguments to conversations. To reconstruct the tree structure of conversation threads using extracted argument paths, the straightforward method would be to get the first Dmail in the ranked list of each Qmail. If its score is higher than a fixed threshold and its time is before the time of the Qmail, consider it as the parent of the Qmail and otherwise the Qmail is the root of the thread. Given all parent–child relations, construction of the tree would be trivial. In this method, it is assumed that a reply relationship between two emails is independent of the emails’ grandparents and grandchildren and also other relations between emails. A similar approach is used by Seo et al. [2] for this task. Experiment results show that the accuracy of the extracted conversations using this method is sometimes poor because of some ambiguities like getting a sibling in the true conversation tree as the direct father of a node. To make the results more robust, in LExTreC we try to make better use of information in the argument paths to extract parent–child relations. The idea is to consider all the relevant Dmails of a Qmail as its potential parents and make use of implicit information between Dmails in extracted argument paths to refine the parent–child relations. By an explicit relation, we refer to the extracted relationships between a Qmail and its relevant Dmails and an implicit relation denotes extracted relationships between two consecutive Dmails in response to a Qmail. To better illustrate the relation between arguments and conversations, consider Figure 4. Figure 4(A) shows the tree structure of a conversation thread. The arcs with label Arg.# show argument paths from emails to the root email. Figure 4(B–D) shows the corresponding argument paths extracted using our proposed method. In these figures, nodes represent emails, straight edges with label W indicate the explicit parent–child relations between emails, and dotted edges with label W 0 indicate the implicit parent–child relations. As seen in Figure 4(D), the extracted argument paths are not always accurate. For example in this figure, since e2 and e3 are close in time, e2 is given a high score when e3 is considered as the Qmail, and it will be incorrectly recognized as the direct father of e3 if we just focus on this argument path for extracting parent–child relations. However, the correct parent–child relation between e1 and e3 can be indirectly inferred from the implicit relations between e1 and e3 in the argument paths extracted when e5 and e6 are considered as Qmails. Thus we can improve the accuracy of extracted parent–child relations using these implicit relations. To measure the quality of implicit parent–child relations, explicit similarity scores, named Wi,j,k in Figure 4, are unfolded into the feature space considering the learned weights of features. Implicit relations are then validated using dot ! indicate the feature vector describing the relation between e as a child and e as the products of email vectors. Let vi,j,k i j ! . The reason that we unfold the similarity scores into feature parent when ek is considered as Qmail, and Wi,j,k = vi,j,k vectors instead of directly using W s is to better estimate these similarities in the feature space. For example if ei is Journal of Information Science, 39(6) 2013, pp. 846–863 Ó The Author(s), DOI: 10.1177/0165551513494638

Dehghani et al.

852

Figure 4. Reconstructing conversations from arguments.

similar to ej in terms of time and is similar to ek in terms of content, intuitively there is no similarity between ej and ek ! truly leads to obtaining low similarity between e and e . Consider regarding ei and using the dot product of ! vi,j,i and vi,k,i j k e5 is a Qmail and e3 , e1 , e0 are extracted Dmails from e5 , the root of the tree (Figure 4C). On top of the explicit information in this path, we can implicitly consider e1 as the parent of e3 regarding W 03,1,5 that is actually inferred from the existing explicit feature vectors. In order to evaluate the quality of this extra information, we calculate W 03,1,5 as the dot ! product of v! 5,3,5 and v5,1,5 . After extracting all implicit parent–child relations, in order to determine the direct father of each email eQ , the similarity between each email eD and email eQ is calculated as: SQD = WQ,D,Q +

X

W 0 Q,D,i

i

where sQD indicates the similarity between eQ as a child and eD as a parent. Finally, PQ is selected as direct father of eQ where: PQ = ej : j = argmaxfSQi g i

Experiment results confirm the effectiveness of this method, especially when siblings make it hard to detect the correct parent–child relations.

4. Features In this section, we introduce the features we have employed to extract relations between emails. We have selected these features regarding four main aspects of email data: content, subject, date and participants. It is noteworthy that definitions of some of these features are different when reconstructing the linear structure or tree structure of conversations, which will be mentioned when defining the features.

4.1. Email content The first seven features are related to similarity of email contents. It is clear that, when two emails are written along the same discussion, they usually contain similar texts and use similar words. Thus, we can define features based on the content similarity of emails. 4.1.1. Email text similarity. When creating a reply email, many email clients automatically quote the content of the original email in the reply email. Thus, the content of an email can be split into its quoted and unquoted parts. To measure the content similarity of two emails, we have employed different approaches when reconstructing the linear or tree structure Journal of Information Science, 39(6) 2013, pp. 846–863 Ó The Author(s), DOI: 10.1177/0165551513494638

Dehghani et al.

853

of conversations. When reconstructing the linear structure of a conversation, we consider the whole content of emails to measure their similarity. In reconstructing the tree structure of a conversation, if the Qmail contains quoted text, the quoted part is considered as the query, otherwise the whole text is given as text query. For Dmails, the whole content is considered as document. The reason for this choice is that, in reconstructing the tree structure of conversations, features should predict parent–child relations and the quoted part of a child email is usually a near duplicate of the content of its parent. We employ five different information retrieval techniques to measure the content similarities which introduce five different features. These techniques include a vector space model with pivoted normalization [38], Okapi BM25 [39], and using KL-divergence for measuring the similarity of statistical language models of two emails with three different smoothing methods: absolute discounting, Dirichlet Prior smoothing and Jelinek–Mercer smoothing [40]. 4.1.2. Email named entity similarity. Named entities are defined as words or phrases that refer to various entities of interest, including persons, locations and organizations. Since emails are usually about specific issues, in many cases they contain named entities in their texts. If two emails talk about the same subject and are in the same conversation, it is probable that they contain common named entities, especially if one is replying to the other. To find named entities of text of emails, we employ the Stanford University named entity recognizer tool [41]. We use the Dice similarity of the set of named entities to measure the text similarity of two emails: SimNamedEntity ðe1 ,e2 Þ =

2jNEe1 ∩ NEe2 j jNEe1 j + jNEe2 j

where NEe indicates the set of named entities of email e. This feature is calculated similarly for constructing both the linear and tree structures of conversations. 4.1.3. Speech act. For reconstruction of conversation threads, it is useful to classify emails according to the intent of the sender. Speech acts are used to investigate the intent of the writer in conversational text. In a formal definition, a speech act is an act that is performed by making an utterance defined in terms of a speaker’s intention. It provides a function in communication that has an effect on listeners [42]. Usually a sequential nature exists between the speech act of an email and the speech acts of its children. For example, an email with Request speech act is mostly coupled with an email with Deliver speech act as reply email. We use this feature to improve the effectiveness of detecting reply relations in the reconstruction of the tree structure of conversation threads. For automatic speech act labelling at email level, we use Ciranda [43] to label emails as Request, Propose, Commit, Data and Meeting. Very briefly, a Request email asks the recipient to perform some action; a Propose email proposes a joint activity (i.e. asks the recipient to perform some action and commits the sender); a Commit email commits the sender to some future sequence of action; Data is information, or a pointer to information, delivered to the recipient; and a Meeting is a joint activity in a specific time and/or specific space. We estimate the speech act score of each pair of Qmail and Dmail by estimating the conditional probability of Dmail with speech act SA1 (DSA1 ) being the parent of Qmail with speech act SA2 (QSA2 ). This probability is estimated using maximum likelihood estimation method as: PðDSA1 jQSA2 Þ =

PðDSA1 ∩ QSA2 Þ NSA2 → SA1 ≈ P(QSA2 ) NSA2

where NSA2 → SA1 is the number of child emails in training data with speech act label SA2 whose parents are labelled as SA1 . NSA2 is the total number of child emails in training data with speech act SA2 .

4.2. Email subject The next feature we employ to measure the similarity of emails is regarding their subjects. To measure subject similarity between two emails, in both reconstructing linear and tree structures of conversations, we first canonize the subjects, that is, parts of subject that are not generated by humans, like fw: and re:, are removed. The normalized word overlap is then used as the similarity measure. Let se be the set of words belonging to the canonized subject of email e. Subject similarity is defined as:

Journal of Information Science, 39(6) 2013, pp. 846–863 Ó The Author(s), DOI: 10.1177/0165551513494638

Dehghani et al.

854 Simsubject ðe1 ,e2 Þ =

2 j S e1 ∩ S e2 j : jSe1 j + jSe2 j

4.3. Email date One of the most important features we use is the date of emails. We have used the date similarity function proposed by Erera and Carmel [1] with a little modification. Assuming that a reasonable upper bound in response time can be specified, Erera and Carmel have used a fixed time window with length maxdiff , above which the date similarity is zero. To reconstruct the linear structure of conversations, we use the same similarity function: jde1 de2 j Simdate ðe1 ,e2 Þ = 1 min 1, , maxdiff

where de is the date of email e. For reconstructing tree structure of conversations, this feature must be modified to give a high penalty if Dmail’s date is after Qmail’s date. ( Simdate ðQmail, DmailÞ =

dQ dD 1 min 1, maxdiff penalty

if dQ dD ≥ 0 : if dQ dD < 0

We have used penalty = 10 in our experiments.

4.4. Email participants Our next two features evaluate the relationship among people who participate in a conversation. We consider both local closeness of participants and their global closeness in our method. In other words, in addition to considering the relation of people due to their participation in the conversation, we consider their closeness with respect to their communication social network. Since the meanings of these two features in reconstructing linear and tree structures are close, they are identically calculated for both cases. To estimate the similarity of two emails in terms of their participants local closeness, we use Erera and Carmel’s method [1]. They use a variant of Dice similarity, taking the activity role of participants into consideration. In addition to local closeness, we measure the global closeness of participants of two emails as well. The idea is that people who are closer to each other in terms of communication are more likely to contribute in a specific conversation. For estimating closeness of participants, we use their social communication network. First we create a directed graph indicating the social network of email communication. In this graph, nodes represent participants, and for each email in the dataset, the node that represents the sender of the email is connected to the nodes that represent recipients of that email. We can estimate the closeness of two nodes in this graph using different social network similarity measures. In this paper, we exploit neighbourhood overlap, which is a simple measure to estimate the closeness of two nodes; for this purpose. Let adj(p) denote the set of all neighbours of node p including itself. The closeness of two nodes is defined as: adjðpi Þ ∩ adj pj : Closeness pi , pj = jadjðpi Þ ∪ adj (p j)j

The similarity of two emails regarding the global closeness of their participants is calculated as: P Simsocialnetwork ðe1 , e2 Þ =

pi ∈ pe1

P

pj ∈ pe2

Closeness pi , pj

2 log jpe1 ∪ pe2 j

:

Experimental results show that all these features are effective, and exploiting learning algorithms to learn to properly combine these features and use them together can significantly improve the quality of conversation thread reconstruction.

5. Experiments In this section, the experiment results are presented to evaluate the effectiveness of the proposed method. After introducing the test collections used in these experiments, we explain the evaluation measures, and then describe and analyse the results and compare our method with some other research work in this area. Journal of Information Science, 39(6) 2013, pp. 846–863 Ó The Author(s), DOI: 10.1177/0165551513494638

Dehghani et al.

855

Table 1. Datasets statistics. Dataset

Number of emails

Number of threads

Average number of emails per thread

Number of emails in the longest thread

Maximum node degree

Maximum conversation tree depth

BC3 Apache Redhat

261 2185 10427

40 258 842

6.53 8.47 12.384

11 177 89

6 38 17

6 41 34

Figure 5. BC3 dataset.

5.1. Datasets In this study, we have performed experiments on three different datasets. The first dataset is BC3 corpus [44], which is a subset of W3C [45]. The W3C corpus has been crawled from w3c.org (WWW Consortium’s web site). This dataset contains thread information, as well as human written abstract summaries for the emails and speech act labels. The text content of BC3 emails has a total of 3222 sentences. This corpus consists of 162 IDs. The tagged conversation threads of BC3 corpus are shown in Figure 5(A). We have made use of the social network extracted from W3C instead of the social network extracted from BC3 itself. The reason for this is that we want to analyse all communications, even some indirect ones, to compute the features. Some of these relationships exist in W3C, but may have been omitted in BC3. The social network of email communications in BC3 corpus is shown in Figure 5(B). This network contains 6807 nodes and 18,247 edges. The second dataset we have used in these experiments is a subset of Apache Tomcat public mailing list (http://tomcat. apache.org/mail/dev). This dataset contains the discussions of the mailing list from August 2011 to March 2012. We have removed small threads, the ones that contain fewer than four emails. The total number of people who participated in these discussions is only 80, which is much lower than in the BC3 corpus. The third dataset is a subset of Fedora Redhat Project public mailing list (https://www.redhat.com/archives/fedoradevel-list). This dataset contains discussions that took place in the first six months of 2009. We eliminated all threads with fewer than four emails. A total of 675 people participated in these discussions. Table 1 shows some statistics about these three datasets. An important point about these datasets is that Apache and Redhat are public mailing lists and are similar to forums, hence in most cases emails have been sent to a central email address. In this aspect, the social network of people Journal of Information Science, 39(6) 2013, pp. 846–863 Ó The Author(s), DOI: 10.1177/0165551513494638

Dehghani et al.

856

extracted from these datasets is different from email data. However, BC3 is an email dataset and meets all of the email specifications we have mentioned. This difference between the datasets leads to different results achieved by applying the introduced method on each of the datasets.

5.2. Evaluation metrics In order to evaluate the effectiveness of our method in reconstructing the linear structure of conversation threads, we have used Precision, Recall and Rand Index as evaluation metrics. To compute these measures, each pair of emails is considered in turn. If the two emails belong to the same thread in both the tagged conversations and inferred conversations, we have a true positive (TP) case. A false positive (FP) case would be when the two emails do not belong to the same thread in tagged conversations, but are labelled as co-threads in the inferred conversations. A false negative (FN ) case is when the two emails belong to the same thread in tagged conversations but are not co-threads in the inferred conversations. A true negative (TN) case is when the two emails do not belong to the same conversation, either in the tagged conversations, or in the inferred conversations. Precision, recall and rand index are defined as: Precision =

TP ; TP + FP

Recall =

TP ; TP + FN

Rand index =

TP + TN TP + TN + FP + FN

In order to assess the ability of the proposed method to reconstruct trees of conversation threads, a comprehensive evaluation has been performed in this work. Some of the previous studies on conversation thread reconstruction have only employed accuracy of the predicted edges, defined as the proportion of correctly predicted edges (Accedge ), the evaluation criterion [26, 30]. However, Wang et al. [24] argued that such a measure is not sufficient to judge the predicted structure. It only evaluates pointwise predictions on each node, but totally ignores the predicted structure. They have defined a set of metrics to address this problem and evaluate the quality of the predicted structure more appropriately with respect to the ground-truth conversation threads. The first metric, Accpath which is defined as the correct portion of the path from each node to root, measures if we can read from the root to a particular email without missing an email or meeting an irrelevant email from another branch. Accpath is defined as: Accpath =

Pn

i=1

δ½PathGT ðiÞ = PathIC ðiÞ n

where PathGT ðiÞ and PathIC (i) indicate the set of nodes lying on the path from node i to the root of the conversation in the ground-truth and the inferred conversation, respectively, and n is the number of emails in the dataset. This metric is highly strict in matching the whole path, and the authors also present a relaxed version of the metric which computes the overlap between the path from one node to the root in ground-truth vs the path from the same node to the root in the predicted path. Path precision PPath and recall Rpath are then defined as: Ppath =

Pn

i=1

jjPathGT ðiÞ ⊂ PathIC ðiÞjj ; n

Rpath =

Pn

i=1

jjPathIC ðiÞ ⊂ PathGT ðiÞjj n

In fact, these path-based metrics emphasize the correct prediction of nodes at higher levels of the tree. Another important aspect that should be evaluated is how well the local structure of a node is preserved. When a node is a branching node, it is crucial to recover all its child nodes to get the correct track of its sub-trees. Therefore the authors have also defined node precision Pnode and node recall Rnode as the overlap between the set of children of each node in ground-truth and inferred conversation: Pnode =

Pn

i=1

jjChildGT ðiÞ ⊂ ChildIC ðiÞjj ; n

Rnode =

Pn

i=1

jjChildIC ðiÞ ⊂ ChildGT ðiÞjj n

where ChildGT ðiÞ and ChildIC (i) indicate the set of children of node i in ground-truth and the inferred conversation respectively.

5.3. Experiment results In all experiments, we conducted 5-fold cross-validation for evaluating the proposed methods. We evaluate the proposed methods for reconstructing linear and tree structure of conversation threads separately. In each model, results of some Journal of Information Science, 39(6) 2013, pp. 846–863 Ó The Author(s), DOI: 10.1177/0165551513494638

Dehghani et al.

857

Table 2. Evaluation results of reconstructing linear structure of conversation threads. Methods

Wu and Oard [22] LExLinC (subject/participants) Erera and Carmel [1] LExLinC (all features)

BC3

Apache

Redhat

Precision

Recall

Rand index

Precision

Recall

Rand index

Precision

Recall

Rand index

0.601 0.625

0.625 0.669

0.690 0.733

0.406 0.55

0.457 0.601

0.542 0.631

0.498 0.520

0.526 0.554

0.554 0.601

0.891 0.992

0.903 0.972

0.928 0.988

0.771 0.854

0.705 0.824

0.861 0.936

0.808 0.88

0.832 0.89

0.892 0.943

previous works are presented for comparison. Our evaluation process is based on comparison of tagged conversation threads in datasets with our algorithm’s inferred conversations. 5.3.1. Extracting linear structure of conversations. In order to compare the results of the proposed LExLinC method with other existing approaches in this research area, we have applied the methods proposed by Wu and Oard [22] and those proposed by Erera and Carmel [1] on each dataset. Wu and Oard group emails into the same conversation thread if they have the same subject line and have at least one sender/recipient in common. Erera and Carmel begin with grouping messages with identical subjects into candidate threads. These threads are then broken to sub-conversations according to the email similarities, and similar sub-conversations are grouped together to form conversations. Actually, Erera and Carmel’s approach is very similar to our proposed approach in terms of the leveraged features, and hence we have compared the results of our approach with these methods. In addition, we have tested LExLinC in a special case that uses only subjects and participants of emails as features. We have tested four conventional methods for prediction part of LExLinC: SVM, KNN, Bayesian and a predictor based on neural networks [46]. Among these methods, SVM led to the best results; thus we only report the results that employ SVM. In addition, we use a graph clustering algorithm, PPC [47], in LExLinC to partition the generated email graph for separating email conversation threads. Table 2 shows the results on the three datasets. As can be seen from the table, LExLinC outperforms all other methods on all three datasets. Performance of LExLinC on the Apache dataset is poorer than the other two datasets. Since in Apache, users participate in discussions more actively and the dataset contains several threads with many emails, and in many cases long conversations digress gradually, clustering all emails in long threads is rather difficult. On the other hand, results of LExLinC using only subject and participants as features indicate that LExLinC’s improvements compared with Wu and Oard’s method do not depend only on the type of features. Even when LExLinC uses the same features as the other methods, it outperforms those methods owing to its ability to compromise between the effects of different features by learning the best way of combining the features. We also conducted a statistical significance test (t-test) on the improvements of LExLinC over Erara and Carmel’s methods and on the improvements of LExLinC when it just uses subject and participants over Wu and Oard’s method. The results indicate that, in both cases, improvements of LExLinC in terms of precision and recall are statistically significant (p-value < 0.000005). 5.3.2. Extracting tree structure of conversations. In the next set of experiments, the performance of LExTreC for reconstructing the tree structure of conversations is evaluated. To compare LExTreC performance with other methods, we have tested some previous works that try to reconstruct the tree structure of conversation threads in email corpora on the three datasets. Lewis and Knowl [28] studied five retrieval strategies to indicate if an email is a response to another one. Their results showed that the most effective strategy is using the quotation of one email as a query and the unquoted part of other emails as documents. Joshi et al. [29] used segmentation and detection of near duplicate emails to find and organize messages that should be grouped together based on reply and forwarding relationships. Seo et al. [2] investigated retrieval and learning techniques that exploit the hierarchical thread structures in forum data. They introduced structure discovery techniques that use a variety of features to model relations between posts that are in the same thread. In fact, they tried to reconstruct the tree structure of a thread using the flat structure of its posts. We use their idea in email datasets but with different features related to email data. To investigate the performance of learning-to-rank algorithms, we have employed four different algorithms that have been shown to be the best techniques on numerous test collections: Ranking SVM, AdaRank, RankNet and Journal of Information Science, 39(6) 2013, pp. 846–863 Ó The Author(s), DOI: 10.1177/0165551513494638

Dehghani et al.

858

Table 3. Evaluation results of reconstructing linear structure of conversation threads on the BC3 dataset. Methods

Accedge

Accpath

Ppath

Rpath

Pnode

Rnode

Lewis and Knowles [28] Joshi et al. [29] Seo et al. [2] LExTreC

0.410 0.881 0.961 0.970

0.315 0.800 0.914 0.961

0.331 0.891 0.887 0.98

0.423 0.714 0.851 0.94

0.382 0.879 0.902 0.989

0.410 0.765 0.911 0.966

Table 4. Evaluation results of reconstructing tree structure of conversation threads on the Apache dataset. Methods

Accedge

Accpath

Ppath

Rpath

Pnode

Rnode


0.201 0.711 0.681 0.787

0.191 0.703 0.686 0.770

0.303 0.761 0.724 0.856

0.330 0.611 0.778 0.890

0.288 0.720 0.708 0.76

0.283 0.599 0.685 0.780

Table 5. Evaluation results of reconstructing tree structure of conversation threads on the Redhat dataset. Methods

Accedge

Accpath

Ppath

Rpath

Pnode

Rnode


0.242 0.797 0.702 0.831

0.239 0.707 0.632 0.802

0.315 0.832 0.781 0.893

0.293 0.641 0.763 0.930

0.317 0.861 0.816 0.892

0.294 0.719 0.810 0.908

RankBoost [38]. The performances of all of these algorithms were almost the same but Ranking SVM led to relatively better results. Ranking SVM tries to accurately rank the documents on top of the rankings owing to their importance for retrieval systems. Since in our task this issue is very important, Ranking SVM has learned relatively better models. Evaluation results of reconstructing tree structure of conversation threads on three different datasets are presented in Tables 3–5. Statistical t-tests indicate that improvements of our method in terms of precision and recall over all methods in all three datasets are statistically significant (p-value < 0.0005). Comparing the results in Tables 3–5, it can be seen that the results on BC3 and Redhat datasets are better than those on Apache dataset. In the Apache dataset, structures of conversation threads are very complicated, making the detection of the exact structure of threads harder. On the other hand, in the Apache dataset, there exist some cases where an email has many children, especially for the root emails. In many cases, users do not reply to the email that they really want to replyto; instead they reply to another email, usually the root email. In some of these cases, LExTreC detects the true location of emails in the tree. However, the ground-truth is based on tags which are generated automatically based on users’ actions. Nevertheless, it is obvious that LExTreC significantly outperforms the previous methods in all three datasets. The most important factor causing this significant accuracy, especially in path-based metrics, is that we focus on creating paths and try to reconstruct the whole tree structure as accurately as possible. Lewis and Knowles just used the text content of emails and thus failed to reconstruct the correct structure. Joshi made an assumption that, when an email replies to another email, the content of the former is quoted in the latter in a separate segment. This assumption is not always true. This is the case for BC3 dataset, in which nearly 30% of replies do not contain quoted text and Joshi et al.’s method cannot obtain a high recall. By error examination of Seo et al.’s method, we found that their method in some cases takes sibling nodes as parents incorrectly, since they only take direct similarities into account.

6. Discussion In this section, we try to address a number of important issues about our proposed approach. We will first present a brief discussion of the time complexity of LExLinC and LExTreC. We then discuss the reasons why the ground-truth might Journal of Information Science, 39(6) 2013, pp. 846–863 Ó The Author(s), DOI: 10.1177/0165551513494638

Dehghani et al.

859

Table 6. Effect of textual features – BC3 dataset. LExLinC

Full features Without text-based features With only text-based features

LExTreC

F1

Rand Index

F1Path

F1node

0.982 0.783 0.590

0.988 0.817 0.616

0.961 0.734 0.552

0.978 0.722 0.554

not indicate actual conversations in some cases. We will continue with a comprehensive study of the role of textual features of email contents in reconstruction of conversation threads. Finally, we briefly discuss cross-learning for situations where training data are insufficient.

6.1. A brief discussion of time complexity The time complexities of the training stages for LExLinC and LExTreC are almost the same. Having n emails, the time complexity of LExLinC’s feature extraction is O(n2 ): The time complexity of SVM training depends on the number of SVs, which is small compared with n, so the complexity of the whole training stage is Oðn2 Þ. This is also the case for the training stage of LExTreC, since the time complexity of path extraction is O(n). The testing stage of LExLinC has a time complexity of O(n2 ) for feature extraction, plus O(n) for clustering using the PPC method, in addition to O(nlogn) for sorting in chronological order, which yields an overall complexity of O(n2 ). The time complexity of LExTreC is O(n2 ) for feature extraction plus the overhead for calculating the implied relationships. This overhead equals O(n2 d 3 ), where d indicates the depth of the deepest discussion thread. We have n emails and, for each email, we can get d 2 parent–child relationships by dot product. Then we have to search for each of these d 2 vectors in all n * d vectors for the entire emails in order to change the original vectors. Usually d does not grow with n and it is always far smaller than n. For example, the number of emails in the Redhat dataset is five times bigger than the Apache dataset, but the value of d is smaller compared with Apache (Table 1). In the worst case, which hardly ever happens, all of the emails in a dataset are in one thread and this thread has a linear structure. Ignoring the worst case scenario, the total time complexity of testing stage of LExTreC can practically be estimated as Oðn2 Þ.

6.2. A note on the imperfection of ground-truth In the evaluation phase of this study, three different datasets have been used. The structures of the ground-truth conversations in these datasets are based on INREPLYTO tags. The most important difference among the datasets is the way they were annotated. In the BC3 corpus, these tags are assigned manually, so tagged conversations are real conversations. In contrast, in the Apache and Redhat datasets, these tags are assigned automatically. Therefore, unlike BC3, some tagged conversations must be split or merged to form real conversations. As an illustration, consider the following scenarios. In open source communities, it is common for a group of experts to participate in discussions on approximately similar topics. Consequently, there might exist more than one tagged conversation with almost the same group of people (i.e. experts) trying to solve similar problems. According to the formal definition of conversation threads, which expresses a conversation as a set of passed messages among the same group of people on the same topic, these mentioned threads must be considered as a unique conversation. On the other hand, people may start a different discussion as a reply to another out-dated conversation. This may happen so as to avoid retyping addresses. In these situations, automatically tagged conversations must be split into two distinct conversations as they have different topics, although their participants could be the same. Our method distinguishes these situations and merges or splits conversations appropriately. In some cases these issues cause differences between ground-truth and the results of our proposed methods.

6.3. Effect of email text in reconstructing conversation threads Text content of emails is an important attribute in determining their topics. Since all emails in a conversation are composed around the same topic, it could be beneficial to leverage their text when extracting conversations. Here, we study the effect of the presence or absence of text-based features in reconstructing conversation threads. Table 6 presents the Journal of Information Science, 39(6) 2013, pp. 846–863 Ó The Author(s), DOI: 10.1177/0165551513494638

Dehghani et al.

860

Table 7. Comparison of different textual features – BC3 dataset. Methods

VS_Pivoted BM25 LM_AD LM_JM LM_D Speech act Named entity

Linear structure

Tree structure

F1

Rand index

F1Path

F1node

0.413 0.484 0.467 0.470 0.489 — 0.564

0.441 0.471 0.470 0.479 0.487 — 0.512

0.383 0.402 0.399 0.421 0.436 0.081 0.441

0.408 0.401 0.407 0.426 0.430 0.109 0.451

results of three different experiments on reconstruction of both linear and tree structures of conversation threads in terms of F1 measure as the harmonic mean of precision and recall. In the first experiment, LExLinC and LExTreC are evaluated using all features. In the second experiment, the methods are evaluated employing features not related to the body text of emails and the third experiment evaluates the ability of our method when only text-based features are provided. According to the results illustrated in Table 6, reconstructing conversation threads regardless of text-based features has rather good results. However, leveraging text-based features along with other features can significantly improve the results. Even when our methods use only text-based features, the results are acceptable. This indicates that, in the absence of non-textual features, like SMS or chat data, we can reconstruct conversation threads and achieve reasonable accuracy. Moreover, we have evaluated the effect of each of the text-based features separately. Table 7 presents the results for measuring text similarity of emails in reconstruction of conversation threads using five different information retrieval techniques, including vector space model with pivoted normalization (VS_Pivoted), Okapi BM25, and using KLdivergence for measuring similarity of statistical language models of text content of two emails with three different smoothing methods, comprising absolute discounting (LM_AD), Jelinek–Mercer smoothing (LM_JM) and Dirichlet Prior smoothing (LM_D), as well as two natural language processing techniques, using speech act and similarity of named entities. As the table shows, using similarities of named entities leads to the best results. This is due to taking into account the existence of named entities in email conversations. Furthermore, comparing language modelling with different smoothing results, it can be seen that the performance of the Dirichlet Prior smoothing method is better than that of the two other methods. The reason for this is that text bodies of emails have different sizes, and the Dirichlet Prior smoothing method considers the length of documents in the smoothing.

6.4. Ability to generalize Although our proposed method is supervised, we would like to test its adaptability in the absence of labelled data. In some cases, we do not have any labelled threads from the target email dataset, but we might have some labelled threads from other email datasets. In this set of experiments, we investigate the capability of our method when applied in these situations. We perform a cross dataset train/test to evaluate the generality of the learning algorithm. In this setting, we use all three collections. The test sets and training sets are constructed by random selection of 20 threads from each of the three datasets. Then, LExLinC and LExTreC are learned on each training set and the learned models are applied to all three testing sets. F1 and Accpath are reported as the evaluation metrics. In Tables 8 and 9, columns indicate the testing set and rows indicate the training set. Comparing on-diagonal and off-diagonal entries in Tables 8 and 9 shows that LExLinC and LExTreC are capable of generalization in the learning process to some extent. Since Apache and Redhat are both technical mailing lists, they share some properties, and the learning models trained on Apache achieve promising performance on Redhat, and vice versa. However, BC3 is different from the other two datasets in terms of social network structure and also discussed topics, and the interaction patterns of its users are quite different from the other two email datasets. In such cases, employing unsupervised methods like threading emails based on their subject, text and similarity of participants would be more appropriate. Nonetheless, when there exists a labelled dataset with same properties as the target dataset, our method using cross learning outperforms unsupervised methods.


Dehghani et al.

861

Table 8. Cross learning in LExLinC –F1. Test Train

BC3

Apache

Redhat

BC3 Apache Redhat

0.974 0.501 0.613

0.453 0.821 0.791

0.470 0.755 0.860

Table 9. Cross learning in LExTreC –Accpath . Test Train

BC3

Apache

Redhat

BC3 Apache Redhat

0.949 0.281 0.394

0.244 0.741 0.670

0.301 0.711 0.781

7. Conclusions and future work The main goal of this study was to investigate methods to reconstruct conversation threads in email corpora. Email conversation threads are defined based on reply and forwarding relations among emails, and also their topics and participants. We proposed LExLinC and LExTreC learning methods that try to extract linear and tree structures of conversations, respectively. LExLinC learns to extract co-thread relations between emails and partitions the dataset into clusters of emails such that each cluster represents a conversation thread. On the other hand, LExTreC tries to learn parent–child relations among emails and extracts tree structure of conversations. Our evaluation results on the three email corpora indicated that exploiting our proposed learning approaches improves performance of conversation thread reconstruction. Furthermore, we studied the effect of text-based features in reconstructing conversation threads. Finally, we discussed the generalization ability of our supervised methods by examining cross learning for the three datasets. In the future, we will try to improve the quality of the detected conversations by estimating the content similarities more accurately, for example using semantic similarity among the contents of emails or employing text content expansion when the text content of an email is very short. Another idea for future work is to study different closeness measures to estimate the similarity of people regarding their social network. Also, we can adapt our method for use in online environments and add new emails into existing conversation trees using incremental learning methods. Funding This research was in part supported by two grants from the Institute for Research in Fundamental Sciences, Tehran, Iran (no. CS13914-18 and no. CS1390-4-06).

References [1] [2] [3] [4] [5] [6]

Erera S and Carmel D. Conversation detection in email systems. In: Proceedings of European conference on information retrieval (ECIR’08), 2008, pp. 498–505. Seo J, Croft WB and Smith DA. Online community search using conversational structures. Information Retrieval 2011; 14(6): 547–571. Duan H and Zhai Ch. Exploiting thread structures to improve smoothing of language models for forum post retrieval. In: Proceedings of European conference on information retrieval (ECIR), 2011, pp. 350–361. Kolla M and Vechtomova O. Retrieval of discussions from enterprise mailing lists. In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’07), 2007, pp. 881–882. Bhatia S and Mitra P. Adopting inference networks for online thread retrieval. In: Proceedings of the 24th Association for the Advancement of Artificial Intelligence (AAAI) conference, 2010, pp. 1300–1305. Magnani M, Montesi D and Rossi L. Conversation retrieval for microblogging sites. Information Retrieval 2011; 15(3): 354–372.


Dehghani et al. [7]

[8] [9]

[10] [11] [12] [13]

[14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24]

[25] [26]

[27] [28] [29] [30]

[31] [32] [33]

862

Joty S, Carenini G, Murray G and Ng RT. Exploiting conversation structure in unsupervised topic segmentation for emails. In: Proceedings of the 2010 conference on empirical methods in natural language processing (EMNLP’10), Association for Computational Linguistics 2010, pp. 388–398. Carenini G, Ng RT and Zhou X. Summarizing emails with conversational cohesion and subjectivity. In: Proceedings of Association for Computational Linguistics: Human language technologies (ACL: HTL), 2008, pp. 353–361. Mckeown K, Shrestha L and Rambow O. Using question–answer pairs in extractive summarization of email conversations. In: Proceedings of the 8th international conference on computational linguistics and intelligent text processing (CICLing’07), 2009, pp. 542–550. Alberts I and Forest D. Email pragmatics and automatic classification: A study in the organizational context. Journal of the American Society for Information Science and Technology (JASIST) 2012; 63(5): 904–922. Bekkerman R, McCallum A and Huang G. Automatic categorization of email into folders: Benchmark experiments on Enron and SRI Corpora. Center for Intelligent Information Retrieval, Technical Report IR-418 2004. Ding S, Cong G, Lin C and Zhu X. Using conditional random fields to extract contexts and answers of questions from online forums. In: Proceedings of Association for Computational Linguistics (ACL), 2008, pp. 710–718. Hong L and Davison B. A classification-based approach to question answering in discussion boards. In: Proceedings of the 32th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’09), 2009, pp. 171–178. Jurczyk P and Agichtein P. Discovering authorities in question answer communities by using link analysis. In: Proceedings of the 16th ACM conference on information and knowledge management (CIKM), 2007, pp. 919–922. Zhang J, Ackerman M and Adamic L. Expertise networks in online communities: Structure and algorithms. In: Proceedings of the 16th World Wide Web conference (WWW), 2007, pp. 221–230. Tamme T, Norbisrath U, Singer G and Vainikko E. Improving email management. In: Proceedings of international conference on advances in information mining and management (IMMM), 2011, pp. 67–72. Kerr B. Thread arcs: An email thread visualization. In: Proceedings of the Ninth annual IEEE conference on Information visualization (INFOVIS’03), 2003, pp. 211–218. Perer A and Shneiderman B. Beyond threads: Identifying discussions in email archives. In: Proceedings of the eleventh annual IEEE symposium on information visualization (InfoVis 2005), 2005, pp. 41–42. Savolainen R. Requesting and providing information in blogs and internet discussion forums. Journal of Documentation, 2011; 67(5): 863–886. Savolainen R. The structure of argument patterns on a social Q&A site. Journal of the American Society for Information Science and Technology 2012; 63(12): 2536–2548. Yeh JY and Harnly A. Email thread reassembly using similarity matching. In: Proceedings of the third conference on email and anti-spam (CEAS), 2006. Wu Y and Oard DW. Indexing emails and email threads for retrieval. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’05), 2005, pp. 665–666. Wang Y, Joshi M, Cohen W and Rose´ C. Recovering implicit thread structure in newsgroup style conversations. In: Proceedings of the second international conference on weblogs and social media (ICWSM), 2008, pp. 152–160. Wang H, Wang C, Zhai CX and Han J. Learning online discussion structures by conditional random fields. In: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval (SIGIR’11), 2011, pp. 435–444. Zawinski J. Message threading. http://www.jwz.org/doc/threading.html (accessed 23 April 2012). Wang X, Xu M, Zheng N and Chen M. Email conversations reconstruction based on messages threading for multi-person. In: Proceedings of international workshop on education technology and training and international workshop on geoscience and remote sensing (ETTANDGRS’08), 2008, pp. 676–680. Cowan-Sharp J. A study of topic and topic change in conversational threads. MSc thesis, Naval Postgraduate School, Monterey, California, 2009. Lewis DD and Knowles KA. Threading electronic mail: A preliminary study. Information Processing and Management 1997; 33(2): 209–217. Joshi S, Contractor D, Ng K, Deshpande PM and Hampp T. Auto-grouping emails for faster e-discovery. Proceedings of Very Large Database Endowment 2011; 4(12): 1284–1294. Dehghani M, Asadpour M and Shakery A. An evolutionary-based method for reconstructing conversation threads in email corpora. In: Proceedings of the 2012 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), 2012, pp. 1164–1169. Smith M, Cadiz JJ and Burkhalter B. Conversation trees and threaded chats. In: Proceedings of the 2000 ACM conference on computer supported cooperative work (CSCW’00), 2000, pp. 97–105. Shen D, Yang Q, Sun JT and Chen Z. Thread detection in dynamic text message streams. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’06), 2006, pp. 35–42. Aumayr E, Chan J and Hayes C. Reconstruction of threaded conversations in online discussion forums. In: Proceedings of international conference on weblogs and social media (ICWSM), 2011, pp. 26–33.


Dehghani et al. [34] [35] [36]

[37] [38] [39] [40] [41] [42] [43] [44] [45] [46] [47]

863

Ventura JAL, Hacid H, Ansiaux A and Maag ML. Conversations reconstruction in the social web. In: Proceedings of the 21st international conference companion on World Wide Web (WWW ’12 Companion), 2012, pp. 573–574. Duan H and Zhai C. Exploiting thread structures to improve smoothing of language models for forum post retrieval. In: Proceedings of European conference on information retrieval (ECIR’11), 2011, pp. 350–361. Feng D, Shaw D, Kim J and Hovy E. Learning to detect conversation focus of threaded discussions. In: Proceedings of the main conference on human language technology: Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL’06), 2006, pp. 208–215. Li H. Learning to rank for information retrieval and natural language processing, 1st edn. Morgan & Claypool, 2011. Singhal A, Buckley C and Mitra M. Pivoted document length normalization. In: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR’96), 1996, pp. 21–29. Robertson SE, Walker S and Beaulieu M. Experimentation as a way of life: Okapi at TREC. Information Processing and Management 2000; 36(1): 95–108. Zhai C and Lafferty J. A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems 2004; 22(2): 179–214. Finkel JR, Grenager T and Manning C. Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL), 2005, pp. 363–370. Searle J. Speech acts: An essay in the philosophy of language. Cambridge: Cambridge University Press, 1969. Cohen WW, Carvalho VR and Mitchell TM. Learning to classify email into ‘speech Acts’. In: Proceedings of the conference on empirical methods in natural language processing (EMNLP), 2004, pp. 309–316. Ulrich J, Murray G and Carenini G. A publicly available annotated corpus for supervised email summarization. AAAI08 email workshop, 2008. Soboroff I, de Vries AP and Craswell N. Overview of the TREC 2006 enterprise track. In: Proceeding of text retrieval conference (TREC), 2006, pp. 32–51. Stork DG, Richard OD and Peter EH. Pattern classification, 2nd edn. Chichester: John Wiley & Sons, 2001. Tabrizi SA. Personalized pagerank clustering: A graph clustering algorithm based on random walks. MSc thesis, University of Tehran, 2012.