Person Name Disambiguation in Web Pages using Social ... - CiteSeerX

Person Name Disambiguation in Web Pages using Social Network, Compound Words and Latent Topics Shingo Ono1 , Issei Sato1 , Minoru Yoshida2 , and Hiroshi Nakagawa2 1

Graduate School of Information Science and Technology, The University of Tokyo 2 Information Technology Center, The University of Tokyo [email protected], [email protected], [email protected], [email protected]

Abstract. The World Wide Web (WWW) provides much information about persons, and in recent years WWW search engines have been commonly used for learning about persons. However, many persons have the same name and that ambiguity typically causes the search results of one person name to include Web pages about several different persons. We propose a novel framework for person name disambiguation that has the following three components processes. Extraction of social network information by finding co-occurrences of named entities, Measurement of document similarities based on occurrences of key compound words, Inference of topic information from documents based on the Dirichlet Process Unigram Mixture model. Experiments using an actual Web document dataset show that the result of our framework is promising.

Keywords: person name disambiguation, web people search, clustering, social network

2

1

Introduction

The World Wide Web (WWW) provides much information about persons, and in recent years WWW search engines have been commonly used for learning about persons. However, ambiguity in person names (i. e. , many persons having the same name), typically causes the search results of one person name to result in Web pages about several different persons. In this paper, the ambiguity of person name in Web pages is defined as follows. Each string appearing as a name on a Web page is a reference to a certain entity in the real world, i. e. , each name refers to an entity. The ambiguity of person name in Web pages is that person names that have the same string in many Web pages refers to different entities. For example, if you want to know about a “George Bush” who is not the president but an ordinary person, many pages about the president that are returned as the search result may be a problem you. According to the circumstances, we may have to look once more to find Web pages about the target person among the many search results, which may be hard and time consuming work. Hereinafter, we use a term ”person name” to mean a string indicating the name of a person. In this paper, we propose a novel framework for person name disambiguation (i. e. , the problem of clustering Web pages about persons with the same name according to the true entities. ) Our framework is based on the following three intuitions: 1. Each person has his/her own social network. 2. There are specific compound key words that characterize him/her. 3. Each person is related to some specific topics. These intuitions led to our framework, which comprises the following steps. First, we extract social networks by finding co-occurrences of person names with Named Entity extraction tools (NE taggers). Second, we measure document similarities based on occurrences of key compound words that are extracted by using statistics of compound nouns and their components. Third, we infer topic information from documents based on a basic topic model Unigram Mixture, which is a probabilistic generative model of a document. In particular, we use Dirichlet Process Unigram Mixture (DPUM), which is an extension of Unigram Mixture that uses Dirichlet Process. Finally, we cluster Web pages by using the above three types of features (i. e. , social networks, document similarities, and documents topics. ) Among these three steps, the first step is the one proposed in our previous work [13]. The remaining part of this paper is organized as follows. Section 2 and 3 explain the task definition and related works Section 4 explains our framework. Section 5 evaluates our framework with an actual Web document dataset. Section 6 summarizes our work.

3

2

Task Definition

Our task, the disambiguation of person names appearing on Web pages, is formalized as follows. The query (target person name) is referred to as q. The set of Web pages obtained by inputting query q to a search engine is denoted by P = {d1 , d2 , · · · , dk }. Each Web page di has at least one string q. Then, the jth appearance of string q on Web page di is assumed to be sij . Each sij indicates only one entity in the set E = {e1 , e2 , · · · , en } of entities in the real world having the name q. Now, the set of sij is assumed to be S. We define function Φ : S → E. Function Φ is a mapping from the name appearing in the document to entities in the real world. In other words, Φ maps from a string to an entity. Our purpose ˘ that will approximate function Φ. is to find function Φ The modeling above permits the same string q appearing in the same document to refer to different entities. Web pages with such properties are quite rare and dealing with them makes the system more complicated, so we decided to ignore such pages by assuming that all instances of the same string q on a certain Web page di refer to the same entity, i. e. , for each i, there exists em ∈ E, such that ∀j, Φ(Sij ) = em . This assumption means that the same name that appears multiple times on one page only refers to one entity. This results in a simpler model i. e. , Φ′ : P → E. In this research, our aim was to estimate Φ′ . The problem here is n (that appears in the definition of E) is not known in advance. In other words, we do not know how many distinct entities have the string q. We actually estimated Φ′ by clustering Web pages. Our system works as follows. Given query q, the system retrieves Web pages that have string q using a search engine and then disambiguates the reference. Finally, the system outputs a set of page clusters, each of which refers to a single entity.

3

Related Works

Several important works have tried to solve the task described in the previous section. Bagga and Baldwin [3] applied the vector space model to calculating similarity between names using only co-occurring words. Based on this, Niu et al. [12] presented an algorithm that uses information extraction results in addition to co-occurring words. However, these methods had only been tested on artificial small test data, leaving doubt concerning their suitability for practical use. Mann and Yarowsky [8] employed a clustering algorithm to generate person clusters based on extracted biographic data. However, this method was also only tested on artificial test data. Wan et al. [15] proposed a system that rebuilt search results for person names. Their system, called WebHawk, was aimed at practical use like our systems, but their task was somewhat different. Their system was designed for actual frequent queries. The algorithm of their system was specialized for English person name queryies that consist of three words: family name, first name, and middle name. They mainly assumed queries such as “” or “ ”, and took middle names

4

into consideration, which may have improved accuracy. However, it would not be suitable for other types of names such as those in Japanese (consisting only of a family name and given name). As another approach to this task, Bekkerman and McCallum [4] proposed two methods of finding Web pages that refer to a particular person. Their work consists of two distinct mechanisms: the first is based on link structure and the second uses agglomerative/conglomerative double clustering. However, they focused on disambiguating an existing social network of people, which is not the case when searching for people in real situations. In addition, our experience is that the number of direct links between pages that contain the same name are fewer than expected, so information on link structures would be difficult to use to resolve our task. Although there may be indirect links (i. e. , one page can be found from another page via other pages), it is too time consuming to find them.

4

Proposed Framework

In this section, we explain three types of features of the proposed framework: social networks, document similarities and documents topics. 4.1

Preprocessing

We eliminate noise tokens such as HTML tags and stop words. We extract local texts that appear within 100 words before and after the target person name (query). In this study, our analysis is limited to the local texts. 4.2

Extraction of Social Networks

This section explains how to extract social networks and to cluster pages by using social networks. This method was used in our previous work[13]. We use graph representation of relations between documents. Let G be an undirected graph with vertex set V and edge set E. Each vertex vi ∈ V corresponds to page di . Then, edge eij represents that di and dj refer to the same entity. On the other hand, social networks can be seen as another graph structure in which each node represents an entity, each edge represents that the fact that two entities have a relation, and each connected component represents one social network. We assume that every pair of entities that appears in the same page have a relation. We also assume that the same name in the same social network refers to the same entity. In graph G, we make edge eij if the same person name m (other than q) appears in both of di and dj because, roughly speaking, this means that both of di and dj are related to m (i. e. , both are in the same social network which m

5

belongs to. ) 3 Moreover, we utilize the place names and organization names that appear near the position of the target person name to extract more information of social networks. the place names and organization names can be discriminating as well as person names around the target parson name. To identify person, place, and organization names, we used CaboCha 4 as an NE tagger which tags each proper noun according to context, such as person name (family name), person name (first name), place name, or organization name. The clustering algorithm by Social Networks is presented below. Procedure: Clustering by Social Networks(SN) 1. From all documents dj (1 ≤ j ≤ k), extract person names (full name), place names and organization names with a NE tagger. 2. Calculate SN similarity simSN (dx , dy ) as, simSN (dx , dy ) = µ ∗ (number of person names appearing in both dx and dy ) +ν ∗ (number of place or organization names appearing in both dx and dy ) 3. If simSN (dx , dy ) ≥ θSN , then Φ′ (dx ) = Φ′ (dy ), where θSN is the threshold. This is where µ and ν are parameters for weighting and θSN is a threshold. In this study, µ and ν are constrained as µ >> ν. The constraint says that person names are more important than other names. Φ′ (dx ) = Φ′ (dy ) means two pages, dx and dy , are to be in the same cluster and clustering is done as follows. Let G be an undirected graph with vertex set V and edge set E. Each vertex vi ∈ V corresponds to page di . The result of the above procedure gives edge set E. Each edge eij ∈ E exists if and only if constraint Φ′ (di ) = Φ′ (dj ) was added in Step 3 of the above algorithm. Then, graph G = ⟨V, E⟩ has some connected components. Each connected components means one cluster of Web pages all of which refer to the same entity. In Fig. 1, the dotted-lines show occurrences of the same person name, place name or organization name. In Fig. 2, the solid lines show the connection among documents whose SN similarities are over the threshold. 4.3

Document Similarities based on Compound Key Words

This section explains how to measure document similarities based on key compound words and to cluster documents by similarity. First, we calculate an importance score of compound words in a document with the method proposed by Nakagawa et al. [10]. Next, we construct a compound word vector cwv = (s1 , s2 , · · · , sVc ) for each document where {1, 2, · · · , Vc } 3

4

We ignore the ambiguity of m by assuming that it is rare that two or more social networks contain the same person name pair (q, m). http://chasen.org/˜taku/software/cabocha/

6

BB

A

BB XX local text

A

YY local text

XX YY

A

BB local text

ZZ A

target person name BB local text

another person name organization name place name

Fig. 1. Co-occurrence of Person Name, Place Name and Organization Name.

Cluster 1 Cluster 3

Cluster 2

Fig. 2. Clusters constructed by Social Networks corresponding to µ, ν and θSN .

are the indices of the compound words in documents and sv is the score of compound word v. Then, we measure the document similarity by using the scalar product of the compound word vectors. Finally, we cluster the documents by the similarity and a threshold. The importance score for the compound words is calculated as follows: Let CW (= W1 W2 · · · WL ) be a compound word, where Wi (i= 1, . . , L) is a simple noun. f (CW ) is the number of independent occurrences of compound word CW in a document where ”independent” occurrence of CW means that CW is not a part of any longer compound nouns. The importance score of compound word CW is Score(CW ) = f (CW ) · LR(CW )

(1)

LR(CW ) is defined as follows: 1

L LR(CW ) = (Πi=1 (LN (Wi ) + 1)(RN (Wi ) + 1)) 2L

(2)

LN (Wi ) and RN (Wi ) are the frequencies of nouns that directly precede or succeed simple noun Wi .

7

This system can be obtained as ”Term Extraction System”5 . The clustering algorithm by key compound words is presented below. Procedure: Clustering by Compound Key Words(CKW) 1. From all documents dj (1 ≤ j ≤ M ), extract key compound words and construct compound word vectors cwvj (1 ≤ j ≤ k) with Term Extraction System . 2. Calculate CKW similarity simCKW (dx , dy ) as, simCKW (dx , dy ) = cwvx · cwvy 3. If simCKW (dx , dy ) ≥ θCKW , then Φ′ (dx ) = Φ′ (dy ), where θCKW is the threshold. Having constrains Φ′ (dx ) = Φ′ (dy ), clustering is done in the same way as Social Networks. 4.4

Estimate Latent Topic of Document

In this paper, we assume that pages referring to the same entity have the same latent topic that indicates a word distribution. Therefore, inferring the latent topic of a page allows pages the pages that have the same topic to be categorized into the same cluster. As a clustering algorithm that can treat the latent topic, we adopt Unigram Mixture that is a basic topic model [11]. Moreover, we use Unigram Mixture expanded by Dirichlet Process [6] : Dirichlet Process Unigram Mixture(DPUM). DPUM can estimate the number of latent topics corresponding to a set of pages. In the person name disambiguation, the number of true entities(topics) is unknown at first, so DPUM is suitable to our purpose. Unigram Mixture is a probabilistic generative model of a document based on Unigram Model, which assumes that the words of every document are drawn independently from a single multinomial distribution. In Unigram Mixture, each document is generated by the topic-conditional multinomial distribution p(w|z, ϕ). z ∈ {1, 2, · · · , T } is a latent topic and T is the number of latent topics. ϕ = {ϕt }Tt=1 is the parameter of the multinomial distribution corresponding to latent topic t where ϕt = (ϕt1 , ϕt2 , · · · , ϕtNv ) and Nv is the number of vocabulary items and ϕtw is the probability that word w is generated from topic t. It is a problem that the number of latent topics is unknown in advance. To solve this problem, a nonparametric Bayes model using Dirichlet Process was proposed [1][5] [6]. This model can change the model structure (the number of latent topics, etc. . . ) in correspondence with the data. A mixture model expanded by Dirichlet Process is called Dirichlet Process Mixture(DPM) [1]. Sethuraman provides a constructive representation of Dirichlet Process as Stick-breaking Process [14]. By using Stick-breaking process, the effective learning algorithm of DPM can be proposed [5]. 5

http://www.r.dl.itc.u-tokyo.ac.jp/ñakagawa/resource/termext/atr-e.html

8

The stick-breaking process is based on countably infinite random variables ∞ ∞ {βt }∞ t=1 , {πt }t=1 and {ϕt }t=1 as follows: t−1 βt ∼ Beta(1, α0 ), πt = βt Πi=1 (1 − βi ) ϕt ∼ G0

(3) (4)

α0 is a concentrate parameter and G0 is a base measure of Dirichlet Process. In DPUM, G0 is Dirichlet distribution p(ϕ|λ) where λ is a parameter of Dirichlet distribution. Beta is a Beta distribution. We write π(= {π}∞ t=1 ) ∼ SB(π; α0 ) if π is constructed by Eq. (3). The process of generating a document in DPUM by using the stick-breaking process is as follows: 1. Draw π (= {π}∞ t=1 ) ∼ SB(π; α0 ) 2. Draw ϕt ∼ G0 (t = 1, 2, · · · , ∞) 3. For each document d: (a) zd ∼ M ulti(z; π) (b) For each of the Nd words wdn : wdn ∼ p(w|zd , ϕ) Note that M ulti is a Multinomial distribution and p(w = v|z = t, ϕ) = ϕtv . Therefore, DPUM can be formulated in the joint probability distribution as follows. M p(D, z, π, ϕ|α0 , λ) = p(π|α0 )p(ϕ|λ)Πd=1 p(wd |zd , ϕ)p(zd |π)

(5)

Nd Πn=1 p(wdn |zd , ϕ)

(6)

p(wd |zd , ϕ) =

M is the number of documents. Nd is the number of words in a document d. wd = (wd1 , wd2 , · · · , wdNd ) is a sequence of Nd words where wdn denotes the nth word in the sequence. p(π|α0 ) is SB(π|α0 ). For inference of latent topics in DPUM, we adopt Variational Bayes inference, which provides a deterministic method [2]. Blei et al. proposed a framework of Variational Bayes inference for DPM that was restricted to an exponential family mixture and was formulated by Stick-breaking process [5]. This inference scheme does not need to set the number of latent topics, but it does need to set a maximum number of latent topics due to computational cost.

5 5.1

Experimentation of Proposed Framework Data Set

As we mentioned, the English corpus for the Name Disambiguation task is developed in WePS [16]. Because our system targets Japanese Web pages, however, we developed an original Japanese Web page test set for this task as follows.

9

We first input Japanese person name queries into a search engine. Some of the person queries were chosen from among ambiguous popular names. For example, “Taro Kimura” is a very common name in Japan, and we found there were many people called “Taro Kimura”, including a famous commentator, a member of the Diet, a translator, and a schoolmaster. Some other queries were selected from persons in our laboratory, and other person name queries were generated automatically. Second, we tried to extract Web pages containing these names. We retrieved these pages with a search engine. If the query hit many pages, we collected the top 100-200 Web pages. Finally, these pages were manually annotated6 . Annotators removed pages that violated our assumption that one page refers to only one entity. As a result, we collected 5015 Web pages on 38 person names, and all page references were clarified. 5.2

Evaluation

Precision (P), recall (R), and F-measure (F) were used as the evaluation metrics in our experiments. All metrics were calculated as follows [7]. Assume C = {C1 , C2 , · · · , Cn } is a set with correct grouping, and D = {D1 , D2 , · · · , Dm } is a set for the result of clustering, where Ci and Dj are sets of pages. For each correct cluster Ci (1 ≤ i ≤ n), we calculated precision, recall, and F-measure for all clusters Dj (1 ≤ j ≤ m) as Pij =

|Ci ∩ Dj | |Ci ∩ Dj | 2Pij Rij , Rij = , Fij = . |Dj | |Ci | Pij + Rij

The F-measure of Ci (Fi ) was calculated by Fi = maxj Fij . Using j ′ = argmaxj Fij , Pi and Ri were calculated as Pi = Pij ′ , Ri = Rij ′ . The entire evaluation was conducted by calculating the weighted average where weights were proportional to the number of elements in the clusters, calculated as F= ∑n

n ∑ |Ci |Fi i=1

|C|

, where |C| = i=1 |Ci |. The weighted average precision and recall were also calculated in the same way for the F-measure. 5.3

Baseline

Baseline is a clustering method that uses well-known document similarities by word frequency. 6

Note that the annotators were unable to determine pages perfectly; there were a few pages that were too ambiguous to determine. To standardize the results, each ambiguous page was regarded as referring to another independent entity, i. e. , each of them composed a cluster by itself in correct grouping.

10

First, we construct a word frequency vector wfvj = (f1 , f2 , · · · , fW ) for each document where {1, 2, · · · , W } are the indices of the vocabulary in documents and fw is the frequency of word w in a document dj . Then, we measure the document similarity by using the scalar product of the word frequency vectors: simBase (dx , dy ) = wfvx · wfvy Finally, we cluster the documents by the similarity simBase and a threshold θBase . The clustering is done in the same way as Compound Key Words. Moreover, we investigated No Cluster in which all documents are categorized into different clusters, that is, there are not any documents that are categorized into the same cluster. 5.4

Experimentation

We investigated which of the Social Network (SN), Compound Key Words (CKW), Dirichlet Process Unigram Mixture (DP) or their combinations were the best. Combinations of two or three methods means different methods together used together. More precisely, the result of the combination of SN and CKW is given by considering graph G = ⟨V, ESN ∪ ECKW ⟩ and G = ⟨V, ESN ∩ ECKW ⟩ where GSN = ⟨V, ESN ⟩ is the result for SN and GCKW = ⟨V, ECKW ⟩ is the result for KWC. DP needs to initialize the latent topic zd of a document and the maximum number of latent topics. When DP was applied in a stand-alone way, the latent topic was initialized randomly and the maximum number of topics was set to 100. When DP was combined with SN/CKW methods, SN/CKW methods were applied first, and DP was then initialized with the SN/CKW results. That is, we regarded the clusters constructed by SN/CKW methods as the initial latent topics of DP and applied DP. In this case, the maximum number of latent topics was set to the number of the cluster constructed by SN/CKW methods. Table 1 lists the results of an average of 38 queries. Fig. 3-6 shows F-measure of SN, CKW, SN∪CKW, SN+DP, CKW+DP and SN∪CKW+DP with respect to each person name. According to the results, Either SN or CKW showed the great improvement from the baseline. In addition, they seem to employ distinct type of information to a certain extent because (SN∪CKW) shows four to five points improvement from SN or CKW alone. The fact that DP also improves SN or CKW on F-measure means that DP introduces another aspect of the information, i. e. , documents topics. As expected from these results, proposed methods (SN∪CKW, DP) showed the highest performance on F-measure among others.

6

Conclusion

We propose a novel framework for person name disambiguation that has the following three components processes: social networks, document similarities by compound key words and documents topics. Experiments using an actual Web

11 Table 1. Results: Average of 38 queries

No Cluster Baseline DP SN CKW SN ∩ CKW SN ∪ CKW SN+DP CKW+DP SN ∩ CKW+DP SN ∪ CKW+DP

F 0. 2996 0. 5319 0. 3853 0. 7163 0. 7077 0. 6196 0. 7546 0. 7388 0. 7130 0. 6535 0. 7601

㪈

P 1. 0000 0. 7744 0. 8443 0. 9000 0. 8263 0. 9469 0. 8448 0. 8994 0. 8260 0. 9457 0. 8462

R 0. 2536 0. 5350 0. 3526 0. 6692 0. 7109 0. 5180 0. 7707 0. 6975 0. 7210 0. 5542 0. 7793

㪇㪅㪐㪌㪇㪅㪐

㪇㪅㪐㪇㪅㪏㪌㪇㪅㪏

㪪㪥㪪㪥㪃㩷㪛㪧

㪇㪅㪎㪇㪅㪍

㪝㪄㫄㪼㪸㫊㫌㫉㪼


㪇㪅㪏

㪇㪅㪎㪌

㪚㪢㪮㪚㪢㪮㪃㩷㪛㪧

㪇㪅㪎㪇㪅㪍㪌㪇㪅㪍

㪇㪅㪌㪇㪅㪌㪌

㪇㪅㪋

㪇㪅㪌

㪈㪊㪌㪎㪐㪈㪈㪈㪊㪈㪌㪈㪎㪈㪐㪉㪈㪉㪊㪉㪌㪉㪎㪉㪐㪊㪈㪊㪊㪊㪌㪊㪎㪧㪼㫉㫊㫆㫅㩷㫅㪸㫄㪼㩷㪠㪛

㪈㪊㪌㪎㪐㪈㪈㪈㪊㪈㪌㪈㪎㪈㪐㪉㪈㪉㪊㪉㪌㪉㪎㪉㪐㪊㪈㪊㪊㪊㪌㪊㪎㪧㪼㫉㫊㫆㫅㩷㫅㪸㫄㪼㩷㪠㪛

Fig. 3. F-measure of SN and SN+DP with Fig. 4. F-measure of CKW and CKW+DP respect to each person name. with respect to each person name.

document dataset show that the result of our framework is promising because our framework uses distinct type of information potentially being within documents. Acknowledgments This research was funded in part by Category ”A” of ”Scientific Research” Grants in Japan.

References 1. Antoniak: Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems, The Annals of Statistics, Vol 2, No 6, 1974. 2. H. Attias: Learning parameters and structure of latent variable models by Variational Bayes, In Proceedings of Uncertainty in Artificial Intelligence, 1999. 3. A. Bagga and B. Baldwin: Entity-Based Cross-Document Coreferencing Using the Vector Space Model. In Proceedings of COLING-ACL 1998, pp. 79–85, 1998. 4. R. Bekkerman and A. McCallum: Disambiguating Web Appearances of People in a Social Network. In Proceedings of WWW2005, pp. 463–470, 2005. 5. D. M. Blei and M. I. Jordan : Variational inference for Dirichlet process mixtures, Journal of Bayesia Analysis, Vol 1, No 1, pp. 121–144, 2005.

12 㪈㪇㪅㪐㪌

㪇㪅㪐

㪪㪥㪚㪢㪮㪪㪥㺥㪚㪢㪮

㪇㪅㪎



㪇㪅㪏㪌

㪇㪅㪏㪇㪅㪎㪌

㪇㪅㪍

㪇㪅㪍㪌

㪇㪅㪌

㪇㪅㪌㪌

㪇㪅㪋㪈㪊㪌㪎㪐㪈㪈㪈㪊㪈㪌㪈㪎㪈㪐㪉㪈㪉㪊㪉㪌㪉㪎㪉㪐㪊㪈㪊㪊㪊㪌㪊㪎㪧㪼㫉㫊㫆㫅㩷㫅㪸㫄㪼㩷㪠㪛

㪪㪥㩷㺥㩷㪚㪢㪮㪪㪥㺥㪚㪢㪮㪃㩷㪛㪧

㪇㪅㪋㪌㪈㪊㪌㪎㪐㪈㪈㪈㪊㪈㪌㪈㪎㪈㪐㪉㪈㪉㪊㪉㪌㪉㪎㪉㪐㪊㪈㪊㪊㪊㪌㪊㪎㪧㪼㫉㫊㫆㫅㩷㫅㪸㫄㪼㩷㪠㪛

Fig. 5. F-measure of SN, CKW and Fig. 6. F-measure of SN∪CKW and SN∪CKW with respect to each person SN∪CKW+DP with respect to each pername. son name.

6. Ferguson: A Bayesian Analysis of Some Nonparametric Problems, The Annals of Statistics, Vol 1, No 2, 1973. 7. B. Larsen and C. Aone: Fast and effective text mining using linear-time document clustering. In Proceedings of the 5th ACM SIGKDD. pp. 16–22, 1999. 8. G. S. Mann and D. Yarowsky: Unsupervised Personal Name Disambiguation. In Proceedings of CoNLL2003, pp. 33–40, 2003. 9. T. S. Morton: Coreference for NLP Applications. In Proceedings of ACL-2000, pp. 173–180, 2000. 10. H. Nakagawa and T. Mori: Automatic Term Recognition based on Statistics of Compound Nouns and their Components, Terminology, Vol 9, No 2, pp. 201–219, 2003 11. K. Nigam, A. K. McCallum, S. Thrun and T. M. Mitchell: Text Classification from Labeled and Unlabeled Documents using EM. Machine Learning, Vol 39, pp. 103–134, 2000. 12. C. Niu, W. Li, and R. K. Srihari: Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction. In Proceedings of ACL-2004, pp. 598–605, 2004. 13. S. Ono, M. Yoshida and H. Nakagawa: NAYOSE: A System for Reference Disambiguation of Proper Nouns Appearing on Web Pages, In Proceedings of AIRS2006, LNCS 4182, pp. 338–349, 2006. 14. Sethuraman . : A Constructive Definition of Dirichlet Priors, Statistica Sinica, Vol 4, pp. 639–650, 1994. 15. X. Wan, J. Gao, M. Li, and B. Ding: Person Resolution in Person Search Results: WebHawk. In Proceedings of CIKM2005, pp. 163–170, 2005. 16. Web People Search Task, SemEval-2007