citations of âJ. Smithâ in different documents may refer to different authors. In order to ..... Addison-Wesley Longman Publishing Co., Inc., Boston (1989). 18.
A Graph Partitioning Approach to Entity Disambiguation Using Uncertain Information Emili Sapena, Llu´ıs Padr´o, and Jordi Turmo TALP Research Center Universitat Politecnica de Catalunya Barcelona, Spain {esapena,padro,turmo}@lsi.upc.edu
Abstract. This paper presents a method for Entity Disambiguation in Information Extraction from different sources in the web. Once entities and relations between them are extracted, it is needed to determine which ones are referring to the same real-world entity. We model the problem as a graph partitioning problem in order to combine the available information more accurately than a pairwise classifier. Moreover, our method handle uncertain information which turns out to be quite helpful. Two algorithms are trained and compared, one probabilistic and the other deterministic. Both are tuned using genetic algorithms to find the best weights for the set of constraints. Experiments show that graph-based modeling yields better results using uncertain information.
1 Introduction and Motivation Entity disambiguation resolves the many-to-many correspondence between mentions of entities in natural language and real-world entities. A real-world entity can be expressed using different aliases due to multiple reasons: use of abbreviations, different naming conventions (e.g. “Name Surname” and “Surname, N.”), misspellings or naming variations over time (e.g. “Leningrad” and “Saint Petersburg”). Furthermore, some different real-world entities may have the same name or share some aliases. For instance, two citations of “J. Smith” in different documents may refer to different authors. In order to keep coherence in data extracted from text for further analysis, information integration is mandatory. This means, to determine when different mentions refer to the same real entity and when same mentions refer to different ones. This problem arises in many applications that integrate data from multiple sources. Concretely, many tasks related to natural language processing have been involved in the problem, such as question answering, summarization, information extraction, among others. The entity disambiguation problem is also known as identity uncertainty, record linkage, deduplication, mention matching and many others. Many techniques have been explored the Entity Disambiguation problem. Some of them use rules [1] while some others use string similarity functions [2,3]. In most works, the knowledge is manually defined, such as rules or weights [1,2], and only some works rely on the use of machine learning approaches [3,4]. Some techniques take advantage of an ontology structure, like clustering template elements [5], or exploiting relations [6,7]. A. Ranta, B. Nordstr¨om (Eds.): GoTAL 2008, LNAI 5221, pp. 428–439, 2008. c Springer-Verlag Berlin Heidelberg 2008
A Graph Partitioning Approach to Entity Disambiguation
429
id Name or alias Relations with clubs A Tomasz Waldoch Schalke (2001 - 2002) B Tomasz Waldoch Bochum (1993 - 1999) Schalke 04 (1999 - 2006) C Waldoch Bochum (1996 - 1998) For example, we have three football players candidates to be the same real person with their relations with clubs in the table. A and B have exactly the same name but they are extracted from different sources so we are not sure if they are the same person. C is an alias of them, extracted from another document. A pairwise classifier would easily determine that A and B are the same person and also B and C, because they have played in the same clubs at the same time. However, determining if A and C are the same real entity may fail because a lack of information between them. If it happens, there will be a contradiction at the end of the process. We also have that the element A has a relation with the club “Schalke” (element X) while element B has a relation with “Schalke 04” (element Y ). These two clubs are also entities to disambiguate. We call this uncertain information because we can not ensure that they play in the same club, neither the opposite. If the classifier tries to determine if X and Y are the same club, it would need to know if A and B are the same person. However, if it tries first to disambiguate A and B it would need to know if X and Y are the same club. An iterative process seems to be more appropriate for this kind of information. Fig. 1. Examples of pairwise classifier lacks
More recent works take advantage of some domain knowledge at the semantic level to improve the results. For example, [8] shows how semantic rules, either automatically learned or specified by a domain expert, can improve the results. [9] use probabilistic domain constraints in a more general model using a relaxation labeling algorithm to perform matching. Most of these works face the problem as a pairwise binary classifier, where a pair of mentions of entities are classified as referring to the same entity or not. However, this point of view does not always take advantage of all the available information mainly due to two reasons: – A classifier by pairs uses attributes of both elements, their relations and the constraints applied to them. This situation can cause misclassifying of some pairs of elements simply because of a lack of information. Consequently, this may lead the process to finish with contradictions in the results. – In a binary classifier, it is not possible to use uncertain relations during the disambiguation process because the order of the pairs to classify may change the final result. Also, that uncertainty might make the process fail. There is an example in figure 1. We call uncertain information or uncertain relations to the relations referring to an alias. In a data set where different entity types are related and each type has uncertainty (possible duplicates), we have to deal with uncertain relations.
430
E. Sapena, L. Padr´o, and J. Turmo
As far as we know, three approaches in the state of the art deal with these problems. The first one consists in an iterative execution of the classification process like in the work of [6]. The second approach [10] defines the task with Markov logic networks and solve it doing logic inference with a MaxSAT solver. Third approach represents the problem as a graph for a subsequent partitioning [11,12]. We centered our work in the graph-based approach. In this paper we propose a graph representation of the Entity Disambiguation problem taking advantage of already extracted uncertain information. We also propose the use of two iterative algorithms for resolution. On one hand, this point of view can overcome the difficulties of the pairwise classifiers because results do not fall in contradictions at the end of the process. On the other hand, using algorithms that work iteratively solving graph partitioning, one may use the information of uncertain relations in a more natural way. The rest of the document is structured as follows. Section 2 presents an overview of related work. Section 3 formally defines the problem. The methodology employed is explained in section 4, and the algorithms are detailed in section 5. The last sections describes our experiments, results and conclusions.
2 Related Work To the best of our knowledge, there are few works approaching multi-type entity disambiguation as a graph using uncertain information. [11] defines a conditional model to disambiguate different entity types using, among others, the information offered by their relations. In this way, they propose a relational graph partitioning algorithm that ensures consistency in the decisions taken and also take advantage of uncertain relations. There is no previous purge in the data in order to reduce the execution costs. Other recent works [13,14] are based on the same idea. [15] system and learning process is adaptive to any dataset. They deal with uncertain information using a structural connection strength measure, separately of feature functions. It can not have semantic constraints over relations such as time, kind of relation, and so on. The algorithms used in these works are based in a greedy decision where, as in any greedy algorithm, wrong decisions at the beginning may unleash a bad performance. In our work we use two different algorithms that work iteratively and only take definitive decisions at the end of the execution. Other improvements presented here are a candidate selection system in order to reduce the execution costs and an homogeneous way to use attributes and relations between elements in any feature function.
3 Problem Definition and Representation Entity Disambiguation problem consists of a set of references to entities (elements) that have to be mapped to the minimal collection of individual entities. Representing the problem in a graph we are reducing Entity Disambiguation to a graph partitioning problem given a set of constraints. At the end of the process, every partition will be a group of elements representing a real entity.
A Graph Partitioning Approach to Entity Disambiguation
431
Let G = G(V, E) be an undirected graph where V is a set of vertices and E a set of edges. Each element in our data is represented as a vertex v ∈ V in the graph and an edge e ∈ E is added to the graph for every pair of vertices representing elements which can potentially be the same entity. The set of constraints between two elements is used to compute a variable weight value in each edge, which indicates how sure we are that the elements represented by the two adjacent vertices may be the same real entity. There are two kinds of constraints: – Fixed constraints. Constraints that depend on static data. These constraints are comparisons about template elements attributes, relations, and other semantic rules. For example, two organizations having the same year of foundation or two people related with the same organization. – Variable constraints. Constraints obtained from uncertain relations. These constraints may change their influence during the disambiguation process depending on the current state of the elements involved in the uncertain relation. For example, two people may be the same real person when the organizations related to them may be the same organization. If, during the disambiguation process, both organizations in the example tend to be the same real entity, both people will tend also to be the same person, and the other way round. Variable constraints are evaluated at each iteration during the disambiguation process and their weights are obtained depending on the involved elements state. Finally, the edge weight used by the algorithms is the sum of the weight produced by the fixed constraints and the weight obtained evaluating the variable constraints. Negative weights indicates that the involved elements should not be in the same partition. Let x = (x1 , ...xn ) be the set of elements to disambiguate. For each xi , a vertex vi is added to the graph. The elements may have some attributes and we write them as xi = (xi .a1 , xi .a2 , xi .a3 , ...) where, for instance, when xi is an element of the type organization, xi .a1 is the attribute foundation year. The set of relations between two elements (xi and xj ), even direct or indirect, are represented as rij . Additionally, we have a vector s(t) = (s1 , ...sn ) containing the state of each vertex at iteration t. The state of a vertex vi is si , a value indicating the partition where it is assigned. Generally, edges weight for the graph partitioning is obtained before resolution like follows: λk fk (xi , xj , rij ) (1) eij .weight = k
where fk (·) is a feature function that evaluates constraint k. It may use the information of the elements xi and xj , some of their attributes and their relations. And λk is the weight applied to the feature function. However, in our proposal we also utilize variable constraints that need to know the state of other vertices to be evaluated. Consequently, we call fixed weight to the weight contribution obtained in equation 1: λk fk (xi , xj , rij ) (2) eij .wf ix = k
432
E. Sapena, L. Padr´o, and J. Turmo
and we define the variable weight as: eij .wvar(t) =
λk fk (xi , xj , rij , s(t))
(3)
k
where t is the iteration number when the process is running. Adding s(t), we are providing new information to the feature function which is used to evaluate variable constraints. Finally, in each iteration, definitive weight is obtained as follows: eij .weight(t) = eij .wf ix + eij .wvar(t)
(4)
This implies that algorithms used to resolve graph partitioning need to deal with dynamic weight values.
4 Methodology The methodology used in our work for the Entity Disambiguation problem consists of four steps. First, to select the candidates. Second, to find the constraints between them to generate the graph. Once we have the problem represented in a graph, third step is to find the optimal weight combination for the feature functions that evaluate constraints. Finally, the graph partitioning problem is solved. The input of the process is a set of elements extracted from different sources that might be duplicated and the output is this set of elements grouped by the real entities they refer. Following subsections describe each one of these steps. 4.1 Candidate Selection In order to avoid a graph where each vertex is adjacent to all the others, we generate a graph selecting elements candidates to be the same real entity. We select as candidates pairs of elements where one is an alias of the other. To do that we use the method of Alias Assignment developed by [16]. The method consists in training a Support Vector Machines pairwise classifier where each pair of elements is represented as a vector of features. These features are obtained using similarity functions such as string matching, edit distance and acronyms similarity. Also world-knowledge is used like city names in different languages for organizations or tipical nicknames for people. Depending on the kind of elements and their domain such as organization or people, we use different features. Not all of the candidates have an edge between them because not all of them share an alias. For example, we have three elements of people named a) “Jason”, b) “Jason Smith” and c) “Smith”. Elements a and b will be adjacent vertices in the graph because “Jason” is an alias of “Jason Smith”. The same happens with elements b and c. The vertices representing a and c won’t be connected by any edge, however, they will be in the same subgraph so they may be, at the end, the same entity. Then, we also link candidates that we are pretty sure that they are not the same entity, given that they already are in the same subgraph. These edges will have a negative weight which may help the algorithm. Following the same example, if “Jason” and “Smith” have different birth dates, they are also linked.
A Graph Partitioning Approach to Entity Disambiguation
433
In this way, the whole problem representation as a graph consists of a set of subgraphs each of which is an entity disambiguation subproblem. Candidate selection is not a strictly necessary stage but helps reducing computational costs. 4.2 Constraints Evaluation In the second step we generate the constraints applicable to the pairs of candidates. Using the ontology and the knowledge of the domain, an expert manually writes a set of rules that, when applied to a pair of elements, help to know whether they are the same entity or not. These constraints can be seen as soft rules and their influence or weight will be determined with Genetic Algorithms using training data in the next step. Constraints can be of any order, that is, any number of elements or pairs of elements may be involved in a constraint. Table 1 shows some examples of constraints. Table 1. Example of constraints Constraint Kind of entity Description Kind of affected constraint c1 Organization Orgi and Orgj are likely to match Fixed if foundation dates are equal c2 Organization Orgi and Orgj are likely to match Variable if P erl and P erk are also likely to match and P erl belongsTo Orgi and P erk belongsTo Orgj c3 Person P eri and P erj are likely to match Fixed if they are related to the same organization c4 Person P eri and P erj are unlikely to match Fixed if they are doing different events at the same time
4.3 Finding Optimal Weights The performance of the algorithms depend on the edge weights which, at the same time, depend on the constraint weights. In order to achieve good performance, it is mandatory to find a good constraint weight combination. Searching the space of constraint weight combinations is intractable here by an exhaustive search. Therefore, we use Genetic Algorithms for this task [17]. Other works have also used evolutionary algorithms to train similar processes successfully [18]. This step is only done for training, and it needs an annotated dataset. The graph is solved using different weight combinations and an evolutionary process is done evaluating each time the graph partitioning results. Once training is done, constraint weights are saved for further executions. 4.4 Solving Graph Partitioning Problem Graph partitioning task determines the best partition assignment for the vertices, given a set of conditions. In this case, the conditions are the edge weights, which represent how strong are the involved constraints. Positive weights indicate that both adjacent vertices should be in the same partition, while negative weights indicate the opposite. The higher the weight, the harder the condition. The algorithms used (detailed in Section 5) iteratively look for combinations of partitions according to indications of edge weights.
434
E. Sapena, L. Padr´o, and J. Turmo
5 Algorithms We propose the use of two algorithms to entity disambiguation. The reason to compare a deterministic algorithm (Relax) with a probabilistic one (Ant) is the scalability. While a deterministic algorithm can ensure that the result is the best possible, for larger datasets it needs more resources and might be intractable. On the contrary, a probabilistic algorithm like Ants can achieve good performance (not the optimal) besides computational cost issues. 5.1 Relaxation Labeling Algorithm Relaxation is a generic name for a family of iterative algorithms which perform function optimization, based on local information. They are closely related to neural nets and gradient step. The algorithm has been widely used to solve AI problems [19] and also NLP problems such as from PoS-tagging [20], chunking, knowledge integration, and Semantic Parsing [21]. Relaxation labeling (Relax) solves our weighted constraint satisfaction problem dealing with variable compatibility coefficients. Each vertex is assigned to a partition satisfying as many constraints as possible. 5.2 Ants Algorithm The ants algorithm is a multiagent system based on the idea of parallel search. A generic version of the algorithm was proposed in [22]. The algorithm faces the problem as a graph coloring problem, optimizing a global fitness function. In theoretical computer science, “graph coloring” usually refers to a very specific constraint satisfaction problem: assigning colors to vertices such that no two adjacent vertices have the same color. However, this algorithm is more general and optimizes a global fitness function using colors as a vertex state, and using local fitness function to decide the color of each vertex. Playing with local and global fitness functions one can adapt the algorithm to solve almost any problem of constraint satisfaction. The algorithm works as follows. Initially, all vertices are randomly colored and a given number of agents (ants) is placed on the vertices, also at random. Then the ants move around the graph and change the coloring according to a local optimization criterion. The local and global fitness functions depend on the problem to solve and are the only part that normally needs adaption. Each movement or decision taken by an ant has a probability of error, which prevents the algorithm falling in local minima. The adaption of the algorithm to our entity disambiguation task is done by finding correct global and local fitness functions. Local fitness function is defined as: m−1 F it(v) =
i=0
ei .weight − l−1 j=m ej .weight l−1 i=0 |ei .weight|
(5)
where vertex v has l adjacent vertices and e.weight are the values of edge weights. From 0 to m − 1 are the edge weights corresponding to the adjacent vertices with the same
A Graph Partitioning Approach to Entity Disambiguation
435
color and from m to l − 1 are the ones corresponding to adjacent vertices with different color. Note that the values of the edge weights can be negative and their value is obtained as explained in section 3. If the obtained value is negative, F it(v) returns zero. The global fitness function is then the sum of all the vertices fitness: n−1
F it(vi ) (6) n where n is the total number of vertices. At the end of execution, vertices sharing a color are elements that refer to the same entity. GlobalF itness =
i=0
6 Evaluation Framework We evaluated our approach to entity disambiguation using two datasets Football and Cora. 6.1 Football Dataset We use data automatically extracted from different websites about football1 (soccer). The entities to disambiguate are people (players, coaches, referees and presidents), organizations (clubs and federations) and teams. There are also other entities extracted like competitions, awards and matches that does not need disambiguation. Also relations (some of them, temporal relations) and events have been extracted from these websites (for example, players belong to clubs, teams play matches). Many players, clubs and teams have similar or identical names. In this situation, and when the information is extracted from different sources, one can not integrate the information only using similarities or name comparisons. The whole data extracted consists of a high number of elements and relations as is summarized in Table 2. A representative part of the data has been manually labeled (last two columns in Table 2). We generated about 750.000 fixed constraints and 25.000 variable constraints of 33 different types (some examples in table 1). Each algorithm is trained and tested doing a five-fold cross-validation over the manually labeled data. The reason to evaluate the system with this dataset is the existence of uncertain relations. During the information extraction process, some relations point to an alias that, at that moment, it is not possible to know which real entity is referring to (for exemple, “Robert” belongsTo “Manchester”. One does not know which Robert neither which Manchester). We save this relation as uncertain to use it in the subsequent disambiguation process. We have extracted different kinds of entities with relations between them what let us a good scenario to test our proposal. 6.2 Cora In order to evaluate our methodology and algorithms in a dataset widely used, we choose Cora2 . It contains about 1800 citations, with 600 different papers. We 1 2
http://www.lsi.upc.edu/˜esapena/data/footballdb.tar.gz http://www.cs.umass.edu/˜mccallum/data/cora-refs.tar.gz
436
E. Sapena, L. Padr´o, and J. Turmo Table 2. Data used in the experiments Kind of # extracted # Ambiguous # Candidate # Elements # Real entities entity elements elements pairs labeled labeled Person 22,828 17,721 207,275 888 326 Club 1,929 811 1,334 54 18 Team 1,830 682 1,049 53 21
disambiguate papers, authors and venues, but only papers are used for training and test because only papers are labeled. It is splitted in three non-overlapped parts (fahl, kibl and utgo) that we use for cross-validation.
7 Experiments Three experiments have been done to evaluate our methodology and the algorithms proposed: – Comparing pair classification with graph partitioning. The goal is to prove that the graph point of view and the use of uncertain relations achieve better results than pair classification. We train Support Vector Machines (SVM) to disambiguate entities as a binary classifier. Each pair of candidate elements is evaluated using the information of fixed constraints. We compare the results of SVM with the results obtained with Relax and Ants. – Comparing algorithms with and without using uncertain relations. In this second experiment, the goal is to know how helpful is the use of uncertain information when disambiguating. We compare both algorithms Relax and Ants with and without using variable constraints. – Comparing Relax and Ants algorithms with Greedy Agglomerative Clustering. The goal of this third experiment is to compare iterative algorithms with greedy and also corroborate that our proposed algorithms achieve state-of-the-art performance using a widely used corpus. To evaluate the results we choose Purity and Inverse Purity measures and their harmonic mean F1 . In the Entity Disambiguation problem, the input is a set of elements and we are expecting a concrete association of them in the output. That is, we have to evaluate how correct are the groups of elements obtained. Purity (Pur) and Inverse Purity (IPur) are standard clustering measures that helps us to evaluate this kind of results. The precision of a cluster P ∈ P for a given category L ∈ L is given by P rec(P, L) := |P|P∩L| | . The overall value for purity is computed by taking the weighted average of maximal precision values: P ur(P, L) =
|P | maxL∈L P rec(P, L) |D|
P ∈P
(7)
A Graph Partitioning Approach to Entity Disambiguation
IP ur(P, L) =
|L| maxP ∈P P rec(L, P ) |D|
437
(8)
L∈L
F1 =
2 ∗ P ur(P, L) ∗ IP ur(P, L) P ur(P, L) + IP ur(P, L)
(9)
Pairwise precision and recall measures are not fully adequate for a partitioning problem. A good explanation of why Purity and Inverse Purity is more appropriate for this kind of evaluation can be found in [15]. In the results we only show the final F1 . We use two baselines to compare the algorithms. Baseline Join groups all the elements of each subgraph. That is, all directly or indirectly connected candidates are considered to be the same real entity. It produces an Inverse Purity of almost 100%, depending on the goodness of the candidate selection process. Baseline Disjoin does the opposite, it separates all the elements as if each one was a different real entity. It obtains 100% Purity. Also a Greedy Agglomerative Clustering (GAC) algorithm has been implemented in order to compare it with the performance of our proposed algorithms using the same information.
8 Results The results obtained in the first experiment are shown in Table 3. As expected, if we evaluate the accuracy achieved by SVM, we obtain a performance that seems quite good: 85.8%. However, once pairs are classified, the final result has to determine which elements refer to the same real entity. We join all the pairs of elements classified as positive to obtain a set of groups (single-link). This last step generates large groups of elements because any missclassification causes merging of two groups. A few of these missclassifications cause this bad performance. Consequently, final results of SVM tend to be similar to the baseline Join. Both algorithms based in graphs outperform SVM thanks to the graph-based approach. Table 3. Results of the first and second experiments in Football dataset. +UR means: using Uncertain Relations. Algorithm Join Disjoin SVM Ants Ants + UR Relax Relax + UR F1 8.2 12.3 53.6 75.6 79.2 81.5 83.7
The results obtained in the second experiment (Table 3) show that in both algorithms, Relax and Ants, the performance is better when uncertain information is used. Note that variable constraints represent only a 3% of the constraints in the Football dataset but they help the algorithms contributing with more information. The third experiment (Table 4) shows how iterative algorithm Ants performs slightly better than Greedy Agglomerative Clustering (GAC) using the same information in Cora dataset. However, Relax does not achieve GAC performance and all three results are in an interval about 1.3%, which is not significant. A possible reason is because in Cora the most informative constraints are strings comparisions.
438
E. Sapena, L. Padr´o, and J. Turmo Table 4. Results of the third experiment in Cora dataset Cora fahl kibl utgo Average
GAC 86.6 96.8 94.7 92.7
Ants 88.7 97.0 94.2 93.3
Relax 88.0 95.7 92.4 92.0
9 Conclusions We have proposed two algorithms to the Entity Disambiguation problem and a graphbased modeling using uncertain information. Our hypothesis is that graph-based point of view can solve some of the troubles of pairwise classifiers. Experiments show that our modeling yields better results since combines more accurately the available information. Also, it is able to use uncertain information which turns out to be quite helpful. Finally, we have seen that the proposed iterative algorithms achieve performance comparable to the state-of-the-art ones in a widely used corpus when no useful uncertain information is available.
References 1. Hernandez, M.A., Stolfo, S.J.: The merge/purge problem for large databases. In: SIGMOD 1995: Proceedings of the 1995 ACM SIGMOD international conference on Management of data, pp. 127–138. ACM Press, New York (1995) 2. Cohen, W., Ravikumar, P., Fienberg, S.: A comparison of string distance metrics for namematching tasks. In: Proceedings of the IJCAI (2003) 3. Bilenko, M., Mooney, R.J.: Adaptive duplicate detection using learnable string similarity measures. In: KDD 2003: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 39–48. ACM Press, New York (2003) 4. Han, H., Giles, L., Li, H.Z.C., Tsioutsiouliklis, K.: Two supervised learning approaches for name disambiguation in author citations. In: Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004, pp. 296–305 (2004) 5. McCallum, A., Nigam, K., Ungar, L.H.: Efficient clustering of high-dimensional data sets with application to reference matching. In: KDD 2000: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 169–178. ACM Press, New York (2000) 6. Bhattacharya, I., Getoor, L.: Iterative record linkage for cleaning and integration. In: DMKD 2004: Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, Paris, France, pp. 11–18. ACM Press, New York (2004) 7. Pasula, H., Marthi, B., Milch, B., Russell, S., Shpitser, I.: Identity uncertainty and citation matching. In: Processing (NIPS) (2002) 8. Doan, A., Lu, Y., Lee, Y., Han, J.: Profile-based object matching for information integration. IEEE Intelligent Systems 18(5), 54–59 (2003) 9. Shen, W., Li, X., Doan, A.: Constraint-based entity matching. In: Proceedings of AAAI (2005) 10. Singla, P., Domingos, P.: Entity resolution with markov logic. In: ICDM 2006, pp. 572–582. IEEE Computer Society, Washington (2006)
A Graph Partitioning Approach to Entity Disambiguation
439
11. Culotta, A., McCallum, A.: Joint deduplication of multiple record types in relational data. In: CIKM 2005: Proceedings of the 14th ACM international conference on Information and knowledge management, pp. 257–258. ACM, New York (2005) 12. Han, H., Zha, H., Giles, C.L.: Name disambiguation in author citations using a k-way spectral clustering method. In: JCDL 2005: Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries, pp. 334–343. ACM, New York (2005) 13. Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Trans. Knowl. Discov. Data 1(1), 5 (2007) 14. Wang, C., Lu, J., Zhang, G.: A constrained clustering approach to duplicate detection among relational data. In: Advances in Knowledge Discovery and Data Mining, pp. 308–319 (2007) 15. Chen, Z., Kalashnikov, D.V., Mehrotra, S.: Adaptive graphical approach to entity resolution. In: JCDL 2007: Proceedings of the 7th ACM/IEEE joint conference on Digital libraries, pp. 204–213. ACM, New York (2007) 16. Sapena, E., Padr´o, L., Turmo, J.: Alias assigment in information extraction. In: Proceedings of SEPLN-2007, Sevilla, Spain (2007) 17. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co., Inc., Boston (1989) 18. Pelillo, M., Abbattista, F., Maffione, A.: An evolutionary approach to training relaxation labeling processes. Pattern Recogn. Lett. 16(10), 1069–1078 (1995) 19. Rosenfeld, R., Hummel, R.A., Zucker, S.W.: Scene labelling by relaxation operations. IEEE Transactions on Systems, Man and Cybernetics 6(6), 420–433 (1976) 20. M`arquez, L., Padr´o, L., Rodr´ıguez, H.: A machine learning approach for pos tagging. Machine Learning Journal 39(1), 59–91 (2000) 21. Atserias, J.: Towards Robustness in Natural Language Understanding. Ph.D. Thesis, Dept. Lenguajes y Sistemas Inform´aticos. Euskal Herriko Unibertsitatea. Donosti. Spain (2006) 22. Comellas, F., Ozon, J.: An ant algorithm for the graph colouring problem. In: ANTS 1998 From Ant Colonies to Artificial Ants: First international workshop on ant colony optimization, Brussels (1998)