Visitors to a webpage are presented with text such as âA knife is ..... When interpreting a sentence as a drs a discourse referent (essentially a free variable) is ...
ASKNet: Automatically Creating Semantic Knowledge Networks from Natural Language Text
Brian Harrington Oxford University Computing Laboratory University of Oxford
Thesis submitted for the degree of Doctor of Philosophy Hilary Term 2009
To Sophie: You are the love of my life and have made me happier than I ever thought possible. You mean everything to me. I want to spend the rest of my life with you. Will you marry me?
Acknowledgements I would like to thank my supervisor Stephen Clark for his continued support throughout my time at Oxford. It was his assistance and guidance that made this thesis possible. I would also like to thank my family, especially my parents who have always provided me with moral (not to mention financial) support. Finally, I would like to thank Sophie for her role in motivating me. It is my desire to start a life with her that has pushed me through the final stages of writing this thesis and prevented me from becoming a perpetual student. This research was funded by the Clarendon Scholarship and in part by the Canadian Centennial Scholarship.
Contents 1 Introduction 1.1 Motivation . . . . . . . . . . . . . . . . . . . 1.2 Existing Semantic Resources . . . . . . . . . 1.2.1 Manually Constructed Resources . . 1.2.2 Automatically Constructed Resources 1.3 Contributions of This Thesis . . . . . . . . . 1.4 Outline . . . . . . . . . . . . . . . . . . . . .
. . . . . .
6 7 8 9 10 12 12
. . . . . .
15 16 16 16 19 20 21
. . . . . . .
24 24 27 28 29 35 35 36
4 Information Integration 4.1 Spreading Activation . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 History of Spreading Activation . . . . . . . . . . . . . . . . .
43 46 47
2 Parsing and Semantic Analysis 2.1 Parsing . . . . . . . . . . . . . . 2.1.1 Choosing a Parser . . . . 2.1.2 The C&C Parser . . . . 2.2 Semantic Analysis . . . . . . . . 2.2.1 Discourse Representation 2.2.2 Boxer . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . Theory . . . . .
3 Semantic Networks 3.1 A Semantic Network Definition . . . 3.2 The ASKNet Semantic Network . . . 3.2.1 Temporality and Other Issues 3.2.2 Network Implementation . . . 3.3 Parser Filtering . . . . . . . . . . . . 3.3.1 C&C Parser Filter . . . . . . 3.3.2 Boxer Filter . . . . . . . . . .
i
. . . . . . .
. . . . . . .
. . . . . .
. . . . . . .
. . . . . .
. . . . . . .
. . . . . .
. . . . . .
. . . . . . .
. . . . . .
. . . . . .
. . . . . . .
. . . . . .
. . . . . .
. . . . . . .
. . . . . .
. . . . . .
. . . . . . .
. . . . . .
. . . . . .
. . . . . . .
. . . . . .
. . . . . .
. . . . . . .
. . . . . .
. . . . . .
. . . . . . .
. . . . . .
. . . . . .
. . . . . . .
. . . . . .
. . . . . .
. . . . . . .
. . . . . .
. . . . . .
. . . . . . .
. . . . . .
. . . . . .
. . . . . . .
. . . . . .
. . . . . .
. . . . . . .
. . . . . .
. . . . . .
. . . . . . .
4.2
Spreading Activation in ASKNet . . . . . 4.2.1 Update Algorithm: Example . . . . 4.2.2 Update Algorithm: Implementation 4.2.3 Firing Algorithms . . . . . . . . . .
5 Evaluation 5.1 Network Creation Speed . . . . . . . 5.2 Manual Evaluation . . . . . . . . . . 5.2.1 Building the Network Core . . 5.2.2 Evaluating the Network Core 5.2.3 Results . . . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
. . . .
. . . . .
6 Semantic Relatedness 6.1 Using ASKNet to Obtain Semantic Relatedness Scores 6.2 WordSense 353 . . . . . . . . . . . . . . . . . . . . . . 6.3 Spearman’s Rank Correlation Coefficient . . . . . . . . 6.4 Experiment 1 . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Data Collection & Preparation . . . . . . . . . 6.4.2 Calculating a Baseline Score . . . . . . . . . . . 6.4.3 Calculating the ASKNet Score . . . . . . . . . . 6.4.4 Results . . . . . . . . . . . . . . . . . . . . . . . 6.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . 6.5 Experiment 2 . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Inspecting the Corpus . . . . . . . . . . . . . . 6.5.2 Building a Better Corpus . . . . . . . . . . . . . 6.5.3 Results: New Corpus . . . . . . . . . . . . . . . 6.5.4 Discussion . . . . . . . . . . . . . . . . . . . . .
. . . .
. . . . .
. . . . . . . . . . . . . .
. . . .
. . . . .
. . . . . . . . . . . . . .
. . . .
. . . . .
. . . . . . . . . . . . . .
. . . .
. . . . .
. . . . . . . . . . . . . .
. . . .
. . . . .
. . . . . . . . . . . . . .
. . . .
. . . . .
. . . . . . . . . . . . . .
. . . .
. . . . .
. . . . . . . . . . . . . .
. . . .
48 49 56 60
. . . . .
63 64 67 69 71 76
. . . . . . . . . . . . . .
80 82 84 84 85 85 86 87 87 90 91 91 94 95 96
7 Conclusions 100 7.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 7.1.1 Future Improvements . . . . . . . . . . . . . . . . . . . . . . . 102 7.1.2 External improvements . . . . . . . . . . . . . . . . . . . . . . 104 A Published Papers
106
B Semantic Relatedness Scores & Rankings - Initial Corpus
107
C Semantic Relatedness Scores & Rankings - Improved Corpus
115
ii
List of Figures 2.1
A simple CCG derivation using forward (>) and backward () and backward (61 | | dutch(x9) | | year(x1) | | publishing(x10) | | _________ | | nn(x10,x9) | | | | | | group(x9) | | x2:|---------| | | of(x7,x9) | | | old(x0) | | | _________ | | |_________| | | | | | | rel(x2,x1) | | x11:|---------| | | event(x2) | | | x0 = x7 | | | board(x3) | | |_________| | | timex(x6)=XXXX-11-29 | | event(x11) | | join(x4) | |_______________________| | agent(x4,x0) | | patient(x4,x3) | | nonexecutive(x5) | | director(x5) | | as(x4,x5) | | rel(x4,x6) | | event(x4) | |______________________|
Figure 2.3: Example Boxer output in “Pretty Print” format for the sentences Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group. and some limited anaphoric pronoun resolution. Some example output of the program can be seen in Figures 2.2 and 2.3. Representing the sentence as a drs is ideal for ASKNet for several reasons. The drs structure very closely mirrors the semantic network structure used in ASKNet, with discourse referents being roughly equivalent to object nodes and the semantic relations being analogous to either node labels or relations (see Section 3.2.2).
23
Chapter 3 Semantic Networks A semantic network can loosely be defined as any graphical representation of knowledge using nodes to represent semantic objects and arcs to represent relationships between objects. Used since at least the 3rd century AD in philosophy, with computer implementations in use for over 45 years [Masterman, 1962], a wide variety of formalisms have used the name semantic network [Sowa, 1992].
3.1
A Semantic Network Definition
For the purposes of this thesis, we will posit certain requirements for what we will consider as an acceptable semantic network. Primarily, we will require that the relations in the network be labelled and directed. This is to distinguish semantic networks from what we will call associative networks which connect concepts based simply on the existence of a relationship without regards to the relationship’s nature or direction (See Figure 3.1). Associative networks, often referred to as “pathfinder networks”, are technically a type of semantic network, and are quite often used because they can easily be extracted from co-occurrence statistics [Church and Hanks, 1990] and have 24
3.1. A SEMANTIC NETWORK DEFINITION
proven useful for many tasks [Schvaneveldt, 1990]; however for our purposes their lack of power and expressiveness will discount them from consideration. fly airplane bird
wing
plant
tree
animal cat
dog
christmas
house
whiskers
Figure 3.1: An example of an associative network. Objects and concepts are linked without distinction for type or direction of link. The second requirement we shall impose upon semantic networks is that they be structurally unambiguous. A given network structure should have only one semantic meaning. Thus, even though the semantically different ideas of John using a telescope to see a man and John seeing a man carrying a telescope can be encoded in the same English sentence John saw the man with the telescope, when that sentence is translated into a semantic network, the structure of the network must uniquely identify one of the two interpretations (See Figure 3.2). Semantic networks may still contain lexical ambiguity through having ambiguous words used as labels on nodes and arcs. For example in figure 3.3, it is impossible to tell whether bank refers to a financial institution, or the edge of a river. It is theoretically possible to remove lexical ambiguity from a semantic network by forcing each node to be assigned to a particular sense of the word(s) in its label, however word sense disambiguation is a very difficult task and there is no complete solution currently available. The third and final requirement we will make for semantic networks is that they 25
3.1. A SEMANTIC NETWORK DEFINITION
with John
telescope
saw man (a) John used the telescope to see the man
(b) John saw the man carrying the telescope
Figure 3.2: Semantic network representations of the two parses of “John saw the man with the telescope”
John
went to
bank
Figure 3.3: A lexically ambiguous network must be able to accommodate the complex structures regularly found in natural language text. In particular we will require that the network allow relations between complex concepts which may themselves contain many concepts and relations. This is to distinguish proper semantic networks from what we will call atomic networks which only allow simple nodes representing a single concept. These networks can only accommodate a limited type of information, and thus we will not include them in our definition of semantic networks. This notion of semantic network is not definitive, nor is it complete. We have said nothing of the network’s ability to deal with temporal, probabilistic or false information. A definitive definition of semantic networks (if indeed such a definition
26
3.2. THE ASKNET SEMANTIC NETWORK
is possible) is beyond the scope of this thesis. We have merely defined the minimum requirements necessary to be acceptable for our purposes.
3.2
The ASKNet Semantic Network
The semantic network formalism developed for ASKNet meets all of the criteria we have set out for consideration as a “proper” semantic network, and also has a few extra features that make it particularly well suited to the ASKNet project. We will first explain how the criteria are met, and then briefly describe the extra features that have been added. ASKNet trivially meets the first criterion by its design. All relations in ASKNet are labelled and directed with a defined agent and target node, of which at least one must be present before a relation can be added to the network. The second criterion is taken care of by the parser. One of the primary features of a parser is to select one parse from the list of possible parses for a natural language sentence. Since no information is discarded in the translation from the parser output to the network creation, we maintain a single parse and thus are left without any of the original structural ambiguity. The third criterion is met by the hierarchical structure of the network. This allows complex concepts and even entire discourses to be treated as single objects. As we see in Figure 3.4, complex objects can be built up from smaller objects and their relations. The hierarchical structure is unrestrictive, and thus it is possible for any pair of nodes to have a relation connecting them, or for a single node to be a constituent of multiple complex nodes. In figure 3.4 we can also see the attribute nodes (denoted by ellipses). Any object 27
3.2. THE ASKNET SEMANTIC NETWORK
Figure 3.4: A Hierarchical Semantic Network or relation can have multiple attribute nodes, and attribute nodes can also be complex nodes. One additional, but very important feature of ASKNet’s semantic network is that every link in the network is assigned a value between 0 and 1. This value represents the confidence or the salience of the link. This value can be determined by various means such as the confidence we have in the source of our information, or the number of different sources which have repeated a particular relation. In practice, the value (or weight) of a link is set by the update algorithm (see Section 4.2.1). Weights can also be assigned to attribute links.
3.2.1
Temporality and Other Issues
The network formalism presented here is robust and flexible, however it does not deal with all types of information. For example, there is nothing in the network to deal with the temporal nature of information. There is no way for ASKNet to know 28
3.2. THE ASKNET SEMANTIC NETWORK
whether a particular piece of information is meant to be true for a certain time period, or indefinitely true. For our current purposes, we will not attempt to expand the network formalism to deal with issues such as temporality. We recognise that there will be some limitations to the types of information that can be represented by ASKNet; however the current definition will be sufficient for the functionality required in this thesis.
3.2.2
Network Implementation
ASKNet’s internal semantic network is implemented in Java. It is designed to allow maximum flexibility in node type and hierarchy. A UML class diagram1 for the network is given in Figure 3.5. In this section we will explore the details of each of the classes of the network, explaining the functionality and design decisions of each class and how it interacts with the other parts of the network.
SemNode
SemNodes come in 4 distinct types (ObjNode, RelNode, AttNode and ParentNode). Each node type has a distinct activation threshold, but all of the node types are implemented almost identically. The primary difference between the node types is the way they are treated by the SemNet class.
• ObjNodes represent atomic semantic objects. They have a special field called neType which is set if the named entity recogniser (see Section 2.1.2) provided 1
This is a simplified class diagram and contains only a portion of the total class information. Much of the detail has been omitted to increase the saliency of more important features.
29
3.2. THE ASKNET SEMANTIC NETWORK
Figure 3.5: UML Class diagram of ASKNet’s semantic network architecture it with a label. For example, if a node represents a person, its neType field would be set to “per”. • RelNodes represent relations between objects or concepts. While some semantic networks simply label their links with the names of relations, ASKNet uses fully implemented nodes for this purpose, primarily so that relations themselves can have attributes and adjustable firing potentials, and also so that a single relation can have more than one label. All RelNodes have a agent and target link which provide the direction of the relation; at least one of these links must be instantiated for a RelNode to be created. This ensures that all relations must
30
3.2. THE ASKNET SEMANTIC NETWORK
have a direction. • AttNodes represent attributes of an object or concept. They are essentially simplifications of the “attribute of” relationship. Creating a distinct node type for attributes reduces unnecessary relations in the network; improving performance and making the networks more intuitive. • ParentNodes represent complex semantic concepts made up of two or more nodes. All of the members of the complex concept are labelled as the concept’s children and each node has a link to any ParentNodes of which it is a member. ParentNodes are often vacuous parents, which means that they are unlabeled and provide no additional information beyond the grouping of their constituent nodes. ParentNodes also have an allFire mode, wherein all of their children must have fired before they are allowed to fire. This is to prevent one constituent of a complex concept causing the entire concept to fire.
All nodes have a unique id assigned to them at the time of their creation which indicates the document from which they originated. A set of labels allows many labels 31
3.2. THE ASKNET SEMANTIC NETWORK
to be added to a single node, which is necessary as the same concept is often referred to in a variety of ways. Each node has arrays of links to the nodes with which it is connected; these are stored based on the type of node linked. All nodes contain a link to the monitor processes for their network so that they can easily report their status and events such as firing, label changes or deletion requests. Finally each node contains a link to the SemFire object for the network which processes firing requests and controls firing sequences for the nodes. SemNodes can receive a pulse of activation from another node or from the network; this increases the potential variable. If this causes it to exceed the firingLevel, then the node sends a request to SemFire to be fired (see Section 4.2.3 for more details). Nodes can also be merged; merging copies all of one node’s links and labels to another and then deletes the first node. Deleting a node sends messages to all connected nodes to delete the appropriate links from both ends so that no “dead” links can exist in the network.
SemLink
SemLinks form the links between nodes. Each link is assigned a strength when it is created, which can either represent the certainty of the link (i.e., how confident the system is that this link exists in the real world) or its salience (i.e., how often this link has been repeated in the input in comparison with other links of a similar type). This can be increased or decreased by the network as more information is gained.
32
3.2. THE ASKNET SEMANTIC NETWORK
SemNet
The SemNet class is the interface into the semantic network. All of the functionality of the network is available through SemNet’s methods. SemNet is used to add, remove, retrieve and manipulate nodes. It also indexes the nodes and contains the update algorithm (see Section 4.2.1). SemNet must be able to retrieve nodes based on both their unique ID and their label. Since the same label may be used in many nodes, this is achieved with a pair of hashtables. The first hashtable maps a string into a list of the IDs of all nodes which have that string as one of their labels. The second hashtable maps an ID to its corresponding node. The combination of these two hashtables allows SemNet to efficiently retrieve nodes based on either their label or their ID. SemNet’s print() method prints the contents of the network in GraphViz [Gansner and North, 2000] format so that the graph can be displayed visually for manual debugging and evaluation. This is done by calling the print() method of every node in the network. Each node then prints out its own details and the details of all of its links in a format which can be turned into a graphical representation by GraphViz. The majority of the diagrams in this thesis are created using this manner. For examples, see Figures 3.7 and 3.10. The print() method is very rarely called on an entire network for the simple 33
3.2. THE ASKNET SEMANTIC NETWORK
reason that the resultant graphical representation would be far too large. For this reason, SemNet has a printFired() method which only prints nodes which have fired since the last time the network was reset.
SemMonitor
SemMonitor receives status reports from every node in the network; this can be used for debugging purposes but it is also used to track which nodes fired in a given sequence of activation. All nodes have a link to the SemMonitor object for the network and are required to notify SemMonitor every time they fire.
SemFire
SemFire is structured similarly to SemMonitor in that every node in the network contains a link to the single SemFire object. When a node wishes to fire, it notifies SemFire. SemFire keeps a list of all requests and permits nodes to fire in an order specified by the firing algorithm (see Section 4.2.3)
34
3.3. PARSER FILTERING
3.3
Parser Filtering
The output of ASKNet’s parsing and analysis tools must be manipulated in such a way as to turn each sentence’s representation into the form of a semantic network update which can be used by the update algorithm (see Section 4.2.1). The module which performs this data manipulation is called the parser filter. The parser filter is designed in a modular fashion so that when one of the parsing and analysis tools changes, the parser filter can be easily replaced or altered without affecting the rest of ASKNet. Two parser filters have been developed for ASKNet, the first to filter the output of the C&C parser, and the second to filter the output of Boxer.
3.3.1
C&C Parser Filter
The first filter developed for the system was designed to work with the C&C parser’s grammatical relations output2 . The filter is essentially a set of rules mapping relations output by the parser to network features. For example, as can be seen in Table 3.1. upon encountering the output vmod(word1, word2), the filter turns the node for word2 into an attribute for the relational node word1 (if either of the nodes do not exist they are created; if the node for word1 is not already a relNode it is turned into one). Some of the rules require some complexity to ensure that links are preserved especially between parental nodes during the application of various rules. There are also a few “ad hoc” rules created to deal properly with phenomena such as conjunctions and disjunctions. The order in which rules are applied also greatly affects the performance of this filter. 2
The parser filter as described here is compatible with an older beta version of the grammatical relations output from the parser. This explains why the grammatical relations in this example are different to those in [Clark and Curran, 2007]
35
3.3. PARSER FILTERING Parser Output comp(word1, word2) vmod(word1, word2)
Rule Merge Node1 and Node2 Node2 becomes attNode Node1 becomes relNode Node2 becomes attribute of Node1 Parents of Node2 become parents of Node1 ncsubj(word1, word2) Node1 becomes relNode Subject link of Node1 set to Node2 dobj(word1, word2) Node1 becomes relNode Object link of Node1 points to Node2 Table 3.1: A sample of the rules used by the C&C parser filter The C&C Parser Filter is no longer used in ASKNet, but it is a good example of the type of filter that would need to be created if we chose to change the parsing and data analysis tools. The grammatical relations output by the C&C parser are radically different to the output of Boxer which is currently used, but with the creation of a simple filter, it can be fully integrated into ASKNet with little difficulty.
3.3.2
Boxer Filter
The Boxer filter takes advantage of the recursive nature of Boxer’s prolog output. The program is written recursively, handling one predicate at a time and continually calling itself on any sub-predicates. Like the C&C parser filter, the Boxer filter is essentially a set of rules mapping predicates to network fragments (See Figure 3.8 for a simple example). However, with the output of Boxer, the predicates are nested recursively, so the filter must deal with them recursively. Table 3.2 shows the rules for a number of Boxer’s prolog predicates. Several of the rules used by the Boxer filter are context sensitive (i.e., if a predicate tries to label a node which is in one of its parent nodes, it is treated as an attribute instead). There are also a number of “special case” rules such as those shown where 36
3.3. PARSER FILTERING nmod(Vinken_2, Pierre_1) nmod(years_5, 61_4) comp(old_6, years_5) ncsubj(old_6, Vinken_2) detmod(board_11, the_10) dobj(join_9, board_11) nmod(director_15, nonexecutive_14) detmod(director_15, a_13) comp(as_12, director_15) vmod(join_9, as_12) comp(Nov._16, 29_17) vmod(join_9, Nov._16) xcomp(will_8, join_9) ncsubj(will_8, Vinken_2) ncsubj(join_9, Vinken_2) Pierre|NNP|N/N Vinken|NNP|N ,|,|, 61|CD|N/N years|NNS|N old|JJ|(S[adj]\NP)\NP ,|,|, will|MD|(S[dcl]\NP)/(S[b]\NP) join|VB|(S[b]\NP)/NP the|DT|NP[nb]/N board|NN|N as|IN|((S\NP)\(S\NP))/NP a|DT|NP[nb]/N nonexecutive|JJ|N/N director|NN|N Nov.|NNP|((S\NP)\(S\NP))/N[num] 29|CD|N[num] .|.|. nmod(Vinken_2, Mr._1) nmod(N.V._7, Elsevier_6) nmod(group_12, publishing_11) nmod(group_12, Dutch_10) detmod(group_12, the_9) conj(,_8, group_12) conj(,_8, N.V._7) comp(of_5, group_12) comp(of_5, N.V._7) nmod(chairman_4, of_5) dobj(is_3, chairman_4) ncsubj(is_3, Vinken_2) Mr.|NNP|N/N Vinken|NNP|N is|VBZ|(S[dcl]\NP)/NP chairman|NN|N of|IN|(NP\NP)/NP Elsevier|NNP|N/N N.V.|NNP|N ,|,|, the|DT|NP[nb]/N Dutch|NNP|N/N publishing|VBG|N/N group|NN|N .|.|. Figure 3.6: Sample output from the C&C parser for the text “Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group”.
37
3.3. PARSER FILTERING
Figure 3.7: The C&C parser filter output for the input given in Figure 3.6 38
3.3. PARSER FILTERING
Figure 3.8: A simple example of the Boxer DRS (left) and resulting ASKNet network fragment (right) for the sentence “John scored a great goal” the predicate was either ‘agent’ or ‘event’. The Boxer filter continues calling itself recursively, creating sub-networks within parent nodes (this results in the hierarchical nature of the network) until it has processed the entire prolog drs structure and we are left with a semantic network which represents all of the information in the discourse.
39
3.3. PARSER FILTERING
Prolog Predicate drs(A[ ],B)
Rule Create one node for each of the discourse referents in A Recursively call filter on B prop(x,B) Recursively call filter on B Set x as the parent node for network fragment created by B named(x, text, type) Set x to named entity type type Give node x label text pred(text, x) Give node x label text pred(‘event’, x) Set x to type relNode pred(text,[x,y]) Create relNode z with label text set subject link of z to x set object link of z to y pred(‘agent’,[x,y]) Set agent link of y to x eq(x,y) Create relNode z with label is set subject link of z to x set object link of z to y or(A,B) Create parentNode x with label or Create unlabeled parentNode y Create unlabeled parentNode z Set x as parent of y and z Recursively call filter on A Set y as the parent node for network fragment created by A Recursively call filter on B Set z as the parent node for network fragment created by B
Table 3.2: A sample of the rules used by the Boxer filter. Capital letters represent prolog statements, lower case letters represent prolog variables
40
3.3. PARSER FILTERING
smerge( drs( [[1001, 1002]:x0, [1004, 1005]:x1, [1006]:x2, [1010]:x3, [1009]:x4, [1013]:x5, [1016]:x6], [ [2001]:named(x0, mr, ttl), [1002, 2002]:named(x0, vinken, per), [1001]:named(x0, pierre, per), [1004]:card(x1, 61, ge), [1005]:pred(year, [x1]), [1006]:prop(x2, drs([], [[1006]:pred(old, [x0])])), [1006]:pred(rel, [x2, x1]), []:pred(event, [x2]), [1011]:pred(board, [x3]), [1016, 1017]:timex(x6, date([]:’XXXX’, [1016]:’11’, [1017]:’29’)), [1009]:pred(join, [x4]), [1009]:pred(agent, [x4, x0]), [1009]:pred(patient, [x4, x3]), [1014]:pred(nonexecutive, [x5]), [1015]:pred(director, [x5]), [1012]:pred(as, [x4, x5]), [1016]:pred(rel, [x4, x6]), []:pred(event, [x4])]), drs( [[2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012]:x7, [2006, 2007]:x8, [2009]:x9, [2011]:x10, [2003]:x11], [ [2004]:pred(chairman, [x7]), [2006, 2007]:named(x8, elsevier_nv, loc), [2005]:pred(of, [x7, x8]), [2010]:pred(dutch, [x9]), [2011]:pred(publishing, [x10]), []:pred(nn, [x10, x9]), [2012]:pred(group, [x9]), [2005]:pred(of, [x7, x9]), [2003]:prop(x11, drs([], [[2003]:eq(x0, x7)])), []:pred(event, [x11]) ])) ). Figure 3.9: Sample output from Boxer for the text “Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group”.
41
3.3. PARSER FILTERING
Figure 3.10: Boxer filter output for the input given in Figure 3.9
42
Chapter 4 Information Integration The power of ASKNet comes from its ability to integrate information from various sources into a single cohesive representation. This is the main goal of the update algorithm (see Section 4.2.1). For an example of the type of additional information which can be gained by information integration, consider the set of network fragments shown in Figure 4.1. Each fragment is taken from a different source, and without being integrated into a cohesive resource, the fragments are of little value, particularly as they are likely to be scattered among many other fragments. However, when we integrate the fragments into a single network by mapping co-referent nodes together as in Figure 4.2 it becomes apparent that there is a connection between Chemical F and Disease B that would not have been apparent from the information given in each fragment separately. In Figure 4.2 there is a path connecting Chemical F and Disease B even though they never appeared together within a document. Analysing these paths could lead to the discovery of novel relationships (This is discussed further in Section 7.1.1). This simple example shows the potential power that can be gained by integrating
43
Figure 4.1: A collection of network fragments taken from various sources, including news, biomedical, geographical and political information networks into cohesive units. We will revisit this example in later chapters when we explore the uses of the ASKNet system, and at that time we will see how beneficial these connections can be.
Information Integration and GOFAI
Some might argue that information integration in the manner outlined in this thesis is ai-complete, or “Good Old Fashioned ai” (GOFAI). While that may be the case if one were arguing about a system’s ability to extract all possible information within a corpora, it is certainly possible to integrate a large amount of information without the need for GOFAI. With the vast quantities of natural language text readily available through the internet, a system could integrate only a small percentage of the information it received and still produce a resource that is useful to the scientific community. We do not claim that the methodologies outlined in this thesis will ever be able to extract
44
Figure 4.2: An integrated network created from the fragments in Figure 4.1 all possible information from a corpus, but we will attempt to show that they can extract and integrate enough information to create a high quality, large scale useful resource.
Existing Information Integration Systems
Most of the research on information integration has been done in the database paradigm, using string similarity measurements to align database fields [Bilenko et al., 2003]. Research done on natural language information integration has mostly centered on document clustering based on attributes gained from pattern matching [Wan et al., 2005]. The majority of automatically created semantic resources, such as those referenced in Section 1.2.2, have only the simplest forms of information integration. This limits their ability to create large scale resources from diverse text sources, as it limits the system’s usefulness when processing data other than dictionary or encyclopedia entries (which are explicit enough to be processed without needing integration). One particularly interesting line of research is the work of Guha and Garg [Guha and Garg, 2004]. They propose a search engine which clusters document results 45
4.1. SPREADING ACTIVATION
which relate to a particular person. The proposed methodology is to create binary first order logic predicates (e.g.,first name(x,Bob), works for(x,IBM)) which can be treated as attributes for a person, and then using those attributes to cluster documents about one particular individual. This amounts to a simplified version of the problem ASKNet attempts to solve, using a simplified network, and limiting the domain to personal information; the results, however, are promising.
4.1
Spreading Activation
Spreading activation is a common feature in connectionist models of knowledge and reasoning, and is usually connected with the neural network paradigm. Spreading activation in neural networks is the process by which activation can spread from one node in the network to all adjacent nodes in a similar manner to the firing of a neurone in the human brain. Nodes in a spreading activation neural network receive activation from their surrounding nodes, and if the total amount of accumulated activation exceeds some threshold, that node then fires, sending its activation to all nodes to which it is connected. The amount of activation sent between any two nodes is proportional to the strength of the link between those nodes with respect to the strength of all other links connected to the firing node. The activation function used in ASKNet is given in (4.1). Spreading activation algorithms are, by nature, localised algorithms. Due to the signal attenuation parameter (given as βx,y in Equation 4.1), it is guaranteed that the signal can only travel a set distance from the originating node. Assuming the network in which they are implemented is of sufficient size, firing any one node affects only a small percentage of the nodes (i.e., those strongly linked to the original firing node), and leaves the remainder of the network unaffected.
46
4.1. SPREADING ACTIVATION weighti,j k|k∈link(i),k6=j βi,k weighti,k
activationi,j = αi P
αx activationx,y weightx,y βx,y
link(x) link(x, y)
4.1.1
(4.1)
Symbol Definitions Firing variable which fluctuates depending on node types Amount of activation sent from node x to node y when node x fires Strength of link between node x and node y Signal attenuation on link (x,y), 0 < β < 1 determines the amount of activation that is lost along each link. Fluctuates depending on link types The set of nodes y such that link(x,y) exists The directed link from node x to node y
History of Spreading Activation
The discovery that human memory is organised semantically and that concepts which are semantically related can excite one another came from the field of psycho-linguistics. Meyer and Schvaneveldt [Meyer and Schvaneveldt, 1971] showed that when participants were asked to classify pairs of words, having a pair of words which were semantically related increased both the speed and the accuracy of the classification. They hypothesised that when one word is retrieved from memory this causes other semantically related words to be primed and thus retrieval of those words will be facilitated. The formal theory of spreading activation can be traced back to the work of Quillian [Quillian, 1969] who proposed a formal model for spreading activation in a semantic network. This early theory was little more than a marker passing method where the connection between any two nodes was found by passing markers to all adjacent nodes until two markers met, similar to a breadth first search. It was the work of Collins and Loftus [Collins and Loftus, 1975] that added the main features of what we today consider spreading activation, such as: signal attenuation,
47
4.2. SPREADING ACTIVATION IN ASKNET
summation of activation from input nodes and firing thresholds. Despite the obvious theoretical advantages of Collins and Loftus’ model, due to computational restraints much of the work which has used the title of “spreading activation” has very rarely used the full model. Many researchers used a simplified marker passing model [Hirst, 1987], or used a smaller or simplified network because the manual creation of semantic networks that fit Collins and Loftus’ model was too time consuming [Crestani, 1997, Preece, 1981]. The application of spreading activation to information retrieval gained a great deal of support in the 80s and early 90s [Salton and Buckley, 1988, Kjeldsen and Cohen, 1988, Charniak and Goldman, 1993]: However the difficulty of manually creating networks, combined with the computational intractability of automatically creating networks caused most researchers to abandon this course [Preece, 1981]. In the past few years there has been an increase in the number of nlp projects utilising spreading activation on resources such as WordNet and Wikipedia [Wang et al., 2008, Nastase, 2008].
4.2
Spreading Activation in ASKNet
The semantic network created for ASKNet has been designed specifically for use with spreading activation. Each node maintains its own activation level and threshold, and can independently send activation to all surrounding nodes (This can be done with or without regard to the direction of the links. For the purposes of this thesis, unless stated otherwise, all activation spread disregards link direction). Monitor processes control the firing patterns and record the order and frequency of node firing. Each of the various types of nodes (object, relation, parent, attribute, etc.) can
48
4.2. SPREADING ACTIVATION IN ASKNET
have its own firing threshold and even its own firing algorithm. Each node type has a global signal attenuation value that controls the percentage of the activation that a node of this type passes on to each of its neighbours when firing. This mirrors natural neural networks, and also ensures that the network will always eventually return to a stable state, as with each successive firing, some activation is lost and thus firing cannot continue indefinitely. Spreading activation is by nature a parallel process, however it is implemented sequentially in ASKNet for purely computational reasons. While future work may allow parallelisation of the algorithm, the current system has been designed to ensure that the sequential nature of the processing does not adversely affect the outcome. Two separate implementations of the firing algorithm have been created. The first is a pulsing algorithm where each node which is prepared to fire at any given stage fires and the activation is suspended until all nodes have finished firing. This is analogous to having the nodes fire simultaneously on set pulses of time. The second implementation of the firing algorithm uses a priority queue to allow the nodes with the greatest amount of activation to fire first (for more detailed information see Section 4.2.3).The second algorithm is more analogous to the asynchronous firing of neurones in the human brain, however both implementations have been fully implemented and the user can choose which firing method they wish the system to use.
4.2.1
Update Algorithm: Example
The update algorithm takes a smaller network or network fragment (update network ) and integrates it into a larger network (main network ). Essentially the same algorithm is used for updating at the sentence level and at the document level. When updating at the sentence level, the update network represents the next sentence in the document and the main network represents all previous sentences in the document. When 49
4.2. SPREADING ACTIVATION IN ASKNET
updating at the document level, the update network represents the document, and the main network represents all of the documents that have been processed by the system. This section of the thesis attempts to convey an understanding of the update algorithm by walking through a simple example. This example uses a simplified network to avoid unnecessary details. The numbers used here are not representative of the values received in the actual ASKNet system; the changes in values at each update have been increased so that we can see a change in the network after only one iteration. In normal performance, changes of this magnitude would require multiple iterations of the algorithm. For this example, we consider the update shown in Figure 4.3 where we are attempting to update a network containing information on United States politics, along with a few “red herrings”, with an update network formed from the sentence “Bush beat Gore to the White House”. All nodes in this example will be referred to by their ID field. Initially, all named entity nodes from the update network are matched with any similar nodes from the main network1 . The nodes are compared on simple similarity characteristics as computed in Equation 4.2. A similarity score is then calculated for each node pairing producing the matrix shown in Table 4.1. For the purposes of this example, we will assume that initially all of the similarity scores for disputed nodes are equal. Once the initial scoring is completed, the algorithm chooses a node from the update network (in this case let us choose bu) and attempts to refine its similarity scores. In order to do this, it first puts activation into the bu node, then allows the network to 1
This algorithm is not restricted to named entity nodes, and could be computed for any or all node type. However for clarity of explanation, we will restrict the following example to named entity nodes.
50
4.2. SPREADING ACTIVATION IN ASKNET
Figure 4.3: An example main network containing information about United States politics, writers and mathematicians being updated by a network fragment formed from the sentence “Bush beat Gore to the White House.” bu go wh
georgebush 0.5
johnbush algore gorevidal 0.5 0.5 0.5
whitehouse
0.5 Table 4.1: Similarity Matrix: Initial scoring
fire. This results in activation accumulating in the go and wh nodes. The amount of activation in each node will depend on the structure of the network and the strength of the various links. For the purposes of this example, it is sufficient to see that the go and wh nodes will receive some activation. Once the network has settled into a stable state, the activation from all update 51
4.2. SPREADING ACTIVATION IN ASKNET
scorei,j = α ∗ N EBooli,j + β ∗
scorex,y α N EBoolx,y β labelsx
|labelsi ∩ labelsj | |labelsi ∪ labelsj |
(4.2)
Symbol Definitions The initial similarity score computed for the node pair (x,y) Weighting given to named entity similarity A boolean set to 1 if x and y have the same NE type otherwise set to 0 Weighting given to label similarity The set of textual labels of node x
network nodes, except the node being refined (in this case bu), is transferred to the corresponding nodes in the main network as seen in Figure 4.4. The activation for nodes with more than one potential mapping is split among all potential main network candidate nodes based on the current similarity matrix score. In this example, since the similarity matrix scores (go,algore) and (go,gorevidal) are equal, the activation from go is split evenly. The main network is then allowed to fire, and the transferred activation spreads throughout the main network. In our example, some of the activation from the algore and whitehouse nodes will reach the georgebush node, the activation from gorevidal node will not reach any named entity nodes, and the johnbush node will receive no activation at all. The algorithm can now refine the similarity scores based on the activation received. Since the georgebush node received some activation, we will increase the similarity score for (bu,georgebush) slightly. The johnbush node did not receive any activation at all, and so we will decrease the similarity score for (bu,johnbush). The resulting similarity matrix is shown in Table 4.2. The algorithm has now used semantic information to refine the scores for mapping of the bu node. Since the nodes which bu and georgebush are connected to are
52
4.2. SPREADING ACTIVATION IN ASKNET
Figure 4.4: The activation from the update network is transferred to the main network. For nodes with more than one potential mapping, the activation is split based on the current similarity matrix score. bu go wh
georgebush 0.6
johnbush algore gorevidal 0.25 0.5 0.5
whitehouse
0.5
Table 4.2: Similarity Matrix: After refining scores for bu node similar, we have an increased confidence that they are referring to the same realworld entity, and since the nodes which bu and johnbush are connected to share no similarity, we have decreased our confidence in their referring to the same real world entity.
53
4.2. SPREADING ACTIVATION IN ASKNET
The algorithm then attempts to refine the scores for another node in the update network (let us choose go). The process this time is similar, however rather than the activation from bu being transferred evenly to the potential main network matches as the activation from go was in the previous iteration, it is instead transferred more heavily to the georgebush node, since the similarity score for (bu,georgebush) is now higher than that of (bu,johnbush) As depicted in Figure 4.5. Increasing the activation in the georgebush node means that the algore node will receive more activation, and thus we will increase its score more than we would have, had we not refined the scores for the bu node. Thus, as we refine our similarity matrix becomes more refined with each iteration. The similarity matrix after the first complete iteration can be seen in Table 4.3. bu go wh
georgebush 0.6
johnbush algore gorevidal 0.25 0.65 0.25
whitehouse
0.5
Table 4.3: Similarity Matrix: After refining scores for bu and go nodes Refining the scores for the bu and go nodes means that when we attempt to refine the final wh node, less activation is wasted in the activation transfer from update network to main network (i.e., the georgebush and algore nodes get much more activation and the johnbush and gorevidal nodes get less). Therefore when the main network is fired, the whitehouse node receives more activation than it would have if we had not refined the other scores. Thus we increase its score more, resulting in the similarity matrix given in Table 4.4. After one iteration of the update algorithm, the similarity matrix has been improved. As we saw already in the first iteration, the increased confidence in one mapping leads to large increases in confidence in future mappings (this is in line with intuition; as we gain increased confidence in one area, we can use that confidence to 54
4.2. SPREADING ACTIVATION IN ASKNET
Figure 4.5: The activation from the update network is transferred to the main network. The activation from the bu node is split unevenly, with a higher percentage going to georgebush than johnbush due to our updated similarity scores. bu go wh
georgebush 0.6
johnbush algore gorevidal 0.25 0.65 0.25
whitehouse
0.7
Table 4.4: Similarity Matrix: After one iteration of the update algorithm make bolder predictions in related areas). Therefore, the update algorithm becomes a self reinforcing loop, allowing the similarity scores to converge. The update algorithm is run multiple times on a single update, or until all update nodes have a similarity score above a set threshold (called the mapping threshold ).
55
4.2. SPREADING ACTIVATION IN ASKNET
When the algorithm terminates, any pairs with a similarity score above the mapping threshold are mapped together, and all non-mapped nodes are simply placed into the main network. Eventually, our example should map wh to whitehouse, bu to georgebush and go to algoregore, resulting in the updated network shown in Figure 4.6.
Figure 4.6: Network resulting from application of the update algorithm This is a simplified example, and many features of the algorithm have been abstracted away. However it gives an overall understanding of how the algorithm works, and hopefully provides an insight into the intuition behind the update algorithm. Further details of the particulars are provided in the next section.
4.2.2
Update Algorithm: Implementation
The update algorithm, like the rest of ASKNet, is implemented in Java. The majority of the algorithm is implemented by the NetMerger class, which takes two networks as parameters and returns a list of nodes which should be mapped together. The update 56
4.2. SPREADING ACTIVATION IN ASKNET
algorithm was designed to be executable on very large networks, and thus required data structures and sub-algorithms that would ensure that the overall algorithm would be efficient in terms of both time and memory usage. In this section we will explain the three main classes which perform the update algorithm, and the data structures which they use to implement the algorithm efficiently.
NetMerger
The NetMerger class implements the actual update algorithm as described in Section 4.2.1. The mergeDiscourse method performs the updates at a sentence level, mapping each sentence’s drs into a single update network. The mergeNetworks method performs an almost identical algorithm to merge the update networks into the main network. The two methods are implemented separately so that different firing parameters can be used for the two types of updates, and also so that in future additional features such as improved anaphora resolution could be implemented in the mergeDiscourse method. The merge methods do not actually perform the mappings, but rather calculate the similarity scores, and return listings of node pairs which should be mapped together. The map method then performs the actual mapping, calling appropriate methods from the SemNet class. The NetMerger class never directly manipulates network nodes. It works exclu57
4.2. SPREADING ACTIVATION IN ASKNET
sively with node IDs in string form, and interfaces with the SemNet class to perform all necessary functions. This results in increased modularity of code, and also simplifies the writing and debugging of the algorithms.
ScoreMatrix
With networks potentially reaching millions of nodes, it is obviously inefficient to calculate and store the similarity scores for all possible node pairs, particularly as the vast majority of the scores would never be updated. For this reason, the scoreMatrix class is implemented as a specialised sparse matrix. The scoresTable hashtable maps the ID of each node in the update network to a hashset of MapScore objects, which represents all of the similarity scores in its row. This allows calculation of the relevant elements of the similarity matrix without making it necessary to create objects for similarity scores which never get updated. If a score is never updated, it is never created, thus conserving memory. The use of hashtables and hashsets also allows efficient lookup of individual similarity scores, thus allowing individual scores to be calculated without the need to search through entire rows or columns of the matrix. Figure 4.7 shows an example of the data structures used.
58
4.2. SPREADING ACTIVATION IN ASKNET
Figure 4.7: An example similarity matrix and the corresponding ScoreMatrix data structure MapScore
MapScore objects represent a similarity score for a pair of nodes. The update and main network nodes are differentiated, as several of the functions which use MapScore 59
4.2. SPREADING ACTIVATION IN ASKNET
are designed around the assumption that the update algorithms will normally be performed between one very large network and one relatively small network. Thus differentiating the networks to which the nodes belong is very important.
4.2.3
Firing Algorithms
Ultimately each node in a neural network should act independently, firing whenever it receives the appropriate amount of activation. This asynchronous communication between nodes is more directly analogous to the workings of the human brain, and most spreading activation theories assume a completely asynchronous model. In practice, it is difficult to have all nodes operating in parallel. ASKNet attempts to emulate an asynchronous network through its firing algorithm. Each network has a SemFire object (see Section 3.2.2 for class diagram) which controls the firing of the nodes in that network. When a node in the network is prepared to fire, it sends a firing request to the SemFire object. The SemFire object then holds the request until the appropriate time before sending a firing permission message to the node allowing it to fire. Two separate firing algorithms have been implemented in ASKNet.
60
4.2. SPREADING ACTIVATION IN ASKNET
Pulse Firing
The pulse firing algorithm emulates a network where all nodes fire simultaneously at a given epoch of time. Each node that is prepared to fire at a given time fires, and the system waits until all nodes have fired and all activation levels have been calculated before beginning the next firing round. To implement this algorithm, the SemFire object retains two lists of requests. The first is the list of firing requests which will be fulfilled on this pulse; we will call this list the pulse list. The second list contains all requests made during the current pulse; we will call this the wait list. The SemFire object fires all of the nodes with requests in the pulse list, removing a request once it has been fulfilled (in this algorithm the order of firing is irrelevant), while placing all firing requests it receives into the wait list. Once the pulse list is empty, and all requests from the current pulse have been collected in the wait list, the SemFire object simply moves all requests from the wait list into the pulse list, and is then ready for the next pulse.
Priority Firing
The priority firing algorithm emulates a network where the amount of activation received by a node dictates the speed with which the node fires. Nodes receiving higher amounts of activation will fire faster than nodes which receive just enough to meet their firing threshold. To implement this algorithm, the SemFire object retains a priority queue of requests, where each request is assigned a priority based on the amount of activation it received over its activation threshold (4.3). The SemFire object fulfills the highest priority request; if a new request is received while the first request is being processed, it is added to the queue immediately. 61
4.2. SPREADING ACTIVATION IN ASKNET
priorityi = αi (acti − leveli ) priorityx αx acti leveli
(4.3)
Symbol Definitions The priority of node x Type priority variable, dependant on node type of x (can be set to give a higher priority to a particular node type) Activation level of node x Firing level of node x
The two firing algorithms would be equivalent if all of the activation in the network spread equally. However when a node fires, it sends out a set amount of activation, and excess activation received above the firing threshold disappears from the network. The effect of this disappearing activation is that the order in which nodes fire can change the final pattern of activation in the network. It is therefore important that both firing algorithms be implemented and tested so that a choice can be made based on their contribution to the performance of the system. In practice both firing algorithms obtain similar results with the minor differences being cancelled out when processing large data sets. For the experimental data in this thesis we have chosen to use the pulse firing algorithm as it allows for easier debugging since we can pause the firing at any step to get a “freeze-frame” of the system mid-fire.
62
Chapter 5 Evaluation Evaluation of large scale semantic network creation systems is a difficult task. The scale and complexity of the networks makes traditional evaluation metrics impractical, and since ASKNet is the first system of its kind, there are no existing systems against which we can directly compare. In this chapter we discuss the metrics we have developed in order to evaluate ASKNet, and describe the implementation and results of those metrics. One of the most important aspects of ASKNet is its ability to create large scale semantic networks efficiently. In particular, this means that as the size of a network grows, the time taken to add a new node should not increase exponentially. In order to evaluate the efficiency of ASKNet, we first show evidence of its ability to efficiently create semantic networks on a scale comparable with the largest available resources. We then establish an upper bound on network creation time, showing that as the network grows, the time required to add new nodes increases linearly. The establishment of this upper bound is very important, as although it is necessary to show that ASKNet can efficiently create networks with a size comparable to any existing resource, the upper bound shows that the existing algorithms can scale up 63
5.1. NETWORK CREATION SPEED
to networks many orders of magnitude larger, on a scale which simply can not be achieved by manual creation. Creating large networks efficiently is trivial if there is no regard to the quality of the networks produced. (for example, one could simply create nodes and links at random). It is therefore important that we establish the quality of the networks produced by ASKNet. This is not a trivial task, as the networks are generally too large to evaluate manually, and there exists no gold standard against which we can compare. We attempt to solve these problems by evaluating the precision of a “network core”, a subset of the network containing the most important information from the input documents. Using humans to evaluate these cores, we were able to establish a precision score of 79.1%. This is a very promising result for such a difficult task. While human evaluation of the network cores provides a good direct measure of the quality of a portion of the network, it is also important to evaluate the produced networks as a whole. Therefore we implement a task based evaluation, using ASKNet to perform a real world task and comparing its results against those of state of the art systems using other methodologies. In this instance, we have chosen automatic judgement of the semantic relatedness of words, a task for which we believe ASKNet to be well suited. We compare ASKNet’s scores against human judgements, and find that it correlates at least as well as the scores of top performing systems utilising WordNet, pointwise mutual information or vector based approaches. This evaluation will be described in Chapter 6.
5.1
Network Creation Speed
One of the major goals of the ASKNet system is to efficiently develop semantic resources on a scale never before available. For this end, it is not only important that 64
5.1. NETWORK CREATION SPEED
we are able to build networks quickly, it is also imperative that the time required to build networks does not increase exponentially with respect to the network size. This is one of the advantages of using spreading activation algorithms; since the area of the network affected by any one firing is not dependent on the size of the overall network, the time taken by the algorithms should not increase exponentially with the network size. In order to evaluate the network creation speed, we chose articles of newspaper text from the 1998 New York Times as taken from the AQUAINT Corpus of English News Text1 , which mentioned then United States President Bill Clinton. By choosing articles mentioning a single individual we hoped to create an overlap in named entity space without limiting the amount of data available. This ensured that the update algorithm (see Section 4.2.1) was run more frequently than would be expected in unrestricted text, thus giving us a good upper bound on the performance of ASKNet. In order to further ensure that this experiment gave a true representation of the network creation speed, all spreading activation based parameters were set to allow for maximum firing strength and spread. This meant that as many nodes as possible were involved in the update algorithm, thus increasing the algorithmic complexity as much as possible. After processing approximately 2 million sentences, ASKNet was able to build a network of over 1.5 million nodes and 3.5 million links in less than 3 days. This time also takes into account the parsing and semantic analysis (See Table 5.1). This is a vast improvement over manually created networks which take years or even decades to achieve networks of less than half this size [Matuszek et al., 2006]. During the creation of the network, the time taken to add a new node to the network was monitored and recorded against the number of nodes in the network. 1
Made available by the Linguistic Data Consortium (LDC).
65
5.1. NETWORK CREATION SPEED Total Number of Nodes 1,500,413 Total Number of Edges 3,781,088 Time: Parsing 31hrs : 30 min Time: Semantic Analysis 16 hrs: 54 min Time: Building Network & Information Integration 22 hrs : 24 min Time: Total 70 hrs : 48 min Table 5.1: Statistics pertaining to the creation of a large scale semantic network This allowed us to chart the rate at which network creation speed slowed in relation to network size. The results are shown in Figure 5.1.
Figure 5.1: Average time to add a new node to the network vs. total number of nodes As the network size began to grow, the average time required to add a new node began to climb exponentially. This is to be expected in a small network as the update algorithm’s spreading activation would reach most or all nodes in the network, and so each additional node would increase the time required for each run of the update algorithm. Because the spreading activation algorithms are localised (see Section 4.1), once 66
5.2. MANUAL EVALUATION
the network becomes so large that the activation does not spread to the majority of nodes, an addition of a new node is unlikely to have any effect on the spreading activation algorithm. As we see in Figure 5.1, when the network size hits a critical point (in this case approximately 850,000 nodes) the average time required to add a new node begins to grow linearly with respect to network size. This shows that average node insertion time eventually grows linearly with the size of the network, which implies that the total time to create a network (assuming the network is large enough) is linear with respect to the network’s size. This result was obtained by choosing input sentences on similar topics, with high named entity overlap, and the maximum possible activation spread. In practice the exponential growth of node creation time would terminate at a much smaller network size. Eventually average node insertion time becomes effectively constant as adding new nodes becomes less likely to affect firing algorithms for other parts of the network. This evaluation has empirically established that (for networks over a given size) the time required to build the network grows linearly with respect to network size. This is important in establishing ASKNet’s ability to build very large scale networks. It is promising that we were able to build a network twice as large as anything previously in existence in only a matter of days; however it is even more promising that the growth rate has been shown to be sub-exponential. This means that there is very little limitation on the size of network that could potentially be created with ASKNet.
5.2
Manual Evaluation
Evaluating large-scale semantic networks is a difficult task. Traditional nlp evaluation metrics such as precision and recall do not apply so readily to semantic networks; the networks are too large to be directly evaluated by humans; and even the notion 67
5.2. MANUAL EVALUATION
of what a “correct” network should look like is difficult to define. nlp evaluation metrics also typically assume a uniform importance of information. However, when considering semantic networks, there is often a distinction between relevant and irrelevant information. For example, a network containing information about the Second World War could contain the fact that September 3rd 1939 was the day that the Allies declared war on Germany, and also the fact that it was a Sunday. Clearly for many applications the former fact is much more relevant than the latter. In order to achieve a meaningful precision metric for a semantic network, it is important to focus the evaluation on high-relevance portions of the network. There is no gold-standard resource against which these networks can be evaluated, and given their size and complexity it is highly unlikely that any such resource will be built. Therefore evaluation can either be performed by direct human evaluation or indirect, application based evaluation. For this chapter we have chosen direct, human evaluation. The size of the networks created by ASKNet makes human evaluation of the entire network impossible. It is therefore necessary to define a subset of the network on which to focus evaluation efforts. In early experiments, we found that human evaluators had difficulty in accurately evaluating networks with more than 20 - 30 object nodes and 30 - 40 relations. Rather than simply evaluating a random subset of the network, which may be of low-relevance, we evaluated a network core, which we define as a set of high-relevance nodes, and the network paths which connect them. This allows us to maintain a reasonable sized network for evaluation, while still ensuring that we are focusing our efforts on the high-relevance portions of the network. These are also likely to be the portions of the network which have undergone the most iterations of the update algorithm. Therefore the evaluation will be more likely to give an accurate representation of ASKNet’s overall capability, rather than being dominated by the 68
5.2. MANUAL EVALUATION
quality of the nlp tools used. We evaluated networks based on documents from the 2006 Document Understanding Conference (duc). These documents are taken from multiple newspaper sources and grouped by topic. This allows us to evaluate ASKNet on a variety of inputs covering a range of topics, while ensuring that the update algorithm is tested by the repetition of entities across documents. In total we used 125 documents covering 5 topics, where topics were randomly chosen from the 50 topics covered in duc 2006. The topics chosen were: Israeli West Bank Settlements, Computer Viruses, NASA’s Galileo Mission, the 2001 Election of Vladimir Putin and the Elian Gonzalez Custody Battle.
5.2.1
Building the Network Core
Our task in building the core is to reduce the size of the evaluation network while maintaining the most relevant information for this particular type of network (newspaper text). We begin to build the core by adding all named entity nodes which are mentioned in more than 10% of the documents (a value picked for pragmatic purposes of obtaining a core with an appropriate size). In evaluating the duc data, we find that over 50% of the named entity nodes are only mentioned in a single document (and thus are very unlikely to be central to the understanding of the topic). Applying this restriction reduces the number of named entities to an average of 12 per topic network while still ensuring that the most important entities remain in the core. For each of the named entity nodes in the core, we perform a variation of Dijkstra’s algorithm [Dijkstra, 1959] to find the strongest path to every other named entity node in the core. Rather than using the link weights to determine the shortest path, as in the normal Dijkstra’s algorithm, we use the spreading activation algorithm to
69
5.2. MANUAL EVALUATION
determine the path along which the greatest amount of activation will travel between the two nodes, which we call the primary path. Adding all of these paths to the core results in a representation containing the most important named entities in the network, and the primary path between each pair of nodes (if such a path exists). Pseudo code for this algorithm is given in Figure 5.2 Algorithm 1: CreateCore Data: ASKN etnetworkA = (N, V )kN = nodes, V = links Result: Core : The network core of A begin N EN odes ←− {n ∈ N k n is a named entity node}; CoreN odes ←− {n ∈ N k n appeared in >10% of documents }; CoreN Es ←− {N EN odes ∩ CoreN odes}; P athN odes ←− ∅; P athLinks ←− ∅; Core ←− ∅; for x ∈ CoreN Es do for y ∈ CoreN Esky 6= x do giveActivation(x); while notReceivedActivation(y) do f ireN etwork(A); giveActivation(x); /* Trace the path of maximum activation from y back to x */ tempN ode = y ; while tempN ode 6= x do prevN ode = tempN ode; /* maxContrib(i) returns the node which sent the most activation to i */ tempN ode = maxContrib(tempN ode); P athN odes ←− P athN odes ∪ tempN ode; P athLinks ←− P athLinks ∪ link(prevN ode, tempN ode); Core = (CoreN odes ∪ P athN odes, P athLinks); end Figure 5.2: Pseudocode for the algorithm used to create a network core given an ASKNet network The core that results from the Dijkstra-like algorithm focuses on the relationships between the primary entities and discards peripheral information about individual 70
5.2. MANUAL EVALUATION
entities within the network. It also focuses on the strongest paths, which represent the most salient relationships between entities and leaves out the less salient relationships (represented by the weaker paths). As an example, the core obtained from the “Elian Gonzalez Custody Battle” network (See Figure 5.3) maintained the primary relationships between the important entities within the network, but discarded information such as the dates of many trials, the quotes of less important figures relating to the case, and information about entities which did not directly relate to the case itself. Running the algorithm on each of the topic networks produced from the duc data results in cores with an average of 20 object nodes and 32 relations per network, which falls within the acceptable limit for human evaluation. An additional benefit of building the core in this manner is that, since the resulting core tends to contain the most salient nodes and relations in the network, human evaluators can easily identify which portions of the network relate to which aspect of the stories. We also found during our experiments that the core tended to stabilise over time. On average only 2 object nodes and no named entity nodes changed within the core of each network between inputting the 20th and the 25th document of a particular duc category. This indicates that the core, defined in this way, is a relatively stable subset of the network, and represents information which is central to the story, and is therefore being repeated in each article.
5.2.2
Evaluating the Network Core
ASKNet uses the GraphViz [Gansner and North, 2000] library to produce graphical output. This allows human evaluators to quickly and intuitively assess the correctness of portions of the network. One network was created for each of the 5 topics, and
71
5.2. MANUAL EVALUATION
Legend -Named Entity
panel
of reno
-Entity
janet
us rule
-Relation
in
TYPE = org
appeals
TYPE = per
circuit
TYPE = per
general attorney
court
attourney
-Semantic Relation Synonymy, Meronymy, Definition, etc.
TYPE = loc
for
-Attribute
atlanta say TYPE = org
-Connector
justice_department in
father pleased
TYPE = org
act TYPE = org
naturalization_service
with
circuit_court
order
of
relative
11th
transfer
unanimously TYPE = org
ins
meet
official
of
of
nearly
on
gonzalez for
with
wednesday
with
TYPE = per
elian
morning
want
son
hour
relative
lawyer
TYPE = org
immigration
of
return
leave
show
thursday
go
NOT
in gonzalez briefly TYPE = loc
TYPE = per
miguel
miami
by
to
in
with
for
on
tuesday
night
in
juan
judge
castro with
travel
remain
DATE:+-XXXX-01-14
TYPE = loc
cuba
boat
TYPE = per
fidel
TYPE = loc
florida
television
to
president
on
with
wife
from
TYPE = loc
in
havana
TYPE = loc
to
united_states
state
once
unclear
TYPE = loc
washington
DATE:+-XXXX-04-06 reunite
whether
Figure 5.3: Graphical representation for topic: “Elian Gonzalez Custody Battle”. graphical representations were output for each network. Examples of the graphical representations of the network cores used for evaluation are shown in Figure 5.3 and Figure 5.5. Magnified views of the representations are also given in Figure 5.4 and Figure 5.6. To ease the evaluator’s task, we have chosen to output the graphs without the recursive nesting. In some cases, connector nodes (ovals) were added to provide information that was lost due to the removal of the nesting. Each of the 5 topic networks was evaluated by 3 human evaluators. (The networks were distributed in such a way as to ensure that no two networks were evaluated by the same 3 evaluators). 5 evaluators participated in the experiment, all of whom were graduate students in non computer science subjects and who spoke English as a first language. None of 72
5.2. MANUAL EVALUATION
Figure 5.4: Expanded section of Figure 5.3. the evaluators had any prior experience with ASKNet or similar semantic networks. The evaluators were provided with the graphical output of the networks they were to assess, the sentences that were used in the formation of each path, and a document explaining the nature of the project, the formalities of the graphical representation, and the step-by-step instructions for performing the evaluation.2 The evaluation was divided into 2 sections and errors were classified into 3 types. The evaluators were first asked to evaluate the named entity nodes in the network, to determine if each node had a type error (an incorrect named entity type as assigned by the named entity tagger as shown in Figure 5.7), or a label error (an incorrect set 2
All of the evaluation materials provided to the evaluators can be found online at www.brianharrington.net/asknet.
73
5.2. MANUAL EVALUATION
stepashin former resignation TYPE = org
zyuganov
TYPE = per
prime minister sergei
communist party
gennady
of
is
pave
come
rival
express
way
in
in
of
belief
for
announcement
name
resign
yeltsin of
election
TYPE = per
surprise
as
boris
race
president
nation
russian president
DATE:+-1996-XX-XX
presidential
leave
into
of acting
champion
putin vladimir president
war
acting TYPE = loc early
after
kremlin
TYPE = per
prime
wednesday
minister
current
russian
in
'president-elect' visit
return TYPE = loc promise
publish
TYPE = org
central_election_commission
chechnya
born head
and
transform
wait
after
of
schedule
TYPE = org
cec
in
on to
plan
list
TYPE = loc
russia
inauguration
tentatively
for
TYPE = loc
leningrad
DATE:+-1952-10-07 TYPE = loc
economic
of
DATE:+-XXXX-05-05
TYPE = loc
ivanovo
st_petersburg is
to
candidate
town
ministerial
textile
TYPE = loc
factory
northeast
down-at-the-heels
of
180
mile
moscow
Figure 5.5: Graphical representation for topic: “2001 Election of Vladimir Putin”. of labels, indicating that the node did not correspond to a single real world entity as shown in Figure 5.8). The evaluators were then asked to evaluate each primary path. If there was an error at any point in the path, the entire path was said to have
74
5.2. MANUAL EVALUATION
Figure 5.6: Expanded section of Figure 5.5. a path error (as shown in the Figure 5.9) and deemed to be incorrect. In particular, it is important to notice that in the bottom example of Figure 5.9, the error actually caused several paths (i.e., “Melissa Virus” - “Microsoft Word”, “Melissa Virus” “Microsoft Word documents” and “Microsoft Word” - “Microsoft Word documents”) to be considered incorrect. This lowered the overall network scores, by potentially penalising the same mistake multiple times, but as in all stages of this evaluation we felt it important to err on the side of caution to ensure that our results were under-estimations of network quality rather than over-estimations. The error types were recorded separately in an attempt to discover their source. Type errors are caused by the named entity tagger, label errors by the update algorithm or the semantic analyser (Boxer), and path errors by the parser or Boxer.
75
5.2. MANUAL EVALUATION
Figure 5.7: Examples of type errors. Left: “Melissa virus” , a computer virus identified as a location. Right: “Gaza Strip”, a location identified as an organisation.
Figure 5.8: Examples of label errors. Left: After processing a sentence containing the phrase “...arriving in Israel Sunday to conduct...” the phrase “Israel Sunday” is mistakenly identified as a single entity. Right: The location “Miami” and the person “Miami Judge” collapsed into a single node.
5.2.3
Results
The scores reported by the human evaluators are given in Table 5.2. The scores given are the percentage of nodes and paths that were represented entirely correctly. A named entity node with either a type or label error was considered incorrect, and any path segment containing a path error resulted in the entire path being labelled as incorrect. The overall average precision was 79.1%, with a Kappa Coefficient [Carletta, 1996] of 0.69 indicating a high level of agreement between evaluators. Due to the nature of the evaluation, we can perform further analysis on the errors
76
5.2. MANUAL EVALUATION
Figure 5.9: Examples of path errors. Top: Three independent meetings referenced in the same sentence, all involving relatives of Elian Gonzalez, are identified as a single meeting event. Bottom: The network indicates that the computer, rather than the virus, hides in Microsoft Word documents.
77
5.2. MANUAL EVALUATION Topic Eval 1 Eval 2 Eval 3 Elian Gonzalez 88.2% 70.1% 75.0% Galileo Probe 82.6% 87.0% 91.3% Viruses 68.4% 73.7% 73.7% Vladimir Putin 90.3% 82.8% 94.7% West Bank 68.2% 77.3% 70.0% Average Precision:
Avg 77.6% 87.0% 71.9% 89.9% 72.3% 79.1%
Table 5.2: Evaluation Results Topic Elian Gonzalez Galileo Probe Viruses Vladimir Putin West Bank Total:
NE Type 8.3% 22.2% 93.8% 22.2% 66.7% 43.4%
Label 50.5% 55.6% 0.0% 33.3% 27.8% 32.9%
Path 41.7% 22.2% 6.3% 44.4% 5.6% 23.7%
Table 5.3: Errors by Type reported by the evaluators, and categorize each error by type as seen in Table 5.3. The results in Table 5.3 indicate that the errors within the network are not from a single source, but rather are scattered across each of the steps. The NE Type errors were made by the ner tool. The Label errors came from either Boxer (mostly from mis-judged entity variable allocation), or from the Update Algorithm (from merging nodes which were not co-referent). The Path errors were caused by either the parser mis-parsing the sentence, Boxer mis-analysing the semantics, or from inappropriate mappings in the Update Algorithm. The errors appear to be relatively evenly distributed, indicating that, as each of the tools used in the system improves, the overall quality of the network will increase. Some topics tended to cause particular types of problems. Notably, the ner tool performed very poorly on the Viruses topic. This is to be expected as the majority of the named entities were names of computer viruses or software programs that would not have existed in the training data used for the ner tagging model. An overall precision of 79.1% is highly promising for such a difficult task. The 78
5.2. MANUAL EVALUATION
high score indicates that, while semantic network creation is by no means a solved problem, it is possible to create a system which combines multiple natural language inputs into a single cohesive knowledge network and does so with a high level of precision. In particular we have shown that ASKNet’s use of spreading activation techniques results in a high quality network core, with the most important named entities and the relations between those entities being properly represented in the majority of cases.
79
Chapter 6 Semantic Relatedness The ability to determine semantic relatedness between two terms could be of great use to a wide variety of nlp applications, such as information retrieval, query expansion, word sense disambiguation and text summarisation [Budanitsky and Hirst, 2006]. However, it is important to draw a distinction between semantic relatedness and semantic similarity. Resnik 1999 illustrates this point by writing “Semantic similarity represents a special case of semantic relatedness: for example, cars and gasoline would seem to be more closely related than, say, cars and bicycles, but the latter pair are certainly more similar”. Budanitsky & Hirst 2006 further point out that “Computational applications typically require relatedness rather than just similarity; for example, money and river are cues to the in-context meaning of bank that are just as good as trust company”. Despite these distinctions, many papers continue use these terms interchangeably. For the purposes of this thesis, we will continue to honour this distinction, and we will use the term semantic distance to refer to the union of these two concepts. One of the most popular modern methods for automatically judging semantic distance is the use of WordNet [Fellbaum, 1998], using the paths between words 80
in the taxonomy as a measure of distance. While many of these approaches have obtained promising results for measuring semantic similarity [Jiang and Conrath, 1997, Banerjee and Pedersen, 2003], the results for measuring semantic distance have been much less promising [Hughes and Ramage, 2007]. One of the major drawbacks of using WordNet as a basis for evaluating semantic relatedness is its hierarchical taxonomic structure. This results in terms such “car” and “bicycle” being very close in the network, but terms such as “car” and “gasoline” being separated by a great distance. Another difficulty results from the non-scalability of WordNet which we addressed in Section 1.2.1. While the quality of the network is very high, the manual nature of its construction prohibits it from having the coverage necessary to be able to reliably obtain scores for any arbitrary word pair. ASKNet’s non-hierarchical nature, and generalised relation links combined with its robustness in dealing with different types of input make it, at least in principle, a much more suitable resource to use for discovering semantic relatedness between terms. An additional way of obtaining semantic distance scores is to calculate pointwise mutual information (pmi) across a large corpus. By obtaining the frequency with which words co-occur in the corpus and dividing by the total number of times each term appears, one can obtain a measure of association of those terms within that corpus. If the corpus is large enough, these values can be used as a measure of semantic distance. The drawback of this methodology is that it requires a very large corpus, and while word co-occurrences can be computed efficiently, it is still necessary to process a great deal of information in order to build a representative score. One alternative to computing pmi based on word co-occurrence, is to use the number of results retrieved by large search engines when searching for individual words and word pairs. This method has been used successfully in measuring semantic similarity [Turney, 2001]. 81
6.1. USING ASKNET TO OBTAIN SEMANTIC RELATEDNESS SCORES
A final method for improving upon simple word co-occurrence is the use of vector space models, which can use word co-occurence as features in the vectors, but can also incorporate additional linguistic information such as syntactic relations. Using these additional types of information has been shown to improve scores in similar tasks [Pad´o and Lapata, 2007]. Of these traditional methods for obtaining scores of semantic distance, none are particularly suited to measuring semantic relatedness, as opposed to similarity. All of them (with the exception of [Turney, 2001]) require either a manually created resource or a large, pre-compiled corpus. In this section we will detail an alternative methodology which uses ASKNet to obtain scores for semantic relatedness using a relatively small corpus automatically harvested from the web with minimal human intervention.
6.1
Using ASKNet to Obtain Semantic Relatedness Scores
In this chapter we detail two experiments using ASKNet to obtain scores for the semantic relatedness of word pairs, and comparing those scores against human generated scores for the same word pairs. In the first experiment, the ASKNet networks are built on a corpus obtained directly from search engine results. The second experiment has an identical methodology to the first, but the corpus is improved by using a few simple heuristics to obtain a more representative corpus, which results in a large improvement to the correlation scores. Once a large scale ASKNet network is constructed, it is possible to use the spreading activation functions of the network (as described in Section 4.1) to efficiently
82
6.1. USING ASKNET TO OBTAIN SEMANTIC RELATEDNESS SCORES
dist(x, y, α) = act(x, y, α) + act(y, x, α)
(6.1)
Symbol Definitions act(i,j,α) total amount of activation received by node j when node i is given α activation and then the network is allowed to fire
obtain a distance score between any node pair (x,y). This score is obtained by placing a set amount of activation (α) in node x, allowing the network to fire until it stabilises, and then noting the total amount of activation received during this process by node y, which we will call act(x,y,α). This process is repeated starting with node y to obtain act(y,x,α). We will call the sum of these two values dist(x,y,α), and, since we will be using a constant value for α, will will shorten this to dist(x,y). dist(x,y) is a measure of the total strength of connection between nodes x and y, relative to the other nodes in their region. This takes into account not just direct paths, but also indirect paths, if the links along those paths are of sufficient strength. Since ASKNet relations are general, and not hierarchical in nature, this score can be used as a measure of semantic relatedness between two terms. If we take the (car, gasoline), (car, bicycle) example mentioned earlier, we can see that firing the node representing car in ASKNet should result in more activation being sent to gasoline than to bicycle as the former shares more direct and indirect relations with car. This means that unlike WordNet or other taxonomic resources, ASKNet can be directly used to infer semantic relatedness, rather than semantic similarity. In order to evaluate ASKNet’s ability to produce measures of semantic relatedness, we chose to correlate the system’s measurements to those given by humans. In these experiments we take human judgements to be a gold standard, and attempt to use ASKNet to replicate those judgements.
83
6.2. WORDSENSE 353
6.2
WordSense 353
In order to obtain human judgements against which we could compare ASKNet’s scoring, we used the WordSimilarity-353 (ws-353) collection [Finkelstein et al., 2002]. Although the name implies that the scores are similarity rankings, human judges were in fact asked to score 353 pairs of words for their relatedness on a scale of 1 to 10. The ws-353 collection contains word pairs which are not semantically similar, but still receive high scores because they are judged to be related (e.g., the pair (money, bank) receives a score of 8.50). It also contains word pairs which do not share a part of speech (e.g., (drink, mouth)), and at least one term which does not appear in WordNet at all (Maradona). All of these have proven difficult for WordNet based methods, and resulted in significantly poorer results than those obtained with collections emphasising semantic similarity [Hughes and Ramage, 2007].
6.3
Spearman’s Rank Correlation Coefficient
For consistency with previous literature, we use Spearman’s rank correlation coefficient (also known as Spearman’s ρ [Spearman, 1987]) as a measure of the correlation between the ASKNet scores and those from the ws-353. Spearman’s rank correlation coefficient assesses the measurements based on their relative ranking rather than on their values. If we take as vectors the set of human measurements (X = hx1 , .., xn i) and ASKNet measurements (Y = hyi , .., yn i) and convert them into ranks ((X 0 = hx01 , .., x0n i and (Y 0 = hy10 , .., yn0 i respectively), i.e., if xi is the largest value in X, then x0i = 1, if xj is the second highest value in X, then x0j = 2, etc. Then the correlation coefficient can be calculated by Equation 6.2. The significance value is calculated using a simple 84
6.4. EXPERIMENT 1 P P P n( ni=1 x0i yi0 ) − ( ni=1 x0i )( ni=1 yi0 ) ρ= q P Pn 0 2 q Pn 0 2 P 2 n 02 n( i=1 xi ) − ( i=1 xi ) n( i=1 yi ) − ( ni=1 yi0 ) x0k n
(6.2)
Symbol Definitions the k’th element of vector X 0 the number of elements in vectors X 0 and Y 0
permutation test, finding the probability that a random permutation of X 0 will achieve an equal or greater value of ρ.
6.4 6.4.1
Experiment 1 Data Collection & Preparation
In order to use ASKNet to develop rankings for each word pair in the ws-353, we first extracted each individual word from the pairings resulting in a list of 440 words (some words were used in multiple pairings). For each of the words in this list, we then performed a query in a major search engine, in this case Google, and downloaded the first 5 page results for that query. (The choice of the number 5 as the number of documents to download for each word was based on a combination of intuition about the precision and recall of search engines, as well as the purely pragmatic issue of obtaining a corpus that could be held in system memory). Each of the downloaded web pages was then cleaned by a set of Perl scripts which removed all HTML markup and javascript code and comments. Punctuation was added where necessary (e.g., upon encountering a or tag, if the previous string did not end in a full stop, one was added). Statistics for the resulting corpus are given in Table 6.1.
85
6.4. EXPERIMENT 1 Experiment 1 Corpus Number of Sentences 995,981 Number of Words 4,471,301 Avg. Number of Sentences/Page 452.7 % Pages from Wikipedia 18.5 Table 6.1: Summary statistics for the corpus generated in Experiment 1
6.4.2
Calculating a Baseline Score
A simple baseline score was calculated using pointwise mutual information (pmi) of word co-occurence statistics. The corpus was used as input to a Perl script which counted the number of paragraphs and documents in which word pairs co-occurred. The score of a word pair was increased by 1 point for every document in which both words occurred, and increased by a further point for every paragraph in which both words occurred. (This means that a word pair co-occurring in a single paragraph automatically received a score of at least 2 points, this also means that the score for a particular word pair can actually have a value greater than 1). This score was then divided by the product of the total number of occurrences of either word in the corpus. This is in line with the standard definition of pmi. The methodology used is formalised in Equation 6.3. Note that unlike the traditional definition of pmi we do not take the log of the scores. This is because the final result is based on the rank, and therefore it is not necessary. The result of performing Spearman’s rank correlation on these scores is given in Table 6.3 Scorex,y =
co-occur(x, y) occur(x)occur(y)
Symbol Definitions Scorei,j The computed distance between words i and j co-occur(i,j) The number of times words i and j occur in the same sentence + the number of times words i and j occur in the same paragraph occur(i) the total number of times word i occurs in the corpus
86
(6.3)
6.4. EXPERIMENT 1
6.4.3
Calculating the ASKNet Score
After processing the corpus to build an ASKNet network with approximately 800,000 nodes and 1,900,000 edges, the appropriate node pairs were fired to obtain the distance measure as described earlier. Those measurements were then recorded as ASKNet’s measurement of semantic relatedness between two terms. If a term was used as a label in two or more nodes, the node containing the fewest extraneous labels was chosen. If there was more than one node using the term as a label with the same overall number of labels, the input activation was split evenly. It is important to note that no manual adjustments were made to ASKNet to facilitate this experiment. All of the firing parameters were set based on intuition and results from previous experiments before any data was entered into the system. This means that the system was not fine-tuned to this particular task. This methodology was chosen for two reasons, firstly because the ws-353 contained few enough pairs that we thought it unwise to split up the collection for training and testing, and secondly because we hoped to show that a completed “un-tweaked” network could perform at least as well as manually tuned systems based on WordNet.
6.4.4
Results
The scores for both the baseline system and ASKNet were compared against those from the ws-353 collection using Spearman’s rank correlation. Example scores and ranks are given in Table 6.2. The correlation results are given in Table 6.3. For comparison, we have included the results of the same correlation on scores from four additional systems. These scores were obtained from [Hughes and Ramage, 2007]. The Jiang-Conrath system [Jiang and Conrath, 1997] computes semantic distance
87
6.4. EXPERIMENT 1 Word Pair love - sex tiger - cat tiger - tiger book - paper computer - keyboard computer - internet plane - car train - car television - radio media - radio drug - abuse bread - butter cucumber - potato doctor - nurse professor - doctor student - professor smart - student smart - stupid company - stock stock - market stock - phone stock - CD stock - jaguar stock - egg fertility - egg stock - live stock - life book - library bank - money wood - forest money - cash professor - cucumber king - cabbage king - queen king - rook bishop - rabbi
ws-353 Baseline ASKNet ws-353 Baseline ASKNet Score Score Score Rank Rank Rank 6.77 266 5.90 144 62 85 7.35 296 9.06 109 53 58 10 398 58.87 1 33 9 7.46 511 45.15 98 20 15 7.62 216 9.52 82 83 56 7.58 0 0.00 86 316 316 5.77 500 9.16 214 21 57 6.31 938 37.04 177 5 20 6.77 186 6.18 143 99 82 7.42 138 3.05 103 120 130 6.85 64 1.24 138 190 169 6.19 202 6.54 188 90 79 5.92 0 0.00 204 316 316 7 108 4.17 127 138 115 6.62 137 15.38 156 121 40 6.81 260 14.07 139 69 42 4.62 12 0.14 256 260 252 5.81 4 0.03 213 295 290 7.08 408 11.49 121 30 53 8.08 411 51.51 52 29 12 1.62 37 0.29 340 222 238 1.31 8 0.05 341 279 279 0.92 91 4.30 345 156 112 1.81 9 0.06 335 271 273 6.69 19 0.19 150 246 247 3.73 232 3.07 283 77 129 0.92 158 2.81 344 111 135 7.46 342 38.02 99 40 19 8.12 525 32.68 48 18 23 7.73 270 17.32 72 61 36 9.15 214 8.93 6 85 59 0.31 0 0.00 352 316 316 0.23 48 0.88 353 206 187 8.58 58 1.08 24 197 176 5.92 141 3.17 205 118 128 6.69 12 0.11 152 260 259
Table 6.2: Relatedness scores and score rankings for ws-353, baseline system and ASKNet.
88
6.4. EXPERIMENT 1
between a word pair by using the information content of the two words and their lowest common subsumer in the WordNet hierarchy. The Lin approach [Lin, 1998] similarly uses WordNet to discover word pair similarity which he defines as the “ratio between the amount of information needed to state their commonality and the information needed to fully describe what they are”. Both of these approaches were identified as top performers in a survey of methodologies for calculating semantic distance [Budanitsky and Hirst, 2006]. The Hughes-Ramage system [Hughes and Ramage, 2007] uses random walks over WordNet to achieve a similar metric to the other systems, but they also augment this with walks of associative networks generated from word co-occurence in WordNet’s gloss definitions. This is explicitly done in order to help improve their methodology’s ability to calculate semantic relatedness as opposed to semantic similarity. In addition to the WordNet based systems, we performed a simple pmi measure over the British National Corpus (bnc)1 . These scores were calculated in the same manner as our baseline both with and without stemming implemented, but on the larger, more general corpus of the bnc. The results of the pmi based methodology were poor, based largely on the problem of data sparseness. Several word pairs never co-occurred in the bnc, and some individual terms never occurred at all. Stemming helped the pmi approach slightly, but it still performed far worse than the Hughes & Ramage WordNet based system. This is likely because many of the word pairs, even after stemming, simply never appeared together in the corpus. Also worth noting is the information provided to us by Jeffrey Mitchell, of the University of Edinburgh School of Informatics. He used vector based models similar 1
Data cited herein has been extracted from the British National Corpus Online service, managed by Oxford University Computing Services on behalf of the BNC Consortium. All rights in the texts cited are reserved.
89
6.4. EXPERIMENT 1
to the methodology of [Pad´o and Lapata, 2007] on the bnc in order to compute scores for the word pairs given in the ws-353. Through personal communication we have discovered that this methodology obtained a correlation coefficient of 0.5012. Jiang-Conrath Lin Hughes-Ramage pmi: bnc-unstemmed pmi: bnc-stemmed Baseline: Word co-occurrence ASKNet
ρ 0.195 0.216 0.552 0.192 0.250 0.310 0.391
Table 6.3: Rank correlation scores for ASKNet, the baseline system and existing WordNet based systems. Figures 6.1 and 6.2 provide scatter plots of the rank order of the scores for the baseline and ASKNet respectively. In Figure 6.1, note the large number of data points lying directly on the x-axis. These points indicate word pairs for which a score of zero was obtained, indicating that the word pair never co-occurred in a document. We can also see that there is very little visible correlation (as would be indicated by a tendency for the data points to cluster around a line from the bottom left to the top right of the graph) in either graph. Table 6.2 provides a sample of the scores and rankings produced by both systems compared to those of the ws-353 gold standard (the full results can be found in Appendix C).
6.4.5
Discussion
These results were somewhat disappointing. While the ASKNet system did manage to out-perform the Jiang-Conrath and Lin systems, neither of their methodologies were specifically targeting semantic relatedness, and failed to even reach the baseline score. ASKNet was out performed by the Hughes-Ramage system by over 16 percentage points. 90
6.5. EXPERIMENT 2
Figure 6.1: Scatter plot of rank order of ws-353 scores vs. rank order of baseline scores.
6.5 6.5.1
Experiment 2 Inspecting the Corpus
After a disappointing result in experiment 1, we used ASKNet’s graphical output to manually survey pieces of the network. This identified several problems with the data collection and preparation methodology as described in Section 6.4.1. Upon inspecting the corpus several problems were identified which suggested that the low scores were a result of the corpus rather than the experiments themselves. Some pages retrieved contained no meaningful content of any description. For example, the first link retrieved for the query term Maradona is his public site2 , 2
http://www.diegomaradona.com
91
6.5. EXPERIMENT 2
Figure 6.2: Scatter plot of rank order of ws-353 scores vs. rank order of ASKNet scores. which is implemented entirely in Flash, which can not easily be converted into plain text. Therefore the corpus entry for this web page consisted solely of the sentence “CASTELLANO — ENGLISH” (taken from two small language selection links at the bottom of the page). Over 75% of the query words resulted in at least one of the five results being a link to Wikipedia. This may initially seem like a positive result, as it would indicate a good percentage of words having at least one document giving an encyclopedic overview of the term; in fact, it proved to be disastrous to the results. For more than half of the query terms linking to Wikipedia, the resulting document was a “disambiguation page”: a page which contains very little text relating to the term itself, but instead is merely a listing of links to other Wikipedia articles relating to the various senses of the term. A partial example of such an article is given in Figure 6.3.
92
6.5. EXPERIMENT 2
Figure 6.3: An excerpt from the Wikipedia “disambiguation page” for the query term “American”. These “disambiguation pages” are problematic not only in that they provide very little information for ASKNet to use, but also in that they are structured as a list, which causes the data preparation script used in Section 6.4.1 to treat each element of the list as a sentence fragment, which often did not result in a sensible parse being returned by the C&C parser. Another problem was that the search engine used (in this case Google) uses word
93
6.5. EXPERIMENT 2
stemming of query terms. This resulted in a search for “planning” returning documents which never contained the word “planning”, but instead contained the word “plan”. Since both the baseline system and ASKNet used the original query terms un-stemmed, the potentially useful information provided by these documents was being ignored in score calculation.
6.5.2
Building a Better Corpus
In order to improve the corpus, several changes were made to the data preparation method described in 6.4.1. These changes are listed below:
1. A heuristic was added to say that if a sentence ended with a colon, and was immediately followed by a list, that sentence was copied once for each element in the list, and the sentence was concatenated with each list element, excluding the colon. (An example of the change that this made is given in Figure 6.4). 2. For Wikipedia disambiguation pages, identified as such by either the string “(disambiguation)” in the title or by first sentence being of the form “X may refer to:”, all of the links in the disambiguation list were followed, and the resulting pages were also added to the corpus. This means that for the example given in Figure 6.3, the wikipedia pages for United States, Americas and Indigenous peoples of the Americas would be added to the corpus.
Additionally, an implementation of the Porter Stemmer [Porter, 1980] was added to ASKNet, and for each word added to a node as a label, the stemmed version of the word was also added if the stemmed word was different from the original. We also stemmed all of the words in the corpus for the computation of the baseline score. Summary statistics for the resulting corpus are given in Table 6.4. 94
6.5. EXPERIMENT 2 American may refer to: A person, inhabitant, or attribute of the United States of America. A person, an inhabitant, or attribute of the Americas, the lands and regions of the Western Hemisphere. A person or attribute of the indigenous peoples of the Americas.
American may refer to a person, inhabitant, or attribute of the United States of America. American may refer to a person, an inhabitant, or attribute of the Americas, the lands and regions of the Western Hemisphere American may refer to a person or attribute of the indigenous peoples of the Americas Figure 6.4: The text retrieved from the page shown in Figure 6.3 using the original data preparation methods (top) and the improved heuristic (bottom). Experiment 1 Corpus Number of Sentences 995,981 Number of Words 4,471,301 Avg. Number of Sentences/Page 452.7 % Pages from Wikipedia 18.5 Experiment 2 Corpus Number of Sentences 1,042,128 Number of Words 5,027,947 Avg. Number of Sentences/Page 464.4 % Pages from Wikipedia 22.1 Table 6.4: Summary statistics for the corpus generated in both experiments
6.5.3
Results: New Corpus
The baseline and ASKNet scores were re-computed on the new corpus, and using the new techniques as described in the previous section. The results are shown in Table 6.5. Figures 6.5 and 6.6 provide scatter plots of the rank order of the scores for the 95
6.5. EXPERIMENT 2
Jiang-Conrath Lin Hughes-Ramage pmi: bnc-unstemmed pmi: bnc-stemmed Original Corpus Baseline: Word co-occurrence ASKNet Improved Corpus Baseline: Word co-occurrence ASKNet
ρ 0.195 0.216 0.552 0.192 0.250 0.310 0.391 0.408 0.609
Table 6.5: Rank correlation scores for ASKNet, the baseline system and existing WordNet based systems. baseline and ASKNet respectively as calculated using the new corpus. Here we note the reduction of data points lying directly on the x-axis in Figure 6.5 when compared with Figure 6.1, indicating fewer word pairs with no co-occurrences. We can also note the distinct visible correlation in Figure 6.6. Table 6.6 provides a sample of the scores and rankings produced by both systems being run on the new corpus compared to those of the ws-353 gold standard.
6.5.4
Discussion
The improvements to the corpus allowed for a massive improvement in the quality of ASKNet’s scores. The rank correlation coefficient improved to 0.609, which indicates a performance at least on par with any of the existing methodologies. This shows that with a properly acquired corpus, ASKNet can be used to judge the semantic relatedness at least as well as any other existing system, without the need for a manually created network such as WordNet or a large human-collected corpus such as the bnc. Additionally, since ASKNet used a web-based corpus to generate its scores, it 96
6.5. EXPERIMENT 2 Word Pair computer - keyboard computer - internet plane - car train - car television - radio media - radio drug - abuse bread - butter cucumber - potato doctor - nurse professor - doctor student - professor smart - student smart - stupid company - stock stock - market stock - phone stock - CD stock - jaguar stock - egg fertility - egg stock - live stock - life book - library bank - money wood - forest money - cash professor - cucumber king - cabbage king - queen king - rook bishop - rabbi
ws-353 Baseline ASKNet ws-353 Baseline ASKNet Score Score Score Rank Rank Rank 7.62 1403 26.45 82 59 65 7.58 390 5.2 86 144 178 5.77 922 1.09 214 76 288 6.31 3705 5.16 177 26 179 6.77 2657 47.63 143 33 38 7.42 833 11.61 103 84 112 6.85 435 19.14 138 136 85 6.19 693 21.91 188 98 77 5.92 20 0.26 204 287 321 7 426 50.97 127 139 36 6.62 1545 126.82 156 52 14 6.81 1503 33.08 139 53 55 4.62 23 2.34 256 281 246 5.81 3 12.93 213 323 105 7.08 3259 41.84 121 30 44 8.08 6061 49.58 52 15 37 1.62 33 0.34 340 267 318 1.31 7 0.14 341 311 332 0.92 430 16.25 345 138 89 1.81 8 0.2 335 303 326 6.69 28 2.93 150 274 231 3.73 318 2.57 283 157 235 0.92 338 2.08 344 152 263 7.46 4419 67.1 99 23 30 8.12 3561 33.85 48 27 53 7.73 1836 25.36 72 42 69 9.15 901 37.97 6 77 47 0.31 217 2.9 352 177 232 0.23 88 0.96 353 221 297 8.58 108 3.4 24 209 221 5.92 476 5.07 205 131 180 6.69 103 1.76 152 212 271
Table 6.6: Relatedness scores and score rankings for ws-353 , baseline system and ASKNet as computed on the improved corpus did not encounter the same data sparseness problems seen in the other systems. [Hughes and Ramage, 2007] were forced to remove at least one word pair from their analysis, because the word Maradona did not appear in WordNet. When performing experiments on a similar data set using vector space models, [Pad´o and Lapata, 97
6.5. EXPERIMENT 2
Figure 6.5: Scatter plot of rank order of ws-353 scores vs. rank order of baseline scores as calculated on the improved corpus. 2007] were forced to remove seven of the 143 word pairs due to one of the words having too low frequency in their corpus (in this case, the bnc). Since our ASKNet based methodology for acquiring scores retrieved data directly from the internet, it encountered no such problems. One of the initial goals of this evaluation was to assess ASKNet’s ability to perform a real-world task with as little human intervention as possible. To that end, all of the firing parameters of ASKNet were set before any data was collected, and even then, only the most coarse adjustments were made manually in anticipation of the types of data that would be found in the corpus. While our improvements to the corpus creation process could be seen as human intervention, they were relatively minor, and with appropriate foresight would have been included into the initial corpus creation process. The data was by no means “hand picked” to be appropriate for ASKNet or
98
6.5. EXPERIMENT 2
Figure 6.6: Scatter plot of rank order of ws-353 scores vs. rank order of ASKNet scores as calculated on the improved corpus. for the task at hand. In conclusion, we have demonstrated that a novel approach to automatically measuring the semantic relatedness of words, using a relatively small, task focused, webharvested corpus to build an ASKNet network, can perform at least as well as any existing system. This shows that the networks produced by ASKNet are of sufficient quality to be of use in a real world application, and therefore we consider this to be a very positive result in our evaluation.
99
Chapter 7 Conclusions In this thesis we have detailed the conception, development and evaluation of ASKNet, a system for automatically creating large scale semantic knowledge networks from natural language text. We have shown that existing nlp tools, an appropriate semantic network formalism and spreading activation algorithms can be combined to design a system which is capable of efficiently creating semantic networks on a scale never before possible and of promising quality. The primary focus of this thesis has been to combine ai techniques with nlp tools in order to efficiently achieve cross-document information extraction and integration. This work promises to not only afford researchers with large scale networks, a useful tool in their own right, but also to provide a new methodology for large scale knowledge acquisition. We have shown that cross-document information integration, a step which has often been overlooked as either unnecessary or unfeasible, is both necessary for creating high quality semantic resources, and possible to achieve efficiently on a large scale using existing tools and algorithms. Furthermore we have shown that it is possible to automatically create high quality semantic resources on a large scale in a reasonable time without the need for manually created resources, and crucially, that 100
7.1. FUTURE WORK
by using appropriate algorithms, the scale of those resources can increase indefinitely with only a linear increase in creation time. Semantic networks have a wide variety of applications, and a system for automatically generating such networks could have far reaching benefits to multiple areas both in the field of nlp and in other research areas which require knowledge acquisition systems which can perform effectively and efficiently on a large scale. There are many areas of research to which a system such as ASKNet could potentially be of benefit; biomedical research is one of the most obvious examples. The current state of information overload is resulting in unforeseen difficulties for many researchers. The problem of research has, in many areas, ceased to be a search for information and become an exercise in filtering the vast amounts of information which are readily available. Developing tools and methodologies that can aid in this filtering process is an important task for the nlp community, not just for the benefit of researchers within our own field, but for the benefit of the academic community at large. This thesis has focused on the ASKNet system as both a real-world tool for developing semantic resources and a proof of concept system showing what is possible in the field of knowledge discovery. The ASKNet system has demonstrated that by combining ai and nlp techniques it is possible to create semantic resources larger, faster and with higher quality than anything previously obtainable.
7.1
Future Work
In this section we will briefly survey some potential future directions for the ASKNet project. These are roughly divided into two sections: improvements which are external to the ASKNet project, but could be incorporated to improve the networks it creates; and potential improvements for/uses of ASKNet networks. 101
7.1. FUTURE WORK
7.1.1
Future Improvements
ASKNet was designed primarily as a “proof of concept” system in order to test various hypotheses regarding the use of nlp tools and spreading activation theories to integrate information from multiple sources into a single cohesive network. While we have shown in this thesis that network creation time is linear with respect to network size, it is still unlikely that any networks on a scale much larger than those created for this thesis would be made using a single CPU. It would be possible in future to create a version of ASKNet that would work in a distributed fashion, using a single network that could be built and manipulated by any number of computers working independently.
A Distributed ASKNet
In order to create a distributed version of ASKNet, one would simply have to add interfaces into a central network for multiple computers (essentially this would involve having the SemNet interface described in Section 3.2.2 available to multiple agents). In order to remove conflicts between agents, one agent would require the ability to “lock” a section of the network, then to receive a copy of that section which they could update before merging it back with the main “global” network. Through this distributed process, multiple agents could be working on the same network simultaneously without fear of corrupting the network by trying to update the same nodes simultaneously. An agent would only have to wait for other agents to terminate if there was an agent trying to update a node similar to one in its network; in a large enough network this is likely to be very infrequent.
102
7.1. FUTURE WORK
Potential Uses for ASKNet
There are many potential uses for ASKNet networks. In this section we will simply list a few areas which we feel could benefit from ASKNet in the relatively near future, or to which we feel ASKNet is particularly well suited.
• Entity Relationship Discovery: This is an extension of the semantic relatedness scores generated in Section 6. Rather than finding the semantic relatedness of words, ASKNet could be used to judge the semantic relatedness of named entities. This task is even less suited to existing resources like WordNet, and could be very useful in information retrieval. In particular, if ASKNet can be ported to the biomedical domain, the ability to discover the existence of relationships between entities such as genes and proteins would be very useful. Since the C&C tools are currently being ported to work with biomedical data, this application should be possible in the very near future. • Question Answering: An obvious use of ASKNet is in a question answering (qa) system. Most existing qa systems are only able to extract information from single documents. ASKNet could be employed to increase the recall of these systems by allowing them to answer questions which require intra-document analysis. • Novel Relationship Discovery: An extension of the Entity Relationship Discovery functionality could be to attempt to discover novel relationships. In order to do this, one would merely have to try to find entities which have strong relationships in ASKNet without having any relations connecting them directly. This would be analogous to entities being strongly related, but never being mentioned within the same context. An example of this sort of relationship was found by [Swanson, 1986], where it was discovered through an extensive 103
7.1. FUTURE WORK
literature review that there was a link between Raynaud’s disease and fish-oil, despite no experiments having ever linked the two directly. The use of ASKNet to discover these types of novel relationships could potentially evolve into an interesting ai project, and be of great use to a wide variety of areas.
7.1.2
External improvements
˙ this secASKNet is designed to adapt to improvements made in other areas of nlpIn tion we list just a few improvements which we believe could contribute to an increase in the quality and usefulness of the created networks, but whose implementation is beyond the scope of the project. Each of these are active areas of research, and we hope that future improvements in any or all of these areas will benefit ASKNet.
• Anaphora resolution: Currently ASKNet only has very rudimentary anaphora resolution provided by Boxer, combined with a few heuristics added in the network creation process. This means a good deal of potentially useful information is not being integrated even at the sentence level. Improving anaphora resolution is an active area of research and improvements are being made [Charniak and Elsner, 2009]. Improvements in this area could increase the recall of ASKNet tremendously. • Temporal relations: The network formalism used throughout this thesis is atemporal, and cannot easily accommodate temporal information. This would obviously be a difficulty that would need to be resolved before any sophisticated automated reasoning could be done on the created networks. This could possibly be resolved by adding a temporal element to the relations which would indicate the time period for which they held. This is a difficult task, as no tools are currently available which appropriately capture this information, however 104
7.1. FUTURE WORK
there is active research in this area [Pustejovsky et al., 2005], and we hope that in future this information would be able to be added to ASKNet. • Domain adaptation: The ability to create networks on new domains, particularly those domains where similar resources are scarce or non-existent would be very useful for obvious reasons. While the ASKNet framework is not particularly tied to a specific domain, the tools it uses (i.e., C&C and Boxer) are trained on a particular domain (newspaper text) and will obviously have reduced performance on a domain that is novel to them. Work is currently underway to port C&C tools to work in the biomedical domain. This will allow us to test the plausibility and level of difficulty of porting ASKNet to new domains once the tools it uses have been properly adapted.
105
Appendix A Published Papers The following is a listing of papers extracted from the materials in this dissertation which have been published in other venues. All papers have been co-authored by Brian Harrington and Stephen Clark. The papers & publications are as follows: • Journal Papers – Harrington B. & Clark S. ASKNet: Creating and Evaluating Large Scale Integrated Semantic Networks (Expanded version of ICSC-08 paper). International Journal of Semantic Computing. 2(3), pp.343-364, 2009. • Conference Papers – Harrington B. & Clark S. ASKNet: Automated Semantic Knowledge Network. Proceedings of the Twenty-Second Conference on Artificial Intelligence (AAAI-07), Vancouver, Canada, 2007. – Harrington B. & Clark S. ASKNet: Creating and Evaluating Large Scale Integrated Semantic Networks. Proceedings of the Second IEEE International Conference on Semantic Computing (ICSC-08), Santa Clara, USA, 2008.
106
Appendix B Semantic Relatedness Scores & Rankings - Initial Corpus This appendix provides the complete set of relatedness scores and score rankings for ws-353, baseline system and ASKNet as computed on the initial corpus. An extract from this table is given in Table 6.2. Word Pair love - sex tiger - cat tiger - tiger book - paper computer - keyboard computer - internet plane - car train - car telephone - communication television - radio media - radio drug - abuse bread - butter cucumber - potato doctor - nurse professor - doctor student - professor smart - student smart - stupid
ws-353 Score 6.77 7.35 10 7.46 7.62 7.58 5.77 6.31 7.5 6.77 7.42 6.85 6.19 5.92 7 6.62 6.81 4.62 5.81
Baseline Score 266 296 398 511 216 0 500 938 195 186 138 64 202 0 108 137 260 12 4
107
ASKNet Score 5.90 9.06 58.87 45.15 9.52 0.00 9.16 37.04 4.79 6.18 3.05 1.24 6.54 0.00 4.17 15.38 14.07 0.14 0.03
ws-353 Rank 144 109 1 98 82 86 214 177 94 143 103 138 188 204 127 156 139 256 213
Baseline Rank 62 53 33 20 83 316 21 5 95 99 120 190 90 316 138 121 69 260 295
ASKNet Rank 85 58 9 15 56 316 57 20 103 82 130 169 79 316 115 40 42 252 290
Word Pair company - stock stock - market stock - phone stock - CD stock - jaguar stock - egg fertility - egg stock - live stock - life book - library bank - money wood - forest money - cash professor - cucumber king - cabbage king - queen king - rook bishop - rabbi Jerusalem - Israel Jerusalem - Palestinian holy - sex fuck - sex Maradona - football football - soccer football - basketball football - tennis tennis - racket Arafat - peace Arafat - terror Arafat - Jackson law - lawyer movie - star movie - popcorn movie - critic movie - theater physics - proton physics - chemistry space - chemistry alcohol - chemistry vodka - gin vodka - brandy drink - car drink - ear drink - mouth drink - eat baby - mother drink - mother car - automobile
ws-353 Score 7.08 8.08 1.62 1.31 0.92 1.81 6.69 3.73 0.92 7.46 8.12 7.73 9.15 0.31 0.23 8.58 5.92 6.69 8.46 7.65 1.62 9.44 8.62 9.03 6.81 6.63 7.56 6.73 7.65 2.5 8.38 7.38 6.19 6.73 7.92 8.12 7.35 4.88 5.54 8.46 8.13 3.04 1.31 5.96 6.87 7.85 2.65 8.94
Baseline Score 408 411 37 8 91 9 19 232 158 342 525 270 214 0 48 58 141 12 622 382 25 104 100 227 229 39 50 198 196 0 496 261 134 183 306 32 182 87 47 69 19 341 721 35 748 36 17 345
108
ASKNet Score 11.49 51.51 0.29 0.05 4.30 0.06 0.19 3.07 2.81 38.02 32.68 17.32 8.93 0.00 0.88 1.08 3.17 0.11 116.17 38.41 0.32 3.91 8.35 14.02 3.97 0.39 0.60 19.76 5.47 0.00 137.92 11.24 3.55 2.07 62.43 0.41 4.25 0.74 0.53 0.80 0.18 5.09 13.26 0.41 20.93 0.50 0.12 6.16
ws-353 Rank 121 52 340 341 345 335 150 283 344 99 48 72 6 352 353 24 205 152 28 79 339 2 22 10 140 154 89 147 78 321 33 108 187 146 62 47 110 248 225 29 45 304 342 200 134 67 316 14
Baseline Rank 30 29 222 279 156 271 246 77 111 40 18 61 85 316 206 197 118 260 14 36 237 143 148 80 79 219 203 92 94 316 22 67 124 102 47 231 103 162 208 183 246 41 11 226 10 225 249 39
ASKNet Rank 53 12 238 279 112 273 247 129 135 19 23 36 59 316 187 176 128 259 3 18 235 121 64 43 119 226 210 31 90 316 2 54 124 151 6 223 114 201 217 193 249 93 46 223 29 218 255 83
Word Pair gem - jewel journey - voyage boy - lad coast - shore asylum - madhouse magician - wizard midday - noon furnace - stove food - fruit bird - cock bird - crane tool - implement brother - monk crane - implement lad - brother journey - car monk - oracle cemetery - woodland food - rooster coast - hill forest - graveyard shore - woodland monk - slave coast - forest lad - wizard chord - smile glass - magician noon - string rooster - voyage money - dollar money - cash money - currency money - wealth money - property money - possession money - bank money - deposit money - withdrawal money - laundering money - operation tiger - jaguar tiger - feline tiger - carnivore tiger - mammal tiger - animal tiger - organism tiger - fauna tiger - zoo
ws-353 Score 8.96 9.29 8.83 9.1 8.87 9.02 9.29 8.79 7.52 7.1 7.38 6.46 6.27 2.69 4.46 5.85 5 2.08 4.42 4.38 1.85 3.08 0.92 3.15 0.92 0.54 2.08 0.54 0.62 8.42 9.08 9.04 8.27 7.57 7.29 8.5 7.73 6.88 5.65 3.31 8 8 7.08 6.85 7 4.77 5.62 5.87
Baseline Score 276 3 80 65 9 186 20 88 205 122 41 73 68 3 14 84 61 0 39 33 0 0 7 187 4 0 0 0 0 244 214 246 249 215 95 525 273 57 102 296 115 17 38 22 249 26 9 106
109
ASKNet Score 59.08 0.02 0.90 2.41 0.08 34.34 0.33 0.86 4.89 4.69 0.80 0.85 4.80 0.02 0.12 1.07 0.60 0.00 0.35 0.28 0.00 0.00 0.04 2.21 0.02 0.00 0.00 0.00 0.00 5.59 8.93 11.92 5.07 6.78 2.48 32.68 8.20 0.79 20.88 4.54 4.88 0.28 0.73 0.45 8.69 0.24 0.08 3.53
ws-353 Rank 13 3 17 7 16 11 4 19 92 120 105 166 179 315 263 212 241 329 264 265 333 303 347 302 346 350 330 351 349 32 8 9 41 87 112 26 73 132 217 297 57 59 122 135 128 250 222 210
Baseline Rank 57 300 171 189 271 99 245 160 88 130 217 178 184 300 256 168 193 316 219 228 316 316 283 97 295 316 316 316 316 74 85 73 71 84 151 18 60 199 146 53 134 249 221 240 71 235 271 139
ASKNet Rank 8 298 185 142 264 22 233 191 99 106 193 192 102 298 255 178 210 316 231 239 316 316 286 146 298 316 316 316 316 87 59 52 94 78 140 23 69 196 30 107 100 239 203 220 62 244 264 125
Word Pair psychology - psychiatry psychology - anxiety psychology - fear psychology - depression psychology - clinic psychology - doctor psychology - Freud psychology - mind psychology - health psychology - science psychology - discipline psychology - cognition planet - star planet - constellation planet - moon planet - sun planet - galaxy planet - space planet - astronomer precedent - example precedent - information precedent - cognition precedent - law precedent - collection precedent - group precedent - antecedent cup - coffee cup - tableware cup - article cup - artifact cup - object cup - entity cup - drink cup - food cup - substance cup - liquid jaguar - cat jaguar - car energy - secretary secretary - senate energy - laboratory computer - laboratory weapon - secret FBI - fingerprint FBI - investigation investigation - effort Mars - water Mars - scientist news - report canyon - landscape
ws-353 Score 8.08 7 6.85 7.42 6.58 6.42 8.21 7.69 7.23 6.71 5.58 7.48 8.45 8.06 8.08 8.02 8.11 7.92 7.94 5.85 3.85 2.81 6.65 2.5 1.77 6.04 6.58 6.85 2.4 2.92 3.69 2.15 7.25 5 1.92 5.9 7.42 7.27 1.81 5.06 5.09 6.78 6.06 6.94 8.31 4.59 2.94 5.63 8.16 7.53
Baseline Score 0 0 0 0 0 0 0 0 0 0 0 0 629 127 262 232 165 319 227 88 5 14 340 4 42 0 142 0 364 11 80 50 105 139 37 51 136 102 0 0 61 46 66 187 126 32 365 84 136 16
110
ASKNet Score 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 23.57 1.34 5.18 5.35 2.21 8.33 7.15 1.74 0.03 0.12 19.29 0.03 0.40 0.00 4.46 0.00 4.85 0.07 0.75 0.45 1.67 1.52 0.26 0.64 24.68 7.79 0.00 0.00 0.66 0.55 1.01 7.86 13.68 0.62 8.50 0.87 3.93 0.14
ws-353 Rank 51 126 137 101 158 169 42 77 115 149 223 95 30 53 50 56 49 63 61 211 280 312 153 320 337 193 157 136 322 310 288 328 114 243 332 208 102 113 334 239 238 142 192 130 38 257 309 219 43 91
Baseline Rank 316 316 316 316 316 316 316 316 316 316 316 316 13 127 65 77 110 44 80 160 292 256 42 295 215 316 117 316 38 264 171 203 141 119 222 202 122 146 316 316 193 210 186 97 128 231 37 168 122 253
ASKNet Rank 316 316 316 316 316 316 316 316 316 316 316 316 26 166 92 91 146 66 75 155 290 255 32 290 225 316 109 316 101 268 199 220 158 160 243 208 25 72 316 316 206 215 180 71 45 209 63 188 120 252
Word Pair image - surface discovery - space water - seepage sign - recess Wednesday - news mile - kilometer computer - news territory - surface atmosphere - landscape president - medal war - troops record - number skin - eye Japanese - American theater - history volunteer - motto prejudice - recognition decoration - valor century - year century - nation delay - racism delay - news minister - party peace - plan minority - peace attempt - peace government - crisis deployment - departure deployment - withdrawal energy - crisis announcement - news announcement - effort stroke - hospital disability - death victim - emergency treatment - recovery journal - association doctor - personnel doctor - liability liability - insurance school - center reason - hypertension reason - criterion hundred - percent Harvard - Yale hospital - infrastructure death - row death - inmate lawyer - evidence life - death
ws-353 Score 4.56 6.34 6.56 2.38 2.22 8.66 4.47 5.34 3.69 3 8.13 6.31 6.22 6.5 3.91 2.56 3 5.63 7.59 3.16 1.19 3.31 6.63 4.75 3.69 4.25 6.56 4.25 5.88 5.94 7.56 2.75 7.03 5.47 6.47 7.91 4.97 5 5.19 7.03 3.44 2.31 5.91 7.38 8.13 4.63 5.25 5.03 6.69 7.88
Baseline Score 254 90 23 66 3 86 50 29 22 9 210 920 174 197 73 3 5 3 1598 1802 0 9 58 106 11 90 105 0 0 44 8 10 174 23 47 15 33 14 11 265 193 68 37 120 401 7 438 78 116 607
111
ASKNet Score 4.38 0.89 0.39 0.78 0.02 3.71 0.59 0.22 0.19 0.05 4.53 15.82 1.54 2.52 0.91 0.02 0.03 0.02 38.72 61.30 0.00 0.06 0.76 2.20 0.07 2.07 1.71 0.00 0.00 1.39 0.06 0.06 2.88 0.35 0.80 0.11 0.27 0.11 0.10 53.75 4.44 1.26 0.56 1.24 35.76 0.04 6.97 2.21 1.11 43.05
ws-353 Rank 258 174 160 324 327 21 260 229 287 305 44 178 184 162 275 318 306 218 85 301 343 298 155 253 285 267 159 269 209 203 88 313 124 226 165 64 245 242 236 125 293 325 206 106 46 255 235 240 151 66
Baseline Rank 70 157 238 186 300 165 203 234 240 271 87 6 106 93 178 300 292 300 3 2 316 271 197 139 264 157 141 316 316 213 279 267 106 238 208 255 228 256 264 64 96 184 222 131 31 283 28 176 133 16
ASKNet Rank 111 186 226 197 298 123 212 245 247 279 108 38 159 139 184 298 290 298 17 7 316 273 198 149 268 151 157 316 316 165 273 273 134 231 193 259 242 259 262 11 110 168 213 169 21 286 77 146 175 16
Word Pair life - term word - similarity board - recommendation governor - interview OPEC - country peace - atmosphere peace - insurance territory - kilometer travel - activity competition - price consumer - confidence consumer - energy problem - airport car - flight credit - card credit - information hotel - reservation grocery - money registration - arrangement arrangement - accommodation month - hotel type - kind arrival - hotel bed - closet closet - clothes situation - conclusion situation - isolation impartiality - interest direction - combination street - place street - avenue street - block street - children listing - proximity listing - category cell - phone production - hike benchmark - index media - trading media - gain dividend - payment dividend - calculation calculation - computation currency - market OPEC - oil oil - stock announcement - production announcement - warning profit - warning profit - loss
ws-353 Score 4.5 4.75 4.47 3.25 5.63 3.69 2.94 5.28 5 6.44 4.13 4.75 2.38 4.94 8.06 5.31 8.03 5.94 6 5.41 1.81 8.97 6 6.72 8 4.81 3.88 5.16 2.25 6.44 8.88 6.88 4.94 2.56 6.38 7.81 1.75 4.25 3.88 2.88 7.63 6.48 8.44 7.5 8.59 6.34 3.38 6 3.88 7.63
Baseline Score 1507 61 9 9 40 7 6 22 79 95 4 61 12 261 442 277 33 0 10 2 17 609 0 43 31 48 17 4 52 170 87 132 92 2 7 319 2 2 45 544 73 0 12 153 87 78 0 9 7 170
112
ASKNet Score 49.06 0.56 0.06 0.07 0.38 0.04 0.04 0.50 0.93 2.92 0.02 1.18 0.08 4.14 226.92 12.98 1.19 0.00 0.06 0.01 0.15 12.37 0.00 1.08 0.54 0.45 0.14 0.03 0.33 3.24 0.95 2.93 1.52 0.01 0.05 102.29 0.01 0.01 0.74 8.25 4.08 0.00 0.12 2.93 13.81 0.66 0.00 0.07 0.05 7.49
ws-353 Rank 259 252 261 299 220 286 308 233 244 167 270 251 323 247 54 231 55 202 196 228 336 12 195 148 58 249 278 237 326 168 15 131 246 319 171 70 338 268 276 311 81 163 31 93 23 175 295 197 277 80
Baseline Rank 4 193 271 271 218 283 288 240 173 151 295 193 260 67 27 56 228 316 267 308 249 15 316 214 233 206 249 295 201 108 162 125 154 308 283 44 308 308 211 17 178 316 260 113 162 176 316 271 283 108
ASKNet Rank 14 213 273 268 229 286 286 218 183 133 298 172 264 116 1 48 171 316 273 308 250 50 316 176 216 220 252 290 233 127 182 131 160 308 279 4 308 308 201 68 117 316 255 131 44 206 316 268 279 74
Word Pair dollar - yen dollar - buck dollar - profit dollar - loss computer - software network - hardware phone - equipment equipment - maker luxury - car five - month report - gain investor - earning liquid - water baseball - season game - victory game - team marathon - sprint game - series game - defeat seven - series seafood - sea seafood - food seafood - lobster lobster - food lobster - wine food - preparation video - archive start - year start - match game - round boxing - round championship - tournament fighting - defeating line - insurance day - summer summer - drought summer - nature day - dawn nature - environment environment - ecology nature - man man - woman man - governor murder - manslaughter soap - opera opera - performance life - lesson focus - life production - crew television - film
ws-353 Score 7.78 9.22 7.38 6.09 8.5 8.31 7.13 5.91 6.47 3.38 3.63 7.13 7.89 5.97 7.03 7.69 7.47 6.19 6.97 3.56 7.47 8.34 8.7 7.81 5.7 6.22 6.34 4.06 4.47 5.97 7.61 8.36 7.41 2.69 3.94 7.16 5.63 7.53 8.31 8.81 6.25 8.3 5.25 8.53 7.94 6.88 5.94 4.06 6.25 7.72
Baseline Score 66 10 62 22 293 79 262 42 62 182 461 45 449 114 104 825 3 393 148 185 19 26 6 71 0 131 6 887 235 709 92 83 2 305 266 0 55 103 299 96 2368 842 275 222 13 400 3 314 115 446
113
ASKNet Score 4.94 0.08 0.69 0.21 17.96 2.28 4.70 0.36 0.67 2.41 7.57 1.01 12.17 5.00 1.52 65.19 0.02 12.54 2.16 14.63 0.32 0.39 0.05 1.18 0.00 1.41 0.03 18.01 6.31 21.38 10.34 1.31 0.01 18.61 6.40 0.00 0.75 1.05 8.29 5.68 57.34 21.15 15.78 49.56 0.07 16.19 0.02 4.06 4.30 8.35
ws-353 Rank 71 5 107 191 27 39 118 207 164 294 290 119 65 199 123 76 96 189 129 291 97 35 20 69 216 185 173 272 262 198 83 34 104 314 274 117 221 90 37 18 180 40 234 25 60 133 201 271 182 74
Baseline Rank 186 267 191 240 55 173 65 215 191 103 24 211 25 136 143 9 300 34 115 101 246 235 288 182 316 126 288 7 76 12 154 170 308 48 62 316 200 145 52 149 1 8 58 82 259 32 300 46 134 26
ASKNet Rank 96 264 204 246 35 145 105 230 205 142 73 180 51 95 160 5 298 49 150 41 235 226 279 172 316 164 290 34 81 27 55 167 308 33 80 316 199 179 67 86 10 28 39 13 268 37 298 118 112 64
Word Pair lover - quarrel viewer - serial possibility - girl population - development morality - importance morality - marriage Mexico - Brazil gender - equality change - attitude family - planning opera - industry sugar - approach practice - institution ministry - culture problem - challenge size - prominence country - citizen planet - people development - issue experience - music music - project glass - metal aluminum - metal chance - credibility exhibit - memorabilia concert - virtuoso rock - jazz museum - theater observation - architecture space - world preservation - world admission - ticket shower - thunderstorm shower - flood weather - forecast disaster - area governor - office architecture - century
ws-353 Score 6.19 2.97 1.94 3.75 3.31 3.69 7.44 6.41 5.44 6.25 2.63 0.88 3.19 4.69 6.75 5.31 7.31 5.75 3.97 3.47 3.63 5.56 7.83 3.88 5.31 6.81 7.59 7.19 4.38 6.53 6.19 7.69 6.31 6.03 8.34 6.25 6.34 3.78
Baseline Score 3 0 2 331 10 5 120 113 180 89 205 16 303 8 151 95 305 145 384 275 155 302 79 0 2 8 240 22 6 494 96 85 2 34 72 126 85 202
114
ASKNet Score 0.02 0.00 0.01 5.49 0.09 0.03 2.04 3.43 13.19 1.17 2.70 0.15 5.91 0.05 1.79 0.87 4.91 2.44 4.90 2.54 1.44 8.81 2.64 0.00 0.01 0.05 7.89 0.28 0.03 7.07 0.87 2.39 0.01 0.31 3.90 1.73 4.79 5.59
ws-353 Rank 186 307 331 282 296 284 100 170 227 181 317 348 300 254 145 230 111 215 273 292 289 224 68 279 232 141 84 116 266 161 190 75 176 194 36 183 172 281
Baseline Rank 300 316 308 43 267 292 131 137 105 159 88 253 50 279 114 151 48 116 35 58 112 51 173 316 308 279 75 240 288 23 149 166 308 227 181 128 166 90
ASKNet Rank 298 316 308 89 263 290 153 126 47 174 136 250 84 279 154 188 97 141 98 138 163 61 137 316 308 279 70 239 290 76 188 144 308 237 122 156 103 87
Appendix C Semantic Relatedness Scores & Rankings - Improved Corpus This appendix provides the complete set of relatedness scores and score rankings for ws-353, baseline system and ASKNet as computed on the improved corpus. An extract from this table is given in Table 6.6.
115
Word Pair love - sex tiger - cat tiger - tiger book - paper computer - keyboard computer - internet plane - car train - car telephone - communication television - radio media - radio drug - abuse bread - butter cucumber - potato doctor - nurse professor - doctor student - professor smart - student smart - stupid company - stock stock - market stock - phone stock - CD stock - jaguar stock - egg fertility - egg stock - live stock - life book - library bank - money wood - forest money - cash professor - cucumber king - cabbage king - queen king - rook bishop - rabbi Jerusalem - Israel Jerusalem - Palestinian holy - sex fuck - sex Maradona - football football - soccer football - basketball football - tennis tennis - racket Arafat - peace Arafat - terror Arafat - Jackson
ws-353 Score 6.77 7.35 10 7.46 7.62 7.58 5.77 6.31 7.5 6.77 7.42 6.85 6.19 5.92 7 6.62 6.81 4.62 5.81 7.08 8.08 1.62 1.31 0.92 1.81 6.69 3.73 0.92 7.46 8.12 7.73 9.15 0.31 0.23 8.58 5.92 6.69 8.46 7.65 1.62 9.44 8.62 9.03 6.81 6.63 7.56 6.73 7.65 2.5
Baseline Score 599 947 8773 5215 1403 390 922 3705 491 2657 833 435 693 20 426 1545 1503 23 3 3259 6061 33 7 430 8 28 318 338 4419 3561 1836 901 217 88 108 476 103 11617 4212 283 392 844 1412 529 84 70 2093 673 55
116
ASKNet Score 5.74 1.82 175.86 36.2 26.45 5.2 1.09 5.16 15.52 47.63 11.61 19.14 21.91 0.26 50.97 126.82 33.08 2.34 12.93 41.84 49.58 0.34 0.14 16.25 0.2 2.93 2.57 2.08 67.1 33.85 25.36 37.97 2.9 0.96 3.4 5.07 1.76 98.15 97.15 4.87 10.69 36.74 133.9 20.01 2.2 32.38 107.12 17.16 0.73
ws-353 Rank 144 109 1 98 82 86 214 177 94 143 103 138 188 204 127 156 139 256 213 121 52 340 341 345 335 150 283 344 99 48 72 6 352 353 24 205 152 28 79 339 2 22 10 140 154 89 147 78 321
Baseline Rank 110 73 8 20 59 144 76 26 127 33 84 136 98 287 139 52 53 281 323 30 15 267 311 138 303 274 157 152 23 27 42 77 177 221 209 131 212 7 24 161 143 81 57 121 222 237 39 102 249
ASKNet Rank 169 270 8 51 65 178 288 179 95 38 112 85 77 321 36 14 55 246 105 44 37 318 332 89 326 231 235 263 30 53 69 47 232 297 221 180 271 17 18 186 114 50 13 84 252 57 16 87 304
Word Pair law - lawyer movie - star movie - popcorn movie - critic movie - theater physics - proton physics - chemistry space - chemistry alcohol - chemistry vodka - gin vodka - brandy drink - car drink - ear drink - mouth drink - eat baby - mother drink - mother car - automobile gem - jewel journey - voyage boy - lad coast - shore asylum - madhouse magician - wizard midday - noon furnace - stove food - fruit bird - cock bird - crane tool - implement brother - monk crane - implement lad - brother journey - car monk - oracle cemetery - woodland food - rooster coast - hill forest - graveyard shore - woodland monk - slave coast - forest lad - wizard chord - smile glass - magician noon - string rooster - voyage money - dollar money - cash money - currency
ws-353 Score 8.38 7.38 6.19 6.73 7.92 8.12 7.35 4.88 5.54 8.46 8.13 3.04 1.31 5.96 6.87 7.85 2.65 8.94 8.96 9.29 8.83 9.1 8.87 9.02 9.29 8.79 7.52 7.1 7.38 6.46 6.27 2.69 4.46 5.85 5 2.08 4.42 4.38 1.85 3.08 0.92 3.15 0.92 0.54 2.08 0.54 0.62 8.42 9.08 9.04
Baseline Score 13792 1479 1234 248 6648 1378 433 144 61 81 20 621 1327 43 2170 53 48 621 5917 3 90 377 10 3434 33 289 489 565 80 136 597 176 12 107 61 0 35 31 22 1 4 226 7 0 0 0 1 559 933 1409
117
ASKNet Score 110.76 9.96 42.94 3.88 168.49 22.38 20.8 2.42 2.37 0.5 4.96 3.39 1.65 4.99 4.29 14.06 1.08 5.65 70.91 24.71 3.97 63.43 135.62 406.39 222.97 31.9 6.91 16.11 7.52 6.36 87.11 2.96 1.4 3.06 17.06 0 2.39 2.12 0.29 0.01 0.17 9.72 0.16 0 0 0 0.02 15.27 38.39 37.1
ws-353 Rank 33 108 187 146 62 47 110 248 225 29 45 304 342 200 134 67 316 14 13 3 17 7 16 11 4 19 92 120 105 166 179 315 263 212 241 329 264 265 333 303 347 302 346 350 330 351 349 32 8 9
Baseline Rank 3 54 64 169 11 60 137 195 244 224 285 107 63 260 38 252 256 106 16 320 219 148 300 28 268 160 130 116 226 199 111 185 298 210 242 352 265 271 282 342 318 174 313 348 349 350 332 118 75 58
ASKNet Rank 15 122 42 209 9 74 81 239 244 313 184 222 272 183 198 102 289 170 28 71 205 33 12 3 6 58 151 90 142 157 20 230 279 227 88 352 242 259 320 342 329 124 330 348 349 350 339 96 46 49
Word Pair money - wealth money - property money - possession money - bank money - deposit money - withdrawal money - laundering money - operation tiger - jaguar tiger - feline tiger - carnivore tiger - mammal tiger - animal tiger - organism tiger - fauna tiger - zoo psychology - psychiatry psychology - anxiety psychology - fear psychology - depression psychology - clinic psychology - doctor psychology - Freud psychology - mind psychology - health psychology - science psychology - discipline psychology - cognition planet - star planet - constellation planet - moon planet - sun planet - galaxy planet - space planet - astronomer precedent - example precedent - information precedent - cognition precedent - law precedent - collection precedent - group precedent - antecedent cup - coffee cup - tableware cup - article cup - artifact cup - object cup - entity cup - drink cup - food
ws-353 Score 8.27 7.57 7.29 8.5 7.73 6.88 5.65 3.31 8 8 7.08 6.85 7 4.77 5.62 5.87 8.08 7 6.85 7.42 6.58 6.42 8.21 7.69 7.23 6.71 5.58 7.48 8.45 8.06 8.08 8.02 8.11 7.92 7.94 5.85 3.85 2.81 6.65 2.5 1.77 6.04 6.58 6.85 2.4 2.92 3.69 2.15 7.25 5
Baseline Score 596 679 326 3271 1130 80 8005 779 490 42 73 54 873 24 8 353 15 0 15 2 16 2 161 123 158 476 97 379 3022 136 584 1930 225 976 742 279 62 13 2541 7 49 8 709 0 498 7 76 45 210 155
118
ASKNet Score 7.37 6.19 10.43 29.99 21.39 5.62 199.95 10.2 23.37 5.82 10.08 1.29 7.08 0.45 1.93 8.75 0.2 0 0.19 0.02 0.22 0.03 2.14 1.64 2.1 6.34 1.3 5.05 15.23 6.34 15.76 24.81 5.87 7.06 27.85 4.34 0.92 2.21 42.93 0.39 1.16 0.11 21.11 0 2.25 0.99 0.91 1.44 6.78 2.1
ws-353 Rank 41 87 112 26 73 132 217 297 57 59 122 135 128 250 222 210 51 126 137 101 158 169 42 77 115 149 223 95 30 53 50 56 49 63 61 211 280 312 153 320 337 193 157 136 322 310 288 328 114 243
Baseline Rank 112 100 154 29 67 227 9 91 129 262 234 251 79 279 305 150 295 344 296 330 291 326 187 204 188 132 214 146 31 200 114 40 175 71 95 162 240 297 34 308 254 304 96 353 124 310 231 259 179 190
ASKNet Rank 144 164 116 60 78 173 7 118 73 168 120 282 149 316 265 130 325 344 327 338 323 337 258 273 260 158 281 181 97 159 91 70 167 150 61 196 298 251 43 317 285 334 79 353 250 294 300 278 153 262
Word Pair cup - substance cup - liquid jaguar - cat jaguar - car energy - secretary secretary - senate energy - laboratory computer - laboratory weapon - secret FBI - fingerprint FBI - investigation investigation - effort Mars - water Mars - scientist news - report canyon - landscape image - surface discovery - space water - seepage sign - recess Wednesday - news mile - kilometer computer - news territory - surface atmosphere - landscape president - medal war - troops record - number skin - eye Japanese - American theater - history volunteer - motto prejudice - recognition decoration - valor century - year century - nation delay - racism delay - news minister - party peace - plan minority - peace attempt - peace government - crisis deployment - departure deployment - withdrawal energy - crisis announcement - news announcement - effort stroke - hospital disability - death
ws-353 Score 1.92 5.9 7.42 7.27 1.81 5.06 5.09 6.78 6.06 6.94 8.31 4.59 2.94 5.63 8.16 7.53 4.56 6.34 6.56 2.38 2.22 8.66 4.47 5.34 3.69 3 8.13 6.31 6.22 6.5 3.91 2.56 3 5.63 7.59 3.16 1.19 3.31 6.63 4.75 3.69 4.25 6.56 4.25 5.88 5.94 7.56 2.75 7.03 5.47
Baseline Score 26 65 2505 779 77 205 74 245 137 792 1587 72 860 88 1470 16 602 408 56 78 19 1124 126 24 19 22 457 12330 203 377 116 2 3 2 4709 6141 0 154 1768 227 8 208 187 782 0 550 158 8 289 68
119
ASKNet Score 0.79 2.27 7.62 4.41 1.03 2.74 2.01 5.63 7.61 64.03 240.38 9.26 4.17 3.02 25.91 9.67 6.23 7.23 6.01 1.47 0.91 72.48 2.15 0.46 2.25 0.72 10.41 146.27 4.74 4.21 1.47 5.42 0.88 54.05 13.83 3.47 0 3.42 26.24 1.88 1.06 7.17 3.58 10.43 0 7.67 7.48 4.44 8.71 2.15
ws-353 Rank 332 208 102 113 334 239 238 142 192 130 38 257 309 219 43 91 258 174 160 324 327 21 260 229 287 305 44 178 184 162 275 318 306 218 85 301 343 298 155 253 285 267 159 269 209 203 88 313 124 226
Baseline Rank 276 239 35 90 230 181 233 171 197 87 50 235 80 220 55 294 109 141 247 229 288 68 203 280 289 283 134 5 182 147 205 324 322 328 21 13 346 191 43 173 306 180 184 89 351 119 189 307 159 238
ASKNet Rank 303 248 139 193 292 234 264 171 140 32 4 127 203 228 67 125 161 145 166 276 301 27 257 315 249 305 117 10 188 201 277 176 302 34 103 218 346 220 66 268 291 147 216 115 351 138 143 191 131 256
Word Pair victim - emergency treatment - recovery journal - association doctor - personnel doctor - liability liability - insurance school - center reason - hypertension reason - criterion hundred - percent Harvard - Yale hospital - infrastructure death - row death - inmate lawyer - evidence life - death life - term word - similarity board - recommendation governor - interview OPEC - country peace - atmosphere peace - insurance territory - kilometer travel - activity competition - price consumer - confidence consumer - energy problem - airport car - flight credit - card credit - information hotel - reservation grocery - money registration - arrangement arrangement - accommodation month - hotel type - kind arrival - hotel bed - closet closet - clothes situation - conclusion situation - isolation impartiality - interest direction - combination street - place street - avenue street - block street - children listing - proximity listing - category
ws-353 Score 6.47 7.91 4.97 5 5.19 7.03 3.44 2.31 5.91 7.38 8.13 4.63 5.25 5.03 6.69 7.88 4.5 4.75 4.47 3.25 5.63 3.69 2.94 5.28 5 6.44 4.13 4.75 2.38 4.94 8.06 5.31 8.03 5.94 6 5.41 1.81 8.97 6 6.72 8 4.81 3.88 5.16 2.25 6.44 8.88 6.88 4.94 2.56 6.38
Baseline Score 110 56 29 20 12 6153 623 126 62 264 3798 43 700 222 134 4703 5242 95 17 8 5368 4 5 50 142 315 371 216 76 437 47692 1705 276 2 7 1 49 1355 54 109 81 57 16 202 35 339 95 324 153 1 147
120
ASKNet Score 11.95 2.51 2.29 3.47 1.38 225.19 7.22 4.42 6.2 6.36 79.17 0.92 2.1 30.52 4.21 22.04 7.73 6.25 1 3.9 72.98 0.24 0.16 10.19 3.23 5.63 5.45 2.52 1.15 2.16 524.34 14.11 37.73 0.03 2.18 0.55 1.55 6.42 0.71 7.81 64.49 4.17 2.75 3.73 1.9 2.96 43.64 17.24 5.03 2.39 4.63
ws-353 Rank 165 64 245 242 236 125 293 325 206 106 46 255 235 240 151 66 259 252 261 299 220 286 308 233 244 167 270 251 323 247 54 231 55 202 196 228 336 12 195 148 58 249 278 237 326 168 15 131 246 319 171
Baseline Rank 207 248 272 286 299 12 105 202 241 168 25 261 97 176 201 22 19 217 290 302 18 317 316 253 196 158 149 178 232 135 1 47 163 325 312 338 255 61 250 208 225 246 292 183 266 151 216 155 193 337 194
ASKNet Rank 109 237 247 219 280 5 146 192 163 156 22 299 261 59 200 76 137 160 293 207 26 322 331 119 225 172 175 236 286 255 2 100 48 336 254 312 275 155 306 136 31 202 233 212 267 229 41 86 182 243 190
Word Pair cell - phone production - hike benchmark - index media - trading media - gain dividend - payment dividend - calculation calculation - computation currency - market OPEC - oil oil - stock announcement - production announcement - warning profit - warning profit - loss dollar - yen dollar - buck dollar - profit dollar - loss computer - software network - hardware phone - equipment equipment - maker luxury - car five - month report - gain investor - earning liquid - water baseball - season game - victory game - team marathon - sprint game - series game - defeat seven - series seafood - sea seafood - food seafood - lobster lobster - food lobster - wine food - preparation video - archive start - year start - match game - round boxing - round championship - tournament fighting - defeating line - insurance day - summer
ws-353 Score 7.81 1.75 4.25 3.88 2.88 7.63 6.48 8.44 7.5 8.59 6.34 3.38 6 3.88 7.63 7.78 9.22 7.38 6.09 8.5 8.31 7.13 5.91 6.47 3.38 3.63 7.13 7.89 5.97 7.03 7.69 7.47 6.19 6.97 3.56 7.47 8.34 8.7 7.81 5.7 6.22 6.34 4.06 4.47 5.97 7.61 8.36 7.41 2.69 3.94
Baseline Score 11828 1 2 79 831 672 1 57 401 1668 72 3 7 6 837 495 9 113 32 2696 246 491 36 95 242 786 101 1709 1204 164 6831 2 6124 274 1554 47 41 21 529 0 619 28 2172 678 2479 1035 153 1 1883 1338
121
ASKNet Score 78.55 0.12 0.97 3.62 3.11 41.56 0.01 8.33 7.08 78.74 0.68 0.04 26.72 1.07 20.12 80.5 3.36 4.68 0.98 45.09 25.75 11.48 3.92 2.5 4.38 3.8 12.41 15.54 35.96 6.22 24.58 1.19 74.28 8.43 33.01 3.59 11.66 9.66 9.75 0 11.89 0.59 8.63 8.6 7.57 15.03 20.84 3.27 6.11 15.6
ws-353 Rank 70 338 268 276 311 81 163 31 93 23 175 295 197 277 80 71 5 107 191 27 39 118 207 164 294 290 119 65 199 123 76 96 189 129 291 97 35 20 69 216 185 173 272 262 198 83 34 104 314 274
Baseline Rank 6 334 331 228 85 103 341 245 142 48 236 321 309 314 83 125 301 206 269 32 170 126 264 215 172 88 213 46 66 186 10 329 14 164 51 258 263 284 120 345 108 275 37 101 36 70 192 339 41 62
ASKNet Rank 24 333 296 214 226 45 341 135 148 23 309 335 63 290 83 21 223 189 295 40 68 113 206 238 194 211 106 94 52 162 72 284 25 134 56 215 111 126 123 345 110 311 132 133 141 98 80 224 165 93
Word Pair summer - drought summer - nature day - dawn nature - environment environment - ecology nature - man man - woman man - governor murder - manslaughter soap - opera opera - performance life - lesson focus - life production - crew television - film lover - quarrel viewer - serial possibility - girl population - development morality - importance morality - marriage Mexico - Brazil gender - equality change - attitude family - planning opera - industry sugar - approach practice - institution ministry - culture problem - challenge size - prominence country - citizen planet - people development - issue experience - music music - project glass - metal aluminum - metal chance - credibility exhibit - memorabilia concert - virtuoso rock - jazz museum - theater observation - architecture space - world preservation - world
ws-353 Score 7.16 5.63 7.53 8.31 8.81 6.25 8.3 5.25 8.53 7.94 6.88 5.94 4.06 6.25 7.72 6.19 2.97 1.94 3.75 3.31 3.69 7.44 6.41 5.44 6.25 2.63 0.88 3.19 4.69 6.75 5.31 7.31 5.75 3.97 3.47 3.63 5.56 7.83 3.88 5.31 6.81 7.59 7.19 4.38 6.53 6.19
Baseline Score 0 82 137 1118 768 5829 12541 1593 15307 328 1755 47 574 528 1753 2 1 1 693 25 61 272 1227 1441 822 272 16 625 25 490 92 525 322 594 389 957 886 264 0 1 5 840 28 4 750 104
122
ASKNet Score 0 2.4 4.3 9.97 33.66 4.27 142.99 3.85 634.36 4.38 15.65 0.71 4.92 13.21 22.37 20.62 0.01 0.17 4.14 1.85 1.09 3.88 46.43 14.45 14.1 1.91 0.32 4.74 0.64 8.77 2.39 6.84 1.63 3.49 3.69 12.28 12.1 51.45 0 0.68 5.24 26.46 1.23 0.2 2.2 2.35
ws-353 Rank 117 221 90 37 18 180 40 234 25 60 133 201 271 182 74 186 307 331 282 296 284 100 170 227 181 317 348 300 254 145 230 111 215 273 292 289 224 68 279 232 141 84 116 266 161 190
Baseline Rank 343 223 198 69 92 17 4 49 2 153 44 257 115 122 45 327 335 333 99 277 243 166 65 56 86 165 293 104 278 128 218 123 156 113 145 72 78 167 347 336 315 82 273 319 93 211
ASKNet Rank 343 240 197 121 54 199 11 210 1 195 92 307 185 104 75 82 340 328 204 269 287 208 39 99 101 266 319 187 310 129 241 152 274 217 213 107 108 35 347 308 177 64 283 324 253 245
Word Pair admission - ticket shower - thunderstorm shower - flood weather - forecast disaster - area governor - office architecture - century
ws-353 Score 7.69 6.31 6.03 8.34 6.25 6.34 3.78
Baseline Score 461 1 31 937 414 742 560
123
ASKNet Score 70.23 0.48 9.23 90.72 6.48 27.85 5.55
ws-353 Rank 75 176 194 36 183 172 281
Baseline Rank 133 340 270 74 140 94 117
ASKNet Rank 29 314 128 19 154 62 174
Bibliography Satanjeev Banerjee and T. Pedersen. Extended gloss overlaps as a measure of semantic relatedness. In In Proceedings of the Eighteenth International Conference on Artificial Intelligence (IJCAI-03), 2003. M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, and S. Fienberg. Adaptive name matching in information integration. Intelligent Systems, IEEE, 18:16 – 23, Sep/Oct 2003. Johan Bos. Towards wide-coverage semantic interpretation. In Proceedings of Sixth International Workshop on Computational Semantics IWCS-6, pages 42–53, 2005. Johan Bos, Stephen Clark, Mark Steedman, James R. Curran, and Julia Hockenmaier. Wide-coverage semantic representations from a CCG parser. In Proceedings of the 20th International Conference on Computational Linguistics (COLING-04), pages 1240–1246, Geneva, Switzerland, 2004. T. Briscoe and J. Carroll. Robust accurate statistical annotation of general text. In Proceedings of the 3rd International Conference on Language Resources and Evaluation, pages 1499–1504, Las Palmas, Gran Canaria, 2002. Alexander Budanitsky and Graeme Hirst. Evaluating wordnet-based measures of semantic distance. Computational Linguistics, 32:13 – 47, March 2006. J. Carletta. Assessing agreement on classification tasks: the Kappa statistic. Computational Linguistics, 22(2):249–254, 1996. Eugene Charniak. A maximum-entropy-inspired parser. In Proceedings of the First Conference on North American Chapter of the Association for Computational Linguistics, pages 132–139, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc. Eugene Charniak and Micha Elsner. Em works for pronoun anaphora resolution. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 148 – 156, Athens Greece, 2009. Eugene Charniak and Robert P. Goldman. A Bayesian model of plan recognition. Artificial Intelligence, 64(1):53–79, 1993.
124
BIBLIOGRAPHY
Kenneth Ward Church and Patrick Hanks. Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1):22–29, 1990. S. Clark and J. R. Curran. Wide-coverage efficient statistical parsing with CCG and log-linear models. Computational Linguistics, 33(4):493–552, 2007. Stephen Clark and James R. Curran. Parsing the WSJ using CCG and log-linear models. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL ’04), pages 104–111, Barcelona, Spain, 2004. Allan M. Collins and Elizabeth F. Loftus. A spreading-activation theory of semantic processing. Psychological Review, 82(6):407–428, 1975. M. Collins. Head-driven statistical models for natural language parsing. Computational Linguistics, 29(4):589–637, 2003. Micheal Collins. Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, University of Pennsylvania, 1999. F. Crestani. Application of spreading activation techniques in information retrieval. Artificial Intelligence Review, 11(6):453 – 482, Dec 1997. J. R. Curran and S. Clark. Language independent NER using a maximum entropy tagger. In Proceedings of the Seventh Conference on Natural Language Learning (CoNLL-03), pages 164–167, Edmonton, Canada, 2003. Jon Curtis, G. Matthews, and D. Baxter. On the effective use of Cyc in a question answering system. In Papers from the IJCAI Workshop on Knowledge and Reasoning for Answering Questions, Edinburgh, Scotland, 2005. Jon Curtis, D. Baxter, and J. Cabral. On the application of the Cyc ontology to word sense disambiguation. In Proceedings of the Nineteenth International FLAIRS Conference, pages 652 – 657, Melbourne Beach, FL, May 2006. H. Trang Dang, J. Lin, and D. Kelly. Overview of the TREC 2006 question answering track. In Proceedings of the Fifteenth Text Retrieval Conference (TREC 2006), Gaithersburg, MD, 2006. E. W. Dijkstra. A note on two problems in connection with graphs. Numerical Mathematics, 1:269 – 271, 1959. William B. Dolan, L. Vanderwende, and S. Richardson. Automatically deriving a structured knowledge base from on-line dictionaries. In Proceedings of the Pacific Association for Computational Linguistics, Vancouver, British Columbia, April 1993. Oren Etzioni, Michael Cafarella, Doug Downey, Stanley Kok, Ana-Maria Popescu, Tal Shaked, Stephen Soderland, Daniel S. Weld, and Alexander Yates. Web-scale
125
BIBLIOGRAPHY
information extraction in KnowItAll: (preliminary results). In WWW ’04: Proceedings of the 13th international conference on World Wide Web, pages 100–110, New York, NY, USA, 2004. ACM. Christiane Fellbaum, editor. WordNet : An Electronic Lexical Database. MIT Press, Cambridge, Mass, USA, 1998. Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. Placing search in context: The concept revisited. In ACM Transactions on Information Systems, volume 20(1), pages 116–131, 2002. Emden R. Gansner and Stephen C. North. An open graph visualization system and its applications to software engineering. Software — Practice and Experience, 30 (11):1203 – 1233, 2000. R. V. Guha and A. Garg. Disambiguating people in search. In 13th World Wide Web Conference (WWW 2004), New York, USA, 2004. A. Hickl, J. Williams, J. Bensley, K. Roberts, B. Rink, and Y. Shi. Recognizing textual entailment with LCC’s groundhog system. In Proceedings of the Second PASCAL Challenges Workshop, Venice, Italy, 2006. G. Hirst. Semantic Interpretation and the Resolution of Ambiguity. Studies in Natural Language Processing. Cambridge University Press, Cambridge, UK, 1987. J. Hockenmaier. Data and Models for Statistical Parsing with Combinatory Categorial Grammar. PhD thesis, University of Edinburgh, 2003. Thad Hughes and Daniel Ramage. Lexical semantic relatedness with random graph walks. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLPCoNLL), pages 581–589, Prague, Czech Republic, 2007. J. J. Jiang and D. W. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. In International Conference on Research on Computational Linguistics (ROCLING X), Taipei, Taiwan, September 1997. H. Kamp. A theory of truth and semantic representation. In J. Groenendijk et al., editors, Formal Methods in the Study of Language. Mathematisch Centrum, 1981. Hans Kamp and Uwe Reyle. From Discourse to Logic : Introduction to Modeltheoretic Semantics of Natural Language, Formal Logic and Discourse Representation Theory. Kluwer Academic, Dordrecht, 1993. Rick Kjeldsen and Paul R. Cohen. The evolution and performance of the GRANT system. Technical report, University of Massachusetts, Amherst, MA, USA, 1988. Dan Klein and Christopher D. Manning. Fast exact inference with a factored model for natural language parsing. Advances in Neural Information Processing Systems, 15:3–10, 2003. 126
BIBLIOGRAPHY
Douglas B. Lenat. Cyc: A large-scale investment in knowledge infrastructure. Communications of the ACM, 38(11):33 – 38, 1995. Dekang Lin. An information-theoretic definition of similarity. In Proceedings of the 15th International Conference on Machine Learning, 1998. H. Liu and P Singh. Commonsense reasoning in and over natural language. In Proceedings of the 8th International Conference on Knowledge-Based Intelligent Information & Engineering Systems (KES’2004), Wellington, New Zealand, 2004a. H Liu and P Singh. Conceptnet: A practical commonsense reasoning tool-kit. BT Technology Journal, 22:211 – 226, Oct 2004b. Margaret Masterman. Semantic message detection for machine translation, using an interlingua. In Proceedings of the 1961 International Conference on Machine Translation of Languages and Applied Language Analysis, pages 438 – 475, London, 1962. Cynthia Matuszek, J. Cabral, M. Witbrock, and J. DeOliveira. An introduction to the syntax and content of Cyc. In 2006 AAAI Spring Symposium on Formalizing and Compiling Background Knowledge and Its Applications to Knowledge Representation and Question Answering, Stanford, CA, USA, March 2006. D.E. Meyer and R.W. Schvaneveldt. Facilitation in recognizing pairs of words: Evidence of a dependence between retrieval operations. Journal of Experimental Psychology, 90(2):227–234, 1971. D. Moldovan, S. Harabagiu, R. Girju, P. Morarescu, F. Lacatusu, A. Novischi, A. Badulescu, and O. Bolohan. LCC tools for question answering. In 11th Text Retrieval Conference, Gaithersburg, MD, 2002. Vivi Nastase. Topic-driven multi-document summarization with encyclopedic knowledge and spreading activation. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP-2008), pages 763–772, Honolulu, October 2008. Sebastian Pad´o and Mirella Lapata. Dependency-based construction of semantic space models. Computational Linguistics, 33(2):161–199, 2007. Patrick Pantel and Marco Pennacchiotti. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of Conference on Computational Linguistics / Association for Computational Linguistics (COLING/ACL-06), Sydney, Australia, 2006. Patrick Pantel, Deepak Ravichandran, and Eduard Hovy. Towards terascale knowledge acquisition. In Proceedings of Conference on Computational Linguistics (COLING-04), pages 771 – 777, Geneva, Switzerland, 2004. M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980. 127
BIBLIOGRAPHY
S Preece. A Spreading Activation Model for Information Retrieval. PhD thesis, University of Illinois, Urbana, IL, 1981. James Pustejovsky, Robert Knippen, Jessica Littman, and Roser Saur´ı. Temporal and event information in natural language text. Language Resources and Evaluation, 39(2-3):123–164, 2005. M. Ross Quillian. The teachable language comprehender: A simulation program and theory of language. Communications of the ACM, 12(8):459 – 476, 1969. Philip Resnik. Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research, 11:95–130, 1999. Stephen D. Richardson, William B. Dolan, and Lucy Vanderwende. Mindnet: Acquiring and structuring semantic information from text. In Proceedings of COLING ’98, 1998. G. Salton and C. Buckley. On the use of spreading activation methods in automatic information retrieval. In SIGIR ’88: Proceedings of the 11th annual international ACM SIGIR conference on research and development in information retrieval, pages 147 – 160, New York, NY, USA, 1988. ACM Press. L. Schubert and M. Tong. Extracting and evaluating general world knowledge from the Brown corpus. In Proceedings of the HLT/NAACL 2003 Workshop on Text Mining, 2003. Roger W. Schvaneveldt, editor. Pathfinder associative networks: studies in knowledge organization. Ablex Publishing Corp., Norwood, NJ, USA, 1990. ISBN 0-89391624-2. Push Singh, Thomas Lin, Erik T. Mueller, Grace Lim, Travell Perkins, and Wan Li Zhu. Open mind common sense: Knowledge acquisition from the general public. In Lecture Notes in Computer Science, volume 2519, pages 1223 – 1237. Springer Berlin / Heidelberg, 2002. John F. Sowa. Semantic networks. In S. C. Shapiro, editor, Encyclopedia of Artificial Intelligence. Wiley-Interscience, New York, 2nd edition, 1992. C. Spearman. The proof and measurement of association between two things. The American journal of psychology, 100(3-4):441–471, 1987. Mark Steedman. The Syntactic Process. The MIT Press, Cambridge, MA., 2000. Tom Stocky, Alexander Faaborg, and Henry Lieberman. A commonsense approach to predictive text entry. In Proceedings of Conference on Human Factors in Computing Systems, Vienna, Austria, April 2004.
128
BIBLIOGRAPHY
D. R. Swanson. Fish oil, raynaud’s syndrome, and undiscovered public knowledge. Perspectives in Biology and Medicine, 30(1):7–18, 1986. E. F. Tjong Kim Sang and F. De Meulder. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Walter Daelemans and Miles Osborne, editors, Proceedings of CoNLL-2003, pages 142–147, 2003. Peter D. Turney. Lecture notes in computer science 1: Mining the web for synonyms: PMI-IR versus LSA on TOEFL, 2001. J. van Eijck. Discourse representation theory. In Encyclopedia of Language and Linguistics. Elsevier Science Ltd, 2 edition, 2005. J. van Eijck and H. Kamp. Representing discourse in context. In J. van Benthem and A. ter Meulen, editors, Handbook of Logic and Language. MIT Press, Cambridge MA, USA, 1997. Xiaojun Wan, Jianfeng Gao, Mu Li, and Binggong Ding. Person resolution in person search results: Webhawk. In CIKM ’05: Proceedings of the 14th ACM international conference on Information and knowledge management, pages 163 – 170, New York, NY, USA, 2005. ACM Press. Huan Wang, Xing Jiang, Liang-Tien Chia, and Ah-Hwee Tan. Ontology enhanced web image retrieval: aided by Wikipedia & spreading activation theory. In MIR ’08: Proceeding of the 1st ACM international conference on Multimedia information retrieval, pages 195–201, New York, NY, USA, 2008. ACM. Word Net. WNStats - wordnet 2.1 database statistics. Viewed 25 July, 2006, 2006. http://wordnet.princeton.edu/.
129