This paper first introduces Constraint Dependency Grammar in section 2. .... Figure 1 depicts the process of parsing the sentence: The dog eats, using the.
PAC Learning Constraint Dependency Grammar Constraints Mary P. Harper, Christopher M. White, Stephen A. Hockema, and Randall A. Helzerman 1285 Electrical Engineering Building School of Electrical and Computer Engineering Purdue University West Lafayette, IN 47907-1285
Abstract
Constraint Dependency Grammar (CDG) [11, 13] is a constraint-based grammatical formalism that has proven eective for processing English [5] and improving the accuracy of spoken language understanding systems [4]. However, prospective users of CDG face a steep learning curve when trying to master this powerful formalism. Therefore, a recent trend in CDG research has been to try to ease the burden of grammar writers by developing methods for automatically learning CDG grammars from annotated sentences [22, 23]. In this paper, we prove that CDG grammar constraints are PAC learnable.
1 Introduction Constraint Dependency Grammar (CDG) [11, 13] is a constraint-based grammatical formalism that has proven eective for processing English [5] and improving the accuracy of spoken language understanding systems [4]. However, prospective users of CDG face a steep learning curve when trying to This material is based upon work supported by a grant from the Intel Research Council and the National Science Foundation under Grant No. IRI-9704358. Any opinions, ndings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily re ect the views of Intel or the National Science Foundation.
1
master a new grammatical formalism. Therefore, a recent trend in CDG research has been to try to ease the burden of grammar writers by developing methods for automatically learning CDG constraints from annotated sentences [22, 23]. In order to demonstrate that CDG grammar constraints can be effectively learned in polynomial time, we will prove that CDG constraints are PAC learnable from positive training examples. In eect, so long as we have a corpus of sentences that are labeled with parse relations between words, grammar constraints that cover the language represented by the sentences can be learned in polynomial time. This paper rst introduces Constraint Dependency Grammar in section 2. This description of CDG is necessarily brief; for additional information on this topic, please refer to papers written by several CDG researchers [4, 5, 11, 12, 13, 14, 15]. Section 3 describes the concept of abstract role value tuples as an alternative way of representing CDG constraints. Finally, in section 4, the PAC-learning model is de ned, the PAC-learnability of some related target concepts is described, and nally abstract role value tuples are shown to be PAC-learnable.
2 CDG and Parsing In this section, we will rst de ne CDG and then give an example of a simple CDG that will be used to illustrate the parsing algorithm.
2.1 CDG De nitions
Constraint Dependency Grammar (CDG), introduced by Maruyama [11, 12, 13], uses constraints rather than production rules for parsing. The parsing algorithm is framed as a constraint satisfaction problem; the parsing rules are the constraints and the solutions are the parses. A CDG is de ned as a four-tuple, h; R; L; C i, where: = a nite set of preterminal symbols, or lexical categories = f1 ; : : : ; cg. R = a nite set of uniquely named roles (or role-ids) = fr1 ; : : : ; rpg. L = a nite set of labels = fl1; : : : ; lq g. C = a constraint formula. A sentence s = w1 w2w3 : : : wn is a string of nite length n and is an element of . For each word wi 2 of a sentence s, there are p dierent roles, yielding 2
n p roles for the entire sentence. A role is a variable that is assigned a role value, which is an element of the set L f1; 2; : : : ; ng. Role values will be denoted in the parsing example as l-m, where l 2 L and m 2 f1; 2; : : : ; ng is called the modi ee. Each label in L indicates a dierent syntactic function, and m speci es the position that the role value's word is modifying when it takes on the function speci ed by the label, l (e.g., subj-3). Maruyama originally used a modi ee of NIL to indicate that a role value does not require a modi ee. In this paper, we indicate that a role value does not require a modi ee by having the role value's modi ee set equal to its position. This simpli es both the description of CDG and the subsequent discussion on PAC learning. Once a role value is assigned to a role, it is expanded to include, in addition to its label and modi ee, an associated lexical category, role name for the role to which it is assigned, and position of the word in the sentence to which it is assigned. An alternative characterization of the problem of assigning role values to roles is to view a role value as an element of the set R POS L MOD, where POS = f1; 2; : : :; ng is the set of all possible positions a role value's word can have, MOD = f1; 2; : : : ; ng is the set of modi ees that a role value can have, and n is sentence length. Once a role value is assigned to a role, the element of , the element of R, and the role value's position are determined. The sentence s is said to be generated by the grammar G if there exists an assignment A that maps a role value to each of the n p roles for s such that C is satis ed. There may be more than one assignment of role values to the roles of a sentence which satis es C , in which case there is more than one parse for the sentence. L(G) is the language generated by grammar G if and only if L(G) is the set of all sentences generated by G. Note that the empty string has no roles and is always generated by any grammar according to de nition. C is a logical formula of the form: 8 x1 : role(x1 ), 8 x2 : role(x2) ^ (x2 6= x1 ), . . .
8 xa : role(xa ) ^ (xa 6= x1 ) ^ (xa 6= x2 ) ^ : : : ^ (xa 6= xa?1 ) (P1 ^ P2 ^ : : : ^ Pm )
Each Pi has the form: IF Antecedent THEN Consequent, where Antecedent and Consequent are predicates involving =, , or predicates joined by 3
the logical connectives and, or, or not. This is a rst-order predicate calculus formula over all roles that requires that an assignment of role values to roles be consistent with the formula. Those role values that are inconsistent with C can be eliminated. Several access functions are de ned for accessing the needed grammatical information associated with a role value in order to test it for grammaticality in C ; these include: (pos x), which returns the position of the word for the role value assigned to x; (rid x), which returns the role name of x to which the role value is assigned; (lab x), which returns the label of the role value assigned to x; (mod x), which returns the position of the modi ee for the role value assigned to x; and (cat x), which returns the category of the role value assigned to x. The constants allowed in C include elements and subsets of [ R [ L. Maruyama [11] originally allowed C to also contain as constants the numbers corresponding to the position of a word; however, exact position information is not required, nor is it very useful for describing legal grammatical relations in a grammar1. A subformula Pi may contain up to a variables based on the above description. A subformula is called a unary constraint if it contains only a single variable (by convention, we use x1) and a binary constraint if it contains two variables (by convention x1 and x2 ). The value of a, the maximum number of variables in the subformulas of C , is called the arity parameter for a CDG grammar. A CDG also has a degree parameter, which is the number of roles in the grammar. The set of languages accepted by a CDG grammar is a superset of the set of languages that can be accepted by context-free grammars. In fact, Maruyama [11, 12] proves that any arbitrary CFG converted to Griebach Normal form can be converted into a CDG grammar with a degree of two and an arity of two that accepts the same language as the CFG. In addition, CDG can accept languages that CFGs cannot, for example, an bncn (where a, b, and c are terminal symbols) and ww (where w is some string of terminal symbols). The only reference to a position in a sentence found in any published CDG grammar [11] is to the word in the rst position of a sentence. However, the concept that something can be in the rst position if it can be before all of the other words is easily captured by constraints that make no reference to position 1. 1
4
2.2 CDG Parsing Example
Figure 1 depicts the process of parsing the sentence: The dog eats, using the following grammar: = fnoun, verb, detg, R = fgovernor (depicted as G)g, L = fDET, SUBJ, ROOTg, and C = 8x1 : role(x1 ), 8x2 : role(x2 ) ^ x2 6= x1 F , where F is the conjunction of the constraints in Figure 1. This grammar has a degree of one and an arity of two. We will demonstrate the parsing process in ve steps, although some of the steps can be merged. We will also describe how parses can be extracted at the end of the parsing process.
Step 1: Initializing the Network. Initially, each of the roles of a word is
assigned the set of all possible role values. Since the sentence has three words, the possible labels and modi ees for the role values assigned to each role for each word category in the sentence would be fDET, SUBJ, ROOTg f1, 2, 3g = fDET-1, DET-2, DET-3, SUBJ-1, SUBJ-2, SUBJ-3, ROOT-1, ROOT-2, ROOT-3g. At the end of parsing, the role values that can be assigned to the roles given the constraints will represent the parses for the sentence. The rst constraint network in Figure 1 shows the state of the parse after all of the role values are generated.
Step 2: Propagating Unary Constraints and Enforcing Node Consistency. Next, unary constraints are applied to the role values assigned to the
roles. Unary constraints test whether the role values are legal independently of the other role values in the sentence. Any role value that makes a unary constraint false cannot satisfy C ; hence, it is eliminated from its role because it can never participate in an assignment A. Inconsistent role values are deleted to re-establish the network's node consistency with the constraints. For example, in the rst constraint network depicted in Figure 1, the only role values for the governor role of The that satisfy the rst constraint are DET-2 and DET-3. The second constraint network in Figure 1 depicts the state of the parse after applying all of the unary constraints and enforcing node consistency. Unary constraint propagation can be merged with role value generation in order to create only those role values that are consistent with the unary constraints. Initial network construction together with unary constraint propagation requires a worst-case running time of (n2 ), where n is the number of words in the sentence. 5
After Step 1: The det 1
dog
eats
noun 2
verb 3
G {DET−1, DET−2, DET−3, SUBJ−1, SUBJ−2, SUBJ−3, ROOT−1, ROOT−2, ROOT−3}
G
G
{DET−1, DET−2, DET−3, SUBJ−1, SUBJ−2, SUBJ−3, ROOT−1, ROOT−2, ROOT−3}
{DET−1, DET−2, DET−3, SUBJ−1, SUBJ−2, SUBJ−3, ROOT−1, ROOT−2, ROOT−3}
Unary Constraints
Apply Unary Constraints, Enforce Node Consistency, and Build Arcs:
1. (if (and (= (cat x1) det) (= (rid x1) governor)) (and (= (lab x1) DET) (< (pos x1) (mod x1)))) 2. (if (and (= (cat x1) noun) (= (rid x1) governor)) (and (= (lab x1) SUBJ) (< (pos x1) (mod x1))))
After Steps 2 and 3: The det 1
dog
eats
noun 2
G
G
{SUBJ−3}
{DET−2, DET−3}
3. (if (and (= (cat x1) verb) (= (rid x1) governor)) (and (= (lab x1) ROOT) (= (mod x1) (pos x1))))
verb 3
G {ROOT−3} ROOT−3
SUBJ−3 DET−2
1
DET−3
1
SUBJ−3
1
ROOT−3 DET−2
1
DET−3
1
Binary Constraint (if (and (= (lab x1) DET) (= (rid x1) governor) (= (mod x1) (pos x2)) (= (rid x2) governor)) (= (cat x2) noun))
Apply Binary Constraints:
After Step 4: The det 1
dog
eats
noun 2
G
verb 3 G
G {SUBJ−3}
{DET−2, DET−3}
{ROOT−3}
ROOT−3
SUBJ−3 DET−2
1
DET−3
1
SUBJ−3
1
ROOT−3 DET−2
1
DET−3
0
Enforce Arc Consistency:
After Step 5: The det 1
dog
eats
noun 2
verb 3
G
G
{DET−2}
{SUBJ−3}
G {ROOT−3}
Figure 1: Using constraints to parse the sentence: The dog eats. 6
Step 3: Preparing for Binary Constraint Propagation. The next step
is to prepare the network for binary constraints, which will then be used to determine those pairs of role values that can legally coexist. To keep track of pairs of role values, arcs connecting each role to all other roles in the network are constructed. Each arc has an associated arc matrix, whose row and column indices are the role values associated with the two roles it connects. The entries in an arc matrix can either be a 1 (indicating that the two role values indexing the entry are compatible) or a 0 (indicating that the two role values cannot simultaneously exist). Initially, all entries in a matrix are set to 1 as in the second constraint network of Figure 1.
Step 4: Executing Binary Constraint Propagation. Binary constraints
are then propagated to determine which pairs of role values can legally co-exist. If a pair of role values satisfy the antecedent of a binary constraint but do not satisfy the consequent, then the two role values tested cannot appear together in a sentence parse. The binary constraint shown in Figure 1 is applied to the pairs of role values indexing the entries in the matrices in the second constraint network. For example, when x1 = DET-3 for the and x2 = ROOT-3 for eats, the antecedent of the binary constraint is satis ed, but the consequent is not; hence, the role values are incompatible. This is indicated by replacing the entry of 1 with 0 as shown in the third constraint network of Figure 1.
Step 5: Enforcing Arc Consistency. Finally, arc consistency (or ltering) is enforced in order to eliminate role values that are inconsistent with all of the role values that can be assigned to another role in the parse of a sentence [10, 16, 17]. In this case, the role value is incompatible with all sentence parses and can therefore be deleted. The fourth constraint network in Figure 1 demonstrates the eect of enforcing arc consistency. Preparation for binary constraints, binary constraint propagation, and arc consistency can be merged together during parsing. The CDG parsing algorithm requires (n4) time in the worst case to prepare for and propagate the binary constraints over pairs of role values and to perform arc consistency, where n is the number of words in the sentence.
Extracting Parses. A parse for the sentence consists of an assignment of role values to roles such that the unary and binary constraints are satis ed for the assignment. A parse is extracted by searching through the roles and 7
selecting role values that are compatible with all of the other role values assigned to the other roles. There is a single parse for our example sentence since there is only one role value per role in the sentence. Depending on the grammar and the sentence to be parsed, there can be more than one parse for a sentence; hence, there can be more than one assignment of role values to the roles of the sentence.
Runtime Considerations for Arity. Although CDG supports any arity,
as the arity of the grammar increases, so does the cost of propagating the constraints. For example, for a grammar to support ternary constraints (a = n 3), there must be 3 arcs created joining role triples, and constraints must be propagated over three dimensional matrices, leading to a worst case running time of (n6). In general, practical CDGs are limited to a = 2, which provides sucient expressivity to represent all context-free grammars, as well as some grammars that are context sensitive.
3 Abstract Role Value Tuples As discussed in the previous section, the grammaticality of a sentence in a language de ned by a CDG, as well as its possible parses, is determined by the constraints of the grammar. If the set of all possible role values assigned to the roles of a sentence of length n is denoted S1 = R f1; 2; : : :; ng L f1; 2; : : : ; ng, where n can be any arbitrary natural number greater than 0, unary constraints partition S1 into grammatical and ungrammatical role values. Similarly, binary constraints partition the set S2 = S1 S1 = S12 into compatible and incompatible pairs. In general, constraints with arity a partition Sa = S1a into compatible and incompatible a-tuples of role values. For this paper, we will focus on a = 2; however, this discussion can be easily generalized for any constant arity. An alternative way of representing the unary and binary constraints of a grammar would be as a set of grammatical role values and compatible pairs of role values. Unfortunately, the sets S1 and S2 are in nite because the position associated with a role value in any arbitrary sentence, as well as the position of its modi ee, could be any natural number (unless we limit the power of CDG). Fortunately, it is possible to construct another view of role values given that constraints in a CDG do not need to use the exact position of a word or a 8
modi ee in the sentence to parse sentences [4, 5, 11, 13, 14, 15]; they only need to test the relative positions between role values and their modi ees. Recall, for example, the rst constraint in our parsing example:
x1 ) det) x1 ) governor)) x1 ) DET) x1 ) (mod x1 )))) This unary constraint simply tests the relative position of the role value and the thing it is modifying. Similarly, binary constraints test for the relative positions of two role values and their modi ees. (if (and (= (= (and (= (