Polynomial Time Algorithms for Finding Unordered Tree Patterns with Internal Variables Takayoshi Shoudai1 , Tomoyuki Uchida2 , and Tetsuhiro Miyahara2 1
Department of Informatics, Kyushu University, Kasuga 816-8580, Japan
[email protected] 2 Faculty of Information Sciences, Hiroshima City University, Hiroshima 731-3194, Japan {uchida@cs,miyahara@its}.hiroshima-cu.ac.jp
Abstract. Many documents such as Web documents or XML files have tree structures. A term tree is an unordered tree pattern consisting of internal variables and tree structures. In order to extract meaningful and hidden knowledge from such tree structured documents, we consider a minimal language (MINL) problem for term trees. The MINL problem for term trees is to find a term tree t such that the language generated by t is minimal among languages, generated by term trees, which contain all given tree structured data. Firstly, we show that the MINL problem for regular term trees is computable in polynomial time if the number of edge labels is infinite. Next, we show that the MINL problems with optimizing the size of an output term tree are NP-complete. Finally, in order to show that our polynomial time algorithm for the MINL problem can be applied to data mining from real-world Web documents, we show that regular term tree languages are polynomial time inductively inferable from positive data if the number of edge labels is infinite.
1
Introduction
Many documents such as Web documents or XML files have tree structures. In order to extract meaningful and hidden knowledge from such documents, we need tree structured patterns which can explain them. As tree structured patterns, a tree pattern [1, 3, 4], a type of objects [9] and a tree-expression pattern [12] were proposed. In [5, 11], we presented the concept of term trees as a graph pattern suited for representing unordered tree structured data. A term tree is an unordered tree pattern which consists of internal variables and tree structures. A term tree is different from the other representations proposed in [1, 3, 4, 9, 12] in that a term tree has internal structured variables which can be substituted by arbitrary trees. A term tree t is said to be regular if the labels of all variables are different. In [7], we proved that the matching problem for an extended regular term tree and a standard tree is NP-complete. However, in [6], we showed that the matching problem for a regular term tree whose variables consist of two vertices is computable in polynomial time. Then in this paper, we consider only such a regular term tree. Since a variable can be replaced by any tree, overgeneralized 1
x
x y
T1
T2
T3
t
t0
Fig. 1. A term tree t and a term tree t0 as an overgeneralized term tree and one of the least generalized term trees explaining T1 , T2 and T3 , respectively. A variable is represented by a box with lines to its elements. The label of a box is the variable label of the variable.
patterns explaining all given data are meaningless. Then the purpose of this work is to find one of the least generalized term trees explaining tree structured data. In Fig. 1, we give examples of term trees t and t0 which can explain all trees T1 , T2 and T3 . The term tree t is an overgeneralized term tree which is meaningless. But the term tree t0 is one of the least generalized term trees. The concept represented by a term tree, which is called a term tree language, is the set of all trees obtained by replacing all variables by arbitrary trees. The minimal language (MINL) problem is the problem of finding a term tree whose term tree language is minimal among term tree languages containing all given trees. In [5, 11], for a special type of regular term trees, which are called a regular term caterpillar, we showed that the MINL problem for regular term caterpillars is computable in polynomial time. In this paper, we show that the MINL problem for regular term trees is computable in polynomial time if the number of edge labels is infinite. Moreover, we consider the following two problems. Firstly, MINL with Variable-size Minimization is the problem of finding a term tree t such that the term tree language of t is minimal and the number of variables in t is minimum. Secondly, MINL with Tree-size Maximization is the problem of finding a term tree t such that the term tree language of t is minimal and the number of vertices in t is maximum. Then we prove that the both two problems are NP-complete. These results show the hardness of finding the optimum regular term tree representing all given data. In order to show that our polynomial time algorithm for the MINL problem can be applied to data mining from real-world Web documents, we show that regular term tree languages are polynomial time inductively inferable from positive data if the number of edge labels is infinite. In [8], we proposed a tag tree pattern which is a special type of a regular term tree and is suited for expressing structures of XML documents, and presented an algorithm for generating all maximally frequent tag tree patterns. The results of this paper give a theoretical foundation of the result in [8]. This paper is organized as follows. In Section 2, we give the notion of a term tree as a tree structured pattern. Also we formally define the MINL problem for regular term trees. In Section 3, we give a polynomial time algorithm solving the MINL problem for regular term trees if the number of edge labels is infinite. And we show the hardness of finding the optimal term tree which explaining all given 2
data in Section 4. We show that regular term tree languages are polynomial time inductively inferable from positive data in Section 5.
2
Preliminaries – Term Trees as Tree Structured Patterns
Let T = (VT , ET ) be a rooted unordered tree (or simply a tree) with a set VT of vertices, a set ET of edges, and an edge labeling. A variable in T is a list [u, u0 ] of two distinct vertices u and u0 in VT . A label of a variable is called a variable label. Λ and X denote a set of edge labels and a set of variable labels, respectively, where Λ ∩ X = φ. For a set S, the number of elements in S is denoted by |S|. Definition 1. A triplet g = (Vg , Eg , Hg ) is called a rooted term tree (or simply a term tree) if Hg is a finite set of variables such that for any [u, u0 ] ∈ Hg , [u0 , u] is not in Hg , and the graph (Vg , Eg ∪Eg0 ) is a tree where Eg0 = {{u, v} | [u, v] ∈ Hg }. A term tree g is called regular if all variables in Hg have mutually distinct variable labels in X. In particular, a term tree with no variable is called a ground term tree and considered to be a standard tree. RTT and GTT denote the set of all regular term trees and the set of all ground term trees, respectively. For a term tree f and its vertices v1 and vi , a path from v1 to vi is a sequence v1 , v2 , . . . , vi of distinct vertices of f such that for 1 ≤ j < i, there exists an edge or a variable which consists of vj and vj+1 . If there is an edge or a variable which consists of v and v 0 such that v lies on the path from the root of f to v 0 , then v is said to be the parent of v 0 and v 0 is a child of v. Without loss of generality, we assume that v 0 is a child of v if [v, v 0 ] is a variable in f . Let f = (Vf , Ef , Hf ) and g = (Vg , Eg , Hg ) be regular term trees. We say that f and g are isomorphic, denoted by f ≡ g, if there is a bijection ϕ from Vf to Vg such that (i) the root of f is mapped to the root of g by ϕ, (ii) {u, v} ∈ Ef if and only if {ϕ(u), ϕ(v)} ∈ Eg and the two edges have the same edge label, and (iii) [u, v] ∈ Hf if and only if [ϕ(u), ϕ(v)] ∈ Hg . Two isomorphic regular term trees are considered to be identical. Let f and g be term trees with at least two vertices. Let σ = [u, u0 ] be a list consisting of the root u of g and another vertex u0 in g. The form x := [g, σ] is called a binding for x. A new term tree f {x := [g, σ]} is obtained by applying the 0 binding x := [g, σ] to f in the following way: Let e1 = [v1 , v10 ], . . . , em = [vm , vm ] be the variables in f with the variable label x. Let g1 , . . . , gm be m copies of g and ui , u0i the vertices of gi corresponding to u, u0 of g, respectively. For each variable ei = [vi , vi0 ], we attach gi to f by removing the variable ei from Hf and by identifying the vertices vi , vi0 with the vertices ui , u0i of gi . We define the root of the resulting term tree as the root of f . A substitution θ is a finite collection of bindings {x1 := [g1 , σ1 ], · · · , xn := [gn , σn ]}, where xi ’s are mutually distinct variable labels in X. The term tree f θ, called the instance of f by θ, is obtained by applying the all bindings xi := [gi , σi ] on f simultaneously. For term trees f and g, if there exists a substitution θ such that f ≡ gθ, we write f ¹ g. Especially we write f ≺ g if f ¹ g and g 6¹ f . In Fig. 2 we give examples of a regular term tree, a substitution and an instance. 3
v1 x
u1 y
u2
v2 t
t1
t2
tθ
Fig. 2. Ground term trees t1 and t2 , and an instance tθ which is obtained by applying a substitution θ = {x := [t1 , [v1 , v2 ]], y := [t2 , [u1 , u2 ]]} to the regular term tree t. A variable is represented by a box with lines to its elements. The label of a box is the variable label of the variable.
For a (regular) term tree g, the (regular) term tree language L(g) of g is defined as L(g) = {h ∈ GTT | h ≡ gθ for a substitution θ}. The class RTTL of all regular term tree languages is defined as RTTL = {L(g) | g ∈ RTT }. Let S be a nonempty finite subset of GTT and r be a regular term tree. A regular term L(r) tree language L(r) is minimal for (S,RTT ) if (i) S ⊆ L(r) and (ii) L(s) ⊆ / implies S 6⊆ L(s) for any s ∈ RTT . Minimal Language (MINL) Problem for RTT Instance: A nonempty finite subset S of GTT . Question: Find a regular term tree r such that L(r) is minimal for (S,RTT ).
3
Polynomial Time Algorithm for Finding Minimal Regular Term Languages
Since solving the minimal language problem for RTT is essential to the learnability of regular term tree languages, we give the following theorem. Theorem 1. The minimal language problem for RTT with infinite edge labels is computable in polynomial time. Proof. We show that the procedure MINL(S) (Fig. 3) works correctly for finding a regular term tree r such that the language L(r) is minimal for (S, RTT ). Let f = (Vf , Ef , Hf ) and g = (Vg , Eg , Hg ) be term trees. We write h ≈ g if there exists a bijection ξ : Vh → Vg such that for u, v ∈ Vh , {u, v} ∈ Eh or [u, v] ∈ Hh if and only if {ξ(u), ξ(v)} ∈ Eg or [ξ(u), ξ(v)] ∈ Hg . For RTT with infinite edge labels, the following two claims hold: Claim 1. For any g, h ∈ RTT with h ≈ g, h ¹ g if and only if L(h) ⊆ L(g). Proof of Claim 1. If h ¹ g, we have L(h) ⊆ L(g) straightforwardly. Then we show that h ¹ g if L(h) ⊆ L(g). Since h ≈ g, there exists a bijection ξ such that for u, v ∈ Vh , {u, v} ∈ Eh or [u, v] ∈ Hh if and only if {ξ(u), ξ(v)} ∈ Eg or [ξ(u), ξ(v)] ∈ Hg . If there does not exist a bijection ξ such that [ξ(u), ξ(v)] is in Hg for all [u, v] ∈ Hh , the ground term tree which is obtained by replacing all 4
Procedure MINL(S); Input: a nonempty finite set S of ground term trees. Output: a regular term tree g such that the language L(g) is minimal for (S, RTT ). begin g := Basic-Tree(S); foreach variable [u, v] ∈ Hg do foreach edge label c which appears in S do begin let g 0 be a term tree which is obtained from g by replacing variable [u, v] with an edge labeled with c; if S ⊆ L(g 0 ) then begin g := g 0 ; break end end end;
Procedure Basic-Tree(S); begin // Each variable is assumed to be labeled with a distinct variable label. d := 0; g := ({r}, ∅, ∅); g := breadth-expansion(r, g, S); max-depth :=the maximum depth of the trees in S; d := d + 1; while d ≤ max-depth − 1 do begin v :=a vertex at depth d which is not yet visited; g :=breadth-expansion(v, g, S); while there exists a sibling of v which is not yet visited do begin Let v 0 be a sibling of v which is not yet visited; g :=breadth-expansion(v 0 , g, S) end; d := d + 1 end; return g end; Procedure breadth-expansion(v, g, S); begin g 0 :=depth-expansion(v, g, S); while g 6= g 0 do begin g := g 0 ; g 0 :=depth-expansion(v, g, S) end; return g end;
Procedure depth-expansion(v, g, S); begin Let g be (Vg , ∅, Hg ); Let v 0 be a new vertex and [v, v 0 ] a new variable; g 0 := (Vg ∪ {v 0 }, ∅, Hg ∪ {[v, v 0 ]}); while S ⊆ L(g 0 ) do begin g := g 0 ; v := v 0 ; Let v 0 be a new vertex and [v, v 0 ] a new variable; g 0 := (Vg ∪ {v 0 }, ∅, Hg ∪ {[v, v 0 ]}) end; return g end;
Fig. 3. MINL(S): An algorithm for finding a regular term tree r such that the language L(r) is minimal for (S, RT T ).
5
variables in h with an edge label which does not appear in g is not in L(g). Then [ξ(u), ξ(v)] ∈ Hg for all [u, v] ∈ Hh . Therefore h ¹ g. (End of Proof of Claim 1 ) By Claim 1 we can replace the inclusion relation ⊆ on RTTL with the relation ¹ on RTT . Claim 2. Let g = (Vg , Eg , Hg ) be an output regular term tree by the procedure MINL, given a nonempty finite set S of ground term trees. If there exists a regular term tree h with S ⊆ L(h) ⊆ L(g), then h ≈ g. Proof of Claim 2. Let g 0 be the regular term tree which is obtained by BasicTree(S). Then L(h) ⊆ L(g) ⊆ L(g 0 ) and g ≈ g 0 . Let h0 be the regular term tree which is obtained by replacing all edges in h with variables. Then h ≈ h0 and L(h) ⊆ L(h0 ). Let θ be a substitution which realizes h ≡ g 0 θ and θ0 a substitution which obtained by replacing all edges appearing in θ with variables. Then h0 ≡ g 0 θ0 . Since S ⊆ L(h) ⊆ L(h0 ), S ⊆ L(g 0 θ0 ). Since Basic-Tree(S) generates a regular term tree whose language is minimal for (S, {g ∈ RT T | g has no edge}), g 0 θ0 ≡ g 0 . Therefore h0 ≡ g 0 , then h ≈ g. (End of Proof of Claim 2 ) Suppose that there exists a regular term tree h = (Vh , Eh , Hh ) such that S ⊆ L(h) ⊆ L(g). From the above two claims, we obtain h ≺ g and h ≈ g. / There exist a variable [u, v] ∈ Hg labeled with x and an edge label a such that h ¹ g{x := [Ta , [u0 , v 0 ]]} where Ta is a tree consisting of an edge {u0 , v 0 } labeled with a. We denote by f the regular term tree which is obtained in the procedure MINL before trying to replace the variable [u, v] with an edge labeled with a. Then we have that S 6⊆ L(f {x := [Ta , [u0 , v 0 ]]}) and there exists a substitution θ with g ≡ f θ. From h ¹ g{x := [Ta , [u0 , v 0 ]]} ≡ f θ{x := [Ta , [u0 , v 0 ]]} ≡ f {x := [Ta , [u0 , v 0 ]]}θ, we have h ¹ f {x := [Ta , [u0 , v 0 ]]}. Thus S ⊆ L(h) ⊆ L(f {x := [Ta , [u0 , v 0 ]]}). This contradicts S 6⊆ L(f {x := [Ta , [u0 , v 0 ]]}). In [6], we gave an algorithm for deciding whether or not a given ground term tree T is a member of L(g) for a given regular term tree g. The algorithm runs in O(n2 N 3.5 ) time where n and N are the number of vertices in g and T respectively. Let Nmax and Nmin be the maximum and minimum numbers of vertices of ground term trees in S respectively. Since the final regular term tree g is no larger than the smallest ground term tree in S, the algorithm MINL(S) checks at most O(Nmin ) time whether or not S ⊆ L(g). Therefore the algorithm 3 3.5 runs in O(Nmin Nmax |S|) time. 2
4
Hardness Results of Finding Regular Term Trees of Minimal Language
In Section 3, we have given a polynomial time algorithm for finding a regular term tree of minimal language from a given sample. In this section, we discuss MINL problems with optimizing the size of an output regular term tree. MINL with Variable-size Minimization Instance: A nonempty finite subset S of GTT and a positive integer K. 6
Tj
P0
P1 (x1 , x2 , x3 ) = (true, false, true)
P2 (x1 , x2 , x3 ) = (true, false, false)
Fig. 4. Tj for a clause cj = {x1 , x¯2 , x3 }. Pi (1 ≤ i ≤ 7) is constructed systematically by removing appropriate 3 branches labeled with T or F from P0 . We describe only P1 and P2 in this figure.
Question: Is there a regular term tree t = (V, E, H) with |H| ≤ K such that L(t) is minimal for (S, RT T )? Theorem 2. MINL with Variable-size Minimization is NP-complete. Proof. Membership in NP is obvious. We transform 3-SAT to this problem. Let U = {x1 , . . . , xn } be a set of variables and C = {c1 , . . . , cm } be a collection of clauses over U such that each clause cj (1 ≤ j ≤ m) has |cj | = 3. We use symbols T, F, d1 , d2 , d4 , d5 , x1 , . . . , xn as edge labels. We construct trees T1 , . . . , Tm from c1 , . . . , cm . Let P0 be the tree which is described in Fig. 4. For a clause cj , which contains xj1 , xj2 , xj3 as positive or negative literals, we have the 7 truth assignments to xj1 , xj2 , xj3 which satisfy
T0
T
Fig. 5. Two special sample trees T and T 0 .
7
g1
g2
t1
t2
t3
Fig. 6. The regular term trees t1 , t2 , t3 which generate minimal languages for {g1 , g2 } where both g1 and g2 have the same edge label xi .
g10
g20
t01
t02
t03
Fig. 7. The regular term trees t01 , t02 , t03 which generate minimal languages for {g10 , g20 } where g1 and g2 have distinct edge labels xi and xj (i 6= j), respectively.
the clause cj . For the ith truth assignment (xj1 , xj2 , xj3 ) = (bi1 , bi2 , bi3 ) (i = 1, 2, . . . , 7), we construct Pi by removing the branches which are labeled with b¯i1 , b¯i2 , b¯i3 from the subtrees corresponding to xj1 , xj2 , xj3 of P0 , respectively. For example, for a clause {x1 , x¯2 , x3 } the tree P1 in Fig. 4 shows an assignment x1 = true, x2 = f alse, x3 = true and the tree P2 shows x1 = true, x2 = f alse, x3 = f alse. We also construct two special trees T and T 0 (Fig. 5). T and T 0 have 7 subtrees like Tj . Only one subtree of them is distinct. Let S = {T1 , . . . , Tm , T, T 0 } be a sample set. Lastly we set K = 7n. The depth of a vertex v is the length of the unique path from the root to v. Fact 1. For any regular term tree g which is minimal for (S, RT T ), (i) the root of g has just 7 children which connect to the root by edges labeled with d1 , (ii) each vertex of depth 1 of g has just n children which connect to the parent by edges labeled with d2 , and (iii) each vertex of depth 2 of g has just one child. This fact means that any regular term tree g which is minimal for (S, RT T ) has at least 7n variables each of which locates at each subtree rooted at a vertex of depth 3. Fact 2. Let g1 and g2 be trees described in Fig. 6. There are only three regular term trees which are minimal for ({g1 , g2 }, RT T ). The three regular term trees are described in Fig. 6. 8
Fig. 8. The output regular term tree when there is a truth assignment which satisfies C (Theorem 2).
Fact 3. Let g10 and g20 be trees described in Fig. 7. There are only three regular term trees which are minimal for ({g10 , g20 }, RT T ). The three regular term trees are described in Fig. 7. From the above facts, if 3-SAT has a truth assignment which satisfies all clauses in C, there is a regular term tree t = (V, E, H) with |H| = 7n such that L(t) is minimal for (S, RT T ) (Fig. 8). Conversely, if there is a regular term tree t = (V, E, H) with |H| = 7n such that L(t) is minimal for (S, RT T ), the regular term tree is isomorphic to the one which is described in Fig. 8 by ignoring edge labels T and F. For 1 ≤ i ≤ n, we assign true to xi if the nearest edge label from xi of type T,F is T, otherwise we assign false to xi . Then this truth assignment satisfies C. 2 Next we show that it is hard to compute the regular term tree of maximum tree-size which is minimal for (S, RT T ) for a given sample set S. MINL with Tree-size Maximization Instance: A nonempty finite subset S of GTT and a positive integer K. Question: Is there a regular term tree t = (V, E, H) with |V | ≥ K such that L(t) is minimal for (S, RT T )? Theorem 3. MINL with Tree-size Maximization is NP-complete. Proof. (Sketch) Membership in NP is obvious. We transform 3-SAT to this problem in a similar way to Theorem 2. We use only blank as an edge label. Each variable xi (1 ≤ i ≤ n) transforms a tree of the form shown in Fig. 9. It is easy to see that the number of vertices of a regular term tree which matches the trees corresponding to xi and xj (i 6= j) is at most 5(n + 1). We construct trees T1 , . . . , Tm from c1 , . . . , cm in a similar way to Theorem 2 but we use subtrees P1 , . . . (P1 is shown in Fig. 10) instead of P1 , . . . in Fig. 4. Moreover we construct one special tree T (Fig. 10). Let S = {T1 , . . . , Tm , T } be a sample set of GT T . Lastly we set K = 35n2 + 92n + 8. Then, we can compute a regular term tree t = (V, E, H) with |H| = K such that L(t) is minimal for (S, RT T ) (Fig. 11) if and only if 3-SAT has a truth assignment which satisfies all clauses in C. 2
9
Fig. 9. Subtrees corresponding to x1 , . . . , xn .
P1 (x1 , x2 , x3 ) = (true, false, true)
T
Fig. 10. A subtree P1 corresponding to a truth assignment and a special sample tree T.
Fig. 11. The output regular term tree when there is a truth assignment which satisfies C (Theorem 3).
10
5
Polynomial Time Inductive Inference of Regular Term Tree Languages from Positive Data
In this section, a language is a subset of GTT . An indexed family of recursive languages (or simply a class) is a family of languages C = {L1 , L2 , . . .} such that there is an effective procedure to decide whether w ∈ Li or not, given a tree w and an index i. In our setting, an index is a term tree, and the language Li with an index i is the term tree language L(i) of a term tree i. A class C is said to have finite thickness, if for any nonempty finite set T ⊆ GT T , the cardinality of {L ∈ C | T ⊆ L} is finite. An inductive inference machine (IIM, for short) is an effective procedure which requests inputs from time to time and produces indices as hypotheses from time to time. A positive presentation σ of a nonempty language L is an infinite sequence w1 , w2 , · · · of trees in GTT such that {w1 , w2 , · · ·} = L. We denote by σ[n] the σ’s initial segment of length n ≥ 0. For an IIM and a finite sequence σ[n] = w1 , w2 , · · · , wn , we denote by M (σ[n]) the last hypothesis produced by M which is successively presented w1 , w2 , · · · , wn on its input requests. An IIM is said to converge to an index r for a positive presentation σ, if there is an n ≥ 1 such that for any m ≥ n, M (σ[m]) is r. Let C be a class and M be an IIM for C. An IIM M is said to infer a class C in the limit from positive data, if for any L ∈ C and any positive presentation σ of L, M converges to an index r for σ such that L = Lr . A class C is said to be inferable in the limit from positive data, if there is an IIM which infers C in the limit from positive data. A class C is said to be polynomial time inductively inferable from positive data if there exists an IIM for C which outputs hypotheses in polynomial time with respect to the length of the input data read so far, and infers C in the limit from positive data. Theorem 4 ([2],[10]). Let C be a class. If C has finite thickness, and the membership problem and the minimal language problem for C are computable in polynomial time, then C is polynomial time inductively inferable from positive data. The membership problem for RTTL is, given a ground term tree g and a regular term tree t, the problem of deciding whether or not g ∈ L(t). We gave a polynomial time algorithm for the membership problem for RTTL [6]. Since the class RTTL has finite thickness [5], from Theorem 1,4, we have Theorem 5. Theorem 5. The class RTTL with infinite edge labels is polynomial time inductively inferable from positive data.
6
Conclusions
The minimal language problem is a kernel problem in learning methods such as inductive inference. We have given a polynomial time algorithm for solving the minimal language problem for regular term trees. From the viewpoint of computational complexity, we have shown that it is hard to solve the minimal language 11
problems with optimizing the size of an output regular term tree. By using our polynomial time algorithm, we have shown the regular term tree language with infinite edge labels is polynomial time inductively inferable from positive data. We can give membership and MINL algorithms to the following two classes of regular term trees with no edge label: RTTL1 = {L(g) | g ∈ RT T and the number of children of every vertex in g is not 2} and RTTL2 = {L(g) | g ∈ RT T and (i) for every pair of vertices in t whose degrees are more than 2, there exists a vertex of degree 2 on the path between them, and (ii) there is no vertex of degree 3 in t such that the distance between any leaf and the vertex is at least 2}. Therefore the classes are polynomial time inductively inferable from positive data. But it is an open question whether or not the class of all regular term tree languages with no edge label is polynomial time inductively inferable from positive data.
References 1. T. R. Amoth, P. Cull, and P. Tadepalli. Exact learning of unordered tree patterns from queries. Proc. COLT-99, ACM Press, pages 323–332, 1999. 2. D. Angluin. Finding patterns common to a set of strings. Journal of Computer and System Science, 21:46–62, 1980. 3. H. Arimura, T. Shinohara, and S. Otsuki. Polynomial time algorithm for finding finite unions of tree pattern languages. Proc. NIL-91, Springer-Verlag, LNAI 659, pages 118–131, 1993. 4. S. Goldman and S. Kwek. On learning unions of pattern languages and tree patterns. Proc. ALT-99, Springer-Verlag, LNAI 1720, 1720:347–363, 1999. 5. S. Matsumoto, Y. Hayashi, and T. Shoudai. Polynomial time inductive inference of regular term tree languages from positive data. Proc. ALT-97, Springer-Verlag, LNAI 1316, pages 212–227, 1997. 6. T. Miyahara, T. Shoudai, T. Uchida, T. Kuboyama, K. Takahashi, and H. Ueda. Discovering new knowledge from graph data using inductive logic programming. Proc. ILP-99, Springer-Verlag, LNAI 1634, pages 222–233, 1999. 7. T. Miyahara, T. Shoudai, T. Uchida, K. Takahashi, and H. Ueda. Polynomial time matching algorithms for tree-like structured patterns in knowledge discovery. Proc. PAKDD-2000, Springer-Verlag, LNAI 1805, pages 5–16, 2000. 8. T. Miyahara, T. Shoudai, T. Uchida, K. Takahashi, and H. Ueda. Discovery of frequent tree structured patterns in semistructured web documents. Proc. PAKDD2001, Springer-Verlag, LNAI 2035, pages 47–52, 2001. 9. S. Nestorov, S. Abiteboul, and R. Motwani. Extracting schema from semistructured data. Proceedings of ACM SIGMOD International Conference on Management of Data, pages 295–306, 1998. 10. T. Shinohara. Polynomial time inference of extended regular pattern languages. In Springer-Verlag, LNCS 147, pages 115–127, 1982. 11. T. Shoudai, T. Miyahara, T. Uchida, and S. Matsumoto. Inductive inference of regular term tree languages and its application to knowledge discovery. Information Modelling and Knowledge Bases XI, IOS Press, pages 85–102, 2000. 12. K. Wang and H. Liu. Discovering structural association of semistructured data. IEEE Trans. Knowledge and Data Engineering, 12:353–371, 2000.
12