Learning Concatenations of Locally Testable Languages from Positive Data Satoshi Kobayashi
Takashi Yokomori
Department of Computer Science and Information Mathematics The University of Electro-Communications 1-5-1, Chofugaoka, Chofu, Tokyo 182, Japan e-mail:fsatoshi,
[email protected]
Abstract This paper introduces the class of concatenations of locally testable languages and its subclasses, and presents some results on the learnability of the classes from positive data. We rst establish several relationships among the language classes introduced, and give a sucient condition for a concatenation operation to preserve nite elasticity of a language class C . Then we show that, for each k, the class CLT k , a subclass of concatenations of locally testable languages, is identi able in the limit from positive data. Further, we introduce a notion of local parsability, and de ne a class (k; l)-C LT S , which is a subclass of the class of concatenations of strictly locally testable languages. Then, for each k; l 1, (k; l)-C LT S is proved to be identi able in the limit from positive data using reversible automata with the conjectures updated in polynomial time. Some possible applications of this result are also brie y discussed.
1 Introduction Inductive inference is a process of acquiring a concept from its examples. This process was formulated by Gold as a process of identifying a target concept in the limit, which is called Gold's ideti cation in the limit [Gol67]. He also showed that a super nite class, i.e., a class which contains all nite concepts and at least one in nite concept, is not identi able in the limit from positive data only, which was shocking to us because it leads us to the negative result on the learnability of the class of regular languages from positive data. On the other hand, we have also known some interesting classes of languages, k-reversible languages [Ang82], pattern languages [Ang80], etc., which are identi able in the limit from positive data. However, as is mentioned in [Ang82], further research on the learnability from positive data for subclasses of regular languages remains open to be studied. In particular, [Ang82] refers to a 1
possibility of close relationships between noncounting languages and reversible suggests that a certain synthetic approach to learning these two language classes might give some useful results for analyzing subclasses of regular languages learnable from positive data. Recently, in [Yok90], Yokomori has shown results on the learnability of the class of strictly locally testable languages from positive data and presents an interesting relationship between strictly ktestable languages and (k+1)-reversible languages. This paper introduces some subclasses of noncounting languages, and investigates relationships among those classes and the class of reversible languages. Further, we present some learnability results on the classes. In section 2, we introduce the class of concatenations of locally testable languages CLT S and its subclasses, and then we compare them with the class of reversible languages in section 3. Section 4 presents theoretical results on the learnability of CLT S and its subclasses. Especially we show that the class (k; l)-CLT S , which is a subclass of CLTS with local parsability, is identi able in the limit using (k+2l)reversible automata with the conjectures updated in polynomial time. Some possible applications of the results are also brie y discussed. languages, and
2 Concatenations of Locally Testable Languages Let 6 be a nite alphabet and 63 be the set of all nite length strings over 6. Let 6 be the set of all strings over 6 of length k. We denote the null string by . 6+ is de ned as 63 0 . The length of a string w 2 63 is denoted by j w j. Please do not confuse it with the notation j S j for a set S , which represents the cardinality of S . A language is a subset of 63 . In this section, we consider only non-null languages. Therefore, in this section, we assume that a language over 6 is a subset of 6+. A concatenation of languages, L1 1 L2 , is de ned as a set of strings fw1 w2 j w1 2 L1 ; w2 2 L2g. L (w) and R (w) are de ned as the k-length pre x and k-length sux of w, respectively. These notations are de ned only when w has length k or more. Further, let I (w) be the set of all interior substrings of length k. Note that, for any string w with j w j k + 1, it holds that I (w) = ;, where ; denotes an empty set. Then, we de ne the class of locally testable languages as follows [MP71]. Let k be a positive integer. A language L over 6 is k-testable i for all strings, w1 ; w2 , of length k or more, if L (w1 ) = L (w2 ), R (w1) = R (w2) and I (w1 ) = I (w2 ), then either w1 and w2 are in L or neither are. A language L is locally testable i L is k -testable for some positive integer k . The class of k -testable languages and the class of locally testable languages are denoted by LT = and LT , respectively. We denote [ LT = by LT . The de nition of k -testable languages says nothing about strings of length less than k. So, a k-testable language may include any subset of strings of length less than k. For any positive integer k, a language L over 6 is said to be strictly k -testable i there exist nite sets A, B , and C such that A; B; C 6 , and for any string w with j w j k, w 2 L i L (w) 2 A, R (w) 2 B , and I (w ) C . Here, k
k
k
k
k
k
k
k
k
k
k
i
k
i
k
k
k
k
2
k
k
< A; B; C > is called a triple for L and denoted by triple(L). A language L is strictly locally testable i L is strictly k-testable for some positive integer k . We denote the class of strictly k -testable languages and the class of strictly locally testable languages by LT S = and LT S , respectively. The class [ LT S = is denoted by LT S . k
i
i
k
k
[MP71] (1) The class of locally testable languages (k-testable languages) is closed under the Boolean operations. (2) The class of strictly locally testable languages (strictly k-testable languages) is closed under intersection. (3) The class of locally testable languages (k-testable languages) is the closure of that of strictly locally testable languages (strictly k-testable languages) under the Boolean operations. Example 1 Let us consider a strictly 2-testable language L over 6 = fa; bg, for which < faag; fbbg; faa; ab; bbg > is a triple. This language is also denoted by a regualr exspression aaa3 bbb3 . Here we can easily show that L is a strictly 3-testable language for which < faaa; aabg; fbbb; abbg; faaa; aab; abb; bbbg > is a triple. At rst thought, it seems to hold that for any positive integer k, LT S = and LT = are contained in LT S = +1 and LT = +1 , respectively. However, this is not the case. For example, let us consider a strictly k-testable language L0 = fa ; a +1 g for which < fa g; fa g; ; > is a triple. Then, it holds that L0 is not in LT = +1 because for w1 = a +1 2 L0 and w2 = a +2 62 L0 , we have L +1 (w1 ) = L +1(w2 ), R +1 (w1 ) = R +1 (w2 ), and I +1 (w1) = I +1 (w2 )(= ;). Therefore, in general it holds that LT S= 6 LT = +1 . Further, by Theorem 1, we have LT = 6 LT = +1 and LT S = 6 LTS = +1 . 2 Here, let us consider a slightly dierent de nition of locally testable language as follows. In this setting, we say that a language L is k-testable i for all strings, w1 ; w2 , of length k + 1 or more, if L (w1) = L (w2 ), R (w1) = R (w2 ) and I (w1 ) = I (w2), then either w1 and w2 are in L or neither are. The dierence between this de nition and the original one is underlined. In this de nition, we can prove that LT = is contained in LT = +1 in the following manner. Let L be a language in LT = and w1 ; w2 be strings of length k +2 or more such that L +1 (w1 ) = L +1 (w2 ), R +1(w1 ) = R +1 (w2 ), and I +1(w1 ) = I +1(w2 ). It suces to show w1 2 L i w2 2 L. First, we prove L (w1) = L (w2 ), R (w1 ) = R (w2 ) and I (w1) = I (w2 ). It is easy to see L (w1 ) = L (w2) and R (w1) = R (w2 ). For proving I (w1) = I (w2 ), we consider the next two cases. In case j w1 j k + 3, I +1 (w1) = I +1 (w2) immediately implies I (w1 ) = I (w2 ). In case j w1 j= k + 2, we have I +1 (w2 ) = I +1 (w1 ) = ;. Therefore, j w2 j= k + 2 holds. Let aw be L +1 (w1)(=L +1(w2 )), where a 2 6 and w 2 63 . Then we have I (w1) = fwg = I (w2 ). Theorem 1
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
3
Hence, in any case, we have I (w1 ) = I (w2 ). Therefore, it holds that w1 2 L i w2 2 L, since L 2 LT = . This implies that L is a (k + 1)-testable language. As disscussed above, if we use the new de nition, then we have interesting inclusion properties among the classes of k-testable languages. However, in the rest of the paper, we restrict the attention to the original de nition of locally testable languages. Let us consider the class of concatenations of locally testable languages. For any class of languages C , we denote by Con(C ) the class of languages which is the smallest class of languages that includes C and is closed under concatenation. Then, by CLT , CLT = , CLT , CLT S , CLT S= , and CLT S , we denote Con(LT ), Con(LT = ), Con(LT ), Con(LT S ), Con(LT S = ), and Con(LT S ), respectively. Example 2 Let us consider languages L1 and L2 over 6 = fa; b; cg, which are denoted by regular expressions (a + b)(a + b)3 and (b + c)(b + c)3 , respectively. It is easy to see that L1 and L2 are strictly 1-testable languages such that triple(L1 ) =< fa; bg; fa; bg; fa; bg >, triple(L2 ) =< fb; cg; fb; cg; fb; cg >. Let L3 = L1 [ L2 . Then, by Theorem 1, we have that L3 is 1-testable, so L3 2 LT . Please note that L3 is k-testable for any positive integer k since both L1 and L2 are strictly k-testable for any positive integer k . However, we can prove that L3 62 CLTS . Let us assume L3 2 CLT S . Then there exist some positive integer n and a sequence of strictly locally testable languages S1; S2 ; :::; S such that L3 = S1 1 S2 1 1 1 1 1 S and S is strictly k testable for some positive integer k . Here we have n = 1 since a 2 L3. (Recall we consider only non-null languages.) Let < A; B; C > be a triple for S1 = L3 . Then, since a 1 +1 b 1 +1 2 L3 and b 1 +1 c 1 +1 2 L3 , it holds that a 1 ; b 1 2 A, b 1 ; c 1 2 B , and 0 8j k1 (a 1 0 b 2 C ^ b 1 0 c 2 C ). Therefore, a 1 b 1 c 1 2 L3 , which is a contradiction. Let L4 = L1 1 L2 1 L1. Then, L4 is in CLT S by its de nition. Please note that L4 is in CLT S = for any positive integer k . However, we can prove L4 62 LT as follows. Let us assume that L4 2 LT . Then there exists some positive integer k such that L4 is k-testable. Here we have w1 = a +1 ca 2 L4, w2 = a ca ca 62 L4 . It is easy to see that L (w1) = L (w2 ), R (w1) = R (w2), and I (w1 ) = I (w2) hold. This is a contradiction. 2 From the discussion above, we have the next lemma. Lemma 1 (1) There exists a language L such that, for any positive integer k , L 2 LT = and L 62 CLT S . (2) There exists a language L such that, for any positive integer k, L 2 CLT S = and L 62 LT . Then, we have the followings. Theorem 2 (1) CLT S , CLT S = , and CLT S are incomparable to LT . k
k
k
k
k
k
k
k
k
k
k
n
n
i
i
i
k
k
k
k
k
k
k
k
k
j
k
k
j
k
j
j
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
k
4
(2)
CLT S (CLT S = , CLTS ) properly includes LT S (LT S = , LT S , k
k
k
k
respectively). (3) LT (LT = , LT ) properly includes LTS (LTS = , LTS , respectively). (4) CLT (CLT = , CLT ) properly includes LT (LT = , LT , respectively). (5) CLT (CLT = , CLT ) properly includes CLT S (CLTS = , CLT S , respectively). In this paper, we de ne noncounting languages by using the notion of locally testable languages. The class NC of noncounting languages is de ned as the smallest class of languages that contains LT and is closed under the Boolean operations and concatenation. Therefore, all of the language classes introduced in this section are subclasses of NC . k
k
k
k
k
k
k
k
k
k
k
k
3 Comparison with Reversible Languages In this section, we compare the classes of languages introduced in section 2, with the class of reversible languages which is identi able in the limit from positive data [Ang82]. Here we give the de nition of reversible languages based on the languagetheoretic characterization [Ang82]. Let k be a non-negative integer. A language L is k-reversible i whenever u1vw and u2vw are in L and j v j= k, it holds that for any x 2 63 , u1vx 2 L i u2 vx 2 L. (In case k = 0, we say L is zero-reversible rather than 0-reversible.) A language L is said to be reversible i L is k-reversible for some non-negative integer k . The class of k-reversible languages and the class of reversible languages are denoted by Rev (k) and Rev, respectively. Then we have the following. Lemma 2 [Ang82] For any non-negative integer k, Rev(k ) is properly contained in Rev(k + 1). Example 3 The language denoted by a regular expression (bb)+ is zero-reversible. However, this language is not contained in NC . (cf. [MP71], p.6) Let L1 and L2 be strictly 1-testable languages such that triple(L1 ) = < fag; fag; fag > and triple(L2 ) =< fcg; fcg; fag >. Then, by the de nition, L3 = L1 1 L2 2 CLTS holds. Please note that L3 is in CLTS = for any positive integer k . However, we can prove that L3 62 Rev as follows. Let us assume that L3 is k -reversible for some non-negative integer k. Then, aa c 2 L3 , aca c 2 L3, and aa cac 2 L3 hold. Therefore, by the de nition of k-reversible language, we have that aca cac 2 L3, which is a contradiction. Let L4 and L5 be strictly 1-testable languages such that triple(L4 ) = < fag; fa; bg; fag > and triple(L5) =< fcg; fag; fag >. Then L6 = L4 [ L5 is in LT . Please note that L6 is k-testable for any positive integer k. We can prove that L6 is not in Rev. k
k
k
k
k
5
Let us assume that L6 is k -reversible for some non-negative integer k. Then, a(a) a 2 L6 , c(a) a 2 L6, and a(a) b 2 L6 hold. Therefore, by the de nition of k-reversible language, we have that c(a) b 2 L6, which is a contradiction. 2 k
k
k
k
Using the discussion above, we have the followings. There exists a language L such that, for any positive integer k ,
Lemma 3 (1)
L 2 CLT S = , and L 62 Rev. k
(2)
There exists a language L such that, for any positive integer k, L 2 LT = , and L 62 Rev. k
Further, the next fact is proved. Lemma 4
[Yok90] LTS = is properly contained in Rev(k+1). k
Therefore, we have the followings. The following classes are incomparable to Rev . NC , LT , LT = , LT , CLT S , CLTS = , CLT S , CLT , CLT = , and CLT
Theorem 3 (1) k
(2)
k
k
k
k
k
LT S is properly contained in Rev.
A part of relationships among the classes of languages introduced in this paper is summarized in Figure 1.
4 Learnability Results 4.1 De nitions Here we brie y introduce some fundamental de nitions. For more details, please refer to [Gol67], [BB75], [Ang80], and [LZ93]. Let C be a class of non-empty languages over a xed alphabet 6. Then, we consider a class of representations R for C with the following properties. 1. R is a recursively enumerable language (over some xed alphabet). 2. For all L 2 C , there exists r 2 R such that r represents L (denoted by L(r) = L). 3. There exists a recursive function f such that for all r 2 R and w 2 63 , if w 2 L(r) f (r; w) = 01 otherwise We say that a class of representation R is class preserving with respect to C i C = fL(r) j r 2 Rg holds. A class of representation R is said to be class comprising with respect to C i C fL(r) j r 2 Rg holds.(cf.[LZ93])) 6
For a given L 2 C , a positive presentation of L is any in nite sequence w1 ; w2 ; w3; ::: of strings such that 8w 2 L 9i(w = w ) and 8i(w 2 L). Let L be a given language. We say that an algorithm A identi es L in the limit from positive data using R i for any positive presentation of L, the in nite sequence, r1 ; r2 ; r3; :::, of representations in R produced by A converges to a representation r such that L = L(r). A class C of languages is said to be identi able in the limit from positive data using R i there exists an algorithm A such that A identi es i
i
every language in C in the limit from positive data using R. A learning algorithm A is said to be responsive on C i for any L 2 C and any positive presentation of L, A always outputs some conjecture between any consecutive input requests from A. An algorithm A consistently identi es C in the limit from positive data using R i A identi es C in the limit from positive data using R and for any representation r produced by A, the given set of strings fw1 ; w2 ; :::; w g is contained in L(r ). An algorithm A conservatively identi es C in the limit from positive data using R i A identi es C in the limit from positive data using R and for any output r (i 2) of A, it holds that, if L(r 01 ) contains the set of given strings fw1; :::; w g, then r = r 01 . A class C of languages is said to be identi able in the limit from positive data using R with the conjectures updated in polynomial time i there exists some algorithm A which is responsive on C and consistently and conservatively identi es C in the limit from positive data using R with the property that the time used by A for updating conjectures is bounded by some polynomial with respect to the size of given examples up to that point, i.e. j w1 j + 1 1 1 + j w j. It is often the case that a class of representations R with class preserving property may be encoded by the set of positive integers so that each integer i corresponds to the ith representation in R. In this case, by L , we denote the language which is represented by an integer i, and an in nite sequence L1; L2 ; L3 ; ::: is called an indexed family of recursive languages, or an indexed family for short.[Ang80] i
i
i
i
i
i
i
i
i
i
Note : In the sequel, if a representation class R is not speci ed, we always assume that some appropriate enumerable class of representations with class preserving property is attached to a target concept class.
4.2 Learnability of C LT k from Positive Data Let C be an indexed family. We say C has nite thickness i for any string w 2 63 , the number of languages in C which contain w is nite. Angluin showed that nite thickness is a sucient condition for learnability from positive data [Ang80]. Wright introduced another sucient condition, called nite elasticity, for learnability from positive data, originaly in [Wri89], and correctly in [MSW90]. An indexed family C of languages has in nite elasticity i there exist an in nite sequence w0; w1; w2 ; ::: of strings and an in nite sequence L1; L2 ; ::: of languages in C such that, for any k 1, fw0 ; w1 ; :::; w 01 g L and w 62 L hold. A class C has nite elasticity i C does not have in nite elasticity. k
7
k
k
k
As proved in [Wri89], it holds that, if a class C has nite thickness, then C has nite elasticity. [Wri89] An indexed family C is identi able in the limit from positive data if C has nite elasticity.
Theorem 4
Since the number of k-testable languages on a xed nite alphabet is nite, we immediately obtain the following. Lemma 5
LT has nite thickness, and therefore, nite elasticity. k
By Theorem 4 and Lemma 5, we have the following. Theorem 5
LT is identi able in the limit from positive data. k
Now, for an indexed family C , let us consider the learnability of Con(C ) from positive data. The next result is useful for proving the learnability of Con(C ) from positive data, when C has nite elasticity. Let us consider an indexed family C with the following properties. (C1) For any language L in the class, if 2 L, then L = fg. (C2) The class has nite elasticity. Then, Con(C ) also satis es the conditions (C1) and (C2). Lemma 6
Proof
Let us consider a language L1 11 1 L in Con(C ) which contains a null-string. Then, each L must contain a null-string. From the condition (C1) of C , since each L = fg, we have L1 1 1 1 L = fg. Therefore, Con(C ) satis es the condition (C1). The proof for the claim that Con(C ) satis es the condition (C2) is as follows. Let us assume that Con(C ) has in nite elasticity. Then there exist in nite sequences, w0; w1 ; w2 :::, of strings and L1 ; L2; ::: of languages in Con(C ) such that fw0 ; w1 ; :::; w 01 g L and w 62 L for any positive integer k . For any L (k 1), there is a sequence L 1 ; L 2 ; :::; L ( ) of languages in C such that L = L 1 1 L 2 11 1 L ( ). Here we may assume that 8k 1; 1 8i l(k ) ( 62 L ), since, otherwise, L = fg by the condition (C1) of C , and therefore, L can be removed from the sequence. Then we construct the following three in nite sequences: k0; k1 ; k2 ; ::: of nonnegative integers, N0; N1 ; N2 ; ::: of sets of non-negative integers, and t 0 ; t 1 ; t 2 ; ::: of tuples of strings, where by l we denote the length of t . The construction is based on the recursive procedure bellow. n
i
i
n
k
k
k
k
k
k;i
k;
k;
k;
k
k;
k;l k
k;l k
k;i
k;i
k
i
ki
8
k
k
Initialization :
i = 0; k0 = 0; N01 =the set of all positive integers.
Stage i :
For each tuple of strings t = (s1; s2 ; :::; s ) such that w = s1s2:::s and 1 8j p (s 2 6+ ), let A = fk 2 N 01 j l(k ) = p ^ 1 8j p (s 2 L )g. Find t = (w 1 ; w 2; :::; w ) such that j A j= 1. Let N = A and k +1 = minfj 2 N g. Initiallize each A to ;. Goto stage i + 1. p
ki
p
j
t
i
ki
ki ;
i
j
ki ;
ki ;l i
tk
i
tk
i
k;j
i
i
t
For each tuple t = (s1; s2 ; :::; s ) of strings, A represents the set of all indices i of languages L (= L 1 1 L 2 1 1 1 L ) such that s1 2 L 1 ; s2 2 L 2 ; :::; s 2 L . Here we can prove, for each stage, that there exists some A such that j A j= 1, and it holds that k +1 > k , N N 01 and j N j= 1. The claim is proved by the induction on i. Let us consider the case i = 0. In this case, we have for all positive integers k, w 0 = w0 2 L . Therefore, for any k 1, there exists a tuple of strings t = (s 1 ; s 2 ; :::; s ( ) ) such that w 0 = w0 = s 1 s 2 1 1 1 s ( ) and 1 8j l(k ) (s 2 L ). Here we have 1 8j l(k) (s 6= ), since 62 L . Therefore, for each k ( 1), there exists a tuple t such that k 2 A . The number of tuples t = (s1 ; s2 ; :::; s ) such that w0 = s1s2:::s and 1 8j p (s 2 6+ ), is nite. Hence, there exists a tuple t 0 such that j A 0 j= 1. Therefore, we have j N0 j= 1. It is easy to see N0 N01 by the de nition of N0 and A 0 . From 8j 2 N01 (j 1), we have k1 1 > 0 = k0 . Therefore, the claim holds in case i = 0. Let us assume the claim holds in case i = n 0 1 and consider the case i = n. In a similar manner as in the case i = 0, we can prove that there exists a tuple of strings t = (w 1; :::; w ) such that j A j= 1. Therefore, we have j N j= 1. It is sucient to show k +1 > k and N N 01 . It is easy to see N N 01 by the de nition of N and A . From w 62 L , we have that there is no tuples t = (s1; :::; s ) such that w = s1s2 1 1 1 s , l(k ) = p and 1 8j l(k ) (s 2 L ). Therefore, for any t = (s1; :::; s ) such that w = s1 1 1 1 s and 1 8j p (s 2 6+ ), k 62 A holds. Hence, we have k 62 A = N . This implies that k +1 > k . Therefore the claim holds. p
i
i;
t
i;
i;p
i;
i;
p
tk
tk
i
i
i
i
i
k
k
k
i
i;p
i
k;
k;
k;j
k;
k
k;l k
k;j
k;
k;l k
k;j
k;j
t
p
p
j
tk
k
tk
kn ;
kn
tkn
kn ;l n
n
n
n
n
n
n
n
p
kn
kn
p
kn
n
tkn
n
n
p
kn ;j
p
kn
n
j
j
n
tkn
n
n
t
n
Next, for some in nitely many integers k 0, we select a string w 3 and a language L 3 from t = (w 1 ; w 2; :::; w ) and L 1 ; :::; L ( ) , respectively, and construct an in nite sequence of strings and languages satisfying the condition of in nite elasticity. The construction is as follows. For 1 8j j w0 j, let P = fk j j l(k ) ^ w 62 L g. Here, we have that 8k ; 1 9j l(k ) (w 62 L ), since w 62 L . It also holds that, for any positive integer k, l(k) j w0 j, since 8k 1 (w0 2 L ). i
ki ;j
ki ;
ki
j
ki ;
ki ;j
ki ;l i
n
n
n
n
kn ;j
ki ;
kn ;j
kn ;j
ki ;l ki
kn ;j
kn
kn
k
9
These facts imply that, for each k , there exists some integer j such that 1 j j w0 j and k 2 P . Therefore, there exists some j 3 such that j P 3 j= 1. By k , we denote the ith smallest element in P 3 . Let us consider an in nite sequene of strings, w 1 3 ; w 2 3 ; ::: and an in nite sequence of languages, L 1 3 ; L 2 3 ; :::. By the relation N 1 N 2 N 3 1 1 1 N 01 , it is easy to see that k 2 N holds for m = 1; 2; :::; i 0 1. Therefore, we have 3 2 L 3 for m = 1; 2; :::; i 0 1 by the de nition of N and A . w Hence, fw 1 3 ; w 2 3 ; :::; w 01 3 g L 3 holds. On the other hand, by the de nition of P 3 , we have w 3 62 L 3 for any k 2 P 3 . These facts imply that the class C has in nite elasticity, which is a contradiction. This completes the proof. 2 n
n
j
j
j
pi
kp ;j
kp ;j
kp ;j
p
kp ;j
pi
p
p
pi
pm
kp ;j
kpm ;j
pm
i
kp ;j
kp ;j
kp
i
m
kp ;j
;j
i
kp ;j
j
t kp
kp ;j
i
j
pi
i
Let us consider language classes C0 = ffag; a3 g and C1 = f( + 3 a); a g over 6 = fag, both of which have nite thickness. Con(C0 ) has nite elasticity. However, Con(C1 ) does not have nite elasticity, which is because C1
Example 4
does not satisfy the condition (C1). Therefore, the condition (C1) is necessary in this sense. Please note that C1 is identi able in the limit from positive data but that Con(C1) is not identi able in the limit from positive data, since there exists no nite tell-tale(cf. [Ang80]) for a3 . 2 It was shown in [MS93] that a xed nite number of language concatenations preserves nite elasticity. Here, we have proved that under a condition (C1), Con(C ), i.e., arbitrary number of language concatenations preserves nite elasticity. Note, however, that without a condition (C1), Con(C ) does not preserve nite elasticity of C , as shown in the example above. Theorem 6
data.
CLT and CLT S are identi able in the limit from positive k
k
Proof
By Lemma 5 and Lemma 6, CLT has nite elasticity. By Therem 2, CLT S has nite elasticity, too. By Theorem 4, we have the results. 2 k
k
4.3 Local Parsability We showed in the previous subsection that the class CLT and CLT S are identi able in the limit from positive data, which does not mean the existence of ecient learning algorithms for CLT or CLT S . In this subsection, we consider the problem of learning a subclass of CLT S from positive data with the conjectures updated in polynomial time. The di culties of ecient learning of CLT S seem to lie in the intractability of nding concatenation points of given training examples. Therefore, it would be better to impose some reasonably restrictive conditions on concatinating strictly locally testable languages for obtaining some eciently learnable subclass of CLT S . In this paper, we introduce a notion called local parsability. This notion has some close relationships to the notion of local parsability originally de ned in k
k
k
k
k
k
10
k
[MP71], which is proposed for analyzing code events. Intuitively, a language L in Con(C ) is said to be locally parsable if we can determine concatenation points of any given string w in L by scanning w with a xed nite length window. More
formally, the notion is de ned as follows. Let k be a positive integer. A parse set of length k is a nite set of pairs of strings (p; q) such that j p j k; j q j k . Let P S be a parse set of length k . Then, for any string w in 63 , N (w; PS ) is de ned as the set consisting of 0 and j w j, and integers i such that 9 (p; q ) 2 P S 9 x; y 2 63 (w = xpqy ^ i =j x j + j p j). Here, we denote the i-th smallest element of N (w; P S ) by j . Let C be a class of languages, P S be a parse set of length k, L1; 1 1 1 ; L be a nite sequnce of languages in C and w be a string in L = L1 1 1 1 L . Then, w is parsable to L1 ; 1 1 1 ; L based on P S , i j N (w; P S ) j= n + 1 and 8j ; j +1 2 N (w; PS ) (sub(w; j +1; j +1 ) 2 L ), where, by sub(w; i; j ), we denote the substring of w which starts at the ith and ends at the j th character of w. In case of i > j , sub(w; i; j ) represents . A language L 2 Con(C ) is said to be kparsable i there exists a nite sequence L1 ; 1 1 1 ; L of languages in C and a parse set PS of length k such that L = L1 1 1 1 L and, for any string w 2 L1 1 1 1 L , w is parsable to L1 ; 1 1 1 ; L based on PS . A language L 2 Con(C ) is said to be locally parsable i L is k -parsable for some positive integer k. The class of languages (k; l)-CLT S is de ned as the smallest class of languages that contains all languages L 2 CLT S such that L is l-parsable. The class of languages P CLT S is de ned as the smallest class of languages that contains all languages L 2 CLT S such that L is locally parsable. i
n
n
n
i
i
i
i
i
n
n
n
n
k
Let L1 and L2 be strictly 1-testable languages such that triple(L1) = < fag; fag; fag > and triple(L2 ) =< fbg; fbg; fbg >. Let L3 = L1 1 L2 1 L1 . Then we can easily prove L3 2 (1; 1)-CLT S , since P S = f(a; b); (b; a)g is a parse set of length 1 such that for any string w 2 L3 , w is parsable to L1 ; L2; L1 based on PS . Please note here that L3 is in (k; l)-CLTS for any positive integers k and l. We can prove L3 62 LT as follows. Let us assume L3 2 LT . Then there exists some positive integer k such that L3 is k -testable. We have w1 = a +1 ba 2 L3 and w2 = a ba ba 62 L3 . It is easy to see that L (w1) = L (w2), R (w1 ) = R (w2 ) and I (w1 ) = I (w2) hold, which is a contradiction. 2
Example 5
k
k
k
k
k
k
k
k
k
k
k
Therefore, we have the following. Lemma 7
For any positive integers k and l, there exists a language L in (k; l)-
CLT S such that L 62 LT .
Please note that any language L in LT S is in (k; l)-CLT S , for any positive integer l, since any string in L is l-parsable to L based on PS = ;. k
P CLTS and (k; l)-CLTS are incomparable to LT . P CLT S ((k; l)-CLT S ) properly includes LT S (LT S , respectively).
Theorem 7 (1) (2)
k
11
Regular NC CLT CLTS LT
PCLTS LTS
Rev
Figure 1: Relationships between subclasses of regular languages
4.4 Ecient Learning of (k; l)-C LT S from Positive Data In this subsection, we will show that (k; l)-CLT S is identi able in the limit from positive data using reversible automata with the conjectures updated in polynomial time. This proof is based on a relationship between (k; l)-CLT S and Rev(k +2l). Lemma 8 Let L be any language in LT S and u1 ; u2 ; v; x, and y be any strings over 6 such that u1 vx; u2 vx; u1 vy 2 L, and j v j= k + 1. Then u2 vy 2 L holds. k
Proof
There exists some positive integer j ( k) such that L is strictly j -testable by the de nition of LT S . By Lemma 4, L is (j + 1)-reversible. Then we have that L is (k + 1)-reversible by Lemma 2. Therefore, we have u2 vy 2 L by the de nition of k-reversible language. 2 We then can prove an interesting relationship between (k; l)-CLTS and Rev(k +2l). Theorem 8 For any positive integers k and l, (k; l)-CLT S is contained in Rev(k +2l). k
Proof
Let L be a language in (k; l)-CLT S . Then there exist a positive integer n, a sequence of languages L1 ; :::; L in LT S , and a parse set P S of length l such that L = L1 1 1 1 L , and, for all w 2 L, w is parsable to L1 ; :::; L based on P S . Let us consider any strings u1 ; u2 ; v; x; y 2 63 such that w1 = u1 vx 2 L; w2 = u2 vx 2 L; w3 = u1vy 2 L and j v j= k+2l. It is sucient for us to show w4 = u2vy 2 L. n
k
n
n
12
Here, we denote the i-th smallest element in N (w ; PS ) by j , for p = 1; 2; 3; 4. For proving w4 = u2 vy 2 L, it suces to show j N (w4 ; P S ) j= n + 1 and 8j4 ; j 4+1 2 N (w4; P S ) sub(w4; j 4 + 1; j 4+1 ) 2 L . From the assumption above, we have, (1) j N (w ; P S ) j= n + 1 (for p = 1; 2; 3) (2) 8j ; j +1 2 N (w ; P S ) sub(w ; j + 1; j +1 ) 2 L (for p = 1; 2; 3) Further, for a nite set of integers N and an integer m, by N (N ), we denote the set of all elements in N which are less than or equal to m (grater than m), and, by N=m, we denote the set fe 0 m j e 2 N g. Then we have the followings.(cf. Figure 2) (3) N (w3 ; PS )j 1 j+ = N (w1 ; P S )j 1 j+ (4) N (w2 ; PS ) j 2 j+ = j u2 j = N (w1 ; P S ) j 1 j+ = j u1 j (5) N (w2 ; PS )j 2 j+ = N (w4 ; P S )j 2 j+ (6) N (w3 ; PS ) j 1 j+ = j u1 j = N (w4 ; P S ) j 2 j+ = j u2 j p
p
i
i
i
i
i
i
p
p
p
i
i
p
p
p
p
i
i
i
m
u
l
> u
l
u
l
> u
l
u
>m
l
> u
u
l
l
> u
l
N(w1,PS) >|u1|+l
N(w1,PS) i3 (j 1 0 j u1 j= j 2 0 j u2 j> l ^ j 3 0 j u1 j= j 4 0 j u2 j> l). By (2); (8); (9), we have 8j 4; j 4+1 2 N (w4 ; P S ) s:t: i + 1 i3 sub(w4 ; j 4 + 1; j 4+1 ) = sub(u2 v; j 4 + 1; j 4+1 ) = sub(u2 v; j 2 + 1; j 2+1 ) = sub(w2 ; j 2 + 1; j 2+1) 2 L 8j 4; j 4+1 2 N (w4 ; P S ) s:t: i > i3 sub(w4; j 4 + 1; j 4+1 ) = sub(vy; j 4 + 10 j u2 j; j 4+1 0 j u2 j) = sub(vy; j 3 + 10 j u1 j; j 3+1 0 j u1 j) = sub(w3 ; j 3 + 1; j 3+1 ) 2 L Therefore, it is only left for us to show sub(w4 ; j 43 + 1; j 43+1 ) 2 L 3 , We consider the two cases of j 43 +1 j u2 j +k + l and j 43+1 j u2 j +k + l +1. In case j 43 +1 j u2 j +k + l, we have j23+1 = j 43+1 j u2 j +k + l, since N (w2 ; P S )j 2 j+ + = N (w4 ; PS )j 2 j+ + holds. Therefore, by (2) and (8), it holds that sub(w4; j 43 + 1; j 43+1 ) = sub(u2v; j 43 + 1; j 43 +1 ) = sub(u2v; j 23 + 1; j 23+1 ) = sub(w2 ; j 23 + 1; j 23+1 ) 2 L 3 . Let us consider the case j 43 +1 j u2 j +k + l + 1. In this case, j 23 +1 j u2 j +k + l + 1 holds since otherwise j 43+1 = j 23 +1 j u2 j +k + l holds by the the relation N (w2; P S )j 2 j+ + = N (w4 ; P S )j 2 j+ + . Therefore, we have j 13 +1 ; j 33 +1 j u1 j +k + l + 1 by (9). Let q1 = sub(u1v; j 13 + 1; j u1 j +l), q2 = sub(u2 v; j 23 + 1; j u2 j +l), z1 = sub(vx; k + l + 2; j 13 +1 0 j u1 j), z2 = sub(vy; k + l + 2; j 33 +1 0 j u1 j), and r = sub(v; l +1; k + l +1). Then, by (2); (8); (9), we have sub(w1 ; j 13 +1; j 13 +1 ) = q1 rz1 2 L 3 , sub(w2 ; j 23 +1; j 23 +1 ) = q2 rz1 2 L 3 , sub(w3; j 33 +1; j 33 +1 ) = q1rz2 2 L 3 , and sub(w4; j 43 + 1; j 43 +1 ) = q2 rz2 . Here note that j r j= k + 1. Therefore, by Lemma 8 and L 3 2 LTS , we have sub(w4 ; j 43 + 1; j 43+1 ) = q2 rz2 2 L 3 . This completes the proof. 2 u
l
u
i
i
l
u
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
u
i
k
l
u
k
i
l
i
i
i
i
i
i
i
i
i
i
i
i
u
i
i
i
i
i
i
l
i
i
i
i
u
i
i
i
l
k
i
l
u
k
l
i
i
i
i
i
i
i
i
i
i
i
i
i
i
k
i
i
i
i
i
i
Theorem 9 [Ang82] The class of k -reversible languages is identi able in the limit from positive data using k-reversible automata with the conjectures updated in polynomial time.
The learning algorithm for Rev (k) is called k-RI in [Ang82]. (k; l)-CLT S is identi able in the limit from positive data using (k+2l)-reversible automata with the conjectures updated in polynomial time.
Theorem 10 Proof
By Theorem 8 and Theorem 9, we have only to apply the learning algorithm (k+2l)-RI for learning (k; l)-CLT S . 2 14
Note here that the (k +2l)-RI algorithm does not always output a conjecture 2 (k; l)-CLT S. Therefore, Theorem 10 does not imply the existence of a class preserving ecent learning algorithm, but the class comprising ecient learnability of the class (k; l)-CLT S . It is an open question whether the class preserving ecient learnability holds for (k; l)-CLT S . g such that L(g ) i
i
5 Concluding Remarks In [Yok90], Yokomori presented a learning algorithm for the class LT S= from positive data. In the current paper, we have introduced some extended classes, CLT , CLT S , (k; l)-CLT S , etc., by concatenating locally testable languages, and established their relationships. These classes, CLT , CLT S , (k; l)-CLTS are proved to properly include LTS = and to be identi able in the limit from positive data. Especially, we have shown that the class (k; l)-CLT S is identi able in the limit from positive data using reversible automata with the conjectures updated in polynomial time, which is based on a close relationship between the class (k; l)-CLT S and the class of (k +2l)-reversible languages. On the other hand, in [YIK94], we applied the learning algorithm for LT S = to the problem of identifying the -chain region of amino acid sequences of hemoglobin and obtained the overall success rate of more than 90% correct prediction for unknown -chain region of amino acid sequences. This work, motivated by the theoretical result by [Hea87], is interesting in that it bridges the gap between the mathematical analysis of splicing process of DNA sequences and formal language theory. This paper presents some theoretical results on the learnability of some subclasses of concatenations of locally testable languages from positive data. It is strongly suggested by the experimental work [YIK94], that the class of concatenations of locally testable languages may eectively model some classes of amino acid or DNA sequences where sequencial locality changes exist. In fact, the notion of local parsability is strongly motivated from a biological observation that exon-intron boundaries are characterized by a nite set of pairs of base sequences. Therefore, Theorem 10 suggests that Angluin's k-RI algorithm has some potential abilities to nd sequentially changing common localities in given samples of amino acid or DNA sequences, provided that local feature changes of biological data are locally parsable. These application issues to amino acid or DNA sequence analysis are left for future works. It is also left as a theoretical interest to nd an ecient learning algorithm for (k; l)-CLT S in which a representation class R is class preserving with respect to the target class (k; l)-CLT S . k
k
k
k
k
k
k
Acknowledgement We would like to thank anonymous referees for their valuable comments. This work was supported in part by Grants-in-Aid for Scienti c Research No.06780302 15
and No.06249202 from the Ministry of Education, Science and Culture, Japan.
References [Ang80] D. Angluin. Inductive inference of formal languages from positive data. Information and Control, vol.45, pp.117-135, 1980 [Ang82] D. Angluin. Inference of reversible languages. Journal of the ACM, vol.29, pp.741-765, 1982 [BB75] L. Blum and M. Blum. Toward a Mathematical Theory of Inductive Inference. Information and Control, vol.28, pp.125-155, 1975 [Gol67] E. Mark Gold. Language identi cation in the limit. Information and Control, vol.10, pp.447-474, 1967 [Hea87] T. Head. Formal language theory and DNA : An analysis of the generative capacity of speci c recombinant behaviors. Bulletin of Mathematical Biology, vol.49, pp.737-759, 1987 [LZ93] S. Lange and T. Zeugmann. Language Learning in Dependence on the Space of Hypotheses. Proc. of 6th Annual ACM Workshop on Computational Learning Theory, pp.127-136, 1993 [MS93] T. Moriyama and M. Sato. Properties of Language Classes with Finite Elasticity. Proc. 4th Workshop on Algorithmic Learning Theory, Lecture Notes in Arti cial Intelligence 744, pp.187-196, 1993 [MSW90] T. Motoki, T. Shinohara and K. Wright. The correct de nition of nite elasticity: corrigendum to identi cation of unions. Proc. of 4th Workshop on Computational Learning Theory, pp.375-375, 1991 [MP71] R. McNaughton and S. Papert. Counter-Free Automata. MIT Press, Cambridge, MA, 1971 [Yok90] T. Yokomori. A note on the polynomial-time identi cation of strictly local languages in the limit. Report CSIM90-03, Dept. of Comput. Sci. and Inf. Math., Univ. of Elect.-Communi., 1990 [YIK94] T. Yokomori, N. Ishida and S. Kobayashi. Learning Local Languages and Its Application to Protein -chain Identi cation. Proc. of 27th Hawaii International Conference on System Sciences, Hawaii, pp.113-122, 1994, Janualy [Wri89] K. Wright. Identi cation of unions of languages drawn from an identi able class. Proc. of 2nd Workshop on Computational Learning Theory, pp.328-333, 1989 16