LOCALLY, LOCALLY THRESHOLD AND PIECEWISE TESTABLE LANGUAGES. SOME IMPLEMENTED ALGORITHMS D. Belostotski
D. Kravtsov
A. Shemshurenko
A.N. Trahtman
M. Sobol
Sh. Yakov
Bar-Ilan University, Dep. of Math. and CS, 52900,Ramat Gan, Israel email:
[email protected]
Abstract
We implement a set of procedures for deciding whether or not a language given by its minimal automaton or by its syntactic semigroup is locally testable, threshold locally testable, strictly locally testable, or piecewise testable. The level of local testability of syntactic semigroup is also found. Some new eective polynomial time algorithms are used. These algorithms have been implemented as a C=C ++ package (TESTAS).
Key words:Language, Algorithm, Automaton, Semigroup, Local testability, Piecewise testability, Local threshold testability
Introduction There is a deep connection between problems in string pattern matching and nite automata. A number of classical algorithms in everyday use in text editors and other pattern matching software actually compute nite automata from descriptions of patterns as regular expressions. The class of languages arising from nite patterns is the class of locally testable languages and some their generalizations. The concept of local testability is connected with languages, nite automata and semigroups. Membership of a long text in such a language just depends on a scan of short subpatterns of the text. It is best understood in terms of a kind of computational procedure used to classify a two-dimensional image: a window of relatively small size is moved around on the image and a record is made of the various attributes of the image that are detected by what is observed through the window. No record is kept of the order in which the attributes are observed, where each attribute occurs, or how many times it occurs. We say that a class of images is locally testable if a decision about whether a given image belongs to the class can be made simply on the basis of the set of attributes that occur. Locally testable and piecewise testable languages are the best known subclasses of star-free languages. These classes with generalizations de ne two important directions in investigations of these languages. 1
The concept of local testability was introduced by McNaughton and Papert [?] and by Brzozowski and Simon [?]. Locally testable automata have a wide spectrum of applications. Regular languages and picture languages can be described with the help of a strictly locally testable languages [?], [?]. Local automata (kind of locally testable automata) are heavily used to construct transducers and coding schemes adapted to constrained channels [?]. Locally testable languages are used in the study of DNA and informational macromolecules in biology [?]. The locally threshold testable languages were introduced by Beauquier and Pin [?]. These languages generalize the concept of locally testable language and have been studied extensively in recent years. An important reason to study locally threshold testable languages is their applicability in pattern recognition [?], [?], [?]. Stochastic locally threshold testable languages, also known as n grams are used in pattern recognition, particular in speech recognition, both in acousticphonetics decoding as in language modeling [?]. An application of these languages to the inference problem [?] is studied too. Piecewise testable languages are the nite Boolean combinations of languages of the form A a1 A a2 A :::A a A where k 0, a is a letter from the alphabet A and A is the free monoid over A [?]), [?].. There are several systems for manipulating automata and semigroups ( for survey, story and details see [?] and [?]. The list of the systems is following Froidure and Pin paper [?] and preprint of Caron [?]: 1. REGPACK [?] 2. AUTOMATE [?] 3. AMoRE [?] 4. Grail [?] 5. The FIRE Engine [?] 6. LANGAGE [?]. 7. APL package [?]. 8. Froidure and Pin package [?]. 9. Sutner package [?]. Some algorithms concerning distinct kinds of testability of nite automata were implemented by Caron [?], [?]. One of them was improved and implemented in our package. i
k
1 De nitions and results Let be an alphabet and let + denote the free semigroup on . If w 2 + , let jwj denote the length of w. Let k be a positive integer. Let i (w) [t (w)] denote the pre x [sux] of w of length k or w if jwj < k. Let F (w) denote the set of factors of w of length k with at least j occurrences. A language L [a semigroup S ] is called l-threshold k-testable if there is an alphabet [and a surjective morphism : + ! S ] such that for all u, v 2 + , if i 1 (u) = i 1 (v ), t 1 (u) = t 1 (v ) and F (u) = F (v ) for all j l, then either both u and v are in L or neither is in L [u = v]. An automaton is l-threshold k-testable if the automaton accepts a l-threshold k-testable language [the syntactic semigroup of the automaton is l-threshold k-testable]. A language L [a semigroup S , an automaton A] is locally threshold testable if it is l-threshold k-testable for some k and l. The syntactic characterization of locally threshold testable languages was given by Beauquier and Pin [?]: Given the syntactic semigroup S of the language L, we form a graph G(S) as follows. The vertices of G(S) are the idempotents of S , and the edges from e to f are the elements of the k
k
k;j
k
k
k
k
k;j
k;j
2
form esf . A language L is locally threshold testable if and only if S is aperiodic and for any two nodes e, f and three edges p, q, r such that p and q are edges from e to f and r is an edge from f to e we have prq = qrp. We use in our package a polynomial time algorithm for the local threshold testability problem for syntactic semigroup of the language [?]. In spite of the fact that we must verify an identity in ve variables, the time complexity of the algorithm is O(n3 ). Local testability can be considered as a special case of local l-threshold testability for l = 1. A locally testable language L is a language with the property that for some nonnegative integer k, called the order or the level of local testability, whether or not a word u is in the language L depends on (1) the pre x and sux of the word u of length k 1 and (2) the set of intermediate substrings of length k of the word u. For a given k the language is called k-testable. The de nition of strictly locally testable language is analogous, only the length of pre x and sux is equal to k, in the de nition of strongly locally testable language pre x and sux are omitted at all. Kim, McNaughton and McCloskey [?] have found necessary and sucient conditions of local testability and a polynomial time algorithm for local testability problem for automata based on these conditions. Kim and McNaughton [?] have shown that computing the order k of a locally testable deterministic automaton is NP-hard. The situation in semigroups is more favorable. We implement in our package a polynomial time algorithm of worst case asymptotic cost O(n2 ) for nding the order of local testability for a given semigroup [?]. A polynomial time algorithm of order O(n2 ) for local testability problem for semigroups [?] is implemented in our package as well. The class of locally testable semigroups coincides with the class of strictly locally testable semigroups [?], whence the same algorithm of order O(n2 ) checks strictly locally testable semigroups. By n is denoted here the size of the semigroup. Piecewise testable languages were characterized by Simon [?]: A language is piecewise testable i its syntactic monoid is J -trivial (meaning that distinct elements of monoid generate distinct ideals). Stern [?] modi ed these necessary and sucient conditions and described a polynomial time algorithm to verify piecewise testability. The algorithm was implemented by Caron [?]. We use in our package an algorithm to verify piecewise testability of deterministic nite automaton of order O(n2 ) [?]. In comparison, Stern's algorithm for piecewise testability [?] has worst-case complexity O(n5 ). The parameter n is here the sum of number of states and number of edges of the state transition graph of the automaton (n can be also considered as the product of the number of states by the size of the alphabet). We implement also an algorithm to verify piecewise testability of a nite semigroup of order O(n2 ) where n is the size of the semigroup [?]. The family of nite strongly locally testable semigroups is exactly the intersection of the family of nite locally testable semigroups and family of piecewise testable semigroups [?]. So, our package checked strongly locally testable semigroups too. The time complexity of the algorithm is O(n2 ). Regretfully, it is not true that a language is strongly locally testable if and only if its syntactic semigroup is strongly locally testable [?]. 3
2 Description of the package TESTAS The package includes three programs. They analyze: 1) an automaton of the language presented as oriented labeled graph; 2) syntactic semigroup of the language; and nd 3) direct product of two semigroups presented by their Cayley graph. The input of the graph algorithms is a directed graph of type LEDA.graph. The package creates an object of type graph and initializes it to the empty directed graph G. This constructor speci es the numbers of free data slots in the nodes and edges of G that can be used for storing the entries of node and edge arrays. The User has opportunity to de ne the number of nodes and size of alphabet of edge labels, to change them, to add and delete edges, nodes and labels. The program checks piecewise testability of a nite automaton and nds strongly connected components of the graph of the automaton. The input of semigroup algorithms is Cayley graph of the semigroup presented by the table: elements X generators where the elements of the semigroup are represented by integers from 0 to n 1 with generators in beginning. i-th line of the table is a list of products of i-th element on all generators. Set of generators is not necessarily minimal, therefore the multiplication table of the semigroup (Cayley table) is acceptable too. The input le may be ordinary txt le. Comments without numerals may be placed in the le as well.. The program checks local testability, local threshold testability and piecewise testability of syntactic semigroup of the language. Strictly locally testable and strongly locally testable semigroups are veri ed as well. The level of local testability of syntactic semigroup is also found. The User can verify aperiodity and associative low too. There exists possibility to change values of products in the table of the Cayley graph. Maximal size of considered semigroups was about some thousands elements. Big size semigroups are obtained with help of the direct product program.
References [1] M.-P. Beal, J. Senellart, On the bound of the synchronization delay of local automata, Theoret. Comput. Sci. 205, 1-2(1998), 297-306. [2] D. Beauquier, J.E. Pin, Factors of words, Lect. Notes in Comp. Sci. Springer, Berlin, 372(1989), 63-79. [3] J.-C. Birget, Strict local testability of the nite control of two-way automata and of regular picture description languages, J. of Alg. Comp. 1, 2(1991), 161-175. [4] J.A. Brzozowski, I. Simon, Characterizations of locally testable events, Discrete Math. 4, (1973), 243-271. [5] P. Caron, LANGAGE: A Maple package for automaton characterization of regular languages, Springer, Lect. Notes in Comp. Sci. 1436(1998), 46-55. 4
[6] P. Caron, Families of locally testable languages, Theoret. Comput. Sci., 242(2000), 361-376. [7] J.M. Camparnaud, G. Hansel, Automate, a computing package for automata and nite semigroups, J. of Symbolic Comput.. 12(1991), 197-220. [8] G. Cousineau, J.F. Perrot, J.M. Riet, APL programs for direct computationof a nite semigroup, APL Congress 73, Amsterdam, North Holand Publ. Co., (1973), 67-74. [9] V.Froidure,J.-E. Pin, Algorithms for computing nite semigroups. F. Cucker and M. Shub eds.,Foundations of Comp. Math. Springer, (1997), 112-126. [10] K.S. Fu, Pattern recognition and applications Prentice Hall, N.Y., 1982. [11] R.C. Gonzales, M.G. Thomason, Syntactic pattern recognition. An introduction Addison Wesley, N.Y., 1978. [12] T. Head, Formal languages theory and DNA: an analysis of the generative capacity of speci c recombinant behaviors, Bull. Math. Biol. 49(1987), 4, 739-757. [13] F. Hinz, Classes of picture languages that cannot be distinguished in the chain code concept and deletion of redundant retreats, Springer, Lect. Notes in Comp. Sci. 349(1990), 132-143. [14] S. Kim, R. McNaughton, R. McCloskey, A polynomial time algorithm for the local testability problem of deterministic nite automata, IEEE Trans. Comput. 40(1991) N10, 10871093. [15] S. Kim, R. McNaughton, Computing the order of a locally testable automaton, Lect. Notes in Comp. Sci. Springer, 560(1991), 186-211. [16] E. Leiss, Regpack, an interactive package for regular languages and nite automata , Research report CS-77-32, Univ. of Waterloo, 1977. [17] S. W. Margolis, J.E. Pin, Languages and inverse semigroups, Lect. Notes in Comp. Sci, Springer 199(1985), 285-299. [18] O. Matz, A. Miller, A. Pottho, W. Thomas, E.Valkema, Report on the program AMoRE, Inst. inf. und pract. math., Christian-Albrecht Univ. Kiel, 1995. [19] R. McNaughton, S. Papert,Counter-free automata M.I.T. Press Mass., 1971. [20] D. Raymond, D. Wood, Grail, a C + + library for automata and expressions, J. of Symb. Comp., 17 (1994), 341-350. [21] J. Ruiz, S. Espana, P. Garcia, Locally threshold testable languages in strict sense: Application to the inference problem. Springer, Lect. Notes in Comp. Sci 1433(1998), 150-161. [22] C. Selmi, Over testable languages. Theoret. Comput. Sci.,161(1996), 157-190. [23] I. Simon, Piecewise testable events, Springer, Lect. Notes in Comp. Sci., 33(1975), 214-222. 5
[24] J. Stern, Complexity of some problems from the theory of automata. Inf. and Control, 66(1985), 163-176. [25] K. Sutner, Finite State Mashines and Syntactic Semigroups, The Mathematica J. 2(1991), 78-87. [26] A.N. Trahtman, A polynomial time algorithm for local testability and its level. Int. J. of Algebra and Comp. v. 9, 1(1998), 31-39. [27] A.N. Trahtman,Checking local threshold and piecewise testability, (submitted). [28] A.N. Trahtman,Verifying piecewise testability of nite deteministic automata, CIAA2000, London, Ontario, Canada, Yuly 24-25, 2000. [29] E. Vidal, F. Casacuberta, P. Garcia, Grammatical inference and automatic speech recognition. In Speech recognition and coding Springer, 1995, 175-191. [30] B.W. Watson, The design and implementation of the FIRE Engine: A C + + toolkit for Finite Automata and Regular Expressions, Comp. Sci. Rep. 94722, Endhoven Univ. of Techn. 1994. [31] Y. Zalcstein, Locally testable language, J. Comp. System Sci. 6(1972), 151-167.
6