Transformations that preserve learnability - Springer Link

T r a n s f o r m a t i o n s that Preserve Learnability * Andris Ambainis 1 and Rfisil~s Freivalds 1 Institute of Mathematics and Computer Science, University of Latvia, Raina bulv. 29, Riga, Latvia, e-mail:{ ambainis, rusins}@cclu.lv

A b s t r a c t . We consider transformations (performed by general recursive operators) mapping recursive functions into recursive functions. These transformations can be considered as mapping sets of recursive functions into sets of recursive functions. A transformation is said to be preserving the identification type I, if the transformation always maps/-identifiable sets into/-identifiable sets. There are transformations preserving FIN but not EX, and there axe transformations preserving EX but not FIN. However, transformations preserving EXi always preserve EXj for j < i.

1

Introduction

In his academic lecture (1872) before getting professorship in Erlangen university Felix Klein (1849-1925) designed an astonishing program how to remake geometry. The listeners were confused and even shocked. In this program (nowadays known as Erlangen program) geometry was considered as "what remains invariant under motion transformations". It seemed unbelievable that a geometry textbook could have no pictures in it. However, a century later there are more geometry books containing no illustrations rather than containing any. The best description of the algebraic (rather: group-theoretic) approach is found in the book [18] by Herman Weyl (1885-1955). To cut the long story short, this approach is as follows: "If you are to find deep properties of some object, consider all the natural transformations that preserve your object (i.e. under which the object remains invariant). These transformations when supplied with the algebraic operation "composition" form a group. Consider automorphisms of this group. The automorphisms form another group. Consider the generating elements of these two groups, and consider the structure of subgroups of these two groups. You will get a huge amount of properties of your object that are most essential for understanding your object." The algebraic approach has made its way in Theoretical Computer Science as well long ago. Here have been some spectacular justifications of the algebraic approach. One of the most convincing examples (at least for the authors of * This research was supported by Latvia's Science Council grant 93.599. The first author was supported in part by scholarship "SWH Izgl~tl-bai, Zin~tnei un Kultfirai" from Latvia's Education Foundation.

300

this paper) is the history of the equivalence problem for multi-tape one-way deterministic automata[7]. For two decades this problem was open but all the many attempts to solve it failed. Finally, the problem was solved by Tero Harju and Juhani Karhum~ki by reduction of this problem to a theorem by Samuel Eilenberg in Algebraic Automata Theory. We have started an algebraic approach to Inductive Inference, closely following the guidelines by F. Klein and H. Weyl. However, we are only at the very starting point. We have considered transformations defined by general recursive operators. These transformations map functions into functions. This naturally induces a mapping from sets of functions into sets of functions. We are interested in finding out whether the transformation always maps an identifiable set of functions into an identifiable set of functions. In the positive case we say that the transformation preserves identifiability. There are many known identification types. We wished to find the first relations between groups of transformations preserving different identification types. Let T(I) be the set of all the transformations preserving identification type I. One might expect that, for two identification types /1 C_ /2 implies either T(I1) C_ T(I2) or T(I2) C_T(I1). We found that this is not true. FIN C EX but neither T(FIN) C_ T(EX) nor T(EX) C_ T(FIN). On the other hand, for all i and j, if i < j, then T(EXj) C_ T(EX~). 2

Related

Work

Barzdin~s c o n j e c t u r e and r o b u s t n e s s . Many proofs in inductive inference are based on self-referential arguments (arguments involving classes of functions such that values of a function contain program computing the function). It has been argued that no interesting function class can have such structure and, hence, these proof methods are not natural. In 1970s, Barzdins suggested how to formulate such objections. Namely, he suggested that, if a class of functions U is identifiable only due to self-referential property, then, there exists a recursive operator 9 such that 4~(U) is not identifiable. He conjectured that one of early results in inductive inference (the separation between EX and Popperian EX) would fail, if the requirement that ~(U) should be identifiable for any 9 was introduced. Several authors [6, 13, 19] have studied Barzdins' conjecture. Fulk[6] proved that Barzdina' conjecture is false and considered separations between learning classes which are robust, i.e. preserved by any transformation ~. Similarly to our work, they considered identifiable sets of functions and transformations which preserve or do not preserve learnability. The difference from this paper is that they focused on classes of functions which are learnable under any transformation but we focus on transformations which preserve learnability of function classes. Intrinisic complexity. Freivalds, Kinber and Smith[5] used transformations to define reducibilities between learning problems and to analyze intrinsic

301

complexity of learning problems. For language learning, similar approach was considered by Jain and Sharma[10, 11]. T r a n s f o r m a t i o n s which preserve P A C - l e a r n a b i l i t y . Pitt and Warmuth have studied transformations which preserve learnability in the framework of PAC-learning and polynomial prediction model [12, 16]. 3 3.1

Preliminaries Notation

IN = {0, 1, 2,...} denotes the set of natural numbers. C_ denotes inclusion, C denotes proper inclusion. denotes a fixed acceptable programming system for partial recursive functions [17, 14]. We denote it by ~-system. ~i denotes the function computed by the program with number i in ~-system. Any recursion theoretic notation not explained below is from [17]. 3.2

I n d u c t i v e inference of recursive f u n c t i o n s

As is traditional in the study of inductive inference, we will use the basic model of learning in the limit developed by Gold [8]. As targets of the learning procedure, we will use the recursive functions. Several authors have argued that this paradigm is sufficiently general to model, via suitable encodings, a large variety of real world learning situations [1]. In the sequel, we will consider the learning of a particular recursive function f : IN ~ IN. An Inductive Inference Machine (abbreviated as IIM) is an algorithmic device that receives as input the values of function f(0), f(1), -.- and tries to guess some program computing the function f. An IIM outputs a sequence of conjectures, each computing some partial recursive function. The function f is identified by an IIM, if a sequence of conjectures produced when using the range of f as input converges to a program computing f. The set of recursive functions U is identified by an IIM M if M identifies each function in U. The set U is identifiable if there exists IIM M which identifies this set. The collection of all sets which are identifiable will be denoted EX. An IIM makes a mindchange when it outputs a conjecture that is syntactically different from the previous one. Two cases have been considered. The first, called learning the limit is when the number of mindchanges is finite, but not bounded a priori. This notion is precisely the same as EX-identification as defined above. The second case arises when the number of allowed mindchanges is bounded a priori by some constant n. In this case, the learning process is assumed to stop when (and if) the n th mind change is made. This restricted form of inference is called EX~-identification. The classes EXo, EX1, -.. are defined analogously with the class EX. A Well known result from [3] is that EX0 C EX1 C ..- C EXn C EX~+I C --- C EX. EX0 is also denoted by FIN. In FIN-identification, IIM can output only one conjecture on input f and cannot change it later.

302

3.3

Recursive operators

In this paper, we shall study the transformations which preserve identifiability. We require that transformations must be computable. More formally, we require that transformations must be general recursive operators[17]. General recursive operator is an algorithmic mapping which maps total functions to total functions and is defined for all recursive functions. Below, we give more formal definition. Consider a Turing machine M that reads from the input pairs of natural numbers ( k , m ) and, from time to time, outputs pairs (k',m~). This machine defines an enumeration operator 9 which maps sets of pairs to sets of pairs. 4~(S) is the set of pairs which the machine outputs, if it receives S as the input. Let P be the set of all partial functions of one variable. Enumeration operator induces mapping ~ from :P to P. The mapping ~ is partial recursive operator. If f is a partial function, then ~ ( f ) is undefined, if the machine M on the graph of f (the set of (k, m) such that f ( k ) = m) outputs two pairs (k', ml) and (k', m2) such that ml 7~ m2 (i.e. outputs both that k~(f)(k') = m l and ~/(f)(k') = m2). Otherwise, k~(f) is the partial function such that k~(f)(k') = m' if and only if M outputs (k r, m~} on the input f . Partial recursive operator ~ is recursive operator if ~ ( f ) is defined for all partial functions of one variable, i.e. if there is no input on which M outputs both (k, ml) and (k, m2) for kl r k2. Recursive operator k~ is general recursive operator if, for any total (even nonrecursive) function, ~ ( f ) is a total function. For more information on recursive and general recursive operators refer to [17]. In this paper, we consider general recursive operators as transformations in the set of all recursive functions and study them in the context of inductive inference. T and a denote initial segments of total recursive functions. ITI denotes the length of ~-. ~-IYdenotes the concatenation of y at the end of % i.e. T = ~-lOy is defined as follows:

(

i f x < 1 11, if z ' ~ IT1], undefined otherwise. ~-107-2 denotes the concatenation of ~-1 and v2 i.e., v = v1(>~-2 is defined T(X) :

rl(x)

y

7(x)-- ~ - e ( x - l n l ) k undefined

if In I _ < x < l n l W I T 2 1 , otherwise. Let T = (too,... m d and a = ( m ~ , . . . , rn}) be initial segments of total recursive functions. O(r) = a denotes that k~ maps r to a (i.e., if f(O) = t o o , . . . , / ( i ) = mi, then O(f)(0) = m~o, ..., O ( f ) ( j ) = m}).

4 L e a r n i n g c r i t e r i a w h i c h are i n v a r i a n t u n d e r all transformations Let K be the identification type (i.e. the collection of all sets identifiable according to some criterion of success (e.g. EX, FIN, EXi, etc.)).

303

There are two trivial cases when for any U E K and any general recursive operator 4~, ~(U) E K: 1. K contains all finite sets of functions and no infinite sets of functions (for example, K is the identification of minimal programs in some acceptable programming systems[4]). Then, for any U E K, ~(U) is finite and, hence, ~(U) E K. 2. The set of all recursive functions is identifiable (BC*-identification[3] or learning with confidence[2], for example). Then, for any U E K, r is the subset of the set of all recursive functions, and, hence, is learnable. In these two cases, K-identifiability is preserved by any recursive operator ~. However, none of these two examples is natural. In the first case, the learning technique is so weak that it allows to learn only finitely many functions. In the second case, it is so strong that it allows to learn everything. There is the third, more natural case, when K-learnability is invariant under any recursive operator ~. IIM is Popperian if all its conjectures (even the incorrect ones) are total recursive functions[3]. A set of functions U is PEX-identifiable if it is EX-identified by some Popperian IIM M. T h e o r e m 1 [19] If U E PEX and 9 is a recursive operator, then ~(U) E PEX. P r o o f . It can be proved that U E PEX if and only if U is recursively enumerable class of total recursive functions. For any r. e. class U, ~(U) is r. e. class of total recursive functions, too. Hence, if U E PEX, then ~(U) E PEX. [] It is known that PEX-identification has other nice closedness properties, too. It is closed under finite and even recursively enumerable union. 5

Transformations

which preserve

FIN o r EX

We compare the set of transformations that preserve EX and the set of transformations that preserve FIN. T h e o r e m 2 There exists a one-to-one general recursive operator kV which preserves EX-identifiability but does not preserve FIN-identi]iability. P r o o f . Consider the recursive operator ~P defined as follows: ~(0 ~) = 0 n k~(OnO kOv) = on+k OnO k,~v for any n, k E IN satisfying k > 0 and any sequence of values v. C l a i m 1 ~ does not preserve FIN-identi]iability.

304

P r o o f . The set U consisting of all functions i

fi(x)=

i f x = 0,

0 otherwise

for i C 1N is FIN-identifiable. Consider the set g'(U). By the way of contradiction, assume that some IIM M FIN-identifies g~(U). k~(.f0) = f0 because k~(0~) = 0 '~. So, M identifies fo, i.e. it outputs a conjecture on f0 after reading a finite initial segment 0 n. This initial segment is common for O(f0) and all functions k~(]m) with m > n. Hence, on all these functions M outputs the same conjecture. This conjecture is correct on at most one of these functions. Hence, M does not identify some of g'(fm). Contradiction. [] C l a i m 2 If U E E X , then g'(U) E E X . P r o o f . Let T be an initial segment of the function in U and a be ~(T). If a ~ 0 "~ and a ~ 0mOn, there exists at most one v such that ~(v) = or. Hence, if U is identified by a machine M, then ~(U) is identified by a machine M ~ working as follows: 1. If the segment of the input function f ( 0 ) , . . . , f(n) read by the is everywhere zero, output the everywhere zero function as the 2. Otherwise, search for segment T which is mapped to f ( 0 ) , . . . , operator ~, compute the conjecture of M after reading T and Output a program computing O(qOh).

machine M ~ conjecture. f(n) by the denote it h.

Let f E U. If ~ ( f ) is everywhere zero function, it is evidently identified by M ~. Otherwise, starting from the first nonzero value of ~ ( f ) , M ' computes the initial segments of f correctly. If M outputs correct program for f on some initial segment of f , then M r outputs a correct program for O(f) on the corresponding initial segment of O(f). Hence, if f E U, then M' identifies O(f). [] Surprisingly, it appears that not all FIN-preserving operators preserve EXidentifiability as well. (We expected that it is possible to use the transformation of input segments (from ~P(T)) to T) and conjectures (from f to ~ ( f ) ) for FIN as the subroutines in the transformation of EX-identification algorithms.) Using a complicated diagonalization argument, we prove T h e o r e m 3 There exists a one-to-one general recursive operator ~P which pre-

serves FIN-identifiability but does not preserve EX-identifiability. P r o o f . We construct, in parallel, a transformation ~P which preserves FINidentifiability and a learning machine M which identifies U such that ~P(U) ~ EX. N1, N2,... denotes some numbering of all FIN-identification machines. With each machine Ni we associate a set Si consisting of all initial segments of functions ( f ( 0 ) , . . . , f(n)) such that Ni issues a conjecture after reading f(n) and without reading f ( n + 1) (and next values). M1, M 2 , . . . is some numbering of all EX-identification machines. We construct a set Ui such that Mi does not learn ~(Ui). To construct Ui, we use

305

functions f with f(0) = i. Simultaneously, we define the learning machine M so that M EX-identifies U~. The construction is as follows: 1. Define k~((i)) = (i). 2. Set ~- = (i/. Define that M issues a conjecture hr on T. The conjecture hr is a program such that ~h~ (x) = T(X), if ~-(X) is defined. (For example, if T = (i), then ~h~ (0) = i.) To compute the next values, the program h~ simulates the next steps of the diagonalization process given below and looks where the value of the function computed by hi is defined. 3. Start stage 1. Stage s. Let m be the length of ~Define that M changes a conjecture, if it reads TOO (i. e., M reads 7, then it reads the next value, and the next value of function is 0). M's conjecture after reading this input is a program computing T(X) i f x < m , 0 otherwise.

]o(X)=

s.1. Set k equal to 1. s.2. Define that, for any initial segment a such that T q: a, a = a~Oy and ~ ( a I) is defined before this step, =

s.3. For all k'