why classical models for pattern recognition are

3 downloads 0 Views 378KB Size Report
approaches to pattern recognition: the vector-space-based and the syntactic approaches. The ..... in cognitive psychology (see, for example, [13, Ch. 5], [14]).
WHY CLASSICAL MODELS FOR PATTERN RECOGNITION ARE NOT PATTERN RECOGNITION MODELS Lev Goldfarb and Jaroslav Hook Faculty of Computer Science University of New Brunswick P.O. Box 4400 Fredericton, N.B., E3B 5A3 Canada (International Conference on Advances in Pattern Recognition, ed. Sameer Singh, Springer, pp.405–414, 1998)

ABSTRACT. In this paper we outline a simple explanation of why, we think, the classical, or vector-space-based (including the artificial neural net) models for pattern recognition are fundamentally inadequate as such. The present simple explanation of this inadequacy is based on a radically new understanding of the nature of inductive learning processes. The latter became possible only after a careful analysis of the axiomatic foundations of a new inductive learning model proposed by the first author in 1990 which overcomes the above limitations. The new model—evolving transformation system—has emerged as a result of a 13-year long attempt to find a mathematical framework that would unify the two main and structurally different approaches to pattern recognition: the vector-space-based and the syntactic approaches. The decisive deficiency of the classical vector-space-based pattern recognition models, as it turns out, relates to the intrinsic inability of the underlying mathematical model, i.e. the normed vector space, to accommodate, during the learning process in a realistic environment, the discovery of the corresponding class distance function under more general than numeric, symbolic, pattern representation. Typically, such symbolic distance functions have very little to do with a very restricted, “Euclidean”, class of distance functions, which due to the underlying algebraic structure of the vector space are unavoidably associated with this form of pattern representation. In other words, the more general class of symbolic distance functions is incomparably larger than that consistent with the vector space structure, and so the discovery and construction of the appropriate class distance function during the learning process simply cannot proceed in the vector space setting.

1. INTRODUCTION The area of pattern recognition is almost half a century old, but today, as in 1974, one can repeat with Vapnik and Chervonenkis that “the problem of pattern recognition still has not acquired the formal formulation that would satisfy all the researchers” [1, p.9]. In the same preface to their book they further suggest: “In essence, the different points of view on the statement of pattern recognition problem are determined by the answer to the question: Is there a single construction principle for adequate description of patterns of various nature or in each case the construction of the description language is the problem for the specialists in the concrete area?

If the answer is ‘yes’, then the discovery of this principle must form the main research direction in pattern recognition. It should form the main direction since it would be a general and principally new direction. If the answer is ‘no’, then the pattern recognition is reduced to the problem of the average risk minimization for a special class of decision rules and can be considered as one of the direction in applied statistics. The answer to this question has not been found and that is why the choice of the problem statement has been so far a question of faith” [1, p.11]. Indeed, since the late 70's early 80's the field of pattern recognition has basically been dominated by two fundamentally different formal frameworks: the original vector-space-based framework and the syntactic framework. In spite of the optimistic hopes, the integration of the two approaches into a single one has proved to be much more difficult than was anticipated. It is important to stress that each of the two approaches has some desirable complementary features arising from the two corresponding axiomatic structures: the normed vector space structure and the Chomsky’s formal grammar structure. To simplify, one can say that the critical relevant feature of the former structure is the existence of the metric-based decision surfaces, while that of the latter is the presence of the generative class representation, however rigid that representation is. The first author has been working on the unification of the above two frameworks for more than twenty years. During that time, it has gradually become clear that a very closely related problem, the problem of inductive learning (formulated in a radically new form), absorbs the main difficulties associated with a satisfactory formulation of the pattern recognition problem. Moreover, it appears that the key to the solution of these twin problems is the new concept of inductive class representation, i.e. a model of a simultaneously generative and distance-based representation, or encoding, of the (infinite) pattern class based on a small finite training set (see section 4). In the paper, based on this new conception of inductive learning process (section 4), we present a simple explanation of why, we think, the classical, or normed vector-space-based (including ANN) models for pattern recognition, in view of the limitations imposed by the axiomatic structure of the normed vector space itself, are in principle incapable of capturing the inductive class structure and therefore of modeling the process of pattern recognition in a realistic environment. For a similar argument, which is based, however, on some very simple basic necessary (but not sufficient) requirement for an inductive learning model, see [2]. This intrinsic limitation of the vector-space-based models should partly explain the reasons for the difficulties encountered in reconciling the above two pattern recognition frameworks. The paper is written by the first author who used in section 5 the results of the experiments performed by the second author under the supervision of the first author.

2. OBJECT REPRESENTATION AND THE CORRESPONDING OPERATIONS The classical (and essentially all known) mathematical structures that offer axiomatic description of a set of objects with some relations among them—e.g. ordered set, group, ring, vector space, topological space—view objects as endowed with a fixed structure specified by means of the defining set of axioms. In other words, one might say that the axioms “force” the corresponding structure on the individual objects in the set. For example, given a set of three linearly independent vectors, i.e. a basis, in a 3-dimensional vector space over the field of reals, as a result of the axiomatic definition of this space, any given vector in it can be uniquely represented as a linear combination of those three vectors, and this decomposition should be viewed as a fixed structural description, or representation, of that vector (with respect to the given basis).

The axioms for a concrete mathematical structure characterize completely the operations that can be applied to the objects when this structure is imposed on a set of objects. Hence, when adopting a particular mathematical structure to model the actual process in nature, one always has to keep in mind that as far as other (than specified by the axioms of the chosen mathematical structure) types of object relations, or operations, and thus the structural object features, are concerned, they become absolutely invisible, or nonexistent. In other words, the adoption of a “wrong”, or inappropriate, mathematical structure for modeling of some real phenomenon has very significant implications for our ability to capture the central features of the phenomenon. In this connection, we believe that, as far as the biological processing models are concerned, there are presently no nontrivial satisfactory examples of utilization of any basic mathematical structures. The latter may, and as we will argue does, point out to the inadequacy of the existing mathematical structures in this respect. What are the very basic reasons for the last statement? It appears that the main reason is related to the following fact: classical mathematical structures were abstracted during the development of “numeric” mathematics, while biologically, or biochemically, more relevant mathematical structures should be abstracted during the development of “symbolic” mathematics whose axiomatics is related to symbolic rather than numeric operations. The first author has already outlined the nature of such operations (see for example [3, 4, 5]). In case of a set of strings over a finite alphabet, examples of such operations are substring insertion, deletion, and substitution operations.1 The fundamental differences between the “numeric” and “symbolic” mathematical structures become more apparent when one begins to investigate the resulting variety of the admissible distance functions, i.e. the variety of configurations of the corresponding sets under various distances that are consistent with the underlying (i.e. numeric or symbolic) sets of operations. This brings us to the next section.

3. SYMBOLIC DISTANCES AS BASED ON THE SYMBOLIC OPERATIONS Conventionally, in mathematics, when introducing topology into an existing mathematical structure, e.g. into a group or a vector space, the basic criteria is to make sure that this topology “makes” the corresponding operations continuous (as functions) with respect to it [6, Ch. 3]. For example, the vector addition and the scalar multiplication in a normed vector space must be continuous with respect to the metric topology induced by the corresponding distance function. As was discussed in [7], this consistency, on the one hand, is absolutely necessary to ensure the consistency of the local space properties throughout the entire vector space, and on the other hand, it imposes severe restrictions on the corresponding class of admissible distance functions. The latter is the necessary price one must pay in order to work in the “nice” spaces that are generalized “numeric” spaces. The accumulated mathematical experience strongly suggests that when approaching nonnumeric, or symbolic, entities, essentially, the same principle must be applied: to ensure the consistency of the introduced topology with the underlying axiomatic, or operational, structure of the set of objects, the distance between two objects in the set must be defined by means of the corresponding set of symbolic operations. By a number of reasons, one of which is to enforce the competitive cooperation, it is useful to assign to each of the symbolic operations a normalized nonnegative weight, i.e. the sum of the weights of all operations for a fixed symbolic structure is 1.

1

Another important path “that points in the right direction” is via a very central in computer science concept of abstract data type, or ADT.

Definition 1. A symbolic structure, or a transformation system (TS), is a triple T = (S, O, D) , where S is a set of structured, or symbolic, object representations called structs; O = {o i} i = 1, . . . , m is a finite set of (reversible) operations each of which is a multivalued function that can transform one symbolic object into other(s) and any two structs from S can be transformed into each other by means of them (e.g. deletions/insertions and substitutions); D = {d ω} ω∈ Ω

is a parametric family of distance functions with the parameter set Ω = { ω = (w1, w2, . . . ,wm) ∈ Rm | ∀i w i ≥ 0, ∑ w i = 1 }

(Ω is called the unit simplex) and, for each parameter vector ω of operation weights, the distance function d ω is defined as follows d ω (s1, s2) = minimum of the numbers, each of which is obtained by adding the weights of the operations in a particular sequence of operations that transforms struct s1 into struct s2 . _________________________________________________________________________________________________________________________________________________________________________________

In this definition, the “reversibility” of set O of operations simply means that the action of any operation on the struct can be reversed by means of another operation (e.g. deletion/insertion). For axiomatic definition of the transformation system, which has emerged within the framework of evolving transformation system model, see, for example, [3]. Returning to the issue raised at the end of section 2, regarding the fundamental difference between the “numeric” and “symbolic” mathematical structures, we should note that the difference is roughly explicated by the difference between the corresponding families of admissible distance functions. Thus, for example, the initial work [8] has already suggested that, given any fixed finite-dimensional pseudo-Euclidean (more general than Euclidean) vector space and a fixed typical symbolic distance function on the set of strings over a finite alphabet, almost all finite sets of strings cannot be isometrically (preserving the string interdistances) represented in that pseudoEuclidean space. Even more important, however, is the fact [7] that while in a finite-dimensional vector space all admissible distance functions are equivalent to each other, in a symbolic setting, the last statement is absolutely wrong: even for a set of strings over a finite alphabet there are, in fact, infinitely many nonequivalent classes of distance functions. Basically, what explains the relative richness of the class of symbolic distance functions is the relative richness of the set of various structural paths between a pair of structs (see Definition 1). Moreover, as one moves up the ladder of representational complexity, e.g. from vectors to strings, to labeled trees, to labeled graphs, the richness of the corresponding classes of distance functions grows. In pattern recognition, the pivotal role of the concept of distance defined on the set of patterns is almost universally recognized (see for example [5]). In the following section we attempt to make this point more explicit by incorporating the concept of distance into a new definition of the inductive learning problem, the problem which appears to be more basic with respect to the pattern recognition problem.

4. INDUCTIVE LEARNING AND INDUCTIVE CLASS REPRESENTATION Since paper [9] was devoted to this topic, we will restrict ourselves to the substantially modified basic definitions whose present form was recently proposed by the first author. We note that the following two definitions amount to a radically new understanding of the concept of class, the concept of class generalization, as well as the concept of inductive learning process.

Definition 2. Given some countable representation structure S (specified, as always, axiomatically), by a class C we will understand a countable subset of S whose origin is assumed to be some constructive generative process that, starting from some finite number of elements and using some finite set of operations, allowed by the axioms of S, generates all of C. ________________________________________________________________________________________________________________________________________________________________________________________________________________________

The assumptions of the above definition are justified based on our present understanding of the physical and chemical processes responsible for the evolution of objects in the universe [10, pp. 105–111]: it is almost universally accepted in science that all objects have hierarchical compositional history and that at each (fixed) hierarchical level an object can be represented as a compositional, or combinative, structure composed of a finite set of “symbolic” primitives according to some fixed set of structural rules. _____________________________________________________________________________________________________________________________________________________

Definition 3. Given a small finite set C+ of k (typically 4 ≤ k ≤ 15) positive patterns that are randomly chosen from a class C —class to be learned—and a small finite set C− of negative patterns, i.e. not from C, we say that a system is capable of learning inductively class C if it is able to construct an inductive class representation of C (ICR) which satisfies the following two conditions • ICR can generate from C+ or its subset (in a manner consistent with the structure of S) an approximation C* of C such that C* ∩ C− = φ and • ICR should also specify the class distance function d on S which must be consistent with the structure of S and such that: for some small nonnegative number ε, the part of C* contained in the union of k spheres, where each sphere is defined as S (s', ε ) = { s ∈ S | d (s, s' ) < ε for a fixed s' ∈ C+ } , coincides with the part of C generated by C+ and, moreover, the same union has no elements from the complement of C. ____________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ __________

The phrase “consistent with the structure of S” simply means that the constructive generative process must rely only on the operations allowed by the axioms of S (see section 2). The above definition proposes to base our understanding of the nature of inductive learning process on the presence of some general mechanism, the ICR, that is capable: 1) of generating some approximation of the class on the basis of a small training set of patterns, and, more significantly, 2) of producing the class distance function (which is not necessarily metric), which can then serve as a basic tool for a fuzzy but more refined delineation of the class boundary. The main source of the present form of the above definition is the concept of ICR proposed by the first author around 1993–94 during the ongoing development of the evolving transformation system (ETS) model [11, Sec. 5] as well as the classical mathematical concept of inductive definition of a set (see, for example, [12, Sec. 3.1]). Paper [11] is particularly recommended as the only published brief outline of a more typical ETS learning algorithm. We should also note that, since the early 1970's, an indispensable role of the concept of distance, or inversely similarity, measure in the clarification of the concept of class has gradually become very apparent in cognitive psychology (see, for example, [13, Ch. 5], [14]). It is not difficult to see that a satisfactory (according to the above definition) inductive learning model simultaneously solves the pattern recognition problem: the corresponding class distance function d offers the sought pattern classification tool with the help of which one can answer the question whether or not a given pattern belongs to class C. In the ETS model, which suggested the above definitions, the inductive class representation takes on a very revealing and satisfactory form illustrated (condensedly) in the following very simple example (see [4]).

Example. Suppose that the set of patterns is a set of 2-dimensional plane closed shapes composed of three primitives: the straight segment a, the convex corner b, and the concave corner c. Then, the set of structs (Def. 1) could be specified as the set of strings over a threeletter alphabet {a, b, c}, where any two strings obtained by a circular permutation of each other are equivalent, i.e. they are considered to be identical. Given a small (of size 10-15) training set corresponding to the class of rectangles, at the end of the learning process 2, the resulting ICR (see the definition following the example) may look, for example, as follows ( s' = abaaababaaab, { o1, o2, o3 }, ω* = (0, 1, 0) ) , or as follows

( s' = baabaaaabaabaaaa, { o1, o2, o3 }, ω* = (0, ½, ½ ) ) , or as follows

( s' = aababaabab,

{ o1, o2, o3 }, ω* = (0, 0, 1) ) , where s' is the generating struct, o1 , o2, , o3 denote the deletions/insertions of a, b, c respectively, and ω* = (w1 , w2, w3) is the corresponding weight vector (see Def. 1). As shown in [4], in this case, the corresponding (optimal) family of operation weights forms a 1-dimensional side w2 + w3 = 1 of a 2-dimensional unit simplex Ω in R3. In the absence of noise, any string s, whose (circular edit) distance d ω* (s, s') is 0, is classified as belonging to the class of rectangles. ▶ One should note that the above example corresponds to the simplest case, when the construction of the corresponding ICR does not involve the discovery of any new (macro)operations, e.g. deletions/insertions of substrings. Note that the model name, “evolving transformation system”, reflects the fact that the learning process constructs a sequence of transformation systems (each with a more adequate set of operations) resulting in the final ICR. The general form of ICR in the ETS model is given by the following triple Π = (C+ʹ , O fin , Ω fin ) , where C+ʹ is a small subset of C+ (the set of generators), O fin is the final set of operations at the end of the learning process, and Ω fin ⊆ Ω is the set of optimal weight vectors for the final transformation system. It is not difficult to see that the two conditions in Definition 3 are satisfied (C* is generated by the operations from O fin acting on the structs from C+ʹ ). To summarize, the main point is that the process of inductive learning, and therefore of pattern recognition, cannot be properly understood or satisfactory modeled independently of the two conditions described in Definition 3. In particular, we strongly believe that the failure of the classical, i.e. vector-space-based pattern recognition models, considered next, can be attributed to the inability of the vector space framework to accommodate, for practically all nontrivial useful environments, the inductive learning process as it was conceived above.

5. ARE THERE ANY ENVIRONMENTS FOR WHICH THE CLASSICAL PATTERN RECOGNITION MODELS ARE WELL SUITED? In this section, based on the new understanding of inductive learning outlined in the last section, we first describe the unrealistic restrictions on the pattern recognition environments imposed by the structure of the classical, vector-space-based, models. We then discuss the 2

In this paper we are not considering any ETS learning algorithms that actually construct the ICR (see for example [4], [11]).

implications of ignoring these restrictions and the ongoing, as we believe unproductive, attempts to bypass them. For a related complimentary discussion centered around other examples see [2]. Based on Definitions 2, 3, it is easy to see that, in a vector space framework (the basic operations are vector addition and scalar multiplication), the class that is not a union of several subclasses, first, must lie either in the linear or in the affine subspace generated by the positive training set, C+, this subspace must not contain any negative training patterns, and second, within this subspace, in some spheres around the patterns from C+, all the class patterns “related” to the patterns in C+ (typically all of C ) must lie, with no non-class patterns present in the spheres. Obviously, the above conditions do not hold. Historically, this meant that in order to use the vector-space-based pattern recognition models at all, one had to bring into consideration some classes of nonlinear “decision” functions. However, what do these classes of nonlinear functions have to do with the problem of inductive class representation? In fact, there are uncountably many different classes of nonlinear functions, and, practically, almost all of them are not related to both the structure of the representation space, i.e. to the input vector space, and the “structure” of the training set of vectors. Thus, if the input representation structure, S, is a vector space, then whether we do or do not add to the vector (or linear) space axioms some nonlinear structure (e.g. for the internal nodes of the corresponding ANN), the following very important question arises: What are we learning during the learning process? Certainly, the “learned” decision surfaces do not give us anything even remotely resembling the inductive class representation (ICR) as it was outlined in Definition 3. Following further the logic of the new learning framework, one can observe that an essentially unique (up to equivalence [7]) Euclidean distance in the vector space is practically never related to the class distance, but, according to the proposed framework, it is the latter that is directly involved in capturing the class generalization. In other words, the vector space learning framework, whose linear structure “requires” analytically “nice”, locally homogeneous and fixed metric geometry, does not allow one to mold the distance to the ICR. Hence, in the light of the learning framework of section 4, the classical “learning algorithms” in the vector space appear to us now no more than sophisticated games whose aim is to enclose the positive training patterns within some optimal surfaces (chosen from a fixed class of surfaces) in such a way that the negative training patterns are excluded. It is no wonder, then, that, in most cases, multiple complex surfaces are needed to accomplish the goal of the “game”. By contrast, in the ETS framework, the class distance function that is being constructed during learning (by searching for the “optimal” operations with the “optimal” weights), with each new set of operations, changes “irreversibly” the “configuration” of the pattern space itself. To illustrate the above points, related to the inadequacy of the classical models (see also [2]), we have performed several simple experiments, the results of which would, probably, not surprise at all anyone sufficiently familiar with the classical vector-space-based learning paradigm. However, in light of the above considerations, we believe that the results might help one to reinforce the above implications of the proposed new understanding of inductive learning and pattern recognition processes. Figure 1 gives some examples (see [15] for more details) of the experimental pattern recognition results produced by a version of the backpropagation algorithm given in [16] 3. The binary images are of size 8×12 and the training sets supplied by the authors of [16] are of sizes 30–50 (for each digit). The algorithm has converged successfully for each digit class and correctly classified all training patterns. Moreover, the test patterns, also supplied by the authors of [16], were correctly recognized with the error rate of 9 – 10%. The patterns given in Figure 1 are the 3

A three layer architecture with the following parameters: input layer size 96 (= 8×12), hidden layer size 8, output layer size 1, temperature 0.9, moment μ = 0.5, α = 0.1, error margin 0.01, maximal number of iterations 2250000.

“4”

“4”

“4”

“4”

“0”

“0”

“0”

“0”

“2”

“2”

“Not 0”

“Not 1”

“8”

“Not 1”

“Not 7”

Figure 1: Examples of the binary images classified by the backpropagation algorithm as belonging to the class (or its complement) indicated in “ ”.

patterns that were constructed by us. These examples do reinforce the above observations: no concept of class has been captured at the end of learning. In particular, one can easily find: 1) the patterns close to each other in the Euclidean metric but belonging to different classes and 2) the patterns distant from each other in the Euclidean metric but belonging to the same class. Thus, a popular but misleading impression that the classical models are “good enough for practical purposes” is just a tranquillizing illusion which has recently 4 emerged when a virtue has been made of necessity (lack of good models and the resulting need to extensively constrain the original problems).

6. CONCLUSION In this paper, we very briefly outlined a framework for inductive learning and pattern recognition based on the evolving transformation system (ETS) model [3, 4, 5, 11]. The outlined framework projects the view that the inductive learning process embodies some analytically new form of constructive “activity”: the discovery of the appropriate set of weighted “symbolic” operations which capture the class representation based on a small training set of patterns. These weighted operations create the connection, or bridge, between the finite set of training patterns and the class generated by the positive training patterns, i.e. they form the core of the inductive class representation. The same weighted operations, automatically, modify the representation of the patterns in the class by modifying the corresponding distance function 5. In particular, the framework suggests a radically new concept of pattern representation: a pattern representation always involves some finite set of operations, which, if necessary, are dynamically updated during the learning process. These operations can be thought of as pattern/class features and the distance between the structurally “compatible” patterns is always computed based on the current set of operations. The patterns, and therefore the classes, are always viewed as of a generative origin, and the learning process is supposed to find in some chosen operational form, based on a training set, a corresponding (to this operational form) generative inductive class representation: the class consists of those patterns that, first, could be generated (with the help of the operations) from the positive training patterns and, second, are sufficiently close (with respect to the constructed class distance function) to the corresponding positive training patterns. On the basis of the above framework 6, we suggested that the classical, or vector-spacebased, learning framework, in view of the basic axiomatic restrictions of the vector space structure, cannot at all accommodate the inductive learning processes as they emerge from this new framework. This type of axiomatic restrictions is not peculiar to the vector space structure only but are the restrictions that apply to all classical, or of the “numeric” origin, mathematical structures. In other words, what emerges from the above considerations is the need for a fundamentally different class of mathematical structures—”symbolic” mathematical structures, in which the finite set of basic operations is not fixed axiomatically but can be modified dynamically (see for example [3, 5, 8]). The main advantage of such mathematical structures is the ability to modify dynamically their global metric geometry in order to find the optimal class of distance functions that best captures the inductive class representation. 4

This happened after the popularization and the accompanied vulgarization brought by the “connectionists”.

5

I.e. by modifying the distance function (defined on the representation set S ) corresponding to the learning class (see Definition 3). 6

See also another argument based on the intrinsic inadequacy of the vector-space-based learning paradigms [2].

Finally, it is interesting to note that if, indeed, as it appears to us, the above new framework, ETS, does capture the essential features of the learning processes, we are faced with a monumental but unavoidable task of designing radically new, symbolic, measurement devices, which are, at the same time, inductive measurement devices [3]. While all present measurement devices, through their interacting components, produce numbers (with the corresponding essentially unique distance function) as outputs, the symbolic measurement devices, through their interacting components, produce more general entities, “structs” (symbolic entities with the corresponding set of weighted symbolic operations, see Definition 1), as their outputs. Such autonomous “intelligent” measurement devices, as is the case with all biological transducers, will be able to interact with the environment in a more direct and evolutionary, i.e. based on the past experience, manner and will be able to capture the inductive structure of events/objects. It is the inductive structure that, in the long run, appears to be the only reliable structure worth capturing (see [17], particularly pp. 242–245).

REFERENCES [1] Vapnik V.N. and Chervonenkis Ya.A., Pattern Recognition Theory: Statistical Problems of Learning. Nauka, Moskow, 1974 (in Russian). [2] Bhavsar V.C., Ghorbany A.A., Goldfarb L., Artificial neural networks are not learning machines. Internal Technical Report, Faculty of Computer Science, University of New Brunswick. [3] Goldfarb L. and Deshpande S., What is a symbolic measurement process? In Proceedings of the IEEE Conf. Systems, Man, and Cybernetics, vol 5, IEEE Press, 1997, pp. 4139-4145. [4] Goldfarb L., On the foundations of intelligent processes I: An evolving model for pattern learning. Pattern Recognition 1990; 23: 595–616. [5] Goldfarb L., What is distance and why do we need the metric model for pattern learning? Pattern Recognition 1992; 25: 431–438. [6] Taylor A.E., Introduction to Functional Analysis. Wiley, New York, 1967. [7] Goldfarb L., Abela J., Bhavsar V.C., Kamat V.N., Can a vector space based learning model discover inductive class generalization in a symbolic environment? Pattern Recognition Letters 1995; 16(7): 719–726. [8] Goldfarb L., A new approach to pattern recognition. In: Kanal L.N., Rosenfeld A. (eds.) Progress in Pattern Recognition 2. North-Holland, Amsterdam, 1983, pp. 241–402. [9] Goldfarb L. Inductive class representation and its central role in pattern recognition. In: Proceedings of the Conference on Intelligent Systems: A Semiotic Perspective, vol. 1. NIST, Gaithersberg, Maryland, 1996, pp. 53–58. [10] Reeves H., Malicorne: Earthly Reflections of an Astrophysicist. Stoddart Pub., Toronto, 1993. [11] Goldfarb L., Nigam S., The unified learning paradigm: A foundation for AI. In: Honavar V., and Uhr L. (eds.). Artificial Intelligence and Neural Networks: Steps toward Principled Integration. Academic Press, Boston, 1994, pp. 533–559. [12] Hein J.L., Discrete Structures, Logic, and Computability. Jones and Bartlett, Boston, 1995. [13] Bourne L.E., Dominovsky R.L., Loftus E.F., Healy A.F., Cognitive processes, 2nd Edition. Prentice-Hall, Englewood Cliffs, New Jersey, 1986. [14] Hahn U., Chater N., Concepts and similarity. In: Lamberts K, Shanks D (eds). Knowledge, Concepts, and Categories. MIT Press, Cambridge, Massachusetts, 1997, pp. 43–93. [15] Hook J., Are Artificial Neural Networks Learning Machines? Master Thesis, Faculty of Computer Science, University of New Brunswick, Fredericton, Canada, 1998. [16] Pandya A.S., Macy R.B., Pattern Recognition with Neural Networks in C++. IEEE Press, 1996. [17] Plotkin H., Darwin Machines and the Nature of Knowledge. Penguin Books, London, 1995.