algebraic structure of some learning systems - CiteSeerX

ALGEBRAIC STRUCTURE OF SOME LEARNING SYSTEMS Jean-Gabriel GANASCIA LAFORIA - Institut Blaise Pascal Université Pierre et Marie CURIE Tour 46-0, 4 Place Jussieu 75252 Paris, CEDEX, FRANCE [email protected]

Abstract: Our goal is to define some general properties of the representation languages — i.e. lattice structures, distributive lattice structures, cylindric algebras...—, on which generalization algorithms could be related. This paper introduces a formal framework providing a clear description of the version space. It is of great theoretical interest since it makes the generalization and the comparison of many machines learning algorithms possible. Moreover, it could lead to reconsider some aspects of the classical description of the version space. In this paper, we shall restrict the scope of investigation to lattices, i.e., to cases where there exists one and only one generalization for any set of examples. More precisely we take into account a particular kind of lattices: Brouwerian lattices. It is shown that a particularly interesting case covered by this restriction is the product of hierarchical posets which is equivalent to the conjonction of tree structured or linearly ordered attributes.

1. INTRODUCTION During the past, very little attention has been paid to the mathematical structure of objects involved in machine learning processes. As example, the notion of linearly ordered or tree structured attribute which is currently used in Machine Learning is not related to well defined mathematical properties. In fact, most of the time, machine learning mechanisms rely on the introduction of an ordering relation which is related to the notion of generality or to the notion of subsumption. This ordering relation restricts by itself the range of mathematical frameworks which can structure the representation language. The traditional artificial intelligence approach defines knowledge representation languages before defining the properties of those languages. In case of machine learning, there is no precise definition of the language properties which are required by the learning algorithms. It leads to some confusion since the limitations of representation languages and the limitations of algorithms which manipulate expressions are not clearly distinguished. For instance, in case of ID3-like induction systems, it appears that the attribute-value representation is a particular case of some more general representation language which could easily extend those systems. However, the classical description of the algorithm does not make the extensions to more general languages obvious since it is limited to representation languages based on attribute-value structural descriptions. The goal is here to define some general properties of the representation languages — i.e. lattice structures or distributive lattice structures, ... —, on which generalization algorithms could be based. It is of great theoretical interest, since it makes the generalization and the comparison of many machine learning algorithms possible. Nevertheless, it should not be confused with learnability, either Gold [Gold 67], Valiant [Valiant 84] or others learning paradigms. The present goal is not to define general limitations of learning mechanisms but to relate machine learning algorithms to the mathematical properties of the manipulated objects. On the other hand, there has been some attempts to define a general learning framework using the notion of version space, but recent studies show that, in practice, this framework is not usable. (See [Haussler 88] or [Hirsh 92]) We shall restrict our presentation to lattices. It means that each set of descriptions has one and only one least general generalization. This restriction covers many applications in machine learning, for instance, it covers all the ID3-like systems [Quinlan 1983, 1986], but it does not cover the case where matching is multiple, i.e., where a first order logic is required. (Cf. [Muggleton and Feng 1990]) In those cases, the notion of cylindric algebra has to be introduced. It could be seen as a generalization of the present work, but the principles on which it relies are the similar to those presented here. 2.

INTRODUCTION TO VERSION SPACE

Introduced by T. Mitchell [Mitchell 82], the version space has been seen as a general framework in which every machine learning algorithm could be described as a search algorithm. In this framework, similarity-based learning — SBL — could be summarized by the following points (Cf. [Mitchell 82]). "Given: - a language in which to describe instances - a language in which to describe generalizations - a matching predicate that matches generalizations to instances - a set of positive and negative training instances of a target generalization to be learned Determine: generalizations within the provided language that are consistent with the presented training instances (i.e. plausible descriptions of the target generalization)" It is assumed that the space of generalization is ordered with the relation "is more general than", noted ≤g, which is defined by: G1 ≤g G2 if and only if {i∈I | M(G1,i)} ⊇ {i∈I | M(G2,i)} where M(G,i) means that the generalization G match the instance i, M being the matching predicate. The set of all consistent hypothesis is defined with two sets, the set of maximally specific generalizations, noted S-set, and the set of maximally general generalizations, noted Gset. Adding positive and negative instances of the target concept lead to increase the S-set and to decrease the G-set. The algorithm stops when the S-set equals the G-set or when some inconsistency arises. Practical [Bundy & Al. 85] and theoretical studies showed that this framework was actually not usable. The first reason is that the number of examples required to ensure the convergence of the algorithm can be exponential with the problem size. The second is that the size of the G-set can also become exponential, even in some trivial cases like the one given in [Haussler 88]. For the sake of clarity, let us recall the examples given in [Haussler 88]. On one hand, Haussler says that if the instance space X is defined by the boolean attributes A1, A2, ...An, and if the target concept h is supposed to be A1 = true, we need more than 2n-2 positive examples and more than 2n-2 negative examples, even if the hypothesis space is restricted to pure conjonctive concepts. The reason is that there are 2 n-2 positive examples such that A1 = true and A2 = true, so if we want distinguish A1 = true from A1 = true & A2 = true, we need more than 2n-2 positive examples. The same argument can be applied to negative examples. Therefore, the number of examples needed is exponential with the number of attributes. On the other hand, let us suppose that X is always defined by the boolean attributes A1, A2, ...An and that there is one positive example Q (true, true, ..., true) and n/2 negative examples: (false, false, true, true, true, ..., true, true, true) (true, true, false, false, true, ..., true, true, true)

... ... (true, true, true, true, true, ..., true, false, false) Assume that the target concept h is a pure conjonctive hypothesis consistent with the positive example Q, then (1) h is of the form Ai1 = true & Ai2 = true & ... & Aij = true, for some {A1, A2, ..., A n} ⊇ {Ai1, Ai2, ..., Aik} and (2) h must contain the following atoms: either the atom A1 = true or the atom A2 = true to exclude the first counter example either the atom A3 = true or the atom A4 = true to exclude the second counter example ... ... either the atom An-1 = true or the atom An = true to exclude the last counter example Therefore, it is easy to show that the maximally general concept which meet both (1) and (2) is a disjunctive normal form containing at least 2n/2 conjonctions. It follows that the size of the G-set is exponential with the number of counter-examples. Many solutions have been proposed to solve the difficulties encountered. For instance one propose to modify the learning bias (Cf. [Utgoff 88]). Another proposed to consider only a list of negative instances to represent the G-set [Hirsh 92]. It is also possible to provide new ad-hoc representations of the version space [Nicolas 93] or to decompose the generalization language on a product of attributes [Carpinetto 92] etc. It appears that all those solutions are restricted to particular cases, for instance to the cases where the Sset is conjunctive or/and where the generalization space is a lattice — i.e. each set of instances has one and only one generalization. We claim that the learning problem being stated as above, it is possible to get a better formalization than the one proposed by the classical version space. It is just necessary to add the hypothesis that the instance language is ordered by the generality relation. Then, the S-set and the G-set are related to the instance language and not to the generalization language which is just used to compute an efficient generalization. In this framework, we do not have to make the S-set and the G-set converge since it is only necessary to have a maximally general generalization consistent with the instances, i.e. a G-set. To clarify our ideas, let us restrict to the case where the description language is a Brouwerian lattice (Cf. Appendix) and let us formalize the notion of S-set and G-set in this context. 3. FORMALIZATION OF THE LEARNING PROBLEM It is possible to formulate the learning problem as it was introduced by T. Mitchell (see above) using elementary lattice theory notions (see appendix). To do so, let us suppose

that given a set E = {e1, e2,..., en} of positive instances and a set CE = {ce1, ce2,..., cem} of negative instances, a concept C has to be learned. We shall assume that the positive instances, ei, and the negative instances, cej, are described as points of a representation space ℜ which is ordered by the generality relation ≤g. We shall also assume that ℜ is a Brouwerian lattice which means (Cf. Appendix) that for each pair {a, b} there exists a least upper bound of a and b, noted (a ∨ b), a greater lower bound of a and b, noted (a ∧ b) and a pseudo-complement of a related to b, noted (a : b). 3 . 1 . An example To make the presentation more intuitive for the reader, we shall provide an example which will illustrate all the abstract notions presented in the paper. This example is drawn from [Carpineto 1992] and from [Nicolas 93]. It is related to the playing card domain. We shall use a description language using three tree structured attributes, rank, oddity, colour : Anyoddity Anysuit

Black

♣

Odd

Red

♠ ♥

♦

1

Numbered

3

5 7 9 Anyrank

Even

J

K 2

4 6

8 10 Q

Face

1 2 3 4 5 6 7 8 9 10 J Q K Let us suppose there is two positive example, (7 ♠) and (King ♠). The least general generalization is the greatest lower bound of (7 ♠) and (King ♠), i.e. it is the greatest element E such that E ≤g (7 ♠) and E ≤g (King ♠). In other words, E = (7 ♠) ∧ (King ♠) = (7 ♠) OR (King ♠). In case of classical version space, we should have E = Odd & ♠. Using only distributive lattices and tree structured properties it is possible to obtain E = Anyrank & Odd & ♠ & R where R is a disjunction: E = (((rank = 7) ∨ (oddity = 7) ∨ (colour = ♠)) ∧ ((rank = K) ∨ (oddity = K) ∨ (colour = ♠))) E = ((colour = ♠) ∨ (((rank = 7) ∨ (oddity = 7)) ∧ ((rank = K) ∨ (oddity = K)))) E = ((colour = ♠) ∨ (rank = Anyrank) ∨ (oddity = Odd) ∨ ((rank = 7) ∨ (oddity = 7)) ∧ ((rank = K) ∨ (oddity = K))) because due to tree structured attributes properties we have (rank = 7) = (rank = 7) ∨ (rank = anyrank), (rank = K) = (rank = K) ∨ (rank = anyrank), (oddity = 7) = (oddity = 7) ∨ (oddity = Odd) and (oddity = K) = (oddity = K) ∨ (oddity = Odd).

It makes E = ((colour = ♠) & (rank = Anyrank) & (oddity = Odd) & R where R is a disjunction of the form : R = ((rank = 7) ∨ (oddity = 7)) ∧ ((rank = K) ∨ (oddity = K)) Let us remark that the least general generalisation of a and b is noted (a ∧ b) which corresponds to the logical disjonction — i.e. (a OR b) — and not to the conjonction. This surprising effect is du to the fact that the ordering relation is the generalization ordering and not the implication or the subsombtion. As we shall see in the following, the least general generalisation keeps a lot of information which makes it very similar to a factorization process. Here, the learning mechanism which has to forgot some information is related to the search procedure, not to the generalization mechanism. In fact, generalization is just used to store information and to define properly the search space. 3 . 2 . Maximally specific generalization As stated above, the maximally specific generalizations, i.e. the S-set, is the least general generalization of the positive instances which is consistent with the negative instances. In other words, it is the greater lower bound of E which means that: n

S=

∧ eu u=1

Using the properties of the glb — greater lower bound — it is obvious that ∀i∈[1, n] S ≤g ei, so S generalize all the positive instances. Moreover, if there exists another generalization, it is lower — i.e. more general — than S which means that S is the least general generalization of the examples. However, the negative instances are not involved in this definition of S. In fact, due to the properties of lattices, a negative instance is covered by S if and only if it is covered by at least one positive instance. We shall now prove this property. Definition: an element a ≠ ∅ is said ∧-irreducible (resp. ∨-irreducible) iff b ∧ c = a (resp. b ∨ c = a) implies b = a or c = a. Lemma: if P is ∨-irreducible [resp. ∧-irreducible] in a distributive lattice L then: k

P≤

∨x

i

implies ∃i∈[1, k] such that P ≤ x i

i=1 k

[resp. P ≥ ∧ x i=1

i

]

implies ∃i∈[1, k] such that P ≥ x i

Proof: k

P≤

∨x i=1

k

i

implies P = P ∧

k

∨x = ∨ P∧x) (

i

i=1

i

(because the lattice is distributive)

i=1

P being ∨-irreducible, then ∃i ∈[1, k] such that P = P ∧ xi. [proof similar for ∧-irreducible elements] Remark: as it is noted in the appendix, every Brouwerian lattice is distributive. Theorem: if ∃j ∈ [1, m] such that cej ≥ S then ∃i ∈ [1, n] such that cej ≥ ei. In other words, if some negative instance of C is covered by S, then it is also covered by some positive instance of C. Proof: it is just to apply the previous lemma. Two cases have to be studied: either cej is ∧-irreducible or not. In the first case, it is only to apply the preceding lemma with P = cej and xi = ei. In the second case, cej is the conjunction of ∧-irreducible elements, so p

ce j

∧ a ji the a ji being ∧-irreducible elements

=

i=1 j

j

Therefore, it is possible to apply the preceding theorem on the a i, since it is obvious that a i ≥ S j

j

that a i ≥ S which means that a i ≥

∧ eu u

Being a least general generalization, S is mainly restricted to the positive instances. Example: the previous example shows how it works to compute the least general generalization, i.e. to compute S. Let us remark that, using the classical version space generalization — i.e. E = Odd & ♠ & Anyrank — should lead to some contradiction with (9 ♠) as counter-example, while the present generalization — i.e. E = ((colour = ♠) ∨ (rank = Anyrank) ∨ (oddity = Odd) ∨ ((rank = 7) ∨ (oddity = 7)) ∧ ((rank = K) ∨ (oddity = K))) — should not. Here, the generalization is essentially viewed as a factorization. However, as we shall see in the following, the description language is a distributive lattice, so it may involve a product of distributive lattices, each one being associated with some hierarchy. In this case, the generalization may correspond to what people call "climbing a hierarchy of predicates". 3 . 3 . Maximally general generalization As described above, the maximally general generalization G— the so-called G-set —can be a huge formula, even in trivial cases. We define now the G-set using lattices operators so as to provide some usefull representation of the version space.

In the appendix, we define the difference operator, noted /, which is the dual of the pseudo-complement. More precisely, the difference b/a is the least x — i.e. the more general — such that (a ∨ x) ≥ b. The considered lattices being Brouwerian, it is obvious that there exists a difference for each couple of elements. Using the difference operator, it is then possible to define the G-set by the following formula: G=

n

m

i=1

j=1

∧ e i / ∧ ce j

To prove that G corresponds effectively to the G-set, we have to prove the three following points: 1- G covers all the positive instances, i.e. ∀i∈[1,n] G ≤ ei. 2- G does not cover any negative instance, i.e. ∀j∈[1,m] ¬ (G ≤ cej). 3- G is the most general set satisfying 1 and 2. 1- G covers all the positive instances Proof: m

n

j=1

i=1

∧ ce j ≥ ∧ e i

By definition, G is the least element such that G ∨

it follows that G ≤

n

∧

n

∧ e i would be a solution...

e i because, if it was not the case

i=1

i=1

Therefore ∀i ∈ [1, n] G ≤ ei. 2- G does not cover any negative instance It is possible to prove that if G cover a negative instance, then, this instance is also covered by S. Proof: let us assume that there exists a negative instance, cek, which is covered by G, i.e. such that G ≤ cek. Using the distributivity we obtain G ∨ But, as ce k ≥ G , it follows that

m

∧ j=1

(G

m

m

j=1

j=1

∧ ce j = ∧ (G ∨ ce j) ∨ ce j) =

m

∧

(G

∨ ce j) ∧ ce k ≥

j = 1 , j≠k

Which means that S ≤ cek. 3- G is the most general which satisfies 1 and 2 Proof: it follows naturally from the definition of the difference

n

∧ ei i=1

The preceding formula makes possible the computation of G, but it is very inefficient. As Hirsh [Hirsh 92] shows, it is only necessary to provide some representation of the version space which make possible to: 1- test if an instance belong or not to the version space and 2- update the version space. Hirsh proposes to memorize a list of negative instances. Here, the proposed solution is similar: it is to memorize the least general generalization of the positive instances, i.e. S, and the least general generalization of the negative instances. In other words we just have to memorize: m

∧ j=1

n

ce j and

∧ ei i=1

It is interesting to show that general considerations on the mathematical structure of the objects manipulated by learning algorithms lead to some practical consequences. Moreover, it appears that our solution is better than the solution proposed by Hirsh because we even have to memorize the generalisation of the negative instances and not the list of all the instances. Even if the generalization keeps a lot of information, it is more efficient to consider only the generalization of negative instances. Example: let us consider again the previous example and let us suppose there are two positive instances (7 ♠) and (K ♠) and two negative instances (7 ♣) and (Q ♠). The least general generalization of the example is E = ((colour = ♠) ∨ (rank = Anyrank) ∨ (oddity = Odd) ∨ ((rank = 7) ∨ (oddity = 7)) ∧ ((rank = K) ∨ (oddity = K))) while the least general generalization of the counter-example is CE = ((colour = Black) ∨ (rank = Anyrank) ∨ (oddity = Anyoddity) ∨ ((rank = 7) ∨ (oddity = 7) ∨ (colour = ♣)) ∧ ((rank = Q) ∨ (oddity = Q) ∨ (colour = ♠))). Now, let us suppose we want to generate some conjonctive hypothesis h belonging the version space. It is the least general conjonction G such that G ∨ CE ≥ E, i.e. G = E/CE. In our example, neither G = (colour = ♠) nor G = (oddity = Odd) are solutions, so there is only one conjonctive solution which is (colour = ♠) ∨ (oddity = Odd), i.e. (colour = ♠) & (oddity = Odd). As we see on this example, in case of conjonctions of attributes, it is very easy to compute the solution with this formalism. It is also easy to check that no conjonctive solution is allowed and to compute disjunctive solutions. 4. STRUCTURE OF LATTICES So far we have seen how to circumscribe the version space in ℜ, we shall now study the exploration of ℜ and then its structure. Let us remember that ℜ is a distributive lattice. Using classical results from lattice theory it is possible to define a particular class of distributive lattices for which the complexity of the exploration is linear with the number of points, i.e. with the number of symbols belonging to ℜ. These lattices, as we shall

see, will play a central role in the structure of ℜ. This section introduces this class of lattices and goes on to show how it is possible to structure ℜ. Definition: A subset A of a poset X is said ∧-closed if and only if ∀a∈A, ∀x x ≤ a ⇒ x∈A. Theorem: the free distributive lattice generated with n symbols, ∅ and I, is dually isomorphic to the ring of all ∧-closed subsets of {1, 2, ..., n}. Proof: See [Birkhoff 67]. Definition: A poset X is said to be hierarchical if and only if ∀(a, b) ∈ X2 a ≥ b or a ≤ b or {x/ x ≥ a and x ≥ b} = {}. It means that a ∨ b = if a ≥ b then a else if a ≤ b then b else I. Example: all three attributes Colour, Oddity and Rank are built on hierarchical posets since the conjonction of any two values is either one of the two values of the empty set. For instance, (colour = ♠) ∨ (colour = ♣) = ⊥ while (colour = ♠) ∨ (colour = black) = (colour = black). Theorem: Any element belonging to a distributive lattice L built on a hierarchical poset X is a greater lower bound of elements of the poset X. Proof: See [Ganascia 92]. Therefore, in the case of distributive lattices built on hierarchical posets, only the greater lower bounds of elements of the hierarchical posets have to be considered. In other words, using hierarchical posets the search space, i.e. the portion of the distributive lattice ℜ which has to be explored, is restricted to the ring of all elements of the poset X on which ℜ is built. In practice, hierarchical posets correspond to attributes whose values are disjointed, linearly ordered or, more generally, partially ordered through a hierarchy. Hierarchical posets correspond to hierarchies of propositions. It is not a sufficient representation since it was not possible to be restricted to just one attribute at a time. To increase the representational power, therefore, the product of distributive lattices built on hierarchical posets can be introduced. Since it has been proved that the product of distributive lattices is a distributive lattice, the resulting relation lattice ℜ is just the product of n hierarchical posets. This structure could be interpreted in terms of machine learning as a set of attributes, each of which is a hierarchical attribute, i.e an attribute whose values belong to a hierarchy. This is exactly the structure of the representation that is used in [Mitchell 82] and in many others papers in the litterature. It can also be seen as an extension of the attribute-value vectors as they are used in the classical TDIS algorithms like ID3, CN2, etc. The use of tree structured attributes is a particular case of this structure, so the previous example fall in this framework. One of the main advantages of this structure is that it allows a direct introduction of knowledge through hierarchies, whether this knowledge be composed of exclusive values, ordered attributes or general hierarchies.

5. CONCLUSION The aim of this paper was to give a clear, intuitive presentation of the proposed formal framework which is a nice formulation of version space. Another work (See [Ganascia 93]) based on the same formalism allows to generalize the classical Top Down Induction Techniques and to extend them with respect to learning strategies and representation. In a few words it allows to compute an approximation of the G-set with k-CNF formulas. Therefore it makes possible to establish strong link between the version space and the Top Down Induction Techniques. However it is not only of theoretical interest, and the next step is to use it in a practical manner along three lines. The first point is to use it to test different learning strategies. It will be implemented so as to enable an easy introduction of many different learning strategies, thus allowing classical "ID-like" or "CN-like" strategies to be evaluated and compared with new ones. The second point concerns the ability to modify the learning bias and to express it as part of knowledge. In case of the classical version space, the generalisation language is given at the very begining when the S-set and the G-set are computed, while in our framework it can only be given when the generalization step occurs. So it renders possible a dynamic transformation of the part of the learning bias which is related to the generalization language. The third point is related to the extension of the proposed framework to cylindric algebra in order to take into account cases where the reduction to attribute value language is not possible. The final point is mainly of theoretical interest and involves examining the exact role of Brouwerian logic on which the new model is based. The introduction of Brouwerian logic was in fact a surprising spin-off considering that the only objective was to get a non atomic lattice as a description space. This question is directly related to the fact that distributive complemented lattices are necessarily atomic which means that the introduction of classical negation leads to a Boolean representation. Since the aim was to justify attribute-value representation a simplification operation has to be introduced without using complementation; the choice was pseudo-complementation. It now seems interesting to investigate the links between induction and intuitionism in more depth. Concerning this point, our intuition is that it could enlight the philosophical debate related to induction since the classical Hempel's objection could be refuted. 6. REFERENCES [Birkhoff 1967]., Lattice theory, Third Edition, American Mathematical Society, Providence, RI 1967. [Blumer & Al. 87] "0ccam's razor", Information Processing Letters 24

[Bundy & Al. 85] Bundy A., Silver B., Plummer D. "An analytical comparison of some rule-learning programs", Artificial Intelligence 27, 1985 [Clark and Niblett 1987] "The CN2 Induction Algorithm", Machine Learning, 3, p. 261283. [Carpineto 1992] "Trading off Consistency and Efficiency in Version-Space Induction", ICML-92 [Ganascia 1992] "An algebraic formalization of CHARADE", LAFORIA, Internal report, November 1992. [Ganascia1993], "TDIS: an Algebraic Formalization", to appear in IJCAI-93, August 1993, Chambéry, France. [Gold 67] "Language identification in the limit", Information and control, Vol. 10, pp. 447-474. [Grätzer 1978] General Lattice Theory, Academic Press, New-York, 1978. [Haussler 1988] "Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework", Artificial Intelligence 36, 1988 [Hirsh 1992] "Polynomial-Time Learning with Version Space", ICML 92 [Quinlan 1983] "Learning efficient classification procedures", Machine Learning: an artificial intelligence approach, Michalski, Carbonell & Mitchell (eds.), Morgan Kaufmann, 1983, p. 463-482. [Quinlan 1986] "The effect of noise on concept learning", Machine Learning: an artificial intelligence approach, Vol. II, Michalski, Carbonell & Mitchell (eds.), Morgan Kaufmann, 1986, p. 149-166. [Rasiowa 1974] An algebraic Approach to Non-Classical Logics, Studies in Logic and the Foundations of Mathematics, North-Holland, Amsterdam-London, 1974. [Mitchell 1982] "Generalization as search", Artificial Intelligence, 18, p. 203-226, 1982. Mitchell T. "Generalization as search", Artificial Intelligence 18, 1982 [Mitchell T. 1980] "The need for Biases in Learning Generalizations", May 1980, internal report Rutgers University, Readings in Machine Learning, Morgan Kaufman [Muggleton and Feng 1990] "Efficient Induction of Logic Programs", Proceedings of the First Conference on Algorithmic Learning Theory, Tokyo, Japan: Ohmasha. 1990. [Utgoff P. 1986] "Shift of bias for inductive concept learning", in Machine Learning 2, Morgan Kaufman,1986 [Wille 1989] "Knowledge Acquisition by methods of formal concept analysis", Technische hochschule Darmstadt, Fachbereich Mathematik, July 89, preprint N° 1238. [Valiant 1984] "A theory of the learnable", Comm. ACM 27, (11), 1984, 1134-1142 Nicolas J. "Une représentation efficace pour les espaces des versions" 7.

APPENDIX

This is an introduction to some basic notions of lattice theory. More details can be found in [Rasiowa 74], [Grätzer 78] and [Birkhoff 67]. Definition: A poset is a partially ordered set, i.e. a set with a partial ordering relation ≤. Definition: A lattice is a poset E such that there exists a least upper bound and a greatest lower bound for each pair (a, b) of elements of E. The least upper bound is noted (a ∨ b) and the greatest lower bound (a ∧ b). Definition: A lattice is distributive if and only if it satisfies (1) or (2): (1) ∀(x, y, z) x ∧ (y ∨ z) = (x ∧ y) ∨ (x ∧ z) (2) ∀(x, y, z) x ∨ (y ∧ z) = (x ∨ y) ∧ (x ∨ z) Definition: The pseudocomplement of a relative to b is the greatest element x such that a ∧ x ≤ b. It is noted "a:b". Since we need to simplify, to subtract, i.e. to build the difference between two descriptions, the dual of the relative pseudocomplement will defined and named "difference". Definition: The difference "b/a" is the dual of the pseudocomplement of a relative to b. In other words, it is the least x such that a ∨ x ≥ b. Definition: a Brouwerian lattice is a lattice where there exists a difference "b/a" for each pair of elements (a, b). Theorem: Every Brouwerian lattice is distributive. Proof: See [Birkhoff 67]. Theorem: Every finite distributive lattice is a Brouwerian lattice See [Birkhoff 67]. Proof: See [Birkhoff 67].

algebraic structure of some learning systems - CiteSeerX

algebraic structure of some learning systems - CiteSeerX

Suggest Documents

Fibonacci numbers and algebraic structure count of some ... - CiteSeerX

ALGEBRAIC STRUCTURE OF GENETIC

Algebraic Structure of Linear Dynamical Systems. III. Realization ...

structure systems. - CiteSeerX

ALGEBRAIC STRUCTURE OF COMBINED ...

Algebraic evaluations of some Euler integrals, duplication ... - CiteSeerX

The degree of a Boolean function and some algebraic ... - CiteSeerX

Algebraic evaluations of some Euler integrals, duplication ... - CiteSeerX

SOME METRIC ASPECTS IN ALGEBRAIC

On some variational algebraic problems

Linear Systems of Algebraic Equations

Virtual Learning Systems - CiteSeerX

SOME ALGEBRAIC DEFINITIONS AND CONSTRUCTIONS ...

On New Algebraic Systems

DYNAMICAL SYSTEMS OVER ALGEBRAIC

Algebraic computation of some intersection D-modules

GEOMETRIC PROPERTIES OF SOME ALGEBRAIC CURVES ...

SOME ALGEBRAIC AND STATISTICAL PROPERTIES OF ... - EMIS

Structure of the Algebras of Some Free Systems - Project Euclid

Learning Argument Structure Generalizations - CiteSeerX

Developing Algebraic Thinking in Earlier Grades: Some ... - CiteSeerX

Some algebraic laws for spans (and their connections with ... - CiteSeerX

Numerical stability bounds for algebraic systems of Prony ... - CiteSeerX

Capital Structure and Firm Characteristics: Some ... - CiteSeerX