Document not found! Please try again

Fuzzy models of language structures - Fuzzy ... - Semantic Scholar

4 downloads 0 Views 571KB Size Report
Takingintoaccounttherelationbetween , ,and ,then. D. Negative Binomial Distribution With Fuzzy Elementary. Events [10]. Let the sequence of Bernoulli trials with ...
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 10, NO. 4, AUGUST 2002

421

Fuzzy Models of Language Structures Fransisco Criado Torralba, Tamaz Gachechiladze, Hamlet Meladze, and Guram Tsertsvadze

Abstract—Statistical distribution of language structures reflect important regularities controlling informational and psycho-physiological processes, which accompany the generation of verbal language or printed texts. In this paper, fuzzy quantitative models of language statistics are constructed. The suggested models are based on the assumption about a super-position of two kinds of uncertainties: probabilistic and possibilistic. The realization of this super-position in statistical distributions is achieved by the splitting procedure of the probability measure. In this way, the fuzzy versions of generalized binomial, Fucks’, and Zipf–Mandelbrot’s distributions are constructed describing the probabilistic and possibilistic organization of language at any level: morphological, syntactic, or phonological. Index Terms—Fuzzy sets, linguistic modeling, membership functions, probability theory.

One has

Taking the previous into account, there are two direct possibilities of grouping the terms 1)

I. SET SPLITTING

L

ET be a finite set and a correspondence cator of subset ,

any subset, , where and

. Consider is the indi-

i.e., the grouping corresponds to the splitting of

(3)

(1) In this paper, it is supposed that is the support of the mappings and [i.e., , ]. More and determines the support precisely, is the support of : , that is if is of , ), a subnormal fuzzy subset ( . In other cases . In all instances determines the support of . Notice that “ ” does not denote especially the functional is called a correspondence but only that any pair if and equasplitting of a crisp set tion (1) holds. and are According to [1], the splitting components the dual subset with respect to . fuzzy subsets of . Call induces The splitting procedure of some subsets corresponding splitting of the union and intersection of these two subsets. Let and and the results of be split when and are their splitting. How can given? This splitting must satisfy the natural requirement

, when

This splitting has the same character as in.1 Indeed, consider and first split as

Now, split

as

This leads to (3). Evidently, the splitting order is not essential. For this reason, [2], (3), and in are called “sequential splitting” 2)

if

,

(2) if Manuscript received February 6, 2001; revised October 30, 2001 and February 26, 2002. F. Criado Torralba is with the Department of Statistics, Málaga University, Málaga E-29071, Spain (e-mail: [email protected]). T. Gachechiladze, H. Meladze, and G. Tsertsvadze are with the Department of Applied Mathematics and Computer Science, Tbilisi State University, Tbilisi 380028, Georgia. Publisher Item Identifier 10.1109/TFUZZ.2002.800655.

1Throughout the whole paper, one must always keep in mind that the tilde (or double tilde) in notations of intersection and union of crisp subsets A and B stands for splitting induced by splitting of the corresponding components A and B .

1063-6706/02$17.00 © 2002 IEEE

422

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 10, NO. 4, AUGUST 2002

i.e., this grouping corresponds to following splitting of the intersection (4) Such splitting in [2] is called “simultaneous.” fulfillment of the natural reIn the case of the union quirement

is demanded, and operating as in the case of intersection one can easily obtain

(sequential splitting)

(5)

(simultaneous splitting)

(6)

Now consider the Cartesian product. Let two universal sets and be given, and , . If their splittings are given, then according to

one can write

(sequential splitting)

(7)

(simultaneous splitting)

(8)

Finally, some simple relations connected with the dual subset are listed, which can be proved directly from the following definitions: ; 1) involution , 2) connection with Zadeh’s complement is the usual complement of in the universal where set; 3) duality laws for intersection and union (true for both kinds of splitting)

Definition 2: The lattice is called a Brouwer’s lattice if for such any given element of this lattice and the set of all have a greatest element,2 called relative that pseudocomplement of in . Relative pseudocomplement of in , is called pseudocomplement of and is denoted by . with natural orConsider the Boolean lattice and for , , then dering (if ). The set of all split elements of this lattice with natural is a lattice. ordering is a Brouwer’s lattice. Theorem 1: The direct demonstration of this theorem [i.e., the demonand the set of stration that for any two elements such that3 has the greatest element all called the relative pseudocomplement of in ] can be done according to [4]. It is easy to see that

(11) is a pseudocomplement of and, as a where function of , represents the indicator of the usual complement of the set in . Next, the following theorem is easy to demonstrate. The theorem shown below is well known, but here it is presented in terms of fuzzy subsets. Theorem 2: The following statements hold in the lattice : If

(12)

The two theorems previously considered, as will be seen further on, are important when calculating probabilities of some random events.

(9) (10) which turn into the De Morgan’s laws for fuzzy subsets when and , i.e., when supports of both and mappings coincide with the universal set. II. THE LATTICE OF SPLIT ELEMENTS OF ORDINARY INDICATORS’ BOOLEAN LATTICE First, let some definitions be considered. Definition 1: A set , partially ordered, is called a lattice if ) and any two elements and have a infimum (intersection ). Brouwer and Heiting have introduced a supremum (union a generalization of the Boolean Algebra (see [4]).

then

III. THE SPLITTING OF A SET The splitting of a set, which corresponds to the indicator splitting, as seen, is represented as

(13) Here, is the operation of set synthesis. corresponds to the splitting The term “splitting” of resulting in the pair procedure of the classical indicator 2Greatest

x 2 X. 3

element of the set X is an element b

2 X such that x  b for all

 denotes the natural order: A~  B~ ) A~  B~ .

CRIADO TORRALBAet al.: FUZZY MODELS OF LANGUAGE STRUCTURES

423

. The term “split element” means the first element of , i.e., . All results can be formulated in terms a pair of the splitting procedure. All formulas in this paper must be considered as rules for operating with first component of dual pairs in practical use of the theory of fuzzy sets and can be founded by the splitting procedure. On the basis of (13) one can obtain a more general expression that, obviously, will make sense under the condition , or . One can also obtain the existence that , etc. conditions for expressions Considering that such a condition holds for the aforementioned expressions, one can easily prove that

8) the annihilation laws: and ; ; 9) involution law for fuzzy complement: , and , 10) identity laws: ; and 11) order inversion laws: ; and 12) De Morgan’s laws: . In connection with the introduced notion of dual subsets, one can prove the following laws: 13) involution law for the dual subset:

14) duality laws for the union and intersection of split subsets:

(14) For example, to prove law 13), one can write

For example, to prove the last two formulas, one can write

On the other hand, according to (12)

Let it be assumed that in these formulas the following relations hold:

(15) which are evident because of (4), (6), and (13). In the lattice of split subsets, almost all Boolean lattice rules hold. ; 1) reflexivity: , ; 2) antisymmetry: , ; 3) transitivity: and ; 4) idempotency: and ; 5) commutativity: and 6) associativity: ; and 7) distributivity: ;

Comparing these expressions, one obtains the required proof. Now, to prove the second law (14) one has

Notice that in lattice datur do not hold.

laws of contradiction and tertium non

424

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 10, NO. 4, AUGUST 2002

In Section I, some examples of indicator splitting were considered. Let other examples be considered. . Let be the universal set, Splitting the set difference . The equality holds. If one splits subsets and , then the splitting of this equality, according to (4), will be

Under the condition that is a narrowing of . responding , i.e., Let be the universal set again but now It is widely known that

on cor.

(16) For splitting the symmetric difference

The splitting of

, one has

leads to the formula

(17) On the other hand (21) Thus, for the split symmetric difference, one also has the formula

. Under the condition that Splitting of the element of the universal set (fuzzy point) (22) where

According to (17) (18)

(23)

Actually, taking into account law 7), one has

and a Theorem 3: Let be the universal set, corresponding split point, then the splitting of the universal set point will be the relative determined by the splitting of the in : pseudocomplement of (24) Proof:

Equations (17) and (18) can be rewritten as

IV. DUAL ELEMENT AND FUZZINESS (QUALITATIVE CONSIDERATION) (19)

Let be the universal set and equality

. By the

the splitting of , according to the aforementioned example, leads to the equality

(20)

As illustrated before, the dual element plays an important role in describing split subset lattices. Now, the role of the dual element in understanding fuzziness will be considered. There is an important difference between usual and fuzzy subsets: The usual subset (set) can be represented as an aggregate of real objects when only the real measured potential possibility of aggregate formation corresponds to fuzzy subsets. Fuzzy subset is a medium of formation for real aggregate. It is important to notice that the term “medium of formation” is borrowed from [5] to underline the following circumstance: any sequence of research outcomes is a result of acts of free decision-making by the subject (observer), any concrete sequence is a crisp finite subset of some universum, but the fuzzy subset is analogous of Weil’s continuum. is defined In the lattice of fuzzy subsets, a dual element by splitting procedure [2], [3]. Its sense can be explained as folis a degree lows: The value of the membership function of concordance of an element with the concept represented has the same sense with respect to the by ; the value , which in the pair with , concept represented by defines a crisp subset . The nearer (in some sense) to and

CRIADO TORRALBAet al.: FUZZY MODELS OF LANGUAGE STRUCTURES

425

[6], then more fuzzy is the statement “elements of posses property ( ).” Below, a qualitative description of fuzziness is considered analogously with [6], but with the following difference: in [6], the fuzziness is characterized by the relation between and Zadeh’s negation . In the present case, the , which in the authors’ less rigid relation between and opinion underlines the fact that fuzziness is an intrinsic property of and is independent of pseudocomplement, is assumed as a basis. The basis for considering the relation is a relation in ” [6]. distributive lattice, “ is between and , and (distributive lattice). is Definition 3: Let , if and not less fuzzy than are in between and . Here means sequential splitting of the intersection of and [3]. So that

(25) Theorem 4: Relation and

is reflexive and transitive on

, i.e.,

and

It can be seen that on is not antisymmetric and, hence, it is not a partial order. is such that Theorem 5: Relation on and ; 1) . 2) define such a relation such that On the lattice if or or . is an equivalence relation. Each equivalence class consists of a fuzzy subset and , then the equivalence class corresponding dual. If consists of only one element. Check that is really an equivalence relation. The reflexivity and and let and symmetry are obvious. Let ; if the second means that from the first follows then in this case ; and if then . The other cases are checked analogously. Thus, and or or . The transitivity is thus proved. The subset consisting of any fuzzy subset and its corresponding dual subset is called the dual pair. According to Theorem 5, if one component of the dual pair is more fuzzy than any component of the other pair, then any component of the first pair is more fuzzy than any component of the second pair. So it is reasonable to introduce the notion of fuzziness of the dual pair. Definition 4: Let be a set of dual pairs. Define on a takes place , then relation such that if for one can say that the dual pair is no less fuzzy than the dual pair . Theorem 6: The relation on the set of dual pairs is a partialorder relation. Proof: . Consider the component , as 1) Reflexivity: then . and . 2) Transitivity: means that for all and all .

means that for all and all . and According to Theorem 4 , , , consequently . and . One 3) Antisymmetry: if and , then . has if and , then . Acand cording to Theorem 5, or and ; , i.e., or . besides, it must be . The dual pair corresponds to the Consequently splitting procedure. From the aforementioned consideration, one can speak about a rise in fuzziness at splitting (fuzziness of splitting). Relation defines a partial order in the set of split subsets. The properties of fuzzy subsets studied in previous sections enables one to consider problems of fuzzy subset theory in terms of the splitting procedure.4 However, the aim here is that of considering fuzzy quantitative models of language structures. corresponds to the Remark 1: The term “splitting” of splitting procedure of the classical indicator, which results in the ). The term “split element” refers to the component pair ( , ), for instance . All results can be formulated of a pair ( , in terms of the splitting procedure. All formulas in this paper must be consider as rules for dual pair operations with respect to the practical use of the theory of fuzzy sets and can be supported by the splitting procedure. V. PROBABILITY MEASURE SPLITTING Let of the event

be a given probability space. The probability is calculated from the formula (26)

According to the splitting procedure of the set , this formula can be rewritten in the following form: (27) is a -measurable membership function (the correwhere and sponding subset is a fuzzy random event). Define as follows:

and (28) the probability of fuzzy event and the probability of dual fuzzy , correspondingly. Call the representation event (29) the procedure of probability measure splitting. 4This idea, put forward by V. Kreinovich, is very elegant, especially for considering fuzziness quantitatively, this fundamental quality was presented in ~; X ~ ). The authors would like to thank the referee terms of the dual pair (X for his interesting suggestions.

426

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 10, NO. 4, AUGUST 2002

From , where immediately follows that:

The main property of additivity

is an arbitrary subset, it

(33) can be split in the following way. The left-hand side is Furthermore, since

, it is evident that (34)

Because

From the properties of the Lebesgue–Stieltjes integral the properties of the split probability measures are as follows: ; 1) monotony: 2) continuity with respect to monotonic sequences:

are not intersected, then

(35)

(29 ) 3) strong additivity: and the right-hand side 4)

-semiadditivity:

(29 ) Finally, the property of additivity for split events is written in the following way:

(36)

Measure 3) follows from the equality:

and measure 4) from the inequality or (37)

(30)

Thus, the denumerable additivity property also holds for fuzzy subsets. Similarly

(31)

A. Conditional Probability (Nonsplit Conditions) The condition expressed by the phrase “for a given event ” means that the initial probability space is re, is a condiplaced by the probability space tional probability measure. As it is known, the conditional probability of some event is the conditional mathematical expectation of indicator

, then also, , . From these relations, it follows that for nonintersecting subsets, the law (14) can be represented as

Actually, if

(38) (14 ) In [7], this quantity is interpreted as the value of function

In general, for any finite (32)

(39)

CRIADO TORRALBAet al.: FUZZY MODELS OF LANGUAGE STRUCTURES

427

More generally, if is the denumerable partition of and is the minimal -field induced by this partition, then the -measurable function

Then, the right-hand side of (38) can be represented as

(40) is a value of the conditional probability for a given -field . The procedure of splitting indicator leads to the notion of the conditional probability of fuzzy event for a given (nonsplit) -field (41) and

(45) . The authors The left-hand side is equal to obtained the descriptive definition of conditional probability [7]. for a given (conditional The conditional expectation probability of for a given ) is the -measurable function is a narrowing on whose indefinite integral with respect to of the definite integral of with respect to (46)

(42)

It is easy to see that ; 1) , or is a -measurable, then 2) if a.s. Note that (46) also makes sense for nondenumerable partitions [7]. B. Conditional Probability (Split Conditions)

The splitting rule is (43) Formula (40) defines the conditional expectation for any non. Actually, may be represented as a null crisp event : union over a subclass of

and, according to (38), one may write

is such that the direct The constructive definition of application of the splitting procedure for obtaining conditional probability in the case of a fuzzy condition is impossible. However, as it will been seen below, one can obtain a formula similar to (38) in the case of fuzzy condition (split condition). For this purpose, when splitting the corresponding measure is essential to retain some features of this formula. Proceeding from the notion of the mathematical expectation of a random event indicator for a given function [7], if for such a function one takes a function corresponding to fuzzy condition (membership function of fuzzy condition) and performs the convenient splitting, a reasonable measure that has almost all basic properties of ordinary conditional probability can be obtained. induces the denumerable partition of ; ( , Let , ). In this case

(47) (44)

Thus, for a function of conditional probability in the case of fuzzy condition one can take the expression

is known, then can be evaluOne can see that if ated. be a narrowing of on , which is determined by the Let formula

(48) where the numbers (49)

428

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 10, NO. 4, AUGUST 2002

A similar expression is obtained for

. It is clear that

Formulas (54) and (55) are evident. For example, in the case of (54) one has (taking into account the absolute continuity of measures and with respect to measure )

and (50) . If for any natural

Now consider any measurable indicator one defines the function

then the sequence has

tends to

. One a.s.

(51) and

VI. SPLITTING SHANNON’S ENTROPY Let set be split point by point. In this case, Shannon’s , turns into entropy of probability distribution , where is the membership function . If the branching property of funcof a fuzzy point , [8] is used, then tion

It is clear that (56) (52) These definitions make sense because of the generalized Radon–Nikodym’s theorem [7]. Comparing the aforementioned can be considered as the formulas, one can conclude that conditional probability function in the case of the measurable fuzzy condition. All considered formulas were related with measures of crisp events in the case of the fuzzy condition. If one performs the splitting of indicator on the right-hand side of basic formula (51), then one obtains the definition of

where5

(57) and (58) set

is Zadeh’s entropy [9], i.e., an entropy of fuzzy with respect to probability distribution

(53) [see (3)]. where The considered version of conditional probability in the case of the fuzzy condition almost surely has all properties of ordi, which must nary probabilities except the condition be replaced by a.s.

(59) Function gence

is actually a Kullback directed diver, [10] and

(54) (60)

One has a.s.

is a weighted nonprobabilistic entropy of [11].

a.s. a.s.

(55)

5log(1) = log (1).

CRIADO TORRALBAet al.: FUZZY MODELS OF LANGUAGE STRUCTURES

429

It can be seen that (56) is no more than Hiroto’s measure of uncertainty [12]. Notice that if the branching property is used in some other way, then (56) can be rewritten in the following form:

where factor is

is the fuzzy Bernoulli event. The normalization

For the completely sequential case, one gets (66) and (61) where

The important characteristic of split Bernoulli probability (65) is the composition law; in the simultaneous case

In addition, functions connected by Jumarie’s entropy [13]:

and

(67)

are and in the sequential case (62)

VII. FUZZY DISTRIBUTIONS A. Binomial Distribution With Fuzzy Elementary Events

(68)

be the space of elementary events. One can Let obtain the fuzzy elementary events by splitting usual events and . For membership functions, one can write

and

(63) , . where According to (28), the probability of fuzzy elementary events is (64)

As well as the characteristic of binomial probabilities in the case of fuzzy elementary events, one may consider the known property of exponential distribution; in the simultaneous case

where and are the probabilities of the corresponding crisp events. Now it is easy to write the split binomial distribution corresponding to fuzzy elementary events. Only two variants will be considered: completely simultaneous and completely sequential. The intermediate cases are not of any interest and for this reason they will not be considered here. For the completely simultaneous case, the split binomial distribution is

(69) and in the sequential case

(65)

(70)

430

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 10, NO. 4, AUGUST 2002

where

where

The Poisson limit B. The Binomial Distribution With Fuzzy Number of Successes be considered. The fuzzy Let set quantity “approximately from ” is defined as the fuzzy subset . Therefore, the corresponding distribution is of (71) is the membership function of fuzzy number “apwhere proximate from .” This distribution is also called the binomial distribution because it is characterized by the above composition law and the property of exponential distribution.

(73)

where and are connected by the relation

From a practical viewpoint, what is interesting is the expresand sion of the sum over all values of

C. Fuzzy Upper Binomial Distribution

(74)

The consideration of the usual upper binomial distribution is based on the model of superposition of two events. The Bernoulli event and the emergence of the total amount of . failures characterized by a priori probability is the probability of elementary success, and If are values of membership functions corresponding to compliwhen distinguishing the events of cated events a Bernoulli and non-Bernoulli origin, then the universal set , , is split in the folwhich is the composition lowing way:

The corresponding membership function

where

Taking into account the relation between , , and , then

D. Negative Binomial Distribution With Fuzzy Elementary Events [10]

[condition is obviously satisfied]. The probability measure corresponding to fuzzy upper binomial distribution is

Let the sequence of Bernoulli trials with probability of fuzzy be considered ( ), success probability of usual Bernoulli elementary event, deth notes the probability that the th success takes place at trial, provided trials are continued up to th success. Accepting a splitting scheme that is used for binomial distribution with fuzzy elementary events, one can write

(75)

Since for any (72)

CRIADO TORRALBAet al.: FUZZY MODELS OF LANGUAGE STRUCTURES

431

then the aforementioned formula can be written in the following form:

(76)

(for sequential splitting) with

Here is connected to linguistic spectrum [15]. events are split ) and Bernoulli 2) events are crisp:

Define negative binomial distribution with fuzzy elementary and as sequence events, but fixed real number (77)

Evidently

where

(81) where

Note that if , or , then (77) reduces to usual negative binomial distribution. Fuzzy Fucks’ Distribution: As in the case of “upper Bernoulli” distribution, all variants of Fucks’ distributions [15] are based on the assumption that Fucks’ event is a superposition of Bernoulli and deterministic events (78) is deterministic (certainly successes in trials) and is a Bernoulli event [ successes in random events]. There are many variants of Fucks’ event splitting. Only some of them are considered in this paper. 1) The deterministic event is nonfuzzy, but Bernoulli elementary events are fuzzy. In this case

is the membership function of fuzzy set

and (82)

3) In the case when both deterministic and Bernoulli events are split, one must discriminate clearly the simultaneous and successive or sequential splitting of Fucks’ event. In the last case, it is easy to obtain the final result. Consideration of the two aforesaid cases allows one to write (83)

where

consequently

(84)

The corresponding probability measure is

(simultaneous splitting of Bernoulli event) and

(79)

(85) (completely sequential splitting of Bernoulli event). When Fucks’ event is split simultaneously the author’s is a realized chain reasoning is as follows: of distributed successes and failures, a chain that is a concatenation of two others: Deterministic in which there are only successes and Bernoulli sequence of length containing successes. Therefore, simultaneous splitting must take place according to the rule

(for simultaneous splitting) with

and

(80)

(86)

432

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 10, NO. 4, AUGUST 2002

Fuzzy Zipf–Mandelbrot Distribution: It is well known that the Mandelbrot’s theory of recurrent coding constitutes the basis of statistical macrolinguistics. If the vocabulary of volume is divided into classes according to informational cost [16] of words of a given class, then the probability of the word of th class can be expressed as

Consequently

(93) (87)

The considered fuzzy Fucks’ distributions play a leading part in constructing fuzzy quantitative microlinguistical models of language. 4) Some language structures are often described by generalized Fucks’ distributions when there are two kinds of successes with probabilities and . In this case

where , , do not depend on the cost and tional cost of th class. Let three cases of splitting be considered. 1) The set of classes this case

is a informa-

. In (94)

2) The set of informational costs . For is a function of , according to the principle of generalization [18] one obtains (95)

(88)

3) When the number of classes is fuzzy number , by analogy with binomial distribution with fuzzy number of trials, one can write

If one is only interested in one kind of success, then (96) The aforementioned formulas must be applied to the whole language as a formation medium, while the classical one must be applied to individual texts. (89) Corresponding Poisson limit (

,

VIII. GRAPHEMES DISTRIBUTION MODELING BY WORDS IN THE SPANISH LANGUAGE

and

) is (90) where

In this section, the description of the research method of probability–possibility organization of graphemes distribution by words is suggested. The aim of this investigation is not to construct a final quantitative model of the word formation process, but rather to illustrate the possibilities of the suggested model. Description of the Model

(91) is Euler integral, and , an incomplete gamma function. Taking into account the relation between incomplete gamma-function and -distribution one finally obtains (92) is -distribution with degrees of freedom. Distribution (92) is called the degrees “ -distribution with approximately of freedom.”

where

Consider -seat carteges of three types of symbols conventionally called empty symbols, zero and unit elements. Unit elements are obtained directly from experience and the real structures of natural language, which are formed by picked up language elements, are estimated using the said elements. According to the suggested model, the process of any analyzed structure formation is considered as the super-position of purely random (probabilistic) and possibilistic (fuzzy) processes, i.e., as the composition of two bodies of evidence: Probabilistic (dissonant) and possibilistic (consonant). The structure of the probabilistic body of evidence correwhere is the cartege length, sponds to Bernoulli events the whole number of unit elements, the number of a priori fixed (determined) unit elements (d.u.e.). As to the structure of the possibilistic body of evidence, it consists of consonant

CRIADO TORRALBAet al.: FUZZY MODELS OF LANGUAGE STRUCTURES

433

(focal) events to which the poscorresponds; here is sibility distribution “one d.u.e.,” , “ d.u.e.” the event “no d.u.e.,” Probabilities of focal events are expressed by the following possibility distribution: (97) The compositional rule is described by the following relations: (98)

(103) denotes mathematical expectation. Here In the case of practical calculations, the corresponding Poisson limit

(99) is a combined event whose probability is defined where by (99). can be considered as the mathematical Notice that deexpectation of the random variable fined on the constant body of evidence. The generating function corresponding to (99) is

(104) is used instead of (99). The generating function is (105) The meaning of constant is defined by the relation

(100)

(106)

is the generating function of the Bernoulli distriwhere bution. defined on the comMoments of the random variable bined body are

i.e., is the difference between common and focal mathematical expectations. Formula (99) describes a class of carteges that can be called “nonuncomponent.” All seats in these carteges are filled with elements (no empty symbols). Such a class of carteges can be used as a model if the frequency of the zero event can be repre. Experience shows that for any sented by number grapheme this condition is not valid. One must use the more general model based on distribution (92). This distribution differs from (99) by supplementary factors and turns into (99) when . It is easy to see that the corresponding generating function is (107)

(101)

and

where by convention when when For example (108)

(102)

is the mean over the focal probabilities. where Let it be supposed that distribution (92) must be considered as a model grapheme distribution by words. The division of all words of natural language into fuzzy classes corresponds to focal events. Each of them corresponds to specific features of word formation of a given class.

434

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 10, NO. 4, AUGUST 2002

TABLE I RESULTS OF DIFFERENT FREQUENCIES FOR GRAPHEME “a”

TABLE II RESULTS OBTAINED FROM (110)

As it will be shown below, a good possibility distribution is the following one: (109) Using (108) and the observation results, one gets

which for defining

gives

From this equation, . 0, 3951, and 0, Focal probabilities 6049. The data obtained gives the following combined distribution for grapheme “a” over words

(110) Results of calculations are given in Table II. For comparing the mean of observations, results over all initial graphemes are given in the same table. Fuzzy subset of the number of determined graphemes (111) can be interpreted as “approximately 0,” i.e., “almost all” graphemes “a” are random. REFERENCES Illustrative Example The considered model will be used for investigating grapheme distribution in Spanish words. The consideration of all graphemes, the comparative analysis of their distributions and models and informational analysis will be given in a separate paper. Let “a” be considered as an example. The initial material was obtained from the Diccionario manual Español–Ruso [17], with about 6000 dictionary units. The observation results of different frequencies for grapheme “a” are given in Table I. The vertical entries correspond to the number of meetings of graphemes of a word, the horizontal ones to the different initial graphemes of a word.

[1] L. Zadeh, “Fuzzy sets,” Inform. Control, vol. 8, p. 338, 1965. [2] T. Gachechiladze and T. Manjaparashvili, “Fuzzy random events and corresponding probability measures,” Rep. Tbilisi University, Tbilisi, Georgia, 1990. [3] F. Criado and T. Gachechiladze, “Fuzzy random events and their corresponding conditional probability measures,” in Real Academia de Ciencias Exactas LXXXIX, Madrid, Spain, 1995. [4] G. Birkhoff, Lattice Theory, NY, 1981. [5] H. Weil, Filosofie der mathematik und Naturwissenschaft-Handbuch der Filosofie, A. Baumler and S. Schöter, Eds., 1927. [6] R. Yager, “On the measures of fuzziness and negation, I,” Int. J. General Syst., vol. 5, p. 221, 1979. [7] M. Loeve, Theory of Probability. Princeton, NJ: Van Nostrand, 1960. [8] A. Fainstein, Foundation of Information Theory. New York: McGrawHill, 1958. [9] L. Zadeh, “Probability measures and fuzzy events,” J. Math. Anal. and Applic., vol. 23, p. 424, 1968. [10] S. Kullback, Information Theory and Statistics. London, U.K.: Wiley, 1958.

CRIADO TORRALBAet al.: FUZZY MODELS OF LANGUAGE STRUCTURES

435

[11] A. De Luca and S. Termini, “A definition of nonprobabilistic entropy,” Inform. Control, vol. 20, p. 301, 1972. [12] K. Hiroto, “Ambiguity based on the concept of subjective entropy,” in Fuzzy Information and Decision Processes, M. M. Gupta and E. Sánchez, Eds. Amsterdam, The Netherlands: North Holland, 1982. [13] F. Criado and T. Gachechiladze, “Entropy of fuzzy events,” Fuzzy Sets Syst., vol. 88, no. 1, 1997. [14] T. Gachechiladze and T. Manjaparashvili, “Fuzzy linguistical models,” in Quantitative Linguistic. Tbilisi, Georgia: Tallin-Tbilisi, 1990. [15] W. Fucks, “Mathematical theory of word formation,” in Communication Theory, London, U.K., 1953. [16] B. Mandelbrot, “An information theory and statistical structure of language,” in Communication Theory, W. Jackson, Ed. London, U.K., 1953. [17] Diccionario Manual Español-Ruso Moscu, Russia, 1978. [18] L. Zadeh, “The concept of linguistic variable and its application to approximate reasoning,” Inform. Sci., vol. 8, pp. 199–249, 301–357, 1975.

Tamaz Gachechiladze was born in 1929. He is an Assistant Professor with the Department of Stochastic Processes Theory, Tbilisi State University, Georgia. His current research interests include mathematical cybernetics and informatics.

Fransisco Criado Torralba was born in 1945. He received the Ph.D. degree from Malaga University, Malaga, Spain, in 1979. Currently, he lectures on statistics and operational research at Malaga University. He has published a large number of papers on mathematical systems modeling, fuzzy systems, game theory and control theory, as well as having participated actively in several EU research programs.

Guram Tsertsvadze was born in 1933. He is Chair of Mathematical Cybernetics and Informatics, and Full Professor in the Department of Applied Mathematics and Computer Sciences, Tbilisi State University, Georgia. His current research interests include mathematical cybernetics and informatics.

Hamlet Meladze was born in 1939. He is currently Chair of Computer Mathematical Providing and Information Technologies, and Full Professor in the Department of Applied Mathematics and Computer Sciences, Tbilisi State University, Georgia. His current research interests include mathematical cybernetics and informatics.

Suggest Documents