On Advanced Computing With Words Using the ... - IEEE Xplore

0 downloads 0 Views 4MB Size Report
Oct 2, 2014 - to solve Advanced Computing with Words (ACWW) problems. ... ing fuzzy set solutions are decoded into natural language words.
IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 22, NO. 5, OCTOBER 2014

1245

On Advanced Computing With Words Using the Generalized Extension Principle for Type-1 Fuzzy Sets Mohammad Reza Rajati, Student Member, IEEE, and Jerry M. Mendel, Life Fellow, IEEE

Abstract—In this paper, we propose and demonstrate an effective methodology for implementing the generalized extension principle to solve Advanced Computing with Words (ACWW) problems. Such problems involve implicit assignments of linguistic truth, probability, and possibility. To begin, we establish the vocabularies of the words involved in the problems, and then collect data from subjects about the words after which fuzzy set models for the words are obtained by using the Interval Approach (IA) or the Enhanced Interval Approach (EIA). Next, the solutions of the ACWW problems, which involve the fuzzy set models of the words, are formulated using the Generalized Extension Principle. Because the solutions to those problems involve complicated functional optimization problems that cannot be solved analytically, we then develop a numerical method for their solution. Finally, the resulting fuzzy set solutions are decoded into natural language words using Jaccard’s similarity measure. We explain how ACWW problems can solve some potential prototype engineering problems and connect the methodology of this paper with Perceptual Computing. Index Terms—Advanced Computing with Words (ACWW), Generalized Extension Principle, Perceptual Computing, type-1 fuzzy sets, validation, Zadeh’s challenge problems.

I. INTRODUCTION OMPUTING with Words (CWW or CW) is a methodology of computation whose objects are words rather than numbers [19], [25], [47], [49]. Words are drawn from natural languages and are modeled by fuzzy sets. Basic CWW deals with simple assignments of attributes through IF-THEN rules. According to Zadeh [49], Advanced Computing with Words (ACWW) involves problems in which the carriers of information are numbers, intervals, and words. Assignment of attributes may be implicit, and one generally deals with assignments of linguistic truth, probability, and possibility constraints through complicated natural language statements. Moreover, world knowledge is necessary for solving ACWW problems [20]. Modeling words is subsequently an important task for solving ACWW problems. Mendel [19] argues that since words mean different things to different people, a first-order uncertainty model for a word should be an Interval Type-2 Fuzzy Set (IT2 FS). Interestingly, Zadeh [49] anticipates that in the future,

C

Manuscript received July 27, 2013; accepted October 7, 2013. Date of publication November 5, 2013; date of current version October 2, 2014. The work of M. R. Rajati was supported by the Annenberg Fellowship Program and the 2011 Summer Research Institute Fellowship Program of the University of Southern California. The authors are with the Ming Hsieh Department of Electrical Engineering, Signal and Image Processing Institute, University of Southern California, Los Angeles, CA 90089 USA (e-mail: [email protected]; [email protected]). Digital Object Identifier 10.1109/TFUZZ.2013.2287028

fuzzy sets of higher type will play a central role in ACWW. Therefore, it is plausible to examine the solutions to ACWW problems when words are modeled by IT2 FSs. In this paper, however, we use IT2 FS models of words only to establish Type1 Fuzzy Set (T1 FS) models of them, for reasons that are given in Section II-A. CWW has been applied successfully to hierarchical and distributed decision making [8], [22], Perceptual Reasoning and Perceptual Computing [21], [23], [40], and decision support [9], [15]. There have been extensive attempts to implement the approach in more realistic settings, e.g., [12], [24], and [28], and attempts to formalize the paradigm of CWW [3], [4], [13]. Despite the extensive literature on CWW, there have only been a few attempts to deal with ACWW problems to the best of our knowledge [27], [29]–[31], [33], [34]. Zadeh has introduced a set of challenge problems for ACWW in his recent works, and has proposed solutions to them [46], [49]. The solutions to many of the problems utilize the Generalized Extension Principle (GEP), which results in complicated optimization problems. In this paper, we establish a methodology for carrying out the computations of the GEP for solving ACWW problems. We are focusing on solving one of Zadeh’s famous ACWW problems, but later demonstrate that some of his other famous ACWW problems can also be solved in exactly the same manner. Historically, Zadeh’s ACWW challenge problems have been formulated in terms of everyday reasoning examples involving attributes like height, age, distance, etc. (e.g., Probably John is tall. What is the probability that John is short?1 ) In this paper, we also demonstrate how ACWW can address more realistic problems dealing with subjective judgments. It has been demonstrated in [6], [7], [10], [26], [37], [38], [42], and [53] that subjective, fuzzy, and linguistic probabilities can be used to model the subjective assessments of safety, reliability, and risk. Some examples that address product safety, network security, and network trust are as follows. It is somewhat improbable that cars of model X are unsafe. What is the probability that they are safe? It is very probable that the network provided by Company Y is highly secure. What is the probability that it is somewhat insecure? Probably the online auction of website Z is pretty trustworthy. What is the probability that it is extremely trustworthy?

1 This question can also be stated as: What is the probability that (the height of) John is short?

1063-6706 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

1246

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 22, NO. 5, OCTOBER 2014

It is somewhat likely that the risk associated with investment in real estate is high. What is the average risk associated with investment in real estate? In all of these examples, the numeric probability of a fuzzy event (e.g., unsafe, highly secure) is implicitly constrained by a linguistic probability (e.g., somewhat probable, somewhat likely), and therefore they are ACWW problems. These examples suggest that the methodologies for solving Zadeh’s ACWW problems can also be applied to solve more realistic problems. We show how to do this in Section V. The rest of this paper is organized as follows. In Section II, we describe a famous ACWW problem, its variations, and their solutions that are obtained by using the GEP. In Section III, we first establish fuzzy set models of the words that are involved in the ACWW problems of Section II, and then implement numerical solutions to the problems when type-1 fuzzy set word models are used. In Section IV, we provide discussions about how to validate solutions of ACWW problems that are solved in this paper. In Section V, we solve an engineering ACWW problem, using the methodology of our earlier sections. In Section VI, we show how some of Zadeh’s other ACWW problems can be solved using the same methodology as described in Section III. In Section VII, we present a high-level discussion describing how ACWW problems can be formulated by investigating the linguistic description of the problems. In Section VIII, we investigate the relationship between the methodology of this paper with Perceptual Computing; and, finally in Section IX, we present some conclusions, as well as some directions for future research.

is calculated as2 [43]:



PTall =

b

pH (h)μTall (h)dh

in which a and b are the minimum and maximum possible heights of men and pH is the probability distribution function of heights, where  b pH (h)dh = 1. (2) a

The probability of the fuzzy event “Short” is calculated as  b pH (h)μShort (h)dh. (3) PShort = a

To derive the soft constraint imposed on PShort by the fact that PTall is constrained by “Probably,” one needs to use the framework of the GEP, which is an important tool for propagation of possibilistic constraints, and was originally introduced in [45]. Assume that f (·) and g(·) are real functions: f, g : U1 × U2 × · · · × Un −→ V. f (X1 , X2 , . . . , Xn ) is A g(X1 , X2 , . . . , Xn ) is B where A and B are T1 FSs. Then, A induces B as in (5), shown at the bottom of the page. The GEP basically extends the function g(f −1 (·)) : V → V to T1 FSs, where f −1 is the pre-image of the function f (·). In the PJS problem, f = PTall , and

Among Zadeh’s many ACWW problems is the following famous Probability that John is short (PJS) problem: Probably John is tall. What is the probability that John is short? This problem involves a linguistic probability (probably) and an implicit assignment of that linguistic probability to the probability that John is tall. The probability that “John is tall,” PTall ,

μB (v) =



sup

g : X[a,b] −→ R.

(7)

The GEP then implies that the soft constraint on the probability that “John is short” is as given in (8), shown at the bottom of the page. 2 We assume that “Probably John is tall” is equivalent to “It is probable that John is tall.”

μA (f (u1 , u2 , . . . , un )),

u 1 ,u 2 ,...,u n |v =g (u 1 ,u 2 ,...,u n )

∃v = g(u1 , u2 , . . . , un ) (5)  ∃v = g(u1 , u2 , . . . , un )

0,

⎧ ⎪ ⎪ sup ⎪ b ⎪ ⎪ ⎪ ⎨ p H |v = a p H ( h ) μ S h o r t ( h ) d h b μP S h o r t (v) = p H (h )d h = 1 a ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ 0,

(6)

where X[a,b] is the space of probability distribution functions on [a, b]. In addition, g = PShort , and

A. PJS and PJW Problems

⎧ ⎨

(4)

Moreover, assume that

f : X[a,b] −→ R II. PROBLEM DESCRIPTION

(1)

a

 μProbable

b

pH (h)μTall (h)dh ,



b

∃pH ∈ X[a,b] s.t. v =

a

pH (h)μShort (h)dh a

  ∃pH ∈ X[a,b] s.t. v =

b

pH (h)μShort (h)dh a

(8)

RAJATI AND MENDEL: ADVANCED COMPUTING WITH WORDS USING THE GENERALIZED EXTENSION PRINCIPLE

Note that (8) cannot be solved by using the α-cut decomposition theorem, since B(α) = g(f −1 (A(α))) (f = PTall and g = PShort ), but the relation f −1 (·) cannot be derived explicitly. In this paper, we also study the variations of the PJS problem, that include: Probably John is tall. What is the probability that John is W ? This is called the “PJW problem.” In this problem, W represents any height word. Similar to (8), the GEP yields the soft constraint PW on the probability that “John is W ,” as in (9), shown at the bottom of the page. Note that (8) and (9) are difficult functional optimizations that have to be carried out over X[a,b] , the space of probability distributions over [a, b]. In this paper, a methodology for performing the optimizations is offered. Although our solutions to the PJW problems use T1 FSs, those FSs are obtained from IT2 FS models for reasons that are given in Section III. The readers may be wondering why we have not presented our solutions to the PJW problem entirely within the framework of IT2 FSs since we are strong advocates for using such FSs in CWW problems. We have not done this for the following reasons. 1) To date, although Zadeh formulated the PJS problem and gave the structure of its solution using the GEP for T1 FSs, no numerical or linguistic solutions were provided even for T1 FSs. We believe that we are the first to do so in this paper. 2) All of the basic concepts for solving the PJW problem are more easily explained and understood using T1 FSs. 3) To date, how to state and solve the GEP for IT2 FSs is an open research topic, one that we are researching and will report on in the future. 4) We strongly believe that an IT2 solution to a problem needs to be compared with T1 solutions, so that one can: 1) see if an IT2 solution is needed, and 2) observe similarities and differences between the T1 and IT2 solutions. The results that we have presented in this paper will serve as a useful baseline for such comparisons. B. An Interesting Special Case When the probability distribution of the height of the population of which John is a member is known, the PJW problem becomes a special problem. Assume that the probability distribution of heights is known to be p∗H . Then, the answer to the question “What is the probability that John is W ” is a numerical value  b ∗ PW = p∗H (h)μW (h)dh (10) a

1247

problems compensates for the imprecision of knowledge about the probability distribution of heights. However, if the problem is to be solved given that “Probably John is tall,” we can still use b ∗ ∗ . Since PTall = a p∗H (h)μTall (h)dh (9) to derive an MF for PW is also a numerical value, (9) reduces to: μP W∗ (v) = ⎧  ⎪ ⎨μ Probable

⎪ ⎩

b

 p∗H (h)μTall (h)dh , v =

a

b

p∗H (h)μW (h)dh

a

0,

otherwise. (11)

∗ . Equation (11) says that μP W∗ represents a fuzzy singleton at PW Obviously, the membership value of the fuzzy singleton depends b on the compatibility of Probable with a p∗H (h)μTall (h)dh ∗ (which is exactly μProbable (PTall )). The less this compatibility is, then the smaller is the membership value for this fuzzy singleton, which can be interpreted as less confidence in the solution. b In the extreme case when μProbable ( a p∗H (h)μTall (h)dh) = 0, ∗ because μP W∗ (v) = 0 ∀v, we obtain the empty set as PW , which reflects a total incompatibility of a human’s world knowledge (assumed for the problem) with reality. It is worth noting that the empty set can be interpreted as the word Undefined [50].

TO

III. IMPLEMENTATION OF THE SOLUTION THE PROBABILITY THAT JOHN IS W PROBLEM

In this section, we implement the solution to the PJW problem which includes the PJS problem. We model the words involved in the problem, and use a methodology to approximate the solution to the optimization problem of (9), shown at the bottom of the page. A. Modeling Words To begin, we established the following vocabularies of linguistic heights and linguistic probabilities: Heights = {Very short, Short, Moderately short, Medium, Moderately tall, Tall, Very tall} and, Probabilities = {Extremely improbable, Very improbable, Improbable, Somewhat improbable, Tossup, Somewhat probable, Probable, Very probable, Extremely probable}. Next, we modeled all of these words as FSs. Recall, that there are at least two types of uncertainty associated with a word [18], [19]: 1) intra-uncertainty, the uncertainty an individual has about the meaning of a word;3 and 2) interuncertainty, the uncertainty a group of people have about the meaning of a word. In other words, words mean different things

and there is no need for the information “Probably John is tall.” 3 This is related to the uncertainty associated with unsharpness of classes [48]. This means that this piece of information in the PJW and PJS  b  b ⎧ ⎪ ⎪ sup μ p (h)μ (h)dh , ∃p ∈ X s.t. v = pH (h)μW (h)dh, Probable H Tall H [a,b] ⎪ b ⎪ ⎪ a a ⎪ p ( h ) μ ( h ) d h p |v = W ⎨ H a H b (9) μP W (v) = p H (h )d h = 1 a ⎪ ⎪ ⎪  ⎪ b ⎪ ⎪ ⎩ 0,  ∃pH ∈ X[a,b] s.t. v = pH (h)μW (h)dh a

1248

Fig. 1.

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 22, NO. 5, OCTOBER 2014

FOU of an IT2 FS described by nine parameters.

Fig. 3.

Vocabulary of IT2 FSs representing linguistic probabilities.

the pairwise Jaccard similarities between the height words. Jac and B

is calculated card similarity [5], [39] between IT2 FSs A as

B)

sJ (A,  

(u))du + U min(μA (u), μB

(u))du U min(μA (u), μB  = .

(u))du + U max(μA (u), μB

(u))du U max(μA (u), μB (12)

Fig. 2.

Vocabulary of IT2 FSs representing linguistic heights.

to different people, and this fact calls for (at least) using IT2 FSs as models of words [18], [19]. In order to synthesize IT2 FS models of words, we began by collecting data from a group of subjects and then used either the interval approach (IA) [14] or enhanced interval approach (EIA) [41]. We collected data from 48 subjects using the Amazon Mechanical Turk website [1] for the aforementioned vocabulary of linguistic heights and from 111 subject for the vocabulary of linguistic probabilities [32]. We used the EIA [41] to obtain IT2 FS models of the words from the data. The IT2 FS footprints of uncertainties (FOUs) have nine parameters (see Fig. 1); (q, r, s, t) determine the upper membership function (UMF) and (q1 , r1 , s1 , t1 , h) determine the lower membership function (LMF), where h is the height of the LMF. The vocabulary of linguistic heights modeled by IT2 FSs is depicted in Fig. 2. We assumed that the minimum possible height is a = 139.7 cm and the maximum possible height is b = 221 cm.4 The parameters of the FOUs of the linguistic height words are given in Table I. To make sure that the vocabulary of height words provides an appropriate partitioning of the space of heights, we calculated 4 This range corresponds to [4 7 , 6 9 ], and is inspired by the range [140 cm, 220 cm] that Zadeh uses in [51].

Pairwise similarities between the height words are shown in Table II. Observe that the words have pairwise similarities that are less than 0.5, indicating that this vocabulary provides a good partitioning of the universe of discourse. Similar information for the vocabulary of linguistic probabilities [32] is given in Fig. 3 and Tables III and IV. In Table IV, observe that the probability words also have pairwise similarities less than 0.5, indicating that this vocabulary provides a good partitioning of the universe of discourse. Because Zadeh’s solutions [49] involve T1 FS models of words, we chose those models from our IT2 FS models by using two kinds of embedded T1 FSs.

Definition 1: An embedded type-1 fuzzy set Ae of an IT2 FS A over U is a T1 FS over U for which ∀t ∈ U, μA (t) ≤ μA e (t) ≤ μA (t). Definition 2: A normal embedded type-1 fuzzy set An e of an

over U is an embedded T1 FS of A

that is normal. IT2 FS A That is supt μA n e (t) = 1. An UMF is a normal embedded T1 FS. Our first T1 FS is the UMF from each of the IT2 FS FOUs (see Figs. 2 and 3, and Tables I and III). Our second T1 FS is obtained from Fig. 1 and Fig. 3 FOU parameters, as shown in Fig. 4, and is also a normal embedded T1 FS, sort of an average T1 FS. In the sequel, we refer to this T1 FS as a “normal embedded T1 FS.” The vocabulary of height words modeled by normal embedded T1 FSs is shown in Fig. 5 and their parameters are given in Table V.

RAJATI AND MENDEL: ADVANCED COMPUTING WITH WORDS USING THE GENERALIZED EXTENSION PRINCIPLE

1249

TABLE I MEMBERSHIP FUNCTION PARAMETERS OF THE HEIGHT WORDS DEPICTED IN FIG. 2

TABLE II PAIRWISE SIMILARITIES BETWEEN THE HEIGHT WORDS DEPICTED IN FIG. 2

TABLE III MEMBERSHIP FUNCTION PARAMETERS OF THE PROBABILITY WORDS DEPICTED IN FIG. 3

TABLE IV PAIRWISE SIMILARITIES BETWEEN THE PROBABILITY WORDS DEPICTED IN FIG. 3

Comparable MFs of normal embedded T1 FS models of our nine probability words are given in Fig. 6 and Table VI, respectively. We also observed that our probability words have pairwise similarities less than 0.5, indicating that this vocabulary provides a good partitioning of the universe of discourse for probability. B. Approximate Solution to The Optimization Problem

Fig. 4. Normal embedded type-1 fuzzy sets (shown by the dashed line) for left-shoulder, interior, and right-shoulder FOUs.

Next, we solve the optimization problem of (9). It is a functional optimization problem, and cannot be solved analytically. Instead, our approach is to: 1) Choose the family (families) of probability distributions pertinent to the problem.

1250

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 22, NO. 5, OCTOBER 2014

Fig. 5. Vocabulary of linguistic heights modeled by normal embedded type-1 fuzzy sets.

TABLE V MEMBERSHIP FUNCTION PARAMETERS OF THE HEIGHT WORDS DEPICTED IN FIG. 5

2) Choose the ranges of the parameters of the families of probability distributions. 3) Discretize the ranges of parameters of the probability distributions. 4) Construct a pool of probability distributions having all possible combinations of parameters, and for all of its members: a) Choose a specific pH from the pool (again, note b that a pH (h)dh = 1, because pH (h) is a probability distribution function on [a, b]). b b) Compute v = a pH (h)μW (h)dh. b c) Compute a pH (h)μTall (h)dh. b d) Compute γ(v) = μProbable ( a pH (h)μTall (h)dh). 5) Construct a scatter plot of γ(v) versus v. 6) Detect an envelope of γ(v), namely μP W (v). The envelope detection plays the role of taking the sup. One can imagine different ways of detecting the envelope. We used the following algorithm: a) Divide the space of possible v’s, which is [0, 1], into N bins. b) For each bin 1) Search for all the (v, γ(v)) pairs whose v value falls in the bin. 2) Compute γ ∗ , the maximum of the γ(v)’s associated with the pairs found in the previous

Fig. 6. Vocabulary of normal embedded type-1 fuzzy set models of linguistic probabilities. TABLE VI MEMBERSHIP FUNCTION PARAMETERS OF THE PROBABILITY WORDS IN FIG. 6

step. If there are no pairs whose v’s fall in the bin, γ ∗ = 0. 3) For v’s that are members of the bin, set μP W (v) = γ ∗ . Implementation of solutions using a huge number of distributions involves an enormous computational burden, and it is impossible to carry out the optimization over all possible distributions. One needs to incorporate some additional world knowledge about the type of probability distribution of heights into the solution of the PJW problem, since one should only use probability distributions for heights of males that make sense. More generally, each ACWW problem has a real-world domain associated with it; hence, it is either explicitly or implicitly constrained by that domain. Therefore, when probability distributions are needed, they should be selected pertinent to that domain. It is shown in [36] that the distribution of the height of Americans is a mixture of Gaussian distributions, since heights of both American men and women obey Gaussian distributions. This suggests that the optimization problem in (9) can be carN ried out on X[a,b] , the space of all normal distributions over [a, b]. The probability density function of a Gaussian probability distribution is formulated as ( x −α ) 2 1 − f (x|α, β) = √ e 2β 2 . 2πβ

(13)

RAJATI AND MENDEL: ADVANCED COMPUTING WITH WORDS USING THE GENERALIZED EXTENSION PRINCIPLE

5 The smallest average height among world countries is 160 cm and the largest average height is 183.8 [2]. For standard deviations, we could not find a source that reports all the standard deviations of heights for each country. The standard deviation of height of Americans is 7–7.5 cm [36]; therefore, we intuitively chose the range [5, [10] as the range of standard deviations of heights. Note that this information comprises additional world knowledge for solving the problem.

1 0.8

0.6

0.6

γ( v)

γ( v)

1 0.8

0.4

0.4

0.2

0.2

0 0

0.2

0.4

v

0.6

0.8

0 0

1

0.2

0.4

1 0.8

0.6

0.6

γ( v)

1 0.8

0.4

0.4

0.2

0.2

0.2

0.4

v

0.6

0.8

0 0

1

0.2

0.4

0.8

0.8

0.6

0.6

γ( v)

1

0.4

0.4

0.2

0.2

0.4

1

v

0.6

0.8

1

0.6

0.8

1

PTall

PModerately tall 1

0.2

0.8

(d)

(c)

0 0

0.6

PMedium

PModerately short

0 0

v

(b)

(a)

γ( v)

v

0.6

0.8

0 0

1

0.2

0.4

v

(f)

(e)

PVery tall 1 0.8

γ( v)

g(x|α, β) ≡ f (x|α, β)/(F (b|α, β) − F (a|α, β))I[a,b] (x) (14) where I[a,b] (·) is the indicator function of the interval [a, b]. In this paper, we chose 200 equally spaced points in the intervals [160, 183.8] and [5, 10], respectively,5 as candidates for α and β, which led to 40 000 Gaussian distributions (normalized over [a, b]), and then implemented the above algorithm. The γ(v) versus v scatter plots for the case of using UMF T1 FSs are depicted in Fig. 7. It is not surprising that the scatter plots for the words Very short, Short, and Moderately short are leftshoulders, because, the a priori knowledge that “Probably John is tall” intuitively suggests that the probability of short sounding height words must be close to zero. Note that in Fig. 7(a), the scatter plot is so narrow that it is not visible. As we will see, it gives a very narrow fuzzy set, which is close to a singleton at zero. The scatter plot associated with Tall has points that are exactly on the MF of Probable. The scatter plot associated with Very tall has an interior shape and results in an interior MF. Similar plots were obtained for normal embedded T1 FSs. The envelopes for the scatter plots associated with UMFs and normal embedded T1 FSs are depicted in Figs. 8 and 9, respectively. We then calculated the Jaccard’s similarity of the solutions with each linguistic probability, so as to translate the results into natural language words. Those similarities are summarized in Tables VII and VIII. The boldface numbers signify the words with the highest similarities to each PW . All of our solutions to the problem “What is the probability that John is W ,” given “Probably John is tall,” are summarized in Table IX. They were obtained by choosing the word (in Tables VII and VIII) that has the largest similarity. Therefore, for example, when UMFs are used as T1 FS models of words, then the linguistic solutions to the PJW problems are “It is extremely improbable that John is very short.” “It is extremely improbable that John is short.” “It is extremely improbable that John is moderately short.” “It is very improbable that John has medium height.” “It is somewhat improbable that John is moderately tall.” “It is probable that John is tall.” (which means that the algorithm works correctly, since we have already assumed that “Probably John is tall”.) “It is improbable that John is very tall.” Note that in Figs. 8 and 9, some of the fuzzy sets are so narrow that they are not visible. Consequently, their similarities with the word extremely improbable are very small. One can linguistically interpret those narrow fuzzy probabilities as around zero.

PShort

PVery short

γ( v)

Because x ∈ [a, b], we normalize each probability distribution f (x|α, β) by F (b|α, β) − F (a|α, β), so as to make a probability distribution on [a, b], where F (x|α, β) = (13) x −∞ f (λ|α, β)dλ is the cumulative distribution function of f (x|α, β); therefore, for each distribution, we construct

1251

0.6 0.4 0.2 0 0

0.2

0.4

v

0.6

0.8

1

(g) Fig. 7. Scatter plots for each P W using UMFs as T1 FS models of words, when distributions are Gaussian.

Observe, from Table IX that four of the linguistic solutions are the same, regardless of which T1 FS model was used for all the words; however, the fact that three of the solutions are different suggests that results for this ACWW problem can be sensitive to the T1 FS that is used to model a word. Comparing Figs. 8 and 9, observe that normal embedded T1 FSs provide narrower MFs for the words involved in the PJW problem as compared with UMF T1 FSs. Because the GEP propagates uncertainty, solutions corresponding to normal embedded T1 FS are therefore narrower (i.e., less uncertain) than solutions using UMF T1 FSs.

1252

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 22, NO. 5, OCTOBER 2014

(b)

(a)

(c)

(d)

(e)

(f)

(g) Fig. 8. Detected envelopes μ P W (v) for plots in Fig. 7. UMFs were used as T1 FS models of words and probability distributions were Gaussian.

A reviewer of this paper suggested that the MFs may be determined only by using the endpoints of the intervals that are used as ranges for the means and standard deviations of the Gaussian probability distributions. We examined v = PTall and μProbable (PVerytall ) for α = 160, 183.6, and β = 5, 10 and obtained the following (v, γ(v)) pairs: (0, 0), (0.0067, 0), (0.2314, 0.5055), (0.2704, 0.4566). These four points do not describe the shape of the fuzzy set in Fig. 8(g); hence, it is not possible to determine solution MFs by using only the endpoints of the intervals that are used as ranges of the parameters of the probability distributions.6

6 A somewhat analogous situation occurs when one computes the centroid of an IT2 FS [11]. The two endpoints of the centroid are not associated with the the center of gravity of the lower and upper MFs. Instead, each is associated with a T1 FS that includes a portion of both the lower and upper MFs.

(a)

(b)

(c)

(d)

(e)

(f)

(g) Fig. 9. Detected envelopes μ P W (v) when normal embedded type-1 fuzzy set models of words and Gaussian distributions were used.

IV. ON CORRECTNESS OF THE RESULTS A natural question to ask at this point is: How does one know that the solution obtained for our ACWW problem is correct? In [19], Mendel claims that a Turing-type test is presently the only way to validate a CWW solution. Such a test requires that a human provide an answer to the ACWW problem; however, we have found that ACWW problems are so complicated that a human might be unable to do this. Consequently, we know of no general way to answer this question. Instead, we fall back on a very simple idea, namely, we consider one PJW problem for which we surely know what the correct answer is, and see if the GEP and our algorithm for solving (9) provide that answer. That PJW problem is: “Probably John is W . What is the probability that John is W ?” Common sense tells us the answer is “Probable,” i.e., “It is probable that John is W .”

RAJATI AND MENDEL: ADVANCED COMPUTING WITH WORDS USING THE GENERALIZED EXTENSION PRINCIPLE

1253

TABLE VII SIMILARITIES BETWEEN THE T1 FSS DEPICTED IN FIG. 8 AND UMFS FOR LINGUISTIC PROBABILITY WORDS

TABLE VIII SIMILARITIES BETWEEN THE T1 FSS DEPICTED IN FIG. 9 AND NORMAL EMBEDDED T1 FSS FOR LINGUISTIC PROBABILITY WORDS

TABLE IX SUMMARY OF THE SOLUTIONS TO THE PROBLEM “WHAT IS THE PROBABILITY THAT JOHN IS W , GIVEN “PROBABLY JOHN IS TALL”

μP W (v) = ⎧ ⎪ ⎪ ⎪ ⎨μProbable (v),

 ∃pH ∈

N X[a,b]

b

s.t. v =

pH (h)μW (h)dh a



⎪ ⎪ ⎪ ⎩0,

 ∃pH ∈ X[a,b] s.t. v =

b

pH (h)μW (h)dh. a

(17)

The solution to this problem using T1 FSs and the GEP is in (15), shown at the bottom of the next page [see also (9)]. Because  sup

b p H (h )μ W  ba

p H |v =

a

=

μProbable



b

pH (h)μW (h)dh a

(h )d h

p H (h )d h = 1

sup b p H (h )μ W  ba

p H |v =

a

μProbable (v) = μProbable (v) (16) (h )d h

p H (h )d h = 1

(15) simplifies to:

μP W

⎧ ⎪ ⎪ sup ⎪ ⎪ b ⎪ ⎪ ⎪ ⎨ p H |v = a p H ( h ) μ W ( h ) d h b p H (h )d h = 1 (v) = a ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ 0,

 μProbable

b

From (17), it is obvious that the membership function of PW is always equal to that of P robable, except for those v’s for which no probability distribution is found that satisfies v = b a pH (h)μW (h)dh, which, by the second line of (17), means that μP W (v) = 0. The latter occurs when W ’s have membership functions that have small overlaps with all pH ’s, so the integral b a μW (h)pH (h)dh has an upper bound very smaller than 1. Therefore, the solution using T1 FSs gives the word “Probable” as the solution, except for those v’s for which no pH is found so b that v = a μW (h)pH (h)dh (i.e., v’s that are greater than the b upper bound of a μW (h)pH (h)dh), for which μP W is set to zero by (17), resulting in a “trimmed” version of “Probable.” The similarity of the output with “Probable” is not exactly 1, due to the approximation enforced by the envelope detection algorithm and possible “trimming.” We could stop with theoretical result (17), but we want to verify that our computational method also leads to the right-hand side of (17).

pH (h)μW (h)dh , a

 ∃pH ∈

N X[a,b]

b

s.t. v =

pH (h)μW (h)dh, a

(15)  N  ∃pH ∈ X[a,b] s.t. v =

b

pH (h)μW (h)dh a

1254

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 22, NO. 5, OCTOBER 2014

TABLE X SIMILARITIES BETWEEN THE WORDS IN FIG. 10 AND THE UMFS OF THE LINGUISTIC PROBABILITY WORDS OF FIG. 3

(a)

(b)

(c)

(d)

(e)

(f)

(g) Fig. 10. Detected envelopes μ P W (v), which are solutions to the PJW problem “Probably John is W . What is the probability that John is W ?”

In order to demonstrate (17), we calculated the solution to the PJW problem “Probably John is W . What is the probability that John is W ?” We used UMF T1 FS word models, and divided the intervals [160, 183.8] and [5, 10] into 200 equally spaced points so that there were 200 × 200 = 40 000 Gaussian probability distributions. The solutions for each W are shown in Fig. 10, and their similarities with the vocabulary of linguistic probabilities

modeled by UMFs are given in Table X. Observe that five of the seven solutions map to the word “Probable,” which is the result that was expected. For the words Very short and Very tall, we got zero MFs over the whole [0, 1] range, due to the fact that the overlaps of the selected pH ’s with μW were small, so the maximum v = b a μW (h)pH (h)dh was so small that μProbable (v) = 0 for all of the v’s. (Note that μProbable (v) = 0 for v ∈ [0.5086, 0.7914].) b In addition, for the v’s that are not equal to a μW (h)pH (h)dh for any pH , the GEP in (17) and our algorithm sets μP W = 0. One can interpret this phenomenon as the discrepancy between the FS models of the extreme words “Very short,” and “Very tall” and the world knowledge about the ranges of the parameters of the probability distributions, i.e., the possibility assigned by the extreme words to height values that are close to the endpoints of the interval of heights is large, while the probability densities that are assigned to those points by the probability distributions are very small. This may suggest that, for not having such a discrepancy, one should establish a new scale for heights based on the ranges for means and standard deviations (e.g., one could say that the smallest height to be considered is the smallest mean height (160 cm) minus twice the smallest conceivable standard deviation (2 × 5 cm), which gives 150 cm, and the largest height that has to be considered is the largest average height (183.6 cm) plus twice the largest conceivable standard deviation (2 × 10 cm) which gives 203.6 cm. Collecting data from subjects about heights on the new scale [150, 203.6] might solve the discrepancy problem, because then even extreme words have reasonable amounts of overlap with the probability distributions. However, neglecting people with heights less than 150 cm or more than 203.6 cm does not seem to be a reasonable decision. We leave further investigation about this problem to future research. If we use a wider range for the means of probability distributions, e.g., [142.7, 201], all of the seven solutions do map into “Probable.” This suggests (somewhat surprisingly) that besides the world knowledge, the validation process needs to be taken into account for determining the ranges of parameters of probability distributions. How to validate the GEP approach for solving ACWW problems in a more general way is an important and open research question, one that we leave for future research. V. ENGINEERING ADVANCED COMPUTING WITH WORDS PROBLEM In this section, we solve a specific problem about product reliability to demonstrate how ACWW problems for real-world

RAJATI AND MENDEL: ADVANCED COMPUTING WITH WORDS USING THE GENERALIZED EXTENSION PRINCIPLE

1255

ure (as a measure of reliability) is Weibull, whose probability density function is: β

f (x|α, β) = βα−β xβ −1 e−( α ) I[0,∞) (x) x

Fig. 11.

Vocabulary of IT2 FSs representing linguistic reliability [23, Fig. 7.5].

TABLE XI MEMBERSHIP FUNCTIONS OF THE WORDS DEPICTED IN FIG. 11 [23, TABLE 7.13]

applications can be solved by the methodology of this paper. Consider the following statement, which provides the subjective judgment about the reliability of a product, and a related question: Probably product X is highly reliable. What is the probability that the reliability of X is R? R represents a word describing reliability, and can be one of the words: None to very little, Very low, Low, More or less low, From fair to more or less high, More or less high, High, and Extremely high. These words constitute the user-friendly eight-word vocabulary introduced in [23, ch. 7]. Their FOUs are depicted in Fig. 11 and the parameters of the MFs are given in Table XI. The pairwise Jaccard similarities between those words are given in Table XII. Note that the threshold of pairwise similarity for an appropriate partitioning of the space was set to 0.6 in [23, ch. 7]; therefore, this vocabulary represents a good partitioning of the space. We chose a [0, 10] scale, and interpreted reliability as “time to failure,” so “high reliability” corresponds to a large time to failure. This scale is a “hypothetical scale” and can be rescaled to any appropriate time scale. As in Section III, we used the UMFs of IT2 FS models of reliability words as the T1 FS models of those words (see Table XI and Fig. 11). In the reliability literature, time to failure is modeled by a variety of distributions including exponential, Gaussian, and Weibull. Here, we assume that the distribution of time to fail-

(18)

where I[0,∞) (·) is the indicator function of the interval [0, ∞). In all of the simulations of this section, we chose7 α ∈ [3, 8] and β ∈ [0.01, 10], and normalized the Weibull distributions by F (10|α, β) − F (0|α, β), where F (·|α, β) is the cumulative distribution function of a Weibull distribution with the parameters α and β, to make them finite probability distributions on [0, 10]. We divided each of the intervals pertinent to α and β into 200 equally spaced points, obtaining 40 000 Weibull probability distributions. Solutions from the GEP using UMFs were obtained employing the same methodology as was applied to the PJW problem; the detected envelopes are shown in Fig. 12. We then calculated the Jaccard’s similarity of the solutions with each linguistic probability, so as to translate the results into natural language words. Those similarities are summarized in Table XIII. All of our solutions to the problem “What is the probability that the reliability of product X is R” given “Probably product X has high reliability,” are summarized in Table XIV. They were obtained by choosing the word (in Table XIII) that has the largest similarity; thus, the linguistic solutions to this problem are: “It is extremely improbable that X’s reliability is none to very little.” “It is extremely improbable that X has very low reliability.” “It is extremely improbable that X has low reliability.” “It is extremely improbable that X has more or less low reliability.” “It is probable that X’s reliability is from fair to more or less high.” “It is very probable that X’s reliability is more or less high.” “It is probable that X’s reliability is high.” (which means that the algorithm works correctly, since we have already assumed that “Probably X has high reliability.”) “It is improbable that X’s reliability is extremely high.” Comment: In some of the solutions there is a “spike” close to v = 0. Actually, when the distributions used for solving the reliability problems (or the PJW problems) have “tails” (like Gaussian and Weibull distributions), the membership function of PR (or PW ), at least for some R’s (W ’s), are nonzero for v’s that are very close to zero. When μR is far from, e.g. μHigh (R = From fair to more or less high satisfies this), and when a probability distribution has a large amount of overb lap with μHigh , then a μHigh (λ)p(λ)dλ can be large, so that b μProbable ( a μHigh (λ)p(λ)dλ) = 0. However, since μR (h) is b in the region of the tail of pH , v = a μR (λ)p(λ)dλ is very small, but is nonzero; hence, μP R (v) is nonzero for very small values of v. One may therefore expect such spikes to appear in 7 Since we normalize the probability distributions and make them finite on [0, 10], for whatever set of parameters that we select, the mean of those truncated probability distributions stays within [0, 10]; however, we showed through numerical simulation that when α ∈ [3, 8] and β ∈ [0.01, 10], the means of the probability distribustions that we chose are in the range [0.0578, 7.6106] and their standard deviations are in the range [0.3434, 2.7998].

1256

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 22, NO. 5, OCTOBER 2014

TABLE XII PAIRWISE SIMILARITIES BETWEEN THE WORDS DEPICTED IN FIG. 11

TABLE XIII SIMILARITIES BETWEEN THE WORDS DEPICTED IN FIG. 12 AND LINGUISTIC PROBABILITY WORDS

many solutions to the reliability and PJW problems that mainly look like right-shoulders or interiors (but are actually nonconvex fuzzy sets due to the spikes). Those spikes are so small that, in many solutions, they are not visible. More importantly they do not contribute very much to similarity calculations, because they have very small areas. VI. OTHER ZADEH ADVANCED COMPUTING WITH WORDS CHALLENGE PROBLEMS Zadeh has proposed some other challenge problems for ACWW [49], some of which can also be solved by the GEP. In this section, we present three of them so that the reader will see that the methodology we have presented for the PJW problems can in principle also be used to solve the other problems.

TABLE XIV SUMMARY OF THE SOLUTIONS TO THE PROBLEM “WHAT IS THE PROBABILITY THAT THE RELIABILITY OF PRODUCT X IS R” GIVEN “PROBABLY PRODUCT X HAS HIGH RELIABILITY” PR when R is: None to very little Very low Low More or less low From fair to more or less high More or less high High Extremely high

which clearly satisfies 

a

in which a and b are the minimum and maximum possible heights and pH is the probability distribution function of heights,

b

pH (h)dh = 1.

A. Tall Swedes Problem (AHS) The tall Swedes problem8 is about the average height of Swedes (AHS), and is: Most Swedes are tall. What is the average height of Swedes? This problem involves a linguistic quantifier (Most) and an implicit assignment of the linguistic quantifier to the portion of tall Swedes. The portion of Swedes who are tall is equivalent to the probability of the fuzzy event Tall, and is calculated as  b PTall = μTall (h)pH (h)dh (19)

UMF Extremely improbable Extremely improbable Extremely improbable Extremely improbable Probable Very probable Probable Improbable

(20)

a

The soft constraint “Most” is assigned to PTall . On the other hand, the average height of Swedes is calculated as  b pH (h)hdh. (21) AH = a

To derive the soft constraint imposed on AH by the fact that PTall is constrained by “Most,” one needs to use the framework of the GEP. In the Tall Swedes problem, f = PTall , where N f : X[a,b] −→ R

(22)

N is the space of all possible height probability in which X[a,b] distribution functions on [a, b]; and, g = AH, where 8 We do not use the acronym “TSP” because it is already widely used for the famous Traveling Salesman Problem.

N g : X[a,b] −→ R.

(23)

RAJATI AND MENDEL: ADVANCED COMPUTING WITH WORDS USING THE GENERALIZED EXTENSION PRINCIPLE

1257

Most of the products of Company X have somewhat short lifetimes. What is the average life-time that is expected from the products of Company X? B. Robert’s Problem (RP) The RP is as follows: Usually Robert leaves his office at about 5 pm. Usually it takes Robert about an hour to get home from work. What is the probability that Robert is at home at (or before) 6:15 pm? The time of arrival of Robert, Z, the time of departure of Robert, X, and his travel time, Y are three random variables that satisfy:

(b)

(a)

Z = X + Y. The probability density function of Z is calculated as

(d)

(c)

(25)

 pZ (v) =

+∞

−∞

pX (λ)pY (λ − v)dλ ≡ pX ∗ pY .

(26)

The probability of the fuzzy event A ≡ About 5 pm, PA is 

b1

PA = (e)

pX (λ)μA (λ)dλ

(27)

a1

(f)

where a1 and a2 are, respectively, the earliest and latest possible times that Robert leaves the office. The probability of the fuzzy event B ≡ About an hour, PB is 

b2

PB =

pY (λ)μB (λ)dλ

(28)

a2

(g)

(h)

Fig. 12. Detected envelopes μ P R when UMF fuzzy set models of words and Weibull distributions were used.

where a2 and b2 are, respectively, the smallest and the largest possible amounts of time that it takes Robert to get home from work. The probability that Robert is at home before t = 6:15 pm, PHom e is 

 μA H (v) =

sup

p H |v =

b

b a

a

p H (h )h (h )d h

μM ost a

b

t

PHom e =

The GEP then implies that the soft constraint on the average height of Swedes is computed as

pZ (λ)dλ ≡ w

(29)

t0

where t0 is the earliest time for Robert to get home. PA and PB pH (h)μTall (h)dh . are both constrained by “usually.” We want to find the constraint on PHom e , i.e., 

p H (h )d h = 1

(24) Comparing (24) and (9), it is clear that our methodology for solving (9) can also be used to solve (24). It is worth noting that other approaches to solving the tall Swedes problem have been offered in [27] and [34]. We are interested in converting Zadeh’s challenge problems to analogous engineering problems. An example of such a problem for the AHS problem would be the following:

b1

pX (λ)μA (λ)dλ is Usually a1

and  b2 pY (λ)μB (λ)dλ is Usually a2



t

PHom e = t0

(pX ∗ pY )(λ)dλ is Π.

1258

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 22, NO. 5, OCTOBER 2014

Then, μΠ (w) =

 sup t

w=

p Z ( λ) d λ

t0 p Z = p X ∗p Y b1 p X ( λ) d λ= 1 pX | a1 b2 p Y ( λ) d λ= 1 pY | a2





⎛ ⎜ ⎜ min ⎜ ⎝



b1

pX (λ)μA (λ)dλ

μUsually a1



b2

μUsually

pY (λ)μB (λ)dλ

Much taller is defined as a fuzzy relation on HSm × HIn , in which HSm and HIn are, respectively, the spaces of all possible heights of Swedes and Italians, and the degree to which Si is much taller than Ij is hij , (i = 1, . . . , m; j = 1, . . . , n), i.e.,

⎟ ⎟ . ⎟ ⎠

hij ≡ μM uch

taller (xi , yj ).

(32)

The cardinality ci of the set of Italians in relation to whom a Swede Si is much taller can be calculated using the following Σ-count [44]:

a2



(30) The following algorithm can be used to solve the RP: 1) Choose the families of probability distributions pertinent to pX and pY . 2) Choose the ranges of the parameters of the families of the probability distributions. 3) Discretize the ranges of the parameters of the probability distributions. 4) For pX , construct a pool of probability distributions having all possible combinations of parameters. 5) For pY , construct a pool of probability distributions having all possible combinations of parameters. For all of the members of the two pools: a) Choose two specific probability distributions pX and b pY from the pools (Note that a 11 pX (λ)dλ = 1 and  b2 a 2 pY (λ)dλ = 1, because pX and pY are probability distribution functions on [a1 , b1 ] and [a2 , b2 ], respectively). b) Compute pZ =pX ∗ pY . t c) Compute w = t 0 pZ (λ)dλ. b d) Compute γ(w) = min(μUsually ( a 11 pX (λ)μA (λ)dλ), b μUsually ( a 22 pY (λ)μB (λ)dλ)). 6) Construct a scatter plot of γ(w) versus w. 7) Detect an envelope of γ(w), namely μΠ (w). An equivalent engineering ACWW problem for Robert’s problem would be [31]: Usually Product I lasts for about 5 years and is then replaced by a refurbished one. Usually, the refurbished Product I lasts for about 2 years. What is the probability that a new Product I is not needed until the eighth year? C. Swedes and Italians Problem (SIP) The SIP is: Most Swedes are much taller than most Italians. What is the difference in the average height of Swedes and the average height of Italians? Zadeh formulates the SIP using generalized constraints, as follows. Assume that the population of Swedes is represented by {S1 , S2 , . . . , Sm } and the population of Italians is represented by {I1 , I2 , . . . , In }. The height of Si is denoted by xi , i = 1, . . . , m, and the height of Ij is denoted by yj , j = 1, . . . , n. Let  x ≡ (x1 , x2 , . . . , xm ) (31) y ≡ (y1 , y2 , . . . , yn ).

ci =

n 

μM uch

taller (xi , yj )

=

j =1

n 

hij .

(33)

j =1

The proportion of Italians in relation to whom Si is much taller, ρi , is then ρi ≡

ci . n

(34)

Using a T1 FS model for the linguistic quantifier M ost, the degree ui to which a Swede Si , is much taller than most Italians, is ui = μM ost (ρi ).

(35)

The proportion of the m Swedes who are much taller than most Italians can be derived via the division of the Σ-count of those Swedes by m: 1  ui . m i=1 m

v=

(36)

Consequently, the degree to which v belongs to the linguistic quantifier M ost is determined by M (x, y) = μM ost (v)

(37)

in which the fact that v is a function of x and y is emphasized in the argument of M . The difference in average height of Swedes and the average height of Italians, d, is calculated as d=

1  1 xi − yj . m i n j

(38)

To derive the linguistic constraint imposed on d by (37), one exploits the GEP. Zadeh’s approach states that there is a soft constraint “Most” on v, the Σ-count of Swedes who are much taller than most Italians, given by (36), and requires the calculation of the soft constraint on d, given by (38). Therefore, in (5), f (x, y) = v, and f : HSm × HIn −→ R. (39)   In addition, g(x, y) = d = 1/m i xi − 1/n j yj , and g : HSm × HIn −→ R.

(40)

The GEP implies that the soft constraint D on the difference in average heights d is characterized by the following membership

RAJATI AND MENDEL: ADVANCED COMPUTING WITH WORDS USING THE GENERALIZED EXTENSION PRINCIPLE

1259

function: μD (d) =

sup ( x , y ) ∈Hm ×Hn I S 1 d= m x −1 i i n







j

yj

⎛ ⎞⎞ m n   1 1 μM ost ⎝ μM ost ⎝ μM uch taller (xi , yj )⎠⎠ m i=1 n j =1 =

sup

M (x, y)

( x , y ) ∈Hm ×Hn S I 1 d= m x −1 i i n



 j

(41)

yj

in which (x, y) belongs to HSm × HIn , the space of all possible heights that Swedes and Italians can have. The sup is taken over this space since we have no information on the height distributions among these two nationalities. The problem stated in (41) can be solved using the following algorithm, which is in the spirit of our methodology for solving the PJW problem: 1) Choose N1 different x’s and N2 different y’s for which xi ∈ HS and yj ∈ HI . 2) Construct a pool of all possible pairs (x, y), and for all of its members, repeat 2a–2c: a) Choose a specific (x, y) fromthe pool. 1 b) Compute d = m1 i xi − n  j yj . n m 1 c) Compute γ(d) = μM ost ( m1 i=1 μM ost ( n j =1 μM uch taller (xi , yj ))). 3) Construct a scatter plot of γ(d) versus d. 4) Detect an envelope of γ(d), namely μD (d). An equivalent engineering ACWW problem for SIP would be: Most of the products of Company X have much longer lifetimes than most of the products of Company Y. What is the difference between the average life-time of the products of Company X and the average life time of the products of Company Y? VII. DISCUSSION In this section, we briefly discuss how one can solve an ACWW problem in general. This issue is not an easy one, because ACWW problems can conceivably be very complicated, due to the complicated nature of natural languages; hence, more research needs to be done to recognize different categories of ACWW problems involving implicit assignments of probability, truth, possibility, etc. Nevertheless, inspired by ACWW problems that have been discussed in this paper, we attempt to demonstrate a fairly general framework for solving ACWW problems that involve linguistic probabilities, namely: 1) Determine the linguistic probability words, linguistic quantifiers, and linguistic usuality words in the problem. 2) From the linguistic description of the problem, determine which quantities (i.e., numeric probabilities) are constrained by the words that were found in the previous step. This may be a challenging task, since those words may be implicitly assigned to the quantity (e.g., Most is assigned to the portion of Swedes who are tall, Probable is assigned to the probability that John is tall, etc.). 3) Formulate the numeric probabilities in the previous step:

Fig. 13.

Perceptual computer, which uses FS models for words.

a) Use definite integrals involving indicator functions of non-fuzzy events or membership functions of fuzzy events and the (unknown) probability distribution functions pertinent to those events (continuous case); or b) Use the fraction of the cardinality of the fuzzy event over the cardinality of the whole population (discrete case). Like the continuous case, for which the probability density functions might be unknown, some quantities related to the population (e.g., the average height of a population or the variance of the height of a population) may be unknown. 4) Formulate the quantity on which a fuzzy constraint must be calculated, in terms of the unknowns (e.g., probability distributions) of Steps 3(a) or 3(b). This quantity may be an average, a numeric probability, or a function of some averages, etc. To do this, the calculi of fuzzy sets is needed. 5) Apply the GEP: knowing the soft constraints on the quantities formulated in Step 3, determine the soft constraint on the quantity that was formulated in Step 4. The sup of the GEP is taken over all possible unknowns of Steps 3(a) or 3(b). For example, if a probability distribution is not known, or if the height of the individuals in a population is unknown, the sup is taken over all admissible probability distributions or admissible heights of individuals. The word “admissible” is crucial and implies that additional problem-specific world knowledge is needed. We leave it to the reader to return to our earlier sections to see how these five steps were implemented for specific ACWW problems. VIII. RELATIONSHIP BETWEEN PERCEPTUAL COMPUTING AND ADVANCED COMPUTING WITH WORDS USING THE GENERALIZED EXTENSION PRINCIPLE In this section, we investigate the relationship between ACWW and perceptual computing [23]. The architecture of a Perceptual Computer (Per-C) is illustrated in Fig. 13 [16]–[18]. The Per-C has three major subsystems: 1) encoder, which constructs fuzzy set models of words that are used by the CWW engine; 2) CWW engine, which, based on the knowledge pertinent to a specific problem and the calculi of fuzzy sets, computes a fuzzy set at its output, that is used by the decoder; and, 3) decoder, which transforms the FS output of the engine into a recommendation (a word) that is comprehensible to humans. Data that support the recommendation may also be provided. In this paper, we used the IA [14] or the EIA [41] for establishing IT2 FS models of words. We then used UMFSs or normal embedded T1 FSs extracted from the IT2 FS models of words, as T1 FS models of them; hence, we used the IA or EIA as encoders. To compute the solutions to ACWW problems

1260

IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 22, NO. 5, OCTOBER 2014

using those fuzzy set models, we used the GEP; hence, the GEP acts as a new9 CWW engine. We then used Jaccard’s similarity measure as a decoder to transform the fuzzy set provided by the GEP, into a word. We can therefore conclude that ACWW using the GEP fits within the framework of Perceptual Computing.

establishing methodologies for validating the solutions to an ACWW problem is an open and very important research topic. Since the GEP is the main tool for manipulating Z-numbers [52], our methodology may also contribute to developing viable frameworks for computing with Z-numbers.

IX. CONCLUSION

ACKNOWLEDGMENT

Zadeh believes that the GEP is one of the main aggregation tools for CWW, especially when dealing with probability constraints. Unfortunately, analytic solutions involving the GEP are presently hopeless, and existing numerical algorithms (e.g., αcut decomposition) are not directly applicable. In this paper, we solved some of Zadeh’s challenge problems that involve linguistic probabilities using a novel algorithm for implementing the GEP. To the best of our knowledge, our algorithm is the first attempt to actually implement Zadeh’s solution to some of his ACWW problems. The applicability of our algorithm has been demonstrated by solving some specific ACWW problems. An important prerequisite for using the GEP for problems involving linguistic probabilities, is knowing the type of probability distributions for a specific problem. This additional “world knowledge” is a luxury that may not always be available. A limitation of our algorithm for implementing the GEP is its “exhaustive search” on the space of the parameters involved in the problem, especially when the number of parameters (of the probability distributions) proliferates.10 Implementing the GEP without exhaustive search is certainly very worthy of future research. We also believe that there is more than one way to solve ACWW problems, and have presented different solutions to Zadeh’s ACWW problems in [29]–[31], [33], and [34] using syllogisms and Novel Weighted Averages [35]. Those solutions do not need any world knowledge about probability distributions, but they need world knowledge about the domains of variables involved in the problems (e.g., time and height). This is at the expense of obtaining lower and upper probabilities for a problem, instead of getting only one fuzzy probability. Interestingly, those soloutions also use the extension principle, but in a different way from the GEP. They involve the challenge of choosing appropriate compatibility measures [29], and may require multiple computational methodologies for fusion of inconsistent information [31]. How to validate a particular solution to an ACWW problem is an important and interesting issue. It is worth reminding the reader that most ACWW problems are so complicated that a human cannot provide an answer to them; hence, validation by means of a Turing Test is not possible. In this paper, we established a method for investigating the correctness of solutions by checking whether the algorithm yields the correct solution to a problem whose solution is intuitively known. We believe that

The authors would like to thank Prof. Lotfi A. Zadeh for explaining the solutions to his ACWW challenge problems to us and his continuing support and enthusiasm. They would also like to thank the reviewers of the paper for constructive feedback and comments which improved this paper.

9 Other

CWW engines are: Novel Weighted Averages and IF–THEN rules. example of this occurs when we need the distribution of heights of people (men and women), which is a mixture of Gaussians (kf (x|α 1 , β1 ) + (1 − k)f (x|α 2 , β2 )), and hence, has five parameters. 10 An

REFERENCES [1] Amazon Mechanical Turk. (2013). [Online]. Available: https://www. mturk.com/mturk/ [2] Human height. (2013). [Online]. Available: https://en.wikipedia.org/wiki/ Human height [3] Y. Cao and G. Chen, “A fuzzy petri-nets model for computing with words,” IEEE Trans. Fuzzy Syst., vol. 18, no. 3, pp. 486–499, Jun. 2010. [4] Y. Cao, M. Ying, and G. Chen, “Retraction and generalized extension of computing with words,” IEEE Trans. Fuzzy Syst., vol. 15, no. 6, pp. 1238– 1250, Dec. 2007. [5] V. Cross and T. Sudkamp, Similarity and Compatibility in Fuzzy Set Theory: Assessment and Applications. New York, NY, USA: Physica Verlag, 2002. [6] Y. Ding and A. Lisnianski, “Fuzzy universal generating functions for multi-state system reliability assessment,” Fuzzy Sets Syst., vol. 159, no. 3, pp. 307–324, 2008. [7] J. Dunyak, I. Saad, and D. Wunsch, “A theory of independent fuzzy probability for system reliability,” IEEE Trans. Fuzzy Syst., vol. 7, no. 3, pp. 286–294, Jun. 1999. [8] S. Han and J. Mendel, “A new method for managing the uncertainties in evaluating multi-person multi-criteria location choices, using a perceptual computer,” Ann. Oper. Res., vol. 195, no. 1, pp. 277–309, 2012. [9] F. Herrera, S. Alonso, F. Chiclana, and E. Herrera-Viedma, “Computing with words in decision making: Foundations, trends and prospects,” Fuzzy Optim. Decis. Making, vol. 8, no. 4, pp. 337–364, 2009. [10] I. Karimi and E. H¨ullermeier, “Risk assessment system of natural hazards: A new approach based on fuzzy probability,” Fuzzy Sets Syst., vol. 158, no. 9, pp. 987–999, 2007. [11] N. N. Karnik and J. M. Mendel, “Centroid of a type-2 fuzzy set,” Inf. Sci., vol. 132, no. 1, pp. 195–220, 2001. [12] E. S. Khorasani, P. Patel, S. Rahimi, and D. Houle, “An inference engine toolkit for computing with words,” J. Ambient Intell. Humaniz. Comput., vol. 4, no. 4, pp. 451–470, Aug. 2013. [13] E. S. Khorasani, S. Rahimi, and W. Calvert, “Formalization of generalized constraint language: A crucial prelude to computing with words,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 43, no. 1, pp. 246–258, Feb. 2013. [14] F. Liu and J. M. Mendel, “Encoding words into interval type-2 fuzzy sets using an interval approach,” IEEE Trans. Fuzzy Syst., vol. 16, no. 6, pp. 1503–1521, Dec. 2008. [15] L. Mart´ınez, D. Ruan, and F. Herrera, “Computing with words in decision support systems: An overview on models and applications,” Int. J. Computat. Intell. Syst., vol. 3, no. 4, pp. 382–395, 2010. [16] J. M. Mendel, “The perceptual computer: An architecture for computing with words,” in Proc. 10th IEEE Int. Conf. Fuzzy Syst., 2001, vol. 1, pp. 35–38. [17] J. M. Mendel, “An architecture for making judgments using computing with words,” Int. J. Appl. Math. Comput. Sci., vol. 12, no. 3, pp. 325–336, 2002. [18] J. M. Mendel, “Computing with words and its relationships with fuzzistics,” Inf. Sci., vol. 177, no. 4, pp. 988–1006, 2007. [19] J. M. Mendel, “Computing with words: Zadeh, Turing, Popper and Occam,” IEEE Comput. Intell. Mag., vol. 2, no. 4, pp. 10–17, Nov. 2007.

RAJATI AND MENDEL: ADVANCED COMPUTING WITH WORDS USING THE GENERALIZED EXTENSION PRINCIPLE

[20] J. M. Mendel, J. Lawry, and L. A. Zadeh, “Foreword to the special section on computing with words,” IEEE Trans. Fuzzy Syst., vol. 18, no. 3, pp. 437–440, 2010. [21] J. M. Mendel and D. Wu, “Perceptual reasoning for perceptual computing,” IEEE Trans. Fuzzy Syst., vol. 16, no. 6, pp. 1550–1564, Dec. 2008. [22] J. M. Mendel and D. Wu, “Computing with words for hierarchical and distributed decision making,” in Computational Intelligence in Complex Decision Systems. Paris, France: Atlantis, 2009. [23] J. M. Mendel and D. Wu, Perceptual Computing: Aiding People in Making Subjective Judgments. New York, NY, USA: Wiley-IEEE Press, 2010. [24] J. M. Mendel and D. Wu, “Challenges for perceptual computer applications and how they were overcome,” IEEE Comput. Intell. Mag., vol. 7, no. 3, pp. 36–47, 2012. [25] J. M. Mendel, L. A. Zadeh, E. Trillas, R. R. Yager, J. Lawry, H. Hagras, and S. Guadarrama, “What computing with words means to me [discussion forum],” IEEE Comput. Intell. Mag., vol. 5, no. 1, pp. 20–26, Feb. 2010. [26] B. M¨oller, W. Graf, and M. Beer, “Safety assessment of structures in view of fuzzy randomness,” Comput. Struct., vol. 81, no. 15, pp. 1567–1582, 2003. [27] V. Nov´ak and P. Murinov´a, “Intermediate quantifiers of natural language and their syllogisms,” in Proc. World Conf. Soft Comput., 2011, pp. 178– 184. [28] P. Patel, E. Khorasani, and S. Rahimi, “An API for generalized constraint language based expert system,” in Proc. Annu. Meet. North Amer. Fuzzy Inf. Process. Soc., 2012, pp. 1–6. [29] M. R. Rajati and J. M. Mendel, “Lower and upper probability calculations using compatibility measures for solving Zadeh’s challenge problems,” in Proc. IEEE Int. Conf. Fuzzy Syst., 2012, pp. 1–8. [30] M. R. Rajati and J. M. Mendel, “Solving Zadeh’s Swedes and Italians challenge problem,” in Proc. Annu. Meet. North Amer. Fuzzy Inf. Process. Soc., 2012, pp. 1–6. [31] M. R. Rajati and J. M. Mendel, “Advanced computing with words using syllogistic reasoning and arithmetic operations on linguistic belief structures,” in Proc. IEEE Int. Conf. Fuzzy Syst., 2013, pp. 1–8, paper F-1449. [32] M. R. Rajati and J. M. Mendel, “Modeling linguistic probabilities and linguistic quantifiers using interval type-2 fuzzy sets,” in Proc. 2013 Joint IFSA World Congr. Annu. Meet. North Amer. Fuzzy Inf. Process. Soc., 2013, pp. 327–332. [33] M. R. Rajati, J. M. Mendel, and D. Wu, “Solving Zadeh’s Magnus challenge problem on linguistic probabilities via linguistic weighted averages,” in Proc. IEEE Int. Conf. Fuzzy Syst., 2011, pp. 2177–2184. [34] M. R. Rajati, D. Wu, and J. M. Mendel, “On solving Zadeh’s tall Swedes problem,” in Proc. World Conf. Soft Comput., 2011, paper no. 442. [35] M. R. Rajati and J. M. Mendel, “Novel weighted averages versus normalized sums in computing with words,” Inf. Sci., vol. 235, pp. 130–149, 2013. [36] M. Schilling, A. Watkins, and W. Watkins, “Is human height bimodal?,” Amer. Statistic., vol. 56, no. 3, pp. 223–229, 2002. [37] F. Tatari, M. Akbarzadeh-T, and A. Sabahi, “Fuzzy-probabilistic multi agent system for breast cancer risk assessment and insurance premium assignment,” J. Biomed. Informat., vol. 45, pp. 1021–1034, 2012. [38] J. Wang, “A subjective methodology for safety analysis of safety requirements specifications,” IEEE Trans. Fuzzy Syst., vol. 5, no. 3, pp. 418–430, Aug. 1997. [39] D. Wu and J. M. Mendel, “A comparative study of ranking methods, similarity measures and uncertainty measures for interval type-2 fuzzy sets,” Inf. Sci., vol. 179, no. 8, pp. 1169–1192, 2009. [40] D. Wu and J. M. Mendel, “Perceptual reasoning for perceptual computing: A similarity-based approach,” IEEE Trans. Fuzzy Syst., vol. 17, no. 6, pp. 1397–1411, Dec. 2009. [41] D. Wu, J. M. Mendel, and S. Coupland, “Enhanced interval approach for encoding words into interval type-2 fuzzy sets and its convergence analysis,” IEEE Trans. Fuzzy Syst., vol. 20, no. 3, pp. 499–513, Jun. 2012. [42] R. R. Yager, “Perception-based granular probabilities in risk modeling and decision making,” IEEE Trans. Fuzzy Syst., vol. 14, no. 2, pp. 329–339, Apr. 2006. [43] L. A. Zadeh, “Probability measures of fuzzy events,” J. Math. Anal. Appl, vol. 23, no. 2, pp. 421–427, 1968. [44] L. A. Zadeh, “A computational approach to fuzzy quantifiers in natural languages,” Comput. Math. Appl., vol. 9, no. 1, pp. 149–184, 1983. [45] L. A. Zadeh, “Fuzzy logic= computing with words,” IEEE Trans. Fuzzy Syst., vol. 4, no. 2, pp. 103–111, May 1996. [46] L. A. Zadeh, “From computing with numbers to computing with words,” Ann. New York Acad. Sci., vol. 929, no. 1, pp. 221–252, 2001. [47] L. A. Zadeh, “From computing with numbers to computing with words– from manipulation of measurements to manipulation of perceptions,” in Logic, Thought and Action, D. Vanderveken, Ed., (Series Logic, Episte-

[48] [49] [50] [51] [52] [53]

1261

mology, and the Unity of Science, vol. 2) Dordrecht, The Netherlands: Springer-Verlag, 2005, pp. 507–544. L. A. Zadeh, “A summary and update of fuzzy logic,” in Proc. 2010 IEEE Int. Conf. Granul. Comput., 2010, pp. 42–44. L. A. Zadeh, Computing with Words: Principal Concepts and Ideas, (Series Studies in Fuzziness and Soft Computing, vol. 277). Berlin, Germany: Springer-Verlag, 2012. L. A. Zadeh, “The concept of a linguistic variable and its application to approximate reasoning—Part II,” Inf. Sci., vol. 8, no. 4, pp. 301–357, 1975. L. A. Zadeh, “Precisiated natural language (PNL),” AI Mag., vol. 25, no. 3, 2004. L. A. Zadeh, “A note on Z-numbers,” Inf.Sci., vol. 181, no. 14, pp. 2923– 2932, 2011. J. Zhou, “Reliability assessment method for pressure piping containing circumferential defects based on fuzzy probability,” Int. J. Pressure Vessels Piping, vol. 82, no. 9, pp. 669–678, 2005.

Mohammad Reza Rajati (S’12) was born in Kermanshah, Iran, in 1984. He received the B.Sc. degree in electrical engineering (with major option in control systems) from Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran, in 2005 and the M.Sc. degree in mechatronics engineering from K. N. Toosi University of Technology, Tehran, in 2008. He is currently working toward the Ph.D. degree in electrical engineering and the M.Sc. degree in mathematical statistics, respectively, with the Department of Electrical Engineering and the Department of Mathematics, University of Southern California (USC), Los Angeles, CA, USA, where he has been an Annenberg Fellow since 2009. He is currently a Research Assistant with the Center for Integrated Smart Oilfield Technologies (CiSoft), USC. He has published 25 technical papers. His research interests include computational intelligence and its applications to control systems, decision making, smart oilfield technologies, and signal processing. He is also interested in classical control theory, mathematical statistics, and machine learning. Mr. Rajati is a student member of the IEEE Computational Intelligence Society. He has served on the editorial board of two books and as a Reviewer for many journals and conferences in the area of computational intelligence, including the IEEE TRANSACTIONS ON FUZZY SYSTEMS, the IEEE TRANSACTIONS ON CYBERNETICS, IEEE Computational Intelligence Magazine, the IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, and the International Joint Conference on Neural Networks. He received a Best Student Paper Award at the 2012 North American Fuzzy Information Processing Society Meeting, Berkeley, CA, USA, in 2012.

Jerry M. Mendel (S’59–M’61–SM’72–F’78– LF’04) received the Ph.D. degree in electrical engineering from the Polytechnic Institute of Brooklyn, Brooklyn, NY, USA. He is currently a Professor of electrical engineering and systems architecting engineering with the University of Southern California, Los Angeles, CA, USA, where he has been since 1974. He has published more than 550 technical papers and is the author and/or editor of ten books, including Uncertain Rule-based Fuzzy Logic Systems: Introduction and New Directions (Prentice-Hall, 2001), Perceptual Computing: Aiding People in Making Subjective Judgments (Wiley & IEEE Press, 2010), and Advances in Type-2 Fuzzy Sets and Systems (Springer, 2013). His present research interests include type-2 fuzzy logic systems and their applications to a wide range of problems, including smart oil field technology, computing with words, and fuzzy set qualitative comparative analysis. Dr. Mendel is a Distinguished Member of the IEEE Control Systems Society and a Fellow of the International Fuzzy Systems Association. He was the President of the IEEE Control Systems Society in 1986. He is a member of the Administrative Committee of the IEEE Computational Intelligence Society was the Chairman of its Fuzzy Systems Technical Committee (TC), and was the Chairman of the Computing With Words Task Force of that TC. Among his awards are the 1983 Best Transactions Paper Award from the IEEE Geoscience and Remote Sensing Society, the 1992 Signal Processing Society Paper Award, the 2002 and 2014 IEEE TRANSACTIONS ON FUZZY SYSTEMS Outstanding Paper Awards, a 1984 IEEE Centennial Medal, an IEEE Third Millenium Medal, and a Fuzzy Systems Pioneer Award from the IEEE Computational Intelligence Society in 2008.