Estimating the expected value of fuzzy random variables in random ...

19 downloads 0 Views 798KB Size Report
Apr 27, 1998 - the measurability and expectation of fuzzy random variables have been defined ... fuzzy random variable in a finite population, when only theĀ ...
Statistical Papers 40, 277-295 (1999)

Statistical Papers 9 Springer-Verlag 1999 _

.

Estimating the expected value of fuzzy random variables in random samplings from finite populations M. Asunci6n Lublano and M. Angeles Gil Received" April 27, 1998; revised version" September 16, 1998

In this paper we consider the problem of estimating the expected value of a fuzzy-valued random element in random samplings from finite populations. To this purpose, we quantify the associated sampling error by mea~ls of a parameterized measure we have introduced in a previous paper. Keywords: Aumann's integral, expected value of a fuzzy random variable, filzzy random variable, ,~-mean squared dispersion, ra,~dom saxt~plings, random set.

INTRODUCTION Given a random experiment characterized by a probability space, random elements associated with it are assignment processes converting experimental outcomes into numerical values, vector values, set values, and others. Depending on the nature of the images in the processes above, we can distinguish different kinds of random elements, like the we[I-known random variables, the random vectors, the random sets and, more recently, the flJzzy random variables.

278 Fuzzy random variables correspond to a high level of generalization (ill the sense that they include random variables, vectors and sets as particular cases), but to an intermediate level of precision (in between random variables/vectors and random sets). Fuzzy random variables, as introduced by Purl and Ralescu (1986), have been formalized in terms of notions from the Theory of Random Sets, so that the measurability and expectation of fuzzy random variables have been defined on the basis of the measurability condition for a set-valued mapping (which is stated by using Hausdorff's metric) and the Aumann integral, respectively. Several statistical studies for fuzzy random variables concerning either inferential techniques, or decision-making problems, or the measurement of inequality, have been developed (see, for instance, Ralescu 1982, 1995ab, Ralescu and Ralescu 1984, 1986, Kruse and Meyer 1987, Gil and LSpez-Diaz 1996, Gebhardt et al. 1998, Gil et al. 1998ab). In this paper we discuss the problem of estimating the expected value of a fuzzy random variable in a finite population, when only the information relating to a sample from this population can be examined. When we deal with (real-valued) random variables, we can consider two sampling theories, namely, the standard sample survey theory and the classical sampling one. In the classical theory of Mathematical Statistics, variables are usually assumed to be distributed in accordance with a known parametric distribution (say Normal, Poisson, Pascal, etc.), whereas in the sample survey theory this knowledge is not admitted and populations are supposed to be finite. In the framework of fuzzy random variables, the assumption of having a known parametric distribution is unrealistic. In fact, the literature on models of distributions for fuzzy random variables is quite rare (see Purl and Ralescu 1985, Ralescu 1995c), so that in this context the sample survey theory makes more sense than the classical one. Throughout this paper we will show that if we consider random samplings with or without replacement, the sample expected value of a fuzzy random variable is an unbiased (in Puri and Ralescu's sense) estimator of the corresponding population expected value. To assess tile accuracy of a fuzzy estimator of a fuzzy paraineter, we have previously established (Lubiano et al. 1999) a measure extending the mean square error of the real-valued case: the ~-mean squared dispersion. The main contribution of this paper is given by the possibilities (due to tile use of the ~mean squared dispersion) of quantifying the error associated with the unbiased estimators in terms of this measure, and especially of estimating this error and choosing appropriate sample sizes. Tile results obtained will be applied in an example and some related open problems will be commented.

279

2

PRELIMINARIES

Let K:c(]Rp) be the class of nonempty compact and convex subsets of the Euclidean space IRp (p E ~N), and let ~-c(]Rp) denote the class of the mappings V : ]Rp ---, [0, 1] (or fuzzy subsets of ]Rp) such that Vo E K.c(IRp) for all a E [0, 1], where Vo = {z E IRp [ V(x) > a} if a E (0, 1] and V0 = cl{x E IRv I V(x) > 0}. We will refer to Vo as the o-level set of V. The space Ec(]R p) can be endowed with a linear structure induced by the scalar product and the Minkowski addition, that is,

9~ A = { A a l a E A } ,

A+B={a+bIaEA,

b E B },

for all A, B e Kc(]Rp) and A E IR. The Hausdorff distance between two elements A, B E Ec(IR v) is defined as

dH(A,B) = max { sup inf [[a - b[[, sup inf ]la - bl[ }, aEA bEB

bEB aEA

where [[. [[ is the Euclidean norm in IRp. (ICc(]RP),dH) is a complete and separable metric space (see Debreu 1967). In a similar way, the space ~-c(]Rp) can be endowed with a linear structure induced by the fuzzy product | and addition ~ based on Zadeh's extension principle (1975), in accordance with which (and following Nguyen 1978) for each a E [0, 1] we have that |

=

9

=

+

for all V, W E 9re(JRp) and A e IR. On the other hand, .Tc(IRp) can be endowed with the metric doo defined (Puri and Ralescu 1981) as follows:

d~(V, W) =

sup dH(Vc,,IVo), ~E(o,I]

for all

V , W E .Tc(IRP).

(gvc(IRP), doo) is a complete nonseparable metric space (see Puri and Ralescu 1986, Klement et al. 1986). Let (f~,A, P) be a probability space. If Bd~ is the a-feld generated by the open sets in .T'c(IRp) defined in terms of the doc metric, then a fuzzy random variable (also called a convex random fuzzy set) associated with (f~, A) is a Borel- measurable function, that is, an (A, Bd~)-measurable function X : f2 .Tr The concept above is sometimes generalized by defining a fuzzy random variable as a mapping 2' : f2 --..7:r p) such that the section Xo : f2 --. /Cc(IRP), which is called the a-level function and is defined by Xo(W,) = (X(~,))o, is a (compact) random set (i.e., an (A, /3aH)-measurable function) for all a E [0, 1]. In fact, the results in this paper are all applicable to this general notion. If 2' is a fuzzy random variable, then the expected value of 2' in (f2, A, P) is defined as the unique element /~(2r E .Tc(1Rp) such that for each a E [0, 1] the set (/~(X))~ is the Aumann integral (Aumann 1965) of Xa with respect to P, that is,

(fi(x))o

=

{ E ( f ) I f : fl --, IRP, E(ll.fll)


d) d)/ < a is -

-

d2

n~,

then we will complete the preliminary sample by choosing an additional simple random sample of size n - n 1. Since the method ill Theorem 3.5 supplies all overassessment of the minimum sample size for our aim, the final sample size is expected to ensure this condition. Some additional comments about the approach considered to deternfine a suitable sample size will be given in the concluding section. Alternative ways to estimate A~(2d) in the last problem are suggested in most of texts on Sampling Theory (cf. Cochran 1977, Barnett 1991).

286

4

ESTIMATING THE EXPECTED VALUE OF A FUZZY RANDOM VARIABLE IN RANDOM SAMPLING WITH REPLACEMENT

In this section we are going to examine a problem sinfilar to that in Section 3, but in the random samplin9 with replacement from a finite population ~. Therefore, we still consider the uniform distribution on the probability space associated with ~, but individuals or units can arise more than once in the same sample. Although the random sampling with replacement from a finite population is only occasionally employed in practice, it is worth analyzing the results for it, since they can serve as the bases to draw conclusions for more complex sampling schemes. Assume that a sample of size n is chosen at random and with replacement from the overall population. The sample expected value of X, X,~, defines a fuzzy random variable associated with the probability space ( T ~ , ~ ' ( T ~ ) , p ~~ (Tnw being the space of the CRy,,, = (N+n-l) distinct possible random samples with replacement of size n from the given population, and pW[v] is the probability of choosing the sample t, E T~, which does not determine a uniform distribution on T ~ in the considered sampling). As for the simple random sampling, 2(n is an unbiased fuzzy estimator of X in random sampling with replacement. Thus, T h e o r e m 4.1 In random sampling with replacement of size n from the popu-

lation ~ of N sampling units, U1 . . . . . UN , the .Tc(]R)-valued estimator Xn is unbiased to estimate the population expected value "X, that is,/~ ( ' ~ , ) = ~" (where the last ezpeetation E('~n) is computed over the space (T'~,7:'(T'~),p~)). Proof. Indeed,

~ ( x . ) - - ~ p~[,,] | x.{,.,j, vET.~ and,

T,,[,,] =

N

_i| ~ /2

tj[,,] |

x(u~),

j=l

where tj[v] is the number of times that Uj appears in ,, E T w, j = 1. . . . . N. The real-valued random variables tj are defined on (Tnw, p(TW), pW) and the random vector (& . . . . . t g ) has a multinomial distribution .M(n, 1/N . . . . . 1/N). In virtue of the properties of the expectation of fuzzy random variables, we obtain that N

i(-zo) = ;l|174

j=l where E(tj) = E v e T ~ PW[t'ltJ[ v] = n / N , and hence /~ ('~n) = ~ .

[]

Regarding the accuracy or precision, of Xn in estimating X, we can establish now that

287

Theorem 4.2 In random sampling with replacement of size n from the population ~ of N sampling units, the )~-mean squared dispersion of Xn about X is given by

MSD (-2

n

where A~2 ('Xn) is computed over T~ whereas Ay,2 (X) is computed over ~. Proof.

Indeed, following arguments similar to those in Theorem 3.2, we have

that j=l

and, since for each j,l E {1..... N} with l :~ j we have that Var(tj) = (N - 1) n/N 2 and Cov(tj, tl) = - n / N ~, we can conclude that N

E

j=l

2

N-1

N

j=l l=j+l N

N-1

+ )~2{ N~V-51 E (mid X~ (Uj)) 2

N

N22 E

j=l

midX~(Uy) mid X~ (Ut) }

E

j=l t=j+l

N

N-1

+A3{ NN~-1 E (inf Xo (Uj)) 2

2_~.2.N Z

j----1

N

Z

inf 2d~(Uj) inf 2d~(UO }] dc~

j=l l=j+l

A2 (X) _

,k

[]

n

As an immediate consequence from Theorems 3.2 and 4.2 we obtain the unsurprising conclusion that simple random sampling is more precise than the random sampling with replacement in estimating ~. The following result states the estimation of A~. (~,~) in the random sampling with replacement: Theorem 4.3 In random sampling with replacement of size n from the population ~ of N sampling units, the (real-vabged) estimator

,1] is unbiased to estimate A~2 (X).

288 Proof. Indeed, in random sampling with replacement of size n we have that 2 2d~) are unbiased to estimate Var( sup 2da), s n2 ( s u p X a ) , s2(midXo) and sn(inf Var (mid A'a), and Vat( inf Xo), respectively, whence

C o r o l l a r y 4.4 In random sampling with replacement of size n from the population fl of N sampling units, the "real-valued) estimator

n

is unbiased to estimate A~ (-~n). The problem of choosing an appropriate sample size can be now solved as follows: T h e o r e m 4.5 In random sampling with replacement from the population ~ of N sampling units, the sample size

n ---- d2a satisfies that Pr(D~('X'n, X---) > d) _< a.

5

ILLUSTRATIVE

EXAMPLE

The use of the results in the study developed in this paper, will be now illustrated by means of an example. E x a m p l e . A psychologist wants to make a survey of tile preferred age for the inhabitants of a given city of 25,000, to estimate the mean preference. For this purpose, the psychologist selects a preliminary simple random sample of nl = 100 inhabitants of this city, and asks them for the time of their life they consider has been/is/will be the best (assuming that they are not suffering from any illness during this period). Assume the answers of the people polled are as follows: 15 'when I am/was YOUNG' (xl), 2 'when I am/was VERY YOUNG' (k2), 2 'when I am/was EXTREMELY YOUNG' (~'3), 3 'when I am/was FAIRLY YOUNG' (z4), 10 'when I am/wa.s MIDDLE-AGED' (ks), 8 'when I am/was ABOVE MIDDLE-AGED' (x6), 15 'when I am/was BELOW MIDDLE-AGED' (x7), 10 'when I am/was AROUND MIDDLE-AGED' (xs), 4 'when I am/was VERY MIDDLE-AGED' (~:9), 7 'when I am/was OLD' (xl0),

289 2 'when I am VERY OLD' (zu), 1 'when I am EXTREMELY OLD' (~'12), 15 'when I am/was NOT OLD' (~13), 6 'when I am/was FAIRLY OLD' (s The preceding values for the variable ?c' = "time of life" over the population fl of the 25,000 inhabitants of the considered city, are clearly ill-defined and no exact boundaries are universally established for them. Suppose that to express the answers above, we use for instance the fuzzy regions with supports in [0,100] based on S-curves (see, for instance, Cox 1994), where

IO 2

ift d) < a. Secondly, it \

would be also valuable to examiuing how operational the performance of the extension of Chebychev's inequality for fuzzy random variables and based on the d ~ metric becomes in practice, since d ~ would determine a real (iustead of an average) upper bound for the error incurred i,1 the estimation. Actually, the practical inconvenience Of this extension will almost certainly lie in estimating E ( d ~ (~n, X----)) to apply it in gett4ng appropriate sample sizes. It should be remarked that there is no deviation in the results in this paper, from known classical results i,~ Sampling Theory with finite populations. Tiffs coincidence is mainly due to the properties satisfied by the measure of dispersion which is considered throughout the paper, and to the fact pointed out in section 2 that, under the convexity assumption for variable values, the expected value of a fuzzy random variable on a finite population has an expression similar to that of a real-valued variable, unless for the extended operations. Conclusions in this paper for ~c(]R)-valued random elements are applicable for a particular case of the variance in KSrner (1997) and N~ither (1997), which was introduced for ~-~(]RP)-valued random elements and uses different metrics dp.2, p E [1,-l-co) (in fact, for the particular case in which k -- 1 and p = 2, we obtain that D(.5,.0,.5)(Y, W) ---- d 2 , 2 ( V , ~.V)). However, for p E [1, +c~) \ {2} conclusions would differ from those in this paper, and in most cases, computations becomes much more complex or even unfeasible. In the same way, and for .T'c(]R)-valued random elements, the results in this paper can be extended by using the generalized distance in Be,'toluzza et al. (1995) which take into account all points in the a-levels of the involved fuzzy values by considering an immediate bijection. Finally, an open problem which is closely connected with the questions in this paper is that of defining interval estimates for fuzzy parameters (see Kruse

293 and Meyer 1987, Ralescu 1995a, and Gebhardt et aI. t998, for approaches on this problem).

ACKNOWLED

GEMENTS

The research in this paper was supported in part by the Spanish DGES Grant No. PB95-1049 (Ministerio de Educacidn y Cultura) and FICYT Grant No. 37/PB-TIC97-02 (Consejerfa de Cultura del Principado de Asturias). Their financial support is gratefully acknowledged. The authors are very thankful to their colleagues Professors Norberto Corral and Maria Teresa L6pez for their strong help in solving the computational aspects in this paper, and to Dr. Miguel L6pez-D~az, Professor Wolfgang Niither, Dr. Ralph KSrner and referees of this paper because of their valuable comments.

References [1] Aumann, R.J. (1965). Integrals of set-valued functions. J. Math. Anal. Appl., 12, 1-12. [2] Bandemer, H. and Gottwald, S. (1995). Fuzzy Sets, Fuzzy Logic, Fuzzy Methods with Applications. John Wiley &: Sons, Chichester. [3] Barnett, V. (1991). Sample Survey. Principles and Methods. Edward Arnold, London. [4] Bertoluzza, C., Corral, N. and Salas, A. (1995). Oll a new class of distances between fuzzy numbers. Mathware & Soft Computing, 2, 71-84. [5] Byrne, C. (1978). Remarks on the set-valued integrals of Debreu and Aumann. J. Math. Anal. Appl., 62, 243-246. [6] Cochran, W.G. (1977). Sampling Techniques. John Wiley & Sons, New York. [7] Cox, E. (1994). The Fuzzy Systems Handbook. Academic Press, Cambridge. [8] Debreu, G. (1967). Integration of correspondences. Proc. Fifth Berkeley Syrup. Math. Statist. Prob.. 1965/66, 2, Part i. Univ. of California Press, Berkeley, 351-372. [9] Gebhardt, J., Gil, M.A. and Kruse, R. (1998). Fuzzy Set-Theoretic Methods in Statistics. In Fuzzy Sets in Decision Analysis, Operations Research and Statistics, Ch. 10. The Handbook of Fuzzy Sets Series, (Didier Dubois and Henri Prade Series Editors), Roman Slowinski, Ed., Kluwer Academic Publishers, New York, 311-347.

294 [10] Gil, M.A. and Ldpez-Diaz, M. (1996). Fundamentals and Bayesian Analyses of decision problems with fuzzy-valued utilities. Int..J. Approx. Reason., 15, 203-224. [11] Gil, M.A., Ldpez-Diaz, M. and Ldpez-Garcfa, H. (1998a). The fuzzy hyperbolic inequality index associated with fuzzy random variables. European J. Oper. Res., 110, 377-391. [12] Gil, M.A., Ldpez-Diaz, M. and Rodr/guez-Mufiiz, L.J. (1998b). An improvement of a comparison of experiments in statistical decision problems with fuzzy utities. IEEE Trans. Syst. Man, Cyb.. (Accepted, in press). [13] KSrner, R. (1997). On the variance of fuzzy random variables. Fuzzy Sets and Systems, 92, 83-93. [14] Klement, E.P., Puri M.L. and Ralescu, D.A. (1986). Limit theorems for fuzzy random variables. Proe. R. Soe, Lond., A 407, 171-182. [15] Kruse, R. and Meyer, K.D. (1987). Statistics with Vague Data, Reidel Publ. Co., Dordrecht. [16] Lubiano, M.A., Gil, M.A., Ldpez-Dfaz, M. and Ldpez, M.T. (1999). The ~-mean squared dispersion associated with a fuzzy random variable. Fuzzy Sets and Systems. (Accepted, in press). [17] N~ither, W. (1997). Linear statistical inference for random fuzzy data. Statistics, 29, 221-240. [18] Negoita, C. and Ralescu, D.A. (1987). Sinmlation, Knowledge-based Computing, and Fuzzy Statistics. Van Nostrand Reinhold Co., New York. [19] Nguyen, H.T. (1978). A note on the extension principle for fuzzy sets. J. IVlath. Anal. Appl., 64, 369-380. [20] Purl, M.L. and Ralescu, D. (1981). Diff~rentielle d'une fonction floue. C.R. Acad. Sci. Paris, S~r. I, 293, 237-239. [21] Purl, M.L. and Ralescu, D.A. (1985) The concept of normality for fuzzy random variables. Ann. Probab., 13, 1373-1379. [22] Puri, M.L. and Ralescu, D.A. (1986) Fuzzy random variables..J. Math. Anal. Appl., 114, 409-422. [23] Ralescu, A. and Ralescu, D.A. (1984). Probability and fuzziness. Inf. Sci., 17, 85-92. [24] Ralescu, A. and Ralescu, D.A. (1986). Fuzzy sets in statistical inference. The Mathematics of Fuzzy Systems. (A. Di Nola and A.G.S. Ventre, Eds.). Verlag TV Rheinland, KSlm 273-283.

295 [25] Ralescu, D.A. (1982). Fuzzy logic and statistical estimation. Proc. 2nd World Conference on Mathematics at the Service of Man, 605-606, Las Palmas-Canarias. [26] Ralescu, D.A. (1995a). Fuzzy probabilities and their applications to statistical inference. Advances in Intelligent Computing - IPMU'94, Lecture Notes in Computer Science, 945, 217-222. [27] Ralescu, D.A. (1995b). Inequalities for fuzzy random variables. Proc. 26th Iranian Mathematical Conference, 333-335, Kerman. [28] Ralescu, D.A. (1995c). Fuzzy random variables revisited. Proc. IFES'95 and Fuzzy IEEE Joint Conference, Vol. 2, 993-1000, Yokohama. [29] Zadeh, L.A. (1975). The concept of a linguistic variable and its application to approximate reasoning. Parts 1, 2, and 3. Inf. Sci., 8, 199-249; 8,301-357; 9, 43-80. M. Asuncion Lubiano M. Angeles Gil Dpto. De Estadistica, I.O. y D.M. Universidad de Oviedo E-33071 Oviedo, Spain