Soft Comput (2015) 19:883–890 DOI 10.1007/s00500-014-1415-5
FOCUS
The statistical inferences of fuzzy regression based on bootstrap techniques Woo-Joo Lee · Hye Young Jung · Jin Hee Yoon · Seung Hoe Choi
Published online: 13 August 2014 © Springer-Verlag Berlin Heidelberg 2014
Abstract In this paper, we estimate the parameters of fuzzy regression models and investigate a statistical inferences with crisp inputs and fuzzy outputs for each α-cut. The proposed approaches of statistical inferences are fuzzy least squares (FLS) method and bootstrap technique. FLS is constructed on the basis of minimizing the sum of square of the total difference between observed and estimated outputs. Numerical examples are illustrated to perform the hypotheses test and to provide the percentile confidence regions by proposed approach. Keywords Fuzzy regression · Fuzzy least squares method · Bootstrap method
1 Introduction The linear regression model is the most frequently used technique in statistical models for explaining the relationship Communicated by J.-W. Jung. W.-J. Lee · H. Y. Jung Department of Mathematics, Yonsei University, Seoul, Republic of Korea e-mail:
[email protected] H. Y. Jung e-mail:
[email protected] J. H. Yoon School of Mathematics and Statistics, Sejong University, Seoul, Republic of Korea e-mail:
[email protected] S. H. Choi (B) School of Liberal Arts and Science, Korea Aerospace University, Goyang, Republic of Korea e-mail:
[email protected]
between one or more independent variables and response variables. Based on this relationship, response variable can be predicted from independent variables. In traditional statistics, the theory of parameter estimation in linear regression models is almost completely established. In such studies, the involved model elements, i.e., variables, parameters and error terms should be precise, and analysis can be conducted only with precise data. However, due to the vagueness of data in many practical situations, the traditional least squares method cannot be applied. Fuzzy linear regression, introduced by Tanaka et al. (1982), considered the LR-type fuzzy number and minimized the fuzziness of the fuzzy linear regression model. The problems of statistical inference with fuzzy data, i.e., estimation, testing hypotheses, prediction and analysis of variance with fuzzy data, are also developed by various authors: Arnold (1995), Casals et al. (1986), Grzegorzewski (2000), Kim and Seo (2012), Wu (2005) and (2007). In particular, to test the statistical hypothesis of fuzzy random variables, bootstrap techniques were applied by Montenegro et al. (2004). Recently bootstrap techniques were applied to regression model (Akbari et al. 2012; Lin et al. 2012; Maria et al. 2013). In this paper, we perform linear regression model with fuzzy data by applying the bootstrap techniques since bootstrap procedure is known as a fast convergent method. Bootstrapping is a general approach to statistical inference based on building a sampling distribution for a statistic by resampling from the data at hand. Also, we do not assume any underlying population distribution , most commonly used methods are permutation method and jackknife method. And we deal with the estimation and hypothesis test for parameter of fuzzy regression model. The structure of the paper is as follows: In Sect. 2, some basic concept of fuzzy theories is briefly reviewed. In Sect. 3, the fuzzy analogs of the normal equations of the least squares
123
884
W.-J. Lee et al.
methods are derived and the fuzzy least squares estimators are obtained. And bootstrap methods are applied to statistical inference. In Sect. 4, numerical examples are presented to illustrate regression model with fuzzy data applying bootstrap techniques. Finally, Sect. 5 summarizes the main results and draws conclusions.
2 Preliminaries In this section, we introduce some definitions regarding the fuzzy sets and the fuzzy numbers, as well as some basic concepts of fuzzy theory. Definition 1 A fuzzy subset A˜ with membership function ξ A˜ : Rn → [0, 1] is called a fuzzy number, if ξ A˜ satisfies (i) A˜ is normal, (ii) A˜ is convex, (iii) ξ A˜ is semicontinuous, (iv) ˜ := {x ∈ Rn : ξ˜ ˜ (x) > 0} is compact. supp( A) A
Definition 2 The α-cut of a fuzzy number A˜ is a non-fuzzy set defined as Aα := {x ∈ Rn | ξ A˜ (x) ≥ α, 0 ≤ α ≤ 1}. The set of all fuzzy numbers will be denoted by Fc (R). The α-cut Aα of A˜ ∈ Fc (R) is a closed and bounded interval for each α ∈ [0, 1]. Hence a fuzzy number A˜ ∈ Fc (R) is completely determined by the end points of the intervals Aα = [AαL , AU α ]. The arithmetic operations for two fuzzy numbers Aα and Bα are defined in the standard way, in terms of the α-cuts for α ∈ [0, 1]. U Addition: Aα + Bα = [AαL + BαL , AU α + Bα ]. Scalar multiplication: for given k ∈ R, [k AαL , k AU α ] if k ≥ 0 k · Aα = L [k AU , k A α α ] if k < 0 Scalar addition: k + Aα = [k + AαL , k + AU α ]. Using the general Hukuhara difference (Stefanini 2010), the subtraction of interval is defined as follows: L U − + [AαL , AU α ] g [Bα , Bα ] = [C α , C α ] U + L where Cα− = min{AαL − BαL , AU α − Bα }, C α = max{Aα − U BαL , AU α − Bα }. An alternative representation of a closed interval Aα = [AαL , AU α ] is by using the midpoint and the width, i.e.,
AC α =
1 L 1 U W L (A + AU α ) and Aα = (Aα − Aα ). 2 α 2
Therefore, the interval Aα can be equivalently expressed as W W the vector Aα = (AC α , Aα ), Aα ≥ 0. In this paper, the distance between α-cuts of two fuzzy numbers A˜ and B˜ as Euclidean distance is applied by C 2 + AW − B W 2 . AC (1) d(Aα , Bα ) = α − Bα α α
123
The notion of a fuzzy random variable was introduced by various authors: Kwakernaak (1978, 1979), Kruse and Meyer (1987), Puri and Ralescu (1986), and Hu et al. (2002). Let (Ω, F, P) be a probability space. A mapping X : Ω → Fc (R) is said to be an Fc (R)-valued fuzzy random variable in Puri and Ralescu’s sense (1986), if the α-level mapping, X α : Ω → Fc (R) are random sets, defined such that X α (ω) = X (ω) α = inf(X (ω))α , sup(X (ω))α for all ω ∈ Ω, α ∈ [0, 1]. If supx∈X 0 |x| ∈ L 1 , the Aumann-type expected value (Puri and Ralescu 1986) is the unique fuzzy set E[X α ] ∈ Fc (R) satisfying that for all α ∈ [0, 1] (E[X ])α = Aumann’s integral of X α = E(inf X αL ), E(sup X αU ) . Note that in the case of one-dimensional fuzzy random variables, the expectation described above is the same as the expectation of fuzzy random variables proposed by Kruse and Meyer (1987) and Feng et al. (2001). In addition, there are different approaches of definition on variance of fuzzy random variables (Kruse and Meyer 1987; Feng et al. 2001; Körner 1997, etc.). In this paper, the variance of fuzzy random variable X α is defined in the Fréchet’s sense as the real value σ X2 α = E d 2 (X α , E X α ) .
3 Statistical inferences for fuzzy regression models The fuzzy regression model can be categorized depending on the types of variable and parameter. (a) Fuzzy output, crisp input, and fuzzy parameters. (b) Fuzzy output, fuzzy input, and crisp parameters. (c) Fuzzy output, fuzzy input, and fuzzy parameters. In this paper, we use the fuzzy regression model in the first category based on fuzzy least squares (FLS) method. 3.1 Fuzzy regression model In this paper, we restrict the model into the following simple case without loss of generality: y˜i = β˜0 + β˜1 xi + ˜i , i = 1, . . . , n and α ∈ [0, 1], where y˜i is the fuzzy output variable, xi is a non-fuzzy (crisp) input, β˜0 is the fuzzy intercept coefficient, β˜1 is the fuzzy slope coefficient and ˜i is an error without assumption of distribution. Now let us consider the following α-cut linear model: yiα = β0α + β1α xi + iα .
(2)
The statistical inferences of fuzzy regression
885
Since the regression coefficients of model (2) are α-cuts of fuzzy number, the estimated dependent variable yˆiα is also α-cut of fuzzy number. The α-cuts of fuzzy coefficients β jα ( j = 0, 1) are obtained by minimizing the error measured by (1) between the actual observation yiα and estimated fuzzy output yˆiα : Q(β Ljα , β Ujα ) =
n
d 2 (yiα , yˆiα ).
i=1
The estimators for β0α and β1α are obtained by solving following equations: ∂Q = 0, ∂β Ljα
∂Q ∂β Ujα
= 0.
Then α-cut estimators for β0α and β1α of unknown parameters are n ¯ iLα − y¯αL ) i=1 (x i − x)(y L L L L n βˆ0α = y¯α − βˆ1α x, ¯ βˆ1α = ¯ 2 i=1 (x i − x) and
n
¯ βˆ1Uα = βˆ0Uα = y¯αU − βˆ1Uα x,
ˆiα = [ˆi−α , ˆi+α ], i = 1, . . . , n, where ˆi−α = min{yiLα − yˆi−α , yiUα − yˆi+α }, ˆi+α = max{yiLα − yˆi−α , yiUα − yˆi+α }. Step 2 Calculate the centered residual of ˆiα : eiα = [ei−α , ei+α ],
¯ iUα − y¯αU ) i=1 (x i − x)(y n . ¯ 2 i=1 (x i − x)
We denote the closed interval βˆ jα for j = 0, 1 by
ˆ+ βˆ jα = βˆ − jα , β jα ,
approach and the second is based on the resampling errors. Bootstrap method based on resampling errors is known as more suitable for the case of deterministic x, whereas bootstrap method based on the drawing i.i.d. sample from the observations pairs is more appropriate for the case of random x. But bootstrapping pairs can also be used for deterministic x (Efron 1982). The bootstrap is a “model-dependent” method in terms of its implementation and performance although the bootstrap requires no theoretical formula for the quantity to be estimated and is less model-dependent than the traditional approach. In this paper, we use bootstrap method based on the resampling errors. The bootstrap procedure is as follows: Step 1 Estimate the regression coefficients of model (2). And calculate the residual as
(4)
where ei−α = min{ˆi−α − ˆ¯i−α , ˆi+α − ¯ˆi+α }, ei+α = max{ˆi−α − ˆ¯ − , ˆ + − ¯ˆ + }. iα
(3)
ˆ L ˆU ˆ+ ˆL ˆU where βˆ − jα = min{β jα , β jα } and β jα = max{β jα , β jα }. An attractive feature of the α-cut approach is that all α−cuts form a nested structure with respect to α (Kaufmann 1975). Given α1 , α2 ∈ (0, 1] with α1 > α2 , the feasible regions by α2 . As a defined by α1 are than those defined smaller ˆ + ⊆ βˆ − , βˆ + j = 0, 1, which , β result, we have βˆ − jα1 jα1 jα2 jα2 imply that the membership functions are convex. Thus βˆ1−α , βˆ0−α increase as regards α, and βˆ1+α , βˆ0+α decrease as regards α. According to “Resolution Identity” introduced by Zadeh (1975a,b,c), the fuzzy estimator βˆ˜ j for regression parameter β˜ j can be represented by the family of closed interval
ˆ + for j = 0, 1. βˆ˜ j = α βˆ − , β jα jα α∈[0,1]
Since the closed interval βˆ jα is not an empty set and the membership function of βˆ˜ j is upper semicontinuous, the fuzzy estimator βˆ˜ is also a fuzzy number. j
3.2 Bootstrap fuzzy regression analysis In this section, we introduce bootstrap procedures. In general, regression method for bootstrap is divided into two approaches: the first is based on the resampling observations’
iα
iα
Step 3Draw a n sized bootstrap random sample with replace(b) (b) ment e1α , . . . , en α from the eiα value calculated in step 2 giving 1/n probability for each eiα . Compute the boot(b) strap yiα values by adding resampled residuals onto the least squares regression fit, holding the regression design fixed: (b) (b) yiα = yˆiα + eiα . Step 4 Obtain the least squares estimates from 1st bootstrap sample:
(b) (b)− (b)+ (5) βˆ jα = βˆ jα , βˆ jα , where
ˆ (b)U , βˆ (b)+ = max βˆ (b)L , βˆ (b)U . = min βˆ (b)L βˆ (b)− jα jα , β jα jα jα jα
Step 5 Repeat step 3 and step 4 for b = 1, 2, . . . , B. By resampling residuals and randomly reattaching them to fitted values, the procedure implicitly assumes that the errors are identically distributed. Bootstrapping draws an analogy between the fitted value yˆα in the sample and yα in the population, and between the residual eα in the sample and the error α in the population (Efron 1982). In bootstrap principle, the population is to the sample as the sample is to the bootstrap samples. According to the weak law of large numbers, the empirical distribution function converges in probability to the true distribution function. Note that define the bootstrap (b) observation yiα , by treating βˆα as the “true” parameter and (b)
eiα as the “population” of errors (Wu 1986). It is easy to
123
886
W.-J. Lee et al.
Fig. 1 Bootstrap procedures. a Vector representation of the bootstrap estimates, b standardized the bootstrap estimates, c empirical distribution
check (see appendix) that (b) E βˆ jα g βˆ jα = {0}.
Table 1 The triangular crisp input and fuzzy output
(6)
While the bootstrap is based on the principle of substitution and mimicking sampling behavior, its application is usually carried out with data resampling. In this case, Monte Carlo simulation method can be used to calculate the bootstrap estimators. 3.3 The hypothesis test for slope coefficient In this section, we investigate hypothesis test for the regression parameters β1 α from the bootstrap sampling distribution (b) of βˆ1 . Traditional hypothesis test for an unknown parameters are commonly based on the statistics depending on random samples. The distributions of these statistics are usually driven by assumption of distribution for random sample. In this regard, bootstrap method does not need assumption of distribution, because it makes use of an empirical distribution for bootstrap sample. In regression model, hypothesis test for the slope coefficient (parameter) β1 α equals testing some constant β1,0 α . Following is the hypothesis for this test. H0 : β1 α = β1,0 α v.s. H1 : β1 α = β1,0 α The test statistic used for this test is given by βˆ1α g β1,0 α Tα = d 2 , {0} sβˆ1
(7)
α
where sβ2ˆ
1α
=
n
2 (y , yˆ ) i i n α α 2 (n − 2) i=1 (xi − x) ¯ i=1 d
The distance d 2 was defined in (1). To perform statistical hypothesis test using bootstrap techniques, we propose the following procedure:
123
No.
xi
y˜i
1
1
(6.2, 6.2, 6.8)
9
9
(6.0, 6.3, 7.1)
2
2
(6.1, 6.4, 6.7)
10
10
(6.4, 6.8, 7.0)
3
3
(5.7, 5.8, 6.3)
11
11
(6.6, 6.9, 7.2)
4
4
(6.7, 6.8, 7.0)
12
12
(6.7, 7.0, 7.3)
5
5
(5.5, 5.9, 6.2)
13
13
(6.3, 6.5, 6.9)
6
6
(5.9, 6.2, 6.5)
14
14
(6.5, 6.8, 7.3)
7
7
(6.1, 6.4, 6.7)
15
15
(6.1, 6.3, 7.4)
8
8
(6.0, 6.6, 6.9)
No.
xi
y˜i
Algorithm Repeat from step 1 to step 4 as much as αk = 1 where αk = αk−1 + α, k = 0, . . ., α → 0. Initial value α0 is 0. Step 1 Calculate the least square coefficients from the boot(b) strap fuzzy sample. And convert βˆ1αk into vector representation (Fig. 1a). (b) Step 2 Standardize βˆ1αk using constant interval and sample deviation (Fig. 1b). And calculate the distance from {0} (Fig. 1c). Step 3 Find the empirical distribution of the test statistics (b) based on Tα , b = 1, . . . , B obtained from bootstrap samples. Step 4 Compute the p value as the proportion of values in {Tα1 , Tα2 , . . . , TαB } being greater than or equal to Tα . The null hypothesis H0 should be rejected at significance level λ whenever B I (Tα(b) ≥ Tα ) < λ, (8) p-value = b=1 B where I (·) is the indicator function, equaling ‘1’ if the condition in parentheses is true, and ‘0’ otherwise. Now, we introduce 100(1−λ) % simultaneous confidence region for the coefficient β1α derived by bootstrap empirical (b) sampling distribution for Tα . The bootstrap percentile confidence region is based upon the quantities of the bootstrap empirical distribution. We use the critical point which has its
The statistical inferences of fuzzy regression Table 2 The interval estimate of α-cuts at eleven distinct α-values in EX.1
887
α
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
βˆ0− βˆ +
5.952
5.970
5.988
6.006
6.024
6.042
6.060
6.078
6.096
6.114
6.131
6.418
6.389
6.361
6.332
6.303
6.275
6.246
6.217
6.189
6.160
6.131
0
p value 0.071 0.057 0.052 0.047 0.044 0.039 0.035 0.035 0.029 0.025 0.021 βˆ1− 0.0318 0.0327 0.0336 0.0346 0.0355 0.0364 0.0374 0.0383 0.0392 0.0401 0.0411 0.0586 0.0568 0.0551 0.0533 0.0516 0.0498 0.0481 0.0463 0.0446 0.0428 0.0411 βˆ1+ p value 0.368
Table 3 Test statistics and critical point of α-cuts for slope coefficient in EX.1
α
0
0.319
0.1
0.272
0.2
0.225
0.3
0.194
0.4
0.164
0.5
0.135
0.119
0.105
0.6
0.7
0.8
0.098
0.093
0.9
1.0
Tα
2.858
2.841
2.814
2.777
2.730
2.675
2.612
2.543
2.469
2.392
2.314
CP
3.713
3.555
3.385
3.229
3.070
2.939
2.771
2.633
2.501
2.382
2.268
No.
xi
y˜i
Table 4 Data for Ex.2 No.
(a)
8
7.5 7
xi
y˜i
1
5
(4, 11, 19)
5
17
(23, 25, 27)
2
8
(11, 16, 20)
6
19
(26, 30, 34)
3
11
(15, 18, 21)
7
22
(27, 31, 39)
4
14
(21, 24, 26)
8
24
(28, 37, 48)
6.5
accumulative ratio 1−λ (i.e., P Tα < tλ = 1−λ) according (b) to the ordered value of Tα . Using (1) and (7), the estimated confidence region is
6 5.5 5
0
5
10
15
(b)
β1Cα − βˆ1Cα
2
2 + β1Wα − βˆ1Wα < tλ2 ·
d 2 (yi α , yˆi α ) n . (n − 2) i=1 (xi − x) ¯ 2
(9)
Here, β1Cα , β1Wα are the midpoint and width of β1α = [β1−α , β1+α ], respectively. 4 Numerical examples Numerical examples are used to illustrate the fuzzy regression models that are summarized in previous sections. This example focuses on illustration and application of bootstrap techniques in fuzzy regression analysis.
(c)
Fig. 2 The plot of the statistical inferences in EX.1. a Ploting, b the behavior of p value, c 90 % confidence regions of slope
Example 1 The data pairs (xi , y˜i ) of Table 1, (i = 1, . . . , 15) are used to demonstrate the proposed procedure in case where the crisp input x and fuzzy output y˜ . Table 2 provides p value, the lower, and upper bounds of the membership functions of βˆ˜ 1 and βˆ˜ 0 for 11 alpha levels: 0, 0.1, . . . , 1.0 by applying the least squares method. Precisely, the α = 1.0 shows the regression coefficient that is most likely, and the α = 0 shows the range in which the regression coefficient could appear. In this example, although these two regression coefficients βˆ˜ 1 and βˆ˜ 0 are fuzzy, their
123
888
W.-J. Lee et al.
Table 5 The interval estimate of α-cuts at eleven distinct α-values in EX.2 α
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
βˆ0− βˆ +
−0.629
−0.118
0.393
0.904
1.416
1.927
2.438
2.949
3.460
3.971
4.483
7.036
6.781
6.525
6.270
6.015
5.759
5.504
5.249
4.993
4.738
4.483
p value βˆ −
0.108
0.087
0.069
0.059
0.054
0.043
0.045
0.046
0.047
0.051
0.050
1.249
1.255
1.260
1.265
1.270
1.275
1.281
1.286
1.291
1.296
1.301
1.567
1.540
1.514
1.487
1.461
1.434
1.408
1.381
1.355
1.328
1.301
p value
0.067
0.053
0.051
0.055
0.051
0.058
0.056
0.065
0.073
0.081
0.086
1 βˆ1+
most likely values are 0.0411 and 6.131, respectively; and it is impossible for their values to take outside the ranges of [0.0318, 0.0586] and [5.952, 6.418], respectively. Note that in this example if one provides the value of a regression coefficient as a crisp number, then it is clear that it will loose some important information in fuzzy regression analysis. Table 3 shows test statistic and the critical point (CP) of slope coefficient at significance level λ = 0.1 for each 1,000 replications, base on the algorithm described in Sect. 3. To check whether linear relationship between the independent variable and the dependent variable is a significant or not, we consider hypothesis as follows:
(a)50 40
30
20
10
0
H0 : β1 α = {0} v.s. H1 : β1 α = {0}. For α-cuts, the null hypothesis states that the slope is equal to {0}, and the alternative hypothesis states that the slope is not equal to {0}. To perform hypothesis test, 1,000 replicate data sets were created by bootstrap method from residual for each observation. If the observed Tα value is larger than CP or p value is smaller than significance level λ, then the null hypothesis is rejected. Figure 2 shows the behavior of p value for α-cuts. As shown in Fig. 2, when α-cuts are larger than 0.883, we accept the alternative hypothesis at the level of significance λ = 0.1. On the other hand, when α-cuts are smaller than 0.883, we accept the null hypothesis. This example shows that the statistical significance of the slope can be changed depending on the vagueness of the data. In data of Example 1, the statistical significance is changed when α = 0.883, moreover the estimates are not fuzzy number anymore. The 90 % confidence region defined in (9) is provided in Fig. 2c, where β − +β + x-axis is the center β1C = 1 2 1 of β1 = [β1− , β1+ ], y-axis β + −β − is the width β1W = 1 2 1 , and z-axis is α. The black line shows the estimates. Example 2 Table 4 shows the fuzzy observations which are non-symmetric triangular fuzzy numbers in paper (Chang and Lee 1994).
123
5
10
15
20
25
(b) p−value
0
0.1 β1
0.05
0
β0
0
0.2
0.4
0.6
0.8
1
α
(c)
Fig. 3 The plot of the statistical inferences in EX.2. a Ploting, b the behavior of p value, c 90 % confidence regions of slope
In Table 5, regression coefficients βˆ˜ 0 and βˆ˜ 1 are fuzzy numbers and their most likely values are 4.483 and 1.301, respectively. It is impossible for their values to take outside the ranges of [−0.629, 7.036] and [1.249, 1.567], respec-
The statistical inferences of fuzzy regression
tively. Figure 3b shows the behavior of p value according to α-cuts for hypothesis H0 : β j α = {0} v.s. H1 : β j α = {0}, j = 0, 1. As shown in Fig. 3b, we accept the alternative hypothesis at the level of significance λ = 0.1. That is, the slope is significant for all α and βˆ˜ 1 is the fuzzy number based on the “Resolution Identity” of Zadeh. Figure 3c shows bootstrap percentile confidence regions with respect to the slope coefficient at 90 % level of significance. x-axis express cen β − +β + ter β1C = 1 2 1 of β1 = [β1− , β1+ ], y-axis is the width W β + −β − β1 = 1 2 1 , and z-axis is alpha. The black line shows the estimates for slope. Because width (β1W ) is greater than or equal to 0, magenta part of Fig. 3c is the confidence region of slope coefficient. For these, 1,000 replication (iteration) were performed using 1,000 bootstrap samples.
889
In Case 2, in the given conditions, it follows that (b)− (b)− yˆi−α = yˆiUα and yˆi+α = yˆiLα so that yˆiα = yˆiUα + eiα and = yˆiLα + ei(b)+ . yˆi(b)+ α α
g yˆiα , produces, yˆi(b)− − yˆi−α = In Cases 1, 2, taking yˆi(b) α α
(b)−
eiα
(b)+
and yˆiα
(b)+
− yˆi+α = eiα
. Therefore
E βˆ1(b) g βˆ1α α n (b)− n (b)+ ¯ iα ¯ iα i=1 (x i − x)e i=1 (x i − x)e n , =E n ¯ 2 ¯ 2 i=1 (x i − x) i=1 (x i − x)
⎧ n (x −x) ⎨ ni=1 i ¯ 2 E ei(b)− , ei(b)+ if xi ≥ x, ¯ ¯ α α i=1 (x i − x) n
= (x − x) ¯ (b)+ (b)− ⎩ i=1 i {0} g E − eiα , −eiα if xi < x¯ n (x −x) ¯ 2 i=1
i
= {0}.
The intercept coefficient is derived in a similar way. 5 Conclusion In this paper, we have presented several results concerning statistical inference in the presence of fuzzy data using bootstrap techniques. For each α-cut, we estimated and tested hypotheses for the linear regression model based on the fuzzy least squares estimator. Numerical examples are illustrated to present the hypothesis test and to provide the confidence regions in fuzzy regression analysis by proposed bootstrap algorithm. As it is shown in the first example , the result of hypothesis test for the fuzzy regression parameters is dependent on the degree of membership of the fuzzy data. As a conclusion, bootstrap method is preferable in statistical model with fuzzy data because there are some theoretical properties which need basic assumptions for distribution in terms of residuals such as normal distribution. Acknowledgments This research was partially supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (Grant number: NRF-2014R1A1A2002032).
Appendix To prove Eq. (6), the following four cases apply according (b) to endpoints of yˆiα , yˆiα . (b)L
1. yˆiLα < yˆiUα and yˆiα 2. 3. 4.
yˆiLα yˆiLα yˆiLα
< >
and and and
(b)L yˆiα (b)L yˆiα (b)L yˆiα
< > >
.
(b)U yˆiα . (b)U yˆiα . (b)U yˆiα .
In Case 1, in the given conditions, it follows that (b)− (b)− yˆi−α = yˆiLα and yˆi+α = yˆiUα so that yˆiα = yˆiLα + eiα and
(b)+
yˆiα
>
yˆiUα yˆiUα yˆiUα
(b)U
< yˆiα
(b)+
= yˆiUα + eiα
.
(b) E βˆ0α g βˆ0α (b)− (b)+ (b)+ ˆ ˆ − x, ˆ ˆ + x¯ ¯ e ¯ − β − β − β − β = E e¯i(b)− 1 1 1 1 i α α α α α α
+ 0 g − βˆ1(b)− , e¯i(b)+ + βˆ1−α , −βˆ1(b)+ + βˆ1+α x¯ = E e¯i(b)− α α α α (b) (b) = E e¯iα g E βˆ0α g βˆ0α x¯ = 0
In Case 3. In the given conditions, it follows that (b)−
yˆi−α = yˆiLα and yˆi+α = yˆiUα so that yˆiα yˆi(b)+ α
=
yˆiLα
+ ei(b)− . α
(b)+
= yˆiUα + eiα
and
In Case 4. In the given conditions, it follows that (b)−
yˆi−α = yˆiUα and yˆi+α = yˆiLα so that yˆiα (b)+ yˆiα
=
yˆiUα
(b)− + eiα .
(b)+
= yˆiLα + eiα
and
From the conditions, Case 3 and Case 4 impossible.
References Akbari MG, Mohammadalizadeh R, Rezaei M (2012) Bootstrap statistical inference about the regression coefficients based on fuzzy data. Int J Fuzzy Syst 14:549–556 Arnold BF (1995) Statistical tests optimally meeting certain fuzzy requirements on the power function and on the sample size. Fuzzy Sets Syst 75:365–372 Casals MR, Gil MR, Gil P (1986) On the use of Zadeh’s probabilistic definition for testing statistical hypotheses from fuzzy information. Fuzzy Sets Syst 20:175–190 Chang P-T, Lee ES (1994) Fuzzy least absolute deviations regression and the conflicting trends in fuzzy parameters. Comput Math Appl 28:89–101 Efron B (1982) The jackknife, the bootstrap, and other resampling plans, vol 38. Society of Industrial and Applied Mathematics CBMS-NSF Monographs, Philadelphia. ISBN 0898711797 Feng Y-H, Hu L-J, Shu H-S (2001) The variance and covariance of fuzzy random variables and their applications. Fuzzy Sets Syst 120:487– 497 Grzegorzewski P (2000) Testing statistical hypotheses with vague data. Fuzzy Sets Syst 112:501–510
123
890 Hu L, Wu R, Shao S (2002) Analysis of dynamical systems whose inputs are fuzzy stochastic processes. Fuzzy Sets Syst 129:111–118 Kaufmann A (1975) Introduction to the theory of fuzzy subsets, vol 1. Academic Press, New York Kim SJ, Seo IY (2012) A clustering approach to wind power prediction based on support vector regression. Int J Fuzzy Log Intell Syst 12:108–112 Körner R (1997) On the variance of fuzzy random variables. Fuzzy Sets Syst 92:83–93 Kruse R, Meyer KD (1987) Statistics with vague data. D. Reidel Publishing Company, Dortrecht Kwakernaak H (1978) Fuzzy random variables, part I: definitions and theorems. Inf Sci 15:1–15 Kwakernaak H (1979) Fuzzy random variables, part II: algorithms and examples for the discrete case. Inf Sci 17:253–278 Lin JG, Zhuang QY, Huang C (2012) Fuzzy statistical analysis of multiple regression with crisp and fuzzy covariates and applications in analyzing economic data of China. Comput Econ 39:29–49 Maria BF, Renato C, Gil GR (2013) Bootstrap confidence intervals for the parameters of a linear regression model with fuzzy random variables. Towards Adv Data Anal Comb Soft Comput Stat 285: 33–42
123
W.-J. Lee et al. Montenegro M, Colubi A, Casals MR, Gil MA (2004) Asymptotic and bootstrap techniques for testing the expected value of a fuzzy random variable. Metrika 59:31–49 Puri MD, Ralescu D (1986) Fuzzy random variables. J Math Anal Appl 114:409–422 Stefanini L (2010) A generalization of Hukuhara difference for interval and fuzzy arithmetic. Fuzzy Sets Syst 161:1564–1576 Tanaka H, Uejima S, Asai K (1982) Linear regression analysis with fuzzy model. IEEE Syst Trans Syst Man Cybern SMC-2:903–907 Wu CFJ (1986) Iackknife, bootstrap and other resampling methods in regression analysis. Ann Stat 14(4):1261–1295 Wu H-C (2005) Statistical hypotheses testing for fuzzy data. Inf Sci 175:30–56 Wu H-C (2007) Analysis of variance for fuzzy data. Int J Syst Sci 38:235–246 Zadeh LA (1975a) The concept of linguistic variable and its application to approximate reasoning I. Inf Sci 8:199–249 Zadeh LA (1975b) The concept of linguistic variable and its application to approximate reasoning II. Inf Sci 8:301–357 Zadeh LA (1975c) The concept of linguistic variable and its application to approximate reasoning III. Inf Sci 9:43–80