International Journal of Fuzzy Systems, Vol. 14, No. 4, December 2012
549
Bootstrap Statistical Inference about the Regression Coefficients Based on Fuzzy Data M. G. Akbari, R. Mohammadalizadeh, and M. Rezaei Abstract1 Theory of regression methods is based on the crispness of the observations and the parameters of interest. But there can be many different situations in which the above mentioned concepts are imprecise. On the other hand, the theory of fuzzy sets is a well established tool for formulation and analysis of imprecise and subjective concepts. In these times we must use the fuzzy regression. In this paper, at first we use a well-known signed distance, then with using this signed distance we estimate the crisp regression coefficients based on fuzzy data using least square method. Finally, we exhibit confidence interval and hypothesis testing for these coefficients based on bootstrap theory and numerical examples are also provided to illustrate the approach. In case of the confidence interval and hypothesis testing problem, bootstrap techniques (Efron and Tibshirani, [10]) have empirically been shown to be efficient and powerful. Keywords: Canonical fuzzy number, Yao-Wu signed distance, Bootstrap theory, Linear regression, Confidence interval, Hypothesis testing.
1. Introduction Fuzzy linear regression models are used to obtain and appropriate linear relation between a dependent variable and several independent variables in a fuzzy environment. A fuzzy linear regression model was first introduced by Tanaka et al. [27]. They formulated a linear regression model with fuzzy response data, crisp predictor data and fuzzy parameters as a mathematical programming problem. Their approach was later improved by Tanaka et al. [25, 26, 28]. Diamond [9] proposed the fuzzy least square approach to determine Corresponding Author: M. G. Akbari, Department of Statistics, School of Mathematical Sciences, Birjand University, 91775-1159, Birjand, Iran. E-mail:
[email protected] R. Mohammadalizadeh E-mail:
[email protected] M. Rezaei E-mail:
[email protected] Manuscript received 23 Nov. 2010; revised 25 Feb. 2011; accepted 6 June 2012.
fuzzy parameters by defining a metric between two fuzzy numbers. Xizhao and Minghu [30] introduced a new principle of estimating parameters, which was called Minmax principle. The Minmax estimation of the regression model parameters and the LS estimation of the approximation model parameters were discussed. Wu [29] presented the fuzzy estimates of regression parameters in linear regression models for imprecise input and output data with the help of “Resolution Identity”. Nasrabadi et al. [20] considered fuzzy linear regression models with fuzzy-crisp output, fuzzy-crisp input, and proposed an estimated method along with a mathematical-programming-based approach. Nather [21] exhibited different approaches to deal with regression analysis when the data were fuzzy. Bargielaa et al. [7] proposed an iterative algorithm for multiple regression with fuzzy variables. Ge and Wanga [11] extended the fuzzy linear regression model first to its regularized version, i.e. regularized fuzzy linear regression model, so as to enhance its generalization capability; then regularized fuzzy linear regression model was explained as the corresponding equivalent maximum a posteriori problem. Gonzá lez-Rodr ı́ gueza et al. [12] considered estimation of a simple linear regression model for fuzzy random variables, specifically, the least-squares estimation problem in terms of a versatile metric was addressed. Lina and Pai [18] studied a fuzzy support vector regression model for business cycle predictions. Azadeha et al. [6] presented an integrated fuzzy regression algorithm for energy consumption estimation with non-stationary data (a case study of Iran). Sener and Karsaka [24] presented a fuzzy multiple objective decision framework that includes not only fulfillment of engineering characteristics to maximize customer satisfaction, but also maximization of extendibility and minimization of technical difficulty of engineering characteristics as objectives subjected to a financial budget constraint to determine target levels of engineering characteristics in product design. The most literatures about the regression model in fuzzy environments are investigated based on the independent fuzzy random variables [11, 12, 21, 26] or using linear programming methods [6, 20, 24, 26]. These approaches for estimating of parameter of regression models in some cases are complex. In addition to, we can not use these
© 2012 TFSA
550
International Journal of Fuzzy Systems, Vol. 14, No. 4, December 2012
approaches, if some items of model do not exist. Hence, in this paper, we estimate the parameters of regression model based on Yao and Wu [32] signed distance. Also, some statistical inference about parameters of model is investigated using the bootstrap approaches. It should be mentioned that, the present work has some advantages as follows: . 1) The proposed method has achieved a simpler procedure for estimating of parameters of model. Also, it is available and useful for many researchers in the different sciences [6, 24]. .2) The present method in some cases is better than Yang and Lin [31] and Arabpour and Tata [5] (see Section 4). We organize the matter in the following way: in section 2 we describe some basic concepts of canonical fuzzy numbers and Yao-Wu signed distance. Section 3 is devoted to estimate regression coefficients using least square error method. In section 4 we evaluate the performance of our fuzzy regression model. In sections 5 and 6 we give confidence interval and hypothesis testing for these coefficients based on t-bootstrap theory with examples, respectively. At last, a brief conclusion is provided in section 7.
2. Preliminaries In this section we introduce canonical fuzzy numbers and Yao-Wu signed distance, respectively. A. Canonical fuzzy numbers Let R be the set of all real numbers, then a fuzzy subset of R is defined by its membership function the : 0,1 . We denote by : -cut set of and is the closure of the set 0 , and : 1. is called a normal fuzzy set if there exist 1; such that 2. is called a convex fuzzy set if 1 min , for all 0,1 ; 3. the fuzzy set is called a fuzzy number if is normal convex fuzzy set and its -cut sets, is bounded 0; 4. is called a closed fuzzy number if is fuzzy is upper number and its membership function semicontinues; 5. is called a bounded fuzzy number if is a fuzzy has number and its membership function compact support. If is a closed and bounded fuzzy number with inf : and sup : and its membership function be strictly increasing on the
interval , and strictly decreasing on the interval , , then is called canonical fuzzy number. Given a real number , we can induce a fuzzy such that number with membership function 1 and 1 for . Let be the set of all canonical fuzzy real numbers induced by the real numbers . Let “ ” be a binary operation or between two canonical fuzzy numbers and . The membership function of is defined by sup min , for or and or . be a binary operation or between Let two closed intervals , and , . is defined by Then : , , . If and be two closed fuzzy numbers, then and are also closed fuzzy numbers. Furthermore, we have , ; , . B. Yao-Wu signed distance Now we define a distance between fuzzy numbers which is used later. Several ranking methods have been proposed so far, by Puri and Ralescu [23], Cheng [8], Modarres and Sadi-Nezhad [19], Nojavan and Ghazanfari [22], and Akbari and Rezaei [1]. In this paper we use another metric for canonical fuzzy numbers that is known as Yao and Wu signed distance. Definition 1: For each a, b R define the signed distance d of a and b by d a, b a b. Thus, we have the following way to define the rank of any two numbers onR. For each a, b R, 0 d a, 0 d b, 0 a , d a, b 0 d a, 0 d b, 0 a , d a, b 0 d a, 0 d b, 0 a b. d a, b Definition 2: Let F(R) be all of canonical fuzzy numbers on the real line R. For each a, b F R define the signed distance of a and b as follows: d a, b
d Mα a , Mα b dα Mα a
where , Mα a
L α
U α
Mα b dα L α
and Mα b
U α
.
Definition 3 (Yao and Wu, [32]): For each a, b define the ranking of a and b by d a, b d a, b
0 0
a, 0 a, 0
b, 0 b, 0
a a
b, b,
F R ,
551
M. G. Akbari et al.: Bootstrap statistical inference about the regression coefficients based on fuzzy data
d a, b
0
d a, 0
d b, 0
a
b.
,0
a)
3. Least squares error ,…, be pairs of canonical fuzzy , , Let ̃, numbers. For any fuzzy line 1,2, … , sum of squares error is defined to be
,
∑
,0
,0
∑ 1
and
,
and
1
1
,0 1
,0
are:
∑
,0 ∑ ∑
,0
,0
,0
By using the properties of Yao- Wu signed distance we verify ∑ , , , ∑ , , where
,
and
,
,0 ,0 are equal to ∑
,
,
and
∑
, respectively, furthermore . Theorem 1: If x , y , i 1, … , n are pairs canonical fuzzy numbers then ,
,
,
Proof: ,
,
,0
, ,0 ,0
(1)
, ,0 ,0 ,0 , , in view of (1) and (2) we simply have ∑ ,0 , 0.
,0 ∑ ,0 and also ,
,
The minimized values of
,0 ,0 ,
,0
,0 ,
, ,
Theorem 3: Let x , y , i 1, … , n be pairs canonical fuzzy numbers. Hence we have SST SSR SSE where SST ∑ d y , . Proof: The facts that are used to prove the theorem are as follows:
,0 ,0
, (2) (3)
,0 Now by (3) and equality ∑ ∑ , ,
where is Yao- Wu signed distance. and are least squares estimators of If then we have ,
,0 ,0
b)
,
min
,0 ,0
,0 ,
, 0 we have ,0
,0 ,0
,0 ∑
,0 ,
,0 ,
∑ ,
,0 0
(4)
,
, , which results in: , , , 2 , , Finally, with use (4) and (5) we have ∑ ∑ , , ∑ , 2∑ , , ∑ ∑ , , 0 . A statistic that is used to show how well the fuzzy line is fitted to fuzzy data is the coefficient of determination. It is defined as the ratio of the regression sum of squares to the total sum of squares also It is shown and is written as follows: by ∑ , ∑ , Definition 4: The correlation of and is defined as follows: cov , ρ , σ σ and σ are equal to where cov , , σ ∑ d x, d y, , ∑ d x, and ∑ d y , respectively, furthermore x and d is YaoWu signed distance. Now, we define or verify some of the properties of correlation as follows: a) , 1. b) , , . c) | , | 1. Remark 1: | , | 1. Proof: We have
552
International Journal of Fuzzy Systems, Vol. 14, No. 4, December 2012
1
We obtain 0.915,
, 1
,
,
thus we verify 1
,
1 and similarly , 1
,
, ,
2 ,
which results in
1.
,
Remark 2:
.
and R are correlation and Theorem 4: If ρ , coefficient of determination respectively, then |R| |ρ , |. Proof: , ,
| |
∑
,
, ,
∑
, 1
1
∑
1
,
∑
,
1
2
,
2
,
1
2
|
∑
, ,
|
|
,
|
Remark 3: The above procedures of regression reduce to ordinary procedures of regression, if the data set is crisp (non-fuzzy). Example 1: Suppose that we have taken a fuzzy random sample of size n 8 from a population and we have observed the following triangular fuzzy data: Table 1. Fuzzy random sample of size population.
97.72 , 0.52 and
,
31.5 ,
,
50.75 ,
3.57.
4. Performance of fuzzy regression model
,
2
,
8 from a
To evaluate the performance of fuzzy regression model, Kim and Bishu [16] studied a modification of fuzzy linear regression analysis based on a criterion of minimizing the difference of the fuzzy membership values between the observed and estimated fuzzy numbers. Kao and Chyu [14] proposed a two-stage approach to construct the fuzzy linear regression model. In the first stage, the fuzzy observations were defuzzified so that the traditional least-squares method could be applied to find a crisp regression line showing the general trend of the data. In the second stage, the error term of the fuzzy regression model, which represents the fuzziness of the data in a general sense, was determined to give the regression model the best explanatory power for the data. Yang and Lin [31] proposed two estimation methods along with a fuzzy least-squares approach. These proposed methods can be effectively used for the parameter estimation. Kao and Chyu [15] proposed an idea stemmed from the classical least squares to handle fuzzy observations in regression analysis. Based on the extension principle, the membership function of the sum of squared errors were constructed. Hojati et al. [13] proposed a new method for computation of fuzzy regression that was simple and gave good solutions. Kratsehmer [17] was a contribution to the estimation of parameter in fuzzy regression models with random fuzzy sets. Here models with crisp parameters and fuzzy observations of the variables was investigated. Two estimation methods along with a fuzzy least squares approach were proposed. These proposed methods could be effectively used for the parameter estimation. Comparisons are also made between them. Arabpour and Tata [5] used the absolute difference between the membership value of the observed and estimated fuzzy response, as a measure of error ( area in Figure 1). In these papers this measure is also adopted as the criterion for computing the performance of different methods; smaller values of this measure indicate a better fit.
Figure 1. The prediction error for an observation.
553
M. G. Akbari et al.: Bootstrap statistical inference about the regression coefficients based on fuzzy data
By applying the fuzzy least squares method described and for in section 3 we estimated the parameters this model. In order to compare our method with that of Yang-Lin and Arabpour-Tata, we assume that and , 1, 2, . . . , are triangular fuzzy numbers. Example 2: consider Table 1. To show the effectiveness of our method with Yang-Lin and Arabpour-Tata, we apply this data. The fuzzy regression model estimated by our method is as follows: 0.52 3.57 and the fuzzy regression model estimated by Yang and Lin is as follows: 0.5251,0.5293,0.5335 3.2052,3.4967,3.7882 and the fuzzy regression model estimated by Arabpour and Tata is as follows: 0.518804,0.519348,0.523524 3.27580,3.57243,3.83865 Table 2 lists these errors. Again the sum of errors of our method (6.66069) is less than that of Yang-Lin method and that of Arabpur-Tata method. This implies that our fit is slightly better than that of Yang-Lin method and that of Arabpour-Tata method.
and
In this section we introduce bootstrap confidence interval based on t-bootstrap theory for and . We generate bootstrap fuzzy random samples , , , , …, , where each , is a fuzzy sample of size drawn randomly , ; 1,2, … , we and with replacement from compute ,
1,2, … , , ,
1,2, … , , 1
,0 ,
where
is equal to
.
, The th percentile of ̂ estimated by the value that is in: ̂ , # :
,
0,1 , is
where 1,2, … , and 0,1. The bootstrap confidence interval by using fuzzy
data is ̂
∏ where
1
,0 ,
,
and
0
̂
0,1,
are equal to
respectively, furthermore γ
is not an integer, Remark 4: If B procedure can be used. γ Assuming 0.5 , let k
and
,
. the following B
1
γ
,
γ
Table 2. The data and error in Example 1.
5. Bootstrap confidence interval for
,
the largest integer B 1 . then we define the empirical γ and 1 γ quantizes by the kth largest and , , i 0,1 , B 1 k th largest values of t respectively. Example 3: Consider Table 1. Suppose we are interested in bootstrap confidence interval using those fuzzy data. We use the package "MINITAB 13" to deal with complicated fuzzy data structure for this data. For example, if 10000, the estimate of the %5 point is the 500th largest value of the t x , y s and the estimate of the %95 point is the 9500th largest value of the , s. The last line of Table 3 shows the percentiles of , for and computed using 10000 bootstrap samples. Table 3. Percentiles of the distribution with 6 degree of freedom, the bootstrap distribution of , and , .
554
International Journal of Fuzzy Systems, Vol. 14, No. 4, December 2012
The bootstrap confidence intervals 0.05 using fuzzy data are ∏ 0.52 2.736 0.0937,0.52 2.398 0.0937 0.26,0.75 ∏ 3.57 2.140 0.7806,3.57 2.123 0.7806 1.90,5.23 We show the distribution of , , 0,1 , computed using 10000 bootstrap samples as follows:
1. Draw samples of size with replacement from , ; 1,2, … , , . on each sample, 2. Evaluate ,
where
,
, ,0
,
,
1,2, … , , also
and
are equal to
,0
,
and
, respectively.
.Decision rule
Figure 2. Bootstrap distribution of
,
We choose a small probability like 0.01, 0.05 or 0.1, and
(significant level),
. We reject
if
,
.
̂
in error level
,
̂
or
.
̂ . , in error level if ̂ Example 4: Consider Table 1. Suppose that we are interested for testing a) : versus : , : versus : . b) We tested those hypothesis testing that are shown in tables 4 and 5.
.We accept
Figure 3. Bootstrap distribution of
,
Table 4.
,
,
and results.
Table 5.
,
,
and results.
.
6. Bootstrap testing statistical hypotheses In this section we give hypothesis testing on coefficients based on bootstrap theory. we describe bootstrap method that is designed directly for hypothesis testing for fuzzy data based on Yao-Wu signed distance. Compution of the bootstrap test statistics for testing : 1. Draw samples of size with replacement from , ; 1,2, … , , 2. Evaluate . on each sample, where
,
,
,
,
,
1,2, … ,
also
and
,
are equal to
and
,
respectively. Compution of the bootstrap test statistics for testing :
M. G. Akbari et al.: Bootstrap statistical inference about the regression coefficients based on fuzzy data
555
[6] A. Azadeha, M. Saberib, and O. Seraja, “An integrated fuzzy regression algorithm for energy consumption estimation with non-stationary data: A In this paper, we proposed a technique in order to get case study of Iran,” Energy, vol. 35, no. 6, pp. bootstrap statistical inference about the regression 2351-2366, 2010. coefficients based on fuzzy data according to Yao-Wu signed distance. It sounds the introduced method is [7] A. Bargiela, W. Pedrycz and T. Nakashima, “Multiple regression with fuzzy data,” Fuzzy Sets simpler and more accurate than that of Yang-Lin method and Systems, vol. 158, no. 19, pp. 2169-2188, 2007. and that of Arabpour-Tata method and furthermore this [8] C. Cheng, “A new approach for ranking fuzzy metric is very realistic because numbers by distance method,” Fuzzy Sets and • which implies very good statistical properties in Systems, vol. 95, no. 3, pp. 307-317, 1998. connection with regression coefficients; [9] P. Diamond, “Fuzzy least squares,” Information • it involves distances between extreme points; Sciences, vol. 46, no. 3, pp. 141-157, 1988. • it is distance with convenient statistical features. [10] B. Efron and R. J. Tibshirani, “An Introduction to This method is especially attractive since the the Bootstrap,” Chapman-Hall, New York, 1993. estimators are very similar to those of the classical linear regression model and when the fuzzy data becomes crisp, [11] H. W. Ge and S. T. Wanga, “Dependency between degree of fit and input noise in fuzzy linear the estimators and the fuzzy regression line are identical regression using non-symmetric fuzzy triangular to those in the classical case. Extension of the proposed coefficients,” Fuzzy Sets and Systems, vol. 158, no. method to linear models, such as regression models, 19, pp. 2189-2202, 2007. design of experiment is a potential area for the future [12] G. Gonzá lez-Rodrı́gueza, A. Blancob, A. Colubib, work. Furthermore for the testing hypothesis : and M. A. Lubiano, “Estimation of a simple linear , 0,1 as a fuzzy hypothesis (fuzzy coefficient), we regression model for fuzzy random variables,” can apply the methods that are in the papers Akbari and Fuzzy Sets and Systems, vol. 160, no. 3, pp. 357-370, Rezaei [2], Akbari et al. [3], and Akbari and Rezaei [4]. 2009. These methods are a potential area for the future work. [13] M. Hojati, C. R. Bector, and K. Smimou, “A simple method for computation of fuzzy linear regression,” Acknowledgment European Journal of Operational Research, vol. 166, no. 1, pp. 172-184, 2005. The authors wish to express their thanks to the referees for valuable comments which improve the paper. [14] C. Kao and C. Chyu, “A fuzzy linear regression model with better explanatory power,” Fuzzy Sets Also, the authors would like to thanks the Editor-inand Systems, vol. 126, no. 3, pp. 401-409, 2002. Chief Professor Wang for useful helps. [15] C. Kao and C. Chyu, “Least-square estimates in fuzzy regression analysis,” European Journal of References Operational Research, vol. 148, no. 2, pp. 426-435, 2003. [1] M. G. Akbari and A. Rezaei, “Order statistics using [16] N. Kim and R. R. Bishu, “Evaluation of fuzzy linear fuzzy random variables,” Statistics and Probability regression models by comparing membership Letters, vol. 79, no. 8, pp. 1031-1037, 2009. functions,” Fuzzy Sets and Systems, vol. 100, no. 1[2] M. G. Akbari and A. Rezaei, “Bootstrap statistical 3, pp. 343-353, 1998. inference for the variance based on fuzzy data,” Austrian Journal of Statistics, vol. 38, no. 2, pp. [17] V. Kr a tschmer, “Least-square estimates in linear regression models with vague concepts,” Fuzzy Sets 121-130, 2009. and Systems, vol. 157, no. 19, pp. 2579-2592, 2006. [3] M. G. Akbari, A. Razaei and Y. Waghei,“Statistical [18] K. P. Lina and P. F. Pai, “A fuzzy support vector inference about the variance of fuzzy random regression model for business cycle predictions,” variables,” Sankhya, vol. 76, no. 2, pp. 206-221, Expert Systems with Applications, vol. 37, no. 7, pp. 2009. 5430-5435, 2010. [4] M. G. Akbari and A. Razaei, “Bootstrap testing [19] M. Modarres and S. Sadi-Nezhad, “Ranking fuzzy fuzzy hypotheses and observations on fuzzy numbers by preference ratio,” Fuzzy Sets and statistic,” Expert Systems with Applications, vol. 37, Systems, vol. 118, no. 3, pp. 429-436, 2001. no. 8, pp. 5782-5787, 2010. [5] A. Arabpour and M. Tata, “Estimating the [20] M. M. Nasrabadi and E. Nasrabadib, “A mathematical-programming approach to fuzzy parameters of a fuzzy linear regression model,” linear regression analysis,” Applied Mathematics Iranian Journal of Fuzzy Systems, vol. 5, no. 2, pp. and Computation, vol. 16, no. 3, pp. 873-881, 2004. 1-19, 2008.
7. Conclusions
556
International Journal of Fuzzy Systems, Vol. 14, No. 4, December 2012
[21] W. Nather, “Regression with fuzzy data,” Computational Statistics and Data Analysis, vol. 51, no. 1, pp. 235-252, 2006. [22] M. Nojavan and M. Ghazanfari, “A fuzzy ranking method by desirability index,” Journal of Intelligent and Fuzzy Systems, vol. 17, no. 1, pp. 27-34, 2006. [23] M. L. Puri and D. A. Ralescu, “Differential of fuzzy functions,” Journal of Mathematical Analysis and Applications, vol. 114, no. 2, pp. 552-558, 1983. [24] Z. Sener and E. E. Karsaka, “A combined fuzzy linear regression and fuzzy multiple objective programming approach for setting target levels in quality function deployment,” Expert Systems with Applications, vol. 38, no. 4, pp. 3015-3022, 2011. [25] H. Tanaka, I. Hayashi and J. Watada, “Possibilistic linear regression analysis with fuzzy model,” European Journal of Operational Researches, vol. 40, no. 3, pp. 389-396, 1989. [26] H. Tanaka and H. Lee, “Interval regression analysis by quadratic programming approach,” IEEE Trans. on Systems, Man, and Cybernetics, vol. 6, no. 4, pp. 473-481, 1988. [27] H. Tanaka, S. Uegima, and K. Asai, “Linear regression analysis with fuzzy model,” IEEE Trans. on Systems, Man, and Cybernetics, vol. 12, no. 6, pp. 903-907, 1982. [28] H. Tanaka and J. Watada, “Possibilistic systems and their application to the linear regression model,” Fuzzy Sets and Systems, vol. 27, no. 3, pp. 275-289, 1988. [29] H. C. Wu “Fuzzy estimates of regression parameters in linear regression models for imprecise input and output data,” Computational Statistics and Data Analysis, vol. 42, no. 1-2, pp. 203-217, 2003. [30] W. Xizhao and H. Minghu, “Fuzzy linear regression analysis,” Fuzzy Sets and Systems, vol. 51, no. 2, pp. 179-188, 1992. [31] M. S. Yang and T. S. Lin, “Fuzzy least-squares linear regression analysis for fuzzy input-output data,” Fuzzy Sets and Systems, vol. 126, no. 3, pp. 389-399, 2002. [32] J. S. Yao and K. Wu, “Ranking fuzzy numbers based on decomposition principle and signed distance,” Fuzzy Sets and Systems, vol. 116, no. 2, pp. 275-288, 2000.