Expert Systems with Applications 39 (2012) 840–848
Contents lists available at SciVerse ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
A fuzzy extended DELPHI method for adjustment of statistical time series prediction: An empirical study on dry bulk freight market case q Okan Duru a,⇑, Emrah Bulut b, Shigeru Yoshida b a b
Department of Maritime Transportation and Management Engineering, Istanbul Technical University, Tuzla 34940, Istanbul, Turkey Department of Maritime Logistics, Kobe University, Higashinada 658-0022, Kobe, Japan
a r t i c l e
i n f o
Keywords: Fuzzy-DELPHI Decision support systems Forecasting support systems Dry bulk shipping Consensus forecasts
a b s t r a c t This paper investigates the forecasting accuracy of fuzzy extended group decisions in the adjustment of statistical benchmark results. DELPHI is a frequently used method for implementing accurate group consensus decisions. The concept of consensus is subject to expert characteristics and it is sometimes ensured by a facilitator’s judgment. Fuzzy set theory deals with uncertain environments and has been adapted for DELPHI, called fuzzy-DELPHI (FD). The present paper extends the recent literature via an implementation of FD for the adjustment of statistical predictions. We propose a fuzzy-DELPHI adjustment process for improvement of accuracy and introduced an empirical study to illustrate its performance in the validation of adjustments of statistical forecasts in the dry bulk shipping index. Ó 2011 Elsevier Ltd. All rights reserved.
1. Introduction Over the past 50 years, scholars have conducted substantial research on accurate group decisions and the improvement of forecasting tasks. The structure and design of group decision-making is critical for ensuring efficiency and participation stability, as it is a recurring challenge in traditional group meetings. Dalkey and Helmer (1963) were the first scholars to draw attention to the efficiency of group decisions by suggesting the DELPHI consensus building methodology (Linstone & Turoff, 1975; Rowe & Wright, 1999). The DELPHI method has become widely accepted and adopted for many different fields in which group consensus decisions are needed (Fischer, 1981; Parente´, Anderson, Myers, & O’Brien, 1984; Rowe & Wright, 1996). While these studies have focused on conventional group meetings, DELPHI has also been implemented for forecasting purposes, such as forecasting economic and political issues (Parente´ et al., 1984), project selection problems (Yang & Hsieh, 2009), and the assessment of gas storage services (Bonacina, Cretì, & Sileo, 2009). The main attributes of DELPHI are its anonymity, iterative process, feedback process and consensus of group members via equal participation in outcomes (Rowe & Wright, 1999). DELPHI is employed for forecasting purposes by utilizing expert judgments as one of the judgmental forecasting tools. Expert prediction has been suggested by many researchers for forecasting q An earlier version of this paper appeared as part of the ICCAE 2010 Conference Proceedings, published under IEEE copyright. ⇑ Corresponding author. Tel.: +81 80 3830 0190; fax: +81 78 431 6259. E-mail address:
[email protected] (O. Duru).
0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.07.082
tasks (Goodwin & Wright, 1993; Lawrence, Edmundson, & O’Connor, 1985; Lawrence & O’Connor, 1992). Ex ante studies of expert prediction have attempted to average the individual interpretations. However, DELPHI consensus offers a unique improvement to expert groups via its iterative and anonymous process, which provides revisions of single forecasts according to overall intentions and prevents individual domination of the group. The term ‘consensus’ is one of the critical concepts of DELPHIlike procedures. A consensus defines the degree of agreement on the intended decision task. In most cases, uncertainty is considered to be the opposite of consensus (Zarnowitz & Lambros, 1987). Uncertainty is a term that describes the dispersion of the outcomes. A DELPHI procedure strives to produce a collective group decision, but there may be significant differences in the participants’ predictions. There may be no meaningful consensus if the standard deviation of the outcomes reflects a highly diffuse set of results. Despite the iterative structure of the DELPHI procedure, recurring sessions may not assure a consensus of the group. The manner by which to maintain a group consensus is one of the critical questions in studies of DELPHI-like systems. In most cases, the variable that the group is attempting to predict may take a wide range of values, and it is a crucial task to predict data from a volatile background. Zarnowitz and Lambros (1987) presented an analysis of consensus and uncertainty, and one of their important results was that volatility and uncertainty are highly correlated. Consensus building can be a particularly difficult task for the facilitator when the data regarding the objective variable indicate a non-stationary pattern. Zadeh (1965) introduced fuzzy set theory, whose primary purpose is to reduce system uncertainty in several applications. Fuzzy logic techniques have
O. Duru et al. / Expert Systems with Applications 39 (2012) 840–848
provided considerable improvements in the fields of automation and computer intelligence, which use many uncertain variables. Fuzzy set theory is an alternative for increasing predictive performance. It ensures consensus incentives and compiles expert expectations. The implementation of fuzzy DELPHI methods (FDM) has been investigated by many researchers. Kaufmann and Gupta (1988) introduced an FDM procedure designed to define optimistic, moderate and pessimistic valuations of experts via triangular fuzzy numbers (TFN). In their technology forecasting study, Ishikawa et al. (1993) collected responses on a scale of expert judgments. TFNs use the selected scales as their grades of membership. The final crisp result is calculated by max–min FDM and fuzzy integration. An important conclusion of this study is that the FDM ensured consensus in only one round. Chang, Huang, and Lin (2000) proposed a new FDM approach to assess managerial talent. This study used fuzzy statistics and ensured continuous, mathematically explicit membership functions. Chang et al. (2000) reviewed the previous FDM works and concluded that the FDM provides an opportunity to treat fuzziness arising from the forecasting objective and experts themselves. Fuzzy numbers provide a way to process groups of variables rather than the crisp estimates of the conventional DELPHI method. The original contribution of the present paper is that it improves the FDM for financial time series forecasting and implements an algorithm for defining the final crisp adjustments of statistical benchmark methods. Fuzzy numbers are applied to incorporate the scale of adjustments. An expert group is required to adjust the statistical average forecast provided by linguistic terms that are linked to percentage changes. These adjustments are transferred to intended fuzzy sets. The crisp result is calculated as the centroid of the final average fuzzy set. This paper is organized as follows: Section 2 reviews the data on the shipping freight market and detects the unit roots of seasonality in the BDI data. Section 3 presents the forecasting models and procedures. Section 4 examines the empirical results and compares the performance of several results. Section 5 concludes the present paper and provides recommendations for future studies.
2. The Baltic Dry Index and motivation for fuzzy-DELPHI forecasting Shipping is an important method of world transportation, and most merchandise and industrial products are carried by commercial ships. The price of shipping service (freight rate) is an equilibrium price of the ship’s owner–charterer negotiations and is listed in USD per metric ton or USD per day. Freight rates also have a considerable effect on final product prices. In the industry-retail market linkage, shipping margins play a critical role, and in some cases, they are the cause of large price fluctuations (Metaxas, 1971). The Baltic Dry Index (BDI) is a leading global economic indicator that traces prices for shipping bulk cargoes such as iron ore, coal and grains from producers such as Brazil, Australia and South Africa to markets in the US, Europe, Japan and China. The BDI is useful in that it offers a prognostic for determining where prices for raw materials shipped in dry bulk are rising and, therefore, provides a future look into inflationary or deflationary trends in the prices of goods (‘The economists’ magazine, October 16th, 2008). The BDI index is a cumulative price of several standard contracts, including those of various dry bulk routes, ship sizes and types of contract (voyage domain or time domain charters). A panelist committee of the Baltic Exchange defines BDI levels for trading days based on the reported fixtures, or when there is no reported contract, the committee estimates the market.
841
The prediction of freight rates has many fields of interest, including transport planning, fixing the prices of finished goods, and financing ships; furthermore, it affects industries linked to shipping. Despite the value of freight forecasting, the task of prediction has some limitations and difficulties in practice in the shipping industry. Previous forecasting studies have suggested some univariate methods and causal models, but the recent behavior of shipping freights has shown highly volatile and sporadic fluctuations. It is subject to changes based on political and behavioral aspects whose prediction is purely a matter of judgment (Duru & Yoshida, 2008a; Duru & Yoshida, 2008b). Due to the lack of accurate statistical results, judgmental forecasting and its improved methods are considered suitable for revealing subjective factors. Duru and Yoshida (2009) first applied classical DELPHI for short term freight market forecasting (they also used BDI series) and DELPHI suggested particularly in analysis of sporadic conditions. Duru, Bulut, and Yoshida (2010) also applied fuzzy-DELPHI method for forecasting dry bulk freight market. In this method, an automatic forecasting model, X12 ARIMA, is used for statistical extrapolation. However, X12 ARIMA has limitations on modeling and in this method some processes are applied unnecessarily such as seasonal differencing. Rather than conventional DELPHI, the proposed fuzzy-DELPHI method improves uncertainty on decision space and also combines advantages of statistical methods. Since experts are required to adjust statistical benchmark results, quality of statistical extrapolation is also somewhat important. Therefore, the present paper extends the work with the state-of-art methods of univariate time series modeling. Forecasting of levels of data is performed by Box– Jenkins type autoregressive integrated moving average (ARIMA) method and volatility of series is also investigated and predicted by generalized autoregressive conditional heteroscedasticity (GARCH) model. These methods will be discussed in dept on Section 3. The proposed method contributes to the literature in a number of ways. First, this paper investigates accuracy of fuzzy-DELPHI consensus method for financial forecasting problem. Second, the proposed method does not ignore capabilities of statistical methods on extrapolation by recognizing historical patterns and combines them by expert consultation process. Finally, more complex methods of statistical extrapolation are used rather than previous studies. In the present study, the fuzzy-DELPHI (FD) method was implemented for the BDI data, and predictions were calculated for 2009:11–2010:04 term (monthly average) by statistical methods and expert judgments. The empirical study was conducted in the middle of September 2009 with 11 experts, mostly from the shipbroking and managerial professions. The subjects were asked to define an adjustment to the statistical benchmark forecast. The BDI raw data consists of monthly series from 1999:1 to 2009:10. Fig. 1 shows the BDI data history in the sample. The yellow part between 2009:01 and 2009:10 indicates testing period for statistical extrapolation and the green part between 2009:11 and 2010:04 indicates prediction period. Descriptive statistics of the raw BDI data and the first difference series of BDI data for the period from January 1999 to October 2009 are reported in Table 1. Normally, scholars expect that differencing operation increases normality of data, but results clearly indicate rejection of assumption. Jarque–Bera statistics show that both series distribute irregularly. Coefficient of variation for differenced series is excessively high which can not be illustrated by tangible variables. Fig. 1 and Table 1 expose that freight market has very high volatility and statistical prediction should contain processes for modeling volatility. Table 2 presents results of unit root tests for levels and differenced BDI series. Tests are based on Augmented Dickey–Fuller (ADF) and Phillips–Perron testing procedures (Dickey & Fuller,
842
O. Duru et al. / Expert Systems with Applications 39 (2012) 840–848
12000
10000
8000
6000
4000
2000
00 l-0 0 Ja n01 Ju l-0 1 Ja n02 Ju l-0 2 Ja n03 Ju l-0 3 Ja n04 Ju l-0 4 Ja n05 Ju l-0 5 Ja n06 Ju l-0 6 Ja n07 Ju l-0 7 Ja n08 Ju l-0 8 Ja n09 Ju l-0 9 Ja n10 Ju
9
nJa
l-9
Ju
Ja
n-
99
0
Fig. 1. The series of the Baltic Dry Index (sample from 1999:01 to 2009:10 and objective term from 2009:11 to 2010:04) (yellow period is the testing term for statistical model. Green period is out of sample prediction term). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Table 1 Descriptive statistics for levels and logarithmic BDI data (sample from 1999:1 to 2009:10).*
* a b
Statistics
BDI
1st Diff. of BDI
N Mean Maximum Minimum S.D. C.V.a Skew Kurtosis Jarque–Berab
130 3156 10843 803 2412 0.76 1.41 4.48 55.36 [0.000]
130 14.58 2556.78 3167.57 690.89 47.38 -0.94 9.54 251.54 [0.000]
The BDI dataset is supplied from NYK Line Research Group, Tokyo. The coefficient of variation. The Jarque–Bera statistics is a measure of normality (Jarque & Bera, 1980).
1981; Philips & Perron, 1988). According to both tests, raw data has unit roots and first difference of BDI conform stationarity as a prerequisite of time series modeling. The final diagnostic test is the seasonal unit root test. Seasonality is one of the main parts of the many economic time series. Ignoring effects of seasonality may cause significant inconsistency in modeling process. Furthermore, modeling is based on continuous and stable epochs in linear or non-linear forms. Therefore, the definition of the type of seasonality has a critical role, because many scholars have noted the drawbacks of routine adjustments of seasonality (Barsky & Miron, 1989; Hyllberg, Engle, Granger, & Yoo,
1990). Barsky and Miron (1989) pointed out that the routine elimination of the seasonal cycle causes the loss of important information about fluctuations. Hyllberg et al. (1990) first introduced the problem and developed a testing procedure named HEGY test (HEGY are the first letters of Hyllberg et al. (1990)). HEGY test procedure investigates whether the seasonality has unit roots and how seasonality repeats itself. Various frequencies are tested, and inferences are raised according to cumulative analysis. Duru and Yoshida (2009) presented a recent test of HEGY and indicated that seasonality has unit roots. In most of frequencies, there is no any stationary seasonal cycles including annually fluctuations. Because of its complicated content (this paper does not deal with extensive statistical progress) and up-to-date results of Duru and Yoshida (2009), the present research does not proceed for seasonal modeling. Due to diagnostic tests, univariate modeling will be performed with autoregressive integrated moving average method under differenced series, I(1) (integrated once), and volatility will be estimated by GARCH approach. Details of the mentioned methods are given in the next section.
3. Models of forecasting The application of the FD method for forecasting dry bulk shipping index proceeds in six steps. As indicated in Fig. 2, the procedure begins with the generation of statistical benchmark results. To determine the universe of discourse, statistical forecasts are recognized as the origin of the adjustments, and upward and
Table 2 Unit root test for BDI series (1999:1–2009:10).
BDI Critical values 1% 5% 10%
Levels 1st diff.
ADF intercept
Int. + trend
Phillips–Perron Intercept
Int. + trend
2.84 6.88
3.42 6.86
2.23 6.10
2.49 6.06
3.48 2.88 2.58
4.03 3.45 3.15
3.48 2.88 2.58
4.03 3.45 3.15
O. Duru et al. / Expert Systems with Applications 39 (2012) 840–848
843
Box and Jenkins (1976) developed ARMA by providing differencing operation to ensure stationarity of time series. The term ‘‘Integrated’’ is inserted to the model by means of order of differencing (ARIMA). Thereafter, ARIMA is mentioned with the name of Box and Jenkins. ARIMA method provided an additional function that uses autocorrelation and partial autocorrelation for definition of order of ARMA terms. Finally, outline of the procedure is defined as follows: (1) Testing for stationarity and definition of order of differencing. (2) Testing for seasonality and definition of order of seasonal differencing. (3) Definition of order of ARMA model. (4) Estimating model parameters. (5) Diagnostic checks for residual white noise assumption. A non-seasonal ARIMA model is classified as an ‘‘ARIMA(p,d,q)’’ model, where: p is the number of autoregressive terms, d is the number of non-seasonal differences, and q is the number of lagged forecast errors in the prediction equation.
Fig. 2. The overall procedure for the fuzzy-DELPHI adjustment of statistical forecasts (IEEEÓ2010).
downward increments are calculated as percentage deviations from the statistical forecast (steps 2 and 3). After the fuzzification of the adjustments, the DELPHI group panel is performed, and the results of the group consensus are assigned as a fuzzy DELPHI adjustment of statistical forecasts in step 4. Defuzzification of the average fuzzy adjustment gives a crisp adjustment, which is applied to statistical results to produce the final results of the FD procedure (steps 5 and 6). Brief descriptions of the methods and steps are introduced in Sections 3.1–3.3.
For example, first order autoregressive and first order moving average terms under the first order differencing are figured out as ARIMA(1,1,1). Volatility part of the prediction is estimated by generalized autoregressive conditional heteroscedasticity (GARCH). GARCH is introduced by Bollerslev (1986) and improves ARCH of Engle (1982) by estimation of conditional variance r2t by both error variances e2tq and previous variances r2tp in autoregressive form as follows:
r2t ¼ a0 þ a1 e2t1 þ þ aq e2tq þ b1 r2t1 þ þ bp r2tp
ð2Þ
Error variances are based on ARIMA forecasts. 3.2. DELPHI group decision
3.1. Generating statistical benchmark results Statistical part of the present study is applied in two phase: univariate time series estimation and volatility estimation. In the first phase, autoregressive integrated moving average (ARIMA) method is used for extrapolation of statistical predictions. Batchelor, Alizadeh, and Visvikis (2007) performed a series of forecasting exercises for spot and forward freight prices, and the ARIMA results indicated serious reductions of the RMSE (root mean squared error), as compared to the random walk benchmark model, in most of the horizons. For instance, the 20-day horizon results produced reductions of about 36–61% for four different routes, and these results were superior to those of the VAR (vector auto regression), VECM (vector error correction model) and S-VECM (a restricted VECM). Therefore, ARIMA is preferred for univariate extrapolation progress. Autoregressive moving average (ARMA) approach was first suggested by Wold (1938). Wold combined AR and MA forms into ARMA process and indicated that it is very useful in many stationary time series. ARMA process is defined as:
xt ¼ U1 xt1 þ U2 xt2 þ þ Up xtp þ et h1 et1 h2 et2 hq etq
ð1Þ
where xt is the stationary time series and et is the error term et N(l, r2).
The DELPHI group decision method was developed to improve the accuracy and performance of traditional group decisions, which are based on mutual meetings and are subject to drawbacks based on participant ranks or other psychological factors. A traditional group meeting can easily be affected by conflicts of opinions, and it will be subject to the volatility of personal points of view or prejudices. In optimistic terms, this situation can lead to an adjustment of inaccuracy, but under normal conditions, it means that decision makers will lose their intensity of participation, and a few managers may become dominant in the business judgment. In contrast to the traditional group meeting, the DELPHI group consensus ensures equilibrium of sentiments and also offers some advantages for enlarging the field of innovations and predictions through the individual and private concentration of an expert. DELPHI is widely used in many industries for various purposes, mainly for the prediction and planning of production and supply requirements. DELPHI is also able to generate financial predictions, and it can be implemented to improve the efficiency of an expert group that is required to predict the forthcoming monetary conditions or financial risks. The conventional time series analysis is an extrapolation algorithm for synchronizing future trends by matching the estimation and post-sample patterns. If the series has a stationary trend and the data satisfy the normality assumption, then the traditional time series methods can provide accurate forecasts, and perhaps,
844
O. Duru et al. / Expert Systems with Applications 39 (2012) 840–848
there is no need for any other approach. However, many time series data do not satisfy these assumptions, and judgmental forecasting has a highly important role in this case, as many unusual and sporadic events occur in all industries and business systems. DELPHI group decisions can be performed for many different objectives, and short-term forecasting is one of its crucial fields of application. Although DELPHI is frequently proposed for longterm forecasting, and although it is not suggested for series with sufficient backward data, DELPHI may also improve short-term forecasting performance in some fields of application. The present paper employs the DELPHI process to maintain an equality of expert sentiments and to produce accurate predictions of the BDI. The main tasks of DELPHI are as follows: Ensure the anonymity of the information provided by experts and of their identities. Support the group by sufficient and timely reports of group sentiments by a feedback connection. Strive to gather predictions for consensus building. Decide a proper termination time of the session. In most cases, it is adjusted at the second iteration, but the proper number of iterations will be the major judgment of the facilitator. 3.3. Fuzzy forecasting and fuzzy extended DELPHI group decision procedure Fuzzy set theory was developed by Zadeh (1965) and has been used for many different applications over the second half of the last century. A particular value of a fuzzy set is derived from the sufficient approach of uncertainty modeling opportunity. Most systems have uncertainty, which makes them difficult to manage. Fuzzy set theory provides a way to combine a group of data, and the computation of several outcomes can be modeled over fuzzy numbers rather than over a wide range of crisp data sets. Fuzzy set theory is frequently applied to engineering problems and expert system designs. Forecasting is one of its particular fields of application, and several studies have implemented fuzzy set theory and fuzzy inference systems for forecasting tasks. For example, fuzzy time series analysis is a unique area of fuzzy studies. Several
studies have investigated fuzzy time series methodology and improved forecasting efficiency (Chen, 1996; Song & Chissom, 1993). The traditional DELPHI methodology defines an efficient process for group consensus decisions. For forecasting purposes, the fuzzy extended DELPHI approach improves the availability of a robust agreement among group members in a short period. When the objective task consists of a large range of data and when a system has a particular type of uncertainty (nonlinearity, non-stationarity, sporadic shocks, etc.), the fuzzification of the dataset and execution of the DELPHI procedure over a base of fuzzy numbers can improve the performance of the traditional DELPHI method. In the proposed procedure, the BDI series is adjusted by linguistic variables and their linked intervals of percentage adjustment. The subjects are required to adjust the predictions of the statistical benchmark method to increase their accuracy and to reflect the current judgmental directions of the market in a quantitative model. The FD adjustment approach enables us to reduce data uncertainty and to group the data into linguistic terms. Judgmentally adjusted statistical forecasts are concluded from implementing the defuzzified adjustments. The present research was performed on October 2009, and the forecasting term is defined as the term between November 2009 and April 2010 for monthly averages. Experts were asked to adjust the statistical estimate through the DELPHI group decision. The statistical forecasts were calculated by the benchmark method, and a review report was provided to the participant experts during the first week of October 2009. The report consisted of tabular and graphical presentations of the BDI for several backward histories (i.e., one-year, three-year) and also presented the results of statistical extrapolations. In the first iteration, one week was allowed for experts to complete personal adjustments of the statistical results. The adjustments were represented by the intervals of required change, which are based on linguistic variables. The interface window of prediction inquiry for November 2009 average is presented in Fig. 3. The results of the first iteration were returned to the group with a summary report, which included the average, minimum and maximum adjustments for each of the forecasting horizons. During
Fig. 3. The interface window for the fuzzy-DELPHI adjustment study.
845
O. Duru et al. / Expert Systems with Applications 39 (2012) 840–848
µ A~( x ) 1
Fuzzification map for adjustments of statistical forecasts
~ A1
~ A2
Lower bound of U Absolutely less
Far less
Less
~ A5
~ A4
~ A3
Statistical Forecast
More
~ A6
Far more
~ A7
Upper bound of U Absolutely more
x
Fig. 4. Fuzzification map for transformation of DELPHI group adjustments of statistical forecasts.
µ A~( x ) 1
Fuzzification map for adjustments of statistical forecasts for November 2009
~ A1
2074
~ A2
2518
~ A3
2740
~ A4
~ A5
2962
3185
~ A6
3407
~ A7
3851
x
Fig. 5. A sample fuzzification map of fuzzy-DELPHI adjustments for November 2009 BDI (percentage deviations are applied to statistical value).
the last week of October 2009, the expert group was required to compose one more prediction within three days, after reviewing the first iteration feedback report. Then, the results of the second iteration were collected. The DELPHI sessions were completed after two iterations. The average coefficient of variation (C.V.) decreased among iterations and provided 38% decline on group variance. Decline of response variance indicates improvement of repetitive process. The percentage adjustment of the DELPHI expert group was collected in a linguistic form. Group linguistic interpretations are transformed by the fuzzification process to their related fuzzy sets. The extended procedure of fuzzification is as follows. A fuzzy set is a group of data with graded levels of membership. Let U be the universe of discourse, with U = (u1, u2, . . . , um), where each ui is a linguistic variable. The definition of a fuzzy adjustment is as follows: Definition. Let Y(p), 8p 2 R : p ! ½0; 1 that indicates the percentage adjustment of statistical prediction. Let Y(p) be the universe of discourse, U, defined by the fuzzy set li(p). If A(p) consists of li(p) i 2 Nþ , then A(p) is called a fuzzy adjustment on Y(p). The fuzzy set of judgmental adjustments li(p) is defined by the percentage change of the statistical results. Upper bound and lower bound of U is defined by analysis of raw data. After the ratio-tochange (percentage deviation) transformation, mean, l, and standard deviation, r, is calculated. Since the statistical estimation is expected to be in the range of raw data deviations in normality assumption for long term, ±3r boundaries are selected for possible deviations of statistical benchmark. In the BDI raw data, number of outliers is 9 which just 1% of all deviations. Therefore, theoretical 0.98 probability range (±3r) is preferred due to normality assumption, N(l, r2). Boundaries are +30% and 30%, respectively. Fuzzy ~ 1 ¼ ð30%; 30%; 15%Þ, adjustment set are defined as follows: A ~ 2 ¼ ð20%; 15%; 10%Þ, ~ 3 ¼ ð15%; 7:5%; 0%Þ, ~ 4 ¼ ð7:5%; 0%; A A A ~ 5 ¼ ð0%; 7:5%; 15%Þ, ~ 6 ¼ ð10%; 15%; 20%Þ, 7:5%Þ, A A ~ 7 ¼ ð15%; 30%; 30%Þ. Fig. 4 shows fuzzy sets with linguistic A terms and Fig. 5 presents a sample fuzzification map of November 2009 after calculation of midpoints of fuzzy sets.
~ can be defined as A ~ ¼ ða; b; cÞ A triangular fuzzy number (TFN) A with the membership function,
8 0; > > > > xa > > < ba ; lA~ ðxÞ ¼ 1; > > xc > ; > > bc > : 0;
x
c
According to the above definition, the discrete fuzzy sets can be replaced with the following triangular fuzzy numbers:
~ 1 ¼ ðd0 ; d0 ; d2 Þ; A ~ 2 ¼ ðd1 ; d2 ; d3 Þ; A ~ 3 ¼ ðd2 ; d3 ; d4 Þ; A ... ~ m1 ¼ ðdm2 ; dm1 ; dm Þ; A ~ m ¼ ðdm1 ; dm ; dm Þ: A The degree of each data is indicated by the membership grade in the fuzzy sets. Our empirical study on expert adjustment of statistical ~ 1 (absolutely forecasts uses the following linguistic fuzzy sets: A ~ 2 (far less), A ~ 3 (less), A ~ 4 (agree with forecast), A ~ 5 (more), A ~6 less), A ~ 7 (absolutely more). (far more) and A The final crisp adjustment is calculated as the centroid of the final fuzzy set which arithmetic average of expert responses. 4. Empirical results Fuzzy-DELPHI method is applied to a survey group for calibration of statistical forecasting results. The raw BDI data is transformed to differenced series dBDI and remaining process is applied over differenced data. E Views 6.0ÓQMS LLC software is used for statistical analysis and modeling purposes. According to investigation of autocorrelation and partial autocorrelation, the main function is found to be consists of autoregressive components for one period and four period lags. Due to partial autocorrelation
846
O. Duru et al. / Expert Systems with Applications 39 (2012) 840–848
Table 3 GARCH(3,4) estimation of the BDI. Variables
Coefficient
S.E.
t-Stat
P-value
OLS estimation of dependent variable dBDI dBDI = a1dBDI(1) + a2dBDI(4) dBDI(1) 0.4341 dBDI(4) 0.2357
0.073 0.073
5.945 3.208
0.000 0.002
S.E.: 569.58 R-squared: 0.26 Log-likelihood: 1101.46 DWa: 1.87 AICb: 15.542 SBICc: 15.583
Q-statd (5): 2.08 Qsq-statd (5): 100.76 ARCHe (1): 19.43 (0.000) Whitef: 14.24 (0.000) Residual SK: 0.37 Residual KR: 6.80
GARCH(3,4) estimation of dependent variable dBDI r2t ¼ a0 þ a1 e2t1 þ a2 e2t2 þ a3 e2t3 þ a4 e2t4 þ b1 r2t1 þ b2 r2t2 þ b3 r2t3 dBDI(1) 0.4330 dBDI(4) 0.093 a0 342617.7 a1 0.6216 a2 0.3887 a3 0.4327 a4 0.4818 b1 0.3774 b2 0.7293 b3 0.5427
0.071 0.035 66785.6 0.192 0.106 0.153 0.122 0.201 0.098 0.170
6.033 2.637 5.130 3.223 3.657 2.833 3.938 1.869 7.413 3.182
0.000 0.008 0.000 0.001 0.000 0.004 0.000 0.061 0.000 0.001
S.E.: 622.85 R-squared: 0.24 Log-likelihood: 977.87 DW: 1.86 AIC: 15.198 SBIC: 15.418 a b c d e f
Q-stat (5): 4.45 Qsq-stat (5): 10.96 Residual SK: 0.37 Residual KR: 6.28
Durbin–Watson statistics (Durbin & Watson, 1950; Durbin & Watson, 1951). Akaike Information Criterion (Akaike, 1981). Schwarz Bayesian Information Criterion (Schwarz, 1978). Ljung and Box (1978) Q statistics for auto correlations of residuals and squared residuals. Test of autoregressive conditional heteroschedasticity for residuals (Engle, 1982). Test of residual heteroschedasticity (White, 1980).
results of squared residuals and squared returns, GARCH(3,4) is selected for volatility extrapolation. The final results of statistical approach are introduced in table 3. It is clear that all variables are significant at 95% confidence except one period lagged squared returns (94% confidence). But, it is also in acceptable level. Ordinary least squares (OLS) method is used for autoregressive part. Diagnostic tests indicate probable conditionality of variance. Q-statistics for squared residuals, ARCH(1) test and White test support necessity for volatility modeling. Then, GARCH model results are listed. AIC and SBIC metrics indicate superiority of modeling volatility. Accuracy of proposed method and statistical results is compared by three error metrics. Root mean squared error (RMSE) is calculated as follows:
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pn 2 t¼1 ðDv t F v t Þ RMSE ¼ ðt ¼ 1; 2; . . . ; nÞ n
ð3Þ
RMSE is very useful to evaluate size of errors and it is particularly sensitive for higher errors by squared magnification. RMSE is improved normalization method named normalized RMSE (NRMSE). NRMSE is a ratio between RMSE and the range between maximum and minimum of series. Another accuracy metrics is mean absolute percentage error (MAPE). MAPE is defined by:
MAPE ¼
n 1X Dv t F v t ðt ¼ 1; 2; . . . ; nÞ n t¼1 Dv t
MAPE provides average of ratio-to-error measurement.
ð4Þ
Table 4 The RMSE, NRMSE and MAPE scores of the fuzzy-DELPHI judgmental adjustment study.
The first round MAPE RMSE NRMSE
Fuzzy-DELPHI adj.
ARIMA–GARCH
Naïve
0.14 521 0.41
0.19 698 0.55
0.16 616 0.49
The second round MAPE 0.10 RMSE 345 NRMSE 0.27
Table 4 gives results of accuracy metrics. No change model (Naïve) is also presented as a benchmark. Naïve is based on an assumption that the present value is equal to one period before and it is frequently used for comparative purpose as a base method. According to all accuracy measures, judgmental adjustment of statistical prediction ensures superiority. Particularly the second round of DELPHI application improved results dramatically. Reduction on error measures is around 30–40% among the iterative process. ARIMA–GARCH model is inferior to simple Naïve method which indicates that statistical approach is almost impractical. Fuzzy-DELPHI adjustment increases accuracy and practical value of model. Fig. 6 also presents both statistical and adjusted series with actual dataset. It is clear that statistical forecasting may cause higher error rates particularly in turning points and expert adjustment can stabilize enormous deviations.
O. Duru et al. / Expert Systems with Applications 39 (2012) 840–848
847
5000 4500 4000 3500 3000 2500 2000
Actual data
1500 Statistical prediction 1000 500
Fuzzy-DELPHI adjusted predictions
0
9 9 0 9 9 9 9 09 09 10 09 10 09 09 -10 -0 -09 vt- 0 bbl- 0 r- 1 r- 0 g-0 c-0 nnpnar ar ay Fe Fe Ju Ja No Ja Oc Se Ju Ap De Ap M M M Au Fig. 6. Actual BDI dataset, statistical predictions and fuzzy-DELPHI adjusted predictions.
5. Conclusion Shipping, as a leading indicator of economic activity, has an important role in the world trade system, and it can be considered as an indicator of upturns or slowdowns in the economy. Maritime transportation statistics provide a record of world trade traffic (over 80% of world trade uses the seaways), and the first indications of economic activity can thus be deduced from shipping indicators. One of the unique determinants of maritime transportation is the price of shipping service, called the freight rate. A proper forecast of freight rates has many fields of application including various industrial, financial and governmental decisions. Due to their particular inferences, forecasting tasks are necessary even though their predictions are inferior. The current paper attempts to combine some theoretical implications of conventional DELPHI procedures and fuzzy sets into an applicable hybrid model for shipping businesses and other stakeholders of local and global economic institutions. The adjustment of statistical results is a common process in several businesses, and it is important to define a structure for efficient and proper adjustments. The fuzzy-DELPHI based adjustment procedure is investigated in a dry bulk shipping example, and the results are promising. One critical conclusion is that the consensus of the group was ensured successfully since a reduction on variance is gained. Moreover, the fuzzy-DELPHI based study provided superior predictions, as compared with the statistical benchmark results. In fact, statistical approach could not success over no-change strategy, but proposed method improved its accuracy by expert aided design. Further research may contribute to develop a method for optimization of size of fuzzy sets or selection of the shape of fuzzy sets. The present paper applied triangular sets as in conventional use, but several alternatives exist. Another contribution can be performed to bootstrap expert judgments due to long term structural deviations. Structural bias may exist on specific expert valuations. Acknowledgement An earlier version of this paper appeared in [Duru, O., Bulut, E., & Yoshida, S. (2010). Fuzzy extended group consensus of judgmental adjustments on statistical forecasts. In Proceedings of the
International conference on computer and automation engineering, Singapore], as part of the ICCAE 2010 conference proceedings, published under the IEEE copyright. References Akaike, H. (1981). Likelihood of a model and information criteria. Journal of Econometrics, 16, 3–14. Barsky, R. B., & Miron, J. A. (1989). The seasonal cycle and the business cycle. Journal of Political Economy, 97, 503–534. Batchelor, R., Alizadeh, A., & Visvikis, I. (2007). Forecasting spot and forward prices in the international freight market. International Journal of Forecasting, 23, 101–114. Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31, 307–327. Bonacina, M., Cretì, A., & Sileo, A. (2009). Gas storage services and regulation in Italy: A Delphi analysis. Energy Policy, 37, 1277–1288. Box, G. E. P. E., & Jenkins, G. M. (1976). Time series analysis: Forecasting and control (revised ed.). Oakland CA: Holden-Day. Chang, P. T., Huang, L. C., & Lin, H. J. (2000). The fuzzy Delphi method via fuzzy statistics and membership function fitting and an application to the human resources. Fuzzy Sets and Systems, 112, 511–520. Chen, S. M. (1996). Forecasting enrolments based on fuzzy time series. Fuzzy Sets and Systems, 81, 311–319. Dalkey, N. C., & Helmer, O. (1963). An experimental application of the Delphi method to the use of experts. Management Science, 9, 458–467. Dickey, D. A., & Fuller, W. A. (1981). Likelihood ratio statistics for autoregressive time series with a unit root. Econometrica, 49, 1057–1072. Durbin, J., & Watson, G. S. (1950). Testing for serial correlation in least squares regression, I. Biometrika, 37, 409–428. Durbin, J., & Watson, G. S. (1951). Testing for serial correlation in least squares regression, II. Biometrika, 38, 159–179. Duru, O., & Yoshida, S. (2008a). Composite forecast: A new approach for forecasting shipping markets. In Proceedings for the international association of maritime economists conference, Dalian, China, April 2–4. Duru, O., & Yoshida, S. (2008b). Market psychology. Lloyd’s Shipping Economist, 30, 30–31. Duru, O., & Yoshida, S. (2009). Judgmental forecasting in the dry bulk shipping business: Statistical vs. judgmental approach. Asian Journal of Shipping and Logistics, 25(2), 189–217. Duru, O., Bulut, E., & Yoshida, S. (2010). Fuzzy extended group consensus of judgmental adjustments on statistical forecasts. In Proceedings for the 2nd international conference on computer and automation engineering (ICCAE) 2010. Singapore, February 26–28. doi:10.1109/ICCAE.2010.5451405. Engle, R. F. (1982). Autoregressive conditional heteroskedasticity with estimates of the variance of U.K. inflation. Econometrica, 50, 987–1008. Fischer, G. W. (1981). When oracles fail – A comparison of four procedures for aggregating subjective probability forecasts. Organizational Behaviour and Human Performance, 28, 96–110. Goodwin, P., & Wright, G. (1993). Improving judgemental time series forecasting: A review of the guidance provided by research. International Journal of Forecasting, 9, 147–161. Hyllberg, S., Engle, R. F., Granger, C. W. J., & Yoo, B. S. (1990). Seasonal integration and cointegration. Journal of Econometrics, 44, 215–238.
848
O. Duru et al. / Expert Systems with Applications 39 (2012) 840–848
Ishikawa, A., Amagasa, M., Shiga, T., Tomizawa, G., Tatsuta, R., & Mieno, H. (1993). The max–min Delphi method and fuzzy Delphi method via fuzzy integration. Fuzzy Sets and Systems, 55, 241–253. Jarque, C. M., & Bera, A. K. (1980). Efficient tests for normality, homoscedasticity and serial independence of regression residuals. Economics Letters, 6, 255–259. Kaufmann, A., & Gupta, M. M. (1988). Fuzzy mathematical models in engineering and management science. Amsterdam: North-Holland. Lawrence, M. J., Edmundson, R. H., & O’Connor, M. J. (1985). An examination of the accuracy of judgemental extrapolation of time series. International Journal of Forecasting, 1, 25–35. Lawrence, M. J., & O’Connor, M. J. (1992). Exploring judgemental forecasting. International Journal of Forecasting, 8, 15–26. Linstone, H. A., & Turoff, M. (1975). The Delphi method: Techniques and applications. London: Addison-Wesley. Ljung, G. M., & Box, G. E. P. E. (1978). On a measure of lack of fit in time series models. Biometrika, 65, 297–303. Metaxas, B. (1971). The economics of tramp shipping. London: Athlone Press. Parente´, F. J., Anderson, J. K., Myers, P., & O’Brien, T. (1984). An examination of factors contributing to Delphi accuracy. Journal of Forecasting, 3(2), 173–182. Philips, P. C. B., & Perron, P. (1988). Testing for a unit root in time series regression. Biometrica, 75, 335–346.
Rowe, G., & Wright, G. (1996). The impact of task characteristics on the performance of structured group forecasting techniques. International Journal of Forecasting, 12, 73–89. Rowe, G., & Wright, G. (1999). The Delphi technique as a forecasting tool: Issues and analysis. International Journal of Forecasting, 15, 353–375. Schwarz, G. E. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464. Song, Q., & Chissom, B. S. (1993). Fuzzy forecasting enrollments with fuzzy time series – Part 1. Fuzzy Sets and Systems, 54, 1–9. Yang, T., & Hsieh, C. H. (2009). Six-sigma project selection using national quality award criteria and Delphi fuzzy multi criteria decision-making method. Expert Systems with Applications, 36, 7594–7603. White, H. (1980). A heteroscedasticity-consistent covariance matrix estimator and a direct test for heteroscedasticity. Econometrica, 48, 817–838. Wold, H. (1938). A study in the analysis of stationary time series. Stockholm: Almgrist & Wiksell. Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338–353. Zarnowitz, V., & Lambros, L. A. (1987). Consensus and uncertainty in economic prediction. Journal of Political Economy, 95, 591–621.