evaluation of atm cash demand process factors ...

Copernican Journal of Finance & Accounting 2015, volume 4, issue 2

e-ISSN 2300-3065 p-ISSN 2300-1240

Žylius G. (2015). Evaluation of ATM Cash Demand Process Factors Applied for Forecasting with CI Models. Copernican Journal of Finance & Accounting, 4(2), 211–235. http://dx.doi.org/10.12775/ CJFA.2015.025

Gediminas Žylius* Kaunas University of Technology

evaluation of atm cash demand process factors applied for forecasting with ci models

Keywords: computational intelligence, regression, time series forecasting, cash management, data-based forecasting, daily cash flow. J E L Classification: C45, C53, G21.

Abstract: The purpose of cash management is to optimize distribution of cash. Effective cash management brings savings to retail banks that are related to: dormant cash reduction; reduced replenishment costs; decrease of cash preparation costs; reduction of cash insurance costs. Optimization of cash distribution for retail banking in ATM and branch networks requires estimation of cash demand/supply in the future. This estimation determines overall cash management efficiency: accurate cash demand estimation reduces bank overall costs. In order to estimate cash demand in the future, cash flow forecasting must be performed that is usually based on historical cash point (ATM or branch) cash flow data. Many factors that are uncertain and may change in time influence cash supply/demand process for cash point. These may change throughout cash points and are related to location, climate, holiday, celebration day and special event (such as salary days and sale of nearby supermarket) factors. Some factors affect cash demand periodically. Periodical factors form various seasonality in cash flow process: daily (related to intraday factors throughout the day), weekly (mostly related to weekend effects), monthly Date of submission: August 7, 2015: date of acceptance: October 14, 2015. Contact information: [email protected], Department of Automation, Faculty of Electrical and Electronics Engineering, Kaunas University of Technology, Studentų g. 50 - 154, LT - 51367 Kaunas, Lithuania, phone: +370 642 04 808.

*

212

Gediminas Žylius

(related to payday) and yearly (related to climate seasons, tourist and student arrivals, periodical celebration days such as New Year) seasons. Uncertain (aperiodic) factors are mostly related to celebration days that do not occur periodically (such as Easter), structural break factors that form long term or permanent cash flow shift (new shopping mall near cash point, shift of working hours) and some may be temporal (reconstruction of nearby building that restricts cash point reachability). Those factors form cash flow process that contains linear or nonlinear trend, mixtures of various seasonal components (intraday, weekly, monthly yearly), level shifts and heteroscedastic uncertainty. So historical data-based forecasting models need to be able to approximate historical cash demand process as accurately as possible properly evaluating these factors and perform forecasting of cash flow in the future based on estimated empirical relationship.

Introduction

The aim of this research is to study how cash flow process factors affect cash flow forecasting accuracy in ATM network, using computational intelligence methods as cash flow forecasting models when performing daily aggregated cash flow forecasting. For factor evaluation 8 typical (affected by different factors) ATM cash withdrawal process flows selected from real ATM network are used with historical period of 33 months. Previous studies of Automatic Tellec Machine (ATM) withdrawal cash flow (Rodrigues, Esteves 2010) show that this process is strongly affected by calendar factors (day-of-the-week, week-of-the-month, month-of-the-year and holidays). Those effects can be used as numerical input values with CI models. Classical neural network regression approach using calendar effects (working day, weekday, holiday effect, salary day effect) as inputs were used by Kumar and Walia (2006) for cash demand process forecasting. However, cash demand varies in time and simply using regression inputs without incorporating recent time series information is usually not enough. Simutis et al. (2007) applied more advanced flexibe neural network approach by incorporating both regression inputs (calendar effects) and time series inputs (such as value of aggregated cash demand of previous several days). During last decade, support vector machines (SVM) for various computational intelligence task was considered as alternative to neural networks because of benefits such as decision stability, less overfitting and better generalization abilities when smaller training (historical) data is available. However Simutis, Dilijonas and Bastina (2008) show that application of support vector machines to cash demand forecasting process has no superiority to neural networks and is even less accurate when reasonably long historical data period

Evaluation of atm cash demand process factors…

213

for model training is available. As an alternative to neural network approach, interval type-2 fuzzy neural network (IT2FNN) was applied for cash demand forecasting (Darwish 2013). This type of model has both on-line structure and parameter learning abilities that lets model automatically adapt to different cash flow processes. CI models for forecasting are considered to be a more advanced approach. However, time series models are also used for cash flow process forecasting. When comparing classical econometric time series models researches show (Gurgul, Suder 2013) that SARIMA (seasonal autoregressive integrated moving average) models are most accurate. It is shown (Wagner 2010) that SARI M A model even outperform joint forecasting approach using vector time series models. A comparison between time series probability density forecasting models (such as linear, autoregressive and structural and Markov-switching time series models) is made by Brentnall, Crowder and Hand (2010b) and results show that Markov-switching density forecasting model performs best. Deeper investigation of cash demand process factors are also useful for forecasting. Random-effects models (Brentnall, Crowder & Hand 2008, 2010a) are used when modelling of individual client cash withdrawal patterns is made for ATM withdrawal forecasting. Laukaitis (2008) presented research on cash flow forecasting when intraday cash flow time series is treated as random continuous functions projected onto low dimensional subspace and apply functional autoregressive model as predictor of cash flow and intensity of transactions. Forecasting models

This section presents computational intelligence forecasting models that are applied for daily-aggregated ATM cash flow forecasting. Support vector regression model. Support vector regression (SVR) is application of support vector machines (SVM) model for regression problems. SVM was originally proposed by Cortes and Vapnik (1995) as a robust linear model for classification problems. The idea behind SVM is to map input space data vectors to output space via high dimensional space called feature space. This mapping is performed using so called kernel trick. By doing so, linear nature of SVM model can be applied for nonlinear function approximation. The nonlinear mapping requires nonlinear kernel function selection. Gaussian kernel functions are usually used because of few parameters. In this paper two types of

sional space called feature space. This mapping is performed using so called kernel trick. By doing so, linear nature of SVM model can be applied for nonlinear function approxima-

214

tion. The nonlinear mapping requires nonlinear kernel function selection. Gaussian kernel Gediminas Žylius

functions are usually used because of few parameters. In this paper two types of SVR

models are applied: 1) 1) ν-support vector regression (ν-SVR) and 2) least squares support SVR models are applied: 1) 1) ν-support vector regression (ν-SVR) and 2) least vector regression (LSSVR). squares support vector regression (LSSVR). ν-SVR. Let input data vectors (i-thobservation observation n-dimension-dimensional vector) and – ν-SVR. Let input data vectorstotobe be xi  Rn (i-th  1 1tol be y R1. Model  *   optimization T 1 nal vector) and output (cash flow) data output (cash flow) for ν-SVR algorithm is  RC. Model ξ i i  problem min* data   optimization  towbewyi  i  ,ξ,ξ ,b algorithm w ,ν-SVR 2 l iby problem for is formulated (Chih1 following equations    formulated by following equations (Chih-Chung & Chih-Jen 2011): -Chung & Chih-Jen 2011): T ■









 w  xi   b  yi     i ,  * T 1 T 1 ly i  w *     x i   b l     i ,  C to  1ξ i T  i    min  w w 1 subject *  w , ,ξ,ξ ,b 2 l i 1 w* w C  w ,min  ,   l  ξ i   i   ,  0  ,b i2  ,ξ,ξ  i 1    i  w T   x i   b  y i    T i ,  0 .   w   x i   b   y i     i ,   * T





*

*





 y  w   x   b     ,

i i to  i w T performs   x i   b  mapping    i* , of input space to high  y i that * Where   xisubject function (Gaussian)  is kernel subject to   ,  0 ,  

i i *  i ,  i  0 ,   0 . dimensional feature space (the space where   0 . linear regression is performed); w is a * e   xi  is kernel function (Gaussian) that performs mapping of input space to high parameter vector hyperplane; b that isperforms hyperplane bias i , ξi are Where of is kernel (Gaussian) that mapping of parameter; input Where kernelfunction function (Gaussian) performs mapping of space inputtoξhigh  is   xin-dimensional nsional featurespace spaceto(the space where linear regression is performed); w islinear a high dimensional feature space (the spaceto where regression upper and lower training errors (slack subject ε – insensitive tube; w C is dimensional feature space (the variables) space where linear regression is performed); is aa cost * performed); hyperplane; w is a parameter vector of n-dimensional b is hy-* meter vector ofisn-dimensional b is hyperplane bias parameter; ξi , hyperplane; ξi are parameter vectorthe of n-dimensional hyperplane; b is hyperplane bias parameter; ξi , ξi arerigid parameter, that controls trade-off between allowing training errors andvariforcing * perplane parameter; , ξi areto upper and lower training errors (slack r and lower training errorsbias (slack variables) ξsubject ε – insensitive tube; C is a cost i upper and lower errors (slack variables) subject to ε – insensitive tube; the C is a cost ables) subject to ε training – insensitive tube; controls C is a cost parameter, that controls ν isthe regularization parameter of support vectors; l meter,margins; that controls trade-off between allowingthat training errors parameter and forcing number rigid parameter, that controls trade-off between training errors and forcing rigid trade-off between allowingthe training errors and allowing forcing rigid margins; ν is reguins; ν– isisregularization that controls parameter number of support vectors; l the boundaries of ε – number ofparameter data points (observations). Data points that lie on larization parameter that controls parameter number of support l – vectors; is margins; ν is regularization parameter that controls parameter numbervectors; of support l number of datanumber points (observations). Data pointsvectors. that lie onGraphical the boundaries of εthe – boundaries of data points (observations). Data points that lie on of insensitive tube are called support illustration of ν-SVR model – is number of data points (observations). Data points that lie on the boundaries of ε – is sitive tube areε –called supporttube vectors. Graphical illustration of Graphical ν-SVR model is insensitive are called support vectors. illustration of ν-SVR insensitive depicted in Figure 1 tube are called support vectors. Graphical illustration of ν-SVR model is model is depicted in Figure 1. ted in Figure 1 depicted in Figure 1

1. of An illustration of ν-SVR approximation principle Figure 1. ν-SVR An illustration of ν-SVR approximation principle Figure 1. An Figure illustration approximation principle Figure 1. An illustration of ν-SVR approximation principle

this research ν-SVR code is used that is implemented in LIBSVM library (see Chihthis research ν-SVR code is used that is implemented in LIBSVM library (see ChihS o u r c eIn: created by authors. g & Chih-Jen 2011). Chung & Chih-Jen 2011). thissquares research ν-SVR is used that ismodel implemented in LIBSVM library (see ChihLS-SVR.In Least support vectorcode regression (LS-SVR) is very similar to ν– LS-SVR. Least squares support vector regression (LS-SVR) model is very similar to νChung &problem Chih-Jen 2011). by following equations (Suykens et al. 2002): Optimization is formulated SVR. Optimization problem is formulated by following equations (Suykens et al. 2002):


215

In this research ν-SVR code is used that is implemented in LIBSVM library (see Chih-Chung & Chih-Jen 2011).in LIBSVM library (see Chihs research ν-SVR code is used that is implemented LS-SVR. Least squares support vector regression (LS-SVR) model is very & Chih-Jen 2011). similar to ν-SVR. Optimization problem is formulated by following equSVR. Least squares support vector regression (LS-SVR) model is very similar to νations (Suykens et al. 2002): ptimization problem is formulated by following equations (Suykens et al. 2002): ■

1  min  w T w  w ,b , e 2 2 

l

e i 1

2 i

  

 w   x i   b  ei  y i , subject to    0 . T

ei are error variables and γ is regularization constant. Where ei are error variables and γ is regularization constant.

Differently from ν-SVR, LS-SVR doesn’t use insensitive tube and is regulartly from ν-SVR, LS-SVR doesn’t use insensitive tube and is regularized only ized only by parameter γ, so is not as sparse as ν-SVR model (has more support Differently from ν-SVR, doesn’t use insensitive er γ, so is not as sparse as ν-SVR model (hasLS-SVR more support vectors). But leasttube and is regularized only vectors). But least squares loss function brings other flexibility benefits for reby parameter γ, sobenefits is not as sparse as ν-SVR model As (hasformore support vectors). But least s function brings other flexibility for regression problems. gression problems. As for ν-SVR model, for LS-SVR model νGaussian kernel funcsquares loss function brings other flexibility benefits for regression problems. As for νare used. l, for LS-SVR tions model Gaussian kernel functions are used.

In this LS-SVR codeGaussian is usedpresented from LS-SVMlab model, LS-SVR model kernel functions aretoolbox used. presented in esearch LS-SVR SVR code is research usedfor from LS-SVMlab toolbox in website website (see Pelckmans et al. 2002). In this research LS-SVR code is used from LS-SVMlab toolbox presented in website mans et al. 2002). Relevance vector regression model. Relevance vector regression (Tipping 2001) (see Pelckmans et al. 2002). ce vector regression model. Relevance vector regression (Tipping 2001) is is model that has same linear functional form as support vector regression: Relevance vector regression model. Relevance vector regression (Tipping 2001) is has same linear functional form as support vector regression: � same linear functional form as support vector regression: model that has � �� . �� . �� is defined as kernel function and � is model weight�� vector. Where is defined as kernel function model weight vector. � Where �� is defined as kernel function �are is is model weight vector. � more sparse than ν-SVR), and ses even less support vectors (is thatand called

RVR uses even lessissupport vectors (ismodel more parameter sparse than ν-SVR), that are called vectors. Bayesian inference methodology used during RVR RVR uses even less support vectors (is more sparse than ν-SVR), that are relevance vectors. Bayesian inference methodology is used during RVR model parameter nce vector determination. RVR uses EM-like (expectation-maximization) called relevance vectors. Bayesian inference methodology is used during RVR and relevance vector determination. RVR uses EM-like (expectation-maximization) gorithm and applies priori distributions (becausevector of Bayesian methodology) modelaparameter and relevance determination. RVR uses EM-like (exlearning algorithm and applies a priori distributions (because of Bayesian methodology) eters. Gaussian kernel as in both LSSVR and ν-SVR cases are and also applies used with pectation-maximization) learning algorithm a priori distributions

over parameters. Gaussian kernel as inover both parameters. LSSVR and ν-SVR caseskernel are also (because of Bayesian methodology) Gaussian asused in with both LSSVR and ν-SVR cases are also used with RVR. RVR. research RVR implemented in SparseBayes package for MATLAB by RVR In In this research RVR for MATLAB MATLAB by this research RVRimplemented implemented in in SparseBayes SparseBayes package package for by RVR self (see Tipping 2001) is used. RVR author himself (see Tipping 2001) is used. author himself (see Tippingneural 2001)network is used. (FFNN) is the most rward neural network model. Feed-forward neural network model.approximation Feed-forward (classineural network (FFNN) is the most ificial neural networkFeed-forward architecture for nonlinear function

popular artificial neuralofnetwork architecture for by nonlinear function approximation (classid regression) problems. The architecture algorithm was inspired biological

RVR. In this research RVR implemented in SparseBayes package for MATLAB by RVR author himself (see Tipping 2001) is used.

216 Gediminas Feed-forward neural network model. Feed-forward neural network (FFNN) Žylius is the most

popular artificial neural network architecture for nonlinear function approximation (classiFeed-forward neural network model. Feed-forward neural network (FFNN) is

ficationthe and regression) problems.neural The architecture of algorithm inspired by biological most popular artificial network architecture forwas nonlinear function

(classification and problems. The architecture of al- valneural approximation networks (brains) and models theregression) interconnection of neurons that can compute gorithm was inspired by biological neural networks (brains) and models the interconnection of neurons that can compute values from inputs by feeding infordepicted in Figure 2. the network. FFNN model architecture is depicted in Figure 2. mation through

ues from inputs by feeding information through the network. FFNN model architecture is

Figure 2. Anillustration illustration of network architecture Figure 2. An of feed-forward feed-forwardneural neural network architecture

S o u r c e : created by authors.

In this research, logarithmic sigmoid transfer function was used in the hidden layers and In this research, logarithmic sigmoid transfer function was used in the hid-

linear transfer function in the outputfunction layer. Logarithmic sigmoid transfer function output of den layers and linear transfer in the output layer. Logarithmic sigmoid

s equal to:

transfer function output of input variable is equal to:

 1  y  log  . a  1 e 

, Levenberg – Marquardt (Hagan & Menhaj 1994) backpropagation In this work, Levenberg – Marquardt (Hagan & Menhaj 1994) backpropaon & Williamsgation 1986) training algorithm was & used for feed-forward neural algorithm was used for (Rumelhart, Hinton Williams 1986) training aining, which feed-forward is implemented neural in MATLAB Neural Networks Toolbox. network model training, which is implemented in MATLAB Neural Networks egression neural network model. Toolbox. Generalized Regression Neural Network Generalized regression model.(RBF) Generalized Regression Neuoposed by Specht (1991) is special case ofneural radial network basis function ral Network (GRNN) first proposed by Specht (1991) is special case of radial GRNN does not require an iterative training procedure (error back basis function (RBF) neural network. GRNN does not require an iterative trainith other neural network architectures). GRNN training procedure requires ing procedure (error back propagation as with other neural network architecRBF spread parameter. It uses those functions to cover inputspecification space and of RBF spread paramtures). GRNN training procedure requires nction as weighted combination of radial basis functions. of approximates function eter.linear It uses those functions to cover inputNumber space and as weighted linear combination of radial basis functions. Number of RBF funcequal to number of observations (number of days in historical daily cash tion is equal to number of observations (number of days in historical daily cash is formed for each data point vector that is a center of RBF. RBF transfer flow). Each RBF is formed for each data point vector that is a center of RBF. are calculated according to Euclidean distance from the central point to RBF transfer function values are calculated according to Euclidean distance his research GRNN in MATLAB Neural Network Toolbox is GRNN implemented in from implemented the central point to input vector. In this research MAT LAB Neural Network Toolbox is used.

ro-fuzzy inference system model. Adaptive neuro-fuzzy inference system

993) combines fuzzy inference system and neural network features: neural

capabilities (backpropagation) with fuzzy input and output formation

ed.

Adaptive neuro-fuzzy inference system model. Adaptive neuro-fuzzy inference system

217


NFIS) (Jang 1993) combines fuzzy inference system and neural network features: neura

neuro-fuzzy inference system model. Adaptiveinput neuro-fuzzy infer- formation work trainingAdaptive capabilities (backpropagation) with fuzzy and output ence system (ANFIS) (Jang 1993) combines fuzzy inference system and neural

akagi–Sugeno fuzzy inference system). An architecture of ANFIS model that has two network features: neural network training capabilities (backpropagation) with

fuzzy inputis and output formation (Takagi–Sugeno fuzzy inferencearchitecture system). An armbership functions depicted in Figure 3. This type of model has 5 layers chitecture of ANFIS model that has two membership functions is depicted in Fig-

zy layer (1), product (2),architecture normalization layerfuzzy (3),layer defuzzification layer (4) and ure 3. This typelayer of model has 5 layers: (1), product layer

(2), normalization layer (3), defuzzification layer (4) and summation layer (5). mmation layer (5).

Figure 3. Illustration of ANFIS model architecture with two membership functions

Figure 3. Illustration of ANFIS model architecture with two membership functions 1 layer A1 X

2 layer

3 layer

w1

4 layer

w1

5 layer

w 1f1

A2



F

B1 Y

w2

w2

w 2f 2

B2


r a 1st orderFor of aSugeno model, a typical rulerule set can be be expressed as: 1st orderfuzzy of Sugeno fuzzy model, aIF-THEN typical IF-THEN set can expressed as:

er of Sugeno fuzzy model, a typical IF-THEN rule set can be expressed as: der of Sugeno fuzzy rule as: AND yIF-THEN is B THEN f set = pcan x +be q yexpressed + r 1; 1) model, IF x is aAtypical 1 B THEN f1 = p x +1q y 1+ r ; 1 1) IF x is A1 AND y is 1 1 1 1 1 1) IF x is2) A1IFAND B1 THEN x is Ay2 isAND y is B2 fTHEN =qp1y2 x++r1q;2y + r 2. 1 = p1xf+ 2 2) IF x is A2 AND y is B2 THEN f2 = p2x + q2y + r2. 2) IF x is A2 AND y is B2 THEN f2 = p2x + q2y + r2. of five layer functionality shortly explained: Further is each of five layer functionality is shortly explained: ch of five layer functionality is shortly explained: st ■ 1 layer. Forms output, which determines degree in each of ayer. Forms output, which determines membership degree in membership each of layer. Forms output, which determines membership degree in each of bership functions (µA1,membership µA2, µB1, µB2):functions (µA1, µA2, µB1, µB2): mbership functions (µA1, µA2, µB1, µB2): ��,� = �� , � = �, �, ��,� = �� , � = �, �, ��,� = �� , � = �, �. ��,� = �� , � = �, �. yer. In this layer each node is fixed and represents weight of particular rule. In ayer. In this layer each node is fixed and represents weight of particular rule. In nd ode AND operation performed, is product inputs: ■ is2 layer. Inwhich this layer eachofnode is fixed and represents weight of partinode AND operation is performed, which is product of inputs: �� , cular rule. In this node AND operation is performed, which is product of = � ��,� = � � � � = �, �. � � � ��,� = �� = �� , � = �, �. inputs: yer. Each node of this layer is also fixed and calculates normalized rule layer. Each node of this layer is also fixed and calculates normalized rule ation degree: tation degree: ��

A1

A2

B1

B2

layer. Forms output,(µwhich membership degree in each of µ membership functions A1, µ A2, determines B2): � = =B1�, µ� ��, �� = �,�,�,�, �,��,�= ��, mbership functions (µA1, µA2�, �,� µB1=, µ�B2 ): ��, � =� �, �, � ��, ��, = �� ==�,�,�.�. �� ,��,�= �� , � = � � = �, �, �,� ��, � = �, �. ��,� node =�� fixed 2ndlayer. layer.InInthis thislayer layereach each andrepresents representsweight weightofofparticular particularrule. rule.InIn 2nd node is�is fixed and 218 Gediminas Žylius nd ��, � = � � = �, �. �,� � ��is fixed and represents weight of particular rule. In 2 this layer. InAND thisoperation layer each node nodeAND operation performed, whichisisproduct productofofinputs: inputs: this node isisperformed, which

layer. In this layer each node fixed �� and represents weight of particular rule. In this node AND operation which is product ��, =�performed, �is= �. inputs: � =� �� ,�=is � �� , � �==�,�,�.of �,�

�

��

��

node is performed, product inputs: �� is = �this ��which ��,� � = of �, �. calculates normalized rule rd AND operation � = layer �� , layer.Each Eachnode node also fixed and ofofthis layer isisalso fixed and calculates normalized rule 3rd3 layer. rd ■ 3 layer. Each node of this layer is also fixed and calculates normalized rd �� , �fixed �� this = ��layer = �, �. �node �,� = of ��also � 3 layer. Each is and calculates normalized rule excitationdegree: degree: excitation rule excitation degree: layer. Each node of this layer is also fixed and calculates normalized rule excitation degree: �� itation degree: =�� = , � = �, �. �� ,��,�= � � �= �� , � = �, �. �� ,� = � �� = , � = �, �. th � �� and parameters (pi, qi, ri) are estimated �� layer.This Thislayer layeristh isnot notfixed fixedas as other � �other 4th4 layer. and (pi, qi, ri) are estimated = � � = , � = �,parameters �.fixed as other � ■ �,� � 4 layer. This layer is not and parameters (p , q , r ) are esth �of �nodes � � as � are 4during layer. This layer is not fixed other and parameters (pi, qi, ri) are estimated i i i during training process. Output calculated as: training process. Outputduring of nodes are calculated as: Output timated training process. of nodes are calculated as: layer. layerprocess. is not�Output fixed asof�other and parameters (pi, qi, ri) are estimated duringThis training nodes are calculated as: �� . = � � = � � � � � � � � � � � � �� . � ��,��,�= � �� = �� ing of nodes are calculated as: thtraining process. Output �� . � = � � � = � � � � � � � � th �,� � � � � � � layer.This Thisisisananoutput outputlayer, layer,where whereoutput outputvalue valueisiscalculated calculatedasasa asum sumofofallall 5 5 layer. th =� �� layer, =� �� value ��,�output � � � output � �. 5 layer. This is an where is calculated as a sum of all inputs: inputs: th ■ 5 layer.where This output is an output layer, where output value is calculated as layer. value inputs:This is an output layer, � �is calculated as a sum of all ∑∑ �� . � = � � � � = a sum of all inputs: �,� � � = � � � � = . � �,� � � uts: ∑∑�∑ � �� = � � . ��,� = � � �� ∑ � ∑� � � � � training � � thisresearch, research,two twotypes types ANFIS algorithmsare areused: used:classical classical this algorithms ��,�of =ofANFIS � � �� model =modeltraining . ∑� �� this research, two types of ANFIS model training algorithms are used: classical � entsteepest steepestdescend descendbackpropagation backpropagationand andhybrid hybridtraining trainingalgorithm. algorithm.Hybrid Hybridtraining training nt

two typesbackpropagation ofbackpropagation ANFIS model training algorithms are used: classical ntresearch, steepest descend and hybrid algorithm. Hybrid training bines gradient descend and leasttraining squaresmethods. methods. Backpropagation nes gradient descend backpropagation and least squares In this research, two types of ANFIS modelBackpropagation training algorithms are used: teepest descend backpropagation and hybrid training algorithm. Hybrid training nes gradient descend backpropagation andparameters, least squares methods. Backpropagation tune input layermembership membership function parameters, while least squares method ded tototune input layer while least squares method isis training algoclassical gradientfunction steepest descend backpropagation and hybrid gradient descend backpropagation and least squares methods. Backpropagation dor tooutput tune input layer membership function parameters, while least squares method is for output function parameter tuning. rithm. Hybridtuning. training combines gradient descend backpropagation and least function parameter tune input layer membership function parameters, while least squares method is orinput output function parameter tuning.Backpropagation squares methods. is used to c-means tune input layer membership or inputlayer layer membership function parameterinitialization initialization fuzzy c-means (FCM) membership function parameter fuzzy (FCM) function parameters, while least squares method is used for output function utput function parameter tuning. input layer membership function parameter initialization fuzzy c-means (FCM) ering algorithms used,that that partitions dataofofthe theinput inputspace space intosome somenumber number ing algorithms isisused, partitions data into (c)(c) parameter tuning. put layer membership function parameter initialization fuzzyinto c-means (FCM) (c) ing is used, thatmembership partitions data of the input space some number usters anduse usethem them input membership function initialization. ters algorithms and asasinput function initialization. For input layer membership function parameter initialization fuzzy c- means algorithms used, partitions data of the input space intoFuzzy some Logic number (c) ters and useisthem asthat input membership function initialization. thisresearch research ANFIS model thatisis implemented inMATLAB MATLAB Toolbox his ANFIS that implemented Fuzzy Logic Toolbox (FCM)model clustering algorithms isinused, that partitions data of theisis input space into andresearch use themANFIS as input membership function initialization. his that(c) is of implemented in MATLAB Fuzzy Logicmembership Toolbox is function inisome model number clusters and use them as input research ANFIS model that is implemented in MATLAB Fuzzy Logic Toolbox is tialization. In this research ANFIS model that is implemented in MATLAB Fuzzy Logic Toolbox is used. ■ Extreme learning machine model. Extreme learning machines (ELM) (Huang, Zhu & Siew 2006) have the same architecture as single hidden layer feed-forward neural networks. Differently from conventional neural network achitectures, ELM doesn’t require backpropagation for parameter tuning. Instead, hidden node weights are chosen randomly and output weights are determined analytically. Main advantage of this type of learning is speed, which is many times faster than conventional iterative tuning (such as backpropagation).

Siew 2006) have the same architecture single hiddenachitectures, layer feed-forward neural ferently from conventional neural network achitectures, ELMas doesn’t require networks. Differently from conventional neural network ELM doesn’t require from conventional neural network achitectures, ELM doesn’t require arameter tuning. Instead, hidden node weights are chosen randomly networks. Differently from conventional neural network achitectures, ELM doesn’t require ion for parameter tuning. Instead, hidden node weights are chosen randomly backpropagation parameter tuning. Instead, parameter tuning. Instead, hidden nodefor weights chosen randomly determined analytically. Main advantage of thisare type of learning is hidden node weights are chosen randomly backpropagation for parameter tuning. Instead, hidden node eights are determined analytically. Main advantage of this type of learning is weights are chosen randomly and output areofdetermined enydetermined analytically. Main weights advantage this tuning type ofanalytically. learning times faster than conventional iterative (such asis Main advantage of this type of learning is and output weights are determined analytically. Main advantage ofiterative this type tuning of learning is as hanyis times many faster times fasterconventional than conventional iterative tuning as Ethan valuation of atm demand process factors … 219 (such speed, which iscash many times faster than conventional iterative tuning (such as(such speed, which is many times faster than conventional iterative tuning (such as ion). f observations (xi, yibackpropagation). ), single layer neural network output with M Given (x N ,number of observations (xi, yi), output single with layerMneural network output backpropagation). single layer neural network number of observations i yi),N , yi), M single layer neural network output with M number of network observations (xiwith , y single layer neural output of observations (x i i), Given ed as: with M hidden nodes is modeled as: Given number of observations (xi, yi), single layer neural network output with M is hiddenNnodes is modeled as: edmodeled as: � as: � � = ∑�� (�� ) . � nodes is�(� modeled �� = ∑ � � �as: � ) .. �� = ∑� � = ∑hidden � �� )� . � �� (�� ) . �� (� vector; �� is weight vector� connecting the jth hidden� node and the � ∑ = � �� )the . � � � �(� � �� input� vector; �� Where isvector weightconnecting vector connecting the jth hidden is ith input vector; � is weight vector the jth hidden node and the � tith vector; is weight the jth hidden node andnode the andconnecting �w th input vector; �is weight is vector connecting the jth hidden Where ght scalar �connecting jth xhidden node and output node; � is bias � i j Where �� is ith vector; �and vector �connecting the jth hidden node and the � is weight weightconnecting scalar connecting jthinput hidden output node; is bias node �� is scalar � hidden input nodes; �node is weight scalar connecting jth and output node; eight jththe hidden andnode output node; � is bias node and input ß is weight scalar connecting jth hidden node and�� is bias � nodes; � j n node. input nodes; � is weight scalar connecting jth hidden node and output node; � � bias � is bias jth hidden node. outputparameter node; bj is parameter of jth hidden node. of jthnodes hidden n node. ar output nodes and sigmoid hidden arenode. used. Above equation In thisand research linear outputare nodes and sigmoid hidden nodes are used. parameter ofsigmoid jth hidden node. earch linearnodes output nodes hidden used.and Above equation In this research linear output nodes sigmoid hidden nodes are used. Above equation ear output and sigmoid hidden nodes arenodes used. Above equation r form: Above equation can be written in vector form: hidden nodes are used. Above equation thisberesearch output nodes and sigmoid norinform: vector form: Incan written linear in vector form:

can be��. written in vector form: �= � = ��. � = ��. � = dden layer output matrix��. and �� = �(�� ). � = ��. and � � �layer hidden layer output matrix and � �(� �� ). matrix � �output � �output Where H is��Nis M��(� hidden layer Where ��hidden matrix and �� = �(�� ). hidden output matrix and = �=� � ��layer ). ��x � ng ELM theory is simply estimated as: �simply is � of �estimated � hiddenas: layer output matrix and �estimated �� ). TheThe solution issimply simply �� = �(� of ELMisWhere theory issolution ofapplying applying ELM ELMtheory theory is estimated as:� as: ngapplying ELM theory estimated as: �simply = �� The = solution ELM theory is simply � =of�applying �� =estimated �� as: �� is Moore � – Penrose generalized inverse (pseudoinverse) � � �� = (pseudoinverse) � � = � is –Moore – � Penrose )��(� = (��generalized �)�� is inverse Moore – �� Penrose generalized inverse (pseudoinverse) Where ��) is Moore Penrose generalized inverse (pseudoinverse) � � �� is Moore – Penrose generalized inverse (pseudoinWhere (� = �) � is Moore – Penrose generalized inverse (pseudoinverse) Where � matrix. entation of classical ELM is used in this research, which is available verse) matrix. matrix. B implementation of classical ELMinisimplementation usedresearch, in this research, is available A MATLAB of classical ELM is used in this research, which is available mentation of classical ELM is used this which iswhich available , Zhu & Siew 2006). A MATLAB implementation of classical ELM is used in this research, which is A MATLAB implementation of classical ELM is used in this research, which is available ee Huang, Zhu & Siew 2006). (see Huang, Zhu &Zhu Siew&2006). g, Zhu & Siewavailable 2006).at webpage at webpage (see Huang, Siew 2006). at webpage (see Huang, Zhu & Siew 2006). l data datadata is used with historical ifferent typical ATMExperimental daily withdrawal Experimental data earch 8 different typical ATM daily withdrawal data is used with historical In thiswithdrawal research different typical ATM daily withdrawal data is used with historical different typicalwere ATM daily data is used with historical Experimental data Those 8 ATMs selected from large 8database and represent typIn this research 8 different typical ATM daily withdrawal data database is used with historical typ990 days. Those 8 ATMs were selected from large database and represent period up from toFurther days. Those 8and ATMs were selected typfrom data large and hisrepresent Those 8 ATMs selected large database represent typInwere this research 8990 different typical ATM daily withdrawal is used with hat occur in cash flow process. a short explanation of every period up to 990 days. Those ATMs selected from large database and represent of typwthat factors that inical cash flow process. a short explanation of every cash flow that8Those occur in8were cash flow process. Further a short every occur in occur cash flow process. Further short explanation of every torical period up tofactors 990 aFurther days. ATMs were selected from largeexplanation database ical cash flow factors that occur in cash flow process. Further a short explanation of every uced: and represent typical cash flow factors that occur in cash flow process. Further ATM is conduced: contains cash flow with strong weekly seasonality factor; ATM is conduced: a short explanation of every ATM iscash conduced: number 1 contains cash flow withweekly strong weekly seasonality factor; strong ATM number 1 contains flow with strong weekly seasonality factor; contains cash flow with seasonality factor; contains cash flow with strong yearly and weekly seasonality fac ■ ATM number 1 contains cash flow with strong weekly seasonality factor;  with ATM number 1 contains cash flow with strong weekly seasonality factor;seasonality facnumber 2 contains cash flow withyearly strong yearly and seasonality weekly seasonality fac-yearly  ATM number 2 contains cash flow with and weekly 2y seasonality contains cash flow strong and weekly fac- strong is smooth; ■ ATM number 2 contains cash flow with strong yearly and weekly seaso isATM number 2yearly contains cash flow with strong yearly and weekly seasonality fachen yearly seasonality smooth; tors, whenwhen seasonality is smooth; ly seasonality is smooth; nality factors, yearly seasonality is smooth; contains cash flow with strong monthly seasonality factor; tors, when yearlymonthly seasonality is smooth; number 3 contains cashATM flow withmonthly strong seasonality factor; ■ with strong ATM number 3 contains cash with strong monthlyseasonality seasonality factor; factor; contains cash flow seasonality factor; number 3 contains cash flowflow with strong monthly  ATM number 3 contains cash flow with strong monthly seasonality factor; ■ ATM number 4 contains cash flow with strong yearly and weekly seaso

■

nality factors, when yearly seasonality is not smooth (temporal structural breaks); ATM number 5 contains cash flow with weak mixed (weekly and monthly) seasonality, increasing trend and heteroscedasticity factors with

M number 4 contains cash flow with strong yearly and weekly seasonality facnumber 4 contains flow strong yearly and weekly seasonality number 4 contains cash cash flowbreaks); with with strong yearly and weekly seasonality fac- fac220  ATM Gediminas Žylius when yearly seasonality is notATM smooth (temporal structural

tors, when yearly seasonality issmooth not smooth (temporal structural breaks); when yearly seasonality is not (temporal structural breaks); M number 5 contains cashtors, flow with weak mixed (weekly and monthly) seasonpermanent structural break factor (sudden cashmixed flow (weekly break ATM  ATM number 5 contains cash with and monthly) seasonATM number 5 factors contains cash flowflow with weakweak mixed andwhen monthly) seasonincreasing trend and heteroscedasticity with permanent structural break(weekly cash flow decreases); ality, increasing trend and heteroscedasticity factors permanent structural ality, increasing trend and heteroscedasticity factors with with permanent structural breakbreak r (sudden cash flow break when ATM cash flow decreases); ■ ATM number 6 contains cash flow with mixed (weekly and monthly) factor (sudden flow break when cash decreases); factor (sudden cash cash flow breakand when ATMATM cash flowflow decreases); M number 6 contains cash flow with mixed (weekly monthly) seasonality seasonality and permanent structural break factors (sudden cash flow  factors ATM number 6cash contains cash with mixed and monthly) seasonality ATM number 6 cash contains cash flowflow with mixed (weekly and monthly) seasonality break when ATM flow increases); permanent structuralbreak (sudden flow break when ATM cash(weekly ■ ATM and permanent structural break factors (sudden cash break when 7 contains cash flow with weekly seasonality and temporal andnumber permanent structural break factors (sudden cash flowflow break when ATMATM cash cash increases); structural break factors (sudden cash flow break when ATM cash flow flow increases); increases); M number 7 contains cashflow flow with weekly seasonality and temporal structural decreases and again increases to normal cash flow level after some pe ATM number 7 contains flow seasonality and temporal structural ATM number contains flow with with weekly seasonality and temporal structural k factors (sudden cash riod); flow break when7ATM cashcash flowcash decreases and weekly again inbreak factors (sudden decreases and again breakafter factors (sudden cash cash flowflow breakbreak whenwhen ATMATM cash cash flowflow decreases and again in- ines to normal cash flow level some period); ■ ATM number 8 contains cash flow with weekly seasonality, increasing creases toweekly normal cash level after period); creases to normal cashseasonality, flowflow level after somesome period); trend and heteroscedasticity factors. M number 8 contains cash flow with increasing trend and Cash flow pictures together with autocorrelation functions areincreasing presented in trend  ATM number 8 contains cash flow with weekly seasonality, increasing  ATM number 8 contains cash flow with weekly seasonality, trend and and oscedasticity factors. Appendix forheteroscedasticity every ATM. factors. factors. w pictures together withheteroscedasticity autocorrelation functions are presented in Appendix Cash flow pictures together autocorrelation functions are presented in Appendix Cash flow pictures together with with autocorrelation functions are presented in Appendix TM.

gy

for every ATM. for every ATM.

Methodology

In order to evaluate forecasting accuracy a specific metric is needed. In this reMethodology Methodology to evaluate forecasting accuracy a specific metric is needed. In this research search forecasting accuracy metric called symmetric mean absolute percentIn order to evaluate forecasting accuracy a specific is needed. In research this research In order to evaluate forecasting accuracy a specific metric is needed. In this accuracy metric called absolute error (SMAPE) ismetric age errorsymmetric (SMAPE)mean is used whichpercentage is calculated with following formula: forecasting accuracy metric called symmetric absolute percentage (SMAPE) accuracy metric called symmetric meanmean absolute percentage errorerror (SMAPE) is is is calculated withforecasting following formula:

which is calculated formula: usedused which is calculated with following formula: � |� �� | following �� with 100 � . �� | �� | |�� |�� 100 100� � � �� 1 �� . � �� 1 . 2 � �� 1�� 2�� 2 is forecasted cash flow value and �� is real cash flow value. forecasted cash flow value �real real value. Where Where is�forecasted forecasted cash flow value and iscash realcash cashflow flow value. is cash flow value andcash �and flow value. Where �� model � is with � is � isflow ing is performed by training one part of historical dataset

Forecasting is performed by training model one of parthistorical of historical dataset Forecasting is performed by with training model with one part cash cash flowflow dataset sting (forecasting accuracy evaluation) is done another part of with historical Forecasting is performed by training model with one part of historical cash once and (forecasting accuracy evaluation) is done another of historical and testing (forecasting evaluation) done with with another part part of historical ataset. Four casesonce of training fortesting every ATM areaccuracy performed (Figure 4):is 1) modflow dataset once and testing (forecasting accuracy evaluation) is done with cash flow dataset. Four of training for2)every ATM are performed (Figure 1) modcash flow dataset. Four casescases of training for every ATM areare performed (Figure 4): 1)4):moded with 2 yearanother (or 800 days for some ATMs) historical period; models part of historical cash flow dataset. Four cases of training for every els are trained with 2 year (or 800 days fortrained some ATMs) 2) models els are trained with (Figure 2 year (or days formodels some ATMs) historical 2) models are are ATM are performed 4):800 1) models are with 2historical yearperiod; (orperiod; 800 days 1.5 year (or 400 days for some ATMs) historical period; 3) are trained trained with 1.5 year (orperiod; 400 formodels some ATMs) historical period; 3)year models are trained trained with 1.5 year (orperiod; 400 days for 2) some ATMs) historical 3)1.5 models are forsome some ATMs) historical are trained with (ortrained (or 200 days for ATMs) historical 4) days models are trained with 0.5 period;

400 days for some ATMs) historical period; 3) models are trained with 1 year (or 200 days fortesting, some ATMs) historical 4) models are1 year trained with with 1 historical year (or 200 days for some ATMs) historical period; are trained with with 0.5 0.5 0 days for some ATMs) period. But for same amount of period; data4) models (oryear 200 days for some ATMs) historical period; 4) models are trained with (or days 100 days for some ATMs) historical period. Buttesting, for testing, amount of data (or 100 for some ATMs) historical period. But for samesame amount of data every training case. year 0.5 year (or 100 days for some ATMs) historical period. But for testing, same is used for every training is used for every training case.case. amount of data is used for every training case.


221

Figure 4. Illustration of ATM cash flow data division

Figure 4. Illustration of ATM cash flow data division


In order to train model for forecasting, some inputs features that represent cash flow In order to train model for forecasting, some inputs features that represent

factors cash mustflow be constructed. researchIn9 this features are constructed asconstructinputs for every factors must In be this constructed. research 9 features are ed asmodel inputs(ifor every forecasting model represents forecasting day index): forecasting represents forecasting day(iindex):

1st input. Month number (numbers 1–12); 2 input. Day of the month (numbers 1–31); Day ofDay theofmonth (numbers 1-31);  2 nd input. 3rd input. the week (numbers 1–7); th rd 4 input. amount of1-7); i-1 day (previous day); Day ofWithdrawal the week (numbers  3 input. 5th input. Withdrawal amount of i-7 day (previous week day);  4 th input. amountamount of i-1 day (previous day); 2nd week day); 6thWithdrawal input. Withdrawal of i-14 day (previous th th 7 input. Withdrawal amount i-21 day (previous 3rd week day); Withdrawal amount of i-7 of day (previous week day);  5 input. 8th input. Withdrawal amount of i-28 day (previous 4th week day);  6 th input.th Withdrawal amount of i-14 day (previous 2nd week day); 9 input. Sum of withdrawals of last 5 days (i-5 to i-1). th Those Withdrawal features were selected experimental order to evaluamount of after i-21 day (previous studies. 3rd weekInday);  7 input. ate th what features (factors) are best for each particular th type of ATM cash flow  8 input. Withdrawal amount of i-28 day (previous 4 week day); forecasting, one must perform forecasting with all combinations of features th input. Sumbinomial of withdrawals 5 daysof(i-5 to i-1).cases. However, using  9which using formulaofis last number training knowledge from previous experimental studies features from 5totoevaluate 8 are conThose features were selected after experimental studies. In order what feasidered as one set. So the combination set reduces from 9 to 6 and so number of tures (factors) are best for each particular type of ATM cash flow forecasting, one must combinations reduces from 511 to 63. perform forecasting withmodels all combinations ofexternal features parameters which using binomial is CI forecasting have specific that need toformula be properly during training phase. In order to statistically minimize CI ∑��adjusted � = 2� − 1 = 2� − 1 = 511 number of training cases. However, using knowledge forecasting model overfitting and underfitting during training phase, a 10-fold from previous experimental studies features to 8 are considered one set. cross-validation procedure is used withfrom every5forecasting model thatasneeds pa- So the rameter during training phase are used for testing combination settuning. reducesTuned from parameters 9 to 6 and so number of combinations reduces from 511 to phase.



1st input.ndMonth number (numbers 1-12); ■ ■ ■ ■ ■ ■

■ ■ ■

63.

CI forecasting models have specific external parameters that need to be adjusted proper-

ly during training phase. In order to statistically minimize CI forecasting model overfitting and underfitting during training phase, a 10-fold cross-validation procedure is used with

222

Gediminas Žylius

Results

This section presents forecasting accuracy results. Forecasting accuracy results averaged over all 7 forecasting models are presented in Table 1. The first row in the table represents ATM numbers according to their type, which is described in section 3. Second to fifth rows show average SMAPE (averaged over all model forecasts) for particular training type when inputs that yield best average SMAPE amongst all forecasting models and training types (averaged over all seven models and all four training types) are selected. Sixth row show those best average input numbers according to their type as described in section 4. Seventh row show best averaged over all models SMAPE when best input set for particular training type (not for all training types) is selected. Eighth row show those best input numbers for particular training type. Other rows show same kind of results for other training types. Forecasting results averaged for all forecasting models and sorted according to average of all forecasting models and training types for each particular input type (sorted 63 inputs) are depicted in Appendix. Further an explanation of forecasting results for each ATM is made: ATM number 1: worst accuracy is achieved with shortest training history (0.5 year) mostly for all training inputs (see figure in Appendix). Training with long history (2 years) doesn’t improve average forecasting results and best results are achieved with 1.5 year training – a trade-off between overfitting because of too short history and overfitting because of learning patterns that do not reoccur in the future. Best average input set is weekly values of cash flow (1–4 weeks before, see Table 1). However week number wasn’t selected as best input over all models, this shows that cash flow is non-stationary and recent history update of cash flow pattern improves forecasting even for strong weekly seasonality. ATM number 2: as for ATM 1, worst results are achieved with shortest history training. Now long history is more significant than for ATM 1 and best results are achieved with 2 year training (mostly over all input sets, see figure in Appendix). However best average input set doesn’t include month number and day of month in best input set. Week number, 1–4 previous weeks cash flows are related to strong weekly seasonality and moving average (9 th input) lets model more adaptively react to recent cash flow trend variations because of yearly seasonality (see Table 1). ■

■


■

■

■

■

223

ATM number 3: results are not so affected by history period length as for ATM 2 case, but worst results are obtained using 0.5year training also. Month and day of the week numbers weren’t included in best input set, this shows that no significant weekly or yearly seasonality that may affect forecasting took place. Other included inputs are related to monthly periodicity and help to forecast changing monthly pattern. ATM number 4: surprisingly as for yearly seasonality in ATM 2, month number inputs weren’t included in best input set, even though this type of ATM has not so smooth transition of yearly pattern. This might be explained by more strong recent cash flow value impacts (moving average, cash flow day before, or 1–4 previous week cash flows) for sudden change in cash flow than yearly-related (month number, day of month) regression input effects even though yearly seasonality is strong, it has a quasi-periodic structure and values of last year only partially reoccur in current year on the same day. However as for ATM number 2, long history is more necessary in order to forecast more accurately. ATM number 5: interesting results are obtained for this kind of ATM. As expected, most accurate results for most of the inputs (see figure in Appendix) are achieved when shortest history (after sudden cash flow process structural break occurs) is used. This is because cash flow training data after structural break concludes relatively larger part of training examples than for longer period training data. So model adapts to recent break. But it is interesting that this affect is only partial: the results are worse for 1.5 year training, but for 2 year training accuracy increases. This is explained integral part of ATM cash flow (increasing trend) and at the beginning model learns small level cash flow relationship, which is similar to that after the structural break. As expected, for this kind of ATMs with structural break, inputs that encode recent past are more effective than regression inputs. ATM number 6: even though this ATM has structural break as ATM number 5, the structural break impact is less obvious. This might be related to the direction of structural break and break level difference. An explanation of forecasting results is more difficult: shortest history learning is not the best (but looking at figure in Appendix seems worst) because of overfitting (even though it contains history without structural break) because of too short history training. Long history contains structural break so this disturbs model training and overfittig also takes place. Day

224

■

■

Gediminas Žylius

of month input is included in best input set because of monthly seasonality, and 1–4 previous week inputs add to both weekly and monthly (because of 4 weeks) pattern forecasting. But moving average and previous day cash flow inputs were not included. This might be because of too small structural break. However 1–4 previous week cash flow inputs contain recent past information together encoding weekly and monthly seasonality patterns. ATM number 7: because temporal structural break was relatively short, 2 year training still performed best. 1 year training performed worst because full temporal break was included and concluded biggest part of training data compared to 2 or 1.5 year training cases and this lead to model overfitting. 0.5 year training didn’t include temporal structural break part, but didn’t perform best because of too short training history. Because of weekly seasonality, best input set included related inputs. ATM number 8: looking at figure in Appendix it is seen that using 2 year history training is worst case for most input sets, so this means that if ATM has growing tendency, forgetting the past is good, but too few training values will not train the model with maximum accuracy, so proper decision for model training is needed because of short and long history training trade-off. Moving average input is selected to best input set and helps when with trend forecasting, weekly-related inputs correspond to weekly seasonality factors in ATM cash flow process.

Table 1. Forecasting results (in SMAPE, %) over all forecasting models for every ATM ATM Number

1

2

3

4

5

6

7

8

2 Year/800 Days Training (Best Avg. Inputs)

40.64

59.84

66.30

55.00

83.27

39.30

50.94

43.04

1.5 Year/400 Days Training (Best Avg. Inputs)

39.88

61.53

67.02

56.70

85.38

40.21

51.16

43.09

1 Year/200 Days Training (Best Avg. Inputs)

40.55

64.66

70.05

57.35

80.89

39.76

51.80

42.60

0.5 Year/100 Days Training (Best Avg. Inputs)

40.95

67,07

69.96

57.27

82.48

42.61

51.24

43.26

Best Avg. Inputs

5678

3567 89

24567 89

349

56789

2356 78

35678

3567 89

2 Year/800 Days Training (Best 2 Year/800 Days Inputs)

40.64

59.84

66.30

55.00

82.40

39.30

50.71

43.04

225

Evaluation of atm cash demand process factors… ATM Number

1

2

3

4

5

6

7

8

Best 2 Year/800 Days Inputs

5678

3567 89

24567 89

349

9

2356 78

2356 78

3567 89

1.5 Year/400 Days Training (Best 1.5 Year/400 Days Inputs)

39.42

61.53

66.92

55.71

83.31

40.21

51.16

43.09

Best 1.5 Year/400 Days Inputs

35678

3567 89

2456 78

39

9

2356 78

35678

3567 89

1 Year/200 Days Training (Best 1 Year/200 Days Inputs)

40.55

64.66

67.24

56.72

80.89

39.76

51.74

42.60

Best 1 Year/200 Days Inputs

5678

3567 89

2356 78

23456 78

3567 89

0.5 Year/100 Days Training (Best 0.5 Year/100 Days Inputs)

40.95

66.88

67.17

57.27

75.45

41.48

51.08

42.52

Best 0.5 Year/100 Days Inputs

5678

34567 89

234

349

123

2456 78

2356 78

3456 78

12456 13456 56789 78 789

S o u r c e : own studies.

Conclusions and future works

After obtaining forecasting results for various types of ATM cash flows two important conclusions can be made: Choosing model training history is very important. Using long history is useful if yearly seasonality factors are relatively important in ATM cash flow process. If cash flow is relatively stationary (as ATM number 1 and ATM number 3) and has strong monthly or weekly seasonality, using too much history will not increase forecasting accuracy. But if ATM cash flow process has structural breaks or various trend components, training history length must be selected carefully in order to avoid overfitting. Proper input selection is even more important than training history period selection. Time-series inputs such as previous day cash flow value, moving average and 1–4 previous week cash flow values proved to be more efficient input features than regression inputs and contain time-varying information with recent history encoding. So calendar effect forecasting inputs are not as important as time-series feature inputs. Purely regression inputs are effective only when stationarity of cash flow could be assumed. Combination of both may increase forecasting accuracy. ■

■

226

Gediminas Žylius

In future works an investigation of multiple-day-ahead forecasting will be performed, which is more practical for cash flow forecasting. However, multiple-day-forecasting is far more challenging research object with CI models, because of various factors related to error accumulation, conditional probability estimation. Also, automatic statistical methods that would be useful for ATM type identification for input and history period selection will be investigated. References

Brentnall, A.R., Crowder, M.J., & Hand, D.J. (2008). A statistical model for the temporal pattern of individual automated teller machine withdrawals. Journal of the Royal Statistical Society: Series C (Applied Statistics). 57(1). 43–59. http://dx.doi. org/10.1111/j.1467-9876.2007.00599.x. Brentnall, A.R., Crowder, M.J., & Hand, D.J. (2010a). Predicting the amount individuals withdraw at cash machines using a random effects multinomial model. Statistical Modeling. 10(2). 197–214. http://dx.doi.org/10.1177/1471082X0801000205.

Brentnall, A.R., Crowder, M.J., & Hand, D.J. (2010b). Predictive-sequential forecasting system developement for cash machine stocking. International Journal of Forecasting. 26. 764–776.

Chih-Chung, C., & Chih-Jen, L. (2011). LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology. 2(27). 1–27. http://www. csie.ntu.edu.tw/~cjlin/libsvm (accessed: 29.03.2015).

Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning. 20(3). 273– –297. http://dx.doi.org/10.1007/BF00994018. Darwish, S.M. (2013). A methodology to improve cash demand forecasting for ATM network. International Journal of Computer and Electrical Engineering. 5(4). 405–409. http://dx.doi.org/10.7763/IJCEE.2013.V5.741.

Ekinci, Y., Lu, J.C., & Duman, E. (2015). Optimization of ATM cash replenishment with group-demand forecasts. Expert Systems with Applications. 42(7). 3480–3490. http://dx.doi.org/10.1016/j.eswa.2014.12.011.

Gurgul, H., & Suder, M. (2013). Modeling of withdrawals from selected ATMs of the Euronet network. AGH Managerial Economics. 13. 65–82. http://dx.doi.org/10.7494/ manage.2013.13.65. Hagan, M.T., & Menhaj, M. (1994). Training feed-forward networks with the Marquardt algorithm. IEEE Transactions on Neural Networks. 5(6). 989–993.

Huang, G.B., Zhu, Q.Y., & Siew, C.K. (2006). Extreme learning machine: theory and applications. Neurocomputing. 70. 489–501. http://www.ntu.edu.sg/home/egbhuang/ elm_codes.html (accessed: 29.03.2015).

Jang, J.S.R. (1993). ANFIS: adaptive-network-based fuzzy inference systems. IEEE Transactions on Systems, Man, and Cybernetics. 23(3). 665–685. http://dx.doi. org/10.1109/21.256541.


227

Kumar, P., & Walia, E. (2006). Cash forecasting: An application of artificial neural networks in finance. International Journal of Computer Science & Applications. 3(1). 61–77.

Laukaitis, A. (2008). Functional data analysis for cash flow and transactions intensity continuous-time prediction using Hilbert-valued autoregressive processes. European Journal of Operational Research. 185(3). 1607–1614. http://dx.doi.org/10.1016/j. ejor.2006.08.030.

Pelckmans, K., Suykens, J.A.K., Van Gestel, T., De Brabanter, J., Lukas, L., Hamers, B., De Moor, B., & Vandewalle, J. (2002). LS-SVMlab : a Matlab/C toolbox for Least Squares Support Vector Machines. Internal Report 02-44, ESAT-SISTA, KU Leuven, Leuven, Belgium, viewed 14 May 2015. http://www.esat.kuleuven.be/sista/lssvmlab/old/ lssvmlab_paper0.pdf (accessed: 29.03.2015). Rodrigues, P., & Esteves, P. (2010). Calendar effects in daily ATM withdrawals. Economics Bulletin. 30(4). 2587–2597. Rumelhart, D.E., Hinton, G.E., & Williams, R.J. (1986). Learning representations by backpropagation errors. Nature. 323. 533–536.

Simutis, R., Dilijonas, D., & Bastina, L. (2008). Cash demand forecasting for ATM using neural networks and support vector regression algorithms. Proceedings of the twentieth EURO mini conference on continuous optimization and knowledge-based technologies (EurOPT-2008). Neringa. Lithuania. 416–421. Simutis, R., Dilijonas, D., Bastina, L., & Friman, J. (2007). A flexible neural network for ATM cash demand forecasting. Proceedings of the sixth WSEAS international conference on computational intelligence, man-machine systems and cybernetics (CIMMACS 07). 162–165. Specht, D.F. (1991). A general regression neural network. IEEE Transactions on Neural Networks. 2(6). 568–576.

Suykens, J.A.K., Van Gestel, T., De Brabanter, J., De Moor, B., & Vandewalle, J. (2002). Least Squares Support Vector Machines. World Scientific. Singapore.

Tipping M.E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research. 1. 211–244. http://www.miketipping.com/ sparsebayes.htm (accessed: 29.03.2015). Wagner M. (2010). Forecasting daily demand in cash supply chain. American Journal of Economics and Business Administration. 2(4). 377–383. http://dx.doi.org/10.3844/ ajebasp.2010.377.383.

arning Research. 1. 211–244. http://www.miketipping.com/sparsebayes.htm (accessed: 29.03.2015).

agner M. (2010). Forecasting daily demand in cash supply chain. American Journal of Economics and

usiness Administration. 2(4). 377–383. http://dx.doi.org/10.3844/ajebasp.2010.377.383.

appendix

ppendix

ATM Number 1 Cash Flow

1

0.5

0

0

100

200

300

400

500

600

700

800

900

400

450

Day Sequence Number Sample Autocorrelation Function

1 0.5 0 -0.5

0

50

100

150

200

250

300

350

Lag

ATM Number 1 Forecasting Results

65

2 year training 1.5 year training 1 year training 0.5 year training

60

55

50

45

40 10

20

30

40

50

Sequence Number of Mean-Sorted Feature Set

1


60

40 45 10

20

30

40

50

60

40


1

10 20 Number 30 2 Cash 40Flow 50 ATM Sequence Number of Mean-Sorted Feature Set


1 0.5

0.5 0

0 1

60

0

100

200

300

400

500

600

700

800

900

1000

800

900

1000

400

450

500

100 200 2 Forecasting 250 300 Results 350 400 ATM150 Number

450

500

Day Sequence Number Sample Autocorrelation Function 0

100

200

300

400

500

600

700


0.5 1 0 0.5 -0.5 0 -0.5 130

0

50

100

150

200

250

300

350

Lag 0

50

Lag

120

ATM Number 2 Forecasting

130 110 120 100 110

2 year training 1.5 year training Results 1 year training 0.5 year training 2 year training 1.5 year training 1 year training 0.5 year training

90 100 80 90 70 80 60 70 50 60 50

0

10

20

30

40

50

60

70

Sequence Number of Mean-Sorted Feature Set 0

10

20

30

40

50

60


70


1


1 0.5 0.5 0 0

0

100

200

300

700

800

900

1000

0

100

200

Sequence 300Day400 500 Number 600 700

800

900

1000

0

50

100

150

200

250

300

350

400

450

500

0

50

100

150

200

Lag 250

300

350

400

450

500

1 1 0.5

400

500

600

Sample Autocorrelation Function Day Sequence Number Sample Autocorrelation Function

0.5 0 0 -0.5 -0.5

Lag ATM Number 3 Forecasting Results

105


2 year training 1.5 year training 2 year training 1 year training 1.5 year training 0.5 year training 1 year training 0.5 year training

105 100 100 95 95 90 90 85 85 80 80 75 75 70 70 65 65

1 1 0.5 0.5

0

10

60

70

0

Sequence Feature Set 10 20Number30of Mean-Sorted 40 50 60

20

30

40

50

70

Sequence Number of Mean-Sorted Feature Set ATM Number 4 Cash Flow ATM Number 4 Cash Flow

75 70 65

0

10

20

30

40

50

60

70

Sequence Number of Mean-Sorted Feature Set ATM Number 4 Cash Flow

1

0.5

0

0

100

200

300

400

500

600

700

800

900

1000

400

450

500


1 0.5 0 -0.5

0

50

100

150

200

250

300

350

Lag


130


120 110 100 90 80 70 60 50

0

10

20

30

40

50

60


1

0.5


70

70 70 60 60 50 50

0

10

20

30

40

50

60

70


0

10

20

30

40

50

60

70


1


1 0.5 0.5 0 0

0

100

0

100

0

50

200

300

400

500

600

700

800

Day Sequence Number 200 300 400 500 600 700 800 Sample Autocorrelation Function Day Sequence Number Sample Autocorrelation Function

1 1 0.5

900

1000

900

1000

0.5 0 0 -0.5 -0.5

0

100

50

100

150 150

200 200

250

Lag 250

300

350

400

450

500

300

350

400

450

500


170

ATM Number 5 Forecasting Results 800 days training

170 160

400 days training 800 200 days days training training 400 100 days days training training 200 days training 100 days training

160 150 150 140 140 130 130 120 120 110 110 100 100 90 90 80 80 70 70

0 0

10

20

30

40

50

60


10

20

30

40

50

60


70 70


1


1 0.5 0.5 0 0

0

100

0

100

0

50

200

300

400

500

600

700

800

Day Sequence Number 200 300 400 500 600 700 800 Sample Autocorrelation Function Day Sequence Number Sample Autocorrelation Function

1 1 0.5

900

1000

900

1000

0.5 0 0 -0.5 -0.5

0

100

50

100

150 150

200 200

250

Lag 250

300

350

400

450

500

300

350

400

450

500


75

ATM Number 6 Forecasting Results 800 days training

75 70

400 days training 800 200 days days training training 400 100 days days training training 200 days training 100 days training

70 65 65 60 60 55 55 50 50 45 45 40 40 35 35

1 1 0.5 0.5

0 0

10

20

30

40

50

60


10

20

30

40

50

60


70 70

45 40 35

0

10

20

30

40

50

60

70


1

0.5

0

0

100

200

300

400

500

600

700

800

900

1000

400

450

500


1 0.5 0 -0.5

0

50

100

150

200

250

300

350

Lag


70


68 66 64 62 60 58 56 54 52 50

0

10

20

30

40

50

60


1

0.5


70

56 54 54 52 52 50 50

0

10

60

70

0


20

30

40

50

70


1 1 0.5 0.5 0 0

0

100

200

300

700

800

900

1000

0

100

200

Sequence 300Day400 500 Number 600 700

400

500

600

800

900

1000

Sample Autocorrelation Function Day Sequence Number Sample Autocorrelation Function

1 1 0.5 0.5 0 0 -0.5 -0.5

0

50

100

150

200

250

300

350

400

450

500

0

50

100

150

200

Lag 250 Lag

300

350

400

450

500

ATM Number 8 Forecasting Results ATM Number 8 Forecasting Results

65 65

2 year training 1.5 year training 2 year training 1 year training 1.5 year training 0.5 year training 1 year training

60 60

0.5 year training

55 55 50 50 45 45 40 40

0

10

60

70

0


20

30

40

50

70