Applied Economics, 2002, 34, 1607 ±1615
Neural network forecasts of input±output technology C H R I S T O S T . P A P A D A S * and W . G E O R G E H U T C H I N S O N Agricultural University of Athens, Department of Agricultural Economics, Iera Odos 75, Votanikos 118 55, Athens, Greece and The Queen’s University of Belfast, Department of Agricultural and Food Economics, Newforge Lane, Belfast BT9 5PX, Northern Ireland, UK
A signi®cant part of the literature on input±output (IO) analysis is dedicated to the development and application of methodologies forecasting and updating technology coe cients and multipliers. Prominent among such techniques is the RAS method, while more information demanding econometric methods, as well as other less promising ones, have been proposed. However, there has been little interest expressed in the use of more modern and often more innovative methods, such as neural networks in IO analysis in general. This study constructs, proposes and applies a Backpropagatio n Neural Network (BPN) with the purpose of forecasting IO technology coe cients and subsequently multipliers. The RAS method is also applied on the same set of UK IO tables, and the discussion of results of both methods is accompanied by a comparative analysis. The results show that the BPN o ers a valid alternative way of IO technology forecasting and many forecasts were more accurate using this method. Overall, however, the RAS method outperformed the BPN but the di erence is rather small to be systematic and there are further ways to improve the performance of the BPN.
I. INTRODUCTION The costs of compiling accurate input±output (IO) tables in terms of both resources and time spent, have made the updating and projecting of IO technology coe cients an important issue, at least for short and medium-run input± output analysis. The existing techniques usually rely on a set of IO coe cients for a base year and on some incomplete information on total interindustry transaction s and outputs. This is the case for the RAS method introduced by Stone (1961) and Stone and Brown (1962), and since then it has found numerous applications. Other methods based on a very similar logic and data requirements have been proposed by Matuszewski et al. (1964) and Friedlander (1961) but have not been seen as serious improvements over RAS (while Friedlander’s method has also the major drawback of possible negative forecasts). In addition, Snower (1990) extended the RAS method to account for price movements, proposing a technique that
requires a substantially larger amount of information than the traditional RAS method. Other than RAS, methods proposed, include the `Best Practice Method’ of Miernyk (1970), the `Ex Ante Method’ (Fisher and Chilton, 1975; Hamilton, 1982), and the use of econometric estimates of production frontiers for IO forecasting (Field, 1986). Despite criticisms of the `restrictive’ economic logic in the assumptions of RAS (Harrigan, 1982) the method remains the most widely used, analysed, and extended one. Reasons for this include the very demanding nature, with regards to information, of other methods, their ability to provide inconsistent and even impossible results, computational di culties, and other large costs involved which diminish the advantage of short to medium-run IO forecasting over the actual construction of IO transactions tables. Newer methodologies and technologies available, however, have not made it to input±output analysis yet, even though their use is increasing in other areas of quantitative
* To whom correspondence should be addressed,
[email protected] Applied Economics ISSN 0003±6846 print/ISSN 1466±4283 online # 2002 Taylor & Francis Ltd http://www.tandf.co.uk/journals DOI: 10.1080 /0003684011011813 3
1607
C. T. Papadas and W. G. Hutchinson
1608 economics, mostly for the purposes of forecasting and optimization. In some instances they are used for theoretical analysis. These methodologies include applications of neural networks, genetic algorithms and fuzzy logic. This paper introduces the use of neural networks in IO analysis. In particular, it constructs, proposes, and applies a backpropagation neural network (BPN) with the objective of updating and forecasting IO technology coe cients and therefore IO multipliers as well. Certain di culties have prevented the acceptance of neural networks as a valid alternative to existing methods, especially the RAS, and the structure of the proposed BPN aims at overcoming these di culties in an e ective way. The ability to assimilate and process a large amount of data, their ¯exibility, and the lack of relatively restrictive assumptions, make the newer technologies and methodologies promising not only in technology forecasting, but also in other areas of IO analysis.
II. THE RAS METHOD If A ˆ ‰aij Š is the known IO coe cient matrix at a given base year and B ˆ ‰bij Š the unknown similar matrix of another period, according to the RAS method B can be forecasted if, for the same period, we know the gross output qi of every sector i, the purchase of total intermediate inputs vi for every sector i, and the sales of total intermediate inputs ui to the other sectors for every sector i. Given this new information, the RAS attempts to ®nd technology coe cients as `close’ as possible to those of the base period, even though `closeness’ can be de®ned in di erent ways. In particular, the forecasted matrix is given at the ®nal stage of the method’s algorithm by: B ˆ RAS
…1†
where R and S are diagonal matrices and their respective diagonal elements ri and si account for the `substitution’ and `fabrication’ e ects. In particular, it was argued (Stone and Brown, 1963) that each element of A is subject to an e ect of substitution representing the degree to which the sectoral output i has been substituted as an intermediate input in the production processes, and it is this e ect measured by ri . Similarly, each element of A is subject to a `fabrication’ e ect representing the degree to which the ratio of primary to intermediate inputs needed in the production of sectoral output i has changed, and this e ect is measured by si (Leontief has interpreted also each ri as a re¯ection of proportional changes in the productivity of each sector i, and each si as a re¯ection of changes in the e ciency of the ith sector in the use of all intermediate inputs). The solution of the RAS algorithm can be obtained also by solving the following system with respect to the n necess-
ary coe cients of B, the fabrication, and the substitution e ects: …2†
ri aij sj qj ˆ ui
where i ˆ 1; 2; . . . ; n
…3†
X
ri aij sj qj ˆ vj
where j ˆ 1; 2; . . . ; n
…4†
j
and
bij ˆ ri aij sj
X
i
Finally, Bacharach (1970) has shown that the RAS solution in achieving `closeness’ in terms of forecasted IO coe cients, can be obtained also by minimizing the following objective function, with respect to the bij s : ³ ´¶ XXµ bij bij log …5† aij j i subject to:
X j
bij qj ˆ ui
i ˆ 1; 2; . . . ; n
…6†
i
bij qj ˆ vj
j ˆ 1; 2; . . . ; n
…7†
X
The RAS method guarantees the non-negativit y of its results, while zero coe cients in the base year will result in zero coe cients being forecasted for these particular intermediate transactions. Similarly, non-zero coe cients at the base year will not become zero in the forecasted technical coe cients matrix. The main criticism of RAS relates to the fact that, as can be seen from Equations 2±4, the substitution and fabrication e ects ri and si for each sector i, work uniformly through all sectors in the corresponding row and columns, in the calculation process. This is a restriction, despite the fact that di erent combinations of ri and si for every coe cient allow of course each one of these to take a di erent path towards a value in the forecasted matrix B.
III. AN OVERVIEW OF THE BPN ALGORITHM Neural networks represent parallel computing processes, which in various ways simulate biological intelligence. This does not imply that the simulation is close to the complexity of the human brain but some of the existing knowledge of the brain’s functioning has provided the main idea of neural networks. The one employed here in this study is the Backpropagatio n Neural Network (BPN) or Feedforward Neural Network. Every input (independent variable here) is represented by a neuron (input neuron) and all input neurons are thought of as being placed in a layer, the input layer. Input neurons are not connected between themselves but they are all connected to
Neural network forcasts of input±output technology all neurons of another (higher) layer which is called the middle (or hidden) layer. Again, the neurons of the middle layer are not connected between themselves but each one of them is connected to the neurons of another (higher) middle layer, and so on. There is no predetermined number of middle layers or neurons in the middle layers. The ®nal layer, however, is called the output layer and is constituted of output neurons, each one representing an output (here the dependent variable). There can be more than one dependent variable therefore. Each connection of a neuron i to neuron j of the next layer is given a weight wij . Neurons are also called nodes. Brie¯y and in general, the BPN operates as follows: An input pattern (a set of input values) is fed to the corresponding input nodes. Every input neuron sends this `signal’ to each node of the next (`above’) middle layer with which it is connected. This signal is adjusted however, by the weight of this particular connection. The total signal received by each middle layer node therefore is the sum of the weighted signals sent by each node of the input layer. Each node of the middle layer will process the total signal received through an output (`transformation’ ) function and send a signal to all nodes of the next layer and each of these signals will be adjusted again by the weight of the particular connection (weights are given initially arbitrary values). Hence, every node of this second middle layer will receive a total weighted signal and the process goes on as described above. Eventually, such processed and adjusted signals will reach the output nodes of the output layer. Each output node therefore will receive such signals from all nodes of the `previous’ middle layer, it will process them through its own output function and produce an output signal (assessment, forecast, etc.) corresponding to this particular output (here dependent) variable. The outputs produced from the output neurons are now compared to the real output pattern (output values), assumed here to be known, which corresponds to the initial input pattern fed to the input layer of the BPN. Input and output patterns of the training data set are also called training facts. The di erence between the produced and the real values of the dependent variables represented by the output neurons, results in an error signal for each output node. All errors from each output node are now transmitted backwards to all the nodes of the previous layer. The receiving nodes do not get the error signal as such but only a portion of them, based roughly on the contribution that the receiving node has made to the output produced at the output node. The process is repeated from layer to layer until all nodes in the BPN receive an error signal based on their relative contribution to the output produced at each output node in the output layer. Based on these error signals received, the connection weights between each node and the nodes of the next
1609 layer are updated until the BPN converges to a state where all `training patterns (the input patterns given to the input layer and the real, known output patterns) are encoded’. This implies that the weights are now such that if a training input pattern is given again to the BPN, the latter will produce a set of output values similar to the real ones. The problem here therefore is to ®nd values for all connection weights for which, when a set of values (input pattern) is given to the independent variables, the values produced for the dependent variables are similar to the actual ones (output pattern) for the given input pattern. It must be clari®ed that it is the whole data set that the BPN uses (`makes a run’); then it calculates the errors it makes, adjusts its weights, and repeats the process and further adjusts the weights, by going through the whole training data set again on each run. It is assumed then that after this successful training of the BPN, the underlying relationship between independent and dependent variables has been captured by the estimated weights. Then, the BPN should be able to produce reliable forecasts or estimates from its output layer nodes, for the dependent variables, when we do not know their values, any time the respective input neurons are provided with values for the independent variables.
A synoptic formal description of BPN This section presents, in a formal summarized form, the operation of the BPN. The existence of one middle (`hidden’) layer is assumed and hence this BPN has in total three layers. There have been several di erent descriptions of the BPN in the literature, some of them written especially for economists (Kastens et al., 1995). The description adopted here is based on Freeman and Skapura (1991), where a more detailed explanation of the BPN’s operation along the following lines is presented. Assume that we have P vector-pairs (x1 ; y1 ), (x2 ; y2 ), . . . , (xP ; yP ) representing an unknown functional relationship y ˆ g…x† where x 2 RN and y 2 RM . The purpose is to train the three-layer BPN of N input nodes (independent variables) and M output nodes (dependent variables), using the P vectors, to `learn’ by adjusting its weights the true relationship g…:†. The steps that summarize the BPN’s operation are the following: (1) An input vector xr ˆ …xr1 ; xr2 ; . . . ; xrN †t is applied to the input layer of N nodes (where t simply denotes the number of times this vector is applied to the input layer since during training it is applied several times). Every input node sends the received signal to all nodes of the middle layer adjusted by each particular connection and therefore all middle (`hidden’) layers receive adjusted signal inputs from all input nodes. The net input to the jth node of the hidden layer (h) is given therefore by:
C. T. Papadas and W. G. Hutchinson
1610 nethrj ˆ whji
N X iˆ1
whji xri ‡ ³hj
where is the weight of the connection between the ith input node to the jth node of the hidden layer and ³hj is a `bias term’ which demonstrates the existence of an additional neuron in the input layer which constantly produces the value 1, and its connection weights are subject to updating like all other weights. Such a node has been found to improve the performance of the BPN. (2) This net input `activates’ the jth node of the hidden layer and produces an output according to the output function irj ˆ f hj …nethrj †. This process happens in all hidden layer nodes. Output functions can take one of several forms, but the sigmoidal one is usually adopted, taking into consideration its monotonicity, its being output-limiting and its differentiability. (3) Each hidden layer node sends to all output nodes its signal, which is adjusted in the process by the connection weights. In particular, each node of the output layer receives as a signal the total input: netork ˆ netork
L X jˆ1
wokj irj ‡ ³ok
Here is the net input to the kth node of the output layer, coming from all nodes of the hidden layer j ˆ 1; 2; . . . ; L; wokj is the weight of the connection between the jth node of the middle layer and the kth output node. A node producing constantly the value 1 is also included in the middle layer. (4) All output nodes process their received input and produce their output. The kth output node, for example, produces the output ork ˆ f ok …netork † where f ok …netork † is the output function of the kth output node. (5) The error between the output (estimated value of the dependent variable) of the output node and the real one is given by ¯rk ˆ …yrk ¡ ork † where yrk is the real output (value of dependent variable) of the output node (dependent variable) k, which corresponds to the input pattern r. It is this error that the BPN `sees’ and tries to minimize by updating the connection weights. In fact for all given input and output patterns, the BPN trains and updates weights minimizing half the sum of squared errors (Generalized Delta RuleÐGDR): Ep ˆ
M 1X ¯2 2 kˆ1 rk
…8†
We need to 0 de®ne the following quantity: o ˆ …yrk ¡ ork † f ok …netork † for the output nodes, where the ¯rk prime indicates the derivative of the transformatio n function0 withP respect to netork , and similarly o o wkj for the hidden layer nodes. ¯rjh ˆ f hj …nethrj † k ¯rk
(6) The weights of connections between the middle and output layer nodes are updated according to: o wokj …t ‡ 1† ˆ wokj …t† ‡ ²¯rk irj where ² is a predetermined `learning rate’ parameter and t denotes the time we implement the input and output pattern r until the BPN is trained properly. As mentioned earlier, the BPN does not ®nish training with one input and output pattern, before it moves to the next one. Each time it goes through all the data again and again (`runs’), adjusting the weights. The procedure described takes place for all input and output patterns used in the training process. (7) The weights of connections between the input and middle layer nodes are updated as: whji …t ‡ 1† ˆ whji …t† ‡ ²¯hrj xi . The order of the updates on an individual layer is not important. The training stops when the BPN `learns’ the relationship between inputs and outputs and, given the ®nal values of the updated weights, application of all input patterns will produce from the output nodes the true output patterns. The error measurement minimized during training for the above purpose is always the one given by Equation 8. The learning rate parameter is positive and usually less than 1. It is related to the minimization of (8) through a steepest decent algorithm. A larger value of the parameter will result in a speedier training process but results in the danger of obtaining a local rather than a global minimum for (8) when the BPN converges, and further training does not improve the weights and the accuracy of the BPN predictions of the given output patterns from the input ones. A relatively small value for the parameter will result in a larger number of runs (iterations) t and training time, but increases the chances of reaching BPN convergence and ®nal weight values at a lower or global minimum of (8). However, reduction of the parameter value beyond a speci®c minimum may not provide better results and may result in non-convergenc e of the training process. Freeman and Skapura (1991) give derivations of the formulas for the updating of the weights.
I V . EM P I R I C A L A N A L Y S I S A neural network of the BPN type was constructed and trained in order to forecast and update input±output technical coe cients. The objective was to do so using as input patterns only the amount and kind of information needed for the application of the RAS method. The results of the BPN forecasts where analysed to assess its performanc e using certain criteria. In addition, the RAS method was also applied and its result were compared with those of the BPN to see if the BPN can o er a credible alternative to RAS.
Neural network forcasts of input±output technology The data used were the aggregated IO tables of the UK economy for the years 1984 and 1992. The UK economy was distinguished into seven major sectors: Agriculture, Energy, Manufacturing , Construction, Distribution, Transportation, and Services. The level of aggregation depends on the type and purpose of IO analysis but the BPN could be used regardless of the size of the input and output patterns since the computationa l power required is easily available for the conceivable sizes of IO tables. The current size of the aggregated IO tables provides 49 observations (input and output patterns) from the 1984 IO tables for training. The results forecasted and compared with the real IO coe cients are those for the 1992 tables. Various types of IO forecasting and RAS applications with a similar high level of aggregation can be found in the literature, but of particular interest is the study by Toh (1998) which provides also an instrumental variable approach to the interpretation of structural changes through the substitution and fabrication e ects. It must be clari®ed also that the UK IO tables and the transaction matrices from which the technical coe cients are derived, are the commodity-by-industr y accounts of the UK economy, which account for secondary production. Direct coe cient matrices, even though they are referred to as `Leontief’, are, in fact, of the Stone type. The economy is not distinguished simply into economic sectors to which we assign the total value of their commodity output (including secondary production of commodities), but into speci®c commodities. The commodity-by-commodit y technical coe cients used indicate the amount of one commodity needed per unit of output of another commodity, regardless of producing sector. Structure of the BPN used In order to con®ne the study to the use of the same information as in RAS, a BPN is constructed that associates during training the same kind of information as in the case of RAS (but now for the base period), with the IO coe cients of the base period. For an output pattern aij the respective inputs therefore are intermediate sales vi , intermediate purchases ui , and total outputs qi and qj for the commodities in the row and column of the speci®c coe cient. Even though only qj appears in the presentation of RAS through (2)±(4) and (5)±(7), all qs are considered since the relationships apply for all commodities i. Also in every iteration of the original RAS algorithm (Miller and Blair, 1985) both outputs for i and j are considered. In a more direct association method therefore, both should be considered in every observation used. Once training and the weight updating process is completed, providing the BPN with the same input of information for the period under forecast (assumed known as in RAS) should lead to the desired forecasts using the updated connection weights and output functions of the
1611 nodes. However, the four inputs vi , uj , qi , and qj were transformed into two, by taking the ratios vi =qi and uj =qj . This results in the ®rst two input nodes in the input layer. Adopting these two inputs provides several advantages . First of all, both inputs represent ratios taking values between zero and one (they are certainly less than one due to the existence of ®nal demand and primary inputs). Hence, both training inputs are of similar scales which is known to facilitate the training of the BPN. They are of similar scales with the training output as well, since the later is a ratio too as an IO coe cient. The main advantage, however, lies in the application later (`running’) of the trained BPN for forecasting. Using inputs for the period of forecasting of the same scale as the training inputs, allows for more reliable results. This is because the relationship between inputs and outputs outside the range of training patterns may be di erent and not captured during the calculation of training weights. The RAS method however, uses indirectly more information than that formally described. This is because it is an algorithm which is `cell-sensitive’ in the IO coe cients matrix, during all its process. The exact position of a coef®cient’s cell in terms of its speci®c row and column is in fact `considered’ by the algorithm which after several matrix operations converges to a solution and provides a forecast for a coe cient at its respective cell in the forecasted IO coe cients matrix. Accounting for the speci®c row and the speci®c column of a technical coe cient (di erent combination each time for each one of them) captures several technical characteristics or `particularities’ of the sectors or commodities involved, beyond the pieces of information mentioned. This is, in fact, the reason that econometric forecasts of IO coe cients need more information for the relevant sectors or commodities, than the RAS method. It is believed that this explains also, partially, the reason for the nonexistence of BPN forecasting in IO analysis, since with the two input nodes adopted so far, the BPN does not yield accurate forecasts, certainly much worse that those provided by RAS. This problem was faced and the BPN made to account for the speci®c row and column of each coe cient, borrowing an idea from econometrics. As input nodes, `dummy neurons’ were introduced, each one of which represents a row or column. Every row and every column are represented by dummy row and column neurons and therefore in this particular case, fourteen such nodes were introduced accounting for the seven rows and seven columns of the coe cient matrix. Every row dummy neuron takes the value 1 if the inputs refer to that row and 0 otherwise. Similarly every column dummy neuron takes the value 1 if the inputs refer to that column and 0 otherwise. The values of 1 and 0 are also very close to the scale of the output and the other input values. Hence, together with
1612 the two input nodes mentioned earlier, there are in total sixteen nodes in the input layer. According to the above, if all available data for 1984 (49 observations) are used in training, every row dummy neuron of the input layer takes the value 1 seven times (for the seven columns of the tables), since there are seven coe cients in the di erent columns that belong to the same row. In the remaining 42 observations the input values of that node are zero. Similarly, each of the column dummy neurons takes the value 1 seven times, corresponding to the seven coe cients in the seven rows and the particular column represented by the column dummy node. In the other 42 observations the nodes take the value zero. All row dummy nodes therefore, take the value 1 seven times and the value 0 the remaining times, but the times that they take the value 1 are di erent from one row dummy to another (di erent observations) since they represent di erent rows. Similarly, all dummy column nodes take the value 1 seven times and the value 0 the remaining times, but the times that they take value 1 are di erent from one column dummy node to another since they represent di erent columns. To put it di erently, for each row and column dummy node, not only is the value 1 taken seven times only (the rest of the observations for each node being zero), but in addition, there is only one observation in which the combination of a particular row dummy node and a particular column dummy node take simultaneously the value 1 (all other input nodes taking the value 0). Each one of the 49 such combinations represents a particular matrix cell (row± column) of a particular coe cient. The BPN employed therefore has sixteen input layer nodes: two representing and taking the values of vi =qi and uj =qj as explained earlier, and fourteen dummy nodes taking the value 1 or 0, while there is one output node in the output layer representing and taking the values of the coe cient aij corresponding to the inserted input data for a given i and j each time. There is only one middle layer, as is usually the case, with seven nodes. There is no clear cut rule on the preferred structure in terms of middle layers and nodes, only some rules of thumb that were used and experimented with to suggest numbers. Larger and smaller numbers of hidden nodes did not improve the results, and beyond a certain point, results began to worsen. Similarly, alternative learning rates were tried, and ®nally ² ˆ 0:7 was adopted. A training tolerance of 0.06 was used. This means that during training the BPN was programmed to consider that an output pattern was learned (predicted accurately) if the forecasted error was no more that 6% of the total range of the output values (here the IO coe cients). In general, a very narrow training tolerance that forces the BPN to forecast with extreme accuracy the training output patterns from the training input patterns before training stops, may `lose generality’ and not perform well when it is later used for the forecast-
C. T. Papadas and W. G. Hutchinson ing of unknown output patterns from given input patterns. A wide learning tolerance, on the other hand, may lead to an end to the training of the BPN before it has fully captured the underlying relationship between inputs and outputs. The output functions for all middle layer nodes and the output node (now only one) are sigmoidal. o Hence, h f hj …nethrj † ˆ …1 ‡ e¡net rj †¡1 and f o …netor † ˆ …1 ‡ e¡net r †¡1 . Having trained the BPN using the 1984 IO data for the UK economy, it was run using the 1992 data. Exactly the same inputs were inserted as earlier (but now for 1992) into its input nodes, and using the calculated weights and the same output functions for the middle layer and the output node, the BPN provided forecasts of the IO coe cients. These results have been compared with the actual 1992 IO coe cients to evaluate the performance of the BPN. The RAS method described earlier was also used to obtain forecasts of the 1992 IO coe cients in order to compare its performance with that of the BPN. The RAS method was applied as the optimization problem described by Equations 5±7. The RAS problem included therefore 49 variables (the coe cients to be forecasted) and 14 constraints. Naturally, some forecast errors are negative and some positive. The graph (Fig. 1) shows the size of the absolute values of forecast errors, for the 1992 IO coe cients, using both methods. The vertical axis measures the sizes of absolute errors and the horizontal axis the 49 respective coe cients. The coe cients are presented row by row for each column. For example, the ®rst seven refer to the coe cients of the seven rows and the ®rst column, the next seven to the coe cients of the seven rows and the second column, etc. Columns and rows are ordered in the same way as for the seven major IO sectors and commodities above. Figure 1 shows the number of forecasts that were more accurate with one method than with the other and also the range within which forecast errors lie. It is obvious that most of the errors lie in very narrow ranges, indicating satisfactory performance of both methods. The RAS method appears to outperform the BPN here, as most of the forecast errors are smaller with RAS. However, 14 coe cients, representing 28.6% of the total, were forecast more accurately with the BPN. In addition, for another 18 coe cients the di erence in the absolute forecasting errors between the two methods was zero to two decimal places (and for most of these it was either zero up to the third decimal place or very close to such a value). These are very small values considering the high level of aggregation in the IO tables used, implying that the better performance of RAS is not very signi®cant. There are only very few outliers, which represent relatively large forecast BPN errors, seen in Fig. 1, with large error di erences between the two methods, and these represent the few cases for which the RAS superiority was substantial. A more detailed discussion on performance and
Neural network forcasts of input±output technology
1613
0.1
0.09
0.08
Absolute values of errors
0.07
0.06 BPN error
0.05
RAS error
0.04
0.03
0.02
0.01
0 1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 IO coefficients
Fig. 1. Absolute values of forecast errors of IO coe cients: BPN and RAS
comparison of the two methods can be facilitated by the data in Table 1. Table 1 provides some statistics and information on the forecasts and the absolute value of forecast errors using both BPN and RAS methods (since errors can take both signs, absolute values are used to assess each method’s accuracy). Table 1 also provides, in the third column, the same information for the distribution created by taking the absolute value of the di erence of absolute errors under the two methods (again, since the di erence of absolute values can take both signs, the absolute value of the di erence is used). Examining such a distribution allows for a better comparison of the performanc e of the two methods.
Table 1. Information on forecast errors and the performance of BPN and RAS
Average absolute error Variance of absolute errors Range of absolute errors (Average absolute error)/ (IO coe cient range) (Range of absolute errors)/ (IO coe cient range)
BPN
RAS
BPN-RAS
0.0214 0.0004 0.0935 0.0604
0.0130 0.0002 0.0512 0.0368
0.0126 0.0002 0.0813 0.0357
0.2648
0.0903
0.2303
The third column of the table refers to the distribution of absolute di erences between the absolute errors of the two methods. First, the ®rst two columns are examined, where column one provides an evaluation of the BPN performance, and the second column an evaluation of the RAS method. Average values of absolute errors are very common in assessing IO coe cient forecasts. The value of 0.0214 is not, in the authors’ view, signi®cant if one considers again the level of aggregation in the IO tables, which naturally tends to magnify the coe cients when interindustry transactions are not zero (as in some cases). This is particularly true if one chooses to ignore the high error values of a very few outliers (or even the most profound one) which contribute of course to a higher average. It is indicative of the good performance of the BPN that the average absolute error is only 6% of the entire range of the actual IO coe cients, as the fourth row shows. The RAS has performed even better (3.7%) with regards to that criterion. Using this percentage as a criterion (the usual one in assessing BPN performance in forecasting tasks) is a better alternative to percentage errors. This is because very high percentage errors may occur in the prediction of very small coe cients, and therefore the error value is still small and perhaps insigni®cant for IO analysis, in which case the prediction method has worked well. On the other hand, small percentage errors of high-value coef®cients may re¯ect large error sizes. Examining the error
1614 relative to the actual values range provides a more realistic assessment of its importance and also of the performance of the method. The small variances indicate that the absolute errors are generally very close to the average absolute values of errors, which indicates in turn that most of the data are equally small percentages of the true coe cients range. To get a picture of this, the range of absolute errors was estimated as a percentage of the true coe cients range. For both methods, BPN and RAS, it is small, only 26.5% and 9% respectively. The relatively large di erence between the methods can again be attributed to the few outlier observations where the error di erence is large. (This can easily be veri®ed by looking at the sizeable di erence between the average absolute error mentioned and the range of absolute error, both as percentages of the true coe cients range. The di erence is much larger than in the case of RAS.) The results nevertheless, testify to a satisfactory performance for both methods. If the focus is more on the di erences in performanc e of the two methods, it should be noticed that the di erence between the usual measurement of performance, the average absolute errors of the two methods is only 0.008, again rather small if one considers the range of coe cients and the high level of IO aggregation in the data used. The ®nal column of Table 1 refers to the distribution resulting from the absolute di erences between the absolute errors of BPN and RAS (since the simple di erences of absolute errors can take both signs). The ®rst row provides the average of these absolute di erences. This average is obviously a stricter and arguably more thorough comparison of the two forecasting methods. It is for this reason that the value of this average is 0.0126, higher than the di erence of the average absolute values. It is still a relatively small value representing only 3.57% of the range of IO coe cients as can be seen in the fourth row. The rest of the values provided in the other rows of the ®nal column, have much less value for comparisons purposes. It is natural that the ®gure in the last row liesÐby the de®nition of the relevant distributionÐin between the ®gures of the other two rows and the fact that it is closer to the value in the second column rather than than that in the third column demonstrates the e ect of the outliers on the range of values in the distribution of the ®nal column. Overall these results show that both methods have performed relatively well in the forecasting of IO technology coe cients. In addition, the RAS has outperformed the BPN but the latter has generally achieved either slightly better or only slightly worst results. In addition, improvements in the structure of BPN are possible while the use of dummy neurons seems to remove those obstacles previously met in the application of neural networks to IO forecasting.
C. T. Papadas and W. G. Hutchinson V. SUMMARY AND CONCLUSIONS It remains a paradox that neural networks have not been widely combined and used in input±output analysis. This is particularly true in those cases where forecasted values are the main objective rather than the derivation and estimation of functional forms. This is because the usual criticism (often unfair) that neural networks constitute black boxes producing estimates without much explanation of the underlying relationships, is of less importance. The introduction and use of dummy neurons as proposed in the paper, allows for much improved and more reliable results. This allows for the consideration of factors captured in other traditional forecasting methods such as the RAS, and it removes serious computational problems that perhaps have prevented BPN applications to the ®eld of IO analysis. These results demonstrate that the structure of the BPN adopted performed well. The RAS method outperformed the BPN overall, but the latter achieved better forecasts for almost 29% of the relevant IO coe cients. In general, di erences in errors were not substantial with just a few exceptions. The better overall performance of the RAS method cannot be accepted as a rule and the di erence in performance is clearly not great enough to take the BPN out of the picture as an alternative method of forecasting. In addition, it is believed that the experimentation has been carried out here under more favourable conditions for the RAS method. The high level of aggregation in the IO tables used gives more validity to the assumption of uniform substitution and fabrication e ects. Combined with the fact that in the highly aggregated tables the IO coe cients tend not to present great changes from one period to the next, RAS forecasts have been facilitated as they try to approach as closely as possible the initial coe cients given the new information and under the uniformity assumptions. Under di erent conditions a ¯exible forecasting method that captures non-linearities (here nonuniformities of the two e ects) and does not use a closeness criterion may perform better. The computationa l power to deal with much larger IO tables using larger BPNs is readily available. The evaluation of the overall BPN performance and its comparison with that of the RAS method was a ected also by some outliers, cases where the absolute value of BPN errors were either very high or much higher than the corresponding values of the RAS method. It may be possible however to deal with these cases more e ciently at a more practical level. Associated with the problem of forecasting IO coe cients is the `reconciliation issue’. This refers to the following fact: once the forecasts have been made, each column (row) sum may yield a total percentage for these intermediate input transactions, di erent from the initial given information of column (row) sum values, taken as
Neural network forcasts of input±output technology percentages of total outputs (also given), corresponding to these columns (rows). Several methods have been proposed to overcome this problem and the issue is not resolved. Usually, however, the methodologies lead to reduction of large di erences between forecasted changes of the various coe cients. This reduces the size of large errors in cases such as our outliers, where the coe cient remains much closer to its previous value than forecasted. In addition, the structure of the BPN was selected in terms of the number of middle layer nodes, or even the number of middle layers (and the learning rate as well), more or less arbitrarily based on some experimentation and rules of thumb. This is most often the case due to di culties involved in developing a formal procedure for selecting an optimum BPN structure. However, there is a way to optimize the structure of a BPN using genetic algorithms, a procedure ®nding wider applications in neural networks. Such an optimization will lead to improvements of the results, especially in those cases where large errors are observed, and allow the BPN to compete more e ectively with other methods. BPNs present also some other attractive features for IO analysis, as they can be used in the forecasting of other types of coe cients and therefore multipliers, where other forecasting methodologies have di culties or are even non-existent. We refer here to certain types of extended (energy, environmental , etc. IO models). Future research should focus on the development of neural networks that can o er substantial help in input± output applications to di erent types of problems. Neural networks may also help revive some solutions and approaches to IO analysis which were theoretically promising, but ®nally abandoned and replaced by others. R E F E R E N C ES Bacharach, M. (1970) Biproportional Matrices and Input±Output Change , Cambridge University Press, Cambridge.
1615 Field, K. F. (1986) Input-output technology forecasting: a microeconometric foundation approach, Department of Economics, University of Strathclyde. Fisher, W. H. and Chilton, C. H. (1975) Developing ex-ante input±output ¯ow and capital coe cients, in W. F. Gossling (ed.), Capital Coe cients and Dynamic Input± Output Models, Input±Output Publishing Co., London. Freeman, J. A. and Skapura, D. M. (1991) Neural Networks: Algorithms, Applications, and Programming Techniques, Addison-Wesley, Inc. Friedlander, D. (1961) A technique for estimating contingency table given the marginal totals and supplementary data, Journal of the Royal Statistical Society, 124, 412±20. Hamilton, D. (1982) Theoretical and practical aspects of ex-ante techniques in updating and forecasting of input±output coef®cients, Fraser of Allander Institute, University of Strathclyde, Glasgow. Harrigan, F. (1982) Projection and updating input±output matrices, SRC research project, Fraser of Allander Institute, University of Strathclyde, Glasgow. Kastens, T. L., Featherstone, A. M. and Biere, A. W. (1995) A neural network primer for agricultural economists, Agricultural Finance Review, 55, 54±73. Matuszewski, T., Pitts, P. R. and Sawyer, J. A. (1964) Linear programming estimates of changes in input±output coe cients, Canadian Journal of Economics and Political Science, 33, 203±10. Miernyk, W. H. (1970) Stimulating Regional Economic Growth, D.C. Heath and Co., Lexington, MA. Miller, R. E. and Blair, P. D. (1985) Input±Output Analysis: Foundations and Extensions, Prentice Hall, London. Snower, D. J. (1990) New methods of updating input±output matrices, Economic Systems Research, 2, 27±38. Stone, R. (1961) Input±Output and National Accounts, Organization for European Economic Cooperation, Paris. Stone, R. and Brown, A. (1962) A Computable Model of Economic Growth, Vol. 1, A Programme for Growth, Chapman and Hall, London. Stone, R. and Brown, A. (1963) Input±Output Relationship 195466, No. 3, A Programme for Growth, Department of Applied Economics, Cambridge University. Toh, M.-H. (1998). The RAS approach in updating input±output matrices: an instrumental variable interpretation and analysis of structural change, Economic Systems Research, 10(1), 63±78, Cambridge University.
Copyright of Applied Economics is the property of Routledge and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use.