an efficient model for product allocation using ... - Semantic Scholar

1 downloads 0 Views 149KB Size Report
Hashem and Schmeiser (1992) suggest constructing MSE-optimal linear com- binations (MSE-OLC) of the corresponding outputs of the trained networks.
AN EFFICIENT MODEL FOR PRODUCT ALLOCATION USING OPTIMAL COMBINATIONS OF NEURAL NETWORKS Sherif Hashem y Yuehwern Yih Bruce Schmeiser School of Industrial Engineering, Purdue University Abstract

Recently, constructing MSE-optimal linear combinations of trained neural networks has been introduced as a method for integrating the knowledge acquired by a number of trained neural networks, and thus improve the resultant model accuracy. We apply this method to develop a neural network based model that aids in solving a product allocation problem. Our approach yields signi cant improvement in the modeling accuracy as well as dramatic reduction in the training time.

1 INTRODUCTION

Constructing neural network (NN) based models often involves trying a mixture of network architectures and training parameters in order to achieve acceptable performance. Consequently, a number of trained networks is produced. Typically, one of these networks is chosen as best, while the rest are discarded. Hashem and Schmeiser (1992) suggest constructing MSE-optimal linear combinations (MSE-OLC) of the corresponding outputs of the trained networks. This approach allows the integration of the knowledge acquired by the trained networks, and thus may signi cantly improve the resultant model accuracy. A brief overview on the MSE-OLC of trained NNs is given in Section 2. In Section 3, we construct a NN based model to aid in solving a product allocation problem using MSE-OLC of trained networks. The conclusions are summarized in Section 4.

2 MSE-OLC OF TRAINED NNs

Linear combinations of estimators have been used by the statistics community for a long time. Clemen (1989) cites more than 200 papers in his review of In Intelligent Engineering Systems through Arti cial Neural Networks, Vol. 3, C. Dagli, L. Burke, B. Fernandez and J. Ghosh (Eds.), ASME Press, pp. 669-674. ySupported by PRF Research Grant 6901627 from Purdue University, West Lafayette, IN. 

1

Y

1

α1

Y2

α2

Σ

X

Y

αp

Yp

OLC of Neural Networks

Figure 1: Linear Combination of the Outputs of p Trained NNs the literature related to combining forecasts, including contributions from the forecasting, psychology, statistics, and management science literatures. Hashem and Schmeiser (1992, 1993) investigate combining the corresponding outputs of a number of trained NNs, and illustrate that optimal linear combinations (OLC) may signi cantly improve the accuracy of NN based models. From a NN prospective, combining the corresponding outputs of a number of trained NNs is similar to creating a large NN in which the trained NNs are subnetworks operating in-parallel, and the combination-weights are the weights of the output layer (Figure 1). However, the main di erence between the two situations is that in the former situation the weights of the trained NNs are frozen and the combination-weights are computed by performing a simple (fast) matrix inversion, as discussed in Section 2.2. In the latter situation, training one large NN, there is a large number of weights that need to be simultaneously estimated (trained). Thus training time may be longer, and also the risk of over- tting to the data may increase.

2.1 OLC problem description

First, we discuss multi-input-single-output mappings. Multi-input-multi-output mappings are discussed in Section 2.3. Consider a multi-input-single-output mapping approximated by a trained NN. A trained NN accepts a vector-valued input ~x and returns a scalar output y(~x). The approximation error is (~x) = t(~x) ? y (~x), where t(~x) is the true answer for a given input ~x. A linear combinationP of the outputs of p NNs (Figure 1) returns the scalar output y~(~x; ~ ) = pj=1 j yj (~x) ; with corresponding error ~(~x; ~ ) = t(~x) ? y~(~x; ~ ); where yj (~x) is the output of the j th network, and j is the combination-weight associated with yj (~x); j = 1; 2; : : : ; p. The problem is to nd good values for 1 ; : : : ; p . 2

2.2 Computing the MSE-OLC weights

The combination-weights may be obtained based on any optimality criterion. We focus on minimizing the mean-squared error (MSE) of the combined model, over a given set of observed data. Think of the input ~x as an observation of a random variable X~ from a (usually unknown) multivariate distribution function FX~ . Then the MSE-OLC is de ned by the optimal-weights vector ~  = ( 1; 2; : : : ; p) that minimizes MSE(~(X~ ; ~ )) = E(~(X~ ; ~ ))2 ; where E denotes expected value with respect to FX~ . From Hashem and Schmeiser (1992), ~; ~  = ?1  (1) where  = [ij ] = [E(yi(X~ ) yj (X~ ))] is a p  p matrix and ~ = [i] = [E(t(X~ ) yi (X~ ))] is a p  1 vector.  In practice, one seldom knows FX~ . Thus  and  in (1) need to be estimated. If D is a set of independent observations from FX~ , then each ~x 2 D

may be treated as equally likely. Hence  and  may be estimated using ^ij =

and ^i =

jDj X (y (~x ) y (~x ))=jDj

for all i; j

(2)

jDj X (t(~x ) y (~x ))=jDj

for all i

(3)

k=1 k=1

i k j k k i k

respectively, where jDj denotes the cardinality of D.

2.3 Multi-input-multi-output MSE-OLC problem

One approach to the multi-output case is to compute an optimal combinationweights vector for each output separately. Such treatment is straightforward and minimizes the total MSE for multi-input-multi-output mappings. This approach is adopted in Section 3.

3 PRODUCT ALLOCATION PROBLEM

We study the allocation problem of a product made by an Indiana manufacturing company. Due to the workmanship requirements, the production capacity is limited. However, the daily demand is typically two to three times higher than the available supply. The product is made and then distributed to 40 Customer Service Centers (CSC) spread all over the United States. The problem is to allocate the available supply among the demanding CSCs in a manner that (at least partially) satis es their demand, while following some guidelines and constraints. Among these guidelines and constraints are the capacity of the distribution trucks (98 units/truck), the priority of each CSC, the frequency and size of historical demands of each CSC, and the geographic 3

location of each CSC. These constraints make the allocation problem fairly complex. Currently, the daily allocation schedule is planned manually. As an interim step, we use a NN based model, trained to mimic the human scheduler, to produce good initial schedules that the human scheduler can improve on. Thus, we hope to reduce the time and e ort associated with the daily allocation process. In this section, we conduct a pilot study to examine the performance of NN based models.

3.1 Model structure

The data for creating the NN model consist of the allocation schedules for 42 consecutive working days. For each day, the demand by each CSC as well as the supply allocated to that CSC are given. The total daily demand ranges between 1093 and 3093 units, while the available daily supply ranges between 637 and 1127 units. The data are split randomly into a training data set of 30 days, and a testing data set of 12 days. The 40 CSCs are partitioned into 4 groups, and once the supply to a given group is determined by a NN, it may be reallocated among the CSCs in that group by another NN. This hierarchical approach requires NNs much smaller than creating one NN to handle the 40 CSCs at one level. Moreover, the resultant allocation subproblems are similar; hence, we here focus on the rst subproblem in the hierarchy, allocating the available supply among the 4 groups. The individual demands from each group, the total demand, and the total supply are inputs to the model. The output of the model is the supply to be allocated to each group. Thus the model has 6 inputs and 4 outputs. The total MSE (over all 4 outputs) is used as a measure of the accuracy of the NN model.

3.2 Input-output representations

We investigate the following three di erent data representations: A. All inputs and outputs are expressed in number of units. B. The inputs are expressed in number of units, while the outputs (individual supplies) are expressed as percentages of the total supply. C. The individual demands as well as the individual supplies are expressed as percentages of the total demand and total supply (respectively). The total demand and total supply are in number of units.

3.3 Neural network topologies

Since the NN topology may in uence its approximation capability, we investigate three di erent network topologies:  6{3{4 NN: network with one hidden layer that contains 3 hidden units;  6{4{4 NN: network with one hidden layer that contains 4 hidden units;  6{3{2{4 NN: network with two hidden layers that contain 3 and 2 hidden units (respectively). The activation function for the hidden units as well as the output units is the logistic sigmoid function g(x) = (1 + e?x)?1. 4

Training Data Testing Data Total MSE

Total MSE

Input-output Representation

A

6303

8228

B

0.0066

C

0.0064

Input-output Representation

Training Data Testing Data Total MSE

Total MSE

A

3501

4761

0.0093

B

0.0043

0.0054

0.0086

C

0.0047

0.0061

Table 1: Total MSEs of best trained NNs obtained after 1000 iterations

Table 2: Total MSEs of MSE-OLC of nine NNs trained for 1000 iterations

3.4 Neural networks training

For every input-output representation, 3 replications (with independent initial connection-weights) of each of the 3 network topologies are trained using the Error Backpropagation algorithm (Hertz et al. 1991, pp. 115{130). The iteration budget is preset to 1000 iterations. The network that yields the best performance (among the 9 NNs) in terms of the total MSE on the training data is selected. The total MSEs on the training and testing data are shown in Table 1.

3.5 Results of the MSE-OLC of the trained networks

For each of the three input-output representations, the MSE-OLC of the nine trained networks is constructed. The performance of the combination on both the training and the testing data is summarized in Table 2. Compared to the performance of the single best network (Table 1), the MSE-OLC of the nine trained networks (Table 2) reduces the total MSE by 27 % to 44 % on the training data, and by 29 % to 42 % on the testing data. Since only the training data are used in estimating the combination-weights for the MSE-OLC, the comparable performance of the combined model on the training and testing data sets suggests that the MSE-OLC generalizes well. Moreover, in representations A and B, the MSE-OLCs of the nine networks, trained for 1000 iterations, outperform the best NNs obtained with a training budget of 100 000 iterations. For representation C, the MSE-OLC of the nine networks, trained for 1000 iterations, perform as good as the best NN obtained with a training budget of 100 000. This result suggests that using MSE-OLC may eliminate the need for excessive training to achieve a given model accuracy.

5

3.6 Discussion

In all the three input-output representations (A, B, C), the MSE-OLC of the nine trained networks yields better model accuracy than the best NNs (Tables 1 and 2). Hence, we focus on the analysis of the MSE-OLC models. From Table 2, representation B yields lower total MSE compared to representation C on both the training data and the testing data. Moreover, converting the NN outputs in representations B to be in terms of number of units instead of percentages of the total supply, the total MSE on the training data is 3639, and on the testing data is 4455. Thus representation B performs better than representation A in terms of the total MSE on the testing data. This makes representation B the best performer among the three representations, since the performance on the testing data is often used as a measure of out-of-sample performance of the model. Another performance measure, of special practical value, is the frequency of the approximation error, ~, exceeding a certain tolerance level. Since the capacity of the distribution trucks is 98 units, this value is chosen to be the tolerance level. For a given output, the number of times that ~ exceeds the above tolerance is divided by the total number of data points, yielding the desired error frequency. For representation B, the error frequencies for the four outputs are between 3% and 17% on the training data; and between 0% and 25% on the testing data. These results indicate that the NN based model produces allocation schedules close to those produced by the human scheduler.

4 CONCLUSIONS

Our pilot study shows that NNs may be successfully employed to capture allocation patterns and relationships that in uence product allocation decisions. MSE-OLC is a straightforward and e ective method for integrating the knowledge acquired by a number of trained networks. The gain in accuracy, compared to using the best network, is substantial in our example. Moreover, the reduction in the training time required to achieve such accuracy is dramatic.

REFERENCES

Clemen, R.T. (1989), \Combining Forecasts: A Review and Annotated Bibliography," International Journal of Forecasting, vol. 5, pp. 559{583. Hashem, S., and B. Schmeiser (1992), \Improving Model Accuracy Using Optimal Linear Combinations of Trained Neural Networks," Tech. Rep. SMS92-16, School of Industrial Engineering, Purdue University. Hashem, S., and B. Schmeiser (1993), \Approximating a Function and its Derivatives Using MSE-Optimal Linear Combinations of Trained Feedforward Neural Networks," Proceedings of the World Congress on Neural Networks, Lawrence Erlbaum Associates, New Jersey, vol. 1, pp. 617{620. Hertz, J., A. Krogh, and R.G. Palmer (1991), Introduction to the Theory of Neural Computation, Addison-Wesley. 6

Suggest Documents