Cross-Validation with Active Pattern Selection for ... - Semantic Scholar

10 downloads 6785 Views 236KB Size Report
cross-validation of neural network classi ers called \cross- validation with active pattern selection" (CV/APS). In. CV/APS, the contribution of the training patterns ...
Cross-Validation with Active Pattern Selection for Neural Network Classi ers Friedrich Leisch, Lakhmi C. Jain, and Kurt Hornik

Abstract | We propose a new approach for leave-one-out cross-validation of neural network classi ers called \crossvalidation with active pattern selection" (CV/APS). In CV/APS, the contribution of the training patterns to network learning is estimated and this information is used for active selection of CV patterns. On the tested examples, the computational cost of CV can be drastically reduced with only small or no errors. Keywords | Cross-validation, Active Pattern Selection, Classi cation, Risk

I. Introduction

INCE its (re-)introduction and systemization by Stone in the mid-70's, cross-validation (CV) has become a S popular tool for the assessment of statistical models and

statistical prediction, e.g., for regression models, especially when only few data are available. Stone prefers the term assessment to validation, \which has a ring of excessive con dence about it" [1]. Cross-validation can not \validate" a model, but gives an estimate of the generalization capabilities. Cross-validation is a very general framework using no special model assumptions. It can be shown to be asymptotically equivalent to Akaike's Information Criterion [2], [3]. In many cases the training set is too small|in relation to the classi cation problem|to be reduced arbitrarily without loss of design quality, e.g., to split the available examples into a training and a test set. This diculty can be overcome by use of leave-one-out cross-validation at the expense of a drastic increase in computational costs. The computational cost of leave-one-out cross-validation can be very high, especially for large networks and training sets. The network must be retrained for every pattern in the training set, thus the total computational cost typically is of order N times the average network training cost, where N is the size of the training set. The contribution of the individual training patterns is very di erent, such that the costs can often be reduced by using information about the \importance" of the patterns. This paper is organized as follows: Section II reviews pattern selection for neural network training. Section III F. Leisch and K. Hornik are with the Institut fur Statistik und Wahrscheinlichkeitstheorie, Technische Universitat Wien, Wiedner Hauptstrae 8-10/1071, A-1040 Wien, Austria; email: fFriedrich.Leisch, [email protected] L.C. Jain is with the Knowledge-based Engineering Systems Group, University of South Australia, Adelaide, SA-5095, Australia; email: [email protected]

gives an introduction to estimation of misclassi cation rates by cross-validation. In Section IV a new approach for combination of active pattern selection and cross-validation is developed and tested on some arti cial and real world examples in Section V. II. Pattern Selection for Neural Network Training

Pattern selection for neural networks has been receiving increasing interest in the past few years. We distinguish between active sampling and active selection, see, e.g., [4] for an introduction. Shortly, active sampling is choosing which unlabeled examples we want to label and active selection is choosing on which labeled examples we wish to train. Active sampling tries to determine the distribution of training data and samples with a distribution di erent from the \environmental distribution" (the natural distribution of the data in the learning environment). Active sampling for a classi cation problem might, e.g., sample more data near the class boundaries and sample with less density inside the classes. Active pattern selection is used when there is no \oracle" that can be queried for new data, and starts with a xed set of examples. The task is to nd a subset that is as small as possible and contains as much information as possible. Obviously there is a tradeo between these two goals, such that the problem may be reformulated as nding a minimal subset that contains at least a certain amount of information. Recently several techniques for active pattern selection during the training process of a neural network have been developed. The dynamic pattern selection algorithm, which is an extension to standard back-propagation, was introduced in [5]. The network training is started with a small subset. During the training process, generalization of the network is estimated using an independent test set and a new pattern is selected when the generalization estimate exceeds the apparent network error on the current training set. The new training example is selected to have maximal error. A similar algorithm was developed by [6], called active selection of training sets. New patterns having maximal error are added to the current subset using an integrated mean square error estimate. The main focus lies on the reduction of the size of the training set. Both algorithms

have been developed for continuous mappings and can not be used directly for classi cation tasks. [7] studies the e ects of several strategies called pedagogical pattern selection. The patterns are not presented with uniform permutations as in normal back-propagation algorithms but some error dependent presentation probabilities or error dependent repetitions are used. III. Neural Network Classifiers and Cross-Validation

In this section we present brie y the estimation of misclassi cation rates using cross-validation. Consider the task of classifying a random vector  into one of c classes C1 ; : : :; Cc, where  takes values in an arbitrary d-dimensional input space X . Let g() : X ! f1; : : :; cg be a classi cation function mapping the input space X on the nite set f1; : : :; cg; e.g., g is the function de ned by a neural network. The classi cation error for input  = x can be measured by the loss functional Lg(x) =

c X n=1

IP(Cn jx)l(n; g(x))

where IP(Cn jx) is the posterior probability of class Cn given x and l(n; m) = ln;m is the cost arising from classifying a member of Cn as Cm . The standard case is uniform cost for each error, i.e., ln;n = 0 and ln;m = 1, n 6= m, in which case L is called the misclassi cation rate of the classi er; in the following we consider uniform costs only. The classi cation task is to nd an optimal function g minimizing the average loss IE Lg() =

Z

X

Lg(x) dF(x)

where F denotes the (typically unknown) distribution of . The average loss may be identi ed as the risk associated with the classi er. Let XN = fx1; : : :; xN g be a set of independent input vectors for which the true class is known, available for training the classi er and testing its performance. Further, let g(jXN ) denote a classi er trained using set XN . The average loss on XN , called apparent loss, typically underestimates the actual loss IE Lg( jXN ), because g(jXN ) is constructed by minimizing the loss on XN . Therefore, the performance of the classi er should be estimated by using a test set di erent from the training set. The most popular way to estimate the actual loss is to split XN in a training set and a test set. The classi er is designed using the training set and tested using the independent test set. In actual practice, the number of training and test data is limited, i.e., N is xed, resulting in a tradeo between training and test data. Using most data for training (and therefore only few data for testing) may yield

a good classi er design, but the estimate for performance prediction is not good. On the other hand, if we want a lot of data for testing, we can use only few for training and therefore the classi er design may not be good. If only few examples are available, we may not be able to a ord a test set at all, because all N examples are needed for training the classi er. A. Leave-k-out Cross-Validation By repeatedly training the classi er with N ? k examples and using the average error made on the patterns left out, one gets an estimator for the performance of a classi er trained by all available example patterns XN , called leavek-out cross-validation. The most popular choices for k are k = 1 (\leave-one-out") and k = N=10 (\10-fold"). Let XN(n) = fx1; : : :; xn?1; xn+1; : : :; xN g denote the example set with the n-th pattern left out and let C(xn) 2 f1; : : :; cg denote the class of xn. The leave-one-out CV estimator for IE Lg( jXN ) is de ned as

N X 1 l(C(xn ); g(xnjXN(n) )) N) = N n=1 Cross-validation overcomes the problems arising from the tradeo between size of training and test set by using all examples for training and testing. LCV is an asymptotically unbiased estimator, because IE Lg(xn jXN(n) ) = IE Lg( jXN(n) )  IE Lg( jXN ) for large N; the expectation is with respect to  and all possible training sets XN of size N. Obviously leave-one-out CV has a smaller bias than leave-k-out with k > 1, but simultaneously the largest computational cost. The main computational cost (CPU time, .. .) for CV arises from retraining the classi er. For CV with leave-one-out we need to train the network N times, where N is the size of the training set. Thus, the average computation time for leave-one-out is of order N times the average training time.

LCV (X

IV. Cross-Validation with Active Pattern Selection

The expected value of the leave-one-out CV estimate LCV for uniform costs (ln;n = 0, ln;m = 1) given a training set XN can be written as N X IE LCV (XN ) = N1 pn n=1

with misclassi cation probabilities n o pn := IP g(xnjXN(n) ) 6= C(xn) where IP is taken with respect to initialization and training (presentation order). A neural network is usually initialized with random weights before backpropagation training,

hence g(xnjXN(n) ) is a random variable. The probabilities pn may be seen as the \importance" of the xn for CV or as a measure how \easily" the classi er learns these examples given XN(n) . E.g., the pn may be high near the class boundaries and small near the class centers. If the example patterns are drawn iid from their environmental distribution F, the patterns misclassi ed in their CV-run|i.e., patterns with g(xn jXN(n) ) 6= C(xn)|are uniformly distributed within the set XN . We want to reduce computational cost of CV by leaving out not all examples, but examples in a subset XNK of size K < N only. We de ne a new CV estimate as

The output error will be small for inputs x with low uncertainty, i.e., for inputs, where the a posteriori probability of class C(x) is near 1 and the probabilities of the other classes nearly vanish. On the other hand, if x comes from a region where two or more classes overlap, then C(x) will be signi cantly smaller than 1 and the second term in (3) will increase, too. If the network does not have sucient approximation capabilities, because there are too few hidden nodes, the error will also be larger. We propose to use the output error of each pattern for construction of|not an optimal, but hopefully good|set XNK . Our algorithm for cross-validation on K actively selected patterns (CV/APS | CV with active pattern seK   X 1 lection) works as follows: ( n ) LCV (XN ; XNK ) = N l C(xnk ); g(xnk jXN k ) : (1) k=1 1. Train the NN classi er using training set XN . 2. Set k = 1, En1 = e(xn ) 8n. The expected value of this CV estimate is 3. Find the unused pattern xnk 2= fxn1 ; : : :; xnk?1 g havK X ing maximal error IE LCV (XN ; XNK ) = N1 pnk (2) k=1 Enk Enkk = n=2fnmax 1 ;:::;nk?1 g hence, the introduced error is minimal, if we use the K patterns with highest pnk . As the pn are unknown, we 4. Train the network using training set XN(nk ) . can not construct an optimal subset XNK directly. We 5. Set Enk+1 := Enk + e(xn) 8n. will use some heuristics and the capability of multi-layer 6. Set k := k + 1 and repeat from 3 until k = K. perceptrons to estimate the a posteriori probabilities of the The summation Enk of the errors e(xn) over all trainclasses to construct the subset XNK . ing processes should provide independence from individual training runs, i.e., the information about the \importance" A. Output of the Trained Network of each example gets more accurate after every training Multi-layer perceptrons (MLPs) are known to be asymp- cycle. Prior to actually excluding patterns, information totically equivalent to the optimal Bayesian classi er, given is gained from K independent training runs. Enk =k is the that suciently many hidden units are available and that average nal error, hence our algorithm excludes training uses sucient amounts of data [8], [9]. We use patterns with output small expected error IE e(xn ), where the exan MLP with d input nodes and c output nodes with outpectation is taken with respect to the initialization of the puts y(x) = (y1 (x); : : :; yc(x)) given input x; further let network weights. 0  yn  1, e.g., by use of a sigmoid activation function in The stopping condition \create a subset of size K" may the output nodes. The number of hidden nodes has to be be replaced by more

exible strategies such as the construcsucient to approximate the a posteriori probabilities of tion of subsets containing patterns with Enk at least as big the classes; how to nd an optimal architecture is beyond as q percent of the maximum Enk . the scope of this paper. Each class is assigned one output node and the network is B. Accumulated Training Error trained using target vectors t(x) = (t1 (x); : : :; tc(x)), where The algorithm proposed in Section IV-A above uses only the fact that MLPs estimate the a posteriori probabilities of tC (x) = 1; tn (x) = 0; 8n 6= C(x) the classes. The nal output error of the trained network is The network is trained to minimize the quadratic error used, but no information about the training process itself. X Consider an MLP trained by backpropagation (BP). Let e(x) = (yn (x) ? tn(x))2 : el (xn ) denote the error of pattern P xnl after backprop train1 The outputs yn then converge to the empirical a posteriori ing epoch l and let e~(xn) = l e (xn) be the accumuprobabilities IP(Cnjx) given input x; see, e.g., [9] for details. lated training error. Then e~ should be high for patterns Consider that the network has been successfully trained, contributing much to the training process. On the other thus yn (x)  IP(Cnjx). We can rewrite the output error as hand, patterns with small e~ contributed less to the training e(x) = (1 ? yC (x) )2 +

X

n6=C (x)

yn (x)2

1

One training epoch equals one training cycle in which each pattern

(3) is presented once to the network and the corresponding errors el (xn ) are back-propagated.

and were somehow more easier to learn. Thus, we use the heuristic that pn should be higher for patterns with high e~ and small for patterns with low e~. We de ne a variant of the CV/APS algorithm and call it CV/APS 2 by replacing e with e~: 1. Train the NN classi er using training set XN . 2. Set k = 1, E~n1 = e~(xn ) 8n. 3. Find the unused pattern xnk 2= fxn1 ; : : :; xnk?1 g hav- Fig. 1. Circle in a square (left) and continuous XOR problem (right). ing maximal error TABLE I E~nkk = max E~nk Results on circle problem, N = 200, 40 repeats. n=2fn1 ;:::;nk?1 g

4. Train the network using training set XN(nk ) . 5. Set E~nk+1 := E~nk + e~(xn) 8n. 6. Set k := k + 1 and repeat from 3 until k = K. k E~n =k represents the average area under the learning curve of pattern xn. V. Experiments

CV/APS and CV/APS 2 were tested on two arti cial benchmark problems and on ve real world data sets. In most experiments below we used slightly oversized networks, this way we avoid getting stuck in local minima too often2 . All simulations were performed on a Linux PC 486 using the Stuttgart Neural Network Simulator (SNNS). The outcome of leave-one-out CV is a series of N 0's and 1's, where 0 stands for patterns correctly classi ed when left out and 1 for patterns misclassi ed. LCV (XN ) then equals simply the number of 1's divided by the number of patterns. If the patterns xn are drawn iid from the environmental distribution, the 1's are distributed uniformly over the whole series. CV/APS tries to nd the 1's as soon as possible by shuing the order of the patterns such that the 1's are more likely to occur at the beginning of the CV series.

apparent actual LCV 10-fold

1.40 3.99 3.25 5.75 CV/APS CV/APS 2 LCV , K = N=2 3.25 3.25 CV L , K = N=4 3.25 3.20 last (avg.) 18.50 23.20 last (75%) 24.00 35.00 last (max.) 33.00 52.00

actual misclassi cation rates, the average normal leave-oneout CV estimate LCV and the 10-fold CV estimate. The apparent misclassi cation rate is|as expected|too small, the 10-fold estimate is too high whereas leave-one-out CV gives a useful estimate. The second block of Tables I and II show the CV/APS and CV/APS 2 estimates. First a subset half the size of the training set (K = N=2 = 100), and then only a quarter the size of the training set (K = N=4 = 50) were used. Thus we reduce the computational costs to 50% and 25% of the original costs. With K = N=2 we introduce no error at all for the circle problem, the CV/APS estimates equal the normal CV estimate; for the XOR case only a small A. Arti cial Data error (7.40 compared to 7.45) is introduced. CV/APS was tested on two benchmark problems used, The third block of Tables I and II show when the last mise.g., by [10] for estimation of NN misclassi cation rates. The rst problem is the \circle in a square" (see Figure 1), TABLE II and was learned by an MLP with 2 inputs, 10 hidden and 2 Results on continuous XOR problem, N = 200, 40 repeats. output nodes (2-10-2). The second problem is a continuous apparent 5.80 generalization of the well known XOR problem, and was actual 7.83 learned by a 2-4-2 MLP. In both problems 200 input patLCV 7.45 terns are uniformly distributed on the square ?1  x; y  1 10-fold 10.20 and classi ed according to Figure 1. Tables I and II show the results of the simulations on CV/APS CV/APS 2 the circle and XOR problem, respectively. 40 training sets LCV , K = N=2 7.40 7.40 CV of size N = 200 have been created for each problem. The L , K = N=4 6.70 6.60 rst block of each table shows the average apparent and last (avg.) 68.70 73.70 last (75%) 86.00 94.00 2 The goal of the simulations was to demonstrate CV/APS on some last (max.) 104.00 102.00 architecture, hence it was not necessary to use an optimal architecture.

classi ed pattern was found by CV/APS. In the circle case, on average we would have to use only K = 18:5 (K = 23:20 for CV/APS 2) to get the same misclassi cation estimate as with normal CV. Hence, on average we could reduce the computational costs by approximately 90%. The third quartiles are 24 (CV/APS) and 35 (CV/APS 2), respectively. Hence, in 30 of the 40 simulations the last misclassi ed pattern was found before these numbers. The worst cases were 33 (CV/APS) and 52 (CV/APS 2). The results for the XOR case are slightly worse than for the circle case, but the worst cases of 102 and 104 show, that we could use only half the training set and introduce almost no error at all. For both problems, CV/APS worked better than CV/APS 2. 0.5

90 80 70 60 50 40 30 20 10

0.5

0

0

-0.5 -0.5

0 0.5

Fig. 3. Accumulated error En201 after 201 training runs (original run plus 200 CV runs) for the circle problem. Patterns near the class boundary are assigned higher importance than those in the inner regions of the classes.

0.45 0.4

B. Real World Data

0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0

20

40

60

80

100

120

140

160

180

200

Fig. 2. Five simulations on the circle problem, normalized error Enkk = maxn Enk versus k.

Figure 2 shows for ve simulations the sum of the errors Enkk at time of the selection of the k-th pattern for this pattern. To get output between zero and one, Enkk is normalized with the maximum Enk at this stage, thus this value may be interpreted as the percentage of contribution of this pattern in relation to the most important pattern. The circles on the bottom left mark patterns, that were misclassi ed during their CV-run; e.g., in the rst simulation (lowest row of circles), the misclassi ed patterns were found at positions k = 2; 4; 6; 10 and 12. With normal CV, misclassi ed patterns are distributed uniformly within the training set, thus the circles would be distributed uniformly along the axis between 1 and 200. Figures 2 and 4 clearly shows how CV/APS rearranges the patterns such that misclassi ed patterns are at the beginning of the training set. They also show the correlation between the probability of misclassi cation (\density of circles") and Enkk .

We tested CV/APS also on several real world data sets. The ship data set has been provided by DSTO Australia and is not freely available. The other data sets are standard benchmarks from the UCI repository of machine learning databases at www.ics.uci.edu/~mlearn/MLRepository.html. Ship: This data set consists of radar measurements of six types of ships. The inputs are 19-dimensional vectors and should be classi ed according to the type of ship. A training set of size N = 480 was available for this problem and we used a 19-4-6 MLP as classi er. Heart Disease: This is a data set of size N = 297, where each pattern consists of 13 features. The goal is to detect the absence or presence of the heart disease in the patient. We used an MLP with 13 inputs, two outputs and no hidden layer. Glass: This data set uses 10 features (chemical analysis, etc.) to classify 6 di erent types of glass. The size of the training set was N = 214 and we used a 10-5-6 MLP. Vehicle: Classify a given silhouette as one of four types of vehicle, using a set of 18 features extracted from the silhouette. The size of the training set was N = 400 and we used a 18-6-4 MLP. Breast Cancer: Classify each pattern as benign or malignant based on 9 features. The size of the training set was N = 400 and we used a 9-4-2 MLP. Table III shows the results on the real world problems. Note that the actual error is of course unknown. CV/APS and CV/APS 2 work well with K = N=2 in all cases, and even choosing K = N=4 introduces only small error in all problems except vehicle. Figure 4 shows the sum of the er-

TABLE III

Error estimates on real world data sets. The columns contain the error on the training set (app), leave-one-out (LCV ) cross-validation, 10-fold cross-validation, CV/APS with K = N=2 and K = N=4, CV/APS 2 with K = N=2 and K = N=4.

app

LCV 10-fold

ship 3.54 8.12 heart disease 15.15 18.18 glass 1.87 5.60 vehicle 8.50 23.00 breast cancer 1.61 3.37

10.62 19.86 6.94 23.25 4.39

rors Enkk at time of the selection of the k-th pattern for the vehicle and the breast cancer problems. Again the distribution of the misclassi ed patterns (circles at the bottom) and Enkk are highly correlated. Any stopping rule based on the value of Enkk would stop CV/APS much earlier for the breast cancer problem than for the vehicle problem. 0.5 breast cancer vehicle 0.45 0.4 0.35 0.3

CV/APS CV/APS 2 N=2 N=4 N=2 N=4 8.12 7.91 8.12 7.91 18.18 17.84 18.18 17.50 5.60 5.60 5.60 4.67 22.50 16.25 22.50 16.75 3.37 3.37 3.37 3.37

the sums (1) and (2) and are divided by N. Hence, CV/APS strictly underestimates the true leave-one-out estimate. One could try to bound this bias by estimating the misclassi cation probability pnK of the last pattern left out. Another technique could be combining CV/APS and 10-fold CV. The former strictly underestimates leave-oneout, the later usually overstimates. Hence a combination of both could give bounds on leave one out at much smaller computational costs. CV/APS gives a measure of pattern \importance" which may be used for training set reduction as data preprocessing stage. Acknowledgments

0.25 0.2 0.15 0.1 0.05 0 0

50

100

150

200

250

300

350

400

Friedrich Leisch's research at the Knowledge-based Engineering Systems Group, University of South Australia, was supported by a Kurt Godel scholarship of the Austrian Federal Ministry of Science and Research. Thanks are due to Ray Johnson from DSTO Australia for providing the ship classi cation data set.

References Fig. 4. Simulation on the breast cancer and vehicle classi cation k k problem, normalized error Enk = maxn En versus k. [1] M. Stone, \Cross-validatory choice and assessment of statistical predictions", Journal of the Royal Statistical Society B, vol. 36, no. 2, pp. 111{147, 1974. [2] Hirotogu Akaike, \Information theory and an extension of the maximumlikelihoodprinciple", in 2nd International Symposium VI. Summary on Information Theory, B. N. Petrov and F. Csaki, Eds. Akad. Kiado, Budapest, Hungary, 1973. A|due to our knowledge|new approach for leave-oneM. Stone, \An asymptotic equivalence of choice of model by out cross-validation of neural network classi ers has been [3] cross-validation and Akaike's criterion", Journal of the Royal proposed: cross-validation with active pattern selection Statistical Society B, vol. 39, no. 1, pp. 44{47, 1977. (CV/APS). The contribution of the training patterns to [4] Mark Plutowski, Selecting training exemplars for neural network learning, PhD thesis, University of California, San Diego, USA, network learning is estimated and this information is used 1994. for active selection of CV patterns. On two arti cial and [5] A. Robel, \The dynamic pattern selection algorithm: E ective training and controlled generalization of backpropagation neural ve real world examples, the computational cost of CV can networks", Tech. Rep., Technische Universitat Berlin, Germany, be drastically reduced with only small or no errors. 1994. The proposed algorithms seem to be a promising starting [6] Mark Plutowski and Halber White, \Selecting concise training sets from clean data", IEEE Transactions on Neural Networks, point for further analysis. More re ned stopping criteria vol. 4, no. 2, pp. 305{318, 1993. should be investigated, i.e. how many training patterns can [7] Christian Cachin, \Pedagogical pattern selection strategies", be left out without loss of too much information. In our Neural Networks, vol. 7, no. 1, pp. 175{181, 1994. D. W. Ruck, S. K. Rogers, M. Kabrisky, M. E. Oxley, and B. W. examples stopping CV/APS at a value of Enkk = maxn Enk = [8] Suter, \The multilayer perceptron as an approximation to a 0:05 works very well (see Figures 2 and 4). Bayes optimal discriminant function", IEEE Transactions on Neural Networks, vol. 1, no. 4, pp. 296{298, December 1990. The CV/APS estimates are based on K patterns, but

[9] John B. Hampshire and Barak A. Pearlmutter, \Equivalence proofs for multi-layer perceptronclassi ers and the Bayesian discriminant function", Tech. Rep., Carnegie Mellon University, Pittsburgh, USA, 1994. [10] Colin R. Reeves and Jane C. O'Brien, \Estimation of misclassi cation rates in neural network applications", School of Mathematical and Information sciences, Coventry University, UK, 1993. [11] J. H. Gennari, P. Langley, and D. Fisher, \Models of incremental concept formation", Arti cial Intelligence, vol. 40, pp. 11{61, 1989.

Friedrich Leisch was born in Vienna, Austria. He received his M.Sc. degree in applied mathematics in 1993 from the Technische Universitat Wien of Vienna, Austria. Currently, he is Research and Teaching Assistant with the Department of Statistics and Probability Theory of the Technische Universitat Wien, Vienna, Austria. Friedrich Leisch spent the academic year 1995 as Invited Researcher at the Knowledge-Based Intelligent, Engineering Systems Group (KES), University of South Australia, Adelaide, Australia. His general research interests are in statistics and neural networks, including classi cation, adaptive resampling and model selection. Lakhmi C. Jain received his B.E. (Hons),

M.E. and Ph.D. degrees in Electronic Engineering. He is a founding director of the Knowledge-Based Intelligent Engineering Systems Group (KES), located in the Faculty of Information Technology, University of South Australia, Adelaide, Australia. Prof. Jain is one of the Editors-in-Chief of the international Journal of Knowledge-Based Intelligent Engineering Systems and serves as an Associate Editor of the IEEE Transactions on Industrial Electronics. His interests focus on the use of novel techniques such as knowledge-based systems, arti cial neural networks, fuzzy systems and genetic algorithms in engineering systems, and in particular the application of these techniques to solving practical engineering problems. He holds a joint patent (with Mr Udina) on the skylight Light Intensity Data Logger.

Kurt Hornik was born in Vienna, Austria. He received the M.Sc. and Ph.D. degrees in applied mathematics in 1985 and 1987, respectively, both from the Technische Universitat Wien of Vienna, Austria. Currently, he is Associate Professor with the Department of Statistics and Probability Theory of the Technische Universitat Wien, Vienna, Austria. Dr. Hornik's general interests lie in the areas of computational intelligence, statistics, and biostatistics. Speci c research areas of current interest include adaptive feature extraction algorithms and pattern recognition using neural networks.

Suggest Documents