Order selection for vector autoregressive models

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. 2, FEBRUARY 2003

427

Order Selection for Vector Autoregressive Models Stijn de Waele and Piet M. T. Broersen

Abstract—Order-selection criteria for vector autoregressive (AR) modeling are discussed. The performance of an order-selection criterion is optimal if the model of the selected order is the most accurate model in the considered set of estimated models: here vector AR models. Suboptimal performance can be a result of underfit or overfit. The Akaike information criterion (AIC) is an asymptotically unbiased estimator of the Kullback–Leibler discrepancy (KLD) that can be used as an order-selection criterion. AIC is known to suffer from overfit: The selected model order can be greater than the optimal model order. Two causes of overfit are finite sample effects and asymptotic effects. As a consequence of finite sample effects, AIC underestimates the KLD for higher model orders, leading to overfit. Asymptotically, overfit is the result of statistical variations in the order-selection criterion. To derive an accurate order-selection criterion, both causes of overfit have to be addressed. Moreover, the cost of underfit has to be taken into account. The combined information criterion (CIC) for vector signals is robust to finite-sample effects and has the optimal asymptotic penalty factor. This penalty factor is the result of a tradeoff of underfit and overfit. The optimal penalty factor depends on the number of estimated parameters per model order. The CIC is compared to other criteria such as the AIC, the corrected Akaike information criterion (AICc), and the consistent minimum description length (MDL). Index Terms—Multivariate time series analysis, order selection, selection bias.

I. INTRODUCTION

D

ETERMINATION of the model order is an important step in vector autoregressive (AR) modeling. In this paper, we will consider order-selection for estimated vector AR( ) models from observations of vector-valued time series. Automatic order selection using statistical order-selection criteria was first introduced by Akaike [1]. A typical application of vector AR models is clutter suppression in airborne radar signal processing. The colored clutter signal is whitened by inverse filtering with an estimated vector AR filter [2]. The performance of an order-selection criterion is optimal if the model of the selected order is the most accurate model in the considered set of estimated models. Note that this is not necessarily the true model order. If the true process is AR(10), where the last six parameters are insignificant, the estimated AR(4)-model will be the most accurate. The two ways order selection can fail is by selecting either a model order that is too low or a model order that is too high. These phenomena are called underfit and overfit, respectively. The Akaike information criterion (AIC) is known to suffer from overfit [3], [4]. As Manuscript received October 17, 2001; revised September 24, 2002. This work was supported by the Dutch Technology Foundation (STW) under Contract DTN 4758. The associate editor coordinating the review of this paper and approving it for publication was Dr. Alexi Gorokhov. The authors are with the Signals, Systems, and Control Group, Delft University of Technology, Delft, The Netherlands (e-mail: [email protected]). Digital Object Identifier 10.1109/TSP.2002.806905

a result, a lot of attention has been given to reducing overfit in order-selection. Several solutions to the overfit problem have been proposed. One solution is to choose a very conservative . This strongly revalue for the maximum model order duces the finite sample overfit. For vector AR models estimated from observations of an -dimensional time series, the maxshould be less than ( imum model order at most) [5]. However, the optimal model order may well be greater than this maximum candidate order. As a result, this restriction can reduce the performance of AR modeling [6]. A corrected AIC (AICc) has been introduced based on a different elaboration of asymptotic results [7]. Although asymptotically equivalent to AIC, simulations have shown that more accurate order-selection is achieved with this criterion [8]. Another solution is the usage of consistent criteria with an [4]. A consistent critepenalty factor that increases with rion is the minimum description length (MDL) criterion [9] or the equivalent Bayesian information criterion (BIC), where the . Asymptotically, MDL works well if penalty factor is set to the true process is a AR( ) process with very significant parameters. In practice many processes are of a more complex nature, typically AR( ). Consistent criteria do not perform very well for these complex processes [10]. As mentioned before, not only overfit but also underfit may lead to reduced accuracy of order-selection. The combined information criterion (CIC) for scalar signals [11] is based on a trade-off of underfit and overfit. It is a combination of the finite sample information criterion (FSIC) and an asymptotic . In this order-selection criterion with penalty factor paper, we will derive the CIC criterion for vector signals and compare it with existing criteria. Throughout, we will take into account the possibility of order-selection for partial prediction. elements ( ) of Partial prediction means predicting the vector from previous observations. An application of partial prediction is found in control [12]. We will discuss the case where the model order is determined based on the given data only. If reliable knowledge about the optimal order is available from previous experiments this should be taken into account. In Section II, some definitions for vector AR time series analysis are given. In Section III, overfit in order selection is discussed. In Section IV, the combined information criterion CIC for vectors signals is derived. This new criterion is compared with other criteria in a simulation experiment described in Section V.

II. DEFINITIONS A discrete-time vector time series is a vector from a vector space as a function of the integer variable . The dimension is . Components of the vector are denoted . An of

1053-587X/03$17.00 © 2003 IEEE

428


AR( ) vector process is a stationary stochastic signal that is generated by the following difference equation: (1) are the AR parameter linear mappings from The and are abbreviated to “AR parameters.” The linear mapping can be fully characterized by a matrix [13, p. 84] with . The number of AR parameters is the matrix elements is a white noise signal with AR order. The generating signal of is given by the trace covariance matrix . The variance of : tr

(2)

is denoted . The auto- and cross The covariance matrix of correlations and spectra can be calculated from the AR parameters. A useful representation of the AR parameters are the partial [14]. The requirement of stationarity of the AR correlations model can be expressed in the partial correlations as: (3) are smaller than 1. This means that all singular values of The error of an estimated model is evaluated by using the estimated model for prediction. The one-step ahead predictor for is given an estimated AR( ) with parameters by (4) The prediction error signal is given by

: (5)

The model error (ME) is a normalized version of the one-stepahead prediction error (PE) ME

tr

system identification, where one is interested in prediction the output of a system based on previous values of the input and the output [18]. The model error for partial prediction is defined as tr with

(11) Asymptotically, the expectation of model error is equal to the ) number of estimated parameters for unbiased models ( for

(12)

For the selection of the model orde, r several order-selection criteria are available. The formulation of the criteria is adapted to order-selection for partial prediction. An order-selection criterion can be based on the fit RES( ) of the estimated AR( ) model to the data plus a penalty factor for the number of estimated parameters. The residual is given by (13)

RES

is the estimate of the covariance matrix of the genwhere of the estimated AR( ) model. is erating white noise in the same way as is related to [see (11)]. related to The order-selection criteria that will be discussed are the following. • The generalized information criterion (GIC), which is defined as RES

GIC

(14)

• The AIC [14]

(6)

with

(10)

(15)

AIC • The MDL or BIC [9]:

PE

tr

Here, is assumed to be of full rank. This is a generalization of the model error for scalar signals [15]. It is the first-order Taylor approximation of the Kullback–Leibler discrepancy (KLD) for normally distributed processes [16]. An equally accurate approximation of the KLD is found using the determinant instead of the trace in the ME ME

MDL

(7)

(8)

elWith partial prediction we are interested in predicting ( ) based on all elements of the previous ements of . Without loss of generality, the observations elements discussion is restricted to prediction of the first of so that (9) Order selection for partial prediction was first discussed by Akaike [17]. An application is the prediction error method of

GIC

(16)

• The AICc [7] for partial prediction: RES

AICc

(17)

• The CIC for vector signals CIC

FSIC

(18)

The expression for CIC is given here for the sake of completeness. It will be discussed in detail in Section IV. The selected model order is given by the model order where an is minimal order-selection (19) ), the residual can be expressed For standard modeling ( in terms of the estimated partial correlations as RES

(20)

DE WAELE AND BROERSEN: ORDER SELECTION FOR VECTOR AUTOREGRESSIVE MODELS

The most common estimators for vector AR models are the Yule–Walker, least squares, and Nuttall–Strand estimators [14]. The Nuttall–Strand or multivariate Burg algorithm estimates the directly from the data. Unlike the least partial correlations squares estimate, the resulting model is guaranteed to be stationary since the estimated partial correlations satisfy the inequality (3). The estimate does not contain the triangular bias that is present in the Yule–Walker estimate [19]. Therefore, the Nuttall–Strand estimator deserves a preference. III. OVERFIT IN ORDER SELECTION In this section, it will be shown that overfit can be explained from the statistical properties (bias and variance) of the order-selection criterion. The optimal model order is the order for which the model error ME is lowest. Two causes of overfit can be distinguished: finite sample effects and asymptotic effects. The model orders where the is smaller than 0.1 times number of estimated parameters is considered as the the total number of observations asymptotic regime. This results in the following restriction of the model order : Asymptotic regime:

A. Finite Sample Overfit The AIC is an asymptotically unbiased estimator of the KLD. This asymptotic approximation is only valid as long as the number of estimated parameters is small compared with the number of observations. For high model orders, the expectation of AIC is too low. As a result, a very high model order will be selected with a very poor accuracy [5], [7]. Finite sample overfit can be removed by using an estimator for the KLD that has a lower bias for higher model orders. Exact calculations of finite sample effects are not viable. However, sufficiently accurate results can be obtained by using simulation results. Finite sample estimators of the KLD are the corrected AIC, AICc, and the FSIC [20]. The generalization of FSIC to vector signals is given by RES

FSIC is that the observed difference in the behavior of different estimators is reflected in the criterion. B. Asymptotic Overfit 1) Introduction: Asymptotic overfit is the effect that the selected model order is greater than the optimal model order even if the maximum number of estimated parameters is . small with respect to the total number of observations With AIC, the probability of selecting a model order greater than the true order is considerable, even if the maximum model order is within the asymptotic regime [3]. As a result, AIC does not provide a consistent estimate of the true model order [4]. In the asymptotic regime, AIC provides an unbiased estimate for the KLD. Here, overfit is a result of statistical variations in the order-selection criterion. In the next subsection, the cost of overfit as a result of these statistical variations will be calculated using asymptotic approximations. However, we will first show that this cause of overfit is indeed mainly restricted to the asymptotic regime. Only after having established this, an asymptotic calculation becomes meaningful. A typical simulation example is a two-dimensional (2-D) vector moving average (MA) process of order 1, given by

(21)

The solution to the problem of finite sample overfit has been discussed before [7], [11]. Finite sample overfit is briefly discussed here to provide a complete survey of the overfit problem and to show the different mechanisms governing finite sample overfit and asymptotic overfit.

FSIC

429

(22)

The are the finite sample variance coefficients, that have been determined from simulation experiments. They contain the statistical finite sample behavior of a particular estimator. For the Nuttall–Strand estimator, the are given by [21] (23) The denominator is the number of degrees of freedom per element available for estimation of . The advantage of using

(24) is white noise with covariance matrix . As where is often the case, for practical processes, this example process cannot be described exactly by a finite-order AR model. The number of observations is 200 per element. Overfit as a result of statistical fluctuations occurs when the is comparable increment of the order-selection criterion is defined with its standard deviation. The increment FSIC as FSIC

FSIC

FSIC

(25)

The expectation of the increment of FSIC and its standard deviation as a function of the model order for the simulation example are given in Fig. 1. As can be seen in the figure, the magnitudes of FSIC and FSIC are comparable only for low model orders. Therefore, overfit as a result of statistical fluctua). tions is indeed restricted to the asymptotic regime (here In addition, the example shows that statistical fluctuations are not very influential for the model orders where the estimate pa). rameters are very significant ( The performance of an order-selection criterion should be measured by determining the error of the selected model. An important factor determining the cost of overfit is selection bias [22]. Consider selection between an estimated AR( ) model and ) model from an AR( ) process. Due to an estimated AR( for the additional parameters in AIC, order the penalty is only selected if the residual reduction is greater than the average reduction. Since the true is zero, the larger parameter value results in large model error ME. Therefore, we find that the expectation of AIC( ) given that is the selected order is smaller than the a priori expectation of AIC( ): AIC

AIC

(26)

430


Fig. 1. Absolute increment of the finite sample information criterion and its standard deviation as a function of the model order k .

1FSIC

At the same time, the model error for the selected order is greater than the a priori expectation ME

ME

(27)

Since the model error and the KLD are asymptotically equivalent, it can be concluded that AIC does not provide an unbiased estimate for the KLD for the selected order. As a simulation exobservations ample to show this effect a sample of is used. The results of the MA(1)-process (24) with are given in Fig. 2. 2) Cost of Overfit: The calculation of the cost of overfit given here is a generalization of Shibata’s calculation for scalar signals [3]. We will determine the cost of overfit as the expectation of the model error of a selected model in white noise. In the , , second-order Taylor approximation around , GIC can be expressed in the parameters as (28)

GIC is the Frobenius-norm for where pressed in terms of the parameters as

fore, is the number of additionally estimated parameters when the model order is increased by 1. The decrease of the residual is denoted : RES

The model error ME is given by (30)

Note that a low value of GIC corresponds to a large value of occurs with the opposite sign in the ME because (30). This explains the opposite behavior of AIC and ME for the selected order, as expressed by (26) and (27) and found in the simulation example (see Fig. 2). Using the asymptotic theory for parameter estimation, it can be shown that the parameters are independent normally distributed with zero mean and . Therefore, the sum of squares has a variance degrees of freedom, which Chi-square distribution with . is denoted We will determine the cost of overfit for more general orderselection problem. Suppose models of increasing model order are estimated. The number of estimated parameters is . There-

RES

(31)

has a Chi-square distributed with degrees of freedom. The GIC can be expressed in terms of the parameters as (32)

GIC while the ME is given by

(33)

ME

Standard time series analysis fits into this framework by using . The number of additionally estimated parameters . The cost of overfit is the ME of the per order is equal to selected model

that can be ex-

(29)

ME

Fig. 2. A priori expectations of AIC and the model error ME compared with the expectations of AIC and ME given that the model order k is the selected model order.

ME

(34)

By calculating the ME of the selected model, selection bias is automatically taken into account. To evaluate this expression, the following result derived by Spitzer as a corollary from [23, Th. 3.1] is used. Corollary (Spitzer): Given is a set of stochastic variables with , where the are independent identically distributed (i.i.d.). Then, the expectation is given by of the maximum of

(35) Since this corollary considers the maximum of a set of stochastic variables and we take the minimum of GIC, we will look at minus GIC. This can be written as a sum of i.i.d.s : GIC with

. The ME can be expressed in terms of ME

(36) as (37)


431

Since GIC is minimal for the selected model order, the for the selected model order is given by value of . The selected order is denoted . Therefore, the expectation of the ME is given by ME (38) . Using the corollary The selected model is zero if of Spitzer, the first contribution becomes

(39) Using the fact that

(40) and some straightforward rearrangement, this contribution can be written as

Fig. 3. Cost of overfit C for AIC (penalty = 2) as a function of the additional number of estimated parameters per model order r .

by making a trade-off of underfit and overfit. Finally, the CIC for vector signals is introduced. This order-selection criterion has the optimal asymptotic penalty factor in the asymptotic regime and takes finite sample effects into account for higher model orders. A. Cost of Underfit

(41) As in the derivation of Shibata [3], we will now apply Spitzer’s . The result is given corollary to calculate the expectation of by (42)

The cost of underfit is determined by the model error as a re. A critical parameter sult of missing a critical parameter for the penalty factor is defined as the parameter for which the expectation of GIC is equal for the AR( ) model that in) model that does cludes the critical parameter and the AR( not include this parameter. Using the asymptotic Taylor approxcan be written as imation (28), the expectation GIC GIC

GIC (44)

tend to infinity, the Combining (41) and (42) and letting expectation of the ME for the selected model is given by

The expectation of the norm of the estimated partial correlation is given by

(43) ) is plotted The cost of overfit as a function of for AIC ( in Fig. 3. Some examples of order-selection problems and their cor): responding cost of overfit for AIC are scalar AR ( ; standard 2-D AR ( ): . IV. COMBINED INFORMATION CRITERION In the previous section, it was shown that finite sample overfit can be prevented by using a unbiased estimator for the KLD. However, an unbiased estimator for the KLD results in a considerable cost of asymptotic overfit. The cost of asymptotic overfit can be reduced by increasing the penalty factor . An example of such an order-selection criterion is the MDL [see (16)] that . This results in accurate order-sehas a penalty factor lection for AR( ) processes with very significant parameters. Otherwise, increasing the penalty factor will introduce underfit. Underfit means that the selected model order is lower than the optimal model order, thus missing significant parameters. In this is calculated section, an optimal asymptotic penalty factor

(45) Using the fact that the parameter estimates are asymptotically , this becomes unbiased with variance (46) Substituting GIC

this expression and GIC

in

(44) and yields

equalizing

(47) in the AR( ) model results in a decrease Including the true ) model of of the model error with respect to the AR( for including and an increase of as a result of

432


inaccuracy in the estimated parameters. Therefore, the total cost of underfit is given by (48) . This The norm of the critical parameter decreases as reflects that as the number of observations is increased, smaller details become significant because parameter estimation becomes more accurate. Not including this small parameter in the selected model therefore contributes to the cost of underfit. B. Optimal Penalty Factor Given the expressions for the cost of overfit (43) and the cost of underfit (48), we will now define an optimal penalty factor. is the defined as the penalty The optimal penalty factor factor for which the maximum of underfit and overfit is minimal: (49) Since the cost of overfit is monotonically decreasing as a function of and the cost of overfit is monotonically increasing, this and : amounts to equating (50) The optimal penalty factor as a function of is plotted in Fig. 4. It is quite accurately approximated by

Fig. 4. Optimal penalty as a function of the additional number estimated parameters per model order.

r

of

eters that could improve the model accuracy are not included in the selected model. This explains why MLD is less accurate for complex [AR( )] processes. For the Nuttall–Strand estimator, the performance of CIC differs from AICc only in the asymptotic regime. Both criteria are not subject to finite sample overfit. This difference in the asymptotic penalty factor is largest . The cost of asymptotic overfit is small for for small large . V. SIMULATIONS

(51) Using this approximation the order-selection criterion for partial becomes prediction (52)

GIC

Simulation illustrate the differences between the various order-selection criteria. We will discuss three different cases: observations; • MA(1)-process given by (24); ; and , • AR(3) process with with

This can be rearranged to yield GIC

AIC

(53)

Using this order-selection criterion, an optimal trade-off of asymptotic underfit and overfit is made. However, GIC will suffer from finite sample overfit, as it does not take into account finite sample effects. The CIC has the optimal penalty factor in the asymptotic regime, whereas an unbiased estimate for the KLD is used in the finite sample regime. The CIC is given by CIC

FSIC

(54)

This order-selection criterion will now be compared with existing order-selection criteria. The differences are illustrated with simulations. Compared to AIC, two differences are present. First, the for asymptotic penalty factor is greater than the penalty AIC. This yields a better trade-off of asymptotic underfit and overfit. Moreover, CIC is not subject to finite sample overfit, as is AIC. The CIC deviates quite considerably from the MDL. The ever increasing penalty factor in MDL can lead to a great cost of underfit, which is given by (55) as can be found by substituting the penalty factor of MLD in (48). This is a result of the fact that small param-

(56) ; ; and and ; • AR(3) process with ; . The ME of the selected model and the asymptotic penalty for a number of order-selection criteria are given factor considered for in Table I. The maximum model order selection equals 50. The first simulation example shows that the effect of finite sample overfit in AIC leads to a poor quality of the selected model. This type of overfit does not occur in the is other two examples. Here, the maximum model order relatively small with respect to the number of observations . The second example shows that the cost of underfit in MDL can , the estimated AR(3) model typically be large. For is the most accurate model. Still, MDL frequently selects the AR(0) model. In a theoretical analysis of order-selection, it is often assumed that underfit does not occur. This assumption is only realistic if a larger number of observations of an AR process is available, as in the third simulation example. The CIC and AIC yield accurate models in all three examples. The model error is somewhat smaller for CIC. The third example, where underfit is absent, shows that the theoretical analysis of Section III accurately describes the asymptotic cost calculated using of overfit. The asymptotic cost of overfit


433

TABLE I SELECTION QUALITY OF FOUR ORDER SELECTION CRITERIA FOR VECTOR AR MODELS. THE ACCURACY OF THE SELECTED MODEL IIS EXPRESSED USING THE ME (AVERAGE OF 10 000 SIMULATIONS). ALSO GIVEN IS THE ASYMPTOTIC PENALTY FACTOR . AIC AND AIC ARE EQUIVALENT IN THE ASYMPTOTIC REGIME; THE DIFFERENCE BETWEEN THESE CRITERIA OCCURS IN THE FINITE SAMPLE REGIME

(43) for the 2-D AR(3) process with is 2.23 for , as in AIC and AIC . Combining this cost of overfit with ; ; in the cost of parameter estimation of 6 [ (12)] yields a total error of 8.23. For the penalty factor of 2.5 as used in CIC, a lower cost of overfit of 1.05 is predicted, yielding a total error of 7.05. Both predictions are in agreement with the simulation results. VI. CONCLUDING REMARKS The analysis of underfit and overfit in vector time series analysis has resulted in the CIC for vector signals. It has been shown that the optimal penalty factor depends on the additional number of parameters that is estimated if the model order is increased by 1. The optimal penalty factor decreases from 3 for to the penalty factor as in AIC for large . The analysis of the underfit and overfit, leading to the orderselection criterion CIC, can easily be extended to other order-selection problems such as linear regression. The finite sample behavior of an estimator can be determined from a simple simdistribution for ulation experiment. The assumption of the the residuals in the calculation of the asymptotic cost of overfit is valid for a wide range of problems [24]. REFERENCES [1] H. Akaike, “Fitting autoregressive models for prediction,” Ann. Inst. Stat. Math., vol. 21, pp. 243–247, 1969. [2] J. Li, G. Liu, and P. Stoica, “Moving target feature extraction for airborne high-range resolution phased-array radar,” IEEE Trans. Signal Processing, vol. 49, pp. 277–289, Feb. 2001. [3] R. Shibata, “Approximate efficiency of a selection procedure for the number of regression variables,” Biometrika, vol. 71, no. 1, pp. 43–49, 1984. [4] M. B. Priestley, Spectral Analysis and Time Series. London: Academic, 1981. [5] Y. Sakamoto, M. Ishiguro, and G. Kitagawa, Akaike Information Criterion Statistics. Tokyo, Japan: KTK, 1986. [6] J. Roman, M. Rangaswamy, D. Davis, Q. Zhang, B. Himed, and J. Michels, “Parametric adaptive matched filter for airborne radar applications,” IEEE Trans. Aerosp. Electron. Syst., vol. 36, pp. 677–691, Apr. 2000. [7] C. Hurvich and C. Tsai, “A corrected Akaike information criterion for vector autoregressive model selection,” J. Time Series Anal., vol. 14, no. 3, pp. 271–279, 1993. [8] T. Subba Rao, Developments in Time Series Analysis. London, U.K.: Chapman and Hall, 1993, ch. 5, pp. 50–66. [9] J. Rissanen, “Modeling by the shortest data description,” Automatica, vol. 14, pp. 465–471, 1978. [10] D. Anderson, K. Burnham, and G. White, “Comparison of AIC and CAIC for model selection and statistical inference from capture-recapture studies,” J. Applied Statist., vol. 25, no. 2, pp. 263–282, 1998.

[11] P. M. T. Broersen, “Finite sample criteria for autoregressive order-selection,” IEEE Trans. Signal Processing, vol. 48, pp. 3550–3558, Dec. 2000. [12] H. Akaike and G. Kitagawa, Eds., The Practice of Time Series Analysis. New York: Springer, 1999, Statistics for Engineering and Physical Science. [13] W. Greub, “Linear algebra,” in Graduate Texts in Mathematics, fourth ed. New York: Springer, 1981. [14] S. L. Marple, Digital Spectral Analysis With Applications. Englewood Cliffs, NJ: Prentice-Hall, 1987. [15] P. M. T. Broersen, “The quality of models for ARMA processes,” IEEE Trans. Signal Processing, vol. 46, pp. 1749–1752, June 1998. [16] S. de Waele and P. M. T. Broersen, “Finite sample effects in vector autoregressive modeling,” IEEE Trans. Instrum. Meas., vol. 51, Oct. 2002, to be published. [17] H. Akaike, “Autoregressive model fitting for control,” Ann. Inst. Stat. Math., vol. 23, pp. 163–180, 1971. [18] L. Ljung, System Identification-Theory for the User, second ed. Upper Saddle River, NJ: Prentice Hall, 1999. [19] J. Erkelens and P. Broersen, “Bias propagation in the autocorrelation method of linear prediction,” IEEE Trans. Speech Audio Processing, vol. 5, pp. 116–119, Mar. 1997. [20] P. Broersen and H. Wensink, “Autoregressive model order-selection by a finite sample estimator for the Kullback–Leibler discrepancy,” IEEE Trans. Instrum. Meas., vol. 46, pp. 2058–2061, July 1998. [21] S. de Waele and P. M. T. Broersen, “Finite sample effects in multichannel autoregressive modeling,” in Proc. IMTC Conf., Budapest, Hungary, 2001, pp. 1–6. [22] A. Miller, Subset Selection in Regression. London, U.K.: Chapman and Hall, 1990. [23] F. Spitzer, “A combinatorial lemma and its application to probability theory,” Trans. Amer. Math. Soc., vol. 82, pp. 323–339, 1956. [24] S. Kullback, Information Theory and Statistics. London, U.K.: Wiley, 1959.

Stijn de Waele was born in Eindhoven, The Netherlands, in 1973. He received the M.Sc. degree in applied physics in 1998 from Delft University of Technology, Delft, the Netherlands, where he is currently pursuing the Ph.D. degree with the Department of Applied Physics. His research interests are the development of new time series analysis algorithms and its application to radar signal processing.

Piet M. T. Broersen was born in Zijdewind, the Netherlands, in 1944. He received the M.Sc. degree in applied physics in 1968 and the Ph.D. degree in 1976, both from the Delft University of Technology, Delft, the Netherlands. He is currently with the Department of Applied Physics, Delft University of Technology. His main research interest is automatic identification. He found a solution for the selection of order and type of time series models and the application to spectral analysis, model building, and feature extraction. His next subject is the automatic identification of input–output relations with statistical criteria.