Model Selection Tutorial #1: Akaike's Information ... - User Web Pages

27 downloads 0 Views 1MB Size Report
Nov 22, 2008 - Daniel F. Schmidt and Enes Makalic. Melbourne, November 22 ..... C.S. Wallace, D. Boulton and J.J. Rissanen. The minimum discrepancy ...
Motivation Estimation AIC

Derivation References

Model Selection Tutorial #1: Akaike’s Information Criterion Daniel F. Schmidt and Enes Makalic

Melbourne, November 22, 2008

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Content

1

Motivation

2

Estimation

3

AIC

4

Derivation

5

References

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Problem

We have observed n data points yn = (y1 , . . . , yn ) from some unknown, probabilistic source p ∗ , i.e. yn ∼ p ∗ where yn ∈ Y n . We wish to learn about p ∗ from yn . More precisely, we would like to discover the generating source p ∗ , or at least a good approximation of it, from nothing but yn

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Statistical Models To approximate p ∗ we will restrict ourself to a set of potential statistical models Informally, a statistical model can be viewed as a conditional probability distribution over the potential dataspace Y n p(yn |θ), θ ∈ Θ where θ = (θ1 , . . . , θp ) is a parameter vector that indexes the particular model Such models satisfy Z

p(yn |θ)dyn = 1

yn ∈Y n

for a fixed θ Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Statistical Models ...

An example would be the univariate normal distribution. !  n n 2 1 X 1 n 2 exp − p(y |θ) = (yi − µ) 2πτ 2τ i=1

where p=2 θ = (µ, τ ) are the parameters Y n = Rn Θ = R × R+

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Content

1

Motivation

2

Estimation

3

AIC

4

Derivation

5

References

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Parameter Estimation

Given a statistical model and data yn , we would like to take a guess at a plausible value of θ The guess should be ‘good’ in some sense Many ways to approach this problem ; we shall discuss one particularly relevant and important method : Maximum Likelihood

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Method of Maximum Likelihood (ML), Part 1

A heuristic procedure introduced by R. A. Fisher Possesses good properties in many cases Is very general and easy to understand To estimate parameters θ for a statistical model from yn , solve  ˆ n ) = arg max p(yn |θ) θ(y θ∈Θ

or, more conveniently

 ˆ n ) = arg min − log p(yn |θ) θ(y θ∈Θ

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Method of Maximum Likelihood (ML), Part 2 Example : Estimating the mean parameter µ of a univariate normal distribution Negative log-likelihood function : L(µ, τ ) =

n n 1 X (yi − µ)2 log(2πτ ) + 2 2τ i=1

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Method of Maximum Likelihood (ML), Part 2 Example : Estimating the mean parameter µ of a univariate normal distribution Negative log-likelihood function : L(µ, τ ) =

n n 1 X (yi − µ)2 log(2πτ ) + 2 2τ i=1

Differentiating L(·) with respect to µ yields ∂L(µ, τ ) 1 = ∂µ 2τ

Daniel F. Schmidt and Enes Makalic

2nµ − 2

n X

yi

i=1

Model Selection with AIC

!

Motivation Estimation AIC

Derivation References

Method of Maximum Likelihood (ML), Part 2 Example : Estimating the mean parameter µ of a univariate normal distribution Negative log-likelihood function : L(µ, τ ) =

n n 1 X (yi − µ)2 log(2πτ ) + 2 2τ i=1

Differentiating L(·) with respect to µ yields ∂L(µ, τ ) 1 = ∂µ 2τ

2nµ − 2

n X

yi

i=1

Setting this to zero, and solving for µ yields n

µ ˆ(yn ) =

1X yi n i=1

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

!

Motivation Estimation AIC

Derivation References

Univariate Polynomial Regression A more complex model : k-order polynomial regression

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Univariate Polynomial Regression A more complex model : k-order polynomial regression Let each y(x) be distributed as per a univariate normal with variance τ and a special mean µ(x) = β0 + β1 x + β2 x 2 . . . ...βk x k The parameters of this model are θ (k ) = (τ, β0 , . . . , βk ).

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Univariate Polynomial Regression A more complex model : k-order polynomial regression Let each y(x) be distributed as per a univariate normal with variance τ and a special mean µ(x) = β0 + β1 x + β2 x 2 . . . ...βk x k The parameters of this model are θ (k ) = (τ, β0 , . . . , βk ). In this model the data yn is associated with a xn which are known

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Univariate Polynomial Regression A more complex model : k-order polynomial regression Let each y(x) be distributed as per a univariate normal with variance τ and a special mean µ(x) = β0 + β1 x + β2 x 2 . . . ...βk x k The parameters of this model are θ (k ) = (τ, β0 , . . . , βk ). In this model the data yn is associated with a xn which are known Given an order k, maximum likelihood can be used to estimate θ (k )

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Univariate Polynomial Regression A more complex model : k-order polynomial regression Let each y(x) be distributed as per a univariate normal with variance τ and a special mean µ(x) = β0 + β1 x + β2 x 2 . . . ...βk x k The parameters of this model are θ (k ) = (τ, β0 , . . . , βk ). In this model the data yn is associated with a xn which are known Given an order k, maximum likelihood can be used to estimate θ (k ) But it cannot be used to provide a suitable estimate of order k ! Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Univariate Polynomial Regression If we let µ ˆ(k ) (x) = βˆ0 + βˆ1 x + βˆ2 x 2 . . . ...βˆk x k Maximum Likelihood chooses βˆ(k ) (yn ) to minimise τˆ(k ) (yn ) =

n 2 1 X yi − µ ˆ(k ) (xi ) n i=1

This is called the residual variance.

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Univariate Polynomial Regression If we let µ ˆ(k ) (x) = βˆ0 + βˆ1 x + βˆ2 x 2 . . . ...βˆk x k Maximum Likelihood chooses βˆ(k ) (yn ) to minimise τˆ(k ) (yn ) =

n 2 1 X yi − µ ˆ(k ) (xi ) n i=1

This is called the residual variance. The likelihood function L(yn |θˆ(k ) (yn )) made by plugging in the Maximum Likelihood estimates is   n n τ (k ) (yn ) + L(yn |θˆ(k ) (yn )) = log 2πˆ 2 2 Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Method of Maximum Likelihood (ML), Part 4 ‘Truth’ : µ(x) = 9.7x 5 + 0.8x 3 + 9.4x 2 − 5.7x − 2, τ = 1 14 12 10 8

y

6 4 2 0 −2 −4 −6 −1

−0.5

Daniel F. Schmidt and Enes Makalic

0 x

0.5

Model Selection with AIC

1

Motivation Estimation AIC

Derivation References

Method of Maximum Likelihood (ML), Part 4 Polynomial fit, k = 2, τˆ(2) (y) = 4.6919 14 12 10 8

y

6 4 2 0 −2 −4 −6 −1

−0.5

Daniel F. Schmidt and Enes Makalic

0 x

0.5

Model Selection with AIC

1

Motivation Estimation AIC

Derivation References

Method of Maximum Likelihood (ML), Part 4 Polynomial fit, k = 5, τˆ(5) (y) = 1.1388 14 12 10 8

y

6 4 2 0 −2 −4 −6 −1

−0.5

Daniel F. Schmidt and Enes Makalic

0 x

0.5

Model Selection with AIC

1

Motivation Estimation AIC

Derivation References

Method of Maximum Likelihood (ML), Part 4 Polynomial fit, k = 10, τˆ(10) (y) = 1.0038 14 12 10 8

y

6 4 2 0 −2 −4 −6 −1

−0.5

Daniel F. Schmidt and Enes Makalic

0 x

0.5

Model Selection with AIC

1

Motivation Estimation AIC

Derivation References

Method of Maximum Likelihood (ML), Part 4 Polynomial fit, k = 20, τˆ(20) (y) = 0.1612 14 12 10 8

y

6 4 2 0 −2 −4 −6 −1

−0.5

Daniel F. Schmidt and Enes Makalic

0 x

0.5

Model Selection with AIC

1

Motivation Estimation AIC

Derivation References

A problem with Maximum Likelihood

It is not difficult to show that τˆ(0) > τˆ(1) > τˆ(2) > . . . > τˆ(n−1) and furthermore that τˆ(n−1) = 0.

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

A problem with Maximum Likelihood

It is not difficult to show that τˆ(0) > τˆ(1) > τˆ(2) > . . . > τˆ(n−1) and furthermore that τˆ(n−1) = 0. From this it is obvious that attempting to estimate k using Maximum Likelihood will fail, i.e. the solution of nn no log 2πˆ τ (k ) (yn ) + kˆ = arg min 2 k ∈{0,...,n−1} 2 is simply kˆ = (n − 1), irrespective of yn .

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Some solutions ...

The minimum encoding approach, pioneered by C.S. Wallace, D. Boulton and J.J. Rissanen The minimum discrepancy estimation approach, pioneered by H. Akaike

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Content

1

Motivation

2

Estimation

3

AIC

4

Derivation

5

References

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Kullback-Leibler Divergence

AIC is based on estimating the Kullback-Leibler (KL) divergence The Kullback-Leibler divergence Z Z f (yn ) log g(yn )d yn + KL(f ||g) = − f (yn ) log f (yn )d yn n n | Y {z } |Y {z } Entropy

Cross−entropy

Cross-entropy, ∆(f ||g), is the ‘expected negative log-likelihood’ of data coming from f under g

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Kullback-Leibler Divergence Cross-entropy for polynomial fits of order k = {0, . . . , 20} 200 180 160

Cross−entropy

140 120 100 80 60 40 20 0 0

5

Daniel F. Schmidt and Enes Makalic

10 k

15

Model Selection with AIC

20

Motivation Estimation AIC

Derivation References

Akaike’s Information Criterion

Problem : KL divergence depends on knowing the truth (our p ∗ ) Akaike’s solution : Estimate it !

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Akaike’s Information Criterion

The AIC score for a model is ˆ n )) = − log p(yn |θ(y ˆ n )) + p AIC(θ(y where p is the number of free model parameters. Using AIC one chooses the model that solves n o kˆ = arg min AIC(θˆ(k ) (yn )) k ∈{0,1,...}

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Properties of AIC

Under certain conditions the AIC score satisfies h i h i ˆ n )) = Eθ∗ ∆(θ ∗ ||θ(y ˆ n )) + on (1) Eθ∗ AIC(θ(y

where on (1) → 0 as n → ∞

In words, the AIC score is an asymptotically unbiased estimate of the cross-entropy risk This means it is only valid if n is ‘large’

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Properties of AIC

AIC is good for prediction AIC is an asymptotically efficient model selection criterion In words, as n → ∞, with probability approaching one, the model with the minimum AIC score will also possess the smallest Kullback-Leibler divergence It is not necessarily the best choice for induction

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Conditions for AIC to apply

AIC is an asymptotic approximation ; one should consider whether it applies before using it

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Conditions for AIC to apply

AIC is an asymptotic approximation ; one should consider whether it applies before using it For AIC to be valid, n must be large compared to p

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Conditions for AIC to apply

AIC is an asymptotic approximation ; one should consider whether it applies before using it For AIC to be valid, n must be large compared to p The true model must be θ ∗ ∈ Θ

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Conditions for AIC to apply

AIC is an asymptotic approximation ; one should consider whether it applies before using it For AIC to be valid, n must be large compared to p The true model must be θ ∗ ∈ Θ Every θ ∈ Θ must map to a unique distribution p(·|θ)

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Conditions for AIC to apply

AIC is an asymptotic approximation ; one should consider whether it applies before using it For AIC to be valid, n must be large compared to p The true model must be θ ∗ ∈ Θ Every θ ∈ Θ must map to a unique distribution p(·|θ) The Maximum Likelihood estimates must be consistent and be approximately normally distributed for large n

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Conditions for AIC to apply

AIC is an asymptotic approximation ; one should consider whether it applies before using it For AIC to be valid, n must be large compared to p The true model must be θ ∗ ∈ Θ Every θ ∈ Θ must map to a unique distribution p(·|θ) The Maximum Likelihood estimates must be consistent and be approximately normally distributed for large n L(θ) must be twice differentiable with respect to θ for all θ∈Θ

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Some models to which AIC can be applied include ...

Linear regression models, function approximation Generalised linear models Autoregressive Moving Average models, spectral estimation Constant bin-width histogram estimation Some forms of hypothesis testing

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

When not to use AIC

Multilayer Perceptron Neural Networks Many different θ map to the same distribution

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

When not to use AIC

Multilayer Perceptron Neural Networks Many different θ map to the same distribution

Neyman-Scott Problem, Mixture Modelling The Maximum Likelihood estimates are not consistent

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

When not to use AIC

Multilayer Perceptron Neural Networks Many different θ map to the same distribution

Neyman-Scott Problem, Mixture Modelling The Maximum Likelihood estimates are not consistent

The Uniform Distribution L(θ) is not twice differentiable

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

When not to use AIC

Multilayer Perceptron Neural Networks Many different θ map to the same distribution

Neyman-Scott Problem, Mixture Modelling The Maximum Likelihood estimates are not consistent

The Uniform Distribution L(θ) is not twice differentiable

The AIC approach may still be applied to these problems, but the derivations need to be different

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Application to polynomials AIC criterion for polynomials n n τ (k ) (yn ) + + (k + 2) AIC(k) = log 2πˆ 2 2 50 45 40 35

AIC(k)

30 25 20 15 10 5 0 0

5

Daniel F. Schmidt and Enes Makalic

10 k

15

Model Selection with AIC

20

Motivation Estimation AIC

Derivation References

Application to polynomials AIC selects kˆ = 3 14 12 10 8

y

6 4 2 0 −2 −4 −6 −1

−0.5

Daniel F. Schmidt and Enes Makalic

0 x

0.5

Model Selection with AIC

1

Motivation Estimation AIC

Derivation References

Improvements to AIC For some model types it is possible to derive improved estimates of the cross-entropy Under certain conditions, the ‘corrected’ AIC (AICc) criterion ˆ n )) = − log p(yn |θ(y ˆ n )) + n(p + 1) AICc (θ(y n−p−2 satisfies h i h i ˆ n )) = Eθ∗ ∆(θ ∗ ||θ(y ˆ n )) Eθ∗ AICc (θ(y

In words, it is an exactly unbiased estimator of the cross-entropy, even for finite n

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Application to polynomials AICc criterion for polynomials AICc (k) =

n n(k + 2) n log 2πˆ τ (k ) (yn ) + + 2 2 n−k −3

50 AIC AICc

45 40

Criterion Score

35 30 25 20 15 10 5 0 0

5

Daniel F. Schmidt and Enes Makalic

10 k

15

Model Selection with AIC

20

Motivation Estimation AIC

Derivation References

Using AICc

Tends to perform better than AIC, especially when n/p is small Theoretically only valid for homoskedastic linear models ; these include Linear regression models, including linear function approximation Autoregressive Moving Average (ARMA) models Linear smoothers (kernel, local regression, etc)

Practically, tends to perform well as long as the model class is suitably regular

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Content

1

Motivation

2

Estimation

3

AIC

4

Derivation

5

References

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Some theory

Let k ∗ be the true number of parameters, and assume that the model space is nested Two sources of error/discrepancy in model selection Discrepancy due to approximation Main source of error when underfitting, i.e. when kˆ < k ∗

Discrepancy due to estimation Source of error when exactly fitting or overfitting, i.e. when kˆ ≥ k ∗

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Discrepancy due to Approximation

14 12

True Curve Best Fitting Cubic

10 8

y

6 4 2 0 −2 −4 −6 −1

−0.5

Daniel F. Schmidt and Enes Makalic

0 x

0.5

Model Selection with AIC

1

Motivation Estimation AIC

Derivation References

Discrepancy due to Estimation

14 12

True Curve Lower CI Upper CI

10 8

y

6 4 2 0 −2 −4 −6 −1

−0.5

Daniel F. Schmidt and Enes Makalic

0 x

0.5

Model Selection with AIC

1

Motivation Estimation AIC

Derivation References

Derivation The aim is to show that h i h i ˆ + p = Eθ∗ ∆(θ ∗ ||θ) ˆ + on (1) Eθ∗ L(yn |θ)

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Derivation The aim is to show that h i h i ˆ + p = Eθ∗ ∆(θ ∗ ||θ) ˆ + on (1) Eθ∗ L(yn |θ)

Note that (under certain conditions) h i ˆ = ∆(θ ∗ ||θ0 )+ 1 (θ−θ ˆ 0 )0 J(θ0 )(θˆ−θ0 )+on (1) Eθ∗ ∆(θ ∗ ||θ) 2

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Derivation The aim is to show that h i h i ˆ + p = Eθ∗ ∆(θ ∗ ||θ) ˆ + on (1) Eθ∗ L(yn |θ)

Note that (under certain conditions) h i ˆ = ∆(θ ∗ ||θ0 )+ 1 (θ−θ ˆ 0 )0 J(θ0 )(θˆ−θ0 )+on (1) Eθ∗ ∆(θ ∗ ||θ) 2 ... and

h i ˆ θˆ − θ0 ) + on (1) ˆ + 1 (θˆ − θ0 )0 H(θ)( ∆(θ ∗ ||θ0 ) = Eθ∗ L(yn |θ) 2

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Derivation The aim is to show that h i h i ˆ + p = Eθ∗ ∆(θ ∗ ||θ) ˆ + on (1) Eθ∗ L(yn |θ)

Note that (under certain conditions) h i ˆ = ∆(θ ∗ ||θ0 )+ 1 (θ−θ ˆ 0 )0 J(θ0 )(θˆ−θ0 )+on (1) Eθ∗ ∆(θ ∗ ||θ) 2 ... and

h i ˆ θˆ − θ0 ) + on (1) ˆ + 1 (θˆ − θ0 )0 H(θ)( ∆(θ ∗ ||θ0 ) = Eθ∗ L(yn |θ) 2 Where

J(θ0 ) =

"

#  2  ∂ L(yn ||θ) ∂ 2 ∆(θ ∗ ||θ) ˆ , H(θ) = ∂θ∂θ 0 θ=θ0 ∂θ∂θ 0 θ=θˆ

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Derivation Since h i 1 Eθ∗ (θˆ − θ0 )0 J(θ0 )(θˆ − θ0 ) = 2 h i 1 ˆ θˆ − θ0 ) = Eθ∗ (θˆ − θ0 )0 H(θ)( 2

Daniel F. Schmidt and Enes Makalic

p + on (1) 2 p + on (1) 2

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Derivation Since h i 1 Eθ∗ (θˆ − θ0 )0 J(θ0 )(θˆ − θ0 ) = 2 h i 1 ˆ θˆ − θ0 ) = Eθ∗ (θˆ − θ0 )0 H(θ)( 2

p + on (1) 2 p + on (1) 2

Then, substituting h i  h i  ˆ ˆ + p + on (1) + p + on (1) Eθ∗ ∆(θ ∗ ||θ) = Eθ∗ L(yn |θ) 2 i2 h n ˆ + p +on (1) = Eθ∗ L(y |θ) | {z } ˆ AIC(θ)

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

Content

1

Motivation

2

Estimation

3

AIC

4

Derivation

5

References

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Motivation Estimation AIC

Derivation References

References

S. Kullback and R. A. Leibler, ‘On Information and Sufficiency’, The Annals of Mathematical Statistics, Vol. 22, No. 1, pp. 79–86, 1951 H. Akaike, ‘A new look at the statistical model identification’, IEEE Transactions on Automatic Control, Vol. 19, No. 6, pp. 716–723, 1974 H. Linhart and W. Zucchini, Model Selection, John Wiley and Sons, 1986 C. M. Hurvich and C. Tsai, ‘Regression and Time Series Model Selection in Small Samples’, Biometrika, Vol. 76, pp. 297–307, 1989 J. E. Cavanaugh, ‘Unifying the Deriviations for the Akaike and Corrected Akaike Information Criteria’, Statistics & Probability Letters, Vol. 33, pp. 201–208, 1997

Daniel F. Schmidt and Enes Makalic

Model Selection with AIC

Suggest Documents