Discrete choice models with non-linear externalities ...

2 downloads 0 Views 3MB Size Report
Dec 8, 2014 - machine specification: • Matlab 64bit on Linux Machine and Windows. • Intel i7 quad-core processor cpu 2.10GHz, DDR3 SDRAM of 8 Gbytes .
discrete choice models with non-linear externalities a numerical estimation method . Fabio Vanni December 8, 2014

Institute of Economics Scuola Superiore Sant’Anna - PISA - ITALY Co-authors: Ugo Gragnolati and prof. Giulio Bottazzi

outline

Presents: ∙ a stochastic discrete choice model with non-linear terms, ∙ a method to carry out inferential analysis on its parameters, ∙ numerical techniques and application to simulated data and empirical ones.

1

.the model

interdependencies in the decisions of agents

• Individual choices determined by intrinsic features of the object and by choice of other individuals. ∙ Consumers buy a product on the basis of qualities of the product and as a function of the number of other consumer possessing the product. ∙ Social medias with consumers preferences ∙ Firm for a place for localization on the base of geography and on the presence of other firms in the area.

3

markov process and polya urn schemes

∙ Class of stochastic models as Ehrenfest-Brilloiun model: a Markov process equivalent to an ergodic Polya process. ∙ Population of N agents choosing among L alternatives, each agent maximize the individual utility which depends on the choices of other.

4

sequencial choices

Process made up of sequence of indivudual choices: ∙ Framework with L locations and N agents.

5

sequencial choices

Process made up of sequence of indivudual choices: ∙ Framework with L locations and N agents.

∙ an agent is selected at random to revise his choice

5

sequencial choices

Process made up of sequence of indivudual choices: ∙ Framework with L locations and N agents.

∙ an agent is selected at random to revise his choice

∙ an alternative l is chosen by an agent with probability g pl = ∑L l l=1

gl

5

sequencial choices

Process made up of sequence of indivudual choices: ∙ Framework with L locations and N agents.

∙ an agent is selected at random to revise his choice

∙ an alternative l is chosen by an agent with probability g pl = ∑L l l=1

gl

∙ g = (g1 , . . . , gL ) vector of common utilities. 5

linear utility function

• The linear case1 : gl = al + bnl ∙ al captures the intrinsic features of alternative l (local attractiveness) ∙ b captures the social externality of alternative l (social influence) ∙ nl number of agents that selected the alternative l

1 G.

Bottazzi and A. Secchi. LEM-WP September 2007, Bottazzi et. Al 2008 and Gragnolati,Regional Studies,2013

2 Bottazzi

6

linear utility function

• The linear case1 : gl = al + bnl ∙ al captures the intrinsic features of alternative l (local attractiveness) ∙ b captures the social externality of alternative l (social influence) ∙ nl number of agents that selected the alternative l

• Ergodic process: ∙ numerical averages converge asymptotically to the invariant distribution of the process.

• Analytical form of the equilibrium distribution ∙ a closed-form likelihood function to evaluate the best parameters that fit the observations2 . 1 G.

Bottazzi and A. Secchi. LEM-WP September 2007, Bottazzi et. Al 2008 and Gragnolati,Regional Studies,2013

2 Bottazzi

6

non-linear externalities

• The non-linear case: gl = al + bnl + cn2l ∙ c > 0 positive spillover (boost effect) ∙ c < 0 spatial congestion ∙ c = 0 recover the linear case (”Polya case”)

7

non-linear externalities

• The non-linear case: gl = al + bnl + cn2l ∙ c > 0 positive spillover (boost effect) ∙ c < 0 spatial congestion ∙ c = 0 recover the linear case (”Polya case”)

• no analytical form for the equilibrium distribution ∙ study of the time evolutions and trajectories of the configuration vector n(t) = (n1 (t), . . . , nL (t))

7

non-linear externalities

• The non-linear case: gl = al + bnl + cn2l ∙ c > 0 positive spillover (boost effect) ∙ c < 0 spatial congestion ∙ c = 0 recover the linear case (”Polya case”)

• no analytical form for the equilibrium distribution ∙ study of the time evolutions and trajectories of the configuration vector n(t) = (n1 (t), . . . , nL (t))

• the process is still ergodic ∙ numerical estimation of the equilbirium distributions ∙ time means as ensemble averages. ∑ ∙ limT→∞ T1 Tt=0 δnt ,n = π(n)

7

small perturbations regime

Considerations on the non-linear case: • Limits in the evaluation of the non-linear term ∙ no closed-form likelihood function ∙ computational estimation in a high dimensional spaces.

8

small perturbations regime

Considerations on the non-linear case: • Limits in the evaluation of the non-linear term ∙ no closed-form likelihood function ∙ computational estimation in a high dimensional spaces.

• Getting around the limits 1 n ∙ from point estimation in terms of configuration vectors (ML approach) ∙ to aggregate estimation in terms of occupancy classes (χ2 approach) ∙ small perturbation regime c ≈

8

.estimation & inferential analysis

framework for the hypothesis testing analysis

(a) The basic search procedure to find the ”best” parameter c. (b) Build the test statistics (a+b) Rejection of the Null Hypothesis (c = 0)

10

search procedure

(a) Grid-like procedure observation . . . . .

. . . . .

.

. . . . .

. . . . .

search procedure

(a) Grid-like procedure observation . . . . .

. . . . . n o

.

. . . . .

. . . . .

1. Initial observed configuration

search procedure

(a) Grid-like procedure observation . . . . .

. . . . . n o

c1 n′(tmin; c1)



.

. . . . .

. . . . .

1. Initial observed configuration 2. equilibrium configuration for a given c

search procedure

(a) Grid-like procedure observation . . . . .

. . . . . n o

c1 n′(tmin; c1) n′(tmin + T; c1)

.



. . . . .

...

. . . . .

1. Initial observed configuration 2. equilibrium configuration for a given c 3. trajectory evolution

search procedure

(a) Grid-like procedure observation . . . . .

. . . . . n o

c1 n′(tmin; c1) n′(tmin + T; c1)

.



. . . . .

...

. . . . .

Θ

1. Initial observed configuration 2. equilibrium configuration for a given c 3. trajectory evolution

Obj.Fun.distance

4. comparison measure

search procedure

(a) Grid-like procedure observation

1. Initial observed configuration

. . . . .

. . . . . n o

c1 n′(tmin; c1) n′(tmin + T; c1)



... Θ

c.k . . . ;. . c ) n′(tmin k

. . . .T; . n′(tmin + ck )

Θ Obj.Fun.distance



...

2. equilibrium configuration for a given c 3. trajectory evolution 4. comparison measure 5. another value of c

search procedure

(a) Grid-like procedure observation

1. Initial observed configuration

. . . . .

. . . . . n o

c1 n′(tmin; c1) n′(tmin + T; c1)



... Θ

c.k . . . ;. . c ) n′(tmin k

. . . .T; . n′(tmin + ck )

Θ Obj.Fun.distance

cK n′(tmin; cK)



n′(tmin + T; ck)

... Θ

2. equilibrium configuration for a given c 3. trajectory evolution 4. comparison measure 5. another value of c 6. K indipendent trajectories

search procedure

(a) Grid-like procedure observation

1. Initial observed configuration

. . . . .

. . . . . n o

c1 n′(tmin; c1) n′(tmin + T; c1)



...

c.k . . . ;. . c ) n′(tmin k

. . . .T; . n′(tmin + ck )

Θ

Θ

cK n′(tmin; cK)



n′(tmin + T; ck)

... Θ

ˆc(no) = argmin{Θ(n′(c), no)} c

2. equilibrium configuration for a given c 3. trajectory evolution 4. comparison measure

ˆc

5. another value of c 6. K indipendent trajectories 7. closeness among trajectories 11

search procedure

(a) Grid-like procedure observation

1. Initial observed configuration

. . . . .

. . . . . n o

c1 n′(tmin; c1) n′(tmin + T; c1)



...

c.k . . . ;. . c ) n′(tmin k

. . . .T; . n′(tmin + ck )

Θ

Θ

cK n′(tmin; cK)



n′(tmin + T; ck)

... Θ

ˆc(no) = argmin{Θ(n′(c), no)} c

2. equilibrium configuration for a given c 3. trajectory evolution 4. comparison measure

ˆc

point estimation for the c that generates a configuration closest to the observed no

5. another value of c 6. K indipendent trajectories 7. closeness among trajectories 11

montecarlo statistics

(b) Test statistical significance of ˆc 1. MonteCarlo approach: build the test statistic from configurations taken from the Null Hypothesis (c = 0), via a Polya random generator.

.

.

n.1

. .. .

model . (c = 0) n.s

.

.

. .. .

n.S

.

2. Run the previous search procedure to find the first optimum point ˜c(n1 ) 3. For different seeds apply the same procedure: S replica

...

...

...

...

...

...

...

...

˜c(ns)

˜c(n1)

...

...

...

...

...

...

˜c(nS)

ˆ Θ

4. Build the test statistics form the S independent estimates of ˜c

−ˆc2

.

ˆc2 ˜c

12

montecarlo statistics

(b) Test statistical significance of ˆc 1. MonteCarlo approach: build the test statistic from configurations taken from the Null Hypothesis (c = 0), via a Polya random generator.

.

.

n.1

. .. .

model . (c = 0) n.s

.

.

. .. .

n.S

.

2. Run the previous search procedure to find the first optimum point ˜c(n1 ) 3. For different seeds apply the same procedure: S replica

...

...

...

...

...

...

...

...

˜c(ns)

˜c(n1)

...

...

...

...

...

...

˜c(nS)

ˆ Θ

4. Build the test statistics form the S independent estimates of ˜c

−ˆc2

.

ˆc2 ˜c

12

montecarlo statistics

(b) Test statistical significance of ˆc 1. MonteCarlo approach: build the test statistic from configurations taken from the Null Hypothesis (c = 0), via a Polya random generator.

.

.

n.1

. .. .

model . (c = 0) n.s

.

.

. .. .

n.S

.

2. Run the previous search procedure to find the first optimum point ˜c(n1 ) 3. For different seeds apply the same procedure: S replica

...

...

...

...

...

...

...

...

˜c(ns)

˜c(n1)

...

...

...

...

...

...

˜c(nS)

ˆ Θ

4. Build the test statistics form the S independent estimates of ˜c

−ˆc2

.

ˆc2 ˜c

12

montecarlo statistics

(b) Test statistical significance of ˆc 1. MonteCarlo approach: build the test statistic from configurations taken from the Null Hypothesis (c = 0), via a Polya random generator.

.

.

n.1

. .. .

model . (c = 0) n.s

.

.

. .. .

n.S

.

2. Run the previous search procedure to find the first optimum point ˜c(n1 ) 3. For different seeds apply the same procedure: S replica

...

...

...

...

...

...

...

...

˜c(ns)

˜c(n1)

...

...

...

...

...

...

˜c(nS)

ˆ Θ

4. Build the test statistics form the S independent estimates of ˜c

−ˆc2

.

ˆc2 ˜c

12

complete hypothesis testing

observation

.

.

n.1

. .. .

. . . . .

. . . . . n o

c1 n′(tmin; c1) n′(tmin + T; c1)



...

c.k . . . ;. . c ) n′(tmin k

. . . .T; . n′(tmin + ck )

F

F

model . (c = 0) n.s

.

.

. .. .

n.S

c. K …

n′(tmin; cK)

...

n′(tmin + T; ck)

...

...

...

...

...

...

...

...

...

...

...

... ...

...

F

ˆc(no) = argmin{Θ(n′o(c), no)} c

˜c(ns)

˜c(n1)

˜c(nS)

ˆc

1. the empirical function from the ˜c’s under the null hypothesis c = 0

ˆ Θ

2. two-sided p-value of ˆc(n)

−ˆc2

ˆc2

3. reject or not the null hyp. c = 0. .

˜c

13

.the objective function

objective function

• Θ compares two configurations of the systems. ∙ χ2 minimization approach

• Distance measures in terms of occupancy classes. ∙ occupancy defined as as the number of alternatives chosen by n agents.

15

χ2 occupancy measure

∙ Classes Cj chosen to be equally populated ∙ Each bin contains the same number of occurrences

Figure: (a)

Figure: (b)

Figure: (c)

(a) from the observed no , the binning from cumulative distribution (b) the histograms respect to these classes is almost flat (c) visual hindsight of difference between configurations

16

comparing two occupation vectors

∙ distance function to compare configurations is the χ2 statistics: Θ(n′ , no ) = χ2 =

J ∑ (ho,j − he,j )2 he,j j=1

he,j :=

tmin +T 1 ∑ ht,j T t=t min

∙ he,j the average frequency of class j over the time T of the trajectory n′ (c) ∙ ho,j the frequency of the class j for the observed configuration no 17

comparing two occupation vectors

Other possible choices for the objective function are: ∙ Hellinger-Bhattacharya Distance: dH (p, q) =

[∑ √ ]1/2 √ ( pi − qi )2 i

∙ Kullback-Leibler Distance: KL(p, q) =

∑ i

∙ ∆-Distance ∆(p, q) =

pi log

pi qi

∑ (pi − qi )2 pi + qi i

18

optimization methods

Numerical solution to the optimization problem that minimizes Θ = χ2 : { ( )} ˆc(no ) = argmin Θ n′ (c), no c

Using two different methods: ∙ Grid method ∙ Iterative method

19

grid method - description

• the search for the minimum point ˆc ∈ [cmin , cmax ] • interval according to a quasi-linear condition. • find the minimum point value for χ2 (noisy function)

20

grid method - limitations and advantages

• simple and no convergence to reach. • choice of the step-size ∆c. ·10−2

χ2

2.8 2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 . −3

−2.5

−2

−1.5

−1

−0.5

0

0.5

c

1

1.5

2

2.5

3

3.5 × 10-2

.

21

grid method - limitations and advantages

• simple and no convergence to reach. • choice of the step-size ∆c. ∙ Finer :increases the resolution but it magnifies the oscillations deriving from the random nature of the process.

·10−2

χ2

2.8 2.6 2.4 .

2.2

.small ∆

2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 . −3

.

−2.5

−2

−1.5

−1

−0.5

0

0.5

c

1

1.5

2

2.5

3

3.5 × 10-2

.

21

grid method - limitations and advantages

• simple and no convergence to reach. • choice of the step-size ∆c. ∙ Finer :increases the resolution but it magnifies the oscillations deriving from the random nature of the process.

·10−2

χ2

2.8 2.6 2.4

. .

2.2

.small ∆ .large small ∆ ∆

2 1.8 1.6 1.4

∙ Coarser : increases the speed of the algorithm, but loss of definition and larger variance of the estimation (less likely to reject the null hypothesis)

1.2 1 0.8 0.6 0.4 0.2 . −3

.

−2.5

−2

−1.5

−1

−0.5

0

0.5

c

1

1.5

2

2.5

3

3.5 × 10-2

.

21

iterative method

• Iterative method to find the minimum • Exploit the quadratic behavior of the χ2 ∙ Local minimum by successively fitting parabolas (Brent, 1973).

• No requirement for a differentiable function, ∙ Avoiding the critical computation of derivatives in the case of noisy functions

• The superlinear rate of convergence (1.34).

22

parabolic approximation

1. three initial guess points x0 , x1 , x2 2. evaluate the function at the points f0 = f(x0 ), f1 = f(x1 ), f2 = f(x2 )

3. fit parabola via Lagrange interpolation q(x) = a0 + a1 (x − x0 ) + a2 (x − x0 )(x − x1 ) , f1 − f0 , x1 − x0 ( ) f2 − f0 1 − a1 a2 = x2 − x1 x2 − x0

a0 = f0 ,

a1 =

4. parabola’s min is the estimate for the min. of the function ¯x =

x1 +x0 2



a1 2a2

5. discard oldest point 6. repeat until convergence

23

convergence criteria

Differently from the deterministic case the convergence is set ∙ stochastic behavior of the evolved configurations ∙ χ2 noisy ∙ fix a tolerance criterion based on the variance of the process ∙

2 ( ) δχ ≤ σ fχ2 stop when fluctuations of minimum values are of the same order of magnitude of the estimated standard deviation of χ2 24

comparison of two methods

The best approach: the power of discriminate in the hypothesis testing with the same degree of algorithmic cost. Grid search approach ↑ simple implementation ↑ no convergence needed

Parabolic Interpolation approach ↓ more complex computational implementation

↓ fix the interval search

↓ constraint specification: control on local maxima and alignment issue cost an extra iteration.

↓ grid definition and step-size

↑ does not require differentiability

↓ higher resolution requires larger computational cost.

↑ more performing method

↓ slow search

↑ 4x faster than grid method ↑ resolution independent (only three points)

25

computational costs

The central code is the evolution of the choice process generating configuration vectors over time. The asymptotic temporal cost is: C(L, T) ∼ O(L) · O(T) C(L, T) = (a0 + a1 L) · T where: ∙ [a0 ] = [a1 ] = [sec] evaluated for a specific machine in use. ∙ L number of alternatives , T number of time step in the evolution

Then the spatial cost of the physical memory needed M(L, T) = mL · T ,

where [m] = [MBytes] the weight of each number that occupies 8 bytes. 26

computational costs

machine specification: • Matlab 64bit on Linux Machine and Windows • Intel i7 quad-core processor cpu 2.10GHz, DDR3 SDRAM of 8 Gbytes . • Programming accordingly to a major-order of Matlab • Parallel computing over Np = 4 workers: C(L, T)P = C(L, T)/Np

27

precision variables

• The total time of the estimation related to its efficiency is affected by: 1. T the temporal length of each occupancy trajectory and the respective obj. fun. computation. 2. S the number of seeds used to build the empirical statistics, for the hypothesis testing.

28

performance on simulated data

Example of inferential analysis with: • Precision parameters: S = 100-trajectories, T = 105 -time steps • System size:. L = 103 -alternatives and N = 104 -agents.

∙ Grid size of 40 increments in [−1 · 10−3 , 1 · 10−3 ],

∙ Each c is calculated in 4-8 iterative steps

∙ ∼ 2 hrs of calculation to compute it

∙ ∼ 20 min to compute it

29

.overview recapitulation

bottom-up recap.

Nested summary of the previous steps and forward.

∙ Specification of the numerical solution to the minimum problem of the Objective function { ( )} ˆc(no ) = argmin Θ n′ (c), no c

best with Successive Parabolic Interpolation Approximation

31

bottom-up recap.

Nested summary of the previous steps and forward. ∙ Definition of a measure of distance between occupancy distribution of the configuration vectors Θ(n′ , no ) =

J ∑ (ho,j − he,j )2 he,j j=1

the χ2 minimization in terms of occupancy classes, to find the value of c that make the distribution as ”closest” as possible.

31

bottom-up recap.

Nested summary of the previous steps and forward. ∙ Statistical compatibility with the observed distribution. Given the set of parameters (a, b, c), test the null hypothesis c = 0 against the alternative, at a given significance level (p-value).

31

bottom-up recap.

Nested summary of the previous steps and forward.

∙ Exploit the previous search procedure of inferantial analysis to build a point estimation of multiple parameters via a multistage procedure.

31

.point estimation of multiple parameters

multistage procedure

Estimate multiple parameters e.g. (b, c), multidimensional space, where each dimension is a parameter to be estimated.

∙ Procedure based on a coordinate descend method in which each parameter estimation is obtained respect to one variable at a time while the other is kept fixed.

33

multistage procedure

Estimate multiple parameters e.g. (b, c), multidimensional space, where each dimension is a parameter to be estimated.

A The single parameter (c) is obtained conditionally to an initial ˆ We start to test the value of b = b. hypothesis c = 0, if rejected take the new value ˆc0 .

33

multistage procedure

Estimate multiple parameters e.g. (b, c), multidimensional space, where each dimension is a parameter to be estimated.

B restriction c = ˆc0 and test the ˆ0 . If not, hypothesis that b = b procede.

33

multistage procedure

Estimate multiple parameters e.g. (b, c), multidimensional space, where each dimension is a parameter to be estimated.

E the procedure stops if the set of estimates not statistical different (cannot reject the last null-hypothesis).

33

.application to empirical data

localization choices

• After tested the estimation procedure on simulated data. • Analysis on localization choices for firms using Italian census data for the year 2001 (see ISTAT, 2006). • Are the localization choices of firms affected by non-linear externalities? • Negative non-linear terms in pl ∼ al + b · nl + c · n2l sign of congestion effect in some commuting zones. • Agents are the N firms that try to localize their plants across L commuting zones that compose Italy. 35

firms’ localization in italy

∙ Territory split in L = 686 commuting areas. ∙ Data for different industrial sectors (Constructions, Textiles, Apparel, Food, etc...) ∙ N-number of firms for sector. ∙ the model parameters: ∙ al capture the advantage of that location (attractiveness factor) ∙ b is positive externality of localization (nl already there)

36

firms’ localization in italy

Analysis to detect the presence of non-linear externalities ∙ Very high dimensional problem with N typically between 103 -106 number of firms. ∙ Until now only one significant sector with evidence for c ̸= 0. ∙ Other 20 sectors to analyze in Italy and other sectors in USA territory.

37

.conclusions

conclusions

• Analysis for the presence of non-linear externalities for a discrete choice model • Estimation framework developed for High Dimesional problems. • MATLAB toolbox with: ∙ ∙ ∙ ∙

Random Number Generator for generalized Polya distrubution, Optimization procedure (Succ.Parab.Approx.), Statistical significance tests and MonteCarlo simulations, Coordinate descend method for multivariate parameters’ estimation.

• LEM Working Paper, Online ISSN 2284-0400.

39

conclusions

• Analysis for the presence of non-linear externalities for a discrete choice model • Estimation framework developed for High Dimesional problems. • MATLAB toolbox with: ∙ ∙ ∙ ∙

Random Number Generator for generalized Polya distrubution, Optimization procedure (Succ.Parab.Approx.), Statistical significance tests and MonteCarlo simulations, Coordinate descend method for multivariate parameters’ estimation.

• LEM Working Paper, Online ISSN 2284-0400.

39

conclusions

• Analysis for the presence of non-linear externalities for a discrete choice model • Estimation framework developed for High Dimesional problems. • MATLAB toolbox with: ∙ ∙ ∙ ∙

Random Number Generator for generalized Polya distrubution, Optimization procedure (Succ.Parab.Approx.), Statistical significance tests and MonteCarlo simulations, Coordinate descend method for multivariate parameters’ estimation.

• LEM Working Paper, Online ISSN 2284-0400.

39

conclusions

• Analysis for the presence of non-linear externalities for a discrete choice model • Estimation framework developed for High Dimesional problems. • MATLAB toolbox with: ∙ ∙ ∙ ∙

Random Number Generator for generalized Polya distrubution, Optimization procedure (Succ.Parab.Approx.), Statistical significance tests and MonteCarlo simulations, Coordinate descend method for multivariate parameters’ estimation.

• LEM Working Paper, Online ISSN 2284-0400.

39

Thank you

[email protected] [email protected]

40

Suggest Documents