CAUSALITY Contributions to Spatial Econometrics

6 downloads 0 Views 3MB Size Report
(FC) to solve the difference between causal structures and non-experimental data. These conditions enable causal ...... 20 Mantova. 46 Lucca. 72 Bari. 98 Lodi.
CAUSALITY Contributions to Spatial Econometrics

Marcos Herrera Gómez

Advisors: Ph. D. Jesús Mur Lacambra Ph. D. Manuel Ruiz Marín

Contents

List of Symbols and Abbreviations

1

List of Tables

6

List of Figures

11

1 Introduction

13

2 Approaches to the Causality Concept

19

2.1

Philosophical Causal Theories . . . . . . . . . . . . . . . . . . . . . .

19

2.1.1

Determinist Theories . . . . . . . . . . . . . . . . . . . . . . .

19

2.1.1.1

Regularity Theory . . . . . . . . . . . . . . . . . . . .

20

2.1.1.2

Counterfactual Theory . . . . . . . . . . . . . . . .

25

2.1.1.3

Manipulability or Agency Theory . . . . . . . . . . . .

26

2.1.1.4

Theory of Mechanisms and Capacities . . . . . . . .

28

Non-deterministic or Probabilistic Theory . . . . . . . . . . .

29

2.2

Pearl’s Graphical Method and Glymour’s Algorithm . . . . . . . . .

33

2.3

Causal Approaches in Economics and Econometrics . . . . . . . . . .

38

2.3.1

Principal Theoretical Currents concerning Causation . . . . .

39

2.3.1.1

A Priori Structural Approach . . . . . . . . . . . . .

39

2.3.1.2

Inferential Structural Approach . . . . . . . . . . .

40

2.3.1.3

Inferential Process Approach . . . . . . . . . . . . .

42

2.3.1.4

A priori Process Approach . . . . . . . . . . . . . .

42

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

2.1.2

2.4

i

Contents 3 Causality in Space. A Parametric Approach 3.1

Causation in Space with Information Constraints . . . . . . . . . . .

46

3.2

A Testing Strategy for Causality . . . . . . . . . . . . . . . . . . . .

49

3.2.1

Selection of the Spatial Structure . . . . . . . . . . . . . . . .

50

3.2.1.1

J Test . . . . . . . . . . . . . . . . . . . . . . . . . .

51

Cross-Spatial Dependence Analysis . . . . . . . . . . . . . . .

55

3.2.2.1

Bivariate Moran Test . . . . . . . . . . . . . . . . .

55

3.2.2.2

Lagrange Multiplier Test . . . . . . . . . . . . . . .

56

3.2.2.3

Behavior of the Independence Tests. Finite Samples

60

Spatial Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

66

3.3.1

The Lagrange Multiplier Version of the Granger Test . . . . .

66

3.3.1.1

Performance for Finite Samples . . . . . . . . . . .

68

Granger-Wiener Predictive Efficiency . . . . . . . . . . . . . .

71

3.3.2.1

Monte Carlo Simulations . . . . . . . . . . . . . . .

77

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

81

Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

3.2.2

3.3

3.3.2

3.4

4 Causality in Space. A Non-Parametric Approach 4.1

4.2

4.3

ii

45

97

Symbolic Dynamics and Entropy . . . . . . . . . . . . . . . . . . . .

98

4.1.1

Symbolization Process . . . . . . . . . . . . . . . . . . . . . .

98

4.1.2

Entropy: Definitions and Concepts . . . . . . . . . . . . . . . 102

Independence in Spatial Processes . . . . . . . . . . . . . . . . . . . 105 4.2.1

Testing Independence between Spatial Processes . . . . . . . 106

4.2.2

Consistency of the Test Υ (m) . . . . . . . . . . . . . . . . . . 112

4.2.3

Permutation Alternative to the Independence Test . . . . . . 114

4.2.4

Performance for Finite Samples . . . . . . . . . . . . . . . . . 116

Analysis of the Appropriate Spatial Structure . . . . . . . . . . . . . 120 4.3.1

Detection of Most Informative Weighting Matrix . . . . . . . 120

4.3.2

Analysis of Performance in Finite Samples . . . . . . . . . . . 122

Contents 4.4

4.5

Spatial Causality in Information . . . . . . . . . . . . . . . . . . . . 124 4.4.1

Spatial Causality Test . . . . . . . . . . . . . . . . . . . . . . 124

4.4.2

Monte Carlo Simulations . . . . . . . . . . . . . . . . . . . . 127

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

5 Spatial Causality between Migration and Unemployment in the Italian Provinces

133

5.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

5.2

Review of Empirical Evidence . . . . . . . . . . . . . . . . . . . . . . 135

5.3

Migration and Regional Unemployment in Italy . . . . . . . . . . . . 139

5.4

Procedure for Detecting Spatial Causation . . . . . . . . . . . . . . . 143

5.5

5.4.1

The Framework for the Analysis . . . . . . . . . . . . . . . . 143

5.4.2

Step 1: The Selection of the Spatial Structure . . . . . . . . . 144

5.4.3

Step 2: Bivariate Spatial Dependence Analysis . . . . . . . . 147

5.4.4

Step 3: Spatial Causality . . . . . . . . . . . . . . . . . . . . 149 5.4.4.1

Parametric Test . . . . . . . . . . . . . . . . . . . . 149

5.4.4.2

Non-parametric Test . . . . . . . . . . . . . . . . . . 150

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 6 Final Conclusions

155

6.1

Principal Contributions . . . . . . . . . . . . . . . . . . . . . . . . . 156

6.2

Future Lines of Research . . . . . . . . . . . . . . . . . . . . . . . . . 157

Bibliography

159

iii

List of Abbreviations and Symbols ABREV IAT ION S ARM A

Autoregressive Moving Average Process

AR

Autoregressive Process

DGP

Data Generating Process

DAG

Direct Aciclical Graph

FC

Fidelity Condition

GM M

Generalized Method of Moments

LM

Lagrange Multiplier

M CC

Marcov Causal Condition

MA

Moving Average Process

N

Number of Finite Elements or S Order

QM L

Quasi-Maximum Likelihood

x, y, z, ...

Random Variables

X, Y, Z, ...

Random Vectors

S

Sets of Points on the Space

SEM

Simultaneous Equation Model

1

A, B, C, D

Single Events

SARAR(1, 1)

Spatial Autoregressive with Error Autoregressive

SpVAR

Spatial Vector Autoregressive

W (d)

Spatial Weighting Matrix of d Order

SVAR

Structural Vector Autoregressive

2SLS

Two-Stage Least Squares

SpVARCS

Spatial Vector Autoregressive of Cross-Section Data

VAR

Vector Autoregressive

M SE

Mean Square Error

SY M BOLS ∼

as

Asymptotically Distributed



And



Bidirectional Causality

B(n, p)

Binomial Distribution with Parameters n and p

Iyx

Bivariate Moran Test



Cause to

χ2k

Chi-Square Distribution with k Degrees of Freedom

Cab

Combinatory Between a and b

E (· | ·)

Conditional Expectation

P (· | ·)

Conditional Probability

h.|. (m)

Conditional Symbolic Entropy

2

CV (·) , V (· | ·)

Conditional Variance



Containment

D

Convergence in Distribution when N goes to ∞

−→, plim

P

Convergence in Probabilty when N goes to ∞

cov(x, y)

Covariance Between x and y

C

Critical Region



Distributed Approximately



Distributed

F (.)

Distribution Function

m

Embedding Dimension

{Ø}

Empty Set

E (·)

Expectation

2 Ry/x

Expected Coefficient of Determination

I

Fisher’s Information Matrix



For All

l(.)

Gradient Vector

H1

Hypothesis, Alternative

H0

Hypothesis, Null

IN

Identity Matrix of Order N



Independence

i.i.d.

Independently and Identically Distributed

−→

3

ι(.) , τ(.)

Indicator Function



Infinity

Λ

Information Set

P (· , ·)

Joint Probability



Krönecker Product

LF

Likelihood Function

ryx

Linear Correlation Coefficient Between x and y

L (· , ·)

Log-Likelihood Function

H−i

Identity Matrix whose i − th row has been Removed

Me

Median of Process

ln(.)

Natural Logarithm

N

Natural Numbers

;

No Cause to

¬

No Ocurrency

N µ, σ 2



Normal Distribution with Mean µ and Variance σ 2

#

Number of Elements of a Set



Or

∂(.)/∂(.)

Partial Derivation, First

∂(.)2/∂(.)∂(.)

Partial Derivation, Second

p.d.f.

Probability Density Function

Q

Product of Numbers

4

Q.E.D.

Quod Erat Demonstrandum

R

Real Numbers

Nsi

Set of Neighbors of si

W(x, y)

Set of Space-Dependent Structures Between x and y

Γn

Set of n Symbols

α

Significance Level

XW

Space-Dependent Structure for x

YW

Space-Dependent Structure for y

Σ

Sum

h(m)

Symbolic Entropy

f (.), g(.)

Symbolization Map



Then

tr

Trace

A

0

Transpose of A

σ 2 (·)

Variance

lii

Zero Column Vector Except for the i − th which is 1.

5

List of Tables

2.1

Determinist Theories of Causality . . . . . . . . . . . . . . . . . . . .

20

2.2

Theoretical Approaches in Economic . . . . . . . . . . . . . . . . . .

39

3.1

Empirical Size of Iyx Test at 5% level . . . . . . . . . . . . . . . . .

62

3.2

Empirical Power of Iyx Test at 5% level . . . . . . . . . . . . . . . .

63

3.3

Empirical Power of Iyx Test at 5% level . . . . . . . . . . . . . . . .

63

3.4

Empirical Power of Iyx Test at 5% level . . . . . . . . . . . . . . . .

64

3.5

Empirical Power of Iyx Test at 5% level . . . . . . . . . . . . . . . .

64

3.6

Empirical Size of LMI Test at 5% level

. . . . . . . . . . . . . . . .

64

3.7

Empirical Power of LMI Test at 5% level . . . . . . . . . . . . . . .

65

3.8

Empirical Power of LMI Test at 5% level . . . . . . . . . . . . . . .

65

3.9

Empirical Power of LMI Test at 5% level . . . . . . . . . . . . . . .

65

3.10 Empirical Power of LMI Test at 5% level . . . . . . . . . . . . . . .

66

3.11 Empirical Size of LMN C Test at 5% level . . . . . . . . . . . . . . .

69

3.12 Global Estimated Power of LMN C Test at 5% level . . . . . . . . . .

70

3.13 Global Estimated Power of LMN C Test at 5% level . . . . . . . . . .

70

3.14 Global Estimated Power of LMN C Test at 5% level . . . . . . . . . .

71

3.15 Empirical Size of pˆ Test at 5% level . . . . . . . . . . . . . . . . . . .

77

3.16 Global Estimated Power of pˆ Test at 5% level . . . . . . . . . . . . .

78

3.17 Global Estimated Power of pˆ Test at 5% level . . . . . . . . . . . . .

79

3.18 Global Estimated Power of pˆ Test at 5% level . . . . . . . . . . . . .

80

4.1

ˆ 1 Test at 5% level . . . . . . . . . . . . . . . . . . 116 Empirical Size of Ψ

7

List of Tables 4.2

ˆ 1 Test at 5% level . . . . . . . . . . . . . . . . 116 Empirical Power of Ψ

4.3

ˆ 1 Test at 5% level . . . . . . . . . . . . . . . . 117 Empirical Power of Ψ

4.4

ˆ 1 Test at 5% level . . . . . . . . . . . . . . . . 117 Empirical Power of Ψ

4.5

ˆ 1 Test at 5% level . . . . . . . . . . . . . . . . 118 Empirical Power of Ψ

4.6

ˆ 1 Test at 5% level . . . . . . . . . . . . . . . . 118 Empirical Power of Ψ

4.7

ˆ 1 Test at 5% level . . . . . . . . . . . . . . . . 118 Empirical Power of Ψ

4.8

ˆ 2 Test at 5% level . . . . . . . . . . . . . . . . . . 119 Empirical Size of Ψ

4.9

ˆ 2 Test at 5% level . . . . . . . . . . . . . . . . 119 Empirical Power of Ψ

ˆ 2 Test at 5% level . . . . . . . . . . . . . . . . 119 4.10 Empirical Power of Ψ ˆ 2 Test at 5% level . . . . . . . . . . . . . . . . 120 4.11 Empirical Power of Ψ ˆ 2 Test at 5% level . . . . . . . . . . . . . . . . 120 4.12 Empirical Power of Ψ 4.13 Simulations of Conditional Entropy. Linear Case . . . . . . . . . . . 123 4.14 Simulations of Conditional Entropy. Non Linear Case . . . . . . . . 124 4.15 Empirical Size of δˆ (YW , XW ) Test at 5% level . . . . . . . . . . . . . 127 4.16 Global Estimated Power of δˆ (YW , XW ) Test at 5% level . . . . . . . 128 4.18 Global Estimated Power of δˆ (YW , XW ) Test at 5% level . . . . . . . 129 4.20 Global Estimated Power of δˆ (YW , XW ) Test at 5% level . . . . . . . 130 5.1

National Unemployment Rate and Net Migration . . . . . . . . . . . 140

5.2

Conditional Entropy. 3-year Periods . . . . . . . . . . . . . . . . . . 145

5.3

Conditional Entropy. Alternative Aggregations . . . . . . . . . . . . 146

5.4

J-Test. 3-year Periods . . . . . . . . . . . . . . . . . . . . . . . . . . 146

5.5

J-Test. Alternative Aggregations . . . . . . . . . . . . . . . . . . . . 147

5.6

Iyx and LMI Tests. 3-year Periods . . . . . . . . . . . . . . . . . . . 147

5.7

Iyx and LMI Tests. Alternative Aggregations . . . . . . . . . . . . . 148

5.8

Ψ1 and Ψ2 Tests. 3-year Periods . . . . . . . . . . . . . . . . . . . . 148

5.9

Ψ1 and Ψ2 Tests. Alternative Aggregations . . . . . . . . . . . . . . 149

5.10 LMN C Test. 3-year Periods . . . . . . . . . . . . . . . . . . . . . . . 149 5.11 LMN C Test. Alternative Aggregations . . . . . . . . . . . . . . . . . 150 5.12 δˆ (YW , XW ) Test. 3-year Periods . . . . . . . . . . . . . . . . . . . . 150

8

List of Tables 5.13 δˆ (YW , XW ) Test. Alternative Aggregations . . . . . . . . . . . . . . 151 5.14 Italian Provinces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

9

List of Figures

1.1

Causal Language in Economics . . . . . . . . . . . . . . . . . . . . .

14

2.1

Common Cause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

32

2.2

Intermediate Cause . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33

2.3

Causal Sequence between Events . . . . . . . . . . . . . . . . . . . .

34

3.1

Strategy for Approach of Causality . . . . . . . . . . . . . . . . . . .

50

4.1

Example of Regular Lattice 3 × 3 for xs and ys . . . . . . . . . . . . 101

5.1

Relationship between Unemployment and Net Migration . . . . . . . 141

5.2

Spatial Distribution of Variables . . . . . . . . . . . . . . . . . . . . 142

5.3

Quantil-Quantil Graph . . . . . . . . . . . . . . . . . . . . . . . . . . 144

11

1 Introduction Causality is referred to on a daily basis. Our colloquial language is full of causal relationships, generalizations of experiences and observations that are classified as the effects or causes of certain actions: “Smoking causes cancer”, “Physical exercise improves your health”, “I passed the exam because I studied hard”, etc. Our language is full of phrases containing the words cause and effect or associated relationships. In a scientific setting, in spite of influential attempts to attribute this language to a pre-scientific era (such as Russell, 1953), the idea of causality persists. Causal relationships are present in all sciences, physical, biological and social.

Many

scientists seek causal inferences, arguing that the principal goal of science is to discover the mechanisms that govern the studied object. Whether from a scientific or colloquial perspective, we are interested in knowing what or who generated a result, how a given objective can be attained or what the consequences of performing a given action are?. But what does causality means? What is causality on an ontological level? Epistemologically, how can we detect the existence of a causal relationship? In general, there is a tendency to refrain from specifying these responses. There is no clear agreement concerning what causality means exactly, and there are different nuances in the answers to the other questions. These shortcomings are not only found in science, as philosophy is not unanimous in its perception of the issue either. In economics, we have seen a renewed interest in the study of causality in the last few decades. One sample is found in Figure 1.1, which shows the trend in the use of causality-related terms from 1930 to 2001.

13

1 Introduction

Figure 1.1: Causal Language in Economics

Source: Hoover (2004). Graph shows articles using the words of “causal family” as a fraction of all articles in the JSTOR archive of economics journals. Causality-related terms: cause, causes, causal, causally, causality, causation.

This figure comes from a study by Hoover (2004), who compiled the use of causality-related terms in leading economic journals. They fell out of use from the fifties to the eighties, but returned in the form of a boom in the nineties. Interpreting the figure with caution (the study includes all publications using at least one term related to causality), it shows the current significance of the topic. The treatment given to causality has been important since the onset of econometric modelling. Some authors, such as Heckman (1999), argue that the principal contribution of econometrics is the definition of causal parameters, their discovery from data and the role played by parameters in policy assessment. Despite its importance, however, the study of causality has been limited in spatial econometrics. This is a relatively young discipline, which has grown considerably in the last few decades due, among other things, to a growing concern for economic issues associated with space and territory. We should highlight that main research never contemplated the problem of causality. The most similar concept to cause that we can find in this literature is explanation, in the sense that the variables on the right-hand side explain what occurs to the left-hand variable.

14

There are some reasons that explain this omission. It can be attributed to the type of data and relationships used in spatial econometrics: cross-sectional data and models that are static in the sense of time. However, a discussion of causality refers us to a time context so, in a cross-section, it would not appear to make much sense to consider this problem. Another argument that is often mentioned is that general equilibrium models are generally used in cross-section (Isard, 1971). The cross-section reflects the longterm solution, without the need to openly consider the issue of causality. Part of the time series literature (Charemza and Deadman, 1997; Brockwell and Davis, 2003, for example) shares the simplistic view that cross-sections should be interpreted in terms of static long-term relationships, while time series lead to dynamic short-term models. In any event, this is a clearly unsatisfactory situation. If we cannot progress in the discussion about causality in space, we should reconsider the real utility of cross-sectional econometric models. In this case, the purpose of spatial econometrics in the scientific method of economics should be purely descriptive, as its ability to confirm or reject theories is evidently limited. The premise is that causality is one of the strong concepts in economics. In general, economic models are causal models, transmitting the impact of some variables on others. The analysis of causality should therefore be part of regular econometric models, irrespective of the context in which these models are resolved (time, space or mixed). Over time, we can see and isolate sequences of assumed causes and hypothetical effects, facilitating the construction of causality tests based on the premise of temporal precedence, without abandoning the objective of investigating causal relationships. However, in space we do not have that information that compels us to seek alternatives for the temporal precedence, without sacrificing the objective of investigating causal relationships. In this thesis, we approach the study of causal relationships in space in a compact manner, attempting to present a conceptual and operative definition so that its

15

1 Introduction existence can be empirically detected.

The relationship between philosophical

explanation and empirical strategy is of the utmost importance for applied research as, in general, although empirical studies do not explain the causality concept used, they do implicitly use one. This lack of clear concepts has generated major debates and hindered scientific progress, with important consequences for data interpretation. In general, we refer to causality when we refer to a relationship or process between two or more events in which one generates or produces the other. This process implies that one event is the effect of another, commonly known as cause. According to the definition provided by the Spanish Academy of Language, cause means “basis or origin of something, reason to act, law according to which effects are produced”. These definitions, however, are not precise enough to be the basis of our research. We will study the concept in depth and formalize its characteristics. We will review the main currents of thought in the philosophical field. We will then consider the predominant approaches used in economics, highlighting the differences between different schools. We will be including the most recent developments related to causal methods, such as Pearl’s graphic approach and the algorithm proposed by Spirtes, Glymour and Scheines. This review enables us to set the bases for an analysis of causality in a spatial context and contemplate a statistical procedure for empirically testing its existence. The statistical approaches used contemplate both parametric and non-parametric methods. The study is structured as follows. Chapter 2 reviews the literature on causality. The objective is to describe the different schools found in both the philosophical and economic fields. This review helps us to understand better the problems associated to the term causality and leads us to define our work plan. Chapter 3 develops a strategy to detect causation with parametric methods. For the detection of spatial causation we present an approach based on a system of simultaneous equations and also an approach regarding the incremental

16

predictability of the effect process, when the data set contains information about the cause variable. Chapter 4 described a non-parametric method based on Information Theory concepts. This is a new method, enabling us to develop a strategy for testing causation without having to estimate causal parameters. Chapter 5 provides an applied example, where the subject is particularly significant. The chosen topic is migration and unemployment in Italian provinces. In this chapter, we apply all the tools we have previously developed, comparing the inferences made by the parametric and non-parametric methods. Finally, Chapter 6 presents a summary highlighting the main contributions made by this study and possible future lines of research.

17

2 Approaches to the Causality Concept

2.1 Philosophical Causal Theories There has been a very intense debate concerning the definition of causality in the Philosophy of Science context. The different positions include two antagonistic ways of seeing the world: the deterministic and the non-deterministic views. Those in favour of determinism sustain that the world is governed by laws and there is no possibility of choosing individuals or objects. In this world, everyone behaves according to established patterns. According to this idea, behaviour is governed by hidden laws that can only be discovered by the scientific method. The different theories based on this concept include the Neo-Humean approach, Manipulation approach, Counterfactual theory and Mechanism and Capacities, each of which will be further developed in the next section. The non-deterministic approach, however, finds that the world is too complex to establish exact causal relationships. Its arguments are based on the fact that laws cannot be established in many scientific fields and on quantum mechanics developments regarding the unpredictability of molecular behaviour.

2.1.1 Determinist Theories Brady (2003) presents a review of determinist philosophic theories, all of which will be described with their advantages and disadvantages. Table 2.1 summarizes the essential characteristics of each approach.

19

2 Approaches to the Causality Concept

Table 2.1: Determinist Theories of Causality Neo-Humean Regularity Theory

Counterfactual Theory

Manipulation

Theory

Mechanisms and Capacities

von Wright (1971) Gasking

Major Authors

Hume (1739) Mill (1843)

(1955) Lewis (1973)

Mackie (1965)

Collingwood (1940)* Menzies and Price (1993)

Cartwright (1989) Glennan (1996)

Woodward (2002)* Consideration of

Consideration

similar worlds of:

Approach to the Symmetric Aspect

Observation of constant conjunction and correlation

“If the cause

Consideration

occurs then so

that regularly

does the effect”

produces the

and “if the cause

effect from the

does not occur

cause

then the effect

of whether there is a mechanism or capacity that leads from the cause to the effect

does not occur” Consideration of

Approach to the Asymmetric Aspect

the truth of: “If Temporal

the effect does

precedence

not occur, then the cause may

Observation of

An appeal to

the effect of the

the operation of

manipulation

the mechanism

still occur”

Emphasis on causes of effects or effects of causes?

Causes of effects

Effects of causes

Effects of causes

Causes of effects

Experiments,

Advantage

Observational

Experiments,

Natural

Analytic

and causal

Case study

experiments,

models, Case

modeling

comparisons

Quasi-

studies

experiments Brady (2003), * Included in Zalta (ed.)(2002)

2.1.1.1 Regularity Theory For this viewpoint, a successful causal explanation is one that finds the causes of effects. Its principal supporters include Hume, Mill and Mackie.

20

2.1 Philosophical Causal Theories David Hume made considerable contributions to the approach to causal relationships. In book one, part three, of his Treatise on Human Nature (1739), he exemplifies causation by means of billiard balls: one ball hits another and causes it to move. The idea of cause and effect comes from resolving the essential elements in the example, which are: 1. First, the idea of causality between two objects, cause and effect, involves contiguity (one billiard ball causes the other to move when they touch). This characteristic establishes that the two objects are close in place and time. 2. Secondly, there is priority in the time of the cause relative to the effect, called succession or temporality (the ball that moves first is the cause; the other is the effect). According to Hume, if a cause were to be contemporaneous to its effect, this effect would be contemporaneous to its effects, and so on, and there would be nothing like a succession of events. All objects should be co-existent. This characteristic is usually referred to as an asymmetric relationship. 3. Thirdly, there has to be a constant union between cause and effect, called invariance or constant conjunction. This quality establishes the relationship (as it is more important than the other two) and is the source of the idea of necessary connection. Other characteristics can be deduced from these three, such as: (i) the same cause always produced the same effect, and the same effect only arises from the same cause (causal determinism); (ii) when different objects produce the same effect, causality must be attributed to a feature shared by the objects; (iii) the difference between the effects of two similar objects must be derived from that in which they differ. Regularity (constant conjunction), contiguity and temporal priority summarize Hume’s initial approach. This view reflects a highly empirical position and a clear methodological inclination towards inductivism: observational generalizations. In their causal affirmation, cause and effect events are general sentences, in contra-

21

2 Approaches to the Causality Concept position to singular instances (observable). Regularity is the difference between the singular and the general. Subsequently, in An enquiry concerning human understanding (1748), Hume attempts to clarify some of the above points. He focuses on the idea of necessary connection as a mental consequence of regularity. The idea of necessary connection comes from the way in which the human mind reacts to the perception of regularity, not from a characteristic of objects in the real world. In this second definition of causality, interest is focused on the mental process, on how people think of causality (Hausman, 1998). When the two publications are compared, it is evident that Hume provides two definitions of causality, one based on observational regularity and another based on the idea of such regularity. Which of these two definitions is actually preferred by Hume? consensus.

There is no

Beauchamp and Rosenberg (1981) believe that Hume accepts both

definitions, which are consistent and complementary. On the other hand, Hoover (2001) believes that Hume establishes a division between the ontological part (what causality is) and causal inference conditions (regularity relationship). He sustains that Hume is a causal skeptic, as the two meanings cannot both be precise. The nearest thing to the ontological term of causality, attributed to a mental process, is observation of the regularity relationship, and these terms are not equivalent. There has been different criticism to regularity theory in both of its versions. With regards to the constant conjunction criterion, it is considered to be applicable to all instances in which the circumstances of the cause-effect sequence are estimated to be similar; it could be argued that there is no similarity if an effect does not occur. Hume’s theory sustains that cause inevitably leads to effect. In other words, if C is a sufficient cause of D, the presence of C always implies the presence of D. If the relationship is more complex, however, and there is another cause B, complementary to C, that helps to produce D, the presence of C becomes a necessary condition for the presence of D. So all the necessary conditions for the occurrence of D should be

22

2.1 Philosophical Causal Theories included. Hausman (1989), for example, establishes that the relationship between a light switch and the light issued by a bulb is deterministic, but he highlights that there is not always light when we press the switch. The bulb could be broken or there could be a faulty contact. According to Hume’s theory, the causal relationship is incomplete in such cases, and more events have to be added to fully explain the effect. The fact that it is difficult to specify the set of causes that are necessary and sufficient for an effect is a reason for questioning whether the connection between cause and effect is deterministic. In view of this, Mackie (1965) proposes a refinement for the definition of cause, adding IN U S (insufficient but necessary part of an unnecessary but sufficient condition). This is better explained by an example. If someone reports that a fire was caused by a cigarette butt, the butt is an ingredient of an implicit scenario that could include a careless smoker and a fair amount of inflammable material near the butt, etc. The scenario is not a necessary cause of the fire (the fire could have started for another reason) but, if appropriately described, is indeed sufficient cause. The cigarette butt itself is not sufficient to cause the fire (other things, such as the presence of oxygen, are required), but it is a necessary part of the scenario (the fire would not have started if it were not for the but). The unextinguished cigarette butt is an insufficient but necessary part of a scenario that is unnecessary but sufficient for the fire to occur. The butt is an IN U S cause. Mackie (1974) positions this concept in the context of what he calls causal field, defined as a scenario where the effect occurs sometimes but not always. Defining a causal field is a way of delimiting causal analysis. Once the causal field has been defined, the analysis consists of seeking a difference between when the effect occurs and when it does not. Mill (1843) enlarges upon Hume’s proposal, including the possibility of plural causes. If this is the case, the same effect may be found without the same cause, as it is produced by another object. Subsequently, Mill postulates a number of

23

2 Approaches to the Causality Concept methodological criteria for the empirical identification of causes or effects. These principles are: 1. Method of Agreement: If D invariably has a preceding condition, this condition is its cause 2. Method of Difference: if D occurs in the presence of C but not when C is not present, then all other conditions being constant (ceteris paribus), C is the cause of D. 3. Indirect Method or Method of Residues: if D occurs in a given situation, and part of D is known because of a particular antecedent, the remainder of D is the effect of the remaining antecedents. 4. Method of Concomitant Variations: if the variation in C is accompanied by a variation in D, C is the cause (or effect) of D. Regularity theory fails to distinguish causal from non-causal sequences.

In

the example provided by Irzik (2001), night and day represent an invariable and repetitive sequence, but can hardly be accepted as a causal relationship. Relationships from common causes cannot be detected by Humean conditions. Mill attempted to solve this problem by adding the condition that cause must be a non-conditional invariable antecedent. Assume, for instance, that C and B are antecedent events of event D, and that C is necessary and sufficient for D in the observed data. Also assume that B always occurs in the observed data. We can conclude that C causes D if we are capable of saying that C is necessary and sufficient for D, even when ¬B has occurred. This condition, introduced by Mill, involves a kind of counterfactual argument that is the essence of the approach proposed by Lewis. Mackie (1974) argues that a causal relationship should contain a priority condition to distinguish between causal and non-causal sequences. The temporality criterion is thus replaced by causal priority.

24

2.1 Philosophical Causal Theories With regards to asymmetry between cause and effect, the only characteristic that helps this distinction is temporal precedence. Philosophers such as Russell supported Hume, but others, including Kant, argue that causes and effects can occur contemporaneously. Hausman (1989) highlights that causes can continue to exist after some of their effects have commenced. The asymmetry relationship cannot be determined in these cases. An extensive discussion of this approach can be found in chapter 3 of Mackie (1974), Beauchamp and Rosenberg (1981), chapter 8 of Horwich (1987), chapter 3 of Hausman (1998) and Irzik (2001).

2.1.1.2 Counterfactual Theory Although he never explored this approach, Hume (1748) was the first to provide an explicit definition of counterfactual causality: “We may define a cause to be an object followed by another, and where all the objects, similar to the first, are followed by objects similar to the second. Or, in other words, where, if the first object had not been, the second never had existed”. The key to this approach is the counterfactual element: the second object or effect would not have occurred if, contrary to the facts, the first object or cause had never occurred. The counterfactual approach was further developed by Lewis (1973). According to him, the counterfactual approach requires the truth of two statements about two different events, C and D. Lewis establishes that, when the occurrence of events C and D is observed, it can be sustained that the counterfactual “if C was to occur, then D will occur” is a true sentence. The truth of this statement is the first condition for a causal relationship. He then considers the truth of the second counterfactual statement: “if C was not to occur, then D would not occur either”. If this is also true, it can then be said that C causes D. According to Lewis, we have causal dependence, a relationship between events that is sufficient to define causality. Causal dependence implies transitivity; an event is causally dependent on another if and only if they are connected by a chain of causal dependence.

25

2 Approaches to the Causality Concept This theory’s virtue is that it is directly concerned with singular causal events and does not require the examination of a large number of observations of C and D. The possibility of analyzing specific cases is important for all investigators, especially for those who conduct case studies. One observation of a cause and its effect is sufficient to establish causality, if it can be shown that, in the closest similar world, the absence of the cause leads to the absence of the effect. The problem with this approach is that the second condition cannot be evaluated in the real world because, as C occurs, the second condition is false. Lewis suggests considering the possibility of the condition being true in the world closest to the real one, where C does not occur. The problem is how to identify this closest world. On the other hand, the direction of causality cannot be established, so two effects of a common cause could mistakenly be seen as cause and effect. Assume we are studying the relationship between a barometer and a storm, each of which is an effect of high pressure systems. If barometric pressure grows, the storm occurs, and the opposite is also true. Therefore, by counterfactual definition, each of these events is the cause of the other. If we include the temporal precedence condition in the theory, we could solve the causal direction problem, but not the common cause problem. This is one of the reasons why Lewis decides not to continue in this direction. Those who do not support this viewpoint claim that counterfactual arguments, such as closest possible world, are inherently metaphysical and that this approach is therefore less scientific than is commonly assumed (Dawid, 2000; Shafer, 2000). The complete development of this theory, together with its limitations, can be found in Dowe (2000) and Menzies (2002).

2.1.1.3 Manipulability or Agency Theory A manipulable explanation focuses on finding the effect of causes. In general, this approach is predominant in experimental sciences. Briefly, its supporters claim that:

26

2.1 Philosophical Causal Theories if C is a genuine cause of D, we should therefore be capable of manipulating C in such as way as to be able to manipulate or alter D. Unlike other theories, this approach is supported by many scholars without philosophical backgrounds, although its main supporters include renowned philosophers such as von Wrigth (1974) and Menzies and Price (1993). The principal characteristic of the initial approach is that the definition of causality is linked to the action of man; in other words, it is an anthropocentric concept. As von Wright acknowledges, the notion of causality is linked to the personal experience of doing one thing and then achieving another. On the other hand, causes are relative; only manipulable events can be potential causes; the rest of the events preceding the effect form part of the causal base but not of the possible causes (Anderson, 1938). By considering effectively and potentially controllable events as causal, this approach can resolve some of the problems found in regularity theory. Some regular non-causal relationships can be easily resolved by the manipulability approach. In the day and night case, for instance, neither event is controllable and the sequence cannot be causal. The anthropocentric position has generated some criticism as it limits the causal environment and cannot explain events the processes of which are independent from humans. This shortcoming led Menzies and Price (1993) to refine the theory, sustaining that causal deductions can be made, as if they were manipulable, by smallscale models or simulations of uncontrollable events such as volcanic eruptions or earthquakes and their effects on a city. This contribution from Menzies and Price was not welcomed. The possibility of using small-scale models is controversial, as the causal process may not correctly represent the large-scale phenomenon. With regards to the simulations, the manipulable causes must be similar to the phenomenon that involved nonmanipulable causes; consequently, the notion of appearance trusts that the same causal process is operative in both cases. The problem is that the explanation of

27

2 Approaches to the Causality Concept non-manipulable causes requires the definition of appearance, but this concept is not defined in agency theory terms (Woodward, 2001). This problem can be solved by including a counterfactual formulation in manipulation theory: C causes D if, and only if, D could change with appropriate manipulation of C. Woodward (2000, 2001) claims that an event can qualify as an intervention even if it does not involve human action. In other words, a natural process involving inanimate beings can be qualified as intervention if counterfactual information is included. This type of investigation is described as a natural experiment. Acceptance of this implies that we limit the study of causation to manipulable events or events similar to interventions. This leaves most science outside the scope of the study of causality, including astronomic development and geology, for example. In our particular scientific field, economics, we should only focus on causality in matters such as programme assessment, labour policies and other areas where there is room for manipulation. Sobel (1999) mentions that this philosophical approach leads to the empirical strategy of causal inference developed by Rubin (1974), Neyman (1923) and Holland (1986). See Heckman (2005) for a detailed analysis.

2.1.1.4 Theory of Mechanisms and Capacities In this approach, causality is established by the presence of a mechanism between causes and effects. Observed regularities are explained in terms of a lower level process and the mechanism involved varies from one field to another and from one time to another. This approach also provides a satisfactory solution to the problem of matching causes to particular effects that undermines regularity and counterfactual theories. According to Cartwright (1989), cause is a capacity or power to produce a given effect. This capacity is more or less stable in different situations. In this approach, regularity is an empirical manifestation of the nature of things in special

28

2.1 Philosophical Causal Theories circumstances. The notion of stable capacity is linked to that of invariance, in the sense that the relationship between cause and effect must continue to be stable when circumstances change. When it is said that smoking causes cancer, this must be true for different ages, genders, etc. Glennan (2002) provides the following definition: “A mechanism for a behavior is a complex system that produces that behavior by the interaction of a number of parts, where the interactions between parts can be characterized by direct, invariant, change-relating generalizations”, pp. 344. For Glennan, causal mechanisms are relatively stable combinations of parts, and they can be manifest in multiple space-time locations. Mechanisms consists of a number of broken-down parts with relative stable properties in the absence of intervention. Given a particular breakdown of parts, interaction between parts is a causal notion that should be interpreted in terms of the truth of certain counterfactuals. The idea of direct, invariant, change-relating generalizations is close to proposed manipulation theory or Hume’s idea concerning succession. Each chain of mechanistic explanations has finally to be considered in a law derived from observed regularities. Brady (2003) highlights that the mechanisms explanation is global, enabling each science to have a particular level of explanation, according to the studied phenomenon. Social scientists do not seek chemical explanations of what they study and chemists do not seek to explain social or aggregate behaviour.

2.1.2 Non-deterministic or Probabilistic Theory The theories visited so far assume that causal relations are deterministic. But in the social sciences, and primarily in economics, individuals anticipate certain events or learn from past experience, so a deterministic causal relation is difficult to accept. Non-deterministic theory could be more acceptable because it makes use of probability, although its use is not limited to this approach.

29

2 Approaches to the Causality Concept Causal probabilistic theory establishes the need for a non-deterministic treatment of causation. Its principal supporters are Reichenbach (1956), Good (1961, 1962), Salmon (1980) and Suppes (1970). Several arguments have been used in favour of non-determinism: the observation that everyday life and scientific activity is not transparently determined; and even if the world were deterministic, it is too complex for a deterministic explanation. In this approach, some authors such as Sobel (1995) and Irzik (2001) see an attempt to salvage Humean or regularity theory, as we can reinterpret the constant conjunction relation. Following Suppes (1970), it can be argued that this is a too restrictive criterion. An event is thus the cause of another if the appearance of the former is very probably followed by the appearance of the latter. This characteristic can be described as probable conjunction. In probabilistic terms, let P (D|C) be the conditional probability of D given that C occurred. Reinterpreting the constant conjunction criterion established by Hume, this would be expressed as: P (D|C) = 1



P (D|¬C) = 0,

where ¬C indicates that C did not occur. As we saw earlier, the problem is that many phenomena never meet this criterion. Probabilistic causation highlights this problem and only requires the occurrence of C to make the occurrence of D more likely. In probabilistic terms, there is probable conjunction when: P (D|C) > P (D|¬C) ⇒ C → D, where «→» is read cause, C causes D. The simplest formulation of probabilistic theory sustains that C causes D if P (D|C) > P (D), but this implies that P (C|D) > P (C), or that D causes C (as P (D, C) = P (D|C)P (C) = P (C|D)P (D), where P (D, C) is the joint probability of C and D). As Suppes assumes, the problem can be solved by imposing that the cause must come before the effect, so P (Dt+1 |Ct ) > P (Dt+1 ), which does not imply that

30

2.1 Philosophical Causal Theories P (Ct+1 |Dt ) > P (Ct+1 ). This problem is known as observational equivalence and has often been mentioned in economics (Simon, 1953; Sargent, 1976). The new formulation of the constant conjunction criterion is more practical, although there are some problems. Assume that C generates two effects, D and A, and that D always precedes A. If we assume that D is the lightning and A the thunder, when D is present A occurs and if ¬D occurs, then ¬A occurs, this meets the above condition, implying that lightning causes thunder. We know, however, that this is not true, and the relationship is spurious due to a common cause. Suppes adds a condition to prevent this type of relationship: effects D and C cannot have common causes. More specifically, an event C causes an event D if, and only if, P (D|C) > P (D) and there is no event A, before C and D, that could prevent the relationship between C and D. Event A will explain C → D if, and only if, P (Dt+2 |Ct+1 ) > P (Dt+2 ) and P (Dt+2 |Ct+1 , At ) = P (Dt+2 |At ). Conditioned by A, then, the occurrence of C does not increase the probability of D. Simpson’s paradox is another complication. Assume that C causes D and that there are no factors that explain both events. It is therefore to be expected that the conditional probability of D, given C, is greater that the probability of D. Assume that C contributed to B, which prevents D from appearing. In this case, C is a positive causal factor for D through one trajectory and a negative causal factor for the same event through the other trajectory. If the two influences cancel each other out, we will then not see an increase in expectation probability. Reichenbach (1956) attempted to provide an explanation of causal asymmetry in terms of independence from the temporal order. His principal contribution is related to the common cause principle: if a coincidence or unlikely relation occurs, there must be a common cause. Reichenbach highlights that causation is not limited to a functional relationship, as functional relationships are symmetrical in cause and effect, while causal relationships are asymmetrical.

Reichenbach analyses the direction of time in

terms of the direction of causality, appealing to the common cause principle: if the

31

2 Approaches to the Causality Concept coincidence of two events, A and B, occurs more often than would correspond to their independent occurrence, in other words if the events satisfy P (A, B) > P (A) P (B), there is a common cause C for these events such that the ABC fork is conjunctive, and satisfies: 1. A⊥B|C. 2. P (A|C) > P (A|¬C) . 3. P (B|C) > P (B|¬C) . where A⊥B|C means that A and B are probabilistically independent, conditioned by C. Reichenbach’s last notation implies that P (A, B) > P (A) P (B), so that the common cause explains the dependence between A and B. This principle is of key importance for determining the direction of causality and the direction of time: there is a fork ACB (Figure 2.1) such that the relations comply with the common cause principle, and where there is not another C 0 that satisfies these conditions in relation to A and B, C is the common cause, and antecedent, of A and B. Figure 2.1: Common Cause

A

C

B

Reichenbach extends his analysis to intermediate causes: C is an intermediate cause between A and B (Figure 2.2) if 1 > P (B|C) > P (B|A) > P (B) > 0 and A⊥B|C. The purpose of this discussion is that conditional probabilities can be used to construct a complete causal graph.

32

2.2 Pearl’s Graphical Method and Glymour’s Algorithm

Figure 2.2: Intermediate Cause

A

C

B

2.2 Pearl’s Graphical Method and Glymour’s Algorithm This approach maintains an axiomatic definition of causality. In other words, it details the conditions necessary for a set of variables to be classified as causal. They include the following (for simplicity’s sake, relationships are represented by simple events): 1. The relationship must be transitive: if A causes B and B causes C, then it must be true that A causes C. 2. The relationship must be local, in the sense that it meets Markov’s condition: this means that events are due to their nearest causes. 3. The relationship must be irreflexive: an event cannot cause itself, although this does not mean that all events must be causally explained. Every causal explanation includes events that are accepted without being derived from prior events. 4. Finally, the relationship must be asymmetrical: if A causes B, then B cannot simultaneously be the cause of A. This does not invalidate cyclical effects in the time between event B and event A. The idea here is that the causal dependence between a set of variables can be represented by a direct acyclic graph (DAG). Pearl (2000) presents this graphic tool as an alternative formalization of causality. Based on the Cowles Commission’s analysis in the fifties, Pearl criticizes economists’ current tendency to interpret the equal sign (=) in structural models as if it was the usual mathematical symbol instead of a representation of direct causality. From his viewpoint, equations should represent causal models: the causes of the part on the left of the equal sign, which

33

2 Approaches to the Causality Concept are their effects, are located on the right. All equations represent causal laws. These equations can be represented by graphs with a unique causal direction and no causal simultaneity. A measure of probability can be assigned to the represented variables so that it reflects the independence or not between them. This is known as a Bayesian network. Basically, a Bayesian network consists of a direct acyclic graph (DAG), the nodes of which are variables in the domain of interest. These nodes are joined with a probability distribution of each conditional variable on its ascendents on the graph (or its marginal probability if it has no ascendents). The nodes and their probabilities are joined by a fundamental assumption known as the Markov condition: each variable is probabilistically independent from its non-descendents conditioned on its ascendents on the graph. We can then calculate the probability involving the variables in the domain. A causal network is a Bayesian network in which the graph structure is interpreted in terms of direct causal relations. In Figure 2.3, we say that A and B cause C and that they both indirectly cause D, through C. With a causal interpretation, the Markov condition, now called the causal Markov condition (CM C), says that each variable is probabilistically independent from non-conditional effects on its direct causes. It is normally assumed that the CM C must be met if the graph correctly described the causal relation between the variables and no causally relevant variable is missing.

Figure 2.3: Causal Sequence between Events

A

C

B

34

D

2.2 Pearl’s Graphical Method and Glymour’s Algorithm The CM C also involved the common cause principle proposed by Reichenbach (1956). This principle establishes that, if a system of variables meets the Markov condition, and they have a high degree of association, they are caused by a latent factor. Spirtes, Glymour and Scheines (2000) use the CM C and the faithfulness condition (F C) to solve the difference between causal structures and non-experimental data. These conditions enable causal inferences from observational data as if there had been an experimental intervention in the latter. The faithfulness condition assumes that probabilistic dependence fully rests on revealed causal connections. In other words, all the independence and conditional independence relations between observed variables are a result of the CM C applied to the true causal structure resulting from the data. These two conditions lead to the manipulation theory proposed by Meek and Glymour (1994). When probabilities satisfy CM C and F C, and when intervention is ideal in the sense of manipulability, causal inference is legitimate. Given an external intervention on a variable A in a causal model, the investigator can derive the subsequent probability distribution on the complete model by simply altering the conditional probability distribution of A. If the intervention is strong enough for A, it can be seen as its only cause. The model does not have to be changed, providing that the system’s causal structures remains unaltered. The meaning of intervention here is similar to the definition proposed by Woodward, with no human action required. The implementation of this theory led Glymour and his group of investigators to develop specific software called T ET RAD to intervene in structural equation models. T ET RAD seeks all possible trajectories between variables by means of algorithms and the search is similar to that proposed by other programs that seek automatic trajectories, such as LISREL and EQS. Different causal search algorithms have been developed along this line of reasoning (Spirtes, Glymour and Scheines, 2000). In general, they start with estimation of

35

2 Approaches to the Causality Concept correlations between the variables. The most common is the P C algorithm, which assumes that graphs are strictly acyclic or recursive. The process begins with a graph where all the variables are causally connected in an unknown direction. Successive independence tests are then performed between pairs of variables, conditional on a set of zero variables, followed by one, two, three and so on, until the set of variables has ended. Whenever independence is found, the causal relation between the variables on the graph is removed. Once the pairs have been checked, trios of variables are tested in which two are conditionally independent but connected by the third. If the variables remain conditionally dependent when conditioned on a third, this variable affects the relation and intervenes in a causal arrangement between them. A provisional causal arrangement takes place when these variables have been identified. The strategy followed by Glymour and his group can be divided into two steps. They first generate a list of all the relevant relations contained in the data, and they then scan the structure to see which explanation most simply considers the largest number of these relations. Most applications of these methods assume that causal structure is acyclic, but relations between variables are often cyclic or simultaneously determined in economics. Some progress has been made to extend the analysis to simultaneous causal systems (Pearl, 2000; Richardson, 1997; Richardson, Spirtes and Glymour, 1997). There are few applications of this approach to economic issues, but we can mention Swanson and Granger (1997), Demiralp and Hoover (2003) and Demiralp, Hoover and Perez (2008). The graphic method provides an alternative to the traditional problem of identification of structural vector autoregression (SVAR). Identification generally involves imposing constraints on contemporaneous relations, and it depends on a priori assumptions. Swanson and Granger (1997) exploit conditional independence in data to reduce identification assumptions. On the other hand, Demiralp and

36

2.2 Pearl’s Graphical Method and Glymour’s Algorithm Hoover (2003) provide Monte Carlo results on the effectiveness of the graphic method for discovering the true contemporaneous structure in SVAR. The impact and use of the graphic approach together with the Glymour algorithm is considerable. But it also has been the object of harsh criticism. Cartwright (1999), for example, argues that the elimination approach is useless if relevant variables and genuine causes have not been included from the start. According to Cartwright, to investigate some hypotheses, one must conduct randomized experiments instead of seeking the CM C in non-experimental data. Randomized experiments however, cannot be used in sciences such as astronomy and geology, but they have produced solid causal inferences. Cartwright argues that causal laws cannot be reduced to probabilistic laws, so the CM C is questionable. According to Cartwright (1999), probabilities can be a guide to causes, but they are like the symptoms of a disease: there is no general formula for divining the disease from one symptom. In other words, there is no universal condition that can be imposed on all causal structures. Cartwright (1983) claims that it is a mistake to make universal causal inferences. Causal relations are always confined to one specific population. Dupres and Cartwright (1988) suggest that there are only probabilistic mechanisms or capacities, but not probabilistic causal laws for everything. A causal explanation depends on the stability of the capacities. In contrast to probabilistic causality, which is relative to a group of variables, capacities remain constant when they are outside the context in which they were measured (see Section 2.1.1.4 for a more detailed description of these concepts). Likewise, Cartwright (1999) is skeptical about the universal nature of the faithfulness condition. With this condition, it is generally unacceptable for there to be two equally powerful causal effects that cancel each other out. Without this condition, however, any explanation would be possible by merely arguing that there is no evidence of the causal connection as the effects cancel each other out. In response, Glymour (1999) accepts the criticism that the true model of a scientific

37

2 Approaches to the Causality Concept issue is too complex for all relevant variables to be included. But it is also true that science is rarely interested in absolute truth. An investigation that correctly discovers the causes of many of the variations in a social phenomenon, and eliminates minor causes, would be a victory. Another important criticism was provided by Humphreys and Freedman (1996), who highlight the limitations of the Spirtes approach. They indicate, for example, that there is no coherent reason to believe that the graphic representation is a synonym of causation. The claim that nodes are equal to causes is based on the CM C. The CM C is only the Markov condition with the added assumption that the graphic arrangement represents causality. Humphreys and Freedman remain firm: causality is not a consequences of the theory contemplated by Spirtes, but another assumption.

Investigators involved in this approach need to know the

cause beforehand. For other criticisms and illustrations of the examples analyzed in the book by Spirtes, Glymour and Scheines (2000), see Freedman and Humphreys (1998).

2.3 Causal Approaches in Economics and Econometrics John Stuart Mill, one of the classic points of reference when discussing methodology in economics, argues that it is an inexact, separate, science, the general principles of which are known a priori. The base of all economic knowledge is thus the ceteris paribus principle (Hausman, 1989). Following Mill’s premise, the Cowles Commission (Koopmans, 1950; Hood and Koopmans, 1953) developed the approach called simultaneous equation modeling or SEM , using economic theory a priori. On the other hand, the seminal work of Haavelmo (1944) enabled the inclusion of statistical techniques with which joint data endogeneity could be formulated. These two contributions were of key importance for the development of 20th century econometrics (Morgan, 1990).

Haavelmo established the basis for the

statistical treatment of endogenous variables. The Cowles Commission’s work is

38

2.3 Causal Approaches in Economics and Econometrics a consequence of this, and of particular interest for the relationship between theory and data. Heckman (1999) sustains that most econometric theory is an adaptation of statistical developments. The main difference between econometrics and statistics is that econometric analysis pursued causal relations, whereas statistics focuses on the analysis of correlations. He argues that the principal contribution of econometrics is the definition of causal parameters, their discovery from data and the role of these parameters in policy assessment.

2.3.1 Principal Theoretical Currents concerning Causation Hoover (2006) establishes a classification of the principal doctrines on causation in economics, which is shown in Table 2.2. This section shows the principal, but not all, theoretical currents.

A Priori

Inferential

Table 2.2: Theoretical Approaches in Economic Structural Process Cowles Commission’s Theory, Koopmans (1953), Hood y Zellner (1979) Koopmans (1953) Simon (1953), Hoover (1990, 2001), Favero y Hendry Granger (1969), Sims (1980) (1992), Angrist y Krueger (1999, 2001)

Source: Hoover (2006).

2.3.1.1 A Priori Structural Approach The Cowles Commission relates the causality quality to invariant properties of structural econometric models. This approach emphasizes the distinction between internal and external variables and the identification and estimation of structural parameters. It assumes that economics has an a priori approach, as discussed by Mill. The approach’s apriorism comes from confidence in economic theory for providing an appropriate guide for the identification of causality, largely based on the

39

2 Approaches to the Causality Concept simultaneous equation model or SEM . This method was always primarily used in the estimation of Keynesian macroeconomic models and supply-demand curve parameters.

In the mid-60s, the Cowles Commission research programme was

broadly perceived as intellectually successful but empirically wrong. The term structural is not explicitly defined by the Cowles Commission. In general, it refers to the difference between the structural and the reduced form of the model. In the structural form, each internal variable is expressed relative to other internal and external variables. The reduced form is a solution in which each internal variable is expressed only relative to the external variables. The critics of the a priori structural approach included Lucas (1976), who highlights the questionable theoretical classification of external and internal variables.

Lucas rejects the simultaneous equation approach because, if agents

anticipate certain events, the internal variables will change while the external variables do not, and the estimated coefficients will not capture the effect of the external variables, but of a combination of the two factors. Lucas (1976) highlights the importance of rational expectations together with the fact that structural models cannot be used to identify causal effects. This criticism is closely related to the problem of parameter identification and concept of exogeneity. The alternative consists of all the variables participating symmetrically and being treated as internal, which leads to the autoregressive vector concept proposed by Sims (1980).

2.3.1.2 Inferential Structural Approach Simon (1957) formalizes the causal order concept as an asymmetrical and invariant relation with interventions on the system’s basic parameters. Simon (1953) shows that causality can be defined in structural models not only among internal and external variables but also among internal variables.

He also shows that the

conditions for a correctly defined causal order are similar to identification conditions.

40

2.3 Causal Approaches in Economics and Econometrics Hendry (1995) defines cause as a quantitative process that induces changes over time, within a parametric structure. A structure is an entity that remains invariant to interventions and directly characterizes the affected relations. The relationship between cause and effect is asymmetrical: the latter cannot induce the former. Hoover (2001) defines cause as an asymmetrical relation with unconnected parameters within a causal structure, which is a characteristic of reality. The provision of empirical evidence on a given causal relation between two internal variables in a model involves distinguishing causal order and ensuring identification of the equation. A basic assumption in this case is that non-observable external variables are not correlated to observable external variables. This is another way of expressing Reichenbach’s common cause principle. Simon, on the other hand, strays from the Cowles Commission proposal by suggesting that the way to reveal causal direction is by controlled or natural experiments. Hoover (2001) generalizes Simon’s approach to non-linear equations, showing that the idea of natural experiments can be captured by institutional changes, historic events and other non-statistical information that enables the testing of structural breaks in the system under analysis. This non-statistical information is important for identifying an intervention as belonging to the process that governs the causal relationship. Hoover indicates that the causal process can be identified by nonstatistical information, denying the possibility of doing so due to the methodological apriorism discarded by rational expectations. The macroeconomic current of causal analysis headed by Angrist and Krueger (1999, 2001) can be seen as belonging to the same scheme as that proposed by Simon (Hoover, 2006). The identification of causal relationships is guided by instrumental variables (Angrist, Imbens and Rubin, 1996). A natural experiment is a change in a relevant policy or factor that can be identified by non-statistical information.

41

2 Approaches to the Causality Concept 2.3.1.3 Inferential Process Approach This approach is based on the statistical properties of the process under study to identify causal direction. In the 1980s, Sims (1972, 1980, 1986) promoted the definitive abandonment of the SEM approach in favour of a more flexible technique based on autoregressive vectors, where all variables are considered to be endogenous. Sims highlights the incredible nature of the identification assumptions used by the Cowles Commission. The causality proposed by Granger (1969) and Sims (1980) is inferential and based on data, with no reference to the economic theories that could support such a process or relationship. The analysis developed by Granger is close to the criterion established by Hume. In the idea of causation, the timing criterion is of key importance for causal identification, abstaining from all possible a priori information. According to our philosophical review, this approach fits perfectly with the position adopted by Suppes (1970). The probabilistic approach identifies a cause with the increase in the probability of the effect which is exactly the definition applied by Granger. In the following chapters, we will more broadly consider the discussion about incremental predictability and the autoregressive vector approach. 2.3.1.4 A priori Process Approach Zellner (1979) claims that causality can be defined in terms of predictability according to a law or set of laws. Based on this, we can see that he agrees with Granger in the inferential approach, but this inference is based on the theoretical support of a guiding law. Theoretical support acts as a framework for distinguishing between laws and possible false or non-causal regularities. Zellner criticizes Granger’s position regarding causation for two reasons. First, it is not satisfactory to identify cause with a mere order in time. Secondly, Granger’s approach is atheoretical. Due to the lack of a reference theory, Granger’s method discovers accidental regularities.

42

2.4 Summary

2.4 Summary In this chapter, we have presented the principal currents of thought regarding causation. It is difficult to provide a neutral definition of causation, as it is seen as a relationship, force, connection between events, objects, variables, statuses, etc. The most recent theories combine mathematical logic, graph theory and Bayesian approach, etc. The situation is similar in economics, where we also find a large collection of definitions of causal relationships.

Each approach presents an alternative

definition and an empirical method for its detection. In any event, the inferential approaches (structural and process) have become the most dominant for present-day econometrics. The structural approach focuses more on microeconomic issues, on the evaluation of programs and natural experiments. The process approach finds its main area of empirical application in macroeconomics. These approaches should not be seen as competitive as they are based on different assumptions. Our interest lies in spatial econometrics, where there is no control over the variables.

We therefore naturally prefer Granger’s approach (1969).

Briefly,

Granger’s approach is based on weak stationarity in the respective series, such as xt and yt . The available information set at time t is Λt . This set includes all the historic information, up to period t, of both series, xt and yt , together with zt , which represents the set of contextual variables in which the causal relationship is investigated. That is: Λt ≡ {xt−j , yt−j , zt−j } ,

0

∀ j ≥ 0. Let Λt ≡ {yt−j , zt−j },

∀ j ≥ 0, be the information set that excludes the past and present of xt ; σ 2 (·) is the variance of the respective forecasting error. According to Granger (1969), xt is the causal variable for yt if and only if 

0



σ 2 (yt+1 |Λt ) < σ 2 yt+1 |Λt . In other words, if the future value of yt can be more precisely predicted (with less variance) if the past values of xt , in relation to yt+1 , are included in the data set. Granger (1980) highlights a fundamental condition: the causal variable must contain unique information about the effect variable and this information must not

43

2 Approaches to the Causality Concept be present in another variable. As a result, the cause variable must help to better predict the effect variable. This discussion leads us to the concept of incremental predictability as a quantifiable measure that successfully tests causation. This same idea was presented for Wiener (1956): “For two simultaneously measured signals, if we can predict the first signal better by using the past information from the second one than by using the information without it, then we call the second signal causal to the first one”. Our objective is to develop a concept, based on the Granger-Wiener system, of causation in space similar that proposed for time series.

44

3 Causality in Space. A Parametric Approach As seen in the previous chapter, the causation issue has been of key importance in philosophy, economics and econometrics.

In a sense, the identification of

an unequivocal causal relationship, in the way expected by economic theory, is fundamental for evaluating the relevance of an econometric specification.

This

concern, however, has received little attention in the specific context of crosssectional spatial econometric models. Our proposal attempts to duly consider the problem of causation in spatial econometrics, contemplating a definition of causation that emphasises in the incremental information content. In other words, intuitively, the establishment of causation would mean that the variable considered cause provides additional information about the variable considered effect. In turn, a reference to informative content implies establishing a particular definition of information. In this sense, we could consider, for instance, “phenomenon that provides things with meaning or sense”, “organized set of processed data that constitutes a message about a given entity or phenomenon”, “numerical quantity that captures the uncertainty in the result of an experiment to be performed”. This last definition is the best for our purpose and directly refers to the entropy of information (Information Theory) that we will present in Chapter 4. A concern for capturing information from data has always been present in statistics and in econometrics. As Soofi (1994) mentions, in a parametric context, Fisher (1921) proposes the inverse of the variance of the sample distribution of an estimator

45

3 Causality in Space. A Parametric Approach as the measurement of relevant information provided by data regarding an unknown parameter. Based on the idea of incremental predictability, Granger (1969) uses the measurement of information proposed by Fisher to develop the causation concept most popular in econometrics. We now present the treatment of causation for a pure cross-section, contemplating information measures similar to that used by Granger in a context of bivariate relations. However, in space this requires a series of prior steps that are necessary before testing for causation. Briefly, the strategy that we propose begins by the analysis of the spatial structure and the relationship between variables. The detection of spatial dependence is implemented both in a single equation framework and in a system of simultaneous equations. The consideration of systems of equations in space provides an adequate context for directly testing spatial causation. We end by contemplating a complementary strategy based on a iterated forecasting exercise for a cross-section data.

We complete the study with the

respective Monte Carlo experiments.

3.1 Causation in Space with Information Constraints Spatial econometrics is a relatively young discipline, which has grown considerably in the last few decades, largely due to a growing concern for issues associated with space and territory. The development of this discipline was significantly affected by the textbook of Luc Anselin, “Spatial econometrics: Methods and Models”, published in 1988. Given its significance, it is interesting to note that the problem of causation is not even considered. In particular, the list of topics of interest for “Spatial econometrics” comprises approximately 500 items, not one of them being related to causation. The same occurs with other textbooks, such as Paelinck and Klaassen (1979), Upton and Fingleton (1985), Anselin and Florax (1995) or other more recent publications such as Tiefelsdorf (2000), Griffith (2003), Anselin, Florax

46

3.1 Causation in Space with Information Constraints and Rey (2004), Getis, Mur and Zoller (2004), Arbia (2006) or Pace and Lesage (2009). Our bibliographical review only detected three papers that have considered the problem. The first, by Blommestein and Nijkamp (1981), is an adaptation of the structural approach for spatial data, with a recursive triangular system, according to the methodology suggested by Simon (1953). On the other hand, Stern (2000) describes the difficulties involved in applying Granger’s treatment of causation to space, highlighting its importance in a different sense: Granger’s approach should be interpreted as a test that provides information about which variables are dependent or explanatory. Finally, Mur and Paelinck (2009) review Granger’s identification and causation concepts with regards to space, reinterpreting the terms future/past as near/distant.

For testing purposes, these authors establish the explanatory

capacity of the cause variable relative to the effect variable through the coefficient of correlation, similar to Granger’s test in time series. Other than these few papers, research on the subject is practically non-existent; this may be due to several reasons, some of which are obvious. The type of data and relations used in spatial econometrics are important: data with a single cross-section and static models (in a time perspective). As we have seen, one of the basic principles on which the discussion of causation is based on the temporal precedence between cause and effect. This principle plays an important role in Granger’s approach, as the data contained in one of the variables’ lags is used to examine the existence of causal relations with the rest. Simultaneous observations of different variables are used in a cross-section context. In turn, part of the literature believes that a cross-section reflects the long-term equilibrium solution, avoiding the need to openly consider the issue of causation. This is a convenient yet unacceptable explanation.

Paelinck (1983) suggests a

more interesting argument, insisting that disequilibria are dominant in space. The dynamics of economic events on space (defined as the time between a shock in one point of space and the materialization of all its effects) are very slow. From

47

3 Causality in Space. A Parametric Approach this perspective, a cross-section consists of the combination of a series of prior disequilibria that began in the past. This approach is attractive from a theoretical perspective, but is evidently problematic as it fails to identify different causation pathways. Another of the traditional principles found in the literature about causation is physical proximity between cause and effect, which contradicts the principle of allotopy: “The factors that explain an economic phenomenon in one part of space are often located in distant areas”, as contemplated by Ancot, Kuiper and Paelinck (1990, pp. 141). The lack of physical proximity between interacting agents is the basis of spatial econometrics, one of whose objectives is to identify and evaluate spatial dynamic mechanisms that result in interactions between points that are not necessarily close to each other in space. In general, the specialized literature has not paid much attention to spatial causation. Indeed, the five principles that, according to Ancot, Kuiper and Paelinck (1990, pp. 141-142) must guide the model’s specification are:

1. Spatial models must include spatial interdependence.

2. These spatial interdependence relations will probably be asymmetrical.

3. The principle of allotopy.

4. Ex post interaction can be different from ex ante interaction.

5. The models must contain explicit topological elements.

As we can see, no reference is made to why one variable is placed on the left of the equal sign and the rest on the right. The most similar concept that we can find in this literature is explanation, in the sense that the right-hand variables should explain the variables on the left-hand side.

48

3.2 A Testing Strategy for Causality

3.2 A Testing Strategy for Causality When the analysis of causality is considered in a time context, there are sequence of issues to be resolved before the problem is tackled (Geweke, 1984). One of them, for example, is the selection of the order of lags appropriate for each variable, without questioning that each series can be explained by its own past values. A similar process can be considered in the a spatial context, although a broader discussion is required. For example, the selection of the spatial structure for each series is not obvious. It is not enough to specify the order of the spatial lag; we also have to define the number of neighbours for this order. Once dependence has been detected in at least one of the series, we can analyze the relationship between the variables. In general, the analysis of dependence assumes that the detected spatial structure, order and number of neighbours is similar in all the variables. However, although this is generally accepted, it is not always the case. To progress in our study of spatial causality, we propose the strategy summarized in Figure 3.1. Given two spatial series, we first consider the existence or not of spatial dependence in each series. If the spatial information is irrelevant for both of them, we should use a traditional causal approach, as contemplated by Pearl and Glymour. If we detect spatial dependence, space has to be treated as an informative structure that is relevant for the causality analysis. We suggest the inclusion of a process to select the most appropriate weighting matrix and its relevant order. Having identified the spatial structure of each series, we continue our study of crossed dependence among the variables, using the previously detected structures. We now use a bivariate relationship to consider the discussion about spatial causation. The process ends here if there is no bivariate dependence. If there is dependence between the variables, we analyze the direction of the information. We suggest two strategies: Lagrange multipliers or iterated prediction. This final stage can lead to different conclusions, the most desired of which is the detection of a single direction in the information.

49

3 Causality in Space. A Parametric Approach

Figure 3.1: Strategy for Approach of Causality

Bivariate Analysis of Causality among series with Spatial Information

Do the series contain spatial structure?

No

Space is not relevant

Pearl, Glymour approach or traditional analysis

No

Only univariate dependence

There is no bivariate spatial causality mechanism

No

Dependence between series or two-directionality

Yes

Is there spatial dependence among the series?

Yes

Can a single direction be detected in the information?

Yes

Spatial Causation

3.2.1 Selection of the Spatial Structure The spatial structure is usually introduced by means of a weighting matrix. This matrix plays an important role as it essentially defines the set of neighbours for each location. In general, it seems reasonable to assume that the nearest points will show stronger relationships than more distant points.

50

3.2 A Testing Strategy for Causality In econometric practice, the weighting matrix is generally constructed from geography or geometry, using the concepts of contiguity and distance.

The

criterion is subjectively selected, depending on investigator’s knowledge of the study topic. This decision is important, because the weighting matrix will condition the subsequent analysis. Basically, we are selecting the structure with which we analyze causes and effects among the variables. This structure must contain appropriate and non-redundant information. The idea, then, involves using some statistical criterion that enables us to select, from different candidates, the most appropriate weighting matrix for the problem at hand. Surprisingly, this aspect has not been considered in the specialized literature very often. We can only suggest the J-test, known in econometrics as an instrument for selecting from non-nested models (Davidson and MacKinnon, 1981).

3.2.1.1 J Test Kelejian (2008) discusses the applicability of the J test in a spatial context. Briefly, the problem formulated by the author is as follows. He proposes a SARAR (1, 1) model, where the subscript 0 indicates that it is the model of the null hypothesis:

y = X0 β0 + λ0 W0 y + u0

(3.2.1)

u0 = ρ0 M0 u0 + ε0 where y denotes the N × 1 vector of observations of the dependent variable, X0 denotes the N × k matrix of regressors (in our case it could contain a single constant term). Both variables, X0 and y, have been measured without error. W0 and M0 are N × N spatial weighting matrices defined a priori, β0 is a k × 1 vector of unknown parameters, λ0 and ρ0 are unknown scalar parameters, u0 denotes the N × 1 vector of errors terms and ε0 is an N × 1 vector of innovations, assuming that ε0 ∼ i.i.d. 0, σ 2 IN . This is called Model0 . 

51

3 Causality in Space. A Parametric Approach Under the alternative hypothesis, the data-generating process has a similar structure, Model1 :

y = X1 β1 + λ1 W1 y + u1

(3.2.2)

u1 = ρ1 M1 u1 + ε1

Premultiplying Model0 by (IN − ρ0 M0 ) yields:

y0 (ρ) = Z0 (ρ) γ + ε0

(3.2.3)

where y0 (ρ) = (IN − ρ0 M0 ) y, Z0 (ρ) = (IN − ρ0 M0 ) Z0 , with Z0 = (X0 , W0 y) and 

0

0



γ = β , λ . The same transformation can be applied to Model1 . In this context, the J-test can been seen as the test of the following augmented equation:

y0 (ρ) = Z0 (ρ) γ + φ [Z1 (ρ1 ) γˆ1 ] + ε0

(3.2.4)

where γˆ1 represents a consistent estimator of γ1 and φ is a parameter whose value, under the null hypothesis, is φ = 0. The parameters to be estimated are, for Model0 , β0 , λ0 , ρ0 , σ02 and, for Model1 , β1 , λ1 , ρ1 and the variance σ12 . These coefficients can be obtained by the generalized method of moments, GM M , suggested by Kelejian and Prucha (1999) or by the recent quasi-maximum likelihood method, QM L, proposed by Burridge and Fingleton (2010). Below we present briefly the GM M procedure of Kelejian and Prucha. As the model (3.2.4) contains a spatial lag of the dependent variable, the estimation method proposed is based on instrumental variables. Let the list of instruments be:

52

3.2 A Testing Strategy for Causality

T0 = (X0 , W0 X0 , . . . , W0r X0 , M0 W0 X0 , . . . , M0 W0r X0 )LI T1 = (X1 , W1 X1 , . . . , W1r X1 , M1 W1 X1 , . . . , M1 W1r X1 )LI T¯ =



¯ ¯ M W X, ¯ . . . , M W rX ¯ W X, ¯ . . . , W r X, X,

 LI

¯ = (X0 , X1 ), subscript LI indicates that the columns of the corresponding where X matrices are linearly independent; typically, r ≤ 2. Kelejian suggests the following procedure: 1. Estimate the null hypothesis model of (3.2.1) by two-stage least squares, 2SLS, using the matrix of instruments T0 ; we obtain the residual vector u ˆ0 . Repeat this procedure for the alternative model (3.2.2) by 2SLS, using the matrix of instruments T1 . 2. Take γˆ1 appearing in (3.2.4) as the 2SLS estimator based on matrix T1 for the alternative model. 3. Using the estimated residuals of null model, u ˆ0 , estimate the parameter ρ0 by the generalized moments procedure, GM M , proposed by Kelejian and Prucha (1998). Replace ρ0 with ρˆ0 and estimate the resulting model by 2SLS using instrument matrix T0 . Obtain the residual vector, εˆ, and use this vector to 0

estimate the corresponding variance: σ ˆε2 = εˆ εˆ/N . This is the generalized spatial two-stage least squares procedure. 4. Replace ρ in (3.2.4) by ρˆ0 .

Considering F = (Z1 γˆ1 ) as the empirical

counterpart to (3.2.4) let y0 (ˆ ρ) ≈ Z0 (ˆ ρ0 ) γ + φF + ε0

5. Estimate (3.2.5) by 2SLS using T¯ as instruments.

(3.2.5)

Specifically, the set

of regressors of (3.2.5) is denoted by S = (Z0 (ˆ ρ) , F ), and the regression

53

3 Causality in Space. A Parametric Approach 0

parameters as η =



0



γ , φ . Note that, under the null hypothesis model,

0 0 η0 = γ , 0 . Let Sˆ = P S ≡ Zˆ0 (ˆ ρ) , Fˆ







0 2SLS estimator of η is: ηˆ = Sˆ Sˆ



−1



0 where P = T¯ T¯ T¯



−1

0 T¯ , so the

0 Sˆ y0 (ˆ ρ).

Kelejian (2008) shows that

 D N 1/2 (ˆ η − η) −→ N 0, σε2 plim

N →∞

0 Sˆ Sˆ N

!−1 

(3.2.6)



P

σ ˆε2 −→ σε2 Clearly, for finite samples the inference can be based on an approximation such as:

ηˆ ≈ N



η, σ ˆε2



0 Sˆ Sˆ

−1 

(3.2.7)

0 0 ˆ Let k¯ = k + 2; ηˆ = γˆ , φˆ ; Vˆφˆ be the estimated variance corresponding to φ,





0 which appears in the (k + 2) × (k + 2) entry of the k¯ × k¯ matrix (3.2.7), σ ˆε2 Sˆ Sˆ



−1

.

Then, a Wald test of H0 : φ = 0 against H1 : φ 6= 0, at the α% level of significance would be to reject H0 if 0 φˆ Vˆφˆ−1 φˆ > χ21−α (1)

(3.2.8)

As an alternative to the asymptotic distribution, Burridge and Fingleton (2010) suggest a bootstrap procedure with better properties for finite samples.

As a

generalization of this procedure, Kelejian proposes a limited number, g ≥ 1, of alternatives of the same type, in which Model0 is not nested. The J-test works reasonably well for finite samples, although it may lack power, especially when the rival matrices are very close. For further details, see Burridge and Fingleton (2010). We shall remember that our objective is to select the most informative weighting matrix assuming dependence between x and y. In short, the problem of interest is: X0 = X1 = x, ρ0 = ρ1 = 0, but W0 6= W1 . In other words, there are two models

54

3.2 A Testing Strategy for Causality with same explanatory variable, and no spatial autocorrelation in the respective error terms, Model0 and Model1 ; but the weighting matrices differ. The resulting specification is as follows: y = βj x + λj Wj x + uj , j = 0, 1

(3.2.9)

This expression is a reformulation of (3.2.1), considering Wj y = Wj x. In sum, it is worth highlighting that there are few alternative procedures for selecting the correct weighting matrix.

This procedure of the J-test aims to

determine the spatial setting on which the rest of the analysis is based. Given the importance of this decision, the non-parametric chapter presents an alternative procedure to compete with the J-test.

3.2.2 Cross-Spatial Dependence Analysis Having selected the spatial structure that best fits the data, the next step consists of checking for dependence between the variables involved. This can be done with, at least, two different approaches. The first is based on a simple generalization of Moran’s I test; the second approach uses an autoregressive vector model adapted to the spatial case. 3.2.2.1 Bivariate Moran Test The most popular test for spatial autocorrelation is probably Moran’s I test (1950). It can easily be generalized for bivariate processes (Wartenberg, 1985): N P N P

Iyx =

N s S0 P N

i=1

where S0 =

N P N P

wij (yi − y) (xj − x)

i=1j=1 i6=j

s 2

(yi − y)

N P

j=1

(3.2.10) (xj − x)

2

wij , wij is the (i, j) entry of the W weighting matrix.

i=1j=1 i6=j

The expected value and variance of the test are given by the following expressions:

55

3 Causality in Space. A Parametric Approach

E (Iyx ) =

−ryx N −1

(3.2.11)

     2    ryx N [2B +(2S3 − 2S5 )(N −3)+S3N23 ]+        −m y 2 x2

V (Iyx )=

     

σy2 σx2

[6B +(4S1 − 2S2 )(N −3)+S1N23 ]

+N [B +(2S4 − S6 )(N −3)+ S4 N23 ] (N − 1) N23 S02

    

2

−ryx N −1





(3.2.12)

where ryx is the population linear correlation coefficient between xj and yi , σy2 = N P

yi2 x2i i=1 2 V (y), σx = V (x), my2 x2 = N , N23 = (N − 2) (N − 3), B = S02 − S2 + S1 , where N N N N P N P P P P wij ; wij and w•j = (wi• + w•i )2 , with wi• = wij (wij + wji ), S2 = S1 = i=1 j=1 i=1 i=1j=1 i6=j N P N N P N N N P P 2 , S = P w w and S = P (w 2 + w 2 ). S3 = wij wji , S4 = wij 5 i• •i 6 i• •i i=1j=1 i=1j=1 i=1 i=1 i6=j i6=j

A detailed derivation of these results can be found in Czaplewski and Reich (1993). For moderate sample sizes (in any event, N > 40), the authors recommend the following test: T =

(Iyx −E(Iyx ))/(V (Iyx ))1/2 ,

which is distributed approximately like a

standard normal distribution. The null hypothesis of no correlation is rejected when |T | > zα/2 , where zα/2 is the critical value corresponding to the standardized normal value that leaves a probability of α/2 on the right.

3.2.2.2 Lagrange Multiplier Test A more elaborate alternative consists of formulating a spatial autoregressive bivariate vector model based on a single cross-section, SpVARCS:

y= x=

56

k11 X

k12 X k k α0 + ρyy W y + ρkyx W k x + uy k=1 k=0 k21 k22 X X β0 + ρkxy W k y + ρkxx W k x + ux k=1 k=0

(3.2.13) (3.2.14)

3.2 A Testing Strategy for Causality One advantage of this method is that it provides a simple approach to a Grangerlike spatial causality, as we shall see in the next section. The following specification (which can be modified if is necessary) allows us to analyze the assumption of independence between the series:

[IN − ρyy W ] y + [βIN + ρyx W ] x + µy = uy

(3.2.15)

[θIN + ρxy W ] y + [IN − ρxx W ] x + µx = ux

(3.2.16)

In matrix notation:

AY + µ = u h

0

0

0

i

0

0

where Y is a (2N × 1) vector such that Y = y , x , with y and x two row vectors of dimension (N × 1). The µ vector is also of order (2N × 1): µ = m ⊗ l, where 0

l is an (N × 1) vector and m = [µy , µx ]. The error vector is also broken down 0

h

0

0

i

into two sub-vectors of order (N × 1): u = uy , ux , where u ∼ N [0, Σ], with variance-covariance matrix:



2  σy IN

Σ=

0

0 σx2 IN

  

(3.2.17)

Matrix A is of order (2N × 2N ) with the following structure:





 A11

A=

A12 

A21 A22

→

   A11 = IN − ρyy W        A12 = βIN + ρyx W    A21 = θIN + ρxy W       A =I −ρ W 22 xx N

Assuming normality, the loglikelihood function is:

57

3 Causality in Space. A Parametric Approach

0

N N [AY − µ] (Σ ⊗ IN )−1 [AY − µ] L (Y, Ψ) = − ln (2π) − ln |Σ| + ln |A| − 2 2 2 0

h

i

with Ψ = ρyy , β, ρyx , µy , σy2 , ρxx , θ, ρxy , µx , σx2 . The score vector is               l (Ψ) =              

∂L ∂ρyy ∂L ∂β ∂L ∂ρyx ∂L ∂µy ∂L ∂σy2 ∂L ∂ρxx ∂L ∂θ ∂L ∂ρxy ∂L ∂µx ∂L 2 ∂σx





0 1 0 σy2 y W

(A11 y + A12 x + µy ) − tra11 W

    0   − σ12 x (A11 y + A12 x + µy ) + tra21   y      − 12 x0 W 0 (A11 y + A12 x + µy ) + tra21 W   σy   0   − σ12 l (A11 y + A12 x + µy )   y   0   (A11 y+A12 x+µy ) (A11 y+A12 x+µy ) N   − 2σ 2 + 4 2σ = y y   0 0 1   2 x W (A21 y + A22 x + µx ) − tra22 W   σx   0   − σ12 y (A21 y + A22 x + µx ) + tra12   x   0 0     − σ12 y W (A21 y + A22 x + µx ) + tra22 W x     1 0   2 l (A21 y + A22 x + µx ) σx   0 (A21 y+A22 x+µx ) (A21 y+A22 x+µx ) N − 2σ 2 + 2σ 4 x



                            

x

0 1 0 σy2 y W uy

− tra11 W   0  − σ12 x uy + tra21  y  0  1 0  − σ2 x W uy + tra21 W y   0  − σ12 l uy  y 0  uy uy  N − 2σ 2 + 2σ 4  y y l (Ψ) =   1 0 0  σ2 x W ux − tra22 W  x  0  − σ12 y ux + tra12  x   − 12 y 0 W 0 ux + tra12 W  σx  0  − σ12 l ux  x  0 ux ux N − 2σ 2 + 2σ 4 x

                            

x

The details can be found in the appendix to this chapter. The hypothesis of independence between the series is specified as:

58

H0 : A12 = A21 = 0

(3.2.18)

H1 : A12

(3.2.19)

∨ A21 6= 0

3.2 A Testing Strategy for Causality The score, reordered, can be simplified as follows:                 l (Ψ)|H0 =                

0

with l0 =



0 − σ12 x uy y

∂L ∂ρyx ∂L ∂β ∂L ∂ρxy ∂L ∂θ ∂L ∂ρyy ∂L ∂µy ∂L ∂σy2 ∂L ∂ρxx ∂L ∂µx ∂L ∂σx2

0 0 − σ12 x W uy y





0 − σ12 x uy y



       0 0   1    − σ2 x W uy   y         − 1 y0 u  x   σx2         − 12 y 0 W 0 ux     σx          l 0     0  = =        l1 0            0           0             0      

(3.2.20)

0

0 − σ12 y ux x

0 0 − σ12 y W ux x

0

.

The Lagrange multiplier is the quadratic form of the score on the inverse of the information matrix, both (vector and matrix) evaluated under the null hypothesis. Combining the results of (3.2.20) and the expression in the appendix, (3.4.84), we obtain:

0

LMI = l0 I 11 l0 ∼ χ24

(3.2.21)

where I 11 is the inverse of the variance-covariance matrix of the l0 vector, expression (3.4.85) of the appendix. Let α be a real number with 0 ≤ α ≤ 1 where P χ24 > χ2α = α. 

So to test H0 : {xs }s∈S and {ys }s∈S

are

mutually

independent

The decision rule for the LMI test with a confidence level of 100 (1 − α) % is: If

0 ≤ LMI ≤ χ2α ,

Otherwise,

we

we

reject

cannot

reject

H0 ,

H0 .

59

3 Causality in Space. A Parametric Approach 3.2.2.3 Behavior of the Independence Tests. Finite Samples

In this section, we present the size and statistical power of the Iyx and LMI test by means of Monte Carlo simulation exercises. The same analysis will be performed in the non-parametric chapter, and we provide a full presentation of the data-generating processes (D.G.P.) even though they are not all applicable to the tests contemplated here. Each experiment starts by obtaining a random map in a hypothetical twodimensional space. This irregular map is reflected on the corresponding normalized W matrix. This matrix has been constructed following the m − 1 nearest neighbours criterion. The following global parameters are involved in the D.G.P.:

N ∈ {100, 400, 1000} , ρ ∈ {0.3; 0.5; 0.7} , m ∈ {4, 6, 8}

(3.2.22)

where N is the sample size, ρ is the spatial autocorrelation parameter and m is usually known as the embedding dimension. Briefly, the latter corresponds to the set made by each observation and its m − 1 neighbours (we will return to this term in the next chapter). In the experiment, we want to simulate both linear and non-linear relations between the variables x and y. In the first case, linearity, we control the relation by, for instance, the coefficient of determination expected from the equation. Based on a specification like this:

y = βx + θW x + ε,

(3.2.23)

2 the strength of the relation can be deduced by the expected Ry/x coefficient.

Under equation (3.2.23), the expected coefficient of determination between the variables is equal to (assuming an unit variance of x and in ε as well as incorrelation

60

3.2 A Testing Strategy for Causality between the two variables): 2 Ry/x =

β 2 + (θ2/m−1) β 2 + (θ2/m−1) + 1

We have considered different values for this coefficient:

2 Ry/x ∈ {0.4; 0.6; 0.8}

(3.2.24)

For simplicity, in all cases we maintain β = 0.5. The spatial lag parameter of y, 2 2 2 ρ, will be 0.3 for Ry/x = 0.4, 0.5 for Ry/x = 0.6 and 0.7 for Ry/x = 0.8. The spatial

lag parameter of x, θ, is obtained by deduction: θ =

q

(1−m)(β 2 (1−R2 )−R2 ) . 1−R2

Having defined the values of the parameters involved in the simulation, we can present the different processes used in the analysis. To analyze the empirical size, we have considered that the variables are distributed as follows:

y ∼ N (0, 1)

(3.2.25)

x ∼ N (0, 1) Three linear and three non-linear models have been contemplated for statistical power. The former appear in expressions (3.2.26), (3.2.27) and (3.2.28). The latter are obtained by applying different non-linear transformations to the variable y with respect to the corresponding linear case. Linear Models DGP1 (Intra-Dependence) y = ρW y + ε

(3.2.26)

y = βx + θW x + ε

(3.2.27)

DGP2 (Inter-Dependence)

61

3 Causality in Space. A Parametric Approach DGP3 (Inter-Dependence and Intra-Dependence) y = (I − ρW )−1 (θW x + ε)

(3.2.28)

y = 1/[(I−ρW )−1 ε]

(3.2.29)

y = 1/(βx+θW x+ε)

(3.2.30)

Non-Linear Models DGP4 (Intra-Dependence)

DGP5 (Inter-Dependence)

DGP6 (Inter-Dependence and Intra-Dependence) y = 1/[(I−ρW )−1 (θW x+ε)]

(3.2.31)

In all cases: x ∼ N (0, 1) , ε ∼ N (0, 1) and Cov (x, ε) = 0. As we mentioned earlier, some processes will not be used in this section as they pertain to the realm of the null hypothesis of the independence tests. This is the case of the linear model, DGP 1, and its non-linear extension, DGP 4. For the other processes, the following tables summarize the information obtained from the simulations. Table 3.1 shows the empirical size for the Iyx test. The presented values are good, showing conduct ranging around 5%.

Table 3.1: Empirical Size of Iyx Test at 5% level m N = 100 N = 400 N = 1000

4 4.0 6.3 5.6

6 4.8 4.4 4.4

8 4.1 3.6 4.8

Note: Number of Replications: 1000

62

3.2 A Testing Strategy for Causality Table 3.2 presents the power of the bivariate Moran test for the case of interdependence, DGP 2. The test performs fairly for small sample sizes. As the sample size increases, however, the power of test rapidly rises to 100%.

Table 3.2: Empirical Power of Iyx Test at 5% level DGP 2 N = 100 N = 400 N = 1000 m 4 6 8 4 6 8 4 6 8 2 Ry/x = 0.4 56.1 82.2 90.2 100 100 100 100 100 100 2 Ry/x = 0.6 37.6 60.8 70.0 91.6 100 99.5 100 100 100 2 Ry/x = 0.8 28.0 46.5 57.1 83.4 98.7 98.6 100 100 100 Note: Number of Replications: 1000

Table 3.3 shows the results for DGP 3, intra- and inter-dependence. The results are good in most of the simulated cases. For N = 100, as the number of neighbours increases, the estimated power rapidly approaches 100%.

Table 3.3: Empirical Power of Iyx Test at DGP 3 N = 100 N = 400 m 4 6 8 4 6 8 2 Ry/x = 0.4 77.8 95.0 97.3 100 100 100 2 Ry/x = 0.6 59.2 85.1 92.4 98.7 100 100 2 Ry/x = 0.8 51.0 74.5 86.6 97.2 100 100

5% level N = 1000 4 6 8 100 100 100 100 100 100 100 100 100

Note: Number of Replications: 1000

The estimated power values for the non-linear cases, DGP 5 and DGP 6, are summarized on Tables 3.4 and 3.5. We can see how they fall considerably relative to linear processes. In no case the power value is higher than 10%. The bivariate Moran is highly sensitive to the non-linearity of the relationship between the variables.

63

3 Causality in Space. A Parametric Approach

Table 3.4: Empirical Power of Iyx Test at 5% level DGP 5 N = 100 N = 400 N = 1000 m 4 6 8 4 6 8 4 6 8 2 Ry/x = 0.4 4.3 5.3 3.0 7.3 5.1 8.5 5.9 9.7 4.2 2 Ry/x = 0.6 3.9 4.1 4.2 4.5 3.2 9.7 7.7 5.3 5.1 2 Ry/x = 0.8 5.5 4.9 5.5 7.1 7.6 6.6 5.8 3.7 8.4 Note: Number of Replications: 1000

Table DGP 6 m 2 Ry/x = 0.4 2 Ry/x = 0.6 2 Ry/x = 0.8

3.5: Empirical Power of Iyx Test at 5% level N = 100 N = 400 N = 1000 4 6 8 4 6 8 4 6 8 4.2 4.8 6.1 5.2 5.6 2.8 6.6 7.5 3.2 5.7 5.8 5.7 10.0 5.4 7.4 3.7 5.1 6.6 4.5 5.2 6.6 6.3 7.1 6.3 7.3 7.4 3.8

Note: Number of Replications: 1000

Next, we present the results of the Monte Carlo for the LMI test. Table 3.6 shows the size for the test, with values beneath the nominal value of 5%.

Table 3.6: Empirical Size of LMI Test at 5% level m N = 100 N = 400 N = 1000

4 4.0 3.0 3.0

6 3.0 3.0 4.0

8 3.0 3.0 3.0

Note: Number of Replications 1000

The estimated power is shown on Tables 3.7 and 3.8. For both processes, DGP 2 and DGP 3, the results are good, with values of practically 100% in nearly all cases. For small samples, the LMI test performs better than the Iyx test.

64

3.2 A Testing Strategy for Causality

Table 3.7: Empirical Power of LMI Test at DGP 2 N = 100 N = 400 m 4 6 8 4 6 8 2 Ry/x = 0.4 100 100 100 99.0 100 100 2 Ry/x = 0.6 100 99.0 100 100 99.0 100 2 Ry/x = 0.8 100 100 100 100 100 100

5% level N = 1000 4 6 8 100 100 100 100 100 100 100 100 100

Note: Number of Replications: 1000

Table 3.8: Empirical Power of LMI Test at DGP 3 N = 100 N = 400 m 4 6 8 4 6 8 2 Ry/x = 0.4 97.0 99.0 97.0 100 100 100 2 Ry/x = 0.6 97.0 99.0 98.0 100 100 100 2 Ry/x = 0.8 97.0 96.0 99.0 100 100 100

5% level N = 1000 4 6 8 100 100 100 100 100 100 100 100 100

Note: Number of Replications: 1000

The performance of the Lagrange multiplier for non-linear processes is summarized in Tables 3.9 and 3.10. Non-linearity between the variables means that the power of test is practically zero in both cases. This is similar to the result obtained for the bivariate Moran test, although with considerably lower values.

Table 3.9: Empirical Power of LMI Test P DG5 N = 100 N = 400 m 4 6 8 4 6 8 2 Ry/x = 0.4 1.0 0.0 1.0 2.0 1.0 0.0 2 Ry/x = 0.6 1.0 0.0 1.0 0.0 1.0 0.0 2 Ry/x = 0.8 0.0 0.0 2.0 0.0 1.0 1.0

at 5% level N = 1000 4 6 8 1.0 1.0 1.0 1.0 0.0 0.0 1.0 2.0 0.0

Note: Number of Replications: 1000

65

3 Causality in Space. A Parametric Approach

Table 3.10: Empirical Power of DGP 6 N = 100 N m 4 6 8 4 2 Ry/x = 0.4 0.0 0.0 0.0 0.0 2 Ry/x = 0.6 0.0 2.0 3.0 0.0 2 Ry/x = 0.8 2.0 2.0 1.0 2.0

LMI Test = 400 6 8 0.0 0.0 0.0 0.0 0.0 2.0

at 5% level N = 1000 4 6 8 1.0 1.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0

Note: Number of Replications: 1000

In sum, the two tests work well with linear relationships, but they are unable to detect dependence between non-linearly related variables.

3.3 Spatial Causality The Granger test of Chapter 2, is related to the idea that the use of the data pertaining to variable x, assumed to be a cause, helps to improve the forecast of the 

0



variable assumed to be the effect, y; that is, that σ 2 (yt+1 |Λt ) < σ 2 yt+1 |Λt . The problem is that, in this field, we cannot use the terms future or past. The standard solution is to resort to substitute concepts as, in example, near/distant. We first present a traditional proposal in which we analyze causality using a bivariate system of equations. In the second part of the section, we examine an algorithm based solely on the predictive capacity of each variable. In each case we include the principal results of the Monte Carlo simulations.

3.3.1 The Lagrange Multiplier Version of the Granger Test The logical sequence of the discussion contemplated in 3.2.2.2 takes us, once the hypothesis of independence of (3.2.18) in the bivariate system (3.2.15)-(3.2.16) has been rejected, to the hypothesis of non-causality (in information) of x relative to y:

H0 : A12 = 0 H1 : A12 6= 0

66

(3.3.1)

3.3 Spatial Causality or to the hypothesis of non-causality of y relative to x: H0 : A21 = 0

(3.3.2)

H1 : A21 6= 0

Assume that we are interested in analyzing the assumption of non causality of x relative to y. The null hypothesis is: 

 H0 : A12 = 0 

(3.3.3)

 H1 : A12 6= 0 

Maintaining the same ordering in the score as in the previous section, under the null hypothesis, we obtain:

                lΨ|H0 =                

0

with l0 =





lρxy lθ lρyy lµy lσy2 lρxx lµx lσx2

0 − σ12 x uy y

0 − σ12 x uy y

−1 trA−1 11 A21 A22

−     0 0   −1 −1   − σ12 x W uy − trA11 A21 A22 W y       0         0       0   =     0         0       0         0    

lβ  lρyx



0



−1 trA−1 11 A21 A22

0 0 − σ12 x W uy y



                    l0  =    l1             

−1 trA−1 11 A21 A22 W

(3.3.4)

0

.

Again, the idea is to combine the score of (3.3.4) with the inverse of the information matrix which, for the null hypothesis of (3.3.3), appears in expression (3.4.126) of the Appendix. The Lagrange multiplier obtained is:

67

3 Causality in Space. A Parametric Approach

0

LMN C = l0 I 11 l0 ∼ χ22

(3.3.5)

where I 11 is the inverse of the matrix of variances and covariances of vector l0 , evaluated under the null hypothesis, as shown in (3.4.127). Let α be a real number with 0 ≤ α ≤ 1 where P χ22 > χ2α = α. 

To test H0 : {xs }s∈S does not cause {ys }s∈S The decision rule in the application of the LMN C test with a confidence level of 100 (1 − α) % is: If

0 ≤ LMN C ≤ χ2α ,

Otherwise,

we

we

reject

cannot

reject

H0 ,

H0

3.3.1.1 Performance for Finite Samples The following global parameters involved in the D.G.P. are used:

N ∈ {100, 400, 1000} , ρ ∈ {0.3; 0.5; 0.7} , m ∈ {4, 6, 8}

(3.3.6)

where N is the sample size, ρ the spatial autocorrelation parameter and m the embedding dimension (number of neighbours of each observation, m − 1, plus one). 2 As in the previous section, we maintain the expected Ry/x as the statistic for

controlling the linear relationship between variables x and y, in an equation like:

y = βx + θW x + ε,

(3.3.7)

The β parameter remained fixed at 0.5 while the value of θ is adjusted according 2 to the expected Ry/x and the embedding dimension, m.

To analyze the size, we consider a very simple D.G.P.:

68

3.3 Spatial Causality

y ∼ N (0, 1)

(3.3.8)

x ∼ N (0, 1) The following processes were contemplated for obtaining the estimated power: DGP1 y = (I − ρW )−1 (βx + θW x + ε)

(3.3.9)

DGP2 h

y = (I − ρW )−1 (βx + θW x + ε)

i−1

(3.3.10)

DGP3 y = exp

h

(I − ρW )−1 βx + θW x + ε

i3 

(3.3.11)

Moreover, x ∼ N (0, 1) , ε ∼ N (0, 1) and cov (x, ε) = 0. Table 3.11 shows the empirical size results. The values are systematically lower than the nominal level of significance, with a single case (N = 100 and m = 8) exceeding this nominal value.

Table 3.11: Empirical Size of LMN C Test at 5% level m N = 100 N = 400 N = 1000

4 1.9 1.8 2.1

6 2.0 1.9 2.2

8 6.8 1.8 3.4

Note: Number of Replications: 1000

The following tables show the global estimated power of the statistic. Global power refers to the cases where we reject H0 of non-causality from x to y, but we cannot simultaneously reject H0 of non-causality from y to x.

69

3 Causality in Space. A Parametric Approach

2 Ry/x

2 Ry/x

2 Ry/x

Table 3.12: Global Estimated Power of LMN C Test at 5% level DGP 1 N = 100 N = 400 N = 1000 m 4 6 8 4 6 8 4 6 8 ρ = 0.3 29.0 12.0 11.0 92.0 93.0 93.0 99.0 100 100 = 0.4 ρ = 0.5 50.0 19.0 23.0 99.0 95.0 93.0 100 97.0 99.0 ρ = 0.7 69.0 31.0 20.0 100 93.0 95.0 98.0 100 97.0 ρ = 0.3 45.0 18.0 20.0 97.0 97.0 97.0 97.0 100 99.0 = 0.6 ρ = 0.5 52.0 30.0 22.0 100 97.0 95.0 100 98.0 98.0 ρ = 0.7 60.0 39.0 24.0 95.0 95.0 95.0 99.0 99.0 98.0 ρ = 0.3 29.0 22.0 16.0 90.0 89.0 89.0 100 100 99.0 = 0.8 ρ = 0.5 36.0 35.0 19.0 93.0 94.0 91.0 100 97.0 96.0 ρ = 0.7 45.0 29.0 22.0 94.0 98.0 92.0 100 99.0 100

Note: Number of Replications: 1000

Table 3.12 presents the results for DGP 1.

The statistic’s performance is

acceptable. For N = 100, the power is low, 69% in the best case. These results improve rapidly when the sample size is increased, reaching (in most cases) values of more than 90% for sample sizes of 400 and 1000 observations. Table 3.13 then presents the performance of the LMN C test for the second datagenerating process, DGP 2. The results are highly deficient, with incredible low values in the estimated power.

Table 3.13: Global Estimated Power of LMN C Test at 5% level DGP 2 N = 100 N = 400 N = 1000 m 4 6 8 4 6 8 4 6 8 ρ = 0.3 5.6 1.0 3.9 4.0 0.0 0.0 3.0 2.0 4.0 2 Ry/x = 0.4 ρ = 0.5 5.3 6.0 4.8 3.0 5.0 5.0 2.0 1.0 4.0 ρ = 0.7 3.8 3.0 4.6 4.0 2.0 8.0 1.0 3.0 1.0 ρ = 0.3 4.1 4.0 6.3 6.0 6.0 3.0 2.0 1.0 1.0 2 Ry/x = 0.6 ρ = 0.5 5.0 5.0 4.2 9.0 8.0 2.0 4.0 3.0 1.0 ρ = 0.7 4.5 3.0 4.9 6.0 4.0 6.0 0.1 3.0 1.0 ρ = 0.3 4.5 5.0 5.8 4.0 10.0 11.0 0.0 1.0 6.0 2 Ry/x = 0.8 ρ = 0.5 5.9 4.0 7.0 6.0 8.0 12.0 5.0 1.0 7.0 ρ = 0.7 6.0 3.0 6.1 5.0 3.0 6.0 6.0 0.0 7.0 Note: Number of Replications: 1000

70

3.3 Spatial Causality Table 3.14 shows the results obtained for the second non-linear process, DGP 3. The estimated power of the test is, once again, practically zero.

2 Ry/x

2 Ry/x

2 Ry/x

Table 3.14: Global Estimated Power of LMN C Test at DGP 3 N = 100 N = 400 m 4 6 8 4 6 8 ρ = 0.3 7.0 10.0 4.0 12.0 5.0 7.0 4.0 11.0 9.0 12.0 = 0.4 ρ = 0.5 8.0 4.0 ρ = 0.7 6.0 12.0 9.0 8.0 12.0 7.0 ρ = 0.3 3.0 5.0 6.0 10.0 8.0 9.0 = 0.6 ρ = 0.5 6.0 9.0 9.0 4.0 9.0 5.0 ρ = 0.7 3.0 3.0 10.0 4.0 3.0 4.0 ρ = 0.3 3.0 6.0 8.0 11.0 8.0 7.0 4.0 2.0 5.0 7.0 = 0.8 ρ = 0.5 6.0 7.0 ρ = 0.7 4.0 7.0 8.0 5.0 1.0 8.0

5% level N = 1000 4 6 8 2.0 2.0 2.0 4.0 0.0 2.0 5.0 3.0 5.0 2.0 4.0 5.0 2.0 4.0 3.0 6.0 5.0 5.0 7.0 6.0 3.0 8.0 5.0 6.0 7.0 7.0 6.0

Note: Number of Replications: 1000

Again, as with LMI , in relation to non-linear processes, the power of the LMN C test is extremely low, and it is incapable of detecting the direction in the information among the variables under study.

3.3.2 Granger-Wiener Predictive Efficiency In this section we develop an alternative to the Lagrange tests that uses Wiener’s original idea about the prediction of signals from parallel processes. In general, forecasting in spatial econometrics is limited to two situations. The first is spatial interpolation, where we work with observations from fixed locations and we need to combine this information to generate new observations from the variable at a series of intermediate, non-observed, points. The second is extrapolation, which consists of obtaining more disaggregated data from certain level of spatial aggregation. Spatial forecasting, whether extrapolation or interpolation, is directly related to well known phenomena such as ecological inference (King, 1997) and the modifiable areal unit problem or M AU P (Arbia,1989). These questions have in common the

71

3 Causality in Space. A Parametric Approach problems of using relations between variables from different geographical areas. It is well known that economic data (and spatial data, in particular) are aggregations of individual records.

These basic data are randomly aggregated in political-

administrative units of different shapes and sizes. The accumulation of basic records is not systematic, which is the origin of M AU P introduced by Openshaw and Taylor (1979). The treatment we are pursuing does not aim, specifically, to forecast intermediate or disaggregate data. Conditioned on the level of aggregation of the data, rather our proposal for identifying causality mechanisms involves forecasting the data corresponding to each point of the space, considering the information provided by the other locations. With this purpose, we adopt the framework proposed by Kelejian and Prucha (2007), adapted to consider the problem of causality in space. This framework is not substantially different from the bivariate approach with which we obtained the Lagrange multipliers. Specifically, Kelejian and Prucha (2007) refer the discussion to the case of two vectors, Z1 and Z2 , jointly normally distributed, (Z1 , Z2 ) ∼ N (µ, V ), where









 µ1   V11  and V = 

µ=

µ2

V12 

V21 V22

.

If the objective is to minimize the mean square error (M SE) of the forecast of Z1 , conditioned by Z2 , the optimal predictor is the conditional mean, denoted by E (Z1 | Z2 = z2 ). Moreover V (Z1 | Z2 = z2 ) refers to the conditioned variance in the same case. Using standard results:

−1 E (Z1 | Z2 = z2 ) = µ1 + V12 V22 (z2 − µ2 ) −1 V (Z1 | Z2 = z2 ) = V11 − V12 V22 V21 .

72

(3.3.12)

3.3 Spatial Causality Turning back to the case under study, first we assume that y does not cause x in a given spatial context, and we want to determine whether x causes y. Consider the following model:

y = ρW y + βx + θW x + ε

(3.3.13)

where x and y are (N × 1) vectors, W is a normalized (N × N ) weighting matrix and ε is a vector of error terms. Parameters ρ, β, θ and δ are scalars. Assume that ε ∼ N 0, σε2 , |ρ| < 1 and (IN − ρW ) in not singular, where IN is the identity 

matrix of order N . The reduced form of (3.3.13) is given by: y = (IN − ρW )−1 (βx + θW x + ε) . Considering model (3.3.13), the objective is to forecast the i − th element of y, yi :

yi = ρwi. y + βxi. + θwi. x + εi.

(3.3.14)

where wi. is the i − th row of W , xi. and εi. are the i − th elements of x and ε. Note that wi. y does not include yi as the elements of the main diagonal of W are zero. On the other hand, let H−i be an ((N − 1)×N ) matrix obtained from identity, IN , from which we eliminate the i − th row. Let y−i be the vector of (N − 1) observations of variable y, such that y−i = H−i y. We can obtain the prediction of each observation of vector y conditioned by the following data sets:

Λ1 = {x, W } , Λ2 = {x, W, wi. y} , Λ3 = {x, W, y−i } .

73

3 Causality in Space. A Parametric Approach It is clear that Λ1 and Λ2 are included in the full information case, Λ3 . The Λ3 information set includes the (N − 1) observations of vector y (all except the i − th), as well as vector x and the weighting matrix W . Λ2 includes information about a linear combination of the observations of y, while the first set, Λ1 , contains no information about variable y. When using the information set, Λ1 , the forecasts are obtained from the reduced form of the model, expression (3.3.13). If we use the Λ2 information set, the approximation is that of equation (3.3.14). (p)

Accordingly, we consider three different predictors, denoted as yi , p = 1, 2, 3. Each of them is the conditional mean of variable y at point i, corresponding to the different information sets Λp , p = 1, 2, 3. We thus obtain:

(a) Predictor for information level 1 (1)

yi

= E (yi | x, W )

(3.3.15)

= (I − ρW )−1 lii (βxi. + θwi. x) where lii is a zero column vector, except for the i − th, which has a value of 1.

(b) Predictor for information level 2 (2)

yi

= E (yi | x, W, wi. y)

(3.3.16)

= ρwi. y + βxi. + θwi. x + cov(εi , wi. y) [V C(wi. y)]−1 [wi. y − E (wi. y)] where wi. y = βwi. (I − ρW )−1 x + θwi. (I − ρW )−1 W x + wi. (I − ρW )−1 ε, with 

conditional variance V C(wi. y) = σε2 wi. (I − ρW )−1 I − ρW 0

0

−1

0

wi. , and covariance

cov(εi , wi. y) = σε2 wi. (I − ρW )−1 lii , E (wi. y) = wi. (I − ρW )−1 (βx + θW x).

(c) Predictor for level information 3

74

3.3 Spatial Causality (3)

yi

= E (yi | x, W, y−i )

(3.3.17)

= ρwi. y + βxi. + θwi. x + 0

cov(εi , y−i ) [V C(y−i )]−1 [y−i − E (y−i )] 

where V C(y−i ) = σε2 H−i (I − ρW )−1 I − ρW

0

−1

0

H−i is the conditional variance,

cov(εi , y−i ) = σε2 H−i (I − ρW )−1 lii , E (y−i ) = H−i (I − ρW )−1 (βx + θW x). (j)

(j)

Let ei

(j)

be the forecast error of yi when the predictor is yi : ei

(j)

= yi − yi ,

j = 1, 2, 3. The variance of the respective forecasting error is then equal to:



(1)

var ei

| Λ1





= var (I − ρW )−1 i. ε | Λ1



(3.3.18) 0

−1 = σε2 (I − ρW )−1 i. (I − ρW )i. ,



(2)

var ei

| Λ2



= var (εi | Λ2 )

(3.3.19)

= σε2 − [cov (εi , wi y)]2 [var (wi. y)]−1 ,



(3)

var ei

| Λ3



= var (εi | Λ3 )

(3.3.20) 0

= σε2 − cov (εi , y−i ) [V C (y−i )]−1 cov (εi , y−i ) , Using this information, we can develop different indicators in order to be able to detect the existence of causal relationships. The following options seem obvious: (a) The forecast error. (b) The variance of the forecast error. (c) The structural permanence test. In the three cases, the respective indicator should provide better results if we follow the right direction of causality. If we focus on the case of the forecast error, we assume that if variable x causes variable y, then:

75

3 Causality in Space. A Parametric Approach

(j) (j) ei (yi | y−i ; x) ≤ ei (yi | y−i )

(3.3.21)

In other words, if variable x causes variable y, the forecast error will be smaller for the correctly specified model (Aznar, 1989). To avoid sign cancellations, we use the absolute value transformation. The proposal is to iterate the comparison of (3.3.21) for each point in the sample. This will enable us to complete the discussion with a formal statistical test, such as the proportions test. First, let τi be the indicator function defined as:

τi =

   1

if i satisfy ei (yi | y−i ) > ei (yi | y−i , x)

  0

otherwise

(j)



(j)



(3.3.22)

Then, τi is a Bernoulli variable with probability of “success” p, where “success” means that the i − th observation meets the inequality criterion. If Q counts the number of successes in the sample, it is distributed:

Q =

X

τi ∼ B (N, p)

(3.3.23)

i∈N

So the estimated proportion, pˆ, has an approximately normal distribution (when N is large enough):

pˆ =

Q ≈ N (p, p(1−p)/N ) N

(3.3.24)

Our interest lies in the fact that, if x causes y, this proportion should be significantly higher than 0.5: Under H0 , p = 0.5 so the confidence interval for p is

76

3.3 Spatial Causality



p ∈ −∞; 0.5 + Nα

q



(3.3.25)

0.25/N

If pˆ lies in the interval (3.3.25), we cannot reject H0 .

3.3.2.1 Monte Carlo Simulations In this section we present some of the main results obtained from the Monte Carlo experiments in the relation to the proportion test of (3.3.25). We now present, in Table 3.15, the empirical size for the three different data sets. There is a clear tendency to underestimate the size in most of the cases.

Table 3.15: Empirical Size of pˆ Test at 5% level Inf ormative Level m N = 100 N = 400 N = 1000

Λ2

Λ3

4

Λ1 6

8

4

6

8

4

6

8

1.0 3.0 2.0

2.0 1.0 1.0

1.0 0.0 1.0

1.0 2.0 2.0

0.0 2.0 2.0

1.0 1.0 4.0

2.0 1.0 2.0

3.0 3.0 1.0

1.0 0.0 3.0

Note: Number of Replications: 1000

The estimated power for processes DGP 1, DGP 2 and DGP 3 are shown on Tables 3.16, 3.17 and 3.18, respectively. Overall, the results are really disappointing, with an estimated power almost zero even for the linear case, DGP 1. Regardless of the data level, the test offers no evidence of detecting the right direction in the information flow.

77

3 Causality in Space. A Parametric Approach

Table 3.16: Global Estimated Power of pˆ Test at 5% level Λ1 m 2 Ry/x = 0.4

2 Ry/x = 0.6

2 Ry/x = 0.8

N = 100 4

2 Ry/x = 0.6

2 Ry/x = 0.8

2 Ry/x = 0.6

2 Ry/x = 0.8

7.0 4.0 3.0 8.0 10.0 10.0 10.0 13.0 19.0 20.0 25.0 13.0 26.0 20.0 26.0 23.0 30.0 20.0 N = 100

6

N = 1000 8

6

4

6

8

4

6

8

4

6

8

2.0 0.0 3.0 3.0 5.0 4.0 19.0 12.0 15.0

0.0 0.0 3.0 0.0 3.0 1.0 5.0 5.0 9.0

2.0 1.0 4.0 2.0 1.0 1.0 2.0 3.0 2.0

1.0 1.0 0.0 0.0 1.0 1.0 1.0 0.0 2.0

0.0 1.0 2.0 2.0 0.0 1.0 2.0 3.0 1.0

6.0 6.0 9.0 6.0 6.0 5.0 9.0 8.0 9.0

5.0 3.0 7.0 3.0 5.0 8.0 7.0 1.0 5.0

2.0 6.0 8.0 6.0 4.0 5.0 6.0 6.0 8.0

3.0 5.0 5.0 5.0 4.0 4.0 8.0 3.0 6.0

8

4

1.0 1.0 0.0 0.0 2.0 0.0 2.0 10.0 2.0

6

8

0.0 5.0 5.0 3.0 2.0 5.0 2.0 6.0 1.0 5.0 2.0 2.0 2.0 6.0 10.0 10.0 1.0 8.0 N = 400

11.0 9.0 8.0 10.0 3.0 4.0 4.0 6.0 12.0 N

8

ρ = 0.3 ρ = 0.5 ρ = 0.7 ρ = 0.3 ρ = 0.5 ρ = 0.7 ρ = 0.3 ρ = 0.5 ρ = 0.7

6

0.0 1.0 0.0 0.0 2.0 3.0 2.0 1.0 1.0 0.0 0.0 2.0 1.0 0.0 0.0 0.0 1.0 0.0 N = 100

0.0 6.0 5.0 5.0 3.0 3.0 2.0 5.0 2.0 7.0 2.0 5.0 3.0 5.0 4.0 12.0 1.0 14.0 N = 400

4

2.0 2.0 2.0 0.0 0.0 2.0 0.0 1.0 1.0

4

3.0 1.0 0.0 2.0 4.0 1.0 4.0 1.0 1.0

N = 400

ρ = 0.3 ρ = 0.5 ρ = 0.7 ρ = 0.3 ρ = 0.5 ρ = 0.7 ρ = 0.3 ρ = 0.5 ρ = 0.7

Note: Number of Replications: 1000

78

4

6.0 16.0 14.0 22.0 24.0 27.0 35.0 47.0 33.0

Λ3 m 2 Ry/x = 0.4

8

ρ = 0.3 ρ = 0.5 ρ = 0.7 ρ = 0.3 ρ = 0.5 ρ = 0.7 ρ = 0.3 ρ = 0.5 ρ = 0.7

Λ2 m 2 Ry/x = 0.4

6

DGP 1

4

8.0 7.0 6.0 6.0 4.0 5.0 8.0 3.0 6.0

9.0 7.0 6.0 10.0 8.0 10.0 8.0 6.0 8.0 8.0 8.0 6.0 9.0 9.0 3.0 2.0 12.0 11.0 = 1000 6

8

7.0 3.0 5.0 4.0 6.0 4.0 10.0 6.0 3.0 3.0 6.0 7.0 5.0 5.0 4.0 6.0 5.0 6.1 N = 1000

3.3 Spatial Causality

Table 3.17: Global Estimated Power of pˆ Test at 5% level Λ1 m 2 Ry/x = 0.4

2 Ry/x = 0.6

2 Ry/x = 0.8

N = 100 4

2 Ry/x = 0.6

2 Ry/x = 0.8

2 Ry/x = 0.6

2 Ry/x = 0.8

N = 1000 8

8

4

6

8

4

6

8

4

6

8

1.0 1.0 0.0 0.0 1.0 1.0 1.0 0.0 2.0

0.0 1.0 2.0 2.0 0.0 1.0 2.0 3.0 1.0

0.0 2.0 1.0 2.0 0.0 1.0 0.0 1.0 1.0

3.0 0.0 2.0 0.0 1.0 1.0 3.0 2.0 5.0

3.0 1.0 1.0 4.0 3.0 3.0 4.0 4.0 5.0

2.0 4.0 1.0 0.0 6.0 7.0 8.0 5.0 7.0

0.0 0.0 0.0 3.0 4.0 3.0 4.0 1.0 3.0

3.0 3.0 4.0 3.0 0.0 2.0 3.0 0.0 9.0

9.0 3.0 2.0 4.0 5.0 2.0 3.0 0.0 6.0

8

4

2.0 1.0 1.0 4.0 3.0 5.0 7.0 5.0 1.0 N

6

8

0.0 0.0 0.0 3.0 1.0 2.0 0.0 2.0 5.0 0.0 4.0 4.0 3.0 3.0 6.0 1.0 1.0 0.0 = 400

4.0 3.0 2.0 8.0 9.0 1.0 2.0 4.0 3.0 N

6

ρ = 0.3 ρ = 0.5 ρ = 0.7 ρ = 0.3 ρ = 0.5 ρ = 0.7 ρ = 0.3 ρ = 0.5 ρ = 0.7

6

2.0 2.0 1.0 1.0 4.0 0.0 0.0 1.0 1.0 0.0 1.0 4.0 2.0 4.0 1.0 2.0 0.0 1.0 = 400

4

1.0 1.0 0.0 0.0 2.0 0.0 2.0 1.0 2.0 N

0.0 1.0 1.0 2.0 2.0 2.0 2.0 1.0 1.0 0.0 2.0 1.0 2.0 0.0 1.0 2.0 1.0 1.0 = 100

7.0 2.0 0.0 4.0 3.0 5.0 0.0 1.0 1.0 N

6

ρ = 0.3 ρ = 0.5 ρ = 0.7 ρ = 0.3 ρ = 0.5 ρ = 0.7 ρ = 0.3 ρ = 0.5 ρ = 0.7

4

0.0 2.0 5.0 1.0 3.0 3.0 2.0 2.0 2.0 0.0 2.0 4.0 3.0 0.0 4.0 2.0 1.0 1.0 = 100

N = 400

4

3.0 1.0 0.0 2.0 4.0 1.0 4.0 1.0 1.0 N

Λ3 m 2 Ry/x = 0.4

8

ρ = 0.3 ρ = 0.5 ρ = 0.7 ρ = 0.3 ρ = 0.5 ρ = 0.7 ρ = 0.3 ρ = 0.5 ρ = 0.7

Λ2 m 2 Ry/x = 0.4

6

DGP 2

4

4.0 4.0 4.0 3.0 2.0 4.0 4.0 4.0 5.0 N

3.0 0.0 0.0 4.0 4.0 6.0 2.0 0.0 3.0 3.0 3.0 1.0 2.0 1.0 8.0 1.0 4.0 3.0 = 1000 6

8

4.0 3.0 2.0 3.0 2.0 4.0 4.0 0.0 1.0 0.0 3.0 3.0 1.0 6.0 4.0 2.0 1.0 0.0 = 1000

Note: Number of Replications: 1000

79

3 Causality in Space. A Parametric Approach

Table 3.18: Global Estimated Power of pˆ Test at 5% level DGP 3 Λ1 m 2 Ry/x = 0.4

2 Ry/x = 0.6

2 Ry/x = 0.8

N = 100 4

ρ = 0.3 ρ = 0.5 ρ = 0.7 ρ = 0.3 ρ = 0.5 ρ = 0.7 ρ = 0.3 ρ = 0.5 ρ = 0.7

Λ2 m 2 Ry/x = 0.4

2 Ry/x = 0.6

2 Ry/x = 0.8

4

ρ = 0.3 ρ = 0.5 ρ = 0.7 ρ = 0.3 ρ = 0.5 ρ = 0.7 ρ = 0.3 ρ = 0.5 ρ = 0.7

Λ3 m 2 Ry/x = 0.4

2 Ry/x = 0.6

2 Ry/x = 0.8

0.0 0.0 0.0 0.0 1.0 1.0 1.0 1.0 1.0 N

ρ = 0.3 ρ = 0.5 ρ = 0.7 ρ = 0.3 ρ = 0.5 ρ = 0.7 ρ = 0.3 ρ = 0.5 ρ = 0.7

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 N

6

8

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 1.0 0.0 1.0 3.0 = 100 6

8

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 = 100

4

2.0 1.0 2.0 1.0 1.0 2.0 0.0 0.0 2.0 N 4

1.0 0.0 0.0 2.0 2.0 1.0 0.0 2.0 2.0 N

6

N = 1000 8

0.0 2.0 2.0 0.0 1.0 2.0 1.0 2.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 2.0 2.0 2.0 = 400 6

8

2.0 0.0 2.0 1.0 2.0 0.0 1.0 0.0 1.0 5.0 0.0 4.0 1.0 3.0 1.0 0.0 0.0 2.0 = 400

4

0.0 0.0 1.0 2.0 0.0 2.0 0.0 0.0 2.0 N 4

0.0 0.0 1.0 2.0 0.0 1.0 1.0 1.0 0.0 N

6

8

2.0 0.0 1.0 1.0 0.0 1.0 0.0 0.0 1.0 4.0 1.0 6.0 2.0 5.0 2.0 0.0 1.0 0.0 = 1000 6

8

3.0 5.0 1.0 5.0 0.0 7.0 0.0 0.0 1.0 2.0 1.0 1.0 4.0 0.0 2.0 2.0 1.0 1.0 = 1000

4

6

8

4

6

8

4

6

8

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

0.0 1.0 1.0 2.0 1.0 1.0 2.0 2.0 2.0

2.0 0.0 1.0 1.0 1.0 0.0 0.0 1.0 0.0

0.0 2.0 2.0 1.0 2.0 1.0 1.0 5.0 4.0

0.0 2.0 2.0 1.0 2.0 1.0 0.0 0.0 0.0

2.0 0.0 0.0 2.0 0.0 0.0 1.0 1.0 1.0

0.0 1.0 1.0 1.0 2.0 0.0 2.0 2.0 0.0

Note: Number of Replications: 1000

80

N = 400

3.4 Summary

3.4 Summary This chapter has shown a strategy for detecting causality in space, which places special emphasis on the concept of incremental information. In this sense, a variable is classified as a cause if it provides additional information about the effect variable. Similar to the time series analysis, we propose a sequence of steps. The first stage of the process is the analysis of the univariate spatial structure of the series. Moreover we have to check for the assumption of independence between the series using, for example, the bivariate Moran test, Iyx , or the Lagrange Multiplier, LMI . Consideration of a multivariate simultaneous spatial model provides an adequate context for a Granger spatial causation test, LMN C . A predictive approach has also been applied to supplement the LMN C test. Similar to time series forecasting, the idea is to predict the data corresponding to each point in the spatial system, considering information provided by the other locations. The forecasting evaluation is by means of mean absolute error, (others indicators gave similar, spelling, results) transforming this information in a simple proportion test. These procedures enable us to approach the concept of causality in space. The procedure, using the Lagrange Multiplier LMN C , works reasonably well for the linear case. However, no clear sign of directionality is obtained when we apply this method to non-linear relationships. In the case of the predictive approach, the results are unexpectedly unacceptable, and no incremental predictive information can be detected. In general, the tests we have presented are unable to appropriately detect nonlinear relationships. In the next chapter, we present an alternative proposal capable of capturing more useful information than the parametric approach.

81

3 Causality in Space. A Parametric Approach

Appendix NOTE 1: Inverse Partitioned Matrix 







 A11 A12 

 a11 a12 

A21 A22

a21 a22

A=

−1  =⇒ A = 



(3.4.1)

with a11 =

h

A11 − A12 A−1 22 A21

i−1

(3.4.2)

a12 = −a11 A12 A−1 22

(3.4.3)

a21 = −A−1 22 A21 a11

(3.4.4)

−1 −1 a22 = A−1 22 + A22 A21 a11 A12 A22

(3.4.5)

NOTE 2: Second Derivative of the Log-likelihood Function ∂2L N =− 2 ∂µy ∂µy σy

(3.4.6)

0

∂2L lx =− 2 ∂µy ∂β σy

(3.4.7)

0

∂2L l Wy =− 2 ∂µy ∂ρyx σy

(3.4.8)

0

l Wy ∂2L = ∂µy ∂ρyy σy2

(3.4.9)

0

∂2L l uy = 4 2 ∂µy ∂σy σy

(3.4.10)

0

∂2L l Wy =− 2 ∂µx ∂θ σx

(3.4.11)

0

∂2L l Wx = ∂µx ∂ρxx σx2

82

(3.4.12)

3.4 Summary

0

l Wy ∂2L =− 2 ∂µx ∂ρxy σx

(3.4.13)

0

∂2L l ux = 4 2 ∂µx ∂σx σx

(3.4.14)

∂2L xx −1 = − 2 − trA−1 22 A21 a11 A22 A21 a11 ∂β∂β σy

(3.4.15)

∂2L −1 −1 = −trA−1 22 a11 − trA22 A21 a11 A12 A22 a11 ∂β∂θ

(3.4.16)

0

∂2L −1 −1 −1 −1 = −trA−1 22 W A22 A21 a11 − trA22 A21 a11 A12 A22 W A22 A21 a11 ∂β∂ρxx

(3.4.17)

0

∂2L x Wx −1 −1 A21 a11 W A−1 = − 2 − trA22 22 A21 a11 − trA22 W a11 ∂β∂ρyx σy

(3.4.18)

0

∂2L x Wy = − trA−1 22 A21 a11 W a11 ∂β∂ρyy σy2

∂2L −1 −1 = −trA−1 22 A21 a11 A12 A22 W a11 − trA22 W a11 ∂β∂ρxy

(3.4.19)

(3.4.20)

0

∂2L x uy = 4 2 ∂β∂σy σy

(3.4.21)

∂2L yy −1 = − 2 + tra11 A12 A−1 22 a11 A12 A22 ∂θ∂θ σy

(3.4.22)

0

83

3 Causality in Space. A Parametric Approach

0

y Wy ∂2L −1 −1 = − tra11 A12 A−1 22 W A21 a11 − tra11 A12 A22 W A22 ∂θ∂ρxx σx2

(3.4.23)

∂2L −1 −1 = −tra11 W A−1 22 A21 a11 A12 A22 − tra11 W A22 ∂β∂ρyx

(3.4.24)

∂2L = −tra11 W a11 A12 A−1 22 ∂θ∂ρyy

(3.4.25)

0

∂2L y Wy −1 = − 2 − tra11 A12 A−1 22 W a11 A12 A22 ∂θ∂ρxy σx

(3.4.26)

0

∂2L ∂ρxx ∂ρxx

∂2L x ux =− 4 2 ∂θ∂σx σy

(3.4.27)

h x W Wx −1 − trA W + W A−1 22 22 A21 a11 A12 σx2

(3.4.28)

0

= −

0

i

−1 −1 −1 +A21 a11 A12 A−1 22 W A22 A21 a11 A12 + A21 a11 A12 A22 W A22 W

h i ∂2L −1 −1 = −trA−1 22 A21 a11 W A22 A21 a11 A12 + I A22 W ∂ρxx ∂ρyx

∂2L −1 = −trA−1 22 A21 a11 W a11 A12 A22 W ∂ρxx ∂ρyy

0

(3.4.29)

(3.4.30)

0

h i ∂2L x W Wy −1 −1 −1 = − trA I + A a A A 21 11 12 22 W a11 A12 A22 W 22 ∂ρxx ∂ρxy σx2

(3.4.31)

0

∂2L x W ux = 2 ∂ρxx ∂σx σx4

84

(3.4.32)

3.4 Summary

0

0

x W Wx ∂2L −1 =− − trA−1 22 A21 a11 W A22 A21 a11 W ∂ρyx ∂ρyx σy2

0

(3.4.33)

0

∂2L x W Wy = − trA−1 22 A21 a11 W a11 W 2 ∂ρyx ∂ρyy σy

(3.4.34)

i h ∂2L −1 = −tr I + A−1 22 A21 a11 A12 A22 W a11 W ∂ρyx ∂ρxy

(3.4.35)

0

∂2L x W ux = 2 ∂ρyx ∂σy σy4

0

(3.4.36)

0

∂2L x W Wy =− − tra11 W a11 W ∂ρyy ∂ρyy σy2

(3.4.37)

∂2L = −tra11 A12 A−1 22 W a11 W ∂ρyy ∂ρxy

(3.4.38)

0

∂2L x W uy =− 2 ∂ρyy ∂σy σy4

0

(3.4.39)

0

∂2L y W Wy −1 =− − tra11 A12 A−1 22 W a11 A12 A22 W ∂ρxy ∂ρxy σx2

(3.4.40)

0

∂2L x W ux =− ∂ρxy ∂σy2 σx4

(3.4.41)

0

uy uy ∂2L R = − ∂σy2 ∂σy2 2σy4 σx6

(3.4.42)

0

∂2L R u ux = 4 − x6 ∂σx2 ∂σx2 2σy σx

(3.4.43)

85

3 Causality in Space. A Parametric Approach

∂2L ∂2L ∂2L ∂2L ∂2L ∂2L = = = = = =0 ∂β∂σx2 ∂θ∂σy2 ∂ρxx ∂σy2 ∂ρxx ∂σx2 ∂ρxx ∂σx2 ∂ρxx ∂σx2

(3.4.44)

∂2L ∂2L ∂2L ∂2L ∂2L = = = = =0 ∂µy ∂µx ∂µy ∂θ ∂µy ∂ρxx ∂µy ∂ρxy ∂µx ∂β

(3.4.45)

∂2L ∂2L ∂2L ∂2L ∂2L = = = = =0 ∂µx ∂ρyx ∂µx ∂ρyy ∂µx ∂σy2 ∂µy ∂σx2 ∂σy2 ∂σx2

(3.4.46)

NOTE 3: Independence Hypothesis The null hypothesis of independence between the series can be formulated as:   H0 : A12 = A21 = 0   

H1 :

(3.4.47)

A12 ∨ A21 6= 0

implying that:

E (y) = −µy A−1 11 l = δy

(3.4.48)

E (x) = −µx A−1 22 l = δx

(3.4.49)

E yy 0



h

i

0

(3.4.50)

h

i

0

(3.4.51)

−1 2 0 2 = A−1 11 µy ll + σy I A11 = ∆y

E xx0



−1 2 0 2 = A−1 22 µx ll + σx I A22 = ∆x

E yx0



0 −1 = (µx µy ) A−1 11 ll A22 = ∆xy

0

(3.4.52)

The elements of the information matrix, under the null hypothesis, are:

Iµy µy =

N σy2

(3.4.53)

0

Iµy β

86

l A−1 22 l = µx σy2

(3.4.54)

3.4 Summary

0

Iµy ρyx

l W A−1 22 l = −µx σy2

(3.4.55)

0

Iµy ρyy = −µy

l W A−1 11 l 2 σy

(3.4.56)

0

Iµx θ = −µy

l A−1 11 l σx2

(3.4.57)

0

Iµx ρxx

l W A−1 22 l = −µx 2 σx

Iµx ρxy

l W A−1 11 l = −µy σx2

(3.4.59)

1 tr∆x σy2

(3.4.60)

(3.4.58)

0

Iββ =

−1 Iβθ = trA−1 22 A11

(3.4.61)

1 −1 trW ∆x − trA−1 22 W A11 σy2

(3.4.62)

Iβρyx = −

Iβρyy =

1 trW ∆xy σy2

−1 Iβρxy = trA−1 22 W A11

Iθθ =

Iθρxx =

1 tr∆y σx2 1 trW ∆xy σx2

(3.4.63)

(3.4.64)

(3.4.65)

(3.4.66)

87

3 Causality in Space. A Parametric Approach

−1 Iθρyx = trA−1 22 W A11

Iθρxy =

Iρxx ρxx =

1 trW ∆y σx2

1 trW 0 W ∆x σx2

−1 + trA−1 22 W A22 W

1 trW 0 W ∆xy σx2

Iρxx ρxy =

(3.4.67)

(3.4.68)

(3.4.69)

(3.4.70)

0

Iρxx σx2

trA−1 22 W = σx4

Iρyx ρyx =

1 trW 0 W ∆x σy2

(3.4.72)

Iρyx ρyy =

1 trW 0 W ∆xy σy2

(3.4.73)

−1 Iρyx ρxy = trA−1 22 W A11 W

(3.4.74)

Iρyy ρyy =

1 −1 trW 0 W ∆xy + trA−1 11 W A11 W σy2

(3.4.75)

1 trA−1 11 W σy2

(3.4.76)

1 trW 0 W ∆y σx2

(3.4.77)

Iρyy σy2 =

Iρxy ρxy =

Iσy2 σy2 =

88

(3.4.71)

N 2σy4

(3.4.78)

3.4 Summary

Iσx2 σx2 =

N 2σx4

(3.4.79)

Iµy µx = Iµy θ = Iµy ρxx = Iµy ρxy = Iµy σy2 = Iµy σx2 = Iµx β = 0

(3.4.80)

Iµx ρyx = Iµx ρyy = Iβρxx = Iθρyy = Iµx σy2 = Iµx σx2 = 0

(3.4.81)

Iρyy ρxy = Iρxx ρyx = Iρxx ρyy = Iρxx σy2 = Iρyx σy2 = Iρyx σx2 = 0

(3.4.82)

Iσx2 σy2 = Iθσy2 = Iθσx2 = Iβσy2 = Iβσx2 = Iρxy σy2 = Iρxy σx2 = Iρyy σx2 = 0

(3.4.83)

Then, the information matrix is:



 Iββ

              I=               

Iβρyx Iρyx ρyx

Iβθ

Iβρxy

Iβµy

0

0

0



0

Iρyx µy

Iρyx ρyy

0

0

0

0

Iθρxy

0

0

0

Iθµx

Iθρxx

0

Iρxy ρxy

0

0

0

Iρxy µx

Iρxy ρxx

0

Iµy µy

Iµy ρyy

0

0

0

0

Iρyy ρyy

Iρyy σy2

0

0

0

Iσy2 σy2

0

0

0

Iµx µx

Iµx ρxx

0

Iρxx ρxx

Iρxx σx2

Iρyx θ Iρyx ρxy Iθθ

Iβρyy

Iσx2 σx2 



 I11 I12 

I=

I21 I22



(3.4.84)

89

                              

3 Causality in Space. A Parametric Approach where:



 Iββ

Iβρyx

   I11 =     



 Iβµy

   Iρyx µy I12 =    0  

Iρyx ρyx



Iρyx θ Iρyx ρxy   Iθρxy Iρxy ρxy

    



Iβρyy

0

0

0

0 

Iρyx ρyy

0

0

0

0  

0

0

0

0 Iρxy µx

        I22 =         

Iβρxy 

Iθθ

0

 Iµy µy



Iβθ

Iθµx

Iθρxx Iρxy ρxx



  0   

0

Iµy ρyy

0

0

0

0

Iρyy ρyy

Iρyy σy2

0

0

0

Iσy2 σy2

0

0

0

Iµx µx

Iµx ρxx

0

Iρxx ρxx

Iρxx σx2 Iσx2 σx2

                 

The inverse of the sub-matrix I11 is:

h

−1 −1 I11 = I 11 = I11 − I12 I22 I21

i−1

(3.4.85)

NOTE 4: No Causality Hypothesis from x to y The null hypothesis x does not cause y can be formulated as: 

 H0 : A12 = 0   H1 : A12 6= 0 

90

(3.4.86)

3.4 Summary implying that: E (y) = −µy A−1 11 l = δy

(3.4.87)

−1 E (x) = −µx A−1 22 l − A22 A21 δy = δx

(3.4.88)

h

0

i

E yy 0



−1 2 0 2 = A−1 11 µy ll + σy I A11 = ∆y

E xx0



2 2 0 = A−1 22 σx I + µx ll + µx (lA21 δy + A21 δy l )

0

h

0

0

0

(3.4.89) 0

(3.4.90)

0

i

−1 +A21 δy δy A21 A22 = ∆x

E xy 0





0



= −A−1 22 µx lδy + A21 ∆y = ∆xy

(3.4.91)

The elements of the information matrix, under the null hypothesis, are:

N σy2

(3.4.92)

l 0 δx σy2

(3.4.93)

l 0 W δx σy2

(3.4.94)

Iµy µy =

Iµy β =

Iµy ρyx =

l 0 W δy σy2

(3.4.95)

l0 δy σx2

(3.4.96)

l0 W δx σx2

(3.4.97)

Iµy ρyy = −

Iµx θ =

Iµx ρxx = −

Iµx ρxy =

l 0 W δy σx2

(3.4.98)

91

3 Causality in Space. A Parametric Approach

Iββ =

Iβρyx =

1 −1 −1 −1 tr∆x + iββ ; iββ = trA−1 22 A21 A11 A22 A21 A11 σy2

−1 Iβθ = iβθ ; iβθ = trA−1 22 A11

(3.4.100)

−1 −1 Iβρxx = iβρxx ; iβρxx = trA−1 22 W A22 A21 A11

(3.4.101)

1 −1 −1 −1 −1 −1 trW ∆x +iβρyx ; iβρyx = trA−1 22 A11 trA22 A21 A11 W A22 A21 A11 (3.4.102) σy2

−1 −1 Iβρyy = iβρyy ; iβρyx = trA−1 22 A21 A11 W A11

(3.4.103)

−1 Iβρxy = iβρxy ; iβρxx = trA−1 22 W A11

(3.4.104)

Iβσy2 =

1 −1 i 2 ; i 2 = trA−1 22 A21 A11 σy2 βσy βσy

(3.4.105)

1 tr∆y σx2

(3.4.106)

1 trW ∆xy σx2

(3.4.107)

Iθθ =

Iθρxx = −

−1 −1 −1 Iθρyx = trA−1 22 W A11 = iβρyx ; iβρyx = trA22 W A11

Iθρxy =

92

(3.4.99)

1 trW ∆y σx2

(3.4.108)

(3.4.109)

3.4 Summary

Iρxx ρxx =

1 trW 0 W ∆x σx2

−1 + iρxx ρxx ; iρxx ρxx = trA−1 22 W A22 W

−1 −1 Iρxx ρyx = iρxx ρyx ; iρxx ρyx = trA−1 22 A21 A11 W A22 W

Iρxx ρxy = −

Iρxx σx2 =

Iρyx ρyx =

1 trW 0 W ∆xy σx2

1 −10 i 2; i 2 = trA22 W σx4 ρxx σx ρxx σx

(3.4.110)

(3.4.111) (3.4.112)

(3.4.113)

1 −1 −1 −1 trW 0 W ∆x +iρyx ρyx ; iρyx ρyx = trA−1 22 A21 A11 W A22 A21 A11 W (3.4.114) σy2

Iρyx ρyy = −

1 −1 −1 trW 0 W ∆xy + iρyx ρyy ; iρyx ρyy = trA−1 22 A21 A11 W A11 W σy2

−1 Iρyx ρxy = iρyx ρxy ; iρyx ρyxy = trA−1 22 W A11 W

Iρyx σy2 =

Iρyy ρyy =

0 1 −1 iρyx σy2 ; iρyx σy2 = trW A−1 22 A21 A11 2 σy

1 −1 trW 0 W ∆xy + iρyy ρyy ; iρyy ρyy = trW A−1 11 W trA11 σy2

Iρyy σy2 =

(3.4.115)

(3.4.116)

(3.4.117)

(3.4.118)

1 1 iρyy σy2 ; iρyy σy2 = 2 trA−1 11 W 2 σy σy

(3.4.119)

1 trW 0 W ∆y σx2

(3.4.120)

Iρxy ρxy =

93

3 Causality in Space. A Parametric Approach

Iσy2 σy2 =

N 2σy4

(3.4.121)

Iσx2 σx2 =

N 2σx4

(3.4.122)

Iµx ρyy = Iµx ρyx = Iµx σy2 = Iµx σx2 = Iρxx σy2 = Iβσx2 = Iθσx2 = Iθσy2 = 0

(3.4.123)

Iρyy ρxy = Iρxy σx2 = Iρxy σy2 = Iσx2 σy2 = Iρyy σx2 = Iρyx σx2 = 0

(3.4.124)

Iρxx ρyy = Iθρyy = Iµy θ = Iµy ρxx = Iµy µx = Iµy ρxy = Iµy σy2 = Iµy σx2 = Iµx β = 0 (3.4.125) Then, the information matrix is:



I =

94

 Iββ                              

Iβρyx Iρyx ρyx

Iβµy

Iβρyy

Iβσy2

0

Iβθ

Iµy ρyx

Iρyx ρyy

Iρyx σy2

0

Iµy µy

Iµy ρyy

0

0

0

Iρyy ρyy

Iρyy σy2

0

Iσy2 σy2

Iβρxx

Iβρxy

0

Iρyx ρxy

0

0

0

0

0

0

0

0

0

0

0

0

0

Iµx µx

Iθµx

Iµx ρxx

Iµx ρxy

0

Iθθ

Iθρxx

Iθρxy

0

Iρxx ρxx

Iρxx ρxy

Iρxx σx2

Iρxy ρxy

0

Iρyx θ Iρyx ρxx

Iσx2 σx2

                               

3.4 Summary





 I11 I12 

I=

I21 I22

(3.4.126)



where: 

  Iββ

Iβρyx 

I11 = 

Iρyx ρyx

  Iβµy

I12 = 

Iµy ρyx

Iβσy2

Iρyx ρyy

Iρyx σy2

  Iµy µy

           I22 =            

0

Iβρyy





Iβθ

Iβρxx

0 Iρyx θ Iρyx ρxx

Iβρxy Iρyx ρxy

0  0





Iµy ρyy

0

0

0

0

0

0

Iρyy ρyy

Iρyy σy2

0

0

0

0

0

Iσy2 σy2

0

0

0

0

0

Iµx µx

Iθµx

Iµx ρxx

Iµx ρxy

0

Iθθ

Iθρxx

Iθρxy

0

Iρxx ρxx

Iρxx ρxy

Iρxx σx2

Iρxy ρxy

0 Iσx2 σx2

                       

The inverse of the sub-matrix I11 is:

h

−1 −1 I21 I11 = I 11 = I11 − I12 I22

i−1

(3.4.127)

95

4 Causality in Space. A Non-Parametric Approach

As we established in the previous chapter, our proposal concerning the definition of causality emphasises in the incremental information content. In this chapter when we talk about information, we will refer to the concept of entropy as a numerical quantity that captures the uncertainty of a variable random. The Fisher (1921) proposal, for the parametric case (sample variance), is a special case of this broader definition. Similar to the previous chapter as a first step we will focus on detecting the dependence between the series, using a new technique known as symbolic dynamics. This technique reduces the observed series to a set of symbols such that they capture the relevant information, facilitating estimation of a measure of information known as symbolic entropy. The detection of spatial dependence among series enables us to approach causality, although it does not provide a sense of direction between the variables involved. In turn, space could be playing a more important role than could be expected. This leads to the next section, which contemplates a procedure for the appropriate choice of weighting matrix, under dependence. We conclude with the spatial causality test proper. This test is the final step in a non-trivial process such that the establishment of causality is non-ambiguous, capable of determining a direction between series, representing the fundamental added value for establishing spatial causality in information.

97

4 Causality in Space. A Non-Parametric Approach As we did in Chapter 3, each section contains the results of corresponding Monte Carlo simulations.

4.1 Symbolic Dynamics and Entropy Symbolic dynamics is based on the transformation of a series into a sequence of symbols that capture statistically useful information that cannot be directly observed. The idea is to consider a space in which all the possible states of a system are represented. This space can be partitioned in a finite number of regions, and each region is represented by a symbol. In other words, symbolic dynamics is a segmented description of a system’s dynamics. For further details, see Hao and Zheng (1998). On the other hand, the use of symbols enables us to simply calculate the entropy of the series; this is a measure of the utmost importance for the establishment of causality in space. The definitions and concepts to be established reflect the ideas contained in Matilla and Ruiz (2008) and López et al. (2010), who symbolize univariate series in time and space, respectively, introducing a pattern of order in the series.

4.1.1 Symbolization Process This section explains the symbolization of spatial processes and proposes a symbolization which, in general terms, performs well for detecting dependent processes and causal relationships between spatial processes.

When more

information is available about the processes under study, the proposed symbolization can be improved by the researcher. Let {xs }s∈S and {ys }s∈S be two spatial processes of real data, where S is a set of points or locations in space. In order to symbolize the series, we have to define a nonempty finite set of symbols capable of representing the necessary information about the spatial process to test the established null hypothesis. This set is denoted by Γn = {σ1 , σ2 , . . . , σn } and each of its elements σi will be a symbol for i = 1, 2, . . . , n.

98

4.1 Symbolic Dynamics and Entropy Symbolizing a process is therefore defining a map

f : {xs }s∈S → Γn

(4.1.1)

such that each element xs is associated to a single symbol f (xs ) = σis with is ∈ {1, 2, . . . , n}. We then say that location s ∈ S is σi − type, relative to the series {xs }s∈S , if and only if f (xs ) = σis . We call f the symbolization map. The same process can be followed for the series ys . We then introduce the bivariate process {Zs }s∈S as: Zs = {xs , ys }

(4.1.2)

where xs and ys are the previously defined univariate spatial processes. For this bivariate process, we define the set of symbols Ωn as the direct product of the two 



sets Γn , that is, Ω2n = Γn × Γn and its elements are of the form ηij = σix , σjy . The symbolization map of the bivariate process would be g : {Zs }s∈S → Ω2n = Γn × Γn

(4.1.3)

defined by 

g (Zs = (xs , ys )) = (f (xs ) , f (ys )) = ηij = σix , σjy



(4.1.4)

We will say that s is ηij − type for Z = (x, y) or simply that s is ηij − type, if and only if s is σix − type for x and σjy − type for y. Our interest is limited to bivariate processes, although it can easily be generalized for multivariate processes as follows. Consider a k − dimensional spatial process, {Zs }s∈S = {x1s , x2s , . . . , xks }. Let Ωkn = Γn × Γn · · · × Γn be the direct product of k copies of Γn and ηi1 ,i2 ,...ik = (σi1 , σi2 , . . . , σik ) ∈ Ωkn . We then say that s is ηi1 ,i2 ,...ik − type if and only if s is σij − type for xjs for all j = 1, 2, . . . , k.

99

4 Causality in Space. A Non-Parametric Approach We can define different symbolization maps depending on the problem. In our case, we will define a symbolization function f as follows: let Mex be the median of the univariate spatial process {xs }s∈S . Define the indicator function

τs =

   1 if   0

xs ≥ Mex otherwise

(4.1.5)

Let m ≥ 2 be the embedding dimension. For each s ∈ S, let Ns be the set formed by the (m − 1) neighbours of s. We use the term m − surrounding to denote the set formed by each s and Ns , such that m − surrounding xm (s) = xs , xs1 , . . . , xsm−1 . 

We define the indicator function for each si with i = 1, 2, . . . , m − 1:

ιssi =

   0

if τs 6= τsi

  1

otherwise

(4.1.6)

We can then establish a symbolization map for spatial process {xs }s∈S as f : {xs }s∈S → Γm , defined as: f (xs ) =

m−1 X

ιssi

(4.1.7)

i=1

where Γm = {0, 1, . . . , m − 1}, and the cardinality of the set is equal to m. The symbolization process consists for comparing, for each location s, the value τs with τsi of each si the set of the m − 1 nearest neighbours to location s. This symbolization enables us to capture the relevant information about the neighbourhood of s. We propose the following example to facilitate the interpretation of the symbolization process. Assume that there are two spatial processes distributed on a 3 × 3 regular map. Its spatial representation is shown on Figure 4.1. Considering m = 4, we can represent the series xs , such that x4 (s1 ) = (xs1 = 4, xs2 = 1, xs4 = 6, xs5 = 2) represents the 4 − surrounding of s1 , which is formed by the 3 nearest neighbours to location s1 . For each location of the series xs we form the 4 − surrounding remaining: x4 (s2 ) , x4 (s3 ) , . . . , x4 (s9 ).

100

4.1 Symbolic Dynamics and Entropy

Figure 4.1: Example of Regular Lattice 3 × 3 for xs and ys .

. Let denote Nsi the set formed by the 3 nearest neighbours si . Then: (Ns2 = {s3 , s1 , s5 }), (Ns3 = {s2 , s6 , s5 }), (Ns4 = {s5 , s1 , s7 }), (Ns5 = {s6 , s2 , s4 }), (Ns6 = {s3 , s5 , s9 }), (Ns7 = {s8 , s4 , s5 }), (Ns8 = {s9 , s5 , s7 }), (Ns9 = {s6 , s8 , s5 }). This process can similarly be applied to ys . For this example, the symbol associated to s1 of series xs is: f (xs1 ) = (ιs1 s2 = 0) + (ιs1 s4 = 1) + (ιs1 s5 = 0) = 1. Likewise, we can obtain the symbols associated to the rest of the locations. We thus obtain: f (xs2 ) = 1; f (xs3 ) = 1; f (xs4 ) = 1; f (xs5 ) = 1; f (xs6 ) = 2; f (xs7 ) = 2; f (xs8 ) = 2; f (xs9 ) = 1. Similarly, we can obtain the symbols associated to series ys such that: f (ys1 ) = 0; f (ys2 ) = 1; f (ys3 ) = 1; f (ys4 ) = 1; f (ys5 ) = 2; f (ys6 ) = 2; f (ys7 ) = 1; f (ys8 ) = 2; f (ys9 ) = 2. Note that, when the spatial processes are independent, and with the symbolization map proposed for each process, the probability of occurrence of each symbol is given by p (σ) = Cσ

/2(m−1) , where Cσm−1 = (m−1)!/[(m−1−σ)!σ!] denotes the combinations of

m−1

m−1 elements taken from σ in σ for all symbols σ ∈ {0, . . . , m − 1}. In other words, for m = 4, assuming the independence of the spatial process, the expected relative frequencies for each symbol are: p (σ = 0) = 1/8, p (σ = 1) = 3/8, p (σ = 2) = 3/8, p (σ = 3) = 1/8.

101

4 Causality in Space. A Non-Parametric Approach Having symbolized the univariate processes, we now form pairs, for each location, of the symbols obtained for their respective series, obtaining the symbolization for the bivariate process. For example, location s = 2 is (1, 1) − type. Examples of different symbolization functions can be found in Matilla and Ruiz (2008, 2009), López et al. (2010) and Ruiz, López and Páez (2009). In the latter, the proposal is applied to discrete data, while the first two are limited to continuous variables. On the other hand, it is important to note that the symbolization procedure is applicable to both regular and irregular spatial structures and to both points and areas.

4.1.2 Entropy: Definitions and Concepts This section explains some basic concepts of Information Theory.

A thorough

account can be found in Cover and Thomas (1991). The entropy concept is at the core of Information Theory. It provides a measure of the uncertainty of a stochastic process. Let x be a discrete random variable that takes on values {x1 , x2 , . . . , xn } with probabilities p (xi ) for each i = 1, 2, . . . , n, respectively. Definition 1: The Shannon entropy, h (x), of a discrete random variable x is defined as: h (x) = −

n P

p (xi ) ln (p (xi )).

i=1

Usually, when the base of the logarithm is equal to 2, the units of measure are expressed in bits. We use the Neperian base, so the units are expressed in nats. It is conventionally assumed that 0 ln 0 = 0. In other words, adding terms equal to zero does not alter the entropy. Based on the definition of individual entropy, we can consider the joint entropy of a pair of random variables. Definition 2: The entropy h (x, y) of a pair of discrete random variables (x, y) with joint distribution p (x, y) is: h (x, y) = −

p (x, y) ln (p (x, y)).

PP

102

x y

4.1 Symbolic Dynamics and Entropy We can in turn define conditional entropy. Definition 3: Conditional entropy h (x|y) with distribution p (x, y) is defined as: h (x|y) = −

p (x, y) ln (p (x|y)).

PP x y

In other words, conditional entropy h (x|y) is the entropy of x that remains when y has been observed. With the established definitions, we can show some interesting relationships: 1. h (x, y) = h (x) + h (y|x). 2. h (y|x) 6= h (x|y). 3. h (x, y) = h (y, x). Note that measures of entropy are functions of the probability distribution of random variables. In other words, they do not depend on the values of said variables, but on their probabilities. In contrast, the variance depends on the values of the variables and is sensitive to change in the unit of measurement. These entropy concepts can be adapted to the probability distribution of the symbols calculated in the previous section, as follows. Having symbolized the series, as shown in the previous section, for a embedding dimension m ≥ 2, it is easy to calculate the absolute and relative frequency of the different collections of symbols σixs ∈ Γn and σjys ∈ Γn . We define the absolute frequency of symbol σix as: nσix = # {s ∈ S|s

is

σix − type for

(4.1.8)

x}

Similarly, for series {ys }s∈S , the absolute frequency of symbol σjy is defined as n

nσy = # s ∈ S|s j

is σjy − type

f or

y

o

(4.1.9)

Once the absolute frequencies have been calculated, the relative frequencies can also be estimated: p (σix ) ≡ pσix =

# {s ∈ S|s is

σix − type |S|

f or

x}

=

nσix |S|

(4.1.10)

103

4 Causality in Space. A Non-Parametric Approach n





p σjy ≡ pσy =

σjy − type

# s ∈ S|s is

f or

y

o

|S|

j

=

nσ y j

|S|

(4.1.11)

where |S| denotes the cardinal of set S; in general |S| = N . Similarly, we calculate the relative frequency for ηij ∈ Ω2n : p (ηij ) ≡ pηij =

# {s ∈ S|s

nη is ηij − type} = ij |S| |S|

(4.1.12)

Once the necessary symbols and concepts have been entered, we are in a position to develop the symbolic entropy concept for a two−dimensional spatial series {Zs }s∈S . This is Shannon entropy for the m2 different symbols hZ (m) = −

X

p (η) ln (p (η))

(4.1.13)

η∈Ω2m

Symbolic entropy is an indicator of the information contained in the m2 symbols used in the symbolization. Similarly, we can define the marginal symbolic entropies as hx (m) = −

X

σ x ∈Γ

hy (m) = −

p (σ x ) ln (p (σ x ))

(4.1.14)

p (σ y ) ln (p (σ y ))

(4.1.15)

m

X

σ y ∈Γm

Note that the marginal entropies satisfy 0 ≤ h (m) ≤ ln (n). The lower limit is reached when a single symbol appears and the upper limit is reached when all the symbols have the same probability of occurrence. In turn, we can obtain the symbolic entropy of y, conditioned by the occurrence of symbol σ x in x as: hy|σx (m) = −

X

σ y ∈Γm

104

p (σ y |σ x ) ln (p (σ y |σ x ))

(4.1.16)

4.2 Independence in Spatial Processes We can also estimate the conditional symbolic entropy of ys given xs : hy|x (m) = −

X

p (σ x , σ y ) ln (p (σ y |σ x ))

X

(4.1.17)

σ x ∈Γm σ y ∈Γm

But, given p (σ x , σ y ) = p (σ x ) p (σ y |σ x ), we can manipulate the expression as follows: hy|x (m) = −

X

X

p (σ x ) p (σ y |σ x ) ln (p (σ y |σ x ))

σ x ∈Γm σ y ∈Γm

= −

X

p (σ x )

X

σ x ∈Γm

σ y ∈Γ

p (σ y |σ x ) ln (p (σ y |σ x ))

(4.1.18)

m

So that the conditional symbolic entropy of ys given xs : hy|x (m) =

X σ x ∈Γ

p (σ x ) hy|σx (m)

(4.1.19)

m

is understood as the average symbolic entropy of y conditioned by the symbolic occurrence of x. Note that entropy, interpreted as an informative measure, is not a new measure in econometrics and spatial statistics. Cressie (1993), a reference on spatial statistics, starts the first chapter by contemplating the entropy concept as a measure of a system’s disorder to motivate the statistical discussion on spatial data and highlights the relationship with variance, a more familiar measure for statisticians. Unfortunately, possibly because of its simplicity of estimation, variance has played a preponderant role as a measure of information, although growing interest has arisen in the last years for the entropy concept and its potential applications.

4.2 Independence in Spatial Processes The tools developed enable us to approach the issue of dependence among spatial processes. Based on López et al. (2010), for univariate processes, we develop an approach that is adapted for relationships among spatial processes. Specifically, in López et

105

4 Causality in Space. A Non-Parametric Approach al. (2010), for an independent spatial process {xs }s∈S , a statistic is derived, denoted by SG, based on symbolic dynamics and entropy. This statistic can be adapted to the symbolization proposed in the previous section for a fixed embedding dimension m ≥ 2, and would be: "

SG (m) = 2N 2 (m − 1) ln (2) −

"m X

ln



Cσm−1 i



#

#

− hx (m)

(4.2.1)

i=1

Based on the null hypothesis H0 of independence, the statistic SG is distributed as a Chi-square with m degrees of freedom.

4.2.1 Testing Independence between Spatial Processes To test independence among series, consider a two-dimensional spatial series {Zs = {xs , ys }}s∈S with a fixed embedding dimension, m ≥ 2. In order to test the independence among series {xs }s∈S and {ys }s∈S , we propose the following null and alternative hypotheses: H0 : {xs }s∈S and {ys }s∈S

are i.i.d. and

independent

of

each

H1 : The negation of H0 Now, for a symbol η ∈ Ω2n of the bivariate process {Zs }s∈S , we define the random variable τηs as follows: τηs =

   1 if s is η − type   0

otherwise

(4.2.2)

So τηs is a Bernoulli variable with probability of “success” pη , where “success” means that s is η − type. It is easy to see that X

pη = 1

(4.2.3)

η∈Ω2n

We now assume that the set of locations S is finite, and N denotes the number of elements of S. We are interested in knowing how many elements of the set S are η − type, for all symbols η ∈ Ω2n . We therefore construct the following random

106

4.2 Independence in Spatial Processes variable Qη =

X

τηs

(4.2.4)

s∈S

Note that not all the variables are independent (due to overlapping of the m − surroundings), so Qη is not exactly a Binomial random variable. However, the sum of Bernoulli dependent variables can be approximated as a Binomial random variable if the following conditions are met (Soon, 1996): 1. The dependence between the indicators is weak, and 2. The probability of occurrence of the indicators is small. The second condition is met by the way in which the symbols were constructed. Under the null hypothesis, the probability of success of a τηs indicator is small (p (σi ) =

(m−1) Cσm−1 i /2

) for most of the symbols. The first condition can only be

met if the distribution of the locations on the map is regular and the embedding dimension is relatively small. If the embedding dimension is large, or the spatial system is irregular, this condition becomes more difficult to sustain. To ensure that the dependence among the τηs indicators is weak enough, it is possible to control the degree of overlapping of the m − surroundings. The overlapping occurs when m − surroundings of different locations share common neighbours. To obtain a good binomial approximation, we can consider a sub-set of locations S˜ ⊆ S with controlled overlapping, so that the dependence among the τηs indicators ˜ is weak for s ∈ S. Obviously, a good Binomial approximation involves loss of sample information, and the feedback must be taken into account. This strategy enables us to obtain an overlapping index which can be established when the test is implemented. A method for constructing the set of locations S˜ with controlled overlapping can be found in Ruiz, López and Páez (2009). Therefore, under the conditions stated, the variable Qη can be approximated as a Binomial random variable:

107

4 Causality in Space. A Non-Parametric Approach

Qη ≈ B (N, pη )

(4.2.5)

Under the null hypothesis H0 , the joint probability function of the n2 variables (Qη11 , Qη12 , . . . , Qηnn ) is a multinomial with joint probability function: P (Qη11 = a11 , . . . , Qηnn = ann ) =

(a11 + a21 + · · · + ann )! a11 a21 nn pη11 pη12 · · · paηnn (4.2.6) a11 !a21 ! · · · ann !

where a11 + a21 + · · · + ann = N . The likelihood function of the joint distribution (4.2.6) is: LF (pη11 , pη12 , . . . , pηnn ) = and as

N! nη nη ηnn pη 11 pη 12 · · · pnηnn nη11 !nη12 ! · · · nηnn ! 11 12

(4.2.7)

pηij = 1, it follows that

P i,j

LF (pη11 , pη12 , . . . , pηnn ) =

N! nη nη pη1111 pη1212 · · · nη11 !nη12 ! · · · nηnn ! nη

nn−1 · · · pηnn−1 1 − pη11 − . . . − pηnn−1

nηnn

(4.2.8)

The logarithm of the likelihood function is

N! L (pη11 , pη12 , . . . , pηnn ) = ln nη11 !nη12 ! · · · nηnn !

!

+

n n−1 X X



nηij ln pηij



i=1 j=1

+ nηnn ln 1 − pη11 − pη12 − . . . − pηnn−1



To obtain the maximum-likelihood estimators pˆηij of pηij for all i, j = 1, 2, . . . , n, we resort to the gradient ∂L (pη11 , pη12 , . . . , pηnn ) =0 ∂pηij

108

4.2 Independence in Spatial Processes so that

pˆηij =

nηij N

So the likelihood ratio statistic is (Lehmann, 1986):

λ (Q) =

(0)nη11 (0)nη12 (0)n N! · · · pηnn ηnn pη12 nη11 !nη12 !···nηnn ! pη11 nη nη nηnn N ˆη1111 pˆη1212 · · · pˆηnn nη11 !nη12 !···nηnn ! p n Q n (0)nηij Q

=

pηij i=1j=1 n Q n  nη Q

ij

ij

N

i=1j=1 n P n P

nη

nηij

= N i=1j=1

n Q n (0)nηij Q

pηij

i=1j=1 n Q n Q



nηijij

i=1j=1

= N

N

n n Y Y i=1j=1



(0)

nη

p  ηij  nηij

ij

(0)

where pηij denotes the probability of symbol ηij under the null hypothesis. Under the joint null hypothesis of independence among series and i.i.d., Υ (m) = −2ln (λ (Q)) asymptotically follows a Chi-square distribution with k degrees of freedom, where k is equal to the number of unknown parameters under H1 less the number of unknown parameters under H0 (Lehmann, 1986). Then,

Υ (m) = −2ln (λ (Q)) 

= −2 N lnN +

(4.2.9) n X n X



nηij ln 

i=1 j=1

(0) pηij

nηij

  ∼ χ2 k

109

4 Causality in Space. A Non-Parametric Approach Under the null hypothesis, spatial processes {xs }s∈S and {ys }s∈S are independent and i.i.d., so it must be true that (0) (0)

p(0) ηij = pσ x pσ y i

Using this result, and considering that 

Υ (m) = −2N lnN + 



= −2N lnN + =

−2N lnN +

=

i=1j=1



= 1, we can deduce that

N

(0) (0)

pσx pσy



nηij  i j  ln   N nηij i=1 j=1

n X n X nηij

N

n X n X nηij i=1 j=1



n P n nη P ij

n X n X

i=1 j=1



j

N

ln



ln



(0) (0) pσx pσy i j



(0) (0) pσx pσy i j





n X n X nηij i=1 j=1



n X n X nηij i=1 j=1



N  ln nηij N N 

nηij ln N N 







n X n X

nηij lnN  N i=1 j=1

   X    n n X n n X X nηij nηij nηij (0) (0)  −2N  ln pσx pσy − ln i=1 j=1

N

i

j

i=1 j=1

N

N

So, with the symbolization proposed in Section 4.1.1, and considering that, under (0)

the null hypothesis, spatial processes are independent of each and i.i.d., then pσx = i

(0) pσy i

=

(m−1) Cσm−1 i /2

for all i = 1, 2, . . . , m.

On the other hand, hZ (m) = −

n P n nη P ij

i=1j=1

N

ln

 nη  ij

N

. We can readily obtain the

following result:

 

     n X n  X nηij  − hZ (m) Υ (m) = 2N 2 (m − 1) ln (2) −  ln Cσm−1 Cσm−1 x y i   j N i=1 j=1

(4.2.10) We have therefore shown the following:

Theorem 4.2.1:

110

4.2 Independence in Spatial Processes Let {xs }s∈S and {ys }s∈S be two spatial processes with |S| = N . Assume that the processes have been symbolized with the symbolization application defined in Section 4.1.1. Denote by hZ (m) the entropy defined in (4.1.13) for a fixed embedding dimension m ≥ 2, with m ∈ N. If the spatial series {xs }s∈S and {ys }s∈S are i.i.d. and independent of each other, the statistic

(

Υ (m) = 2N

"

2 (m − 1) ln (2) −

n P n nη P ij

i=1j=1

N



ln Cσm−1 Cσm−1 y x i

#

j

)

− hZ (m)

is asymptotically distributed as a χ2m2 +1 . Let α be a real number with 0 ≤ α ≤ 1 and let χ2α be such that P r χ2k > χ2α = α. 

In order to test H0 : {xs }s∈S and {ys }s∈S

are i.i.d. and

independent

each

other

The decision rule in the application of the Υ (m) test with a 100 (1 − α) % confidence level is: If 0 ≤ Υ (m) ≤ χ2α

fail

Otherwise

H0

reject

to

reject H0

This test can be generalized for k spatial processes. The final structure of the multivariate test is:

Υ (m) = 2N

  



k (m − 1) ln (2) − 

n X

i1 =1

···

n X nηij ik =1

N



ln 

k Y



 

Cσm−1 xi  − hZ (m)

j=1

ij



(4.2.11)

where Z is a joint process k − dimensional that is asymptotically distributed as a χ2mk +1 . Our interest focuses on the step following the rejection of H0 . If this condition is met, we can proceed to the detection of causality and its direction in the studied relationship.

111

4 Causality in Space. A Non-Parametric Approach

4.2.2 Consistency of the Test Υ (m) In the previous section we obtained the dependence test and its asymptotic distribution, which is well defined under the null hypothesis. We now add the property of consistency in weak conditions for a wide range of spatial processes. Consistency is a very important property for a statistical test, as it asymptotically ensures that the null hypothesis will be rejected with probability one if it is false. In the case of the test, it will reject the joint null hypothesis of i.i.d. and independence providing that the dependence structure (linear or non-linear) is less than m. Theorem 4.2.2: Let {xs }s∈S and {ys }s∈S be two stationary processes, and m > 2 with m ∈ N. 



b (m) > C = 1, for all 0 < C < Under dependence of order less than m, lim P r Υ N →∞

∞, C ∈ R.

Proof We first note that plim pbσ = pσ exists for every spatial process, where pbσ = N →∞

nσ N .

By continuity of the log function, it then follows that

b (m) = h (m) plim h

(4.2.12)

N →∞

b (m) represent the estimator of Υ (m) and remember that Let Υ

 

     n X n  X n η ij b (m) = 2N 2 (m − 1) ln (2) −   − hZ (m) Υ ln Cσm−1 Cσm−1 x y i   j N i=1 j=1

"

Consider that H (m) = 2 (m − 1) ln (2) −

n P n nη P ij i=1j=1

N

ln



Cσm−1 Cσm−1 x y i j

#

− hZ (m)

can be re-written as (see demonstration of Theorem 4.2.1): m X m X

H (m) = −

112



nηij  ln  N i=1 j=1

Cσm−1 Cσm−1 i j



22(m−1) nηij N

 

(4.2.13)

4.2 Independence in Spatial Processes Furthermore, for each positive real number x ∈ R, with −ln (x) ≥ 1 − x equality only occurring when x = 1. Using these results, and considering that 2m−1 = (1 + 1)m−1 =

m−1 P i=0

Cim−1

together with equation (4.2.13) under the alternative hypothesis of dependence among processes of an order less than m. Note that

H (m) >



m X m X

nηij  ln 1 − N i=1 j=1

Cσm−1 Cσm−1 i j 22(m−1) nηij N

  =0

Therefore, we have obtained that, under dependence of order less than m

H (m) > 0

(4.2.14)

Let 0 < C < ∞ with C ∈ R and N with large enough so that C < H (m) 2N

(4.2.15)

Then

h

b (m) > C Pr Υ

i

= P r [2N H (m) > C] 

= P r H (m) >

C 2N



So, for equation (4.2.15) we have





b (m) > C = 1 plim Υ

N →∞

as desired. Q.E.D. Note that the parameter m guarantees the property of consistency for the statistic Υ (m). As the investigator must offer this low value, we contemplate minimal conditions:

113

4 Causality in Space. A Non-Parametric Approach 1. The lowest value of m is equal to 2 for each series. 2. The highest value of m will depend on sample size N . Note that N should be larger than the number of symbols ((n × n) ≤ N ) in order to have at least the same number of m − surroundings as possible symbols. Being conservative, as suggested by Rohatgi (1976), in order to have a good approximation to the limit of the tabulated values of distribution χ2 , the value of the expected frequencies must be greater than 5. This means that the embedding dimension must be fixed so that (5 (n × n) ≤ N ). For example, in our particular non-standard symbolization, if we establish m = 5, the joint distribution would have 25 symbols. We would then need a sample of 125 observations to have an appropriate approximation to distribution χ2 .

4.2.3 Permutation Alternative to the Independence Test As mentioned, the variable Qη equation (4.2.5) does not always provide a good approximation to the Binomial distribution, under the joint null hypothesis of i.i.d. and independence. If the embedding dimension is large or the space system is irregular, the degree of overlap of the m − surroundings tends to be important, causing Q to be a poor approximation to a Binomial random variable. In these situations, we present an alternative strategy for testing independence by random permutation. Permutation techniques turn allows us to test hypotheses different from those raised in the Section 4.2.1. First, we propose the following null and alternative hypotheses: H0 : {xs }s∈S and {ys }s∈S

are i.i.d. and

independent

of

each

H1 : The negative of H0 Denoting Ψ1 =

1 2N Υ.

The permutationed test procedure, with a number B of

permutations (perms), is as follows: b 1 from the original sample {xs } 1. Compute the value of the statistic Ψ s∈S and

{ys }s∈S .

114

4.2 Independence in Spatial Processes 2. Re-sampling {xs }s∈S and {ys }s∈S , we obtain two permutationed series {xs (b)}s∈S and {ys (b)}s∈S , where b is the number of permutation sample. (b)

b 3. For series {xs (b)}s∈S and {ys (b)}s∈S , estimate the statistic: Ψ 1

4. Repeat B − 1 times steps 2 and 3 to obtain B permutations of the statistic n

o (b) B

b Ψ 1

b=1

.

5. Compute the estimated permutation p − value: B     X b1 = 1 b (b) > Ψ b1 pboots − value Ψ τ Ψ 1 B b=1

(4.2.16)

where τ (·) is an indicator function that assigns 1 if inequality is true and 0 otherwise. 6. Reject the null hypothesis of independence between {xs }s∈S and {ys }s∈S if 



b1 < α pboots − value Ψ

(4.2.17)

for a nominal size α. We can also design a permutation contrast to a weaker null hypothesis. Specifically, we can jump on the condition that {xs }s∈S and {ys }s∈S are i.i.d. and compare only the hypothesis of independence between series. To perform this step we must remember that, under independence between spatial processes, it follows that

pηij = pσix pσy , j

and therefore it is immediate contrast that hZ (m) = hx (m) + hy (m). Thus, we ˆ x (m) + h ˆ y (m) − h ˆ Z (m) and, replacing Ψ ˆ2 = h ˆ1 propose the following statistic: Ψ ˆ 2 in the steps 1 to 6, we obtain a new permutation test for the null hypothesis: for Ψ 0

H0 : {xs }s∈S and {ys }s∈S

are

independent

of

each

115

4 Causality in Space. A Non-Parametric Approach

4.2.4 Performance for Finite Samples In this section, we present the size and statistical power of the new test, applying a Monte Carlo simulation exercise. In this section we present the empirical size and empirical power for the new test using Monte Carlo simulation exercises. The specification of each data processes, parameters, etc., are the same that Section 3.2.2.3. For each DGP will present two versions of test. In the first case, we show the results of general test, that this the following null hypotheses: H0 : {xs }s∈S and {ys }s∈S

are

i.i.d.

and

mutually

independent

ˆ 1 . The presented Table 4.1 shows the size of the general independence statistic, Ψ values are good, showing conduct ranging around 5%. Following the rule that each symbol must have at least an expected frequency of 5, for a sample size N = 100, we only consider the case of m = 4. Similar precaution is applied to statistical power. ˆ 1 Test at 5% level Table 4.1: Empirical Size of Ψ m N = 100 N = 400 N = 1000

4 4.9 6.3 4.8

6 − 6.2 6.2

8 − 6.4 6.1

Note: Perms: 200. Number of Replications: 1000

The power value for DGP 1, which only presents intra-dependence, is shown in Table 4.2. The results are clearly satisfactory, with the power value rapidly increasing as the sample size, spatial dependence and the number of symbols grow. ˆ 1 Test at 5% level Table 4.2: Empirical Power of Ψ DGP 1 m ρ = 0.3 ρ = 0.5 ρ = 0.7

N = 100 4 39.0 94.5 100

N = 400 4 6 8 96.0 85.0 71.5 100 100 100 100 100 100

Note: Perms: 200. Number of Replications: 200.

116

N = 1000 4 6 8 100 99.0 96.0 100 100 100 100 100 100

4.2 Independence in Spatial Processes Table 4.3 shows the power value for DGP 2 (inter-dependence only). For all possible cases, the test presents a good power value, rapidly reaching 100% when the value of the parameters is increased. For DGP 3 (intra- and inter-dependence), the results are even better, with a power value of practically 100% in all cases (Table 4.4).

ˆ 1 Test at 5% level Table 4.3: Empirical Power of Ψ DGP 2 m 2 Ry/x = 0.4 2 Ry/x = 0.6 2 Ry/x = 0.8

N = 100 4 46.0 76.5 83.5

N = 400 4 6 8 98.5 100 98.5 100 100 100 100 100 100

N = 1000 4 6 8 100 100 100 100 100 100 100 100 100

Note: Perms: 200. Number of Replications: 200.

ˆ 1 Test at 5% level Table 4.4: Empirical Power of Ψ DGP 3 m 2 Ry/x = 0.4 2 Ry/x = 0.6 2 Ry/x = 0.8

N = 100 4 95.0 98.0 99.5

N = 400 4 6 8 100 100 100 100 100 100 100 100 100

N = 1000 4 6 8 100 100 100 100 100 100 100 100 100

Note: Perms: 200. Number of Replications: 200.

Power values are lower in non-linear processes, although still high. Tables 4.5, 4.6 and 4.7 show the results for DGP 4 (intra-dependence only), DGP 5 (interdependence only) and DGP 6 (intra- and inter-dependence), respectively. In all cases, the power value grows as the sample size increases. Tables 4.6 and 4.7 show the 2 values of Ry/x for informative purposes only (the relationship between the variables

is clearly non-linear). As occurred in the linear case, the test’s power for non-linear intra- and interdependence (Table 4.7) is satisfactory even with small sample sizes.

117

4 Causality in Space. A Non-Parametric Approach ˆ 1 Test at 5% level Table 4.5: Empirical Power of Ψ DGP 4 m ρ = 0.3 ρ = 0.5 ρ = 0.7

N = 100 4 20.5 65.0 99.0

N = 400 4 6 8 75.5 57.5 45.5 100 97.5 96.0 100 100 100

N = 1000 4 6 8 100 95.0 93.0 100 100 100 100 100 100

Note: Perms: 200. Number of Replications: 200.

ˆ 1 Test at 5% level Table 4.6: Empirical Power of Ψ DGP 5 m 2 Ry/x = 0.4 2 Ry/x = 0.6 2 Ry/x = 0.8

N = 100 4 20.5 34.5 49.5

N = 400 4 6 8 82.5 92.5 86.0 95.0 98.0 99.0 100 100 100

N = 1000 4 6 8 99.0 100 100 100 100 100 100 100 100

Note: Perms: 200. Number of Replications: 200.

ˆ 1 Test at 5% level Table 4.7: Empirical Power of Ψ DGP 6 m 2 Ry/x = 0.4 2 Ry/x = 0.6 2 Ry/x = 0.8

N = 100 4 67.0 82.5 88.5

N = 400 4 6 8 100 99.5 99.5 100 100 100 100 100 100

N = 1000 4 6 8 100 100 100 100 100 100 100 100 100

Note: Perms: 200. Number of Replications: 200.

Then, we present the results of the permutation test for the null hypothesis of independence between series, namely: 0

H0 : {xs }s∈S and {ys }s∈S

are

independent

of

each

ˆ 2 statistic we do not show the As with the parametric dependence tests, for the Ψ power results for processes DGP 1 and DGP 4, as they fall under the null hypothesis. ˆ 2 , with a generally acceptable Table 4.8 presents the empirical size of the test, Ψ performance. The values range around 5%, with a maximum of 6.6% and a minimum of 4.7%.

118

4.2 Independence in Spatial Processes ˆ 2 Test at 5% level Table 4.8: Empirical Size of Ψ m N = 100 N = 400 N = 1000

4 5.6 4.7 4.8

6 − 5.2 5.1

8 − 6.6 5.9

Note: Number of Replications: 1000

In the case of inter-dependence, DGP 2, the empirical power shown in Table 4.9. For small sample size the results are not good, although the power is rapidly improving for N = 400. For sample size of N = 1000, the power of the test reaches 100% in all cases. Similar behavior can be visualized for the case of inter-and intradependence, GDP 3, shown in Table 4.10. ˆ 2 Test at 5% level Table 4.9: Empirical Power of Ψ DGP 2 m 2 Ry/x = 0.4 2 Ry/x = 0.6 2 Ry/x = 0.8

N = 100 4 23.0 27.5 31.5

N = 400 4 6 8 62.5 68.5 78.0 81.0 82.0 79.0 90.5 87.5 86.5

N = 1000 4 6 8 100 100 100 100 100 100 100 100 100

Note: Perms: 200. Number of Replications: 200.

ˆ 2 Test at 5% level Table 4.10: Empirical Power of Ψ DGP 3 m 2 Ry/x = 0.4 2 Ry/x = 0.6 2 Ry/x = 0.8

N = 100 4 22.5 21.5 20.5

N = 400 4 6 8 62.0 72.5 71.5 71.0 73.0 65.5 74.5 70.5 65.5

N = 1000 4 6 8 94.0 95.0 99.0 97.5 98.5 99.0 99.0 99.0 99.0

Note: Perms: 200. Number of Replications: 200.

We now present the non-linear cases. Table 4.11 shows the results for DGP 5. This process, which contains inter- and intra-dependence, has high power values when the sample size is large (N = 1000), but the results are deficient for small and medium sample sizes. In all cases, these values increase when the initial linear dependence between the variables is stronger.

119

4 Causality in Space. A Non-Parametric Approach ˆ 2 Test at 5% level Table 4.11: Empirical Power of Ψ DGP 5 m 2 Ry/x = 0.4 2 Ry/x = 0.6 2 Ry/x = 0.8

N = 100 4 9.0 13.0 18.5

N = 400 4 6 8 36.0 52.0 57.0 54.5 53.5 54.5 61.0 62.5 57.5

N = 1000 4 6 8 80.0 83.0 89.5 88.0 87.5 84.5 96.0 92.5 87.5

Note: Perms: 200. Number of Replications: 200.

Finally, Table 4.12 shows how the test performs for inter- and intra-dependence, DGP 6, with high values only when the sample contains 1000 observations. ˆ 2 Test at 5% level Table 4.12: Empirical Power of Ψ DGP 6 m 2 Ry/x = 0.4 2 Ry/x = 0.6 2 Ry/x = 0.8

N = 100 4 9.0 12.0 11.0

N = 400 4 6 8 29.5 40.5 52.0 29.5 41.0 44.0 34.0 40.5 47.0

N = 1000 4 6 8 71.0 73.0 83.5 75.0 71.5 80.5 78.0 73.5 73.0

Note: Perms: 200. Number of Replications: 200.

4.3 Analysis of the Appropriate Spatial Structure Once dependence between series has been detected, the next step is to verify the spatial structure of the variables involved. The idea is to include all the significant information in the causal analysis, but no more than necessary. We present a procedure that alters the symbolic entropy concept and enables us to detect the most appropriate type of weighting matrix for the relationship under study and to consider the most informative order of spatial weighting matrix. These series of alterations are adaptations to the bivariate spatial case, under standard symbolization, of those proposed by Matilla and Ruiz (2010).

4.3.1 Detection of Most Informative Weighting Matrix In spatial econometrics, it is customary to specify a specific weighting matrix from the different types of existing matrices (for a review, see Anselin, 2002, and Getis and

120

4.3 Analysis of the Appropriate Spatial Structure Aldstadt, 2004). In general, this selection is made a priori, depending on the user’s judgment. In this section, we aim to establish a procedure such that the chosen structure is the one that provides the most information. The selection criterion is thus based on objective information from the data themselves, and does not depend on the investigator’s subjectivity. Let {xs }s∈S and {ys }s∈S be two spatial processes such that xs possibly causes ys . Each process is embedded in an m − dimensional surrounding and under the symbolization proposed in Section 4.1.1. Let W (x, y) = {W | ∈ J } be the finite set formed by all the relevant contract matrices that determine all the possible causal relationships between the two processes, where J is a set of indices.

Let set W (x, y) be called the spatial-

dependence structure set between x and y. Let K be a subset of Γm and let W ∈ W (x, y). We can then define x KW = {σ x ∈ K|σ x is admissible for W x} .

(4.3.1)

where admissible indicates that the probability of occurrence of the symbol is positive. We use Γxm to denote the set of symbols that are admissible for {xs }s∈S . Let W0 ∈ W (x, y) be the most informative weighting matrix for the relationship between x and y. Given the spatial process {ys }s∈S , there is a subset K ⊆ Γm such that 



x |σ y > p (K∗x |σ y ) for all K∗ ⊆ Γ , W ∈ W (x, y) \ {W } and σ y ∈ Γy . Then p KW m 0 m W 0



hW0 x|y (m) = −

X

σ y ∈Γy

 X   p (σ y )  p (σ x |σ y ) ln (p (σ x |σ y )) ≤ 

X

≤ −

σ y ∈Γy

(4.3.2)

x σ x ∈KW o

pσ y 

 X

p (σ x |σ y ) ln (p (σ x |σ y )) = hW x|y (m)

∗x σ x ∈KW

We have thus proved the following theorem. Theorem 4.3.1: Let {xs }s∈S and {ys }s∈S two spatial processes.

For a fixed

embedding dimension m ≥ 2, with m ∈ N, if the most important weighting matrix that reveals the spatial-dependence structure between x and y is W0 ∈ W (x, y) then

121

4 Causality in Space. A Non-Parametric Approach

hW0 x|y (m) =

min

n

W ∈W(x,y)

o

hW x|y (m) .

(4.3.3)

4.3.2 Analysis of Performance in Finite Samples In this section, we present information about the performance of conditional entropy as a measure for selecting spatial structure between dependent, and possibly causal, relationships in finite samples. We compute 1000 Monte Carlo replicates of each model and consider the following scenario: We first generate the data of each process by a weighting matrix of the first order, W1 . To consider a real decision problem, we contemplate five different weighting matrices: 1. W1 : weighting matrix of the first order. First order means that the m − 1 nearest neighbours are taken for each point. 2. W2 : weighting matrix of the second order. Second order means that the nearest neighbours of order m at 2(m − 1) are taken for each point, less the first m − 1. 3. W3 : weighting matrix including the nearest m − 2 neighbours plus the m − th nearest neighbour. 4. W4 : weighting matrix that includes the nearest m − 3 neighbours plus the m − th and m + 1 − th nearest neighbours. 5. W5 : weighting matrix that includes the m − 4 nearest neighbours plus the m − th, m + 1 − th and m + 2 − th nearest neighbours. The problem is to select the most informative of these five alternatives to complete the relationship between variables x and y. Tables 4.13 and 4.14 show the results obtained from the conditional entropy in a linear and non-linear process, respectively. As mentioned earlier, the correct matrix

122

4.3 Analysis of the Appropriate Spatial Structure with which the data were generated is always W1 . The % Selection column shows the percentage of replications in which the respective matrix is chosen for its lower conditional entropy. It is evident that, as the linear dependence of the process increases, the difference in the mean conditional entropy of W1 relative to the other matrices grows, and the percentage of correct choice also favourably increases. In the linear case, with values 2 of Ry/x = 0.8, the percentage of success is over 85% in the worst case. As the sample

size and linear dependence grow, conditional entropy more effectively captures the information in the data, choosing matrix W1 more often, and always in the most favourable situation.

Table 4.13: Simulations of Conditional Entropy. Linear Case N = 100 M ean % Selection W1 1308.9 39.0 W2 1394.9 6.6 R2 = 0.4 W3 1336.5 24.0 W4 1352.3 17.0 W5 1358.1 13.4 W1 1236.8 59.9 W2 1389.6 3.9 R2 = 0.6 W3 1306.4 15.8 W4 1339.5 9.5 W5 1344.5 10.9 W1 1103.1 85.4 W2 1384.7 1.1 R2 = 0.8 W3 1276.5 8.0 W4 1334.2 2.5 W5 1344.3 3.0 Note: Number of Replications: 1000. Linear m=5

N = 400 M ean % Selection 1419.7 62.9 1503.5 1.2 1445.7 20.4 1465.2 7.2 1471.3 8.3 1350.4 88.5 1499.3 0.3 1417.0 8.0 1453.6 1.9 1466.9 1.3 1206.1 99.9 1498.9 0.0 1386.4 0.1 1446.6 0.0 1465.6 0.0

N = 1000 M ean % Selection 1448.3 76.5 1520.8 0.3 1468.9 15.7 1484.8 4.2 1492.4 3.3 1377.0 97.9 1519.7 0.0 1439.5 2.0 1473.4 0.1 1488.4 0.0 1234.7 100 1518.1 0.0 1408.1 0.0 1465.8 0.0 1485.8 0.0

Table 4.14 shows the situation for the non-linear case. Although % Selection falls slightly, the situation substantially remains the same. The performance of conditional entropy continues to be adequate.

123

4 Causality in Space. A Non-Parametric Approach

Table 4.14: Simulations of Conditional Entropy. Non Linear Case N on Linear N = 100 m=5 M ean % Selection W1 1349.2 26.8 W2 1386.2 13.7 R2 = 0.4 W3 1364.0 16.5 W4 1368.7 17.7 W5 1355.5 25.3 W1 1311.9 39.5 W2 1386.3 9.3 R2 = 0.6 W3 1345.4 19.4 W4 1363.6 12.6 W5 1356.1 19.2 W1 1244.4 59.0 W2 1379.7 6.8 R2 = 0.8 W3 1329.5 12.5 W4 1352.5 9.0 W5 1348.6 12.7 Note: Number of Replications: 1000.

N = 400 M ean % Selection 1443.7 51.8 1505.6 2.2 1461.0 21.0 1474.5 10.8 1475.7 14.2 1400.0 77.5 1506.5 1.0 1445.7 11.0 1468.2 4.7 1475.0 5.8 1301.6 96.6 1505.5 0.3 1427.0 1.7 1465.0 0.6 1472.9 0.8

N = 1000 M ean % Selection 1465.4 69.8 1524.1 0.4 1481.5 17.5 1493.5 5.7 1496.8 6.6 1413.8 94.5 1523.2 0.1 1462.3 4.1 1486.0 0.5 1494.5 0.1 1304.9 100 1521.1 0.0 1439.9 0.0 1480.9 0.0 1493.0 0.0

4.4 Spatial Causality in Information Having detected the dependence between series, and having contemplated the most informative spatial structure, we process to test causality in information. This process aims to prevent mistaking causality for dependence. We therefore begin the causal analysis by testing dependence between the series. We then aim to include the spatial structure providing the most information about such dependence, which is possibly causal.

4.4.1 Spatial Causality Test Let {xs }s∈S and {ys }s∈S be two spatial processes and let W (x, y) be the set of spatial-dependence structures between x and y.

124

4.4 Spatial Causality in Information We use

XW

= {Wi x|Wi ∈ W (x, y)}

(4.4.1)

YW

= {Wi y|Wi ∈ W (x, y)}

(4.4.2)

to denote the sets of spatial lags of x and y given by all the contact matrices in W (x, y).

Definition: We say that {xs }s∈S does not cause {ys }s∈S under the spatial structure XW and YW if hy|YW (m) = hy|YW ,XW (m)

(4.4.3)

We then propose a unilateral non-parametric test for the following null hypothesis

H0 : {xs }s∈S does not cause {ys }s∈S under the spatial structure XW and YW with the following statistic: ˆ y|Y (m) − h ˆ y|Y ,X (m) δˆ (YW , XW ) = h W W W

(4.4.4)

If XW does not contain extra information about y then δˆ (YW , XW ) = 0, otherwise, δˆ (YW , XW ) > 0. The asymptotic distribution of the statistic is unknown so we need an alternative. As an approximation, we propose the implementation of the permutation technique. The advantage of this technique over asymptotic methods, when used in a nonparametric context, is that it makes no assumptions about population distributions, avoiding the mathematical difficulties posed by the asymptotic theory. Moreover, as seen Efron and Tibshirani (1993), the method is reliable and consistent for estimating distributions of an estimator, being more accurate in finite samples than asymptotic approximations.

125

4 Causality in Space. A Non-Parametric Approach There are several permutation possibilities, one of which is to independently resample {xs }s∈S and {ys }s∈S . Note that the dependence structure is unfortunately lost in this re-sampling. This could affect the distribution of the statistical test and, therefore, under the null hypothesis of non-causality, it could create deviations in the probability of rejection from the nominal size.1 The permutation test procedure, with a number B of permutations (perms), consists of the following steps: 1. Compute the value of the statistic δˆ (YW , XW ) from the original sample {xs }s∈S and {ys }s∈S . 2. Re-sampling {xs }s∈S and {ys }s∈S , we obtain two permutationed series {xs (b)}s∈S and {ys (b)}s∈S , where b is the number of permutation sample. 3. For series {xs (b)}s∈S and {ys (b)}s∈S , estimate the statistic: δˆ(b) (YW , XW ) . 4. Repeat B − 1 times steps 2 and 3 to obtain B permutationed realizations of oB

the statistic δˆ(b) (YW , XW ) n

b=1

.

5. Compute the estimated permutation p − value: B     1X τ δˆb (YW , XW ) > δˆ (YW , XW ) (4.4.5) pboots − value δˆ (YW , XW ) = B b=1

where τ (·) is an indicator function that assigns 1 if inequality is true and 0 otherwise. 6. Reject the null hypothesis of {xs }s∈S does not cause {ys }s∈S under the spatial structure W (x, y) if pboots − value δˆ (YW , XW ) < α 



(4.4.6)

for a nominal size α. 1

As an alternative to this proposal, we have performed simulations resampling only the variable considered cause to maintain some of the dependence between the series. The results show that the power does not decrease significantly compared to the proposal that we provide.

126

4.4 Spatial Causality in Information

4.4.2 Monte Carlo Simulations We now present the results of the Monte Carlo experiments, using the D.G.P. introduced in Section 3.3. One advantage of the δˆ (YW , XW ) test is that it can be modified according to data availability. As a general rule, we have considered that, on average, each symbol must have an expected frequency of close to 5. This decision will affect the information that can be included in structures YW and XW . For instance, for N = 100, we consider that the δˆ (YW , XW ) test contains the following information: YW = Ø, XW = {W x}. For N = 400 and N = 1000, the data set can be enlarged so that YW = {W y} and XW = {W x}. For simplicity, the δˆ (YW , XW ) statistic with YW = Ø, XW = {W x}, is denoted simple statistic, and in the case of YW = {W y}, XW = {W x}, broad statistic. Table 4.15 shows the size of the statistic. The results are good, with values of just over 5% for both the simple and broad statistic. Table 4.15: Empirical Size of δˆ (YW , XW ) Test at 5% level YW Ø Ø Ø Wy Wy

XW Wx Wx Wx Wx Wx

m N = 100 N = 400 N = 1000 N = 400 N = 1000

4 5.5 5.7 5.6 5.9 6.0

5 4.9 6.0 5.4 4.5 5.3

6 − − 6.3 − 5.4

Note: Perms: 200. Number of Replications: 1000.

Tables 4.16, 4.18 and 4.20 present the global estimated power of the statistic. The power value is calculated after applying the test in both directions, from x to y and from y to x. For the proposed D.G.P., the global power captures all the cases in which, for each simulation, we reject the null hypothesis of non-causality from x to y, and we cannot reject the null hypothesis from y to x.

127

4 Causality in Space. A Non-Parametric Approach

DGP 1 m

Y W , XW = Wx

5

4 43.0 40.0 27.0 81.0 78.0 66.0 97.0 94.0 90.0

5 28.0 30.0 25.0 81.0 75.0 44.0 97.0 99.0 81.0

YW = W y, XW = W x

N = 400 6 80.0 85.0 92.0 80.0 92.0 96.0 74.0 84.0 96.0

YW = Ø, XW = W x

4

5 67.0 78.0 81.0 66.0 93.0 94.0 59.0 72.0 92.0

4 48.0 53.0 73.0 16.0 35.0 59.0 4.0 25.0 55.0

5 26.0 60.0 70.0 13.0 31.0 57.0 13.0 19.0 56.0

4 70.0 78.0 74.0 74.0 91.0 94.0 64.0 84.0 95.0

5 82.0 89.0 84.0 96.0 98.0 99.0 94.0 99.0 100

6 84.0 78.0 78.0 98.0 99.0 98.0 98.0 99.0 100

YW = W y, XW = W x

N = 1000

6 35.0 52.0 71.0 27.0 53.0 68.0 21.0 39.0 65.0

YW = Ø, XW = W x

Table 4.16: Global Estimated Power of δˆ (YW , XW ) Test at 5% level

Ø, XW

N = 100 =

4

YW

42.0 54.0 69.0 ρ = 0.3 2 42.0 84.0 Ry/x = 0.4 ρ = 0.5 44.0 37.0 80.0 ρ = 0.7 57.0 61.0 56.0 ρ = 0.3 69.0 68.0 79.0 R2 = 0.6 ρ = 0.5 70.0 y/x 56.0 84.0 ρ = 0.7 77.0 86.0 41.0 ρ = 0.3 87.0 2 87.0 65.0 Ry/x = 0.8 ρ = 0.5 91.0 74.0 80.0 ρ = 0.7 79.0 Note: Perms: 100. Number of Replications: 100.

128

4

5

ρ = 0.3 12.0 32.0 2 Ry/x = 0.4 ρ = 0.5 16.0 19.0 ρ = 0.7 18.0 26.0 ρ = 0.3 27.0 41.0 2 Ry/x = 0.6 ρ = 0.5 43.0 33.0 ρ = 0.7 14.0 30.0 ρ = 0.3 60.0 59.0 2 Ry/x = 0.8 ρ = 0.5 42.0 57.0 ρ = 0.7 32.0 32.0 Note: Perms: 100. Number of Replications:

m

N = 100

YW = Ø, XW = W x

DGP 2

Y W , XW

100.

4 66.0 57.0 48.0 78.0 79.0 73.0 79.0 80.0 81.0

5 73.0 67.0 70.0 89.0 81.0 72.0 83.0 91.0 95.0

6 68.0 67.0 68.0 84.0 85.0 88.0 89.0 93.0 94.0

YW = Ø, XW = W x

4 48.0 42.0 27.0 69.0 57.0 33.0 96.0 77.0 72.0

5 56.0 56.0 35.0 79.0 68.0 47.0 94.0 94.0 65.0

YW = W y, XW = W x

N = 400 4 80.0 73.0 78.0 71.0 65.0 81.0 55.0 65.0 75.0

5 81.0 84.0 81.0 74.0 74.0 82.0 80.0 71.0 78.0

6 87.0 83.0 84.0 74.0 83.0 88.0 78.0 81.0 89.0

4 72.0 67.0 54.0 81.0 84.0 82.0 84.0 87.0 84.0

5 89.0 85.0 77.0 87.0 90.0 92.0 90.0 95.0 95.0

6 91.0 91.0 83.0 97.0 93.0 91.0 93.0 98.0 98.0

YW = W y, XW = W x

N = 1000 YW = Ø, XW = W x

Table 4.18: Global Estimated Power of δˆ (YW , XW ) Test at 5% level

4.4 Spatial Causality in Information

129

4 Causality in Space. A Non-Parametric Approach

DGP 3 m

Y W , XW = Wx

5

4

4 62.0 55.0 41.0 89.0 85.0 75.0 93.0 93.0 95.0

5 62.0 59.0 43.0 90.0 89.0 69.0 97.0 100 93.0

YW = W y, XW = W x

N = 400 6 81.0 89.0 91.0 87.0 89.0 95.0 78.0 85.0 90.0

YW = Ø, XW = W x

5 79.0 87.0 87.0 78.0 90.0 89.0 78.0 72.0 86.0

4 56.0 59.0 72.0 31.0 48.0 70.0 18.0 43.0 72.0

5 56.0 62.0 64.0 29.0 48.0 66.0 20.0 27.0 57.0

4 82.0 87.0 87.0 81.0 88.0 92.0 76.0 85.0 93.0

5 92.0 96.0 90.0 95.0 95.0 99.0 92.0 99.0 100

6 94.0 97.0 90.0 97.0 99.0 100 96.0 100 100

YW = W y, XW = W x

N = 1000

6 63.0 59.0 81.0 46.0 53.0 73.0 32.0 40.0 64.0

YW = Ø, XW = W x

Table 4.20: Global Estimated Power of δˆ (YW , XW ) Test at 5% level

Ø, XW

N = 100 =

4

YW

41.0 49.0 ρ = 0.3 65.0 2 43.0 Ry/x = 0.4 ρ = 0.5 47.0 73.0 37.0 ρ = 0.7 56.0 76.0 63.0 ρ = 0.3 68.0 59.0 68.0 R2 = 0.6 ρ = 0.5 76.0 82.0 y/x 55.0 ρ = 0.7 79.0 84.0 86.0 ρ = 0.3 88.0 59.0 2 85.0 Ry/x = 0.8 ρ = 0.5 84.0 71.0 74.0 ρ = 0.7 81.0 88.0 Note: Perms: 100. Number of Replications: 100.

130

4.4 Spatial Causality in Information Table 4.16 shows the results obtained for the linear case, DGP 1. When N = 100, we can only include information in XW but, despite this constraint, the power value 2 grows as the linear relationship strengthens. In the case of Ry/x = 0.8, the power

value obtained is good, 91% in the best case. For larger sample sizes, such as N = 400 and N = 1000, we can test for causality using the two versions of the statistic, simple and broad. For m equal to 4 and 5, under weak linear dependence, the simple statistic performs better. When there are more observations, the value of m can be increased to 6, showing a good performance in all the simulated cases. For the broad statistic, we can only consider m equal to 4 and 5. With a high coefficient of determination, the broad test performs better in practically all the cases considered. The broad test performs better for large sample sizes, better using the available information and attaining power values of more than 90% in most cases, and even 2 100% when Ry/x = 0.8.

For non-linear processes, Tables 4.18 and 4.20, the test with both versions performs as described above. Even for N = 1000, the test performs better in non-linear process DGP 3 than in linear process DGP 1.

131

4 Causality in Space. A Non-Parametric Approach

4.5 Summary This chapter has developed a tool that supplements causal analysis, considering the strategy discussed in Chapter 3. Based on the concept of symbolic entropy, as a general information measure, different tests and procedures are developed that enable one to capture relevant information among series. We started by focusing on the detection of dependence between series, presenting a statistic in which its two permutationed versions, Ψ1 and Ψ2 , perform well for finite samples. We then presented a procedure called Conditional Entropy that is capable of detecting the most informative spatial structure when relationships between variables are both linear and non-linear. The chapter finally focuses on the spatial causality test proper. This is the end of a complex process that enables us to unequivocally determine the direction of information between series. The test’s performance in the simulations shows promise for the unequivocal detection of information between variables. It is capable of detecting spatial causation in linear and non-linear conditions. Alternatively, besides establishing an operative and detectable concept of causation in space, the tools developed in this chapter lead us to discuss specification in spatial econometrics. regularity in data.

132

This is decisive for establishing more than empirical

5 Spatial Causality between Migration and Unemployment in the Italian Provinces1

5.1 Introduction This chapter is concerned with the relationship between migration and unemployment on a regional level. In particular, we aim to study whether there is spatial causality between interregional migration and regional unemployment or vice versa. The relationship between migration and unemployment is often the subject of public debate. Many argue that immigration is the cause of high unemployment in regions receiving large number of migrants. This can be clarified with theoretical arguments based on the neoclassical school. From this perspective, assuming that labour is homogeneous and there is perfect competition on the goods market, workers move to prosperous regions, increasing their labour supply (direct effect). In turn, immigration generates an increase in the consumption of local goods, leading companies to increase the labour demand (indirect effect). According to the neoclassical perspective, the direct effect prevails over the indirect effect, resulting in an increase in unemployment. From a different perspective, the New Economic Geography with the emphasis in the core-periphery dynamics (Krugman, 1991) also supports the existence of 1

This chapter was developed while the author was conducting a research stay in ISEA (Rome, Italy) and was partly financed by DGA (CONAID) and CAI (Ref.: CH12/10 ).

133

5 Spatial Causality between Migration and Unemployment in the Italian Provinces a causal relationship between migration and unemployment. Assuming imperfect competition on the goods market and rigidity on the labour market, Epifani and Gancia (2005) present a model where the forces that generate agglomeration also determine the spatial disparities that produce interregional persistence in the unemployment rate. If we consider that regional integration results from diminishing transport costs, we can expect greater migration from backward to prosperous regions. The migratory effect will stimulate agglomeration economies (home-market effect), increasing business profits and, therefore, the demand for labour. In this case, the indirect effects of immigration on labour demand will prevail over the direct effect. In other words, the core-periphery dichotomy is being reinforced, with immigration reducing the unemployment rate in the region of destination. Other authors, however, suggest that unemployment is the cause of migration. Pissarides and Wadsworth (1989) argued that unemployment has effects on migration, as people move from places where they are not employed to places offering greater possibilities to get an employment. Unemployment in the place of origin also increases the probability of migration, as they are more likely to become unemployed or continue in unemployment. According to this perspective, unemployment in the place of origin is one of the main causes of migration. In sum, there are important theoretical arguments in favour of both causality relationships, representing an interesting case of empirical debate. The following section reviews the literature on the subject, distinguishing between two types of studies: (1) those that attempt to investigate the correct direction of causality, and (2) those that assume a specific theoretical hypothesis that they try to verify. Section 5.3 analyzes the evolution of unemployment and interregional migration in the Italian provinces, highlighting their principal characteristics. Section 5.4 performs the procedure for detecting spatial causality. The chapter ends with some conclusions.

134

5.2 Review of Empirical Evidence

5.2 Review of Empirical Evidence In general, the initial approaches to this problem are based on national data, studying the impact of migration on the countries of destination, such as United States, Canada and Australia, among others.

Subsequently, the focus has

increasingly been on intra-national processes. These studies can be divided into two types.

One of them considers the

simultaneity between the variables of interest and aims explicitly to test causality, usually according to Granger’s approach. Non-causal studies are based, generally, on a previous theoretical position that the study attempts to corroborate, in example, highlighting the explanatory power of the model. Marr and Siklos (1994) is one of the so called causal studies. These authors study the relationship between immigration and unemployment in Canada, using quarterly data for the 1962-1990 period. They conclude that, before 1978, changes in migratory levels did not affect the Canadian unemployment rate, but that this effect was significant after that year.

In a subsequent study, Marr and Siklos

(1995) use annual data for Canada in 1926-1992 and apply a vector autoregressive, VAR, approach to a set of variables: immigration, unemployment, salaries and gross domestic product, GDP . The evidence suggests that immigration causes unemployment. Withers and Pope (1985, 1993) investigate the case of Australia for 1861-1991 and 1948-1982. Using annual and quarterly data, these authors find evidence that unemployment causes immigration but immigration does not cause unemployment. Using quarterly data for Canada, Lee (1992) finds that immigration causes greater unemployment and that unemployment is only a weak cause of immigration. DíazEmparanza and Espinosa (2000), applying the same empirical strategy used by Lee (1992), analyze the causal relationship between unemployment and immigration in Spain in the 1981-1998 period (monthly data). The results show that, in the short term, there is no causality in the Granger sense, from unemployment to immigration,

135

5 Spatial Causality between Migration and Unemployment in the Italian Provinces but there is weak causality from immigration to unemployment. These studies contemplate cointegration among causal relationships. Tian and Shan (1999) investigate the relationship between unemployment and immigration in New Zealand and Australia. Using quarterly data for the 1983-1995 period, they construct a VAR with six variables. Their main conclusions highlight that there is no evidence of Granger causality between the two variables. Konya (2000) studies the bivariate relationship between unemployment and immigration in Australia in 1981-1998. Using quarterly data, he establishes that there is unidirectional Granger causality from immigration to unemployment. This causal relationship is negative, with immigration leading to greater unemployment in the long term. For Canada, Islam (2007) finds that migration does not cause unemployment but that there is evidence that unemployment causes migration. Based on a vector error correction model, he suggests that greater unemployment causes less immigration in the short term. In the long term, there is no increase in aggregate unemployment due to immigration. For the Canadian province of British Columbia, Gross (2004) finds that, in the short term, immigration causes increasing unemployment. This effect disappears in the long term, where immigrants help to reduce unemployment by creating more jobs. Feridun conducts a series of causal studies involving unemployment, immigration and GDP per capita. Feridun (2004) investigates the causal relationship between these variables for Finland in 1981-2001. The results show that immigration causes unemployment, and that this causality cannot be reversed. Applying a similar method, Feridun (2005) investigates the nature of the relationship for Norway; applying cointegrated VAR models, he finds no evidence of causality between unemployment and immigration in 1983-2003. Finally, Feridun (2007) investigates a similar relationship in Sweden. Using annual data for the 1980-2004 period, his

136

5.2 Review of Empirical Evidence conclusions provide no evidence that immigration causes unemployment, although he does find that unemployment causes immigration. Most causal studies investigate the relationship between unemployment and immigration on a national scale. The few regional studies include Gross (2004), who focuses on the impact of a single regional market. With regards to non-causal studies, several are related to the effect of regional unemployment on emigration. Pissarides and McMaster (1990) aim to identify the extent to which regional migration rates respond to salary differentials and regional unemployment. Using aggregate data for regions of Great Britain in 1961-1982, they show that the net migration rate presents a small positive response to changes in the regional unemployment rate. Pissarides and Wadsworth (1989) use individual data to investigate whether unemployment (individual and local) affects the probability of migration in Great Britain.

Their results, similar to those subsequently presented by Hughes and

McCormick (1994), show that causality between regional unemployment and emigration is insignificant. Using individual data, Da Vanzo (1978) finds that, in the United States, unemployment is only relevant for explaining the migration of people who are unemployed. In Spain, Antolin and Bover (1997) find that unemployment affects the migration of people who are not registered as unemployed. Using a micro database, the authors show that unemployment affects individual migration decisions. On an aggregate level, they find no relationship between migration and unemployment rate. Herzog, Schlottmann and Boehm (1993) examine different empirical job-seeking studies. In most of them, the dependent variable is dichotomic, measuring only the decision to migrate or not. The results show the importance of personal and local conditions for explaining the decision to migrate. For a good summary of other similar studies, see Greenwood (1975, 1985).

137

5 Spatial Causality between Migration and Unemployment in the Italian Provinces Faini et al. (1997) find evidence for Italy that the impact of unemployment on the decision to migrate is highly non-linear. Using individual data, their results show that unemployment encourages long distance mobility but discourages short distance mobility. This could mean that regional unemployment is highly correlated with geographic space. Different studies focus on the effect of migration on unemployment. Blanchard and Katz (1992) find that migration was a balancing mechanism for regional unemployment in the United States. Using combined data for Austria, Winter-Ebmer and Zweimüller (1994) conclude that there was only slight short term impact of immigration on the risk of unemployment in the previously resident population. Coulon (2005) finds that growing migration did not have a significant impact on employment in six Swiss regions from 1991 to 2003. Groenewold (1997) analyses a model in which immigration acts as an adjustment mechanism between regions. The author shows that interregional equilibrium forces act slowly and did not help to equalize regional unemployment rates in Australia. Pischke and Velling (1997) analyse panel data for 167 regional markets in Germany from 1985 to 1989. The change in the foreign population in each market was used as the explanatory variable, producing only small negative effects on the unemployment rate. Galloway and Jozefowicz (2008) investigate the impact of migration on the unemployment rate in 26 regional markets in Holland. Using panel data for 19962003, they find that migration had a positive effect on the regional unemployment rate. See Elhorst (2003) for a good review of the literature that attempts to explain regional unemployment differences.

Etzo (2008) presents a good review of the

literature concerning internal migration. There are few studies using spatial data and spatial parametric tools. Our review found no empirical studies applying causality techniques for data panels or using a spatial autoregressive vector approach.

138

5.3 Migration and Regional Unemployment in Italy

5.3 Migration and Regional Unemployment in Italy Internal migration in Italy followed several cycles. The aggregated flow was very significant in the 1960s, with a considerable number of people moving from the south to the north. This movement weakened in the 1970s and 1980s (Faini et al., 1997). Mobility was still limited in the early 1990s, although migration from the south to the north started to grow again in the mid-1990s (Basile and Causi, 2007). With regards to the evolution of the national unemployment rate, there were two different periods after the end of the 1970s (Bertola and Garibaldi, 2003). The first reflects a sustained increase until the recovery of the crisis of 1992-1993. The second started in the mid-1990s, and its main characteristic is the general decline of unemployment rates. Focusing on 1995-2006, we present a descriptive analysis of the evolution of migration and unemployment.

We use annual data for the NUTS 3 regions

(provinces), from the Italian National Statistics Institute (ISTAT ). The unemployment rate is equal to the number of unemployed divided by the labour force. The net migration rate is the average net migratory balance divided by the total population (aged from 15-64) in each province and year. The average net migratory balance is defined as the number of new migrants less the number of migrants leaving municipal censuses. Net migration has been used to measure migratory flows in many empirical studies (see, for example, Elhorst, 2003; Basile and Causi, 2007). The data used in the analysis period is the average of 4 three-year periods2 . This is because the migration variable is measured in relation to two administrative acts: cancellation of registration in the province of origin and registration in the province of destination. These acts can be delayed for some time, which affects mobility 2

Given the number of available years, the study can be conducted for different levels of temporal aggregation. The main conclusions, however, do not change, so we decided to focus on the three-year case. Section 5.4 will present the results for three different time of aggregation: (a) four 3-year periods, (b) two 6-year periods and (c) one 12-year period.

139

5 Spatial Causality between Migration and Unemployment in the Italian Provinces measurements. We have used the three-year average in order to correct this possible bias. Table 5.1 presents a statistical summary of the 103 Italian provinces for the four periods. Table 5.1: National Unemployment Rate and Net Migration Unemployment Rate (%) Net Migration (%) Year Average Stand. Dev. Average Stand. Dev. 1995-1997 10.73 6.07 0.27 0.43 1998-2000 10.36 6.62 0.33 0.60 2001-2003 8.47 6.00 0.98 0.62 2004-2006 7.59 4.55 0.64 0.64 Source: ISTAT .

The unemployment rate fell during the period 1995-2006. The Italian rate was 10.73% for the first period, 1995-1997, falling to 7.59% in 2004-2006. Net migration grew in the first three periods, from an average of 0.27% in 1995-1997 to 0.98% in 2001-2003. The last period registers a decrease in the aggregated value to 0.64%. The national dispersion of the data differs in each period.

In the case of

unemployment, the standard deviation grew from 1995 to 2000 and then fell below the 1995-1997 level. Dispersion in the net migratory rate grew regularly throughout the four periods considered. As its well-known, the southern Italian provinces evolve differently from those in the north and center. Taking this last into account, we can see the situation of each group in the different periods in Figure 5.1. The scatter plot between the variables shows a negative relationship, the slope becomes weaker period after period. The northern provinces have a low unemployment rate throughout the period and net inflow of immigrants.

The

dynamics observed is a decrease in the level of unemployment data on the provinces of North and South. The axes in Figure 5.1 have the same scale on each graph so we can see greater concentration in the level of unemployment throughout the periods. As mentioned earlier, both the southern and northern provinces reduced their unemployment rates.

140

5.3 Migration and Regional Unemployment in Italy

Figure 5.1: Relationship between Unemployment and Net Migration

.

% Unemployment 20 30 10 0

0

10

% Unemployment 20 30

40

Year 1998-2000

40

Year 1995-1997

-1

-.5

0 .5 % Net Migration

Northern Provinces

1

1.5

-2

Southern Provinces

-1

0 % Net Migration

Northern Provinces

1

2

Southern Provinces

. Year 2004-2006

0

0

10

10

% Unemployment 20 30

% Unemployment 20 30

40

40

Year 2001-2003

0

.5

1 1.5 % Net Migration

Northern Provinces

2

-1

2.5

0

% Net Migration

Northern Provinces

Southern Provinces

1

2

Southern Provinces

. . Source: ISTAT . Southern Provinces: provinces belong to regions of Campania, Abruzzo, Molise, Basilicata, Calabria, Puglia, Sicilia and Sardegna. Northern Provinces: provinces belong to regions of Center and North, Valle d’Aosta, Piemonte, Lombardia, Trentino Alto Adige, Friuli Venezia Giulia, Veneto, Liguria, Emilia Romagna, Marche, Toscana, Lazio and Umbria.

But if we consider the ratio between unemployment in the southern and in the northern provinces, we see that it goes from 2.29 in 1995-1998 to 3.13 in 20012003; finally, the ratio falls to 2.75 in 2004-2006. The difference between North and South increased in the first part of the period, subsequently decreasing just to maintain the discrepancy similar to the beginning. This is paradoxical as we should to find convergence between unemployment rates of the Italian provinces, implying a progressive reduction in the relative unemployment rate differential. Finally, we present the spatial distribution of variables (Figure 5.2).

141

5 Spatial Causality between Migration and Unemployment in the Italian Provinces

Figure 5.2: Spatial Distribution of Variables .

(14.12531,31.53626] (9.160642,14.12531] (6.220458,9.160642] [2.368152,6.220458]

% Unemployment, 1995-1997

(.6059328,1.356383] (.3805334,.6059328] (-.0885842,.3805334] [-1.060587,-.0885842]

.

(.8048511,1.709722] (.4638292,.8048511] (-.1890391,.4638292] [-1.408446,-.1890391]

(14.2261,31.67573] (8.201465,14.2261] (5.267354,8.201465] [2.325562,5.267354]

% Unemployment, 1998-2000

.

(12.37918,24.14242] (6.133433,12.37918] (3.773417,6.133433] [1.579589,3.773417]

% Unemployment, 2001-2003

% Net Migration, 1995-1997

% Net Migration, 1998-2000

(1.468992,2.365383] (1.04019,1.468992] (.4036406,1.04019] [-.1814139,.4036406]

.

% Net Migration, 2001-2003

.

(11.62917,19.36859] (5.213083,11.62917] (4.109676,5.213083] [2.666076,4.109676]

% Unemployment, 2004-2006

Source: ISTAT

142

(1.163868,1.827718] (.8032928,1.163868] (.1122014,.8032928] [-.7202358,.1122014]

.

% Net Migration, 2004-2006

5.4 Procedure for Detecting Spatial Causation As discussed in Figure 5.1, it is evident that the spatial pattern of unemployment has not been uniform among the provinces. There are clear and persistent clusters of similar data in the Northern and Southern regions. The unemployment rate was particularly high, mainly, in the regions of Sicilia, Serdegna, Calabria, Campania, Puglia and Basilicata; on the contrary, unemployment has been lower in the Northern provinces. With regards to the migratory flow, the Northern provinces are recipients whereas the Southern provinces are generators of migrants, in net terms. The pattern is similar year after year. These results show the importance of geographic space for the variables of interest and its persistence in their spatial distribution. In conclusion, we can see that the spatial distribution by quartile in each period shows a similar pattern. During the analyzed period there was a reduction in national unemployment, with high and persistent regional disparities. These results are similar to those reported by Faini et al. (1997) and Alesina, Danninger and Rostagno (1999), among others.

5.4 Procedure for Detecting Spatial Causation In this section we will be applying the strategy suggested in Figure 3.1. We will simultaneously use parametric and non-parametric tools for each step. We present the results of the causality analysis for different periods of time. As our initial analysis focuses on four 3-year periods, we can include causality detection for two 6-year periods and the total average. We can thus compare the results obtained for different time periods.

5.4.1 The Framework for the Analysis In the first instance, we have to establish whether the variables follow a normal distribution. This assumption is important for causation analysis based on the parametric approach. In the non-parametric approach, the results are not affected by the lack of normality of the variables.

143

5 Spatial Causality between Migration and Unemployment in the Italian Provinces Figure 5.3 shows the distribution by quantile of different transformation for the variables under study. The distribution of each transformed variable is compared with the distribution that would be obtained for a normal random variable in a Q-Q graph. We use log transformation for unemployment as it more clearly normally distributed. For net migration, the possible transformations are only 6, due to the negative values of the original variable. In this case, we decided to maintain the raw data, untransformed data, given that its adjustment relative to the quantiles of normal distribution is acceptable. It is evident that the log transformation does not affect the symbolization process involved in the non-parametric approach. Figure 5.3: Quantil-Quantil Graph cubic

-10000-5000

0

square

5000 10000 15000

-200

0

200

sqrt

cubic

identity

400

600

-10

0

log

10

20

square

30

1/sqrt -1

-.5

0

.5

1

-.5

inverse 1

2

3

4

5

1

inverse

-.3

-.2

-.1

1.5

2

2.5

3

3.5

-.6

-.5

1/square

0

.1

-.1

identity

-.05

-.4

-.3

-.2

-.1

.01

.02

0

.5

1

-1

-.5

1/square

0

.5

1

1.5

1/cubic

1/cubic

0

.05

-.03

% Unemployment, 1995-1997

-.02

-.01

0

-20

-10

0

10

20

-400

-200

0

200

400

-20000 -10000

0

10000 20000

% Net Migration, 1995-1997

Source: ISTAT . Comparison of quantiles for different transformations respect to Normal distribution.

5.4.2 Step 1: The Selection of the Spatial Structure In this step, we need to select a weighting matrix from all the possible choices. Remember that, in general, researchers can use their subjective criterion in the construction of a weighting matrix, using either geographic distance, the length of the common border between regions, transport service frequency or socioeconomic criteria such as similarities in productive structures. Assuming that there is more than one viable alternative, the question is: Which matrix should be selected of the different alternatives?.

144

5.4 Procedure for Detecting Spatial Causation In our case, there is no alternative other than nearest neighbours for the construction of the weighting matrix, so we consider a scenario where we want to choose the most informative matrix from a set of combinations of first and second order neighbours. The spatial structure selection process is conditioned by data availability. For the non-parametric test to be operative, we have to limit each location’s number of neighbours to 3, that is m = 4. This means that the comparison of the weighting matrices will be limited to the matrices compliant with this constraint. The following matrices were considered as possible for our research: 1. W1 : weighting matrix of the first order, including the 1st , 2nd and 3rd nearest neighbours. 2. W2 : weighting matrix of the second order, including the 4th , 5th and 6th nearest neighbours. 3. W3 : weighting matrix including the 1st , 2nd and 4th nearest neighbours. 4. W4 : weighting matrix including the 1st , 4th and 5th nearest neighbours. Table 5.2 summarizes the information resulting from the application of the conditional entropy procedure. In the column Selection, 1st , 2nd , 3rd and 4th means that the corresponding weighting matrix is the most preferred, the second, the third or the fourth preferred.

Table 5.2: Conditional Entropy. 3-year Periods P eriods

W1 W2 W3 W4

1995 − 1997

1998 − 2000

2001 − 2003

2004 − 2006

V alue

Selection

V alue

Selection

V alue

Selection

V alue

Selection

439.3 809.2 746.6 734.7

1 4th 3rd 2nd

446.1 771.0 660.9 543.6

1 4th 3rd 2nd

426.2 818.7 519.1 583.2

1 4th 2nd 3rd

612.8 729.4 635.5 609.3

2nd 4th 3rd 1st

st

st

st

145

5 Spatial Causality between Migration and Unemployment in the Italian Provinces In most of the periods, the most informative of the four options is W1 . In the last period, however, the most informative is W4 (Table 5.2). If we consider other time intervals, Table 5.3, the decision is W1 for the first period, 1995-2000, and the total average for 1995-2006. In the second period, 2001-2006, we select W3 .

Table 5.3: Conditional Entropy. Alternative Aggregations 1995 − 2000

P eriods

2001 − 2006

1995 − 2006

V alue

Selection

V alue

Selection

V alue

Selection

455.3 778.1 750.4 719.4

1st 4th 3rd 2nd

553.1 812.0 508.1 547.2

3rd 4th 1st 2nd

583.7 725.1 696.2 593.2

1st 4th 3rd 2nd

W1 W2 W3 W4

As W1 is the weighting matrix chosen most often this our own selection. Next, we can check by means the J-test, whether other matrices provide similar results to W1 . They are summarized in Table 5.4.

Table 5.4: J-Test. 3-year Periods P eriods

W1 vs. W2 W1 vs. W3 W1 vs. W4

1995 − 1997

1998 − 2000

2001 − 2003

2004 − 2006

p − value

p − value

p − value

p − value

0.73 0.94 0.51

0.05 0.37 0.40

0.07 0.28 0.23

0.03 0.08 0.28

Note: In terms of equation (3.2.9), Model0 uses W1 and Model1 uses Wj , j = 2, 3, 4.

In 1998-2000 and 2004-2006, we reject the null hypothesis, W1 , in favour of the alternative hypothesis. Taking this alternative case as the true model, we are unable to reject the null hypothesis. The J-test tells us to use W2 as the correct weighting matrix in both periods.

146

5.4 Procedure for Detecting Spatial Causation Note that, for 1998-2000, the W2 matrix provides the least information content (greater entropy) according to the results obtained by the measures of conditional entropy, and that the J-test is highly sensitive to non-linear mechanisms. Table 5.5 shows the results for the alternative time intervals. The same comment applies as for the 3-year periods.

Table 5.5: J-Test. Alternative Aggregations P eriodos

W1 vs. W2 W1 vs. W3 W1 vs. W4

1995 − 2000

2001 − 2006

1995 − 2006

p − value

p − value

p − value

0.24 0.66 0.91

0.04 0.09 0.20

0.10 0.27 0.50

Note: In terms of equation (3.2.9), Model0 uses W1 and Model1 uses Wj , j = 2, 3, 4.

5.4.3 Step 2: Bivariate Spatial Dependence Analysis Having selected the most significant spatial structure, our next step consists of testing for the existence of spatial dependence between the variables. Tables 5.6 and 5.7 show the results obtained for the parametric tests Iyx and LMI . Spatial dependence between unemployment and net migration is detected in all cases. Using weighting matrix W2 , chosen by the J-test for the 2004-2006 period, the results lead to the same conclusion.

Table 5.6: Iyx and LMI Tests. 3-year Periods

Periods 1995-1997 1998-2000 2001-2003 2004-2006

Bivariate M oran Iˆyx p − value −0.63 −0.77 −0.78 −0.78

0.00 0.00 0.00 0.00

LMI ˆ I LM 23.02 24.23 27.06 18.53

Conclusion p − value

0.00 0.00 0.00 0.00

Dependence Dependence Dependence Dependence

Note: W1 = 3 − nearest neighbours is used in all cases.

147

5 Spatial Causality between Migration and Unemployment in the Italian Provinces

Table 5.7: Iyx and LMI Tests. Alternative Aggregations

Periods 1995-2000 2001-2006 1995-2006

Bivariate M oran p − value Iˆyx −0.72 −0.81 −0.78

0.00 0.00 0.00

LMI ˆ I LM 22.64 24.56 26.01

Conclusion p − value

0.00 0.00 0.00

Dependence Dependence Dependence

Note: W1 = 3 − nearest neighbours is used in all cases.

Tables 5.8 and 5.9 show the results obtained for the permutationed non-parametric tests. Similarly to the parametric approach, we detect spatial dependence for all the periods except 2004-2006. In the period 2004-2006, the detected spatial dependence only in a univariate sense (intra-dependent). This could be due to the selection of the weighting matrix, as W1 was used in all cases. When this matrix is replaced by W4 , selected by the measure of conditional entropy for this period, there are no changes, detecting only intra-dependence.

Table 5.8: Ψ1 and Ψ2 Tests. 3-year Periods H0 Periods 1995-1997 1998-2000 2001-2003 2004-2006

Intra − inter independence

ˆ1 Ψ 1.29 1.76 1.88 1.65

p − value

0.00 0.00 0.00 0.00

Inter independence

ˆ2 Ψ 0.10 0.12 0.13 0.04

p − value

0.04 0.02 0.01 0.55

Conclusion Dependence inter − intra inter − intra inter − intra intra

Note: Perms: 200 in all cases. m = 4, 3 − nearest neighbours.

In the case of the alternative aggregations, Table 5.9 shows the results corresponding to different time intervals.

148

5.4 Procedure for Detecting Spatial Causation

Table 5.9: Ψ1 and Ψ2 Tests. Alternative Aggregations H0 Periods 1995-2000 2001-2006 1995-2006

Intra − inter independence

ˆ1 Ψ 1.76 1.96 1.87

p − value

0.00 0.00 0.00

Inter independence

ˆ2 Ψ 0.18 0.17 0.20

p − value

0.00 0.00 0.00

Conclusion Dependence inter − intra inter − intra inter − intra

Note: Perms: 200 in all cases. m = 4, 3 − nearest neighbours.

In sum, according to the non-parametric results, we have to conclude that there is no spatial causation in 2004-2006 between unemployment and migration because both variables are independent in this period. In spite of this result, we are going to maintain also the interval 2004-2006 to check if the δˆ (YW , XW ) statistic detects causality.

5.4.4 Step 3: Spatial Causality Having detected spatial dependence between the series (with the exception of the interval 2004-2006), we move to the application of the causation tests. In the case of the parametric approach, the Lagrange Multiplier test, LMN C , appears the best option, ruling out the iterated prediction procedure. For the non-parametric test, we use the δˆ (YW , XW ) test in its simple version due to the sample size. 5.4.4.1 Parametric Test The LMN C results are shown in Tables 5.10 and 5.11. Table 5.10: LMN C Test. 3-year Periods H0

U nemp. ; M igr.

Periods

ˆ NC LM

1995-1997

241.6 474.8 418.5 389.5

1998-2000 2001-2003 2004-2006

M igr. ; U nemp.

p − value

ˆ NC LM

p − value

0.00 0.00 0.00 0.00

257.6 451.9 575.5 439.2

0.00 0.00 0.00 0.00

Conclusion M igr. ⇔ U nemp. M igr. ⇔ U nemp. M igr. ⇔ U nemp. M igr. ⇔ U nemp.

Note: “;” means does not cause and “⇔” means bidirectionality. Weighting matrix: W1 .

149

5 Spatial Causality between Migration and Unemployment in the Italian Provinces

In all cases, and for all time intervals, we are unable to detect an unequivocal direction in the information. The null hypothesis is therefore rejected in both directions, concluding that there is spatial bidirectionality among the studied variables.

Table 5.11: LMN C Test. Alternative Aggregations H0

U nemp. ; M igr.

Periods

ˆ NC LM

1995-2000

357.5 480.3 465.7

2001-2006 1995-2006

M igr. ; U nemp.

p − value

ˆ NC LM

p − value

0.00 0.00 0.00

366.4 738.1 522.7

0.00 0.00 0.00

Conclusion M igr. ⇔ U nemp.. M igr. ⇔ U nemp. M igr. ⇔ U nemp.

Note: “;” means does not cause and “⇔” means bidirectionality. Weighting matrix: W1 .

5.4.4.2 Non-parametric Test Table 5.12 shows the values of the δˆ (YW , XW ) test for the null hypothesis of noncausality from unemployment to migration and from migration to unemployment. In turn, it shows the p − value for the statistic, considering 200 permutations. The permutation results are obtained from an independent re-sampling of both variables. Similar conclusions are obtained when only the cause variable is re-sampled.

Table 5.12: δˆ (YW , XW ) Test. 3-year Periods H0

U nemp. ; M igr.

Periods

δˆ (YW , XW )

1995-1997

0.06 0.06 0.05 0.06

1998-2000 2001-2003 2004-2006

M igr. ; U nemp.

p − value

δˆ (YW , XW )

p − value

0.28 0.44 0.46 0.42

0.19 0.18 0.15 0.07

0.00 0.00 0.00 0.21

Conclusion M igr. ⇒ U nemp. M igr. ⇒ U nemp. M igr. ⇒ U nemp. Dependence

Note: Perms: 200. “;” means does not cause and “⇒” means causes. Weighting matrix: W1 , m = 4.

150

5.4 Procedure for Detecting Spatial Causation The results reveal that there is causality in information for the first three intervals, only detecting dependence (as expected) between the two variables for the period 2004-2006. Table 5.13 shows the results for the alternative temporal aggregations. They are consistent with those obtained previously. Spatial causality is detected from net migration to unemployment in 1995-2000 and for the whole period 1995-2006. In 2001-2006, as consequence of to use the independent period 2004-2006, the result is a weakening of the direction of information.

Table 5.13: δˆ (YW , XW ) Test. Alternative Aggregations H0

U nemp. ; M igr.

Periods

δˆ (YW , XW )

1995-2000

0.05 0.07 0.04

2001-2006 1995-2006

M igr. ; U nemp.

p − value

δˆ (YW , XW )

p − value

0.56 0.30 0.68

0.17 0.09 0.15

0.00 0.10 0.00

Conclusion M igr. ⇒ U nemp. Dependence M igr. ⇒ U nemp.

Note: Perms: 200. “;” means does not cause and “⇒” means causes. Weighting matrix: W1 , m = 4.

In sum, the non-parametric approach detects a causal relationship from net migration to unemployment. In the case of the 6-year aggregations, there is a clear direction in the information in 1995-2000, similar to the 3-year periods case. In the second 3-year period, only exists intra-dependence. For the whole period 1995-2006, we detect a clear causal direction in information that goes from the migration to unemployment.

151

5 Spatial Causality between Migration and Unemployment in the Italian Provinces

5.5 Conclusions This chapter investigated the relationship between unemployment and net regional migration in the Italian provinces. The topic is particularly significant, as there are theoretical arguments affirming one or the other direction of causality, depending on initial theoretical assumptions. Using the procedure for detecting spatial causality, we have reached interesting conclusions. The analysis of the spatial structure and bivariate dependence enables us to conclude that there is strong dependence between the variables in the crosssections corresponding to the averages of 3-year periods. This leads us to the analysis of spatial causality, in order to detect a single direction in the information flow between the variables. Only in the last interval, 2004-2006, we failed to detect dependence between the variables with the non-parametric approach. The causality analysis was performed with the LMN C and δˆ (YW , XW ) tests, which performed better in finite samples. In the parametric case, we are unable to fix unique direction in the flow of information, although bidirectionality is detected. On the other hand, the results of the δˆ (YW , XW ) statistic show that there is spatial causality from migration to unemployment. It is important to note that this test does not provide the sign of the studied relationships (as individual parameters are not estimated). It is worth to acknowledge the existence of strong non-linearities between the variables, making the non-parametric approach more reliable. This view tend to give support to the theoretical arguments of neoclassical and the new geographic economy paradigms. One limitation is that the sample size comprises only 103 observations. Also, the analysis is only valid for bivariate relationships. The impact of a third variable such as relative salary could alter this relationship.

152

5.5 Conclusions

Appendix

Province

Table 5.14: Italian Provinces Province Province

1 Torino 2 Vercelli 3 Novara 4 Cuneo 5 Asti 6 Alessandria 7 Aosta 8 Imperia 9 Savona 10 Genova 11 La Spezia 12 Varese 13 Como 14 Sondrio 15 Milano 16 Bergamo 17 Brescia 18 Pavia 19 Cremona 20 Mantova 21 Bolzano-Bozen 22 Trento 23 Verona 24 Vicenza 25 Belluno 26 Treviso

27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

Venezia Padova Rovigo Udine Gorizia Trieste Piacenza Parma Reggio nell’Emilia Modena Bologna Ferrara Ravenna Forlì-Cesena Pesaro-Urbino Ancona Macerata Ascoli Piceno Massa Lucca Pistoia Firenze Livorno Pisa Arezzo Siena

53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78

Grosseto Perugia Terni Viterbo Rieti Roma Latina Frosinone Caserta Benevento Napoli Avellino Salerno L’Aquila Teramo Pescara Chieti Campobasso Foggia Bari Taranto Brindisi Lecce Potenza Matera Cosenza

Province 79 Catanzaro 80 Reggio di Calabria 81 Trapani 82 Palermo 83 Messina 84 Agrigento 85 Caltanissetta 86 Enna 87 Catania 88 Ragusa 89 Siracusa 90 Sassari 91 Nuoro 92 Cagliari 93 Pordenone 94 Isernia 95 Oristano 96 Biella 97 Lecco 98 Lodi 99 Rimini 100 Prato 101 Crotone 102 Vibo Valentia 103 Verbano-Cusio-Ossola

153

6 Final Conclusions

This study has approached the analysis of spatial causality for a cross-section of data. We aimed to provide an operative concept in space, suggesting a series of steps for its empirical testing. Those steps were developed based on both parametric and non-parametric approaches. In order to establish a definition of spatial causality, we analyzed different philosophical and economic thoughts so to clarify the concept. As our interest lies in spatial econometrics, where there is no control over study variables, it is natural to consider the inferential process approach as the most appropriate proposal (Granger, 1969; Sims, 1980). Under this general approach, our proposal contemplates a definition of causality that emphasizes the concept of incremental informative content. establishes that causality means that:

Intuitively, it

the cause variable provides additional

information about the effect variable. In a parametric context, Chapter 3, this measure of information content led us to the concepts of sample variance and forecast error. As stated by Granger (1980), a variable will be causal if it contains unique information about the effect variable. As a result, the cause variable must help us to explain, or forecast better, the effect variable (in terms of lower variance or mean absolute error). Chapter 4 considered a broader definition of information content. The key concept in this case is the term information in the sense of “numerical quantity that captures the uncertainty in the result of an experiment to be conducted”. This definition makes direct reference to the entropy of a given information set.

155

6 Final Conclusions

6.1 Principal Contributions

The main contribution of our study was the provision of an operative concept of spatial causality and a strategy capable of detecting it empirically (see Table 3.1). Alternative procedures or tests were included in each step of the strategy. These procedures may be useful also in other fields, not only for causal analysis. With regards to the problem of selecting a weighting matrix, there is only one formal test in the literature, the J-test. This step was enriched with the Conditional Entropy procedure, capable of detecting the correct order and the number of neighbours even with non-linear relationships. We presented an alternative to the Bivariate Moran test in the form of a Lagrange Multiplier, LMI . In a non-linear case, the non-parametric Ψ1 test performs quite well for intra- and inter-dependence. Depending on the characteristics of the spatial distribution of the data, an asymptotic or a permutationed version of the Ψ1 test can be used. The alternative permutationed version, Ψ2 , also provides a second option for the detection of dependence between variables. Detection of spatial dependence leads us to the final step of the procedure, where we try to detect the direction of the information flow between the variables. We propose three alternatives: in the parametric case, a Lagrange Multiplier, LMN C , and a Granger-like iterative predictive approach; and in the non-parametric case, the δˆ (YW , XW ) test. The predictive approach presents severe limitations and it is not capable of detecting causality in space. The performance of the LMN C test is acceptable under specific conditions (linearity is the most stringent). When these conditions change, the power of the test falls drastically. The δˆ (YW , XW ) non-parametric test meets the adequate conditions required to detect direction in information between spatial variables.

156

6.2 Future Lines of Research

6.2 Future Lines of Research The problem considered has deeper implications and highlights issues that have not been efficiently resolved in spatial econometrics. The problem of identification is one of them, as the use of a cross-section implies a large number of constraints on the parameters of a single spatial model. Paelinck and Nijkamp (1978) do not discuss the importance of economic theory for obtaining a good identification of the model, but they do insist on the importance of explicitly recognizing the problem and the constraints derived from the need for identification. We also refer to exogeneity. It is usual practice to assume that the regressors of the equation are exogenous in the specification. This assumption facilitates the subsequent inference, but is not obvious. If we think in terms of general equilibrium models, the exogeneity constraint makes little sense. In strictly spatial terms, the popular quote of Tobler (1970), “All things in space are related, but closer things are more related”, directly questions the universal validity of the principle of exogeneity. The recent literature has shown more concern for this issue in particular, with the widespread use of instrumental variables and GM M procedures, as seen, for instance, in Anselin and Kelejian (1997), Fingleton and Le Gallo (2008) or Kelejian and Prucha (2009). However, there is little work related to the characterization of a variable as exogenous or endogenous, to the type of variables that can be involved in a cross-section specification, or to exogeneity tests; ultimately, we return to the most elementary issue related to model identification conditions. In other words, we need to contemplate multivariate instead of univariate or bivariate equations. Future research will enhance the analysis by including third variables, approaching the common cause principle considered by Reichenbach (1956).

The common

cause principle establishes that two variables are statistically dependent if, and only if, one variable causes the other or they share one or more common causes. The consideration of the common cause principle avoids causality and spurious dependence problems.

157

6 Final Conclusions One alternative line of research is the detection of causality with the structural process approach.

Also known as the natural experimental approach, it was

developed, and defended by Heckman (2005), Angrist and Krueger (1999, 2001). As we saw in Chapter 2, the basic idea is to observe the difference in effects between different groups of individuals that could be called treatment and control groups. The methodology based on Information Theory can easily be adapted for this purpose. With regards to our empirical analysis, the sample size of 103 regions means that we can only apply the lowest informative level in the non-parametric case. On the other hand, a third variable such as relative salary could be affecting the causal relationship. Research will focus on these possible alternatives.

158

Bibliography

[1] Alesina, A., Danninger, S. and M. Rostagno (1999). «Redistribution through public employment», IMF Working Paper, 177. [2] Ancot, J., Kuiper, H. and J. Paelinck (1990). «Five principles of spatial econometrics illustrated», in Chaterji and Kuenne (eds.), Dynamics and conflict in regional structural change. Londres, McMillan. [3] Anderson, J. (1938). «The problem of causality», Australasian Journal of Psychology and Philosophy, 16, pp. 127-142. [4] Angrist, J., Imbens, G. and D. Rubin (1996). «Identification of causal effects using instrumental variables», Journal of the American Statistical Association, 91, pp. 430-442. [5] Angrist, J. and A. Krueger (1999). «Empirical strategies in labor economics» in Ashenfelter and Card (eds.), Handbook of Labor Economics, vol. 3A, Amsterdam, Elsevier, pp. 1277-1366. [6] Angrist, J. and A. Krueger (2001). «Instrumental variables and the search for identification: From supply and demand to natural experiments», Journal of Economic Perspectives, 15(4), pp. 69-85. [7] Anselin, L. (1988). Spatial econometrics: Methods and models. Dortrecht, Kluwer Academic Publishers.

159

Bibliography [8] Anselin, L. (2002). «Under the hood. Issues in the specification and interpretation of spatial regression models», Agricultural Economics, 27(3), pp. 247-267. [9] Anselin, L. and R. Florax (eds.) (1995). New directions in spatial econometrics. Berlin, Springer. [10] Anselin, L., Florax, R. and S. Rey (eds.) (2004). Advances in spatial econometrics: Methodology, tools and applications. Berlin, Springer-Verlag. [11] Anselin L. and H. Kelejian (1997). «Testing for spatial error autocorrelation in the presence of endogenous regressors», International Regional Science Review, 20, pp. 153–182. [12] Antolín, P. and O. Bover (1997). «Regional migration in Spain: The effect of personal characteristics and of unemployment, wage and house price differentials using pooled cross-sections», Oxford Bulletin of Economics and Satistics, 59(2), pp. 215-235. [13] Arbia, G. (1989). Spatial data configuration in statistical analysis of regional economics and related problems. Dordrecht, Kluwer Academic Publishers. [14] Arbia,

G. (2006). Spatial Econometrics. Statistical foundations and

applications to regional convergence. Berlin, Springer. [15] Aznar, A. (1989). Econometric model selection: A new approach. Dordrechet, Kluver Academic Publishers. [16] Aznar, A. and J. Trívez (1987). «Causal relationships between money and income in the Spanish economy», in P. Artus (ed): Monetary Policy: A Theoretical and Econometric Approach. Kluwer: Dordrecht. [17] Aznar, A. and J. Trívez (1988). «Relaciones entre causalidad, exogeneidad y predetermineidad», Estadística Española, 29, pp. 51-69.

160

Bibliography [18] Basile, R. and M. Causi (2007). «Le determinanti dei flussi migratori nelle province italiane: 1991-2000», Economia e Lavoro, 2, pp. 139-159. [19] Beauchamp, T. and A. Rosenberg (1981). Hume and the problem of causation. New York, Oxford University Press. [20] Bertola, G. and P. Garibaldi (2003). «The structure and history of Italian unemployment», CESifo Working Paper Series 907, CESifo Group Munich. [21] Blanchard, O. and L. Katz (1992). «Regional evolutions», Brookings Papers on Economic Activity, Economic Studies Program, The Brookings Institution, 23(1992-1), pp. 1-76. [22] Blommestein, H. and P. Nijkamp (1981). «Soft spatial econometric causality model», working paper presented in the Seventh Pacific Regional Science Conference, Surfers Paradise, Australia. [23] Brady, H. (2003). «Models of causal inference: Going beyond the NeymanRubin-Holland theory», Survey Research Center and UC DATA, University of California, Berkeley. [24] Brockwell, P. and R. Davis (2003). Introduction to time series and forecasting. Berlin, Springer. [25] Burridge, P. and B. Fingleton (2010). «Improving the J test in the SARAR model by likelihood-based estimation», 9th International Workshop in Spatial Econometrics and Statistics, 24-25 June, Orléans, France. [26] Cartwright, N. (1983). How the laws of physics lie. Oxford, Clarendon. [27] Cartwright, N. (1989). Nature’s capacities and their measurement. New York, Oxford University Press. [28] Cartwright, N. (1999). «Causal diversity and the Markov condition», Synthese, 12, pp. 3-27.

161

Bibliography [29] Charemza, W and D. Deadman (1997). New directions in econometric practice. Cheltenham, Edward Elgar. [30] Collingwood, R. (1940). An essay on metaphysics. Oxford, Clarendon Press. [31] Coulon, A. (2005). «More on the employment effect of recent immigration: Switzerland in the 90’s», Working Paper, London School of Economics, pp. 1-17. [32] Cover, T. and J. Thomas (1991). Elements of information theory. New York, NY, John Wiley and Sons. [33] Cressie, N. (1993). Statistics of spatial data. New York, Wiley. [34] Czaplewski, R. and R. Reich (1993). «Expected value and variance of Moran’s bivariate spatial autocorrelation statistic under permutation», USDA For. Serv. Res. Paper RM-309, Fort Collins, Colorado, pp. 1-13. [35] Da Vanzo, J. (1978). «Does unemployment affect migration? Evidence from microdata», Review of Economics and Statistics, 60, pp. 504-514. [36] Davidson, R. and J. G. MacKinnon (1981). «Several tests for model specification in the presence of alternative hypotheses», Econometrica, 49, pp. 781-793. [37] Dawid, A. (2000). «Causal inference without counterfactuals», Journal of the American Statistical Association, 86, pp. 9-26. [38] Demiralp, S. and K. Hoover (2003). «Searching for the causal structure of a vector autoregression», Oxford Bulletin of Economics and Statistics, 65 (supplement), pp. 745-767. [39] Demiralp, S., Hoover, K. and S. Perez (2008). «A bootstrap method for identifying and evaluating a structural vector autoregression», Oxford Bulletin of Economics and Statistics, 70, Issue 4 , pp. 509-533.

162

Bibliography [40] Díaz-Emparanza, I. and A. Espinosa (2000). «Análisis de la relación entre el desempleo y la inmigración internacional», Documentos de Trabajo Biltoki, 013. [41] Domingo, C. (2003). Condiciones de causalidad en modelos econométricos. Ph.D. Thesis, University of Zaragoza, Spain. [42] Dowe, P. (2000). «Is causation influency?», Working paper no published. [43] Dupres, J. and N. Cartwright (1988). «Probability and causality: Why Hume and indeterminism don´t mix», Nous, 22, pp. 521-536. [44] Efron, B. and R. Tibshirani (1993). An Introduction to the bootstrap. Chapman and Hall, New York. [45] Elhorst, J. (2003). «The mystery of regional unemployment differentials: Theoretical and empirical explanations», Journal of Economic Surveys, 17, pp. 709-748. [46] Epifani,

P. and G. Gancia (2005). «Trade,

migration and regional

unemployment», Regional Science and Urban Economics, 35, pp. 625-644. [47] Etzo, I. (2008). «Internal migration: a review of the literature», MPRA Paper 8783, University Library of Munich, Germany. [48] Faini, R., Galli, G., Gennari, P. and F. Rossi (1997). «An empirical puzzle: Falling migration and growing unemployment differentials among Italian regions», European Economic Review, 4, pp. 571-579. [49] Favero, C. and D. Hendry (1992). «Testing the Lucas’ critique: A review», Econometric Reviews, 11(3), pp. 265-306. [50] Feridun, M. (2004). «Does immigration have an impact on economic development and unemployment? Empirical evidence from Finland (19812001)», International Journal of Applied Econometrics and Quantitative Studies, 1, pp. 39-60.

163

Bibliography [51] Feridun, M. (2005). «Economic impact of immigration on the host country: The case of Norway», Prague Economic Papers, 4, pp. 350-362. [52] Feridun, M. (2007). «Immigration, income and unemployment: An application of the bounds testing approach to cointegration», Journal of Developing Areas, 41(1), pp. 37-51. [53] Fingleton, B. and J. Le Gallo (2008). «Estimating spatial models with endogenous variables, a spatial lag and spatially dependent disturbances: Finite sample properties», Papers in Regional Science, 87, pp. 319-339. [54] Fisher, R. (1921). «On mathematical foundations of theoretical statistics», Philosophical Transactions of the Royal Society of London, Serie A, 222, pp. 309-368. [55] Freedman, D. and P. Humphreys (1998). «Are there algorithms that discover causal structure?», UCB Statistics Technical Report, 514. [56] Galloway, R. and J. Jozefowicz (2008). «The effects of immigration on regional unemployment rates in the Netherlands», International Advances in Economic Research, 14(3), pp. 291-302. [57] Gasking, D. (1955). «Causation and recipes», Mind, New Series, 64 (265), pp. 479-487. [58] Getis, A. and J. Aldstadt (2004). «Constructing the spatial weights matrix using a local statistic», Geographical Analysis, 36 (2), pp. 90-104. [59] Getis, A., Mur, J. and H. Zoller (eds.) (2004). Spatial econometrics and spatial statistics. Houndmills, Palgrave Macmillan. [60] Geweke, J. (1984). «Inference and causality in economic time series models», in Griliches and Intriligator (eds.), Handbook of econometrics, vol. 2, Chapter 19. Amsterdam, North Holland, pp. 1101-1144.

164

Bibliography [61] Glennan, S. (1996). «Mechanisms and the nature of causation», Erkenntnis, 44, pp. 49-71. [62] Glennan, S. (2002). «Rethinking mechanistic explanations», en Barrett y Alexander (eds.), PSA 2000 Part II Symposia Papers, Supplement to Philosophy of Science, 69(3), pp. S342-S353. [63] Glymour, C. (1999). «Rabbit hunting», Synthese, 121, pp. 55-78. [64] Good, I. (1961). «A causal calculus I», British Journal for the Philosophy of Science, 11, pp. 305-18. [65] Good, I. (1962). «A causal calculus II», British Journal for the Philosophy of Science, 12, pp. 43-51. [66] Granger, C. (1969). «Investigating causal relations by econometric models and cross-spectral methods», Econometrica, 37(3), pp. 424-438. [67] Granger, C. (1980). «Testing for causality: A personal viewpoint», Journal of Economic Dynamics and Control, 2(4), pp. 329-352. [68] Greenwood, M. (1975). «Research on internal migration in the United States: A survey», Journal of Economic Literature, 13, pp. 397-433. [69] Greenwood, M. (1985). «Human migration: theory, models, and empirical studies», Journal of Regional Science, 25(4), pp. 521-544. [70] Griffith, D. (2003). Spatial autocorrelation and spatial filtering: Gaining understanding through theory and scientific visualization. Berlin, SpringerVerlag. [71] Groenewold, N. (1997). «Does migration equalize regional unemployment rates? Evidence from Australia», Papers in Regional Science, 76(1), pp. 120. [72] Gross, D. (2004). «Impact of immigrant workers on a regional labour market», Applied Economics Letters, 11, pp. 405–408.

165

Bibliography [73] Haavelmo,

T. (1944). «The probability approach in econometrics»,

Econometrica, 12 (supplement), July. [74] Hao, B. and W. Zheng (1998). Applied symbolic dynamics and chaos. World Scientific, Singapore. [75] Hausman, D. (1989). «Economic methodology in a nutshell», Journal of Economic Perspectives, 3(2), pp. 115-127. [76] Hausman, D. (1998). Causal asymmetries. Cambridge, Cambridge University Press. [77] Heckman, J. (1999). «Causal parameters and policy analysis in economics: A twentieth century retrospective», National Bureau of Economic Research, Working Paper 7333. [78] Heckman, J. (2005). «The scientific model of causality», Sociological Methodology, 35, pp. 1-97. [79] Hendry, D. (1995). Dynamic econometrics. Oxford, Oxford University Press. [80] Herzog, H., Schlottmann, A. and T. Boehm (1993). «Migration as spatial jobsearch: A survey of empirical findings», Regional Studies, 27(4) , pp. 327 340. [81] Holland, P. (1986). «Statistics and causal inference» (with discussion), Journal of the American Statistical Association, 81, pp. 945-970. [82] Hood, W. and T. Koopmans (eds.) (1953). «Studies in econometric method», Cowles Commission Monograph 14. New Haven, Yale University Press. [83] Hoover, K. (1990). «The logic of causal inference: Econometrics and the conditional analysis of causality», Economics and Philosophy, 6(2), pp. 207234. [84] Hoover, K. (2001). Causality in macroeconomics. Cambridge, Cambridge University Press.

166

Bibliography [85] Hoover, K. (2004). «Lost causes», Journal of the History of Economic Thought, 26(2), pp. 149-164. [86] Hoover, K. (2006). «Causality in economics and econometrics», in Durlauf and Blume (eds.), The New Palgrave Dictionary of Economics. [87] Horwich, P. (1987). Asymmetries in time: Problems in the philosophy of science. Cambridge, Cambridge University Press. [88] Hughes, G. and B. McCormick (1994). «Does migration reduce differentials in regional unemployment rates?», in Van Dijk et al. (eds), Migration and labor market adjustment, Kluwer Academic Publishers. [89] Hume, D. (1739/2001). Un tratado sobre la naturaleza humana. Ensayo para introducir el método del razonamiento experimental en los asuntos morales. Edición Electrónica, Libros en la red, Diputación de Albacete. [90] Hume, D. (1748/1975). An enquiry concerning human understanding. Oxford, Clarendon Press. [91] Humphreys, P. and D. Freedman (1996). «The grand leap», British Journal of the Philosophy of Science, 47, pp. 113-123. [92] Irzik, G. (2001). «Three dogmas of Humean causation», in Galavotti, Suppes and Costantini (eds.), Stochastic causality, CSLI Publications, pp. 85-101. [93] Isard, W. (1971). Métodos de análisis regional: Una introducción a la ciencia regional. Barcelona, Ariel. [94] Islam, A. (2007). «Immigration unemployment relationship: The evidence from Canada», Australian Economic Papers, 46(1), pp. 52-66. [95] Kelejian, H. (2008). «A spatial J-test for model specification against a single or a set of non-nested alternatives», Letters in Spatial and Resource Sciences, 1, pp. 3-11.

167

Bibliography [96] Kelejian, H. and I. Prucha (1998). «A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances», Journal of Real Estate Finance and Economics, 17, pp. 99–121. [97] Kelejian, H. and I. Prucha (1999). «A generalized moments estimator for the autoregressive parameter in a spatial model», International Economic Review, 40, pp. 509–533. [98] Kelejian, H. and I. Prucha (2007). «The relative efficiencies of various predictors in spatial econometric models containing spatial lags», Regional Science and Urban Economics, 37, pp. 363-374. [99] Kelejian, H. and I. Prucha (2009). «Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances», Working Paper, Department of Economics University of Maryland. [100] King, G. (1997). A solution to the ecological inference problem: Reconstructing individual behavior from aggregate data. Princeton, University Press. [101] Konya, L. (2000). «Bivariate causality between immigration and long-term unemployment in Australia, 1981-1998», Victoria University of Technology Working Paper, 18/00. [102] Koopmans, T. (ed.) (1950). «Statistical inference in dynamic economic models», Cowles Commission Monograph 10. New York, Wiley. [103] Krugman, P. (1991). «Increasing returns and economic geography», Journal of Political Economy, 99, pp. 483-499. [104] Lee, H. (1992). «Maximum likelihood inference on cointegration and seasonal cointegration», Journal of Econometrics, 54, pp. 1-47. [105] Lehmann, E. (1986). Testing statistical hypothesis. New York, John Wiley & Sons.

168

Bibliography [106] Lewis, D. (1973). «Causation», The Journal of Philosophy, 70 (17), pp. 556567. [107] Lewis, D. (1986). «Causal explanation», in Philosophical Papers, v II., Oxford, Oxford University Press. [108] López, F., Matilla-García, M., Mur, J., and M. Ruiz Marín (2010). «A non-parametric spatial independence test using symbolic entropy», Regional Science and Urban Economics, 40 (2-3), pp. 106-115. [109] Lucas, R. (1976). «Econometric policy evaluation: A critique», in Brunner and Meltzer (eds.), The Phillips curve and labor markets. Carnegie Rochester Conference Series on Public Policy, v. 11, Spring. Amsterdam, North Holland, pp. 161-168. [110] Mackie, J. (1965/1993). «Causes and conditions», in Sosa and Tooley (eds.), Causation. Oxford, pp. 33-55. [111] Mackie, J. (1974). The cement of the universe. A study of causation. Oxford, Oxford University Press. [112] Marr, W. and P. Siklos (1994). «The link between immigration and unemployment in Canada», Journal of Policy Modeling, 16(1), pp. 1-25. [113] Marr, W. and P. Siklos (1995). «Immigration and unemployment: A Canadian macroeconomic perspective», in DeVoretz (ed.), Diminishing returns: The economics of Canada’s recent immigration policy, pp. 293-330. Toronto: C.D. Howe Institute. [114] Matilla-García,

M. and M. Ruiz Marín (2008). «A non-parametric

independence test using permutation entropy», Journal of Econometrics, 144, pp. 139-155. [115] Matilla-García, M. and M. Ruiz Marín (2009). «Detection of non-linear structure in time series», Economics Letters, 105(1), pp. 1-6.

169

Bibliography [116] Matilla-García, M. and M. Ruiz Marín (2010). «Spatial symbolic entropy: A tool for detecting the order of contiguity», Geographical Analysis, pre-printed. [117] Meek, C. and C. Glymour (1994). «Conditioning and intervening», British Journal for the Philosophy of Science, 45 (4), pp. 1001-1021. [118] Menzies, P. (2002). «Counterfactual theories of causation», in Zalta (ed.), Stanford Encyclopedia of Philosophy. Stanford, Stanford University. [119] Menzies, P. and M. Price (1993). «Causation as a secondary quality», British Journal for the Philosophy of Science, 44, pp. 187-203. [120] Mill, J. S. (1843/1973). «A system of logic: Ratiocinative and inductive», in Robson (ed.), The collected works of John Stuart Mill, v.7, Toronto, University of Toronto Press. [121] Moran, P. (1950). «The interpretation of statistical maps», Journal of the Royal Statistical Society, Series B, 10, pp. 243-251. [122] Morgan, M. (1990). The history of econometric ideas. Cambridge, Cambridge University Press. [123] Mur, J. and J. Paelinck (2009). «Some issues on the concept of causality in spatial econometrics models», working paper presented in the III World Conference of Spatial Econometrics, Barcelona, España. [124] Neyman, J. (1923). «Statistical problems in agricultural experimentation» (with discussion), Journal of the Royal Statistical Society Suppl., 2, pp. 107180. [125] Openshaw, S. and P. Taylor (1979). «A million or so correlation coefficients: Three experiments on the modifiable area unit problem», in Wrigley (ed.), Statistical Applications in the Spatial Sciences, Pion, London, pp. 127-144. [126] Pace, K. and J. Lesage (2009). Introduction to spatial econometrics. Londres, Chapman & Hall/CRC.

170

Bibliography [127] Paelinck, J. (1983). Formal spatial economic analysis. Gower Press: Aldershot. [128] Paelinck, J. and L. Klaassen (1979). Spatial econometrics. Saxon House. [129] Paelinck J. and P. Nijkamp (1978). Operational theory and method in regional economics. Saxon House. [130] Pearl, J. (1995). «Causal diagrams for empirical research» (with discussion), Biometrika, 82, pp. 669-710. [131] Pearl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, Cambridge University Press. [132] Pischke, J. and J. Velling (1997). «Employment effects of immigration to Germany: An analysis based on local labor markets», The Review of Economics and Statistics, 79(4), pp. 594-604. [133] Pissarides, C. and I. McMaster (1990). «Regional migration, wages and unemployment:

Empirical evidence and implications for policy», Oxford

Economic Papers, Oxford University Press, 42(4), pp. 812-831. [134] Pissarides, C. and J. Wadsworth (1989). «Unemployment and the interregional mobility of labour», Economic Journal, 99(397), pp. 739-755. [135] Price, M. (1979). «Causality in temporal systems: A correction», Journal of Econometrics, 10, pp. 253-256. [136] Reichenbach, H. (1956). The direction of time. Berkeley, University of California Press. [137] Richardson, T. (1997). «A characterization of Markov equivalence for directed cyclic graphs», International Journal of Approximate Reasoning, 17, 2/3, pp.107-162. [138] Richardson, T., Spirtes, P. and C. Glymour (1997). «A note on cyclic graphs and dynamical feedback systems», in Madigan and Smyth (eds.), Proceedings of Artificial Intelligence and Statistics, pp. 421-428.

171

Bibliography [139] Rodríguez Feijoó, S. (1990). La causalidad en el marco econométrico: La determinación empírica de la causalidad. Ph.D. Thesis, University of Las Palmas de Gran Canaria, Spain. [140] Rohatgi, V. (1976). An introduction to probability theory and mathematical statistics. Wiley, New York. [141] Rubin, D. (1974). «Estimating causal effects of treatments in randomized and nonrandomized studies», Journal of Educational Psychology, 66, pp. 688-701. [142] Ruiz, M., López, F. and A. Páez (2009). «Testing for spatial association of qualitative data using symbolic dynamics», Journal of Geographical Systems, 10.1007/s10109-009-0100-1. [143] Russell, B. (1953). «On the notion of cause, with applications to free-will problem», in Feigl and Brodbeck (eds.), Readings in the philosophy of science. New York, Appleton-Century-Crofts, pp. 387-407. [144] Salmon, W. (1980). «Probabilistic causality», Pacific Philosophical Quarterly, 61, pp. 50-74. [145] Sargent, T. (1976). «The observational equivalence of natural and unnatural rate theories of macroeconomics», Journal of Political Economy, 84, pp. 631640. [146] Shafer, J. (2000). «The triumph», The Journal of Philosophy, 97, Abril, pp. 165-181. [147] Shan, J, Morris, A. and F. Sun (1999). «Immigration and unemployment: New evidence from Australia and New Zealand», International Review of Applied Economics, 13(2), pp. 253-258. [148] Shannon, C. (1948). «A mathematical theory of communication», Bell System Technical Journal, 27, pp. 379-423, pp. 623-656, July, October.

172

Bibliography [149] Simon, H. (1953). «Causal order and identifiability», in Hood and Koopmans (eds.), Studies in Econometric Method, Cowles Commission Monograph 14. New Haven, Yale University Press. [150] Simon, H. (1957). «Amounts of fixation and discovery in maze learning behavior», Psychometrika, 22(3), pp. 261-268. [151] Sims, C. (1980). «Macroeconomics and reality», Econometrica, 48, pp. 1-48. [152] Sims, C. (1982). «Policy analysis with econometric models», Brookings Papers on Economic Activity, pp. 107-152. [153] Sims, C. (1986). «Are forecasting models usable for policy analysis?», Federal Reserve Bank of Minneapolis Quarterly Review, 10(1), Winter, pp. 2-15. [154] Sobel, M. (1995). «Causal inference in the social and behavioral sciences», in Arminger, Clogg and Sobel (eds.), Handbook of statistical modeling for the social and behavioral sciences. New York, Plenum Press, pp. 1-38. [155] Sobel, M. (1999). «Causal inference in the social sciences», Journal of the American Statistical Association, 95, pp. 647–651. [156] Soofi, E. (1994). «Capturing the intangible concept of information», Journal of the American Statistical Association, 89(428), pp. 1243-1254. [157] Soon, S. (1996). «Binomial approximation for dependent indicators», Statistica Sinica, 6(3), pp. 703–714. [158] Spirtes, P., Glymour C. and R. Scheines (2000). Causation, prediction, and search. 2nd edition. Cambridge, MA, MIT Press. [159] Stern, D. (2000). «Applying recent developments in time series econometrics to the spatial domain», Professional Geographer, 52(1), pp. 37-49. [160] Suppes, P. (1970). A probabilistic theory of causality. Acta Philosophica Fennica, Fasc. XXIV.

173

Bibliography [161] Swanson, N. and C. Granger (1997). «Impulse response functions based on a causal approach to residual orthogonalization in vector autoregressions», Journal of the American Statistical Association, 92(1), pp. 357-367. [162] Tian, G. and J. Shan (1999). «Do migrants rob jobs? New evidence from Australia», Australian Economic History Review, 39(2), pp. 133-142. [163] Tiefelsdorf, M. (2000). Modelling spatial processes. The identification and analysis of spatial relationships in regression residuals by means of Moran’s I. Springer: Berlin. [164] Tobler W. (1970). «A computer movie simulating urban growth in the Detroit region», Economic Geography, 46, pp. 234-240. [165] Trívez, J. (1986). Causalidad de Granger en un marco bivariante. Ph.D. Thesis, University of Zaragoza, Spain. [166] Trívez, J. (1991). «Causalidad de Granger en un marco bivariante», Estadística Española, 33, pp. 131-154. [167] Upton, G. and B. Fingleton (1985). Spatial data analysis by example, vol. 1. New York, Wiley. [168] von Wright, G. (1971). Explanation and understanding. New York, Cornell University Press. [169] Wartenberg, D. (1985). «Multivariate spatial correlation:

a method for

exploratory geographical analysis», Geographical Analysis, 17, pp. 263-283. [170] Wiener, N. (1956). «The theory of prediction», in Beckenbach, E. (ed.), Modern Mathematics for Engineers, Mc Graw, New York. [171] Winter-Ebmer, R. and J. Zweimuller (1994). «Do immigration displace native workers? The Austrian experience», CEPR Discussion Paper 991, London. [172] Withers, G. and D. Pope (1985). «Inmigration and unemployment», The Economic Record, 61, pp. 554-563.

174

Bibliography [173] Withers, G. and D. Pope (1993). «Do migrants rob jobs? Lessons from the Australian history 1961-1991», Journal of Economic History, 53, pp. 719-742. [174] Woodward, J. (2000). «Explanation and invariance in the special sciences», British Journal of Philosophy of Science, 51, pp. 197-254. [175] Woodward,

J.

(2001).

«Probabilistic

causality,

direct

causes

and

counterfactual dependence», in Galavotti, Suppes and Constantini (eds.), Stochastic causality, Stanford, CSLI Publications, pp. 39-63. [176] Woodward, J. (2002). «Causation and manipulability», in Zalta (ed.) Stanford Encyclopedia of Philosophy. Stanford, Stanford University. [177] Zalta, E. (ed.) (2002). Stanford encyclopedia of philosophy. The Metaphysics Research Lab, Center for the Study of Language and Information. Stanford, Stanford University. [178] Zellner, A. (1979). «Causality and econometrics», in Brunner and Meltzer (eds.), Three aspects of policy making: Knowledge, data and institutions, Carnegie-Rochester Conference Series on Public Policy, vol. 10, Amsterdam, North-Holland, pp. 9-54.

175

Suggest Documents