Geostatistical Methods for the Identification of Flow and ... - CiteSeerX

6 downloads 0 Views 2MB Size Report
Nov 8, 2004 - The model function or transfer function is a mathematical function .... Stability: The solution for the model parameters is a continuous function of model input and the .... in which p (y), the marginal pdf of the observations, is merely a ...... ∣ξk . 2. Find ξk+1 by solving: ξk+1 = ξk + ∆ξ. (6.47). [ ˜HT k Wξξ˜Hk. ˜gT.
Geostatistical Methods for the Identification of Flow and Transport Parameters in the Subsurface

Von der Fakultät Bau- und Umweltingenieurwissenschaften der Universität Stuttgart zur Erlangung der Würde eines Doktors der Ingenieurwissenschaften (Dr.-Ing.) genehmigte Abhandlung

Vorgelegt von

Wolfgang Nowak

aus Esslingen am Neckar

Hauptberichter: Nebenberichter:

Prof. Dr. András Bárdossy Prof. Peter K. Kitanidis PD Dr. Olaf A. Cirpka

Tag der mündlichen Prüfung:

8. November 2004

Institut für Wasserbau der Universität Stuttgart

2005

D93

Geostatisical Methods for the Identification of Flow and Transport Parameters in the Subsurface

CIP Titelaufnahme der Deutschen Bibliothek

Nowak, Wolfgang: Geostatisical Methods for the Identification of Flow and Transport Parameters in the Subsurface /von Wolfgang Nowak. Institut für Wasserbau, Universität Stuttgart. - Stuttgart: Inst. für Wasserbau, 2005 (Mitteilungen / Institut für Wasserbau, Universität Stuttgart ; H. 134) Zugl.: Stuttgart, Univ., Diss., 2005 ISBN 3-9337 61-37-9

Gegen Vervielfältigung und Übersetzung bestehen keine Einwände, es wird lediglich um Quellenangabe gebeten. Herausgegeben 2005 vom Eigenverlag des Instituts für Wasserbau Druck: Sprint-Druck, Stuttgart

Acknowledgments I would like to gratefully thank my supervisor PD Dr. Olaf A. Cirpka for the great supervision, guidance and constructive discussions. Furthermore, I owe a debt of gratitude to my co-supervisors Prof. Peter K. Kitanidis and Prof. András Bárdossy, and I would like to explicitely mention Prof. Peter K. Kitanidis and his group at Stanford University in thankful recognition of their hospitality. The experimental data used in this study were laboriously collected by my two colleagues Surabhin Jose and Arifur Rahman with the support of many research assistants. Thanks to the entire SMART work group for the friendly working environment. Sascha Tenkleve was very helpful in the work on spectral methods. Erika Bäcker, John Geier, Sascha Tenkleve and Insa Neuweiler contributed to this dissertation by reviewing certain chapters. This thesis has been funded by the Deutsche Forschungsgemeinschaft under the grant Ci 26/3-2 to Ci26/3-4 “Experimental and Numerical Studies on Mixing of Reactive Compounds in Groundwater”.

Contents List of Symbols and Abbreviations Abstract

IX XIII

Zusammenfassung 1 Introduction

XV 1

1.1

Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.2

Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

2 Approaches for Inverse Modeling

3

2.1

Computational Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

2.2

Model Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

2.2.1

Origins of Model Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

2.2.2

Modeling under Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

2.2.3

Challenges in Inverse Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

2.3

Geostatistical Description of Heterogeneity . . . . . . . . . . . . . . . . . . . . . . . . .

7

2.4

Bayes Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

2.5

Inverse Modeling in Subsurface Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

2.6

2.5.1

Deterministic Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

2.5.2

Geostatistical Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

2.5.3

Hybrid Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

Geostatistical Inverse Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

3 Governing Equations

17

3.1

Groundwater Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

3.2

Solute Transport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

3.3

Moment Generating Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

II

CONTENTS

4 Heterogeneity in Nature and Model 4.1

4.2

21

Solute Transport in Heterogeneous Aquifers . . . . . . . . . . . . . . . . . . . . . . . .

21

4.1.1

Transport Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

4.1.2

Flow in Heterogeneous Porous Media . . . . . . . . . . . . . . . . . . . . . . . .

21

4.1.3

Dispersion of Solutes in Porous Media . . . . . . . . . . . . . . . . . . . . . . . .

22

4.1.4

The Method of Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

Dispersion in Computational Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

4.2.1

Macrotransport Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

4.2.2

Geostatistical Inverse Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

4.2.3

Conditional Realizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

4.2.4

Combined Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

5 Approach

33

5.1

The Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33

5.2

Proposed New Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

5.2.1

Outline of the Proposed New Method . . . . . . . . . . . . . . . . . . . . . . . .

34

5.2.2

Temporal Moments of Local Breakthrough Curves . . . . . . . . . . . . . . . . .

35

5.2.3

Properties of the Dispersion Coefficient . . . . . . . . . . . . . . . . . . . . . . .

36

5.2.4

Discretization and Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38

Detailed Task Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

5.3

6 Quasi-Linear Geostatistical Inversing 6.1

6.2

6.3

41

Linear Cokriging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

41

6.1.1

Cokriging with Uncertain Mean . . . . . . . . . . . . . . . . . . . . . . . . . . .

42

6.1.2

Properties of Cokriging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

46

6.1.3

Bad Condition of the Cokriging Matrix . . . . . . . . . . . . . . . . . . . . . . .

48

Quasi-Linear Geostatistical Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

6.2.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

49

6.2.2

Successive Linearization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

51

6.2.3

Conventional Iteration Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . .

52

6.2.4

Form of the Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

6.2.5

Drawbacks of the Conventional Algorithm . . . . . . . . . . . . . . . . . . . . .

54

Modified Levenberg-Marquardt Algorithm . . . . . . . . . . . . . . . . . . . . . . . . .

55

6.3.1

57

Properties of the Modified Algorithm . . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS

III

6.3.2

Step Size Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

57

6.3.3

Application to known and unknown mean . . . . . . . . . . . . . . . . . . . . .

58

6.4

Identification of Structural Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58

6.5

Performance Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59

6.6

Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

62

7 Sensitivity Analysis

63

7.1

Outline of the Adjoint-State Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

64

7.2

Small Perturbation Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65

7.2.1

Log-Conductivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

65

7.2.2

Log-Dispersion Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

66

Weak Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

7.3.1

Log-Conductivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

7.3.2

Log-Dispersion Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71

Adjoint State Sensitivities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

72

7.4.1

Sensitivities with Respect to log K . . . . . . . . . . . . . . . . . . . . . . . . . .

72

7.4.2

Sensitivities With Respect to log D . . . . . . . . . . . . . . . . . . . . . . . . . .

74

7.5

On Practical Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

75

7.6

Graphical Examples and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

76

7.3

7.4

8 Spectral Methods for Geostatistics 8.1

8.2

8.3

78

Basic Matrix Operations in Geostatistics . . . . . . . . . . . . . . . . . . . . . . . . . . .

79

8.1.1

Convolution-Type Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . .

79

8.1.2

Decomposition-Type Operations . . . . . . . . . . . . . . . . . . . . . . . . . . .

80

8.1.3

Deconvolution-Type Operations . . . . . . . . . . . . . . . . . . . . . . . . . . .

80

Computational Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

81

8.2.1

Convolution-Type Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . .

81

8.2.2

Decomposition-Type Operations . . . . . . . . . . . . . . . . . . . . . . . . . . .

81

8.2.3

Deconvolution-Type Operations . . . . . . . . . . . . . . . . . . . . . . . . . . .

82

Structured Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

83

8.3.1

Diagonal Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

83

8.3.2

Block Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

83

8.3.3

Toeplitz Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

84

8.3.4

Circulant Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

84

IV

CONTENTS

8.4

8.5

8.6

8.7

8.8

8.3.5

Vandermolde Matrices and the Discrete Fourier Matrix . . . . . . . . . . . . . .

85

8.3.6

Properties and Structure of Covariance Matrices . . . . . . . . . . . . . . . . . .

86

Circulant Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

8.4.1

Graphical Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

86

8.4.2

Mathematical Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

87

8.4.3

Embedding Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

88

Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

88

8.5.1

Matrix-Vector Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

88

8.5.2

Bilinear and Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

89

8.5.3

Matrix-Matrix Multiplications . . . . . . . . . . . . . . . . . . . . . . . . . . . .

89

8.5.4

Exemplification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

90

Realizations of Non-Stationary Random Fields . . . . . . . . . . . . . . . . . . . . . . .

91

8.6.1

Stationary Random Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

91

8.6.2

Non-Stationary Random Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . .

92

8.6.3

Illustrative Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

94

Deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

95

8.7.1

Preconditioned Conjugate Gradients . . . . . . . . . . . . . . . . . . . . . . . . .

95

8.7.2

Circulant PCG for Toeplitz Systems . . . . . . . . . . . . . . . . . . . . . . . . .

96

8.7.3

Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

99

Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

9 Finite Element Formulations 9.1

102

Discretization by the Finite Element Method . . . . . . . . . . . . . . . . . . . . . . . . 103 9.1.1

Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

9.1.2

Method of Weighted Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

9.2

Groundwater Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

9.3

Temporal Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

9.4

Adjoint States Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

9.5

Post-processing for Sensitivities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

9.6

Computational Speedup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

CONTENTS

10 Application to Artificial Data

V

108

10.1 Basic Test Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 10.2 Input Data and Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 10.2.1 Magnitude of the Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 10.2.2 Estimated Breakthrough Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 10.2.3 Longitudinal and Transverse Character . . . . . . . . . . . . . . . . . . . . . . . 115 10.3 Structural Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 10.4 Stability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 10.4.1 Interdependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 10.4.2 Normalized Second Central Moment . . . . . . . . . . . . . . . . . . . . . . . . 119 10.5 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 11 Application to Experimental Data

121

11.1 Experimental Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 11.2 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 11.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 11.3.1 Statistical Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 11.3.2 Data Quality and Quantity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 11.3.3 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 11.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 12 Summary and Conclusions

135

12.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 12.2 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 12.3 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Bibliography

141

A Mathematical Tools

147

A.1 Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 A.1.1 Partitioned Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 A.1.2 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 A.1.3 Matrix Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 A.2 Integration Rules for Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 A.3 Analytical Expressions for Element Matrices . . . . . . . . . . . . . . . . . . . . . . . . 149

List of Figures 2.1

Forward modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4

2.2

Inverse modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

2.3

Deterministic approach for flow model calibration . . . . . . . . . . . . . . . . . . . . .

10

2.4

Three conditional realizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12

4.1

Heterogeneity from pore scale to regional scale . . . . . . . . . . . . . . . . . . . . . . .

22

4.2

Dispersive mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

4.3

Local breakthrough curve associated to a pulse-like injection and its temporal moments 24

4.4

Macrodispersion, effective dispersion and plume size . . . . . . . . . . . . . . . . . . .

28

4.5

Sampling density and dispersion in estimated conductivity fields . . . . . . . . . . . .

30

6.1

Test case for Quasi-Linear Geostatistical Inversing . . . . . . . . . . . . . . . . . . . . .

60

7.1

Sensitivities with respect to log K and log Ds . . . . . . . . . . . . . . . . . . . . . . . .

76

8.1

Periodic Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

87

8.2

Finite and periodic covariance function . . . . . . . . . . . . . . . . . . . . . . . . . . .

87

8.3

Memory consumption for Qss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

90

8.4

CPU time for quadratic forms of Toeplitz matrices . . . . . . . . . . . . . . . . . . . . .

91

8.5

Different types of non-stationary realizations . . . . . . . . . . . . . . . . . . . . . . . .

95

8.6

Condition number of different Toeplitz matrices . . . . . . . . . . . . . . . . . . . . . .

97

8.7

Number of PCG iterations for different circulant preconditioners . . . . . . . . . . . .

99

8.8

Impact of filtering in circulant PCG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

10.1 Test case. True log K distribution and resulting flow net and temporal moments from numerical simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 10.2 Results of geostatistical inversing for an artificial data set extracted from the test case displayed in Figure 10.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

LIST OF FIGURES

VII

10.3 Effect of doubling the transverse resolution of the measurement grid . . . . . . . . . . 111 10.4 Effect of doubling the measurement error . . . . . . . . . . . . . . . . . . . . . . . . . . 112 10.5 Estimated log Ds an data quality and quantity . . . . . . . . . . . . . . . . . . . . . . . 113 10.6 Simulated breakthrough curves for different data spacing . . . . . . . . . . . . . . . . . 114 10.7 Impact of components in log Ds depending on input data . . . . . . . . . . . . . . . . . 116 11.1 Filling of the sandbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 11.2 Microstructures within the sand lenses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 11.3 Experimental and filtered truncated breakthrough curve . . . . . . . . . . . . . . . . . 124 11.4 Results of geostatistical inversing for experimental data set . . . . . . . . . . . . . . . . 126 11.5 Standard deviation of estimation (log K and log Ds ) for experimental data set . . . . . 127 11.6 Orthonormal residuals for experimental data set . . . . . . . . . . . . . . . . . . . . . . 129 11.7 Comparison of sandbox filling and estimated log K . . . . . . . . . . . . . . . . . . . . 131 11.8 Tracer concentration in transverse experiment and in simulation . . . . . . . . . . . . . 133

List of Tables 6.1

Parameters used in the test cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

61

10.1 Parameters for artificial test cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 11.1 Sand Types for the sandbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 11.2 Parameters and prior knowledge used for experimental data set . . . . . . . . . . . . . 125 11.3 Posterior parameter values from the experimental data set . . . . . . . . . . . . . . . . 128 11.4 Estimation variances σ 2 and posterior correlation coefficients r for the uncertain drift coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

List of Symbols and Abbreviations As a general convention, regular letters like concentration c or the molecular diffusion coefficient Dm are scalars. Bold small letters are vectors (usually column vectors), such as velocity v or the data vector of observations y. Bold capital letters like the dispersion tensor D or the sensitivity matrix H are matrices. For matrix indices, the first index corresponds to the row number and the second index corresponds to the column number.

Scalar Quantities Symbol c dx dy Da Dg D` Dloc Dm Ds Dt i j k K Kg L L Lm Lp m mk m2c m2cn n p P t x y

Units   M L−3 [L] [L]2 −1  L2 T −1  L2 T −1  L2 T −1  L2 T −1  L2 T −1  L2 T −1  L T  2 −1  L2 T −1  L T [L]

  −3 k M L−3 T 2  M L−3 T  ML T [T ] [L] [L]

Description Solute concentration Grid spacing in x-direction Grid spacing in y-direction Apparent dispersion coefficient, specified by subscript Geometric mean of the dispersion coefficient Longitudinal dispersion coefficient Local dispersion coefficient Molecular diffusion coefficient Scalar dispersion coefficient Transverse dispersion coefficient Index, imaginary number Index Iteration step number, order of temporal moments Hydraulic conductivity Geometric mean of conductivity Length Objective function Likelihood term in the objective function Prior term of the objective function Number of observations k-th temporal moments Second central temporal moments Normalized second central temporal moment Number of unknown parameter values Probability density, number of base functions General parameter Time Spatial coordinate Spatial coordinate

X

Y Y0 z z Z α` αt γ Γ ε εc εr λ λx λy µ ν Ξ Ξ0 π σ τ φ χ2 ψ Ω

List of Symbols and Abbreviations

Log-conductivity Perturbations of log-conductivity General state variable Spatial coordinate Observation of a general state variable Longitudinal dispersivity Transverse dispersivity Parameter regulating innovation and projection Boundary Random vector Complex random vector Real random vector Levenberg-Marquardt parameter for innovation Correlation length in x-direction Correlation length in y-direction Mean value of quantity specified by subscript Lagrange multiplier Log-dispersion coefficient Perturbations of log-dispersion coefficient Pi Standard deviation of quantity specified by subscript Levenberg-Marquardt parameter for projection Hydraulic head Chi-square Adjoint state variable, specified by subscript Domain

[L] [L] [L]

[L] [L]

[L]

Vectors and Matrices Symbol C D f F G H ˜k H I M n P q Q r ˆr ˆrn R

Units 



L2 T −1

LT −1





Description Cirulant matrix Dispersion tensor Model transfer function, m × 1 Discrete Fourier matrix Generalized covariance matrix, specified by subscripts Sensitivity matrix, m × n Sensitivity matrix in k-th iteration step, m × n Identity matrix Mapping matrix Normal vector Block of a partitioned matrix Specific discharge Covariance matrix, specified by subscripts Vector of measurement error, m × 1 Residuals of the inverse model, m × 1 Orthonormalized residuals, m × 1 Matrix of measurement error, m × m

List of Symbols and Abbreviations

s ˆs sk S T u v v V W X y ˆ y yk0 β β∗ βb θ θ∗ λ Λ ξ



LT −1



Vector of unknown values, n × 1 Posterior (conditional) mean of unknown parameters, n × 1 Current estimate at k-th iteration step, n × 1 Diagonal scaling matrix Toeplitz matrix Vector to be convoluted Fourier transform of u Velocity Vandermolde matrix Weighting matrix, weighting function (FEM) Matrix of discretized base functions for uncertain mean, n × p Vector of observations, m × 1 Observations reproduced by inverse model, m × 1 Modified observations in k -th iteration step, m × 1 Coefficients for base functions, p × 1 Prior mean value, p × 1 Posterior mean values, p × 1 Structural parameters Prior estimate of structural parameters Vector of eigenvalues Diagonal matrix of eigenvalues Cokriging weights, m × 1

Operators and Other Symbols Symbol ∂ () ∆ ∇ () · () () ◦ () δ () δij ()T H () −1 () 1 () 2

Description Partial derivative Difference operator Nabla operator Scalar product Hadamard (elementwise) product Dirac delta function Kronecker delta Matrix Transpose (as superscript) Hermitian matrix transpose (as superscript) Inverse (for matrices) Symmetric square root decomposition (for matrices)

Abbreviations Symbol BLUE FEM

Description Best Linear Unbiased Estimator Finite Element Method

XI

XII

FFT MAP PCG PDE SLE SPDE SUPG

List of Symbols and Abbreviations

Fast Fourier Transform Maximum A Posterior Likelihood Method Preconditioned Conjugate Gradient Partial Differential Equation Successive Linear Estimator Stochastic Partial Differential Equation Streamline Upwind Petrov Galerkin

Abstract Per definition, log-conductivity fields estimated by geostatistical inversing do not resolve the full variability of heterogeneous aquifers. Therefore, in transport simulations, the dispersion of solute clouds is under-predicted. Macrotransport theory defines dispersion coefficients that parameterize the total magnitude of variability. Using these dispersion coefficients together with estimated conductivity fields would over-predict dispersion, since estimated conductivity fields already resolve some of the variability. Up to presence, only a few methods exist that allow to use estimated conductivity fields for transport simulations. A review of these methods reveals that they are either associated with excessive computational costs, only cover special cases, or are merely approximate. Their predictions hold only in a stochastic sense and cannot take into account measurements of transportrelated quantities in an explicit manner. In this dissertation, I successfully develop, implement and apply a new method for geostatistical identification of flow and transport parameters in the subsurface. The parameters featured here are the log-conductivity and a scalar log-dispersion coefficient. The extension to other parameters like retardation coefficients or reaction rates is straightforward. Geostatistical identification of flow parameters is well-known. However, simultaneous identification together with transport parameters is new. In order to implement the new method, I develop a modified Levenberg-Marquardt algorithm for the Quasi-Linear Geostatistical Approach and extend the latter to the generalized case of uncertain prior knowledge. I derive the sensitivities of the state variables of interest with respect to the newly introduced scalar log-dispersion coefficient. Further, I summarize and extend the list of spectral methods that help to drastically speed up the expensive matrix operations involved in geostatistical inverse modeling. If the quality and quantity of input data is sufficient, the new method accurately simulates the dispersive mechanisms of spreading, dilution and the irregular movement of the center of mass of a plume. Therefore, it adequately predicts mixing of solute clouds and effective reaction rates in heterogeneous media. I perform extensive series of test cases in order to discuss and prove certain properties of the new method and the new dispersion coefficient. The character and magnitude of the identified dispersion coefficient depends strongly on the quality and quantity of input data and their potential to resolve variability in the conductivity field. Because inverse models of transport are coupled to inverse models of flow, the information in the input data has to sufficiently characterize the flow field. Otherwise, transport-related input data cannot be interpreted. Application to an experimental data set from a large-scale sandbox experiment and comparison to results from existing approaches in macrotransport theory show good agreement.

Zusammenfassung Motivation Grundwasser ist eine wichtige Ressource, die weltweit sowohl in ihrer Quantität als auch Qualität gefährdet ist. Numerische Modelle für die Vorhersage und Risikoabschätzung der Grundwassermenge und -beschaffenheit sind von Nöten, um eine nachhaltige Bewirtschaftung zu gewährleisten. Insbesondere für die Planung und Auslegung von Sanierungsmaßnahmen und die Abschätzung des natürlichen Rückhaltevermögens sind solche Modelle unabdingbar. Um die Zuverlässigkeit und das Vorhersagevermögen solcher Modelle sicherzustellen, ist es notwendig, das fragliche natürliche System möglichst genau im Modell abzubilden. Unglücklicherweise sind Grundwasserleiter im Allgemeinen heterogen, und Messungen der Eigenschaften von Grundwasserleitern sind teuer und fehleranfällig. Somit sind die Parameter in numerischen Modellen in fast allen Fällen unsicher und fehlerbehaftet. Wenn diese Unsicherheit nicht rigoros quantifiziert wird, sind Modelle aufgrund unbekannter Unsicherheit ihrer Vorhersagen nutzlos. Daher müssen Methoden zur Kalibrierung von Modellen unter allen Umständen die Unsicherheit und die Heterogenität von Grundwasserleitern berücksichtigen. Die Unsicherheit in der Beschreibung von Grundwasserleitern kann am besten in einem stochastischen Rahmen in Angriff genommen werden. Die Verbindung von Modellkalibrierung und Stochastik führte zu Methoden der Geostatistischen Invertierung. Diese behandeln die LogDurchlässigkeit oder andere hydraulische Parameter als Zufallsfunktionen im Raum, charakterisiert durch den Mittelwert und eine Kovarianzfunktion. Messungen abhängiger Größen, wie zum Beispiel der Piezometerhöhe, werden verwendet, um Informationen über die Parameter durch Konditionierung zu erhalten. Da solche Eingabedaten lediglich unvollständige Information vermitteln, können die so geschätzten Log-Duchlässigkeitsfelder niemals den vollen Umfang der Variabilität eines natürlichen Systems wiedergeben. Die Entwicklung solcher Methoden stellt auch für die Zukunft noch anspruchsvolle Aufgaben an die Forschung. Die wichtigsten Mechanismen für den Stofftransport in heterogenen Systemen sind Advektion und Dispersion. Advektion ist der Transport von gelösten Stoffen mit der Strömung des Grundwassers. Dispersion rührt von der Heterogenität des Strömungsfeldes und somit von der Heterogenität des Grundwasserleiters. Da geschätzte Log-Durchlässigkeitsfelder per Definition nicht die gesamte Variabilität wiederspiegeln können, wird bei Ihrer Verwendung in der Transportmodellierung die Dispersion gelöster Stoffe unterschätzt. Die Theorie von Transportvorgängen auf der Makroskala definiert Dispersionskoeffizienten, welche eine vollständige Vernachlässigung der Variabilität in numerischen Modellen ausgleicht. Da geschätzte Log-Durchlässigkeitsfelder die Variabilität bereits teilweise berücksichtigen, würde der Einsatz von makroskaligen Dispersionskoeffizienten auf geschätzten Feldern zu einer Überschätzung der Dispersion führen. Eine fehlerhafte Beschreibung der Dispersion führt zu einer fehlerhaften Beschreibung der Verdünnung und Durchmischung und letzten Endes zu einer falschen Vorhersage chemischer Reaktionen zwischen gelösten Stoffen im Untergrund.

XVI

Zusammenfassung

Bis heute existieren lediglich wenige Methoden, die es erlauben, geschätzte Durchlässigkeitsfelder in der Transportmodellierung einzusetzen. Diese Methoden sind jedoch entweder mit nicht tragbarem rechnerischem Aufwand verbunden, nur auf Spezialfälle anwendbar oder nur näherungsweise gültig. All diese Methoden geben Transportprozesse in heterogenen Medien lediglich in einem stochastischen Sinne wieder und machen keinen expliziten Gebrauch von transportbezogenen Messdaten. In der einschlägigen Literatur werden drei dispersive Mechanismen unterschieden: Die unregelmäßige Bewegung des Massenschwerpunktes von Stoffwolken, Spreitung und Verdünnung. Bestehende Methoden können bislang nur Summen dieser Prozesse nachahmen. Es ist mit den bisherigen Mitteln unmöglich, jeden dieser Mechanismen gleichzeitig quantitativ und qualitativ angemessen in numerische Modelle mit abgeschätzten oder kalibrierten Parameterwerten für heterogene Medien einzubinden.

Ziel dieser Arbeit Ziel dieser Arbeit ist es, eine geostatistische Methode zur Bestimmung von Parameterwerten für Strömung und Transport im Untergrund zu entwickeln, umzusetzen und anzuwenden. Die Methode soll einen möglichst vollständigen Gebrauch aller zur Verfügung stehenden Messdaten und sonstiger Informationen machen. Außerdem soll sie insbesondere die Dispersion in angemessener Art und richtigem Umfang wiederspiegeln, unter getrennter Berücksichtigung der Unsicherheit des Massenschwerpunktes, der Spreitung und der Verdünnung. Dies ist die Grundlage, um die Vermischung gelöster Stoffe und effektive Reaktionsraten in heterogenen Medien vorhersagen zu können.

These und Ansatz In dieser Dissertation stelle ich folgende These auf: Geostatistische inverse Modellierung führt zu geglätteten Parameterfeldern zur Beschreibung heterogener Medien. Diese geglätteten Felder sind den heterogenen Medien für die Modellierung von Strömungs- und Transportvorgängen dann äquivalent, wenn Strömungs- und Transportprozesse bei der inversen Modellierung berücksichtigt wurden. Die Medien müssen dabei mit Feldern relevanter Parametern sowohl für Strömungs- als auch für Transportprozesse beschrieben werden, konditioniert auf Messdaten von Zustandsgrößen aus Stömungs- und Transportprozessen. So einfach diese These auch klingt, wurde sie bis zum heutigen Tag weder in der Fachliteratur formuliert noch umgesetzt. Basierend auf dieser These habe ich erfolgreich eine neue Methode zur geostatistischen Identifizierung von Strömungs- und Transportparametern im Untergrund hergeleitet, entwickelt und angewendet. Die in dieser Dissertation betrachteten Parameter sind die Log-Durchlässigkeit und ein skalarer Dispersionskoeffizient. Eine Ausweitung auf weitere Transportparameter ist ohne weiteres möglich. Die geostatistische Identifizierung von Strömungsparametern ist in der Literatur wohlbekannt. Die simultane geostatistische Identifizierung von Strömungs- und Transportparametern ist jedoch ein in dieser Dissertation erstmalig angewandtes Konzept. Als Eingabedaten verwende ich hydraulische Daten und Daten aus Tracer-Experimenten, um sowohl Strömungs- als auch Transportvorgänge zu beschreiben. Die hier berücksichtigten hydraulischen Daten sind Messungen der Durchlässigkeit, der Piezometerhöhe und des Gesamtdurchflusses. Aus Tracer-Experimenten beziehe ich zeitliche Momente lokaler Durchbruchskurven. Zur Ermittlung der

Zusammenfassung

XVII

Parameterwerte verwende ich eine existierende Bayessche Methode zur geostatistischen Parameterabschätzung. Die so ermittelte räumliche Verteilung des Dispersionskoeffizienten gleicht die Menge an nicht aufgelöster Variabilität im gleichzeitig ermittelten Durschlässigkeitsfeld aus. Mit Hilfe der identifizierten Parameterfelder kann die Grundwasserströmung und der advektivdispersive Transport im Untergrund modelliert werden. Dabei wird jeder der drei oben genannten dispersiven Mechanismen qualitativ und quantitativ abgebildet, soweit es die zur Verfügung stehenden Eingabedaten ermöglichen. Diese Felder können somit bei ausreichend guter Datenlage die Durchmischung von Stoffwolken qualitativ und quantitativ beschreiben und effektive Reaktionsraten in heterogenen Medien vorhersagen.

Verbesserungen existierender Methoden und neue Methoden Die hier entwickelte neue Methode basiert auf dem Quasi-Linearen Geostatistischen Ansatz von Kitanidis (1995) [56]. Ich konnte diesen Ansatz mit Hilfe spektraler (FFT-basierten) Methoden erheblich beschleunigen. Seine Stabilität konnte ich durch die Entwicklung eines speziell angepassten Levenberg-Marquardt Algorithmus an Stelle des ursprünglich verwendeten Gauss-Newton Algorithmus erhöhen. Des weiteren habe ich den Ansatz um das Konzept unsicheren A-Priori-Wissens erweitert. Für die neuen Arten von Eingabedaten und den neu eingeführten Parameter habe ich die entsprechenden Sensitivitäten numerischer Modelle mit Hilfe der Methode adjungierter Zustände analytisch hergeleitet. Die Sensitivitäten werden benötigt, um im Laufe der Invertierung die Modellfunktionen sukkzessiv zu linearisieren. Die neu hergeleiteten Sensitivitäten sind die des Gesamtdurchflusses und des zweiten zeitlichen Zentralmomentes bezüglich der Log-Durchlässigkeit sowie die des ersten und des zweiten zentralen zeitlichen Momentes bezüglich des Log-Dispersionskoeffizienten. Zusätzlich habe ich eine Auswahl an spektralen Methoden für Töplitz-Matrizen zusammengefasst und um neue Methoden erweitert. Mit ihrer Hilfe kann ich alle aufwendigen Matrizenoperationen im Quasi-Linearen Geostatistischen Ansatz mit Leichtigkeit ausführen. Die von mir neu entwickelten spektralen Methoden sind die Matrix-Toeplitz-Matrix Multiplikation, die Lösung schlecht konditionierter Töplitz-Systeme und die Generierung von Realisationen mit bestimmter Art von statistischer Nicht-Stationarität. Zur weiteren Beschleunigung der Berechnungen habe ich analytische Lösungen bestimmter elementbezogener Matrizen im Finite-Elemente-Verfahren mit Hilfe von Tensoren dritter Ordnung hergeleitet. Diese analytischen Lösungen für elementbezogene Matrizen können vollständig vektorisiert zum globalen Finite-Elemente Gleichungssystem zusammengesetzt werden.

Eigenschaften des neuen Dispersionskoeffizienten Der neu eingeführte Dispersionskoeffizient ist skalar, logarithmisch, effektiv und spezifisch. Dies erklärt sich im Einzelnen wie folgt: • Ich habe ihn als eine skalare Größe definiert, weil einzelne Einträge eines vollständigen Dispersionstensors auf Grund allgemeiner Einschränkungen bei der Aufnahme von Messdaten nicht bestimmt werden könnten. • Ich habe ihn als log-normal verteilte Größe definiert, um seine Nichtnegativität zu garantieren. • Es handelt sich um einen effektiven Dispersionskoeffizienten, weil die hier verwendeten Eingabedaten effektive Dispersion im Sinne von Dentz et al. [22] beschreiben.

XVIII

Zusammenfassung

• Ich nenne ihn einen spezifischen Parameter, weil er spezifisch den Mangel an Variabilität im gleichzeitig geschätzten Log-Durchlässigkeitsfeld ausgleicht. Da der Parameter spezifisch für das gleichzeitig ermittelte Log-Durchlässigkeitfeld ist, ändern sich seine Eigenschaften mit der Qualität und Quantität der zur Verfügung stehenden Eingabedaten. Für den Grenzfall, dass die Eingabedaten die Varibilität des Log-Durchlässigkeitsfeldes vollständig wiedergeben, verhält sich der Dispersionskoeffizient näherungsweise wie ein transversaler lokaler Dispersionskoeffizient. Im Grenzfall völliger Abwesenheit von Informationen weist das identifizierte Log-Durchlässigkeitsfeld gar keine Variabilität auf. Dann verhält sich der ermittelte Koeffizient wie ein longitudinaler effektiver Dispersionskoeffizient. Dies bezieht sich insbesondere auf die Größenordnung seiner Werte und die von ihm beschriebenen Mechanismen.

Eigenschaften der neuen Methode Die Fähigkeit der Methode zur Vorhersage steigt selbstverständlich mit zunehmender Qualität und Quantität der Eingabedaten an. Die Eingabedaten sind ausreichend um das System nahezu vollständig zu beschreiben, wenn mehr als ein Messpunkt der verwendeten Eingabedaten pro Korrelationslänge des Durchlässigkeitsfeldes zur Verfügung steht. In diesem Fall können auch vollständige Durchbruchskurven gut wiedergegeben werden, auch wenn nur zeitliche Momente niedriger Ordnung der Durchbruchskurven als Eingabedaten verwendet werden. Der verbesserte Optimierungsalgorithmus konvergierte in allen Testfällen zuverlässig. Eine Stabilitätsanalyse ergab einige interessante Aspekte. Bei konventioneller Modellierung wird zuerst die Strömung simuliert; anschließend werden Transportvorgänge im errechneten Strömungsfeld berechnet. Die Simulation des Transportes kann dabei nicht genauer sein als die Simulation der Strömung. Bei inverser Modellierung, so stelle ich fest, verhält sich dies gleichermaßen. Darauf basierend habe ich folgende Forderungen formuliert. Erstens müssen die Eingabedaten die fraglichen Strömungsvorgänge ausreichend charakterisieren, da sonst die transportbezogenen Eingabedaten unumgänglich falsch interpretiert werden. Zweitens müssen die den transportbezogenen Daten zugewiesenen Messtoleranzen sowohl die tatsächliche Unsicherheit bei der Messung als auch den Modellfehler aus des geschätzten Strömungsfeldes beinhalten. Drittens kann die inverse Modellierung des Transports nicht vor der inversen Modellierung der Strömung konvergieren. Diese Forderungen gelten grundsätzlich für alle Anwendungen, in denen Modelle gleichzeitig für Strömungs- und Transportvorgänge kalibriert werden sollen.

Anwendung auf experimentelle Daten Zur Erkundung der Eigenschaften der neuen Methode und des neuen Koeffizienten habe ich umfangreiche Testberechnungen ausgeführt. Das Projekt, in dessen Rahmen diese Dissertation verfasst wurde, sieht auch die Anwendung auf experimentelle Daten vor. Ziel der Experimente war es, die Theorie des Transportes auf der Makroskala auf reale Daten aus großskaligen Versuchen unter kontrollierten Bedingungen anzuwenden und zu verifizieren. Die Mitglieder der Projektgruppe Jose (2004) und Rahman (2004) [48, 76] führten großskalige Laborexperimente zur longitudinalen und transversalen Dispersion in einem heterogen sandbefülltem Grundwasserversuchsstand durch. Die daraus erlangten Daten haben Sie mit Hilfe existierender Theorien ausgewertet. Zur Verifizierung habe ich mit meiner neu entwickelten Methode das Durchlässigkeitsfeld und die räumliche Verteilung des skalaren Log-Dispersionskoeffizienten ermittelt. Die Auswertung und ein Vergleich der Ergebnisse ergab Folgendes:

Zusammenfassung

XIX

• Der verwendete Algorithmus konvergierte zuverlässig. Die Ergebnisse waren physikalisch sinnvoll und bestanden alle verwendeten statistischen Prüfverfahren. Sie spiegelten die verwendeten Eingabedaten sinvoll wieder, mit Ausnahme eines Teilbereiches am Einlauf des Versuchsstandes. • Die Vorhersage eines weiteren Experimentes, welches ich nicht für die inverse Modellierung verwendet hatte, bestätigte die Richtigkeit der ermittelten Parameterverteilung. • Der Teilbereich am Einlauf leidet vor allem unter erhöhter Unsicherheit in der Beschreibung des Strömungsfeldes. Dies unterstreicht die Forderungen bezüglich der Zuverlässigkeit hydraulischer Daten. • Ein direkter Vergleich des Durchlässigkeitsfeldes mit dem Füllmuster des Versuchsstandes fiel positiv aus, obwohl er nicht entsprechend eines bekannten geostatistischen Modelles befüllt worden war. • Die aus theoretischen Abwägungen hergeleiteten Eigenschaften des neu eingeführten Dispersionskoeffizienten bestätigten sich in der Anwendung. • Der Vergleich mit den Ergebnissen anderer Auswertungsmethoden ergab gute Übereinstimmungen und somit die Bestätigung der existierenden Theorien. • Einzelne Kernbereiche besonders starker Durchmischung konnten identifiziert und mechanistisch erklärt werden.

Zusammenfassende Schlussfolgerungen Insgesamt funktionierte die neu entwickelte und umgesetzte Methode in Anwendungen sowohl auf künstlichen Testserien als auch bei Verwendung experimenteller Daten sehr gut. Sie liefert zuverlässige Abschätzungen der räumlichen Verteilung von Parametern für die Modellierung von Strömung und Transport in heterogenen Medien. Die Unsicherheit des Massenschwerpunktes von Stoffwolken, sowie deren Spreitung und Verdünnung werden dabei gleichzeitig und jeweils einzeln qualitativ und quantitativ beschrieben, soweit die Eingabedaten es ermöglichen. Dies erlaubt die Vorhersage der Durchmischung und Reaktion von Stoffen im Untergrund bei gleichzeitig rigoroser Berücksichtigung von Modellunsicherheiten. Wie bei allen geostatistischen Methoden hängt die Qualität der ermittelten Parameterfelder ausschließlich von der Qualität und Quantität der Eingabedaten ab. Für Feld-Anwendungen wird das Problem der numerischen Auflösung weitere interressante Aufgaben stellen. Wenn es erklärtes Ziel ist, die effektive Dispersion und lokale Verdünnung im Modell abzubilden, müssen punktartige Messungen von Durchbruchskurven aus Tracerexperimenten als Eingabedaten vorliegen. Die Diskretisierung in den involvierten numerischen Modellen muss ausreichend hoch sein, um Strömungs- und Transportprozesse auch auf solch kleiner Skala auflösen zu können. Hohe Auflösungen mit nur wenigen Millimetern Gitterweite stellen jedoch einen gewissen oberen Rahmen für die Größe des Modellgebietes dar. Zunehmende Rechenleistung und eine Weiterentwicklung der Methode mit Hilfe speziellerer numerischer Verfahren und adaptiver stromlinienorientierter Gitter werden hier weiterhelfen.

Chapter 1

Introduction 1.1 Motivation Groundwater is an important resource globally endangered both in its quantity and quality. Computational models for prediction and risk assessment of groundwater quantity and quality are required in order to ensure its sustainable management. Especially the design of remediation strategies and the prediction of natural attenuation is impossible without such models. To assure reliability and predictive power of computational models, it is essential to accurately represent the natural system in question. Unfortunately, aquifers in general are heterogeneous, and data on aquifer properties are expensive and prone to error. Therefore, the parameters in computational models are mostly uncertain. If one does not rigorously quantify the uncertainty of model parameters, the unquantified uncertainty of model output renders the models useless. Therefore, techniques for model calibration have to account for aquifer heterogeneity and uncertainty. The uncertainty of aquifer properties can be tackled best in a stochastic framework. The fusion of model calibration and stochastics resulted in methods of geostatistical inversing. They treat the logconductivity or different hydraulic parameters as random space variables, characterized by their mean values and their covariance functions. Measurements of dependent quantities, such as hydraulic heads, are used to infer information onto the parameters. Since the input data convey only incomplete information, the resulting estimates of log-conductivity cannot fully simulate the heterogeneity of aquifers. The development of such methods continues to set up challenging tasks for future research. The most important mechanisms for solute transport in heterogeneous formations are advection and dispersion. Advection is the transport of solutes with the flow of groundwater. Dispersion originates from the variability of the flow field, and hence from aquifer heterogeneity. For solute clouds, it describes dilution, the distortion of their outline, and the irregular motion of their center of mass. Among these mechanisms, dilution allows adjacent plumes to mix and thus undergo chemical reactions. Therefore, it is vital to accurately quantify dispersion in order to predict reaction rates in heterogeneous media. If, in computational models, natural variability is resolved only partly, the dispersion of solute clouds is underestimated, and so are effective reaction rates. Macrotransport theory replaces heterogeneous media by equivalent homogeneous media. The dispersion of solutes is parameterized and simulated by dispersion coefficients in a diffusion-like process. Unfortunately, predictions based on macrotransport theory hold only in a stochastic sense. Macrotransport theory requires a statistical description of the heterogeneity, i.e. by the covariance function. To obtain this function, a certain amount of

2

Introduction

sample data must be available. Ironically, only the statistics of these data are used, but the information carried by the actual values is entirely discarded. Geostatistical inversing uses the actual values to estimate the spatial distribution of hydraulic conductivity. However, a colloquialism among modelers says to never perform transport simulations on a geostatistically estimated conductivity field: The resulting conductivity fields do not resolve the entire variability and therefore underestimate dispersion of solute clouds. When applying macrotransport dispersion coefficients to these estimates, dispersion is overestimated since the heterogeneity would be both partially resolved by the estimated conductivity field and fully parameterized by the dispersion coefficients. Only a few existing methods fill this gap. The most common ones simulate the unresolved variability in a random fashion by generating conditional realizations of the conductivity field at excessive computational costs. Other approaches parameterize the unresolved heterogeneity in a fashion similar to macrotransport theory. These approaches are either exact and unbearable in their computational costs, or they are only approximate for special cases. Further, none of them can quantitatively describe all of the three dispersive mechanisms both individually and simultaneously. In this field, there is an obvious demand for further research.

1.2 Goal The goal of this thesis is to develop, implement and apply a new method for geostatistical inversing that simultaneously identifies the parameters for flow and transport in subsurface flow. The method to be developed will follow the principle of geostatistical inverse modeling. In contrast to existing methods, it is aimed at simultaneously estimating the spatial distribution of conductivity and a dispersion coefficient and could be further extended to other transport parameters, such as retardation factors. In many cases, the data used for geostatistical inversing are breakthrough curves from tracer experiments. Along with information on the conductivity field, these data also convey information on dispersion. This part of the information, which has not been used for geostatistical inversing up to presence, will be considered in the new method. For geostatistical inversing, there are various methods. Among these, the most appropriate one has to be found, adapted and fine-tuned for the new task at hand. The most important properties to look out for are stability of the underlying algorithm, computational efficiency, and the ability to rigorously quantify the uncertainty of identified parameter values. Most available inversion algorithms successively linearize the sensitivities of the measured quantities with respect to the parameters to be identified. The dispersion coefficient is a new type of unknown in geostatistical inversing, so that expressions for the sensitivities need to be derived. Since transport simulations in computational models often require a fine discretization with a large number of unknown discrete values, computational costs are an issue to be aware of. For the sake of minimizing the computational effort, efficient algorithms for certain matrix operations in geostatistics must be reviewed and extended. Also, the methods to compute model sensitivities and to simulate groundwater flow and solute transport must be computationally most efficient. Once the method is developed, it is to be implemented in MATLAB, a language for mathematical programming. It is to be tested on artificial data sets for the sake of verification, to investigate its properties and to gain experience with its performance. This thesis is placed within the framework of a project to investigate dilution, mixing and reaction of solutes in the subsurface. As a last step, the new method is to be applied to a real data set obtained from an experiment performed by members of the project group.

Chapter 2

Approaches for Inverse Modeling In this chapter, I review existing approaches for inverse modeling. Section 2.1 gives a brief introduction to computational models, forward modeling and inverse modeling. Section 2.2 discusses the ubiquitous nuisance of model uncertainty, its origins and how to handle modeling under uncertainty. It covers the problems, implications and imperatives that arise from model uncertainty when calibrating models through inverse modeling. Sections 2.3 and 2.4 summarize the concepts of geostatistics and Bayes theorem that are highly useful to quantify uncertainty of model parameters. The objective of Section 2.5 is to give an overview of techniques for inverse modeling and how they cope with uncertainty. Finally, Section 2.6 deals with geostatistical inverse modeling. In later chapters, I identify the parameter values for hydraulic conductivity and dispersion using the Quasi-Linear Geostatistical Approach for inverse modeling. The latter section explains why I chose this specific approach and what improvements are necessary to this method in order to equip it for my purposes.

2.1 Computational Models Definition of Terms A computational model is a set of equations or a piece of program code that evaluates a model function, transforming model input into model output. Its purpose is to simulate the response of a natural or technical system to system excitation. Computational models are based on conceptual models that define the system by those properties and processes relevant to explain the system behavior. Model input are all values that enter into the model function, describing external forces or other kinds of excitation that act onto the system. This includes control variables and subsidiary conditions, like boundary and initial conditions. The model function or transfer function is a mathematical function that relates model input to model output by simulating certain processes. Model parameters or independent quantities are parameters in the transfer function in order to characterize the properties of the physical system or processes in question. Model output (or state variables, dependent quantities) are all resulting values from the transfer function. They resemble the state of the modeled system and how it responds to its excitations. Computational models can be random or deterministic, depending on whether random variables appear in the model. They can be linear or non-linear, depending on the character of the transfer function. Further, they are either stationary or dynamic, depending on whether time is considered as a system variable. An example from the field of hydrogeology is a numerical model for groundwater flow: The system to be modeled is an aquifer, characterized by its geometry and the spatial distribution of hydraulic

4

Approaches for Inverse Modeling

conductivity. System excitations, like groundwater recharge or extraction, lead to a changing groundwater table as system response. In the model, hydraulic conductivity is a parameter. The pumping rate at an extraction well together with boundary and initial conditions is the model input, and the resulting distribution of the hydraulic head is the model output. Forward and Inverse Modeling Given a model function, model input and model parameters, forward modeling is the process of evaluating the model output to simulate a system response to system excitations, as illustrated in Figure 2.1. The primary goal may be the design, management or prediction of technical or natural systems, or theoretical considerations for better understanding of single processes and process interactions.

Figure 2.1: Forward modeling In theory, an accurate simulation model must fulfill the following two requirements: (1) the transfer function must be an exact description of the processes in the system, and (2) the model parameters must be known. In practice, however, knowledge on processes and physical properties of natural systems is always uncertain and incomplete. To set up simulation models that are sufficiently accurate for design and management purposes, the model structure and parameters are determined in a process called model calibration: the model is adapted and modified until model input and model output match given data sets of observed system excitation and response. Once the conceptual model structure is fixed or assumed to be known, finding the adequate parameter values is referred to as parameter identification. In contrast to forward modeling, where input and parameters are known and output is unknown, the output is known and parameter values or single control variables of the model input are unknown. Since this reverses the flow of information in the computational model (see Figure 2.2), parameter identification is often called inverse modeling. In this thesis, inverse modeling refers to the case where model input and model output are known, whereas model parameters are unknown.

2.2 Model Uncertainty 2.2.1 Origins of Model Uncertainty In general, computational models suffer from three different kinds of uncertainty: uncertainty of the conceptual model, numerical error and parameter uncertainty. A good reading for this subject

2.2 Model Uncertainty

5

Figure 2.2: Inverse modeling that includes many examples and lucid explanations are the books by Sun (1994) and Schweppe (1973) [94, 87]. If any of those uncertainties are too high, the entire model is useless. Further, if one of these is significantly higher than the others, it does not make any sense trying to improve the others. A model that is conceptually wrong may be calibrated with excellent data sets and equipped with the best numerical schemes, but will still yield wrong results. It lies in the hands of the educated modeler to conceptualize the system to be modeled in an appropriate way. I assume for the purpose of this thesis that conceptual error is absent. As for numerical error, the increasing computational power of contemporary computers can easily minimize it to acceptable levels by using finer discretizations. Usually, it is smaller than the other errors by orders of magnitude. In many cases, the remaining uncertainty, that of the parameters, is the toughest to handle. Parameter uncertainty may be defined as the imperfection of information. In principle, there are three sources of uncertainty that affect model parameters: (1) measurement error, (2) spatial heterogeneity and (3) temporal fluctuations. (1) Measurements of physical quantities are always subject to measurement error: In almost all cases, there is a discrepancy between the real and the measured value. It may originate, among other sources, from inaccurate techniques, reading error, cross-sensitivities of the measurement devices, or from disturbing the system by the measurement process itself. Measurement error is either systematic or random. Random error is unbiased white noise and can be dealt with by placing the process of measurement in a stochastic framework. Systematic error, by contrast, adds an unknown bias to the measurement values and hence is the more severe type of measurement error. (2) Natural systems often display spatial variability in their physical properties referred to as heterogeneity. Most measurement techniques either collect point-like observations or spatially averaged values only. Other measurement techniques, like many methods of remote sensing, offer good spatial resolutions. However, they often yield qualitative and not quantitative information. (3) Besides spatial heterogeneity, natural system tend to exhibit fluctuations in time. Measurements are either snapshots or averaged values over time. Continuous observations of fluctuating are not always possible. In most cases, continuous observations are rather a time series of snapshots of moving averages. If the actual structure of heterogeneity or the regime of fluctuation is not known a priori, the exact distribution in time and space is uncertain with an infinite degree of freedom. Due to measurement error, the inaccessibility of certain types of information and the costs of measurements, heterogeneity and fluctuations can never be resolved at arbitrarily small scales.

6

Approaches for Inverse Modeling

The input to inverse modeling is a given amount of information from measurements. Since these are subject to error and limited in number, they do not suffice to identify the model parameters unimpaired by uncertainty. This is especially the case if considering spatial heterogeneity or fluctuations in dynamic models. As a consequence, model parameters and the resulting model output is always subject to uncertainty.

2.2.2 Modeling under Uncertainty The previous section can be summarized: Since nature is heterogeneous and measurements are always subject to error, information on natural systems is always incomplete and uncertain. A simulation model, to suffice for prediction and management purposes, has to accurately describe the natural system. It does so in quality by the mathematical formulation of a conceptual model and in quantity by describing physical properties with the help of model parameters. The model parameters are usually not known a priori. Instead, they need to be identified from observations on the natural system through inverse modeling. Since these observations are subject to uncertainty and since heterogeneity can never be resolved at arbitrarily small scales, the resulting parameter values are subject to uncertainty as well. This uncertainty irrevocably propagates onto the model output, rendering simulation models categorically erroneous and subject to uncertainty. It is self-evident that, for prediction and management purposes, uncertain results from simulation models are entirely useless and even dangerous except if their uncertainty is rigorously quantified. To assess the uncertainty of model output, uncertainty analysis involves two steps: quantifying the uncertainty of model input and parameters, and then propagating the uncertainty onto the model output. For the latter, sensitivity analysis comes into play, where the dependency of the model output on model input and on model parameters is identified, typically in the form of partial derivatives. The result of a complete uncertainty analysis is a mathematical or stochastic description of the uncertainty of the model output, e.g., in the form of error bounds, confidence intervals or entire statistical distributions. It is needless to elaborate on the fact that uncertainty analysis, even if performed most conscientiously, can be no more accurate than its very own input: the quantification of uncertainty in the model input and model parameters. Hence, methods for inverse modeling must quantify the uncertainty of the parameters they identify.

2.2.3 Challenges in Inverse Modeling Inverse modeling is the procedure of finding values for model parameters so that the model output is an accurate simulation of the real system response. The starting point is a computational model and observed data sets of system excitation and response. It is implicitly assumed that the underlying conceptual model is correct in describing the significant processes that act in the system. Since the observations are subject to error, any method of inverse modeling must accurately quantify the resulting uncertainty of the parameters. Only then, uncertainty analysis can take over to quantify how this uncertainty propagates from the parameters onto the model output. If this crucial necessity is not met, the computational model will produce erroneous predictions with a seemingly deterministic character. Not specifying error bounds or other measures of uncertainty will inevitably lead to drastic mistakes in management and risk assessment. As discussed in Section 2.2.1, parameter uncertainty originates from measurement error or lacking resolution in time and space. While forward modeling problems of stable processes are always wellposed, these deficiencies often lead to ill-posed inverse problems. This topic is covered in general by

2.3 Geostatistical Description of Heterogeneity

7

Sun (1994), and for specific cases by Dietrich and Newsam (1989) and by Yeh and Simunek (2002) [94, 24, 103]. Well-posedness and ill-posedness is defined as follows: Well-Posedness: A problem is well-posed, if its solution fulfills the three criteria of existence, uniqueness and stability. Ill-Posedness: any problem that fails to be well-posed is an ill-posed problem. The following is a definition and discussion of the three criteria. Existence: There is a solution, i.e. a set of discrete values or a function, that satisfies all conditions and equations Undoubtedly, the physical reality is an existing solution to inverse problems. However, observations are unavoidably subject to error and may lead to measurement values contradicting the physics of the system. Then, there is no solution to the inverse problem that satisfies the observations. For example, inaccurate measurements of solute concentration may yield negative values, or hydraulic heads inaccurately observed in the field may suggest that groundwater flow occurs uphill. The problem of existence can be tackled by accounting for measurement error in a stochastic framework. Uniqueness: There is only one solution. There are two reasons why an inverse problem may be non-unique. First, the problem may be mathematically under-determined. This is the case, if the number of unknown parameter values is larger than the number of observations and auxiliary conditions. Under-determination can always be removed by introducing more observations, additional auxiliary conditions, prior assumptions or other types of regularizations. Second, there may be inherent non-uniqueness in the problem setup itself, originating from redundant information or from a lack of certain types of information. For example, if only observations of the hydraulic head are available, only contrasts but not the absolute values of hydraulic conductivity can be determined. Again, this type of non-uniqueness may be overcome by including observations of the lacking type, additional auxiliary conditions, prior assumptions or other types of regularizations. Stability: The solution for the model parameters is a continuous function of model input and the observations. It can easily be shown that inverse problems are instable with respect to measurement error. Assume that an observation error is such that the solution is at the limit of existence. For example, the hydraulic conductivity has to be infinite to allow groundwater flow in a region where measurements suggest that there is no hydraulic gradient. At this limit, the solution for a parameter like the hydraulic conductivity is not a continuous function of the measurements.

2.3 Geostatistical Description of Heterogeneity In the previous Section, it became clear that parameter uncertainty has to be quantified. Since the parameter uncertainty in groundwater flow and solute transport models is closely connected to the

8

Approaches for Inverse Modeling

heterogeneity of aquifers, powerful means to describe heterogeneity in a mathematical framework are required. The concept of geostatistics offers a wide toolbox to characterize heterogeneity. Basic literature on this topic are the textbooks by Matheron (1971), Cressie (1991) and Kitanidis (1997) [68, 18, 60]. The basic idea is to approach heterogeneous structures as random spatial functions. These random functions are split up into a spatial function for the expected value, and random fluctuations. The random fluctuations are characterized by geostatistical models. The most common assumption considers the fluctuations to be distributed multi-Gaussian. A covariance function describes spatial correlation as a function of distance. The value of the covariance function for a distance of zero is the variance, quantifying the overall magnitude of variability. Often, certain analytical functions with only a few structural parameters are used as covariance functions. Among these are the Gaussian function to characterize smooth structures, the exponential function for rough structures, linear relations to represent fractal media or the Dirac function for white noise. In most cases, the structural parameters are the variance and the integral scale or correlation length, i.e., an average length of coherent spatial structures. Other model functions and the corresponding mathematical expressions can be found in the literature. The geostatistical model describes the characteristics of a population or ensemble of possible outcomes of a random process. A single aquifer with unknown spatial structure is regarded to be a realization drawn from the ensemble. On that basis, predictions from the ensemble are applied to the aquifer. An important issue is ergodicity. If a realization is large enough compared to the integral scale, then it is ergodic, i.e., volume averages can be used instead of the ensemble average. The assumption of second order stationarity significantly simplifies the mathematical framework for geostatistics: The covariance is assumed to be a function of separation distance between two points of consideration and to be independent of the actual location. Alternatively, functions are assumed to be intrinsic, which is less strict. Intrinsic functions, again, have stochastic properties that do not change in space, but a covariance function cannot be denoted explicitely. This is the case, e.g., for media with infinite variance or for media with an unknown mean value. Then, a generalized covariance function takes the place of the covariance function. Other less common assumptions exist that are of no interest for my thesis. The most wide-spread application of geostatistics is kriging, a method to estimate or interpolate an unknown spatial function between locations of observations where the values are known. In case the observations do not return values of the unknown function but of a quantity that is correlated to that function, the equivalent method is called cokriging. Applications of geostatistics to stochastic hydrology are summarized by De Marsily (1986) [21]. The advantage of geostatistical interpolation is, that the uncertainty of the spatial function is always quantified. Prior to considering the observations, it is given by the variance in the geostatistical model. After taking into account the observations in techniques like kriging or cokriging, the remaining uncertainty is described by the estimation variance.

2.4 Bayes Theorem Many geostatistical methods have been shown to be special applications of Bayes theorem. It is the most basic law of information processing that describes how additional new information reduces uncertainty. Before reviewing the concept behind Bayes theorem, I summarize some frequently used terms.

2.5 Inverse Modeling in Subsurface Flow

9

Definition of Terms Joint probability: The probability of two or more events occurring together (jointly). Marginal probability: The probability of an event to occur regardless of all other events that might occur jointly. Marginalization: Computing the marginal probability from a joint probability. Conditional probability: The probability that an event occurs under the condition that a specific other event occurs jointly. Conditioning: Computing the conditional probability from a joint probability given a specific condition. Prior knowledge: The information (e.g. probabilities) available before conditioning. Posterior knowledge: The information (e.g. a conditional probabilities) available after conditioning. Bayes Theorem Bayes theorem describes how additional new information reduces uncertainty in a versatile and general mathematical framework. It may be denoted as follows: p (s|y) ∼ p (s) p (y|s) . Here, p (s) is the prior probability density function (pdf ) describing the initial uncertainty of some unknown quantity s. p (y|s) is the conditional pdf of the observations for a given value of the unknowns y. It quantifies the reliability of the observations and is often called the likelihood of the measurements. Finally, p (s|y) is the conditional pdf of the unknowns given the observations, representing the reduced uncertainty after considering the new information from the observations. Sometimes, Bayes theorem is denoted in a more general form: p (s|y) =

p (s) p (y|s) , p (y)

in which p (y), the marginal pdf of the observations, is merely a normalizing quantity that can be omitted if considering one specific set of observations.

2.5 Inverse Modeling in Subsurface Flow This section gives an overview of different approaches for inverse modeling. In general, there are deterministic and stochastic approaches. I refer to those approaches that include deterministic assumptions within the stochastic framework as hybrid approaches. In the following, I will briefly review their basic features with the main focus on how to handle the problems of heterogeneity, ill-posedness and parameter uncertainty. One could differentiate between methods that rely on direct or indirect measurements. Direct measurements immediately refer to values of model parameters, while indirect measurements relate to model output. Kriging, for example, performs spatial interpolation between scattered observations of parameter values. Cokriging, by contrast, uses measurements of dependent quantities to identify the spatial distribution of parameters. I do not differentiate between these cases but merely categorize direct measurements as a special case of indirect observations.

10

Approaches for Inverse Modeling

2.5.1 Deterministic Approaches Deterministic approaches for inverse modeling in subsurface flow make deterministic assumptions on the spatial structure of parameters. They represent the spatial distribution of parameters, such as the hydraulic conductivity of an aquifer, by a limited number of zones or layers with constant values in each unit. The shape of the zonation and layering is specified by the modeler. The values inside the zones and layers are obtained in an optimization procedure where the model output is fitted to observations of state variables. Figures 2.3 shows a schematic example of an aquifer in a discrete computational model characterized by four zones with different values of hydraulic conductivity. Given values of the hydraulic heads at the location of the observation wells (black circles), the zonal values can be determined.

Figure 2.3: Deterministic approach for flow model calibration The probably most widespread representative of this group is the computer program Modflow-P (recently replaced by Modflow-2000 and permanently being upgraded) maintained by the American Geophysical Union (Hill, 1992) [42], followed by UCODE, issued by the same organization (Hill, 1998) [44]. Early versions of Modflow-P only included observations of conductivity and hydraulic head, to determine zonal values of hydraulic conductivity. The ADV package extended the capabilities of Modflow-P to include observations of solute transport (Anderman and Hill, 1997 [1]). Advantages The number of zones is usually chosen smaller than the number of observations, rendering the optimization a mathematically over-determined problem. This type of problem is easily solved by least squares fitting algorithms, such as the Gauss-Newton algorithm. Some of these approaches include a stochastic quantification of measurement error. The GaussNewton algorithm provides means to quantify the resulting uncertainty of the parameters in the form of their error variance and correlation among each other, as used in the Modflow add-ons described by Hill (1994) [43]. Further, allowing for measurement error in the conceptualization of the inverse problem overcomes most complications concerning ill-posedness. If additionally taking into account prior knowledge (like expected values of the zonal values) or direct measurements of the zonal values, the resulting problems are most likely to be well-posed. If a few zones are the predominant heterogeneity throughout the aquifer and the shape of the zones can be specified with confidence, the concept of zonation is certainly helpful. Then, the zonal values

2.5 Inverse Modeling in Subsurface Flow

11

of conductivity can be identified at ease and their uncertainty can be quantified accurately. Since the number of unknown zonal values is typically small, the computational effort is relatively low. Disadvantages The problem of deterministic approaches is that zonation is a relatively hard assumption, considering that geological knowledge is mostly uncertain. No experienced and serious geologist will like to make binding statements on the shape of zonation unless the aquifer in question has undergone expensive geophysical site exploration and excessive sampling. Only in some cases, technical knowledge provides clear zone outlines, e.g., for clay liners under landfills or for funnel-and-gate systems. The uncertainty of the zone outlines can hardly be included in subsequent uncertainty analyses. Further, heterogenities inside the zones or layers may be the predominant heterogeneity. They may entirely override the zonal structures. Then, zonation does not appropriately represent aquifer heterogeneity and the uncertainty associated with it and should not be used.

2.5.2 Geostatistical Approaches Geostatistical approaches do not make deterministic assumptions on the structure of aquifer heterogeneity. Without such assumptions, the parameter values may vary in space everywhere in the model domain. The unknown random spatial function of the parameters is typically discretized on a fine computational grid, resulting in a high number of unknown discrete values. Consequently, the number of unknowns is larger by orders of magnitude than the number of measurements. This results in mathematically under-determined problems with an infinite number of possible solutions for the parameter field, spanning a solution space. At this point, geostatistics come into play. The covariance function characterizes the spatial correlation of the parameters and acts as a regularization: the solution space is reduced to a subset of admissible solutions that obey certain statistics (see, e.g., Kitanidis, 1997 [60]). The shape of the covariance function is usually derived from analyzing data sets of measurements in procedures like variogram fitting or the optimization of structural parameters (e.g. Matheron, 1971, Kitanidis, 1986 [68, 51]). Measurements, as well, are interpreted as outcomes of random processes characterized by the measured values themselves and the variance of measurement error. Geostatistical methods can fall into two groups. The first group, mostly called Monte-Carlo methods, randomly simulates many possible outcomes for the solution out of the ensemble statistics. The measurements are included by demanding that each realization of the parameters leads to model output that meets the measurements. The tolerance for meeting the measurements is specified by the measurement error. Realizations that fulfill this condition are referred to as conditional realizations, conditioned on the measurements. The various methods in this group differ by the way they generate their conditional realizations. In post-processing steps, the conditional mean, i.e., the mean value of the conditional realizations at each discrete location, is evaluated. Further, the conditional covariance can be evaluated in order to quantify the remaining uncertainty inside the solution space after conditioning. Figure 2.4 shows an example of three different realizations drawn from an ensemble characterized by a certain covariance function. Apparently, all three realizations show the same type of spatial structure. They have been conditioned to have an identical value at a specified observed location (marked by black circles), making them conditional realizations. Each of these realizations might be an equally likely physical reality for the distribution of hydraulic conductivity in an aquifer for which a certain value has been measured at a specific location.

12

Approaches for Inverse Modeling

Figure 2.4: Three conditional realizations The second group of geostatistical methods directly seeks analytical solutions for the expected value and the covariance of the conditional realizations. Into this group fall methods like simple kriging for the case of direct measurements (e.g. Matheron, 1971 [68]), and simple cokriging for the case of indirect measurements (e.g. De Marsily, 1986 [21]). The resulting mathematical problem is usually well-determined with a unique solution. If desired, conditional realizations can easily be generated in post-processing steps (e.g., Kitanidis, 1996 [57]). Advantages The advantage of geostatistical methods is that they do not rely on unjustifiable and hard deterministic assumptions on the structure of aquifer heterogeneity. Alternatively, they assume certain geostatistical properties of the aquifer. This includes covariance functions to describe the spatial correlation within the parameter field. To minimize the extent of prior assumptions, the shape of the covariance function is determined from the measurements themselves. Most geostatistical methods can be shown to be equivalent to the strict Bayesian principle of information processing (e.g. Kitanidis, 1986 [51]). By following Bayes theorem, they quantify the uncertainty of the identified parameters in a rigorous and complete manner. Disadvantages Up to date, most geostatistical methods for parameter identification suffer from high computational costs, as will be dealt with in Section 2.6. For one part, this originates from handling the stochastic framework for the large numbers of unknown discrete parameter values. Geostatistical methods for inverse modeling are based on a less intuitive problem setup in comparison to deterministic methods. Further, they require deeper understanding to be applied correctly.

2.6 Geostatistical Inverse Modeling

13

Both drawbacks make them less user-friendly. Partly due to this reason and due to the high computational costs, no commercial software is available up to presence.

2.5.3 Hybrid Approaches Hybrid approaches include deterministic knowledge on the structure of heterogeneity into the geostatistical framework. In arbitrary combinations, geostatistical methods can be equipped with additional deterministic shape functions like the mean value, trend functions or zone outlines. These shape functions may be attributed with unknown, uncertain, or known values as prior knowledge. A simple example is kriging with a spatial trend as described in the textbook by Kitanidis (1997) [60]. Depending on how much substance is attributed to the prior knowledge, hybrid approaches may be placed anywhere between the limits of deterministic methods and geostatistical methods. Mathematically, prior knowledge on deterministic shapes is treated like constraints, helping to regularize the problem. Advantages Including zones and trends in geostatistical methods can be done at virtually no computational costs at all. They can be exploited to regularize the inverse problem and stabilize optimization algorithms that would fail for purely stochastic approaches. Further, additional prior knowledge helps to the type of overcome ill-posedness that originates from missing types of information. In case both deterministic zones and heterogeneity within the zones attribute to the total heterogeneity of an aquifer, hybrid approaches are the best choice. Disadvantages For hybrid approaches, the same disadvantages apply as for stochastic approaches concerning the computational costs and the availability of commercial software.

2.6 Geostatistical Inverse Modeling In brief, the previous section can be summarized as follows: geostatistical approaches are superior to deterministic approaches since they do not make deterministic assumptions on spatial structure which are associated with unquantifiable uncertainty. Instead, they derive assumptions on geostatistical properties of the aquifer from the available data themselves. They rigorously quantify the uncertainty of identified parameters. However, they suffer from high computational costs. In this section, I provide a general overview of the most relevant methods for geostatistical inversing, focusing on whether they comply with the Bayesian principle, on their stability, and their computational costs. I decide which method should be apt for the objectives of my thesis. At the end of this section, I conclude what improvements are necessary to equip the chosen method with the necessary computational efficiency, stability and other properties to ensure its successful application. Cokriging is a geostatistically based technique to identify an unknown spatial parameter field given observations of a correlated quantity. Originally developed for mining exploration, cokriging has successfully been used as tool for inverse modeling in hydrogeology already in the early eighties (De Marsily, 1986 [21]). It considers the unknown parameters, such as the hydraulic conductivity

14

Approaches for Inverse Modeling

field of an aquifer, as a random space function which is conditioned on observations of dependent quantities, such as the hydraulic head or the travel time of a solute. The cross-covariance between the random space functions, i.e., the unknown parameters and the dependent quantities, is fully determined by a model function and the auto-covariance of the parameter field. In many cases, the model function is a partial differential equation. It may be non-linear with respect to the unknown parameters, so that inverse modeling in hydrogeology in general is a nonlinear problem. Linearized Cokriging, developed by Kitanidis and Vomvoris (1983) [61], identifies the hydraulic conductivity of aquifers given measurements of conductivity and hydraulic head at steady state. The method linearizes the groundwater flow equation about a homogeneous prior mean value of conductivity, which is applicable for low degrees of variability only. The shape of the auto-covariance function is obtained from the available data using a maximum likelihood method. Carrera and Glorioso (1991) [7] concluded that cokriging should be performed in an iterative manner by successively linearizing the model function about the current estimate of the parameter field. Following this rationale, iterative methods for geostatistical inversing emerged. Basically, four differing concepts have been published. The first group of methods, such as the Iterative Cokriging-Like Technique (Yeh et al, 1995 [101]), defines an approximate linearization of the model function once. Then, cokriging is applied repeatedly while allowing both the linearization and the auto-covariance of the parameter field to remain constant. Other methods conceptualize the iterative procedure as a sequence of Bayesian updating steps. They update both the linearization and the auto-covariance of the parameter field during their iteration algorithm, like the Successive Linear Estimator (SLE) by Yeh and coworkers (1996) [102]. The third group of methods successively linearizes the model function about the current estimate while keeping the covariances. Into this group fall the QuasiLinear Geostatistical Approach (Geostatistical Approach) by Kitanidis (1996) [56] and the Maximum a Posteriori (MAP) method by McLaughlin and Townley (1996) [69]. The last group of methods, including the Pilot Point Method by RamaRao et al. (1995) [78] and the method of Sequential SelfCalibration by Gómez-Hernández et al. (1997) [37], generate conditional realizations. Zimmerman et al. (1998) [105] provide a comparison of how these methods and others perform in some exemplary applications. Computational Costs In my thesis, I aim at large inverse problems where the number of unknown parameters may easily rise above n = 105 , e.g. n = 106 for well-resolved 2-D or 3-D applications. A disadvantage of cokriging-like techniques lies in the computational costs involved in handling the auto-covariances and cross-covariances. This has led to the development of alternative geostatistical methods of inversing in which the covariance matrices are not fully determined, like the Pilot-Point Method (RamaRao et al, 1995 [78]) and the method of Sequential Self-Calibration (Gómez-Hernández et al., 1997 [37]). Both methods use simplified conditioning techniques to generate conditional realizations. While the simplifications may speed up the generation of a single realization, the number of realizations required to cover the entire variability of the parameter field increases rapidly with the number of unknown parameter values. Given the size of problems that will appear in my thesis, these alternative methods are not applicable. Zimmerman (1989) [106] showed that, under certain conditions, the auto-covariance matrix of the unknowns has specific structural properties. Given these properties, one can apply so-called spectral methods to perform all required matrix operations highly efficiently. With the computational costs reduced, cokriging-like techniques excel in terms of computational efficiency while outrivaling the

2.6 Geostatistical Inverse Modeling

15

alternative methods in quantifying parameter uncertainty. The conditions are that the unknown parameter field must be statistically second order stationary or at least intrinsic, and defined on a regular equispaced grid. Spectral methods are dealt with in Chapters 8. To face the size of problems occurring in my thesis, spectral methods are indispensable. Conditional covariance matrices, however, are no more second order stationary or intrinsic and hence do not allow to apply spectral methods. The Successive Linear Estimator updates the covariance matrices in its iterative procedure of Bayesian updating: the conditional covariance matrices from the respective preceding step are taken as covariance matrices for the following step. Hence, this method cannot be sped up by spectral methods. No matter how advantageous it may be under other aspects, it is ruled out for my purposes. Quantification of Parameter Uncertainty Kitanidis (1986) [51] showed that the method of Linearized Cokriging and the Quasi-Linear Geostatistical Approach are identical to a full Bayesian analysis. The same holds for the Successive Linear Estimator. The rigorous Bayesian context allows an accurate quantification of the parameter uncertainty while imposing a minimum of structural assumptions onto the unknowns. Parameter uncertainty is expressed by the conditional covariance of the parameter field. The simplified conditioning techniques in the Pilot-Point Method and the method of Sequential SelfCalibration make their compliance to the Bayesian principle questionable. This leads me to carefully resign from using them for my thesis. Among other differences discussed elsewhere (Kitanidis, 1996, Kitanidis, 1997, McLaughlin and Townley, 1997 [58, 59, 70]), the Quasi-Linear Geostatistical Approach and the MAP method differ as follows. The former defines the solution in a parameterized form based on a rigorous Bayesian analysis. The sole purpose of the iteration procedure is to optimize the subspace spanned by the parameterization. In each iteration step, the previous trial solution is projected onto the current subspace. Only in the last iteration step, when the optimal subspace has been found, the conditioning is carried out and the conditional covariance is evaluated in this optimal subspace. The underlying approach is discussed in depth in Chapter 6.2. In contrast to this, the MAP method seeks a solution that is a sum of different parameterizations encountered during the course of iteration, and the conditional covariance is computed in the last step, based on the final parameterization. This is an inconsistency that violates the Bayesian principle and disqualifies the MAP approach from application in my thesis. Stability Issues The iteration algorithm underlying the Quasi-Linear Geostatistical Approach is in many respects formally similar to the Gauss-Newton algorithm (Press et al., 1992 [75]). For least-squares fitting problems, the Gauss-Newton algorithm is well-known to be efficient for mildly non-linear problems. For strongly non-linear problems, however, it fails. The Levenberg-Marquardt algorithm (Levenberg, 1994, Marquardt, 1963 [65, 67]) is a modification of the Gauss-Newton method that, in a self-adaptive manner, navigates between the Gauss-Newton algorithm and the method of steepest descent (Press et al., 1992 [75]). Combining the robustness of the latter with the computational efficiency of the Gauss-Newton method, the Levenberg-Marquardt algorithm has become a highly valued optimization tool for non-linear problems of least-squares fitting in many engineering fields.

16

Approaches for Inverse Modeling

The basic idea of the Levenberg-Marquardt algorithm is to regularize the iteration procedure by amplifying the main diagonal in the matrix of second derivatives. This concept has already been applied to geostatistical inverse modeling. In cokriging-like procedures, the measurement error appears on the main diagonal of the so-called cokriging matrix, which resembles the matrix of second derivatives in many ways. Dietrich and Newsam (1989) [24] discussed how increasing the measurement error in the cokriging procedure can help to improve ill-conditioned cokriging matrices and suppress artefacts of numerical error in the estimated parameter fields. However, they annotate that it induces a loss of information. The Successive Linear Estimator uses a temporarily and adaptively amplified measurement error term in its cokriging matrix and further relaxation terms at other positions to stabilize the algorithm. In hydrogeological applications, higher variability of the parameters leads to stronger non-linearity of the inverse problem, thus decreasing the convergence radius and increasing the number of necessary iterations in non-linear geostatistical inverse methods. Above a certain extent of non-linearity, they fail to converge. In this respect, the Successive Linear Estimator is superior to the Quasi-Linear Geostatistical Approach. Due to its stabilization mechanisms, it has been applied to more non-linear problems such as inverse modeling in the vadose zone (e.g., Hughson and Yeh, 2000 [47]). Implications for this Thesis It seems that, up to now, the three properties of stability, computational efficiency and rigorous quantification of uncertainty somewhat form a magic triangle. Methods that seek to optimize one of them make drawbacks for the others. To overcome this dilemma, I decided to use the Quasi-Linear Geostatistical Approach for its rigorous Bayesian background, to improve its computational efficiency and stabilize its optimization algorithm. In Chapter 8, I will summarize existing spectral methods, introduce some extensions and present new spectral methods. By combining all these methods, I have sped up the Quasi-Linear Geostatistical Approach dramatically. The power of spectral methods is proven in small test cases at the end of that chapter. Following the example of Dietrich and Newsam (1989) [24] and Yeh et al. (1996) [102], I have developed a modified Levenberg-Marquardt algorithm for the Quasi-Linear Geostatistical Approach in Chapter 6. As demonstrated in an example, the new algorithm stabilizes and speeds up the Geostatistical Approach significantly.

Chapter 3

Governing Equations In this chapter, I provide the governing equations that describe groundwater flow and transport in the subsurface. For the purpose of geostatistical inverse modeling, I characterize local breakthrough curves by their temporal moments. Later in this chapter, I summarize the definition of temporal moments, their physical meaning, and their generating equations.

3.1 Groundwater Flow   According to Darcy’s law, the specific discharge q LT −1 in a locally isotropic porous medium is q = −K∇φ ,

(3.1)

  where K LT −1 is the hydraulic conductivity, φ [L] is the hydraulic head and ∇ is the nabla operator. Under steady-state conditions, applying mass conservation yields the steady-state groundwater flow equation: ∇ · (K∇φ)

= 0 in Ω ,

(3.2)

in which Ω denotes the domain. A general set of boundary conditions is given by: (K∇φ) · n = q ˜ on Γ1 φ

= φ˜ on Γ2

(K∇φ) · n = 0 on Γno ,

(3.3)

S S in which φ˜ and q˜0 are specified functions defined on the boundary Γ = Γ1 Γ2 Γno of the domain Ω, and n is the normal vector on Γ pointing outward. Γno is merely a special case of a specified flux boundary condition as on Γ1 . However, it is convenient to denote it as a separate boundary section, since  some  specific simplifications will hold for the no-flux section in Section 7. The seepage velocity v LT −1 is related to q through the porosity θ [−] of the porous medium: v=

q . θ

(3.4)

18

Governing Equations

3.2 Solute Transport Transport of a conservative, non-sorbing solute in the subsurface is described by the well-known advection-dispersion equation: ∂c + ∇ · (vc − D∇c) ∂t

= 0 in Ω ,

(3.5)

    in which c M L−3 denotes the solute concentration, t [T ] is time and D L2 T −1 is the dispersion tensor. According to Scheidegger (1961) [86], the entries Dij of the dispersion tensor D are: Dij =

vi vj (αl − αt ) + δij (αt kvk + Dm ) . kvk

(3.6)

Here, vi is the i-th component of the velocity  αl and αt [L] are the longitudinal and trans vector, verse dispersivities, respectively, and Dm L2 T −1 is the effective molecular diffusion coefficient. The Kronecker symbol δij is unity for i = j and zero else. A set of general boundary conditions is: (vc − D∇c) · n = v˜c˜ on Γin1 c = c˜ on Γin2 (D∇c) · n = 0 on Γno ∪ Γout ,

(3.7)

where c˜ is a specified concentration, v˜ is the velocity normal to the boundary, Γin is the inflow boundary with the subsections Γin1 and Γin2 , Γno are the no-flow boundary sections and Γout resembles the outflow sections. The subdivision of Γin into Γin1 and Γin2 is restricted such that a coincidence of a Dirichlet boundary in the flow problem (Γ2 ) and a Neumann boundary in the transport problem (Γin1 ) does not occur. The reasons for this restriction are twofold: first, a fixed-head boundary condition in the flow problem and a specified solute flux for the transport problem seldomly coincide in nature. Second, this restriction avoids highly inconvenient situations in later derivations in Section 7.

3.3 Moment Generating Equations Definition of Temporal Moments   The definition of the k-th temporal moment mk M T k L−3 of a local breakthrough curve at location x is: Z ∞ tk c (x, t) dt ,

mk (x) =

(3.8)

0

in which c (x, t) is the local concentration measured over time at a location x. The second central temporal moment is defined by: m2c (x) =

Z

∞ 0



m1 (x) t− m0 (x)

2

c (x, t) dt .

These integrals exist, provided that c converges to zero faster  than any  polynomial for t → ∞. An alternative way to compute the second central moment m2c M T 2 L−3 is Steiner’s well-known theorem from mechanics: m2 m2c = m2 − 1 . (3.9) m0

3.3 Moment Generating Equations

19

I define the normalized second central temporal moment by: m2cn =

m2c . m1

(3.10)

Under the conditions for flow and transport met in later chapters of this thesis, m2c increases along the direction of mean flow in a quadratic manner and is higher than m1 by orders of magnitude. This behavior may lead to ill-conditioned matrices in the geostatistical inverse procedure discussed in Chapter 6. The idea behind normalizing m2c by m1 is to remove the quadratic trend and to obtain a quantity that is about the same order of magnitude as m1 . Physical Meaning of Temporal Moments The concept of moments both in space and time is not new. It is used in statistics, mechanics, and many other engineering disciplines. Harvey and Gorelick (1995) as well as Cirpka and Kitanidis (2000) [40, 13] discuss the physical meaning of temporal moments in the context of characterizing flow and transport in heterogeneous aquifers. Figure 4.3 in the next chapter shows an example of a breakthrough curve and its temporal moments. It may be the result of a tracer test where a short pulse of tracer was injected in an injection well an then observed in observation wells further downstream. The physical meaning of the temporal moments relevant for my thesis is: The Zeroth Temporal Moment is the area under the breakthrough curve. It denotes the total mass that passes by the location of observation, divide by the discharge. The First Temporal Moment corresponds in mechanics to the center of gravity of the area under the curve if normalized by the zeroth moment. For breakthrough curves, it quantifies the mean arrival time of the solute under investigation. The Second Central Moment normalized by the zeroth moment corresponds to the moment of inertia in mechanics. In statistics, it is the variance of a probability density function. For local breakthrough curves, it quantifies the spread or diffuseness of the breakthrough around the mean arrival time. The Normalized Second Central Moment corresponds to the relative spread of a breakthrough curve. Moment Generating Equations Temporal moment generating equations are pde’s that describe the temporal moments of contamination history at all points in the domain. The derivation is given elsewhere (Harvey and Gorelick, 1995, Cirpka and Kitanidis, 2000 [40, 13]). Hence, I merely provide the final expressions. The general approach to obtain the generating equations is to multiply the advection-dispersion equation (eq. 3.5) and its boundary conditions (eq. 3.7) by tk . Then, integrate over time and apply the rules of partial integration to simplify the resulting terms. According to Harvey and Gorelick (1995) [40], the generating equation for the k-th temporal moment is: ∇ · (vmk − D∇mk ) = kmk−1

in Ω .

(3.11)

The boundary conditions that correspond to eq. (3.7) are: (vmk − D∇mk ) · n = v˜m ˜k mk

= m ˜k

on Γin1 on Γin2

(D∇mk ) · n = 0 on Γno ∪ Γout .

(3.12)

20

Governing Equations

For the second central moment, a different source term appears on the right-hand side (Cirpka and Kitanidis, 2000 [13]): T D ∇ · (vm2c ) − ∇ · (D∇m2c ) = 2 (∇m1 ) (∇m1 ) (3.13) m0 The boundary conditions given in eq. (3.7) become: (vm2c − D∇m2c ) · n = v˜m ˜ 2c m2c

= m ˜ 2c

on Γin1 on Γin2

(D∇m2c ) · n = 0 on Γno ∪ Γout ,

(3.14)

in which I simplified for the advection-dominated case by assuming that there is no significant diffusive flux across Γin1 and Γin2 . At this point, it becomes clear that the moment generating equations are formally identical to the advection dispersion equation (eq. 3.5) in steady state. Hence, temporal moments of local breakthrough curves can be modeled directly by solving steady-state equations (eqs. 3.11-3.14) instead of solving for the entire concentration history and then integrating over time. The generating equation for the normalized second central moment has highly inconvenient source terms with the square of m1 in the denominator. Since m1 is equal or close to zero on the inflow boundary under typical conditions, m2cn cannot be defined on the inflow boundary. As a consequence, the governing equation cannot be solved accurately. Instead, m2cn may be evaluated using eq 3.10 everywhere but on the inflow boundary.

Chapter 4

Heterogeneity in Nature and Model Chapter 2 discussed how to identify model parameters like the hydraulic conductivity under uncertainty. Large portions of the uncertainty originated from the heterogeneity of aquifers that is not resolvable in arbitrary detail. This chapter approaches heterogeneity under a different aspect: Heterogeneity of physical properties propagates onto flow and transport in groundwater and changes their character. In Section 4.1, I review how heterogeneity affects groundwater flow and solute transport. Section 4.2 discusses existing methods to include the effects of unresolved heterogeneity in computational models.

4.1 Solute Transport in Heterogeneous Aquifers 4.1.1 Transport Processes A solute in groundwater is typically subject to the following processes: advection, molecular diffusion, and dispersion (see, e.g., Bear, 1972 or Scheidegger, 1961 [5, 86]). For the moment, chemical transformation or mass transfer between different phases is not considered. However, effective reaction rates in heterogeneous media are often controlled by dispersive mixing. Therefore, correct quantification and characterization of dispersion is a vital necessity when predicting reactive transport in the subsurface. Advection is the motion of the dissolved molecules with the fluid motion. Molecular diffusion is the random displacement of solute molecules due to Brownian motion. Dispersion is the spreading of a solute cloud in fluids caused by irregular motion patterns of advection. The following two sections deal with the phenomena of dispersion in heterogeneous aquifers.

4.1.2 Flow in Heterogeneous Porous Media In heterogeneous porous media, fluid motion by advection is irregular in direction and magnitude because the medium in which the fluid moves is heterogeneous. Figure 4.1, (a) to (e), zooms from the smallest to the largest scale to describe aquifer heterogeneity and shows the consequences for groundwater flow. For the sake of simplicity, a porous non-fractured aquifer is considered.

22

Heterogeneity in Nature and Model

(a) On the smallest relevant scale, aquifers consist of single pores. Inside each pore, the velocity observed along the cross-section shows certain profiles due to friction and shear forces. When conceptualizing pores as tubes or slits with laminar flow, the velocity profile inside the pores is parabolic. (b) Different pore throats vary in size. Larger aperture allows higher velocities. (c) The flow path through the pores is tortuous, since the fluid has to flow around grains with varying size and geometry. (d) The Darcy scale is the scale at which the geometry of single pores becomes irrelevant and aquifers may be seen as continua described by their hydraulic conductivity. Local variations in the aquifer material with differing hydraulic conductivity result in spatial variations of velocity. (e) The geologic processes that generate aquifers produce spatial patterns of differing aquifer materials at larger scales, e.g. sedimentation patterns with gravel, sand or clay lenses.

Figure 4.1: Heterogeneity from pore scale to regional scale

4.1.3 Dispersion of Solutes in Porous Media Figure 4.2 illustrates the effects of the irregularity of advection on solute plumes: First, the center of mass of the plume meanders along a curvilinear trajectory oriented in the direction of macroscopic mean flow. The meandering is caused by large-scale variations in the velocity field. Second, due to variations on scales smaller than the plume, the plume outline becomes increasingly stretched, distorted, and develops finger-like features. This mechanism is often called spreading. Third, the increased interface amplifies the effects of diffusion and leads to dilution. The sum of these three mechanism is referred to as dispersion. Some definitions of dispersion account for only the last one or two mechanisms. Meandering of the plume center is a purely advective process. The same is true for the distortion of the plume outline, often called spreading. However, only changing the position and the shape does not change the volume of fluid occupied by the plume. The process of diffusion, supported by the elongated interfaces of the spreading plume, does lead to an increase of fluid volume occupied by the

4.1 Solute Transport in Heterogeneous Aquifers

23

Figure 4.2: Dispersive mechanisms plume. The latter effect is called dilution. Dilution and spreading are not to be confused. Dilution reduces peak concentrations within the plume, and allows adjacent plumes to mix and undergo chemical reactions. Spreading does not lead to mixing, but increases the driving forces that lead to dilution. Mistaking spreading for dilution leads to a drastic overestimation of effective reaction rates in heterogeneous media. The degree of dilution can be quantified using the dilution index according to Kitanidis (1994) [55]. The definition of a macroscopic mean flow direction and velocity introduces a differentiation between longitudinal and transverse components. Since different effects contribute to the longitudinal and transverse components of dispersion, dispersion is in general anisotropic. The longitudinal effects are in most cases larger than the transverse ones. The irregular motion of the center of mass contributes both to the longitudinal and the transverse direction. For large distances covered, however, the transverse meandering averages out and diminishes in relevance. Spreading occurs mainly in the longitudinal direction and predominates over diffusion in longitudinal direction. Since it acts orthogonal to the plume interface, the boost of diffusion takes effect mainly in transverse directions. Cirpka and Attinger (2003) [11] quantify dispersion based on a macroscopic mean flow that changes direction and magnitude over time. In that case, the difference between longitudinal and transverse components reduces.

4.1.4 The Method of Moments The method of moments is used in many fields of engineering. For many purposes in mechanics, for example, solid bodies are described by their total mass, their center of gravity, and their moment of inertia. These quantities correspond to the zeroth, first, and second central spatial moment of mass, respectively. The same holds for statistics, where random samples are often characterized by

24

Heterogeneity in Nature and Model

the total number, mean value and variance. In any case, the second central moment quantifies the spread around the center. Spatial Moments of Solute Clouds Applied to solute plumes in groundwater, the zeroth spatial moment is the total mass of solute. It does not change unless the solute undergoes chemical transformations, decay or mass transfer between phases. The first spatial moment of a the solute cloud normalized by the zeroth represents the position of the center of mass. The second central spatial moments are a measure of spread or dispersion around the center of mass. For dispersion theory, the second central moments play a major role. In an ensemble of random conductivity fields, the center of mass of solute clouds differs from realization to realization. If defining the second central spatial moment with respect to the ensemble average center of mass of all solute clouds, it includes the uncertainty in predicting the position of the center of mass. Defined with respect to the actual center of mass in a single realization, it only captures spreading and dilution. For a plume with an initial size of zero, spreading does not occur. In this case, the second central spatial moments only quantify dilution. For more details, see Kitanidis (1988) [52]. Temporal Moments of Breakthrough Curves Instead of conceptualizing solute clouds as a spatial distribution of mass recorded at a snapshot in time, another powerful method is to observe time series of bypassing concentration at fixed positions. These time series are called breakthrough curves. Local breakthrough curves have a point-like support volume. These are in contrast to breakthrough curves that are observed as averages over larger cross-sectional areas perpendicular to the flow direction, where small-scale variability of the plume is averaged out. Figure 4.3 shows an example of a breakthrough curve and its temporal moments. It may be the result of a tracer test where a short pulse of tracer was injected in an injection well an then observed in an observation well further downstream.

Figure 4.3: Local breakthrough curve associated to a pulse-like injection and its temporal moments

4.2 Dispersion in Computational Models

25

Breakthrough curves can be characterized by their temporal moments. A thorough discussion in the context of dispersion in heterogeneous media is provided by Cirpka and Kitanidis (2000) [13]. The mathematical definition and a short description of the physical meaning is summarized in Chapter 3. Like in the case of spatial moments, the second central temporal moment has a different meaning, depending on the point in time with respect to which it is evaluated. The relevant aspect for my thesis is the following: The second central temporal moment of a local breakthrough curve evaluated with respect to the local arrival time is a measure of dilution, if the initial size of the plume is zero. It is a measure of dilution and spreading if the initial size of the plume is non-zero. In contrast to point-like observations, breakthrough curves observed at a control plane include the variance of arrival time at different positions of the control plane. The difference between local and averaged information is illustrated most graphically by Cirpka and Nowak (2003) [15].

4.2 Dispersion in Computational Models This section discusses methods to deal with dispersion in computational models of solute transport. The process of dispersion originates from aquifer heterogeneity which only in exceptional cases is fully captured by model parameters. If computational models represent heterogeneity only in parts or not at all, they underestimate dispersion. Three basic concepts exist to make up for this deficiency. (1) Macrotransport theory (Section 4.2.1) replaces heterogeneous systems by macroscopically equivalent homogeneous systems and parameterizes the unresolved heterogeneity to represent dispersion in a diffusion-like process. (2) Inverse modeling (Section 4.2.2) resolves heterogeneity as detailed as allowed by the available data. Still, the resulting parameter fields are less heterogeneous than nature and under-predict dispersion. The method of generating conditional realizations (Section 4.2.3) takes over from inverse modeling and simulates the unresolved heterogeneity in a random fashion. (3) Other approaches (Section 4.2.4) likewise take over from inverse modeling but then parameterize the unresolved heterogeneity in a fashion similar to macrotransport theory.

4.2.1 Macrotransport Theory Macrotransport theory uses an upscaling approach to represent the behavior of microscopically heterogeneous systems on the macro scale. Heterogeneity below the macro scale is averaged out, resulting in a macroscopically equivalent homogeneous system. The effects of unresolved heterogeneity are represented on the macro scale. A well-known example of upscaling is Darcy’s law, although it is not referred to as such in most textbooks. Darcy’s law considers porous media on a scale sufficiently larger than single pores, called the Darcy scale. On that scale, it represents the properties of the porous medium by an effective hydraulic conductivity value. The resulting Darcy velocity is only valid if the flow process is considered at the Darcy scale or on larger scales. Conceptually, macrotransport theory characterizes advection in heterogeneous media by the rate of growth of the first spatial moment of a solute could. It describes dispersion by the rate of growth of the second central spatial moment. The local values of hydraulic conductivity are replaced by a macroscopic value such that the rate of growth of the first spatial moment is met. Dispersion is simulated by the virtual diffusion-like process of macrodispersion. It follows Fick’s laws, where the so-called macrodispersion tensor takes the position of the diffusion coefficient. The value of the macrodispersion tensor is chosen such that the second central moments of the plume grow with the

26

Heterogeneity in Nature and Model

expected rate. The components of the tensor, especially the longitudinal one, are much larger than the molecular diffusion coefficient. The diffusion-like character of macrodispersion in computational models may not be confused with actual diffusion and dilution (e.g. Kitanidis, 1988 [52]). Macrodispersion as diffusive process is merely a virtual phenomenon brought to life by the upscaling procedure that only exists in this form in macroscale models. Upscaling groundwater flow from the Darcy scale to the macro scale results in macroscopic flow. The character of the process remains the same with only a different parameter value. Upscaling solute advection, in contrast, results in a corresponding macroscopic advective process plus the process of macrodispersion. The latter is diffusion-like although it originates from the impact of heterogeneity on the process of advection. Scale Dependency of Upscaled Parameters In modeling, many scales may be relevant: • At the Darcy scale, a system of interconnected pores is replaced by an equivalent continuum described by the hydraulic conductivity. At this scale and above, Darcy’s law holds. • The grid scale is set by the grid spacing of a computational model. • The characteristic scale of the structures in heterogeneous media is often described by the integral scale or correlation length in geostatistical considerations. • The macroscale is the scale at the user of a computational model decides to resolve heterogeneity and parameterize heterogeneity on the scales below. • The scale relevant for transport considerations is given by the initial size or the travel distance of a plume. • The observation scale is given by the support volume of observations or the spacing in between. • The experimental scale, i.e., the size of a field or laboratory setup, sets a limit to the maximum scale that may occur. If the relation between different scales is not considered at all levels of modeling, serious conceptual errors occur. Self-evidently, it is useless trying to characterize small-scale variability by measurement techniques with a large-scale support volume or with large spacing in between. Everyone would agree that it is useless to predict microscopic fluid velocities by Darcy’s law. In the same manner, the concentrations predicted by macrotransport theory are valid only on the macro scale. Macrodispersion is a process that always leads to smooth idealized shapes of plumes. If the experimental scale or the scale of observation and heterogeneity is smaller than the macro scale, macrotransport theory cannot be used to predict the outcome of an experiment in more than a stochastic manner. Further, the relation between single scales may affect the value of macroscopic parameters. This caused some confusion in the late 1970ies and the 1980ies. In the natural gradient tracer experiment at the Borden site, Ontario, Sudicky and Cherry (1979) [93] showed that macrodispersion depends on the distance the tracer covered from injection to observation. Anderson (1979) [2] observed the difference of macrodispersion between small-scale laboratory tests and field-observed plumes, and the dependency on the scale considered in field experiments. The bottom line of these observations is that the amount of unresolved heterogeneity increases with the size of the scales that are involved. Hence, the resulting macroscopic dispersion parameters increase with the scale considered. If fully resolving the velocity field down to almost molecular level, only molecular diffusion needs to be considered. On the Darcy or model grid scale, the corresponding parameter is called the local dispersion coefficient. The dispersion coefficients on the macro scale are discussed in the following.

4.2 Dispersion in Computational Models

27

Macrodispersion and Effective Dispersion The macrodispersion tensor describes the dispersion of an ensemble of plumes around the ensemble center of mass, characterized by the second central spatial moments. Figures 4.4 (a) and (b) illustrate realizations of plumes in different realizations of a heterogeneous medium, visualized by a flow net. Subfigures (c) and (d) show the averages over an ensemble of ten realizations. Given a stochastic description of aquifer heterogeneity in form of a covariance function, the macrodispersion tensor may be computed using linear stochastic theory (see, e.g., Gelhar and Axness, 1983 or Neumann et al., 1987 [35, 72]). Dagan (1988 [19]) showed that the macrodispersion coefficient is travel-time dependent. His basic approach is to consider the uncertainty of the position of a particle, integrated along the macroscopically expected trajectory. The uncertainty is obtained by error propagation, starting from the covariance function of hydraulic conductivity, then the hydraulic head, the velocity field and finally the position of the particle. The travel-time dependency is explained by the fact that the particle has to cover distance until it has sampled the complete range of possible heterogeneity. After a certain relaxation time depending on the integral scale of the aquifer, the macrodispersion tensor reaches a large time limit. With smaller integral scales, the asymptotic value is approached faster. If a plume is sufficiently large, ensemble statistics hold for a single plume, which is referred to as ergodicity. That is, for plumes sufficiently larger than the integral scale of an aquifer, the macrodispersion tensor adequately characterizes dispersion for a single plume in a single realization. Most plumes observed in field studies, however, are too small to satisfy the condition of ergodicity. The second central moments used to define the macrodispersion tensor are evaluated with respect to the ensemble center of mass. Hence, the macrodispersion tensor combines two effects: the average dispersion of plumes around their respective center of mass, and the inter-ensemble variation of the center of mass (Kitanidis, 1988 [52]). The latter often dominates over the dispersion in a single realization. For non-ergodic plumes, macrodispersion overestimates the dispersion of plumes in single realizations (see, e.g., Molz and Widdowson, 1988 [71]). Figures 4.4 (c) and (d) show this effect: the macroscopic ensemble average dispersion, visualized by the 95% contours of the plumes, is more drastic for the small plume than for the large plume. Effective dispersion describes only the dispersion of plumes around their respective center of mass (e.g. Kitanidis, 1988, or Rajaram and Gelhar, 1993 [52, 77]). Before taking the ensemble average, the differing positions of the respective center of mass are subtracted from all plumes. The difference between macrodispersion and effective dispersion is larger for smaller plumes, as exemplified in Figures 4.4 (e) and (f). The center of mass for all plumes is marked by small black circles. Apparently, not considering the uncertainty associated with the center yields much less dispersion in the ensemble average for the small plume. For the large plume, the difference is significantly smaller, and for ergodic plumes, the difference vanishes. Dentz and coworkers derived closed-form expressions for the effective dispersion tensor using an Eulerian approach for a plume that is injected at a point (Dentz et al., 2000a [22]). The same group (Dentz et al., 2000b [23]) extended the method to plumes that are injected over a larger volume. Fiori and Dagan (2000) [28] found identical expressions based on a Lagrangian approach. The effective dispersion tensor approaches the same large time limit as the macrodispersion tensor. The asymptotic approach, however, is much slower so that effective dispersion is much smaller than macrodispersion for most of the time. Since the macrodispersion tensor includes, besides dilution, spreading and the uncertainty of the center of mass, it leads to overestimation of reaction rates in heterogeneous media for reactions that are controlled by dispersive mixing. Cirpka and Nowak (2003) [15] claim that the effective dispersion of point-like injection quantifies only dilution, and hence can be used to accurately predict mixing

28

Heterogeneity in Nature and Model

and reaction rates for reactive transport. This claim has been verified in numerical analyses by Cirpka (2002) [10]. However, the latter study shows that the amount of spreading around the center of mass is under-predicted when applying this approach to plumes that have a larger initial size.

Figure 4.4: Macrodispersion, effective dispersion and plume size

Definition via the Method of Moments Since the second central spatial moments and the second central temporal moment quantify the dispersion of solute clouds, they are often used to define dispersion coefficients in the literature. The macrodispersion tensor is defined as half the rate of change of the second central spatial moments normalized by the zeroth spatial moment. In this context, the second central moment is defined with respect to the macroscopically expected center of mass of the solute cloud. Alternatively, its longitudinal component is defined based on the spatial derivative of the second central temporal moment observed at control planes orthogonal to the direction of macroscopic flow. The effective dispersion tensor is defined in the same manner. However, the center of mass considered for the second central spatial moment is the actual center of mass of the solute cloud. For temporal moments, local breakthrough curves, i.e. observed at a point and not averaged over control planes, are considered.

4.2 Dispersion in Computational Models

29

Influence of Local Transverse Dispersion Transverse local dispersion and diffusion lead to longitudinal effective dispersion on larger scales. To this phenomenon, I refer to as aliasing. Observed at the micro-scale, diffusion is isotropic: there is no difference between longitudinal and transverse diffusion and the diffusion coefficient is a scalar quantity. However, if one of two tagged solute particles diffuses transversely from one streamline to an adjacent streamline that has a different velocity, it leads to longitudinal separation of the two particles. The more transverse contrasts exist in the flow field, the more pronounced the effect of aliasing, the stronger the longitudinal separation of the particles. In macroscale models, there are no transverse contrasts in the velocity field. Local transverse dispersion is aliased into longitudinal effective dispersion. The more transverse contrasts occur in the original velocity field, the higher is the required longitudinal component of the dispersion tensor at the macro-scale. I consider the generating equation for the second central temporal moment (eq. 3.13) to analyze the influence of transverse local dispersion on longitudinal effective dispersion. The source term is a quadratic form involving the gradient of the first temporal moment and the local dispersion tensor. Without transverse local dispersion and diffusion, transverse gradients in the first temporal moment do not contribute and only longitudinal local dispersion and diffusion lead to longitudinal effective dispersion. If allowing for transverse diffusive or dispersive exchange, transverse gradients in the first temporal moment boost longitudinal effective dispersion. There is an indirect counter-intuitive effect, however, that influences the relation between transverse local and longitudinal effective dispersion. It can be seen from the generating equation for the first temporal moment (eq. 3.11), that strong transverse local dispersion suppresses transverse gradients in the first temporal moment. The latter has the potential to reduce the value of the source term and hence to reduce longitudinal effective dispersion. Above a certain value, this effect becomes dominant over the effects of aliasing. I provide a numerical test series on this in Section 10.2.3. For macrodispersion, it has been shown by Dagan (1988) and Gelhar and Axness (1984) [19, 35] that a higher value of the local dispersion tensor and molecular diffusion leads to a lower magnitude of macroscopic longitudinal dispersion. By smoothing out transverse contrasts in concentration, the development of sharp and long fingers of plumes is suppressed. Since the effective and the macrodispersion tensor approach the same large time limit, this confirms my considerations at least at the large time limit. Advantages and Disadvantages The advantage of macrotransport theory is, that analytical solutions for the dispersion coefficients have been found. Since the dispersion coefficient merely has to be added to the diffusion coefficient that already exists in most transport models, the additional computational costs are virtually nonexistent. The immanent disadvantage of the underlying upscaling approach is, that predictions from macrotransport theory hold only for ensemble averages or for averages over large volumes. Another drawback is that no actual data can be used to improve the predictions by partially resolving heterogeneity, since it would interfere with the upscaling approach.

4.2.2 Geostatistical Inverse Modeling Conditioning, or geostatistical inverse modeling, uses observed data to resolve the heterogeneity as good as possible (see Chapter 2). Large-scale features are resolved better than small-scale features,

30

Heterogeneity in Nature and Model

depending on the spacing and support volume of the observations. Conditioning reduces the lowfrequency components in covariance functions that describe random spatial fields such as conductivity. The remaining unresolved heterogeneity has a high-frequency, i.e., small-scale, nature. Quite obviously, conditioned hydraulic conductivity fields show a lack of variability and hence underpredict dispersion. A lucid explanation is provided by Cirpka and Nowak (2003) [15]. A general rule in hydrogeology says to “never use a kriged conductivity field for transport modeling”.

Figure 4.5: Sampling density and dispersion in estimated conductivity fields Figure 4.5 illustrates this effect. Subfigure (a) shows an artificial heterogeneous conductivity field together with the corresponding flow net and snapshots of a plume introduced instantaneously along the left boundary. Flow is from left to right. Subfigures (b) to (d) show conductivity fields recovered by geostatistical inverse modeling with their corresponding flow nets and snapshots. The white circles with black dots mark locations of observations of conductivity, head and arrival time each. As the spacing of observations increases, less features of the conductivity field are resolved. The corresponding flow fields are more and more regular and cause less spreading and dilution of plumes. The lack of heterogeneity and the resulting under-prediction of dispersion has to be made up for by either of the methods discussed in the following.

4.2 Dispersion in Computational Models

31

4.2.3 Conditional Realizations One approach to make up for the lack of variability in the conditional mean hydraulic conductivity field is to generate conditional realizations. Direct methods to generate conditional realizations are the Pilot Point Method by RamaRao et al. (1995) [78], the method of Sequential Self-Calibration by Gómez-Hernández et al. (1997) [37] or the spectral method by Dietrich and Newsam (1996) [26]. An indirect method that first computes the conditional mean and then adds random fluctuations is the Quasi-Linear Geostatistical Approach by Kitanidis (1995) [56]. Advantages and Disadvantages The advantage of generating conditional realizations is that actual flow-related measurement data are taken into account to resolve heterogeneity. This reduces the remaining unresolved variability and hence leads to smaller uncertainty of predictions. The conditioning procedure ensures that each realization is equivalent to the natural pre-image with respect to the observed quantities at the measurement locations. The disadvantage is, that single realizations have no predictive power. Instead, an ensemble of realizations has to be generated. In order to adequately capture the variability of the entire population, many realizations are required, leading to high computational costs. Flow-related properties are met by each single realization as far as the data are concerned. Some versions also consider transportrelated data as information advection and use them to condition the flow field. Transport-related quantities like dilution or spreading, however, are not included in the conditioning procedure. Predictions on these quantities can only be done based on the ensemble statistics of the conditional realizations, and hold only for the ensemble statistics of the aquifer.

4.2.4 Combined Approaches Another approach to include the unresolved variability of conditional mean hydraulic conductivity fields is to parameterize it in a fashion similar to macrotransport theory. Since it only corrects for the amount of unresolved variability, Cirpka and Nowak (2003) called the resulting parameters corrective dispersion coefficients. Three Approaches exist in this sector. Rubin’s (1991) [82] starting point is the approach by Dagan (1988) [19]. Instead of integrating the uncertainty of the position of a particle along the macroscopically expected trajectory, Rubin uses the conditional uncertainty and the conditional trajectory. The conditional uncertainty, considering measurements of hydraulic conductivity, hydraulic head and velocity, is smaller than the unconditional uncertainty. Hence, the resulting dispersion coefficient is smaller. This method is rigorous in its analyses, but has huge computational costs, since multiple integrals have to be evaluated along all trajectories of interest. Rubin et al. (1999) [84] presented a generalized approach that covers unresolved sub-grid-scale heterogeneity. The underlying assumption is, that inverse modeling captures all heterogeneity above the grid scale or on the scale of certain larger blocks. The computational effort is kept small. While the original publication only covers macrodispersion, Rubin et al. (2003) [83] extended the approach to effective dispersion. However, the method does not account for remaining uncertainty on the block or grid scale. Cirpka and Nowak (2003) [15] make certain simplifications that are admissible if the observations used in conditioning are on a regular grid. It allows to come up with a simplified expression for the conditional covariance of hydraulic conductivity fields. The conditional covariance is then used

32

Heterogeneity in Nature and Model

as input for the approaches by Dagan (1988) [19] and Dentz et al. (2000) [22] to obtain dispersion coefficients and cover the remaining lack of variability after conditioning. Originally designed for kriged conductivity fields, it can easily be extended to cokriged fields. Aliasing Revisited At this point, I need to come back to the aliasing effect discussed in the previous section. When defining corrective dispersion coefficients, the extent of aliasing occurring on the conditioned field depends on the sharpness of transverse contrasts that are reproduced by the conditional field. The more the available data allow to reproduce transverse contrasts, the more the transverse components of corrective dispersion approaches the value of local dispersion. That is, the anisotropy and character of corrective dispersion depends on how the available data allow to resolve transverse contrasts. A case study in Section 10.2.3 illustrates this property. The only way to directly characterize transverse features of flow and transport is to install a dense curtain-like array of multilevel observation wells. For this purpose, the spacing between the sampling ports has to be sufficiently small to resolve even small-scale variability. In most field situations, the costs of such sampling arrays make this option entirely unfeasable. Aliased longitudinal effects can be observed as longitudinal spreading in single breakthrough curves from tracer tests. However, the aliased transverse effects are merged with other longitudinal effects in breakthrough curves. Given breakthrough curves and a conditional flow field, the longitudinal dispersion observed in the breakthrough curves could be attributed to only longitudinal corrective dispersion, to only transverse local dispersion, or to arbitrary combinations between these two extremes. The degree of anisotropy of the dispersion tensor cannot be concluded from breakthrough curves. Advantages and Disadvantages The advantage of these combined methods is that they allow to combine conditioning and macrotransport theory. Specific advantages and disadvantages of each method have been covered above. Another common disadvantage is that, again, transport predictions hold only for the ensemble average of the aquifer.

Chapter 5

Approach 5.1 The Thesis In the upscaling of groundwater flow in heterogeneous media, one obtains an equivalent homogenous medium characterized by a macroscopic conductivity parameter. The equivalence refers to flow-related considerations and is valid in a stochastic sense. The homogeneous medium, however, leads to an under-prediction of transport-related quantities like spreading and dilution. Macrotransport theory shows clearly that heterogeneity of the flow field in transport considerations is inseparably coupled to dispersion. The associated upscaling procedure for advective transport not only produces the process of macroscopic advection. Additionally, it produces the process of macrodispersion with its own new scale-dependent dispersion coefficient. The resulting homogeneous medium is characterized by conductivity, and dispersion of solute clouds in the homogenized medium is characterized by a dispersion coefficient. In total, the system behavior is equivalent to that of the heterogeneous medium for both flow and transport-related considerations. Again, this holds in a stochastic sense. Conditioning can be interpreted in analogy to upscaling: the heterogeneous medium is replaced by an equivalent smoother medium. The scale considered and the smoothness, however, are set by the available data. They are quite undefined compared to the clearly defined scale and homogeneity used in upscaling procedures. Another difference is, that the equivalence is defined with respect to the observed quantities at given locations. The interpolation of state variables and parameters between the observations holds for the ensemble average. Most existing methods for geostatistical inverse modeling consider only the process of flow. The unknown parameter is usually hydraulic conductivity, and observations are quantities such as measurements of conductivity or hydraulic heads. If tracer data are involved, mostly the arrival time is used. In this context, arrival time is interpreted as a flow-related quantity that mainly characterizes the velocities in the flow field. As a consequence, the resulting smoothed medium is equivalent to the heterogeneous medium for flow considerations. However, it leads to under-prediction of spreading, dilution and mixing if used for transport considerations. Under-predicting mixing of solutes leads to under-prediction of effective rates for chemical reactions of solutes that are controlled by mixing. This is a clear shortcoming of existing methods for geostatistical inversing. From the above comparison, I conclude that existing methods of geostatistical inverse modeling are similar to upscaling the process of groundwater flow. The resulting smoother medium is described by the flow-related parameter of conductivity, conditioned on flow-related quantities. Hence, one

34

Approach

must not expect it to be equivalent for transport. In order to obtain an equivalent medium for transport and to overcome the shortcoming of current methods for geostatistical inversing, the process of transport has to be included in the conditioning procedure. Following this rationale, I formulate my thesis: Geostatistical inversing produces smoothed media equivalent to respective heterogeneous media for both flow and transport only if the processes of flow and transport are considered in the conditioning procedure. The medium must be described by parameters relevant for flow as well as for transport, conditioned on observations of quantities related to both flow and transport.

5.2 Proposed New Method 5.2.1 Outline of the Proposed New Method Based on the above thesis, I now propose a new geostatistical method to identify flow and transport parameters in the subsurface. The outline of the new method is: 1. The unknown parameters are identified using a Bayesian geostatistical inverse approach to flexibly describe heterogeneous aquifers while rigorously quantifying parameter uncertainty. 2. Both hydraulic conductivity and the dispersion coefficient are unknown parameters to characterize both flow-related properties of heterogeneous aquifers and the properties of transport therein. 3. The unknown parameters are conditioned on measurements of flow and transport-related quantities, such as hydraulic quantities and temporal moments of breakthrough curves. The following is a closer look at the core points: Ad 1: In Chapter 2, I recalled that the uncertainty of parameters must be quantified rigorously. Otherwise, the output of computational models is useless for predictive purposes. Geostatistical methods outclass deterministic approaches in describing the heterogeneity of aquifers while minimizing deterministic structural assumptions. Within the group of geostatistical methods, Bayesian methods are superior in quantifying parameter uncertainty. Among the latter, I chose the Quasi-Linear Geostatistical Approach by Kitanidis (1995) [56]. Other methods may be more robust without additional modifications. However, they cannot be sped up to an acceptable level of computational costs for the purpose of this thesis. The main reasons for choosing the Quasi-Linear Geostatistical Approach were that both its computational efficiency and the stability of its underlying optimization algorithm can be improved to fit the purposes of the proposed method. Ad 2: Past applications of geostatistical inverse modeling concentrated on characterizing the properties of heterogeneous aquifers relevant for flow. They mostly feature a scalar, i.e. isotropic, log-conductivity coefficient Y = log K. The concept of identifying a dispersion coefficient by geostatistical inverse modeling is new. Similarly to the conductivity coefficient, I choose the new unknown to be an apparent scalar log-dispersion coefficient Ξ = log Ds . This choice is based mainly on the availability of data and information. Before discussing and justifying these properties, I clarify the character of the available information in Section 5.2.3. If necessary, other transport-related parameters characterizing sorption, reaction rates, etc, could be included. However, I restrict the current work to the dispersion coefficient.

5.2 Proposed New Method

35

Ad 3: Since the hydraulic conductivity and a dispersion coefficient are to be identified, the available data has to carry information on both parameters. Measurements of quantities such as the hydraulic conductivity, hydraulic heads, discharge across control planes, local velocities, etc, carry information on the hydraulic conductivity. In this thesis, I consider measurements of conductivity, heads, and total discharge. Measurements of solute concentrations carry information on transport-related processes. Here, I consider temporal moments of local breakthrough curves. The reasons for this choice are given in the next section.

5.2.2 Temporal Moments of Local Breakthrough Curves In order to consider transport in the conditioning procedure, information on solute transport must be collected and included. This can be achieved by performing tracer tests. In a tracer test, a solute is injected into the aquifer at a certain location and the evolving distribution of concentration is observed. Unfortunately, concentration data in the subsurface can mostly be measured in observation wells only. The spatial distribution of solute concentration is hard to determine, unless an expensive dense network of observation wells is installed. However, each observation well delivers breakthrough curves of solute concentration, i.e., time series of bypassing solute concentration. The temporal resolution is almost arbitrarily fine, especially if in-situ detectors and data loggers are used. Breakthrough curves carry a lot of insignificant information. The values of concentration at small temporal intervals is highly correlated. Furthermore, the data before the first measurable impact of the tracer carry only the information that the tracer has not arrived yet. This makes all data points prior to the first breakthrough redundant. Similarly, for times larger than the full breakthrough, the only information conveyed is that the tracer front has fully passed by. It is desirable to reduce the amount of redundant information in the data, because the computational effort of conditioning rises at least with the square of the number of observations. Moreover, redundant data can cause conditioning procedures to stagnate. Using temporal moments of breakthrough curves remedies these problems. The lower temporal moments, especially the first and the second central, summarize the most significant information on conductivity and the dispersion coefficient. This fact is reflected in the definition of macroscopic parameters: the macroscopic velocity is defined via the spatial derivative of the first temporal moment. The definition of dispersion coefficients based on the second central temporal moment is described in the previous chapter, Section 4.1.4. Another advantage of temporal moments of breakthrough curves is that they can be computed directly via moment generating equations. The generating equations for temporal moments are steadystate equations (see Chapter 3). Therefore, the computational costs are drastically smaller than those for solving for the entire concentration history using the transient advection-dispersion equation. A critical aspect is, that higher order temporal moments, like the second central temporal moment, are subject to numerical error in the moment generating equations and are sensitive to noise in the measured time series. This requires extra attention when solving the generating equations and when evaluating moments from measured curves. My thesis is placed within a project that features laboratory-scale experiments on effective dispersion, dilution and mixing. Their purpose is to verify the dispersion models from stochastic linear theory on realistic data sets, yet under controlled conditions. The proposed new method is to be applied to these data sets. Hence, the dispersion parameter of interest for my thesis is the effective dispersion coefficient as defined by Dentz et al. (2000a) [22]. The second central temporal moments from observed breakthrough curves can be used to define the effective dispersion coefficient if the breakthrough curves are measured at point-like locations.

36

Approach

Larger support volumes implicitly average out heterogeneity in the concentration distribution and include the variance of arrival time within the support volume, rendering the data useless. Such data would quantify macrodispersion instead of effective dispersion. Further, the solute cloud has to be injected with an initial volume or width of zero or at least with a perfectly known initial spread so that the amount of dispersion observed in the breakthrough curves is only comprised of dilution and does not include spreading. Both theoretical requirements are not entirely realizable in practical experiments, but the deviations from theory can be minimized. In the theses of project group members published parallel to my thesis (Jose, 2004, Rahman, 2004 [48, 76]), an adequate experimental setup is described. Jose and Rahman use small pointed optical fibers that detect the concentration of a fluorescent tracer directly in single pores of sandy aquifers. Instantaneous injection of a tracer is replaced by continuous injection. As input data into the inverse model, I use the truncated temporal moments of the observed breakthrough curves. Truncated moments are equivalent to the temporal moments corresponding to instantaneous tracer injection (Jose, 2004 [48]). Identification of other Transport Parameters For other applications, other types of dispersion coefficients might be of interest in order to quantify dilution and spreading or even macrodispersion. To identify these coefficients, the data has to include the respective averaging procedures, e.g., by using larger support volumes or averaging point-wise measurements over control sections. If one desires to quantify decay or chemical reactions, observations of the zeroth temporal moment must be included. Sorption can be quantified by using the first temporal moments of a sorbing and a non-sorbing tracer, or by including sufficient hydraulic data to characterize the Darcy velocity independently of the advective velocity. For kinetic mass transfer coefficients between a mobile and an immobile phase, the third temporal moment could be used since it quantifies the skewness of curves.

5.2.3 Properties of the Dispersion Coefficient The properties of the new dispersion coefficient are chosen in accordance with the data available for the conditioning procedure. It is senseless to define the dispersion coefficient in a manner that cannot be identified from the available data. Hence, the primary questions for discussing and justifying the properties chosen for the new coefficient must be the following: • What data can be obtained from experiments? • What information do the available data convey? • What characteristics of dispersion can be described and what kind of coefficient can be identified from the conveyed information at all? Available Data Local breakthrough curves are available at a spacing of observations typically determined by costbenefit considerations. The curves are characterized by their temporal moments for the sake of computational efficiency and stability.

5.2 Proposed New Method

37

Information Conveyed by the Data The available data convey information on solute concentration, passing by in longitudinal direction at selected locations. Hence, the data from one observation well convey mainly information on longitudinal aspects of the solute cloud. Most aquifers are anisotropic. Their integral scale in the horizontal directions is much larger than in the vertical direction. The consequence for solute transport is, that features of the plume outline like fingers have a comparatively small transverse extent. A good example of this effect are the shapes of the plumes displayed in the figures of the previous chapter, see Figures 4.4 and 4.5. The amount of transverse information is solely given by the spacing of observations in the transverse directions. Resolving the vertical transverse direction can be done only to a certain degree by equipping observation wells with multilevel sampling ports at small vertical distances. The characterization of longitudinal features of plumes is less critical than the transverse, since there is less variability in plumes along the longitudinal direction. Local breakthrough curves provide reliable and highly resolved information at each point. For transverse dispersion, the vertical spacing of observations is a severely limiting factor. Therefore, the available information most likely suffices to characterize longitudinal dispersion better than transverse dispersion. Parameter Identifiability According to the above considerations, the available data quantify longitudinal dispersion much better than transverse dispersion. Yet, the question what is longitudinal and transverse is not trivial. In the previous chapter, I discussed the effect of aliasing: transverse local dispersion is aliased into longitudinal effects at larger scales. Hence, the more the inverse models resolves the transverse variability of the flow field, the more does simulated longitudinal dispersion depend on the magnitude of the transverse dispersion coefficient. That is, the amount of transverse dispersion that has to be simulated by the dispersion coefficient depends on the ability of the data to resolve transverse contrasts. In summary, the ratio between transverse and longitudinal dispersion is unknown and depends on the quality and quantity of available data. Additionally, the transverse and longitudinal components cannot be identified separately given the information available since the data do not suffice to quantify the transverse component. Therefore, I decide that only a lumped dispersion coefficient can be identified from the data. In absence of a reason to fix a specific ratio for lumping, I choose a ration of unity and make the dispersion coefficient a scalar quantity, denoted by the subscript s in Ds . Properties With the issue of identifiability resolved, I can summarize and justify the properties of the new dispersion coefficient. Scalar: I define the new dispersion coefficient as a scalar quantity since transverse and longitudinal components cannot be identified separately. This issue is investigated in more detail in Chapter 10. Log-normal: I define the new dispersion coefficient to be distributed log-normally. This choice ensures its non-negativity, as required by the fundamental laws of thermodynamics. The basic assumption behind the parameterization used in macrotransport theory is, that the dispersion

38

Approach

coefficients roughly scale with velocity. Conductivity is commonly assumed to be log-normally distributed, so that the resulting velocities are approximately log-normal. Hence, macrotransport theory offers no arguments to reject this choice. Travel-Time Dependent: just like other dispersion coefficients, the new dispersion coefficient is travel-time dependent. However, the proposed method cannot explicitely describe this dependency. Since heterogeneity at larger scales is more likely to be resolved by the inverse model, the remaining unresolved heterogeneity has a smaller integral scale than the heterogeneity of the true system. Therefore, the large-time limit is reached faster. Specific: Due to the dependency on travel-time, the new coefficient correctly describes the observed dispersion of solute clouds that are injected in the same manner like the tracer used to obtain the measured breakthrough curves. Furthermore, the dispersion coefficient is valid only together with the conductivity field conditioned jointly from the same data since it makes up for the specific amount of variability unresolved by the conductivity field. Effective: In the applications featured here, the new dispersion coefficient is an effective dispersion coefficient in the sense that it quantifies dilution of solute clouds, but does not describe spreading and uncertainty in predicting the center of mass. This property is inherited from the input data, i.e., the second central temporal moments of local breakthrough curves. The dispersive mechanisms of spreading and the irregular movement of the center of mass are simulated by the conductivity field jointly conditioned from the same data.

5.2.4 Discretization and Resolution Discretization Schemes The governing equations defined in Chapter 3 are partial differential equations. Analytical solutions exist only for homogeneous and special cases of heterogeneous media. To evaluate these equations for arbitrarily heterogeneous media, they need to be solved numerically. The accuracy of the numerical solution depends on the type of discretization scheme chosen and on the resolution. In geostatistics, the parameters of the governing equations are assumed to be random spatial functions. In a computational framework, this translates into a numerical array of discrete values defined on a grid. Alternatively, a functional approach could be chosen, in which the spatial functions are parameterized more effectively than by simple discretization. For example, this could be Fourier coefficients, wavelets, or analytical functionals derived from theoretical considerations. The interface between geostatistics and the numerical solution of partial differential equations, however, requires to evaluate the values of the parameters on the grid prescribed by their discretization scheme. I choose to represent the unknown parameters by discrete values on one regular equispaced grid for the geostatistics and for the discretization of the governing equations. The advantages are, that no expensive mapping between the grids is required at any step. Further, the regular equispaced grid simplifies and speeds up many computations. It allows to use spectral methods for the basic geostatistical matrix operations as discussed in Chapter 8. Further, it simplifies and speeds up the discretization scheme for flow and transport, as I demonstrate in Chapter 9. The discretization schemes I decided to use are described in the same chapter. Required Resolution The resolution for the common grid has to satisfy the following criteria:

5.3 Detailed Task Description

39

1. The accuracy in numerically solving the governing equations for homogeneous parameter fields depends on the resolution. The grid has to be fine enough to avoid or at least minimize numerical error and spurious oscillations. 2. The spatial functions for the parameters have heterogeneous structures with characteristic integral scales. The grid has to be fine enough to resolve these structures. McLaughlin and Townley (1996) [69] follow a geostatistical approach that defines a certain functional space. They argue that only in this manner the properties can be analyzed properly. Still, for numerical evaluation in cases where analytical solutions do not hold, they discretize the functional space on a computational grid, obtaining a vector space. Then, again, the discretization must be fine enough so that the vector space adequately resolves the functional space. 3. The measurements used for conditioning in the final application have certain support volumes. The grid must be fine enough to adequately simulate the system behavior at this scale. The resolution required to avoid oscillations in the numerical schemes is almost irrelevant compared to the other restrictions. The experiment performed within the superior project features an artificial heterogeneous sandy aquifer with a vertical integral scale in the range of four to five centimeters. At an absolute minimum of five to ten grid points per integral scale, this requires a vertical grid spacing of maximum four to ten millimeters. The optical fibers to measure the local breakthrough curve have a support volume with an estimated maximum diameter of two or three millimeters. The latter two set the grid scale in the order of a few millimeters. Allowable Resolution For a certain larger spacing of the observations, is it allowable to resolve the system on a grid at a much smaller scale? Is it allowable to choose a grid spacing maybe even below the Darcy scale? The answer to these two questions is yes, with one restriction. I first neglect the fact that the grid scale might fall short of the Darcy scale and focus on the first question. In fact, I implicitly assume that the system of interest behaves like a continuum even on the grid scale of a few millimeters. Using the geostatistical approach allows to interpolate the physical properties of the system between observed locations and therefore allows to resolve the physical properties at arbitrarily small scales. Further, the governing equations describe the physical nature of the processes acting in the system in a continuous manner. Given the parameters down to arbitrarily small scales, they allow to evaluate state variables at the same small scale. Hence it is allowable to resolve the system and its behavior on a grid much smaller than the spacing of observations. Due to the nature of the geostatistical approach, the predictions of system properties and behavior between the observations hold for the ensemble average. Now, the issue of the sub-Darcy scale remains. At the Darcy scale and above, a porous medium may be described by a continuum approach. Strictly spoken, the hydraulic conductivity is not defined below that scale, ergodicity does not hold for the process of flow, and the predictions based on Darcy’s law are valid only in an effective ensemble-average sense. As a result, predictions of my inverse model on the sub-Darcy scale hold only for the ensemble average even at the locations of measurements.

5.3 Detailed Task Description The proposed new method outlined in Section 5.2.1 is mainly based on the Quasi-Linear Geostatistical Approach by Kitanidis (1995) [56]. However, several steps must be undertaken to extend it and

40

Approach

fit it to the purposes of my thesis and to test and discuss the new method. In detail, the following steps are required: • The concept of uncertain prior knowledge has to be included in the Quasi-Linear Geostatistical Approach by Kitanidis (see Chapter 2). This applies both for the expected value of the unknown parameters and the structural parameters that define the auto-covariance of the unknown parameters, see Sections 6.1 and 6.4. • The optimization algorithm used in the Quasi-Linear Geostatistical Approach fails for highly non-linear applications (see Chapter 2). It has to be replaced by a more stable version, i.e., by introducing a specific Levenberg-Marquardt modification, see Sections 6.2.3 and 6.3. • The matrix operations performed on the auto-covariance matrix of the unknown parameters are computationally too expensive for well-resolved applications (see Chapter 2). Spectral methods allow to perform these operations at ease and with highly reduced memory consumption. The list of available spectral methods needs to be complemented to cover all involved matrix operations, see Chapter 8. • The Quasi-Linear Approach successively linearizes the forward model about the current estimated values of the unknown parameters. While the sensitivities for most types of observations with respect to log-conductivity are available in the literature, they need to be derived with respect to the scalar log-dispersion coefficient, see Chapter 7. • The method will be implemented using the mathematical programming language MATLAB. • The method will be tested on artificial data sets to assess its stability, and to discuss the properties of the new coefficient, see Chapter 10. • The method will be applied to a data set from an experiment performed within this project, see Chapter 11.

Chapter 6

Quasi-Linear Geostatistical Inversing This chapter picks up from Chapter 2. There, I decided to use the Quasi-Linear Geostatistical Approach by Kitanidis (1995) [56] to identify the parameter values for hydraulic conductivity and dispersion. The Quasi-Linear Geostatistical Approach essentially is a Bayesian method for non-linear inverse modeling. Based on cokriging, it accounts for the non-linearity using a special form of successive linearization. To equip it for my purposes, the underlying optimization algorithm needs to be stabilized. The goal of this chapter is to introduce, discuss and test the new stabilized algorithm. To provide some basic understanding, I first derive linear cokriging from Bayes theorem and discuss some crucial properties in Section 6.1. For later use, I introduce a new denotation of the cokriging equations and of the objective function that needs to be minimized when finding the cokriging estimate. Section 6.2 discusses some vital properties of the Quasi-Linear Geostatistical Approach that must be at first understood and then carefully maintained when replacing the iteration algorithm. Finally, in Section 6.3, I present the stabilized algorithm, discuss its properties, and test its increased stability and enhanced efficiency in a small test case.

6.1 Linear Cokriging Cokriging is a geostatistically based technique to identify an unknown spatial parameter field given observations of a correlated quantity. In hydrogeologic applications, i.e., geostatistical inverse modeling, the unknown parameters might be the hydraulic conductivity field of a porous formation, and the observations typically are measurements of hydraulic heads, solute concentrations or travel time. Most publications feature cokriging with known mean or with unknown mean, i.e., they do not consider uncertain prior knowledge like an uncertain mean. The concept of an unknown mean is far less realistic and useful than the concept of an uncertain mean. Vague information on the type of aquifer material, for example, provides some useful prior information on the mean value of hydraulic conductivity. Prior information stabilizes the estimation procedure and allows for estimating parameter fields even in case the problem statement would otherwise be ill-posed. A lucid example for this is the estimation of hydraulic conductivity if only head-related measurements and boundary conditions are provided. Only through prior information on the mean, the problem has a unique solution. In this thesis, I derive cokriging for the case of uncertain mean in the Bayesian context. This is the general unifying case and includes both of the former as special cases. Apart from the more realistic

42

Quasi-Linear Geostatistical Inversing

nature and the stabilizing aspects, it discloses certain properties of cokriging that will prove useful for the design of a specialized optimization algorithm later in my thesis.

6.1.1 Cokriging with Uncertain Mean Prior Distribution Consider a random n × 1 multi-Gaussian vector of unknowns s with expected value E [s|β] = Xβ and n × n covariance matrix Qss . In hydrogeological applications, s may be the vector of unknown log-conductivity values in all grid cells. X is a n × p matrix of known deterministic base functions, and β is a p × 1 vector of uncertain drift coefficients. The probability density function (pdf ) of s for a given value of β is s|β ∼ N (Xβ, Qss ). This denotation is equivalent to:   1 T p (s|β) ∝ exp − (s − Xβ) Q−1 (s − Xβ) . (6.1) ss 2 I obtain the marginal distribution of s regardless of the drift coefficients β from Bayes theorem: p (s) =

p (s|β) p (β) , p (β|s)

in which p (β) is the prior knowledge on β defined via a (multi-)Gaussian pdf:   1 −1 ∗ T ∗ p (β) ∝ exp − (β − β ) Qββ (β − β ) . 2

(6.2)

(6.3)

I determine the remaining unknown p (β|s) on the right-hand side by maximizing the joint pdf of s and β, p (s, β) = p (s|β) p (β), with respect to β. This yields a (multi-)Gaussian pdf β|s ∼ b Qββ|s with the mean β b and the conditional covariance Qββ|s given by N β, Qββ|s

=

 −1 T −1 Q−1 ββ + X Qss X

−1 βb = β ∗ + Qββ|s XT Qss (s − Xβ ∗ ) .

(6.4)

Substituting eq. (6.4) into eq. (6.2), the marginal prior distribution p (s) becomes s ∼ N (Xβ ∗ , Gss ) (compare Kitanidis, 1986, for the case of unknown mean [51]), where  −1  −1 −1 T −1 T −1 −1 −1 X Qss = Qss + XQββ XT . (6.5) Gss = Qss − Qss X X Qss X + Qββ

Gss is the generalized covariance matrix of s (Kitanidis, 1993 [54]). To perform the inversion in eq. (6.5), I have applied eq. (A.10). For the case of unknown mean, Gss is not a regular matrix and only its inverse is a well-behaved quantity. From eq. (6.4) follows after some matrix algebra that   ∗ −1 G−1 s − Xβb , (6.6) ss (s − Xβ ) = Qss

which I will use below for simplifications. Observations

Now, consider the m × 1 vector of observations y related to s via a linear transfer function f : y

= f (s) + r = Hs + r .

(6.7)

6.1 Linear Cokriging

43

r is the m × 1 vector of observation error with zero mean and m × m covariance matrix R. H is the so-called sensitivity matrix (or Jacobian) defined by: Hij =

∂fi (s) , ∂sj

which does not depend on s in the linear case. The likelihood of the measurements, as well, is defined by a multi-Gaussian pdf :   1 (6.8) p (y|s) ∝ exp − (y − Hs)T R−1 (y − Hs) . 2 In hydrogeological applications, y may be a vector of head measurements, f (s) the respective model output at the measurement locations for a given conductivity field, and r the errors when measuring hydraulic heads or arrival times. Error propagation yields the expected value of y for given β, the auto-covariance matrix Qyy of the observations y, and the cross-covariance matrix Qsy between s and y (see, e.g., Schweppe 1973 [87]): h i E y|βb = HXβb Qsy

= Qss HT

Qyy

= HQss HT + R .

(6.9)

Following the same procedure as for the prior  statistics of s, I obtain that β for given values of y is b Qββ|y , in which βb and Qββ|y are: distributed according to β|y ∼ N β, ∗ βb = β ∗ + Qββ|y XT HT Q−1 yy (y − HXβ ) −1  T T −1 Qββ|y = Q−1 + X H Q HX . yy ββ

(6.10)

In analogy to the marginal distribution of s, I derive the marginal distribution of y. The resulting pdf is y ∼ N (HXβ ∗ , Gyy ) with Gyy defined by Gyy

=



Q−1 yy



Q−1 yy HX



T

X H

T

Q−1 yy HX

+

Q−1 ββ

= Qyy + HXQββ XT HT = HGss HT + R .

−1

T

X H

T

Q−1 yy

−1

(6.11)

Again, a useful identity similar to eq. (6.6) holds which will prove helpful for later simplifications:   ∗ −1 b G−1 (6.12) yy (y − HXβ ) = Qyy y − HXβ .

Posterior Mean

The cokriging estimate ˆs for the unknowns s given the observations y is identical to the mean value of the posterior distribution of s given y. It can be derived from Bayesian analysis (compare Kitanidis, 1986 [51]):  1 p (y|s) p (s) T ∗ ∝ exp − (s − Xβ ∗ ) G−1 p (s|y) = ss (s − Xβ ) p (y) 2  1 T (6.13) − (y − Hs) R−1 (y − Hs) . 2

44

Quasi-Linear Geostatistical Inversing

The normalizing constant p (y) was omitted here. The value of s that maximizes eq. (6.13) is the posterior mean ˆs or the cokriging estimate. Maximizing eq. (6.13) is equivalent to minimizing its negative logarithm, hereafter referred to as the objective function L(s): L (s) =

1 1 ∗ (y − Hs)T R−1 (y − Hs) + (s − Xβ ∗ )T G−1 ss (s − Xβ ) . 2 2

(6.14)

Now, I set the first derivative of L (s) to zero in order to obtain the normal equations. After several rearrangements, in which I use eq. (6.9) and apply eq. (A.11) to the partitioned matrix  −1  Qss HT , H −R I obtain:

  b . ˆs = Xβb + Qsy Q−1 y − HX β yy

This shows that ˆs is comprised of the prior mean Xβb plus an innovation term, which depends on the deviations of the observations from their expectation. This innovation term represents those random fluctuations of s about its mean value that are relevant for the observations. Defining the m×1 vector ξ allows to express ˆs in a parameterized form (compare Kitanidis, 1996 [57]):   b ξ = Q−1 y − HX β (6.15) yy ˆs = Xβb + Qsy ξ .

(6.16)

The matrix of base functions X and the cross-covariance matrix Qsy evidently are used as geostatisb I insert eq. (6.16) into eq. (6.4) and simplify: tically based parameterizations for ˆs. To obtain β,   b − β∗ . XT HT ξ = Q−1 β (6.17) ββ Enforcing the constraint (6.17) while solving eq. (m + p) system (compare Kitanidis, 1996 [57]):   Qyy HX XT HT −Q−1 ββ

(6.15) is accomplished by solving the (m + p) ×

ξ βb



=



y ∗ −Q−1 ββ β



(6.18)

.

For the special case of Q−1 ββ = 0, i.e. cokriging with fully unknown mean, eq. (6.16) is also known as the function estimate form of ordinary cokriging. Eq. (6.18) is the system of cokriging equations with the cokriging matrix, and eq. (6.17) is known as the unbiasedness constraint (e.g. Kitanidis, 1997 [60]). In standard geostatistical literature, these equations are derived by finding an unbiased linear estimator with minimum estimation variance (Best Linear Unbiased Estimator, BLUE). For the sake of easy reading, I adapt this nomenclature. Posterior Covariance The posterior covariance Qss|y of s given y quantifies the amount of uncertainty remaining after the conditioning procedure. The Hessian, i.e. the matrix of the second derivatives of the objective function (eq. 6.14), is sometimes referred to as the observed information. Its inverse defines the posterior covariance matrix:

Qss|y

=



∂ 2 L (s) ∂sT ∂s

−1

= HT R−1 H + G−1 ss

−1

.

(6.19)

6.1 Linear Cokriging

45

Applying rules of matrix algebra yields a form that is computationally more efficient: Qss|y

= Qss −



HQss X

T 

Qyy XT H T

HX −Q−1 ββ

−1 

HQss X



.

The central matrix in this expression is the cokriging matrix. According to the rules of partitioned matrices (eqs. A.2-A.11), the inverse of the cokriging matrix is 

Qyy XT H T

HX −Q−1 ββ

−1

=



Pyy Pby

Pyb Pbb



,

in which the submatrices are Pyy

 −1 −1 T T −1 −1 XT HT Q−1 = Q−1 yy yy − Qyy HX X H Qyy HX + Qββ

Pby

=

Pbb

 −1 T T −1 = − Q−1 + X H Q HX . yy ββ



T T −1 Q−1 ββ + X H Qyy HX

−1

T XT HT Q−1 yy = Pyb

(6.20) (6.21) (6.22)

Kitanidis (1996) [57] used this partitioning to find an expression for the posterior covariance that is most convenient for the case of unknown mean: = Qss − Qsy Pyy Qys − XPbb XT − XPby Qys − Qsy Pyb XT .

Qss|y

(6.23)

−1 It is obvious that Pyy = Gyy and Pbb = −Qββ|y . Since, for the case of uncertain mean, Gss is a regular matrix, I may apply the rules for partitioned matrices (eq. A.10) directly to eq. (6.19). After some simplifying rearrangements, I obtain the most efficient form of the posterior covariance for the case of uncertain mean:

Qss|y

= Gss − Gsy G−1 yy Gys ,

(6.24)

where Gsy = GTys = Gss HT . Partitioned Form For later analysis, it is convenient to separate ξ from βb in the cokriging system (eq. 6.18) and the BLUE (eq. 6.16). Using the same partitioning as above: ∗ = Pyy y − Pyb Q−1 ββ β

(6.25)

∗ βb = Pby y − Pbb Q−1 ββ β

(6.26)

ξ

∗ ˆs = (XPby + Qsy Pyy ) y − (XPbb + Qsy Pyb ) Q−1 ββ β .

(6.27)

Simplified Objective Function Now I present a new form of the objective function that is computationally most efficient. Consider the prior term of eq. (6.14): Lp

=

1 T ∗ (s − Xβ ∗ ) G−1 ss (s − Xβ ) . 2

46

Quasi-Linear Geostatistical Inversing

−1 −1 By substituting G−1 ss = Gss Gss Gss and eqs. (6.6), (6.16) and (6.9) in the prior term, I can simplify the prior term:

1 T ξ (Gyy − R) ξ . 2 can be expressed by: Lp =

Likewise, the likelihood term Lm

Lm =

1 T ξ (R) ξ . 2

(6.28)

(6.29)

Thus, the objective function in total can be written as: L=

1 T ξ Gyy ξ . 2

(6.30)

6.1.2 Properties of Cokriging Uniqueness and Well-Posedness The matrix of base functions X and the cross-covariance matrix Qsy evidently are used as a geostatistically based parameterization of ˆs in the BLUE (eq. 6.16), spanning a (m + p)-dimensional subspace for ˆs. As a consequence, the degrees of freedom are reduced from n to (m + p). Only (m + p) parameters βb and ξ have to be solved in the cokriging system of equations (eq. 6.18). This subspace is not an arbitrary choice, but stems from strict Bayesian analysis.

The task of parameter identification is initially underdetermined in the sense that there are less observations than unknown values (m < n). The geostatistical approach, however, converts the problem into a well-determined problem with (m + p) equations for (m + p) unknown parameters. Hence, provided that the cokriging matrix is not rank deficient, parameter estimation and inverse modeling based on cokriging-like procedures is a well-posed problem and yields unique solutions. Limiting Cases for Measurement Error and Prior Information

To clarify the influence of the measurement error R on ˆs, I analyze how the BLUE ˆs reproduces the measurements: ˆ y

= Hˆs .

(6.31)

The residuals of the BLUE are:   ˆr = y − Hˆs = y − H Xβb + Qsy ξ = Rξ ,

(6.32)

in which I have used eqs. (6.16), (6.15) and (6.9). Eq. (6.32) shows that ˆs never perfectly reproduces the observations for non-zero R (unless in the unlikely case that ξ vanishes). It is of great importance to understand that the residuals ˆr are not a lack of accuracy in the BLUE, but an inevitable and desired consequence of meeting the measurements under the smoothness condition implied by the prior statistics of s. The statistical properties of these residuals are discussed below. For later derivations, it is convenient to replace ξ in eq. (6.32) by eq. (6.25): ∗ ˆr = RPyy y − RPyb Q−1 ββ β .

(6.33)

6.1 Linear Cokriging

47

ˆ for zero measurement error: Using eq. (6.32), it is easy to obtain the properties of ˆr and y ˆ lim y

R→0

= y

lim ˆr = 0 .

R→0

(6.34)

At the limit of infinite measurement error, substituting R−1 = 0 into the posterior pdf (eq. 6.13) yields the prior pdf of s, so that: lim

ˆs = Xβ ∗

lim

ˆ y

lim

ˆr = y − HXβ ∗ .

R−1 → 0 R−1 → 0 R−1 → 0

= HXβ ∗

(6.35) (6.36) (6.37)

If the drift coefficients are absolutely unknown before considering the measurements, i.e. Q −1 ββ = 0, analyzing eq. (6.10) yields:  b = XT HT Q−1 HX −1 XT HT Q−1 y , (6.38) lim β yy yy Q−1 ββ →0

i.e., the posterior mean in eq. (6.16) is solely based on the observations. For the limiting case of perfectly known prior mean, with Qββ = 0, the second term in eq. (6.10) vanishes: lim βb = β ∗ .

Qββ →0

(6.39)

Then, the value of βb is fully determined by the prior value β ∗ of the trend parameters.

Orthonormal Residuals

Kitanidis (1991) [53] discusses that the orthonormal residuals, denoted by ˆrn , are an important quantity for model criticism. In this section, I show how to efficiently ortho-normalize the cokriging residuals ˆr to obtain ˆrn . First, I define the covariance of ξ with the help of eq.(6.15) and eq.(6.9):     T  T −1 −1 b b E ξξ = E Qyy y − HXβ y − HXβ Qyy = Q−1 yy E



 T  b y − HXβb y − HXβ Q−1 yy

−1 −1 = Q−1 yy Qyy Qyy = Qyy .

Using eqs. (6.40) and (6.32), the statistics of ˆr can be derived as follows:     Qˆrˆr = E ˆrˆrT = E Rξξ T R = RQ−1 yy R .

(6.40)

(6.41)

Qˆrˆr quantifies that part of the variability in y which is not captured by the cokriging procedure. Now that the covariance matrix of ˆr is known, I use the following transformation to obtain orthonormal residuals: ˆrn = Syy R−1ˆr , (6.42) where Syy is the upper triangular matrix obtained from the Cholesky decomposition of Qyy . The proof that ˆrn is indeed an orthonormal quantity is simple:     E ˆrn ˆrTn = Syy R−1 E ˆrˆrT R−1 STyy = I      E ˆrTn ˆrn = Tr E ˆrn ˆrTn = m .

48

Quasi-Linear Geostatistical Inversing

According to Kitanidis (1991) [53], the geostatistical model used for Qss , Qββ and the measurement errors R must be rejected if ˆrn does not have zero mean and a variance of unity within certain bounds. For statistical testing, ˆrTn ˆrn should follow the χ2 distribution with m degrees of freedom. Computational Costs Discussing the computational costs of cokriging becomes relevant for large problems, since they typically increase with the square or cube of the problem size. In cases where the transfer function f (s) is a partial differential equation, the discretization of the unknowns is dictated by stability criteria of the numerical schemes applied for evaluating f (s). Hence, a number of unknowns in the order of n = 106 is not unusual. The number of observations m is in most cases much smaller than n, e.g. in the order of 10 or 100, and the number p of base functions is typically one for the case of a constant unknown or uncertain mean. The cokriging equations are rapidly solved since the cokriging matrix is sized only (m + p) × (m + p). The potentially expensive tasks are (1) computing H, (2) computing Qsy and Qyy , and (3) evaluating the value of the objective function. However, means to reduce these costs have been found: 1. Applying standard numerical differentiation, computing H takes (n + 1) evaluations of f (s). A highly efficient way to compute H is the adjoint-state method, which requires only (m + 1) solutions of problems that are formally similar to f (s), see, e.g., Sun and Yeh, 1990, or Sun, 1994 [95, 94]. 2. Computing Qsy and Qyy by standard methods is strictly impossible for large n. When n = 106 , for example, storingQss requires 8000 Gigabyte of memory, exceeding the capacity of all present-day HDD devices. Clearly, Qsy = Qss HT cannot be evaluated in such cases. Under certain conditions, however, spectral methods help to drastically reduce the storage requirements and computational costs. These methods are discussed separately in Chapter 8. 3. The objective function in the form of eq. (6.14) requires storage of Qss and the solution of ∗ G−1 ss (s − β ) (eq. 6.14), which is associated with prohibitive computational costs. For linear cokriging, evaluating the objective function is not compulsory. In iterative schemes like the Quasi-Linear Geostatistical Approach, however, it is a necessity. The simplified form presented above (eq. 6.30) is evaluated in an instant since it replaces Qss by Qyy . This holds in the typical case that n  m.

6.1.3 Bad Condition of the Cokriging Matrix In some cases, the cokriging matrix may have a poor condition. Then, numerical artifacts and anomalies appear in the solution caused by numerical errors while solving the cokriging system. Dietrich and Newsam (1989) [24] discussed how these effects appear as the quantity of available data increases and the discretization of the unknowns is refined. They suggested to increase the measurement error R to improve the condition of the cokriging matrix. Increased measurement error, however, induces a loss of information. In the following, I will discuss additional potential reasons for a poor condition of the cokriging matrix and effective countermeasures. In general, the bad condition can originate from either of the following: 1. The sensitivities of the observations vary over many orders of magnitude, mainly since the absolute values of the observations differ drastically. This is the case, e.g., when simultaneously considering measurements of seepage velocity, hydraulic heads and arrival times.

6.2 Quasi-Linear Geostatistical Approach

49

2. There is high correlation among the observations because the spacing in between is small compared to the correlation scale of the unknowns, or due to observations with otherwise redundant information. 3. The cokriging matrix is rank-deficient due to a lack of a specific type of information.. Basically, (1) is a simple scaling problem, (2) is a structural problems of the observations, and (3) is a structural problem of the inverse problem. The scaling problem can be solved by simple diagonal scaling: Let M be the cokriging matrix and S a diagonal scaling matrix. The elements on the diagonal are given by the inverse square root of the absolute values of the diagonal elements in M. Then the solution of Mx = b can be obtained from solving the scaled system:  SMS S−1 x = (Sb) .

The scaling factors might be so large that numerical errors in the order of the floating point precision are amplified up to significant orders by the scaling matrix itself. For these cases, I recommend to use a combination of diagonal scaling and iterative refinement, if necessary, using a LevenbergMarquardt algorithm. Too high correlation among observations can usually be overcome by considering measurement error in an adequate manner. In most cases, the lacking type of information can be substituted by prior knowledge in form of an uncertain mean for the parameters. An alternative powerful but still unpublished method to deal with poorly conditioned cokriging matrices by Kitanidis is to use the Moore-Penrose generalized inverse of the cokriging matrix instead of its inverse. As stable and robust as it might be in most applications, I found this approach inappropriate in the context of the optimization algorithms I present in the following section.

6.2 Quasi-Linear Geostatistical Approach 6.2.1 Introduction In Chapter 2, I discussed the spectrum of available methods for non-linear geostatistical inversing, and subsequently chose to use the Quasi-Linear Geostatistical Approach in this thesis. This method first estimates the unknown parameter field in a cokriging-like procedure, assuming that the autocovariance function of the unknowns is perfectly known. In a second step, the structural parameters that define the covariance function are estimated. Since the overall procedure is non-linear, both steps are repeated iteratively. In the original publication by Kitanidis (1995) [56], the Quasi-Linear Geostatistical Approach is designed for unknown mean of the parameter field. In the previous section, I derived linear cokriging for the case of uncertain mean. The reasons for using the general case of uncertain mean were fourfold: (1) the concept of uncertain mean is more realistic than the concept of unknown mean, (2) prior information stabilizes the estimation procedure, (3) it removes the seemingly immanent ill-posedness of certain problem types and (4) it is the general case that includes the known and unknown mean as limiting cases. For the very same reasons, I will present an extension of the Quasi-Linear Geostatistical Approach to the case of uncertain mean in Sections 6.2.2 and 6.2.3. In this thesis, I use the Quasi-Linear Geostatistical Approach to identify the parameters for both hydraulic conductivity and dispersion given observations of hydraulic heads and temporal moments of

50

Quasi-Linear Geostatistical Inversing

breakthrough curves from tracer experiments. Clearly, the governing equations for these quantities are non-linear with respect to the unknown parameters. The Quasi-Linear Geostatistical Approach successively linearizes the forward problem about the current estimate in an iterative scheme. It defines the solution to the inverse problem in a parameterized form. The parameterization is based on a rigorous Bayesian analysis, resulting in a form of the solution that is highly similar to the cokriging estimator. The sole purpose of the iteration procedure is to optimize the subspace in which the parameterization of the solution is defined. In each iteration step, the previous trial solution is projected onto the current subspace. Only at the last iteration step, when the optimal subspace has been found, the conditioning is carried out and the conditional covariance is evaluated based on this optimal subspace. The form of the solution used in this method and some consequences thereof are discussed in depth in Section 6.2.4. The iteration algorithm underlying the Geostatistical Approach formally similar to the GaussNewton algorithm (see, e.g. Press et al., 1992 [75]) for least-squares fitting. For mildly non-linear least-squares fitting, the Gauss-Newton algorithm is well-known to be efficient. However, it fails for strongly non-linear problems. The inverse problem posed in this thesis exhibits a higher degree of non-linearity than past applications of the Quasi-Linear Geostatistical Approach. Observations such as temporal moments of breakthrough curves are especially non-linear with respect to hydraulic conductivity since their sensitivity pattern is strongly influenced by the streamline pattern. The streamline pattern is being distorted and may oscillate in the iterative procedure. The same holds for the sensitivity pattern of the dispersion coefficient. Further non-linearity stems from the fact that the sensitivities of conductivity and the dispersion coefficient are inter-dependent. This will become evident in Chapter 7, where each of the quantities will occur in the expressions for the sensitivity of the respective other quantity. Stronger non-linearity of the inverse problem decreases the convergence radius and increases the number of necessary iterations. Above a certain extent of non-linearity, the method fails entirely. This gives a clear motivation to improve the existing optimization algorithm in the Quasi-Linear Geostatistical Approach in terms of stability and robustness, reducing the number of iteration steps and increasing the convergence radius for application to strongly non-linear problems. The Levenberg-Marquardt algorithm (Levenberg, 1994, Marquardt 1963 [65, 67]) is a modification of the Gauss-Newton method that, in a self-adaptive manner, navigates between Gauss-Newton and the method of steepest descent (see, e.g., Press et al., 1992 [75]). This idea is accomplished by amplifying the main diagonal of the matrix of second derivatives in the iteration algorithm. Combining the robustness of the method of steepest descent with the computational efficiency of the GaussNewton method, the Levenberg-Marquardt algorithm has become a highly valued optimization tool for non-linear tasks of least-squares fitting in many engineering fields. The basic idea of the Levenberg-Marquardt algorithm has been applied to geostatistical inverse modeling by several authors. Dietrich and Newsam (1989) [24] discussed how increasing the measurement error on the main diagonal of the cokriging matrix can help to improve ill-conditioned matrices and suppress anomalies in the estimated parameter fields. The Successive Linear Estimator introduced by Yeh et al. (1996) [102] uses an adaptively amplified measurement error term in the auto-covariance of measurements and a relaxation term for the cross-covariances to stabilize the algorithm. Due to its robustness, the Successive Linear Estimator has already been applied to inverse problems in variably saturated flow (Zhang and Yeh, 1997, Hughson and Yeh, 2000 [104, 47]). Motivated by the success of the Levenberg-Marquardt algorithm in other areas of engineering and in the Successive Linear Estimator, I present a modified Levenberg-Marquardt algorithm for the QuasiLinear Geostatistical Approach in Section 6.3 and discuss its properties in Section 6.3.1. The original form of the Levenberg-Marquardt algorithm has to be modified to account for the parameterized

6.2 Quasi-Linear Geostatistical Approach

51

form of the solution used in the Geostatistical Approach. Further modifications are necessary to avoid violations of the underlying Bayesian concept. For the structural parameters, the same argumentation applies as for prior information on the mean of the unknown parameters. The part of the original method that estimates the structural parameters does not account for prior information. In cases where the data do not reflect the entire covariance function, the structural parameters are not identifiable. A more realistic approach is that, due to rough knowledge of the geology of a porous formation, some uncertain prior knowledge on the correlation scales or other structural parameters is available. Then, like when introducing prior knowledge on the mean value, the presence of prior information on the structural parameters improves the identifiability of the structural parameters. In Section 6.4, I will extend the estimation of the structural parameters to cases with prior knowledge. I will conclude this section with a performance test in which I apply both the conventional and the new algorithm to a typical non-linear inverse modeling problem taken from Cirpka and Kitanidis (2001) [14].

6.2.2 Successive Linearization The Quasi-Linear Geostatistical Approach successively linearizes the transfer function about the current estimate sk : ˜ k (s − sk ) f (s) ≈ f (sk ) + H ˜ k = ∂f (s) , H ∂s sk

(6.43)

˜ k is the m × n sensitivity matrix linearized about sk . Eq. (6.43) is exact at the limit of in which H s → sk . All quantities marked with a tilde are linearized quantities. Now, a modified vector of observations is introduced: ˜ k sk , yk0 = y − f (sk ) + H (6.44) and from linearized error propagation one obtains (see, e.g., Schweppe, 1973 [87]): i h ˜ k Xβb E y0 |βb = H k

˜ sy,k Q

˜ Tk = Qss H

˜ yy,k Q

˜ k Qss H ˜ Tk + R . = H

(6.45)

Using the modified vector of observations produces a linearized objective function that is formally identical to eq. (6.14): L (s) =

T   1 1 T ∗ ˜ k s R−1 y ˜ ks . ˜k − H ˜k − H (s − Xβ ∗ ) G−1 y ss (s − Xβ ) + 2 2

(6.46)

From here on, all subsequent derivations are identical to linear cokriging. However, when evaluating the objective function, only the simplification of the prior term (eq. 6.28), but not eq. (6.29) can be applied, since the latter is only exact for s = sk . In the linearized case, the expressions for the conditional covariance (eqs. 6.23 and 6.24) are not exact. Instead, the Cramer-Rao inequality provides a lower bound for the conditional covariance: Qss|y ≥ Gss − Gsy G−1 yy Gys . For the proof, see Rao (1973) [79, pp.324] or Schweppe (1973) [87, pp.372].

52

Quasi-Linear Geostatistical Inversing

6.2.3 Conventional Iteration Algorithm The iteration algorithm used in the Quasi-Linear Geostatistical Approach is formally similar to the Gauss-Newton method (Press et al., 1992 [75]) with a non-linear constraint. For comparison, I provide both algorithms below. Algorithm 1 (Gauss-Newton method with constraint): An unknown ng × 1 vector of parameters ξ is related to the mg × 1 vector of measurements y, mg > ng , by the relation y = f (ξ) + r. The objective function T to be minimized is χ2 = (y − f (ξ)) Wξξ (y − f (ξ)), in which Wξξ is a mg × mg weighting matrix, while fulfilling a constraint of the form G (ξ) = G0 . Deviations of G (ξ) from G0 are punished by the weighting matrix Wνν . Define an initial guess ξ0 . Then, ∂G(ξ) ˜ k = ∂f (ξ) and g ˜ 1. compute H = k ∂ξ ∂ξ . ξk

ξk

2. Find ξk+1 by solving:

(6.47)

ξk+1 = ξk + ∆ξ



˜ T Wξξ H ˜k H k ˜k g

˜kT g −1 −Wνν



∆ξ ν



=



˜ T Wξξ (y − f (ξk )) −H k (G0 − G (ξk ))



.

(6.48)

3. Increase k by one and repeat until convergence. The algorithm introduced by Kitanidis (1995) [56] covers the case of unknown mean. The version discussed here extends the method to the general case of uncertain mean: Algorithm 2 (Quasi-Linear Geostatistical Approach): Define an initial guess s0 . Then, ˜ k. 1. compute H 2. Find sk+1 by solving:  ˜ Qyy,k ˜T XT H k

˜ sy,k ξk+1 sk+1 = Xβbk+1 + Q

˜ k X   ξk+1  H bk+1 −Q−1 β ββ

=



˜ k sk y − f (sk ) + H −1 ∗ −Qββ β

(6.49) 

.

(6.50)

3. Increase k by one and repeat until convergence.

˜ T WH ˜ k and Q ˜ yy,k are derived from the Hessian Algorithm 1 and 2 have in common, that both H k matrices of the corresponding linearized objective functions (compare eq. 6.30). Further, the second ˜ k X and g ˜k being the corresponding derivaline in both cases follows from the constraints, with H tives. Finally, the right-hand side vectors of both eq. (6.48) and (6.50) contain the residuals from the previous step in one form or another. The main difference originates from the parameterized form used in the Geostatistical Approach. The estimator in Algorithm 2 (eq. 6.49) is based on the (m + p)-dimensional subspace spanned by X ˜ sy,k . In each iteration step, when H ˜ k is updated, the subspace spanned by Q ˜ sy,k is updated and Q ˜ simultaneously. Then, the old subspace defined by Qsy,k−1 is outdated and no more valid. Hence, unlike in the Algorithm 1, eq.(6.47), the updated solution is not given by the previous solution plus a modification. Instead, Algorithm 2 projects the previous solution onto the new subspace by including ˜ k sk in the right-hand side vector of eq. (6.50). the term H

6.2 Quasi-Linear Geostatistical Approach

53

6.2.4 Form of the Solution The following is a graphical and instructive example to discuss the form of the solution and the iteration algorithm chosen in the Geostatistical Approach (Algorithm 2). Consider that two subspaces ˜ sy,k−1 and the current subspace spanned by Q ˜ sy,k , are available: an outdated subspace spanned by Q ˜ sy,k 6= Q ˜ sy,k−1 . The subspace spanned by Q ˜ sy,k−1 is not a linear combination of the components Q ˜ sy,k . The current sensitivity matrix H ˜ k is more accurate for the following estimation, since it has of Q ˜ k−1 has been linearized been linearized about a value of s that is closer to ˆs than the value which H ˜ about. Let us clearly regard Hk−1 as a poor linearization. Assume I defined a solution to the inverse problem in the following form: ˜ sy,k ξk + Q ˜ sy,k−1 ξk−1 , ˆs = Xβbk + Xβbk−1 + Q

(6.51)

bk , βbk−1 , ξk and ξk−1 so that ˆs minimizes the objective function and then fitted the parameters β (eq. 6.46). It is clear that, if neglecting the smoothness condition implied by the prior term, we are free to choose any combination of the parameters that lead to a perfect fit with the measurements. Since there are (2m + 2p) parameters to fit while the observations and the conditions for the trend parameters result in no more than (m + p) equations, the solution for ˆs would not be unique. Now, let us take into account the contribution of the prior term in the objective function. The current subspace is based on the more accurate linearization. This fact makes it more “efficient” for meeting the measurements in the sense that smaller perturbations lead to the same satisfaction of the measurements while allowing for a smaller value of the prior term. Hence, the previous subspace is completely discarded by finding that only with ξk−1 = 0 the objective function is minimized. The same analysis holds for any number of available subspaces. At this point, I will clarify several consequences: 1. As mentioned in previous sections, the subspaces spanned by X and Qsy reduce the degrees of freedom and hence allow to define a unique solution in the case of linear cokriging. If several subspaces were available, the property of uniqueness would be lost. 2. The iteration procedure in Algorithm 2 finds the optimal subspace for the estimator, in which the optimum subspace is defined such that the measurements are satisfied by minimum perturbations. By defining the unique optimal subspace for the non-linear case, the Quasi-Linear Geostatistical Approach maintains the uniqueness of the solution. 3. Algorithm 2 does not adhere to previous trial solutions, but projects them onto the current subspace in each iteration step. The final estimate is entirely based on the final (optimal) subspace, and the conditional covariance is defined in the very same subspace, using eq. (6.24) with ˜ k . The single iteration steps are not a process of conditioning, but merely of finding the H=H optimal subspace. Hence, it is not necessary to update the prior covariance during the iteration procedure as it would be the case in a sequential or successive Bayesian updating procedure. In the Quasi-Linear Geostatistical Approach, seen from the Bayesian point of view, the act of conditioning is entirely carried out in the final step. 4. Line search algorithms find solutions of the form sk+1 = sk +∆s. In the case of the Quasi-Linear Geostatistical Approach, this would be equivalent to the form given in eq. (6.51). If a line search modification was applied to Algorithm 2, the outdated subspaces would not be discarded, and the Bayesian concept would be violated. 5. Solutions of the form sk+1 = sk +∆s do not violate the Bayesian concept if, and only if, they are used in the context of Bayesian updating procedures such as the Successive Linear Estimator (Yeh et al., 1996 [102]) or sequential kriging and cokriging (Vargas and Yeh, 1999 [100]). In

54

Quasi-Linear Geostatistical Inversing

these methods, each iteration step is defined as a process of conditioning. The covariances are successively updated so that the prior covariance of each iteration step is given by the conditional covariance of the preceding step.

6.2.5 Drawbacks of the Conventional Algorithm For strongly non-linear problems, the Gauss-Newton method (Algorithm 1) in general and the QuasiLinear Geostatistical Approach (Algorithm 2) in particular are known to diverge due to overshooting and oscillations. In comparison to Algorithm 1, Algorithm 2 has an additional disadvantage based on the changing subspace of the solution, as I will show in the following analysis. Deterioration of the Solution I split the entries of the right-hand side vector in eq. (6.18) into an innovative and a projecting part: # " # "   ˜ k sk y − f (sk )  H ˜ y + = . (6.52) ∗ ∗ b b −Q−1 −Q−1 −Q−1 ββ β ββ β − βk ββ βk {z } {z } | | projecting

innovative

When inserting these parts into the linearized cokriging equations (eq. 6.50) separately, I obtain a projecting part and an innovative part of the parameter vector:       ξk+1 ξpr ξin = + b , (6.53) βbk+1 βbpr βin which in turn can be inserted into the estimator (eq. 6.49): sk+1

˜ sy,k ξin . ˜ sy,k ξpr + Xβbin + Q = Xβbpr + Q | {z } | {z }

(6.54)

˜ k spr + H ˜ k sin . = H | {z } | {z }

(6.55)

spr

sin

  The innovative part generates new innovations based on the residuals (y − f (sk )) and β ∗ − βbk of the previous trial solution, while the projecting part projects sk onto the new subspace spanned by ˜ sy,k . Finally, the splitting affects the values of the transfer function y ˆ returned by the BLUE: Q ˆ k+1 y

ˆ pr y

ˆ in y

˜ k spr for y in eq. (6.33) to analyze how the projection ˆ pr = H Now, I substitute the projecting part y reproduces the observations: b ˜ yy,k H ˜ k sk − R P ˜ yb,k Q−1 β ˆrpr = RP ββ k .

(6.56)

ˆ pr 6= y ˆ k unless R = 0: Then it becomes evident that y ˆ pr lim y

R→0

=

ˆk . y

(6.57)

This means that, for non-zero R, the projecting part spr never satisfies the observations to the same extent as the previous trial solution sk . The extreme case of infinite R may serve as an illustrative example: Inserting the projecting part into eq. (6.37) yields: lim

R−1 → 0

spr

= Xβ ∗ .

(6.58)

6.3 Modified Levenberg-Marquardt Algorithm

55

˜k = H ˜ k−1 , it can be shown that spr is equal to sk if, and only if, R = 0. This is easy to see For H ˜ sy,k = Q ˜ sy,k−1 , and the projection from the previous since the subspace for ˆs does not change, i.e., Q ˜ k 6= H ˜ k−1 , the subspace onto the current one is an identity operation. For non-linear f (s), i.e. H projection is not an identity operation. R = 0 is no more sufficient to ensure that spr = sk , so that, in general, spr 6= sk . Then, because f (s) is non-linear and eq. (6.43) is only an approximation, it is not ˜ k spr = H ˜ k sk . Thus, the projecting part of necessarily true that f (spr ) = f (sk ) even for R = 0 and H the solution deteriorates in any case. The larger the extent of non-linearity or the larger the step sizes occurring during iteration, the less accurate is the linearization. This, in turn, leads to a higher degree of deterioration in the projection, with the potential to prevent the entire algorithm from converging. Local Minima For strongly non-linear problems, the objective function may have multiple minima. In such cases, the Geostatistical Approach (Algorithm 2) may find a local minimum that satisfies the measurements to an extent specified by the measurement error statistics. Its identity to the global minimum, however, cannot be proved. Then, it is common practice to accept the solution if (1) it is acceptably smooth to the subjective satisfaction of the modeler, and (2a) the objective function does not exceed a prescribed value, typically derived from the χ2 -distribution for (m + p) degrees of freedom or (2b) the ortho-normalized residuals obey certain statistics (see Kitanidis, 1991 [53]). According to my experience, most failures to find an acceptable solution originate from overshooting of iteration steps, which leads to solutions that are not sufficiently smooth in the sense of the prior distribution.

6.3 Modified Levenberg-Marquardt Algorithm In this section, I present and discuss a modified Levenberg-Marquardt Algorithm for the QuasiLinear Geostatistical Approach. The choice of the Levenberg-Marquardt algorithm and the nature of the modifications is based on the following train of thought. 1. The Geostatistical Approach (Algorithm 2) suffers from oscillations and overshooting, leading to solutions that fail to comply with the smoothness constraint. Further, in addition to typical problems of successive linearization methods with strongly non-linear problems, the solution deteriorates whenever the step size is too large and the algorithm may fail. Applying a line search on top of the Algorithm 2 would violate the required form of the solution. 2. The Levenberg-Marquardt algorithm (Levenberg, 1944, Marquardt, 1963 [65, 67]) suppresses oscillations and overshooting by controlling the step size and direction. It does so by amplifying the diagonal entries of eq. (6.48) in the Gauss-Newton algorithm (Algorithm 1). 3. This is similar to amplifying the measurement error R in Algorithm 2, eq. 6.50. Using R to amplify the diagonal entries of the linearized cokriging system will put the step size control into a statistically based and well controllable framework within the Bayesian concept. 4. When exerting intelligent control over R during the course of iteration, the solution space can systematically be screened starting at the prior mean, which increases the chance that the solution complies with the smoothness condition. 5. The role of the projecting and the innovative parts can be taken into account such that R is controlled separately for these two parts: to suppress the deterioration through the projection and to prevent overshooting in the innovative part. The measurement error R has to be decreased in the projection part and increased in the innovation part to reduce the step size.

56

Quasi-Linear Geostatistical Inversing

6. Error analysis of the linearization can be used to prescribe a certain maximum step size. Further, if an iteration step is very small and the linearization is still sufficiently accurate, it can be re-used for the next iteration step. Again, I first discuss a few properties of the standard Levenberg-Marquardt algorithm for leastsquares fitting before introducing its new counterpart for the Geostatistical Approach. Algorithm 3 (Levenberg-Marquardt algorithm with constraint): The problem description is identical to Algorithm 1. Define an initial guess ξ0 and initialize the Levenberg-Marquardt parameter λ with λ > 0. Then, ˜ k and g ˜k . 1. compute H 2. Find ξk+1 by solving: 

˜ T Wξξ H ˜ k + λD1 H k ˜k g

(6.59)

ξk+1 = ξk + ∆ξ   ˜kT ∆ξ g −1 ν −Wνν + λD2   ˜ T Wξξ (y − f (ξk )) −H k . = G0 − G (ξk )

(6.60)

If the objective function does not improve, increase λ and repeat step 2. Otherwise, decrease λ. 3. Increase k by one and repeat until convergence. The terms λD1 and λD2 amplify the diagonal entries of the Hessian matrix in eq. (6.60). Initially, λ is assigned a low value, λ > 0. Whenever convergence is poor, λ is increased by a user-defined factor, and is again decreased whenever convergence is good. For λ → ∞ , the step size |∆ξ| approaches zero, the search direction approaches the direction of steepest descent, and there is always an improvement of the objective function unless ξk is a minimum. As ξk converges towards the solution, λ can be decreased to zero. Ideally, during the last iteration steps, the unmodified system of equations is used and the algorithm is identical to Algorithm 1. Algorithm 4 (Modified Levenberg-Marquardt Algorithm for the Quasi-Linear Geostatistical Approach): The problem statement is as specified for Algorithm 2. Error analysis yields that the error of linearization is acceptable only for |s − sk | < ∆s1 , and negligible for |s − sk | < ∆s2 . Define an initial guess s0 = Xβ ∗ and initialize λ with λ > 0. ˜ k unless |sk−1 − sk | < ∆s2 . 1. compute H

2. Find sk+1 by solving the following equations:   bpr + βbin + Q ˜ sy,k (ξpr + ξin ) sk+1 = X β  ˜ Qyy,k + λR ˜T XT H k

 ˜ Qyy,k − τ R ˜T XT H k

  ˜ kX ξin H − (1 + λ) Q−1 βbin ββ

  ˜ kX ξpr H − (1 + λ) Q−1 βbpr ββ

=

"

=

"

τ = 1 − (1 + λ)−γ .

y − f (sk )

∗ b −Q−1 ββ β − βk

(6.61) 

˜ k sk H b − (1 + λ) Q−1 ββ βk

# #

(6.62)

(6.63) (6.64)

If |sk+1 − sk | >= ∆s1 or if the objective function does not improve, increase λ and repeat step 2. Otherwise decrease λ and continue. 3. Increase k by one and repeat until convergence.

6.3 Modified Levenberg-Marquardt Algorithm

57

6.3.1 Properties of the Modified Algorithm Algorithm 4 has the following properties: 1. The Levenberg-Marquardt parameter λ controls the step size. If the previous step is sufficiently small, the limit for λ → ∞ is a step size of zero. This property is discussed below in more detail. 2. By appropriate choice of γ > 0, the algorithm can be fine-tuned to the problem at hand. For large γ, the algorithm becomes more aggressive in suppressing the deterioration of the projecting part. I recommend to choose γ > 1 to ensure that ξrep → ξk is of higher order than ξin → 0. 3. As the algorithm converges, λ can be decreased towards zero. For λ → 0, the algorithm is identical to the conventional form (Algorithm 2). 4. The solution found by Algorithm 4 has the same properties as the solution found by Algorithm 2, following the strict Bayesian framework. Eq. (6.61) is not to be confused with the form of the solution that would violate the Bayesian concept as discussed above (eq. 6.51), since no outdated subspaces appear here. 5. The solution space is screened in a controlled manner, starting at the prior mean. In some cases, the uniqueness of the solution is questionable because the problem is strongly non-linear and the objective function has several local minima. Then, the solution found by Algorithm 4 has better chances to comply with the smoothness condition and to fulfill common statistical criteria for testing the solution. 6. The costs of searching for an adequate value of λ and for computing new linearizations are minimized through error analysis.

6.3.2 Step Size Control ˜ yy,k is negligible for very large λ. Then, approximate Qyy ≈ λR and substitute the In eq. (6.62), Q modified cokriging matrix from eq. (6.62) in eqs. (6.20) to (6.22) to obtain the limit of the P submatrices for λ → ∞. Insert the resulting expressions and the right-hand side vector from eq. (6.62) into eqs. (6.25) & (6.26) to give: lim (ξin )

λ→∞

lim

λ→∞



bin β

= 0



= 0.

(6.65)

This shows that increasing λ can be used to restrict the step size for the innovative part. ˜ yy,k −τ R = Similarly, I can show that spr = sk for λ → ∞. Considering that, according to eq. (6.45), Q ˜ k Qss H ˜ T for infinite λ, R vanishes from eq. (6.56), and I obtain: H k lim ˆrpr

λ→∞

˜ k spr lim H

λ→∞

= 0 =

˜ k sk . H

(6.66)

˜k = H ˜ k−1 : Combining eqs. (6.61) through (6.66) yields for the linear case with H lim sk+1 = sk .

λ→∞

(6.67)

58

Quasi-Linear Geostatistical Inversing

Still, the algorithm is subject to the deterioration of the solution, since eq. (6.67) holds only for the ˜k = H ˜ k−1 . For H ˜ k 6= H ˜ k−1 , these identities are only approximations. In most linear case with H situations, the approximate character of eq. (6.67) does not cause problems. In case problems should ˜k ≈ occur, the step size restriction defined by ∆s1 can be chosen more drastically to ensure that H ˜ Hk−1 .

6.3.3 Application to known and unknown mean Algorithm 4 is designed for the case of uncertain mean. The cases of known and unknown mean are merely special limiting cases of this general case. To obtain an algorithm for the case of unknown mean, set Q−1 0 in all places. A more stable version for the unknown mean case can be obtained ββ =   by substituting β ∗ − βbk = 0 in eq. (6.62). In this case, no bias is exerted onto βb so that effectively

the algorithm behaves like in the unknown mean case, but the step size control over β is still active.

For the case of known mean, the entire derivations simplify, and the additional terms, rows and columns for βb disappear in all equations. Instead, the known mean value Xβ is added to sk+1 in eq. (6.61) and subtracted from sk in eq. (6.63).

6.4 Identification of Structural Parameters Up to now, the structural parameters that define the covariance function for the unknown parameter field was assumed to be known. The Quasi-Linear Geostatistical Approach, in a second step after conditioning the unknowns, estimates the structural parameters based on information included in the observations (Kitanidis, 1995 [56]). For reasons discussed in Section 6.2.1, I now extend the existing method to include prior information on the structural parameters. Consider the covariance matrix Qss a function of structural parameters θ such as the variance and integral scale of a geostatistical model. These parameters can be identified based on the observations y by maximizing the posterior probability density function of the structural parameters given the measurements, p (θ|y). Assume that the prior information on θ is available in the form a (multi-) Gaussian probability density function:   1 T −1 ∗ (θ − θ ) , (6.68) p (θ) ∝ ||Qθθ || 2 exp − (θ − θ ∗ ) Q−1 θθ 2 in which ||·|| denotes the determinant of a matrix. Applying Bayes theorem yields for p (θ|y): p (θ|y) =

p (y|θ) p (θ) . p (y)

(6.69)

Omitting the normalizing constant p (y), substituting p (θ) by eq. (6.68) and p (y) by eq. (6.11), and including the linearization yields a convenient expression for the posterior pdf of the structural parameters:   1 0 − 21 0 ∗ T −1 0 ∗ p (θ|y ) ∝ ||Gyy Qθθ || exp − (y − HXβ ) Gyy (y − HXβ ) 2   1 −1 ∗ T ∗ · exp − (θ − θ ) Qθθ (θ − θ ) . (6.70) 2

6.5 Performance Test

59

The vector y0 is defined by eq. (6.44). The tilde to denote linearized quantities has been omitted for H and Gyy . For the sake of easy reading, I define: z = y0 − HXβ ∗ . To find the peak of this function, I minimize its negative logarithm: L (θ|y) = C0 +

1 1 ln ||Gyy Qθθ || + zT G−1 yy z 2 2

1 T ∗ + (θ − θ ∗ ) Q−1 θθ (θ − θ ) , 2

(6.71)

in which C0 is the logarithm of the proportionality constant in eq. (6.70). Since Gyy potentially is a non-linear function of the structural parameters, I employ the Gauss-Newton method: θk+1 = θk − F−1 g ,

(6.72)

where F is the Fisher information matrix and g is the gradient of eq. (6.71). Using the rules of partial derivatives, it is easy to check that ∂Gyy ∂θ

=

∂Qyy . ∂θ

After applying eqs. (6.73) and (A.15) through (A.17), I obtain:   ∂L (θ|y) 1 ∂Qyy −1 1 ∂Qyy −1 ∗ gi = = Tr Gyy − zT G−1 Gyy z + eTi Q−1 yy θθ (θ − θ ) . ∂θi 2 ∂θi 2 ∂θi

(6.73)

(6.74)

The Fisher information matrix is defined by the expected value of the Hessian matrix. The i, j-th element of the Hessian is given by:  2    1 ∂ Qyy −1 1 ∂Qyy −1 ∂Qyy −1 ∂ 2 L (θ|y) = Tr G − Tr Gyy Gyy ∂θi ∂θj 2 ∂θi ∂θj yy 2 ∂θi ∂θj ∂Qyy −1 ∂Qyy −1 1 ∂ 2 Qyy −1 Gyy Gyy z − zT G−1 G z + eTi Q−1 yy θθ ej . ∂θi ∂θj 2 ∂θi ∂θj yy   Taking the expected value, applying eqs. (A.12) to (A.14) and taking into account that E zzT is equal to Gyy yields for the elements of F:   1 ∂Qyy −1 ∂Qyy −1 Fij = Tr Gyy Gyy + eTi Q−1 (6.75) θθ ej . 2 ∂θi ∂θj +zT G−1 yy

These results differ from the results by Kitanidis (1995) [56] only in the definition of G−1 yy and through the terms that stem from introducing prior information on the structural parameters.

6.5 Performance Test To compare the performance of the modified Levenberg-Marquardt algorithm and of the conventional algorithm, I apply both of them to a problem described by Cirpka and Kitanidis (2001) [14] with several simplifications: I seek for the hydraulic conductivity distribution K with uncertain mean in a 2-dimensional locally isotropic aquifer, considering measurements of hydraulic head φ

60

Quasi-Linear Geostatistical Inversing

and arrival time t50 of a conservative tracer. Since a full mathematical description of the underlying problem is given in the original publication, I only provide a brief summary. The unknown quantity is the log-conductivity Y = log K, discretized as an elementwise constant function on a regular grid with n elements. The unknowns are second-order stationary with uncertain constant mean, making X a n × 1 vector with unit entries. The covariance matrix Qss is given by the exponential model with structural parameters that are assumed to be known for simplicity. Since the grid is regular and equispaced, Qss is structured such that I can apply the spectral methods described in Chapter 8 to compute Qyy and Qsy . The transformation Y = ln K is linearized about Y˜k by:   ˜k + K ˜kY 0 , exp Y˜k + Y 0 ≈ K   ˜ k = exp Y˜k . with K

(6.76)

The domain Ω with the boundary Γ is rectangular. The boundary conditions for the flow problem are fixed head on the east and west section and no-flow conditions north and south, forcing the regional mean groundwater flow from west to east. The boundary conditions for the tracer are an instantaneous release of the tracer on the west boundary at time zero, zero-flux in north and south and no diffusive flux in the east.

Figure 6.1: Test case for Quasi-Linear Geostatistical Inversing The sensitivities of the observation with respect to Y are given in Chapter 7. A major contribution to the non-linearity of this problem originates from the linearization of K = exp (Y ). Error analysis yields: ε (Y 0 ) = exp (Y 0 ) − (1 + Y 0 ) . (6.77) I decide the error to be acceptable for Y 0 < 0.4, ε ≈ 0.1, and negligible for Y 0 < 0.01, ε ≈ 5e − 5, which I will take into account in Algorithm 4.

6.5 Performance Test

61

To set up test cases, I generate unconditional realizations of Y using the spectral approach of Dietrich and Newsam (1993) [25], solve the flow and transport problem, pick values of φ and t50 at the measurement locations, and add white noise to obtain artificial measurement data. An example of an unconditional realization of log K together with the corresponding head and arrival time distribution and measurement locations is displayed in Figure 1, (a) through (c). The subfigures show a realization of Y = log K with σY2 = 3.2. (a), hydraulic heads (b) and arrival time (c). The dots represent locations of measurements of the corresponding quantity. The grayscale is normalized for later direct comparison to the subfigures on the right. Subsequently, I ’forget’ the generated distributions of log K and proceed with the Quasi-Linear Geostatistical Approach to determine the unknown spatial distribution of Y , using both algorithms for comparison. For illustration, the result of a specific test case is displayed in Figure 1, (d) through (f). The conductivity distribution recovered through the Geostatistical Approach is smoother than the original field. However, the dependent quantities used for conditioning meet the measurements at the measurement locations. Table 6.1: Parameters used in the test cases parameter

units

value

parameter

units

value

domain length Lx correl. length λx grid spacing dx

m m m

1000 4 4

domain length Ly correl. length λy grid spacing dy

m m m

500 2 4

observations φ observations t50 error σh for h error σt for t50

m %

25 15 0.01 10

porosity dispersivity al dispersivity αt diffusion Dm

m m

0.3 10 1 10−9

m2 s

I ran several test cases. In case one, I chose the variance of Y sufficiently small so that the problem is effectively linear. In all other test cases, I increase the variance of Y by simply scaling the realization used in case one, making the problem increasingly non-linear. The chosen parameter values are listed in Table 6.1. In some regions, the contour-lines of the hydraulic head in Figure 1b indicate that the streamlines are inclined up to almost ninety degrees at a variance of σY2 = 6.4. The distorted streamline pattern resembles the extreme degree of non-linearity due to the heterogeneity of the flow field which affects all transport-related processes and quantities like, e.g., the transport of a tracer and hence its arrival time distribution. In the almost linear test case, σY2 = 0.1, both algorithms find identical solutions within few steps. At a variance of σY2 = 0.4, the conventional algorithm (Algorithm 2) begins to oscillate, but finally finds a solution after 15 steps, while the modified Levenberg-Marquardt version (Algorithm 4) converges after five steps. The solutions differ slightly, but both have similar values of the objective function that cannot be rejected based on χ2 statistics using a 95% confidence level. However, the value of the prior term is smaller for the solution obtained from Algorithm 4, i.e., the latter solution is smoother than the one obtained from the conventional algorithm. The case with variance of unity is the limiting case where the conventional case can still converge. At a variance of σY2 = 1.6, the conventional algorithm fails. The new algorithm still proves stable at a variance of σY2 = 3.2, converging after 19 steps. The solution for this case is shown in Figure 6.1(d,e,f). The solution cannot be rejected on a 95% confidence level. Only at a variance of σY2 = 6.4, the new algorithm stagnates and produces a solution that must be rejected in the statistical test.

62

Quasi-Linear Geostatistical Inversing

6.6 Summary and Conclusions In this chapter, I extended the Quasi-Linear Geostatistical Approach to account for prior information both on the mean of the unknowns and on the structural parameters that define the auto-covariance of the unknown parameter field. Further, I developed a modified Levenberg-Marquardt algorithm to replace the Gauss-Newton algorithm in the original method. A discussion in the Bayesian framework revealed that, and why, the solution has to obey a certain form. The solution is defined in a subspace obtained from the geostatistical approach. The subspace changes during the iteration procedure. In order to comply with the Bayesian concept and to maintain the uniqueness of the solution, only the final and optimal subspace must be used, and the previous trial solution must be projected onto the current subspace in each iteration step. Like the Gauss-Newton algorithm for least-squares fitting, the Gauss-Newton version of the QuasiLinear Geostatistical Approach encounters problems in strongly non-linear cases. Overshooting and oscillations may occur, causing the algorithm to diverge. In case the inverse problem is strongly nonlinear and the objective function has multiple local minima, excessively large steps lead to solutions that fail to obey the smoothness condition implied by the geostatistical approach. In the quasi-Linear Geostatistical Approach, the projection onto the current subspace introduces additional instabilities. The modified Levenberg-Marquardt algorithm for the Quasi-Linear Geostatistical Approach splits each iteration step into two parts: Projecting the previous trial solution onto the current subspace and reducing the residuals. Each part is equipped with its own stabilization mechanism. The first stabilization is to reduce the deterioration of the trial solution during the projection onto the current subspace, while the second restricts the improvement of the residuals to prevent overshooting. The new algorithm screens the solution space starting at the prior mean in a geostatistically controlled manner. Exerting control over the step size, it reduces the risk of oscillation or overshooting of the solution. In case of strong non-linearity, the objective functions may have multiple minima and the identity of the solution to the global minimum cannot be proved. Instead, decision criteria based on geostatistical considerations are used to reject or accept the solution. According to my experience, local minima do in most cases not fulfill these decision criteria since they fail to comply with the smoothness condition implied by the geostatistical approach. By putting the step size control into the geostatistical framework, the new algorithm has improved chances to obey the smoothness criterion. I demonstrated in test cases that the new algorithm has an increased convergence radius. It can cope with stronger non-linearity while requiring less iteration steps than its Gauss-Newton relative. This allows to apply the Quasi-Linear Geostatistical Approach to cases of higher variability and increased non-linearity.

Chapter 7

Sensitivity Analysis In this chapter, I derive sensitivities for observations of state variables with respect to parameters of the transfer function. In my case, the transfer function is given by the governing equations defined in Chapter 3. The state variables include the hydraulic head φ, the total flux Qtot , and the first and second central temporal moments m1 and m2c of local breakthrough curves, or alternatively the normalized second central moment m2cn . As parameters, I consider the log-conductivity log K and the scalar log-dispersion coefficient log Ds . The previous chapter covered the method of Quasi-Linear Geostatistical Inversing. It successively linearizes the transfer function about the current estimate using the sensitivity matrix H, see eq. (6.43). Since the number of observations and discrete values of the unknown parameters may be fairly large, efficient methods are required for the sensitivity analysis. Numerical differentiation is the most straightforward method. Consider an application with n unknown values of log K and m observations. Then, computing the entries of the m × n sensitivity matrix H from finite difference quotients takes n + 1 evaluations of the governing equations. The costs of numerical differentiation are prohibitive for large problems. Evaluating the transfer function requires the solution of pde’s. This is typically associated with computational costs in the order of n2 when using the usual range of solvers based on the Conjugate Gradient method. These costs can be reduced to the order of n log2 n when using multigrid solvers, like the Algebraic Multigrid Solver by Ruge and Stüben (1986) [85]. Altogether, numerical differentiation has costs in the order of at least n2 log2 n. A much more efficient method to compute sensitivities is the adjoint-state method (see, e.g., Townley and Wilson, 1985, Sykes et al., 1985 [97, 96]). The most detailed discussion of this method can be found in the comprehensive textbook by Sun (1994) [94]. Adjoint state sensitivities have successfully been applied for inverse modeling by many authors (e.g., Sun and Yeh, 1990, LaVenue and Pickens, 1992, Cirpka and Kitanidis, 1999 [95, 63, 14]). For each of the m observation, a set of adjoint problems has to be solved that is formally similar or even identical to the transfer function. When again using algebraic multigrid solvers, each adjoint problem has computational costs in the order of n log n. The adjoint state method allows to evaluate the sensitivity matrix with costs only in the order of mn log2 n. It trades one order in n for one order in m. Because observations are notoriously expensive in the field of hydrogeology, m is typically much smaller than n, and this trade drastically reduces the computational costs. The new contributions of this chapter are the sensitivities of Qtot , m2c and m2cn with respect to log K and the sensitivities of m1 , m2c and m2cn with respect to log D. The derivations for the sensitivities

64

Sensitivity Analysis

of m1 and φ with respect to log K are covered by Sun (1994) and Cirpka and Kitanidis (1999) [94, 14]. However, I decided to include them for the sake of completeness.

7.1 Outline of the Adjoint-State Method Before diving into the details of adjoint state sensitivities, I find it advisable to give a short general overview of the underlying method. The goal is to find the total derivative of an observation Zi of a state variable zi with respect to the unknown parameter P . The sensitivity is expressed by a ratio of perturbations: dZi Z0 ≈ i0 . dP P Formally, one could denote the process of observing the state variable zi at location xi as follows: Z Zi = zi (x) = δij δ (xi ) zj dΩ , Ω

in which summation over index j for all types of relevant state variables is implied. The Kronecker delta δij is unity for i = j and zero else, and δ (xi ) is a Dirac pulse at the location of observation. Perturbation analysis yields: Z Zi0 =



δij δ (xi ) zj0 dΩ .

(7.1)

In Section 7.2, I will derive stochastic partial differential equations for the governing equations, spde’s for short. They describe how perturbations P 0 of the parameters propagate onto perturbations zj0 . These spde’s inconveniently contain derivatives of the perturbations. To obtain expressions that are linear in all zj0 , I derive weak forms of the spde’s in Section 7.3. In the weak formulations, the differential operators are shifted from the state variables and parameters onto trial functions ψ j : Z D (ψj , . . .) zj0 + D (ψj , . . .) P 0 dΩ = 0 . (7.2) Ω

Here, D (·) denotes differential expressions that are not specified any further in this introduction. Adding eqs. (7.1)&(7.2) leads to an expression of the form:

Zi0

=

Z



[δij δ (x) + D (ψj , . . .)] zj0 + [D (ψj , . . .)] P 0 dΩ .

(7.3)

Now, the trial functions ψj are chosen such that the brackets multiplied by zj0 vanish: δij δ (x) + D (ψj , . . .) = 0 .

(7.4)

The equations originating from this step are the adjoint-state equations, and ψj denotes the adjoint states of the state variables zj . Since the differential operators shifted onto the adjoint states originally stem from the governing equations, the adjoint state equations are formally very similar to the governing equations. With the brackets vanishing, eq. (7.3) simplifies to Z Zi0 = [D (ψj , . . .)] P 0 dΩ . (7.5) Ω

When defining P as piecewise constant values within sub-volumes Ωk of the domain, one can divide by Pk0 : Z Zi0 = [D (ψj , . . .)] dΩ . (7.6) Pk0 Ωk

7.2 Small Perturbation Analysis

65

Eq. (7.6) is the desired expression for the sensitivity. The latter steps I discuss in Section 7.4. Typically, the adjoint-state equations (eq. 7.4) are solved numerically. The sensitivity is then evaluated in a postprocessing step by inserting the resulting values of ψj into eq. (7.6). How to solve the adjoint state equations under various conditions is covered in Section 7.5.

7.2 Small Perturbation Analysis 7.2.1 Log-Conductivity Stochastic pde’s are partial differential equations that propagate perturbations of the parameters onto dependent state variables. In other words, they describe how variations in the parameter input causes variations in the state variable output. Stochastic pde’s, or spde’s for short, are derived by performing a perturbation analysis on the parameters and state variables in the governing equations. Small-Perturbation Analysis Assume that K is a random variable distributed log-normally. Via the transformation Y = log K , one obtains the corresponding Gaussian quantity, the log-conductivity Y . Now, write Y as the sum of its expected value Y¯ = E [Y ] and a perturbation Y 0 : Y

=

Y¯ + Y 0 ,

where E [Y 0 ] = 0. The inverse transformation yields: K = exp (Y ) = exp(Y¯ + Y 0 ) = exp(Y¯ ) exp(Y 0 ) = Kg exp(Y 0 ) ≈ Kg + Kg Y 0 , in which the exponential function has been linearized about the geometric  mean Kg . Obviously, the first-order expected value of K is the geometric mean Kg = exp Y¯ , and its perturbation is K 0 = Kg Y 0 . The perturbations of log K propagate onto the dependent state variables: φ = q = v =

φ¯ + φ0 ¯ + q0 q ¯ + v0 v

m1 m2 m2c

= m ¯ 1 + m01 = m ¯ 2 + m02 = m ¯ 2c + m02c .

SPDE’s for Groundwater Flow Replacing K and q in eq. (3.1) by the perturbed quantities yields ¯ + q0 q

 = − (Kg + Kg Y 0 ) ∇ φ¯ + φ0 .

(7.7)

Taking the expected value and neglecting products of perturbations yields a linearized expression for the mean specific discharge: ¯ = −Kg ∇φ¯ . q

(7.8)

Subtracting the expected value from eq. (7.7) and dropping second-order terms yields a linearized expression for the fluctuations of q:  q0 = −Kg ∇φ0 + Y 0 ∇φ¯ . (7.9)

66

Sensitivity Analysis

Inserting eq. (7.8) into the groundwater flow equation (eq. 3.2) and its boundary conditions (eq. 3.3) produces the pde for the expected value of the head:  ∇ · Kg ∇φ¯ = 0 in Ω (7.10)  Kg ∇φ¯ · n = q˜ on Γ1 φ¯ = φ˜ on Γ2

 Kg ∇φ¯ · n = 0 on Γno .

(7.11)

When inserting eq. (7.9) into eq. (3.2)&(3.3), one obtains the spde for the heads and its boundary conditions:  ∇ · Kg ∇φ0 + Kg Y 0 ∇φ¯ = 0 in Ω (7.12) φ0

= 0 on Γ2  Kg ∇φ0 + Kg Y 0 ∇φ¯ · n = 0 on Γno ∪ Γ1 .

(7.13)

SPDE’s for Temporal Moments Following the same procedure starting from the generating equation for the first temporal moment, one obtains for the expected value of m1 : ∇ · (¯ vm ¯ 1 − D∇m ¯ 1) − m ¯0

= 0 in Ω

(¯ vm ¯ 1 − D∇m ¯ 1 ) · n = v˜m ˜1 m ¯1

= m ˜1

on Γin1 on Γin2

(D∇m ¯ 1 ) · n = 0 on Γno ∪ Γout

and the corresponding spde:

¯ · ∇m01 + v0 · ∇m v ¯ 1 − ∇ · (D∇m01 ) = 0 in Ω (¯ vm01

0

+v m ¯1 −

D∇m01 )

(7.14)

(7.15)

(7.16)

· n = 0 on Γin1

m01

= 0 on Γin2

(D∇m01 ) · n = 0 on Γno ∪ Γout .

(7.17)

Again using the same method, I have derived the spde for the second central moment: ¯ · ∇m02c + v0 · ∇m v ¯ 2c − ∇ · (D∇m02c )   D ∇m ¯1 −4∇m01 · m0

= 0 in Ω

(7.18)

(¯ vm02c + v0 m ¯ 2c − D∇m02c ) · n = 0 on Γin1 m02c

= 0 on Γin2

(D∇m02c ) · n = 0 on Γno ∪ Γout .

(7.19)

7.2.2 Log-Dispersion Coefficient In analogy to the spde’s for log K, I now present new spde’s that describe how perturbations of the scalar log-dispersion coefficient log Ds propagate onto dependent state variables. The state variables I consider here are the first and the second central temporal moment of local breakthrough curves.

7.2 Small Perturbation Analysis

67

Small Perturbation Analysis I assume that Ds is a random variable which is distributed log-normally. Via the transformation Ξ = log Ds , I obtain a corresponding Gaussian quantity, the scalar log-dispersion coefficient Ξ. Now, I write Ξ as ¯ = E [Ξ] and a perturbation Ξ0 : the sum of its expected value Ξ Ξ =

¯ + Ξ0 , Ξ

with E [Ξ0 ] = 0. The inverse transformation yields: ¯ + Ξ0 ) = exp(Ξ) ¯ exp(Ξ0 ) = Dg exp(Ξ0 ) ≈ Dg + Dg Ξ0 , Ds = exp (Ξ) = exp(Ξ in which I linearized the exponential function about Dg . Obviously, the first-order expected value of  ¯ and its perturbation is Ds0 = Dg Ξ0 . The perturbations of Ds Ds is its geometric mean Dg = exp Ξ propagate onto the dependent state variables: m1 m2c

= m ¯ 1 + m01 = m ¯ 2c + m02c .

SPDE for the k-th Temporal Moment For simplicity, I denote Ds by D in the following. I replace D and mk in eqs. (3.11)&(3.12) by the perturbed quantities, take the expected value and neglect products of perturbations to obtain the first-order expected value of mk :

v · ∇m ¯ k − ∇ · (Dg ∇m ¯ k ) − km ¯ k−1 (vm ¯ k − Dg ∇m ¯ k ) · n − v˜in m ˜ k,in m ¯k

= 0 in Ω

(7.20)

= 0 on Γin1 = m ˜k

on Γin2

(Dg ∇m ¯ k ) · n = 0 on Γno ∪ Γout .

(7.21)

Then, I subtract the expected value from the perturbed generating equations and drop all secondorder terms to set up the linearized spde for the perturbations m0k : v · ∇m0k − ∇ · (Dg ∇m0k + Dg Ξ0 ∇m ¯ k ) − km0k−1

(vm0k − Dg ∇m0k − Dg Ξ0 ∇m ¯ k) · n m0k

(Dg ∇m0k + Dg Ξ0 ∇m ¯ k) · n

= 0 in Ω

(7.22)

= 0 on Γin1 = 0 on Γin2 = 0 on Γno ∪ Γout .

(7.23)

SPDE for the Second Temporal Moment Likewise, I apply the same procedure to eqs. (3.13)&(3.14). The pde for the first-order expected value of the second central moment is:

68

Sensitivity Analysis

v · ∇m ¯ 2c − ∇ · (Dg ∇m ¯ 2c ) − 2

Dg ∇m ¯ 1 · ∇m ¯1 m0

n · (vm ¯ 2c − Dg ∇m ¯ 2c ) − v˜in m ˜ 2c,in m ¯ 2c

= 0 in Ω = 0 on Γin1 = m ˜ 2c,in

on Γin2

n · (Dg ∇m ¯ 2c ) = 0 on Γno ∪ Γout , and the corresponding spde is: v · ∇m02c − ∇ · (Dg ∇m02c + Dg Ξ0 ∇m ¯ 2c ) −2

Dg Ξ 0 Dg ∇m ¯ 1 · ∇m ¯1 −4 ∇m01 · ∇m ¯1 m0 m0

= 0 in Ω

(7.24)

n · (vm02c − Dg ∇m02c − Dg Ξ0 ∇m ¯ 2c ) = 0 on Γin1 m02c

= 0 on Γin2

n · (Dg ∇m02c + Dg Ξ0 ∇m ¯ 2c ) = 0 on Γno ∪ Γout .

(7.25)

7.3 Weak Formulations 7.3.1 Log-Conductivity In this section, I present the weak formulations of the spde’s defined in Section 7.2.1 for perturbations of log K. Groundwater Flow Starting from eq. (7.12), multiply by the trial solution ψφ and integrate over the entire domain Ω to derive a weak formulation of the hydraulic-head spde: Z  ψφ ∇ · Kg ∇φ0 + Kg Y 0 ∇φ¯ dΩ = 0 in Ω . Ω

To shift the differential operators from the perturbations of φ onto the trial solution, I apply Green’s theorem to the single terms and then sort by perturbations: Z Z   0 [∇ · (Kg ∇ψφ )] φ dΩ − ∇ψφ · Kg ∇φ¯ Y 0 dΩ Ω

+

Z



Γ

  ψφ Kg ∇φ0 + Kg Y 0 ∇φ¯ − φ0 (Kg ∇ψφ ) · n dΓ = 0 .

The boundary conditions for the head spde (eq. 7.13) simplify the boundary terms:  ψφ Kg ∇φ0 + Kg Y 0 ∇φ¯ · n − φ0 (Kg ∇ψφ ) · n = ψφ

= φ0 (Kg ∇ψφ ) · n  Kg ∇φ0 + Kg Y 0 ∇φ¯ · n

on Γ1 ∪ Γno on Γ2 .

(7.26)

7.3 Weak Formulations

69

By choosing additional boundary conditions for the the weighting function ψφ , all remaining boundary terms vanish except one: Z Z   0 [∇ · (Kg ∇ψφ )] φ dΩ − ∇ψφ · Kg ∇φ¯ Y 0 dΩ Ω





Z

Γ1

φ0 [Kg ∇ψφ ] · n dΓ = 0 in Ω ψφ

(7.27)

= 0 on Γ2

(Kg ∇ψφ ) · n = 0 on Γno .

(7.28)

Eqs. (7.27)&(7.28) are the weak form of the head spde for perturbations of log K. The remaining boundary term could be canceled by an appropriate choice for the boundary section Γ1 . However, it will be taken care of at a later stage. First Temporal Moment A weak formulation for the spde for the first temporal moment is: Z ψ1 (¯ v · ∇m01 + v0 · ∇m ¯ 1 ) − ψ1 ∇ · (D∇m01 ) dΩ = 0

in Ω .



After applying Green’s theorem to the single terms, one obtains: Z Z − m01 (¯ v · ∇ψ1 + ∇ · (D∇ψ1 )) dΩ + ψ1 v0 · ∇m ¯ 1 dΩ Ω

+

Z



Γ

¯ m01 − ψ1 D∇m01 + m01 D∇ψ1 ) · n dΓ = 0 . (ψ1 v

The boundary conditions specified in eqs. (7.17)&(7.13) and eq. (7.11) simplify the boundary terms. Choosing appropriate boundary conditions for the trial function cancels out the remaining terms on the boundary: Z − m01 (¯ v · ∇ψ1 + ∇ · (D∇ψ1 )) dΩ Ω

+

Z



ψ1 v0 · ∇m ¯ 1 dΩ = 0 in Ω (D∇ψ1 ) · n = 0 on Γin1 ∪ Γno ψ1

= 0 on Γin2

(¯ vψ1 + D∇ψ1 ) · n = 0 on Γout . The perturbations of v may be expressed using eq. (7.9), so that:   Z Z Kg 0 ¯ Kg Y ∇φ + ∇φ0 · ∇m ¯ 1 dΩ . + ψ1 v0 · ∇m ¯ 1 dΩ = − ψ1 θ θ Ω Ω

Applying Green’s theorem once again to transfer the differential operator away from the perturbations yields   Z Kg − ∇φ0 · ψ1 ∇m ¯ 1 dΩ θ Ω    Z  Z Kg Kg ∇m ¯ 1 dΩ − φ0 ψ 1 ∇m ¯ 1 · n dΓ . = φ0 ∇ · ψ 1 θ θ Γ Ω

70

Sensitivity Analysis

The new boundary term vanishes everywhere expect on the inflow boundary Γin1 , since φ0 = 0 on Γout ∪ Γ2 and ∇m ¯ 1 = 0 on Γno . Again, this remaining boundary term is taken care of later. The weak form of the spde for m1 and perturbations of log K is: Z − [¯ v · ∇ψ1 + ∇ · (D∇ψ1 )] m01 dΩ +

Z  Ω





Kg ∇ · ψ1 ∇m ¯1 θ



φ0 dΩ +



Z

Γin1

Z





¯ · ∇m [ψ1 v ¯ 1 ] Y 0 dΩ

Kg φ ψ1 ∇m ¯1 θ 0



· n dΓ = 0 in Ω

(7.29)

(D∇ψ1 ) · n = 0 on Γin1 ∪ Γno ψ1

= 0 on Γin2

(¯ vψ1 + D∇ψ1 ) · n = 0 on Γout .

(7.30)

Second Central Temporal Moment The spde for m2c , is identical to the spde for m1 except for the additional source term. I multiply the additional source term by the weighting function ψ2c and then integrate over the entire domain. Then, I apply Green’s theorem: Z D −4 ψ2c ∇m01 · ∇m ¯ 1 dΩ = m 0 Ω   Z D 0 +4 m1 ∇ · ψ2c ∇m ¯ 1 dΩ m0 Ω  Z  D 0 −4 m1 ψ2c ∇m ¯ 1 · n dΓ . m0 Γ The new boundary term vanishes on all boundaries except on Γin1 since m01 = 0 on Γin2 and (D∇m ¯ 1 ) · n = 0 on Γ \ Γin1 . My weak formulation for the spde of m2c and perturbations of log K is: Z − [¯ v · ∇ψ2c + ∇ · (D∇ψ2c )] m02c dΩ +

Z  Ω





Kg ∇ · ψ2c ∇m ¯ 2c θ



Z

Γin1



φ0 dΩ +

Z



¯ · ∇m [ψ2c v ¯ 2c ] Y 0 dΩ

 D +4 · ψ2c ∇m ¯ 1 dΩ m0 Ω   Kg D φ0 ψ2c ∇m ¯ 2c + 4m01 ψ2c ∇m ¯ 1 · n dΓ = 0 in Ω θ m0 Z

m01 ∇



(7.31)

(D∇ψ2c ) · n = 0 on Γin1 ∪ Γno ψ2c

= 0 on Γin2

(¯ vψ2c + D∇ψ2c ) · n = 0 on Γout .

(7.32)

7.3 Weak Formulations

71

7.3.2 Log-Dispersion Coefficient In this section, I derive weak formulations for the spde’s defined in Section 7.2.2 for perturbations of log D. First Temporal Moment Again following the same procedure, I multiply the spde for the k-th temporal moment (eq. 7.22) by a weighting function ψk and integrate over the entire domain: Z − m0k v · ∇ψk + m0k ∇ · (Dg ∇ψk ) dΩ + +

Z

Z



0



∇ψk · (Dg Ξ ∇m ¯ k ) dΩ −

Z



ψk km0k−1 dΩ

¯ k + m0k Dg ∇ψk ) · n dΓ = 0 . (ψk vm0k − ψk Dg ∇m0k − ψk Dg Ξ0 ∇m

Γ

The boundary conditions specified in eq. (7.23) simplify the new boundary terms. I choose the boundary conditions for ψk so that all remaining boundary terms vanish. The resulting spde for m2c and perturbations of log D is: Z [−v · ∇ψk − ∇ · (Dg ∇ψk )] m0k dΩ +

Z





0

[∇ψk · Dg ∇m ¯ k ] Ξ dΩ −

Z



ψk km0k−1 dΩ = 0 in Ω

(7.33)

(Dg ∇ψk ) · n = 0 on Γin1 ψk

= 0 on Γin2

(vψk + Dg ∇ψk ) · n = 0 on Γno ∪ Γout .

(7.34)

Second Central Temporal Moment Like in the case of log K, the spde for m2c (eq. 7.18) differs from the spde for mk (eq. 7.16) only in the source terms. Hence, the weak formulation for the second central moment spde differs from eq. (7.33) only through the weak formulation of the following terms:   Z Dg Ξ 0 Dg − ψ2c 2 ∇m ¯ 1 · ∇m ¯1 +4 ∇m01 · ∇m ¯ 1 dΩ . m0 m0 Ω As before, I apply Green’s theorem to all terms that contain derivatives of perturbations: Z Dg 4 ψ2c ∇m01 · ∇m ¯ 1 dΩ m0 Ω   Z Dg 0 = −4 m1 ∇ · ψ2c ∇m ¯ 1 dΩ m0 Ω  Z  0 Dg +4 ψ2c m1 ∇m ¯ 1 · n dΓ . m0 Γ The boundary conditions for the temporal moment spde (eq. 7.17) cancel out this new boundary integral on all boundaries except on Γin1 . I neglect the remaining term on Γin1 for two reasons.

72

Sensitivity Analysis

First, both the dispersive flux across the inflow boundary and the perturbations of m1 at the inflow boundary are small in the advection-dominated case. Second, the moment-generating equation for m2c (eq. 3.13) was derived under similar assumptions. Finally, the weak formulation of the spde for m2c and perturbations of log D is: Z [−v · ∇ψ2c − ∇ · (Dg ∇ψ2c )] m02c dΩ +

Z  Ω



 ψ2c ∇m ¯ 1 · Dg ∇m ¯ 1 Ξ0 dΩ m0   Z  Dg +4 ∇ · ψ2c ∇m ¯ 1 m01 dΩ = 0 in Ω m0 Ω

Dg ∇ψ2c · ∇m ¯ 2c − 2

(7.35)

(Dg ∇ψ2c ) · n = 0 on Γin1 ψ2c

= 0 on Γin2

(vψ2c + Dg ∇ψ2c ) · n = 0 on Γno ∪ Γout .

(7.36)

7.4 Adjoint State Sensitivities Now that the weak formulations of the spde’s for all relevant state variables exist, this section goes back to the main thread outlined in Section 7.1.

7.4.1 Sensitivities with Respect to log K Observations of Hydraulic Heads and Temporal Moments First, I consider observations of φ, m1 and m2c and derive their sensitivity with respect to Y = log K. In this context, eq. (7.1) becomes: Z Zi0 = δiφ δ (x) φ0 + δi1 δ (x) m01 + δi2c δ (x) m02c dΩ . (7.37) Ω

Now, I add eqs. (7.27), (7.29) and (7.31) and sort by perturbations to obtain the equivalent of eq. (7.3):   Z  Kg ∇m ¯1 Zi0 = + δiφ δ (x) + ∇ · (Kg ∇ψφ ) + ∇ · ψ1 θ Ω   Z Kg +∇ · ψ2c ∇m ¯ 2c φ0 dΩ − φ0 [Kg ∇ψφ ] · n dΓ θ Γ1   Z Kg Kg 0 − φ ψ2c ∇m ¯ 2c + ψ1 ∇m ¯ 1 · n dΓ θ θ Γin1   Z  D ¯ · ∇ψ1 − ∇ · (D∇ψ1 ) + 4∇ · ψ2c ∇m ¯ 1 m01 dΩ + δi1 δ (x) − v m0 Ω Z ¯ · ∇ψ2c − ∇ · (D∇ψ2c )] m02c dΩ + [δi2c δ (x) − v Ω

+

Z



  ¯ · ∇m ¯ · ∇m −∇ψφ · Kg ∇φ¯ + ψ1 v ¯ 1 + ψ2c v ¯ 2c Y 0 dΩ .

(7.38)

7.4 Adjoint State Sensitivities

73

The terms involving the perturbations m02c m01 and φ0 disappear given that the terms in the corresponding square brackets vanish. This is permissible since in Section 7.3.1 I introduced all ψ without further specifications, and only chose some boundary conditions. Setting the terms inside the square brackets to zero yields the adjoint-state equations for the adjoint states ψφ , ψ1 and ψ2c . The adjoint state equation for the second central moment m2c is:

¯ · ∇ψ2c + ∇ · (D∇ψ2c ) − δi2c δ (x) = 0 in Ω , v

(7.39)

subject to the boundary conditions specified in eq. (7.32). For the first temporal moment m1 , the adjoint-state equation is: ¯ · ∇ψ1 + ∇ · (D∇ψ1 ) v  D −4∇ · ψ2c ∇m ¯ 1 − δi1 δ (x) = 0 in Ω , m0 

subject to eq. (7.30). The adjoint-state equation for the hydraulic head φ is:   Kg ∇ · (Kg ∇ψφ ) + ∇ · ψ2c ∇m ¯ 2c θ   Kg +∇ · ψ1 ∇m ¯ 1 + δiφ δ (x) = 0 in Ω , θ

(7.40)

(7.41)

subject to eq. (7.28) and the additional boundary condition:   Kg Kg ψ2c ∇m ¯ 2c + ψ1 ∇m ¯ 1 + Kg ∇ψφ · n = 0 on Γ1 ∩ Γin1 . θ θ The remaining terms in eq. (7.38) are: Z   0 ¯ · ∇m ¯ · ∇m Zi = −Kg ∇φ¯ · ∇ψφ + ψ1 v ¯ 1 + ψ2c v ¯ 2c Y 0 dΩ . Ω

If Y is defined piecewise constant within each sub-volume Ωk , the sensitivity is given by: Z  dZi Zi0 Kg ¯ = 0 = −Kg ∇φ¯ · ∇ψφ − ψ1 ∇φ · ∇m ¯1 dYk Yk θ Ωk  Kg ¯ −ψ2c ∇φ · ∇m ¯ 2c dΩ . θ

(7.42)

¯ using Darcy’s law. in which I express v Normalized Second Central Moment Since no sensible generating equation could be defined for the normalized second central temporal moment m2cn , its sensitivity with respect to log K can not be obtained directly through the adjoint state method. Instead: m2c m2cn = m1   ∂m2cn ∂ m2c = ∂Y ∂Y m1 =

m2c ∂m1 1 ∂m2c − 2 . m1 ∂Y m1 ∂Y

(7.43)

74

Sensitivity Analysis

This expression is based on a potentially small difference between high numbers. That is, relatively small numerical errors in the computation of the partial sensitivities can lead to relatively large errors in the total sensitivity. However, using the normalized moment m2cn instead of m2c reduces numerical instabilities in the conditioning procedure because m2cn and m1 are less interdependent than m2c and m1 . Whether the advantages or disadvantages prevail I will check in a test case in Chapter 10. Measurement of the Total Flux Now I derive the sensitivity of the total flux Q with respect to log K. In analogy to eq. (7.1), the starting point is an integral form of the observation process. Since the total flux is not a local pointlike observation somewhere in the domain but rather an integral observation on the inflow or outflow boundary, I use the following expression: Z Q = (K∇φ) · n dΓ Q0

=

Z

Γin

Γin

 Kg Y 0 ∇φ¯ + Kg ∇φ0 · n dΓ .

(7.44)

I use a weak formulation of the groundwater flow equation that is identical to eq. (7.26) with the trial function ψq and add it to eq. (7.37): Z  Q0 = Kg Y 0 ∇φ¯ + Kg ∇φ0 · n dΓ Γin

+

Z



+

Z

Γ

0

[∇ · (Kg ∇ψq )] φ dΩ − ψq

Z





 ∇ψq · Kg ∇φ¯ Y 0 dΩ

  Kg ∇φ0 + Kg Y 0 ∇φ¯ − φ0 (Kg ∇ψq ) · n dΓ .

(7.45)

When defining the adjoint state ψq to meet:

∇ · (Kg ∇ψq )

= 0 in Ω

ψq

= −1 on Γin

ψq

= 0 on Γout

(Kg ∇ψq ) · n = 0 on Γno ,

and dividing by Y , the sensitivity of the total flux with respect to Y is: Z  dQ Q0 = 0 = Kg ∇ψq · ∇φ¯ dΩ . dYk Yk Ωk 0

(7.46)

It may be noteworthy that ψφ can be directly computed from the mean head φ¯ in case both Γin and Γout are Dirichlet boundaries in the flow problem and if there are no internal sources or sinks: ψφ = −

φ¯ − φ˜out . φ˜in − φ˜out

7.4.2 Sensitivities With Respect to log D In this section, I consider observations of m1 and m2c and derive their sensitivity with respect to Ξ = log D. In this context, eq. (7.1) becomes: Z Zi0 = δi1 δ (x) m01 + δi2c δ (x) m02c dΩ . (7.47) Ω

7.5 On Practical Application

75

Now, I add eq. (7.33) for k = 1 and eq. (7.35), and then sort by perturbations: Z Zi0 = [δi2c δ (x) − v · ∇ψ2c − ∇ · (Dg ∇ψ2c )] m02c dΩ Ω

+

Z



[δi1 δ (x) − v · ∇ψ1 − ∇ · (Dg ∇ψ1 )

  Dg ∇m ¯ 1 m01 dΩ +4∇ · ψ2c m0  Z  ψ2c + ∇ψ2c · Dg ∇m ¯ 2c − 2 ∇m ¯ 1 · Dg ∇m ¯ 1 + ∇ψ1 · Dg ∇m ¯ 1 Ξ0 dΩ . m0 Ω

(7.48)

As before, the perturbations m02c and m01 can be eliminated given that the terms inside the corresponding square brackets vanish. Together with the boundary conditions chosen in Section 7.3.2, this defines the adjoint-state equations for ψ2c and ψ1 . The resulting adjoint-state equations for m2c and m1 for the sensitivity with respect to log D are identical to eqs. (7.39)&(7.40). The sensitivity is given by Z  dZi Zi0 ψ2c = = ∇ψ2c · Dg ∇m ¯ 2c − 2 ∇m ¯ 1 · Dg ∇m ¯1 dΞk Ξk m0 Ωk +∇ψ1 · Dg ∇m ¯ 1 ] dΩ .

(7.49)

Similar to eq. 7.43, the sensitivity of m2cn is: ∂m2cn ∂Ξ

=

1 ∂m2c m2c ∂m1 − 2 . m1 ∂Ξ m1 ∂Ξ

(7.50)

7.5 On Practical Application The adjoint state equations (eqs. 7.39 to 7.41) are solved as follows. For evaluating the sensitivity of m2c with respect to log K, the source term related to the position of the measurement is active for ψ2c in eq. (7.39). All other source terms related to measurement locations are inactive. The adjoint state equation for ψ2c is solved first. It appears as a source term in the adjoint state equation for ψ1 (eq. 7.40) which is solved next. The adjoint state equation for ψφ (eq 7.41) is solved last since it has source terms related to both ψ2c and ψ1 . For a measurement of m1 , the source term at the measurement location is active only in the adjoint state equation for ψ1 (eq. 7.40), and ψ2c is zero in the entire domain. Hence, only the ψ1 and ψφ need to be solved for in the same sequence as outlined above. If considering a measurement of the φ, all adjoint states expect ψφ are zero, so that only the latter has to be solved for. The same principle applies when evaluating the sensitivity of observations with respect to log Ds . To save computational effort, I take a closer look at the adjoint state equations. Not only are the adjoint state equations for ψ2c and ψ1 identical for both the sensitivity with respect to log K and log Ds . Further, the adjoint state ψ1 for the measurement of m1 is identical to the adjoint state ψ2c for a measurement of m2c . Keeping track at these identities reduces the number of adjoint state equations that must be solved. Once all adjoint state equations relevant for a respective type of observation are solved, eqs. (7.42)&(7.49) are evaluated to obtain the desired sensitivity.

76

Sensitivity Analysis

7.6 Graphical Examples and Discussion In order to illustrate the sensitivities of hydraulic head φ, the first and the second central temporal moments m1 and m2c with respect to log K and log Ds , they are displayed in Figure 7.1. The sensitivities shown here are related to measurements of the state variables at (x, y) = (3m, 0.25m), linearized about constant mean values. The grayscale is normalized from -1 to 1. The sensitivity ∂φ/∂Y is a classical dipole centered about the location of measurement. All other quantities, i.e. m 1 , m2c and m2cn are dominated by advection. Their sensitivities are aligned along the up-gradient part of the streamline that passes through the location of measurement, subject to dispersion.

Figure 7.1: Sensitivities with respect to log K and log Ds Linearized about a constant mean value, the sensitivity patterns are very straightforward. If lin-

7.6 Graphical Examples and Discussion

77

earized about heterogeneous fields of log K and log Ds , the patterns change in many ways. First of all, the streamlines change. Second, the magnitude of the sensitivities changes locally since all terms in the corresponding equations are multiplied by the local values of Kg and Dg . Third, all terms in the sensitivities include gradients of state variables and adjoint states. For heterogeneous parameters, these gradients change and have transverse components that do not exist in the homogeneous case. In the case shown here, the sensitivity of m1 with respect to log Ds seems to be almost insignificant. Apparently, the sensitivities of m2c and m2cn with respect to log Ds are identical, and the difference between the sensitivities with respect to log K is barely visible. The difference is that a certain multiple of ∂m1 /∂Y is subtracted from ∂m2c /∂Y to obtain ∂m2cn /∂Y . This changes dramatically when linearizing about heterogeneous fields of log K and log Ds . Then, adjacent streamlines have different values of m1 and m2c , so that dispersive exchange between streamlines may be dominant for the local values of m1 and m2c . Further, the difference between the sensitivities of m2c and m2cn become clearly visible and m2cn and m1 are significantly less interdependent than m2c and m1 . The physical interpretation of the sensitivity of m2c with respect to log D (eq. 7.49) is as follows: The first term resembles the spreading of breakthrough curves through transverse dispersive exchange of m2c between adjacent streamlines. If an adjacent streamline has a higher value of m 2c , then an increase in the dispersion coefficient will allow to transport that higher value into to the area of influence of the respective measurement. The second term resembles the spreading by dispersion within the area of influence itself. In both cases, the area of influence is marked by minus the gradient of ψ2c . The third term resembles the spreading generated by dispersive exchange of m1 within the region marked by minus the gradient of ψ1 . The first and the third term only contribute if the values of the respective state variables in adjacent streamlines differ.

Chapter 8

Spectral Methods for Geostatistics In geostatistics, spatially distributed unknowns are commonly interpreted as realizations of random processes, characterized by their mean values and covariance functions. Typical examples for this are soil and aquifer properties, the hydraulic head or solute concentrations. The same holds for time series analysis, e.g. in stochastic hydrology. Most geostatistical methods discretize the unknowns on a grid, so that the auto-covariance of the unknowns is described by a matrix. All subsequent tasks, like conditioning, the generation of realizations, or evaluating the likelihood of realizations, involve matrix operations performed on the auto-covariance matrix of the unknowns. These basic matrix operations are the subject matter of Section 8.1. Considering the unknowns resolved by a number n of discrete values, the auto-covariance matrix of the unknowns is sized n × n. Typically, the resolution is dictated not by the geostatistical inverse method itself. Instead, the objectives behind the inverse problem statement or numerical stability criteria force to finely resolve the unknowns. For well-resolved 2-D or 3-D applications, the number of discrete values may easily rise up to the order of 10.000 or 1.000.000. Then, the storage requirements for the auto-covariance matrix and the computational costs for matrix operations become restricting or even prohibitive (e.g., Zimmerman et al., 1998 [105]). The aspect of computational costs is covered in Section 8.2. These circumstances have heavily limited those geostatistical inverse methods that rely on strict Bayesian analysis. Under the pressure of increasingly large problems, alternative geostatistical methods have emerged that avoid complete assembly and explicit use of large matrices. However, as discussed in Chapter 6, these alternative methods sacrifice accuracy in some way or another. In this chapter, I summarize and extend a toolbox of highly efficient methods to reduce the computational costs of basic matrix operations in geostatistics. If the unknown quantity is a stationary random variable discretized on a regular and equispaced grid, the auto-covariance matrix of the unknowns has symmetric Toeplitz structure. Zimmerman (1989) [106] showed that this specific structure can be exploited to reduce both the storage requirements and the computational costs for the matrix operations in question. An even more convenient class of matrices are symmetric circulant matrices. Highly efficient algorithms for circulant matrices have been discovered as early as the middle of the 20th century (e.g. Good, 1950, Davis, 1979 [38, 20]) and applied and extended successfully ever since. The vast majority of these methods is based on the Fast Fourier Transform (FFT) by Cooley and Tukey (1965) [17]. More details on structured matrices and the structure of the auto-covariance matrix are provided in Section 8.3.

8.1 Basic Matrix Operations in Geostatistics

79

Through circulant embedding as described in Section 8.4, Toeplitz matrices like the auto-covariance matrix of the unknowns can be embedded in larger circulant matrices. Then, matrix operations can be performed using FFT-based spectral methods, speeding up geostatistics by orders of magnitude. A survey of existing FFT-based methods, extensions thereof, and new spectral methods are presented in Sections 8.5 to 8.7. In 1-D applications, the auto-covariance matrix is a symmetric Toeplitz matrix, and symmetric block Toeplitz matrices with Toeplitz blocks occur in the 2-D case. For higher dimensionality d, symmetric level-d block Toeplitz structure is imposed. In the following, I will discuss the 1-D case for simplicity. The straightforward extension to the d-dimensional case is left to the reader. Comparison to Other Spectral Methods One can find other spectral methods in geostatistics, where the solution of pde’s is performed in the Fourier space. These methods produce solutions for periodic or infinite flow domains that may be used as approximate solutions for finite domains (e.g. Yeh et al, 1995 [101]). Other spectral methods, like those used by Harter et al. (1996) [39], place the perturbation analysis of the governing equations into the Fourier space. This requires the perturbations of dependent quantities, such as hydraulic heads and solute travel time, to be second order stationary or at least intrinsic. In comparison to the latter, the methods in this section are merely numerical algorithms to speed up matrix operations performed on the auto-covariance matrix of the unknowns. They do not influence the conceptualization of the inverse problem. The results are not approximate and do not require periodicity or infinite domains. Only the unknowns, not the dependent quantities, are required to be second order stationary or at least intrinsic.

8.1 Basic Matrix Operations in Geostatistics The most important and basic matrix operations in geostatistics are multiplications (convolutiontype operations), decompositions, and inversions (deconvolution-type operations). In the following, I will illuminate where these types of operations occur in geostatistics.

8.1.1 Convolution-Type Operations Many methods of geostatistical inversing are founded on conditioning, such as cokriging and the Quasi-Linear Geostatistical Approach by Kitanidis (1995) [56] among the other methods summarized in Chapter 6. This includes the MAP method by McLaughlin and Townley (1996) [69] and the Successive Linear Estimator by Yeh et al. (1996) [102]. For conditioning, the cross-covariance between the unknowns and the observations and the autocovariance among the observations are used to infer information from the observations onto the unknowns. The easiest example for the underlying mathematics is the linear cokriging estimator for the case of perfectly known zero mean. In analogy to cokriging with uncertain mean (compare Section 6.1.1), consider the n × 1 vector of unknowns s and m observations of a dependent quantity arranged in the m × 1 vector y. The vector of unknowns s is characterized by its zero mean and the n × n covariance matrix Qss . The cross-covariance matrix Qsy and the auto-covariance matrix Qyy describe the statistical relation between the unknowns s and the observations y, and among the observations y, respectively.

80

Spectral Methods for Geostatistics

In hydrogeology, the observed quantities are related to the unknowns through transfer functions, like the groundwater flow equation or the transport equation. From these transfer functions, the m × n sensitivity matrix H = ∂y/∂s is computed. According to linear error propagation, Qsy and Qyy are (compare eq.(6.9) in Section 6.1.1): Qsy

= Qss HT

Qyy

= HQss HT . . . .

(8.1)

Given these quantities, the cokriging estimator for the unknowns is (Kitanidis, 1996 [57]): ˆs = Qsy Q−1 yy y .

(8.2)

I categorize the matrix-matrix multiplications in eq. (8.1) as convolution-type operations because, when considering only single rows of H before spatial discretization, these equations turn into convolution integrals: Z ∂yi Qs(x)yi = Qss (x, x0 ) dx0 0) ∂s (x Ω ZZ ∂yj ∂yi Q yi yj = Qss (x00 , x0 ) dx0 dx00 . (8.3) 00 ∂s (x0 ) Ω ∂s (x )

8.1.2 Decomposition-Type Operations The generation of unconditional realizations of spatial random variables is the basis for all MonteCarlo techniques in geostatistics, including the Pilot-Point method (RamaRao et al., 1995, LaVenue et al., 1995 [78, 64]) or the Sequential Self-Calibration Method (Gómez-Hernández et al., 1997 [37]). An unconditional realization su of a random vector s with covariance matrix Qss can be generated using a square root decomposition STss Sss = Qss and an n × 1 random vector ε ∼ N (0, I): Qss su

= STss Sss = STss ε .

(8.4)

The standard decomposition chosen to generate realizations of Gaussian random fields is the Cholesky decomposition. Alternative methods to generate realizations are, among others, the turning band method or sequential Gaussian simulation. However, many of these alternative methods are only approximate or restricted to intrinsic or second-order stationary cases, whereas the Cholesky decomposition can be used for arbitrary types of covariance functions, given that the resulting covariance matrix is positive-definite.

8.1.3 Deconvolution-Type Operations The product of a Toeplitz matrix with a vector is referred to as convolution. Deconvolution is the reverse process, i.e., the solution of the system Qss u = s, evaluating Q−1 ss s, or reverting the convolution integrals (eq. 8.3) in any other form. Deconvolution is the basis for picture deblurring, many types of digital filters, estimation of system input or the transfer function in linear system theory, and many other applications in engineering and signal processing. In geostatistics, the likelihood of a realization or the value of the prior term (compare eq. 6.14) is given by the following quadratic form: Lp (s) =

1 T −1 s Qss s . 2

8.2 Computational Costs

81

For the sake of simplicity, I have chosen the expression for perfectly known zero mean. This problem can be subdivided into first solving the the deconvolution problem Qss u = s and then evaluating 1 T 2 s u. In Section 6.1, I have introduced a computationally efficient form of the prior term (eq. 6.28) that eludes the deconvolution problem. It is applicable whenever the value of s is expressed through a certain parameterization. In the context of the Quasi-Linear Geostatistical Approach, this prerequisite is met by all trial solutions sk but the initial guess s0 . The initial guess often is the prior mean, so that per definition Lp (s0 ) = 0. In some situations, the prior mean is a poor initial guess. To ensure convergence and to keep the number of iteration steps low, an up-sampled course-grid approximation of the posterior mean or other types of improved initial guesses may be advisable. Then, because the initial guess is not in accordance with the required parameterized form, the prior term has to be evaluated through brute-force deconvolution.

8.2 Computational Costs The following sections assess the computational costs for the three basic types of matrix operations in geostatistics. I consider both standard methods and computationally efficient methods that are available up to presence. Also, I and introduce the improvements which I will contribute in Sections 8.5 to 8.7.

8.2.1 Convolution-Type Operations For large numbers of unknowns, the matrix-matrix multiplications in eq. (8.1) dominate the  compu2 tational costs of cokriging-like methods. Computing Qss is an operation that requires O n Floating  Point Operations (FLOPS). The matrix multiplication Qsy = Qss HT is O mn2 , and the computa tion of Qyy via Qyy = HQsy +R or Qyy = HQss HT +R takes O nm2 and O mn2 + nm2 FLOPS, respectively.  Storage of Qss consumes O n2 Byte. In case n = 1.000.000, Qss has a size of 8.000 GByte if stored at double precision, exceeding the capacity of every present HDD device. To prevent memory overflow, single columns of Qss can be constructed, multiplied by H and deleted, a procedure that lets CPU time explode drastically. A method with reduced computational costs is superfast convolution via FFT (Van Loan, 1992) [99]. Kozintsev (1999) [62] shows how to evaluate the quadratic form uT Qss u through FFT, which can be used to compute the diagonal of Qyy . In Section 8.5, I summarize these methods and extend them to be applied to the matrix-matrix products in eq. (8.1).

8.2.2 Decomposition-Type Operations If using the Cholesky decomposition to obtain an upper triangular decomposition of Qss in eq. (8.4),  3 FLOPS and storage costs of generating unconditional realizations has computational costs of O n  O n2 Bytes.

A great effort has been invested in efficient algorithms for decomposing (block-) Toeplitz matrices by exploiting their specific displacement structure (see Kaliath, 1995 [49]). The resulting methods (e.g., Gallivan, 1996, Stewart, 1997a, Stewart, 1997b [34, 91, 90]) are based on a recursive (block-) Schur algorithm and the hyperbolic (block-) Householder transformation (Gallivan, 1994 [33]). A

82

Spectral Methods for Geostatistics

good summary of direct Toeplitz solvers based on the Schur algorithm is given by Kaliath and Sayed (1999) [50]. Look-ahead strategies were introduced to prevent breakdown of the algorithm for nearly-singular matrices (Gallivan, 1995 [32]). These algorithms do not require explicit storage of the Toeplitz matrix. Their result is an upper triangular matrix with no further specific structure that has to be stored explicitely. Therefore, they still explicitely compute and store O n2 elements of a matrix. The more efficient method is to perform circulant embedding on the Toeplitz matrix to obtain a corresponding larger circulant matrix and then perform the decomposition. Generating unconditional realizations using this method has been introduced by Dietrich and Newsam (1993) and extended to conditional realizations by the same authors in 1996 [25, 26]. Since only symmetric nonnegative-definite circulant matrices have a symmetric real circulant decomposition, the embedding size takes some extra attention to maintain the definiteness during the embedding process (Dietrich and Newsam, 1997 [27]). Cirpka and Nowak (2004) [16] use a slightly different circulant decomposition that allows to simulate random variables under certain types of statistical non-stationarity. In Section 8.6, I summarize the decomposition methods by Dietrich and Newsam (1993) [25] and Cirpka and Nowak (2004) [16] and present a modification of the latter. In contrast to the original version, my new modified version allows to simulate non-stationary random variables with explicit cross-correlation between zones that have different covariance functions.

8.2.3 Deconvolution-Type Operations To solve a deconvolution problem, a term of the form Q−1 ss s has to be evaluated. Direct inversion of an unstructured n × n matrix has computational costs of O n3 FLOPS, which is impractical and impossible for largen. Indirect solvers, like the Preconditioned Conjugate Gradients method, reduce these costs to O n2 FLOPS. The method of Conjugate Gradients for symmetric positive-definite systems of equations was introduced by Hestenes and Stiefel (1952) [41]. A pseudocode for the method is provided in Barrett et al. (1993) [4]. Shewchuk (1994) [89] gives the probably best introduction to Conjugate Gradients without what he calls the agonizing pain.

Still, without additional measures to exploit the specific structure of the matrix, the computational costs may be prohibitive. Further, standard indirect solvers require explicit storage of the matrix which is apparently intractable for very large n. Due to the high relevance of deconvolution in many areas of engineering, mathematicians have been working on the problem of deconvolution for a long time, and many highly efficient algorithms are available. The inversion of (block-) circulant matrices has been discovered and re-invented several times (e.g., Good, 1950, Rino, 1970, Searle, 1979 [38, 81, 88]). Trapp (1973) [98] added the Moore-Penrose generalized inverse for (block-) circulant matrices. Nott and Wilson (1997) [73] and Kozintsev (1999) [62] discuss the solution of (block-) circulant systems to evaluate the likelihood of realizations. These methods provide direct solutions in O (n log2 n) FLOPS by using the FFT to diagonalize the matrix and invert the eigenvalues. However, they are only applicable if the original system is circulant. Applying these algorithms to solve Toeplitz systems introduces large error at the boundaries of the domain. Direct Toeplitz solvers are based on the decomposition algorithms discussed above. A different approach to obtain stabilized direct solvers of complexity O n log22 n is pivoting, as covered by Van Barel et al. (2001) [66].

8.3 Structured Matrices

83

A newer kind of Toeplitz solver that is only of complexity O (n log2 n) is the Preconditioned Conjugate Gradient (PCG) method with circulant preconditioners. Its success relies on a good choice of the preconditioner to cluster the eigenvalues around unity. The original idea of using circulant preconditioners for Toeplitz systems emerged in the late eighties, discovered independently by Olkin (1986) [74] and Strang (1986) [92]. The extension to block Toeplitz matrices goes back to Chang and Olkin (1994) [9] and Holmgren and Otto (1992) [45]. A comprehensive overview over the method and a variety of available preconditioners is provided by Chang and Ng (1996) [8]. For some deconvolution problems in geostatistics, where the covariance matrices may have an extremely poor condition, the rationale of clustering the eigenvalues around unity for defining the preconditioner is not very beneficial. In Section 8.7, I quickly summarize the Preconditioned Conjugate Gradient method and existing circulant preconditioners for Toeplitz systems. Then, I discuss under what conditions covariance matrices have a poor condition. Finally, I elaborate on how to regularize preconditioners or otherwise modify the deconvolution problem to facilitate rapid convergence of the PCG method.

8.3 Structured Matrices Before diving into the details of spectral methods, I shall give a quick introduction to those matrix structures that are relevant for a detailed understanding of this chapter. A matrix is called a structured matrix if its elements obey specific rules, formulae, or other kinds of restrictions. Matrix structure is an important issue because it may simplify storage and matrix operations. Each specific structure allows for different simplifications that are more or less powerful.

8.3.1 Diagonal Matrices In diagonal matrices, all off-diagonal elements are zero, and only the values on the main diagonal need to be stored. Consider the diagonal elements of a diagonal matrix Λ (sized n × n) stored in the n × 1 vector λ. Matrix-vector multiplication of Λ with an n × 1 vector v simplifies to: Λv = λ ◦ v ,

(8.5)

in which ◦ denotes the elementwise or Hadamard product. This matrix-vector product only requires n multiplications compared to n2 if arbitrary n × n matrices are involved. Likewise, while inversion and decomposition of arbitrary n × n matrices usually has computational costs in the order of n3 , diagonal matrices can be inverted and decomposed in only n elementary 1 computational operations. The square root decomposition of Λ is Λ 2 , a diagonal matrix with the elementwise square roots of λ on the main diagonal. The inverse Λ−1 is obtained by inverting each element of λ separately.

8.3.2 Block Matrices A block-structured matrix is a matrix where a specific structure applies to blocks of elements. The blocks themselves again show the same structure. In applied matrix algebra, block-structured matrices appear in many two-dimensional problems. Three-dimensional problems result in level-3 blockstructures.

84

Spectral Methods for Geostatistics

The stiffness matrix in Finite Difference codes for 1-D regular grids may serve as an example. For each node, the stiffness matrix contains a diagonal entry and connections to all adjacent nodes of the grid. The stiffness matrix for the ground water flow problem, for example, is a tridiagonal matrix: only the main diagonal and the two adjacent diagonals have non-zero entries. 2-D Finite Difference applications produce a block-pentadiagonal structure, and the 3-D case results in level 3 block-heptadiagonal structure.

8.3.3 Toeplitz Matrices Another very important type of matrix with diagonally simplified structure are Toeplitz matrices, having constant values along each diagonal. In the following, all quantities related to Toeplitz matrices are primed. A n0x × n0x symmetric Toeplitz (ST) matrix T has the structure (Golub and Van Loan, 1996 [36], p. 193):   t0 t1 . . . tn0x −1  t1 t0 tn0x −2    T= . .. .. ..   . . . 0 0 tnx −1 tnx −2 . . . t0

The first row is given by the series t0 . . . tn0x −1 . To construct the (k + 1)-th row, shift the k-th row to the right by one, and fill the leading empty position with the k-th element of the series t1 . . . tn0x −1 . Symmetric block Toeplitz matrices with Toeplitz blocks have the same structure with the ti replaced by Ti , denoting ST blocks sized n0y × n0y , so that the total size is n0x n0y × n0x n0y . Symmetric level-d block-Toeplitz matrices are defined likewise. All level-d block Toeplitz matrices are uniquely defined by their first row or column.

8.3.4 Circulant Matrices Circulant matrices are a special case of Toeplitz matrices where the generating series is symmetric (e.g., Davis, 1979 [20]). In the following, unprimed quantities correspond to circulant matrices. Symmetric circulant matrices sized 2nx × 2nx are defined as (Golub and Van Loan, 1996,pp. 201-202 [36]):   c0 c1 ... c nx ... c1  c1 c0 cnx −1 c2     ..  .. . ..  .  . . C=  cn cn −1 c0 cnx −1  x  x   .  .. ..  ..  . . c1

c2

. . . cnx −1

...

c0

The first row is given by the series c0 . . . cnx . . . c1 (the index runs from zero to nx and down to one again to render it symmetric per definition). To construct the (k + 1)-th row, shift the k-th row to the right by one, and fill the leading empty position with the last element of the k-th row. In symmetric circulant matrices with circulant blocks, the ci are replaced by Ci , which themselves are symmetric circulant submatrices sized 2ny × 2ny , so the total size is 4ny nx × 4ny nx . Symmetric level-d blockcirculant matrices are defined likewise. All level-d block-circulant matrices are completely defined by the first row or column. The sum, product, square root decomposition and inverse of circulant matrices are again circulant. Further qualities will be discussed in the context of the Fourier Matrix. Please note that, if t0 . . . tn0x −1 = c0 . . . cnx , the leading block of the circulant matrix is identical to the Toeplitz matrix T defined above. Since circulant matrices are mathematically more convenient, this identity will be used to embed Toeplitz matrices in larger circulant matrices.

8.3 Structured Matrices

85

8.3.5 Vandermolde Matrices and the Discrete Fourier Matrix The last type of matrix structure I will discuss here is a special case of the Vandermolde matrix. A n × n Vandermolde matrix is uniquely defined by the elements ωi , i = 1 . . . n. The k-th row, k = 1 . . . n, consists of ωi to the power of (k − 1): 

  V= 

ω10 ω11 .. .

ω20 ω21 .. .

ω1n−1

ω2n−1

... ...

ωn0 ωn1 .. .

. . . ωnn−1



  . 

If the ωi are the n-th roots of unity, one obtains the 1-D discrete Fourier matrix. The d -dimensional discrete Fourier matrix is a level-d block Vandermolde matrix. Other publications might normalize 1 the Fourier matrix by a factor of n− 2 . The definition chosen here is in accordance with the discrete Fourier matrix (dftmtx) in the programming language MATLAB. The most interesting property of the discrete Fourier matrix is, that the matrix-vector product v = Fu is the Fourier transform of u. It can be computed superfast by the Fast Fourier Transform (FFT) (Cooley and Tukey, 1965 [17]) in O (n log2 n) FLOPS: v = Fu = F (u) .

(8.6)

While the original FFT algorithm requires n to be a power of two, the Fastest Fourier Transform in the West (FFTW) by Frigo and Johnson (1998) [31] allows superfast evaluation for arbitrary n. Both algorithms are available for the general d-dimensional case. F is a unitary matrix but for a normalizing factor of n, which means that its inverse is a multiple of its Hermitian transpose: FH F = nI .

(8.7)

The Hermitian transpose is the transpose of the complex conjugate. Further, it is symmetric: FT = F .

(8.8)

The reverse operation can be evaluated superfast by the inverse Fourier transform: u = F−1 v =

1 H F v = F −1 (v) . n

(8.9)

Another predicate of the Fourier matrix is, that it diagonalizes circulant matrices (e.g. Barnett, 1990, pp. 350-354 [3]) according to the diagonalization theorem: C=

1 H F ΛF , n

(8.10)

with C denoting an arbitrary n × n circulant matrix, Λ a diagonal matrix with the n eigenvalues λi of C on the diagonal, and F the Fourier matrix. Obviously, the Fourier matrix contains the universal eigenvectors for all circulant matrices. A proof for the 2-D case where F is the 2-D Fourier matrix and C is a block-circulant matrix with circulant blocks is given by Nott and Wilson (1997) [73]. After diagonalization, most matrix operations can be performed efficiently on the diagonal matrix Λ. The eigenvalues of an SCC matrix can be computed as follows. After multiplication by F, eq. (8.10) becomes: FC = ΛF . (8.11)

86

Spectral Methods for Geostatistics

One column of C contains all information, thus only the first column needs to be considered: FC1 = ΛF1 .

(8.12)

Since all entries of F1 are unity, this simplifies to: λ = FC1 = F (C1 ) ,

(8.13)

where λ is an n × 1 vector of the eigenvalues. Using FFT to evaluate FC1 renders the eigenvalue  decomposition O (n log2 n) compared to the conventional O n3 .

8.3.6 Properties and Structure of Covariance Matrices Covariance matrices are symmetric and positive definite per definition. If the random process is statistically second order stationary or at least intrinsic, its (generalized) covariance function is invariant under translation. When such a random process is discretized on a d-dimensional regular equispaced finite grid, the covariance matrix has symmetric level-d block Toeplitz structure. Random processes with the same properties, but defined in periodic domains, have symmetric level-d block circulant covariance matrices. A mathematical proof is given by Zimmerman (1989) [106].

8.4 Circulant Embedding The covariance matrices met in geostatistics have Toeplitz structure under the conditions discussed in Section 8.3.6. It is apparent from eq. (8.13) that circulant matrices are mathematically highly convenient. To tap the long list of efficient algorithms for circulant matrices, Toeplitz matrices are embedded in larger circulant matrices. Depending on the required matrix operation, other quantities need to be embedded as well to match the size of the enlarged matrix, and the result may have to be extracted from some resulting embedded quantity.

8.4.1 Graphical Example Consider that a finite domain Ω0 may be interpreted as a sub-domain of a larger virtual domain Ω. This process is referred to as embedding, and Ω0 and Ω are the embedded and the embedding domain, respectively. Let s0 be a random spatial function defined in Ω0 which is second order stationary with covariance function Q0 (h). In order to maintain the statistical properties of s0 in the embedding process, the covariance function for all separation distances h that are observable in Ω 0 must be identical in Ω. For circulant embedding, the embedding domain Ω and the embedded random variable s have to be periodic, rendering the covariance function Q (h) periodic as well. For simplicity, choose Ω twice the length and width of Ω0 , as exemplified in Figure 8.1. Since covariance functions are even functions by definition, the periodic covariance function Q (h) is obtained by simply mirroring Q0 (h) (see Figure 8.2). For the sake of efficient matrix operations, there is no need to actually generate the random space variable s in the embedding domain.

8.4 Circulant Embedding

87

Figure 8.1: Periodic Embedding

1

R(h)

0.8 finite periodic

0.6 0.4 0.2 0

0

0.2

0.4

h/L

0.6

0.8

1

Figure 8.2: Finite and periodic covariance function

8.4.2 Mathematical Description A very detailed and lucid explanation of the circulant embedding procedure is given by Kozintsev (1999) [62]. The condensed mathematical description of circulant embedding for Toeplitz matrices is as follows: to embed symmetric Toeplitz matrices in symmetric circulant matrices, extend the series t0 . . . tn0x −1 by appending the elements t1 . . . tn0x −2 in reverse order to obtain a series c0 . . . cnx . . . c1 , nx = n0x − 1. This corresponds to mirroring the covariance function Q0 (h) to render it periodic. To embed symmetric level-d block Toeplitz matrices in symmetric level-d circulant matrices, embed the

88

Spectral Methods for Geostatistics

Toeplitz blocks Ti in circulant blocks Ci , and then extend the series of the blocks to obtain a periodic series of blocks. Formally, Cirpka and Nowak (2004) [16] denote the extraction of an embedded Toeplitz matrix T from the embedding circulant matrix C as follows: T = MT CM ,

(8.14)

in which M is a n × n0 mapping matrix that transfers the entries of the finite embedded domain onto the periodic embedding domain. M has one single entry of unity per column. MT extracts the entries from the periodic embedding domain to obtain the entries for the finite embedded domain. Embedding a vector u0 can be written as: u = Mu0 ,

(8.15)

and extraction is denoted by u0 = MT u. In practice, the embedding is done by zero padding, and extracting is achieved by discarding the excessive elements.

8.4.3 Embedding Size Embeddings of arbitrary size are allowed under certain circumstances that are beyond the scope of this thesis. The simplest case, as described above, is to choose the embedding to double the number of nodes along each dimension. Larger embeddings extend the series with new elements from the covariance function before appending the reverse series. Smaller embeddings can be chosen if, e.g., the last 2k elements of the series are constant, so that the last k elements can serve as the mirrored image of the previous k elements. For the size of the embedding, two aspects are of special interest. First, for some spectral methods, the circulant embedding matrix has to be non-negative definite (e.g. Nott and Wilson, 1997, Dietrich and Newsam, 1993 [73, 25]) or positive definite (Kozintsev, 1999 [62]). Second, choosing powers of two for n is especially suited for standard FFT algorithms, whereas newer algorithms like the FFTW perform almost as well for n differing from a power of two. The impact of the embedding size onto the definiteness is discussed by Dietrich and Newsam (1997) [27].

8.5 Convolutions In this section, I summarize existing methods for convolution and the evaluation of quadratic forms via FFT. Then, I extend these methods to evaluate bilinear forms and matrix-matrix products like in eq. (8.1).

8.5.1 Matrix-Vector Products Using the diagonalization theorem (eq. 8.10), the matrix-vector product of a circulant matrix and a vector can be written as (Van Loan, 1992, pp. 205-209 [99]): Cu =

1 H F ΛFu n

= F −1 (F (C1 ) ◦ F (u)) ,

(8.16)

8.5 Convolutions

89

in which eqs. (8.13) and (8.5) were used. The same holds for level-d block circulant matrices when using the d-dimensional FFT. This procedure is called convolution via FFT. The same technique can be applied to Toeplitz matrices when using circulant embedding. Combining eqs. (8.16)&(8.15) yields (compare Kozintsev, 1999 [62]): Tu0

= MT CMu0  1 T M FH ΛF Mu0 = n

 = MT F −1 (F (C1 ) ◦ F (u)) ,

(8.17)

which implies that u0 has to be embedded in a larger vector u by zero-padding. Then, it is multiplied with the circulant matrix using convolution via FFT, and the result has to be extracted by discarding the superfluous elements.

8.5.2 Bilinear and Quadratic Forms 0 I will now present an extension of convolution via FFT to evaluate bilinear forms u0H 1 Tu2 . Starting T H from a modified version of the second line in eq. (8.17) and considering that M = M , I obtain: 0 u0H 1 Tu2

1 0H H H u M F ΛFMu02 n 1 1 (Fu1 )H Λ (Fu2 ) n n 1X ∗ (v1 )k λk (v2 )k , n

= = =

(8.18)

k=1

in which v1 = Fu1 and v2 = Fu2 are computed by FFT, v1∗ is the complex conjugate of v1 , and k is an index running over all elements of the vectors. For u1 = u2 , this simplifies to the quadratic form (compare Nott and Wilson, 1997 [73]): nx ny 0T

0

u Tu =

X

k=1

λk |vk | .

(8.19)

8.5.3 Matrix-Matrix Multiplications Now I extend the methods for matrix-vector multiplication and bilinear forms to matrix-matrix multiplications. Consider H an m × n matrix. The computation of Qsy = Qss HT , in which Qss is a symmetric Toeplitz matrix, can be split up into single vector-matrix multiplications: Qsy,k = Qss uk

k = 1...m,

where uk is the the k-th column of HT . Likewise, the product HQss HT can be split up into m2 sub-problems:  HQss HT kl = uTk Qss ul .

As HQss HT is symmetric, only the upper triangle and the diagonal has to be computed. All of these tasks can be performed using eqs. (8.17)-(8.19).

90

Spectral Methods for Geostatistics

8.5.4 Exemplification To demonstrate the power of evaluating a bilinear form uT1 Qss u2 via FFT compared the standard method, I implemented and timed both methods in MATLAB. As the standard method runs out of memory quite easily, I coded an additional method that exploits the Toeplitz structure of Qss to reduce storage. It subsequently generates the columns of Qss by permutation, storing only one column at a time. The MATLAB codes ran on a contemporary personal computer (1.5GB RAM, 900MHz AMD Athlon CPU) in a test series, evaluating artificial problems with increasing problem sizes n. As exemplary problem, I chose a Gaussian covariance function with 20 times the correlation length per domain length, so that n = 1.3n0 and n = 1.3n0 could be chosen as adequate embedding. The n × 1 vectors u1 and u2 were drawn from a random generator. Figure 8.3 compares the storage requirements of the standard and the spectral method. On my reference computer, the standard method ran out of memory at n = 212 = 4096 and the columnwise standard method at n = 216 = 65.536, while the FFT-based method could be applied to grids of up to n = 221 = 2.097.152 without memory problems. 4

10

standard spectral

RAM [MByte]

2

10

0

10

−2

10

−4

10

1

10

2

10

3

4

10 10 Matrix Size

5

10

6

10

Figure 8.3: Memory consumption for Qss Figure 8.4 illuminates the contrast in CPU time for the matrix-matrix multiplication, including the generation of Qss . Although the columnwise standard method has significant overhead for the permutation of the columns in Qss , it is faster than the standard method by one order of magnitude as it only computes n entries of Qss . Due to its embedding overhead, the FFT-based method is slower than both standard methods for small n. Being of lower order in n, it outruns both standard methods for n > 28 = 256. At the upper limits, the columnwise standard method takes approximately 30 minutes for its maximum allowable problem size (n = 216 grid points). In comparison, the FFTbased method takes only seven seconds for n = 216 . For the maximum problem size on the reference computer, n = 221 , it takes only eight minutes.

8.6 Realizations of Non-Stationary Random Fields

91

4

10

standard columnwise spectral

2

Time [s]

10

0

10

−2

10

−4

10

1

10

2

10

3

4

10 10 Matrix Size

5

10

6

10

Figure 8.4: CPU time for quadratic forms of Toeplitz matrices

8.6 Realizations of Non-Stationary Random Fields This section summarizes the decomposition methods by Dietrich and Newsam (1993) [25] and Cirpka and Nowak (2004) [16]. Later in this section, I present a modification of the latter that allows to simulate non-stationary random variables with explicit cross-correlation between zones of different covariance functions.

8.6.1 Stationary Random Fields Dietrich and Newsam (1993) [25] showed that an unconditional realization of a stationary random function discretized on a regular equispaced grid can be generated as follows. For the covariance matrix, which is a symmetric Toeplitz matrix T, choose an embedding M according to eq. (8.14). The embedding has to satisfy the criterion that the resulting symmetric circulant matrix C is nonnegative definite (Dietrich and Newsam, 1997 [27]). A complex square root decomposition of C is given by: r 1 1 1 Cc2 = Λ2 F , n

(8.20)

where Λ 2 is a diagonal matrix with the the square root of each eigenvalue of C on the diagonal. The proof is simple:  1 H 1 1 1 1 1 Cc2 Cc2 = FH Λ 2 Λ 2 F = FH ΛF = C . n n 1

Generate two n × 1 random vectors ε1 and ε2 with ε1 , ε2 ∼ N (0, I) to obtain a complex random

92

Spectral Methods for Geostatistics

vector εc = ε1 + iε2 . Then compute: u0

 1 H = MT Cc2 εc r 1 H 1 F Λ 2 εc = MT n r  1 −1  1 T F λ 2 ◦ εc . = M n

(8.21)

Here, u0 is a complex n0 × 1 vector, u0 = u01 + iu02 , and both u01 and u02 are unconditional realizations. The last line in eq. (8.21) implies to compute the eigenvalues of C using eq. (8.13), multiply elemen√ twise with the complex random vector εc , take the inverse FFT, divide by n and finally extract the embedded realizations. A proof that u1 and u2 have covariance T is provided by Dietrich and Newsam (1993) in the Appendix [25].

8.6.2 Non-Stationary Random Fields Cirpka and Nowak (2004) [16] show how to define the covariance matrices for selected special cases of non-stationary random fields. The cases of interest here are variance scaling and covariance blending. Variance Scaling Variance scaling is mathematically described as follows (Cirpka and Nowak, 2004 [16]): Q0 = A0 TA0 ,

(8.22)

in which Q0 is the non-stationary covariance matrix subject to variance scaling, T is a stationary covariance matrix with symmetric Toeplitz structure, and A0 is a real diagonal scaling matrix with nonnegative entries on the diagonal defined by the n0 × 1 vector a’. The embedded version of eq. (8.22) is: Q0 = MT QM = MT ACAM , (8.23) where Q is the embedded covariance matrix subject to variance scaling. A = MA0 MT is the zeropadded version of A0 , and A is defined by the embedded vector a = Ma0 on its main diagonal. C is a symmetric circulant matrix that embeds T as defined in eq. (8.14). A symmetric real square root decomposition of a symmetric nonnegative-definite circulant matrix C is given by: 1 1 1 (8.24) Cr2 = FH Λ 2 F , n The proof is again simple:  1 H 1 1 1 1 1 Cr2 Cr2 = 2 FH Λ 2 FFH Λ 2 F = FH ΛF = C , n n in which eqs. (8.7) & (8.10) were used. Since C is symmetric, real and nonnegative-definite, Λ is 1

real and nonnegative-definite. Hence, Cr2 is again symmetric, real and nonnegative-definite, and has symmetric circulant structure. A real square root decomposition for Q is: 1

1

Qr2 = Cr2 A ,

(8.25)

8.6 Realizations of Non-Stationary Random Fields

93

which can easily be checked using eq. (8.24). A realization with non-stationary covariance Q0 can be generated using a real n × 1 random vector εr ∼ N (0, I): u0

 1 H = MT Qr2 εr 1

= MT ACr2 εr . It is easy to prove that: 1

(8.26)

  Cov [u0 ] = E u0 u0T = MT ACAM = Q0 .

Since Cr2 is circulant, the matrix-vector product in eq. (8.26) is done by convolution via FFT. Then, the scaling A is performed through elementwise multiplication with a, followed by extracting u0 . Covariance Blending Cirpka and Nowak (2004) [16] also provide a mathematical description of covariance blending: Q0 = A01 T1 A01 + A02 T2 A02 .

(8.27)

The symbols are defined in accordance with eq. (8.22). The embedded equivalent of eq. (8.27) is: Q0 = MT QM = MT (A1 C1 A1 + A2 C2 A2 ) M .

(8.28)

The embedded covariance matrix Q has the following complex square root decomposition: 1

1

1

(8.29)

= A 1 C1 A 1 + A 2 C2 A 2 = Q .

(8.30)

2 2 Qc2 = Cr,1 A1 − iCr,2 A2 .

Once more, the proof is trivial: 

1

Qc2

H

1

Qc2

Now, a procedure similar to eq. (8.21) may be applied, this time using the complex random vector ε c :  1 H = MT Qc2 εc   1 1 2 2 = MT A1 Cr,1 + iA2 Cr,2 (ε1 + iε2 )   1 1 2 2 Re (u0 ) = MT A1 Cr,1 ε1 − A2 Cr,2 ε2 = u 1   1 1 2 2 Im (u0 ) = MT A1 Cr,1 ε2 + A2 Cr,2 ε1 = u 2 . u0

(8.31)

The proof that the covariance of u1 and u2 is Q0 follows from eq. (8.30) in only a few steps. Since the entries of εc are random, the negative sign in line three of eq. (8.31) is irrelevant. Further, since ε1 and ε2 are uncorrelated, independent realizations for each covariance model can be generated and scaled using eq. (8.26) with convolution via FFT and then summed. The entire procedure may be applied for an arbitrary number m of blended covariance models: ! m X 1 0 T 2 u = M Ak Cr,k εk . (8.32) k=1

94

Spectral Methods for Geostatistics

Covariance Blending and Zonation with Explicite Correlation In case the scaling functions in eq. (8.27) are smooth and sufficiently overlap, the resulting realizations appear to show a continuous blending between the different covariance models. However, the elements of the individual realization to be summed up in eq. (8.32) are not explicitely correlated. Covariance zonation is the special case of covariance blending where the entries of the scaling functions are complementarily either zero or unity. If using eq. (8.32) to generate realizations with covariance zonation, the result is discontinuous at the transition between the zones. Merely choosing ε1 = ε2 = . . . = εm introduces some indefinite correlation between the individual zones. To enforce explicite correlation, I define the covariance matrix as follows: 1

1

1

1

2 2 2 2 Q = MT Q0 M = Cr,1 A1 A1 Cr,1 + Cr,2 A2 A2 Cr,2 .

(8.33)

The corresponding complex square root decomposition of Q is: 1

1

1

2 2 Qc2 = A1 Cr,1 − iA2 Cr,2 ,

(8.34)

which is basically the transpose of the square root decomposition defined in eq. (8.29). The proof that 1

Qc2 is a complex square root decomposition of Q is evident. In analogy to eq. (8.31), this leads to the following method to generate realizations with covariance Q0 :  1 H u0 = MT Qc2 εc  1  1 2 2 = MT Cr,1 A1 + iCr,2 A2 (ε1 + iε2 ) . (8.35)

The effective difference to generating realizations without explicit correlation across the transitions of the zones is, that the scaling is applied to the random vectors ε1 and ε2 instead of applying them after convolution. Again, this leads to the following method for an arbitrary number m of zones: ! m X 1 2 u0 = M T Cr,k Ak ε k . (8.36) k=1

8.6.3 Illustrative Examples Figure 8.5 provides examples for each kind of non-stationarity discussed above. Subfigure (a) shows a simple case of variance scaling, where the variance increases from zero to unity from the left to the right. Subfigure (b) exhibits an example of covariance blending with linear transition from an isotropic Gaussian covariance model on the left to an anisotropic exponential covariance model on the right. Subfigure (c) shows the same covariance models in a zonation setup without crosscorrelation at the zone transitions. Finally, Subfigure (d) demonstrates zonation with explicite crosscorrelation between the zones. The transition in (b) is smooth since the scaling functions for the covariance blending themselves are smooth. In (c), the scaling functions are not smooth. Instead, they switch from zero to unity to indicate the membership of points to the individual zones. The resulting realization is discontinuous at the transition. Here, it becomes evident that the seeming correlation at the transition in (b) originated only from the smoothness of the scaling function. The explicite cross-correlation in (d) enforces a continuous and smooth transition between the zones regardless of the scaling functions.

8.7 Deconvolution

95

Figure 8.5: Different types of non-stationary realizations

8.7 Deconvolution Deconvolution is the solution of a Toeplitz system Ax = b. As discussed in Section 8.2, the fastest solver for symmetric real positive-definite Toeplitz systems is the Preconditioned Conjugate Gradients (PCG) method with circulant preconditioner. In this section, I summarize this algorithm and circulant preconditioners. Then, I discuss situations where the circulant PCG method fails with the circulant preconditioners that are available up to date. Finally, I present regularized circulant preconditioners and a geostatistical approach to deconvolution that solve the deconvolution problems whenever the standard circulant PCG method fails. To prove the power of the new approaches, I include a small performance test.

8.7.1 Preconditioned Conjugate Gradients The Conjugate Gradients Method is attributed to Hestenes and Stiefel (1952) [41]. The following is the preconditioned version taken from Shewchuk (1994) [89], who offers the most detailed and reader-friendly derivation of the algorithm. Algorithm 1 (Preconditioned Conjugate Gradients, PCG): The linear system Ax = b is to be solved for a real symmetric positive-definite n × n matrix A. A preconditioner M, an initial guess x 0 and an error tolerance ε < 1 are provided. Initialize the algorithm with the counter k = 1, error vector r = b − Ax 0 , the preconditioned conjugate gradient d = M−1 r, the residual δ1 = rT M−1 r and the initial residual δ0 = δ1 . Then,

96

Spectral Methods for Geostatistics

1. update the trial solution x using: q = Ad α =

δk dT q

x = x + αd 2. update the error vector and residual: r = r − αq s = M−1 r

δk+1

= rT s

3. Update the preconditioned conjugate gradient d = s+

δk+1 d δk

4. Increase k by one and repeat until k > kmax or δk+1 < ε2 δ0 . The variables q, α and s are auxiliaries to reduce the computational costs. The PCG algorithm requires only one matrix-vector product q = Ad per iteration step. The line s = M−1 r merely implies to apply the preconditioner. The preconditioner is usually chosen such that this operation is cheap. In the absence of roundoff error, the PCG algorithm converges towards the exact solution within m steps, where m is the number of distinct eigenvalues. If the preconditioner is chosen such that the eigenvalues of the preconditioned matrix M−1 A are clustered around unity, the PCG converges in only a few steps. For a more detailed discussion, see Shewchuk (1994) [89].

8.7.2 Circulant PCG for Toeplitz Systems Usually, A is a sparse matrix, and the preconditioner M is chosen to be sparse and computationally inexpensive to apply in s = M−1 r. If A is a Toeplitz matrix, then the matrix-vector products Ax0 and q = Ad are executed in O (n log n) FLOPS using convolution via FFT. The task is then to find a preconditioner that clusters the eigenvalues of M−1 T around unity and at the same time is cheap to apply although T is not a sparse matrix. If a circulant matrix is chosen as preconditioner M, evaluating s = M−1 r can be performed again in no more than O (n log n) FLOPS using the highly efficient methods available for the solution of circulant systems (e.g. Good, 1950, Rino, 1970, Searle, 1979 [38, 81, 88]). In principle, these methods are identical to convolution via FFT (eq. 8.16) when dividing instead of multiplying by the eigenvalues λi . Poorly Conditioned Systems In some cases, the Toeplitz system is extremely ill-conditioned, in particular when the convolution to be reversed has a highly diffuse nature. An intuitive example for this is identifying the initial shape of a contaminant plume that was subject to diffusion over large time-spans. Obviously, information destroyed by diffusion is irrevocably lost due to the fundamental theorems of thermodynamics.

8.7 Deconvolution

97

Diffusion is mathematically equivalent to a convolution with a Gauss-shaped covariance function, resulting in a nearly singular Toeplitz matrix. Seen from the standpoint of digital filtering, the convolution with a Gauss-shaped covariance function is a low-pass filter that eliminates all high frequencies from the original signal. To obtain the original signal, the process of convolution has to be reversed mathematically. Since the low-pass totally annihilated the high frequency components, it is self-evident that deconvolution in this case is an intractable problem regardless of the type of solver used. 20

10

Gaussian exponential spherical

15

cond(T)

10

10

10

5

10

0

10

−3

−2

10

10

−1

λ / dx

10

0

10

Figure 8.6: Condition number of different Toeplitz matrices Especially Gaussian covariance functions with a large correlation length lead to poorly conditioned or almost singular covariance matrices Qss . In Figure 8.6, I show how the condition of n × n Toeplitz matrices, here n = 100, corresponding to various covariance functions depend on the correlation length scale. Here, the correlation length λ is normalized by the domain length L. In contrast to the exponential and the spherical model that are comparably good-natured, the matrix condition for the Gaussian model rises more than exponentially with the correlation length. Apparently, covariance matrices for the Gaussian model have an extremely poor condition if the domain is sized less than 20 correlation length scales. If using the circulant PCG method to evaluate Q−1 ss s, the resulting preconditioners themselves, chosen to cluster the eigenvalues around unity, show an extremely poor condition. Hence, numerical noise is amplified by the preconditioner, and the algorithm stagnates or even diverges. Regularized Circulant Preconditioners In general, the PCG algorithm converges fastest when the preconditioner M is chosen to cluster the eigenvalues of M−1 A around unity. Different choices of circulant preconditioners that follow this directive by minimizing certain norms are summarized by Chang and Ng (1996) [8]. In case the Toeplitz matrix has an extremely poor condition, I suggest to add a regularization term to the diagonal of the preconditioner. Assume that C1 is the n × 1 first column of a circulant preconditioner C. According to eq. (8.13), the eigenvalues of C are: λ (C) = FC1 = F (C1 ) .

98

Spectral Methods for Geostatistics

Adding a well-conditioned positive definite circulant regularization matrix εR onto C yields a regularized circulant preconditioner C∗ with the eigenvalues: λ (C∗ ) = F (C1 + εR1 ) = F (C1 ) + F (εR1 ) . If choosing R = I, the regularized eigenvalues are simple to obtain: λ (C∗ ) = F (C1 + εI1 ) = λ (C) + ε , T

since the Fourier Transform of I1 = [1, 0, 0, . . . , 0] is a vector of unit entries. The condition c∗ of C∗ is given by: λmax (C) + ε c∗ = . λmin (C) + ε Given a required minimum condition c∗min for the preconditioner, I choose ε according to: ε=

λmax (C) − λmin (C) c∗min . c∗min − 1

(8.37)

Geostatistical Approach to Deconvolution An alternative method to stabilize the problem of deconvolution is to interprete it in analogy to cokriging with known zero mean. For this purpose, I assume the unknowns to be distributed with x ∼ N (0, Qxx ). As measurement equation, I use: b = Tx + r , in which the measurement  error is characterized by r ∼ N (0, R). Hence, b is distributed with b ∼ N 0, TQxx TT + R . This setup leads to the following deconvolution estimator: x

−1 = Qxx TT TQxx TT + R y  −1 −1 = T + R (Qxx T) b.

Up to here, I have not specified Qxx and R. Choosing x to be uncorrelated, i.e. Qxx = ε1 I, implies only the weakest assumptions. Then: =

x



T + (ε1 T)

−1

R

−1

b.

Choosing R = ε2 TTT declares the error to be white noise on the side of b, while R = ε2 I declares the error to be white noise on the side of x. An intermediate assumption is R = ε2 T. These options lead to, respectively: x

=

x

=

x

 

ε2 1+ ε1

 −1 T b

−1 ε2 −1 T b ε1  −1 ε2 = T+ I b. ε1 T+

Apparently, the first option simply scales the result by a constant and does not affect the condition of the system to be solved. It seems to be of no further use. Options two and three, in contrast,

8.7 Deconvolution

99

open up the opportunity to control the condition of the system to be solved. While the control power of option two depends on the distribution of the eigenvalues of T, it is clear for option three that choosing the limit of ε = ε2 /ε1 → ∞ will enforce a condition of unity, making this option the most pragmatic approach. In contrast to using a regularized preconditioner, the geostatistical approach to deconvolution modifies the entire system to be solved by filtering out noise of a specified magnitude. I quantify the error introduced by the noise-filter parameter ε using the posterior covariance of cokriging with known zero mean (compare Kitanidis, 1996 [57]): Qxx|b

 −1 = ε1 I − ε1 ITT Tε1 ITT + ε2 T Tε1 I = ε (T + εI) .

The choice of the noise-filter parameter ε is a tradeoff between stability and speed on the one hand and accuracy on the other hand. As for the actual value for ε, the main interest is in choosing a small value that is just sufficient to regularize the system of equations.

8.7.3 Test Cases To exemplify the problems of standard circulant preconditioners for poorly conditioned Toeplitz systems and to show the power of regularized preconditioners and the geostatistical approach, I discuss a small test case. Using the Gaussian covariance model with different normalized correlation length scales λ/L and unit variance, I set up symmetric n × n Toeplitz matrices T with n = 2000. A corresponding artificial right-hand side vector b for each Toeplitz matrix is computed from b = Tx using convolution via FFT, where x is a n × 1 vector with unit entries. Then, I use the PCG method with circulant preconditioners to solve the Toeplitz system and recover x.

20

standard regularized geostatistical

Steps

15 10 5 0 −4 10

−3

10

−2

λ /L

10

−1

10

Figure 8.7: Number of PCG iterations for different circulant preconditioners As example of conventional circulant preconditioners, I employ the preconditioner by Strang (1986) [92] since it is the simplest choice and still is one of the most efficient (see the numerical

100

Spectral Methods for Geostatistics

example in Chang and Ng, 1996 [8]). As can be seen in Figure 8.7, the PCG algorithm with the standard Strang preconditioner is efficient below a normalized correlation length of λ/L = 0.001. Above that, the condition of the Toeplitz system quickly degenerates, resulting in a nearly singular preconditioner. The required number of iteration steps explodes instantaneously, and the algorithm does not converge. The second example is the Strang preconditioner with the newly proposed regularization, fixing the condition of the preconditioner at c∗ = 105 . For low normalized correlation length scales, it behaves identically to the case without regularization. At a normalized correlation length above 0.001, however, it stabilizes the number of necessary iteration steps at about twelve. In the third example, I approach the deconvolution problem in the geostatistical framework. I determine a robust value for the noise-filter parameter ε by the same method as for the regularization parameter, this time restricting the condition of the entire system to a maximum of c∗ = 105 . The result for x is only an approximation at normalized correlation length scales above 0.001, but the number of iteration steps never rises above five. Figure 8.8 compares the true solution for x and the filtered result from geostatistical deconvolution for λ/L = 0.1, plotted over normalized spatial coordinate ξ/L. The value of the noise-filter parameter ε necessary to fix the condition of the system is at 0.3548. Whether this error is acceptable or not depends on the objectives of the underlying application.

1.02

filtered true

x

1

0.98

0.96

0

0.2

0.4

ξ/L

0.6

0.8

1

Figure 8.8: Impact of filtering in circulant PCG

8.8 Summary and Conclusions In this chapter I have discussed the three basic types of matrix operations in geostatistics that are performed on the auto-covariance matrix of the unknowns. These are convolution-type, decompositiontype and deconvolution-type operations. Computational costs often restrict the application of geostatistics to problems with a large number of unknowns. Up to date, especially Bayesian methods have seriously suffered from these restrictions.

8.8 Summary and Conclusions

101

I showed that for each type of operation, there are methods to greatly reduce the computational costs. For this, the covariance of the unknowns has to be statistically second order stationary or at least intrinsic, and the unknowns must be defined on a regular equispaced grid, imposing Toeplitz structure onto the auto-covariance matrix. To complete the toolbox of highly efficient methods for the purposes of this thesis, I added (1) a method to evaluate bilinear forms of Toeplitz matrices, (2) a method to generate realizations for certain kinds non-stationary covariance models, and (3) a stabilization for existing deconvolution methods for the case of poorly conditioned systems of equations. These spectral methods in general perform the matrix multiplications of interest in O (n log n) FLOPS and exploit the Toeplitz structure to reduce storage requirements for the auto-covariance matrix to O (n) bytes. Altogether, this allows to apply geostatistics and especially Bayesian methods like the Quasi-Linear Geostatistical Approach to larger problems, with numbers of unknowns in the range of n = 1.000.000 and higher.

Chapter 9

Finite Element Formulations This chapter gives a condensed introduction to the Finite Element Method and demonstrates how to achieve computational savings through analytical expressions for certain element-related matrices. For my thesis, I use the Finite Element Method to solve partial differential equations like the governing equations and their adjoint state equations, and to evaluate the differential expressions occurring in the sensitivities (Chapter 7). Unlike in the Finite Volume or the Finite Difference Method, the Finite Element Method explicitely includes definitions for differential expressions of state variables. I chose the Standard Galerkin Finite Element method to solve all flow-related equation. For this kind of partial differential equations, it is known to be among the best available discretization schemes. For all transport-related equations, I use the Streamline Upwind Petrov Galerkin method (Brooks and Hughes, 1982 [6]). It is the most suitable version of the Finite Element Method for advectivedispersive transport. I can apply the spectral methods discussed in Chapter 8 only if I discretize the domain by a regular equispaced grid. On that grid, I define parameters like hydraulic conductivity and the scalar apparent dispersion coefficient as cell-wise constant discrete values to comply with the expressions for the sensitivities derived in Chapter 7 (eqs. 7.42 and 7.49). The very same grid I use for the Finite Element Method to avoid mapping the parameter values between several grids. Other discretization schemes might be better for advective-dispersive transport, like the streamline oriented Finite Volume Method. It excels in keeping numerical dispersion at low levels and is a valuable tool especially for low local transverse dispersion (see, e.g., Cirpka et al., 1999 [12]). In contrast to the Finite Element Method, it satisfies the mass balance locally instead of merely globally. I decided not to use this method for two reasons. First, the streamline-oriented grid would change in each iteration step of the conditioning procedure, and second, the parameters would have to be mapped between the equispaced grid and the streamline-oriented grid. The restriction to a regular equispaced grid leads to the definition of finite elements with identical rectangular geometry. This allows to find and use many analytical expressions for element-related matrices in the Finite Element Method. The main focus of this Chapter is on presenting these analytical solutions that speed up all computations by orders. I cover the basics of the Finite Element Method only to accurately define the formulations used. Details of the Finite Element Method and its properties are covered in popular textbooks (e.g. Fletcher, 1996 vol. 1 and 2, Hughes 1987, Reddy, 1993 [29, 30, 46, 80, 6]).

9.1 Discretization by the Finite Element Method

103

9.1 Discretization by the Finite Element Method 9.1.1 Interpolation The Finite Element method discretizes the model domain by finite elements. Each element is characterized by several nodes. In case of rectangular bilinear 2-D elements, there are four nodes located at the corners. At each node, the global spatial coordinates and the values of the state variables are defined. In the interior of each element, these quantities are interpolated by continuous interpolation functions. Parameters that describe the physical properties of the domain may be defined either as elementwise constant values, or they may be defined at the nodes and then interpolated as well. The isoparametric concept uses one function for all interpolations. Often, the interpolation function is called the shape function. The shape function is defined in local coordinates for each element. It consists of contributions for each node. For rectangular 2-D elements, it is N = [N1 N2 N3 N4 ] . If the local node indices start in the upper left corner and increase counterclockwise, then choosing bilinear interpolation leads to the following shape function: N1 N2 N3 N4

= = = =

1 dx dy ( 1 dx dy ( 1 dx dy ( 1 dx dy (

dx dx

− x )( − x )( x )( x )(

dy dy

y − y y − y

) ) ) ).

Here, dx and dy define the size of the element, and x and y are the local coordinates defined on the interval 0 . . . dx and 0 . . . dy , respectively. The nodal values zˆ1 . . . zˆ4 of a state variable z are arranged in a 4 × 1 column vector, so that the interpolated value z˜ inside the element is given by:  zˆ1  zˆ2   z˜ = Nˆ z = [N1 N2 N3 N4 ]   zˆ3  . zˆ4 

Using the same interpolation for both the local coordinates and the state variables is called the isoparametric concept.

9.1.2 Method of Weighted Residuals Consider a general differential equation of the form D (z) = 0 , where z is the state variable and D denotes a set of arbitrary differential operators. Replacing the exact solution for z by an interpolated approximation z˜, introduces a residual ε. The method of weighted residuals requires the residuals to vanish if multiplied by the weighting function W integrated over the entire domain Ω Z Z WD (˜ z ) dΩ = Wε dΩ = 0 . Ω



104

Finite Element Formulations

Standard Galerkin In the standard Galerkin Finite Element Method, the weighting function is identical to the transpose of the shape function, so that W = NT . Streamline-Upwind Petrov Galerkin To prevent oscillations in advection-dominated transport, the streamline upwind Petrov-Galerkin (SUPG) method uses an upstream-weighting in the weighting function (Brooks and Hughes, 1982 [6]): Wup = NT + τ (∇N)T v , in which τ is the upwind-coefficient. For bilinear elements, τ is r   Pe 2 P e2 − ≈ , τ = coth 2 Pe 36 + P e2 and P e is the grid-Peclet number. The grid-Peclet number for 2-D grids can be expressed as: √ kvk 2 dx dy q Pe = , Dl d2x + d2y

in which the fraction is sometimes referred to as the effective grid spacing de , kvk is the absolute value of velocity, and Dl is the total longitudinal dispersion. The derivative of the modified weighting T function is identical to that of the shape function, so that ∇Wup = ∇N .

9.2 Groundwater Flow The starting point is the steady-state flow equation with isotropic K and a source term. Approximatb and q ˆ s yields: ing the hydraulic head φ and the source term qs by the interpolated nodal values φ b − Nˆ −∇ · (K∇Nφ) qs = ε .

Now, I apply the standard Galerkin weighting function W = NT , integrate over each element volume and set the weighed residual to zero: Z Z   T b − N ∇ · K∇Nφ dV = NT Nˆ qs dV . Vel

Vel

In the following, ∇ · () is denoted by ∇T (). Since the nodal values are not a function of space and conductivity is elementwise constant, both can be moved out of the integral. Further, applying Green’s first theorem to the left-hand side removes the second derivative from the shape function. This step is necessary since the second derivative of a bilinear function vanishes and because the boundary integrals are convenient for defining boundary conditions: Z Z Z T b= b ˆs + (∇N) ∇N dV K φ NT N dV q NT nT ∇N dΓ K φ Vel

Vel

Γel

Assembling all element-related equations leads to the global Finite Element system of equations. The boundary integral vanishes at the element boundaries in the interior of the domain. The boundary

9.3 Temporal Moments

105

conditions chosen for the groundwater flow equation cancel out or quantify the remaining terms at the boundary of the domain. Dirichlet boundary conditions entirely eliminate the columns and rows of the corresponding boundary nodes by rearranging the global system of equations. Neumann b in the boundary term: ˆ b = K∇Nφ boundary conditions quantify the boundary flux via specifying q Z Z Z T b= ˆ s + NT nT dΓ q ˆb . (∇N) ∇N dΩ K φ NT N dΩ q Ω



Γ

A short denotation of the global system of equations is:

b = Mˆ Sφ qs + Iˆ qb ,

where S is the global stiffness matrix for the hydraulic heads, M is the global mass matrix, I is the global inflow matrix, and the right-hand side is called the load vector. In the element-related equations, the equivalents of S, M and I are called the element stiffness, mass and inflow matrices, respectively. They are identical for all elements that share the same geometry.

9.3 Temporal Moments Here, I start with the moment generating equation for the k-th temporal moment in a steady-state divergence-free flow field. Using interpolated nodal values and the method of weighted residuals yields: Z Z Vel

ˆ k − W∇ · (D∇Nm ˆ k ) dV = WvT ∇Nm

ˆ k−1 dV . WkNm

Vel

Applying Green’s first theorem to the dispersive term, setting up the global system of equations and using the SUPG weighting function leads to Z



NT vT ∇N + (∇N) Z





T



 ˆk = D + τ vvT ∇N dΩ m

 ˆ k−1 , NT + τ (∇N)T v N dΩ k m

in which summation over all elements is implied and the dispersive boundary term is eliminated by the boundary conditions of the transport problem. Now, I apply Green’s theorem to the advective term. In the resulting boundary term, I replace the velocity v by the boundary fluxes from the flow problem: Z Z T T1 ˆ b N dΓ m ˆk+ ˆk = N n q (∇N)T vT N + τ (∇N)T vvT ∇N + (∇N)T D∇N dΩ m θ Γ Ω Z  T  ˆ k−1 . N + τ (∇N)T v N dΩ k m Ω

In contrast to other approaches that inconsistently approximate the boundary fluxes using a diagonalized form of the load vector from the flow problem, my formulation is fully consistent and does not suffer from oscillations at Neumann boundaries.

The moment-generating equation for the second central temporal moment in a divergence-free steady-state flow field differs from the generating equations for the k -th temporal moment only through the following source term (eqs. 3.11&3.13). After normalizing the zeroth moment, the source term simplifies to: 2∇m1 · (D∇m1 ) .

106

Finite Element Formulations

For the sake of computational efficiency, I approximate the source term by the standard Galerkin method: Z T 2 (∇N) D (∇N) m ˆ 1 N dV m ˆ1. V

9.4 Adjoint States Equations The adjoint state equations for the hydraulic head and temporal moments differ from the original equations only by additional source terms, reversed flow direction and the boundary conditions. In the following, I discuss only the additional source terms. All of the additional source terms are of the same type. To state an example, I will explain how to evaluate the source term in the adjoint state for the first temporal moment (eq. 7.40) within the Finite Element framework. The source term is   D −4∇ · ψ2c ∇m ¯1 . m0 After normalizing the zeroth moment, I apply the method of weighted residuals to obtain: Z b2c . ˆ 1 N) dV ψ −4 W∇ · (D (∇N) m Vel

Then, I apply Green’s theorem to shift the divergence operator onto the weighting function. Further, I use the standard Galerkin weighting function for the sake of computational efficiency, leading to Z T b2c ˆ 1 N dV ψ (∇N) (∇N) m 4D Vel

−4D

Z

Γel

b2c . ˆ 1 N dΓ ψ NT nT (∇N) m

(9.1)

In the adjoint state equation for the hydraulic head, boundary terms of the following form arrear:   Kg n · ∇m ¯ 2c ψ2c . θ

Obviously, their standard Galerkin formulation is identical to the element boundary integral from eq. 9.1: Z Kg b2c . ˆ 2c N dΓ ψ NT nT (∇N) m θ Γ

9.5 Post-processing for Sensitivities The expressions for sensitivities derived in Chapter 7 are:  Z  ¯ ¯ ∂Zi K ¯ φ¯ · ∇ψφ + ψ1 K ∇φ¯ · ∇m = − K∇ ¯ 1 + ψ2c ∇φ¯ · ∇m ¯ 2c dVj ∂Yj θ θ Vj Z   ∂q 1 ¯ φ¯ · ∇φ¯ dVj = − K∇ ∂Yj φˆin Vj Z  ∂Zi ψ2c ¯ m ¯ m = ∇ψ2c · D∇ ¯ 2c − 2 ∇m ¯ 1 · D∇ ¯1 ∂Ξj m0 Vj  ¯ m ¯ m +∇ψ2 · D∇ ¯ 2 + ∇ψ1 · D∇ ¯ 1 dVj .

9.6 Computational Speedup

107

Apparently, these expressions consist of two different types of terms: (1) scalar products of gradients multiplied by elementwise constant factors and (2) the latter multiplied by an adjoint state variable. I treat all terms of the first type as stated on the following example. Using the standard Galerkin method leads to Z Z T T ¯ b bφ , ¯ ¯ K∇φ · ∇ψφ dVel = K φ (∇N) (∇N) dVel ψ Vel

Vel

which is apparently based on the element stiffness matrix M.

For terms of the second type, I use expressions of the form: Z

¯ K ψ1 ∇φ¯ · ∇m ¯ 1 dVel θ Vel

Z ¯ K T b dVel ψ b1 . ˆ = − m (∇N)T (∇N) φN θ 1 Vel

This integral is formally identical to the volume integral in the source terms for the adjoint state equations for temporal moments (eq. 9.1).

9.6 Computational Speedup The standard method for assembling the global system of equations in the Finite Element Method is to evaluate all element-related matrices elementwise in a big loop over all elements. Depending on the type of the element integrals, this includes further sub-loops for numerical integration, resulting in fairly large computational costs. The analytical expressions presented in the Appendix greatly reduce these computational costs for several reasons. First, they replace numerical integration at the element level for most types of element integrals. Second, all terms are broken up to into submatrices that are identical for all elements, so that the matrices only need to be computed once. Third, all subsequent multiplications with nodal values can be executed in a single global vectorized multiplication. In some cases, the vectorization may require multiplications on third order tensors. The speedup I could achieve with this technique ranges between a factor of 100 for the assembly of the global stiffness matrix in the groundwater flow problem to a factor of 1000 for the evaluation for source terms in the adjoint states equations.

Chapter 10

Application to Artificial Data In this chapter, I test the proposed new method by running test cases on artificial data sets. Special aspects of interest are the impact of data quality and quantity, the consequences of lumping the transverse and longitudinal components of dispersion into a single scalar parameter, the consequences of using temporal moments instead of full breakthrough curves, and the stability of the method. Once these issues are covered, I can confidently apply the new method to the experimental data set in the next chapter. I generate the artificial data sets numerically on a computer. The advantage is, that the ’unknown reality’ is a known realization of log-conductivity used as artificial reality. Hence, all parameters and state variables are exactly known everywhere in the domain, unaffected by measurement error or a limited number of measurements. Further, systematic error or uncontrolled conditions that may appear in some experiments do not exist and the forward model describing the system behavior is exact. This allows to directly compare the results of the inverse model to the artificial reality. The test cases generated in this chapter are smaller versions of the experiments conducted within the framework of the superior project. Therefore, the dispersion coefficient of interest quantities effective dispersion in the sense of Dentz et al. (2000a) [22], and the dimensions of the virtual domain reflect the laboratory scale.

10.1 Basic Test Case To produce artificial data sets for test cases, I generate realizations of log K using the exponential model and the parameters shown in Table 10.1. Then, I solve the groundwater flow equation and the generating equations for the first and second central temporal moments. The virtual domain is rectangular. The boundary conditions for the flow problem are a fixed head condition on the left and right sections and a no-flow condition on the top and bottom sections, enforcing groundwater flow from left to right. The boundary conditions for the tracer are an instantaneous release on the left boundary at time zero, zero-flux on top and bottom, and no diffusive flux on the right. In the moment generating equations, I use a local dispersion tensor defined by the local longitudinal and transverse dispersivities α` and αt . For the parameter values relevant for flow and transport, see Table 10.1. Figure 10.1 shows the artificial reality generated for the basic test case, represented by the random realization of log K, the first and second central temporal moment and a flow net. Finally, I pick values of log K and the state variables at chosen locations and add white noise to obtain artificial

10.1 Basic Test Case

109

Table 10.1: Parameters for artificial test cases parameter

units

value

domain length Lx grid spacing dx

m mm

4 10

correl. length λx mean K = exp Y

m

0.3 1 · 10−3

data spacing Y data spacing m1 error σY error σm1 porosity dispersivity al mean ∇φ

% mm -

5/6 · λ 5/6 · λ 1 10 0.3 5 0.0025

m s

parameter

units

value

domain length Ly grid spacing dy

m mm

0.5 5

correl. length λy variance σY2

m -

0.075 1

data spacing φ data spacing m2c error σφ error σm2c diffusion Dm dispersivity αt mean v

mm %

5/6 · λ 5/6 · λ 1 25 10−10 0.5 8 · 10−6

m2 s

mm m s

measurements. The spacing of measurements in Table 10.1 is relative to the correlation length in both x and y direction. These artificial data I use as input data for my suggested method of geostatistical inversion to identify the unknown parameters log K and log Ds . The basic inversion algorithm is the Quasi-Linear Geostatistical Approach by Kitanidis (1995) [56] stabilized by an uncertain mean and the modified Levenberg-Marquardt algorithm from Chapter 6. To speed up all geostatistical computations, I use the spectral methods presented in Chapter 8. As observations, I consider values of log K , total flux, hydraulic heads and the first and second central temporal moments of local breakthrough curves. The sensitivities of the observations with respect to the unknown parameters are derived in Chapter 7. I solve and evaluate all partial differential equations and the sensitivities using the Finite Element Method and the analytical solutions for element-related matrices from Chapter 9. The uncertain mean I used for the distribution of log Ds consists of two components that are discussed in the next section. For the greatest part of all artificial test cases, I assumed the structural parameters in the geostatistical models for log K and log Ds to be known. The structural parameters for log Ds are identified in a separate test case. Figure 10.2 displays the results, represented by the estimated distribution of log K and log D s , the simulated first and second central temporal moments, and the flow net. The black dots in white circles mark the locations of observations of log K, hydraulic head and breakthrough curves. The inversion procedure converged in less than ten iteration steps. I do not show the estimation variance or conditional covariance of the estimated parameters here since they do not contribute to the discussion in this section. When comparing the estimated log K to the random distribution of log K used as artificial reality, it is clear that the estimated field is smoother, but simulates most features at the scale of data spacing. As expected, both the flow net and the first temporal moment computed with the estimated field reflect this deficit of small-scale variability. With the lack of variability, dispersion should be underpredicted. Through the jointly estimated scalar log-dispersion coefficient log Ds , however, the lack of variability is parameterized and the degree of dispersion quantified by the observations of the second central temporal moment is met.

110

Application to Artificial Data

Figure 10.1: Test case. True log K distribution and resulting flow net and temporal moments from numerical simulation

Figure 10.2: Results of geostatistical inversing for an artificial data set extracted from the test case displayed in Figure 10.1. The estimated geometric mean value of Ds is 3.1 · 10−8 m2 /s. This corresponds to a dispersivity of α = 0.0038m, a value between the local longitudinal and transverse dispersivity used for generating the artificial problem. Since the available data were sufficient to characterize the flow field down to a rather small scale, this small order of magnitude was to be expected. I discuss the dependency of

10.2 Input Data and Properties

111

log Ds on the quantity and quality of input data in the upcoming section. There, I also discuss the relation between log Ds as a scalar coefficient and transverse and longitudinal dispersion.

10.2 Input Data and Properties This section investigates the impact of data quantity and quality. It has effects on (1) the magnitude of the estimated dispersion coefficient, (2) the predictive power of the estimated parameters, (3) the quality of estimated breakthrough curves and (4) the character of the dispersion coefficient.

10.2.1 Magnitude of the Coefficient To investigate the impact of data quantity and quality, I performed a series of additional test cases. They are mostly identical to the base case presented above, but they differ in the number of measurements and in the error attributed to the measurements in the conditioning procedure.

Figure 10.3: Effect of doubling the transverse resolution of the measurement grid

Impact of Data Quantity and Quality Figure 10.3 shows a test case in which the space between observation points in transverse direction is half of the space in the base case (see Figure 10.2 for comparison). Apparently, more variability in the conductivity field can be resolved, so that the plume is dispersed by the flow field to a higher degree. The estimated dispersion coefficient is adequately lower: it has to parameterize a smaller amount of unresolved variability. Vice versa, a lower number of measurements results in smoother

112

Application to Artificial Data

Figure 10.4: Effect of doubling the measurement error conductivity and flow fields that require a larger dispersion coefficient. Again, white circles with black dots mark the locations of measurements. The density of data points along transverse directions is more important than in longitudinal direction. The reasons are that the distortion of solute clouds occurs mainly along the direction of mean flow, so that the correlation length of transport-related quantities is much higher in longitudinal direction than in transverse direction. This issue is closely related to the anisotropy of the spatial structure of log Ds addressed in Section 10.3. Figure 10.4 shows an example from the test cases to illustrate the impact of data quality. Here, the measurement error was double compared to the base case. Lower data quality leads to less variability in the estimated conductivity field, resulting in larger values of the estimated dispersion coefficient. 2 A simple scalar measure for the quality and quantity of data is σ ¯ Y,c , the spatial average of the estimation variance of Y = log K normalized by the prior variance of log K. In Figure 10.5, I plotted 2 the mean value of log Ds versus σ ¯Y,c for all test cases. It makes clear how the value of the estimated dispersion coefficient decreases with data quality and quantity.

Cirpka and Nowak (2003) [15] provide analytical expressions for the effective and macrodispersion tensor based on the averaged posterior covariance of log-conductivity fields after kriging. Their general conclusions confirm my findings. For a more detailed analysis, their method could be extended to include indirect measurements. This, however, would exceed the scope of my thesis. Limiting Cases and Base Functions for the Prior Mean The lower limiting value of log Ds appears when the data are sufficient to fully resolve the variability of the conductivity and flow field. The resulting expected value is in the order of the transverse local

10.2 Input Data and Properties

113

mean value logD

−16

−17

−18

0.2

0.4 0.6 0.8 normalized est. var. logK

1

Figure 10.5: Estimated log Ds an data quality and quantity dispersion coefficient. The upper limiting case is when the available data resolve no variability at all. The resulting expected value for this case is in the order of the travel-time dependent longitudinal effective dispersion coefficient according to Dentz et al. (2000a) [22]. Why a longitudinal coefficient sets the upper limit and a transverse coefficient sets the lower limit is discussed in Section 10.2.3. Obviously, the expected value of log Ds is rather uncertain and depends on the amount of information contained in the available flow data. To include this insight in the inversion procedure, I define the prior mean of log Ds as uncertain mean with two base functions. The first base function is spatially uniform and has an uncertain value according to the lower limiting case. The second is a linear trend in the direction of macroscopic flow, approximating the early-time behavior of the longitudinal effective dispersion coefficient.

10.2.2 Estimated Breakthrough Curves In this section, I compare the breakthrough curves reproduced by the inverse model to the original breakthrough curves. In the inverse model, I only use the first and second central temporal moment of the breakthrough curves. The main question for this section is, how good the moments characterize the breakthrough curves and how much information is lost by disregarding the full time series. To answer this question, I evaluated the above series of test cases under a different aspect. Using a transient transport model on the estimated parameters fields, I computed the estimated breakthrough curves and compared them to the original ones. Figure 10.6 shows how the match of the breakthrough curves improves with decreasing data spacing. Subfigure (a) shows the original breakthrough curves from the artificial data set. Subfigures (b) to (d) compare the estimated breakthrough curves (black) to the original ones (gray) for decreasing transverse spacing ∆t of observations relative to the correlation length in y-direction with ∆t = 2λy (a), ∆t = λy (b) and ∆t = 0.5λy (c).

114

Application to Artificial Data

All cases reproduce the first and the second central temporal moments of the measured breakthrough curves, since the moments were used as input data. For data spacing larger than the correlation length (Subfigure b), the match of the overall shape is poor. For example, bimodal breakthrough or tailing effects are not reproduced well. For data spacing equal to the correlation length, (Subfigure c), the match becomes good. Introducing additional observations still improves the match a little (Subfigure d), but is not in a feasible relation to the additional quality of the breakthrough curves.

c/cref

1

c/cref

0 1

(a)

(b)

0 c/cref

1

(c)

0 c/cref

1

(d)

0 0

0.2

0.4

t/tend

0.6

0.8

1

Figure 10.6: Simulated breakthrough curves for different data spacing The shape of breakthrough curves can be met although I only use lower order temporal moments for the following reason. For conservative tracers, effects like tailing and bimodality of breakthrough are caused by transverse diffusive-dispersive exchange between adjacent streamlines with different velocities, arrival time and spread of breakthrough curves. Geostatistical inversion interpolates both the unknown parameters and the state variables between locations of observations. This includes the arrival time and spread of breakthrough in streamlines adjacent to observations. With increasing quality and quantity of input data, these quantities are interpolated more reliably between the observations, and the shape of the breakthrough curves approaches their true shape. The conclusion is, that the ability of temporal moments to characterize other characteristics of the breakthrough curves not included in the temporal moments depends on data quality and quantity.

10.3 Structural Parameters

115

10.2.3 Longitudinal and Transverse Character This section deals with the character of the estimated dispersion coefficient. Since it is a scalar quantity, the same value applies to both transverse and longitudinal dispersion. I will show that, under certain conditions, either its longitudinal or its transverse effect is the dominant factor for longitudinal effective dispersion and hence determines the estimated value. In Sections 4.2.1 and 4.2.4, I discussed the influence of transverse local dispersion on the second central temporal moment and on longitudinal effective dispersion. With strong transverse contrasts in the flow field, transverse local dispersion is aliased into longitudinal effective dispersion. The aliasing effects dominate over longitudinal local dispersion. Based on the generating equation for the second central temporal moments, I predicted that above a certain value of local transverse dispersion, the effect of aliasing is again reduced since transverse contrasts in the first temporal moment are suppressed. Further, I concluded that the exact behavior in this respect depends on the magnitude of resolved small-scale variability in the flow field. The amount of variability recovered by geostatistical inversing of course depends on the quality and quantity of input data. Hence, I expect the estimated dispersion coefficient to have a different character depending on the quantity and quality of input data. In absence of transverse contrasts in the flow field, the transverse effect of the estimated dispersion coefficient is irrelevant. Then, its value is identified based on its longitudinal effects. If transverse contrasts of the flow field are well-resolved, its transverse effects are aliased into longitudinal effective dispersion. The direct longitudinal effects become less relevant and its value is identified mainly based on the aliased transverse effects. To support these considerations, I set up one more series of test cases. Using the relation α ` = αt = D/v, I separate the transverse and longitudinal dispersivities in the scalar log-dispersion coefficient. Then, I can perform a sensitivity analysis on the second central temporal moment to check the relative importance of the two components. Since I expect the sensitivity to depend on the available data, I performed the sensitivity analysis for all test cases considered in Section 10.2.1. Figure 10.7 illustrates the result. It shows the sensitivity of m2c averaged over the outflow section with respect to the components α` and αt separated from log D plotted over the averaged normalized estimation variance of log K. Higher data quality and quantity corresponds to lower estimation variance. The crosses correspond to a simple average of m2c , while the circles correspond to the macroscopic averaging procedure including the variance of m1 at the outflow section in the average of m2c . Clearly, for high data quality and quantity, the second central temporal moment depends mainly on the separated transverse component of the estimated dispersion coefficient. For low data quality and quantity, it depends mainly on the separated longitudinal transverse component. Further, Subfigure (b) shows that the sensitivity of the second central moment with respect to the transverse component is indeed negative.

10.3 Structural Parameters The structural parameters of log Ds , like its variance and integral scales, are not known a priori. Methods of variogram fitting are not applicable since direct measurements of this quantity do not exist. In principle, one could derive the covariance function of dispersion coefficients using linear stochastic theory based on the posterior covariance of the estimated conductivity field. This, however, it would exceed the scope of my thesis.

116

Application to Artificial Data

∂ m2c/∂αl

(a)

0.04 0.03 0.02 0.01

−∂ m2c/∂αt

(b)

0 1

0.5

0

0.2

0.4

0.6

σ2Y,c

0.8

1

Figure 10.7: Impact of components in log Ds depending on input data Hence, I decided to identify the structural parameters from the observations following the method by Kitanidis (1995) [56]. The mathematics are outlined in Section 6.4. To obtain an educated initial guess and uncertain prior knowledge, I suggest the following. Given realizations of log K with known structural parameters and corresponding estimated conductivity fields conditioned on hydraulic data, the second central temporal moment can be evaluated and the underestimation of the second central moment in the estimated field can be evaluated. The spatial derivative of this quantity along the direction of macroscopic flow is a rough estimate for a corrective longitudinal dispersion coefficient. The uncertain prior knowledge on the structural parameters can be obtained from this rough estimate. The actual values are hard to identify from the observations without prior knowledge. The structural parameters identified in different test cases differed significantly. However, all test cases showed one common pattern: the transverse correlation length λx is always smaller for log Ds than for log K. The longitudinal correlation length λy tends to be larger for log Ds than for log K. I explain this behavior as follows: Hot spots for dilution and mixing can be found at the fingershaped features of solute clouds that are distorted by the irregularity of advection in heterogeneous aquifers. To be more precise, it is the zones of sharp transverse contrasts in concentration along the fringes of these finger-shaped features. If these features are under-predicted in the estimated logconductivity field, the dispersion coefficient has to make up for the mixing zones at their fringes. Hence, the spatial structure of the dispersion coefficient must roughly mimic the spatial structure of missing fringes of the finger-shaped features. Typically, these features are very thin in the transverse and long in the longitudinal direction. Their average thickness is in the order of the transverse correlation length of log K, so that their fringes

10.4 Stability Analysis

117

typically are thinner than half the correlation length. In longitudinal direction, however, since the flow systematically seeks paths of low resistance, they may extend over longer distances than the correlation length of log K. The resulting spatial structure of log Ds has a lower transverse and a higher longitudinal correlation length than log K. As a rough estimate, we estimate that the transverse correlation length of log Ds is half as large as that of log K, and the longitudinal correlation length of log Ds is twice as large as that of log K. In the estimation of structural parameters, I include this educated guess as uncertain prior knowledge about the structural parameters. The values for the structural parameters identified in different test cases differ significantly. However, all test cases confirm that the transverse correlation length λ x is smaller for log Ds than for log K, and the longitudinal correlation length λy tends to be larger for log Ds than for log K. The problems in identifying the value of λx are most probably caused by an insufficient amount of measurements in transverse direction: In most cases, the transverse spacing of measurements was smaller than λx . As for λy , the results of estimation are relatively insensitive to its actual value under the given conditions. The spatial structure of log Ds is defined by the geostatistical parameterization Qss HT in eq. (6.16). The patterns within the sensitivity matrix H dominate over the spatial correlation described by Qss : Figure 7.1 in Chapter 7 shows that the patterns of the sensitivities with respect to log Ds extend along the up-gradient streamline from the location of measurement to the inflow boundary of the domain. Another reason might be that the size of the domain used in the test cases was definitively too small for log Ds to reach ergodicity, and the structural parameters might depend on the ability of single observations to resolve certain spatial structures or not.

10.4 Stability Analysis 10.4.1 Interdependencies Four aspects of interdependency are of special interest when analyzing the stability of the method: The interdependency of the processes, of the unknown parameters, of the sensitivities and of the observations. Interdependency of Processes In forward models for groundwater flow and solute transport, the process of transport depends strongly on the process of flow. Usually, for the case of constant fluid properties, this dependency is unidirectional. Transport processes are modeled based on the flow field evaluated before. The prediction of transport cannot be better than the prediction of flow. Inverse models for flow and transport behave the same: the process of transport cannot be inversed better than the process of flow. Further, if flow and transport are inversed jointly, the inversion of transport cannot converge before the inversion of flow has converged. I shall provide an example for this. The arrival time and entropy of a solute cloud in nature always increases along its flow path. Assume there are strong transverse contrasts in both quantities. At the same time, assume that the meandering of streamlines is estimated incorrectly, either because of poor flow data, or because the inversion of flow has not yet converged. Together, this causes an unphysical interpretation of the transport data: the observed arrival time and entropy may seemingly decrease along a streamline. In such cases, the estimated values for log K and log D locally approach excessively small values. The entire inversion procedure may stagnate.

118

Application to Artificial Data

The interdependency of processes is reflected in the interdependency of parameters, sensitivities and observations. Interdependency of Parameters In forward models, the parameters for flow and transport like hydraulic conductivity and dispersivity are independent of each other. The magnitude of dispersivity depends on the scale at which the system is modeled. The scale is fixed and does not change. When jointly inversing flow and transport, however, the required value of the dispersion coefficient depends on the amount of variability in the flow field that is not resolved. If the inversion procedure is iterative and successively resolves more and more variability, the required value of the dispersion coefficient changes during the course of iteration. Interdependency of Sensitivities Geostatistical inversion uses the sensitivities of the observations with respect to the unknown parameters to compute the auto-covariance between the observations. This quantity is used to interprete the available data. For non-linear problems, the sensitivities change during the inversion procedure, and so does the interpretation of data. Different types of observations are more or less affected. Observations of conductivity, for example, are not affected. The sensitivities of flow-related quantities depend on the current value of flow parameters, i.e., of conductivity. The sensitivities of transport-related quantities depend on both flow and transport parameters and on flow-related state variables. Moreover, the sensitivities of higher order temporal moments depend on the current value of lower order temporal moments. These interdependencies can be seen from the adjoint state equations and expressions for sensitivities in Chapter 7. In summary, the sensitivities depend on all previous quantities in the following order: log-conductivity log K, hydraulic head φ, first temporal moment m1 and second central moment m2c . The degree of non-linearity increases in the same order, so that the later quantities are more prone to instabilities in the inversion process that is based on successive linearization. Interdependency of Observations Observations that are highly interdependent in the sense that the sensitivities are linearly dependent lead to a poor condition of the auto-covariance matrix of the observations. This gives rise to numerical error in the inversion procedure. Hence, if there is a choice, the types of observation should be as independent as possible. I discuss this point in more detail in the next section. Conclusions From these points, I make four conclusions: • The data must convey sufficient information on the process of flow. Otherwise, the process of transport cannot be inversed. • The inversion of transport cannot be more accurate than the inversion of flow. The error attributed to observations of transport-related quantities must include both measurement error and model error in the inversion of flow.

10.5 Summary and Conclusions

119

• The inversion of transport cannot be expected to converge before the inversion of flow has converged. If the overall inversion algorithm does not converge, the situation can be improved by first inversing flow and then using the result as initial guess for the joint inversion of flow and transport. Alternatively, the Levenberg-Marquardt algorithm introduced in Chapter 6 can be set up to modify the measurement error of flow and transport data separately, such that transport data are attributed a high error until the flow data are fitted well. • If applicable, the types of observations should be chosen to be as independent as possible.

10.4.2 Normalized Second Central Moment Especially the second central temporal moment m2c is susceptible to the instabilities mentioned above. First, being a state variable of the transport process, it depends on both flow and transport parameters. Second, it depends on all other state variables of flow and transport for the same reason. The dependency of the second central temporal moment on the first temporal moment can be reduced when normalizing it by the latter. This yields the normalized second central temporal moment m2cn I defined in Chapter 3. Unfortunately, the generating equation for this quantity has several cumbersome source terms, so that I have evaluated it from m2c and m1 in a post-processing step. The sensitivities for m2cn , are also obtained from post-processing the sensitivities of m2c and m1 (see eqs. 7.43 and 7.50). Therefore, m2cn may be quite susceptible to numerical error. In order to assess the advantages and disadvantages of the normalization, I repeated some test cases, this time using observations of m2cn instead of m2c . The condition of the auto-covariance matrix of the observations improved by seven orders of magnitude. This is both due to a more uniform magnitude of the single entries and due to a reduced interdependency of the observations. The computational effort per iteration step slightly increased, but the number of iteration steps decreased by an average of twenty percent. There were no observable artefacts caused by numerical error. Unfortunately, I implemented the normalized second central moment only at a later stage of this thesis, so that most computations are based on observations of the second central moment.

10.5 Summary and Conclusions Altogether, the test cases showed that the method fulfills what it promises: Joint inversion of flow and transport produces estimated parameters that adequately represent the system to be modeled for both flow and transport considerations. I demonstrated this on test cases where I estimated both log-conductivity log K and a scalar log-dispersion coefficient log Ds given observations of hydraulic and tracer data. The properties of the estimated parameter field depend strongly on the quality and quantity of the input data used. For increasing quality and quantity of input data, the amount of variability resolved in the conductivity field increases, and the mean value of the dispersion coefficient decreases. If the variability is well resolved, the transverse effects of the scalar dispersion coefficient prevail. If resolving less variability, the longitudinal character dominates. The more information is available, the better the match of the full breakthrough curves, even if only the first and second central temporal moment of the breakthrough curves are used as input data. The structural parameters for the estimated dispersion coefficient cannot be known a priori but can be identified from the available data.

120

Application to Artificial Data

The method succeeded to converge in all test cases. However, four important aspects have to be kept in mind. (1) The conditioning on transport data cannot be more accurate and cannot converge earlier than the conditioning on flow data. (2) To ensure stability and obtain physically meaningful estimates, the flow field has to be characterized sufficiently well by data on flow-related quantities. (3) The measurement error assigned to observations of transport-related quantities in the inverse model has to account for both the actual measurement error and for the model error in the estimated flow field. (4) High interdependencies between observations can lead to numerical error in the inversion procedure. Normalizing the second central temporal moments of breakthrough curves helps to reduce the interdependency of the first and the second central moments and hence increases the stability of the method.

Chapter 11

Application to Experimental Data Within the framework of the superior project, Jose and Rahman (Jose, 2004, Rahman, 2004 [48, 76]) conducted tracer experiments in a large-scale pseudo 2-D sandbox. The purpose of the experiments was to verify theoretical studies on effective dispersion, dilution and reactive mixing by experimental data. Both authors evaluated and discussed their experimental results using analytical solutions and apparent dispersion coefficients. In this chapter, I apply my proposed new method to the experimental data set by Jose (2004) to test how good the method can handle realistic conditions. These conditions include an imperfect match between the forward model and the experimental system, and heterogeneity in the form of natural sedimentation patterns that do not correspond to a known geostatistical model. Further, I compare the results from my geostatistical inverse model with their key points to cross-validate my model and their findings.

11.1 Experimental Data Set The experimental setup and theoretical background for the experiments are described in detail in the dissertations by Jose (2004) and Rahman (2004) [48, 76]. In the following, I provide a rough overview over the setup and experiments relevant for my thesis. Experimental Setup The sand box has dimensions of 14m ×0.13m × 0.5m (length × width × height). The front face of the long side is equipped with glass panels for direct observation of tracer migration. The left and right faces are connected to an inlet and outlet chamber with hydraulic switchboards to control the hydraulic boundary conditions. The sand is covered on top with a bentonite layer to simulate confined conditions. The sandbox is filled heterogeneously with four types of sand in lenses, differing in the grain size distribution. The lenses have an average length of 3m and height of 0.08m. The filling procedure resembled a sedimentation process, leading to microstructures within the lenses. Figure 11.1 displays the outline of the sand lenses in the box, and Figure 11.2 shows a 1m long section to exemplify the types of microstructures. Table 11.1 provides the data describing the four sand types. The grayscale

122

Application to Experimental Data

values in Figure 11.1 are black: fine sand, dark gray: medium sand, light gray: mixed sand, and white: coarse sand. The important aspect of the filling for geostatistical inverse modeling is, that the filling is not constructed deterministically according to a known distribution and does not obey a second order stationary multi-Gaussian covariance function. Due to the microstructures within the lenses and especially due to thin layers of ultra-fine grains, the lenses are not necessarily the hydraulically dominating structures in the spatial distribution of hydraulic conductivity.

Figure 11.1: Filling of the sandbox

Figure 11.2: Microstructures within the sand lenses The hydraulic boundary conditions for flow are given by a constant flux of 3`/s, resulting in a steady state flow field with a head difference of approximately 3.5cm across the 14m length of the box. Initially, the box is flushed with tracer-free water. Starting at time zero, tracer solutions were injected into the inflow of the box. A dense grid of piezometers and fiberoptic probes was installed. Ten cross-sections with ten piezometers each are evenly distributed along the box to collect measurements of hydraulic head.

11.1 Experimental Data Set

123

Table 11.1: Sand Types for the sandbox Sand Type

Grain Size

Hydraulic Conductivity

Coarse Mixed Medium Fine

1.0mm - 2.5mm 0.3mm-1.2mm 0.0mm-3.0mm 0.1mm-0.8mm

1.67 · 10−2 m/s 4.32 · 10−3 m/s 9.09 · 10−4 m/s 5.61 · 10−4 m/s

The piezometric probes are connected to an array of glass tubes on a board with a millimeter scale on it. Fiberoptic probes for point-like measurement of breakthrough curves are arranged in ten crosssections between the cross-sections of piezometers. The number of probes per cross-section alternates between ten and nineteen. The total flux can be measured at the outlet. Experiment on Longitudinal Dispersion In the setup described above, Jose (2004) [48] conducted a conservative tracer experiment to investigate longitudinal effective dispersion. He continuously injected a fluorescent tracer solution over the entire inflow section. Data of total flux and hydraulic head were collected from the steady-state flow field. The breakthrough curves of the tracer were collected in highly resolved time series every 60 seconds. The experiment lasted for roughly four weeks until the entire box was flushed with the tracer solution. Experiment on Transverse Dispersion Rahman (2004) [76] conducted an experiment on transverse effective dispersion. He injected a tracer solution of Cochineal Red A (E 124) into the lower half of the inflow section. At steady state, he took digital photos through the glass panels to visualize spatial distribution of the plume. Clogging of Pore Space and Temporal Fluctuations in Conductivity After a relatively short time period during the experiments, a head loss of one fourth was observed within the first meter of the sandbox. The experimentalists believe that this was caused by clogging of the pore space. Potential reasons are growth of biomass in spite of anti-microbial additives to the inflow. Further, surfactants added to suppress the sorption of tracers onto the sand grains have been observed to form larger micellae and fine submerged foam that might have reduced the free pore space. Over time, the total conductivity of the sandbox decreased, especially after the surfactants were introduced. The duration of a single experiment was about one month to obtain full tracer breakthrough. The relation between total discharge and total difference in hydraulic head was observed to fluctuate within these periods. By using a constant flux boundary condition for flow, these fluctuations are believed to not affect the flow velocities and hence to not affect the transport of the tracers. In case these fluctuations are caused by local effects, however, the streamline pattern changes over time. Then, the snapshot-like measurements of hydraulic head and the time-integrated measurements of arrival time may be inconsistent. Given the information and measurements collected during the experiments, this issue cannot be resolved.

124

Application to Experimental Data

11.2 Application Input Data As input data, I used measurements of hydraulic heads and temporal moments of local breakthrough curves from the experiment on longitudinal dispersion by Jose (2004) [48]. Prior to evaluating the first and second central truncated temporal moments, I applied a noise filter to the breakthrough curves. The truncated moments are identical to the temporal moments computed from the derivative of the breakthrough curves, hence corresponding to the case of instantaneous injection. Figure 11.3 shows an example of a measured breakthrough curve (top), and the curve after removing noise and taking the derivative (bottom). To minimize the impact of long-time drift effects of the fluorimetric system and the remaining noise, I defined cutoff points (black circles) and disregarded the time series before and after. Since the cutoff points were hard to choose in cases where an irregular long-time drift and the breakthrough signal were superimposed, some measurements are subject to a rather high measurement error. The measurement error I attributed to each type of measurement is listed in Table 11.2. I assigned an extra high measurement error to those suffering from extremely poor signalto-noise ratio or from excessive drift effects. To visualize the determined values of the temporal moments, the dash-dotted line shows a Gauss-function with same zeroth, first and second central temporal moment.

c/c0

1 (a) 0.5 0

∂/∂t(c/c0)

0.06 (b) 0.04 0.02 0 0

20

40

t[h]

60

80

100

Figure 11.3: Experimental and filtered truncated breakthrough curve

Prior Knowledge For the unknown parameters, both log K and log Ds , I used the exponential geostatistical model with uncertain structural parameters. As prior mean of log K , I used a constant uncertain mean plus an additional base function simulating the clogging of pore space in the first meter of the sandbox. The prior mean for log Ds is discussed in the previous Chapter. Table 11.2 displays the values of the parameters for prior knowledge. The prior mean values listed there correspond to the base functions for constant mean. I expected the clogged pore to have a value of log K lower by 2.5 with a variance

11.2 Application

125

of unity. The linear trend in log Ds I set up as an increase by unity over the length of the sandbox, again with a variance of unity. Table 11.2: Parameters and prior knowledge used for experimental data set Parameter

Units

Value

Parameter

m mm m2 s

14 5 10−10

Domain length Ly Grid spacing dy Porosity

log K variance σY2 log K prior mean βY∗ Uncertainty QββY log K clogging effect Uncert. of clogging

-

2 -6 1 -2 1

log K correl. length λx log K correl. length λy

m m

1 0.05

mm %

1 10

Domain length Lx Grid spacing dx Molecular diffusion Dm

Measurement error σφ Measurement error σm1

Units

Value

m mm -

0.5 2.5 0.45

log Ds variance σΞ2 log Ds prior mean βΞ∗ Uncertainty QββΞ log Ds trend (x) Uncert. of trend

-

4 -17 1 2 1

log Ds correl. length λx log Ds correl. length λy

m m

1 0.05

Measurement error σm2c

%

25

Results of the Geostatistical Inversion The estimation of the unknown parameters converged after approximately twenty inner iteration steps. Thanks to the stabilized Levenberg-Marquardt iteration algorithm, no oscillations or other instabilities occurred. The estimation of structural parameters (see Section 6.4) took three outer iteration steps and converged to values not far from the prior estimates. Overall, the geostatistical inverse model took about four weeks. Given the fine resolution of the domain, the computations had been strictly impossible without using the spectral methods from Chapter 8. Figure 11.4 shows the estimated parameter values together with the flow net and the distribution of the first and second central temporal moments. The black and white dots in log K mark locations of piezometric probes measuring hydraulic heads. The same in log Ds marks fluorimetric probes measuring breakthrough curves. Structural Parameters I derived the values for the correlation length of log K from the filling pattern of the sandbox. The variance of log K and log Ds as well as the values for correlation length λx and λy for log Ds are estimated from the input data. The resulting values are included in Table 11.3. Values that differed from their prior estimates only insignificantly are taken over from Table 11.2. The transverse correlation length for log Ds is half of the transverse correlation length for log K. The longitudinal correlation length is double. I offer a more detailed discussion on the values of structural parameters in Section 6.4.

126

Application to Experimental Data

Figure 11.4: Results of geostatistical inversing for experimental data set Posterior Knowledge The estimation variance of the identified parameters Y = log K and Ξ = log Ds is a measure for the remaining uncertainty after conditioning. It includes the uncertainty of the mean values. Its square root, the estimation standard deviation, is plotted in Figure 11.5. The averaged estimation variance 2 σ ¯est , a scalar measure of uncertainty, is listed in Table 11.3 together with other relevant posterior quantities. Apparently, the averaged estimation variance of log K is smaller than the prior variance by a higher factor compared to the estimation variance of log Ds . This was to be expected since the

11.2 Application

127

measurements of hydraulic heads contribute only to the estimation of log K.

Figure 11.5: Standard deviation of estimation (log K and log Ds ) for experimental data set The estimation variance does not account for spatial correlation among the unresolved variability. In geostatistical inverse modeling, large-scale variability is in general recovered more easily than smallscale variability. Hence, the estimation variance may be fairly large although a lot of uncertainty in large-scale fluctuations has been removed. A better measure of total uncertainty that accounts for spatial correlation is the averaged posterior covariance. I define it as follows: ¯ ss|y = det Qss|y Q

 n1

,

where n is the number of unknown parameters, Qss|y is the posterior covariance matrix (eq. 6.24) , and det (·) is the matrix determinant. This measure is taken from the normalizing factor of Gaussian probability density functions. It is not a suitable measure of overall uncertainty in case the posterior covariance matrix is rank-deficient, e.g., when direct measurements with zero estimation error are involved. The two measures are identical if there is no spatial correlation, i.e., for white noise. The posterior uncertainty of the mean values is quantified by the matrix Qββ|s (see eq. 6.4). This matrix contains the auto-covariances and the cross-covariances between the identified values of the uncertain drift coefficients. The estimation variances for each drift coefficient and the correlation coefficients resulting from Qββ|s are displayed in Table 11.4. Before conditioning, Qββ was an identity matrix, i.e., no correlation between the drift coefficients was assumed and the variance of each coefficient was unity. After conditioning, the uncertainty in the constant mean value of log K has almost vanished, probably caused by the high number of measurements of the first temporal moment. The effect of pore clogging is still rather uncertain. I attribute this to the fact that no measurements are located directly in the affected zone. The uncertainty in the mean value of log Ds is not reduced that much, most probably because the measurement of the second central moment were comparatively uncertain. The single contributions to the mean values of log K and log Ds are anti-correlated among each other, meeting the expectation for addends. The low values of the correlation coefficients between drift coefficients for log K and for log Ds show that the mean values of log K and log Ds are almost uncorrelated.

128

Application to Experimental Data

Table 11.3: Posterior parameter values from the experimental data set Parameter (Y = log K)

Units

Value

m m

2 1 0.05

Variance Correlation length λx Correlation length λy

2 Avg. est. var. σ ¯Y,est Avg. post. covar. QY Y |y

-

1.08 0.73

Posterior mean βˆY ˆ ββY Post. uncertainty Q Clogging effect Uncert. of clogging

-

-5.17 0.066 -2.51 0.79

Variance Correlation length λx Correlation length λy σY2

Parameter (Ξ = log Ds )

Units

Value

m m

8 2 0.025

2 Avg. est. var. σ ¯Ξ,est Avg. post. covar. QΞΞ|y

-

5.68 4.17

Posterior mean βˆΞ ˆ ββΞ Post. uncertainty Q Trend over 14 m Uncert. of trend

-

-17.96 0.72 1.04 0.72

σΞ2

Table 11.4: Estimation variances σ 2 and posterior correlation coefficients r for the uncertain drift coefficients mean of log K mean of log K trend of log K mean of log Ds trend of log Ds

2

σ = 0.0659 r = −0.4455 r = −0.0466 r = −0.0253

trend of log K

mean of log Ds

trend of log Ds

r = −0.4455 σ 2 = 0.7937 r = −0.0011 r = +0.0019

r = −0.0466 r = −0.0011 σ 2 = 0.3583 r = −0.9016

r = −0.0253 r = +0.0019 r = −0.9016 σ 2 = 0.7200

11.3 Discussion 11.3.1 Statistical Tests The validity of the solution can be tested statistically. First of all, the double value of the objective function (eq. 6.46) should follow the χ2 distribution with m degrees of freedom, where m is the number of observations. The sum of squares of the orthonormal residuals ˆrn (eq. 6.42) should behave the same. The difference between the two quantities is, that the orthonormal residuals are based on the linearization about the previous estimate, while the value of the objective function is evaluated exactly. With an increasing degree of non-linearity, their values may differ significantly. For the current application, the number of measurements is m = 346 and the values of the above quantities are χ2OF = 202.38 and χˆ2rn = 342.74, respectively. Both values are below their expected value but are within the 95% confidence interval. The orthonormal residuals are plotted in Figure 11.6. Plus symbols are for measurements of hydraulic head, circles for the first temporal moment, and x-marks for the second central temporal moment. In the optimum case, the orthonormal residuals should be normally distributed with zero mean and variance of unity. They should behave like white noise without any spatial correlation or other structure. The value of their mean and variance is close to optimal at µˆrn = −0.05 and σˆr2n = 0.99, respectively. The plot, however, reveals slightly higher values for measurements of m 2c

11.3 Discussion

129

4

rn

2 0 −2 −4 0

50

100

150 200 250 Measurement No.

300

350

Figure 11.6: Orthonormal residuals for experimental data set than for m1 . Further, there is a slight spatial trend: the residuals tend to be larger for the lower numbers of each observation type, which corresponds to the measurements close to the inlet of the sandbox. Altogether, this lets me put a high trust in the validity and quality of the solution everywhere except close to the inlet. In that region, both the clogging of pore space is uncertain and the measurements show a larger mismatch than in other regions.

11.3.2 Data Quality and Quantity The quality of the input data is not entirely satisfactory. The deviations of measured hydraulic head from the macroscale expectation in this experiment is typically in the range of a few millimeters. Compared to a measurement error of one millimeter, this results in a poor signal-to-noise ratio of about 5:1. Almost half of the measured breakthrough curves were affected by long-time drift and fluctuations with the same magnitude as the breakthrough signal itself. Additionally, the measured breakthrough curves suffered from a signal-to-noise ratio varying between 100:1 for good probes and 10:1 for bad probes, with a few exceptional outliers that I discarded entirely. This corresponds to roughly twice the measurement error used for the base case in the previous chapter. Overall, the quality of hydraulic data was more questionable than the quality of the tracer data. I expect the resulting parameter values to be based more on information from tracer data than from hydraulic data. Further, the hydraulic conductivity close to the inlet changed due to clogging of the pore space, and fluctuations of the total head difference were observed during the experiments. The hydraulic head is measured as single snapshots in time, while the temporal moments of breakthrough curves are quantities integrated over the entire duration of the experiment. This can lead to inconsistencies between the measurements of hydraulic heads and those of temporal moments. The quantity of the input data, however, is high. Measurements of the hydraulic head are available once per correlation length of log K in both directions. Measurements of breakthrough curves are available at the same spacing, and additionally every second cross-section had the double number of probes in transverse direction. According to the experience made in the previous chapter, this

130

Application to Experimental Data

density of data is sufficient to ensure a reliable estimate in spite of the relatively high measurement error. In Chapters 5 and 10, I discussed the impact of data quality and quantity on the character of log D s . With the expectedly good characterization of the flow field, log Ds should be dominated by its transverse effects. The resulting values should be in the order of the transverse local dispersion coefficient rather than in the order of the longitudinal effective dispersion coefficient.

11.3.3 Verification Comparison to the Filling Pattern Figure 11.7 directly compares the pattern of the sandbox filling (Figure 11.1) and the estimated distribution of log K (Figure 11.4). The grayscale is matched for direct comparison. The conductivity values for the sand types are as shown in Table 11.1. After a closer look at the figure, the beholder can realize that most areas of high and low estimated conductivity are in accordance with the conductivity suggested by the filling pattern. From the amount of input data, one could assume that the pattern of the sand filling should have been captured more accurately. There is, however, a list of good reasons why this assumption is not justified. First, as expected from its nature, geostatistical inversion cannot reproduce sharp outlines of lenses, unless the respective outlines are included as separate base functions for the uncertain mean or the measurement spacing is much smaller. Second, the measurements of hydraulic head have a relatively high signal-to-noise ratio and partially convey erroneous information. Third, the conductivity of the single sand types was measured in permeameter tests without the microstructures produced in the sandbox. The microstructures within the lenses, especially the layers of ultrafines, may play an important role for the hydraulic properties of the sandbox that are not apparent in the filling patters. Fourth, the filling pattern does not correspond to any known theoretical geostatistical model, especially the downward curved shape of single lenses. Comparison to Apparent Longitudinal Effective Dispersion Jose (2004) [48] used the temporal moments of the measured local breakthrough curves for two types of evaluation. First, he computed the apparent effective dispersion coefficient D`a according to Cirpka and Kitanidis (2000) [13] for each measurement location. Other than the longitudinal effective dispersion coefficient D`e by Dentz et al. (2000a) [22], D`a is an average quantity averaged over the flow path up to each respective measurement location. The value of D`a , averaged over single cross-sections of measurements, starts with D`a = 0.75 · 10−5 m2 /s at the inlet and increases towards the outlet with a value of D`a = 2.75 · 10−5 m2 /s. Single hot-spots of mixing identified by extreme values in the probewise values of D`a are at (x, y) = (7m, 0.25m), (x, y) = (9.75m, 0.25m) and at (x, y) = (12.6m, 0.4m). Second, he fitted the model of effective dispersion Dentz et al. (2000a) to the measured data with the structural parameters for log K and the isotropic local dispersion coefficient Dloc as unknowns. The resulting values for the correlation length are λx = 1.43m, λy = 0.039m, σY2 = 2.45 for the variance, and Dloc = 8.24 · 10−9 m2 /s. D`a is a quantity related to uniform stationary macroscopic flow fields with no variability resolved. The value of log Ds obtained from the geostatistical inverse model is coupled to the heterogeneous field of log K identified from the input data. With the variability of flow partially resolved, the value of Ds expectedly is significantly smaller: The geometric mean value of Ds = exp (Ξ) is 1.59 ·

11.3 Discussion

131

Figure 11.7: Comparison of sandbox filling and estimated log K 10−8 m2 /s, increasing up to 4.57 · 10−8 m2 /s in a log-linear trend towards the outlet of the sandbox. This is lower than D`a by three orders of magnitude. The hot-spots of mixing are identified based on the measurements of temporal moments. Since these were used as input data into the inverse model, the inverse model per definition reproduces these hot-spots. The hot spot at (x, y) = (7m, 0.25m), for example, is visible in Figure 11.4 as a sudden increase of m2cn at a height of 0.25m upstream of the probes located at x = 7m. It is achieved by a higher value of log Ds upstream of the probes in question, and by a zone of shear flow between x = 6m and x = 7m occurring as high transverse contrasts in both m1 and specific discharge. The latter can be seen from the distorted streamline pattern in the respective area. The same phenomena can be observed for the hot-spot at (x, y) = (12.6m, 0.4m) at a height between y = 0.3m and y = 0.5m upstream of the probes in question. For the hot spot at (x, y) = (9.75m, 0.25m), these phenomena are not visible that clearly. Another reason why Ds is smaller than D`a is that, for a well-resolved flow field, the expected character of Ds is close to that of a transverse local dispersion coefficient. The value of Dloc fitted by Jose (2004), Dloc = 8.24 · 10−9 m2 /s, corresponds to a flow field with all variability above the Darcy-scale resolved. Since the variability flow field is not fully resolved by the inverse model, the mean value ¯ s = 1.59 · 10−8 m2 /s is higher than Dloc . The difference is merely a factor of two, indicating that the D data were sufficient to resolve a great part of the variability that is relevant for dispersion. Predicting the Experiment on Transverse Dispersion The experiment on transverse dispersion conducted by Rahman (2004) was not included when identifying the unknown parameters log K and log Ds . I fed the identified parameters into a Finite Element transport model with adequate boundary conditions and solved for steady-state concentration.

132

Application to Experimental Data

Then I compared the results of the experiment and the simulation. In the experiment, Rahman used a fixed flux boundary condition and introduced the tracer solution into the bottom half of the flux. The boundary conditions for the transport model are chosen so that the dividing streamline for the plume is at 50% of the total discharge. Figure 11.8 compares a digitally processed image of tracer concentration (a,c) taken from Rahman (2004) to simulated concentration (b,d) together with the estimated flow net. The thick black line marks the dividing streamline of the plume in the inlet. The grayscale is normalized so that black-towhite corresponds to maximum-to-zero concentration. The details of image processing are explained in Rahman’s thesis. The intensity of red color is roughly proportional to tracer concentration. Additionally, I inverted and magnified the RGB channels to enhance the contrast between the sand tinted red by the tracer and the no-discolored sand to a maximum in grayscale. In regions of lower tracer concentration, disturbances from the color of sand and single reflecting sand grains may be significant. However, the picture is good enough for a rough cross-check. Obviously, there is a mismatch in the tracer pattern in the first three meters of the sandbox. After that, the match is quite good with a few exceptions. Apparently, the hydraulic properties of the sandbox within the first three meters of the sandbox were estimated poorly. This coincides with the conspicuously high orthonormal residuals of the measurements near the inlet shown in Figure 11.6. Further, it coincides with the relatively high uncertainty in estimating the influence of clogged pores in the very same region. Another potential explanation is, that this experiment was conducted a year later than the experiment used to identify the parameter values. During that year, clogging of pore space through the surfactant solutions at different concentrations can have altered the distribution of hydraulic conductivity. Whether the inverse model failed to adequately process the input data in that region, whether the measurements were too much affected by error, or whether the hydraulic properties of the sandbox changed over time cannot be analyzed from the available information. To answer this question, the experiments would have to be repeated. Further, the setup of the box would have to be modified in order to allow a better signal-to-noise ratio for measurements of hydraulic head. Streamlines and contour lines of the hydraulic head are always orthogonal to each other. Better input data related to the hydraulic head would significantly improve the match of simulated streamlines for the entire sandbox, also leading to a better fit of the dividing streamline. Comparison to Apparent Transverse Local Dispersion Rahman (2004) [76] evaluated the experiment on transverse dispersion by extracting highly resolved transverse profiles of tracer concentration from the digital images. Then he fitted an analytical solution of the advection-dispersion equation to the concentration profiles and obtained values for an apparent transverse dispersion coefficient Dta . An important effect can be seen in the simulated concentration of tracer in Figure 11.8. The width of the transverse transition between high and low concentrations depends not only on dispersion, but is most sensitively influenced by local convergence or divergence of streamlines. If the streamline pattern is not known, the distance between the two lines is interpreted as a result of transverse dispersion alone. This can lead to unphysical values of estimated dispersion coefficients. To minimize this effect, Rahman corrected the profiles for the streamline pattern prior to fitting, using the streamlines from my simulations. The resulting value for Dta fluctuates about Dta = 5.2 · 10−9 m2 /s. According to Dentz et al. (2000a) [22], the transverse effective dispersion coefficient that applies here is identical to the transverse local dispersion coefficient. This is roughly consistent with the value of Dloc = 8.24 · 10−9 m2 /s

11.4 Summary

133

Figure 11.8: Tracer concentration in transverse experiment and in simulation found by Jose (2004) [48]. Again, this confirms the magnitude of the mean value identified for D s , ¯ s = 1.59 · 10−8 m2 /s: at the limit of exactly resolving the flow field, the value of Ds should be close D to the transverse local dispersion coefficient. Comparison of Breakthrough Curves As done for the artificial data sets used in the previous chapter, the identified parameters were used in a transient transport model to simulate the full breakthrough curves for direct comparison. This task has been processed in the Master’s Thesis of M. Anisur Rahman (2004).

11.4 Summary In this chapter, I successfully applied my proposed new method to an experimental data set from a tracer test in a laboratory-scale sandbox conducted by Jose (2004). The inversion procedure converged after three outer iteration steps for the structural parameters of log Ds with each about twenty inner iteration steps for the identification of the unknown parameters log K and log Ds . The computational effort was acceptable and no oscillations occurred thanks to the spectral methods from Chap-

134

Application to Experimental Data

ter 8, the efficient Finite Element formulations from Chapter 9 and the stabilized iteration algorithm from Chapter 6. The solution passes all statistical tests. Direct comparison of the identified distribution of log K to the filling of the sandbox showed a moderately good match. The imperfect match, however, can be justified with a long list of reasons. Predicting a different experiment on transverse dispersion by Rahman (2004), however, revealed a poor match of the streamline pattern in the first three meters of the sandbox. This coincides with conspicuously high orthonormal residuals of measurements in that region, and with the uncertainty introduced by clogged pores in the very same region. This stresses how important it is to accurately control and measure the hydraulic properties if collecting input data for the proposed new method. Overall, the quality and quantity of input data was sufficient to adequately characterize the heterogeneous structures of the sand filling in most parts of the sandbox. The hence expected character of log Ds is rather that of a transverse dispersion coefficient. The magnitude of the identified dispersion parameter is in accordance to the values for local dispersion and transverse effective dispersion found by Jose (2004) and Rahman (2004). Single hot-spots of dispersion identified by Jose (2004) were reproduced by the inverse model.

Chapter 12

Summary and Conclusions 12.1 Summary In this dissertation, I hypothesized that: Geostatistical inversing produces smoothed media equivalent to respective heterogeneous media for both flow and transport only if the processes of flow and transport are considered in the conditioning procedure. The medium must be described by parameters relevant for flow as well as for transport, conditioned on observations of quantities related to both flow and transport. As obvious this thesis may sound, it has not been formulated in literature or applied up to presence. Based on this thesis, I successfully developed, implemented and applied a new method for geostatistical identification of flow and transport parameters the subsurface. The parameters featured in this study are log-conductivity and a scalar log-dispersion coefficient. The extension to other parameters is straightforward. Geostatistical identification of flow parameters is well-known. However, simultaneous identification together with transport parameters is new. Estimated log-conductivity fields from geostatistical inversing per definition do not resolve the full variability of heterogeneous aquifers. Therefore, in transport simulations, the dispersion of solute clouds is under-predicted. Macrotransport theory defines dispersion coefficients to make up for a total lack of variability in computational models. Since estimated log-conductivity fields resolve parts of the variability, dispersion coefficients from macrotransport theory used on estimated conductivity fields would over-predict dispersion. Up to presence, only a few methods existed that allow to use estimated conductivity fields for transport simulations. These methods, however, are either associated with unacceptable computational costs, merely apply to special cases, or are only approximations. All of these methods predict solute transport in heterogeneous media only in a stochastic sense and do not make explicit use of transport-related data. The new method presented here fills this gap: It explicitely meets the observed flow-related and transport-related input data. Since the input data are chosen to also quantify the dispersion of solutes, the resulting parameter fields simulate the correct amount of dispersion. Further, it is computationally efficient compared to most of the existing methods. As input data, I use hydraulic and tracer data to characterize both flow and transport processes. Hydraulic data considered are direct measurements of conductivity, hydraulic heads and total discharge, and tracer data considered are temporal moments of local breakthrough curves.

136

Summary and Conclusions

In the literature, three dispersive mechanisms are distinguished: the irregular motion of the center of mass of solute clouds, spreading and dilution. The parameter fields identified by the new method are adequate to simulate groundwater flow and advective-dispersive transport in the subsurface while qualitatively and quantitatively simulating each of the three dispersive mechanisms. Therefore, they accurately describe mixing of solute clouds and effective reaction rates in heterogeneous media. Summary of Methods The new method is based on the Quasi-Linear Geostatistical Approach by Kitanidis (1995) [56]. Among the variety of existing approaches, I chose it for its rigorous quantification of parameter uncertainty following the Bayesian framework, and because I could drastically speed it up using superfast spectral (FFT-based) methods. Other methods may have had more robust built-in optimization algorithms. I removed this drawback by designing a modified Levenberg-Marquardt algorithm to replace the original built-in Gauss-Newton like algorithm. I upgraded the Quasi-Linear Geostatistical approach by the concept of uncertain prior knowledge. This allows to include uncertain information, educated guesses, or deterministically known structures into the estimation procedure and increases the flexibility and stability of the method. For the new types of observations and unknowns, I derived the model sensitivities that are required for successively linearizing the forward model in the inversing procedure. In existing methods, the sensitivities of hydraulic heads and the first temporal moment of breakthrough curves with respect to log-conductivity had already been derived. The newly derived sensitivities are that of the total flux and second central temporal moment with respect to log-conductivity and that of the first and second central temporal moments with respect to the scalar log-dispersion coefficient. Furthermore, I summarized and extended spectral methods for Toeplitz matrices. With their help, I execute all expensive matrix operations in the Quasi-Linear Geostatistical Approach at ease. This includes multiplication, decomposition and inversion. I extended the available methods to matrixToeplitz-matrix multiplication, the solution of poorly conditioned Toeplitz systems, and the generation of a certain kind of non-stationary realizations. Last but no least, I derived analytical solutions of certain element-related matrices in the finite element method using third order tensors. Using these analytical solutions, the global system of finite element equations can be assembled in a fully vectorized and computationally most efficient manner. Summarized Properties of the New Dispersion Coefficient and the New Method The new dispersion coefficient is scalar, log-normal, efficient and specific for the following reasons: • I defined the unknown dispersion coefficient as a scalar quantity because single components of a full dispersion tensor could not be identified independently due to general restrictions in the acquisition of data. • I defined it to be distributed log-normally in order to ensure its non-negativity. • It is an efficient dispersion coefficient since the input data used here quantify effective dispersion in the sense of Dentz et al. (2000) [22]. • I call it a specific quantity because it makes up for the specific lack of variability in the simultaneously estimated log-conductivity field.

12.1 Summary

137

Since the parameter is specific for the jointly identified log-conductivity field, its properties change with the quantity and quality of available data. I investigated this dependency in a series of test cases on artificial data sets. At the limit of infinite information, the variability of log-conductivity is fully resolved. At this limit, transverse local dispersion aliased into longitudinal effects dominates over direct local longitudinal dispersion. Then, the dispersion coefficient effectively behaves like a transverse local coefficient. This is true for both its magnitude and for the quickly reached asymptotic limit. At the limit of no information, no variability is resolved in log-conductivity and transverse dispersion is almost insignificant. Then, the identified coefficient behaves like a longitudinal effective dispersion coefficient, with respect to both magnitude and a slow approach to the large-time limit. The spatial structure of the identified dispersion coefficient is defined by a few structural parameters, being the variance and the correlation length scales in the principal directions. They are identified from the available data following the Quasi-Linear Geostatistical Approach by Kitanidis (1995) [56]. The collective of test cases showed that the correlation length along the direction of flow is roughly double, and the transverse correlation is roughly half compared to the respective correlation length scales of log-conductivity. I related this behavior to the typical shape of fingers of solute clouds in heterogeneous media. The predictive power of the method of course increases with increasing quantity and quality of input data. In a series of test cases, I demonstrated that the whole breakthrough curves are simulated well. Although only the first and second central temporal moments have been used as input data, they are sufficient to fully characterize the entire system if more than one data point per correlation length of the conductivity field is available. Stability Issues The geostatistical inversion algorithm converged reliably and with no observable oscillations in almost all test cases. A stability analysis revealed several interesting aspects. In forward modeling, flow problems are solved first, and then transport is simulated in the resulting flow field. The simulation of transport cannot be more accurate than the simulation of flow. The very same holds for inverse modeling: The inversion of transport is unidirectionally coupled to the inversion of flow. I showed this apparent dependency in the linearized model sensitivities used in the inverse model. Based on this insight, I concluded the following. First, the available data must convey sufficient information on the process of flow. Otherwise, the streamline pattern is estimated poorly, the advective causalities in the system are misunderstood, and the interpretation of transport data is wrong, if not even unphysical. Second, following from the first point, the measurement error attributed to transport data must account for both actual measurement error and for the model error of the estimated flow field. Third, the inversion of transport can only converge after the inversion of flow has converged. Another aspect is the interdependency of input data. A further series of test cases proved that the second central temporal moment is more independent of the first temporal moment after normalization by the first temporal moment. The normalization lead to faster convergence and significantly better matrix conditions in sensitive parts of the iteration algorithm. Application to Experimental Data Jose (2004) and Rahman (2004) [48, 76] conducted large-scale laboratory experiments on longitudinal and transverse dispersion, dilution and mixing. The experiments were conducted in a heterogeneously filled quasi 2-D sand box of 14 meters length and 50 centimeters height. From the experiment

138

Summary and Conclusions

on longitudinal dispersion by Jose (2004) [48], I took a data set with a total of 100 measurements of hydraulic heads and about 150 measurements of local breakthrough curves. From the curves, I evaluated the first and second central temporal moments. Given these experimental data, I identified the log-conductivity distribution in the sandbox and the scalar log-dispersion coefficient using the new method. The inverse model converged reliably and yielded parameters that simulated the input data well. Statistical tests performed on the solution as well as an examination of the orthonormal residuals gave good results everywhere except close to the inlet of the sandbox. A direct comparison to schematic plots of the filling patters showed reasonable agreement. I used the identified parameters to predict the experiment on transverse dispersion by Rahman (2004) [76] for further cross-validation. Comparison to his data confirmed the previous findings. The quality and quantity of the input data and the conclusions from cross-validations suggest that the available data in most parts sufficiently characterize the experimental system. One single drawback was the relatively poor quality of the hydraulic data, leading to the observed mismatch close to the inlet. The mean value of the identified scalar log-dispersion coefficient was log Ds = −17.96, corresponding to a value for Ds of 1.5 · 10−8 m2 /s. It has a log-linear trend rising to 4.5 · 10−8 m2 /s towards the outlet of the sandbox. The variability of the sandbox filling is resolved for the greatest part, so that the scalar dispersion coefficient reflects mostly transverse dispersion. Hence, it should be a bit larger than the local transverse dispersion coefficient and much smaller than the longitudinal effective dispersion coefficient. This compares well to other methods quantifying dispersion in the sandbox experiments. Rahman (2004) [76] found an apparent local transverse dispersion coefficient of Dta = 5.2 · 10−9 m2 /s obtained from concentration profiles at cross-sections. Jose (2004) [48] determined a scalar local dispersion coefficient of Dloc = 8.24 · 10−9 m/s from fitting the dispersion model by Dentz et al. (2000) [22] to his experimental breakthrough curves. From the same data, he found that the apparent longitudinal effective dispersion coefficient shows a trend, rising from roughly D`a = 0.75 · 10−5 m2 /s at the inlet to about D`a = 2.75 · 10−5 m2 /s at the outlet. In the results of Jose (2004) [48], several apparent hot-spots of mixing can be identified based on extreme values in the apparent longitudinal effective dispersion coefficient. The parameters estimated by the inverse model simulate these hot-spots. The common pattern at these hot-spots is a shearzone of flow with a highly distorted streamline pattern that brings solute particles of different age from adjacent streamlines closely together.

12.2 Concluding Remarks Overall, the new method worked well in artificial test cases and to evaluate the large-scale laboratory experiment. As for all geostatistical inverse methods, the quality of the identified parameter fields and the predictive power depends on the quality and quantity of the input data. For field-scale applications, the issue of numerical resolution will be a continuing challenge. If one desires to quantify effective dispersion, point-wise measurements of breakthrough curves must be collected. The resolution of the computational forward model has to be fine enough to resolve flow and transport processes down to that scale. Fine resolutions with a few millimeters of grid spacing set a certain upper limit to the maximum domain size, dictated by computational power.

12.3 Future Research

139

12.3 Future Research The new method disclosed need for further research in the following areas: 1. One of the main purposes to identify effective dispersion coefficients is to accurately quantify dilution and mixing of solute clouds. The predictive power of the new method for mixingcontrolled reactive transport is still to be proved. This could be done by predicting reactive tracer experiments conducted within this project. 2. Instead of identifying a scalar dispersion coefficient, one could try to identify a full dispersion tensor in spite of all arguments against it. The identifiability of the single components could be assured by including sufficient prior knowledge, e.g., on the ratio of anisotropy. This ratio, however, cannot be more than an educated guess unless derived from computationally expensive methods, such as liner stochastic theory applied to the non-stationary conditional covariance of the identified log-conductivity field. 3. The method can be extended to other transport parameters, like retardation factors and reaction rates. Adequate informative input data need to be included and the model sensitivities for the observed quantities with respect to the new unknowns must be defined. 4. The method currently uses one grid for both discretizing the unknown parameters and for numerical flow and transport simulations. The method could possibly be improved by mapping effective parameter values between different grids, e.g., using homogenization techniques. Then, transport-related equations could be solved using a streamline-oriented Finite Volume Method, allowing coarser grids for numerical purposes. 5. Spectral methods drastically reduced the computational costs in the new method. However, these spectral methods are restricted to regular equispaced grids. Effective algorithms for the underlying Fourier Transform on irregular grids have been developed in recent years. Geostatistics in general would greatly benefit if the toolbox of spectral methods was extended to irregular grids.

140

Summary and Conclusions

Bibliography [1] E.R. Anderman and M.C. Hill. Advective-transport observation (ADV) package, a computer program for adding advective-transport observations of steady-state flow fields to the threedimensional ground-water flow parameter-estimation model MODFLOWP. Open-File Report 97-14, U.S. Geological Survey, 1997. [2] M.P. Anderson. Using models to simulate the movement of contaminant through groundwater flow systems. CRC Critical Reviews in Environmental Control, 9:97–156, 1979. [3] S. Barnett. Matrices Methods and Applications. Oxford Applied Mathematics and Computing Science Series. Clarendon Press, Oxford, 1990. [4] R. Barrett, M. Berry, T. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. van der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, Philadelphia, PA, 1993. [5] J. Bear. Dynamics of Fluids in Porous Media. Elsevier Science, New York, 1972. [6] A.N. Brooks and T.J.R. Hughes. Streamline upwind/Petrov-Galerkin formulations for convection dominated flows with particular emphasis on the incompressible Navier-Stokes equations. Computational Methods in Applied Mechanical Engineering, 32(1–3):199–259, 1982. [7] J. Carrera and L. Glorioso. On geostatistical formulation of the groundwater flow inverse problem. Advances in Water Resources, 14(5):273–283, 1991. [8] R.H. Chan and M.K. Ng. Conjugate gradient methods for Toeplitz systems. SIAM Review, 38(3):427–482, 1996. [9] T. Chan and J. Olkin. Circulant preconditioners for toeplitz-block matrices. Numer. Algo., 6:89– 101, 1994. [10] O.A. Cirpka. Choice of dispersion coefficients in reactive transport calculations on smoothed fields. Journal of Contaminant Hydrology, 58:261–282, 2002. [11] O.A. Cirpka and S. Attinger. Effective dispersion in heterogeneous media under random transient flow conditions. Water Resources Research, 39(9):doi:10.1029/2002WR001931, 2003. [12] O.A. Cirpka, R. Helmig, and E.O. Frind. Numerical methods for reactive transport on rectangular and streamline-oriented grids. Advances in Water Resources, 22(7):711–728, 1999. [13] O.A. Cirpka and P.K. Kitanidis. Characterization of mixing and dilution in heterogeneous aquifers by means of local temporal moments. Water Resources Research, 36(5):1221–1136, 2000. [14] O.A. Cirpka and P.K. Kitanidis. Sensitivity of temporal moments calculated by the adjoint-state method, and joint inversing of head and tracer data. Advances in Water Resources, 24(1):89–103, 2000.

142

BIBLIOGRAPHY

[15] O.A. Cirpka and W. Nowak. Dispersion on kriged hydraulic conductivity fields. Water Resources Research, 39(2):doi:10.1029/2001WR000598, 2003. [16] O.A. Cirpka and W. Nowak. First-order variance of travel time in non-stationary formations. Water Resources Research, 40:doi:10.1029/2003WR002851, 2004. [17] J.W. Cooley and J.W. Tukey. An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation, 19:297–301, 1965. [18] N.A.C. Cressie. Statistics for Spatial Data. John Wiley & Sons, New York, 1991. [19] G. Dagan. Time-dependent macrodispersion for solute transport in anisotropic heterogeneous aquifers. Water Resources Research, 24(9):1491–500, 1988. [20] P.J. Davis. Circulant Matrices. Pure and Applied Mathematics. John Wiley and Sons, New York, 1979. [21] G. de Marsily. Quantitative Hydrology. Academic Press, San Diego, CA, 1986. [22] M. Dentz, H. Kinzelbach, S. Attinger, and W. Kinzelbach. Temporal behaviour of a solute cloud in a heterogeneous porous medium. 1. point-like injection. Water Resources Research, 36(12):3591–3604, 2000. [23] M. Dentz, H. Kinzelbach, S. Attinger, and W. Kinzelbach. Temporal behaviour of a solute cloud in a heterogeneous porous medium. 2. spatially extended injection. Water Resources Research, 36(12):3605–14, 2000. [24] C.R. Dietrich and G.N. Newsam. A stability analysis of the geostatistical approach to aquifer transmissivity identification. Stochastic Hydrology and Hydraulics, 3:293–316, 1989. [25] C.R. Dietrich and G.N. Newsam. A fast and exact method for multidimensional Gaussian stochastic simulations. Water Resources Research, 29(8):2861–9, 1993. [26] C.R. Dietrich and G.N. Newsam. A fast and exact method for multidimensional Gaussian stochasic simulations: Extension to realizations conditioned on direct and indirect measurements. Water Resources Research, 32(6):1643–52, 1996. [27] C.R. Dietrich and G.N. Newsam. Fast and exact simulation of stationary Gaussian processes through circulant embedding of the covariance matrix. SIAM Journal on Scientific Computing, 18(4):1088–107, 1997. [28] A. Fiori and G. Dagan. Concentration fluctuations in aquifer transport: A rigorous first-order solution and applications. Journal of Contaminant Hydrology, 45(1-2):139–63, 2000. [29] C.A.J. Fletcher. Computational Techniques for Fluid Dynamics, Vol. 1: Fundamental and General Techniques. Springer Series in Computational Physics. Springer Verlag Telos, New York, 2nd edition, 1996. [30] C.A.J. Fletcher. Computational Techniques for Fluid Dynamics, Vol. 2: Specific techniques for different flow categories. Springer Series in Computational Physics. Springer Verlag Telos, New York, 2nd edition, 1996. [31] M. Frigo and S.G. Johnson. FFTW: An adaptive software architecture for the FFT. In Proc. ICASSP, volume 3, pages 1381–1384, IEEE, Seattle, WA, 1998. http://www.fftw.org. [32] K. Gallivan, S. Thirumalai, and P. Van Dooren. A block Toeplitz look-ahead Schur algorithm. In M. Moonen and B. De Moor, editors, SVD in Signal Processing III, Algorithms, Architectures and Applications, pages 199–206, Elsevier, Amsterdam, 1995.

BIBLIOGRAPHY

143

[33] K. Gallivan, S. Thirumalai, and P. Van Dooren. On solving block Toeplitz systems using a block Schur algorithm. In Intern. Conf. Parallel Processing, Proc. ICPP-94, St. Charles IL, pp III-274–III-281, 1994. [34] K. Gallivan, S. Thirumalai, P. Van Dooren, and V. Vermaut. High performance algorithms for Toeplitz and block Toeplitz matrices. Linear Algebra and its Applications, 241–243(1-3):343–88, 1996. [35] L.W. Gelhar and C.L. Axness. Three-dimensional stochastic analysis of macrodispersion in aquifers. Water Resources Research, 19(1):161–80, 1983. [36] G.H. Golub and C.F. van Loan. Matrix Computations. Jon Hopkins University Press, Baltimore, Md, third edition, 1996. [37] J.J Gómez-Hernández, A. Sahuquillo, and J.E. Capilla. Stochastic simulation of transmissivity fields conditional to both transmissivity and piezometric data. 1. theory. Journal of Hydrology, 203:162–174, 1997. [38] I.J. Good. On the inversion of circulant matrices. Biometrika, 37:185–6, 1950. [39] T. Harter, A.L. Gutjahr, and T.C.-J. Yeh. Linearized cosimulation of hydraulic conductivity, pressure head, and flux in saturated and unsaturated, heterogenous porous media. Journal of Hydrology, 183:169–190, 1999. [40] C.F. Harvey and S.M. Gorelick. Temporal moment-generating equations: Modeling transport and mass-transfer in heterogeneous aquifers. Water Resources Research, 31(8):1895–911, 1995. [41] M.R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 49:409–436, 1952. [42] M.C. Hill. A computer program (MODFLOWP) for estimating parameters of a transient, threedimensional, ground-water flow model using nonlinear regression. Open-File Report 91-484, U.S. Geological Survey, 1992. [43] M.C. Hill. Five computer programs for testing weighted residuals and calculating linear confidence and prediction intervals on results from the ground-water parameter-estimation program MODFLOWP. Open-File Report 93-481, U.S. Geological Survey, 1994. [44] M.C. Hill. Methods and guidelines for effective model calibration. Water-Resources Investigations Report 98-4005, U.S. Geological Survey, 1998. [45] S. Holmgren and K. Otto. Iterative solution methods and preconditioners for block-tridiagonal systems of equations. SIAM J. Matrix Anal. Appl., 13:863–486, 1992. [46] J.R. Hughes. Finite Element Method. Prentice Hall International, Inc., 1987. [47] D. Hughson and T.-C.J. Yeh. An inverse model for three-dimensional flow in variably saturated porous media. Water Resources Research, 36(4):829–840, 2000. [48] S.C. Jose. Experimental investigations on longitudinal reactive mixing in heterogeneous aquifers. PhD thesis, Institut für Wasserbau, Universität Stuttgart, 2004, in progress. [49] T. Kailath and A.H. Sayed. Displacement structure: Theory and applications. SIAM Review, 37(3):297–386, 1995. [50] T. Kaliath and A.H. Sayed. Fast Reliable Algorithms of Matrices with Structure. SIAM, Philadelphia, PA, 1999.

144

BIBLIOGRAPHY

[51] P.K. Kitanidis. Parameter uncertainty in estimation of spatial functions: Bayesian analysis. Water Resources Research, 22(4):499–507, 1986. [52] P.K. Kitanidis. Predictions by the method of moments of transport in heterogeneous formations. Journal of Hydrology, 102(1-4):453–73, 1988. [53] P.K. Kitanidis. Orthonormal residuals in geostatistocs: Model criticism and parameter estimation. Mathematical Geology, 23(5):741–58, 1991. [54] P.K. Kitanidis. Generalized covariance functions in estimation. Mathematical Geology, 25(5):525– 540, 1993. [55] P.K. Kitanidis. The concept of the dilution index. Water Resources Research, 30(7):2011–26, 1994. [56] P.K. Kitanidis. Quasi-linear geostatistical theory for inversing. 31(10):2411–19, 1995.

Water Resources Research,

[57] P.K. Kitanidis. Analytical expressions of conditional mean, covariance, and sample functions in geostatistics. Stochastic Hydrology and Hydraulics, 12:279–94, 1996. [58] P.K. Kitanidis. On the geostatistical approach to the inverse problem. Advances in Water Resources, 19(6):333–42, 1996. [59] P.K. Kitanidis. Comment on "a reassessment of the groundwater inverse problem" by d. mclaughlin and l. r. townley. Water Resources Research, 33(9):2199–2202, 1997. [60] P.K. Kitanidis. Introduction to Geostatistics. Cambridge University Press, Cambridge, 1997. [61] P.K. Kitanidis and E.G. Vomvoris. A geostatistical approach to the inverse problem in groundwater modeling (steady state) and one-dimensional simulations. Water Resources Research, 19(3):677–90, 1983. [62] B. Kozintsev. Computations with Gaussian random fields. PhD thesis, Institute for Systems Research, University of Maryland, 1999. [63] A.M. LaVenue and J.F. Pickens. Application of a coupled adjoint sensitivity and kriging approach to calibrate a groundwater flow model. Water Resources Research, 28(6):1543–59, 1992. [64] A.M. LaVenue, B.S. RamaRao, G. de Marsily, and M.G. Marietta. Pilot point methodology for automated calibration of an ensemble of conditionally simulated transmissivity fields. 2. application. Water Resources Research, 31(3):495–516, 1995. [65] K. Levenberg. A method for the solution of certain nonlinear problems in least squares. Quart. Appl. Math., 2:164–168, 1944. [66] M. Van Marel, G. Heinig, and P. Kravanja. A stabilized superfast solver for nonsymmetric toeplitz systems. SIAM Journal on Matrix Analysis and Applications, 23(2):494–510, 2001. [67] D.W. Marquardt. An algorithm for least squares estimation of non-linear parameters. J. Soc. Indust. Appl. Math., 11:431–441, 1963. [68] G. Matheron. The Theory of Regionalized Variables and Its Applications. Ecole de Mines, Fontainbleau, France, 1971. [69] D. McLaughlin and L.R. Townley. A reassessment of the groundwater inverse problem. Water Resources Research, 32(5):1131–61, 1996.

BIBLIOGRAPHY

145

[70] D. McLaughlin and L.R. Townley. Reply to comment by p. k. kitanidis on "a reassessment of the groundwater inverse problem" by d. mclaughlin and l. r. townley. Water Resources Research, 33(9):2203, 1997. [71] F. Molz and M. Widdowson. Internal inconsistencies in dispersion-dominated models that incorporate chemical and microbial kinetics. Water Resources Research, 24(4):615–619, 1988. [72] S.P. Neumann, C.L. Winter, and C.M. Newman. Stochastic theory of field-scale Fickian dispersion in anisotropic porous media. Water Resources Research, 23(3):453–66, 1987. [73] D.J. Nott and R.J. Wilson. Parameter estimation for excursion set texture models. Signal Processing, 63:199–210, 1997. [74] J. Olkin. Linear and Nonlinear Deconvolution Problems. PhD thesis, Rice University, Houston, 1986. [75] W.H. Press, B.P. Flannery S.A. Teukolsky, and W.T. Vetterling. Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, Cambridge, 2nd edition, 1992. [76] A.M. Rahman. Transverse dispersive mixing in heterogeneous porous media. PhD thesis, Institut für Wasserbau, Universität Stuttgart, 2004, in progress. [77] H. Rajaram and L.W. Gelhar. Plume scale-dependent dispersion in heterogeneous aquifers. 2. eulerian analysis and three-dimensional aquifers. Water Resources Research, 29(9):3261–76, 1993. [78] B.S. RamaRao, A.M. LaVenue, G. de Marsily, and M.G. Marietta. Pilot point methodology for automated calibration of an ensemble of conditionally simulated transmissivity fields. 1. theory and computational experiments. Water Resources Research, 31(3):475–493, 1995. [79] C.R. Rao. Linear Statistical Inference and its Applications, second Edition. John Wiley and Sons, New York, 1973. [80] J.N. Reddy. Introduction to the finite element method. McGraw-Hill Science/Engineering/Math, New York, 2nd edition, 1993. [81] C.L. Rino. The inversion of covariance matrices by Finite Fourier Transforms. IEEE Transactions on Information Theory, 16:230–2, 1970. [82] Y. Rubin. Prediction of tracer plume migration in heterogeneous porous media by the method of conditional probabilities. Water Resources Research, 27(6):1291–1308, 1991. [83] Y. Rubin, A. Bellin, and A.E. Lawrence. On the use of bock-effective macrodispersion for numerical simulations of transport in heterogeneous formations. Water Resources Research, 39(9):doi:10.1029/2002WR001727, 2003. [84] Y. Rubin, A. Sun, R. Maxwell, and A. Bellin. The concept of block-effective macrodispersivity and a unified approach for grid-scale- and plume-scale-dependent transport. Journal of Fluid Mechanics, 395:161–80, 1999. [85] J.W. Ruge and K. Stueben. "Algebraic Multigrid (AMG)" in Multigrid Methods. Frontiers in Applied Mathematics, vol. 5. SIAM, Philadelphia, 1986. [86] A.E. Scheidegger. General theory of dispersion in porous media. Journal of Geophysical Research, 66:3273–3278, 1961. [87] F.C. Schweppe. Uncertain Dynamic Systems. Prentice-Hall, Englewood Cliffs, NJ, 1973. [88] S.R. Searle. On inverting circulant matrices. Linear Algebra and its Applications, 25:77–89, 1979.

146

BIBLIOGRAPHY

[89] J.R. Shewchuk. An introduction to the conjugate gradient method without the agonizing pain, 1994. http://www.cs.berkeley.edu/∼jrs/. [90] M. Stewart. Cholesky factorization of semi-definite Toeplitz matrices. Linear Algebra and its Applications, 254(1-3):497–525, 1997. [91] M. Stewart and P. van Doren. Stability issues in the factorization of structured matrices. SIAM Journal on Matrix Analysis and Applications, 18(1):104–18, 1996. [92] G. Strang. A proposal for toeplitz matrix calculations. Stud. Appl. Amth., 74:171–176, 1986. [93] E.A. Sudicky and J.A. Cherry. Field observations of tracer dispersion under natural flow conditions in an unconfined sandy aquifer. Water Pollution Research in Canada, 14:1–17, 1979. [94] N.-Z. Sun. Inverse problems in groundwater modeling. Theory and applications of transport in porous media. Kluwer Academic Publishers, Dordrecht, 1994. [95] N.-Z. Sun and W.W.-G. Yeh. Coupled inverse problems in groundwater modeling. 1. sensitivity analysis and parameter identification. Water Resources Research, 26(10):2507–25, 1990. [96] J.F. Sykes, J.L. Wilson, and R.W. Andrews. Sensitivity analysis for steady-state groundwaterflow using adjoint operators. Water Resources Research, 21(3):359–71, 1985. [97] L.R. Townley and J.L. Wilson. Computationally efficient algorithms for parameter estimation and uncertainty propagation in numerical models of groundwater flow. Water Resources Research, 21(12):1851–60, 1985. [98] G.E. Trapp. Inverses of circulant matrices and block circulant matrices. Kyungpook Mathematical Journal, 13(1):11–20, 1973. [99] C.F. van Loan. Computational Frameworks for the Fast Fourier Transform. SIAM Publications, Philadelphia, PA, 1992. [100] J.A. Vargas-Guzmán and T.C.-J. Yeh. Sequential kriging and cokriging: Two powerful geostatistical approaches. Stochastic Environmental Research and Risk Assessment, 13:416–435, 1999. [101] T.-C.J. Yeh, A.L. Gutjahr, and M. Jin. An iterative cokriging-like technique for ground-water flow modeling. Ground Water, 33(1):33–41, 1995. [102] T.-C.J. Yeh, M. Jin, and S. Hanna. An iterative stochastic inverse method: conditional effective transmissivity and hydraulic head fields. Water Resources Research, 32(1):85–92, 1996. ◦

ˆ unek. Stochstic fusion of information for characterizing and monitoring [103] T.-C.J. Yeh and J. Sim the vadose zone. Vadose Zone Journal, 1:207–221, 2002. [104] J. Zhang and T.C.-J. Yeh. An iterative geostatistical inverse method for steady flow in the vadose zone. Water Resources Research, 33(1):63–71, 1997. [105] D.A. Zimmerman, G. de Marsily, C.A. Gotway, M.G. Marietta, C.L. Axness, R.L. Beauheim, R.L. Bras, J. Carrera, G.Dagan, P.B. Davies, D.P.Gallegos, A. Galli, J. Gomez-Hernandez, P. Grindrod, A.L. Gutjahr, P.K. Kitanidis, A.M. Lavenue, D. McLaughlin, S.P. Neuman, B.S. RamaRao, C. Ravenne, and Y. Rubin. A comparison of seven geostatistically based inverse approaches to estimate transmissivities for modeling advective transport by groundwater flow. Water Resources Research, 34(6):1373–1413, 1998. [106] D.L. Zimmerman. Computationally exploitable structure of covariance matrices and generalized covariance matrices in spatial models. Journal of Statistical Computation and Simulation, 32(1/2):1–15, 1989.

Appendix A

Mathematical Tools A.1 Matrix Algebra A.1.1 Partitioned Matrices Define the partitioned matrix M and its inverse N: 

M11 M21

M12 M22

−1

=



N11 N21

N12 N22



,

(A.1)

then the submatrices obey the following equations (e.g. Schweppe, 1973 [87]): N22

=

M22 − M21 M−1 11 M12

M−1 11

−1 M−1 11 M12 N22 M21 M11

N11

=

N12

= −M−1 11 M12 N22

N21

+

−1

(A.2) (A.3) (A.4) (A.5)

= −N22 M21 M−1 11 .

Since the block indexing is exchangeable, the submatrices follow also: M11 − M12 M−1 22 M21

N11

=

N12

= −N11 M12 M−1 22

N21 N22

−1

= −M−1 22 M21 N11

−1 −1 = M−1 22 + M22 M21 N11 M12 M22 ,

(A.6) (A.7) (A.8) (A.9)

leading to the following identities: M22 − M21 M−1 11 M12

−1

−1 −1 = M−1 22 + M22 M21 N11 M12 M22 −1 −1 M−1 11 M12 M22 − M21 M11 M12 −1 = M11 − M12 M−1 M12 M−1 22 M21 22 .

(A.10)

(A.11)

148

Mathematical Tools

A.1.2 Trace The value of the trace is invariant with respect to cyclic shifts of products inside the trace:     Tr xT Ax = Tr xxT A

(A.12)

Since the trace of a matrix is a linear operation, the order of trace and expectation may be interchanged:       E Tr xT Ax = Tr E xxT A (A.13) A useful property of quadratic forms is that they are scalars, so they are identical to their trace   xT Ax = Tr xT Ax (A.14)

and the above equalities can be applied to quadratic forms.

A.1.3 Matrix Derivatives   ∂A = ||A|| Tr A−1 ∂θ

∂ ||A|| ∂θ ∂A−1 ∂θ ∂Tr [A] ∂θ

∂A −1 = −A−1 A ∂θ   ∂A = Tr ∂θ

(A.15) (A.16) (A.17)

A.2 Integration Rules for Rn The divergence theorem relates the integral of the divergence of a vector field q over a domain Ω to the flux across the domain boundary Γ: Z Z ∇ · q dΩ = q · n dΓ (A.18) Ω

Γ

Consider q given by: q = f ∇g

in which f and g are arbitrary differentiable functions. Then, from partial integration, one obtains: ∇·q

= ∇ · (f ∇g) = ∇f · ∇g + f ∇ · (∇g)

Combining these equations, one obtains Green’s first theorem: Z Z Z f ∇2 g dΩ = − ∇f · ∇g dΩ + f ∇g · n dΓ Ω





(A.20)

Γ

Similarly, let q = f ∇2 g − g∇2 f to obtain Green’s second theorem: Z Z Z f ∇2 g dΩ = g∇2 f dΩ + (f ∇g − g∇f ) · n dΓ Ω

(A.19)

Γ

(A.21)

A.3 Analytical Expressions for Element Matrices

149

Further, Green’s theorems can be extended and applied to weak formulations (e.g., Sun, 1994 [94, p. 104]): Z Z Z ψ∇ · (f ∇g) dΩ = − ∇ψ · (f ∇g) dΩ + (ψf ∇g) · n dΓ (A.22) Z





Z

ψ∇ · (f ∇g) dΩ = Ω

Z





ψ∇ · (f v) dΩ = −

Z



ψv · ∇f dΩ = −

g∇ · (f ∇ψ) dΩ +

Z

Z

Z



∇ψ · (f v) dΩ +



f ∇ · (ψv) dΩ +

Γ

f (ψ∇g − g∇ψ) · n dΓ

(A.23)

(ψf v) · n dΓ

(A.24)

Γ

(ψf v) · n dΓ

(A.25)

Γ

Γ

Z

Z

A.3 Analytical Expressions for Element Matrices The analytical solution for the stiffness matrix M, the mass matrix S and the inflow matrix I are common knowledge in any basic Finite Element textbook. The stiffness matrix S for 2-D bilinear rectangular elements with size dx × dy is: Z T (∇N) ∇N dV Vel

=

The mass matrix M is:



+2 1 dy  +1  6 dx  −2 −1

Z

+1 +2 −1 −2

NT N dV Vel

−2 −1 +2 +1

=

 −1 −2   + 1 dx +1  6 dy +2 

4  2 1 dx dy   2 36 1



+2  −2   +1 −1 2 4 1 2

2 1 4 2

−2 +2 −1 +1

+1 −1 +2 −2

 −1 +1  . −2  +2

 1 2  . 2  4

The inflow matrix I is (assuming that the inflow boundary is at nodes 1 and 2): Z

NT N dΓ = Γin,el



2 1  1 dy  6  0 0

1 2 0 0

0 0 0 0

 0 0  . 0  0

The following analytical solutions have not been derived or reported in literature up to date. To compute the source terms in the adjoint state equations for temporal moments, I derived the following analytical solution:

150

Mathematical Tools

Z

(∇N)T (∇N) ˆ zN dV − Vel

2

−3 dx 6 −3 6 + 24dy 4 −1 −1 2 −1 dx 6 −1 6 + 24dy 4 −1 −1 2 −3 dy 6 −1 6 + 24dx 4 −3 −1 2 +3 dy 6 6 +1 + 24dx 4 +3 +1

+3 +3 +1 +1

−1 −1 −1 −1

+1 +1 +1 +1

−1 −1 −3 −3

−1 −1 −1 −1

+3 +1 +3 +1

+1 +1 +1 +1

−3 −1 −3 −1

Z

ˆN dΓ = NT nT (∇N) z Γel

3 +1 +1 7 7 zˆ + +1 5 1 +1 3 +1 +1 7 7 zˆ + +3 5 3 +3 3 +1 +1 7 7 zˆ + +1 5 1 +1 3 −1 −1 7 7 zˆ + −1 5 3 −1

2

+3 dx 6 +3 6 24dy 4 +1 +1 2 +1 dx 6 6 +1 24dy 4 +1 +1 2 −1 dy 6 6 −1 24dx 4 −1 −1 2 +1 dy 6 6 +1 24dx 4 +1 +1

−3 −3 −1 −1

+1 +1 +1 +1

−1 −1 −1 −1

+1 +1 +3 +3

−1 −3 −1 −3

+1 +1 +1 +1

+1 +3 +1 +3

−1 −1 −1 −1

3 −1 −1 7 7 zˆ −1 5 2 −1 3 −1 −1 7 7 zˆ −3 5 4 −3 3 +1 +3 7 7 zˆ +1 5 2 +3 3 −1 −3 7 7 zˆ , −1 5 4 −3

ˆ. The artifice in this expression is to take ˆ in which zˆ1 . . . zˆ4 are the single nodal entries of z z out of the integral by breaking it up into its single elements. Only after this step it can be moved out of the integral. Without the element boundary term, this term is useful for computing sensitivities: Z

(∇N)T (∇N) ˆ zN dV = Vel

2

+3 dx 6 −3 6 + 24dy 4 +1 −1 2 +1 dx 6 6 −1 + 24dy 4 +1 −1 2 +3 dy 6 +1 6 + 24dx 4 −3 −1 2 −3 dy 6 −1 6 + 24dx 4 +3 +1

+3 −3 +1 −1

+1 −1 +1 −1

+1 −1 +1 −1

+1 −1 +3 −3

+1 +1 −1 −1

+3 +1 −3 −1

−1 −1 +1 +1

−3 −1 +3 +1

3 +1 −1 7 7 zˆ + +1 5 1 −1 3 +1 −1 7 7 zˆ + +3 5 3 −3 3 +1 +1 7 7 zˆ + −1 5 1 −1 3 −1 −1 7 7 zˆ + +1 5 3 +1

2

−3 dx 6 6 +3 24dy 4 −1 +1 2 −1 dx 6 6 +1 24dy 4 −1 +1 2 +1 dy 6 +1 6 24dx 4 −1 −1 2 −1 dy 6 6 −1 24dx 4 +1 +1

−3 +3 −1 +1

−1 +1 −1 +1

−1 +1 −1 +1

−1 +1 −3 +3

+1 +3 −1 −3

+1 +1 −1 −1

−1 −3 +1 +3

−1 −1 +1 +1

3 −1 +1 7 7 zˆ −1 5 2 +1 3 −1 +1 7 7 zˆ −3 5 4 +3 3 +1 +3 7 7 zˆ −1 5 2 −3 3 −1 −3 7 7 zˆ . +1 5 4 +3

Finally, I use the boundary integral separately to compute the boundary values for the adjoint state of the heads. This time presuming that the inflow boundary is at nodes 3 and 4: Z

ˆN dΓ = NT nT (∇N) z Γin

2

+0 dy 6 +0 6 + 24dx 4 +0 +0 2 +0 dy 6 +0 6 + 24dx 4 +0 +0

+0 +0 +0 +0

+0 +0 −6 −2

+0 +0 +0 +0

+0 +0 +6 +2

3 2 +0 +0 6 +0 d +0 7 y 7 zˆ + 6 −2 5 1 24dx 4 +0 −2 +0 3 2 +0 +0 6 +0 7 7 zˆ3 + dy 6 +0 +2 5 24dx 4 +0 +2 +0

+0 +0 +0 +0

+0 +0 −2 −2

+0 +0 +0 +0

+0 +0 +2 +2

3 +0 +0 7 7 zˆ −2 5 2 −6 3 +0 +0 7 7 zˆ . +2 5 4 +6

Suggest Documents