Estimation of spatial processes using local scoring rules - Springer Link

1 downloads 0 Views 279KB Size Report
A. Philip Dawid · Monica Musio. Received: 13 October ... (see e.g. Dawid 1986) is a loss function S(x,Q) measuring the quality of a quoted probability distribution ...
AStA Adv Stat Anal (2013) 97:173–179 DOI 10.1007/s10182-012-0191-8 O R I G I N A L PA P E R

Estimation of spatial processes using local scoring rules SPATIAL SPECIAL ISSUE A. Philip Dawid · Monica Musio

Received: 13 October 2011 / Accepted: 3 April 2012 / Published online: 19 April 2012 © Springer-Verlag 2012

Abstract We display pseudo-likelihood as a special case of a general estimation technique based on proper scoring rules. Such a rule supplies an unbiased estimating equation for any statistical model, and this can be extended to allow for missing data. When the scoring rule has a simple local structure, as in many spatial models, the need to compute problematic normalising constants is avoided. We illustrate the approach through an analysis of data on disease in bell pepper plants. Keywords Proper scoring rule · Pseudo-likelihood · Ratio matching · Unbiased estimating equation

1 Introduction Maximum likelihood estimation of a spatial process can be computationally demanding, because of the need to manipulate the normalisation constant of the joint distribution. Besag (1975) developed the method of pseudo-likelihood to sidestep this problem. This has traditionally been considered as an approximation (of unknown quality) to the full likelihood. However, as we describe below, the method can be justified in its own right, as leading to an unbiased estimating equation. Other methods with similar justification and properties can be constructed from general proper scoring rules, and these supply useful alternatives to pseudo-likelihood.

A.P. Dawid Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, Cambridge, UK e-mail: [email protected] M. Musio () Dipartimento di Matematica ed Informatica, Università di Cagliari, Cagliari, Italy e-mail: [email protected]

A.P. Dawid, M. Musio

174

In Sect. 2 we recall the definition of a proper scoring rule, and describe how such a rule be used to construct an unbiased estimating equation for any smooth parametric family. Section 3 introduces a family of proper scoring rules that are particularly suited to estimation of locally defined spatial processes; Besag’s pseudo-likelihood is seen as a special case. In Sect. 4 we describe the analysis of a spatial dataset using pseudo-likelihood and an alternative scoring rule, and compare the results in Sect. 5. Finally Sect. 6 summarises our findings and indicates lines for further research.

2 Proper scoring rules Let X be a random variable, taking values in a sample space X . A scoring rule (see e.g. Dawid 1986) is a loss function S(x, Q) measuring the quality of a quoted probability distribution Q for X, in the light of the realised outcome x of X. It is proper if, for any distribution P for X, the expected score S(P , Q) := EX∼P S(X, Q) is minimised by quoting Q = P . There is a very wide variety of proper scoring rules: for general characterisations see McCarthy (1956), Savage (1971), Hendrickson and Buehler (1971), Gneiting and Raftery (2007); for various special cases, see Dawid (1998, 2007), Dawid and Sebastiani (1999), Gneiting and Raftery (2007). A prominent example (Good 1952) is the log score, S(x, Q) = − log q(x), where q(·) denotes the density (probability mass function in the discrete case) of X. So long as the size of X exceeds two, this is essentially the only proper scoring rule that is local (Bernardo 1979), i.e. involves q(·) only through its value at the realised outcome x. However many more “local” proper scoring rules are admitted if we weaken the definition of locality to allow dependence on derivatives of q(·) at x in the continuous case (Parry et al. 2012), or on the values of q(·) at points neighbouring x in the discrete case (Dawid et al. 2012). All these extensions have the convenient property that they can be computed without knowledge of the normalisation constant of the density. For a finite (especially binary) sample space X , another useful proper scoring rule is the Brier (Brier 1950) or quadratic score, S(x, Q) = {1 − q(x)}2 + y=x q(y)2 , which is just the squared Euclidean distance between the vector (q(x) : x ∈ X ) corresponding to Q, and the vector corresponding similarly to the one-point distribution at x. The theory of proper scoring rules is primarily conceived as belonging to subjectivist Bayesian statistics (de Finetti 1962; Savage 1971), but it also has important applications to classical inference. Thus, given a proper scoring rule S, for any smooth parametric statistical model P = {Pθ } for X define s(x, θ ) :=

∂S(x, Pθ ) . ∂θ

Then we can estimate θ by θˆS , the root of the estimating equation s(x, θ ) = 0.

(1)

When S is the log score, this is just the likelihood equation, and θˆS is the maximum likelihood estimate. More generally it can be shown (Dawid and Lauritzen 2005) that,

Estimation of spatial processes using local scoring rules

175

for any differentiable scoring rule and any smooth statistical model, Eθ {s(X, θ )} = 0, i.e. (1) is an unbiased estimating equation. In particular it will typically deliver a consistent—if not necessarily efficient—estimator in repeated sampling. We could take advantage of this flexibility by choosing S to increase robustness or ease of computation.

3 Application to spatial processes Let V be a set of sites at which observations X = (Xv : v ∈ V ) may be made—in our intended interpretation, V is a two-dimensional spatial array, and a distribution P for X is a spatial process. Many interesting spatial processes are initially defined locally, in terms of {Pv : v ∈ V }, where Pv denotes the family of conditional distributions for Xv , given the values of X\v , the variables at all other sites. In particular, if P is Markov on a graph G over V , then Pv only depends on the values of Xne(v) , the variables at the sites neighbouring v. Such conditional specification, however, is not readily converted into a joint density for all the variables. In particular, it is typically problematic to work with the normalising constant of this joint distribution. This complicates tasks such as maximum likelihood estimation. We can instead apply the above theory of proper scoring rules, based directly on the initial conditional specification. For simplicity, suppose each Xv takes values in the same set X0 , and let S0 be a proper scoring rule (the single site rule) over X0 . Define a scoring rule S over X by  S(x, Q) = S0 (xv , Q∗v ), (2) v

Q∗v

where x = (xv : v ∈ V ), and is the conditional distribution, under Q, of Xv , given the realised values x\v for the variables X\v at all sites other than v.1 It is easily seen that S is a proper scoring rule. Moreover, since it depends only on the conditional specification, the need to evaluate the normalising constant of the full joint distribution is avoided. Corresponding to (2) we have an unbiased estimating equation  s0 (xv , Pθ,v ) = 0, (3) v

each term in the sum having expectation 0. When S0 is the log score, (3) reduces to the negative logarithm of Besag’s pseudo-likelihood (Besag 1975). For Xv binary and S0 the Brier score, it yields the method of ratio matching (Hyvärinen 2007). Missing data are readily dealt with, so long as they are missing completely at random. Thus let Av = 0 if any value in {v} ∪ ne(v) is missing, else Av = 1. Then s0 (xv , Pθ,v ) × Av has expectation 0, so if we simply omit incomplete terms from (3) we shall still have an unbiased estimating equation. 1 There is no difficulty in principle in allowing the sample space for X to vary with v, and the single site v scoring rule to vary with both v and x\v .

A.P. Dawid, M. Musio

176

4 Example: Phytophthora on bell pepper The data in Fig. 1 indicate the presence or absence of the pathogen Phytophthora capsici Leonian in bell pepper plants on a regular 20 × 20 grid (Chadoeuf et al. 1992). We model the data as a stationary first-order Markov process with respect to the grid, which must thus follow the autologistic model (Besag 1972; Besag 1974; Gumpertz et al. 1997): logit πij = α + β(xi−1,j + xi+1,j ) + γ (xi,j −1 + xi,j +1 )

(4)

where πij is the probability of Xij = 1, given all other values. To fit by maximum pseudo-likelihood (PL), we can proceed as if the (Xij ) were all independent, and maximise the resulting “likelihood”. This can be done by a standard generalised linear model analysis, readily implemented in standard software such as R (R Development Core Team 2007), using the binomial family and (default) logit link function. The R package Rcitrus (Krainski and Ribeiro 2005) provides various functions to manipulate and analyse such data. Alternatively, and possibly more robustly, we could apply ratio matching (RM), based on the Brier scoring rule, which leads to the least-squares recipe: minimise  (xij − πij )2 . Again this can be implemented in standard GLM software, treating the data as if they were normal with constant variance, and using the logit

Fig. 1 Presence (1) and absence (0) of pathogen in bell pepper plants

Estimation of spatial processes using local scoring rules

177

link function. In R this can be effected using the glm() command with option family = quasi(link=logit,variance=constant). Alternatively, one can make appropriate modifications to the function autologistic.citrus in the Rcitrus package to perform this analysis. Although it is easy to compute the estimates, the “standard errors” output by standard GLM software will be inappropriate, typically underestimating true uncertainty, since they do not take any account of the dependence in the data. Instead, we apply the parametric bootstrap (Gumpertz et al. 1997; Krainski et al. 2008). Using the estimated parameter values, and starting from the actual data, we can simulate a new realisation of the process by Gibbs sampling, updating one site at a time according to its conditional distribution given its neighbours, continuing until the process has reached equilibrium. This procedure can be efficiently carried out by the coding method (Besag 1974) and blocked Gibbs sampling: we colour the sites alternately black and white as on a chessboard; update all the black sites, simultaneously and independently, given the white sites; then the whites given the blacks; and so on until equilibrium. We repeat this some large number N times, and each time estimate the parameters by our chosen method. From these N bootstrap estimates we can estimate the standard deviation of the estimator of any parameter or parameter-function—which we then take as applying to the original estimates.

5 Results Table 1 displays the results of fitting the model (4) by pseudo-likelihood (PL) and by ratio matching (RM). Values at sites on the boundary of the grid, which do not have four observed neighbours, are not used as responses, though they are used as covariate values for their neighbouring interior sites. There are thus 18 × 18 = 324 data-points used to fit the model. Standard errors are computed from N = 500 bootstrap samples. A similar analysis can be performed if we include covariates in the model. In particular we have analysed the bell pepper dataset for field 1 detailed in Gumpertz et al. (1997), using an extension of the autologistic model (4) which includes additional terms on the right hand side that are linear functions of the covariates soil water content (%) (swc) and soil pathogen population (spp). We have fitted this model using both PL and RM. Results of the analysis are displayed in Table 2. Although the estimates of parameter-values and their standard errors are, unsurprisingly, different for the two different scoring rules, in both examples they are acceptably close together, and there do not appear to be any consistent discrepancies. This reflects the generally “clean” nature of these particular datasets. For other Table 1 Coefficients estimated by pseudo-likelihood (PL) and ratio matching (RM)

PL coefficient

RM sd

coefficient

sd

−2.4358

0.2850

−2.2859

0.3328

WE, β

1.6443

0.2767

1.5839

0.2959

NS, γ

0.6422

0.2534

0.5536

0.2793

Intercept, α

A.P. Dawid, M. Musio

178 Table 2 Coefficients estimated by pseudo-likelihood (PL) and ratio matching (RM), swc = soil water content, spp= soil pathogen population

PL coefficient Intercept, α WE, β

RM sd

coefficient

sd

−2.9480

1.2383

−2.8188

1.5767

1.2933

1.4147

1.2823

1.1544

NS, γ

0.3524

2.3061

0.3734

1.6031

swc

0.0454

0.1162

0.0335

0.1493

spp

−0.0305

0.1719

−0.0500

0.2774

datasets we might expect more notable differences. In particular, when there are a number of sites where the probability is close to 0 (or 1) while the corresponding outcome is 1 (or 0)—situations penalised very heavily by the log score—the RM approach should be more relaxed than PL in allowing such seemingly discrepant estimates.

6 Concluding remarks The PL and RM methods, as well as methods similarly derived from other proper scoring rules, lead to estimators obtained by solving an unbiased estimating equation. In consequence, any such estimator will be consistent under repeated sampling. In the examples studied the estimates obtained from applying PL and RM are broadly in line, neither being obviously more accurate than the other. Further theoretical and computational work is under way to explore and compare their accuracy, efficiency and robustness properties. Datasets simulated with parameter-values close to criticality (Gumpertz et al. 1997, Table 1) should prove particularly illuminating. Acknowledgement

We are grateful to Elias Krainski for his assistance with using Rcitrus.

References Bernardo, J.M.: Expected information as expected utility. Ann. Stat. 7, 686–690 (1979) Besag, J.E.: Nearest-neighbour systems and the auto-logistic model for binary data. J. R. Stat. Soc. B 34, 75–83 (1972) Besag, J.E.: Spatial interaction and the statistical analysis of lattice systems. J. R. Stat. Soc. B 36(2), 192–236 (1974) Besag, J.E.: Statistical analysis of non-lattice data. J. R. Stat. Soc., Ser. D Stat. 24, 179–195 (1975) Brier, G.W.: Verification of forecasts expressed in terms of probability. Mon. Weather Rev. 78, 1–3 (1950) Chadoeuf, J., Nandris, D., Geiger, J., Nicole, M., Pierrat, J.: Modélisation spatio-temporelle d’une epidémie par un processus de Gibbs: estimation et tests. Biometrics 48, 1165–1175 (1992) Dawid, A.P.: Probability forecasting. In: Kotz, S., Johnson, N.L., Read, C.B. (eds.) Encyclopedia of Statistical Sciences, vol. 7, pp. 210–218. Wiley-Interscience, New York (1986) Dawid, A.P.: Proper measures of discrepancy uncertainty and dependence with applications to predictive experimental design (revised). Tech. Rep. 139b, Department of Statistical Science, University College London (1998). URL http://tinyurl.com/6fa4ekz Dawid, A.P.: The geometry of proper scoring rules. Ann. Inst. Stat. Math. 59, 77–93 (2007). URL http:// tinyurl.com/65t4xml

Estimation of spatial processes using local scoring rules

179

Dawid, A.P., Lauritzen, S.L.: The geometry of decision theory. In: Proceedings of the Second International Symposium on Information Geometry and Its Applications, pp. 22–28. University of Tokyo, Tokyo (2005) Dawid, A.P., Lauritzen, S.L., Parry, M.: Proper scoring rules on discrete sample spaces. Ann. Stat. (2012). arXiv:1104.2224v1 Dawid, A.P., Sebastiani, P.: Coherent dispersion criteria for optimal experimental design. Ann. Stat. 27, 65–81 (1999) de Finetti, B.: Does it make sense to speak of ‘good probability appraisers’? In: Good, I.J. (ed.) The Scientist Speculates: an Anthology of Partly-Baked Ideas, pp. 357–364. Basic Books, New York (1962) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 102, 359–378 (2007) Good, I.J.: Rational decisions. J. R. Stat. Soc. B 14, 107–114 (1952) Gumpertz, M.L., Graham, J.M., Ristaino, J.B.: Autologistic model of spatial pattern of phytophthora epidemic in bell pepper: effects of soil variables on disease presence. J. Agric. Biol. Environ. Stat. 2, 131–156 (1997) Hendrickson, A.D., Buehler, R.J.: Proper scores for probability forecasters. Ann. Stat. 42, 1916–1921 (1971) Hyvärinen, A.: Some extensions of score matching. Comput. Stat. Data Anal. 51, 2499–2512 (2007) Krainski, E.T., Ribeiro, P.J. Jr. Rcitrus: funções em R para análise de dados de doenças de citros. In: R package version 0.3-0 (2005). URL http://www.est.ufpr.br/Rcitrus Krainski, E.T., Ribeiro, P.J. Jr., Bassanezi, R.B., Franciscon, L.: Autologistic model with an application to the citrus “sudden death” disease. Sci. Agric. 65, 541–547 (2008) McCarthy, J.: Measures of the value of information. Proc. Natl. Acad. Sci. USA 42, 654–655 (1956) Parry, M., Dawid, A.P., Lauritzen, S.L.: Proper local scoring rules. Ann. Stat. (2012). arXiv:1101.5011v1 R Development Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing (2007). URL http://www.R-project.org Savage, L.J.: Elicitation of personal probabilities and expectations. J. Am. Stat. Assoc. 66, 783–801 (1971)

Suggest Documents