Coverage Probability Fails to Ensure Reliable Inference

5 downloads 170 Views 516KB Size Report
Jun 26, 2017 - 1996; Barnett, 1999; Robert, 2007). ... tures (Martin and Liu, 2013, 2015a,c,b). ...... then the result of Section 5.2 would be rendered moot. In fact ...
arXiv:1706.08565v1 [math.ST] 26 Jun 2017

Coverage Probability Fails to Ensure Reliable Inference Michael Scott Balch Alexandria Validation Consulting, LLC Ryan Martin Department of Statistics, North Carolina State University Scott Ferson Institute for Risk and Uncertainty, University of Liverpool June 28, 2017

Abstract We present two examples, one pedagogical and one practical, in which confidence distributions support plainly misleading inferences. The root problem in both examples is that the coverage probability criterion allows certain false propositions to have a high probability of being assigned a high degree of confidence. In short, the problem is false confidence. This deficiency is almost universally present in confidence distributions. It is merely not obvious in most problems. Since Bayesian posterior distributions are sometimes rationalized as approximate confidence distributions, this finding has troubling implications for pragmatic users of Bayesian methods. The Martin–Liu validity criterion offers an alternative path for modern frequentist inference. Under this approach, confidence and plausibility assigned to propositions must satisfy a reliability-of-inference interpretation. Consonant confidence structures satisfy this interpretation, and confidence intervals are a special case within that class. So, while confidence distributions fail to provide reliable inference, confidence

1

intervals themselves suffer no such failure. Approaches to inference based solely on coverage probability are the result of a long- and widely-held misunderstanding of confidence intervals and the rigorous inferences that they support. Future efforts to improve upon confidence intervals should be grounded in the stricter Martin–Liu validity criterion.

Keywords: Confidence distribution, Bayesian posterior, Inferential model

2

1

Introduction

Confidence intervals have long been a staple of frequentist statistical inference, rationalized through the concept of “coverage probability.” That is, confidence intervals are widely accepted as valid and meaningful because, over repeated draws of the sampling data, a 100(1 − α)% confidence interval has probability (1 − α) of covering the fixed true value of the parameter being inferred (e.g., Casella and Berger, 2002). Over the past two decades, there have been multiple efforts to build a theory of confidence distributions, in which coverage probability is used to rationalize a distributional representation of inference, rather than just an interval (Schweder and Hjort, 2002; Singh et al., 2007). A complete review of these approaches is beyond the scope of this paper, but there is some recent literature (e.g., Nadarajah et al., 2015; Xie and Singh, 2013; Schweder and Hjort, 2016) that the interested reader may consult. One of these approaches, the theory of confidence structures, is phrased generally enough to allow the propagation of confidence distributions (Balch, 2012). Confidence structures offer nearly all the flexibility of the Bayesian approach together with the frequentist rigor offered by an interpretation grounded in coverage probability.1 This theory has helped to motivate subsequent work, including Ferson et al. (2013, 2014), Balch and Smarslok (2014), and O’Rawe et al. (2015). Unfortunately, the theory does not work. The definitions and proofs in Balch (2012) are mathematically valid, as far as they go. Nevertheless, subsequent applied work has shown that confidence distributions can lead to inferences that are plainly misleading (Balch, 2016). Section 3 provides a simple example of this problem, and Section 4 explores the misleading inferences encountered in satellite conjunction analysis. Section 5 identifies the problem in both of these examples as literal false confidence and then proves that false 1

Balch (2012) refers to “coverage probability” as “Neyman–Pearson confidence”, after the statisticians

who popularized the coverage probability rationalization for confidence intervals.

3

confidence afflicts all confidence distributions on continuous real parameters. Moreover, false confidence is not limited to confidence distributions. The proof presented in Section 5.2 applies to all Kolmogorov-satisfying2 belief functions used to represent uncertainty in a statistically inferred parameter with an unknown fixed value. Section 5.4 considers the circumstances under which the false confidence theorem will or will not apply to a Bayesian posterior. It potentially applies to most Bayesian posteriors, insofar as Bayesians do not usually expect or require a prior or posterior to represent the same kind of aleatory variability that a frequentist expects a sampling model to represent (Mayo, 1996; Barnett, 1999; Robert, 2007). Fortunately, an alternative approach exists. The inferential model (IM) framework falls within the same fiducial tradition as confidence distributions and confidence structures (Martin and Liu, 2013, 2015a,c,b). In the Martin–Liu framework, any inference must satisfy the criterion that the plausibility accorded to any true proposition3 must have a right-of-uniform distribution. That is, over repeated sampling of the data, the plausibility accorded to a true proposition must have a distribution that is stochastically greater than the uniform distribution on the unit interval. If that requirement is reminiscent of the classical property satisfied by p-values, that is because p-values and Martin–Liu plausibility are intimately connected (Martin and Liu, 2014). What this criterion ultimately means is that, if a proposition is assigned a 100(1 − α)% confidence by an IM, that confidence is justified because, if the proposition were false, it would have had a probability of at most α of being assigned such a high confidence value. As shown in Section 5, the same cannot 2

A “Kolmogorov-satisfying” belief function is simply a belief function that satisfies the Kolmogorov

axioms. 3 These propositions usually take the form of “θ ∈ A” where θ is the parameter being inferred and A is a set of possible values. A proposition of the form “θ ∈ A” is true if A actually contains the true value of θ and false if it does not.

4

be said of confidence distributions. In Section 6, it is shown that there is a special type of confidence structure that always satisfies the Martin–Liu validity criterion: consonant confidence structures. In Section 7, it is shown that confidence intervals also satisfy the Martin–Liu validity criterion. The paper concludes in Section 8 with some remarks on how the findings of this paper relate to the traditional understanding of confidence intervals.

2

Review: Confidence Structures and Martin–Liu Validity

2.1

Theory of Confidence Structures

A confidence structure is an observation-dependent random set whose belief function corresponds to coverage probability (Balch, 2012). Let θ ∈ Ωθ be the parameter that is being inferred, and let X ∈ Ωx be the random variable, whose distribution depends on θ and upon which the inference about θ is based. Further, let ProX|θ be the probability measure for X corresponding to the sampling distribution. A confidence structure consists of an auxiliary variable, a ∈ Ωa (with a fixed probability measure, Proa ) and an observation-dependent map, Θ : Ωa × Ωx → Bθ (where Bθ is a Borel algebra on Ωθ ) from the auxiliary variable to the parameter being inferred. To qualify as a confidence structure, the map must satisfy the following criterion: ( ProX|θ

x:

[

a∈A

Θ(a, x) ∋ θ

)!

≥ Proa (A) , ∀ θ ∈ Ωθ , A ∈ Ba

(1)

where Ba is the Borel algebra for the auxiliary variable. This criterion represents the requirement that the map of any measurable set in the auxiliary space, A ∈ Ba , is a 5

confidence interval or, more generally, a confidence region. Confidence values assigned to propositions about the value of θ are computed as follows: CnfΘ|x (B) = Proa ({a : Θ(a, x) ⊆ B}), ∀B ∈ Bθ .

(2)

The confidence value assigned to a set, B ∈ Bθ , by a confidence structure is rationalized by the fact that it is the coverage probability of the largest confidence interval/region contained by that set. In the context of this paper, that is the “coverage probability criterion.” For confidence structures that violate the Kolmogorov axioms,4 it is useful to define the plausibility function, as follows: PlsΘ|x (B) = 1 − CnfΘ|x (B C ) = Proa ({a : Θ(a, x) ∩ B 6= ∅}), ∀B ∈ Bθ . A colloquial understanding for confidence and plausibility is relatively easy to establish. The confidence assigned to a proposition reflects the amount of affirmative support assigned to that proposition by the confidence structure. The plausibility of a proposition is simply the extent to which that proposition has not yet been proven false. The theory of confidence structures is the most general, i.e., permissive, theory of inference motivated by coverage probability that has been attempted so far. This general phrasing has advantages over more traditional confidence distributions. In the theory of confidence structures, one can readily define a multidimensional confidence distribution. For example, consider the classic problem of normal inference with unknown mean and standard deviation, i.e., θ = (µ, σ). One can use the following pivots for x¯ and sx : x¯ − µ s2 √ ∼ n (0, 1) and 2 √x ∼ χ2n−1 σ/ n σ / n−1 4

All random sets, including confidence structures, support belief functions (or, in this case, confidence

functions) that satisfy the more general Shafer axioms (Shafer, 1976; Molchanov, 2005; Salicone, 2007).

6

to derive the following confidence structure for θ = (µ, σ): Θ(a, x) =

sx sx a1 x¯ − √ p , p n a2 / (n − 1) a2 / (n − 1)

!

where a1 ∼ n(0, 1), a2 ∼ χ2n−1 , and a1 and a2 are stochastically independent. In traditional theories of confidence distributions, even this simple and classic two-dimensional solution has yet to receive a formal representation (Xie and Singh, 2013; Schweder, 2007). The generality of confidence structures does not stop there. One can also define confidence structures on continuous parameters with discrete observables, as in binomial inference (Balch, 2012). One can also define a consonant confidence structure; these are explored in Section 2.3. Finally, all of this generality leads to a major result: that confidence structures can be propagated (Balch, 2012). This claim contradicts accepted wisdom about confidence distributions (Xie and Singh, 2013; Cox, 2006), and Section 3 briefly comments on this discrepancy of opinion.

2.2

Traditional Confidence Distributions

In the theory of confidence structures defined in Balch (2012), a confidence distribution is any confidence structure whose confidence function satisfies the Kolmogorov axioms. Except where stated otherwise, that is the definition used in this paper. However, that definition is not universally recognized. Traditionally, confidence distributions are much more strictly defined. Under most traditional definitions, a confidence distribution is an observation-dependent probability distribution on the inferred parameter, with the property that the cumulative distribution function has a uniform distribution at the true parameter value, over repeated draws of the data (Schweder and Hjort, 2002; Singh et al., 2007; Xie and Singh, 2013). That is, if FΘ|x 7

is the cumulative distribution function on θ for a given observation, x, then FΘ|X (θ|X) ∼ U(0, 1) at the true value of θ. Phrased a little more precisely, that criterion is as follows: ProX|θ

  x : FΘ|x (θ) ≤ α = α ∀θ ∈ R, α ∈ (0, 1) .

This definition falls as a special case within the theory of confidence structures, in which the auxiliary variable is uniformly distributed on [0, 1] and the map function is the inverse −1 cumulative distribution function for θ. That is, a ∼ U(0, 1) and Θ(a, x) = FΘ|x (a).

2.3

Consonant Confidence Structures

Consonant confidence structures, as defined in Balch (2012), are confidence structures that satisfy the axioms of possibility theory (Dubois and Prade, 2012), rather than probability theory. In its raw form, a consonant confidence structure is simply a nested set of confidence intervals/regions. As a result, any consonant confidence structure can be written or rewritten in terms of an auxiliary variable that is uniform on [0, 1], with the following property: Θ(a1 ) ⊆ Θ(a2 ), ∀a1 , a2 ∈ [0, 1] s.t. a1 ≥ a2 . Alternatively, a consonant confidence structure can be represented in terms of a pointwise plausibility function. That is, let r : Ωθ → [0, 1], with rΘ|x (θ) = PlsΘ|x ({θ}) = Proa ({a : Θ(a, x) ∋ θ}), ∀θ ∈ Ωθ .

(3)

This is usually the most compact representation of a consonant confidence structure (or any possibility distribution). From this pointwise function, the plausibility and confidence assigned to a measurable set can be computed as follows: PlsΘ|x (B) = sup rΘ|x (θ) and CnfΘ|x (B) = 1 − sup rΘ|x (θ) θ∈B

θ6∈B

8

(4)

Computation Computationof ofConfidence Confidenceand andPlausibility Plausibility

0.8 0.6 0.0

r(-2)=0.13

Pls = 0.41 Cnf = 0

0.4

is 60% confidence interval

Pls = 1 Cnf = 0.353

0.2

Į-slice for Į=0.4

Pointwise Membership Plausibility, Function, r(ș) r(X) Membership Fxn, r(x)

Membership Fxn, r(x) Pointwise Plausibility, r(ș) 0.0 0.2 0.4 0.6 0.8

1.0

1.0

Taking Function a Confidence Interval Membership and Alpha-Slices

-3

-2

-1

0

1

2

-3

3

Inferred InferredParameter, Variable, x ș

-2

-1

0

1

2

3

Inferred InferredParameter, Variable, x ș

Figure 1: Basics of a consonant confidence structure. Left panel shows how to draw a confidence interval with a desired coverage probability. Right panel shows confidence and plausibility computed for two sets of interest via Equation 4 from point-wise plausibility, as defined in Equation 3. This pair of formulas for computing confidence and plausibility is identical to the pair of formulas for computing the necessity and possibility functions in possibility theory (Salicone, 2007; Dubois and Prade, 2012). Figure 1 illustrates these basic properties of a consonant confidence structure. Balch (2012) pursued consonant confidence structures principally because, in some problems, it is easier to derive a consonant confidence structure via p-values than it would be to derive a confidence distribution. However, in the present paper, it is found that consonant confidence structures may be the only confidence structures that consistently yield reliable inference.

9

2.4

Martin–Liu Validity Criterion

Like the the theory of confidence structures, the Martin–Liu IM framework produces an observation-conditional random set on the parameter being inferred. However, instead of rationalizing the confidence function in terms of coverage probability, Martin and Liu enforce a reliability-of-inference criterion. Specifically, they require that any true proposition be accorded plausibility with a uniform or right-of-uniform distribution (Martin and Liu, 2013, 2015a,c,b). That is, ∀θ ∈ Ωθ , the inferential model must satisfy the following: ∀B ∈ Bθ s.t. B ∋ θ, ProX|θ ({x : PlsΘ|x (B) ≤ α}) ≤ α, ∀α ∈ [0, 1] . This is the Martin–Liu validity criterion. As already mentioned in Section 1, a result of this criterion is that only a true proposition can have a high probability of being assigned an high confidence value by a Martin–Liu IM. Interestingly, Martin and Liu (2013, 2015b) have found that consonant IMs5 are guaranteed to satisfy their validity criterion. That begs a question: Is there a connection between Martin–Liu IMs and consonant confidence structures and, if so, how deep is it? In Section 6, we prove that all consonant confidence structures satisfy the Martin–Liu validity criterion. First, though, Sections 3–5 illustrate and then demonstrate why it is advantageous to abandon the coverage probability criterion in favor of the Martin–Liu criterion. Specifically, it is shown that Kolmogorov-satisfying confidence structures (i.e., confidence distributions) assign copious and consistent amounts of confidence to fixed false propositions. In other words, confidence distributions consistently support misleading inferences. 5

Martin and Liu (2013) refer to nested predictive random sets on the auxiliary variable used to define

valid IMs, rather than directly referring to consonant IMs, but the effect and meaning are the same.

10

The |µ| Problem: A Simple Pedagogical Example

3

It is no secret that propagating confidence distributions, as though they were ordinary probability distributions, can lead to trouble (e.g., Schweder and Hjort, 2002; Xie and Singh, 2013; Cox, 2006). Nevertheless, Balch (2012) claims that it can be done. This section explores the propagation of uncertainty in an inferred parameter, µ, through a simple non-linear function µ 7→ |µ|. What is found here is that, unless one insists on restricting confidence distributions to those that can be expressed via a monotone map (see Section 2.2), the propagated confidence distribution on |µ| is as valid (or invalid) as the original generally accepted confidence distribution on µ. The propagated confidence distribution on |µ| does support misleading inferences. Specifically, it consistently indicates that |µ| is larger than it really is. However, it turns out that the misleading inferences about |µ| have been lurking inside the confidence distribution for µ all along.

3.1

Problem Set-Up

Consider the classic problem of inferring the value of the mean, µ, of a normal population, based on an iid random sample drawn from X ∼ n(µ, σ). If σ is unknown, the confidence distribution for µ is readily defined as follows: sx −1 (a) with a ∼ U(0, 1), M(a, x) = x¯ − √ Fn−1 n where M is the map function for µ, x¯ is the sample mean, sx is the sample standard −1 deviation, n is the sample size, Fn−1 is the inverse cumulative distribution function for a t-

distribution with n−1 degrees of freedom, and a is the unit uniform auxiliary variable. This definition conforms to both the theory of confidence structures and the more traditional definition of a confidence distribution. 11

-1.0

-0.5

0.0

0.5

mu ȝ

1.0

1.0 0.8 0.6 0.4

Cumulative Confidence

0.0

0.2

0.8 0.6 0.0

0.2

0.4

Auxiliary Variable

0.8 0.6 0.4 0.0

0.2

Auxiliary Variable

Cumulative Distribution Function for |mu| Cumulative Distribution Function for |ȝ|

1.0

Confidence Distribution for |mu| Confidence Structure for |ȝ|

1.0

Confidence Distribution for mu Confidence Distribution for ȝ

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

|mu| |ȝ|

Confidence distribution, by both definitions

Confidence distribution according to Balch (2012)

0.4

0.6

0.8

1.0

|mu| |ȝ|

Not a confidence distribution, by either definition

Figure 2: Propagated confidence distribution for |µ| To this problem is added a small twist. Instead of inquiring about the signed value of µ, the analyst is interested only in the magnitude of µ, i.e., |µ|. In the the theory of confidence structures, the following map for |µ| is perfectly valid: sx −1 |M|(a, x) = x¯ − √ Fn−1 (a) with a ∼ U(0, 1). n

In the more traditional definition for confidence distributions, this map is unacceptable because it is non-monotonic. Only the inverse cumulative distribution function would be acceptable as the map, and intervals drawn from that map do not have coverage probability properties. Figure 2 illustrates an example confidence distribution for µ, the corresponding confidence distribution for |µ| (valid only in the theory of confidence structures), and the corresponding cumulative distribution function for |µ|, which is generally agreed not to support a coverage probability interpretation.

12

3.2

Misleading Inference

What is potentially troubling about this example, at least for an adherent to the theory of confidence structures, is that the valid confidence structure in the middle panel of Figure 2 supports the same confidence function as the invalid confidence distribution illustrated in the right-most panel of Figure 2. Both representations of uncertainty will assign the same level of confidence to any set of interest. Does that represent a potential for trouble? In fact, it does. That trouble arises when the true value of |µ| is small relative to the standard deviation, σ, of the population being sampled. For simplicity, suppose the true value of µ is exactly zero. Trivially, one will always obtain 100% confidence that |µ| > 0, by virtue of the fact that a continuous confidence distribution cannot assign positive confidence to a single point, in this case, {0}. Somewhat less trivial is the fact that the confidence

distribution for |µ| is driven by σn−1/2 . No matter how small the true value of |µ| is,

the propagated confidence distribution will almost always indicate that |µ| is at least on

the same scale as σn−1/2 . For example, over repeated draws of the data, the propagated

confidence distribution for |µ| will always have at least a 90% probability of assigning at

least a 90% confidence to the false proposition that |µ| ≥ 0.1σn−1/2 ; see Figure 3. Again,

this holds even if the true value of |µ| is zero.

3.3

Interpretation

Now, one could take this problem as a clear and obvious rationale for the traditional position that confidence distributions should not be propagated. The difficulty with this view is that the misleading inferences supported by the propagated confidence distribution for |µ| are also supported by the traditionally accepted confidence distribution for

µ. For example, the proposition {|µ| ≥ 0.1σn−1/2 } is equivalent the proposition {µ ≤ 13

14 12 10 8 6 0

2

4

Density

0.80

0.85

0.90

CnfX(|µ| ≥ 0.1σn

−1 2

0.95

1.00

)

Figure 3: Approximate sampling distribution of CnfX ({µ : |µ| ≥ 0.1σn−1/2 }), where X = (X1 , . . . , Xn ) iid n(0, σ 2 ), with n = 10 and σ = 1.

−0.1σn−1/2 } ∨ {0.1σn−1/2 ≤ µ}. And the confidence being assigned to {|µ| ≥ 0.1σn−1/2 } by the propagated confidence distribution for |µ| is exactly equal to the confidence assigned

to {µ ≤ −0.1σn−1/2 } ∨ {0.1σn−1/2 ≤ µ} by the traditionally accepted confidence distribution for µ. It bears re-iterating that these propositions will have a high probability of being assigned high amounts of confidence, regardless of whether or not the true value of |µ| is larger than 0.1σn−1/2 .

When a proposition has a high probability of being assigned a high confidence regardless of whether or not it is true, that confidence is false confidence; see Section 5.1. The false confidence assigned by the propagated distribution for |µ| is intuitively obvious. The false confidence assigned by the confidence distribution for µ is just as false and just as much the product of a misleading inference, but it usually goes unnoticed because it is assigned to a proposition with which an analyst would not ordinarily be concerned. The next section focuses on a problem in which analysts are concerned with exactly that kind of proposition. 14

4

Satellite Conjunction Analysis: A Practical Example

A satellite conjunction is an event in which either two satellites or a satellite and a piece of tracked debris are estimated to pass near each other. It usually results in a near-miss but, unfortunately, collisions do happen (Kessler et al., 2010). Conjunction analysis is the process by which a satellite operator computes the risk of collision during a conjunction event (Council et al., 2011). Increasingly, satellite operators (and their suppliers) are moving an toward approach in which they attempt to compute the “probability” of collision during a conjunction event (Alfano and Oltrogge, 2016). This is an epistemic probability, as opposed to an aleatory (i.e., frequentist) probability. The satellite paths are not random, in the frequentist sense of the word. Rather, they are fixed but uncertain due to random errors in observations used to estimate the orbits (Alfano and Oltrogge, 2016; Balch, 2016). The distribution used to represent the uncertainty in the satellite paths can readily be understood as a confidence distribution. The satellite orbit estimates (and the uncertainty therein) are tracked via a Kalman filter (Kalman et al., 1960), a well-established method among engineers specializing in guidance, navigation, and control. Under certain assumptions, the Kalman filter produces unbiased estimates of the state vector (i.e., the position and velocity of the two satellites) with the estimation errors having a multivariate normal distribution.6 Let XT be the true, unknown, state vector; that is, the positions 6

Plenty of objections can be raised against the Kalman filter as it is applied to orbit estima-

tion. It is worth noting, however, that there are variants that compensate for the worst deficiencies (Julier and Uhlmann, 1997). In any event, this is the common basis for all conjunction analysis today. And the biggest problem that engineers face is just getting the computation of risk correct, given this model. If risk numbers of the correct order of magnitude can be obtained, that will be a vast improvement

15

ˆ be the estimate of that state and velocities of the two satellites at closest approach. Let X produced by the Kalman filter. Finally, let C represent the known covariance matrix of the estimation errors. The estimate of the system state can be expressed as follows: ˆ = XT + EC X

p ΛC ξ,

where EC is a matrix whose columns are the eigenvectors of C, ΛC is the diagonal matrix whose entries are the eigenvalues of C, ξ is an iid unit normal random variable (i.e., an auxiliary variable). While the full multivariate confidence distribution for this problem is only readily defined in the theory of confidence structures, each of the marginal confidence distributions (on each of the scalar elements of XT ) satisfies the traditional definition of a confidence distribution. With this confidence distribution in hand, it should be a straightforward matter to compute the epistemic “probability” (i.e., confidence) assigned to the prospect of collision in a given conjunction analysis. Mathematically speaking, the computation is relatively simple. In fact, this is as simple a problem as an advocate for confidence distributions is likely to find in a real-world engineering context.7 Unfortunately, interpreting the result of this calculation is not a simple task. As illustrated in Figure 4, for a given satellite size, R, and estimated distance D at closest approach, past a certain point, as the uncertainty, S, in the orbit estimates increases, the epistemic “probability” of collision decreases. That is to say, the epistemic “probability” computed from the confidence distribution indicates that, as data quality decreases, the satellites get safer. This problem is known as “probability dilution”, and it is a serious red flag for engineers attempting to calculate collision risks. To any engineer worth his or her over the current situation (Balch, 2016). 7 Multiple sources can be consulted for the details of this calculation (e.g., Balch, 2016; Patera, 2001; Alfano, 2003; Alfano and Oltrogge, 2016).

16

Computed Probability of Collision, AR=1 D/R=0.5

0

1.0

Computed Probability of Collision, AR=1

-1

0

1

2

3

-5

Log10 of Sigma Log10 of S/R/ Radius

0

D/R= 50

D/R= 100 D/R= 200

D/R= 20

D/R= 10

D/R= 5

D/R= 2

-1

1

D/R= 500 D/R= 1000

0.0

D/R=2

-10

D/R=1

-15

Log10 of Collision Probability

0.8 0.4

0.6

D/R=0.5

0.2

Collision Probability

D/R=1

2

3

Log10 of Sigma Log10 of S/R/ Radius

Figure 4: Computed Collision Probability as a Function of D/R and S/R, where D is nominal distance between satellites at closest approach, R is combined radius of the two satellites, and S is the standard deviation of the relative position of the two satellites at closest approach. From Balch (2016) with permission. salt, the idea that less accurate data somehow make a system safer is foolish on its face (Balch, 2016). There have been multiple attempts to “fix” or avoid probability dilution, for example, Alfano (2003), Frigm (2009), Plakalovi´c et al. (2011), Sun et al. (2014), and Balch (2016). However, none of these attempts has been based in a thorough understanding of what exactly is wrong with probability dilution. Balch (2016) noted that, were the probability of collision aleatory, rather than epistemic, then probability dilution would actually make sense. For example, suppose two satellites were known, with certainty, to be on a collision course and the satellite operator could only impart an impulse of random magnitude in

17

-6

-5

-4

-3

-2

-1

0

Maximum Collision Probability for Given S/R Log10 of Maximum Collision Probability

0.8 0.6 0.4 0.2 0.0

Maximum Collision Probability

1.0

Maximum Collision Probability for Given S/R

-1

0

1

2

3

-1

Log10 of S/R

0

1

2

3

Log10 of S/R

Figure 5: Maximum Computable Probability of Collision, given S/R. This occurs at D/R = 0. From Balch (2016) with permission. a random direction. (Hypothetically, this impulse could be the result of an uncontrolled maneuvering thruster.) The higher the variance of that added impulse, the bigger the resulting perturbation is likely to be, and hence the lower the resulting probability of collision. So, the problem with probability dilution is not the math. The math makes sense, in the right context. Probability dilution is only counter-intuitive when the orbit “variance” reflects the limits of data quality, rather than genuine aleatory variability in the orbits themselves. The final clue about the nature of probability dilution is the fact that, for a fixed S/R ratio, there is a maximum possible computable probability of collision. Figure 5 illustrates this maximum computable probability. Whether or not the two satellites are on a collision course, no matter what results are obtained, the analyst will have a certain minimum

18

confidence that the satellites will not collide. That minimum confidence is determined purely by the data quality. For example, if the uncertainty in the distance between two satellites at closest approach is ten times the combined size of the two satellites,8 the analyst will always compute at least a 99% confidence that the satellites are safe. That is absurd.

5

The Root Cause: False Confidence

There is a common problem at play in the examples given in Sections 3–4. In Section 3, even if |µ| = 0, the proposition |µ| ≥ 0.1σn−1/2 will almost always be assigned a high confidence value. The complaint in the satellite collision risk calculation problem is that, for high ratios of orbit track uncertainty to satellite size, a high confidence is guaranteed to be assigned to the proposition that the satellites will not collide, regardless of whether or not they actually are on a collision course. In both these examples, the fundamental problem is that certain propositions have a high probability of being assigned a high confidence, regardless of whether or not they are true.

5.1

Definition

Strictly speaking, one could decry any confidence (or belief) assigned to a false proposition as “false confidence”. However, since the whole point of statistical inference is that one 8

Unfortunately, greater relative uncertainties than that are not at all unheard of in satellite conjunction

analysis. For example, the morning of the collision between Iridium 33 and Cosmos 2251, the estimated distance at closest approach between the two satellites was 584 meters (Council et al., 2011), and the combined effective radius of the two satellites was less than ten meters (ast, 2017; Garrison et al., 1997). Even though a formal statement of the acknowledged uncertainty in the 584 meter estimate is difficult to track down, since the true error in the estimate was over fifty times larger than the combined satellite size, one can safely assume that the S/R ratio was greater than ten.

19

does not know (a priori) the exact true value of the parameter being inferred, it is to be expected that, in the course of any given inference, some false propositions will be assigned some amount of confidence. The problem exemplified in Sections 3–4 is more severe: there are certain fixed false propositions that are guaranteed or nearly guaranteed to be assigned a high confidence. This suggests a formal definition for (uncontrolled) false confidence. An inferential method can formally be said to suffer from false confidence if, for any arbitrarily specified fixed level of confidence and probability of assignment, short of unity, there is at least one false proposition to which the method will assign (at least) the prescribed level of confidence with (at least) the level of prescribed probability. Next we show that all (or nearly all) confidence distributions used in practice suffer from false confidence.

5.2

Theorem

Most statistical inference in science and engineering involves one or more real-valued continuous parameter(s) inferred from continuous real-valued observables. Let the observable be X ∈ Rn , and let the parameter be θ ∈ Ωθ ⊆ Rm where m is the number of parameters being inferred. The confidence distributions commonly used in practice are, for all values of the observable, continuous distributions over the parameter(s) being inferred. That is, confidence assigned by the confidence distribution CnfΘ|x can be represented via a probability density function, say, cx (θ), depending on the observable(s) x. We assume that cx (θ) is a continuous function of x for fixed θ. These assumptions describe an “ordinary” confidence distribution in the sense that most confidence distributions, approximate confidence distributions, and Bayesian posteriors used in practice satisfy all of these assumptions. The false confidence theorem states that all confidence distributions satisfying these assumptions suffer from false confidence. Formal statement of the theorem and proof follow. 20

Theorem 1. Consider a confidence distribution CnfΘ|x characterized by a probability density function cx (θ) on Ωθ satisfying the assumptions above. Then for any θ ∈ Ωθ , any α ∈ (0, 1), and any p ∈ (0, 1), there exists a set A ⊂ Ωθ such that A 6∋ θ

and

ProX|θ ({x : CnfΘ|x (A) ≥ 1 − α}) ≥ p.

Proof. For a fixed realization, x, set c¯x = supθ cx (θ). Then for any bounded set B ⊂ Ωθ , Z CnfΘ|x (B) = cx (θ) dθ ≤ c¯x v(B), (5) B

where v(B) denotes the volume9 of B. As a function of X, for fixed θ, c¯X has a distribution; let ξ = ξθ,p ∈ (0, ∞) denote the pth quantile of the distribution of c¯X , i.e., ProX|θ ({x : c¯x ≤ ξ}) = p. Choose a neighborhood B ∋ θ with volume v(B) = α/ξ, so that v(B) ξ = α. Define A as the complement of B. That is, A = {θ ∈ Ωθ : θ ∈ / B} . Since confidence distributions satisfy the third Kolmogorov axiom, we have CnfΘ|x (A) = 1 − CnfΘ|x (B).

(6)

By definition, A 6∋ θ; by (5) and (6), we have c¯X ≤ ξ =⇒ CnfΘ|x (B) ≤ v(B) ξ ⇐⇒ CnfΘ|x (B) ≤ α ⇐⇒ CnfΘ|x (A) ≥ 1 − α. The definition for ξ stipulates that the left-most event occurs with probability p and, therefore, the right-most event occurs with at least probability p, proving the claim. 9

or length, area, or hyper-volume, depending on the dimensionality of θ

21

5.3

Practical Importance

It may seem like a cheat to prove the above theorem by defining an arbitrarily small neighborhood around the true parameter value and then marveling that such a small neighborhood is assigned such a small confidence. After all, inferences are uncertain, and it is natural that a relatively precise proposition (i.e., small set) should be assigned a small confidence value, even if it is true. However, that is not the heart of the false confidence problem. The heart of the false confidence problem is that, since distributional representations of uncertainty satisfy the Kolmogorov axioms, any confidence not assigned to a true (but narrow) proposition must be assigned to its false (but broad) complement. The ultimate cause of the false confidence problem is additivity, i.e., the third Kolmogorov axiom. Now, again, one could insist that the false confidence being assigned to the complement of a small neighborhood around the true parameter value is a mere theoretical curiosity of no practical importance. Unfortunately, that is not true. In the examples of Sections 3–4, the false confidence being assigned is specifically being assigned to the complement of a neighborhood about a parameter value of interest. In the |µ| example, the neighborhood of interest is the neighborhood around µ = 0. Since that neighborhood’s complement is assigned large amounts of false confidence, even if |µ| really is small, the confidence distribution on |µ| will indicate that it is large. Similarly, in the satellite collision risk calculation, the relatively small neighborhood of interest is the set of displacement values indicative of an impending collision between the two satellites. Even if the two satellites are on a collision course, the analyst computing collision “probabilities” actually has little to no chance of correctly detecting an impending collision, because large amounts of false confidence are being assigned the complement of that neighborhood of interest. These problems matter. Section 3 may be a toy example problem, but it represents 22

a whole class of problems in which inferential uncertainty has to be propagated through non-linear functions. Non-linear propagation happens. Similarly, the satellite collision risk problem represents a class of engineering problems with bounded failure domains. Problems with closed failure domains happen; in fact, they happen in any collision avoidance problem, not just those that involve satellites in orbit. The fact that these problems cast an ugly light on the performance of confidence distributions does not make them somehow degenerate or invalid. Engineers and scientists still have to deal with them. Also, there is no obvious reason to think that false confidence is only ever assigned to the full complement of a vanishingly small neighborhood. An arbitrarily high false confidence may only be assigned to the complement of a small neighborhood, but a moderate or high amount of false confidence can still be assigned to the complement of a larger neighborhood around the true value of the parameter. Moreover, the false confidence assigned to a set is also potentially assigned to its various subsets. Consider the example in Section 3. Large amounts of false confidence are assigned to propositions of the form {µ ≤ −0.1σn−1/2 } ∨

{0.1σn−1/2 ≤ µ}. How much of that false confidence is assigned to {µ ≤ −0.1σn−1/2 },

and how much is assigned to {µ ≥ 0.1σn−1/2 }? Phrased more precisely, how much of the confidence assigned to each of those two propositions is false confidence? Similarly, how much of the confidence assigned to {µ ≤ −0.2σn−1/2 } or {µ ≥ 0.2σn−1/2 } is false

confidence? How much of the confidence assigned to {µ ≤ −0.3σn−1/2 } or {µ ≥ 0.3σn−1/2 }

is false confidence? Given only the traditional confidence distribution on µ, it is impossible to tell. In short, false confidence casts doubt on all of the inferences supported by a confidence distribution. A practical theory of statistical inference must at least control the rate at which false confidence is assigned.

23

5.4

Implications for Bayesian Inference

The problem of false confidence is not limited to confidence distributions; it afflicts Bayesian posteriors as well. In fact, as the generality of the proof presented in Section 5.2 implies, any Kolmogorov-satisfying belief function on an inferred parameter (with a fixed unknown value), by whatever means obtained, will support misleading inferences. Even if a Bayesian posterior approximates or asymptotically converges to a confidence distribution, as in ?, that is still not a good rationalization for using the Kolmogorov-satisfying belief function supported by that Bayesian posterior. That being said, there is one possible class of exception to the application of this result to Bayesian posteriors. The false confidence theorem is built on the assumption that the parameter being inferred has a fixed true value. There are situations in which the parameter being inferred can be reasonably construed as having been randomly drawn from a known prior distribution representing genuine aleatory variability (von Mises, 1942; Fisher, 1957). If this is the case, then theoretically, the posterior also represents genuine aleatory variability; Bayes’ rule has simply updated the distribution from which θ is supposed to have been drawn. In this context, it could be argued there are no fixed “true” and “false” propositions, because the inferred parameter is not being modeled as having a fixed true value, but rather as a realization of a random process. If one were to accept that argument, then the result of Section 5.2 would be rendered moot. In fact, the question of how seriously the reader takes the false confidence theorem may be entirely governed by the question of how seriously the reader takes the supposition that θ has a fixed true value, as opposed to having been randomly drawn from some relevant prior distribution. A full consideration of this question is beyond the scope of this paper. We suggest, though, that this question is more practical than philosophical. The answer to it may vary from problem to problem. Moreover, it is also fair to challenge the Bayesian24

minded reader to ask himself or herself the reverse question: Do I take my prior distribution so literally, as representing aleatory variability, that I can safely ignore the false confidence theorem? The answer to this question may too vary from problem to problem. The remainder of this paper continues, though, as before, on the assumption that any given act of statistical inference concerns a parameter with a fixed, albeit uncertain, true value.

6

Martin–Liu Validity of Consonant Confidence Structures

The Martin–Liu validity criterion explicitly controls the rate at which false confidence can be assigned to a proposition. Recall the criterion in its plausibility form: ∀B ∈ Bθ s.t. B ∋ θ, ProX|θ ({x : PlsΘ|x (B) ≤ α}) ≤ α, ∀α ∈ [0, 1].

(7)

Since CnfΘ|x (B) = 1 − PlsΘ|x (B C ), this criterion can be rewritten as follows: ∀A ∈ Bθ s.t. A 6∋ θ ProX|θ ({x : CnfΘ|x (A) ≥ 1 − α}) ≤ α, ∀α ∈ [0, 1]. That is, any false proposition has, at most, a probability of α of being assigned a confidence value equal to or greater than 1 − α. As mentioned in Section 2.4, there is a connection between Martin–Liu IMs and consonant confidence structures. Specifically, the following theorem states that all consonant confidence structures satisfy the Martin–Liu criterion. Theorem 2. A consonant confidence structure satisfies the Martin–Liu validity criterion. 25

Proof. There are two equivalent ways of representing a consonant confidence structure: either via a pointwise plausibility function, rΘ|x : Ωθ → [0, 1], or via a consonant map from a unit uniform auxiliary variable. That is, Θ : [0, 1] × Ωx → Bθ s.t. Θ(a1 , x) ⊆ Θ(a2 , x) ∀x ∈ Ωx , a1 ≥ a2 . Taking the consonant map representation as primary, the pointwise plausibility function is computed as follows: rΘ|x (θ) = PlsΘ|x ({θ}) = Proa ({a : Θ(a, x) ∋ θ}), ∀θ ∈ Ωθ . Since the map is ordered and consonant, the following equivalence holds: ∀α ∈ [0, 1], θ ∈ Ωθ , rΘ|x (θ) ≤ α ⇐⇒ θ 6∈

[

Θ(a, x).

a∈(α,1]

That is, θ being assigned a pointwise plausibility value less than or equal to α is equivalent to Θ(a, x) failing to cover θ, for all a greater than α. Phrased in terms of probabilities over repeated draws of X, the above equivalence leads to the following equality: ∀α ∈ [0, 1], θ ∈ Ωθ , ProX|θ {rΘ|x (θ) ≤ α} = ProX|θ ({x :

[

a∈(α,1]

Θ(a, x) 6∋ θ}).

(8)

In other words, the probability of θ being assigned a pointwise plausibility value less than or equal to α is equal to the probability of Θ(a, x) failing to cover θ, for all a greater than α. Now, recall the requirement (1) imposed on all confidence structures, consonant or S otherwise. Then a∈(α,1] Θ(a, x) must cover the true value of θ with at least a probability

of 1 − α. That is,

    [  ProX|θ x: Θ(α, x) ∋ θ  ≥ 1 − α ∀θ ∈ Ωθ , α ∈ [0, 1] .   a∈(α,1]

26

Rearranging obtains     [ ProX|θ  x : Θ(α, x) 6∋ θ  ≤ α ∀θ ∈ Ωθ , α ∈ [0, 1] .   a∈(α,1]

Combining this relationship with (8) yields the following:

ProX|θ ({x : rΘ|x (θ) ≤ α}) ≤ α, which means that rΘ|X (θ) has a right-of-uniform distribution. Since PlsΘ|x (B) ≥ rΘ|x (θ) for all B ∋ θ, it follows that PlsΘ|X (B) is also right-of-uniform for any B ∋ θ, hence (7). Not only do consonant confidence structures satisfy the Martin–Liu validity criterion, Martin (2014) has also found that Martin–Liu IMs result in consonant confidence structures. The two approaches are near-identical. In fact, the only practical distinction between consonant confidence structures and Martin–Liu IMs is that the latter have a prescribed normative method of constructive, whereas consonant confidence structures do not. Balch (2012) suggests a method for constructing consonant confidence structures via p-values, but this method is not asserted to be normative in any way. Recent efforts suggest that even this distinction may not be substantive, but the details will be presented elsewhere.

7

Martin–Liu Validity of Confidence Intervals

Confidence intervals have traditionally been justified in terms of the coverage probability that they guarantee, but they also satisfy the Martin–Liu validity criterion. While coverage probability, on its own, is not enough to guarantee reliable inference, as shown in the previous section, coverage probability and consonance together are adequate. Confidence intervals are trivially consonant. The plausibility accorded to any point inside a 100(1−α)% 27

confidence interval, for fixed α, is (practically speaking) unity;10 the plausibility accorded to any point outside the confidence interval is α. Any set containing the confidence interval is assigned a confidence of 100(1 − α)%. Any set failing to contain the confidence interval is assigned zero confidence. In short, a confidence interval is an extremely coarse consonant confidence structure, defined as follows:   R, a ∈ [0, α), Θ(a, x) =  [θL (x), θR (x)], a ∈ [α, 1],

with a ∼ U(0, 1).

As a consonant confidence structure, Theorem 2 implies that a confidence interval satisfies the Martin–Liu validity criterion, albeit in an inefficient manner. Figure 6 illustrates a confidence interval and its performance (inefficiently) satisfying the Martin–Liu validity criterion. Even without Theorem 2, it is trivially easy to demonstrate that confidence intervals

satisfy the Martin–Liu validity criterion, provided that one is willing to accept the intuitive assignments of plausibility described in the previous paragraph. A 100(1 − α)% confidence interval has a probability of α of assigning a plausibility of α to the true parameter value and a probability of 1 − α of assigning a plausibility of 1 to the true parameter value. That distribution of plausibility is strictly right-of-uniform. It is extremely right-of-uniform. So, it satisfies the Martin–Liu validity criterion. Interestingly, in this context, the coverage probability criterion is simply the requirement that the confidence interval have a probability of 1 − α of assigning a plausibility of 1 to the true parameter value. In this light, coverage probability can be seen as merely an incomplete and inefficient substitute for the Martin–Liu validity criterion. 10

While one has the general sense that values near the edge of a (central) confidence interval are less

plausible than those in the middle of the interval, practically speaking, if all one retains is the fixed-α confidence interval, then that kind of information has been thrown away.

28

0.8 0.6 0.4 0.0

0.2

CDF PointwisePlausibility Plausibility CDF ofofPointwise

0.8 0.6 0.4 0.2 0.0

PointwisePlausibility Plausibility Pointwise

1.0

Martin-Liu Validity of Confidence Interval

1.0

Confidence Interval

-4

-2

0

2

4

0.0

mu Inferred Parameter, ș

0.2

0.4

0.6

0.8

1.0

Pointwise Plausibility at at True True Parameter Value Pointwise Plausibility Parameter Value

Figure 6: A 95% confidence interval illustrated as a coarse consonant confidence structure. The more nuanced consonant confidence structure from which it was derived is illustrated with a dashed line.

8

Conclusions

This paper started with a review of confidence distributions, in the broader context of the theory of confidence structures. Two examples were explored, in which confidence distributions support plainly misleading inferences. The core problem with confidence distributions was then identified as false confidence. It was then proved that false confidence is present in all or nearly all confidence distributions used in practice. This finding also applies to approximate confidence distributions and most Bayesian posteriors. Methods suffering from the problem of false confidence were then contrasted with methods that satisfy the Martin–Liu validity criterion, which explicitly eliminates the problem of uncontrolled false confidence. It was found that, within the theory of confidence structures, consonant confidence structures are guaranteed to satisfy the Martin–Liu validity criterion. 29

It was also found that confidence intervals satisfy the Martin–Liu validity criterion. The failure of confidence distributions is due to a fundamental weakness in coverage probability as a criterion for statistical inference. Mathematically speaking, the difference between consonant confidence structures and confidence distributions is that, while a consonant confidence structure only represents a consonant set of confidence regions, a confidence distribution represents an entire Borel algebra of confidence regions. In terms of coverage probability, both approaches are valid. In fact, were coverage probability an adequate criterion, there would be every reason to prefer confidence distributions over consonant confidence structures. However, when evaluated in terms of how a confidence structure assigns confidence (and plausibility) to fixed propositions over repeated draws of the data, the deficiencies of confidence distributions become all too clear. The misleading inferences supported by confidence distributions in both non-linear propagation problems and collision risk calculations are revealed to be the result of a deep problem with false confidence that is ever-present in confidence distributions. Simply put, confidence distributions stretch the coverage probability criterion past its breaking point. Since any statistical method satisfying the Martin–Liu validity criterion also has coverage probability properties, the Martin–Liu validity criterion is a stronger criterion than coverage probability. The fact that confidence intervals satisfy this stronger criterion means that the statistics community has long understated the empirical rigor of confidence intervals. Confidence intervals not only cover the true value of the parameter being inferred at a promised rate, they also imbue confidence and plausibility to fixed propositions (i.e., sets) in a way that is statistically reliable, albeit inefficient relative to more nuanced consonant confidence structures.

30

References (Accessed May 13, 2017). Strela-2m. In Wade, M., editor, Encyclopedia Astronautica. http://www.astronautix.com/craft/strela2m.htm. Alfano, S. (August, 2003). Relating position uncertainty to maximum conjunction probability, AAS-paper 03-548. In AAS/AIAA Astrodynamics Specialists Conference, Big Sky, Montana. Alfano, S. and Oltrogge, D. (2016). Probability of collision: Valuation, variability, visualization and validity. In AIAA/AAS Astrodynamics Specialist Conference, page 5654. Balch, M. S. (2012). Mathematical foundations for a theory of confidence structures. International Journal of Approximate Reasoning, 53:1003–1019. Balch, M. S. (2016). A corrector for probability dilution in satellite conjunction analysis. In 18th AIAA Non-Deterministic Approaches Conference, page 1445. Balch, M. S. and Smarslok, B. P. (2014). A pre-validation study on supersonic wind tunnel data collected from legacy aerothermal experiments. In 16th AIAA Non-Deterministic Approaches Conference, page 0812. Barnett, V. (1999). Comparative Statistical Inference. Wiley and Sons, Chichester, New York, 3rd edition. Casella, G. and Berger, R. L. (2002). Statistical Inference. Thomson Learning, Pacific Grove, California, 2nd edition. Council, N. R. et al. (2011). Limiting Future Collision Risk to Spacecraft: An Assessment of NASA’s Meteoroid and Orbital Debris Programs. National Academies Press. 31

Cox, D. R. (2006). Principles of Statistical Inference. Cambridge University Press, New York, NY. Dubois, D. and Prade, H. (2012). Possibility theory: an approach to computerized processing of uncertainty. Springer Science & Business Media. Ferson, S., Balch, M., Sentz, K., and Siegrist, J. (2013). Computing with confidence. In Proceedings of the Eighth International Symposium on Imprecise Probability: Theory and Applications. Ferson, S., O’Rawe, J., and Balch, M. (2014). Computing with confidence: imprecise posteriors and predictive distributions. In Vulnerability, Uncertainty, and Risk: Quantification, Mitigation, and Management, pages 895–904. Fisher, R. A. (1957). Statistical Methods and Scientific Inference. Oliver and Boyd, Edinburgh, United Kingdom. Frigm, C. (2009). A single conjunction risk assessment metric: the f-value. Advances in the Astronautical Sciences, 135(2):1175–1192. Garrison, T., Ince, M., Pizzicaroli, J., and Swan, P. (1997). Systems engineering trades for the iridium constellation. Journal of Spacecraft and Rockets, 34(5):675–680. Julier, S. J. and Uhlmann, J. K. (1997). New extension of the kalman filter to nonlinear systems. In AeroSense’97, pages 182–193. International Society for Optics and Photonics. Kalman, R. E. et al. (1960). A new approach to linear filtering and prediction problems. Journal of basic Engineering, 82(1):35–45.

32

Kessler, D. J., Johnson, N. L., Liou, J., and Matney, M. (2010).

The kessler syn-

drome: implications to future space operations. Advances in the Astronautical Sciences, 137(8):2010. Martin, R. (2014). Random sets and exact confidence regions. Sankhya A, 76(2):288–304. Martin, R. and Liu, C. (2013). Inferential models: A framework for prior-free posterior probabilistic inference. Journal of the American Statistical Association, 108(501):301– 313. Martin, R. and Liu, C. (2014). A note on p-values interpreted as plausibilities. Statistica Sinica, pages 1703–1716. Martin, R. and Liu, C. (2015a). Conditional inferential models: combining information for prior-free probabilistic inference. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77(1):195–217. Martin, R. and Liu, C. (2015b). Inferential Models: Reasoning with Uncertainty, volume 145. CRC Press. Martin, R. and Liu, C. (2015c).

Marginal inferential models: prior-free probabilis-

tic inference on interest parameters. Journal of the American Statistical Association, 110(512):1621–1631. Mayo, D. G. (1996). Error and the growth of experimental knowledge. University of Chicago Press, Chicago, Illinois. Molchanov, I. (2005). Theory of Random Sets. Springer-Verlag, London, England, 1st edition.

33

Nadarajah, S., Bityukov, S., and Krasnikov, N. (2015). Confidence distributions: A review. Statistical Methodology, 22:23–46. O’Rawe, J. A., Ferson, S., and Lyon, G. J. (2015). Accounting for uncertainty in dna sequencing data. Trends in Genetics, 31(2):61–66. Patera, R. P. (2001). General method for calculating satellite collision probability. Journal of Guidance, Control, and Dynamics, 24(4):716–722. Plakalovi´c, D., Hejduk, M., Frigm, R., and Newman, L. (2011). A tuned single parameter for representing conjunction risk. In 2011 AIAA/AAS Astrodynamics Specialist Conference. Girdwood, AK. Robert, C. (2007). The Bayesian choice: from decision-theoretic foundations to computational implementation. Springer Science & Business Media. Salicone, S. (2007). Measurement Uncertainty: An Approach via the Mathematical Theory of Evidence. Springer Science+Business Media, LLC. Schweder, T. (2007). Confidence nets for curves. Advances in Statistical Modeling and Inference. Essays in honor of Kjell A. Doksum, pages 593–609. Schweder, T. and Hjort, N. L. (2002). Confidence and likelihood. Scandinavian Journal of Statistics, 29:309–332. Schweder, T. and Hjort, N. L. (2016). Confidence, Likelihood, Probability, volume 41 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, New York. Statistical Inference with Confidence Distributions.

34

Shafer, G. (1976). A Mathematical Theory of Evidence. Prinecton University Press, Princeton, New Jersey. Singh, K., Xie, M., and Strawderman, W. E. (2007). Confidence distribution (cd): Distribution estimator of a parameter. Lecture Notes-Monograph Series, 54:132–150. Sun, Z., Luo, Y., and Niu, Z. (2014). Spacecraft rendezvous trajectory safety quantitative performance index eliminating probability dilution. Science China Technological Sciences, 57(6):1219–1228. von Mises, R. (1942). On the correct use of bayes’ formula. The Annals of Mathematical Statistics, 13(2):156 – 165. Xie, M. and Singh, K. (2013). Confidence distribution, the frequentist distribution estimator of a parameter: a review. Int. Stat. Rev., 81(1):3–39.

35