Statistics and Probability Letters Variable-based

0 downloads 0 Views 388KB Size Report
Nov 12, 2018 - contingency table with missing items and units is MNAR if its ... 3. Identification of missing mechanisms. Let πijkℓ be P(Y1 = i,Y2 = j,R1 = k,R2 ...
Statistics and Probability Letters 146 (2019) 90–96

Contents lists available at ScienceDirect

Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro

Variable-based missing mechanism for an incomplete contingency table with unit missingness Saebom Jeon a , Tae Yeon Kwon b , Yousung Park c ,



a

Department of Marketing Information Consulting, Mokwon University, Republic of Korea Department of International Finance, Hankuk University of Foreign Studies, Republic of Korea c Department of Statistics, Korea University, Republic of Korea b

article

info

Article history: Received 22 March 2018 Received in revised form 19 October 2018 Accepted 3 November 2018 Available online 12 November 2018 Keywords: EMAR Mechanism criteria Missing ratios Missing unit

a b s t r a c t For an incomplete two-way contingency table including cases of missing units (i.e., unobserved data for both variables), we show that every pair of missing mechanisms posited for bivariate categorical variables is uniquely reproduced by an extended missing at random (EMAR) model with exactly equal fit. A condition is thus proposed to identify EMAR from the other missing mechanisms. © 2018 Elsevier B.V. All rights reserved.

1. Introduction Categorical variables subject to missing data can be summarized as an incomplete contingency table with supplemental margins. The missing probability of a variable can depend on the other variables, which may be observed or not. We need to identify how the missing data occur as the inference on missing data depends on the missing mechanism. Missing mechanisms are classified as missing completely at random (MCAR) if the missingness is unrelated to both the observed and unobserved variables, missing at random (MAR) if the missingness depends only on the observed measurements, and missing not at random (MNAR) if the missingness depends on the values of the missing data (Little and Rubin, 2014). However, the classification of missing mechanisms is somehow unclear under the above taxonomy of missing mechanisms when missing units are involved in a contingency table. In a two-way contingency table, the missingness of one variable may depend on the other variable which is observed in its supplemental margin but not in missing units. Molenberghs et al. (2008) classified the missing mechanism in this case as MNAR because the missingness depends on unobserved values of the other variable due to the missing units, while Park et al. (2014) classified it as MAR because they seemed to regard such a case as an ignorable missingness. To eliminate this ambiguity of missing mechanism arising from missing units in a two-way contingency table, we define a variable-based missing mechanism and propose a new missing mechanism called extended missing at random (EMAR) in the sense that EMAR includes MAR as a special case when there is only missing item. The EMAR missingness is defined by isolating it from the MNAR missing mechanisms classified by the traditional MCAR/MAR/MNAR missing mechanism. We show that every pair of missing mechanisms posited for two categorical variables is uniquely reproduced by an EMAR pair model with exactly equal fit under the selection model and pattern mixture ∗ Corresponding author. E-mail address: [email protected] (Y. Park). https://doi.org/10.1016/j.spl.2018.11.006 0167-7152/© 2018 Elsevier B.V. All rights reserved.

S. Jeon, T.Y. Kwon and Y. Park / Statistics and Probability Letters 146 (2019) 90–96

91

Table 1 P(r1 , r2 |y1 , y2 ) under the selection model according to the missing mechanisms of Y1 and Y2 . Y2 Y1

MCAR

EMAR

MNAR

MCAR EMAR

P(r1 , r2 ) P(r2 |r1 )P(r1 |y2 )

P(r1 |r2 )P(r2 |y1 ) P(r1 ,r2 ) P(r2 |y1 )P(r1 |y2 ) P(r )P(r )

P(r1 |r2 )P(r2 |y2 ) P(r1 , r2 |y2 )

MNAR

P(r2 |r1 )P(r1 |y1 )

P(r1 , r2 |y1 )

1

2

P(r1 ,r2 ) P(r2 P(r1 )P(r2 )

|y2 )P(r1 |y1 )

model frames. Using the approaches used in Park et al. (2014) and Kim et al. (2015), we then propose a new criterion for the identification condition to detach EMAR from MNAR and MCAR. The rest of the paper is organized as follows. In Section 2, we define the variable-based missing mechanism to clarify missing mechanisms in an incomplete two-way contingency table with both missing items and units. Section 3 shows that MNAR, EMAR, and MCAR missing mechanisms are not identifiable in either estimable selection or pattern mixture frames. Then, we propose a new condition to distinguish EMAR from MNAR and MCAR. Our results are applied to a real data and examined by a simulation study. 2. Variable-based missing mechanism and EMAR Denote Y1 and Y2 as the two random variables with I and J levels, respectively, in a two-way contingency table and R1 and R2 as the respective missing indicators, where Ri = 1 if Yi is observed and Ri = 2 if Yi is missing for i = 1, 2. Using these missing indicators, we have completely observed cell counts denoted by zij11 when R1 = R2 = 1, supplemental margins only on Y1 denoted by zi+12 when R1 = 1 and R2 = 2, supplemental margins only on Y2 denoted by z+j21 when R1 = 2 and R2 = 1, and the count of missing units denoted by z++22 when R1 = R2 = 2. Thus, IJ + I + J + 1 observations are available to estimate the cell probabilities. The most popular factorizations for the joint probability function of Y1 , Y2 , R1 , and R2 are selection models and pattern mixture models (Little and Rubin, 2014; Little, 1994). Under the selection model, the joint probability function of Y1 , Y2 , R1 , and R2 can be factorized as follows. P(y1 , y2 , r1 , r2 ) = P(y1 , y2 )P(r2 |r1 , y1 , y2 )P(r1 |y1 , y2 ),

(1)

where P(y1 , y2 , r1 , r2 ) = P(Y1 = y1 , Y2 = y2 , R1 = r1 , R2 = r2 ) for y1 = 1, . . . , I, y2 = 1, . . . , J, and r1 , r2 = 1, 2, and the other marginal and conditional probabilities are similarly defined. We define the variable-based missing mechanism as given by Definition 2.1 (Variable-based Missing Mechanism). The missing mechanism of one variable in an incomplete two-way contingency table with missing items and units is MNAR if its missingness depends only on its own missing value, EMAR if its missingness depends on the other variable, and MCAR if its mechanism does not depend on either variable. Under the variable-based missing mechanism, the missingness of Y1 is MNAR if R1 depends only on Y1 , EMAR if R1 depends only on Y2 , and MCAR if R1 does not depend on either Y1 or Y2 . The missingness of Y2 is also similarly defined. EMAR becomes MAR when the contingency table includes only missing items because the missingness of one variable depends on the observed values of the other variable, and hence EMAR is a broader concept than MAR. The MNAR defined above may be slightly narrower than the traditional MNAR. However, they are the same as long as the missing mechanism is free from missing types (i.e., missing item and missing unit) because the missingness of one variable depends only on its own missing values. The factorization of (1) involves more parameters than the degrees of freedom (IJ + I + J). The minimum requirement for model (1) to be estimable is to impose independence between Ri and Yj (or Yi ) if Ri depends only on Yi (or Yj ) for i ̸ = j = 1, 2. This is equivalent to a log-linear model with no three- or four-way interactions among R1 , R2 , Y1 , and Y2 (Baker et al., 1992; Choi et al., 2009; Park and Choi, 2010). When the independence holds, the selection model is simplified and estimable as follows. If R1 depends only on Y1 and R2 depends only on Y2 (e.g., both Y1 and Y2 are MNAR), the second and the third terms in Eq. (1) are reduced to P(r2 |r1 , y1 , y2 ) = P(r2 |r1 , y2 ) and P(r1 |y1 , y2 ) = P(r1 |y1 ), respectively, and the selection model in Eq. (1) can be represented by P(y1 , y2 , r1 , r2 ) = P(y1 , y2 ) × P(r2 |r1 , y2 )P(r1 |y1 ) P(r1 , r2 ) P(r2 |y2 )P(r1 |y1 ). = P(y1 , y2 ) × P(r1 )p(r2 )

(2)

In the same way for other missing mechanisms of Y1 and Y2 , one can show that P(y1 , y2 , r1 , r2 ) for the nine cases of selection model factorization is P(y1 , y2 ) multiplied by P(r1 , r2 |y1 , y2 ) provided in Table 1. Next, under the pattern mixture model, the joint probability of Y1 , Y2 , R1 , and R2 can be factorized as given by P(y1 , y2 , r1 , r2 ) = P(r1 , r2 )P(y1 , y2 |r1 , r2 ).

(3)

92

S. Jeon, T.Y. Kwon and Y. Park / Statistics and Probability Letters 146 (2019) 90–96 Table 2 P(y1 , y2 |r1 , r2 ) under the pattern mixture model for different missing mechanisms of Y1 and Y2 . Y2 Y1

MCAR

EMAR

MCAR EMAR

P(y1 , y2 ) P(y1 |y2 )P(y2 |r1 )

P(y2 |y1 )P(y1 |r2 ) P(y1 ,y2 ) P(y2 |r1 )P(y1 |r2 ) P(y )P(y )

MNAR

P(y2 |y1 )P(y1 |r1 )

P(y2 |y1 )P(y1 |r1 , r2 )

1

MNAR

2

P(y1 |y2 )P(y2 |r2 ) P(y1 |y2 )P(y2 |r1 , r2 ) P(y1 ,y2 ) P(y1 P(y1 )P(y2 )

|r1 )P(y2 |r2 )

The estimable pattern mixture model can be factorized by using the same arguments as in the estimable selection model. Table 2 describes the nine cases of pattern mixture model factorization according to the missing mechanisms of Y1 and Y2 . The joint probability P(y1 , y2 , r1 , r2 ) is represented as the product of P(r1 , r2 ) and P(y1 , y2 |r1 , r2 ) provided in Table 2. These selection model and pattern mixture model factorizations are reduced to the typical expression of MAR provided by Molenberghs et al. (2008) when a two-way contingency table contains only missing items. 3. Identification of missing mechanisms Let πijkℓ be P(Y1 = i, Y2 = j, R1 = k, R2 = ℓ) for notational simplicity. Its marginal probabilities are similarly defined as, for example, πi+12 = P(Y1 = i, R1 = 1, R2 = 2) and π++22 = P(R1 = 2, R2 = 2). Then, the observed likelihood is L=

J I ∏ ∏

z

ij11 πij11

i=1 j=1

I ∏

z

πi+i+1212

J ∏

z

(4)

j=1

i=1

where the fixed total count N =

z

j21 ++22 π++j21 π++ 22 ,



i,j,k,ℓ zijkℓ ,

for i = 1, . . . , I, j = 1, . . . , J, k = 1, 2, and l = 1, 2.

3.1. Non-identifiability of missing mechanisms Define the following two missing ratios by using the cell probabilities as given by

αij =

πij21 πij12 and βij = . πij11 πij11

(5)

It is easy to show by Tables 1 and 2 that these missing ratios in the estimable selection model and pattern mixture model frameworks given in (2) and (3) are reduced to αij = αi , αj , and α when Y1 is MNAR, EMAR, and MCAR, respectively, and βij = βj , βi , and β when Y2 is MNAR, EMAR, and MCAR, respectively. A similar reparameterization was also used in Baker et al. (1992) and Park et al. (2014) for a log-linear model specification. The incompletely observed cell probability πij21 and πij12 can be estimable using the above two missing ratios αij and βij and completely observed cell probabilities πij11 . Lemma 3.1. Suppose that the marginal distribution of unobserved Yi , i = 1, 2 arising from missing units is proportional to that from missing items. Then,

πij22 = αij βij πij11

π+j11 π++22 . π+j21 π++12

The proof is in Appendix A.1. The assumption in Lemma 3.1 ensures the same missing mechanism between missing items and missing units. Lemma 3.1 implies that the parameters that we need to estimate πij21 , πij12 , and πij22 are πij11 , αij , βij , and π++22 . π++12

In particular, when both Y1 and Y2 are EMAR, the parameters are πij11 , αj , βi , and

π++22 , π++12

whose number is equal to

the number of observations of zij11 , z+j21 , zi+12 , and z++22 , producing the ML estimates of the observed likelihood given in (4) perfectly fitted to observations with πˆ ij11 = zij11 /N, αˆj = z+j21 /z+j11 , and βˆi = zi+12 /zi+11 . In this regard, EMAR can be interpreted as ignorable since the conditional distribution of R1 and R2 given Y1 and Y2 is the same as for all unobserved values of Y1 and Y2 . ∑ On the other hand, the∑ MNAR missing mechanism requires the solutions of interior ML equations: π+j21 = i αi πij11 for j = 1, . . . , J and πi+12 = j βj πij11 for i = 1, . . . , I where αi and βj are nonnegative as αi and βj should satisfy πij21 = αi πij11 and πij12 = βj πij11 , respectively. However, the two equations ∑ generally do not hold. For example, if I > J, no solution exists because there are I equations for J parameters in zi+12 = j βj zij11 , and even for I = J, αi or βj may have a negative value, which is called the boundary solution problem (Clarke, 2002; Park et al., 2014). Therefore, in general, the ML estimates under the MNAR missing mechanism do not perfectly fit observations even when the MNAR data are saturated (i.e., I = J). The ML estimates under MCAR, of course, do not perfectly fit the observations, as the number of parameters are smaller than the number of observations. Based on this discussion, we may proceed with the following data analysis. First, select any pair of missing mechanisms for Y1 and Y2 among MNAR, EMAR, and MCAR except for EMAR on both Y1 and Y2 and fit the selected missing model to ∗ maximize the observed likelihood of (4). Second, denote the resulting fitted estimates of zij11 , zi+12 , z+j21 , and z++22 by zij11 ,

S. Jeon, T.Y. Kwon and Y. Park / Statistics and Probability Letters 146 (2019) 90–96

93

∗ ∗ zi∗+12 , z+ j21 , and z++22 , respectively, which are, in turn, fitted to the EMAR Y1 and Y2 model. Given the nature of the EMAR

model discussed above, there is a unique EMAR Y1 and Y2 model that is perfectly fitted to the estimated observations, and it yields the same observed likelihood of (4) as that of the selected missing model, although the EMAR model produces different estimates for the missing probabilities of πij12 , πij21 , and πij22 from those of the selected missing model. This leads to the following result. Theorem 3.2. Every pair of missing mechanisms in an incomplete contingency table with missing items and units is uniquely reproduced by an EMAR model with exactly equal fit. This theorem says that the missing mechanism in a two-way table with missing items and units is not identifiable owing to the perfect fit of the EMAR model to the observations. We illustrate such a non-identifiability with a real data in the following Section 3.3. This requires a criterion to identify the missing mechanism in an incomplete two-way contingency table. 3.2. A new identification rule j

Now, we propose a criterion to identify the missing mechanism using the cell probabilities. Let ωii+′ and ωii′ be the ratio of incompletely observed cell probabilities and completely observed cell probabilities, as denoted by πi+12 /πi′ +12 and πij11 /πi′ j11 , respectively, for i ̸= i′ . For i, i′ = 1, . . . , I, define

ωii+′ =

πi+12 πij11 πij11 , ωiimax = maxj , and ωiimin = minj for i ̸ = i′ . ′ ′ ′ ′ πi +12 πi j11 πi′ j11

Here, we assume that πijkl > 0 for all i, j, k, and l. Similarly, for j, j′ = 1, . . . , J,

ωjj+′ =

π+j21 πij11 πij11 , ωjjmax = maxi , and ωjjmin = mini for j ̸ = j′ . ′ ′ π+j′ 21 πij′ 11 πij′ 11

These ratio combinations can be estimated by the missing ratios αij and βij according to the missing mechanisms of Y1 and Y2 . Theorem 3.3. Define two conditions to identify missing mechanisms of Y1 and Y2 as C1 : ωjjmin < ωjj+′ < ωjjmax and C2 : ωiimin < ωii+′ < ωiimax ′ ′ ′ ′ .

(6)

The missing mechanism of Y1 is EMAR if there is a pair of j and j′ violating C1 , and that of Y2 is EMAR if there is a pair of i and i′ violating C2 . The proof of Theorem 3.3 is in Appendix A.2. The theorem says that MNAR and MCAR always satisfy C1 and C2, while EMAR rarely satisfies the two conditions as shown in a limited simulation study below. Thus, we propose MNAR or MCAR as the missing mechanism when condition C1 for Y1 and condition C2 for Y2 are met. Once the missing mechanism is determined as MNAR or MCAR by the two conditions, MCAR can be identifiable from MNAR by the usual likelihood ratio test as MCAR is a nested model of MNAR. In practice, conditions C1 and C2 are examined by the estimates of πij11 , πi+12 , and π+j21 . Since observations can be interpreted as realized counts from an undisclosed missing mechanism, natural estimates of them are zij11 /N, zi+12 /N, and z+j21 , respectively. Similar conditions based on observations have been discussed in log-linear models by Park et al. (2014) and Kim et al. (2015). 3.3. Data analysis of birth weight and smoking status The application of non-identifiability between EMAR and MNAR (or MCAR) and the new identification rule is illustrated with the real data of Baker et al. (1992). Table 3(a) provides the incomplete contingency table classified with mother’s smoking status (yes, no) and newborn’s weight (< 2500 g, ≥ 2500 g) with two supplemental margins concerned only with smoking and newborn’s weight and with the count of missing units. Baker et al. (1992) proposed nine identifiable log-linear models denoted by M1(α, β ), M2(α, βi ), M3(αj , β ), M4(α, βj ), M5(αi , β ), M6(αi , βi ), M7(αj , βj ), M8(αi , βj ), and M9(αj , βi ) where αij and βij are the two missing ratios defined in (5). Thus, for example, Y1 is MNAR and Y2 is MCAR in M5(αi , β ), whereas Y1 and Y2 are both MNAR in M8(αi , βj ) and are both EMAR in M9(αj , βi ). Note that MAR does not exist in these log-linear models as mentioned earlier and thus the non-identifiability between MNAR and MAR (Molenberghs et al., 2008) is not applicable within such log-linear models. The fits of M1 through M8 and their corresponding EMAR counterparts to the observed data in Table 3(a) coincide with each other as illustrated in Table 3(b) and (c) for M5(αi , β ) and M8(αi , βj ), respectively. This shows the claim of + max Theorem 3.2. However, their fully estimated cell counts are much different as in Table 3(d) and (e). Since (wjjmin ′ , wjj′ , wjj′ ) = + min max (.14, .306, .215) and (wii′ , wii′ , wii′ ) = (.87, .924, 1.329) from the observed data Table 3(a), condition C1 is violated but C2 is satisfied. Thus, the plausible missing mechanisms are EMAR for Y1 and MNAR for Y2 and the corresponding log-linear model is M7(αj , βj ). This model is saturated to produce a perfect fit to the observed data and its fully estimated counts are presented in Table 3(f).

94

S. Jeon, T.Y. Kwon and Y. Park / Statistics and Probability Letters 146 (2019) 90–96 Table 3 Birth weight and smoking: nonidentifiability of missing mechanism and the estimated cell count by the new identification rule.

Table 4 Number of cases satisfying condition C1 under each of the missing mechanisms out of 10,000 repetitions.

αj = (α1 , α2 , α3 )

Missing mechanism

Missing percentage

EMAR

10% 20% 30%

1956 1343 1124

Missing mechanism

Missing percentage

αi = (α1 , α2 ) (1, 1.0)

(1, 1.2)

(1, 1.4)

(1, 1.6)

(1, 1.8)

MNAR or MCAR

10% 20% 30%

9672 9931 9981

9768 9976 9989

9810 9981 9990

9790 9952 9988

9731 9928 9971

(1, 1, 1.5) (1, 1, 1.7) (1, 1, 2) (1, 1.7, 2) (1, 2, 2) 426 101 49

24 0 0

9 1 0

6 0 0

Note αj and αi indicate the extent of EMAR and MNAR for Y1 , respectively.

3.4. Simulation studies We consider 2 × 3 × 2 × 2 contingency tables where Y1 has two levels, Y2 has three levels, and R1 and R2 are two levels. We use the conditional probabilities π11|11 = 0.3, π21|11 = 0.2, π12|11 = 0.2, π22|11 = 0.05, π13|11 = 0.1, and π23|11 = 0.15, where πij|kl = P(Y1 = i, Y2 = j|R1 = k, R2 = l), to generate completely observed cell counts. The missing percentage of Y1 , P(R1 = 2, R2 = 1), varies from 10% to 30% with P(R1 = 1, R2 = 2) = 0.03 and P(R1 = 2, R2 = 2) = 0.02 fixed. For simplicity of discussion, we consider only condition C1 for Y1 . When Y1 is EMAR, we vary αj , j = 1, 2, 3 so that maximum ratios among them are 1.5, 1.7, and 2, whereas, when Y1 is MNAR or MCAR, we set α2 to be from 1 to 1.8 times larger than α1 , where Y1 is MCAR if α1 = α2 = 1. Table 4 shows the number of times in which condition C1 is satisfied from 10,000 repetitions with a sample size of 3,000 to minimize the impact of sampling error. When Y1 is EMAR, the number satisfying C1 is less than 430 except only for αj = (1, 1, 1.5). The number also decreases as the missing percentage increases or as the extent of EMAR (i.e., the maximum ratio among α1 , α2 and α3 ) increases. On the other hand, as expected, when Y1 is MNAR or MCAR, more than 96% satisfy condition C1 regardless of α2 . The remaining 4% may be due to sampling error. Based on this simulation, the plausible missing mechanism of Y1 (Y2 ) is MNAR or MCAR when C1 (C2) is met.

S. Jeon, T.Y. Kwon and Y. Park / Statistics and Probability Letters 146 (2019) 90–96

95

Appendix

A.1. Proof of Lemma 3.1 π

π

Proof. Let αij∗ = πij22 and βij∗ = πij22 to define missing ratios for missing units. Then, it is straightforward to show by using ij12 ij21 Tables 2 and 3 that αij∗ = αi∗ , αj∗ , and α ∗ when Y1 is MNAR, EMAR, and MCAR, respectively, and βij∗ = βj∗ , βi∗ , and β ∗ when Y2 is MNAR, EMAR, and MCAR, respectively. Observe that

π+j12 π+j11 and πi|j21 = αij πi|j11 , π+j22 π+j21 π+j21 π+j11 and πi|j12 = βij πi|j11 , = βij∗ πi|j21 π+j22 π+j12

πi|j22 = αij∗ πi|j12

(A.1)

πi|j22

(A.2)

where πi|jkl = P(Y1 = i|Y2 = j, R1 = k, R2 = l), for k = 1, 2, l = 1, 2. Since Y1 is independent of R2 for MNAR Y1 and is independent of R1 for EMAR or MCAR Y1 , πi|j22 = πi|j21 for MNAR Y1 and πi|j22 = πi|j12 for EMAR or MCAR Y1 . Applying these equal relations to (A.1) and (A.2) for the corresponding missing mechanism of Y1 , we have

αi∗ = αi

π+j11 π+j22 π+j11 π+j22 for MNAR and βij∗ = βij for EMAR or MCAR. π+j12 π+j21 π+j12 π+j21

(A.3)

Since the marginal distribution of unobserved Y2 arising from missing unit is proportional to that from missing item, π+j22 π π is proportional to pi+j12 and hence π+j22 = π++22 . Plugging this equality into (A.3), we have +j12

πij22

++12

⎧ π+j11 π++22 π+j11 π++22 ⎪ ⎪ πij12 = αi βij πij11 for MNAR Y1 , ⎪αi∗ πij12 = αi ⎪ ⎪ π π π +j21 ++12 +j21 π++12 ⎪ ⎨ π+j11 π++22 π+j11 π++22 πij21 = βij αj πij11 for EMAR Y1 , = βij∗ πij21 = βij ⎪ π π π + j21 ++ 12 +j21 π++12 ⎪ ⎪ ⎪ π+j11 π++22 π+j11 π++22 ⎪ ⎪ πij21 = βij απij11 for MCAR Y1 . ⎩βij∗ πij21 = βij π+j21 π++12 π+j21 π++12

where βij = βj , βi , β when Y2 is MNAR, EMAR, and MCAR, respectively. □ A.2. Proof of Theorem 3.3 Proof. We only prove condition C1 for variable Y1 as C2 for Y2 can be shown in the same way. Assume that Y1 is MNAR and, ∗ o for each fixed j and j′ , denote the i determining ωjjmax and ωjjmin ′ ′ by i and i , respectively. Then, we have

ωjjmax ′ ωjj+′

=

∑ ∑ πij′ 21 αi πij′ 11 πi∗ j11 πi∗ j11 · ∑i , = ∑i ∗′ πi∗ j′ 11 i πij21 i αi πij11 πi j 11

(A.4)

where the second equality holds because πij21 = αi πij11 . Since αi > 0 and ωjjmax = ′

ωjjmax ≥ ωjj+′ . ′

πi∗ j11 πi∗ j′ 11



πij11 πij′ 11

for all i, we have (A.5)

Similarly, observe that

ωjjmin ′ ωjj+′

=

∑ ∑ πij′ 21 αi πij′ 11 πio j11 πio j11 · ∑i = ∑i < 1, πio j′ 11 π α ij21 i i i πij11 πi0 j′ 11

where the inequality holds because of ωjjmin = ′

πio j11 πio j′ 11



πij11 πij′ 11

(A.6) and αi > 0 for all i. Since (A.5) and (A.6) are true for every pair of

j and j′ , condition C1 always holds for MNAR Y1 . When Y1 is MCAR, condition C1 is immediately met by replacing αi in (A.4) and (A.6) with α . Therefore, if there is at least one pair of j and j′ violating C1, the missing mechanism of Y1 is neither MNAR nor MCAR. □ References Baker, S.G., Rosenberger, W.F., Dersimonian, R., 1992. Closed-form estimates for missing counts in two-way contingency tables. Stat. Med. 11 (5), 643–657. Choi, B.-S., Choi, J.W., Park, Y., 2009. Bayesian methods for an incomplete two-way contingency table with application to the ohio (buckeye state) polls. Surv. Methodol. 35 (1), 37–71. Clarke, P.S., 2002. On boundary solutions and identifiability in categorical regression with non-ignorable non-response. Biometrical J. 44 (6), 701–717. Kim, S., Park, Y., Kim, D., 2015. On missing-at-random mechanism in two-way incomplete contingency tables. Statist. Probab. Lett. 96, 196–203. Little, R.J., 1994. A class of pattern-mixture models for normal incomplete data. Biometrika 81 (3), 471–483. Little, R.J., Rubin, D.B., 2014. Statistical Analysis with Missing Data, Vol. 333. John Wiley & Sons.

96

S. Jeon, T.Y. Kwon and Y. Park / Statistics and Probability Letters 146 (2019) 90–96

Molenberghs, G., Beunckens, C., Sotto, C., Kenward, M.G., 2008. Every missingness not at random model has a missingness at random counterpart with equal fit. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 (2), 371–388. Park, Y., Choi, B.-S., 2010. Bayesian analysis for incomplete multi-way contingency tables with nonignorable nonresponse. J. Appl. Stat. 37 (9), 1439–1453. Park, Y., Kim, D., Kim, S., 2014. Identification of the occurrence of boundary solutions in a contingency table with nonignorable nonresponse. Statist. Probab. Lett. 93, 34–40.