Biostatistics (2010), 11, 3, pp. 559–571 doi:10.1093/biostatistics/kxq006 Advance Access publication on February 19, 2010
Joint modeling of intercourse behavior and human fecundability using structural equation models SUNGDUK KIM, RAJESHWARI SUNDARAM∗ Biostatistics and Bioinformatics Branch, Division of Epidemiology, Statistics and Prevention Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, 6100 Executive Boulevard, Rockville, MD 20852, USA
[email protected] GERMAINE M. BUCK LOUIS Epidemiology Branch, Division of Epidemiology, Statistics and Prevention Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, 6100 Executive Boulevard, Rockville, MD 20852, USA S UMMARY Human fecundability is defined as the probability of conception during a menstrual cycle among couples at risk for pregnancy. It is highly relevant for understanding human reproduction and represents a series of highly interrelated and timed processes. The statistical literature has recognized the need to incorporate both biological and behavioral factors (Barrett and Marshall, 1969; Dunson and Stanford, 2005) when modeling conception probabilities, given that intercourse during the fertile window is a necessary but not sufficient criterion for conception. The heterogeneity of behaviors such as the timing and frequency of intercourse in a menstrual cycle needs to be considered when estimating conception. Here we propose a joint model of intercourse behavior and human fecundability through a classic conception probability model and a structural equation model (SEM) to accommodate intercourse during the menstrual cycle. The SEM part of the proposed model allows the dependency between intercourse behaviors on consecutive days in a menstrual cycle to vary across days. Consequently, the proposed model can accommodate not only a broad variety of intercourse patterns and dependency structures but also general covariate effects. Finally, we present a detailed analysis of the New York State Angler Cohort Prospective Pregnancy Study to illustrate the proposed methodology. Keywords: Conception; Fecundity; Intercourse; Latent variables; Markov chain Monte Carlo; Menstrual cycle; Posterior distribution; Structural equation model.
1. I NTRODUCTION Human fecundability is of great interest to reproductive scientists as well as couples trying to conceive. Scientists are interested in identifying exposures or behaviors that enhance or reduce fecundity while ∗ To whom correspondence should be addressed. c The Author 2010. Published by Oxford University Press. All rights reserved. For permissions, please e-mail:
[email protected].
560
S. K IM AND OTHERS
couples are interested in maximizing their specific chance of achieving conception. Many of the so-called determinants of male or female fecundity have been quantified without regard to the couple’s intercourse behaviors, raising concern about their accuracy. Statisticians have long been interested in developing models that can capture the biologic aspects of fecundability (e.g. Weinberg and Gladen, 1986; Zhou and Weinberg, 1996; Scheike and Jensen, 1997; Dunson and Zhou, 1997; Dunson and Weinberg, 2000; Dunson and others, 2001; Dunson and Stanford, 2005). Reproductive age women typically ovulate a single ovum each menstrual cycle, which is presumed capable of being fertilized by a single sperm. A host of behaviors impact the probability of conception as do various biologic markers such as the quality of cervical mucus (Bigelow and others, 2004) or sperm. While investigators continue to improve their ability to measure exposures during the critical window of human reproduction, less attention has been paid to the modeling of behavioral factors despite their importance for conception probabilities (Dunson, 2003; Stanford and Dunson, 2007). Furthermore, intercourse behavior is heterogeneous across couples and is presumably influenced by such factors as couples’ ages, parity or the number of live births, and sexual libido. These factors also influence fecundability, thereby making it difficult to differentiate between behavioral factors and biological determinants. For example, parity may be associated with an increased probability of conception but associated with less-frequent sexual intercourse. As such, statistical models must be capable of specifying lifestyle or behavioral factors (e.g. cigarette smoking, diet) without having to condition in any way on intercourse patterns. Most of the analysis of human fecundability data has focused on the model proposed by Schwartz and others (1980): ( ) Y X i jk P(Yi j = 1|X i j ) = ω 1 − (1 − λk ) . (1.1) k
In the above equation, Yi j is the indicator of conception for couple i in cycle j, X i j = (X i j1 , . . . , X i j K ) denotes the vector of intercourse indicators over the fertile window for a couple i in cycle j and k denotes a specific day in the fertile window. Furthermore, ω denotes the cycle viability probability and λk denotes the probability of conception in a viable cycle with intercourse only on day k of the fertile window. Various generalizations of the Schwartz model have been proposed in the literature: Weinberg and others (1994) incorporated cycle-specific covariates, Dunson and Zhou (1997) incorporated within-woman dependency, and Zhou and Weinberg (1996) have incorporated day-specific covariates. However, data on biological factors that indicate the viability status of a cycle are typically unavailable, thus leading to weak identifiability of the cycle viability term ω. To address this, Dunson and Stanford (2005) proposed the following model: Y (1.2) P(Yi j = 1|X i j ) = 1 − (1 − λi jk ) X i jk . k
Here the kth day-specific conception probability λi jk is allowed to vary from cycle to cycle and can be subject specific. The authors used a complementary log-log model with a gamma frailty term to model the day-specific conception probabilities. The above-mentioned models ((1.1) and (1.2) and their subsequent extensions in the literature) are conditional on the intercourse pattern, that is, the intercourse pattern is assumed to be given. However, Dunson (2003) has studied the joint modeling of intercourse and fecundability under a variant of Schwartz model which does not require the intercourse pattern to be fixed. They proposed the following model for fecundability: ( ) Y P(Yi j = 1|ωi j , ρ , λ) = ωi j 1 − (1 − ρi jk λk ) , (1.3) k
Joint modeling of intercourse and human fecundability
561
where ρ = (P(X i j1 = 1), . . . , P(X i j K = 1)) denotes the probability of intercourse in the fertile window and is calculated by modeling the probability of intercourse pattern. Note that this is a marginal model for conception with respect to intercourse. As pointed out by Dunson (2003), this marginal model is useful when (i) the interest lies in modeling fecundability and some of the cycles have missing intercourse information, (ii) the interest lies in marginal modeling of fecundability, and (iii) one is interested in ascertaining the determinants of sexual behavior. However, in Dunson (2003), it is assumed that a couple’s acts of intercourse on consecutive days are independent. This may be too restrictive for many applications. In this paper, we are interested in understanding (i) the heterogeneity of the intercourse behavior in the fertile window by accounting for dependency of intercourse acts on consecutive days, (ii) modeling the probability of conception without conditioning on the intercourse behavior using a marginal model for fecundability while accounting for varied day-specific intercourse probabilities, and (iii) identifying the determinants of intercourse behavior. Consequently, we study the joint modeling of intercourse pattern and human fecundability under a variant of (1.2), without assuming independence of the acts of intercourse on consecutive days. In fact, this dependency may also vary across days. Wilcox and others (2004) have found that the highest frequency of intercourse occurs during the fertile window, estimated to be 5 days before ovulation and the day of ovulation. These findings are indicative of a possible biological libido effect. Motivated by this, we propose a structural equation model (SEM) to analyze and assess the dependency between intercourse acts during the fertile window, via a set of latent variables, taking into account the natural nesting of days within a cycle for an individual. Furthermore, the day-specific conception probabilities are modeled by a generalized linear mixed-effects model which allows them to be both subject and cycle specific. This paper is organized as follows. In Section 2, we present our SEM developed for the intercourse pattern and our model for the probability of conception. In Section 3, we apply our methods to the New York State Angler Cohort Prospective Pregnancy Study (NYSACPPS). In Section 4, we conclude with a discussion of our findings. Details of the likelihood, priors, and posteriors based upon the proposed models and the derivation of appropriate deviance function for model assessment are available in the supplementary material available at Biostatistics online. The development of efficient Markov chain Monte Carlo algorithm (MCMC) for carrying out the computations is also available in the accompanyingsupplementary material available at Biostatistics online. 2. M ODEL 2.1
Fecundability models
We begin by introducing some notation. Henceforth, i denotes a couple, j denotes a menstrual cycle, and k denotes a specific day within the fertile window of a menstrual cycle. Here k is indexed relative to the day of ovulation (day 0) and is negative prior to ovulation and positive afterward. We will use an extended fertile window ranging from k = −8, . . . , 3 instead of the typical k = −5, . . . , 1 in order to account for considerable variation in the length of a fertile window (see Keulers and others, 2007). We assume that there are I couples in the study and that the ith couple contributes n i cycles. Let Yi j denote the binary indicator variable for conception, where Yi j equals 1 if conception occurs in cycle j for couple i and is equal to 0 otherwise. Similarly, let X i jk denote the binary indicator variable for intercourse, that is, X i jk = 1 if intercourse occurs on day k of cycle j for couple i and is 0 otherwise. Let Z i j = (Z i j1 , . . . , Z i j p ) and U i j = (Ui j1 , . . . , Ui jq ) denote the covariate vectors which potentially influence fecundability and intercourse, respectively. The Z i j may share common components with U i j . U i j ) and ρi0jk = Pr(X i jk = 0|U U i j ) = 1 − ρi1jk . Moreover, let ρ i jk = We also write ρi1jk = Pr(X i jk = 1|U ρ i jk , k = 1, . . . , K , j = 1, . . . , n i , i = 1, . . . , I )0 . (ρi0jk , ρi1jk ) and ρ = (ρ
562
S. K IM AND OTHERS
The model (1.2) for probability of conception in cycle j for couple i, conditional on the intercourse pattern X i j = (X i j1 , . . . , X i j K )0 and bi , is Y X i j , bi ) = 1 − (1 − λi∗jk ) X i jk . P(Yi j = 1|X k
Here bi is the couple-specific random effect and λi∗jk is the day-specific probability of conception on day λi∗jk , k = 1, . . . , K ; j = k of cycle j for couple i given that intercourse occurs only on day k. Let λ ∗ = (λ 1, . . . , n i ; i = 1, . . . , I )0 . Note that the above model is valid under the assumption of conditional independence of batches of sperms introduced into the reproductive tract by different intercourse acts to comingle and compete independently to fertilize the ovum for a couple within a cycle. Under this assumption, one can show that the marginal (with respect to intercourse) model of interest for probability of conception becomes ρ , λ∗) = 1 − Pr(Yi j = 1|ρ
K Y
(1 − ρi1jk λi∗jk ).
(2.1)
k=1
The advantage of using such a marginal model has already been explained in Section 1. In addition, this model avoids the weak identifiability issues of Schwartz and others (1980) models by allowing the day-specific conception probabilities to vary by couple as well as cycle. Next, we model the day-specific conception probabilities through the following reparameterization: λi∗jk =
exp(λi jk ) , 1 + exp(λi jk )
using a generalized linear mixed-effects model. Observe that with the above reparameterization, −∞ < λi jk < ∞. Let λ = (λi jk , k = 1, . . . , K , j = 1, . . . , n i , i = 1, . . . , I )0 and b = (b1 , . . . , b I )0 . Incorporating this, our proposed conception model is given by ρ , b , λ, Z i j ) = 1 − Pr(Yi j = 1|ρ
K 1 + ρ 0 exp(λ ) Y i jk i jk
k=1
1 + exp(λi jk )
and λi jk = bi + Z i0 jk β + i jk .
(2.2)
Here the subject-specific effect bi ∼ N (0, σ 2 ), where the variance σ 2 is parameterized as eσb , the overall error i jk ∼ N (0, σe2 ), bi and i jk are assumed independent. Observe that the underlying latent variable has a random-effects model structure and that the reparameterized λi jk can take any value in the real line. This representation facilitates an easy implementation of the Gibbs sampling algorithm. 2.2
Intercourse behavior models
We develop a SEM to analyze the multilevel intercourse pattern in a fertile window. SEMs are powerful multivariate regression techniques that can handle scenarios where the predictor and response variables can be either latent or observed. Bayesian methods to handle heterogeneity in SEM have shown that ignoring heterogeneity can lead to misleading inferences (Ansari and others, 2000). Bayesian methods for analyzing data with clustered structure via SEM are investigated by Ansari and others (2000) and Dunson and Perreault (2001). Recall that our definition of the fertile window is from day −8 before ovulation (day 0) to day 3 after ovulation. We are interested in modeling the dependency between the occurrence of intercourse acts on different days within a fertile window of a menstrual cycle and studying the effects of covariates like
Joint modeling of intercourse and human fecundability
563
smoking, alcohol, parity, and age on frequency of intercourse. Additionally, the dependency of occurrence of intercourse acts may also vary across days (Wilcox and others, 2004). In order to account for these issues, we begin by proposing the following measurement model part of the SEM for the binary response (intercourse pattern) {X i jk , k = 1, . . . , K } based on the covariates {Ui jk , k = 1, . . . K }. We model the binary intercourse data via the flexible approach of data augmentation, first proposed by Albert and Chib (1993). In this approach, latent variables are introduced in the intermediate step. The use of latent variable can be thought of as data augmentation, which aides in calculation of the exact posterior distribution of the parameters in the model for binary data given the observed binary data. For example, the probit regression model for binary outcomes can be seen to have an underlying normal regression structure on latent continuous data. See Albert and Chib (1993) and Van Dyk and Meng (2001) for details. Motivated by this, the proposed model for intercourse in terms of the intermediate latent variables is given as follows:
X i jk =
(
1
if wi jk > 0,
0
if wi jk 6 0,
and wi jk = μk + α 0k ω k η i j + φ 0 U i jk + δi jk ,
(2.3)
where δi jk ∼ N (0, σk2 ), μk is the overall mean effect due to response k (i.e. intercourse on day k), α k is pk -dimensional column vector of coefficients loading on η i j , an r -dimensional vector of latent variable. Also ω k is the pk × r fixed loading matrix that controls the dependence of response k on the set of latent variables and φ is the q-dimensional vector of regression coefficients corresponding to U i j , a qdimensional vector of covariates. Note that the probability of intercourse is modeled by P(X i jk = 1|ηη i j , U i jk ) = 8(μk + α 0k ω k η i j + φ 0 U i jk ).
(2.4)
Given (μk , α k , φ , U i jk ), the mean of wi jk is E(wi jk |μk , α k , φ , U i jk ) = μk + φ 0 U i jk . The covariance between the k and k 0 daily intercourse for ith couple in cycle j is given by ω 0k 0 α k 0 + σk2 I {k = k 0 }. Cov(wi jk , wi jk 0 ) = α 0k ω k Var(ηη i j )ω
(2.5)
Next, we consider the following structural part of the model: ηi j = Γ ηi j + ξ i j ,
(2.6)
where ξ i j = (ξi j1 , ξi j2 , . . . , ξi jr )0 ∼ N (0, diag(ση21 , . . . , ση2r )), and ξ i j and δi jk are assumed to be independent. Here, a single loading matrix Γ is assumed for both endogenous and exogenous latent variables. Further, note that Γ is a r × r loading matrix such that the diagonal elements of Γ are all 0 and the rows of Γ corresponding to the exogenous latent variables are all 0. We assume that I − Γ is invertible. To avoid certain scaling problems between the η i j and the scale of α k , the variance of η i j is assumed to be 1. This can be achieved without any loss of generality, details of which are provided in the supplementary material available at Biostatistics online. For the NYSACPPS, the loading of the 12 manifest variables (fertile window length here is 12 days) in the measurement model, as well as the structural model of the SEM, is illustrated in Figure 1. We split our fertile window into 3 parts: days −8 through −5; days −4 through 1; and days 2 and 3. This is motivated by the findings of Wilcox and others (2004) that an increase in frequency of intercourse occurs around ovulation, reflecting biological changes in libido.
564
S. K IM AND OTHERS
Fig. 1. Path diagram for the New York State Angler Prospective Pregnancy Cohort.
The proposed structural model of the SEM based on Figure 1 can be explicitly written as ηi j1 = ξi j1 , ηi j2 = q ηi j3 = q
i.i.d.
γ1 1 + γ12
ηi j1 + ξi j2 ,
q γ2 1 + γ12
1 + γ22 + (γ3 + γ1 γ2 )2 i.i.d.
where ξi j1 ∼ N (0, 1), ξi j2 ∼ N 0,
ηi j2 + q
1 1+γ12
γ3 1 + γ22 + (γ3 + γ1 γ2 )2
, and ξi j3 ∼ N 0,
above formulation yields η i j ∼ N (0, Vη ) with q γ1 1 1+γ12 q γ1 1 Vη = 1+γ12 (1+γ12 )γ2 +γ1 γ3 q γ1 γ2 +γ3 q q 1+γ22 +(γ3 +γ1 γ2 )2
i.i.d.
1+γ12 1+γ22 +(γ3 +γ1 γ2 )2
q
ηi j1 + ξi j3 ,
1 1+γ22 +(γ3 +γ1 γ2 )2 γ1 γ2 +γ3
1+γ22 +(γ3 +γ1 γ2 )2 (1+γ12 )γ2 +γ1 γ3 q q 1+γ12 1+γ22 +(γ3 +γ1 γ2 )2
1
. Note that the
.
(2.7)
Observe that the diagonal elements of Vη all turn out to be 1. Note that the models (2.4) and (2.6) accommodate a broad variety of intercourse patterns and dependency structures.
Joint modeling of intercourse and human fecundability Moreover, we can calculate the probability of intercourse on a specific day k as Z ∞Z Pr(X i jk = 1|Ui jk ) = N (wi jk : μk + α 0k ω k η i j + φ 0 U i jk , 1)N (ηη i j : 0, Vη )dηη i j dwi jk .
565
(2.8)
0
3. A NALYSIS OF THE N EW YORK A NGLER P ROSPECTIVE C OHORT S TUDY We illustrate our proposed method by analyzing the NYSACPPS (Buck Louis and others, 2009). Specifically, a prospective cohort design was used to recruit women aged 20 to 34 years who resided in the 16 contiguous counties surrounding Lakes Erie and Ontario. The study cohort comprised 113 women who reported planning pregnancies within the next 6 months; 14 women were pregnant at baseline and were excluded. Eighty-three (84%) women returned daily diary information about menstruation, sexual intercourse, home pregnancy test results, and covariates believed to impact female fecundity. Given the absence of a biomarker of ovulation for the study, the Ogino–Knaus method was used for estimating the likely date of ovulation by counting back 14 days from the end of the cycle (Knaus, 1929; Ogino, 1930). The fertile window was defined by including 8 days before and 3 days after the estimated day of ovulation. In our analysis, we will focus on the following factors: intercourse patterns during the fertile window; conception; and covariates, namely, female age (years) upon enrollment, parity (yes/no), number of daily cigarettes smoked, and alcoholic beverages consumed. A detailed description of the characteristics of the cohort at enrollment by pregnancy status is available in the supplementary material available at Biostatistics online. To help the numerical stability in the implementation of the MCMC sampling algorithm, we standardized all covariates. 3.1
Intercourse pattern analysis
We begin by presenting our analysis of the data corresponding to the intercourse pattern of couples. We fitted the model (2.3) where the day-specific covariates of interest are U i jk = (Smoke (Yes/No), Alcohol (Yes/No), Parity (Yes/No), Age)’. Recall that we have split the fertile window into 3 parts, FW1: days −8 through −5 prior to ovulation; FW2: days −4 through +1 (days around ovulation); and FW3: days +2, +3 post ovulation. The path connections considered are all forward in time as it is not reasonable to assume that the dependence structure would have a backward in time influence. Here we take the dimension pk of μ, α , φ , γ , σα2 )0 , α k to be 1 and γ is a 3 × 3 matrix. The parameters of interest in the model (2.3) are θ = (μ 0 where γ = (γ1 , γ2 , γ3 ) is a 3-dimensional vector of the parameters in (2.6). Note that γ1 and γ3 gives a measure of influence of FW1 on FW2 and FW3, respectively. Similarly, γ2 gives the influence of FW2 on FW3. The parameter μk denotes the overall main effect for intercourse on day k and the parameter φ indicates the influence of the covariate Ui jk associated with it. The way to interpret the parameter φ is that positive values indicate a positive association of intercourse with the corresponding covariate and negative values indicate a negative association of intercourse with the corresponding covariate. We assume that (μ, α, φ) are independent a priori and (γ , σα2 ) are independent of (μ, α, φ) a priori. Furthermore, we assume that μk ∼ N (0, σμ2 ), φ ∼ N (0, σφ2 I ), γl ∼ N (0, σγ2 ), α k ∼ N (0, σα2 I ), and π(σα2 ) = π(σα2 |a1 , b1 ) ∝ (σα2 )−(a1 +1) exp(−b1 σα2 ), where a1 and b1 are 2 prespecified hyperparameters. We choose σμ2 = 1000 for π(μk ), σφ2 = 1000 for φ ), σγ2 = 10 for π(γl ), a1 = 1 and b1 = 0.001 for π(σα2 ). π(φ In Table 1, we report the posterior means, the posterior standard deviations (SD), and the 95% highest posterior density (HPD) intervals of the model parameters under the intercourse model. Observe that the
566
S. K IM AND OTHERS
Table 1. Posterior estimates under proposed intercourse models for New York State Angler Pregnancy Data Posterior estimates Variable μ−8 μ−7 μ−6 μ−5 μ−4 μ−3 μ−2 μ−1 μ0 μ1 μ2 μ3 φ1 : Smoke φ2 : Alcohol φ3 : Parity φ4 : Age
Estimate −1.0109 −0.8288 −0.8054 −0.7375 −0.5468 −0.5511 −0.5728 −0.5566 −0.5933 −0.5063 −0.6200 −0.6741 0.0938 0.0628 −0.0028 −0.0559
SD 0.1050 0.0760 0.0765 0.0726 0.0705 0.0692 0.0668 0.0721 0.0730 0.0649 0.0715 0.0794 0.0257 0.0209 0.0283 0.0275
Posterior estimates
95% HPD interval (−1.2258, −0.8165) (−0.9711, −0.6767) (−0.9556, −0.6574) (−0.8775, −0.5942) (−0.6837, −0.4063) (−0.6892, −0.4186) (−0.7035, −0.4434) (−0.6989, −0.4183) (−0.7342, −0.4479) (−0.6300, −0.3762) (−0.7566, −0.4770) (−0.8326, −0.5217) ( 0.0426, 0.1435) ( 0.0213, 0.1032) (−0.0578, 0.0536) (−0.1070, −0.0006)
Variable α−8,1 α−7,1 α−6,1 α−5,1 α−4,2 α−3,2 α−2,2 α−1,2 α−0,2 α1,2 α2,3 α3,3 γ1 γ2 γ3 σα2
Estimate 0.6815 0.3554 0.3863 0.3462 0.4372 0.3561 0.2448 0.4652 0.4606 0.2310 0.3837 0.5163 3.0224 2.3392 0.7406 0.1920
SD 0.1469 0.1120 0.1086 0.1082 0.1077 0.1070 0.0948 0.1118 0.1095 0.0946 0.1087 0.1251 1.2167 1.3717 2.0028 0.0923
95% HPD interval (0.4098, 0.9776) (0.1460, 0.5814) (0.1825, 0.6053) (0.1328, 0.5552) (0.2214, 0.6449) (0.1473, 0.5636) (0.0580, 0.4307) (0.2511, 0.6948) (0.2557, 0.6842) (0.0551, 0.4265) (0.1763, 0.5973) (0.2846, 0.7658) (1.0704, 5.4342) (0.0229, 5.0733) (−3.4541, 4.5386) (0.0604, 0.3710)
Table 2. Day-specific overall intercourse probabilities Day −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3
Estimate 0.20475 0.22067 0.22926 0.24596 0.31032 0.30420 0.29147 0.30911 0.29731 0.31316 0.28393 0.27718
SD 0.01900 0.01958 0.01973 0.02026 0.02162 0.02171 0.02159 0.02165 0.02163 0.02187 0.02141 0.02138
95% HPD interval (0.16874, 0.24280) (0.18383, 0.25963) (0.19176, 0.26832) (0.20680, 0.28629) (0.26927, 0.35336) (0.26266, 0.34682) (0.24906, 0.33345) (0.26694, 0.35150) (0.25753, 0.34191) (0.27143, 0.35691) (0.24214, 0.32497) (0.23617, 0.31931)
parameters associated with the covariates smoke and alcohol, with the corresponding 95% HPD intervals of (0.0426, 0.1435) and (0.0213, 0.1032), indicate an association of increased intercourse activity with both smoking and alcohol consumption. However, we do not see any association of occurrence of intercourse activity with respect to parity and age, which may be due to the fact that every couple in the cohort is intending to become pregnant. Furthermore, to investigate the SEM model, note that 95% HPD for γ1 and γ2 indicate that they are significantly different from 0, whereas γ3 is not significantly different from 0. This indicates that the intercourse pattern on days −8 through −5 (FW1) do influence the intercourse acts on days −4 through +1 (FW2), but do not significantly influence the intercourse activity on days +2, +3 (FW3). Similarly, intercourse acts on days −4 through +1 (FW2) influence intercourse activity on days +2, +3 (FW3). In Table 2, we present the posterior means, the posterior SD, and the 95% HPD intervals for the day-specific intercourse probabilities.
Joint modeling of intercourse and human fecundability
567
Fig. 2. Day-specific intercourse probability plots.
In Figure 2(a), we present the plot of posterior mean day-specific probability of intercourse comparing the effect of smoking, and in Figure 2(b), we present the plot of posterior mean day-specific probability of intercourse comparing the effect of alcohol consumption (Yes/No). Observe that the day-specific probability of intercourse increases as the days progress from FW1 to FW2 and then decreases from FW2 to FW3, supporting the findings of Wilcox and others (2004) that there is an increase in intercourse in the fertile window (their definition of the fertile window corresponds to our FW2). 3.2
Conception model
We fit the model (2.2) jointly with the model for intercourse (2.3) based on the following covariates: cigarette, alcohol, parity, and age. We also used Z i jk = (1(k = −8), 1(k = −7), . . . , 1(k = 3), cigarette, alcohol, parity, age)’ for the conception model. Furthermore, we constrained the β’s corresponding to k = −8, . . . , −5 and k = 2, 3 to be smaller than the βk , k = −4, . . . , 1, but otherwise unconstrained within FW1, FW2, and FW3. This is reasonable since the day-specific conception probabilities should reduce as one gets further away from the day of ovulation. Here the parameters of interest are β, σb , and σb2∗ , assumed independent a priori. We assume β ∼ N (0, σβ2 I ), σb ∼ N (0, σσ2b∗ I ), and π(σb2∗ ) = π(σb2∗ |a0 , b0 ) ∝ (σb2∗ )−(a0 +1) exp(−b0 σb2∗ ), β ), a0 = 1, and where a0 and b0 are 2 prespecified hyperparameters. We choose σβ2 = 1000 for π(β b0 = 0.001 for π(σb2∗ ). In Table 3, we present the posterior means, the posterior SD, and the 95% HPD intervals of the model parameters under conception model. Note that smoking has a significant negative effect on day-specific conception probabilities, while parity has a significant positive effect. However, alcohol consumption and age were not significant. The posterior means, the posterior SD, and the 95% HPD interval of the dayspecific conception probabilities are presented in Table 4. In Figure 3(a), we present the plot of posterior mean day-specific conception probability comparing the effect of smoking, and in Figure 3(b), we present the plot of posterior mean day-specific conception probability comparing the effect of parity.
568
S. K IM AND OTHERS
Table 3. Posterior estimates under proposed conception models for New York State Angler Pregnancy Data Variable Smoke Alcohol Parity Age β−8 β−7 β−6 β−5 β−4 β−3 β−2 β−1 β0 β1 β2 β3 σb2 σb2∗
Estimate −0.5376 −0.2982 0.8094 0.3599 −7.8514 −6.5851 −5.6370 −4.9760 −3.3593 −3.1821 −2.9635 −3.2792 −3.3667 −3.2139 −5.4665 −7.0515 0.0115 0.0094
SD 0.2352 0.4324 0.2894 0.2234 2.1547 1.5663 1.2758 1.0962 1.1588 1.0600 1.1254 1.1043 1.0776 1.0141 1.3961 2.0007 0.0786 0.2178
95% HPD interval (−1.0081, −0.0885) (−1.1619, 0.4512) (0.2691, 1.4004) (−0.0907, 0.7912) (−12.1945, −3.3510) (−9.4338, −3.6214) (−8.2139, −3.3881) (−7.1841, −2.9482) (−5.6670, −1.1267) (−5.1985, −1.0779) (−5.1081, −0.5505) (−5.4620, −1.1141) (−5.4288, −1.3445) (−5.2967, −1.2940) (−8.4758, −3.0529) (−10.9714, −3.3612) (−0.1022, 0.1120) (0.0001, 0.0158)
Table 4. Day-specific overall conception probabilities Day −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3
Estimate 0.00729 0.01394 0.02386 0.03636 0.11451 0.12535 0.14441 0.11977 0.11270 0.12051 0.02812 0.01229
SD 0.01326 0.01874 0.02290 0.02660 0.07737 0.07573 0.09289 0.07812 0.07040 0.07074 0.02397 0.01715
95% HPD interval (0.00000, 0.03427) (0.00006, 0.05277) (0.00052, 0.06989) (0.00164, 0.08807) (0.00543, 0.27863) (0.00942, 0.28233) (0.00592, 0.33133) (0.00658, 0.27798) (0.00693, 0.24911) (0.00901, 0.25935) (0.00014, 0.07617) (0.00001, 0.04944)
Finally, we assessed the fit of the proposed models based on the deviance information criterion (DIC) as defined by Huang and others (2005). The DIC for models for intercourse with SEM (5981.82) are smaller than that for the model without SEM (6211.43) indicating an improved fit for our proposed modeling of intercourse with the structural part of the SEM. Furthermore, we also compared the DIC values (not presented here) for intercourse models with SEM based on various different splits of the fertile window into 2, 3, and 4 components and found that splitting the fertile window as illustrated in Figure 1 resulted in the smallest DIC. We see a similar pattern with DIC values for the conception model based on intercourse model with SEM as compared to the intercourse model without SEM. In all the computations presented in this section, we used 20 000 iterations based on every 5th iteration out of 100 000 Gibbs samples to compute all posterior estimates, including posterior mean, posterior SD,
Joint modeling of intercourse and human fecundability
569
Fig. 3. Day-specific conception probability plots.
95% HPD intervals and DIC values, using a burn-in of 2000 iterations. The convergence of the Gibbs sampler was checked using several diagnostic procedures as recommended by Cowles and Carlin (1996). All HPD intervals were computed using a Monte Carlo method developed by Chen and Shao (1999). The computer codes were written in FORTRAN 95 using IMSL subroutines with double-precision accuracy. 4. D ISCUSSION We have proposed a joint model for the intercourse pattern and probability of conception incorporating the dependency of intercourse pattern, which may vary daily, via a structural equation modeling approach. The proposed marginal model for the probability of intercourse is quite general, while the proposed model for the conception probability accounts for varied day-specific intercourse probabilities and provides marginal model for assessing probability of conception without conditioning on intercourse pattern. Last, the dayspecific conception probabilities are allowed to be influenced by multiple covariates via a very flexible generalized mixed-effects model. Our findings reflect that the probabilities of intercourse vary daily and generally increase in the days preceding ovulation as previously reported (Wilcox and others, 2004) indicative of a possible libido effect. However, other possibilities such as the couple having more frequent intercourse activity around the time of ovulation, as they want to conceive or conversely that intercourse activity induces ovulation, cannot be ruled out. Furthermore, our analysis indicates that the intercourse patterns are significantly influenced by certain covariates like smoking and alcohol consumption, which also influence the probability of conception. Overall, our analysis of intercourse patterns indicates the need for measuring hormonal levels on a regular basis to better understand their relation with intercourse behaviors and eventual fecundability. Results from our probability of conception model indicate that smoking reduces the probability of conception, while parity increases it. However, our day-specific probabilities seem to be lower than those found in other studies. They range around 0.11 to 0.14 on days −4 through +1, more spread out than the findings in other literature. This may be attributable to the fact that the ovulation marker used in this study is not very precise. In future work, we intend to use the approach of Dunson and others (2001) to account for the measurement error of the ovulation marker in this joint modeling approach.
570
S. K IM AND OTHERS S UPPLEMENTARY MATERIAL
Supplementary material is available at http://biostatistics.oxfordjournals.org.
ACKNOWLEDGMENTS We are grateful to the referees and the editor for constructive suggestions that have significantly improved this paper. Conflict of Interest: None declared.
F UNDING Intramural research program of National Institutes of Health; Eunice Kennedy Shriver National Institute of Child Health and Human Development.
R EFERENCES A LBERT, J. H. AND C HIB , S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 88, 669–679. A NSARI , A., J EDIDI , K. AND JAGPAL , S. (2000). A hierarchical Bayesian methodology for treating heterogeneity in structural equation models. Marketing Science 19, 328–347. BARRETT, J. C. AND M ARSHALL , J. (1969). The risk of conception on different days of the menstrual cycle. Population Studies 23, 455–461. B IGELOW, J. L., D UNSON , D. B., S TANFORD , J. B., E COCHARD , R., G NOTH , C. AND C OLOMBO , B. (2004). Mucus observations in the fertile window: a better predictor of conception than timing of intercourse. Human Reproduction 19, 889–892. B UCK L OUIS , G. M., D MOCHOWSKI , J., LYNCH , C., KOSTYNIAK , P., M CGUINNESS , B. M., V ENA , J. E. (2009). Polychlorinated biphenyl serum concentrations, lifestyle and time-to-pregnancy. Human Reproduction 24, 451–458. C HEN , M.-H. AND S HAO , Q. M. (1999). Monte Carlo estimation of Bayesian credible and HPD intervals. Journal of Computational and Graphical Statistics 8, 69–92. C OWLES , C. AND C ARLIN , B. P. (1996). Markov chain Monte Carlo convergence diagnostics: a comparative review. Journal of the American Statistical Association 91, 883–904. DAS , S., C HEN , M.-H., K IM , S. AND WARREN , N. (2008). Bayesian structural equations model for multilevel data with missing responses and missing covariates. Bayesian Analysis 3, 197–224. D UNSON , D. B. (2003). Incorporating heterogeneous intercourse records into time to pregnancy models. Mathematical Population Studies 10, 127–143. D UNSON , D. B. AND P ERREAULT, S. D. (2001). Factor analytic models of clustered multivariate data with informative censoring. Biometrics 57, 302–308. D UNSON , D. B. AND S TANFORD , J. B. (2005). Bayesian inferences on predictors of conception probabilities. Biometrics 61, 126–133. D UNSON , D. B. AND W EINBERG , C. R. (2000). Accounting for unreported and missing intercourse in human fertility studies. 19, 665–679. D UNSON , D. B., W EINBERG , C. R., BAIRD , D. D., K ESNER , J. S. AND W ILCOX , A. J. (2001). Assessing human fertility using several markers of ovulation. Statistics in Medicine 20, 965–978.
Joint modeling of intercourse and human fecundability
571
D UNSON , D. B. AND Z HOU , H. B. (1997). A Bayesian model for fecundability and sterility. Journal of the American Statistical Association 95, 1054–1062. H UANG , L., C HEN , M.-H. AND I BRAHIM , J. G. (2005). Bayesian analysis for generalized linear models with nonignorably missing covariates. Biometrics 61, 767–780. K EULERS , M. J., H AMILTON , C. J. C. M., F RANX , A., E VERS , J. L. H. AND B OTS , R. S. G. M. (2007). The length of the fertile window is associated with the chance of spontaneously conceiving an ongoing pregnancy in subfertile couples. Human Reproduction 22, 1652–1656. K NAUS , H. (1929). Eine neue Methods zur Bestimmung des Ovulationstermines. Zentralblatt F¨ur Gyn¨akologie 53, 2193. O GINO , K. (1930). Ovulationstermin und Konzeptionstermin. Zentralblatt F¨ur Gyn¨akologie 54, 464–479. S CHEIKE , T. H. AND J ENSEN , T. K. (1997). A discrete survival model with random effects: an application to time to pregnancy. Biometrics 53, 318–329. S CHWARTZ , D., M ACDONALD , P. D. M. AND H EUCHEL , V. (1980). Fecundability, coital frequency and the viability of ova. Population Studies 34, 397–400. S TANFORD , J. B. AND D UNSON , D. B. (2007). Effects of sexual intercourse patterns in time to pregnancy studies. American Journal of Epidemiology 165, 1088–1095. VAN DYK , D. A. AND M ENG , X.-L. (2001). The art of data augmentation. Journal of Computational and Graphical Statistics 10, 1–50. W EINBERG , C. R. AND G LADEN , B. C. (1986). The beta-geometric distribution applied to comparative fecundability studies. Biometrics 42, 547–560. W EINBERG , C. R., G LADEN , B. C. AND W ILCOX , A. J. (1994). Models relating the timing of intercourse to the probability of conception and the sex of the baby. Biometrics 50, 358–367. W ILCOX , A. J., BAIRD , D. D., D UNSON , D. B., M CCONNAUGHEY, D. R., K ESNER , J. S. AND W EINBERG , C. R. (2004). On the frequency of intercourse around ovulation: evidence for biological influences. Human Reproduction 19, 1539–1543. Z HOU , H. B. A ND W EINBERG , C. R. (1996). Modelling conception as an aggregated Bernoulli outcome with latent variables via the EM algorithm. Biometrics 52, 945–954. [Received January 6, 2010; revised January 6, 2010; accepted for publication January 7, 2010]