Gaussian copula regression using R

University of Warwick

August 17, 2011

Gaussian copula regression using R Guido Masarotto University of Padua

Cristiano Varin Ca’ Foscari University, Venice

1/ 21

Framework Gaussian copula marginal regression models

● different types of responses: continuous, discrete, categorical ● several forms of possible dependence: � time series

� longitudinal/panel studies � spatial data � ...

● likelihood analysis

● alternative to generalized estimating equations ● R package gcmr

2/ 21

Model specification 1) response Yi related to covariates vector x i and error �i by

where

Yi = F −1 {Φ(�i )�x i ; λ},

� F (yi �x i ; λ) is c.d.f. of Yi given x i

i = 1, . . . , n,

� Φ(z ) is c.d.f. of N(0, 1)

2) multivariate normal errors

� = (�1 , . . . , �n )T ∼ MVN(0, Ω),

correlation matrix Ω parametrized so to reflect various types of dependence (time series, clustered data, spatial data, etc.)

3/ 21

Gaussian copula marginal regression ● copulas conveniently separate marginals (regression model) from the dependence component ● possible marginals: GLMs, skew-normal models, Weibull regression, beta regression, zero-inflated models, . . .

● accommodate various forms of dependence through suitable choice of the copula correlation matrix:

� longitudinal data: ARMA(p,q), exchangeable, unstructured � time series: ARMA(p,q)

� spatial data: Matérn correlation matrix

● Gaussian copula attractive because inherits several well-known properties of multivariate normal ● special case: multivariate probit regression

4/ 21

Likelihood analysis ● continuous case: likelihood (nicely) in closed-form

● noncontinuous case: likelihood is awkward n-dimensional normal integral � hard to handle if n is not (very) small

� generalization of the multivariate probit likelihood � many methods for fitting multivariate probit

� gold standard (?): Geweke-Hajivassiliou-Keane simulator � efficient and relatively easy to program � implemented in gcmr

5/ 21

Package gcmr

6/ 21

Package gcmr > library(gcmr) > args(gcmr) function(formula, data, subset, offset, contrasts = NULL, marginal, cormat, start, fixed, options = gcmr.options()) nonstandard arguments:

● marginal marginal model

● cormat Gaussian copula correlation matrix ● start optional starting values

● fixed parameters fixed to a priori values

● options various options, such as fix the pseudorandom seed, number of Monte Carlo replications, choice of the optimizator (default is nlminb), . . . 7/ 21

Package gcmr: model specification marginal class marginal models gs.marg(link=linear) bn.marg(link=logit) ps.marg(link=log) nb.marg(link=log) sn.marg(link=linear)

Gaussian binomial Poisson negative binomial skew-normal Azzalini (2005)

cormat class correlation matrices ind.cormat() arma.cormat(p,q) cluster.cormat(id, type) matern.cormat(D, k)

working independence ARMA(p,q) cluster/longitudinal type={AR(1), MA(1), exch, unstr} Matérn spatial correlation D distance, k smoothness parameter

user can specify more marginals and correlation matrices 8/ 21

Scottish lip cancer ● incidence of male lip cancer in Scotland during 1975-1980 ● response: observed cases Yi in each county of Scotland (n = 56), also available expected cases Ei

● question: excess of cases associated with proportion of population employed in agriculture, fishing or forestry (AFF)? 0.24

1200

0.21 0.19

1100

0.13

0.16

1000 900

0.11 0.08

800

0.027

700 600

0

100

200

300

400

500

standardized morbidity ratio Y/ E

0

0

600

0.71

1.4

700

2.1

800

2.9

900

3.6

4.3

1000

5

1100

5.7

1200

6.4

● need model for spatially correlated counts

0

100

200

300

400

500

AFF 9/ 21

Scottish lip cancer

● standard non-spatial analysis of these data:

� observed cases Yi negative binomial with mean

µi = Ei exp(β0 + β1 AFF + β2 latitude)

� scale parameter κ

● complement this independence model with spatial variability: � Gaussian copula with Matérn correlation matrix

10/ 21

Scottish lip cancer > > > + > + + + + >

library(xtable); library(gcmr) data(scotland) D.scot for (i in 1:6) { + qqnorm(residuals(m)) + qqline(residuals(m)) + } ●

●

●

●

1

●●●● ●● ● ●● ●●● ●● ●

● ●

1

2

●

−2

−1

2

●

0

1

2

2

●

●

●

● ●●●● ●● ●●●● ●●● ● ●●● ●● ●● ●●●●

●

0

1

● ●● ● ●●● ●●● ●●● ●●● ●●● ● ● ●●● ●● ●● ● ● ●● ●● ●●● ● ●●● ●●● ● ●●

−1

−1

●

● ●

●

●● ●

●

●

−2

●

●

−3 −3

−1

−2

●

0

1

● ●●● ● ● ●● ●●● ●● ● ●● ●●●●●

●

●

● ●

−2

●

−2

●

●

Sample Quantiles

●

●● ●●● ●●● ●● ●

Theoretical Quantiles

2

2 1 0

●●

●● ●● ●●●● ●● ●●● ● ●● ●● ●● ●●● ● ● ●●

−1

1


●

●● ● ●● ● ●●● ●● ●●● ●●●●● ●

0

Sample Quantiles

0

●

●●● ●● ● ●● ●● ●● ●●●

●

−1

● ●

−2

−2

−2

●● ●● ●●● ●● ●● ●●● ●● ●●● ●●●● ●● ● ●● ●● ● ●● ●● ●●● ● ●●

●

●


Sample Quantiles

● ●●● ●

●

●

−2

−2

●

●

●

●

●

●

0

0

●

●

Sample Quantiles

●● ●● ●●● ●● ●●

−1

Sample Quantiles

0

● ●●● ●●

−1

● ●

1

●

● ●●●● ●●● ● ●● ●● ●● ●● ● ●●● ●● ●● ●● ● ●●●

−1

Sample Quantiles

1

●●● ●●

−1

0

1


2

●

−2

−1

0

1


2

−2

−1

0

1

2


15/ 21

Ohio children wheeze data ● package geepack: the dataset is a subset of the six-city study, a longitudinal study of the health effects of air pollution. ● 537 children, four annual measurements each ● variables: resp wheeze status (1=yes, 0=no) id children id age children’ age smoke indicator of maternal smoking ● standard analysis: logistic GEEs ● regressors: age, smoke and interaction

16/ 21

> > > + + + + > + + +

library(geepack) data(ohio) m data(polio) > fit xtable(cbind(coef = coef(fit), + se.hessian = se(fit, "hessian"), + se.sandwich = se(fit, "sandwich"), + se.hac = se(fit, "hac"))) (Intercept) trend cos(2πt�12) sin(2πt�12) cos(2πt�6) sin(2πt�6) κ ar1 ar2 ma1

coef 0.21 -4.29 -0.12 -0.50 0.19 -0.40 0.57 -0.51 0.31 0.68

se.hessian 0.12 2.30 0.15 0.16 0.13 0.13 0.17 0.23 0.09 0.24

se.sandwich 0.12 2.32 0.15 0.16 0.13 0.13 0.17 0.26 0.09 0.27

se.hac 0.13 2.56 0.15 0.19 0.13 0.13 0.17 0.27 0.09 0.28 19/ 21

Polio time series

1.0 0.8 0.6 0.4

2 1 0 −1 −2 −3

●

0.2

● ● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●

0.0

3

> qqnorm(residuals(fit)) > qqline(residuals(fit)) > acf(residuals(fit))

●

−2

−1

0

1

2

0

5

10

15

20

20/ 21

Conclusions ● Gaussian copula marginal regression: flexible framework for modelling dependence ● can be used to extend many regression models for independent data already available in other R packages

● future (!): more models, in particular zero-inflated responses, ordinal and multinomial responses ● future (?): pairwise likelihood for high-dimensional problems

● future (??): other methods for maximum simulated likelihood (MCMC?), MCEM

● some theory and computational details can be found in Masarotto, G. and Varin, C. (2011). Gaussian copula marginal regression (preprint available upon request) ● future theoretical work: robustness to copula misspecification

21/ 21

Gaussian copula regression using R

Gaussian copula regression using R

Suggest Documents

Probabilistic Gaussian Copula Regression model for ...

Gaussian copula marginal regression - Project Euclid

Gaussian Copula Regression Application - m-hikari

Deep Gaussian Processes for Regression using Approximate ...

Direct Gaussian Process Quantile Regression using ... - arXiv

Regression Analysis Using R

Copula Gaussian Graphical Models - University of Washington

Local Gaussian Regression

Gaussian Copula Variational Autoencoders for Mixed Data

Semiparametric Copula Quantile Regression for Complete or ...

Getting Started in Linear Regression using R

Gaussian Cox process regression

Heteroscedastic Gaussian Process Regression - CiteSeerX

Stock Market Prediction using Twin Gaussian Process Regression

Using Gaussian Process Regression for Efficient Motion Planning in ...

Nonstationary Gaussian Process Regression using Point Estimates of ...

State Estimation using Gaussian Process Regression for ... - SMARTech

Bayesian Regression and Classification Using Mixtures of Gaussian ...

Gaussian copula distributions for mixed data, with ...

Copula Gaussian Graphical Models with Hidden Variables - MIRLab

Multi-task Sparse Structure Learning with Gaussian Copula Models

On the Correlation Structure of Gaussian Copula Models for ...

Pair-copula constructions for non-Gaussian Bayesian ... - mediaTUM

Logistic Regression - CRAN-R