Replicate-Based Variance Estimation in a SASÂ® Macro - CiteSeerX

Replicate-Based Variance Estimation in a SAS® Macro Julia L. Bienias Rush-Presbyterian-St. Luke's Medical Center, Chicago, IL

ABSTRACT

regression models (see SAS Institute Inc., 2000).

I describe a program to compute design-based variances (i.e., variances adjusted for the sample design) for linear regression models, logistic regression models, and mixed models for a 2-PSUs-per-stratum complex sampling design. The program allows the user to choose balanced repeated replication or jackknife repeated replication for computing the variances. Although there are commercial packages on the market now that will compute variances using these and other methods (e.g., SUDAAN®, WesVar®), I wanted to create a program that was fully integrated into SAS, so that the user could take full advantage of the SAS System. The program is a macro that allows the user to specify the input data set, the full-sample weight, the replicates (an auxiliary program can be used to create the replicates), the independent and dependent variables, the variance estimation method, and other key variables. The program uses the SAS procedures LOGISTIC, REG, GLM, and MIXED, and takes advantage of the new Output Delivery System. In addition to customized output, the program creates output data sets that can be used for graphing, diagnostics, etc. In my applications it has run reasonably fast.

In this paper I describe a SAS macro program that will produce estimates and standard errors that properly account for the sampling design for a two-PSUs-perstratum complex design, one of the most common, for three models: linear regression, logistic regression, and random coefficient mixed models. The design is described in more detail in the next section, followed by a summary of the program.

INTRODUCTION and BACKGROUND More and more research organizations are using complex sampling designs to identify a sample of persons or other units of analysis. A “complex” design is defined as one in which the units selected for study are chosen with unequal probabilities of selection. (This is in contrast to a “typical” study sample in which the units are chosen with simple random sampling.) Government surveys have long used complex designs. An example of such a design is the oft-copied household survey design used by the Bureau of the Census in the household surveys they conduct for themselves and for other government agencies (e.g., in the Current Population Survey, the National Crime Victimization Survey, the National Health Interview Survey). The basic design is based on “drilling down” through nested levels of geography: First counties (or their equivalents) are selected, then smaller geographic units within counties, down to blocks of households and households themselves (U. S. Bureau of the Census, 2000). As a consequence of the growing use of complex sampling designs, particularly in research settings in which the data are used for modeling, there is a growing need for software that produces proper estimates under the conditions of unequal probabilities of selection. As mentioned earlier, there are commercially available packages, such as SUDAAN and WesVar (see Morganstein & Brick, 1996), that meet this need. In addition, SAS' new SURVEYREG procedure uses the method of Taylor series expansion to produce design-adjusted estimates for linear

DESIGN AND VARIANCE ESTIMATION APPROACH Types of Estimates There are two general types of estimates that are desired by survey researchers and survey data users. The first is simple estimates, such as estimated population means and totals and estimates of change over time in these quantities. The second is parameter estimates from models. In models (e.g., linear regression models, logistic regression models), one is interested in getting correct estimates of the association between some set of predictor variables and an outcome variable(s). The program described in this paper produces estimates of regression coefficients and variances for linear regression, logistic regression (with binary or polychotomous outcomes), and random coefficient mixed models. Variance Estimation Paradigm There is some debate about under what circumstances the sampling design should be taken into account when estimating quantities of interest. For example, if one were interested in estimating the parameter of a model associating a set of predictor variables and an outcome variable, if the model is correct (i.e., true), the parameters can be estimated properly from unweighted data without any special adjustments made for the design (e.g., see Sa¨rndal, Swensson, & Wretman, 1992). This type of approach is generally referred to as “model-based” and there are many variations on it. Alternatively, one can take a “design-based” approach, which is the one I am taking in this paper and in the program. This is the same approach used for estimation and variance estimation in the large government surveys, as well as by many researchers. Under this paradigm, estimates that are properly adjusted for the sample design will be unbiased for the true parameters in the sampling frame. Further, they will be unbiased for the target population to the extent that the frame is an unbiased representation of the target population with respect to the quantities that are being estimated. Under this paradigm, it is important that the estimates account for the unequal probabilities of selection and that the variance estimation procedure reflect the design. For point estimates, this means incorporating the sample weights. Loosely speaking, the weight given to a particular

unit is the number of units that unit's data “represent” in the target population. In practice, these weights might be simply the inverse of the probabilities of selection, or they might also incorporate non-response, post-stratification, or other adjustments.

The Two-PSUs-per-Stratum Design The first stage of selection in any complex design is referred to as selecting the “primary sampling units,” or PSUs. PSUs are selected either with certainty or with some probability 0/2-":? 0/,#-! = 0/-,."> "/01#.? "/01.1 = 0/::#:> "/022.? "0/!2:: = ./-2:,> 1./,.2,? "/00-2 = 0/:1#2> "/0:-:?

51 ) A & "-!-/20:!"-# 51 ) =B ? "#10/2."1."# $ A / & 0/1"1#1!""!1 51 55 3 3

".1"/,,!0!#.

7 3 C /02 C "" 1/10

Figure 2. Sample output from %repl_var for proc=mixed with method=jk2.

Random Coefficients Model for Change in YVAR Subset: Variance Estimation based on Jackknife Repeated Replication Full Sample Fit -- *** Ignore Fixed Effect Standard Errors ***

The Mixed Procedure Model Information Data Set

WORK._NEWDAT

Dependent Variable

yvar

Weight Variable

fullwt

Covariance Structure

Unstructured

Subject Effect

id

Estimation Method

REML

Residual Variance Method

Profile

Fixed Effects SE Method

Model-Based

Degrees of Freedom Method

Between-Within

Dimensions Covariance Parameters

4

Columns in X

3

Columns in Z Per Subject

2

Subjects Max Obs Per Subject Observations Used Observations Not Used Total Observations

483 3 1437 0 1437

Iteration History Iteration

Evaluations

-2 Res Log Like

Criterion

0

1

3898.31593541

1

4

3118.62194555

.

2

1

2961.99269268

0.47087927

3

1

2861.62713561

0.33901490

4

1

2813.16623392

0.14424044

5

1

2797.76212210

0.02560024

6

1

2795.45757519

0.00093815

7

1

2795.38064882

0.00000144

8

1

2795.38053324

0.00000000

Convergence criteria met.

Estimated G Matrix Row

Effect

ID Variable

Col1

Col2

1

Intercept

1

0.7181

0.03729

2

LAG

1

0.03729

0.01778

Covariance Parameter Estimates Cov Parm

Subject

Ratio

UN(1,1)

id

UN(2,1) UN(2,2) Residual

Estimate

Standard Error

Z Value

Pr Z

0.2603

0.7181

0.05341

13.45

Replicate-Based Variance Estimation in a SASÂ® Macro - CiteSeerX

Replicate-Based Variance Estimation in a SASÂ® Macro - CiteSeerX

Suggest Documents

A SAS Macro For Estimation Of Direct Adjusted Survival ... - CiteSeerX

A SAS Macro for Marginal Maximum Likelihood Estimation in ...

SAS macro

%HPGLIMMIX: A High-Performance SAS Macro for GLMM Estimation

%HPGLIMMIX: A High-Performance SAS Macro for GLMM Estimation

SAS 9.1 Macro Language

A SAS Macro for Balancing a Weighted - CiteSeerX

Variance Estimation in Spatial Regression Using a ... - CiteSeerX

On the estimation of a monotone conditional variance in ... - CiteSeerX

SAS 9.2 Macro Language: Reference

VARIANCE INITIALISATION IN GARCH ESTIMATION

Least-squares variance component estimation - CiteSeerX

Online estimation of variance parameters: experimental ... - CiteSeerX

A SAS Macro for Logistic Regression Modeling in Complex Surveys

A SAS Macro for Logistic Regression Modeling in

A Discussion of Three Papers on Variance Estimation - CiteSeerX

Introducing a SASÂ® macro for doubly robust estimation - CiteSeerX

Introduction to Analysis of Variance Procedures - SAS

a SAS Macro for Sample Size Estimation Using Exact ... - Lex Jansen

Macro for Retrospective Statistical Power Analysis - SAS

Users Manual for SAS Binary Mediation Macro

NOISE VARIANCE ESTIMATION IN NONLOCAL TRANSFORM DOMAIN

Variance Estimation in Nonparametric Regression ... - Semantic Scholar

Bootstrap algorithms for variance estimation in PS