Adding Statistical Functionality to the DATA Step with ...

4 downloads 320 Views 641KB Size Report
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries.
Adding Statistical Functionality to the DATA Step with PROC FCMP Stacey Christian and Jacques Rioux SAS Institute Inc., Cary, NC

Paper 326-2010

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Introduction/Motivation  Ever want to call a SAS procedure from the DATA step?  Ever want to encapsulate a complicated analytical algorithm in a reusable function?  This talk will demonstrate how to add statistical functionality to the DATA step through the definition of FCMP function wrappers.

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Overview  RUN_MACRO function in FCMP  Recursive Technique  Iterative Technique/The Simulation  Meta Programming with FCMP

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

RUN_MACRO Function in FCMP  executes a predefined SAS macro  Syntax: rc = run_macro(‘macro_name’, var_1, var_2, …); • rc : return code • macro_name: name of sas macro to run

• var_N: variables to pass to/from macro

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

See Macro Run /* Create a macro called testmacro */ %macro subtract_macro; %let difference = %sysevalf(&a - &b); %mend subtract_macro; /* Use subtract_macro within a function */ proc fcmp outlib = sasuser.ds.functions;

function subtract(a,b); rc = run_macro(„subtract_macro', a, b, difference); if rc eq 0 then return(difference); else return(.); endsub; /* test the call */ a = 5.3; b = 0.7; diff = subtract(a, b); put diff=; run;

diff=4.6 Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

See Macro Run in DATA Step options cmplib = (sasuser.ds); data _null_; a = 5.3; b = 0.7; diff = subtract(a, b); put diff=; run;

diff=4.6

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Recursive Technique: Segmenting Time Series Data  “Segmenting Time Series: A Survey and Novel Approach” Keogh, Eamonn, et. al.

 reduce extremely large time series data sets  piecewise linear approximations

 top-down recursive algorithm

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Top Down Algorithm SegmentTopDown ( currentSegment ) { error = run_linear_approximation( currentSegment ); leftError = run_linear_approximation ( leftSegment );

rightError = run_linear_approximation ( rightSegment ); combinedError = leftError + rightError; if (combinedError < error) then { call SegmentTopDown ( leftSegment ) ; call SegmentTopDown ( rightSegment ); } else { keep_segment( currentSegment ); } }

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Top Down Subroutine subroutine segment_topdown(data $, segdata $, var $, start, end, threshold); error = linear_approximation(data, start,end); mid = start + floor((end-start)/2); left_error = linear_approximation (data, start, mid); right_error = linear_approximation (data, mid+1, end); improvement = (error – (left_error + right_error)) / error; if (improvement > threshold) then do; call segment_topdown(data, segdata, start, mid, threshold); call segment_topdown(data, segdata, mid+1, end, threshold); end; else do; call append_segment(segdata, start, end, error); end; endsub;

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Linear Approximation Subroutine

function linear_approximation(ds_in $, var $, first_obs, last_obs); rc = run_macro(„linear_approximation_macro‟, ds_in, first_obs, last_obs, var, error); return(error); endsub;

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Linear Approximation Macro %macro linear_approximation_macro; data _TEMP_; set &ds_in(firstobs=&first_obs obs=&last_obs); retain _TREND_ 0; _TREND_ = _TREND_ + 1; run; proc reg data=_TEMP_ outest=_EST_ noprint; model &var = _TREND_ / sse; run; quit; proc sql noprint; select _SSE_ into :ERROR from _est_; quit; %mend linear_approximation_macro;

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Recursive Technique: Results data _NULL_; call segment_topdown("sasuser.snp", "work.segds_20", "close", 1, 15116, 0.2); call segment_topdown("sasuser.snp", "work.segds_15", "close", 1, 15116, 0.15); run;

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Recursive Technique: Graphic Results

42 Piecewise Linear Segments Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Recursive Technique: Graphic Results

113 Piecewise Linear Segments Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Iterative Technique • "Minimum Quadratic Distance Estimation for the Proportional Hazards Regression Model with Grouped Data“, Jacques Rioux and Andrew Luong • Survival models/proportional hazard model

• Proc PHREG (max likelihood) versus minimum distance methods • Iteratively reweighted least squares algorithm

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Iteratively Reweighted Least Squares Algorithm initialize_weights( weights ); params1 = run_regression( weights ); while (maxRelativeDifference > criteria) { update_weights(weights);

params2 = run_regression( weights ); maxRelativeDifference = params2 - params1; params1 = params2; }

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

IterativeTechnique: DATA Step code subroutine fit_ph_model(indata $, parmData $, depVars $, weightVars $, indepVars $ ); array params1[3]; array params2[3]; call prepare_phdata(indata, “_prepdata_”); call run_regression(“_prepdata_”, depVars, indepVars, weightVars, parmData, params1);

maxRelativeDifference = 1; do while( maxRelativeDifference > 0.0001 ); call update_weights(“_prepdata_”, weightVars, parmData); call run_regression( “_prepdata_”, depVars, indepVars, weightVars, parmData, params2 ); maxRelativeDifference = calc_max_relative_diff(params1,params2); end; endsub;

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Run_Regression Subroutine subroutine run_regression( data $, dependent $, independent $, weight $, parmData $, parmArray[*]); outargs parmArray; array tmpArray[1] _temporary_; rc = RUN_MACRO ('run_regression_macro', data, parmData , dependent, independent, weight) ; rc = read_array(parmData, tmpArray); do i = 1 to dim(parmArray); parmArray[i] = tmpArray[1,i]; end; endsub;

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Run_Regression Macro %macro run_regression_macro;

proc reg data=&data outest=&parmData NOPRINT; model &dependent = &independent/noint; weight &weight; quit; data &parmData; set &parmData; keep &independent; run; %mend run_regression_macro

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

The True Glory of Reusable Functions: The Simulation • Now have a “fitting routine” for the Proportional Hazard Model (fit_ph_model) • Create a function to generate PH data (called generate_ph_data) • Create a function to append fits to results data set (called append_ph_data).

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

The Simulation Study proc fcmp; do i=1 to 1000; call simulate_ph_data ("work.simdata"); call fit_ph_model("work.simdata", "work.params", "log_log_Pij", "Weight", "x1 x2 x3" ); call append_data("work.simresults", "work.params"); end; run;

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Simulation Results Coefficient

Real Value

Mean

StDev

X1

0.1 0.102454

0.036917

X2

0.3 0.307029

0.050375

X3

0.2 0.205464

0.017793

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Simulation Graphs

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Meta Programming  Create you own scoring function dynamically from a fitted model subroutine create_score( data $, dependent $, independent $, scoreFunc $, library $ ); paramds = "work.params"; rc = RUN_MACRO('run_regression_macro', data, paramds, dependent, independent);

rc = RUN_MACRO('create_score_func_macro', paramds, independent, scoreFunc, library); endsub;

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Score Function Macro %macro create_score_func_macro; proc transpose data =¶mds out=¶mds._t; var &independent; run; proc sql noprint; select trim(_NAME_) || " * " || strip(put(col1,BEST12.)) into: theScore separated by " + " from ¶mds._t; select trim(_NAME_) into: theArgs separated by " , " from ¶mds._t; quit; data _NULL_; set ¶mds; call symputX ("Intercept",intercept); run;

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Score Function Macro - continued proc fcmp outlib=&library..score; function &scoreFunc(&theArgs); return(&Intercept + &theScore); endsub; quit; %mend create_score_func_macro;

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Run Create Score Function data _NULL_; call create_score("work.mroz", "lwage", "educ exper age kidslt6 kidsge6", "PredLWage_Full", "sasuser.score"); call create_score("work.mroz", "lwage", "educ exper age", "PredLWage_NoKids", "sasuser.score"); run; data _NULL_; educ = 15; exper = 5; age = 30; kidslt6 = 2; kidsge6 = 1; PredWage_Full = exp(PredLWage_Full(educ, exper, age, kidslt6, kidsge6)); put PredWage_Full=; PredWage_NoKids = exp(PredLWage_NoKids(educ, exper, age)); put PredWage_NoKids=; run;

PredWage_Full=3.4199679212 PredWage_NoKids=3.787216653 Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Conclusions  Users can encapsulate preexisting analytical procedures as building blocks for even larger more complex statistical analysis methods!  PROC FCMP provides the vehicle to write reusable, independent program units (functions and subroutines)  These units can be written and tested independently.

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Where to find more information  http://support.sas.com/saspresents  Paper is PDF form  Zip file containing all source code

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Adding Statistical Functionality to the DATA Step with PROC FCMP

Paper 326-2010

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.