SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries.
Adding Statistical Functionality to the DATA Step with PROC FCMP Stacey Christian and Jacques Rioux SAS Institute Inc., Cary, NC
Paper 326-2010
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Introduction/Motivation Ever want to call a SAS procedure from the DATA step? Ever want to encapsulate a complicated analytical algorithm in a reusable function? This talk will demonstrate how to add statistical functionality to the DATA step through the definition of FCMP function wrappers.
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Overview RUN_MACRO function in FCMP Recursive Technique Iterative Technique/The Simulation Meta Programming with FCMP
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
RUN_MACRO Function in FCMP executes a predefined SAS macro Syntax: rc = run_macro(‘macro_name’, var_1, var_2, …); • rc : return code • macro_name: name of sas macro to run
• var_N: variables to pass to/from macro
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
See Macro Run /* Create a macro called testmacro */ %macro subtract_macro; %let difference = %sysevalf(&a - &b); %mend subtract_macro; /* Use subtract_macro within a function */ proc fcmp outlib = sasuser.ds.functions;
function subtract(a,b); rc = run_macro(„subtract_macro', a, b, difference); if rc eq 0 then return(difference); else return(.); endsub; /* test the call */ a = 5.3; b = 0.7; diff = subtract(a, b); put diff=; run;
diff=4.6 Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
See Macro Run in DATA Step options cmplib = (sasuser.ds); data _null_; a = 5.3; b = 0.7; diff = subtract(a, b); put diff=; run;
diff=4.6
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Recursive Technique: Segmenting Time Series Data “Segmenting Time Series: A Survey and Novel Approach” Keogh, Eamonn, et. al.
reduce extremely large time series data sets piecewise linear approximations
top-down recursive algorithm
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Top Down Algorithm SegmentTopDown ( currentSegment ) { error = run_linear_approximation( currentSegment ); leftError = run_linear_approximation ( leftSegment );
rightError = run_linear_approximation ( rightSegment ); combinedError = leftError + rightError; if (combinedError < error) then { call SegmentTopDown ( leftSegment ) ; call SegmentTopDown ( rightSegment ); } else { keep_segment( currentSegment ); } }
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Top Down Subroutine subroutine segment_topdown(data $, segdata $, var $, start, end, threshold); error = linear_approximation(data, start,end); mid = start + floor((end-start)/2); left_error = linear_approximation (data, start, mid); right_error = linear_approximation (data, mid+1, end); improvement = (error – (left_error + right_error)) / error; if (improvement > threshold) then do; call segment_topdown(data, segdata, start, mid, threshold); call segment_topdown(data, segdata, mid+1, end, threshold); end; else do; call append_segment(segdata, start, end, error); end; endsub;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Linear Approximation Subroutine
function linear_approximation(ds_in $, var $, first_obs, last_obs); rc = run_macro(„linear_approximation_macro‟, ds_in, first_obs, last_obs, var, error); return(error); endsub;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Linear Approximation Macro %macro linear_approximation_macro; data _TEMP_; set &ds_in(firstobs=&first_obs obs=&last_obs); retain _TREND_ 0; _TREND_ = _TREND_ + 1; run; proc reg data=_TEMP_ outest=_EST_ noprint; model &var = _TREND_ / sse; run; quit; proc sql noprint; select _SSE_ into :ERROR from _est_; quit; %mend linear_approximation_macro;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Recursive Technique: Results data _NULL_; call segment_topdown("sasuser.snp", "work.segds_20", "close", 1, 15116, 0.2); call segment_topdown("sasuser.snp", "work.segds_15", "close", 1, 15116, 0.15); run;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Recursive Technique: Graphic Results
42 Piecewise Linear Segments Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Recursive Technique: Graphic Results
113 Piecewise Linear Segments Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Iterative Technique • "Minimum Quadratic Distance Estimation for the Proportional Hazards Regression Model with Grouped Data“, Jacques Rioux and Andrew Luong • Survival models/proportional hazard model
• Proc PHREG (max likelihood) versus minimum distance methods • Iteratively reweighted least squares algorithm
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Iteratively Reweighted Least Squares Algorithm initialize_weights( weights ); params1 = run_regression( weights ); while (maxRelativeDifference > criteria) { update_weights(weights);
params2 = run_regression( weights ); maxRelativeDifference = params2 - params1; params1 = params2; }
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
IterativeTechnique: DATA Step code subroutine fit_ph_model(indata $, parmData $, depVars $, weightVars $, indepVars $ ); array params1[3]; array params2[3]; call prepare_phdata(indata, “_prepdata_”); call run_regression(“_prepdata_”, depVars, indepVars, weightVars, parmData, params1);
maxRelativeDifference = 1; do while( maxRelativeDifference > 0.0001 ); call update_weights(“_prepdata_”, weightVars, parmData); call run_regression( “_prepdata_”, depVars, indepVars, weightVars, parmData, params2 ); maxRelativeDifference = calc_max_relative_diff(params1,params2); end; endsub;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Run_Regression Subroutine subroutine run_regression( data $, dependent $, independent $, weight $, parmData $, parmArray[*]); outargs parmArray; array tmpArray[1] _temporary_; rc = RUN_MACRO ('run_regression_macro', data, parmData , dependent, independent, weight) ; rc = read_array(parmData, tmpArray); do i = 1 to dim(parmArray); parmArray[i] = tmpArray[1,i]; end; endsub;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Run_Regression Macro %macro run_regression_macro;
proc reg data=&data outest=&parmData NOPRINT; model &dependent = &independent/noint; weight &weight; quit; data &parmData; set &parmData; keep &independent; run; %mend run_regression_macro
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
The True Glory of Reusable Functions: The Simulation • Now have a “fitting routine” for the Proportional Hazard Model (fit_ph_model) • Create a function to generate PH data (called generate_ph_data) • Create a function to append fits to results data set (called append_ph_data).
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
The Simulation Study proc fcmp; do i=1 to 1000; call simulate_ph_data ("work.simdata"); call fit_ph_model("work.simdata", "work.params", "log_log_Pij", "Weight", "x1 x2 x3" ); call append_data("work.simresults", "work.params"); end; run;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Simulation Results Coefficient
Real Value
Mean
StDev
X1
0.1 0.102454
0.036917
X2
0.3 0.307029
0.050375
X3
0.2 0.205464
0.017793
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Simulation Graphs
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Meta Programming Create you own scoring function dynamically from a fitted model subroutine create_score( data $, dependent $, independent $, scoreFunc $, library $ ); paramds = "work.params"; rc = RUN_MACRO('run_regression_macro', data, paramds, dependent, independent);
rc = RUN_MACRO('create_score_func_macro', paramds, independent, scoreFunc, library); endsub;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Score Function Macro %macro create_score_func_macro; proc transpose data =¶mds out=¶mds._t; var &independent; run; proc sql noprint; select trim(_NAME_) || " * " || strip(put(col1,BEST12.)) into: theScore separated by " + " from ¶mds._t; select trim(_NAME_) into: theArgs separated by " , " from ¶mds._t; quit; data _NULL_; set ¶mds; call symputX ("Intercept",intercept); run;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Score Function Macro - continued proc fcmp outlib=&library..score; function &scoreFunc(&theArgs); return(&Intercept + &theScore); endsub; quit; %mend create_score_func_macro;
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Run Create Score Function data _NULL_; call create_score("work.mroz", "lwage", "educ exper age kidslt6 kidsge6", "PredLWage_Full", "sasuser.score"); call create_score("work.mroz", "lwage", "educ exper age", "PredLWage_NoKids", "sasuser.score"); run; data _NULL_; educ = 15; exper = 5; age = 30; kidslt6 = 2; kidsge6 = 1; PredWage_Full = exp(PredLWage_Full(educ, exper, age, kidslt6, kidsge6)); put PredWage_Full=; PredWage_NoKids = exp(PredLWage_NoKids(educ, exper, age)); put PredWage_NoKids=; run;
PredWage_Full=3.4199679212 PredWage_NoKids=3.787216653 Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Conclusions Users can encapsulate preexisting analytical procedures as building blocks for even larger more complex statistical analysis methods! PROC FCMP provides the vehicle to write reusable, independent program units (functions and subroutines) These units can be written and tested independently.
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Where to find more information http://support.sas.com/saspresents Paper is PDF form Zip file containing all source code
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.
Adding Statistical Functionality to the DATA Step with PROC FCMP
Paper 326-2010
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.