as a first approximation of a knot coordinates in piecewise regression. Finally, we ..... Y=YY + 1.8*SQRT(15)*RANNOR(12345+I*4);. X=&XX;. OUTPUT;. END;.
NESUG 17
Analysis
The Piecewise Regression Model as a Response Modeling Tool Eugene Brusilovskiy University of Pennsylvania Philadelphia, PA
Abstract The general problem in response modeling is to identify a response curve and estimate the diminishing returns effect. There are d ifferent approaches to response modeling in SAS® with emphasis on caveats (OLS segmented regression, robust regression, neural nets, and nonparametric regression). In this paper, we formulate a new problem statement of response modeling as a concave piecewise approximation of a response curve. We use artificial data to illustrate this approach. As the accuracy of the solution depends on the ratio of signal/(signal+noise), we can obtain the exact solution when the ratio is close to one, or when the knots are known. A three step heuristic approach is suggested for the situation when the data are significantly contaminated and/or the knots are unknown. The approach includes three steps. First, we run a dummy regression with the SAS® PROC REG to estimate the parameters of dummy variables. Then, we test for a structural break in the series of estimated parameters from the dummy regression using the Chow test in PROC AUTOREG or PROC MODEL. If a change point is identified, then it is treated as a knot or as a first approximation of a knot coordinates in piecewise regression. Finally, we develop a piecewise concave regression with PROC NLP.
Introduction Promotion response modeling (PRM) is a necessary decision support tool in today’s highly-competitive market. Since the consequences (cost, for instance) of wrong decisions are increasing, so is the role of the promotion response modeling. PRM is industry-specific: for example, response modeling in the credit card industry is essentially different from that in the pharmaceutical industry. In this paper, we will concentrate on the latter, where a response curve is used for the evaluation of effectiveness of the promotion campaign, sales force allocation, development of the optimal marketing mix, etc. Each of these problems may require its own definition of the response curve. In general, however, a promotion response curve is the result of PRM and could be defined as a mathematical construct (depending on the nature of its application) that relates a promotional effort to its response. In the pharmaceutical industry, the response could be the number of new prescriptions, and the promotion effort could be doctor detailing, controlling for all other promotion efforts. As stated above, an adequate definition of the response curve is very important. Real promotion campaign data are very noisy, and the relationship between promotional efforts and responses to them are very weak. Moreover, it is necessary to take into account the diminishing returns and the monotonically increasing nature of the response curve. In other words, we assume that the higher the promotion effort, the higher the response, until some point where the over-promotion effect may kick in 1 . Nonparametric regression would not be helpful here because the resulting response curve will, as a rule, have multiple maxima and minima. Therefore, we formulate the problem of response modeling as a problem of nonlinear optimization with linear and nonlinear constraints. To be specific, we have to find a concave piecewise linear approximation of the relationship between the response and promotion efforts. Concavity and monotonicity are necessary to reflect the diminishing returns in the resulting response curve. The piecewise linearity condition has to hold for the sake of simplicity of the next steps of the decision support, that requires optimization.
Problem Statement To make things easier, we will consider only three pieces in the concave piecewise linear approximation of the response curve (see Graph 1). Many authors have considered the problem of piecewise linear approximation (for example, see (2), 1
We know that a product is “over-promoted” when the marginal response becomes negative.
NESUG 17
Analysis
(3)). In this paper, we impose the concavity restriction that is not present in other works; moreover, and we want to solve this problem in SAS®. Let Y be the response, X be the promotion effort, and S1 and S2 be the first and second knots, respectively. Then, based on promotional data, we need to find the set of unknown parameters B0 , B1 , B2 , B3 , S1 , S2 that minimize the objective function (the sum of squared residuals):
(1)
Y = B0 + B1 X , Y = B0 + B1 S1 + B2 ( X − S1 ), Y = B + B S + B ( S − S ) + B ( X − S ), 0 1 1 2 2 1 3 2
X ≤ S1 S1 < X ≤ S 2 X > S2
Where B1 > 0, B2 > 0, B1 – B2 > 0, B2 – B3 > 0, S1 > 0, and S2 – S1 > 0. This formulation assumes that B3 could be negative. If this is the case, the response goes down as promotion efforts go up – this is the over-promotion effect we mentioned above. If we believe that there is no over-promotion effect, then we have to impose an additional restriction on B3 to be nonnegative. According to the definition of the response curve Y, it is continuous, but not differentiable at the knots. Mathematically, the problem of finding the response curve Y, formulated above, is that of nonlinear programming with a continuous but non-differentiable objective function and linear and nonlinear constraints. This type of problem could be solved by the SAS/OR® Non-Linear Programming procedure (PROC NLP). PROC NLP has 11 different optimization algorithms. However, we can use only one of them, namely Nelder-Mead Simplex Method (NMSIMP). It is the only algorithm that doesn’t require first and second order derivatives, and allows for boundary constraints, linear constraints and nonlinear constraints. (p. 429, (1))
Caveats and the Solution The formulation of the problem of finding the concave piecewise linear response curve seems very simple, however, this is just an illusion. Even when the data don’t contain any noise, the estimation of the concave nonlinear piecewise response curve does not always lead to a precise solution. Let’s consider the following example, where we assume that our response curve consists of three linear pieces.
(2)
Y = A0 + A1 X , Y = A0 + A1T1 + A2 ( X − T1 ), Y = A + A T + A (T − T ) + A ( X − T ), 0 1 1 2 2 1 3 2
X ≤ T1 T1 < X ≤ T2 X > T2
Where A 0 =0, A 1 =4, A 2 =1, A 3 = -1, T1 =8, T2 =16. Substituting, we get:
(3)
Y = 4 X , Y = 4 × 8 + ( X − 8) = 24 + X , Y = 4 × 8 + (16 − 8) − ( X − 16) = 56 − X ,
X ≤8 8 < X ≤ 16 X > 16
Our goal is to use PROC NLP (see code below) to find estimates B0 , B1 , B2 , B3 , S1 , S2 of parameters A0 , A 1 , A 2 , A3 , T1 , T2 based only on the data for X and Y generated by (2) using the program in Appendix 1. As can be seen from the code, we consider the situation without the over-promotion effect: A 3 =0.1>0. First, the PROC NLP code did not include any initial values for parameters. In this situation, PROC NLP automatically assigns random initial values, and the results of the parameter estimation vary according to those initial values. If these randomly assigned initial values are not close to the actual values, PROC NLP is not able to find the exact solution. The code below
NESUG 17
Analysis
PROC NLP data=DATA_NO_NOISE OUTEST=STATS2 TECH=NMSIMP MAXFUNC=500000; PARAMETERS B0, B1, B2, B3, S1, S2; * PARAMETERS B0, B1, B2, B3, S1=5, S2=9; MIN F; IF X S1+3, B1 > 0, B2 > 0, B1 - B2 > 0, B2- B3 > 0; NLINCON B3>=0; /****NO OVERPROMOTION EFFECT****/ RUN;
produced very different results every time it was run (out of ten times). The best solution was: PROC NLP: Nonlinear Minimization Optimization Results Parameter Estimates N Parameter 1 2 3 4 5 6
B0 B1 B2 B3 S1 S2
Estimate -0.169762 4.102762 3.562570 0.604655 4.954981 9.880196
Gradient Objective Function 0.165593 0.136660 0.221306 0.315723 -0.287138 0.058462
Value of Objective Function = 40.468431421
where the initial values B0 = 0.9880090207, B1 = 0.9232514267, B2 = 0.4936586565, B3 = 0.3940990266, S1 = 0.8213401534 and S2 = 0.4876402093. When we put plausible initial values for knots S1 = 5 and S2 = 9, proc NLP immediately found almost the exact solution: PROC NLP: Nonlinear Minimization Optimization Results Parameter Estimates N Parameter 1 2 3 4 5 6
B0 B1 B2 B3 S1 S2
Estimate -0.044082 4.009843 1.512494 0.105701 7.972679 15.926853
Gradient Objective Function -0.120676 -0.148904 -0.105357 -0.052936 -0.136654 -0.305662
Value of Objective Function = 0.0141117445
Since the objective function possesses potentially many local minima, and is non-differentiable at a number of points, the strong dependence of the accuracy of the solution on initial values is the general problem of nonlinear optimization. The situation becomes more complex when PROC NLP tries to estimate a response curve without over-promotion effect from the data with over-promotion effect. Even when there is no noise in the data, this is still a very complicated problem.
NESUG 17
Analysis
In real response data, the
Signal ratio is very small, and thus the problem frequently becomes intractable. Noisy Signal + Noise
data (see Graph 2) were generated by the code in Appendix 2, with the exact same parameters A0 =0, A 1 =4, A 2 =1.5, A 3 =0.1, S1 =8, S2 =16 as in our example above. Since real parameters are unknown, it’s very difficult to evaluate the results, but the dependence of the solution on initial values for noisy data is stronger. Thus, to overcome this problem, we offer a three-step heuristic approach. In the first step, we use dummy regression (PROC REG), in order to estimate the parameters of dummy variables for X=1 to X=max[X]. These regression parameters are treated as a time series, where instead of the time index, the number of the dummy variable is used. (The code for the dummy regression is in Appendix 3, and the graph of dummy regression parameter estimates as a series is in Graph 3). Secondly, if we have some sort of expert knowledge about the number of the knots and their locations, we can apply the Chow Test (PROC AUTOREG) to test the hypothesis about the breakpoints, i.e., knots. The last step involves estimating the parameters of the piecewise concave response curve from the series data using PROC NLP . Here, we can set the breakpoints from the second step as the initial values. Although the problem of assigning the initial values remains, the optimization problem becomes significantly simpler. In our example, we don’t know the optimal number of segments – it could be one, two or three. Thus, we need to run PROC NLP three times and then compare the values of the objective function in all three cases with zero, one and two knots . The final response curve will consist of the number of segments that has the smallest objective function. Comparing values of the objective function, the optimal number of segments in our example is 2.
Summary We formulate a new problem statement of response modeling as a concave piecewise approximation of a response curve. We use artificial data to illustrate this approach. As the accuracy of the solution depends on the ratio of
Signal , Signal + Noise
we can obtain the exact solution when the ratio is close to one, or when the knots are known. A three step heuristic approach is suggested for the situation when the data are significantly contaminated and/or the knots are unknown.
References 1.
Chapter 5. The NLP Procedure. SAS Institute Inc., SAS/OR® User’s Guide: Mathematical Programming, Version 8, Cary, NC: SAS Institute, Inc., 1999, pp. 369-511.
2.
Lerman, P.M. Fitting Segmented Regression Models by Grid Search. Applied Statistics, Vol. 29, No. 1 (1980), pp. 77-84.
3.
McGee, Victor E and Willard T. Carleton. Piecewise Regression. Journal of the American Statistical Association, Vol. 65, No. 331 (Sep., 1970), 1109-1124.
NESUG 17
Analysis
APPENDIX 1 Macro for generation of piecewise linear response curve. %MACRO DDATA_NO_NOISE(A0=,A1=,A2=,A3=, S1=,S2=); DATA DATA_NO_NOISE; %DO XX=1 %TO 25 ; /****ONLY ONE PIECE***/ %IF &S1=0 %THEN %DO; Y=%SYSEVALF(&A0 + %SYSEVALF(&A1*&XX)); X=&XX; OUTPUT; %END; /***TWO PIECES***/ %ELSE %IF &S2=0 %THEN %DO; %IF &XX