Piecewise Convex Function Estimation

0 downloads 0 Views 208KB Size Report
points are unknown, we propose a model selection criterion which imposes a ... We consider an unknown function, f, is in the Sobolev space Wm;p 0;1] with m.
Piecewise Convex Function Estimation: Representations, Duality and Model Selection Kurt S. Riedel Courant Institute of Mathematical Sciences New York University New York, New York 10012-1185 Abstract

We consider spline estimates which preserve prescribed piecewise convex properties of the unknown function. A robust version of the penalized likelihood is given and shown to correspond to a variable halfwidth kernel smoother where the halfwidth adaptively decreases in regions of rapid change of the unknown function. When the convexity change points are prescribed, we derive representation results and smoothness properties of the estimates. A dual formulation is given which reduces the estimate is reduced to a nite dimensional convex optimization in the dual space. When the number of regions is unknown, the model selection problem is to determine the number of convexity change points. We propose a piecewise convex information criterion which imposes a logarithmic penalty on each new in ection point. This criterion is suitable for adaptive knot placement algorithms.

1 Introduction A common problem in nonparametric function estimation is that the estimate often has arti cial wiggles that the original function does not possess. In practice, these extra in ection points have a very negative impact on the utility and credibility of the estimate. In this article, we examine function estimation which preserves the geometric shape of the unknown function, f (t). In other words, the number and location of the change points of convexity of the estimate, f^(t), should approximate those of f (t). We say that f (t) has K change points of `-convexity with change points x1  x2 : : :  xK if (?1)k?1 f (`)(t)  0 for xk  t  xk+1. For ` = 0, f (t) is nonnegative and for ` = 1, the function is nondecreasing. In regions where the constraint of `-convexity is active, 1

f (`)(t) = 0 and f (t) is a polynomial of degree ` ? 1. For 1-convexity, f (t) is constant in the active constraint regions and for 2-convexity, the function is linear. Our subjective belief is that most people prefer smoothly varying functions such as quadratic or cubic polynomials even in the active constraint regions. Thus, piecewise 3-convexity or 4-convexity are also reasonable hypotheses. When the change points are prescribed, convex analysis can be employed to derive representation theorems, smoothness properties and duality results. Our work extends that of Refs. [?, ?, ?, ?, ?, ?, ?] to an important class of robust functionals. To motivate robust nonparametric estimation, we show that robust functionals correspond to variable halfwidth data-adaptive kernels, i.e. the e ective halfwidth decreases in regions where the unknown function changes rapidly. The more dicult problems of determining the number and location of the `-convexity breakpoints is the focus of [?, ?, ?]. When the number of change points are known, we prove the existence of a minimum of the penalized likelihood. When the number of change points are unknown, we propose a model selection criterion which imposes a logarithmic penalty on each inferred change point. This new \piecewise convex information criterion" is suitable for adaptive regression splines. Sections ?? and ?? contain functional analysis preliminaries. Lemma ?? gives a characterization of negative polar cones in the spirit of [?]. Representation and duality theorems for constrained smoothing splines have been developed in [?, ?, ?] for the case of prescribed convexity change points. In Sections ?? and ??, we generalize these results to the case of nonquadratic penalty functions. In Section ??, we show how robust penalty functions correspond to data-adaptive variable halfwidth kernel smoothers. In Section ??, we consider estimating the change point locations by minimizing the penalized least squares t. In Section ??, we consider adaptive knot placement and propose a model selection criterion which penalizes for extra in ection points.

2 Functional Analysis Preliminaries We consider an unknown function, f , is in the Sobolev space Wm;p[0; 1] with m  ` and 1 < p < 1, where

Wm;p  ff jf (m) 2 Lp[0; 1] and f; f 0 : : : f (m?1)(t) absolutely continuousg :

(2.1)

For f 2 Wm;p, we have the representation

f (t) =

Z t (t ? s)m?1 aj Pj (t) + (m ? 1)! f (m)(s)ds 0 j =0

mX ?1

2

;

(2.2)

where Pj (t)  tj =(j !). Equation (??) decomposes Wm;p into a direct sum of the space of polynomials of degree m ? 1 plus the set of functions whose rst m ? 1 Rderivatives vanish 0 [?]. We de ne the seminorm kf kp  1 jf (j ) (t)jp dt. We at t = 0, which we denote by Wm;p j;p 0 endow Wm;p with the norm:

jkf kjpm;p =

mX ?1 j =0

jf (j)(t = 0)jp + kf kpm;p :

(2.3)

0 with q = p=(p ? 1) The dual space of Wm;p is isomorphic to the direct sum of Pm?1 and Wm;q and the duality pairing of f 2 Wm;p and g 2 Wm;q is

hhg; f ii =

mX ?1 j =0

bj aj +

Z1 0

f (m)(t)g(m)(t)dt ;

(2.4)

where aj  f (j)(0) and bj  g(j)(0). We denote the duality pairing by hhii and the L2 inner product by hi. In [?], Wm;2 is given a reproducing kernel where for each t, f (t) = hhRt; f ii. The same reproducing kernel structure carries over to 1 < p < 1 with

Rt(s) =

mX ?1 j =0

Pj (t)Pj (s) +

Z minft;sg (t ? u)m?1 (s ? u)m?1 du [(m ? 1)!]2 0

:

(2.5)

 has representations Li f = hhbi (s); f (s)ii and Li f = hhi (s); f (s)i, A linear operator Li 2 Wm;p where bi(s)  LiR(; s) and hi(s)  Li (s ?). LiR(; s) denotes Li acting on the rst entry of R. In the standard case, where Lif  f (ti), bi(s) = R(ti; s) and hi(s) = (s ? ti).

3 Convex Cone Constraints In this and the next section, we assume that the change points fx1 : : : xK g of `-convexity are given and that the unknown function is in the Sobolev space, Wm;p[0; 1]. Given change points, fx1; x2 : : : xK g, we de ne the closed convex cone K;` [x ; : : : ; x ] = ff 2 W j (?1)k?1 f (`) (t)  0 for x Vm;p 1 K m;p k?1  t  xk g ;

(3.1)

where x0  0 and xK+1  1. Let x denote the K row vector, (x1; x2 : : :xK ). Throughout this article, we require `  m. By the Sobolev embedding theorem, f (`)(t) is continuous for ` < m. For ` = m, we require the convexity constraint in (??) almost everywhere. We de ne the class of functions with at most K change points as K;`  Vm;p

[

x1 x2 :::xK

K;` [x ; : : : ; x ] [ (?V K;` [x ; : : :; x ])g : fVm;p 1 K K m;p 1

3

(3.2)

K;` into V K +1;` . V K;` is the union of convex By allowing xk0 = xk0+1, we have embedded Vm;p m;p m;p cones, and is closed but not convex. Similar piecewise `-convex classes are de ned in [?] for the case ` = m + 1 with a supremum norm on the Holder constant for f (`?2). For Theorem ??, we need the following results from convex analysis. De nition [1] Let C be a closed convex cone in Wm;p; the negative polar, C ?, of C is  j 8f 2 C; hhg; f ii  0g. C ?  fg 2 Wm;p K;` [x ], we are able to give a more explicit characterization of the negative polar. For Vm;p Our result is restricted to `  m while Deutsch et al. [?] consider the more dicult case of m = 0 with `  m. K;` [x ] is V K;` [x ]? = closure in W  of the fg 2 Lemma 3.1 The negative polar of Vm;p m;p m;p C 2m?`[0; 1] j (?1)m?` x(t)g(2m?`)(t)  0; g(j)(0) = 0; j = 0 : : : ` ? 1; g(m+j)(1) = 0; and g(m?j?1) (0) ? (?1)j g(m+j)(0) = 0; j = 0 : : : m ? ` ? 2; (?1)m?` x(1)g(2m?`?1) (1)  0; x(0)[g(`)(0) + (?1)m?` g(2m?`?1) (1)  0g, where x (t) is de ned by x(t) = (?1)k?1 for xk < t < xk+1g and x(xk ) = 0; k = 1 : : : K . R R Proof. Integration by parts yields 01 f (m) (t)g (m)(t)dt = (?1)m?` 01 f (`) (t)g (2m?`) (t)dt + Pm?`?1 j (m?j ?1) (t)g (m+j ) j1 for g 2 C 2m?` [0; 1]. We now nd test functions, f~ 2 0 j =0 (?1) f K;` 6 xK , we choose Vm;p [x ] which require each term separately to be nonpositive. For t = ( ` ) m ? ` +1 m ? ` +1 ~ f (s) = (s ? t + h)+ (t ? s + h)+ x(s) where h is a small localization parameter and s+  s for s > 0 and zero otherwise. The boundary conditions at t = 1 are proved inductively with the sequence of test functions f~h(s) = (s ? 1 + h)m+ x(s) as h ! 0. 2 K;m [x ] is V K;m [x ]? = ?V K;m [x ] \ W 0 . Corollary 3.2 The negative polar of Vm;p m;p m;p m;p K;m?1 [x ] is the closure in W  of the fg 2 Corollary 3.3 The negative polar of Vm;p m;p C m+1[0; 1] j g(m+1)(t)x(t)  0; x(1)g(m)(1)  0; x(0)[g(m?1)(0) ? g(m)(0)]  0 g. K;m?2 [x ] is V K;m?2 [x ]? = closure in W  of the Corollary 3.4 The negative polar of Vm;p m;p m;p m +2 ( m +2) ( m ) ( m +1) ( m fg 2 C [0; 1] j g (t)x(t)  0; g (1) = 0; x(1)g (1)  0; [g ?1)(0) ? g(m)(0)] = 0; x(0)[g(m?2)(0) + g(m+1)(0)]  0g:

The negative polar is useful in evaluating the normal cone of K :

Lemma 3.5 ([?], p.171) Let C be a closed convex cone in W, the normal cone of C in W  at f , NC (f ) satis es NC (f ) = C ? \ ff g?, where C ? is the negative polar of C .

4

4 Robust splines: Representations and Smoothness

In this section, we generalize representation and smoothness results to a large class of robust functionals. These robust functionals are advantagous because they downweight outliers and adaptively adjust the e ective smoothing width. We are given N measurements of the unknown function, f (t):

yi = Li f + i = hhi; f i + i = hhbi ; f ii + i ;

(4.1)

where the Li are bounded linear operators on Wm;p, and the i are uncorrelated random variables with variance i2 > 0. A robusti ed estimate of f (t) given the measurements fyig K;` [x1; : : :; xK ]]: is f^  argmin VP[f 2 Vm;p Z

VP[f ]  p jf (m)(s)jpds +

N X i=1

i (hhi ; f i ? yi) ;

(4.2)

where the i are strictly convex, continuous functions. The standard case is p = 2 and i(yi ? hhi ; f i) = jyi ? f (ti)j2=(Ni2). For an excellent discussion of the advantages of robustness in function estimation, see Machler [?]. Theorem ?? is proven in [?] and Theorem ?? is proven in [?] for the case p = 2 and (y) = y2. For the unconstrained case of Theorem ??, see [?]. Equation (??) and the corresponding smoothness results appear in [?] for the case ` = 1, p = 2 and Li = (t ? ti). The set of fhi; i = 1; : : : ; N g separate polynomials of degree m ? 1 means that hhi ; Pmk=0?1 ck tk i = 0, 8i implies ck  0.

Theorem 4.1 Let fhig separate polynomials of degree m ? 1; then the minimization probK;` [x ], and the minimizing function satis es the lem (??) has an unique solution in Vm;p di erential equation:

N X m m ( m ) p ? 2 ( m ) ^ ^ (?1) d [jf j f (t)] + 0i(hhi ; f^i ? yi)hi(t) = 0 i=1

;

(4.3)

in those regions where jf (`) j > 0 for 1 < p < 1 and `  m. Proof. The functional (??) is strictly convex, lower semicontinuous and coercive, so by Theorem 2.1.2 of Ekeland and Temam [?], it has a unique minimum, f^, on any closed convex set. From the generalized calculus of convex analysis, the solution satis es N X 0i(hhi ; f^i ? yi)hi(t) + @NV (f ) i=1

0 2 (?1)m dm [jf^(m)jp?2f^(m)(t)] + 5

;

(4.4)

K;` [x ] at f [?, p. 189]. The normal cone is characterized where NV (f ) is the normal cone of Vm;p by Lemmas ?? and ??. From [?], each element of NV (f ) is the limit of a discrete sum: P (`) 0 t at ( ? t) where the t s are in the active constraint region. Integrating (??) yields 0 ^ (s);(s?t)+m?1 i jf^(m)jp?2f^(m)(t) = PNi=1 i (hhi;f i?y(i m)hh?i1)!

+

?`?1 d(s) R (s?t)m + (m?`?1)!

;

(4.5)

where d corresponds to a particular element of NV (f ). 2 For hi(s) = (s ? ti) and ` = m, Theorem ?? can be derived as a consequence of the corresponding result for constrained interpolation [?]. Corollary 4.2 If fhig are in W`;?1, then the minimizing function of (??) is in C 2m?`?2[0; 1]. Proof. Since (s ? t)+m?`?1 is m ? ` ? 2 times di erentiable, the rst term on the right hand side of (??) is m ? ` ? 2 times di erentiable. By hypothesis, hi 2 W`;?1 and thus R hi(s)(s ? t)m+ ?1ds is in C m?`?2 . Integrating (??) yields f 2 C 2m?`?2. 2

5 Equivalent adaptive halfwidth of robust splines Replacing the standard spline likelihood functional (p = 2 and (y) = y2=2) in (??) with a more robust version has several well-known advantages. First, outliers are less in uential when (y) downweights large values of the residual error. Second, for 1  p < 2, the set of candidate functions are increased, and the solution may have sharper local variation than in the p = 2 case. We now describe a third important advantage: the e ective halfwidth adapts to the unknown function. In [?], it is shown that as the number of measurements increase the spline estimate converges to a local kernel smoother estimate (provided the measurement times, ftig, are nearly regularly spaced.) For technical details, see [?] and Appendix B of [?]. Convergence proofs are available only for p = 2. The resulting e ecive kernel halfwidth, heff , is scales as heff  [F 0(t)] 21m , where F (t) is the limiting distribution of measurement points. For 1 < p < 2, no theory exists on the e ective halfwidth of a robust spline estimate. We assume that f^ converges to f in Wm;p under hypotheses similar to those used for the p = 2 case in [?] and Appendix B of [?]. These conditions relate to the discrepancy of the measurement times, ftig, the smoothness of F (t), and the scaling of the smoothing parameter with N . The appropriate modi cations for p 6= 2 are unknown. We can make a heuristic two-scale analysis of (??) in the continuum. We assume that in the continuum limit, the estimate satis es the following equation to zeroth order: (?1)mdm [jf^(m)jp?2f^(m)(t)] + f^(t) = y(t) ; (5.1) 6

where y(t) = f (t)+ Z (t), with Z (t) being a white noise process. Away from the m-convexity change points, we linearize ?? about f^(m)(t)  f (m) (t). Let f~(t) be the linearized variable for ??: f~(m)(t)  f^(m)(t) ? f (m)(t), where f~(t) satis es (?1)m(p ? 1)dm [jf (m)jp?2f~(m)(t)] + f~(t) = Z (t) : (5.2) When jf (m) (t)jp?2 is small but nonzero, the homogeneous equation may be solved using the Wenzel-Kramer-Brillioun expansion. The resulting Green's function for f~ may be recasted as a kernel smoother with an e ective halfwidth:

heff (t)  [F 0(t)jf (m)(t)jp?2] 21m :

(5.3)

For 1  p  2, the e ective halfwidth of the robusti ed function automatically reduces the halfwidth in regions of large jf (m)(t)j just like a variable halfwidth smoother. We caution that this result has not been rigorously derived. For the equivalent kernel, the bias error scales as f (m)(t)hm while the variance is proportional ito?11 =Nh. The halfwidth that minimizes the mean square error scales as hMSE  h N jf (m)j2 2m+1 The two halfwidths agree at p = 2=(2m + 1), but p < 1 is ill-conditioned. We recognize that this derivation is very loose, but we conjecture that a rigorous multiple scale analysis may prove ??. Our purpose is only to motivate the connection between robust splines and adaptive halfwidth kernel smoothers.

6 Constrained smoothing splines and duality In (??), the intervals on which f (`) (t) vanishes are unknowns and need to be found as part of the optimization. Using the di erential characterization (??) loses the convexity properties of the underlying functional. For this reason, extremizing the dual functional is now preferred.

Theorem 6.1 (Convex Duality) The dual variational problem of Theorem ?? is: Minimize over 2 lRN VP[ ; x] 

N 1?q Z j[P B ](m)(s)jqds + X i ( i) ? iyi ; x q i=1

where i is the Fenchel/Legendre transform of i , and B (t)  LiR(; t). The dual projection Px  is de ned as Z

j[Px

 g ](m)(s)jq ds  inf

PN i bi(t) i

Z1 jg(m) ? g~(m)(s)jq g~2V ? 0

7

ds ;

(6.1) with bi(t) =

(6.2)

subject to the constraints g (j) (0) = g~(j) (0); 0  j < m. The dual problem is strictly convex, and its minimum is the negative of the in mum of (??). When the fhig are linearly independent, the minimum satis es the di erential conditions:   i = 0i hhi; f^i ? yi ; and hhi; f i ? yi = 0( i );

i = 1:::N :

(6.3)

K;` [x ] and de ne Proof. Let V be the indicator function of Vm;p Z1 U (f ) = p jf (m)(s)jpds + V (f ) : 0

(6.4)

We claim that the Legendre transform of U (f ) is (??). Note that V (g) = V ? (g), the indicator function of the dual cone V ? . The Legendre transform of the rst term in (??) is 1?q Z 1 jg(m)(s)jq ds 0

V1(g) =  q

0 ; and 1 otherwise: for g 2 Wm;q

(6.5)

Our claim follows from [U1 + U2](g) = inf g0 fU1(g ? g0) + U2(g0)g. The remainder of the theorem including the di erential conditions (??) follows from the general duality theorem of Aubin and Ekeland [?, p. 221]. 2 An alternative formulation of the duality result for quadratic smoothing problems is given in [?]. For both theories, the case ` < m is dicult to evaluate in practice because the minimization in (??) can only rarely be reduced to an explicit nite dimensional problem. Only a few partial results are known when ` < m [?, ?, ?]. For the case ` = m, the minimization over the dual cone can be done explicitly and yields the following simpli cation:

Corollary 6.2 For ` = m, the dual projection, Px  , is a local operator with [Px M ](m) (s) = PNi=1 ib(im)(t) if P ib(im)(t)x(t)  0 and zero otherwise. Thus the minimization of (??)

is nite dimensional.

7 Change point estimation When the number of change points is xed, but the locations are unknown, we can estimate them by minimizing the functional in (??) with respect to the change point locations. We now show that there exists a set of minimizing change points.

Theorem 7.1 For each K with ` = m, there exist change points fxk ; k = 1; : : : K g which minimize the variational problem (??). 8

Proof. We use the dual variational problem (??) and maximize over x 2 [0; 1]K after minimizing over the 2 RN . We need only consider in the compact region PNi=1 i ( i) ? i yi. For ` = m, explicit construction of the functional (??) shows that it is jointly continuous in and x . Since (??) is convex in , Theorem ?? follows from the min-max theorem [?, p. 296]. 2 We conjecture that Theorem ?? is true for ` < m, but we lack a proof that Eq. (??) is continuous with respect to x for ` < m. The change point locations need not be unique. The proof requires  instead of < in the ordering xk  xk+1 to make the change point space compact. When xk = xk+1, the number of e ective change points is less than K . In [?], Mammen considers the case where K is known but the locations are unknown. The function is estimated using simple least squares on a class of functions roughly analogous +1 to Vm;K;m 1 . Mammen proves that this estimate achieves the optimal convergence rate of ?2m N 2m+1 for the mean integrated square error. Unfortunately, Mammen's estimator is not unique and often results in aestetically unappealing ts. For both formulations. nding the optimal change points locations is computationally intensive. For each candidate set of change points, the the likelihood function needs to be minimized subject to PC constraints. One advantage of our formulation is that for each candidate value of x , the programming problem is strictly convex in the dual. This strict convexity is lost if one uses a penalty functional with p = 1 as in [?] or p = 1 corresponding to a total variation norm. If the total variation norm is used and an absolute value penalty function is employed ((y) = jyj), the programming problem reduces to constrained linear programming.

8 Piecewise Convex Model Selection An alternative to smoothing splines is adaptive regression splines [?]. At each step, a new knot is added to the t. The number and locations of the new knots are chosen by minimizing a Loss of Fit (LoF) function, dN (f;^ fyig). Let ^ 2 be a measure of the average residual error: ^ 2(f;^ fyig) = N1 2 PNi=1[yi ? f^(ti)]2, or its L1 analog. Typical discrepancy functions are

dN (f;^ fyig) = ^ 2 /[1 ? ( 1p + m)=N ] and dB (f;^ fyig) = ^ 2[1+( 2p+m) ln(N )=N ] ; (8.1) where p is the number of knots in the t and m is the spline degree. Friedman proposed dFN with a default value of 1 = 3, while dB is the Bayesian/Schwartz information criterion when 2 = 1. For a nested family   of models, 2 = 1 is appropriate while 2 = 2 corresponds to a nonnested family with 2 NK candidate models at the kth level [2]. In very specialized 9

settings in regression theory and time series, it has been shown that LoF functions like dFN are asymptotically ecient while those like dBN are asymptotically consistent. In other words, using dFN -like criteria will asymptotically minimize the expected error at the cost of not always yielding the correct model. In contrast, the Bayesian criteria will asymptotically yield the correct model at the cost of having a larger expected error. Our goal is to consistently select the number of convexity change points and eciently estimate the model subject to the change point restrictions. Therefore, we propose the following new LoF function: # " 1 +

K ln( N ) =N 2 (8.2) PCIC = 2(^g; fyig) 1 ? ( p + m)=N 1 where K is the number of convexity change points and p is the number of knots. PCIC stands for Piecewise Convex Information   Criterion. In selecting the positions of the K change points, there are essentially 2 NK possible combinations of change point locations if we categorize the change points by the nearest measurement location. Thus, our default values are 1 = 3 and 2 = 2. We motivate PCIC: to add a change point requires an improvement in the residual square error of O(2 ln(N )), which corresponds to an asymptotically consistent estimate. If the additional knot does not increase the number of change points, it will be added if the residual error decreases by 12. Presently, PCIC is purely a heuristic principle. We conjecture that it consistently selects the number of change points and is asymptotically ecient within the class of methods which are asymptotically consistent with regards to convexity change points.

9 Discussion We have considered robust smoothing splines under piecewise convex constraints. We generalize the standard representation and smoothness results to nonlinear splines using convex analysis. When the same derivative is both constrained and penalized (` = m), the dual problem is nite dimensional. We have sketched a derivation of the e ective halfwidth of a robust spline. By robustifying the functional, the e ective halfwidth (??) for the equivalent kernel smoother scales as jf (m)(t)j p2?m2?.1 The halfwidth that minimizes the mean square error scales as i h hMSE  N jf (m)j2 2m+1 . Thus robust splines adjust the halfwidth, but not as much as the asymptoticall optimal local halfwidth would. When the number of convexity change points is known, their locations may be estimated by minimizing the penalized likelihood. For ` = m, we have existence, but not necessarily 10

uniqueness. When the number of change points is unknown, we add an additional penalty term proportional to ln(N ) the number of change points. This \piecewise convex information criterion" is suitable for adaptive knot placement in regression splines. We conjecture that this estimator is asymptotically consistent in the number of change points. In [?], we propose an alternative approach. Step 1: Estimate small nonoverlapping regions where the change points probably occur. Step 2: Perform a constrained t with f^(`+1) constrained to have a single sign in each of these small regions. For the pilot estimator approach, we are able to prove asymptotically optimal convergence rates for the mean integrated square error. Furthermore, the pilot estimate greatly reduces the numerical work by determining the regions where the convexity constraints are likely to be active. In the future, we plan to compare the pilot estimator approach versus adaptive knot placement used with the piecewise convex information criterion. Acknowledgments: Work funded by U.S. Dept. of Energy Grant DE-FG02-86ER53223.

References [1] J.-P. Aubin, and I. Ekeland, \Applied Nonlinear Analysis", John Wiley, New York 1984. [2] F. Deutsch, V. A. Ubhaya, and Y. Xu, Dual cones, constrained n-convex Lp approximation and perfect splines, J. Approx. Th. 80 (1995), 180-203. [3] I. Ekeland and R. Temam, \Convex Analysis and Variational Problems", North Holland, Amsterdam 1976. [4] J. Friedman, Multivariate adaptive regression splines, Ann. Stat. 19 1-141 (1991). [5] W. Li, D. Naik, and J. Swetits, A data smoothing technique for piecewise convex/concave curves, to appear in SIAM J. Sci. Comp. [6] M. Machler, Variational Solution of Penalized Likelihood Problems and Smooth Curve Estimation, to appear in Ann. Stat. [7] E. Mammen, Nonparametric regression under qualitative smoothness assumptions, Ann. Stat., 19 (1991), 741-759. [8] C. A. Michelli, P. W. Smith, J. Swettis, J. Ward, Constrained Lp Approximation, Construct. Approx. 1 (1985), 93-102. 11

[9] C. A. Michelli and F. Utreras, Smoothing and interpolation in a convex set of Hilbert space, SIAM J. Stat. Sci. Comp. 9 (1985), 728-746. [10] C. A. Michelli and F. Utreras, Smoothing and interpolation in a convex set of a Hilbert space: II, the semi-norm case, Math. Model. & Num. Anal. 25 (1991), 425-440. [11] Riedel, K. S., Piecewise convex function estimation: Pilot estimators Submitted for publication. [12] B. W. Silverman, Spline smoothing: the equivalent variable kernel method, Ann. Stat. 12 (1984), 898-916. [13] F. Utreras, Smoothing noisy data under monotonicity constraints - Existence, characterization and convergence rates, Numerische Math. 47 (1985), 611-625. [14] M. Villalobos and G. Wahba, Inequality-constrained multivariate smoothing splines with application to the estimation of posterior probabilities, J. Amer. Stat. Assoc., 82, (1987), 239-248. [15] G. Wahba, Spline Models for Observational Data, SIAM, Philadelphia, PA 1991.

12