The Weierstrass Approximation Theorem and Large Deviations Henryk Gzyl; José Luis Palacios The American Mathematical Monthly, Vol. 104, No. 7. (Aug. - Sep., 1997), pp. 650-653. Stable URL: http://links.jstor.org/sici?sici=0002-9890%28199708%2F09%29104%3A7%3C650%3ATWATAL%3E2.0.CO%3B2-Q The American Mathematical Monthly is currently published by Mathematical Association of America.
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/journals/maa.html. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.
The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers, and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community take advantage of advances in technology. For more information regarding JSTOR, please contact
[email protected].
http://www.jstor.org Fri Mar 7 17:39:19 2008
The Weierstrass Approximation Theorem and Large Deviations Henryk Gzyl and JosC Luis Palacios
Bernstein's proof (1912) of the Weierstrass approximation theorem, which states that the set of real polynomials over [O, 11 is dense in the space of all continuous real functions on [0, 11, is a classic application of probability theory to real analysis that finds its way into many textbooks ([I] and [2]) and journals [3]. All that is invoked in Bernstein's proof (at least as presented in [I] and [3]) is Chebyschev's inequality, and if the argument is applied to a function satisfying a Lipschitz condition, the rate of convergence of the Bernstein polynomials to the function can be shown to be at least of order l/n1i3. If instead of Chebyschev's inequality we use another probabilistic tool very much in vogue nowadays, the theory of large deviations, we can prove that the rate of convergence is at least of order ln1I2n/n1l2. All the material used here concerning large deviations is elementary and can be found in [I]. Let f be a real function on [O,1] that satisfies a Lipschitz condition, i.e., there is a constant C such that for all x, y E [O, 11 If(x> - f ( r ) I Then we have the following:
CIx - Y I .
Theorem 1 (Weierstrass approximation theorem). For f satisfying a Lipschitz condition, there is a sequence of polynomials p,(x), where the degree of p,(x) is n, and a constant K, which depends on f, such that
Here II 11 denotes the sup norm. Since the function f is Lipschitz, it is uniformly M. In order to prove continuous and bounded by a constant, say, M so that 11 f ll I the theorem we need the following lemma, taken almost verbatim from [I], (Corollary A.7) and included for completeness: Lemma 1. For a binomial random variable B(n, x) with n independent trials and probability of success x for each of them, and a > O arbitrary, we have
Proog Let XI, . . . , X,, be independent and identically distributed random variables with P ( X ; = 1 - x ) = x, P(X;
=
-x)
=
1 - x,
[August-September
and let X = XI + ... +Xn. Clearly X has distribution B(n, x) - nx. Since X is symmetric, it is enough to prove the one-sided inequality
Step I . For all reals a , P with I a l cosh( p ) Proof This is immediate if a function f(a, p)
=
=
5
1, we have
+ a sinh( p ) 1 or a
cosh( p )
=
-
4 e P2/2+ap.
1 or
I PI 2
+ a sinh( p )
-
PI
100. If (2) were false, the
ep2/2+afl
would assume a negative global minimum in the interior of the rectangle R
=
{ ( a , p ) : la1 5 1,IPI 1 1 0 0 ) .
Setting partial derivatives equal to 0, we find
and thus tanh tion.
p = p , which implies p = 0. But f ( a , 0) = 0 for all a , a contradic-
Step 2. For all 0 E [O, 11 and all h, ~ ~ A f l - 0+ )
(1 - o ) ~ - " 4 e e k a ,
Proof Setting 0 = (1 + a ) / 2 and h
=
2 p , (3) reduces to (2).
Step 3. Let, for the moment, h > 0 be arbitrary and let E[.] denote "expected value." Then
by Step 2. Thus
Applying Markov's inequality, P(X > a) We set h
=
=
P ( e A X> e")
E[eM]
4< eh2n/8-~u eha -
4a/n to optimize the inequality: P ( X > a) 4 e - 2 u Z / nas , claimed.
Proof of the theorem. Define the Bernstein polynomials
Then, since C:='=, '! xi(l - x)"-'
=
1, and because of the lemma, we have
( 1 1
Now we optimize the parameter a in terms of n. We consider the function aC F(a) = - + 2~e-~"'/", n set
and get the exponential equation r
While a precise solution for this exponential equation is unavailable, we are led to the asymptotic solution 1 a = -n1/2 ln'/2 n . 2 Replacing now in (4) a = 1/2n1I2 lnl/' n, we obtain
so K
=
C/2
+ 2 M works for n 2 3.
The classic probabilistic proof of the Weierstrass approximation theorem, when applied to a Lipschitz function, yields instead of equation (4) the expression
where the first summand follows from the Lipschitz condition and the second is due to Chebyshev's inequality. Optimizing G(a) (a much easier task than optimizing F(a) above) yields a n2l3 and therefore, inserting a = n2l3 into (6) (this is the choice of a in [I], by the way) yields
-
a weaker result, which justifies the effort of using the large deviation inequality. How does our rate of convergence compare with the ones found in standard textbooks on approximation theory? In [4], for instance, it is mentioned that if f i x ) = x 2 and p,ix) is the Bernstein polynomial for this function, then lip, - f ll = 1/4n. In general, this rate cannot be expected for all functions, and it is an exercise in [4] to prove that if f is twice continuously differentiable, then the error 652
NOTES
[August-September
of the approximation satisfies the bound
,
This rate is better than ours, but the assumption that f be twice continuously differentiable is much more restrictive than our Lipschitz condition. REFERENCES 1. N. Alon and J. Spencer, The Probabzhstic Metltod, Wiley, Ncw York, 1992. 2. K. L. Chung, A Course iil Probability Tlzeoly, 2nd ed., Academic Press, New York, 1974. 91 3. K. M. Levasseur, A probabilistic proof of the Wcierstrass approximation theorem, this MONTHLY (1984) 249-250. 4. M. J. D. Powcll, Approximation Theoly and Methods, Cambridge University Press, Cambridge, 1981.
CESMa Uniuersidad Sim6n Bolivar Apartado 89,000 Caracas, Venezuela jopala ecesma. usb. ve
Facultad de Ciencias Universidad Central de Venezuela Caracas, Venezuela
From the MONTHLY, Volume 4, 1897: A Brief Introduction to the Infinitesimal Calculus. Designed Especially to Aid in Reading Mathematical Economics and Statics. By Irving Fisher, Ph.D., Assistant Professor of Political Science in Yale University, Co-author of Phillips and Fisher's Elements of Geometry. 12mo. Cloth. 84 pages. Price, 75 cents. York and London: The Macmillan Co. This little work on the Calculus will be received with joy by a great army of students, teachers, and professors, who have lacked the time and courage to attack some of the more exhaustive works on the subject yet felt the need of a knowledge of the Calculus in order to enable them to read with intelligence the highest authorities on economic as well as other subjects. Dr. Fisher has prepared this little work with a special view of the needs of this class of students. Any one with a clear mind can very easily read and understand every sentence in this book. There is no metaphysical speculation nor obscure statements made in establishing its first principles. pp. 261-262
65.
Proposed by GEORGE LILLEY, Ph.D., LL.D., Portland, Oregon.
A string is wound spiralty 100 times around a cone 100 feet high and 2 feet in diameter at the base. Through what distance will a duck swim in unwinding the string keeping it taut at all times, the cone standing on its base and at right angles to the surface of the water? p. 62
66. Proposed by J. K. ELLWOOD, A.M., Principal of Colfax School, Pittsburgh, Pennsylvania Around the top of a conical frustum-base 5 feet, top 1 foot, altitude 100 feet-is wound a rope 100 feet long and 1 inch thick. It is unwound by a hawk flying in one plane. How far does Mr. Hawk fly? p. 62
19971
NOTES
653