Normalized convergence of random variables - Springer Link

1 downloads 0 Views 482KB Size Report
It is relevant for studying the rate of convergence of stochastic iterative optimization ... theorem of probability theory implies normalized convergence of the ...
N O R M A L I Z E D C O N V E R G E N C E OF R A N D O M VARIABLES

UDC 519.21

V. I. Norkin

The properties of a new type of convergence of sequences of random variables are considered. Normalized convergence occupies an intermediate position between convergence in the mean and convergence in probability. It is relevant for studying the rate of convergence of stochastic iterative optimization algorithms and statistical methods of stochastic programming.

The main types of convergence used in probability theory are almost sure convergence, convergence in the mean, convergence in probability, and convergence in distribution. This paper introduces a new type of convergence, called normalized convergence, which occupies an intermediate position between convergence in the mean and convergence in probability. Normalized convergence is preserved under H61der transformations of random variables. It is also preserved under addition, multiplication, division, and Cartesian product of normalized-convergent sequences of random variables. Every limit theorem of probability theory implies normalized convergence of the corresponding random variables. A Cauchy-type test exists for normalized convergence. The concept of normalized convergence arose in connection with studies of the rate of convergence of statistical methods of stochastic programming (see [1-4]). Particular cases of normalized convergence have been considered in [5-8] for the description of the rate of convergence of stochastic iterative optimization algorithms.

DEFINITION OF NORMAL CONVERGENCE

Definition 1. Let (t2, ~;, P) be a probability space, X a separable metric space with the distance function P(', ").We say that the sequence of random variables ~n: t2 --, X, n = 1,2 ..... is normalized-convergent to the random variable ~: t2 --, X with rate (sequence of rates) lh, n and distribution F(t) (~n __,n ~) if there exists a numerical sequence Pn --" + oo, 0 < ~n < + o~, and a distribution function F: R 1 --, [0, 1] such that

liminf P{~.9 (~,. ~ ) < t } ~ F(t) v t E R 1, n~+~

(1)

where lim inf is the lower limit of a numerical sequence. The separability of X is necessary only in order to ensure measurability of p(~,, ~) (see [9]). If in the definition of normalized convergence F(t) =- 1 starting with some t = T, then normalized convergence of ~n to ~ is equivalent to the assertion lira P {v,~p( ~ , ~) < T} = 1.

(2)

n-n,xcx:

Indeed, setting t = T i n (1), we obtain (2); conversely, defining 1, t > T, F(t)----- O, t 0 and all n _> N(e) we have P{v,vo(~n, ~) < t} > F(t) - e Yt E R 1, then we have uniform normalized convergence of ~n to ~ (~n __,u.n. ~). An equivalent definition of uniform normalized convergence is provided by the following theorem. T H E O R E M 1 [3]. ~n _.,u.n. ~ with rate 1/vn and distribution F(t) if and only if there exists a univariate distribution function ¢(t) such that for all n and t ~ R ~ we have P{vnP(~n, ~) < t} - ¢(t). Let us elucidate the geometrical meaning of uniform normalized convergence. In the linear space Z of the random variables ~ defined on the probability space (f~, g, P) with values in the Banach space X, consider a system of neighborhoods of zero of the form UF = {~ IP: II II < t} _> F(t) Yt ~ R ~, where F(t)is any univariate distribution function. This system of neighborhoods forms a base of some topology T in Z. Fix some neighborhood of zero UF @ T and consider the subspace (Z, XUF)x> o of the original linear topological space. Recall that a sequence of elements of a linear topological space converges to zero if any neighborhood of zero eventually contains all the points of this sequence. T H E O R E M 2 [3]. Uniform normalized convergence of ~n to zero (with rate 1/vn and distribution F(t)) is equivalent to convergence of ~n to zero in the subspace (Z, XUF)x> o of the linear topological space of random variables (Z, T).

RELATIONSHIP WITH OTHER TYPES OF CONVERGENCE Convergence in the mean implies normalized convergence. T H E O R E M 1 [1]. If limn_.o~ Mp(~n , ~) = O, then (1) is satisfied with

~,,~= 1/M9 ( ~ , ~),

F (t) = { 1 --0, 1/t, tt~ 1,

M is the expectation symbol.

Proof For vn = 1/Mp(~ n, ~) we have limn__,oo vn = + oo and vnMp(~n , ~) _< 1, noting that + o~ • 0 = 0. By the Chebyshev inequality, we have P{vnO(~n, ~) >_ t} < vnMP(~n, ~)/t - 1/t for vn < + oo. If vn = + oo, them MP(~n , ~) = 0 and P{vnP(~n, ~) _> t > 0} = P{P(~n, ~) > 0} = 0. Therefore, in either case P{PnO(~n, ~) >- t} 0 and

P (.~p ( ~ , ~) < t) >~ F (t) =

fa -

(

0,

I/t,

t>l

t~ F (t) = 0, t ~< 11 This proves normalized convergence of the programmed-step method. For the method with a constant step On = h [7], N(e, ~) = Cl/[h(C2e~ - h)] + 1, where C1, C2 are some positive constants. For

e" =

t

h + ( k _ l)h

7-'

where k > 1, t > 1 are arbitrary numerical parameters, the corresponding index is N(t', ~') = k. Substituting these t', ~', and N(e', fi') in (3), we obtain

P

2C~ h+C,/((k--1)h)

}

1

p(~,~) .q. P D '=>-v, (~n - - ~) ---> 0 =~ ~n8 ( ~ - - ~) - - ÷ 0, where t~ is an arbitrary number, 0 -< c < 1 . Here the first and the last implications are well-known [9], the second implication follows from Theorem 5, and the third implication follows from Theorem 4.

CALCULUS OF NORMALIZED-CONVERGENT SEQUENCES OF RANDOM VARIABLES Normalized convergence is preserved under H61der transformations of random variables. T H E O R E M 7. Let (~, I1, P) be a probability space, X and Y separable metric spaces with distance functions #x(', ") and py(., .), respectively. Assume that the sequence of random variables (~: f~ --- X is normalized-convergent to the random variable (: ~ --, G C X with rate 1/~n and distribution F(t). Suppose that the mapping y: X -, Y is H61der in some eneighborhood of the set G, more precisely py(y(xl) , y(x2)) _< LePxa(Xl, x2) , c~ > 0, Le > 0, for all x 2 ~ G and all x~ from the e-neighborhood of G. Then the sequence Y(~n) is normalized-convergent to y(() with rate 1/~n~ and distribution F((t/L)~m), where L = lira infe__,+o' e'_>e Le" Proof Let p (x, G) = inf Px (x, z),

L~ = inf L~,.

The mapping y(x) is H61der in the e-neighborhood of G in the above sense with the constant Le'. We have the inequalities ' ~ P{ ~70~,Pu(Y(~,),g(~)) . < t } > P { ¥ ~=Pu(Y(~), 9(~)~ X. T H E O R E M 10. If (~n, 71n)° n (~, rl) with rate 1/pn and distribution F(t), then ~n ° n ~ and 71n ° n 71with the same rate and the same distribution.

Proof We have the inequality P{PnP~(~n, ~) < t} >_ P{vn(P~(~n, ~) + P~(71n, 71)) < t}. Therefore lira inf P {%#~ (~,~, ~) < t} ~ F (t). Q.E.D. T H E O R E M 11. Assume that in the separable metric spaces X~ and Xn we have ~n ° n ~ and 71n ._>n71with rates l/Pn, 1/l~n and distributions F~(t), Fn(t), respectively. Then in the metric space X = (X~, Xn) with the distance P(', ") = P~(', ") + Pn(', ") we have (~n, 71n)° n (~, 71) with rate 1/X n = 1/min(vn~,/~ne), 0 < t < 1, and distribution F(t) = max(F~(t), Fn(t)). Proof Let x n = P~(~n, ~) and Yn = On(71n, 71). By assumption, x n o n 0 and Yn ._>n 0 with rates 1/Vn, 1/tzn and distributions F~(t), Fn(t), respectively. By Theorem 9, x n + Yn __>n0 with rate 1/X n and distribution F(t). Q.E.D. The following propositions show that linear combinations, products, and quotients of normalized-convergent sequences of random variables are normalized-convergent sequences. T H E O R E M 12. If in the Banach space X we have ~n ° n ~ and 71n ._,n 71with rates 1/Un, 1/l~n and distributions F~, F n respectively, then edjn +/371n o n tx~ + ~71 with rate 1/X n = 1/min(un ~, #n~), 0 _< ~ < 1, and distribution F(t) = max(F~(t/L), Fn (t/L)), where L = max( [a l, /3 1), c~ and/3 are numbers.

400

Proof. Consider the mapping (x, y) --, ooc + BY. It is Lipschitz in X ® Xwith the constant L = max(Icxl, ]/~1), i.e., for any (x~, Yl) ~ X ® X and (x2, Y2) ~ X ® X we have II + - #y2 II - L( Ilx II + IlY - y2 II By Theorem 11, (}n, ~n) --" (}, '7) with rate 1/X n and distribution F'(t) = max(F,(t), Fn(t)). Now by Theorem 7 normalized convergence is preserved under Lipschitz mappings, and so liminfP{~,~[,o~+f3~l~--o:~--f3~l]~t}~F'(+). ft-.)~oo

Q.E.D. THEOREM 13. Assume that the sequences of numerical random variables ~n _.>n ~ and r/n _.>nr/with rates 1/pn, 1/#n and distributions F~, Fn, respectively. Suppose that } and ,1 are a.s. bounded, i.e., l}[ < L and I~l < L a.s. Then (nn,, --" }~7 with rate I/X n = 1/min(~n e, #ne), 0 -< e < 1, and distribution F(t) = max(F~(t/L), Fn(t/L)). Proof By Theorem 11, the pair (}n, r/n) _.,n(}, ,7) with rate 1IXn and distribution F'(t) = max(F~(t), Fn(t)). Consider the funcfionf(x, y) = xy. It is Lipschitz in the domain G6 = {(x, y) E R211xl _< L + 6, lyl -< L + ~} with the Lipschitz constant (L + ~), i.e., for any (x~, Yl) ~ G~ and (x2, Y2) ~ G~ we have lxlY ~ - x~21 ~}.

Passing to the limit, we get

lim inf P {%p (~,~,~) < t}i> lira inf P {%~p(~, ~n+pn)< t -- ~}i> F (t r~..~ oo

-

-

e).

/*-),oo

Since e is arbitraryand F(t) is left-continuous,we finallyhave lira inf P {%9 (~, ~) < t} ~ F (t). Q.E.D. ~+,o

REFERENCES .

2. 3. 4. 5. 6. 7. 8. 9. 10.

402

Yu. M. Ermoliev and V. I. Norkin, Normalized Convergence in Stochastic Optimization, Preprint WP-89-091, IIASA, Laxenburg, Austria (1989). V. I. Norkin, Stability of Stochastic Optimization Models and Statistical Methods of Stochastic Programming [in Russian], Preprint 89-53, Inst. Kiber. im. V. M. Glushkova AN UkrSSR, Kiev (1989). Yu. M. Ermol'ev and V. I. Norkin, "Normalized convergence of random variables and its application," Kibernetika, No. 6, 89-93 (1990). V. I. Norkin, "On conditions and rate of convergence of the empirical mean method in mathematical statistics and stochastic programming," Kibernetika, No. 2, 107-120 (1992). V. G. Karmanov, Mathematical Programming [in Russian], Nauka, Moscow (1975). B. T. Polyak, "Convergence and rate of convergence of iterative stochastic algorithms. I. General case," Avtomat. Telemekh., No. 12, 83-94 (1976). M. V. Mikhalevich, "Rate of convergence bounds for a method of searching for the best element," Zh. Vychisl. Mat. Mat. Fiz., 26, No. 7, 994-1005 (1986). A. V. Tsibakov, "Accuracy estimates for the empirical risk method," Prob. Peredachi Inform., 17, No. 1, 50-61 (1981). P. Billingsley, Convergence of Probability Measures, Wiley, New York (1968). R. Ranga Rao, "Relations between weak and uniform convergence of measures with applications," Ann. Math. Stat., 33, 659-680 (1962).