Estimating the Optimal Margins of Embeddings in ... - CiteSeerX

12 downloads 0 Views 264KB Size Report
Hans Ulrich Simon. Lehrstuhl Mathematik & Informatik ..... of HALF-INTERVALSn with arbitrary EQ-queries is blog2 nc (see Maass and Turan. 9], Proposition 4.2.
Estimating the Optimal Margins of Embeddings in Euclidean Half Spaces Jurgen Forster Niels Schmitt Hans Ulrich Simon Lehrstuhl Mathematik & Informatik, Fakultat fur Mathematik Ruhr-Universitat Bochum, 44780 Bochum, Germany fforster,nschmitt,[email protected]

NeuroCOLT2 Technical Report Series NC2-TR-2001-094 April, 2000

Produced as part of the ESPRIT Working Group in Neural and Computational Learning II, NeuroCOLT2 27150 For more information see the NeuroCOLT website http://www.neurocolt.com

or email [email protected]

1

Abstract

Concept classes can canonically be represented by matrices with entries 1 and ?1. We use the singular value decomposition of this matrix to determine the optimal margins of embeddings of the concept classes of singletons and of half spaces in homogeneous Euclidean half spaces. For these concept classes the singular value decomposition can be used to construct optimal embeddings and also to prove the corresponding best possible upper bounds on the margin. We show that the optimal margin for embedding singletons is 3nn?4 and that the optimal margin for half intervals over f1 g is 2 ln n + (ln1n)2 . For the upper bounds on the margins we generalize a bound given in [6]. We also discuss the example of concept classes of monomials to point out limitations of our approach. n

;:::;n

Introduction 2

1 Introduction Recently there has been a lot of interest in maximal margin classi ers. Learning algorithms that calculate the hyperplane with the largest margin on a sample and use this hyperplane to classify new instances have shown excellent empirical performance (see [5]). Often the instances are mapped (implicitly when a kernel function is used) to some possibly high dimensional space before the hyperplane with maximal margin is calculated. If the norms of the instances are bounded and a hyperplane with large margin can be found, a bound on the VC-dimension can be applied (Vapnik [11]; [5], Theorem 4.16). A small VC-dimension means that a concept class can be learned with a small sample (Vapnik and Chervonenkis [12]; Blumer et. al. [4]; [7], Theorem 3.3). The success of maximal margin classi ers raises the question which concept classes can be embedded in half spaces with a large margin. For every concept class there is a trivial embedding into half spaces. Ben-David, Eiron and Simon [2] show that most concept classes cannot be embedded with a margin that is much larger than the trivial margin. They use counting arguments that do not give upper bounds on the margins of particular concept classes. Vapnik [11] also showed an upper bound on the margin in terms of the VC-dimension. A stronger result was shown by Forster [6]: A concept class C over an instance space X can be represented by the matrix M 2 f?1; 1gX C for which the entry Mx;c is 1 if x 2 c and ?1 otherwise. He showed that a concept class (X; C ) can only be embedded in homogeneous half spaces with margin at most pkjXMjjkY j , where kM k denotes the operator norm of M . It is not hard to see that for every matrix M 2 f?1; 1gX Y :

p

p

p

max( jX j; jY j)  kM k  jX jjY j :

p

(See for example Krause [8], Lemma 1.1.) The kM k = jY j holds if and pjequality only if the rows of M are orthogonal, k M k = X j holds if and only if the columns p of M are orthogonal and kM kn =n jX jjY j holds if and only if rank(M ) = 1. The Hadamard matrices Hn 2 R2 2 are examples of matrices with orthogonal rows and columns. They are recursively de ned by

H0 = 1 ;

Hn+1 =

H

n Hn

Hn ?Hn



:

From the upper bound pkjXMjjkY j on the margin it follows easily that for the concept classes for which the matrix M has orthogonal rows or orthogonal columns the trivial embedding has the optimal margin. In this paper we give a straightforward generalization of Forster's result to the case where the entries of the matrix M can be arbitrary real numbers. For nite sets X , Y we say that a matrix M 2 RX Y can be realized by an arrangement of homogeneous half spaces with margin if there are vectors ux, vy for x 2 X , y 2 Y that lie in the unit ball of some Rk (where k can be arbitrarily large) such that Mxy and hux ; vy i have the same sign and j hux; vy i j  for all x 2 X , y 2 Y . If we interpret vy as the normal vector of a homogeneous half-space, then hux ; vy i > 0 means that the vector ux lies in the interior of this half space. The sign of Mxy determines if ux must lie in this half space or not. The requirement j hux ; vy i j  means that the point ux has distance at least from the boundary of the half space. Analogously we can interpret the vectors ux as normal vectors of half spaces and the vectors vy as points. It is crucial that we require the vectors to lie in a unit ball (or that they are bounded) because otherwise we could improve the margin by simply stretching all

Notation 3

vectors. Note that it is not really a restriction to assume that the half spaces are homogeneous: There is a standard way to transform an embedding with inhomogeneous half spaces into an embedding with homogeneous half spaces that has at least half the old margin. For every matrix M 2 RX Y there exists an optimal embedding: We can assume without loss of generality that the vectors of the embedding lie in the unit ball of RX . (Because the linear span of the vectors ux has dimension at most jX j and we can project the vectors vy to this span without changing the scalar products hux ; vy i.) The margin of the embedding is continuous in the vectors ux, vy , and the unit ball of RX is compact. Thus the maximal margin is attained. We introduce a new tool from functional analysis, the singular value decomposition, to estimate the optimal margins of concept classes. The singular value decomposition can not only be used to construct embeddings, but also to show upper bounds on the margins of all embeddings. We show that for two types of concept classes, namely singletons and half intervals, our techniques can be used to calculate the best possible margins exactly. However, we also show that these techniques fail for the concept classes of monomials. In the Section 2 we x some notation for the rest of the paper. In Section 3 we show that a matrix M 2 RX Y can be embedded in homogeneous half pjXonly j k M k . In Section 4 we show how the spaces with margin at most qP P y2Y ( x2X jMxy j) singular value decomposition can be used to apply the upper bound from Section 3. In Sections 5 and 6 we use the results from the previous sections to calculate the optimal margins for the concept classes of singletons and of half intervals. In Section 7 we discuss the concept class of monomials. 2

2 Notation

For a nite set X , RX is the vector space of real (\vectors") on X . The pPfunctions 2x , the supremum norm is Euclidean norm of a vector a 2 RX is kak2 := a x2X kak1 := maxx2X jaxj. As usual we write Rn = Rf1;:::;ng . The vectors x 2 Rn are column vectors. For two nite sets X; Y we write RX Y for the set of real matrices with rows indexed by the elements of X and columns indexed by the elements of Y . The transposition of a matrix A 2 RX Y is denoted by A> 2 RY X . For x 2 X we de ne ex 2 RX to be the \canonical" vector satisfying

1;

x=y ; x 6= y ; for y 2 X . For a complex vector x 2 Cn , x is the complex conjugate and x the complex conjugate transpose of x. IX is the identity matrix. A nonsingular matrix A 2 RX X is called orthogonal if A?1 = A> . The operator norm of a matrix A 2 RX Y is kAk = supX kAxk = xmax kAxk : 2RX (ex )y =

0;

x2R kxk1

kxk1

The supremum is attained because kAxk is a continuous function of x and the unit ball fx 2 RX j kxk  1g is compact. For any matrix A 2 RX Y we have that kAk2 = kA> Ak = kAA> k. The signum function sign : R ! R is given by

81; < sign(x) = : 0 ; ?1 ;

x>0 ; x=0 ; x it follows that there are an orthonormal basis d1 ; : : : ; djX j of RX consisting of eigenvectors of MM > and eigenvalues 0  1 ; : : : ; jX j  kMM >k = kM k2 of MM > such that MM > = PjiX=1j i di di > . The upper bound on follows from



2 X X !2 (1) X

X

jMxy j 

Mxy ux

= Mxy  Mx~y  hux ; ux~ i x2X y2Y x2X y2Y x;x~2X jX j X XX > >

2X X

y2Y

=

x;x~2X

(MM )xx~ hux; ux~ i =

x;x~2X i=1

i (di di )xx~ hux ; ux~ i

2 jX j

X E X

i ex> di ux; di > ex~ ux~ = i

hex ; di i ux

= i=1 x2X x;x~2X i=1

2 jX X j

X jX j XX  kM k2

hex; di i ux

= kM k2 ex > di di > ex~ hux; ux~ i i=1 x2X x;x~2X i=1 2 X > 2X 2 2 jX j D XX

= kM k

x;x~2X

ex ex~ hux; ux~ i = kM k

P

x2X

kuxk  kM k jX j :

We usedPthe fact that P jiX=1j di di > is the unit matrix IX 2 RX X . This holds because jiX=1j di d>i x = jiX=1j hx; di i di = x for all x 2 RX . If we apply Theorem 1 to a matrix M 2 f?1; 1gX Y with entries -1 and 1 we get the upper bound pkjXMjjkY j from Forster [6], Theorem 3. For an arbitrary matrix M 2 RX Y we also get the upper bound

p

rP kMPk jY j 2 x2X y2Y jMxy j on the margin if we apply Theorem 1 to M > .

The Singular Value Decomposition 5

4 The Singular Value Decomposition

The problem of embedding a matrix M 2 RX Y in Euclidean half spaces with a large margin can be stated as follows: We are looking for two matrices B; C with rows of norm 1 such that the signs of the entries of BC > are equal to the signs of the entries of M , and such that the smallest absolute value of the entries of BC > is as large as possible. One possibility of writing M as a product of matrices is the singular value decomposition of M : Let r be the rank of M . Then there always exist matrices U 2 RX r and V 2 RY r with orthonormal columns and nonnegative numbers s1 ; : : : ; sr , called the singular valuespof M , such that M = U diag( s1 ; : :p: ; sr )V > . p p Obviously the matrices B = U diag( s1 ; : : : ; sr ), C = V diag( s1 ; : : : ; sr ) satisfy M = BC > . If we normalize the rows of B and C we get an embedding of M . Surprisingly, we can show that for the concept classes of singletons and of half intervals this embedding has the best possible margin. For both of these concept classes we can use the singular value decomposition to show the optimal upper bound on the margin: We can simply apply Theorem 1 to the matrix UV > . Trying this matrix can be a good idea because it is orthogonal, which means that all its singular values are the same, they are \optimally balanced". Of course we have to check that the entries of UV > have correct signs, since this is not true for all matrices M . Theorem 2. Let M 2 f?1; 1gX Y be a matrix with singular value decomposition U diag(s1 ; : : : ; sr )V > . Let the vectors u^, v^ contain the squared Euclidean norms of the rows of the matrices U diag(ps1 ; : : : ; psr ) and V diag(ps1 ; : : : ; psr ), i.e.

1 0r X u^ := @ sj Uxj2 A j =1

x2X

2 RX ;

1 0r X v^ := @ sj Vyj2 A j =1

y2Y

2 RY :

Then the embedding

ux := p1 (psj Uxj )j=1;:::;r 2 Rr ; u^x vy := p1 (psj Vyj )j=1;:::;r 2 Rr ; v^y

x2X ; y2Y ;

of the matrix M has margin

pjX jjY j pku^k kv^k  Pr sj : 1 1 j =1 1

(2)

If the entries of M and of UV > have the same signs then every embedding of M in homogeneous Euclidean half spaces has margin at most

min

pjX j pjY j ! pjX jjY j kv^k2 ; ku^k2  Pr sj : j =1

(3)

If additionally all normspof the rows of U diag(ps1 ; : : : ; psr ) and all norms of the p rows of V diag( s1 ; : : : ; sr ) are equal, then the optimal margin for embedding M is p

PjrX jjYs j : j j =1

The Optimal Margins of Singleton Concept Classes 6

it follows that Proof. Obviously kuxk = 1 = kvy k holds, and from hux; vy i = pMu^xy x v^y p p the margin of the embedding is 1= ku^k1 kv^k1 . The upper bound jX j=kv^k2 on the margin follows if we apply Theorem 1 to the matrix UV > because kUV > k  kU kkV k = 1 and because of

X

x2X

j(UV > )xy j =

X

x2X

Mxy (UV > )xy =

r XX

x2X j =1

sj Uxj Vyj

r X k=1

Uxk Vyk =

r X j =1

sj Vyj2 = v^y :

P

Both the sum of the components of u^ and the sum of those of v^ are equal to rj=1 sj . From this it follows that Pr s Pr s Pr s Pr s j j j j =1 j =1 j =1 ku^k2  p ; kv^k2  p ; ku^k1  jX j ; kv^k1  jjY=1j j ; jX j jY j and this implies the inequalities in (2) and (3). If all the components of u^ are equal and all the components of v^ are equal, then equality holds in (2) and (3). With Theorem 2 it is easy to calculate the optimal margins of matrices that have orthogonal rows or orthogonal columns. It was already observed by Forster [6] that the trivial embedding is optimal in this case. In the following two sections we show that Theorem 2 can also be used to construct optimal embeddings for the concept classes of singletons and of half intervals.

5 The Optimal Margins of Singleton Concept Classes

Given a parameter n 2 N, we look at the concept classes of singletons. They have the matrices 0 1 ?1    ?1 1 BB ?1 . . . . . . ... CC CC 2 f?1; 1gnn : SINGLETONSn = B B@ .. . . . . . . ?1 A . ?1    ?1 1 It is obvious that this matrix can be embedded with constant margin: We can get an embedding in inhomogeneous half spaces if we choose the points to be the canonical unit vectors of Rn and choose the half spaces that have the the canonical unit vectors as normal vectors and thresholds 21 . This leads to a margin of 21 . BenDavid, Eiron and Simon [3] observed that the optimal margin that can be achieved with inhomogeneous half spaces is 21 + 21n . We show that Theorem 2 can be used to calculate the optimal margin for embeddings with homogeneous half spaces. The matrix SINGLETONSnPisnsymmetric and has the eigenvalue 2 with eigenspace null(M ? 2In ) = fx 2 Rn j k=1 xk = 0g and the eigenvalue 2 ? n with eigenvector (1; : : : ; 1)>. The eigenvectors aj = p 21 (1| ; :{z: : ; 1}; ?j; 0; : : :; 0)> ; j = 1; : : : ; n ? 1 ; j + j j times an = p1n (1; : : : ; 1)> ; are an orthonormal basis of Rn . From this it follows that a singular value decomposition of M is given by

>    a    a ? a ; : : : ; 2 ; n ? 2) a1    an diag(2 1 n ? 1 n {z } | {z } | diag(s{z;:::;sn) } | U

1

V

The Optimal Margins of Half Interval Concept Classes 7

(to check this we can apply the above to the vectors a1 ; : : : ; an .) Obviously the entries of 0 1 ? 2 ?2  ?2 1 n n n B .. C . .   2 . . B . . ? . C CC UV > = 12 M + n2 ? 2 an a>n = B B@ ..n . . . . . . ? n2 A . ? n2    ? n2 1 ? n2 have the same signs as the corresponding entries of SINGLETONSn and we can apply Theorem 2. From r X j =1

sj Ukj2 =

r X j =1

j?j nX ?1 z }| {

=1

sj Vkj2 = 2 ? k2 + 2

1 +1

1 +1 ? 2 = 3 ? 4 = 3n ? 4 n n n j =k j (j + 1)

|

{z

= k1 ? n1

}

for k = 1; : : : ; n it follows that Theorem 3. The maximal margin of a realization of the matrix SINGLETONSn with homogeneous Euclidean half spaces is   n = 1 + 4 = 1 + 1 : 3n ? 4 3 9n ? 12 3 n

6 The Optimal Margins of Half Interval Concept Classes For a parameter n 2 N, the concept class of half intervals has the following matrix:

0 1 ?1    ?1 1 BB ... . . . . . . ... CC CC 2 f?1; 1gnn : HALF-INTERVALSn = B B@ .. . . . ?1 A .

1   1 As observed by Ben-David [1], we can use Noviko 's Theorem [10] to get an upper bound on the margins of embeddings of this matrix in half spaces: If we have an embedding with margin , then it follows from Noviko 's Theorem that the concept class can be learned with at most ?2 EQ-queries. The learning complexity of HALF-INTERVALSn with arbitrary EQ-queries is blog2 nc (see Maass and Turan [9], Proposition 4.2.) This shows that ?2  blog2 nc, or equivalently  pblog1 nc . We can show a much stronger result: With Theorem 2 we can give an exact formula for the optimal margin. In the following we consider only the case that n is even, but the case n odd is very similar. We start by calculating the complex eigenvalues and eigenspaces of M = HALF-INTERVALSn . Let  be a complex root of ?1, i.e.  2 C, n = ?1. Then the vector x := (k?1 )k=1;:::;n 2 Cn is an eigenvector of M for the eigenvalue 2?1 : 2

1 0k  1 ? k k + 1  n X X j ? 1 j ? 1 A @ = 1? ? 1? Mx =  ?  k=1;:::;n j =1 j =k+1 k=1;:::;n  ?2k  2 = 1? = x : k=1;:::;n  ? 1

Monomials 8

This means that we can write M as

M=

X

2 x x :   2C:n =?1 n( ? 1)

(Note that the eigenvectors x are orthogonal and that kx k2 = n.) From this we can see that we can write M = UDV > , where U; D; V 2 Rnn are given by

U=

r

2 sin (2l ? 1)j ; n n

r

2 cos (2l ? 1)j n n

!

j=1;:::;n l=1;:::;n=2

 (2l ? 1) ?1!  (2l ? 1) ?1 ; sin 2n D = diag sin 2n l=1;:::;n=2 ! r r

V=

2 (2l ? 1)(2k ? 1) ; ? 2 sin (2l ? 1)(2k ? 1) n cos 2n n 2n

k=1;:::;n l=1;:::;n=2

:

Omitted calculations can be found in the appendix. It is not hard to check that the rows of U , V are orthonormal, i.e. UDV > is a singular value decomposition of M . The entries of UV > have the same signs as those of HALF-INTERVALSn , because for all j; k 2 f1; : : : ; ng



(UV > )jk = n sin (2j ?2n2k + 1)

?1

is positive if and only if j  k. Now we can apply Theorem 2, and because the sums n X l=1

Dll Ujl2 =

n X l=1

n X Dll Vjl2 = n1 Dll l=1

are equal for all j (the above equalities follow immediately from the special structure of the matrices U , D, V ) we get Theorem 4. The maximal margin of a realization of the concept class of half intervals with matrix HALF-INTERVALSn in Euclidean homogeneous half spaces is

 1  n   (2l ? 1) ?1 !?1 X  = 2 ln n +  (ln n)2 : n sin 2n l=1

7 Monomials For the concept class of monomials over n variables x1 ; : : : ; xn , the instances are assignments to boolean variables x1 ; : : : ; xn and the concepts are conjunctions of the literals x1 ; :x1 ; : : : ; xn ; :xn . The matrix MONOMIALSn = Mn of this concept class is recursively given by 0M 1 n?1 ?1 M0 = (1) ; Mn = @Mn?1 Mn?1A 2 f?1; 1g3n2n : ?1 Mn?1 For n = 0 we have only the empty monomial which is always true. For n  1 the rst 2n?1 columns of Mn correspond to the assignments of the variables for which xn is true and the last 2n?1 columns correspond to the assignments for which xn is false. The rst 3n?1 rows correspond to the monomials m containing the literal

Monomials 9

xn , the next 3n?1 rows to the monomials containing neither xn nor :xn , and the last rows to the monomials containing :xn . There is an embedding of MONOMIALSn in inhomogeneous half spaces with margin 1=n: We map each monomial m to a half space given by a normal vector um and threshold tm . The j -th component of um 2 Rn is 1 if m contains the positive literal xj , ?1 if m contains :xj , and 0 otherwise. If lm is the number of literals contained in m, we de ne the threshold tm as lm ? 1 2 R. For each assignment a of the variables x1 ; : : : ; xn we de ne a vector va 2 Rn by (va )j = 1 if a assigns true to xj and (vy )j = ?1 otherwise. Given a monomial m, the scalar product hum ; va i attains its maximal value lm = tm + 1 for the assignments a that ful ll m. For all other assignments a we have hum; va i  lm ? 2 = tm ? 1. This means that hum; vp a i > tm if and only if a ful lls m and jhum ; va ij  1. After dividing the vectors by n and dividing the thresholds by n we have an embedding with margin 1=n. The best marginpof an embedding of MONOMIALSn with homogeneous half spaces is at most 1= n because the matrix MONOMIALSn has n orthogonal rows: We consider only the monomials that consist of a single positive literal. The corresponding rows of MONOMIALSn are orthogonal because two distinct literals di er on exactly half of all of the assignments to the variables x1 ; : : : ; xn . We want to argue now that the embedding of Theorem 2 does not give good margins for the concept classes of monomials. For this we calculate the sum of the singular values of Mn . Consider the matrix An = 21 (Mn + 13n2n ) 2 f0; 1g3n2n which results from Mn if we replace the ?1 entries of Mn by zeros. For this matrix it holds that  >  > A>0 A0 = (1) ; A>n An = 2AA>n?1AAn?1 2AAn>?1 AAn?1 2 R2n2n : n?1 n?1 n?1 n?1 (0) > For n = 0, obviously x(0) 1 = (1) is an eigenvector of A0 A0 for the eigenvalue 1 = 1. ( n ? 1) > is an eigenvector of An?1 An?1 for the eigenvalue It is easy to see, that if xj ( n ? 1) then i (n?1) ! (n?1) ! x x ( n ) ( n ) xj = j (n?1) ; x2n? +j = j(n?1) xj ?xj 1

are linearly independent eigenvectors of A>n An for the eigenvalues (jn) = j(n?1) and (2nn)? +j = 3j(n?1) . Because each column of An contains exactly 2n ones, it follows that Mn> Mn = (2An ? 13n 2n )> (2An ? 13n 2n ) = 4A>n An ? 2A>n 13n2n ? 2  12n3n An + 12n 3n 13n 2n = 4A>n An + (3n ? 4  2n )  12n2n : 1

By construction it follows inductively that each vector x(jn) for 1  j < 2n contains as many 1s as ?1s, i.e. 12n2n x(jn) = 0. Thus x(jn) is an eigenvector of Mn>Mn for the eigenvalue 4(jn) for 1  j < 2n . The vector x(2nn) contains only 1s, thus 12n2n x(2nn) = 2n x(2nn) . Because of (2nn) = n 3 , this vector is an eigenvector of Mn> Mn for the eigenvalue 4(2nn) + 2n(3n ? 4  2n ) = 4  3n + 6n ? 4n+1 :

REFERENCES 10

The singular values of Mn are the square roots of the eigenvalues of Mn>Mn . Thus each singular value sj(n?1) of Mn?1, except for the case j = 2n?1 , produces p two singular values s(jn) = sj(n?1) and s(2nn)? +j = 3 sj(n?1) . Because of s(0) 1 = 2 the sum of the singular values is 1

2n X

j =1

?

p

p 

s(jn) = 2 (1 + 3)n ? 3n + (4  3n + 6n ? 4n+1 )1=2 :

It follows that the margin of the embedding of Theorem 2 is at most

p

p

p

(3n 2n )1=2 = O 6p n  : P 2n s(n) 1+ 3 j =1 j

Because of 6=(1+ 3)  0:8966 < 1 this means that the margin of the embedding obtained with the singular value decomposition method for MONOMIALSn is exponentially small in n. Because we have already seen that there is an embedding of MONOMIALSn with margin 1=n this means that the upper bound on the margin of Theorem 2 does not hold for all matrices. In particular, for large n the entries of UV > for a singular value decomposition of MONOMIALSn do not have the correct signs.

Acknowledgments Many thanks to Shai Ben-David for interesting discussions. Jurgen Forster was supported by the Deutsche Forschungsgemeinschaft grant SI 498/4{1. This research was also supported by a grant from the G.I.F., the German-Israeli Foundation for Scienti c Research and Development.

References [1] Ben-David, S.. (2000). Personal Communication. [2] Ben-David, S., Eiron, N., & Simon, H. U. (2000). An Asymptotic Study of Embeddings in Euclidean Half Spaces. Unpublished manuscript. [3] Ben-David, S., Eiron, N., & Simon, H. U. (2000). Unpublished manuscript. [4] Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1989). Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM, 100, 157{ 184. [5] Christianini, N., & Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines. Cambridge, United Kingdom: Cambridge University Press. [6] Forster, J. (2001). A Linear Lower Bound on the Unbounded Error Probabilistic Communication Complexity. Sixteenth Annual IEEE Conference on Computational Complexity. [7] Kearns, M. J., & Umesh, V. V. (1994). An Introduction to Computational Learning Theory. Cambridge, Massachusetts: Massachusetts Institute of Technology. [8] Krause, M. (1996). Geometric arguments yield better bounds for threshold circuits and distributed computing. Theoretical Computer Science, 156, 99{117.

REFERENCES 11

[9] Maass, W. & Turan, G. (1992). Lower Bound Methods and Separation Results for On-Line Learning Models. Machine Learning, 9, 107{145. [10] Noviko , A. B. (1962). On convergence proofs on perceptrons. Symposium on the Mathematical Theory of Automata, 12, 615{622. Polytechnic Institute of Brooklyn. [11] Vapnik, V. (1998). Statistical Learning Theory. New York: John Wiley & Sons, Inc. [12] Vapnik, V. N., & Chervonenkis, A. Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16, 264{280.

REFERENCES 12

Appendix: Omitted Calculations from Section 6 The entries of the matrix M = HALF-INTERVALSn can be written as X  j?1 1?k = 2 X j?k+1 Mjk = n2 n 2C:n =?1  ? 1 2C:n =?1  ? 1 n =ei(2l?1)=n 2 X ei(2l?1)(j ?k+1)=n = n l=1 ei(2l?1)=n ? 1  n=2  i(2l?1)(j ?k+1)=n X e e?i(2l?1)(j?k+1)=n + = n2 ei(2l?1)=n ? 1 e?i(2l?1)=n ? 1 l=1 n=2 i(2l?1)(2j ?2k+1)=2n ?i(2l?1)(2j ?2k+1)=2n X ?e = n2 e i(2l?1)=2n ? e?i(2l?1)=2n e l=1 n= X2 sin (2l?1)(22nj?2k+1) 2 =n sin (22ln?1) l=1 ?1 n=2  X

= n2

l=1

sin (22l n? 1)

k ? 1) sin (2l n? 1)j cos (2l ? 1)(2 2n

!

k ? 1) : ? cos (2l n? 1)j sin (2l ? 1)(2 2n

To calculate the entries of the matrix UV > let j; k 2 f1; : : : ; ng be given. For shortness let := (2j?2n2k+1) . n=2 X k ? 1) (UV > )jk = n2 sin (2l n? 1)j cos (2l ? 1)(2 2n l=1



k ? 1) ? cos (2l n? 1)j sin (2l ? 1)(2 2n



0

! 1

n=2 n=X 2?1 ? X  e2i l A = n2 sin (2l ? 1)(22nj ? 2k + 1) = n2 Im @ei l=1 | l=0 {z }

0 =Im(exp(i(2l?1) )) 1 CC 1 BB i 2 e in B = n Im B 1 ? e2i (1 ? e|{z})C C = n sin : @ | {z } =i A i = e?i 1?ei = 2 sin

Now we give an approximation of the optimal margin. We use that sin 2n  n1 for all positive integers n. (This follows from the concavity of the sine function on

REFERENCES 13

[0; =2].) 2

?1 n=2  X  sin (2l ? 1) 2n

l=1

2n Z

  2 +2 Z n=2 sin (2x ? 1) ?1 dx sin 2n 2n 1

| {z } 2n





?n 1 dx + 2n = 2n ln sin x    sin x  1 + cos x n + 2n n ! sin 2n cos 2n 2 n =  ln 1 + sin  ? ln 1 + cos  + 2n  2n ln(2n) + 2n : 2n 2n y= (22xn?1)

?  2 2n

2

2



2

2

| {z } 1

| {z }  21n

 ? Thus the optimal margin is at least 2 ln(2n) + 2 ?1 .

?1 Z (n+1)=2  (2x ? 1) ?1 n=2  X  (2 l ? 1) sin 2 2 dx sin l=1

y= (22xn?1)

2n

2n Z



1

 2n

1  n sin x dx =  ln 1 + cos x sin  = ? 2n ln 1 + cos2n   2n ln 2n : 2n =

2

2

| {z }  2n

 ? Thus the optimal margin is at most 2 ln 2n ?1 .

2n

 sin x 2

 n

2