arXiv:cs/0301016v1 [cs.CC] 16 Jan 2003. LowerBoundson the Bounded Coe cient. Com plexity ofBilinearM aps. PeterB urgisser. Universit atPaderborn and.
arXiv:cs/0301016v1 [cs.CC] 16 Jan 2003
Lower Bounds on the Bounded Coefficient Complexity of Bilinear Maps Peter B¨ urgisser Universit¨at Paderborn and Martin Lotz Universit¨at Paderborn We prove lower bounds of order n log n for both the problem to multiply polynomials of degree n, and to divide polynomials with remainder, in the model of bounded coefficient arithmetic circuits over the complex numbers. These lower bounds are optimal up to order of magnitude. The proof uses a recent idea of R. Raz [Proc. 34th STOC 2002] proposed for matrix multiplication. It reduces the linear problem to multiply a random circulant matrix with a vector to the bilinear problem of cyclic convolution. We treat the arising linear problem by extending J. Morgenstern’s bound [J. ACM 20, pp. 305-306, 1973] in a unitarily invariant way. This establishes a new lower bound on the bounded coefficient complexity of linear forms in terms of the singular values of the corresponding matrix. In addition, we extend these lower bounds for linear and bilinear maps to a model of circuits that allows a restricted number of unbounded scalar multiplications. Categories and Subject Descriptors: F.1.1 [Computation by Abstract Devices]: Models of Computation; F.2.1 [Analysis of Algorithms and Problem Complexity]: Numerical Algorithms and Problems; I.1.2 [Symbolic and Algebraic Manipulation]: Algorithms General Terms: Algorithms, Theory Additional Key Words and Phrases: algebraic complexity, bilinear circuits, lower bounds, singular values
1. INTRODUCTION Finding lower bounds on the complexity of polynomial functions over the complex numbers is one of the fundamental problems of algebraic complexity theory. It becomes more tractable if we restrict the model of computation to arithmetic circuits, where the multiplication with scalars is restricted to constants of bounded absolute value. This model was introduced in a seminal work by [Morgenstern 1973; 1975], where it was proved that the complexity of multiplying a vector with some given square matrix A is bounded from below by the logarithm of the absolute value of the determinant of A. As a consequence, Morgenstern derived the lower bound 1 2 n log n for computing the Discrete Fourier Transform. [Valiant 1976; 1977] analyzed the problem to prove nonlinear lower bounds on the complexity of the Discrete Fourier Transform and related linear problems in the unrestricted model of arithmetic circuits. However, despite many attempts, Author’s address: Faculty of Computer Science, Electrical Engineering, and Mathematics, University of Paderborn, 33095 Paderborn, Germany. Email: {pbuerg,lotzm}@math.uni-paderborn.de. Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists requires prior specific permission and/or a fee. c 20YY ACM 0004-5411/20YY/0100-0001 $5.00
Journal of the ACM, Vol. V, No. N, Month 20YY, Pages 1–0??.
2
·
P. B¨ urgisser, M. Lotz
this problem is still open today. To motivate the bounded coefficient model (b.c. for short), we note that many algorithms for arithmetic problems, like the Fast Fourier Transform and the fast algorithms based on it, use only small constants. [Chazelle 1998] advocated the b.c. model as a natural model of computation by arguing that the finite representation of numbers is essentially equivalent to bounded coefficients. [Chazelle 1998] refined Morgenstern’s bound by proving a lower bound on the b.c. linear complexity of a matrix A in terms of the singular values of A. His applications are nonlinear lower bounds for range searching problems. Several papers [Nisan and Wigderson 1995; Lokam 1995; Pudl´ ak 1998] provided size-depth trade-offs for b.c. arithmetic circuits. The concept of matrix rigidity, originally introduced in [Valiant 1977], hereby plays a vital role. A geometric variant of this concept (euclidean metric instead of Hamming metric) is closely related to the singular value decomposition of a matrix and turns out to be an important tool, as worked out in [Lokam 1995]. [Raz 2002] recently proved a nonlinear lower bound on the complexity of matrix multiplication in the b.c. model. To our knowledge, this paper and [Nisan and Wigderson 1995] are the only ones which deal with the complexity of bilinear maps in the b.c. model of computation. The main result of this paper (Theorem 4.1) is a nonlinear lower bound of order n log n to compute the cyclic convolution of two given vectors in the b.c. model. This bound is optimal up to a constant factor. The proof is based on ideas in [Raz 2002] to establish a lower bound on the complexity of a bilinear map (x, y) 7→ ϕ(x, y) in terms of the complexity of the linear maps y 7→ ϕ(a, y) obtained by fixing the first input to a (Lemma 2.4). However, the linear circuit for the computation of y 7→ ϕ(a, y) resulting from a hypothetical b.c. circuit for ϕ has to be transformed into a small one with bounded coefficients. This can be achieved with a geometric rigidity argument by choosing a vector a at random according to the standard normal distribution in a suitable linear subspace of Cm (Lemma 4.2). In the case of matrix multiplication, [Raz 2002] proceeded by applying a geometric rigidity bound to the resulting linear problem via the Hoffman-Wielandt inequality. This approach does not yield good enough bounds in our situation, where we have to estimate the complexity of structured random matrices; in the case of the convolution these are circulant matrices. Instead, we treat the arising linear problem by extending Morgenstern’s bound in a new way. We define the rmean square volume of a complex matrix A, which turns out to be the square root of the r-th elementary symmetric function in the squares of the singular values of A. An important property of this quantity is that it is invariant under multiplication with unitary matrices from the left or the right. We prove that the logarithm of the r-mean square volume provides a lower bound on the b.c. complexity of the matrix A (Proposition (3.1)). This implies that the logarithm of the product of the largest r singular values is a lower bound on the b.c. complexity. We also study an extension of the bounded coefficient model of computation by allowing a limited number of help gates corresponding to scalar multiplications with unbounded constants. We can show that our proof technique is robust in the sense that it still allows to prove n log n lower bounds if the number of help gates is restricted to (1 − ǫ)n for fixed ǫ > 0. This is achieved by an extension of the mean Journal of the ACM, Vol. V, No. N, Month 20YY.
Bounded Coefficient Complexity of Bilinear Maps
·
3
square volume bound (Proposition 6.1), which is related to the spectral lemma in [Chazelle 1998]. The proof is based on some matrix perturbation arguments. From the lower bound for the cyclic convolution we obtain nonlinear lower bounds for polynomial multiplication, inversion of power series, and polynomial division with remainder by noting that the well-known reductions between these problems [B¨ urgisser et al. 1997] preserve the b.c. property. These lower bounds are again optimal up to order of magnitude. 1.1 Organization of the paper In Section 2, we introduce the model of computation and discuss known facts facts about singular values and matrix rigidity. We also introduce some notation and present auxiliary results related to (complex) Gaussian random vectors. In Section 3 we first recall previously known lower bounds for b.c. linear circuits. Then we introduce the mean square volume of a matrix and prove an extension of Morgenstern’s bound in terms of this quantity. Section 4 contains the statement and proof of our main theorem, the lower bound on cyclic convolution. In Section 5, we derive lower bounds for polynomial multiplication, inversion of power series and division with remainder. Finally, in Section 6 we show that our results can be extended to the case, where a limited number of unbounded scalar multiplications (help gates) is allowed. 2. PRELIMINARIES We start this section by giving a short introduction to the model of computation. 2.1 The model of computation We will base our arguments on the model of algebraic straight-line programs over C, which are often called arithmetic circuits in the literature. For details on this model we refer to chapter 4 of [B¨ urgisser et al. 1997]. By a result in [Strassen 1973b], we may exclude divisions without loss of generality. Definition 2.1. A straight-line program Γ expecting inputs of length n is a sequence (Γ1 , . . . , Γr ) of instructions Γs = (ωs ; is , js ), ωs ∈ {∗, +, −} or Γs = (ωs ; is ), ωs ∈ C, with integers is , js satisfying −n < is , js < s. A sequence of polynomials b−n+1 , . . . , br is called the result sequence of Γ on input variables a1 , . . . , an , if for −n < s ≤ 0, bs = an+s , and for 1 ≤ s ≤ r, bs = bis ωs bjs if Γs = (ωs ; is , js ) and bs = ωs bis if Γs = (ωs ; is ). Γ is said to compute a set of polynomials F on input a1 , . . . , an , if the elements in F are among those of the result sequence of Γ on that input. The size S(Γ) of Γ is the number r of its instructions. In the sequel we will refer to such straight-line programs briefly as circuits. A circuit in which the scalar multiplication is restricted to scalars of absolute value at most 2 will be called a bounded coefficient circuit (b.c. circuit for short). Of course, the bound of 2 could be replaced by any other fixed bound. Any circuit can be transformed into a b.c. circuit by replacing a multiplication with a scalar λ with at most log |λ| additions and a multiplication with a scalar of absolute value at most 2. Unless otherwise stated, log will always refer to logarithms to the base 2. We now introduce restricted notions of circuits, designed for computing linear and bilinear maps. Journal of the ACM, Vol. V, No. N, Month 20YY.
4
·
P. B¨ urgisser, M. Lotz
Definition 2.2. A circuit Γ = (Γ1 , . . . , Γr ) expecting inputs X1 , . . . , Xn is called a linear circuit, if ωs ∈ {+, −} for every instruction Γs = (ωs ; is , js ), or ωs ∈ C if the instruction is of the form (ωs ; is ). A circuit on inputs X1 , . . . , Xm , Y1 , . . . , Yn is called a bilinear circuit, if its sequence of instructions can be partitioned as Γ = (Γ(1) , Γ(2) , Γ(3) , Γ(4) ), where (1) (2) (3) (4)
Γ(1) is a linear circuit with the Xi as inputs, Γ(2) is a linear circuit with the Yj as inputs, each instruction from Γ(3) has the form (∗; i, j), with Γi ∈ Γ(1) and Γj ∈ Γ(2) , Γ(4) is a linear circuit with the previously computed results of Γ(3) as inputs.
In other words, Γ(1) and Γ(2) compute linear functions f1 , . . . , fk in the Xi and g1 , . . . , gℓ in the Yj . Γ(3) then multiplies the fi with the gj and Γ(4) computes linear combinations of the products fi gj . It is clear that linear circuits compute linear maps and that bilinear circuits compute bilinear maps. On the other hand, it can be shown that any linear (bilinear) map can be computed by a linear (bilinear) circuit such that the size increases at most by a constant factor (cf. [B¨ urgisser et al. 1997, Theorem 13.1, Proposition 14.1]). This remains true when considering bounded coefficient circuits, as can easily be checked. From now on, we will only be concerned with bounded coefficient circuits. Definition 2.3. By the b.c. complexity C(ϕ) of a bilinear map ϕ : Cm × Cn → Cp we understand the size of a smallest b.c. bilinear circuit computing ϕ. By the b.c. complexity C(ϕA ) of a linear map ϕA : Cn → Cm (or the corresponding matrix A ∈ Cm×n ), we understand the size of a smallest b.c. linear circuit computing ϕA . By abuse of notation, we also write C(F ) for the smallest size of a b.c. circuit computing a set F of polynomials from the variables. (There is no serious danger of confusion arising from this, since these complexity notions differ at most by a constant factor.) P Let ϕ : Cm × Cn → Cp be a bilinear map described by ϕk (X, Y ) = i,j aijk Xi Yj . Assuming |aijk | ≤ 2, it is clear that C(ϕ) ≤ 3mnp. Therefore, if f1 , . . . , fk are the linear maps computed on the first set of inputs by an optimal b.c. bilinear circuit for ϕ, we have k ≤ S(Γ) ≤ 3mnp. The complexity of a bilinear map ϕ can be related to the complexity of the associated linear map ϕ(a, −), where a ∈ Cm . We have taken the idea behind the following lemma from [Raz 2002]. Lemma 2.4. Let ϕ : Cm × Cn → Cp be a bilinear map and Γ be a b.c. bilinear circuit computing ϕ. If f1 , . . . , fk are the linear maps computed by the circuit on the first set of inputs, then for all a ∈ Cm : C(ϕ(a, −)) ≤ S(Γ) + p log (max |fj (a)|). j
Proof. Let a ∈ Cm be chosen and set γ = maxj |fj (a)|. Transform the circuit Γ into a linear circuit Γ′ by the following steps: (1) replace the first argument x of the input by a, Journal of the ACM, Vol. V, No. N, Month 20YY.
Bounded Coefficient Complexity of Bilinear Maps
·
5
(2) replace each multiplication by fi (a) with a multiplication by 2γ −1 fi (a), (3) multiply each output with γ/2 by simulating this with at most log (γ/2) additions and one multiplication with a scalar of absolute value at most 2. This is a b.c. linear circuit computing the map ϕ(a, −) : Cn → Cp . Since there are p outputs, the size increases by at most p log γ. 2 2.2 Singular values and matrix rigidity The Singular Value Decomposition (SVD) is one of the most important matrix decompositions in numerical analysis. Lately, it has also come to play a prominent role in proving lower bounds for linear circuits [Chazelle 1998; Lokam 1995; Raz 2002]. In this section, we present some basic facts about singular values and show how they relate to notions of matrix rigidity. For a more detailed account on the SVD, we refer to [Golub and Van Loan 1996]. We also find [Courant and Hilbert 1931, Chapt. 1, Sect. 4] a useful reference. The singular values of A ∈ Cm×n , σ1 ≥ . . . ≥ σmin{m,n} , can be defined as the square roots of the eigenvalues of the hermitian matrix AA∗ . Alternatively, they can be characterized as follows: σr+1 = min{kA − Bk2 | B ∈ Cm×n , rk(B) ≤ r}, where k · k denotes the matrix 2-norm. An important consequence is the CourantFischer min-max theorem stating σr+1 =
min
codimV =r
kAxk2 . x∈V −{0} kxk2 max
This description implies the following useful fact from matrix perturbation theory: σr+h (A) ≤ σr (A + E)
(1)
if the matrix E has rank at most h. More generally, for any metric d on Cm×n (or Rm×n ) and 1 ≤ r ≤ min {m, n}, we can define the r-rigidity of a matrix A to be the distance of A to the set of all matrices of rank at most r with respect to this metric: rigd,r (A) = min{d(A, B) | B ∈ Cm×n , rk(B) ≤ r}. Using the Hamming metric, we obtain the usual matrix rigidity as introduced in [Valiant 1977]. On the other hand, using the metric induced by the 1, 2-norm kAk1,2 := maxkxk1 =1 kAxk2 , we obtain the following geometric notion of rigidity, as introduced in [Raz 2002]: rigr (A) = min
max dist(ai , V ).
dim V =r 1≤i≤n
Here, the ai are the column vectors of A ∈ Cm×n and dist denotes the usual euclidean distance. Notions of rigidity can be related to one another the same way the underlying norms can. In particular, we have the following relationship between the geometric rigidity and the singular values: 1 √ σr+1 (A) ≤ rigr (A) ≤ σr+1 (A). n Journal of the ACM, Vol. V, No. N, Month 20YY.
6
·
P. B¨ urgisser, M. Lotz
The proofs of these inequalities are based on well known inequalities for matrix norms. For instance, if B is a matrix of rank at most r with columns bi , we have n
||A − B||21,2 = max ||ai − bi ||22 ≥ i
which shows the left inequality.
1X 1 1 2 ||ai − bi ||22 ≥ ||A − B||22 ≥ σr+1 , n i=1 n n
2.3 Complex Gaussian vectors A random vector X = (X1 , . . . , Xn ) in Rn is called standard Gaussian iff its components Xi are i.i.d. standard normal distributed. It is clear that an orthogonal transformation of such a random vector is again standard Gaussian. Throughout this paper, we will be working with random vectors Z assuming values in Cn . However, by identifying Cn with R2n , we can think of Z as a 2ndimensional real random vector. In particular, it makes sense to say that such Z is (standard) Gaussian in Cn . Let U be an r-dimensional linear subspace of Cn . We say that a random vector Z with values in U is standard Gaussian in U iff for some orthonormal basis b1 , . . . , br P of U we have Z = j ζj bj , where the random vector (ζj ) of the components is standard Gaussian in Cr . It is easy to see that this description does not depend on the choice of the orthonormal basis. In fact, the transformation of a standard Gaussian vector with a unitary matrix is again standard Gaussian, since a unitary transformation Cr → Cr induces an orthogonal transformation R2r → R2r . The easy proof of the following lemma is left to the reader. Lemma 2.5. Let (Z1 , . . . , Zn ) be standard Gaussian in Cn . Consider a complex linear combination S = f1 Z1 + . . . + fn Zn with f = (f1 , . . . , fn ) ∈ Cn . Then the real and imaginary parts of S are independent and normal distributed, each with mean 0 and variance kf k2 . Moreover, T := |S|2 /2kf k2 is exponentially distributed with parameter 1. That is, the density function is e−t for t ≥ 0 and the mean and the variance of U are both equal to 1. 2.4 Two useful inequalities Let X, Y be i.i.d. standard normal random variables and set γ := 1 − E[log X 2 ] and θ := E[log2 (X 2 + Y 2 )]. Evaluating the corresponding integrals yields Z ∞ 1 t−1/2 e−t log t dt ≈ 2.83 γ = −√ π 0 Z 1 ∞ −t/2 e log2 t dt ≈ 3.45. θ= 2 0 Lemma 2.6. Let Z be a centered Gaussian variable with complex values. Then 0 ≤ log E[|Z|2 ] − E[log |Z|2 ] ≤ γ, Var(log |Z|2 ) ≤ θ. Proof. By a principal axis transformation, we may assume that Z = λ1 X + iλ2 Y with independent standard normal X, Y . The difference ∆ := log E[|Z|2 ] − E[log |Z|2 ] is nonnegative, since log is concave (Jensen’s inequality). By linearity of the mean, ∆ as well as Var(log |Z|2 ) are invariant under multiplication of Z with Journal of the ACM, Vol. V, No. N, Month 20YY.
Bounded Coefficient Complexity of Bilinear Maps
·
7
scalars. We may therefore w.l.o.g. assume that 1 = λ1 ≥ λ2 . From this we see that log E[|Z|2 ] = log E[X 2 + λ22 Y 2 ] ≤ log E[X 2 + Y 2 ] = 1
E[log |Z|2 ] = E[log (X 2 + λ22 Y 2 )] ≥ E[log X 2 ] = 1 − γ, which implies the first claim. The estimates Var(log |Z|2 ) ≤ E[log2 |Z|2 ] ≤ E[log2 (X 2 + Y 2 )] = θ. prove the second claim.
2
3. THE MEAN SQUARE VOLUME BOUND Morgenstern’s bound [Morgenstern 1973] states that C(A) ≥ log | det (A)| for a square matrix A, see also [B¨ urgisser et al. 1997, Chapter 13] for details. We are going to study several generalizations of this bound. Let A ∈ Cm×n be a matrix. For an r-subset I ⊆ [m] := {1, . . . , m} let AI denote the submatrix of A consisting of the rows indexed by I. The Gramian determinant det AI A∗I can be interpreted as the square of the volume of the parallelepiped spanned by the rows of AI (A∗ denotes the complex transpose of A). [Raz 2002] defined the r-volume of A by volr (A) := max (det AI A∗I )1/2 |I|=r
and observed that the proof of Morgenstern’s bound extends to the following rvolume bound: C(A) ≥ log volr (A).
(2)
Moreover, [Raz 2002] related this quantity to the geometric rigidity as follows: volr (A) ≥ (rigr (A))r , which implies the rigidity bound, C(A) ≥ r log rigr (A).
(3)
For our purposes it will be important to work with a variant of the r-volume that is completely invariant under unitary transformations. Instead of taking the maximum of the volumes (det AI A∗I )1/2 , we will use the sum of the squares. We define the r-mean square volume msvr (A) of A ∈ Cm×n by 1/2 1/2 X X | det AI,J |2 . det AI A∗I = msvr (A) := |I|=r
|I|=|J|=r
Hereby, AI,J denotes the r × r submatrix consisting of the rows indexed by I and columns indexed by P J. The second equality is a consequence of the Binet-Cauchy formula det AI A∗I = |J|=r | det AI,J |2 , see [Bellman 1997, Chapter 4]. The choice of the L2 -norm instead of the maximum norm results in the following inequality s m volr (A) ≤ msvr (A) ≤ volr (A). (4) r Journal of the ACM, Vol. V, No. N, Month 20YY.
8
·
P. B¨ urgisser, M. Lotz
The mean square volume has the following nice properties, which are all easy to verify: msvr (A) = msvr (A∗ ), msvr (λA) = |λ|r msvr (A), msvr (A) = msvr (U AV ), where λ ∈ C and U and V are unitary matrices of the correct format. Note also that msvn (A) = | det A| for A ∈ Cn×n . The unitary invariance allows to express the mean square volume of A in terms of the singular values σ1 ≥ . . . · · · ≥ σp of A, p := min {m, n}. It is well known [Golub and Van Loan 1996] that there are unitary matrices U ∈ Cm×m and V ∈ Cn×n such that U ∗ AV = diag(σ1 , . . . , σp ). Hence we obtain XY msv2r (A) = msv2r (diag(σ1 , . . . , σp )) = (5) σi2 ≥ σ12 σ22 · · · σr2 , |I|=r i∈I
where I runs over all r-subsets of [p]. Hence, the square of the r-mean square volume of a matrix is the r-th elementary symmetric polynomial in the squares of its singular values. Combining the r-volume bound (2) with (4) we obtain the following mean square volume bound. Proposition 3.1. For a matrix A ∈ Cm×n and r ∈ N with 1 ≤ r ≤ min{m, n} we have m (6) C(A) ≥ log msvr (A) − . 2 Remark 3.2. The r-volume can be seen as the 1, 2-norm of the map Λr A induced by A between the exterior algebras Λr Cn and Λr Cm (see e.g., [Lang 1984] for background on multilinear algebra). Similarly, the mean square volume can be interpreted as the Frobenius norm of Λr A. The unitary invariance of the mean square volume also follows from the fact that Λr is equivariant with respect to unitary transformations and that the Frobenius norm is invariant under such. 4. A LOWER BOUND ON CYCLIC CONVOLUTION In this section we use the mean square volume bound (6) to prove a lower bound on the bilinear map of the cyclic convolution. Pn−1 Pn−1 i i Let f = in C[X]. The cyclic i=0 ai x and g = i=0 bi x be polynomials Pn−1 i convolution of f and g is the polynomial h = i=0 ci x , which is given by the product of f and g in the quotient ring C[X]/(X n − 1). More explicitly: X ck = ai bj , 0 ≤ k < n. i+j≡k mod n
Cyclic convolution is a bilinear map on the coefficients. For a fixed polynomial with coefficient vector a = (a0 , . . . , an−1 ), this map turns into a linear transformation with the circulant matrix a0 a1 . . . an−1 an−1 a0 . . . an−2 Circ(a) = ... ... ... ... . a 1 a 2 . . . a0 Journal of the ACM, Vol. V, No. N, Month 20YY.
Bounded Coefficient Complexity of Bilinear Maps
·
9
Let DFTn = (ω jk )0≤j,k , 2 2 √ where c = 21 (2+γ+ 2θ) ≈ 3.73, and γ, θ are the constants introduced in Section 2.4. We postpone the proof of this lemma and proceed with the proof of the main theorem. Proof. (of Theorem 4.1) Let Γ be a b.c. bilinear circuit for convn , which computes the linear forms f1 , . . . , fk on the first set of inputs. Fix 1 ≤ r < n, to be specified later, and set R = rign−r (f1 , . . . , fk ). By Lemma 4.2 and Lemma 4.3 there exists an a ∈ Cn , such that the following conditions hold: p (1) max1≤i≤k |fi (a)| ≤ 2 ln (4k) R, (2) C(Circ(a)) ≥ 21 r log n − cn. By Lemma 2.4 and the fact that k ≤ 3n3 , we get p S(Γ) + n log (2 ln (12n3 ) R) ≥ C(Circ(a)).
(8)
On the other hand, the rigidity bound (3) implies the following upper bound on R in terms of S(Γ): S(Γ) ≥ C(f1 , . . . , fk ) ≥ (n − r) log R. By combining this with (8) and using the second condition above, we obtain r n S(Γ) ≥ log n − O(n log log n). 1+ n−r 2 Setting ǫ = r/n yields
S(Γ) ≥
ǫ(1 − ǫ) n log n − O(n log log n). 2(2 − ǫ)
Journal of the ACM, Vol. V, No. N, Month 20YY.
·
Bounded Coefficient Complexity of Bilinear Maps
11
A simple calculation shows that the coefficient of the n log n term attains the maximum 0.086 for ǫ ≈ 0.58. Choosing ǫ = 1/2 for simplicity of exposition finishes the proof. 2 Before going into the proof of Lemma 4.3, we provide a lemma on bounding the deviations of products of correlated normal random variables. Lemma 4.4. Let Z = (Z1 , . . . , Zr ) be a centered Gaussian vector in Cr . Define the complex covariance matrix of Z by Σr := (E(Zj Z k ))j,k and put δ := √ 2−(γ+ 2θ) ≈ 0.02. Then we have E(|Z1 |2 · · · |Zr |2 ) ≥ det Σr and 1 P |Z1 |2 · · · |Zr |2 ≥ δ r det Σr > . 2
Proof. For proving the bound on the expectation decompose Zr = ξ + η into a component ξ in the span of Z1 , . . . , Zr−1 plus a component η orthogonal to this span in the Hilbert space of quadratic integrable random variables with respect to the inner product defined by the joint probability density of Z. Therefore, |Zr |2 = |ξ|2 + ξη + ξη + |η|2 , hence by independence E(|Z1 |2 · · · |Zr−1 |2 |Zr |2 ) = E(|Z1 |2 · · · |Zr−1 |2 |ξ|2 ) + E(|Z1 |2 · · · |Zr−1 |2 ) E(|η|2 ) ≥ E(|Z1 |2 · · · |Zr−1 |2 ) E(|η|2 ).
By interpreting the Gramian determinant det Σr as the square volume of the parallelepiped spanned by the random vectors Z1 , . . . , Zr in the Hilbert space, we obtain det Σr = det Σr−1 E(|η|2 ). The desired bound on the expectation E(|Z1 |2 · · · |Zr |2 ) ≥ det Σr thus follows by induction on r. Noting that E(|Zr |2 ) ≥ E(|η|2 ), we also conclude from the above equation that E(|Z1 |2 ) · · · E(|Zr |2 ) ≥ det Σr .
(9) 2
In order to prove the probability estimate for the random product |Z1 | · · · |Zr |2 , we first transform the product into a sum by taking logarithms. For every ǫ > 0 Chebychev’s inequality yields the bound r h1 X i Var(Pr log |Zj |2 ) j=1 P (log |Zj |2 − E[log |Zj |2 ]) ≥ ǫ ≤ . (10) r j=1 ǫ2 r 2
For the variance we have by Lemma 2.6
r X X log |Zj |2 ) = Cov(log |Zj |2 , log |Zk |2 ) Var( j=1
j,k
Xq Var(log |Zj |2 )Var(log |Zk |2 ) ≤ r2 θ. ≤ j,k
2
Setting ǫ = 2θ in this equation and after exponentiating in (10) we obtain h i 1 Pr 2 P |Z1 |2 · · · |Zr |2 ≤ 2−ǫr+ j=1 E[log |Zj | ] ≤ . 2
(11)
Journal of the ACM, Vol. V, No. N, Month 20YY.
12
·
P. B¨ urgisser, M. Lotz
By combining the bound (9) with Lemma 2.6 we get log det Σr ≤
r X i=1
log E[|Zi |2 ] ≤ γr +
r X i=1
E[log |Zi |2 ].
Hence we conclude from (11) that i 1 h P |Z1 |2 · · · |Zr |2 ≤ 2−(ǫ+γ)r det Σr ≤ , 2 from which the lemma follows.
2
Proof. (of Lemma 4.3) By equation (7) we have λ = DFTn a and the singular values of the circulant Circ(a) are given by the absolute values of the components of λ. Setting α = n−1/2 λ = n−1/2 DFTn a, we obtain for the r-mean square volume by (5) XY msv2r (Circ(a)) = nr |αi |2 .
(12)
|I|=r i∈I
Now let a be a standard Gaussian vector in the subspace U of dimension r. Let W by the image of U under the unitary transformation n−1/2 DFTn . As a unitary transformation of a, α is standard Gaussian in the subspace W (cf. Section 2.3). This means that there is an orthonormal basis b1 , . . . , br of W such that α = β 1 b 1 + · · · + βr b r ,
where (βi ) is standard Gaussian in Cr . Let B ∈ Cn×r denote the matrix with the columns b1 , . . . , br and let BI be the submatrix of B consisting of the rows indexed by I, for I ⊆ [n] with |I| = r. Setting αI = (αi )i∈I we have αI = BI β. The complex covariance matrix of αI is given by Σ := E[αI α∗I ] = BI BI∗ , hence det Σ = | det BI |2 .
We remark that | det BI |2 can be interpreted as the volume contraction ratio of the n I projection C P → C 2, α 7→ αI restricted to W . For later purposes we also note that 2 E(|αi | ) = j |Bij | ≤ 1. By the Binet-Cauchy formula and the orthogonality of the basis (bi ) we get X | det BI |2 = det (hbi , bj i)1≤i,j≤r = 1. |I|=r
Therefore, we can choose an index set I such that −1 n | det BI |2 ≥ ≥ 2−n . r By applying Lemma 4.4 to the random vector αI and using (12), we get that with probability at least 1/2, msv2r (Circ(a)) ≥ nr δ r det Σ ≥ nr δ r 2−n , Journal of the ACM, Vol. V, No. N, Month 20YY.
(13)
Bounded Coefficient Complexity of Bilinear Maps
where δ = 2−(γ+
√ 2θ)
·
13
. The mean square volume bound (6) implies that
n 1 1 ≥ r log n − (2 + log δ −1 )n, 2 2 2 with probability at least 1/2. This proves the lemma. C(Circ(a)) ≥ log msvr (Circ(a)) −
2
5. MULTIPLICATION AND DIVISION OF POLYNOMIALS By reducing the cyclic convolution to several other important computational problems, we are going to derive lower bounds of order n log n for these problems. These bounds are optimal up to a constant factor. However, we did not attempt to optimize these factors. 5.1 Polynomial multiplication Pn−1 Pn−1 P2n−2 Let f = i=0 ai xi and g = i=0 bi xi be polynomials in C[X] and f g = i=0 ci xi . Clearly, we can obtain the coefficients of the cyclic convolution of f and g by adding ck to ck+n for 0 ≤ k < n. This observation and Theorem 4.1 immediately imply the following corollary. Corollary 5.1. The bounded coefficient complexity of the multiplication of poly1 n log n − O(n log log n). nomials of degree less than n is at least 12 5.2 Division with remainder We will first derive a lower bound on the inversion of power series mod X n+1 and then use this to get a lower bound for the division of polynomials. Let C[[X]] denote the ring of formal power series in the variable X. We will study the problem to compute the first n coefficients b1 , , . . . , bn of the inverse in C[[X]] f
−1
=1+
∞ X
bk X k
k=1
Pn
i
of the polynomial f = 1 − i=1 ai X given by the coefficients ai . We remark that the bk are polynomials in the ai , which are recursively given by b0 := 1,
bk =
k−1 X
ak−i bi .
i=0
Note that the problem to invert power series is not bilinear. [Sieveking 1972] and [Kung 1974] designed a b.c. circuit of size O(n log n) solving this problem. We now prove a corresponding lower bound on the b.c. complexity of this problem by reducing polynomial multiplication to the problem to invert power series. Theorem 5.2. The map Pnassigning to a1 , , . . . , an the first n coefficients b1 , . . . , bn of the inverse of f = 1 − i=1 ai X i in the ring of formal power series has bounded 1 coefficient complexity greater than 324 n log n − O(n log log n). Proof. Put g =
Pn
i=1
ai X i . The equation 1+
∞ X
k=1
∞
X 1 gk . = bk X = 1−g k
k=0
Journal of the ACM, Vol. V, No. N, Month 20YY.
14
·
P. B¨ urgisser, M. Lotz
P k shows that g 2 is the homogeneous quadratic part of ∞ k=1 bk X in the variables ai . Let Γ be an optimal b.c. circuit computing b1 , . . . , bn . According to the proof in [B¨ urgisser et al. 1997, Theorem 7.1], there is a b.c. circuit of size at most 9 S(Γ) computing the homogeneous quadratic parts of the b1 , . . . , bn with respect to the variables ai . This leads to a b.c. circuit of size at most 9 S(Γ) computing the coefficients of the squared polynomial g 2 . Now let m := ⌊n/3⌋, and assume that g = g1 + X 2m g2 with g1 , g2 of degree smaller than m. Then g 2 = g12 + 2g1 g2 X 2m + g22 X 4m , By the assumption on the degrees we have no “carries” and we can therefore find the coefficients of the product polynomial g1 g2 among the middle terms of g 2 . Thus we obtain a b.c. circuit for the multiplication of polynomials of degree m − 1. The theorem now follows from Corollary 5.1. 2 We now show how to reduce the inversion of power series to the problem of dividing polynomials with remainder. The reduction in the proof of the following corollary is from [Strassen 1973a], see also [B¨ urgisser et al. 1997, Section 2.5]. Corollary 5.3. Let f, g be polynomials with n = deg f ≥ m = deg g and g be monic. Let q be the quotient and r be the remainder of f divided by g, so that f = qg + r and deg r < deg g. The map assigning to the coefficients of f and g the coefficients of the quotient q and the remainder r has bounded coefficient complexity 1 n log n − O(n log log n). at least 324 Proof. Dividing f = X 2n by g = X 2n =
n X
Pn
i=0
qi X i
ai X n−i , where a0 = 1, we obtain:
n X i=0
i=0
n−1 X ri X i . ai X n−i + i=0
By substituting X with 1/X in the above equation and multiplying with X 2n , we get 1=
n X
qi X n−i
n X i=0
i=0
n−1 X ri X 2n−i . ai X i + i=0
Since the remainder is now a multiple of X n X i=0
ai X i
−1
≡
n X i=0
n+1
, we get
qi X n−i mod X n+1 .
From this we see that the coefficients of the quotient are precisely the coefficients Pn of the inverse mod X n+1 of i=0 ai X i in the ring of formal power series, and the proof is finished. 2 6. UNBOUNDED SCALAR MULTIPLICATIONS We extend our model of computation by allowing some instructions corresponding to scalar multiplications with constants of absolute value greater than two, briefly Journal of the ACM, Vol. V, No. N, Month 20YY.
Bounded Coefficient Complexity of Bilinear Maps
·
15
called help gates in the sequel. If there are at most h help gates allowed, we denote the corresponding bounded coefficient complexity by the symbol Ch . We are going to show that our proof technique is robust in the sense that it still allows to prove n log n lower bounds if the number of help gates is restricted to (1 − ǫ)n for fixed ǫ > 0. 6.1 Extension of the mean square volume bound As a first step we extend the mean square volume bound (5) and (6) for dealing with help gates. Proposition 6.1. Assume A ∈ Cm×n has the singular values σ1 ≥ . . . ≥ σp , where p := min {m, n}. For all integers s, h with 1 ≤ s ≤ p − h we have Ch (A) ≥
h+s X
i=h+1
log σi −
m m + h ≥ s log σh+s − + h. 2 2
Proof. Let Γ be a b.c. circuit with at most h help gates, which computes the linear map corresponding to A. Without loss of generality, we may assume that Γ has exactly h help gates. Let gi , i ∈ I, be the linear forms computed at the help gates of Γ. We transform the circuit Γ into a b.c. circuit Γ′ by replacing each help gate with a multiplication by zero. This new circuit is obviously a b.c. circuit of size S(Γ′ ) = S(Γ) − h, computing a linear map corresponding to a matrix B ∈ Cm×n . The linear maps corresponding to A and B coincide on the orthogonal complement of span{gi | i ∈ I} in Cm , therefore B = A + E for a matrix E of rank at most h. From the perturbation inequality (1) we obtain that σi (B) ≥ σi+h (A) for i ≤ p − h. By (5) this implies for s ≤ p − h that X msv2s (B) ≥ σi21 (B) · · · σi2s (B) ≥ 0 2 ln n + ǫ = 0. (14) n→∞
i
For this we may assume that the components of X are uncorrelated. In fact, Slepian’s inequality (see [Ledoux and Talagrand 1991]) implies that for centered Gaussian vectors X = (X1 , . . . , Xn ) and Y = (Y1 , . . . , Yn ) we have h i h i P max Xi ≤ u ≤ P max Yi ≤ u i
E(Xi2 )
i
E(Yi2 )
provided = and E(Xi Xj ) ≤ E(Yi Yj ) for all i, j. We may also assume that all the Xi have variance 1 since the distribution function Z u t2 1 exp(− 2 )dt. Fσ (u) := √ 2σ σ 2π ∞
of a centered normal random variable with variance σ 2 ≤ 1 satisfies F1 (u) ≤ Fσ (u) for all u ≥ 0. Hence, if X is a Gaussian vector with uncorrelated components Xi of variance σi2 ≤ 1, we have n h i Y n Fσi (u) = P max Xi ≤ u . F1 (u) ≤ i
i=1
In the case where X1 , . . . , Xn are independent and standard normal distributed we have according to [Cram´er 1946] that √ 2 ln n + o(1),
π2 1 (1 + o(1)), n → ∞ i i 12 ln n and Claim (14) follows from Chebychev’s inequality. 2. The second assertion follows from the first one applied to the Gaussian vector W with values in R2n given by the √ real and imaginary parts of the Zi (in some order). Note that max1≤i≤n |Zi | ≤ 2 max1≤j≤2n |Wj |. 2 E(max Xi ) =
Var(max Xi ) =
Journal of the ACM, Vol. V, No. N, Month 20YY.
Bounded Coefficient Complexity of Bilinear Maps
·
17
6.3 Cyclic convolution and help gates Our goal is to prove the following extension of Theorem 4.1. Theorem 6.4. The bounded coefficient complexity with at most (1 − ǫ)n help gates of the n-dimensional cyclic convolution convn is at least Ω(n log n) for fixed 0 < ǫ ≤ 1. The proof follows the same line of argumentation as in Section 4. We first state and prove an extension of Lemma 4.3. Lemma 6.5. Let U ⊆ Cn be a subspace of dimension r and h ∈ N with h < r. For a standard Gaussian vector a in U , we have 1 1 P Ch (Circ(a)) ≥ (r − h) log n − n(c + log log n) > , 2 2 for some constant c > 0. Proof. As in the proof of Lemma 4.3 we assume that the random vector α = n−1/2 DFTn a√is standard Gaussian with values in some r-dimensional subspace W . Recall that n |αi | are the singular values of Circ(a). We denote by |α(1) | ≥ . . . ≥ |α(n) | the components of α with decreasing absolute values. In particular, |α(1) | = maxi |α(i) |. Proposition 6.1 implies that r X
√ n log( n |α(i) |) − + h 2 i=h+1 Y r 1 n = (r − h) log n + log |α(i) | − + h. 2 2
Ch (Circ(a)) ≥
i=h+1
In the proof of Lemma 4.3 (13) we showed that msv2r (Circ(a)) ≥ nr δ r 2−n with probability at least 1/2. In the same way, one can show that with probability at least 3/4 we have msv2r (Circ(a)) ≥ nr cn1 for some fixed constant c1 > 0. From the estimate r X Y Y |α(i) |2 |αi |2 ≤ 2n |I|=r i∈I
i=1
Qr
we thus obtain that i=1 |α(i) |2 ≥ (c1 /2)n with probability at least 3/4. By applying Lemma 6.3 to the centered Gaussian random variable α we obtain that with probability at least 3/4 max |α(i) |2 = |α(1) |2 ≤ c2 log n i
for some fixed constant c2 > 0. (Recall that E(|α(i) |2 ) ≤ 1.) Altogether, we obtain that with probability at least 1/2 we have Qr n r (i) 2 Y c1 (i) 2 i=1 |α | |α | ≥ ≥ . 2c2 log n |α(1) |2h i=h+1
This completes the proof of the lemma.
2 Journal of the ACM, Vol. V, No. N, Month 20YY.
18
·
P. B¨ urgisser, M. Lotz
Proof. (of Theorem 6.4) Let Γ be a b.c. bilinear circuit computing convn using at most h ≤ (1 − ǫ)n help gates, 0 < ǫ ≤ 1. Referring to the partition of instructions in Definition 2.2, we assume that Γ(1) uses h1 help gates, and that Γ(2) , Γ(3) , Γ(4) use a total of h2 help gates. Thus h1 + h2 = h. Let f1 , . . . , fk denote the linear forms computed by Γ(1) . Assume h2 < r < n − h1 and set R = rign−r (f1 , . . . , fk ). By Lemma 4.2 and Lemma 6.5 there exists an a ∈ Cn , such that the following conditions hold: p (1) max1≤i≤k log |fi (a)| ≤ log(2 ln (4k) R) ≤ log R + O(log log n), (2) Ch2 (Circ(a)) ≥ 21 (r − h2 ) log n − O(n log log n). On the other hand, by Proposition 6.1 and using σn−r (f1 , . . . , fk ) ≥ R, we get S(Γ) ≥ Ch1 (f1 , . . . , fk ) ≥ (n − r − h1 ) log R −
k . 2
The proof of Lemma 2.4 shows that S(Γ) + n max log |fi (a)| ≥ Ch2 (Circ(a)). 1≤i≤k
By combining all this we obtain nk 1 n S(Γ) + + O(n log log n) ≥ (r − h2 ) log n. 1+ n − r − h1 2(n − r − h1 ) 2
We set now r := ⌊(h2 + n − h1 )/2⌋. Then r + h1 ≤ (1 − 2ǫ )n and r − h2 ≥ By plugging this into the above inequality we obtain
ǫ 2
n − 1.
ǫ+2 k ǫ S(Γ) + + O(n log log n) ≥ n log n. ǫ ǫ 4 2
2
ǫ n log n − O(n log log n). On Let κ := ǫ8 . If k ≤ κn log n + n, then S(Γ) ≥ 8(ǫ+2) the other hand, if k > κn log n + n, then trivially
S(Γ) ≥ Ch1 (f1 , . . . , fk ) ≥ k − n ≥ κn log n. This completes the proof of the theorem.
2
ACKNOWLEDGMENTS
We are grateful to Joachim von zur Gathen for bringing the paper [Raz 2002] to our attention. We thank Satyanarayana Lokam for suggesting to extend our lower bounds involving help gates from the linear to the bilinear case. We also thank Tom Schmitz and Mario Wschebor for useful discussions about probability. This work has been supported by Forschungspreis 2002 der Universit¨at Paderborn and by the Paderborn Institute for Scientific Computation (PaSCo). REFERENCES Bellman, R. 1997. Introduction to matrix analysis. SIAM, Philadelphia, PA. ¨ rgisser, P., Clausen, M., and Shokrollahi, M. 1997. Algebraic Complexity Theory. Bu Grundlehren der mathematischen Wissenschaften, vol. 315. Springer Verlag. Chazelle, B. 1998. A spectral approach to lower bounds with applications to geometric searching. SIAM Journal on Computing 27(2), 545–556. Journal of the ACM, Vol. V, No. N, Month 20YY.
Bounded Coefficient Complexity of Bilinear Maps
·
19
Courant, R. and Hilbert, D. 1931. Methoden der mathematischen Physik. I. Springer-Verlag, Berlin. Zweite Auflage. Cram´ er, H. 1946. Mathematical Methods of Statistics. Princeton Mathematical Series, vol. 9. Princeton University Press. Golub, G. H. and Van Loan, C. 1996. Matrix Computations. The John Hopkins University Press, Baltimore. Kung, H. 1974. On computing reciprocals of power series. Num. Math. 22, 341–348. Lang, S. 1984. Algebra, Second ed. Addison-Wesley. Ledoux, M. and Talagrand, M. 1991. Probability in Banach Spaces. Ergebnisse der Mathematik und ihrer Grenzgebiete, 3. Folge, vol. 23. Springer Verlag. Lokam, S. 1995. Spectral methods for matrix rigidity with applications to size-depth tradeoffs and communication complexity. In Proc. 36th FOCS. 6–15. Morgenstern, J. 1973. Note on a lower bound of the linear complexity of the fast Fourier transform. J. ACM 20, 305–306. Morgenstern, J. 1975. The linear complexity of computation. J. ACM 22, 184–194. Nisan, N. and Wigderson, A. 1995. On the complexity of bilinear forms. In Proc. of the 27th ACM Symposium on the Theory of Computing. 723–732. ´ k, P. 1998. A note on the use of determinant for proving lower bounds on the size of linear Pudla circuits. ECCC Report 42. Raz, R. 2002. On the complexity of matrix product. In Proc. 34th STOC. 144–151. Also available as ECCC Report 12, 2002. Sieveking, M. 1972. An algorithm for division of power series. Computing 10, 153–156. Strassen, V. 1973a. Die Berechnungskomplexit¨ at von elementarsymmetrischen Funktionen und von Interpolationskoeffizienten. Num. Math. 20, 238–251. Strassen, V. 1973b. Vermeidung von Divisionen. Crelles J. Reine Angew. Math. 264, 184–202. Valiant, L. 1976. Graph theoretic properties in computational complexity. J. Comp. Syst. Sci. 13, 278–285. Valiant, L. 1977. Graph theoretic arguments in low-level complexity. Number 53 in LNCS. Springer Verlag, 162–176.
Journal of the ACM, Vol. V, No. N, Month 20YY.