Interpolation of Depth-3 Arithmetic Circuits with Two Multiplication Gates

5 downloads 0 Views 399KB Size Report
Jun 19, 2007 - polynomial is not of small depth-3 rank (namely, after removing the g.c.d. of the ...... execution of steps 2a-2f they satisfy the following properties.
Interpolation of Depth-3 Arithmetic Circuits with Two Multiplication Gates∗ Amir Shpilka† June 19, 2007 Abstract In this paper we consider the problem of constructing a small arithmetic circuit for a polynomial for which we have oracle access. Our focus is on n-variate polynomials, over a finite field F, that have depth-3 arithmetic circuits with two multiplication gates of degree d. We obtain the following results: 1. Multilinear case: When the circuit is multilinear (multiplication gates compute multilinear polynomials) we give an algorithm that outputs, with probability 1 − o(1), all the depth-3 circuits with two multiplication gates computing the same polynomial. The running time of the algorithm is poly(n, |F|). 2. General case: When the circuit is not multilinear we give a quasi-polynomial (in n, d, |F|) time algorithm that outputs, with probability 1 − o(1), a succinct representation of the polynomial. In particular, if the depth-3 circuit for the polynomial is not of small depth-3 rank (namely, after removing the g.c.d. of the two multiplication gates, the remaining linear functions span a not too small linear space) then we output the depth-3 circuit itself. In case that the rank is small we output a depth-3 circuit with a quasi-polynomial number of multiplication gates. Our proof technique is new and relies on the factorization algorithm for multivariate black-box polynomials, on lower bounds on the length of linear locally decodable codes with 2 queries, and on a theorem regarding the structure of identically zero depth-3 circuits with four multiplication gates.



Preliminary version appeared in [Shp07]. Faculty of Computer Science, Technion, Haifa 32000, Israel. Email: [email protected]. This research was supported by the Israel Science Foundation (grant number 439/06). †

Contents 1 Introduction

2

1.1

Computational learning theory . . . . . . . . . . . . . . . . . . . . . . . . . .

2

1.2

Interpolation of arithmetic circuits . . . . . . . . . . . . . . . . . . . . . . .

3

1.3

Some definitions and statement of our results

. . . . . . . . . . . . . . . . .

4

1.4

Our techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

1.5

Organization

7

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2 Preliminaries

7

2.1

Locally decodable codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

2.2

Zero depth-3 circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

8

3 Multilinear circuits

9

3.1

Multilinear circuits: Low degree case . . . . . . . . . . . . . . . . . . . . . .

11

3.2

Case D ≤ 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

3.3

Multilinear circuits: High degree case . . . . . . . . . . . . . . . . . . . . . .

14

3.4

Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

4 General Circuits 4.1

Interpolation for the low rank case . . . . . . . . . . . . . . . . . . . . . . .

17

4.1.1

18

4.1.2 4.2

4.3

16

Step 1: Interpolating on a low dimensional random subspace . . . . . n

Step 2: Lifting from V to F . . . . . . . . . . . . . . . . . . . . . . .

19

Interpolation for the high rank case . . . . . . . . . . . . . . . . . . . . . . .

22

4.2.1

Step 1: Interpolating on a low dimensional subspace I . . . . . . . . .

22

4.2.2

Step 2: Interpolating on a low dimensional subspace II . . . . . . . .

24

4.2.3

Analysis of Algorithm 6 . . . . . . . . . . . . . . . . . . . . . . . . .

26

4.2.4

Step 3: lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

31

Completing the proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . .

32

5 Discussion

33

A Brute force interpolation

36

1

1

Introduction

In this work we consider the problem of constructing a small arithmetic circuit for a polynomial, for which we have oracle access. That is, there is black box holding a polynomial f and we would like to find a small arithmetic circuit that computes f . We are allowed to pick inputs (adaptively) and query the black box for the value of f on those inputs. The focus of this work is on n-variate polynomials that have small depth-3 arithmetic circuits, over some finite field F. We consider the simplest such circuits, those with only two multiplication gates (also known as ΣΠΣ(2) circuits). We obtain the following results. Let f be a polynomial, for which we have oracle access, that is computed by a ΣΠΣ(2) circuit. When f is computed by a multilinear ΣΠΣ(2) circuit we give a polynomial time algorithm that outputs, with high probability, all the multilinear ΣΠΣ(2) circuits that compute f . When f does not have a multilinear ΣΠΣ(2) circuit, we output in quasi-polynomial time, w.h.p., a short description for f (depending on a technical condition we either output a ΣΠΣ(2) circuit for it, or a depth-3 circuit of quasi-polynomial size). The problem of reconstructing a small arithmetic circuit for a polynomial using queries is a basic problem in algebraic complexity and is closely related to problems in learning theory. We now give some background that explains why studying depth-3 circuits is the next natural step given our current state of knowledge.

1.1

Computational learning theory

The question of whether we can compute a small description for a boolean function, for which we have oracle access, is a fundamental problem in learning theory. The problem, also known as the exact learning problem using membership queries, attracted a lot of research and both positive and negative results were proved. On the negative side it was shown that if a class of boolean circuits C contains trapdoor functions or pseudo-random functions then there are no efficient learning algorithms for it [OGM86, KV94, Kha95]. In particular, there are no efficient interpolation algorithms for the class T C40 (the class of depth 4 threshold circuits), under a widely believed cryptographic assumption [KL01, NR04]. Moreover, in [RR97] it was proved that if we consider a class of circuits C that can compute efficiently pseudorandom functions then it is hard to determine, in exponential time, whether a function given by its truth table can be computed efficiently by a circuit from C. In other words, even if the algorithm is given the whole truth table as input, it cannot determine whether f has a polynomial size circuit in C or not, in exponential time (i.e., in time polynomial in the size of the truth table). On the positive side, there are many works showing that in some restricted models of computation, e.g. when f has a small circuit from a restricted class of circuits, exact learning from membership queries is possible (e.g. [SS96, FBV96, BBTV97, BBB+ 00]). However, no exact learning algorithms are known for the class of bounded depth boolean circuits. Moreover, even if we allow the algorithm to run in exponential time and have access to the 2

truth table of the function then it is still not known how to compute a small bounded depth circuit for it. To conclude, exact learning is known only for very restricted classes of circuits, and we cannot hope to learn the class of depth-4 threshold circuits if certain cryptographic assumptions hold.

1.2

Interpolation of arithmetic circuits

As mentioned above we consider the algebraic analog of the exact learning problem. Let A be a class of arithmetic circuits over a field F. We are given oracle access to a polynomial f (x1 , . . . , xn ) ∈ F[x1 , . . . , xn ] that can be computed by an arithmetic circuit from A. We are allowed to ask for the value of the polynomial at points of our choice1 and we would like to output a succinct representation for it. Ideally we would like to output an arithmetic circuit from A that computes the polynomial. This problem is also known as the polynomial interpolation problem. Unlike the exact learning scenario, there are no results that show the impossibility of interpolating arithmetic circuits. This is mainly due to the fact that no reasonable notion of pseudo-random polynomials is known in the algebraic domain. However, it is widely believed that analogously to the exact learning case, it is impossible to efficiently interpolate arithmetic circuits of a certain constant depth. The reason for this belief is that depth3 arithmetic circuits can compute the arithmetic analog of threshold functions (see e.g. [SW01]), and as efficient exact learning of T C40 is believed to be impossible, we also expect efficient interpolation of bounded depth arithmetic circuits to be impossible. It is natural to ask then, what is the maximal depth for which efficient interpolation is possible. Similarly to the exact learning version, many efforts were invested in trying to interpolate restricted classes of arithmetic circuits. In particular, the class of depth-2 arithmetic circuits2 received a great deal of attention and several interpolation algorithms were devised for it [BOT88, GKS94, KS96, Man95, SS96, KS01]. Many works also focused on circuits that can be represented by small multiplicity automata [FBV96, BBTV97, BBB+ 00, KS06] and on the class of read-once arithmetic formulae [HH91, BHH95, BC98, BB98]. One unifying feature of all these classes is that they all compute polynomials whose partial derivatives span a low dimensional space (see e.g. [KS06]). In contrary, it is easy to give an example of a multilinear ΣΠΣ(2) circuit that computes a polynomial whose partial derivatives span a high dimensional space. Thus, known techniques cannot give efficient algorithms for interpolating polynomials computed by multilinear ΣΠΣ(2) circuits. This highlights the gap in our understanding of depth-2 circuits and depth-3 circuits (even those with only two multiplication gates). Thus, current techniques are incapable of interpolating depth-3 circuits, even those with 1

In the case of finite fields we may ask for the value over an algebraic extension field of F. Polynomials computed by small depth-2 circuits are also known as sparse polynomials, i.e. polynomials with a small number of monomials. 2

3

two multiplication gates, and it is believed that above some constant depth efficient interpolation is impossible. In this work we introduce new techniques that enable us to give interpolation algorithms to the class of depth-3 circuits with two multiplication gates. Before presenting our results we need to give several definitions.

1.3

Some definitions and statement of our results

Let f be a polynomial computed by a ΣΠΣ(2) circuit. Then f has the following form: f (¯ x) =

d1 Y

(1)

Li (¯ x) +

i=1

d2 Y

(2)

Li (¯ x),

(1)

i=1

(j)

where the Li -s are linear functions in the variables x¯ = (x1 , . . . , xn ), over the field F: (j)

x) = Li (¯

n X

αi,j,k xk + αi,j,0

k=1

for αi,j,k ∈ F. Let M1 and M2 be the multiplication gates of the circuit. That is, M1 =

d1 Y

(1) x) Li (¯

and M2 =

i=1

d2 Y

(2)

x). Li (¯

i=1

For a ΣΠΣ(2) circuit C we denote with deg(C) the maximal degree of its multiplication gates. For example, if C is given by equation (1) then deg(C) = max(d1 , d2 ). Let 4

gcd(C) = g.c.d.(M1 , M2 ) be the greatest common divisor of the multiplication gates. It is clear that we can write Q gcd(C) = ki=1 Li (¯ x), for some set of linear functions. Following the notations of [DS06] we define the simplification of C, sim(C), to be the circuit 4

sim(C) = C/ gcd(C). From the definition of sim(C) it is clear that there exists a subset3 I1 ⊆ [d1 ] and a subset I2 ⊆ [d2 ], such that |I1 | = d1 − k, |I2 | = d2 − k and Y (1) Y (2) sim(C) = Li (¯ x) + Li (¯ x). (2) i∈I1 3

i∈I2

As usual, [k] stands for the set {1, . . . , k}.

4

In order to state our results we introduce the notions of the rank of a ΣΠΣ(2) circuit, which we denote with rank(C), and the notion of depth-3 rank of f , which we denote with rank(f ). Given a ΣΠΣ(2) arithmetic circuit C, let sim(C) be as in equation (2). We define  n o 4 (1) (2) rank(C) = dim span Li , Lj : i ∈ I1 , j ∈ I2 . In other words, the rank of C is defined to be the dimension of the space spanned by the linear functions in sim(C). Let rank(f ) be the minimum, over all ΣΠΣ(2) circuits C that compute f , of rank(C). The motivation for the definition will become clearer in the analysis of our algorithm. We now state our results. The first result deals with the case of multilinear ΣΠΣ(2) circuits. A multilinear circuit is a circuit in which every multiplication gate computes a multilinear polynomial. Theorem 1 Let f be a multilinear polynomial in n variables that is computed by a degree d multilinear ΣΠΣ(2) circuit, over a field F. Then there is a randomized interpolation algorithm that given black box access to f and the parameters d and n runs in poly(n, |F|)time and with probability 1−o(1) outputs all the ΣΠΣ(2) circuits computing f . When |F| < n5 the algorithm is allowed to make queries to f from a polynomial size algebraic extension field of F. Our second result deals with general ΣΠΣ(2) circuits. Theorem 2 Let f be an n-variate polynomial computed by a ΣΠΣ(2) circuit of degree d, over a field F. Then there is a randomized interpolation algorithm that given black box access to f and the parameters d and n runs in quasi-polynomial time (in n, d, |F|) and has the following properties:  • If rank(f ) = Ω log2 (d) , then with probability 1 − o(1) the algorithm outputs the (unique) ΣΠΣ(2) circuit for f .  • If rank(f ) = O log2 (d) , then the algorithm outputs, with probability 1 − o(1), a polynomial Q(y1 , . . . , yk ) and k linear functions L1 , . . . , Lk such that Q(L1 , . . . , Lk ) = f and k ≤ rank(f ). When |F| < max(d5 , n5 ) the algorithm is allowed to make queries to f from a polynomial size extension field of F.  We note that in the case that rank(f ) = O log2 (d) the polynomial Q(L1 , . . . , Lk ) can be easily represented as a ΣΠΣ circuit with quasi-polynomially many multiplication gates (as Q has at most a quasi-polynomial number of monomials).

1.4

Our techniques

Our algorithms are based on the following scheme. We first restrict the inputs to the unknown polynomial to a low dimensional random subspace of Fn (although in the multilinear case 5

the subspace is not completely random, the intuition is the same). We then interpolate the polynomial on this subspace. The next step is to lift the representation that we found to the whole space. While this is the general scheme it has different realizations in the multilinear case and the general case, and even within each case we have to deal differently with the case of high rank and the case of low rank. However, the same problems lie in the core of the different cases: 1. Let V1 , . . . , Vk be subspaces of co-dimension 1 inside a linear space V . Given circuits C1 , . . . , Ck such that Ci computes f |Vi (the restriction of f to Vi ) how can we construct from them a single circuit C for f |V ? 2. Given linear spaces V ⊆ U such that V is of co-dimension 1 in U , and a circuit C computing f |V , how many circuits C 0 that compute f |U and whose restriction C 0 |V is equal to C are there? The first question arises when we interpolate the restriction of f to a random subspace V . Our algorithm first interpolates the restrictions of one of the multiplication gates to codimensional 1 subspaces of V and then combines the different results to get a representation of that gate over V (then using the factoring algorithms of [Kal85, KT90, Kal95] we are able to interpolate f |V ). To deal with the problem we consider linear functions in the different Ci -s that look similar to each other and try to glue them together to get a new function. This process may fail if for any linear function in, say, C1 there are many other linear functions in C1 that are at Hamming distance 1 from it. In such a situation it is hard for us to tell what is the “true” image of that linear function in the other Ci -s. On the other hand, if this is indeed the case then the linear functions in C1 generate a locally decodable code and using the results of [GKST06, DS06] on the length of such codes, we can prove that such anomaly cannot happen. Therefore we can always find a linear function to learn, until eventually we find the whole multiplication gate. The motivation for the second question is that when we lift the representation that we found on V to U there may be many different circuits that are possible lifts, and we somehow have to pick the right one to continue the lifting process with. To deal with this problem we note that if a polynomial has two different lifts then the difference of the lifts is the zero polynomial. By a result of [DS06] regarding the structure of identically zero depth-3 circuits, we get that the different lifts must be of small rank. This enables us to solve the problem for the high rank case (as in this case the lift is unique). The low rank case is indeed more problematic and this is why we need to output a circuit with quasi-polynomially many multiplication gates when f has low rank (although in the multilinear case we manage to overcome this difficulty by proving that the total number of possible lifts is polynomial).

6

1.5

Organization

The paper is organized as follows. In Section 2 We give some definitions, discuss locally decodable codes and state the result that we will need. Then we describe the results of [DS06] regarding identically zero depth-3 circuits. The proof for the multilinear case is given in Section 3. In Section 4 we give the proof for the general case.

2

Preliminaries

For a natural number n we denote [n] = {1, . . . , n}. For a set S ⊂ [n] we denote with S¯ the complement of S. Let F be a field. We denote with Fn the n-dimensional vector space over F. We shall use the notation x¯ = (x1 , . . . , xn ) to denote the vector of n indeterminates. For v ∈ Fn we denote with wt(v) the weight of v, i.e. the number of non zero coordinates in v. For two non-zero linear functions L1 , L2 we will write L1 ∼ L2 whenever L1 and L2 are linearly dependent. Let V = V0 + v0 ⊆ Fn be an affine subspace, where v0 ∈ Fn and V0 ⊆ Fn is a linear subspace. Let L(¯ x) be a linear function. We denote with L|V the restriction of L to V . We say that a set of linear functions L1 , . . . , Lk is linearly independent over V if the only linear combination of the Li -s whose restriction to V is identically zero is the all zero combination. We now introduce coordinates to the space V . Let v1 , . . . , vs be a basis of V0 . Let L(¯ x) be a linear function. We consider the restriction of L to V with respect to the basis {vi }. Then L|V can we written as a function in s variables. In particular if v = α1 v1 + . . . + αs vs + v0 then we define L|V (α1 , . . . , αs ) = L(v) = L(α1 v1 + . . . + αs vs ) + L(v0 ). For a polynomial f we denote with Lin(f ) the product of all the linear factors of f (with the right multiplicities). We also define sim(f ) = f /Lin(f ) to be the simplification of f . Clearly sim(f ) does not have any linear factors.

2.1

Locally decodable codes

In this section we define the notion of locally decodable codes and state the result that we need. The reader interested in a more complete background is referred to [KT00, GKST06]. We start with the definition of locally decodable codes. We fix in advance some of the parameters to constants in order to simplify the definition. Let E : Ft → Fn be a linear map. We say that E is a 2-query locally decodeable code if there exists a probabilistic oracle machine A such that: • A makes at most 2 queries (non-adaptively). • For every x ∈ Ft , for every y ∈ Fn with ∆(y, E(x)) < n/10, and for every i ∈ [t], we have Pr[Ay (i) = xi ] ≥ 2/3, where the probability is taken over the internal coin tosses of A, and ∆(·, ·) is the Hamming distance function. 7

The following theorem of [DS06] gives a lower bound on the length of such locally decodable codes. We note that [GKST06] were the first to prove similar lower bounds, however for large fields their result is not optimal. Theorem 3 (Theorem 1.2 of [DS06]) Let F be a field, and let E : Ft → Fn be a linear t 2-query locally decodable code, as in the definition above, then n ≥ 2 60 −1 .

2.2

Zero depth-3 circuits

In this section we state some results of [DS06] regarding identically zero depth-3 circuits. We start by giving some necessary definitions. A ΣΠΣ(k) circuit (that is, a depth-3 circuit with k multiplication gates) is identically zero if it computes the zero polynomial. Notice that this is a syntactic definition, we are thinking of the circuit as computing a polynomial and not a function over the field. Let C = M1 + . . . + Mk be an identically zero ΣΠΣ(k) circuit, where the Mi -sPare the multiplication gates. We say that C is minimal if there is no ∅= 6 I ( [k] such that i∈I Mi ≡ 0. C is simple if the g.c.d. of its multiplication gates is 1. The following theorem gives a bound on the degree of multilinear ΣΠΣ(k) circuits that are identically zero. 2

Theorem 4 (Corollary 6.9 of [DS06]) There exists an integer function D(k) = 2O(k ) such that every simple, minimal, identically zero multilinear ΣΠΣ(k) circuit is of degree d ≤ D(k). In other words, if C is an identically zero ΣΠΣ(k) multilinear circuit that is simple and minimal, then the degree of C is bounded by a constant depending on k. We will need to use the result only for ΣΠΣ(4) circuits. We state this as a corollary, and introduce the constant D4 that will play a part in our interpolation algorithms. Corollary 5 There exists a constant D4 , such that every identically zero multilinear ΣΠΣ(4) circuit that is simple and minimal is of degree ≤ D4 . The next theorem deals with general ΣΠΣ(k) circuit. Theorem 6 (Lemma 5.2 of [DS06]) Let k ≥ 3, d ≥ 2 be integers and C ≡ 0 be a simple 2 and minimal ΣΠΣ(k) circuit. Then rank(C) ≤ 2O(k ) logk−2 (d). As before we shall need the following corollary and the constant R4 . Corollary 7 There exists a constant R4 , such that every identically zero ΣΠΣ(4) circuit, of degree d, that is simple and minimal is of rank at most R4 · log2 (d). The following corollary shows how to use Corollaries 5 and 7 to guarantee uniqueness of a ΣΠΣ(2) circuit. Corollary 8 If f has a (multilinear) ΣΠΣ(2) circuit and rank(f ) > R4 · log2 (d) (deg(f ) > D4 ) then f has a unique (multilinear) ΣΠΣ(2) circuit.

8

Proof We prove the claim only for general ΣΠΣ(2) circuits. The proof for the multilinear case is identical. Let C1 = A1 + A2 be a ΣΠΣ(2) circuit for f , where A1 , A2 are its multiplication gates. By our assumption on rank(f ) we know that rank (C1 /g.c.d.(A1 , A2 )) > R4 · log2 (d). Assume that C2 = B1 + B2 is another ΣΠΣ(2) circuit for f , where B1 , B2 are its multiplication gates. Consider the circuit C = A1 + A2 − B1 − B2 , then C is a ΣΠΣ(4) circuit, and C ≡ 0. By the assumption on rank(C1 ) we get that rank(C) > R4 · log2 (d). By Corollary 7 (for the case of multilinear circuits we use Corollary 7) this implies that C is either not simple or not minimal. Note, that by the assumption on rank (C1 /g.c.d.(A1 , A2 )) we get that even if we remove the g.c.d. of C the remaining circuit still has rank > R4 · log2 (d). From this we conclude that C is not minimal. As C1 6≡ 0 we get that A1 −B1 ≡ 0 or A1 −B2 ≡ 0. 

3

Multilinear circuits

In this section we prove Theorem 1. The proof follows the scheme that was described in Section 1.4: we have two cases, the low rank case and the high rank case (in fact we look at low degree and high degree, but in the multilinear scenario this corresponds to low rank and high rank). In the multilinear scenario low rank means constant rank (where the constant depends on D4 as defined in Corollary 5), which makes life easier compared to the general case (in the general case low rank can be as large as O(log2 (d))). We shall have the following representation of f in mind: f = M1 + M2 where M1 and M2 are the multiplication gates of a multilinear ΣΠΣ(2) circuit for f , and are given by the equation M1 =

d Y

0

Li (Si ) and M2 =

i=1

d Y

L0j (Sj0 ),

(3)

j=1

where the Li -s (L0j -s) are linear functions and the sets {Si }i∈[d] ({Sj0 }j∈[d0 ] ) form a partition of the set of variables (Recall that M1 (M2 ) computes a multilinear polynomial). In particular f (¯ x) =

d Y

0

Li (Si ) +

i=1

d Y

L0j (Sj0 ).

(4)

j=1

The first step in our interpolation algorithm is to find all the linear factors of f . By the following theorem, which is an immediate corollary of the results of [Kal85, KT90, Kal95], this can be done efficiently. The theorem requires that the field that we are working with is not too small so from now on we shall assume that |F| ≥ n5 (we can make this assumption as we are allowed to query f on inputs from an extension field).

9

Theorem 9 Let d, n be integers. Let F be a finite field. Then there is a randomized algorithm A that gets as input a black box access to f and the parameters n and d, and outputs, in poly(n, d, log |F|) time, with probability 1 − exp(−n), all the linear factors, with their multiplicities, of f . Let Lin(f ) be the product of all the linear factors of f . The following lemma shows that if f cannot be represented as a product of linear functions then every linear function in its factorization also divides the multiplication gates M1 and M2 (note that this is not the case for non multilinear ΣΠΣ(2) circuits) . Lemma 10 Let f be a polynomial that is computable by a ΣΠΣ(2) circuit (as in Equation (4)). Assume that f cannot be represented as a product of linear functions. Then if a linear function L divides f then L must also divide both multiplication gates, in any multilinear ΣΠΣ(2) circuit for f . Proof Consider a multilinear ΣΠΣ(2) circuit for f . Assume w.l.o.g. that it is given by Equation (4). As we assume that f cannot be represented as a product of linear functions we can also assume w.l.o.g. that g.c.d.(M1 , M2 ) = 1. Assume for a contradiction that L does not divide M1 (and therefore does not divide M2 ). Consider the (affine) subspace on which L = 0. We must have that f |L=0 = 0. This implies that M1 |L=0 = −M2 |L=0 . As M1 and M2 are both products of linear functions we get that they share the same linear factors modulo L. In particular we can assume w.l.o.g. that4 d0 = d and Li |L=0 ∼ L0i |L=0 . By examination we get that the support of L (that is the set of non zero coordinates of L) is contained in the union of the supports of Li and L0i , for every 1 ≤ i ≤ d (recall that we assume that g.c.d.(M1 , M2 ) = 1). If d > 2 then this is not possible as the circuit for f is multilinear and the Si -s are disjoint (and so are the Sj0 -s). Thus we get that d = 2 (recall that we assumed that we removed the g.c.d.). However, if d = 2 and L divides f then there is another linear function L0 such that f = L · L0 in contradiction to the assumption that f is not a product of linear functions.  Thus, in contrary to the general case (i.e. non-multilinear circuits), every linear factor that we find is also a linear factor of both multiplication gates, unless f itself is a product of linear functions. In the rest of this section we shall assume that f is not a product of linear functions as this case is easy to handle by factorization. Recall that sim(f ) = f /Lin(f ). Clearly sim(f ) can be represented as a degree D multilinear ΣΠΣ(2) circuit, for some D. Note that as we assume that we are given the degree d of a multilinear ΣΠΣ(2) circuit for f then it is easy to compute D once we have the factorization of f . We shall consider two cases. The case that D ≤ D4 and the case that D > D4 (D4 is defined in Corollary 5). 4

We can also have d = d0 ± 1 but the analysis remains the same.

10

3.1

Multilinear circuits: Low degree case

Algorithm 1 shows how to find all the multilinear ΣΠΣ(2) circuits that compute sim(f ), when its degree D is at most D4 . As we are looking for multilinear circuits and we have Lin(f ) we shall only consider variables that do not belong to Lin(f ). In order not to add further notations we shall assume that the variables appearing in sim(f ) are {x1 , . . . , xn }. Algorithm 1 Multilinear circuits of low degree 1. Interpolate the polynomial sim(f ) to get an explicit representation of it as a sum of monomials. 2. ∀S ⊆ [n] of size |S| = 10D4 find all the degree D simple multilinear circuits in the variables of S. 3. For each such set S and circuit A, verify that A computes sim(f ) when the variables in S¯ are set to zero (denote this polynomial with sim(f )(S)). 4. For each such set S and circuit A that computes correctly sim(f )(S) do the following: for i 6∈ S set Si = S ∪ {i}. Repeat the previous steps and find all the multilinear ΣΠΣ(2) circuits Ai that compute sim(f )(Si ) and such that Ai |xi =0 ≡ A (i.e. after substituting xi = 0 the circuits are identical). If for some i there is no such circuit then move to the next circuit A for sim(f )(S). 5. For each S and A for which we found {Ai }i∈S¯ , combine the different circuits into one multilinear ΣΠΣ(2) circuit, if possible. 6. For each circuit found in the previous step verify that it computes sim(f ).

Analysis of Algorithm 1 Recall that we can assume that we have oracle access to sim(f ) (as we have oracle access to f and we computed Lin(f )). The first step in the algorithm is a simple interpolation of a multilinear polynomial of degree D in (at most) n variables. This step runs in time polynomial in nD (we simply query the polynomial on all inputs in {0, 1}n of weight at most D, and solve a system of linear equations to find the coefficients, remembering to look only at coefficients of monomials of degree at most D). Hence we can assume that we have an explicit representation of sim(f ) as sum of monomials. It is clear that in the second step we get a polynomial number of circuits. Indeed there are at most n10D4 many sets S, and for each of them there are at most |F|2D·(10D4 +1) multilinear ΣΠΣ(2) circuits in its variables (we just count the number of sets of at most 2D linear functions, in 10D4 variables). In the same way we see that computing the different Ai -s also require polynomial time. The next lemma shows that if D > 3 then for every Si and circuit A there is at most one multilinear ΣΠΣ(2) circuit Ai with Ai = sim(f )(Si ) and Ai |xi =0 ≡ A. 11

Lemma 11 Let g(x1 , . . . , xt ) be a polynomial of degree D > 3 that does not have any linear factors. Assume that we have a multilinear ΣΠΣ(2) circuit computing g|xt =0 . In other words we have D D0 Y Y g(x1 , . . . , xt )|xt =0 = Li + L0j . i=1

j=1

Then there is at most one multilinear circuit of the form 0

D D Y Y (Li + αi xt ) + (L0j + αj0 xt ) i=1

j=1

that computes g (note that such a circuit may not exist). Proof Assume for a contradiction that there are two different multilinear circuits of that form that compute g. Assume w.l.o.g.5 that the first circuit is g = C1 = (L1 + αxt ) ·

D Y

0

Li +

(L01

0

+ α xt ) ·

i=2

D Y

L0j .

j=2

There are three canonical options for the second circuit

C2 = L1 · (L2 + βxt ) ·

D Y

0

Li +

L01

·

(L02

0

+ β xt ) ·

i=3

C2 = (L1 + βxt ) · L2 ·

D Y

C2 = (L1 + βxt ) · L2 ·

L0j .

(5)

L0j .

(6)

L0j .

(7)

j=3 0

Li +

L01

·

(L02

0

+ β xt ) ·

i=3 D Y

D Y

D Y j=3 0

Li +

(L01

0

+ β xt ) ·

i=3

L02

·

D Y j=3

We shall only analyze the first case (given in Equation (5))as it is the more interesting one. The analysis of the other cases is similar (but simpler). So let us assume that C2 is given by Equation (5). As C1 = C2 we get, by exchanging sides, that D D0 Y Y 0 0 0 0 xt · (αL2 − βL1 ) · Li = xt · (β L1 − α L2 ) · L0j . i=3

j=3 0

L01 − α0 L02 )

Since D > 3 there is 3 ≤ i < D such that Li 6 |(β and so there must be 3 ≤ j ≤ D 0 such that Li ∼ Lj . This implies that Li divides g. However we assumed that g does not 5

There are two more cases to consider, but they follow from the same arguments.

12

have linear factors, so we have a contradiction.



The lemma implies that if D > 3 then (as A is simple) from all the circuits Ai such that ˆ i ) ≡ Ai Ai |xi =0 ≡ A there is exactly one way of constructing a circuit Aˆ that satisfies A(S ˆ i ) means that the variables outside Si are set to zero). It is also easy to see that Aˆ can (A(S be constructed in polynomial time (it is easy to find which linear functions in the different Ai -s should be combined together to form a new linear function – those that have the same coefficients for the variables in S). Thus, we get at most a polynomial number of circuits such that at least one of them computes sim(f ). As we have an explicit representation of sim(f ) we can easily verify which of the circuits actually compute it and output all of them. We now handle the cases of D = 3 and D = 2 (note that Lemma 11 say nothing for such D’s and indeed the claim is not true in this case).

3.2

Case D ≤ 3

We first analyze the case D = 3. The only possible difference from the case D > 3 is when there is some index i such that there are two (or more) different ways of extending the circuit A to a circuit Ai . From the proof of Lemma 11 we get that w.l.o.g. A has the following form A = L1 · L2 · L3 + L01 · L02 · L03 , and the two different lifts are Ai = (L1 + α2 xi ) · L2 · L3 + (L01 − β2 xi ) · L02 · L03 and A0i = L1 · (L2 − α1 xi ) · L3 + L01 · (L02 + β1 xi ) · L03 , for constant α1 , α2 , β1 and β2 . In particular we must have that L3 = c · (β1 L01 + c2 · β2 L02 ) and L03 = c · (α1 L1 + α2 L2 ), for some c 6= 0. Using the same reasoning it is easy to show that for any variable j 6∈ Si there is at most one circuit Ai,j (Si ∪ {j}) satisfying Ai,j = g(Si ∪ {j}) and Ai,j |xj =0 ≡ Ai . Thus, if we have such an index i with such Ai and A0i then, as in the case of D > 3, we get that there are at most two circuits Aˆ and Aˆ0 that for every j 6∈ Si satisfy ˆ i ∪ {j}) ≡ Ai,j A(S and Aˆ0 (Si ∪ {j}) ≡ A0i,j . 13

Therefore we get, again, that there are at most polynomially many circuits, that can be computed in polynomial time, and that contain all the possible representations for sim(f ). Hence we can find in polynomial time all the different multilinear ΣΠΣ(2) circuits for f . For the case D = 2 it follows from similar arguments that (using the same notations) there are at most two circuits Aˆ and Aˆ0 such that for any j 6∈ Si satisfy ˆ i ∪ {j}) ≡ Ai,j A(S and Aˆ0 (Si ∪ {j}) ≡ A0i,j . So again there are at most polynomially many possible circuits.

3.3

Multilinear circuits: High degree case

Algorithm 2 shows how to interpolate the circuit in case that deg(sim(f )) > D4 . Algorithm 2 Multilinear circuits of high degree ∀S ⊆ [n] such that |S| = 2D4 do the following. 1. ∀i 6∈ S pick a random assignment to xi from F. Denote the final assignment with ρ. Let sim(f )|ρ be the polynomial sim(f ) after substituting the partial assignment ρ. 2. If sim(f )|ρ has a non trivial linear factor then move to the next choice of a set S. 3. For every simple multilinear ΣΠΣ(2) circuit in the variables {xi }i∈S , over F, of degree greater than D4 (and at most 2D4 of course), check whether the circuit computes sim(f )|ρ . If no circuit computing sim(f )|ρ is found then move to the next S. 4. For every i 6= j, such that i, j 6∈ S set Si,j = S ∪ {i, j}. Find a simple multilinear ΣΠΣ(2) circuit that computes sim(f )|ρi,j , where ρi,j is the assignment that equals ρ on all the coordinates outside Si,j (namely, we “forget” the assignments to xi and xj ). 5. Glue the different circuits for the Si,j -s to a single multilinear ΣΠΣ(2) circuit for sim(f ). If no circuit is found then output “fail”. Analysis of Algorithm 2 Assume w.l.o.g. that we picked S = [2D4 ], and assigned the value ρi to xi for 2D4 < i ≤ n. We say that the assignment ρ is good if no new linear factors were introduced (note that the degree of f |ρ may be smaller though). That is, we assume that Lin(sim(f )|ρ ) = 1. Lemma 12 The probability that ρ is not good is smaller than (n + 1)2 /|F|.

14

Proof We first bound the probability that M1 or M2 were set to zero. This may only happen if a linear function from any of them was set to zero and this happens with probability 1/|F|. As there are 2D linear functions we get an upper bound on the probability of 2D/|F| (recall that we only work with the variables that do not belong to Lin(f )). To get a bound on the probability that a new linear factor was introduced, we note that this is possible only if there is a linear function L1 dividing M1 and a linear function L2 dividing M2 such that L1 |ρ ∼ L2 |ρ , and they were not set to constants. As there is only one coefficient c for which L1 |ρ = c · L2 |ρ , we see that the probability that this happened is equal to the probability that the part in L1 − cL2 that depends on the variables outside S was set to zero. As before this happens with probability 1/|F|. As there are at most D2 such pairs we get an upper bound of D2 /|F| on the probability. Thus, the overall probability is bounded by (D2 + 2D)/|F| < (D + 1)2 /|F| ≤ (n + 1)2 /|F|.  We assume now that we have a set S and a good assignment ρ (we can verify whether ρ is good by finding the linear factors of f |ρ ). Now we have to go over all possible simple multilinear ΣΠΣ(2) circuits for sim(f )|ρ of degree greater than D4 (and at most 2D4 ). Note that as there are only |F|2D4 +1 linear functions in the variables {xi }i∈[2D4 ] then there are polynomially many such circuits (|F|2D4 ·(2D4 +1) is an obvious upper bound). Among all those circuits we shall find, by going over all 0, 1 assignments to {xi }i∈[2D4 ] , a circuit that computes sim(f )|ρ . The next lemma shows that if such a circuit exists then it is the unique circuit for sim(f )|ρ . The proof follows from the definition of D4 and Theorem 5. Lemma 13 For a set S and a good assignment ρ, if deg(sim(f )|ρ ) > D4 then there is a unique simple multilinear ΣΠΣ(2) circuit that computes sim(f )|ρ . Moreover, this circuit is (M1 /Lin(f )) |ρ + (M2 /Lin(f )) |ρ . Proof By our assumption we have that deg(sim(f )|ρ ) > D4 . Assume for a contradiction that there is a multilinear ΣΠΣ(2) circuit C = A1 + A2 such that C = sim(f )|ρ and C 6= (M1 /Lin(f )) |ρ + (M2 /Lin(f )) |ρ . Therefore the circuit ˜ C = (M1 /Lin(f )) |ρ + (M2 /Lin(f )) |ρ − A1 − A2 is identically zero, of degree > D4 . Notice that as ρ is a good assignment we have that C˜ is simple. By Corollary 5 we are assured that it is not minimal and so we must have that A1 = (M1 /Lin(f )) |ρ or A1 = (M2 /Lin(f )) |ρ , which is a contradiction.  Lemma 13 assures us that there is at most one multilinear ΣΠΣ(2) circuit computing sim(f )|ρ . Similarly, we get that the algorithm finds, for every Si,j , the unique multilinear ΣΠΣ(2) circuit for sim(f )|ρi,j . Note that from the assumption that ρ is good it follows that (M1 /Lin(f )) |ρ and (M2 /Lin(f )) |ρ do not share a common linear factor. In particular, for every Si,j it is easy to find which of its multiplication gates is M1 and which is M2 (that is, which of them corresponds to the first multiplication gate of the multilinear ΣΠΣ(2) circuit for sim(f )|ρ ). In addition, for every two variables xi and xj we know whether there is a linear function in M1 that contains both of them or not, and if there is such a function then we know the ratio between their coefficients. Using this information it is easy to find M1 15

and M2 and hence sim(f ). We notice that by the uniqueness of the circuit for sim(f )|ρ we are assured that the circuit that we output is indeed a circuit for sim(f ), and moreover it is the unique multilinear ΣΠΣ(2) circuit for sim(f ).

3.4

Proof of Theorem 1

The proof of the Theorem is easy given the two algorithms. First we find Lin(f ) and compute D = d − deg(Lin(f )). If D ≤ D4 then we run the low degree algorithm. If D > D4 then we run the high degree algorithm. We are assured that if we get an answer then it is going to be correct. The probability of failure, in both algorithms is polynomially (in n) small (and can be easily decreased, e.g. by repeating the algorithm) and so the theorem follows.

4

General Circuits

In this section we prove Theorem 2. For convenience we repeat it here. Theorem 2 Let f be an n-variate polynomial computed by a ΣΠΣ(2) circuit of degree d, over a field F. Then there is a randomized interpolation algorithm that given black box access to f and the parameters d and n runs in quasi-polynomial time (in n, d, |F|) and has the following properties:  • If rank(f ) = Ω log2 (d) , then with probability 1 − o(1) the algorithm outputs the (unique) ΣΠΣ(2) circuit for f .  • If rank(f ) = O log2 (d) , then the algorithm outputs, with probability 1 − o(1), a polynomial Q(y1 , . . . , yk ) and k linear functions L1 , . . . , Lk such that Q(L1 , . . . , Lk ) = f and k ≤ rank(f ). When |F| < max(d5 , n5 ) the algorithm is allowed to make queries to f from a polynomial size extension field of F. From now on we shall assume, w.l.o.g., that the underlying field F is of size greater than max(d5 , n5 ). Assume that f is computed by a ΣΠΣ(2) circuit as in Equation (1). To ease the notations we shall assume that d1 = d2 = d. As described in Section 1.4 there are two conceptual steps in the proof. First we restrict the inputs to come from a random subspace V of dimension O(log2 (d)), where d = deg(f ). We then learn the restriction of f to V which we denote with f |V . The second step is to increase the dimension slowly and learn the restriction of f on larger and larger subspaces. While this is the general picture there is a difference in the way that we handle functions with high rank and functions with low rank. Still, the first step in 16

both algorithms for the low rank case and for the high rank case is the same: We remove the linear factors using the factoring algorithm of Theorem 9. Thus, we can easily find Lin(f ) and so from now on we will be interested in interpolating sim(f ). We start by describing the algorithm for the low rank case

4.1

Interpolation for the low rank case

In this section we give an interpolation algorithm for polynomials computed by ΣΠΣ(2) circuits of degree d, that have rank at most 10R4 · log2 (d) (recall the definition of R4 from Corollary 7). Let f be such a polynomial and assume that there is a ΣΠΣ(2) circuit for f . Algorithm 3 is an interpolation algorithm for f . We now explain each of its steps and analyze its performance. Algorithm 3 - General ΣΠΣ(2) circuits of low rank 1. Interpolating on a low dimensional random subspace: Pick a random subspace V of dimension s = 20R4 ·log2 (d)+log2 (n). Find k 0 ≤ 10R4 ·log2 (d) linearly independent linear functions {`i }i∈[k0 ] , such that sim(f )|V = Q(`1 , . . . , `k0 ), for some polynomial Q of degree deg(Q) ≤ d, and such that no such representation is possible for any k 00 < k 0 . 2. Lifting from V to Fn : Lift the representation and find linear functions {`0i }i∈[k0 ] , over Fn , such that Q(`01 , . . . , `0k0 ) = sim(f ). If no such representation is found then output “fail”.

Analysis of Algorithm 3 From the definition of rank(f ) it is clear that sim(f ) can be written as a polynomial in at most rank(f ) linear functions. The following definition will be useful for us as we interpolate sim(f ). Definition 14 Let h(x1 , . . . , xn ) be a polynomial. We say that h is a polynomial in exactly k linear functions if there is a polynomial P (y1 , . . . , yk ) and k linear functions L1 , . . . , Lk such that h = P (L1 , . . . , Lk ), and there is no such representation for h with less than k linear functions. The following lemma gives a sufficient and necessary condition for being a function of k linear functions. Lemma 15 Let h(x1 , . . . , xn ) be a polynomial over F. Then h can be written as a function of k linear functions if and only if there is a subspace V ⊆ Fn of dimension k that satisfies the following condition: Let V + ui , for 1 ≤ i ≤ |F|n−k , be the different cosets of V in Fn . Then ∀i and u ∈ V + ui we have that h(u) = h(u − ui ).

17

Note that the lemma does not speak about h being a function of exactly k linear functions. However it is clear that if there is a subspace V of dimension k that satisfies the condition in the lemma, and there is no subspace V 0 of dimension < k that satisfies the condition of the lemma, then h is a function of exactly k linear functions. Proof Assume that h is a function of k linear functions, namely h = P (L1 , . . . , Lk ). Assume w.l.o.g. that the Li -s are homogeneous. Let V ∗ = {¯ x ∈ Fn : ∀i Li (¯ x) = 0}. Let V be n ∗ such that F = V ⊕ W . Clearly dim(V ) = k (if the Li -s are linearly dependent then h is a function of at most k − 1 linear functions). It follows that every u ∈ Fn can be written as u = vu∗ + vu for some vu∗ ∈ V ∗ and vu ∈ V , and that for each i, Li (u) = Li (vu ). In particular h(u) = h(u − vu∗ ) (note that u sits in the coset vu∗ + V ). For the other direction, given such a V let L1 , . . . , Lk be k linear functions such that L1 |V , . . . , Lk |V are linearly independent. As before let V ∗ be the linear space on which the Li -s vanish. Clearly Fn = V ⊕ V ∗ . It is clear that h|V can be written as Q(L1 , . . . , Lk ), for some polynomial Q. Given u ∈ Fn write u = v + v ∗ , for v ∈ V and v ∗ ∈ V ∗ . We have that h(u) = h(u − v ∗ ) = Q(L1 (v), . . . , Lk (v)) = Q(L1 (u), . . . , Lk (u)). Thus h = Q(L1 , . . . , Lk ).



The following is an easy corollary. Corollary 16 Let h(¯ x) = P (L1 , . . . , Lk ) be a function of exactly k linear functions. Let V n be a subspace of F . Then h|V is a function of exactly k linear functions if and only if the restrictions L1 |V , . . . , Lk |V are linearly independent. Recall that sim(f ) can be written as a polynomial in at most rank(f ) linear functions. For convenience we shall assume that it is a polynomial of exactly k = rank(f ) linear functions. Let sim(f ) = P (L1 , . . . , Lk ) (8) where P (y1 , . . . , yk ) is a polynomial, {Li }ki=1 are linear functions. We shall later use this representation. 4.1.1

Step 1: Interpolating on a low dimensional random subspace

In this step we pick a random subspace V ⊆ Fn of dimension s = 20R4 · log2 (d) + log2 (n), and interpolate sim(f ) over it. We first show that if sim(f ) is a function of exactly k linear functions then w.h.p. its restriction to a random space V , of high enough dimension, is also a function of exactly k linear functions. To prove it we state a general lemma that will be helpful later. Lemma 17 Let {`i }i∈[t] be a set of linearly independent linear functions over Fn . Let V ⊆ Fn be a random s dimensional subspace. Then the probability that the set {`i |V }i∈[t] is linearly dependent is at most |F|t−s . 18

Proof Clearly, `1 , . . . , `t are linearly dependent on V if and only if there is a nonzero linear combination α1 `1 + . . . + αk `k that vanishes on V . Given V let us define L∗ = {`(x1 , . . . , xn ) : `|V = 0}. Note that as V is random then so is L∗ . Thus, `1 , . . . , `t are linearly dependent on V if and only if there is a nonzero function in the span of the `i -s that belongs to L∗ . In other words, we have to bound the probability that a random subspace of dimension n − s (i.e. L∗ ) intersects a given subspace of dimension ≤ t (i.e. span(`1 , . . . , `t )). As the probability that a random nonzero vector belongs to a given t dimensional subspace is (|F|t − 1)/(|F|n − 1) we get by the union bound that the probability of an intersection is upper bounded by (|F|n−s −1)·(|F|t −1)/(|Fn |−1) < |F|t−s .  From Corollary 16 and Lemma 17 we get the following corollary. Corollary 18 Let h be a function of exactly k linear forms6 . Let V be a random subspace of dimension ≥ k + log2 (n). Then the probability that h|V is not a function of exactly k linear 2 forms is at most |F|− log (n) . We assume then that sim(f )|V is a function of exactly k linear forms. the following lemma shows that we can easily verify that this is indeed the case. Lemma 19 Let h be a polynomial that is a function of exactly k linear functions. Let V ⊆ Fn be an s dimensional subspace such that h|V is a function of k 0 < k linear functions. Let v1 , . . . , vn−s be vectors such that span(V ∪ {vi }i∈[n−s] ) = Fn . Denote Vi = span(V ∪ {vi }). Then for some i ∈ [n − s] we have that h|Vi is a function of more than k 0 linear functions. Proof The claim is an immediate corollary of Lemma 15.



Note that if we find k linear forms that sim(f )|V can be written as a polynomial in, then we can learn it by a brute force interpolation (for completeness we give a description of this process in Appendix A). Of course we don’t have such linear functions, but as k is small, we can go over all choices of k 0 = O(log2 (d)) linearly independent linear forms (linear forms over V ) and for each such set {`i }i∈[k0 ] try to represent sim(f )|V as a polynomial Q(`1 , . . . , `k0 ). This may give us many different representations of sim(f )|V but we keep a representation for which k 0 is the smallest. Notice that by the assumption made after Corollary 18 we have that k 0 = k. Thus, after the second step we have a polynomial Q and linear functions {`i }i∈[k] such that sim(f )|V = Q(`1 , . . . , `k ). We now move to the second step in which we show how to lift the representation of sim(f )|V to all of Fn . 4.1.2

Step 2: Lifting from V to Fn

In the second step of Algorithm 3 we look for linear functions {`∗i }i∈[k] such that `∗i |V = `i and sim(f ) = Q(`∗1 , . . . , `∗k ). The existence of such functions is guaranteed by the following 6

A linear form is a homogeneous linear function.

19

two lemmas. Lemma 20 Let h(¯ x) be a polynomial in exactly k linear functions. Let P (`01 , . . . , `0k ) = h = Q(`1 , . . . , `k ) be two different representations for h. Then span({`0i }i∈[k] ) = span({`i }i∈[k] ). Proof Assume for a contradiction that, w.l.o.g., `01 , `1 , . . . , `k are linearly independent. Let W be the co-dimension 1 subspace on which `01 vanishes. It is clear that `1 |W , . . . , `k |W are linearly independent, whereas `01 |W , . . . , `0k |W are linearly dependent (as `01 |W = 0). In particular by Corollary 16 we get that h|W is a function of at most k − 1 linear functions (when considering the representation according to the `0i |W -s), and a function of exactly k linear functions (when considering the representation according to the `i |W -s), which is a contradiction.  Lemma 21 Let {Li }i∈[k] be the linear functions in Equation (8) and {`i }i∈[k] be the linear functions found Assume that {`∗i }i∈[k] satisfy `∗i |V = `i and  in the first step of our algorithm. ∗ ∗ span {`j }j∈[k] = span {Lj }j∈[k] . Then Q(`1 , . . . , `∗k ) = sim(f ). Proof We shall use the same notations as in Equation (8). By the assumption that  ∗ span {`j }j∈[k] = span {Lj }j∈[k] we see that there is a polynomial Q0 (y1 , . . . , yk ), of degree deg(Q0 ) = deg(P ), such that Q0 (`∗1 , . . . , `∗k ) = P (L1 , . . . , Lk ) = sim(f ). By considering the restriction to V we get Q(`1 , . . . , `k ) = sim(f )|V = P (L1 , . . . , Lk )|V = Q0 (`∗1 , . . . , `∗k )|V = Q0 (`∗1 |V , . . . , `∗k |V ) = Q0 (`1 , . . . , `k ). As the functions {`i }i∈[k] are linearly independent, and the degrees of Q and Q0 are smaller than the size of the field, we get that Q ≡ Q0 . In other words, Q(y1 , . . . , yk ) and Q0 (y1 , . . . , yk ) are the same polynomial. In particular Q(`∗1 , . . . , `∗k ) = Q0 (`∗1 , . . . , `∗k ) = sim(f ), which is what we wanted to prove.  We shall construct the `∗i -s in the following way. We already know what their restriction to V should look like. To continue we construct n − s linear spaces {Vi }i∈[n−s] of dimension s + 1, such that ∩Vi = V and span(∪Vi ) = Fn . After that we find the restrictions of the `∗j -s to each Vi (in the same manner as before). We then glue all these different restrictions together to get the `∗j -s. Note that as dim(span{`∗j }j∈[k] ) = k = dim(span{`j }j∈[k] ), then there is a unique, and easy, way of gluing the different restrictions of `∗j into one linear function, for each j. We now give the algorithm (Algorithm 4) for lifting from V to Fn . Analysis of Algorithm 4 It is clear that as the dimension of Vi is roughly the dimension of V , then each such interpolation requires quasi-polynomial time. Consider the polynomial and linear functions 20

Algorithm 4 Lifting from V to Fn 1. Let {vi }i∈[n] be a basis to Fn , such that {vi }si=1 is a basis to V . For every i ∈ [n − s] let Vi = span(V ∪ {vs+i }). For each Vi find a representation for sim(f )|Vi of the form (i) (i) (i) sim(f )|Vi = Q(`1 , . . . , `k ), where for each i and j we have that `j |V = `j . (i)

2. Find the unique linear functions {`∗i }i∈[k] that satisfy `∗j |Vi = `j , for each i and j.

returned by the algorithm over Vi . We denote with Qi (y1 , . . . , yk ) the polynomial and with (i) (i) (i) {`˜j }j∈[k] the linear functions. We have that Qi (`˜1 , . . . , `˜k ) = sim(f )|Vi . As V ⊆ Vi and (i) (i) Qi (`˜1 , . . . , `˜k )|V = sim(f )|V = Q(`1 , . . . , `k ),

we get by Lemma 20 that 

  (i) ˜ span {`j |V }j∈[k] = span {`j }j∈[k] and

   (i) span {`˜j }j∈[k] = span {Lj |Vi }j∈[k] .

(i) ˆ i such that Therefore, there is a different set of linear functions {`ˆj }j∈[k] and a polynomial Q

ˆ(i) ˜(i) ˜(i) ˆ i (`ˆ(i) Q 1 , . . . , `k ) = Qi (`1 , . . . , `k ) = sim(f )|Vi , and for every 1 ≤ j ≤ k (i) `ˆj |V = `j .

In addition we have that 

  (i) ˆ span {`j }j∈[k] = span {Lj |Vi }j∈[k] .

(9)

As this is the case for every 1 ≤ i ≤ n − s, and the vectors {vi }i∈[n] form a basis to Fn , it follows that for every 1 ≤ j ≤ k there is a unique linear function `∗j , over Fn , such that for every Vi (i) `∗ |V = `ˆ . j

j

i

`∗j -s

Moreover, it is easy to find the in polynomial time (think about the representation of (i) ∗ `j according to the basis {vi }. Then `j gives the first s coordinates, and each `ˆj gives the s + i’th coordinate). Equation 9 and the definition of the `∗j -s imply that   span {`∗j }j∈[k] = span {Lj }j∈[k] . By Lemma 21 we get that Q(`∗1 , . . . , `∗k ) = sim(f ). 21

This completes the interpolation of ΣΠΣ(2) circuits of rank at most 10R4 · log2 (d). We note that as k = O(log2 (d)) and deg(Q) ≤ d then Q(`∗1 , . . . , `∗k ) can be represented as a depth-3 circuit with quasi-polynomially many multiplication gates. Next we show how to handle the case of higher rank.

4.2

Interpolation for the high rank case

Algorithm 5 shows how to interpolate sim(f ) when rank(f ) ≥ 10R4 · log2 (d). In this section we shall use the notation f = M1 + M2 , where M1 and M2 are the two multiplication gates in the ΣΠΣ(2) circuit for f . We note that from Corollary 8 and the assumption that rank(f ) ≥ 10R4 · log2 (d) we get that f has a unique ΣΠΣ(2) circuit. Algorithm 5 General circuits of high rank 1. Interpolating on a low dimensional subspace I: Pick a random subspace V ⊆ Fn 4 of dimension s = 20R4 · log2 (d) + log2 (n). Consider the restriction f |V . For every set of t = 100 log(d) linearly independent linear functions {`i }i∈[t] over V , check whether for every i ∈ [t] the restriction of f to the (affine) subspace V ∩ {¯ x : `i (¯ x) = 0} is equal to a product of linear functions (by factoring). 2. Interpolating on a low dimensional subspace II: For each choice of {`i }i∈[t] , for which factoring was possible, merge (again, only if possible) the different factors into one multiplication gate. For each multiplication gate found, M , check whether f |V −M is a product of linear functions (using the factoring algorithm from Theorem 9). If this is the case then output the representation found for f |V . If no such representation is found then output “fail”. 3. Lifting from V to Fn : Lift the representation found over V to a representation over Fn .

4.2.1

Step 1: Interpolating on a low dimensional subspace I

Recall that gcd(C) = g.c.d.(M1 , M2 ), and that sim(C) = C/ gcd(C) = M10 + M20

(10)

is a ΣΠΣ(2) circuit. From Corollary 8 and the assumption that rank(f ) ≥ 10R4 · log2 (d) we get that sim(C) is the unique ΣΠΣ(2) circuit for f / gcd(C). Let V ⊆ Fn be a random subspace of dimension s = 20R4 · log2 (d) + log2 (n). The following lemma shows that w.h.p. rank(f |V ) is high. Lemma 22 Pr[rank(f |V ) ≤ 10R4 · log2 (d)] ≤ |F|−Ω(log 22

2

n)

.

Proof To prove the claim we show that w.h.p. the rank of the circuit M1 |V + M2 |V is not too small, and then, by Corollary 8, we get that this is actually the unique representation of f |V as a ΣΠΣ(2) circuit. Lemma 23   Pr rank

M1 |V + M2 |V g.c.d.(M1 |V , M2 |V )



 2 ≤ 10R4 log (n) ≤ |F|−Ω(log (n)) . 2

Proof [of Lemma 23] In order to show that the rank of M1 |V + M2 |V does not decrease by much, we first show that w.h.p. g.c.d.(M1 |V , M2 |V ) = g.c.d(M1 , M2 )|V , and then using Lemma 17 we complete the proof. Lemma 24 Pr[g.c.d.(M1 |V , M2 |V ) 6= g.c.d(M1 , M2 )|V ] ≤ d2 (|F| + 1)/|F|s = |F|−Ω(log

2

(n))

.

Proof [of Lemma 24] Clearly, g.c.d.(M1 |V , M2 |V ) can be larger than g.c.d(M1 , M2 )|V only if there are two linearly independent linear functions L1 , L2 , such that Li |Mi , and such that L1 |V ∼ L2 |V . The probability, over the choice of V , that this will happen is at most (|F| + 1)/|F|s . As there are at most d2 such pairs of linear function, the overall probability that the degree of the g.c.d. increases is at most d2 (|F| + 1)/|F|s .  We continue with the proof of Lemma 23. We know that w.h.p. g.c.d.(M1 |V , M2 |V ) = g.c.d.(M1 , M2 )|V . Therefore, w.h.p., (M1 |V + M2 |V )/g.c.d.(M1 |V , M2 |V ) = M10 |V + M20 |V (as defined in Equation 10). By the assumption on rank(f ), we get that the rank of the linear functions in M10 + M20 is at least 10R4 · log2 (d). The result follows by using Lemma 17 with the parameters7 s = 20R4 · log2 (d) + log2 (n) and t = 10R4 · log2 (d). The lemma implies that   2 Pr rank (M10 |V + M20 |V ) ≤ 10R4 · log2 (d) ≤ |F|− log (n) . This completes the proof of lemma 23.  In order to complete the proof of Lemma 22 we have to show that the only representation of f |V as a ΣΠΣ(2) circuit is M1 |V + M2 |V . By the definition of rank(f ) and the conclusion of Lemma 23 this will ensure us that rank(f |V ) ≥ 10R4 log2 (d). Indeed, by combining Corollary 8 and Lemma 23 we get that Pr[rank(f |V ) ≤ 10R4 · log2 (d)] ≤ |F|−Ω(log This completes the proof of Lemma 22.

2

n)

. 

Although the rank is at least 10R4 · log2 (d) we only consider a subset of the linear functions of dimension exactly 10R4 · log2 (d). 7

23

Now that we established that (w.h.p.) rank(f |V ) is high we continue with the analysis of our algorithm. We go over all choices of t = 100 log d linearly independent linear functions {` } . Having Equation (10) in mind, what we are actually after is a set {`i }i∈[t] such that Qit i∈[t] 0 0 i=1 `i divides M1 |V (or M2 |V ). As we just proved that rank(f |V ) is high, we get that there exists such a choice of linear functions. Note that the number of choices of sets that we have to consider is bounded from above by |F|100 log(d)·s , which is quasi-polynomial in n, d and |F|. Given a choice of linear functions {`i }i∈[t] we consider the following (affine) subspaces of V : V |`i =0 = {¯ x ∈ V : `i (¯ x) = 0}. 4

For each such subspace we factor the polynomial fi = f |V |`i =0 . If each of the polynomials {fi } factorizes to a product of linear functions then we keep {`i }i∈[t] for further study. After going over all choices of {`i }i∈[t] we move to the next step. Clearly this step takes quasipolynomial time. From now on we assume that g.c.d.(M1 |V , M2 |V ) = gcd(C)|V (i.e. that no new linear factor were introduced. By Lemma 17 this happens with high probability). 4.2.2

Step 2: Interpolating on a low dimensional subspace II

In this step we find a set {`i }i∈[t] that allows us to “glue” the different fi ’s to a single multiplication gate. This is the main step of the algorithm. Once we manage to do so it will be easy to find a representation for f |V . Before giving the algorithm we need some additional notations. Consider a set {`i }i∈[t] that survived the previous step (that is, that was kept for further study). For convenience we shall assume that V and {`i }i∈[t] have a particular nice structure. First we assume that V = {(a1 , . . . , as , 0, . . . , 0) : ∀i ai ∈ F}. Hence, every linear function defined on V is of the form L(v) =

s X

αi ai + α0 ,

i=1

for the vector v = (a1 , . . . , as , 0, . . . , 0). We shall also assume that `i (v) = `i ((a1 , . . . , as , 0, . . . , 0)) = ai . In other words we assume that `i is a projection on the i-th coordinate. Notice that as we can apply linear transformations to Fn , this can be assumed without loss of generality8 . Thus, by our assumption we get that fi = f |V |`i =0 = f |V |xi =0 , and so we can write fi =

di  Y

ei,j (i) Tj (x1 , . . . , xbi , . . . , xs )

j=1 8

Actually, `i can be an affine function and so should be of the form ai +νi , for some constant νi . However, this will not change the algorithm and will only add unnecessary complications to the presentation.

24

(i)

where Tj is a linear function, ei,j its multiplicity and (x1 , . . . , xbi , . . . , xs ) is the input vector (x1 , . . . , xs ) without its i-th coordinate. We also denote (i) Tj

=

s X

αi,j,b xb + αi,j,0 .

b=1

b6=i

We would like to merge the different factorizations to one. Before we give a formal algorithm we describe the intuition behind it. Q Assume that we are considering a set {`i }i∈[t] such that ti=1 `i divides M10 |V . It follows (1) that fi = M2 |V |`i =0 . In particular, there is a way of adding to each linear function Tj a term of the form α1,j,1 x1 in such a way that9 d1 Y (1) f |V = (Tj + α1,j,1 x1 )e1,j . j=1 (1)

We shall now see how to find the appropriate coefficient α1,j,1 for each Tj . Consider the factorization of f1 : d1  e1,j Y (1) f1 = Tj (xb1 , x2 , . . . , xs ) . j=1

As we just said, this gives us information about all the linear functions in M10 |V , except their coefficient of x1 . We therefore look at the factorization of f2 : f2 =

d2  Y

e2,j (2) Tj (x1 , xb2 , x3 , . . . , xs )

j=1 (1)

(2)

(1)

(2)

and search for two linear functions Tj1 and Tj2 such that Tj1 |x2 =0 ∼ Tj2 |x1 =0 . In other (1) (2) words, Tj1 and Tj2 have the same (up to a constant factor) vector of coefficients (except the coefficients of x1 and x2 ) and the same free term. We would like to combine the two functions into one linear function. Namely, we would like to find a linear function L such (1) (2) that L|x1 =0 ∼ Tj1 and L|x2 =0 ∼ Tj2 . Naturally we would like to add the term α2,j2 ,1 x1 to (1) Tj1 (after an appropriate normalization). The problem is however, that when considering (1) Tj1 |x2 =0 , there may be other indices j10 , j20 , such that (1)

(1)

(2)

(2)

Tj 0 |x2 =0 ∼ Tj1 |x2 =0 ∼ Tj2 |x1 =0 ∼ Tj 0 |x1 =0 , 1

2

and we will not know how to combine them, and with what multiplicity. 9

(1)

We are cheating here: Tj careful.

may have several lifts (up to e1,j different lifts) and so we need to be more

25

A solution to this problem is to find an index j and a coordinate i, such that there is no (1) (1) other j 0 for which Tj 0 belongs to the span of Tj and xi . In particular, for those i and j we have that for every j 0 , (1) (1) Tj 0 |xi =0 6∼ Tj |xi =0 . (i)

Once we have such a pair (j, i) we look for a function Tmj such that (1)

Tj |xi =0 ∼ Tm(i)j |x1 =0 , and from the previous discussion it is clear that there is only one consistent way of defining (1) (i) a linear function L that satisfy L|x1 =0 ∼ Tj and L|xi =0 ∼ Tmj (up to multiplication by a constant). Thus, we only have to prove that we can always find such a pair (j, i). To show this we consider for a contradiction the case where such a pair does not exist. In such a (1) (1) situation we have that for every pair (j, i) there is an index j 0 such that Tj 0 |xi =0 ∼ Tj |xi =0 . (1)

(1)

In other words, for every xi and every j there is j 0 such that xi ∈ span(Tj , Tj 0 ). But this (1)

(1)

means that the mapping (x2 , . . . , xs ) → (T1 , . . . , Td1 ) is a locally decodable code. Hence, by Theorem 3 we must have that s = O(log d1 ), which contradicts the assumption that f |V has high rank. We can continue to learn functions this way until no such pair (j, i) exists. Then we use the information about the linear functions that we already learned in order to learn the (1) remaining functions. Again, the idea is to find a function Tj whose x1 -coefficient is still (1) (1) (1) (1) unknown and to find xi such that all the Tj 0 that satisfy Tj |xi =0 ∼ Tj 0 |xi =0 , except Tj , are known already. We now give the formal algorithm (Algorithm 6) that shows how to glue the different fi -s together. 4.2.3

Analysis of Algorithm 6

We now show that for some set {`i }i∈[t] the algorithm finds a representation of f |V , and analyze its running time. Theorem 25 If the rank of f |V is larger than t = 100 · log(d) (and in our case it is at least 10R4 · log2 (d)), then there is a set {`i }i∈[t] for which Algorithm 6 outputs a ΣΠΣ(2) circuit computing f |V . Proof We assume that sim(f ) is given by Equation (10). We also assume that, w.l.o.g., M10 |V contains t linearly independent functions {`i }i∈[t] . We shall prove that this is a set for which the algorithm above outputs a ΣΠΣ(2) circuit for f |V . Denote M20 |V

=

d Y

(Lj (x1 , . . . , xs ))ej .

j=1

26

(11)

Algorithm 6 Gluing the different fi -s together 1. Set J , L = ∅. 2. While |J | < d1 (= deg(f1 )) do the following: (1)

(1)

(a) Find a pair of indices (i, j) such that wt(Tj ) > 1 and for every j 0 6= j if Tj 0 |xi =0 ∼ (1)

Tj |xi =0 then j 0 ∈ J . If no such (i, j) exists then go to step 2f. (i)

(1)

(b) Find all the m-s such that Tm |x1 =0 ∼ Tj |xi =0 . Let {m1 , . . . , mr } be the indices found. Let Rk1 , . . . , Rka ∈ L, be all the linear functions in L that satisfy (1)

∀γ ∈ [a]

Rkγ |x1 =xi =0 ∼ Tj |xi =0 .

(c) Let p 6= i be such that α1,j,p 6= 0 and p is the first such index. Normalize the (i) functions {Tmb }b∈[r] and {Rkγ }γ∈[a] such that their coefficient of xp is equal to (i) α1,j,p . For simplicity denote with {Tmb }b∈[r] , {Rkγ }γ∈[a] the normalized functions as well. (d) For each b ∈ [r] denote X

κj,i,b = R∈L s.t.

µ(R).

(i) R|xi =0 ∼Tmb

When there is no such R we set κj,i,b = 0. (1)

(e) For each triplet (j, i, b) with κj,i,b < emb ,i define Rj,i,b = Tj + αi,mb ,1 x1 and set its multiplicity to be µ(Rj,i,b ) = ei,mb − κj,i,b . Add j to J and Rj,i,b , with multiplicity (i) µ(Rj,i,b ), to L (recall that Tmb was normalized). (f) Learn all the functions of weight 1 in the factorization of f1 in a similar fashion. Q 3. Find α ∈ F such that α · R∈L (R|x1 =0 )µ(R) = f1 . If no such α exists output “fail”. Q 4. Factor the polynomial f |V − α · R∈L Rµ(R) . If it factorizes as a product of linear Q µ(T ) functions i∈[d] Ti i then define C to be the following ΣΠΣ(2) circuit C =α·

Y

Rµ(R) +

R∈L

Y

µ(Ti )

Ti

.

i∈[d]

Otherwise output “fail”. 5. Verify that the rank of C is ≥ 10R4 · log2 (d). Otherwise output “fail”. By running over all inputs from V verify that C = f |V and output C. If C 6= f |V then return “fail”.

27

(i)

For each Lj we denote10 Lj = Lj |`i =0 = Lj |xi =0 . With this notation we have11 ej Y  (i) fi = Lj (x1 , . . . , xs ) . j

We prove the theorem by looking at the sets J and L and showing that after each execution of steps 2a-2f they satisfy the following properties. (1)

Property 1: For every j ∈ J and for every Lk such that Lk |x1 =0 ∼ Tj R ∈ L such that R ∼ Lk . For this R we have that µ(R) = ek .

there is a unique

Property 2: For every R ∈ L there exists a unique j ∈ J and a unique Lk such that (1) R ∼ Lk , and R|x1 =0 ∼ Tj . For this Lk we have that µ(R) = ek . This seems like a complicated assertion, but it matches the intuition for the algorithm that (1) we sketched before. We think of J as the set of indices for which we already learned Tj , (1) and of L as the collection of linear functions {Li } that correspond to the Tj -s that we learned. We start by showing that the those properties trivially hold for J = ∅. From the structure of our algorithm it is clear that when J = ∅ then also L = ∅ and there is nothing to prove. So assume that J and L have these properties. We now run the algorithm for our J and L and see how it affects them. (1)

We start by showing that in step 2a we indeed find a function Tj to work with. Recall (1) (1) that we look for a function Tj and a coordinate i such that for every j 0 6= j with Tj 0 |xi =0 ∼ (1)

Tj |xi =0 , we have that j 0 ∈ J . The following claim shows that if t > 60 log(d) + 60 then we can always find such a pair (i, j). Lemma 26 Let F be a set of ≤ d linear functions in t > 60 log(d) + 60 variables, such that any two linear functions T 6= T 0 ∈ F are linearly independent (that is, T 6∼ T 0 ). Then there exists a linear function T ∈ F and an index 1 ≤ i ≤ t, such that there is no T 6= T 0 ∈ F satisfying T 0 |xi =0 ∼ T |xi =0 . The proof is by a reduction to a result on Locally Decodable Codes. We show that if it is not the case then we can construct a 2-query linear locally decodable code over the field F, that sends t field elements to ≤ d elements. Because of the lower bounds on the length of such LDC’s this will be a contradiction. Proof [of Lemma 26] Assume for a contradiction that for every T ∈ F and every i ∈ [t], there is T 6= T 0 ∈ F such that T 0 |xi =0 ∼ T |xi =0 . Consider the code (x1 , . . . , xt ) → (T (x1 , . . . , xt ))T ∈F . We shall prove that this is a 2-query LDC from Ft to F|F | . Consider xi for some i ∈ [t]. We can partition F to disjoint sets F = ∪rk=1 Fk , for some 1 ≤ r < |F|, such that if T 6= T 0 ∈ Fk then xi ∈ span(T, T 0 ). This follows from the assumption that any two functions in F are linearly independent, so if 10 11

Recall our assumption on the structure of V and {`i }. Notice that for each i there might be Lj1 and Lj2 such that Lj1 |xi =0 ∼ Lj2 |xi =0 .

28

T 0 |xi =0 ∼ T |xi =0 , then xi ∈ span(T, T 0 ). This immediately shows that this is a 2-query LDC that can decode correctly with probability greater than, say, 2/3 from 1/10 fraction of errors. By Theorem 3 it follows that t ≤ 60 log(|F|)+60 ≤ 60 log(d)+60, which is a contradiction.  (1)

In particular we can always find a linear function Tj with j 6∈ J and an index i such (1) (1) (1) that any j 0 6= j with Tj 0 |xi =0 ∼ Tj |xi =0 is in J . Assume that wt(Tj ) > 1 (this is not really necessary, but it is easier to handle this case). We note, that the analysis is basically (1) the same for the case wt(Tj ) = 1. We now show that Step 2e indeed finds all the Lk (1) s whose restriction to x1 = 0 is equal to Tj , and puts them in L with the right multiplicities. Analysis of Step 2e: Let DL , Dj ⊆ [d] be defined as n o (1) Dj = u ∈ [d] : Lu |x1 =0 ∼ Tj , n o (1) (1) DL = k ∈ [d] : Lk |x1 =0 6∼ Tj and Lk |x1 =xi =0 ∼ Tj |xi =0 . The intuition for the definition is that Dj is the set of linear functions in M20 |V that are (1) mapped to Tj under the restriction x1 = 0, and DL is the set of linear functions in M20 |V (1) that are not equal to Tj , when we set x1 = 0, because of the i’th coordinate. From the definition of Dj we get that X eu . (12) e1,j = u∈Dj

We now show that from Property 1 of J and L it follows that if k ∈ DL then Lk ∈ L (up to multiplication by a constant). Indeed, Lk |x1 =0 appears in the factorization of f1 , say (1) (1) Tjk ∼ Lk |x1 =0 . By our choice of (i, j) and because Tjk |xi =0 ∼ Lk |x1 =xi =0 ∼ Tj |xi =0 we get that jk ∈ J . From Property 1 it follows that there is R ∈ L such that R ∼ Lk and µ(R) = ek . We also note that by Property 2 there is no u ∈ Dj and R ∈ L such that R ∼ Lu (as then we would have j ∈ J ). We continue with the analysis of Step 2e. Let {Tmb }b∈[r] be as in Step 2b. For each mb , let  Dmb = v ∈ [d] : Lv |xi =0 ∼ Tm(i)b . Lemma 27 The following equalities hold 1. Dmb \ DL = Dj ∩ Dmb . 2. |Dj ∩ Dmb | = 1. P 3. κj,i,b = k∈DL ∩Dm ek . b

29

(i)

(1)

Proof [of Lemma 27] We start by proving the first equality. Recall that Tmb |x1 =0 ∼ Tj . (i) (1) Consider some k ∈ Dmb . We have that (Lk |xi =0 )|x1 =0 ∼ Tmb |x1 =0 ∼ Tj |xi =0 . Therefore, by definition, if k 6∈ Dj then k ∈ DL . In other words Dmb ⊆ DL ∪ Dj . As Dj and DL are disjoint we get that Dmb \ DL = Dj ∩ Dmb . We prove the second equality by showing that if u ∈ Dj ∩ Dmb then we can reconstruct (1) (i) Lu from Tj and Tmb , and therefore u is the only element in |Dj ∩ Dmb |. We now give the (1) (i) formal argument. Let u ∈ Dj ∩Dmb . It follows that Lu |x1 =0 ∼ Tj and Lu |xi =0 ∼ Tmb . As we (1) assume that wt(Tj ) > 1 it follows that there is a coordinate p 6= i such that the coefficients (1) (i) of xp in Tj , in Tmb and in Lu are non zero (p is the first non zero coordinate, that is different (1) from i, in Tj ). We notice that this implies that there is a unique way of reconstructing Lu (1) (i) from Tj and Tmb . Indeed, as we do not care for multiplication by constants we may assume that the coefficient of xp in all three functions is 1 (in fact this is what we do in Step 2c). Thus (i) the coefficient of x1 in Lu must be the same as the coefficient in Tmb of x1 , and similarly the (1) coefficient of xi in Lu must be the same as the coefficient in Tj of xi . The other coefficients are equal in all three functions. Therefore, Lu is determined up to a constant factor, but as the different Lk -s are linearly independent, we get that |Dj ∩ Dmb | = 1. Finally to prove the third inequality we first note that from Property 1 it follows that (i) if k ∈ DL ∩ Dmb then there exists a unique R ∈ L such that R ∼ Lk , R|xi =0 ∼ Lk |xi =0 ∼ Tmb and µ(R) = ek . In particular this implies that X X ek . µ(R) ≥ κj,i,b = (i)

R∈L s.t. R|xi =0 ∼Tmb

k∈DL ∩Dmb

On the other hand, Property 2 implies that for every R ∈ L there exists Lk and j 0 ∈ J such (i) 0(1) that R ∼ Lk and R|x1 =0 ∼ Tj (and of course µ(R) = ek ). In particular, if R|xi =0 ∼ Tmb (i) (1) then Lk |xi =0 ∼ Tmb . Thus, k ∈ Dmb . In addition, as j 0 ∈ calJ we get that Lk |x1 =0 6∼ Tj . (1) Moreover, we have that Tj |xi =0 ∼ Tmb |x1 =0 ∼ Lk |x1 =0 . Thus, k ∈ DL as well. We conclude (i) that for every R ∈ L such that R|xi =0 ∼ Tmb there exists a unique k ∈ DL ∩ Dmb with µ(R) = ek . This implies that X X κj,i,b = µ(R) ≤ ek . (i)

R∈L s.t. R|xi =0 ∼Tmb

And so equality holds.

k∈DL ∩Dmb



Intuitively, the equalities in the lemma say that we learned all the Lk ’s that satisfy (1) (1) (i) Lk |x1 =xi =0 ∼ Tj |xi =0 , except for the unique Lu for which Lu |x1 =0 ∼ Tj and Lu |xi =0 ∼ Tmb . (1) Hence, for this u after the normalization of Step 2c, we have that Lu ∼ Tj + αi,mb ,1 x1 .

30

Moreover, we get (by an equality similar to the one in Equation (12)) that X X ek = ei,mb − ek = ei,mb − κj,i,b . eu = ei,mb − k∈DL ∩Dmb

k6=u s.t. (i) Lk |xi =0 ∼Tmb

It follows that when we add j to J we also add Lu to L (or some multiple of it) with the right multiplicity, namely that µ(Rj,i,b ) = eu . This completes the analysis of Step 2e, for (1) (1) wt(Tj ) > 1. The proof for the case wt(Tj ) = 1 is similar (though much simpler) and so we omit it12 . It is easy to check that Properties 1,2 hold for the new J and L. It is clear that the remaining steps of the algorithm indeed output a ΣΠΣ(2) circuit for our choice of {`i }i∈[t] (recall that we assumed, at the beginning of the proof, that they all divide M10 |V ). This completes the proof of Theorem 25.  Comment 28 We note that actually there was no reason to output “fail” in Step 5 if the rank is small. However, in such a case uniqueness is not guaranteed and so we have to take the approach of the low-rank case. Thus, for simplicity we output “fail” for the case that the circuit that we found is of low rank. Theorem 25 shows that there is some set {`i }i∈[t] such that the algorithm outputs a ΣΠΣ(2) for f |V when given this subset13 . As we proved in Lemma 22 that w.h.p. rank(f |V ) ≥ 10R4 · log2 (d), it follows by Corollary 8 that all the different ΣΠΣ(2) circuits that we found in this stage are actually the same. That is, no matter with which set {`i }i∈[t] we started, if the algorithm outputs a ΣΠΣ(2) circuit then it is always the same. We thus have the following corollary. Corollary 29 If we picked V such that rank(f |V ) > R4 · log2 (d) then Algorithm 6 outputs the unique ΣΠΣ(2) circuit for f |V . 4.2.4

Step 3: lifting

For convenience let us assume w.l.o.g., again, that V = {v = (a1 , . . . , as , 0, . . . , 0) | ∀i ai ∈ F}. For 1 ≤ i ≤ n − s let Wi = {w = (a1 , . . . , as , 0, . . . , 0, as+i , 0, . . . , 0) | ∀j aj ∈ F}. (1)

12

In short, assume, w.l.o.g., that the pair (i, j) is found and that Tj = x2 . It follows that no Lk is of the form a1 x1 + a2 x2 + ai xi for nonzero a2 and ai . In particular, by considering the factorization of fi , we can recover all the Lk -s of the form a1 x1 + a2 x2 for nonzero a2 . 13 Recall that we assumed w.l.o.g. that the subset is actually {x1 , . . . , xt }, which is the reason that the `i -s do not appear in the proof.

31

We run steps 1,2 of Algorithm 5 for every Wi and get, for every 1 ≤ i ≤ n − s, a ΣΠΣ(2) circuit Ci for f |Wi (by Corollary 29 it follows that if rank(f |V ) > R4 · log2 (d) then the algorithm also succeeds for every Wi ). We now show how to combine the different circuits into one circuit. Assume that our original ΣΠΣ(2) for f has the following form f (¯ x) =

d Y

(1) Li (¯ x)

+

i=1

where M1 =

Qd

i=1

(1)

Li (¯ x) and M2 =

Qd

=

(2)

Li (¯ x),

i=1 (2)

i=1

(k)

Li

d Y

Li (¯ x). Denote

n X

(k)

(k)

αi,j xj + αi,0 ,

j=1

for k = 1, 2. If rank(f |V ) > R4 · log2 (d) then by Corollary 29 the circuits that we learned are Ci = M1 |Wi + M2 |W2 . In particular Ci =

d Y

(1) Lk (¯ x)|Wi

k=1

=

s d X Y k=1

+

(1) αk,s+i xs+i

(2)

Lk (¯ x)|Wi

k=1

! (1) αk,j xj

+

d Y

+

(1) αk,0

+

j=1

d s Y X k=1

! (2) αk,j xj

+

(2) αk,s+i xs+i

+

.

j=1

Therefore all the we have to do is to find all the linear functions of the form (1)

(2) αk,0

Ps

j=1

(1)

αk,j xj +

(1)

αk,s+i xs+i + αk,0 , for i ∈ [n − s], and glue them together. To do so we must be sure that there are no two linearly independent linear functions in f that their restrictions to V are linearly dependent. Indeed, if no such pair exists then it is very easy to combine the different circuits Ci into one circuit C for f : for every linear function L1 in C1 , that does not divide both its multiplication gates, there is a unique linear function Li in Ci such that their restrictions to xs+1 = xs+i = 0 are linearly dependent (note that by our assumption that g.c.d.(M1 |V , M2 |V ) = g.c.d.(M1 , M2 )|V we get that if L1 divides both multiplication gates in C1 then there is a unique Li that divides both multiplication gates in Ci , and the continuation is the same for this case as well). Therefore there is only one way of generating a linear function L such that L|xs+1 =0 ∼ L1 and L|xs+i =0 ∼ Li . To see that there are no two linearly independent linear functions in f that their restrictions to V are linearly dependent we notice that as we picked V to be a random subspace of dimension Ω(log2 (d)) then Lemma 17 assures us that with probability greater than 1 − exp(− log2 (n)) no two linearly independent linear functions from C become linearly dependent when restricted to V . This completes the interpolation of f for the high rank case.

4.3

Completing the proof of Theorem 2

We note that if we know rank(f ) then the algorithms described in Sections 4.1, 4.2 give the required result. As we don’t know what the rank is we run the high rank algorithm 32

for every rank between 10R4 · log2 (d) and d. Once the algorithm output a ΣΠΣ(2) circuit we halt. Indeed, Step 5 of Algorithm 6 guarantees that we have a ΣΠΣ(2) circuit C 0 that computes f |V . As the rank of this circuit is large we get by uniqueness (Corollary 8) that rank(f |V ) ≥ 10R4 · log2 (d). As the same holds for every f |Wi (defined in Section 4.2.4) we get that the final circuit that we compute is indeed a circuit for f . If none of the executions of the high rank algorithm resulted in a ΣΠΣ(2) circuit then we run the low rank algorithm for each rank between 1 and 10R4 · log2 (d). By Lemma 19 we can check and see whether we have the right rank (by executing the algorithm again for the Vi -s from the lemma). When we pick the correct rank and V is such that the rank of f |V is the same as rank(f ), we are assured that we compute a correct representation for f . To analyze the error probability of the algorithm we note that the possible errors are in the choice of V and in the factoring algorithm. Recall that V is “bad” for us with probability 1/p(n, d) where p is some quasi-polynomial function of n and d (e.g. by Corollary 18). In addition we can repeat the factoring algorithm of Theorem 9 polynomially many times to reduce the error probability. Thus the over all error is quasi-polynomially small (and can be further reduced). This completes the proof of the theorem.

5

Discussion

In this work we showed how to interpolate ΣΠΣ(2) circuits. Our algorithm is randomized so a natural question is to find a deterministic algorithm for the task. We note that by the recent work of [KS07a] we can save on randomness and choose the space V (which is the first space that we restrict our circuit to) in a deterministic way. However, the factoring algorithm in Theorem 9 requires randomness and it is not clear how to get rid of it. Another natural question is to extend our algorithm to handle general ΣΠΣ circuits. This seems far at the moment, however a recent result of [KS07b] shows how to interpolate ΣΠΣ(k) circuits for every constant k.

Acknowledgements I would like to thank Amir Yehudayoff and Nader Bshouty for many helpful discussions, Erich Kaltofen and Joachim von zur Gathen for answering my questions regarding factorization algorithms and Avi Wigderson and the anonymous reviewers for helpful comments. Thanks to Oded Goldreich and Ran Raz for their encouragement. The seed for this work was planted in the BIRS workshop “Recent Advances in Computation Complexity”. I thank the organizers for inviting me and BIRS for hosting the workshop.

33

References [BB98]

D. Bshouty and N. H. Bshouty. On interpolating arithmetic read-once formulas with exponentiation. J. of Computer and System Sciences, 56(1):112–124, 1998.

[BBB+ 00] A. Beimel, F. Bergadano, N. H. Bshouty, E. Kushilevitz, and S. Varricchio. Learning functions represented as multiplicity automata. JACM, 47(3):506–530, 2000. [BBTV97] F. Bergadano, N. H. Bshouty, C. Tamon, and S. Varricchio. On learning branching programs and small depth circuits. In Proceedings of the 3rd European Conference on Computational Learning Theory, volume 1208 of LNAI, pages 150–161, 1997. [BC98]

N. H. Bshouty and R. Cleve. Interpolating arithmetic read-once formulas in parallel. SIAM J. on Computing, 27(2):401–413, 1998.

[BHH95]

N. H. Bshouty, T. R. Hancock, and L. Hellerstein. Learning arithmetic read-once formulas. SIAM J. of Computing, 24(4):706–735, 1995.

[BOT88]

M. Ben-Or and P. Tiwari. A deterministic algorithm for sparse multivariate polynominal interpolation. In Proceedings of the 20th Annual STOC, pages 301– 309, 1988.

[DS06]

Z. Dvir and A. Shpilka. Locally decodable codes with 2 queries and polynomial identity testing for depth 3 circuits. SIAM J. on Computing, 36(5):1404–1434, 2006.

[FBV96]

N. H. Bshouty F. Bergadano and S. Varricchio. Learning multivariate polynomials from substitution and equivalence queries. ECCC, 3(8), 1996.

[GKS94]

D. Grigoriev, M. Karpinski, and M. F. Singer. Computational complexity of sparse rational interpolation. SIAM J. on Computing, 23(1):1–11, 1994.

[GKST06] O. Goldreich, H. J. Karloff, L. J. Schulman, and L. Trevisan. Lower bounds for linear locally decodable codes and private information retrieval. Computational Complexity, 15(3):263–296, 2006. [HH91]

T. R. Hancock and L. Hellerstein. Learning read-once formulas over fields and extended bases. In Proceedings of the 4th Annual Conference on Computational Learning Theory, pages 326–336, 1991.

[Kal85]

E. Kaltofen. Polynomial-time reductions from multivariate to bi- and univariate integral polynomial factorization. SIAM J. on computing, 14(2):469–489, 1985.

[Kal95]

E. Kaltofen. Effective Noether irreducibility forms and applications. J. of Computer and System Sciences, 50(2):274–295, 1995. 34

[Kha95]

M. Kharitonov. Cryptographic lower bounds for learnability of boolean functions on the uniform distribution. J. of Computer and System Sciences, 50(3):600–610, 1995.

[KL01]

M. Krause and S. Lucks. Pseudorandom functions in TC0 and cryptographic limitations to proving lower bounds. Computational Complexity, 10(4):297–313, 2001.

[KS96]

M. Karpinski and I. Shparlinski. On some approximation problems concerning sparse polynomials over finite fields. Theoretical Computer Science, 157(2):259– 266, 1996.

[KS01]

A. Klivans and D. Spielman. Randomness efficient identity testing of multivariate polynomials. In Proceedings of the 33rd Annual ACM Symposium on Theory of Computing, pages 216–223, 2001.

[KS06]

A. Klivans and A. Shpilka. Learning restricted models of arithmetic circuits. Theory of computing, 2(10):185–206, 2006.

[KS07a]

Z. Karnin and A. Shpilka. Deterministic black box polynomial identity testing of depth-3 arithmetic circuits with bounded top fan-in. Manuscript, 2007.

[KS07b]

Z. Karnin and A. Shpilka. Interpolating depth-3 arithmetic circuits with bounded top fan-in. Work in progress, 2007.

[KT90]

E. Kaltofen and B. M. Trager. Computing with polynomials given by black boxes for their evaluations: Greatest common divisors, factorization, separation of numerators and denominators. J. of Symbolic Computation, 9(3):301–320, 1990.

[KT00]

J. Katz and L. Trevisan. On the efficiency of local decoding procedures for errorcorrecting codes. In Proceedings of the 32nd ACM symposium on Theory of computing, pages 80–86. ACM press, 2000.

[KV94]

M. J. Kearns and L. G. Valiant. Cryptographic limitations on learning boolean formulae and finite automata. J. ACM, 41(1):67–95, 1994.

[Man95]

Y. Mansour. Randomized interpolation and approximation of sparse polynomials. SIAM J. on computing, 24(2):357–368, 1995.

[NR04]

M. Naor and O. Reingold. Number-theoretic constructions of efficient pseudorandom functions. J. ACM, 51(2):231–262, 2004.

[OGM86] S. Goldwasser O. Goldreich and S. Micali. How to construct random functions. J. ACM, 33(4):792–807, 1986.

35

[RR97]

A. A. Razboeov and S. Rudich. Natural proofs. J. of Computer and System Sciences, 55(1):24–35, 1997.

[Shp07]

A. Shpilka. Interpolation of depth-3 arithmetic circuits with two multiplication gates. In Proceedings of the 39th Annual ACM Symposium on Theory of Computing, pages 284–293, 2007.

[SS96]

R. E. Schapire and L. M. Sellie. Learning sparse multivariate polynomials over a field with queries and counterexamples. J. of Computer and System Sciences, 52(2):201–213, 1996.

[SW01]

A. Shpilka and A. Wigderson. Depth-3 arithmetic circuits over fields of characteristic zero. Computational Complexity, 10(1):1–27, 2001.

A

Brute force interpolation

In this section we show how to interpolate a polynomial that can be written as a polynomial in a few linear functions. In fact we need to consider a slightly more complicated scenario as described in the following lemma. Lemma 30 Let14 h(x1 , . . . , xs ) = Lin(h)·Q(`1 , . . . , `k ) be a polynomial, where Q is a polynomial and {Li }i∈[k] are linear functions. Let d = deg(h) < |F|. Then, there is a deterministic algorithm that given oracle access to h and Lin(h), d and the set of linear functions {`i }i∈[k] , finds Q. The running time of the algorithm is poly(|F|s , dk ). Proof Consider a generic degreeQd polynomial Q0 P in the `i -s. Such a polynomial has < (d + 1)k monomials of the form ki=1 `i ei , such that ki=1 ei ≤ d. Thus we have a number of unknown coefficients which we can find by a simple interpolation procedure which we now describe: Let `k+1 , . . . , `s be linear functions such that15 `1 , . . . , `s form a basis to the space of linear functions over Fs . Let X Y (13) Q0 (y1 , . . . , yk ) = ce¯ · yi ei . {¯ e=(e1 ,...,ek ):

P

ei ≤d}

Next we represent Lin(h) as a polynomial in the `i -s. Let PLin (y1 , . . . , ys ) be a polynomial satisfying X Lin(h) = PLin (`1 , . . . , `s ) = αM · M (`1 , . . . , `s ), (14) M

where the sum is over all monomials M of degree at most d. Note that as we have oracle access to Lin(h), it is easy to find the αM -s (by the usual interpolation process). We get 14

In Section 4.1.1 we needed to interpolate ∼ (f )|V for an s-dimensional space V , so we assume that h is a polynomial in s variables. 15 We assume w.l.o.g. that `1 , . . . , `k are linearly independent.

36

that h can be written as h = PLin (`1 , . . . , `s ) · Q0 (`1 , . . . , `k ) =

X

γM · M (`1 , . . . , `s ),

(15)

M

where each coefficient γM is a linear function in the ce¯-s. By querying h on all the points in Fs and using the standard interpolation method we can easily find all the coefficients γM , and from them we can try to learn the coefficients ce¯-s, by solving a system of linear equations. We note that once we know the γM -s there is a unique solution to the ce¯-s (in case that such a solution exists). The reason is that if two different solutions exist then they give rise to two polynomials PLin · Q1 and PLin · Q2 , that are equal over Fs . However, the degree of each of the polynomials is at most d, which is smaller than the size of the field that we are working with, and so the two polynomials must be the same. In particular we have that Q0 = Q. 

37

Suggest Documents